NES boot loader usage

This page covers usage of the boot loader, implementations, and tools. See the NES boot loader specification for details on the protocol and operation.

Contents

Boot loader cartridge

The following ROM runs a boot loader at reset, ready to receive a program. It prints the status of the boot loader, to help with diagnosing problems when sending it a program block. Send it test.bin and it should make a low beep. It works on NTSC and PAL.

bootloader-2.zip ROM that runs boot loader at reset, source code, and test file.

Constructing a program block

To build a 256-byte program, first assemble the code into a 256-byte file with the code beginning at offset 7. With the ca65 assembler, this can be achieved with the following minimal boot.cfg file:

; boot.cfg
MEMORY {
    CODE: start = 0, size = $100, fill=yes;
}
SEGMENTS {
    CODE: load = CODE;
}

Then put the code in the source file, with the first 7 bytes reserved. This example plays a low tone:

        .res 7
        
        lda #$47
        sta $4015
        sta $4000
        sta $4001
        sta $4002
        sta $4003

forever:
        jmp forever

Assemble using the following:

ca65 tone.s
ld65 -C boot.cfg -o tone.bin tone.o

When assembled, tone.bin should contain the following:

00: 00 00 00 00 00 00 00 A9 47 8D 15 40 8D 00 40 8D
10: 01 40 8D 02 40 8D 03 40 4C 18 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
...
E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Next, you will be using make_boot.c, the program block creation tool. To build this tool on Linux, type the following:

gcc -o make_boot make_boot.c

To convert the assembled code into a program block suitable for sending to the boot loader, execute

./make_boot tone.bin

The result should look like this:

00: B8 45 CC 51 93 97 B8 6A 1D 4E 57 FD 4E FF FD 4E
10: 7F FD 4E BF FD 4E 3F FD CD E7 FF FF FF FF FF FF
20: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
...
E0: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
F0: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

Send this to the NES. On Linux, the file may be sent as raw binary with the following commands (replace ttyUSB0 with your serial device name)

stty -F /dev/ttyUSB0 sane
stty -F /dev/ttyUSB0 raw 57600 cs8 -crtscts
cat tone.bin > /dev/ttyUSB0

Receiving further data

A program can receive additional data once it begins executing, for example to receive more code to write to $100-$1FF or data to load into another part of memory.

The boot loader may delay execution of the program such that some bytes are lost, so padding must be added between the program block and additional data. The make_boot tool does this, adding 64 bytes of $FF padding, with the last set to $FE. To skip this padding, keep reading bytes until $FE is found.

The following code receives an additional 256 bytes and writes them to $100-$1FF.

        ; Skip $FF synchronization bytes until $FE is received.
sync:   jsr read_serial
        cmp #$FE
        bne sync
        
        ; Receive 256 bytes and write to $100-$1FF
        ldx #0
more:   jsr read_serial
        sta $100,x
        inx
        bne more
        
        ...

; Waits for and receives byte via serial on second controller port.
; No more than 19 cycles may be spent between calls to this routine,
; or data will be lost.
; Out: A = received byte
; Preserved: X, Y
read_serial:
        lda #1
start:  bit $4017       ; Wait for start bit
        beq start
        lsr
        ror
        nop             ; Remove for PAL timing
dbit:   lsr $4017       ; Read start bit and first 7 data bits
        pha
        pla
        pha
        pla
        nop             ; Remove for PAL timing
        nop
        ror a
        bcs last        ; Loop until carry shifts out
        jmp dbit
last:   nop
        lsr $4017       ; Read final data bit
        ror a
        eor #$FF        ; Un-invert received byte
        rts

Boot loader implementations

Below are several implementations of the boot loader, each with a different tradeoff between robustness and code size.

Each implementation has nearly ideal serial timing. Times below in CPU cycles.

Time invervalNTSC PAL
Actual Ideal Actual Ideal
Start bit to middle of data bit 46.5±3.5 46.6 44.5±3.5 43.3
One data bit to the next 31 31.1 29 28.9
Last bit to start bit checking 26-37 20-42 28-37 19-39

Basic

A basic implementation waits for the first signature byte and verifies the 8-bit checksum. This one is 47 bytes.

sum = <0                ; Checksum; initialized when first byte is written
badcrc: ldx #0          ; Number of bytes received
notsig:
byte:   lda #$01        ; Wait for start bit
start:  bit $4017
        beq start
        ldy #6          ; Delay from start bit to middle of data bit
dbit:   dey
        bne dbit
        ldy #3          ; Delay between bits
        nop             ; Remove this NOP for PAL timing
        nop
        lsr $4017       ; Read data bit
        rol a
        bcc dbit
        sta 0,x         ; Store received byte
        cpx #1          ; Verify first byte of signature
        bcs past
        cmp #$E2
        bne notsig
past:   adc sum         ; Update checksum
        sta sum
        inx
        bne byte
        tay             ; Verify checksum
        bne badcrc
        jmp $0007       ; Execute received code

Minimal

A minimal implementation receives the 256-byte program block and begins executing it, without any checking. This one is only 30 bytes.

        ; NTSC version
        ldx #0          ; Number of bytes received
byte:   lda #$01
start:  bit $4017       ; Wait for start bit
        beq start
        lsr             ; A = 0
        nop
dbit:   ldy #3          ; Delay between bits
        lsr $4017       ; Read bit. First time reads 1 for start bit.
dly:    dey             ; Delay
        bne dly
        rol a           ; Move bit into shift register
        sta 0,x         ; Delay, and store received byte on final iter
        bcc dbit
        inx
        bne byte
        jmp $0007       ; Execute received code

The PAL version just reduces the delay by two cycles by reordering things slightly.

        ; PAL version
        ldx #0          ; Number of bytes received
byte:   lda #$01
start:  bit $4017       ; Wait for start bit
        beq start
        lsr             ; A = 0
dbit:   ldy #3          ; Delay between bits
        lsr $4017       ; Read bit. First time reads 1 for start bit.
dly:    dey             ; Delay
        bne dly
        rol a           ; Move bit into shift register
        nop
        bcc dbit
        sta 0,x         ; Store received byte
        inx
        bne byte
        jmp $0007       ; Execute received code

Full

A full implementation waits for the 4-byte signature and verifies 16-bit checksum. This one is 93 bytes. For PAL timing, remove the indicated NOP.

crc     = <0
temp    = <2

badcrc:
notsig: ldx #0          ; Number of bytes received
byte:   lda #$01        ; Wait for start bit
start:  bit $4017
        beq start
        ldy #6          ; Delay from start bit to middle of data bit
dbit:   dey
        bne dbit
        ldy #3          ; Delay between data bits
        nop             ; Remove this NOP for PAL timing
        nop
        lsr $4017       ; Read bit
        rol a
        bcc dbit
        cpx #4          ; Verify signature if one of first four bytes
        bcs past
        eor #$E2        ; Handle signature with partial signature before it
        bne not1st
        ldx #0
not1st: eor signature-1,x; Share last zero byte of JMP
        bne notsig
past:   sta 0,x         ; Write after signature verify clears A, so that
        inx             ; CRC gets cleared for later
        bne byte
        txa             ; Calculate CRC-16 of user data
        ldx #5
check:  eor 0,x
        sta crc+1       ; based on Greg Cook's CRC-16 code
        lsr
        lsr
        lsr
        lsr
        tay
        asl
        eor crc
        sta temp
        tya
        eor crc+1
        sta crc+1
        asl
        asl
        asl
        tay
        asl
        asl
        eor crc+1
        sta crc
        tya
        rol
        eor temp
        inx
        bne check
        eor crc         ; Verify checksum
        bne badcrc
        jmp $0007       ; Execute received code
signature:
        .byte $5D^$E2, $CC^$E2, $75^$E2

Change log