This page covers usage of the boot loader, implementations, and tools. See the NES boot loader specification for details on the protocol and operation.
The following ROM runs a boot loader at reset, ready to receive a program. It prints the status of the boot loader, to help with diagnosing problems when sending it a program block. Send it test.bin and it should make a low beep. It works on NTSC and PAL.
bootloader-2.zip ROM that runs boot loader at reset, source code, and test file.
To build a 256-byte program, first assemble the code into a 256-byte file with the code beginning at offset 7. With the ca65 assembler, this can be achieved with the following minimal boot.cfg file:
; boot.cfg
MEMORY {
CODE: start = 0, size = $100, fill=yes;
}
SEGMENTS {
CODE: load = CODE;
}
Then put the code in the source file, with the first 7 bytes reserved. This example plays a low tone:
.res 7
lda #$47
sta $4015
sta $4000
sta $4001
sta $4002
sta $4003
forever:
jmp forever
Assemble using the following:
ca65 tone.s ld65 -C boot.cfg -o tone.bin tone.o
When assembled, tone.bin should contain the following:
00: 00 00 00 00 00 00 00 A9 47 8D 15 40 8D 00 40 8D 10: 01 40 8D 02 40 8D 03 40 4C 18 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Next, you will be using make_boot.c, the program block creation tool. To build this tool on Linux, type the following:
gcc -o make_boot make_boot.c
To convert the assembled code into a program block suitable for sending to the boot loader, execute
./make_boot tone.bin
The result should look like this:
00: B8 45 CC 51 93 97 B8 6A 1D 4E 57 FD 4E FF FD 4E 10: 7F FD 4E BF FD 4E 3F FD CD E7 FF FF FF FF FF FF 20: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ... E0: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF F0: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
Send this to the NES. On Linux, the file may be sent as raw binary with the following commands (replace ttyUSB0 with your serial device name)
stty -F /dev/ttyUSB0 sane stty -F /dev/ttyUSB0 raw 57600 cs8 -crtscts cat tone.bin > /dev/ttyUSB0
A program can receive additional data once it begins executing, for example to receive more code to write to $100-$1FF or data to load into another part of memory.
The boot loader may delay execution of the program such that some bytes are lost, so padding must be added between the program block and additional data. The make_boot tool does this, adding 64 bytes of $FF padding, with the last set to $FE. To skip this padding, keep reading bytes until $FE is found.
The following code receives an additional 256 bytes and writes them to $100-$1FF.
; Skip $FF synchronization bytes until $FE is received.
sync: jsr read_serial
cmp #$FE
bne sync
; Receive 256 bytes and write to $100-$1FF
ldx #0
more: jsr read_serial
sta $100,x
inx
bne more
...
; Waits for and receives byte via serial on second controller port.
; No more than 19 cycles may be spent between calls to this routine,
; or data will be lost.
; Out: A = received byte
; Preserved: X, Y
read_serial:
lda #1
start: bit $4017 ; Wait for start bit
beq start
lsr
ror
nop ; Remove for PAL timing
dbit: lsr $4017 ; Read start bit and first 7 data bits
pha
pla
pha
pla
nop ; Remove for PAL timing
nop
ror a
bcs last ; Loop until carry shifts out
jmp dbit
last: nop
lsr $4017 ; Read final data bit
ror a
eor #$FF ; Un-invert received byte
rts
Below are several implementations of the boot loader, each with a different tradeoff between robustness and code size.
Each implementation has nearly ideal serial timing. Times below in CPU cycles.
| Time inverval | NTSC | PAL | ||
|---|---|---|---|---|
| Actual | Ideal | Actual | Ideal | |
| Start bit to middle of data bit | 46.5±3.5 | 46.6 | 44.5±3.5 | 43.3 |
| One data bit to the next | 31 | 31.1 | 29 | 28.9 |
| Last bit to start bit checking | 26-37 | 20-42 | 28-37 | 19-39 |
A basic implementation waits for the first signature byte and verifies the 8-bit checksum. This one is 47 bytes.
sum = <0 ; Checksum; initialized when first byte is written
badcrc: ldx #0 ; Number of bytes received
notsig:
byte: lda #$01 ; Wait for start bit
start: bit $4017
beq start
ldy #6 ; Delay from start bit to middle of data bit
dbit: dey
bne dbit
ldy #3 ; Delay between bits
nop ; Remove this NOP for PAL timing
nop
lsr $4017 ; Read data bit
rol a
bcc dbit
sta 0,x ; Store received byte
cpx #1 ; Verify first byte of signature
bcs past
cmp #$E2
bne notsig
past: adc sum ; Update checksum
sta sum
inx
bne byte
tay ; Verify checksum
bne badcrc
jmp $0007 ; Execute received code
A minimal implementation receives the 256-byte program block and begins executing it, without any checking. This one is only 30 bytes.
; NTSC version
ldx #0 ; Number of bytes received
byte: lda #$01
start: bit $4017 ; Wait for start bit
beq start
lsr ; A = 0
nop
dbit: ldy #3 ; Delay between bits
lsr $4017 ; Read bit. First time reads 1 for start bit.
dly: dey ; Delay
bne dly
rol a ; Move bit into shift register
sta 0,x ; Delay, and store received byte on final iter
bcc dbit
inx
bne byte
jmp $0007 ; Execute received code
The PAL version just reduces the delay by two cycles by reordering things slightly.
; PAL version
ldx #0 ; Number of bytes received
byte: lda #$01
start: bit $4017 ; Wait for start bit
beq start
lsr ; A = 0
dbit: ldy #3 ; Delay between bits
lsr $4017 ; Read bit. First time reads 1 for start bit.
dly: dey ; Delay
bne dly
rol a ; Move bit into shift register
nop
bcc dbit
sta 0,x ; Store received byte
inx
bne byte
jmp $0007 ; Execute received code
A full implementation waits for the 4-byte signature and verifies 16-bit checksum. This one is 93 bytes. For PAL timing, remove the indicated NOP.
crc = <0
temp = <2
badcrc:
notsig: ldx #0 ; Number of bytes received
byte: lda #$01 ; Wait for start bit
start: bit $4017
beq start
ldy #6 ; Delay from start bit to middle of data bit
dbit: dey
bne dbit
ldy #3 ; Delay between data bits
nop ; Remove this NOP for PAL timing
nop
lsr $4017 ; Read bit
rol a
bcc dbit
cpx #4 ; Verify signature if one of first four bytes
bcs past
eor #$E2 ; Handle signature with partial signature before it
bne not1st
ldx #0
not1st: eor signature-1,x; Share last zero byte of JMP
bne notsig
past: sta 0,x ; Write after signature verify clears A, so that
inx ; CRC gets cleared for later
bne byte
txa ; Calculate CRC-16 of user data
ldx #5
check: eor 0,x
sta crc+1 ; based on Greg Cook's CRC-16 code
lsr
lsr
lsr
lsr
tay
asl
eor crc
sta temp
tya
eor crc+1
sta crc+1
asl
asl
asl
tay
asl
asl
eor crc+1
sta crc
tya
rol
eor temp
inx
bne check
eor crc ; Verify checksum
bne badcrc
jmp $0007 ; Execute received code
signature:
.byte $5D^$E2, $CC^$E2, $75^$E2