This page covers usage of the boot loader, implementations, and tools. See the NES boot loader specification for details on the protocol and operation.
The following ROM runs a boot loader at reset, ready to receive a program. It prints the status of the boot loader, to help with diagnosing problems when sending it a program block. Send it test.bin and it should make a low beep. It works on NTSC and PAL.
bootloader-2.zip ROM that runs boot loader at reset, source code, and test file.
To build a 256-byte program, first assemble the code into a 256-byte file with the code beginning at offset 7. With the ca65 assembler, this can be achieved with the following minimal boot.cfg file:
; boot.cfg MEMORY { CODE: start = 0, size = $100, fill=yes; } SEGMENTS { CODE: load = CODE; }
Then put the code in the source file, with the first 7 bytes reserved. This example plays a low tone:
.res 7 lda #$47 sta $4015 sta $4000 sta $4001 sta $4002 sta $4003 forever: jmp forever
Assemble using the following:
ca65 tone.s ld65 -C boot.cfg -o tone.bin tone.o
When assembled, tone.bin should contain the following:
00: 00 00 00 00 00 00 00 A9 47 8D 15 40 8D 00 40 8D 10: 01 40 8D 02 40 8D 03 40 4C 18 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... E0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Next, you will be using make_boot.c, the program block creation tool. To build this tool on Linux, type the following:
gcc -o make_boot make_boot.c
To convert the assembled code into a program block suitable for sending to the boot loader, execute
./make_boot tone.bin
The result should look like this:
00: B8 45 CC 51 93 97 B8 6A 1D 4E 57 FD 4E FF FD 4E 10: 7F FD 4E BF FD 4E 3F FD CD E7 FF FF FF FF FF FF 20: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF ... E0: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF F0: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
Send this to the NES. On Linux, the file may be sent as raw binary with the following commands (replace ttyUSB0 with your serial device name)
stty -F /dev/ttyUSB0 sane stty -F /dev/ttyUSB0 raw 57600 cs8 -crtscts cat tone.bin > /dev/ttyUSB0
A program can receive additional data once it begins executing, for example to receive more code to write to $100-$1FF or data to load into another part of memory.
The boot loader may delay execution of the program such that some bytes are lost, so padding must be added between the program block and additional data. The make_boot tool does this, adding 64 bytes of $FF padding, with the last set to $FE. To skip this padding, keep reading bytes until $FE is found.
The following code receives an additional 256 bytes and writes them to $100-$1FF.
; Skip $FF synchronization bytes until $FE is received. sync: jsr read_serial cmp #$FE bne sync ; Receive 256 bytes and write to $100-$1FF ldx #0 more: jsr read_serial sta $100,x inx bne more ... ; Waits for and receives byte via serial on second controller port. ; No more than 19 cycles may be spent between calls to this routine, ; or data will be lost. ; Out: A = received byte ; Preserved: X, Y read_serial: lda #1 start: bit $4017 ; Wait for start bit beq start lsr ror nop ; Remove for PAL timing dbit: lsr $4017 ; Read start bit and first 7 data bits pha pla pha pla nop ; Remove for PAL timing nop ror a bcs last ; Loop until carry shifts out jmp dbit last: nop lsr $4017 ; Read final data bit ror a eor #$FF ; Un-invert received byte rts
Below are several implementations of the boot loader, each with a different tradeoff between robustness and code size.
Each implementation has nearly ideal serial timing. Times below in CPU cycles.
Time inverval | NTSC | PAL | ||
---|---|---|---|---|
Actual | Ideal | Actual | Ideal | |
Start bit to middle of data bit | 46.5±3.5 | 46.6 | 44.5±3.5 | 43.3 |
One data bit to the next | 31 | 31.1 | 29 | 28.9 |
Last bit to start bit checking | 26-37 | 20-42 | 28-37 | 19-39 |
A basic implementation waits for the first signature byte and verifies the 8-bit checksum. This one is 47 bytes.
sum = <0 ; Checksum; initialized when first byte is written badcrc: ldx #0 ; Number of bytes received notsig: byte: lda #$01 ; Wait for start bit start: bit $4017 beq start ldy #6 ; Delay from start bit to middle of data bit dbit: dey bne dbit ldy #3 ; Delay between bits nop ; Remove this NOP for PAL timing nop lsr $4017 ; Read data bit rol a bcc dbit sta 0,x ; Store received byte cpx #1 ; Verify first byte of signature bcs past cmp #$E2 bne notsig past: adc sum ; Update checksum sta sum inx bne byte tay ; Verify checksum bne badcrc jmp $0007 ; Execute received code
A minimal implementation receives the 256-byte program block and begins executing it, without any checking. This one is only 30 bytes.
; NTSC version ldx #0 ; Number of bytes received byte: lda #$01 start: bit $4017 ; Wait for start bit beq start lsr ; A = 0 nop dbit: ldy #3 ; Delay between bits lsr $4017 ; Read bit. First time reads 1 for start bit. dly: dey ; Delay bne dly rol a ; Move bit into shift register sta 0,x ; Delay, and store received byte on final iter bcc dbit inx bne byte jmp $0007 ; Execute received code
The PAL version just reduces the delay by two cycles by reordering things slightly.
; PAL version ldx #0 ; Number of bytes received byte: lda #$01 start: bit $4017 ; Wait for start bit beq start lsr ; A = 0 dbit: ldy #3 ; Delay between bits lsr $4017 ; Read bit. First time reads 1 for start bit. dly: dey ; Delay bne dly rol a ; Move bit into shift register nop bcc dbit sta 0,x ; Store received byte inx bne byte jmp $0007 ; Execute received code
A full implementation waits for the 4-byte signature and verifies 16-bit checksum. This one is 93 bytes. For PAL timing, remove the indicated NOP.
crc = <0 temp = <2 badcrc: notsig: ldx #0 ; Number of bytes received byte: lda #$01 ; Wait for start bit start: bit $4017 beq start ldy #6 ; Delay from start bit to middle of data bit dbit: dey bne dbit ldy #3 ; Delay between data bits nop ; Remove this NOP for PAL timing nop lsr $4017 ; Read bit rol a bcc dbit cpx #4 ; Verify signature if one of first four bytes bcs past eor #$E2 ; Handle signature with partial signature before it bne not1st ldx #0 not1st: eor signature-1,x; Share last zero byte of JMP bne notsig past: sta 0,x ; Write after signature verify clears A, so that inx ; CRC gets cleared for later bne byte txa ; Calculate CRC-16 of user data ldx #5 check: eor 0,x sta crc+1 ; based on Greg Cook's CRC-16 code lsr lsr lsr lsr tay asl eor crc sta temp tya eor crc+1 sta crc+1 asl asl asl tay asl asl eor crc+1 sta crc tya rol eor temp inx bne check eor crc ; Verify checksum bne badcrc jmp $0007 ; Execute received code signature: .byte $5D^$E2, $CC^$E2, $75^$E2