Detailed DMC Operation
I have come up with and implemented a model of the DMC which explains all DMC behavior I've so far observed, and properly handles the DMC saw waves hack. The following is a description of the model, which may not match the DMC hardware. Following the model are several tests which yield the same results running on an emulator based on the model and NES hardware.
Model
The DMC consists of three units:
- output unit
- sample buffer
- DMA unit
The sample buffer either holds a single sample byte or is empty. It is filled by the DMA unit and emptied by the output unit. Only the output unit can empty it, so once loaded with a sample it will be eventually output.
The output unit is always cycling. During each cycle it either outputs a sample byte or is silent for equal duration. The type of cycle is determined just before the cycle starts, based on the full/empty status of the sample buffer. Once a cycle has started, its type can't be changed nor can it be interrupted.
A cycle consists of 8 steps, each consisting of a delay followed by a possible action. The delay is determined by the DMC period ($4010) at the time the delay begins, thus the DMC period is accessed 8 times per cycle. For a sample output cycle, at the end of each step the next sample bit is applied to the DAC. For a silence cycle, each step consists of just the delay.
After a cycle is complete, a new cycle is started; its type is determined by the status of the sample buffer. If the sample buffer is empty, a silence cycle is started, otherwise a sample output cycle is started (using the sample from the buffer) and the sample buffer is cleared.
The DMA unit constantly watches for an opportunity to fill the sample buffer. If the the sample buffer is empty and there are bytes remaining in the current sample, the DMA unit reads the next sample byte from memory and puts it into the sample buffer. After fetching the sample, it decrements the number of remaining sample bytes; if this becomes zero, further action is taken: if looping is enabled, the sample is restarted (see below), otherwise an IRQ is generated (if enabled via $4010).
When a CPU write to $4015 occurs, bit 4 determines one of two actions to perform: if 0, set the remaining bytes in the current sample to 0. If bit 4 is 1 and the number of remaining bytes is zero, restart the sample, otherwise do nothing.
Restarting the DMC sample involves setting the DMA unit's current address and bytes remaining based on the values in DMC registers $4012 and $4013
Notes
If nothing is currently playing, the sample buffer will be empty and the output unit will be in a silence cycle. If the DMC is then enabled, the DMA unit will immediately fill the buffer. If the sample's length is only 1 byte, this will result in almost immediate clearing of the DMC's enable bit and an IRQ (if it's enabled). The sample buffer will then be filled, so when the output unit completes its current silence cycle, it will play the buffered sample byte.
If a sample is current playing and the DMC is then disabled, the sample buffer will still be full and the output unit will be in the middle of a sample, thus the currently playing sample byte will complete, then the remaining sample byte in the buffer will also play, then silence. It might be possible that the CPU disables the DMC just after the output unit empties the sample buffer and before the DMA unit notices it's empty, which would result in one fewer sample byte than usual before silence (the DMA unit seems to take a few cycles to run through its checking cycle).
The number of clock cycles until the transition of the DMC from enabled to disabled can be calculated as follows (the sample buffer will already be full):
clocks until the end of the output unit's step +
steps remaining in current cycle * DMC sample bit period +
(sample bytes remaining - 1) * 8 * DMC sample bit period
Tests
I developed and tested the model with several carefully-designed sequences which were run on NES hardware. The model agrees with the results, and an emulator based on the model generates the same results. Each test is titled by the conclusion which it supports. Samples are shown of the output of NES hardware and the sequence which generated it.
NES ROMs for some tests are available. The ROM name is listed at the beginning of assembly sequences, if one is included. I plan on making better test ROMs which are designed to find defects in emulation, rather than test the DMC model as the current tests are designed to do. Feedback on improvements is welcome.
Each sequence starts out with the DMC's DAC stabilized at 32 (1/4 full range), and the DMC sample set to a series of $55 sample values which result in alternating positive and negative transitions. The DMC's frequency is set at the lowest and IRQ is disabled ($4010 = 0). The DMC sample length is set to 17 bytes ($4013 = 1). Many sequences mark a point in time by directly setting the DAC in order to generate a noticeable output transition.
There is an independent 8-sample-bit output section
Start (large positive transition) and stop (large negative transition) DMC at regular intervals
The DMC only responds every 8 sample bit periods, and the sample always plays for a multiple of 8 sample bits, indicating an independent sample output section that can only be configured every 8 sample bits, even when it's currently not playing anything (silent).
The lowest DMC frequency results in an 8-bit sample taking approximately 1.9 msec. By making the delays double this (3.8 msec each), the latency becomes constant:
; latency.nes
ldy #4 ; iterations
loop:
lda #36 ; mark output
sta $4011
lda #$10 ; start DMC
sta $4015
lda #22 ; delay 2.2 msec
jsr ms_delay
lda #32 ; mark output
sta $4011
lda #0 ; stop DMC
sta $4015
lda #43 ; delay 4.3 msec
jsr ms_delay
dey
bne loop
There is an intermediate buffer in addition to what the 8-bit sample output section uses
In the previous test, the stopping latency was always over 8 samples, indicating an extra byte buffer which the 8-bit sample output section draws on. This test sets up a ramp sample in memory and configures the DMC for it. It starts the DMC and marks the output. Then it changes the sample value in memory to a neutral toggling value. The output shows that the DMC sample doesn't start playing until well after it's started (due to the previously demonstrated latency), so if the DMC didn't buffer the sample value, it should use the neutral toggling sample. The output shows that it uses the original value, indicating an additional byte buffer. The development cartridge I use contains RAM in the upper 32K address space and it allows the CPU to modify it; with ROM the equivalent could be achieved by switching banks.
lda #$80 ; DMC sample at $E000
sta $4012
lda #0 ; DMC sample length = 1 byte
sta $4013
ldy #4 ; iterations
loop:
lda #$FF ; set sample value to ramp
sta $E000
lda #$10 ; start DMC
sta $4015
lda #30 ; mark output
sta $4011
lda #$55 ; set sample value to neutral
sta $E000
lda #42 ; delay 4.2 msec
jsr ms_delay
dey
bne loop
The intermediate buffer can only be emptied by the sample output section
This test enables the DMC, then immediately disables it, but an 8-bit sample is outputted anyway. This supports the existence of an intermediate buffer; it is filled immediately when the DMC is enabled, and is only emptied when the sample output section needs a new byte.
; buffer_retained.nes
lda #$10 ; start
sta $4015
lda #$00 ; immediately stop
sta $4015
The status changes to "not playing" when all sample bytes have been read
Since the 8-bit sample output unit and intermediate buffer form an effective 2-byte buffer, the status can change to "not playing" up to 16 sample bit periods before the last sample bit is added to the DAC.
This test starts a 17-byte sample and polls the status. Once it changes to "not playing", the output is marked with a transition.
The transition occurs immediately after the last bit of the 15th sample byte is applied to the DAC, then the 16th and 17th sample bytes are output.
; status.nes (status_irq.nes for IRQ version)
lda #$10 ; start DMC
sta $4015
wait:
bit $4015 ; wait for status to change to 0
bne wait
lda #0 ; mark output
sta $4011
If the sample length is changed to 1 byte, the status changes to "not playing" before the sample even starts! This is explained by the intermediate buffer being filled immediately:
Back to Blargg's Video Game Sound Emulation