How to Set Up and Use NMIs On the Commodore 64

Published on: 21st July 2025

Article by Jonathan Woods (Kodiak).
Reviewed for technical accuracy by Olivier Bernhard (Amok64).

GLOSSARY

IRQ = Interrupt ReQuest
IRST = Interrupt RaSTer aka raster interrupt aka raster IRQ
NMI = Non-Maskable Interrupt
Handler = Interrupt Service Routine (ISR) = the code executed when an interrupt is triggered
Chaining = rewriting the interrupt-handler vectors during code execution by a handler, to point them to a new handler

Introduction

As a Commodore 64 coder you will have at some stage encountered the IRQ and will be aware of its great utility in being triggered by the raster when it reaches a predetermined scanline in what is widely known as a raster interrupt, or IRST in the hyper-insistent sense of the official Commodore 64 Programmer's Reference Guide.

Cover of the Commodore 64 Programmer's Reference Guide

IRST mentioned in the Commodore 64 Programmer's Reference Guide

In fact, so user-friendly and helpful is this feature that most commercial games on the C64 use it, although beginner coders might not automatically agree with it being easy to set up!

However, there is a mysterious cousin interrupt service available on the C64 called the Non-Maskable Interrupt (NMI) and it can run within the same program as the IRQ in such a fashion as to be able to have two interrupt “threads”, as it were, working in harmony with each other.

In fact, because the NMI takes priority over the IRQ, you can use the NMI to interrupt IRQ service routines, so that you have an interrupt that interrupts other interrupts!

Now, at first this might sound superfluous and indeed, in most real-world game situations it is.

But consider where you have a raster interrupt performing a long task, for example, some char scrolling, and you want to nip in during that process to change a background colour or plex a sprite.

I cover a real-game deployment of this technique in my article, The Wild Wood Deconstructed, and of course, the technique is used extensively in both Seawolves and Parallaxian, as well as the now-cancelled Deep Winter (you can download the .prg VICE executable of the NMI-powered falling snow effect here (PAL-only)).

Deep Winter's falling snow effect over a scrolling background on the Commodore 64 — Deep Winter's falling snow effect over a parallaxed scrolling background on the Commodore 64.

In cases like this, where there are many screen splits required in the midst of long IRST tasks, the NMI is your friend... as long as you know how to set it up – and have the patience to work through its lack of user-friendliness, as we shall see.

The NMI is a Timer Interrupt

Like the “native mode” of the IRQ, the NMI is a timer interrupt.

Perhaps poorly known to, or seldom considered by, IRST users is the fact that when running as a timer interrupt, the IRQ uses the 16-bit frequency registers on CIA Chip #1, which has two timers, known respectively as Timer A and Timer B.

Either timer can be used to trigger instances of the IRQ when it is not an IRST but is, instead, a timer IRQ, and the timing between each triggering of the IRQ consists of the number of intervening machine cycles (hereinafter referred to as CPU cycles) expressed as a 16-bit number in the usual 6502 convention of Lo- and Hi-bytes.

This, by the way, is how interrupts on the VIC-20 work, as that machine lacks a raster interrupt feature.

Meanwhile, back at the C64 ranch, the NMI runs on CIA Chip #2 which, like CIA #1, also has a Timer A and Timer B, both of which may be used to control the trigger points of the NMI, which occur in the same fashion, i.e., after completing a specified number of CPU cycles that were set in the relevant timer’s Lo- and Hi-frequency registers.

As a timer-only interrupt, the NMI is unlike the IRQ in that it cannot be triggered by the raster position.

So... let’s say you want to wait 10 scan lines between two NMI trigger points.

On PAL, that would be 10 x 63 cycles = 630 cycles.

So we know we need to set the “frequency” to 630 expressed as a Lo- and Hi-byte number.

In hexadecimal, 630 = $0276, so LO byte = $76 and HI byte = $02.

These are then written into the frequency registers, so, for example, if we’re using Timer A on CIA #2, we would do this:

LDA #$76
STA $DD04

LDA #$02
STA $DD05

Note that NMIs, like IRSTs, can be chained so that during each NMI handler, we can rewrite to the above timer registers with a new 16-bit representation of the CPU cycle tally to be expended until the next NMI fires... this is analogous to writing a new value to $D012 when chaining IRSTs.

Activating the NMI at a Consistent Start Position on the Screen

It’s one thing understanding that the triggering of NMIs (and timer IRQs) are controlled through setting CPU cycles between instances, as opposed to IRSTs using $D012 to trigger instances.

It is quite another to ensure that the NMI interrupt chain starts at precisely the same on-screen position every time it is launched.

After all, if you can’t dictate an exact start position, the NMI chain becomes worthless as it could begin pretty much at any raster line, which is impractical for any program that requires it for screen splitting graphical effects.

There are numerous ways to achieve a consistent starting position and these are still under refinement by the demo scene.

Personally, and with flippant disregard for setting things up as near instantaneously as possible, I like to get the NMIs underway using this flow:

Set up a one-instance (i.e. for one frame only) IRST to fire at an exact scanline.
Use that IRQ to initiate the NMI chain.
Kill the set-up IRST and proceed to set up the main IRST chain that will run in pseudo-parallel with the NMIs.

(Don’t worry, sample code for this follows at the end of the article).

Entering & Exiting an NMI Handler – Dealing with the CPU Registers

When entering an IRQ handler, the clichéd code would be:

PHA ; [3 cycles]
TXA ; [2 cycles]
PHA ; [3 cycles]
TYA ; [2 cycles]
PHA ; [3 cycles]

and then on leaving:

PLA ; [4 cycles]
TAY ; [2 cycles]
PLA ; [4 cycles]
TAX ; [2 cycles]
PLA ; [4 cycles]

In all, that amounts to 13 cycles at the start and 16 cycles at the end, giving a total of 29 cycles or almost half of one 63 cycle scanline, just to store and recover the three registers.

Of course, not every handler uses all three registers; in cases where an IRST handler only performs a simple or short task, it might only need the A-register, in which case the entry code can be trimmed to this:

PHA ; [3 cycles]

and on exit:

PLA ; [4 cycles]

That reduces “register-handling” overhead to 7 cycles.

However, where every spare CPU cycle matters (as I found when devloping Seawolves), it is better to do this on entry:

STA ZP_IRQ_HOLD_A ; [3 cycles]

and on exit:

LDA ZP_IRQ_HOLD_A ; [3 cycles]

(Where ZP_IRQ_HOLD_A is a Zero Page variable).

This brings it down to 6 cycles overall.

Or, where you need all 3 registers:

STA ZP_IRQ_HOLD_A ; [3 cycles]
STX ZP_IRQ_HOLD_X ; [3 cycles]
STY ZP_IRQ_HOLD_Y ; [3 cycles]

and on exit:

LDA ZP_IRQ_HOLD_A ; [3 cycles]
LDX ZP_IRQ_HOLD_X ; [3 cycles]
LDY ZP_IRQ_HOLD_Y ; [3 cycles]

Leaving you with 9-in, 9-out, i.e. 18 cycles in total.

Naturally this cycle-preservation issue is only important if you are developing a cutting edge game, as chances are you will need every drop of raster time available to keep the IRST chain from stalling.

And this same storing and recovering of registers is just as necessary with the NMI as it is with the IRQ, so you would typically assign an additional 3 ZP variables to store the 3 registers when entering and exiting NMIs, as follows:

STA ZP_NMI_HOLD_A ; [3 cycles]
STX ZP_NMI_HOLD_X ; [3 cycles]
STY ZP_NMI_HOLD_Y ; [3 cycles]

and on exit:

LDA ZP_NMI_HOLD_A ; [3 cycles]
LDX ZP_NMI_HOLD_X ; [3 cycles]
LDY ZP_NMI_HOLD_Y ; [3 cycles]

If a program is running many NMI handlers in an “IRQ-rich environment”, I personally find this far more efficient than faffing around with the stack instructions.

Exiting an NMI Handler – Chaining the Next NMI Trigger Position

Again, let’s consider first how we dictate to the CPU where it will fire the next interrupt handler in an IRST scenario:

LDA #<IRQ_2 ; This is the label name for the next IRQ handler in this example
STA $FFFE

LDA #>IRQ_2
STA $FFFF

LDA #$96 ; $96 = 150 in decimal = next raster trigger line in this example
STA $D012

LDA ZP_IRQ_HOLD_A ; Recover the A-Reg (assumes the other 2 regs not used)

ASL $D019 ; Acknowledge the IRQ has occurred, so that the next one will fire

RTI

So we set the address vectors for the next handler, then we tell the raster service (so-to-speak) which line we want that handler to execute on, we recover the registers – in this example, only A-Reg is assumed to have been used by the handler – then we acknowledge the current IRQ to release the IRST service in the VIC-II chip to start waiting until the scanline arrives at which the next instance of an IRQ should fire and last of all, we ReTurn from Interrupt (RTI).

The same principles apply with the NMI; we set the vectors (only this time using $FFFA-$FFFB), we set the # of CPU cycles until the next NMI (in this example, for the NMI on Timer A in CIA chip #2, using $DD04-$DD05), we recover the registers (in this example, only A-Reg is assumed to have been used by the handler), we acknowledge the NMI and finally we RTI.

LDA #<NMI_2 ; This is the label name for the next NMI handler in this example
STA $FFFA

LDA #>NMI_2
STA $FFFB

LDA #<CYCLES_AFTER_NMI_2
STA $DD04

LDA #>CYCLES_AFTER_NMI_2
STA $DD05

LDA ZP_NMI_HOLD_A ; Recover the A-Reg (assumes the other 2 regs not used)

BIT $DD0D ; Acknowledge the NMI

RTI

Note, however, that we do not specify the cycles between the current handler and the next one here; rather, we are specifying the # of cycles between the next NMI handler (“NMI_2”) and the one after that (“NMI_3”), so that is an added layer of user-unfriendliness in working with NMIs beyond the absence of the useful raster-powered trigger that the IRQ can enjoy.

In other words, with NMIs (and in a corollary fashion with timer IRQs) we have to set the gaps between NMI#2 and NMI#3 during the execution of NMI#1.

For the gap between NMI#3 and NMI#4, we have to set it in NMI#2., and so on.

So, if we have 7 NMI handlers (for example), then in the last NMI handler (NMI#7), we set the gap between NMI#1 and NMI#2.

The table below should illustrate this better:

NMI#1 handler	Sets the CPU cycle gap (“frequency”) between NMI#2 + NMI#3 (as well as its other handler tasks)
NMI#2 handler	Sets the CPU cycle gap (“frequency”) between NMI#3 + NMI#4 (as above!)
NMI#3 handler	Sets the CPU cycle gap (“frequency”) between NMI#4 + NMI#5 (as above!)
NMI#4 handler	Sets the CPU cycle gap (“frequency”) between NMI#5 + NMI#6 (as above!)
NMI#5 handler	Sets the CPU cycle gap (“frequency”) between NMI#6 + NMI#7 (as above!)
NMI#6 handler	Sets the CPU cycle gap (“frequency”) between NMI#7 + NMI#1 (as above!)
NMI#7 handler	Sets the CPU cycle gap (“frequency”) between NMI#1 + NMI#2 (as above!)

It is natural to think that a 10 raster line gap would require 10 x 63 = 630 cycles, but in real-world coding situations it will typically end up being 630 +/- a few cycles to, for example, shunt a background colour change off the edge of the screen to the LHS or RHS to avoid ugly artefacts at the colour change point (the infamous grey dot phenomenon).

Generally speaking, the trickiest one to get right will be the gap between the last NMI handler and the first NMI handler on the next frame.

In the above example, that would be the gap between NMI#7 and NMI#1, which is calculated using the formula below:

“Last gap” (or "wraparound gap") = Total # of cycles on-screen minus total CPU cycles in the other gaps minus # of NMI handlers.

For the example above in PAL, that would be (63*312 = 19,656) minus (total CPU cycles in the other 6 gaps) minus (total # of NMI handlers = 7).

If you want to add some cycles to a gap, you must subtract the same number of cycles from the next gap to maintain stability of the NMI chain.

Likewise, if you want to add an extra NMI handler, you have adjust the formula above accordingly.

Obviously, to do all of this right, you will have to avail of a spreadsheet; the table below shows an outline of how such a spreadsheet might be modelled for a 9-handler NMI chain:

NMI#1 handler	Sets gap between #2 & #3	10 raster lines	630 cycles
NMI#2 handler	Sets gap between #3 & #4	20 raster lines	1260 cycles
NMI#3 handler	Sets gap between #4 & #5	15 raster lines	945 cycles
NMI#4 handler	Sets gap between #5 & #6	20 raster lines	1260 cycles
NMI#5 handler	Sets gap between #6 & #7	20 raster lines	1260 cycles
NMI#6 handler	Sets gap between #7 & #8	23 raster lines	1449 cycles
NMI#7 handler	Sets gap between #8 & #9	10 raster lines	630 cycles
NMI#8 handler	Sets gap between #9 & #1	N raster lines	X cycles
NMI#9 handler	Sets gap between #1 & #2	10 raster lines	630 cycles
		TOTAL minus X	8064 cycles

We generally don’t really need to know what N is in the above table, but we need to calculate X, which we do as follows on PAL machines:

NMI formula for calculating residual cycle gap on the C64

(For NTSC, one screen frame consumes 65 x 263 = 17505 cycles, and remember that the # of cycles per raster line is 65, not 63, so you would adjust accordingly).

NMIs Stalling the IRST Chain

As I mentioned above, I like to use an NMI handler to quickly shoot in to perform some minor task that must always occur at a predetermined raster line, regardless of whether or not IRST code is running at that screen position.

However, care must be taken to ensure that a IRST handler is not triggered at any location where an NMI handler is executing, as the NMI cannot be interrupted by the IRQ under normal circumstances – it only works the other way round, i.e., the IRQ can be interrupted by the NMI.*

If your code falls afoul of that rule, it will cause the IRST chain to stall (aka “raster stall”), that is, the affected IRST trigger point will be missed and the whole IRQ chain will be broken, with unsightly consequence or worse, the program crashing.

Parallaxian WIP testbed pre- raster stall — Parallaxian WIP testbed *before* raster stall

Parallaxian WIP testbed during raster stall — Parallaxian WIP testbed *during* raster stall

Accordingly, take care with your IRQs and NMIs to ensure that a handler of the former is never requested during the execution of a handler of the latter.

* Technically, one could in theory allow IRQs to fire inside an NMI by using CLI before the NMI handler code comes to its acknowledgment; in this case, the stack should ideally be used in such a regime for the storage and recovery of registers. As for any compelling practical use for this... I can think of none right now but that does not mean there are no usage-case examples!

NMIs Stalling the NMI Chain

Every coder who knows about IRSTs will have at some stage encountered raster stall (as mentioned in the previous section), more usually caused by a handler taking too long to complete its code and therefore not having time to exit before the next designated handler should fire.

Typically the stall does not crash the program, and recovers to restore the erstwhile full integrity of the interrupt chain after a screen frame or two.

With the NMI, should a handler take too long (which might, in real-world applications, happen where NMI handlers are tightly spaced), a similar effect occurs.

However, it does not recover to its erstwhile range of predetermined trigger lines, but instead reconstitutes itself shunted on down the screen.

So now you would have your NMI chain intact once more, but dislocated from its correct set-up position, with ugly or disastrous consequences for the display.

This is just something to bear in mind when working with NMIs (and indeed, timer IRQs) and can easily be avoided by assigning longer tasks to the IRSTs and using the NMIs for shorter jobs, as suggested above.

Advantage Over Raster Interrupts: More Precise Timing

Whereas IRSTs can be controlled down to raster line level, NMIs can be set to fire at a target position along a raster line, giving a more efficient means of attaining a precise trigger position than using NOPs or other expedients to conform an IRST to your will.

In this respect, NMIs (or timer IRQs) may be thought of as affording more control than IRSTs, albeit that comes at the cost of far greater awkwardness in setting them up.

For most game scenarios, this is unlikely to matter but for Parallaxian it has come into play and certainly, for modern demos, I would fully expect the use of NMIs and timer IRQs to be far more commonplace than in games.

Indeed, I know of very few games that use timer interrupts at all, which is understandable given the ease of setting up IRSTs compared to NMIs.

Stabilising an NMI

Let me state my own view right here that stabilising interrupts is mostly unnecessary, at least in the kind of games I have been working on, except where there is something that has to be cycle-exact in its execution.

The whole idea of stabilising an interrupt is to ensure that a handler fires at the exact same position consistently and thereby (a) avoiding the “mid-80s screen split jitter” and (b) guaranteeing a start position that can be used as a datum, if you will, for timing-critical tasks.

However, if sprites come into play and traverse the trigger point, then even a stabilised interrupt will be knocked out of kilter somewhat, since sprite-rendering always takes CPU precedence over interrupts.

With an IRST, probably the most popular way to stabilise an interrupt is the tried-and-tested Double IRQ method, but this is of no use with an NMI.

Instead, if you really MUST stabilise an NMI, the go-to technique for most coders seems to be the Inverted Timer method, which entails using the value of a timer other than the one running the NMI to negate the cycle variance at the NMI handler’s entry point caused by the CPU waiting to complete its current “main loop” instruction at that point.

(Note that IRSTs can also be stabilised this way, but most coders seem to prefer to use the Double IRQ method).

You can use either one of CIA#1’s IRQ timers (as long as the chosen timer is not running a timer IRQ) or you can use CIA#2’s other timer, i.e., the one that the NMI is not running on.

I like to implement by (a) starting the stabilisation timer first, as the very first thing the game initialisation code does, before any IRQ/IRST or NMI are set up and then (b) applying the inverted timer method where required, near the start of an NMI handler.

To start the stabilisation timer, I would typically use the “Hermit method” (see the sample program at the end of this article for its deployment).

(For more coder ideas on faster ways to set up a stable timer for raster purposes, see this basketcase thread on CSDB).

Apart from the inverted timer method, there are other even more esoteric ways to achieve stabilisation in an NMI, as covered in this equally bonkers thread on CSDB.

A Final Optimisation

Demo coders would not be demo coders if they were not obsessed with saving clock cycles, and even in the exiting of an NMI handler there is a rather slick expedient, as follows.

As you can see from my earlier code examples discussing exiting NMIs, the final tasks are:

BIT $DD0C

RTI

But during the NMI set-up routine we can do this:

LDA #$40 ; $40 = the opcode for RTI
STA $DD0C

Now when we come to exit an NMI handler, we can replace the BIT + RTI instructions by simply writing:

JMP $DD0C

This will perform the BIT + RTI operations but take one precious cycle less to do so!

For a fuller explanation, see this Codebase article.

SAMPLE CODE + DOWNLOAD

You can download some sample code I prepared for this article here, or you can just copy and paste the code listed below.

You can also download the VICE-executable .prg here.

NOTE 1: The code below is written for CBM Prg Studio but you can convert it to your favourite assembler format easily, even if you're semi-competent!

NOTE 2: I edited the code using Notepad++, which I personally like for all my assembly language work.

NOTE 3: This is one way to set up NMIs, not the only way, so I am not insisting you follow this pattern, nor am I saying my way is the best way.


; NMI and IRST annotated testbed
; By KODIAK 2025
; Feel free to tweak the code below at your leisure.
;

*=$0801

	BYTE    $0B, $08, $0A, $00, $9E, $34, $30, $39, $36, $00, $00, $00

*=$1000

			JSR INIT_HERMITSYNC
			JSR INIT_IRQNMI
					
INFINITE_LOOP		JMP INFINITE_LOOP

; //////////////////////////////////

; INIT_HERMITSYNC
; NOTE: This snippet was originally devised by "Hermit" on codebase, so the annotation is his, not mine! 
; See: https://codebase.c64.org/doku.php?id=base:using_a_timer_as_an_inverted_raster_x-pos_register_method 							
;							
					
INIT_HERMITSYNC		SEI

			; Ensure Timer A on CIA#1 always starts @ exact same point when game begins - used to stabilise interrupts (if required)
			;
			LDA #17
					
DC04_INIT		CMP $D012			; Scan raster							[4 cycles]
			BNE *-3				; Re-scan raster if it =/= 17					[3* cycles]
			
			LDY #08				; Start with value 08						[2 cycles]
			STY $DC04			; Initiate Timer A on CIA#1 with LO byte write			[4 cycles]
			DEY				; Y = Y-1, to cycle through 8 iterations from 08 --> 00		[2 cycles x 8 = 16 in total]
			BNE *-1				; Delay loop takes 39 cycles	[(3 x 7) + (2 x 1) = 23, plus 16 from the 8 DEYs = 39]
			
			; @ this stage, Y-reg = 00
			; @ this stage, A-reg = 17
			
			STY $DC05			; Ensure a zero written to HI byte of Timer A on CIA#1		[4 cycles]
			STA $DC0E,Y			; Force a restart of Timer A on CIA#1 to value 17		[5 cycles]
			
			LDA #17				; Same value as used for the original raster scan		[2 cycles]
			CMP $D012			; Test to see if scanline has ended (new line) or not (same line)[4 cycles]
			
			STY $D015			; Sprite disable (on off-chance they are pre-enabled)		[4 cycles]
			BNE DC04_INIT			; Resynchronise new line started after 63 cycles		[3* cycles]
			
			RTS
						
; /////////////////////////////

; INIT_IRQNMI
; Function: Initialise the IRQ + NMI threads 
;
INIT_IRQNMI		LDA #%01111111
			STA $DC0D 			; Disable timer-based IRQs on CIA#1, as we use its timer A for "Hermit's consistent countdown" below 
			LDA $DC0D 			; Precautionary ack of IRQ
			
			LDA #%00110101			; Bank out Kernal and BASIC, but keep I/O working
			STA $01
			
			LDA #IRST_SETUP_NMI
			STA $FFFF
					
IRSTSINITD011A		BIT $D011
			BPL IRSTSINITD011A
IRSTSINITD011B		BIT $D011
			BMI IRSTSINITD011B							
														
			LDA #%00011011			; Clear the High bit (lines 256-318) = #$1B = #27, so that IRSTNMI starts not below line 255
			STA $D011	
			
			LDY #00 			; Set raster line trigger position for IRSTNMI at #$00 (line 0)
			STY $D012 
			
			INY
			STY $D01A			; Enable IRSTs
			
			CLI				; Allow IRQs to fire again now that all interrupt services have been set up
										
			RTS	

; /////////////////////////////						
					
; IRST_SETUP_NMI
; Function: Runs only once (at start of program) to set up the NMI... some coders might insist this should be stabilised 
; since it's starting the timer, but I have found no inconsistency issues using it as it is (so far at least!)
;

IRST_SETUP_NMI		STA ZP_IRQ_HOLD_A		; Store A-Reg
				
			; Do a raster wait to get the NMI trigger positions in the right approximate locations (doesn't need to be cycle exact)
			;
			; Ensure raster is somewhere in LOWER 57 y-positions when raster line trigger position is set for first time
			;
INITIRQ1		BIT $D011
			BMI INITIRQ1
INITIRQ2		BIT $D011
			BPL INITIRQ2
					
			LDA #76				; Experiment with this value - I set this arbitrarily for predictable start position for NMIs
IRST_SETUP_NMI_RAS	CMP $D012
			BNE IRST_SETUP_NMI_RAS
			
			; Set NMI vectors
			;
			LDA #NMIHANDLER1
			STA $FFFB
			
			; Set stable NMI frequency 
			;
			LDA #GAP_NMI1_AND_NMI2
			STA $DD05
			
			; Insert an RTI instruction at $DD0C so that we can exit the NMI in 3 cycles by JMP $DD0C instead of BIT $DD0D + RTI
			; NOTE: This method might crash some C64Cs: https://csdb.dk/release/?id=139000
			;
			LDA #$40			; #$40 = opcode for RTI
			STA $DD0C
																				
			; Activate NMI
			;
			LDA #%10000001 			; Set Timer A on CIA #2 to enable NMIs generated by timer A underflow
			BIT $DD0D			; "Safety measure"
			STA $DD0D
			LDA #%10010001
			STA $DD0E			; Enable Timer A interrupts on CIA #2 (which is the CIA chip used for NMIs) + runs it in continuous mode
			
			; Set normal IRST vectors... the current IRST handler (IRST_SETUP_NMI) is never executed again during the game
			;
			LDA #IRST0
			STA $FFFF
			
			LDA ZP_IRQ_HOLD_A		; Recover A-Reg
			
			ASL $D019			; Acknowledge IRQ the quick way - some coders argue this is bad & we should write #%00000001 to it instead
			
			RTI	

; /////////////////////////////

NMIHANDLER1		STA ZP_NMI_HOLD_A		; Store A-Reg

			LDA #01
			STA $D021						
									
			; Reset NMI vectors
			;
			LDA #NMIHANDLER2
			STA $FFFB
			
			; Reset stable NMI frequency 
			;
			LDA #GAP_NMI2_AND_NMI3
			STA $DD05
										
			; Recover registers
			;
			LDA ZP_NMI_HOLD_A
			
			; Exit NMI
			;
			JMP $DD0C			; This performs BIT $DD0D (acknowledge NMI) + RTI... see "A Final Optimisation" in the article for more info

; /////////////////////////////

NMIHANDLER2		STA ZP_NMI_HOLD_A		; Store A-Reg

			LDA #02
			STA $D021						
									
			; Reset NMI vectors
			;
			LDA #NMIHANDLER3
			STA $FFFB
			
			; Reset stable NMI frequency 
			;
			LDA #GAP_NMI3_AND_NMI4
			STA $DD05
										
			; Recover registers
			;
			LDA ZP_NMI_HOLD_A
			
			; Exit NMI
			;
			JMP $DD0C

; /////////////////////////////	

NMIHANDLER3		STA ZP_NMI_HOLD_A		; Store A-Reg

			LDA #03
			STA $D021						
									
			; Reset NMI vectors
			;
			LDA #NMIHANDLER4
			STA $FFFB
			
			; Reset stable NMI frequency 
			;
			LDA #GAP_NMI4_AND_NMI5
			STA $DD05
										
			; Recover registers
			;
			LDA ZP_NMI_HOLD_A
			
			; Exit NMI
			;
			JMP $DD0C

; /////////////////////////////		

NMIHANDLER4		STA ZP_NMI_HOLD_A		; Store A-Reg

			LDA #04
			STA $D021							
									
			; Reset NMI vectors
			;
			LDA #NMIHANDLER5
			STA $FFFB
			
			; Reset stable NMI frequency 
			;
			LDA #GAP_NMI5_AND_NMI1
			STA $DD05
										
			; Recover registers
			;
			LDA ZP_NMI_HOLD_A
			
			; Exit NMI
			;
			JMP $DD0C

; /////////////////////////////		

NMIHANDLER5		STA ZP_NMI_HOLD_A		; Store A-Reg

; ----------------------------- Stabilise NMI (inverted timer method) -------------------

			; Perform a BPL branch that = f(7-A)
			;
			LDA $DC04			; Set A-reg = JITTER = countdown value from CIA#1 Timer A as it jitters between 7 and 1
			EOR #%00000111			; Invert result, i.e., set A = 7-A
			STA *+4				; Write the inversion to BPL's branch address
			BPL *+2				; <<< SELF-MODIFIED branch location... Branch forward by A, i.e., the number of bytes equal to the result 
							; of (7-JITTER) above
			
			; The BPL above will branch to any point in the following 7 bytes of RAM as f($DC04)
			
			LDA #$A9			; If A = 0, just execute LDAs from this point... if A = 1, start with $A9 (opcode for LDA #imm) + execute
			LDA #$A9			; If A = 2, just execute LDAS from this point... if A = 3, start with $A9 (opcode for LDA #imm) + execute
			LDA $EAA5			; If A = 4, just execute LDA $EAA5... if A = 5, just execute LDA $EA (as $A5 is the opcpde for LDA zp), 
							; and if A = 6, just execute NOP (as $EA  is the opcode for NOP)
			
; ----------------------------- @ this stage, NMI is stable -----------------------------

			LDA #05
			STA $D021							
									
			; Reset NMI vectors
			;
			LDA #NMIHANDLER1
			STA $FFFB
			
			; Reset stable NMI frequency 
			;
			LDA #GAP_NMI1_AND_NMI2
			STA $DD05
										
			; Recover registers
			;
			LDA ZP_NMI_HOLD_A
			
			; Exit NMI
			;
			JMP $DD0C
			
; /////////////////////////////	

IRST0			STA ZP_IRQ_HOLD_A		; Store A-Reg

			LDA #08
			STA $D020

			; Reset IRQ vectors
			;
			LDA #IRST1	
			STA $FFFF
			
			; Set desired scanline for next IRST to trigger at
			; 
			LDA #80		
			STA $D012
											
			; Recover registers
			;
			LDA ZP_IRQ_HOLD_A
			
			ASL $D019			; Acknowledge IRQ	
			
			RTI

; /////////////////////////////	

IRST1			STA ZP_IRQ_HOLD_A		; Store A-Reg

			LDA #09
			STA $D020

			; Reset IRQ vectors
			;
			LDA #IRST2	
			STA $FFFF
			
			; Set desired scanline for next IRST to trigger at
			;
			LDA #120		
			STA $D012
											
			; Recover registers
			;
			LDA ZP_IRQ_HOLD_A
			
			ASL $D019			; Acknowledge IRQ
			
			RTI

; /////////////////////////////	

IRST2			STA ZP_IRQ_HOLD_A		; Store A-Reg

			LDA #10
			STA $D020

			; Reset IRQ vectors
			;
			LDA #IRST0	
			STA $FFFF
			
			; Reset scanline to position 0 for IRST0
			;
			LDA #00		
			STA $D012
											
			; Recover registers
			;
			LDA ZP_IRQ_HOLD_A
			
			ASL $D019			; Acknowledge IRQ
			
			RTI

; /////////////////////////////	

GAP_NMI1_AND_NMI2 = 630		; 10 raster lines
GAP_NMI2_AND_NMI3 = 1260	; 20 raster lines
GAP_NMI3_AND_NMI4 = 2520	; 40 raster lines
GAP_NMI4_AND_NMI5 = 630		; 10 raster lines
GAP_NMI5_AND_NMI1 = 14611	; Formula: "wrap around gap" = (63 cycles * 312 scanlines) - (total # of user-defined gap cycles) - (total # of NMI handlers)
				; i.e. GAP_NMI5_AND_NMI1 = 19656 - (630 + 1260 + 2520 + 630) - 5

ZP_NMI_HOLD_A = $03					
ZP_IRQ_HOLD_A = $04

I would encourage the reader to try the following experiments with the code above:

Try adding / subtracting one cycle to GAP_NMI5_AND_NMI1 to see the effect.
Try to add an extra NMI handler whilst keeping the NMI chain stable.
Try a different value for the CMP $D012 in the NMI start-up routine.
Try removing the NMI stabilisation code from the 5th NMI handler; NOTE that in this example there is no need for stabilisation as there is not a proper “main loop” in the program running outside the interrupt code, but I have inserted the stabilisation code here anyway for experimentation purposes.
Try adjusting the cycles between NMI handler 4 and 5 to bring the stabilisation position off-screen on the LHS. Remember, when doing this you will have to add to the next NMI cycle gap (GAP_NMI5_AND_NMI1) as many cycles as you subtract from GAP_NMI4_AND_NMI5 otherwise the NMI chain will slide / shunt out of sync!
Try adding a long-ish task, e.g, a delay using a loop with some NOPs inside one of the NMI handlers to see if you can trigger any stall events.
Try adding some sprites and moving them slowly up and down the screen to see if they have any effect on the interrupt trigger positions.
Try changing things so that NMI handlers run on Timer B instead of Timer A (use the NMI REGISTERS information below to guide you).

NMI REGISTERS

$DD04-$DD05: Lo- and Hi-Byte of Timer A’s “frequency” (i.e. # of cycles between NMI trigger events)
Read: Current timer value.
Write: Set timer start value.

$DD06-$DD07: Lo- and Hi-Byte of Timer B’s “frequency” (i.e. # of cycles between NMI trigger events)
Read: Current timer value.
Write: Set timer start value.

$DD0D: Interrupt control and status register.
Read:
Bit #0: 1 = Timer A underflow occurred.
Bit #1: 1 = Timer B underflow occurred.
...
Bit #7: NMI has been generated.

Write:
Bit #0: 1 = Enable NMIs generated by timer A underflow.
Bit #1: 1 = Enable NMIs generated by timer B underflow.

$DD0E: Timer A control register.
Bit #0: 0 = Stop timer; 1 = Start timer.
...
Bit #4: 1 = Load start value into timer.
...
Bit #7: TOD speed; 0 = 60 Hz; 1 = 50 Hz.

$DD0F: Timer B control register.
Bit #0: 0 = Stop timer; 1 = Start timer.
...
Bit #4: 1 = Load start value into timer.
...
Bit #7: TOD speed; 0 = 60 Hz; 1 = 50 Hz.

Further Observations on the NMI

Without delving into the technicalities of 40-50 year old circuit diagrams (which I am unqualified to do), I think for some finality in covering this quite advanced subject we should make some closing observations about the NMI vis-à-vis the IRQ (for hyper-nerds interested in the electronics aspect of interrupts on the 6502 processor):

The IRQ is "level sensitive", which means its signal line has to be in a low state for it to fire, and it can do so repeatedly while in that low state.
Conversely, the NMI is "edge sensitive", which means only the actual transition from high to low in its signal line allows it to fire, so it cannot fire over and over while in a low state; rather it has to revert back to a high state first.
Some old literature on the NMI suggests that it executes an SEI at the start, but that only occurs in the NMI's native ROM handler and superfluously so; however, custom NMIs have no need to do so; rather, like the IRQ, the I-flag is set within the internal set-up of the NMI prior to reaching its handler code. This has caused some confusion in coding circles over the years, unsurprisingly enough.

For more on the above-described technical peculiarities, see Investigating Interrupts and edge sensitive and level sensitive interrupts.

Finally, the flowchart below is included for reference purposes and shows the IRQ and NMI processes in their "native" form, with the KERNAL ROM banked in, BASIC running, etc.

IRQ + NMI flowchart during default ROM operations
(Note the superfluous SEI in the ROM's NMI handler!)

PS - If you value my work and want to support me, a small donation via PayPal would be nice (and thanks if you do!)