Astrododge for Chip8

Write a Chip8 retro gaming emulator in one day. The CPU. And it’s done.

After the introduction to the Chip8 platform, loading the ROM and creating the controller and the display it’s time to complete the Chip8 emulator and spend quality time with some classic 8 bit games. If you missed the previous parts you might want to have a look before moving on. In this article I am going to create the CPU and put the whole thing together.
Chip8 emulator running Space Invaders
See the complete code on GitHub

Understanding the CPU

Chip8 was just an interpreter, a virtual machine. It could run on many different devices so it’s important to specify that there is no such thing as a Chip8 CPU. However we will be talking a lot about opcodes and registers and stack so it is convenient to think of Chip8 as an actual machine and pretend there was a CPU, so I’ll stick to that for the rest of the article. The Cpu class is where the magic happens in my Chip8 emulator.

The stack

As mentioned previously Chip8 uses the stack exclusively for calls and never for values, thus when executing a CALL instruction Chip8 pushes the return address on the stack. On the contrary when executing a RETURN instruction it pops the value from the stack in order to resume execution from where it was before the call. All the computations happen by means of registers. Entries in the stack are 16 bits. The stack cannot contain more than 16 entries. A Chip8 stack and the push() and pop() operations can be implemented very easily with a List.

private var stack = List[Short]()
private def pop : Short = {
  val value = stack.head
  stack = stack.tail

private def push(value:Short) = {
  stack = value :: stack

This code uses some Scala specific syntax so I am going to provide some explanation. As you can see the stack is a List of Shorts, popping one value means removing the first element of the list, known as the head. Any entry inside a List that comes after the head is part of the List’s tail. What pop() does is simply detaching the head of the list, replace the list with its tail and then return the old head. The push() simply concatenates a new element at the beginning of the List with the :: operator, the new element becomes the head of the List.

The registers

The introduction to Chip8 contains an explanation of the Chip8 registers.

private val regVX = Array.ofDim[Byte](16)
private var regI:Short = _
private var regSound:Byte = _
private val regDelay:AtomicInteger = new AtomicInteger()
private var regPC:Int = Memory.PROGRAM_START

Notice the program counter regPC register initialized to Memory.PROGRAM_START (0x200) by default and the delay register regDelay. The delay register is an AtomicInteger, as soon as the value of this register is set to something greater than zero, it will start decrementing the value 60 times per second until it reaches zero again. The delay register is very important for precise timing.

A thread is probably the best approach to keep the timer as decoupled as possible from other executions like the CPU. This is why I use AtomicInteger.

Understanding the opcodes

Chip8 opcodes are always two bytes, the system is big endian. In big endian the most significant byte comes first, and the least significant byte comes next. I know, I always confuse big and little endian too. This is the correct sequence of operations in order to fetch an opcode from memory

val highByte = ram(address)
val lowByte = ram(address + 1)
(((highByte << 8) & 0xFF00) | (lowByte & 0xFF)).toShort

A Chip8 opcode contains two parts, the instruction and the data used by the instruction. The convention in all Chip8 documentation is to use letters for the data. Some opcodes use X, Y and n, these are nibbles, sequences of four bits. In some opcodes you will find kk and kkk, these are respectively two and three nibbles constant. Opcodes working with addresses such as jumps or calls use a three nibble constant. Three nibble constants are written as nnn. Very often the X and Y nibbles are used to access values by indexing into the value registers (see the introduction to Chip8).
An good example is the opcode to draw a sprite on the screen

Dxyn - DRW Vx, Vy, n
Display n-byte sprite starting at memory location I at (Vx, Vy)
Example: D125

The opcode D125 draws a sprite, it gets the horizontal coordinate of the sprite by reading the value register 1 (x=1) and the vertical coordinate by reading the value register 2 (y=2).  The sprite is 5 lines tall. Remember that sprites are always 8 pixels wide. This opcode instructs the Chip8 to start reading the sprite data starting from the memory address currently stored inside register I, or the index register.

8xy3 - XOR Vx, Vy
Set Vx = Vx XOR Vy
Example: 85F3

This opcode performs a bitwise xor between the value stored inside value register 5 and the value stored inside value register 15. It will store the result inside value register 5.

One more example that uses the kk constant

6xkk - LD Vx, byte
Set Vx = kk
Example: 6123

The hexadecimal value 23 is loaded inside value register 1.

Let's have a look at the jump opcode

1nnn - JP addr
Jump to location nnn
Example: 1344

This will jump to absolute address 0x344.
The following piece of Chip8 code draws a sprite, in this example the sprite data is stored at memory location 0x123. The sprite is 5 lines tall and will be drawn at coordinates x=6, y=8

A123 - LD I, 0x123     // Loads value 0x123 into index register 
6006 - LD V0, 6 // Sets V0 to 6
6108 - LD V1, 8 // Sets V1 to 8
D015 - DRW V0, V1, 5 // Display n-byte sprite starting at memory location I at x=V0=6, y=V1=8)

Implementing the CPU

Chip8 has several kinds of opcodes. Opcodes for drawing stuff, jump and calls opcodes, conditional opcodes and mathematic opcodes. All of them follow the same pattern of containing the data that they need. I am not going to list them all here, there are plenty of resources online. What I used is Cowgod's Chip-8 Technical Reference v1.0 which is very detailed and complete..

To emulate a CPU means to implement the fetch+decode+execute cycle

  1. Fetch: retrieve the opcode from memory at the address stored in the program counter (regPC)
  2. Decode: a magic stage where the CPU's hardware translates the opcode value into signals that activate the right circuits. This is kind of unnecessary in software emulation
  3. Execute: perform the operation

This process repeats for every clock cycle. We can use a while loop for this and the exit condition is a boolean value returned from display.isRunning(). This method checks whether the display window is open or closed. As soon as the user clicks on the x button on the window the cpu stops executing and the program terminates.

One more thing is needed inside the loop. We have to implement some mechanism to control the execution speed. Without this the emulation will be too fast and some programs that assume a certain CPU frequency will not work properly. Programs should never assume a specific CPU frequency and especially Chip8 programs. A simple sleep will do the trick. In my Chip8 emulator I used a value of 1000 Hz for CPU_FREQUENCY_HZ. This might be too high, however I found that playing games is much more pleasant when using this value.

while(display.isRunning()) {


Fetching the opcode means reading it from memory at the address inside the program counter register. For each cycle I read the current opcode and then increment the program counter by two. Remember that each opcode is two bytes long

private var regPC:Int = 0x200        // Chip8 programs always start at address 0x200  
def fetch : Short = {
  val value = memory.ramReadTwoBytes(regPC)
  regPC = regPC + 2


As soon as I started coding  I felt the need for a bunch of methods to easily extract the nibbles from the opcodes. Without these guys the code gets too messy. This is the decode phase, and these are the methods.

private def getXandKK(instruction:Short) : (Byte, Byte) = {
  val x: Byte = (instruction & 0x0F00) >> 8
  val kk: Byte = instruction & 0x00FF
  (x, kk)

private def getX(instruction:Short) : Byte = {
  (instruction & 0x0F00) >> 8

private def getXandY(instruction:Short) : (Byte, Byte) = {
  val x: Byte = (instruction & 0x0F00) >> 8
  val y: Byte = (instruction & 0x00F0) >> 4
  (x, y)

private def getXYN(instruction:Short) : (Byte, Byte, Byte) = {
  val x: Byte = (instruction & 0x0F00) >> 8
  val y: Byte = (instruction & 0x00F0) >> 4
  val n: Byte = instruction & 0x000F
  (x, y, n)

private def getNNN(instruction:Short) : Short = {
  instruction & 0x0FFF

In Scala I can use tuples to return more than one value, this is why you see stuff like (x, y, n) at the end of the methods. It's very convenient.


The main component of the CPU is a huge conditional block that performs the correct operation for each opcode. I opted for a pattern matching based switch. Pattern matching is basically on steroids, you might not have it if you are using a different language, in which case you can just fallback to the good old if..else construct. This is an example of the pattern matching conditional logic that I used.

def execute(instr : Short) : Unit = {
  instr match {
    case v if v == 0x00EE =>
      // Return from subroutine
      regPC = pop

    case v if v == 0x00E0 =>
      // Clear display

    case v if (v & 0xF0FF) == 0xF007 =>
      // Set Vx = delay timer value
      val x = getX(v)
      regVX(x) = regDelay.get() & 0xFF

Chip8 has 35 opcodes and I will not list the whole implementation here as this would clutter the article. I'm pretty sure that by this point you are able to figure out what is happening for the remaining opcodes yourself. Have a look at the Cpu class.

The timer thread

Now that the CPU is done we need the delay timer. We can emulate it as a thread that decrements the delay registers at a rate of 60 times per seconds until the value reaches zero. The implementation is quite straightforward.

var lastDelayDecrement = System.currentTimeMillis()
val delayThread = new Thread {
  override def run(): Unit = {
    while(!isInterrupted) {
      val elapsed = System.currentTimeMillis() - lastDelayDecrement
        // Roughly 60 Hz
        if (elapsed >= 16) {
          lastDelayDecrement = System.currentTimeMillis()

Let's finish this

We can start wiring all the components together and creating the Chip8 emulator.

println("+++ " + EmulatorParameters.NAME + " started +++")
val memory = new Memory

This first part creates the memory.

println("Loading interpreter")
val interpreter = new Interpreter(memory)

Here we create the interpreter and load the system fonts in memory. With the next piece of code we load the ROM, the name of the rom was retrieved from the command line arguments and stored in the programName variable previously.

println("Loading '" + programName + "'")
val programSize = interpreter
    .getOrElse {

We move on the instantiating the CPU, the Controller and the Display. This will show the display window.

 println("Loaded " + programName + " : " + programSize + " bytes" )
 val controller = new Controller
 val display = new Display(memory, controller)
 val cpu = new Cpu(memory, controller)

At this point we can start the two threads, one for rendering graphics on the display and the other one for taking care of the delay register.


And finally we can run the main CPU loop

while(display.isRunning()) {

We want to exit the CPU execution loop as soon as the user closes the window, in addition to that we also want to stop the delay thread.


And this is it. The emulator is complete. Congratulations if you made it this far.

Debugging the damn thing

The good news is that you have a working Chip8 emulator.

The bad news is that you probably did something wrong and need to debug this thing by trying several ROMs and comparing the result to a working Chip8 emulator. It's ok, if it would work at the first try you should be suspicious as well. You can test single functionalities by running the tests inside the programs folder, these are far less complex programs with respect to demos and games and make it a lot easier to debug your Chip8 emulator. Unfortunately I cannot help you with this, all I can say is don't give up now! You're almost there!

If you want to use my Chip8 emulator as a reference for debugging you can enable debug logs by setting EmulatorParameters.DEBUG_CPU to true, this will print the execution address and the value of every register for each instruction. Collecting this output and diffing it against your Chip8 emulator's output (just copy the same code in your CPU) can greatly help spotting problems.


Thank you for reading this far, I hope this series of articles was a good inspiration for you to code your own Chip8 emulator. I had a lot of fun doing this, I hope to find the time to write more about retro gaming emulation, it's a fascinating thing to learn how things were done before our days. Please share your Chip8 emulator projects here in the comments!

Please check out the mighty SuperCHIP emulator as well!

Unique opportunity! Help a fellow grow his blog!
Hi there! If you've read this far maybe you think this was useful, or fun, or I don't know what but for some reason You Got Here! Great! Please consider sharing this post with your network, I am trying to get The Code Butchery to grow so I can provide more content like this, will you help me in my journey? Thank you!
Series Navigation<< Write a Chip8 retro gaming emulator in one day. Controller and Display.How to make a SuperCHIP emulator >>
Share this

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.