CHIP-8 in Rust

The source code for this project can be found here on my Github. This was an excercise in learning Rust for me, so the code may not be perfectly best practice. I have not gone back and re-written any of this since I completed it, which was some time ago now. Critique and feedback are welcome.

I think emulators are very cool – I’ve been playing old Nintendo and Playstation games on emulators since highschool. Recently I’ve been curious how they work and after much lurking on r/emuDev I decided to try my hand at writing one. Ultimately I’d like to write a Gameboy or NES emulator, but the sensible advice is to do CHIP-8 first so here we are.

I’ve taken quite a liking to Rust since completing this project. Rust makes me nostalgic for all the low-level language work I did in college, something I don’t get to do much these days. Most importantly there are SDL2 modules for Rust, so all the handy graphical window stuff I might otherwise do using C++ is available.

Reference Material

I primarily relied on this page for the architecture and instruction set of the CHIP-8: Cowgod’s Chip-8 Technical Reference. It is often sited in comments on r/emuDev which is where I found it. That page provides a thorough overview of the CHIP-8 hardware and architecture as well as a full listing of the instruction set. Note that while that page has details on the Super CHIP-8 I did not implement that functionality.

A Note on Dependencies

This project uses two modules beyond those in the standard library, the documentation for which can be found here:

rand – used to generate random numbers
rust-sdl2 – SDL2 bindings for Rust, used for all graphical window, sound, and keyboard functionality

The Overall Architecture

The CHIP-8 consists of the following components:

4096 bytes of memory, addressable per byte
16-bit program counter register
16 8-bit general purpose registers
8-bit register for the stack pointer
16-level 16-bit call stack
8-bit register for the delay timer
8-bit register for the sound timer
16-bit register called ‘I’ or index
A 16-key keypad
64x32 pixel monochrome display
A single-tone sound device

There is also a 60hz clock solely used for decrementing the timer registers. The CPU clock varies between 500 Hz and 700 Hz depending on the specific CHIP-8 hardware so the emulator allows for user configurable clock speed. This is important since certain ROMs perform better on certain clock speeds.

I have organized these pieces into the following modules:

cpu.rs - Models the memory, registers, and the fetch-decode-execute loop
screen.rs - Models the monochrome display
keypad.rs - Models the 16-key keypad
audio.rs - Models the single-tone audio device
main.rs - Emulator entry point, takes user input and runs the emulator

First let’s look at each of those components individually, then we’ll tie everything together and break down the emulator execution loop that actually runs the chosen ROM.

The CPU

The CPU is modeled by the struct below. Most of the fields are straight forward – they map directly to one of the components I listed above. Some of the fields, such as status or wait_for_key are used for internal state and will come up later.

pub struct Cpu {
    mem: [u8; 4096],
    regs: [u8; 16],
    pc: u16,
    idx: u16,
    stack: [u16; 16],
    sp: i8,
    sound: u8,
    delay: u8,
    timer_time: Instant,
    screen: screen::Screen,
    audio: audio::Audio,
    keypad: keypad::Keypad,
    beep: bool,
    pub status: CpuStatus,
    wait_for_key: bool,
}

The CPU tick and the actual execution of instructions are all functions in the Cpu struct impl block. The method for decoding and executing instructions is covered later but an example instruction implementation looks like this:

impl Cpu {
    ...
    //5XY0 - skip next instruction if value in register X and Y are equal
    fn category_5(&mut self, x: u8, y: u8) {
        let xval: u8 = self.regs[x as usize];
        let yval: u8 = self.regs[y as usize];
        if xval == yval {
            self.pc = self.pc + 2;
        }
    }
}

The CPU also has several helper functions to handle per-tick updates such as updating the keyboard state in Cpu.keypad and decrementing the timers. The 60 Hz clock is also maintained in the Cpu object.

impl Cpu {
    ...
    pub fn tick(&mut self, kb_state: sdl2:⌨️:KeyboardState) {
        let now = Instant::now();
        //decrement timers if 60 Hz worth of micros have passed since last time
        if now.duration_since(self.timer_time).as_micros() >= MICROS_60_HZ {
            self.timer_time = now;
            self.decrement_counters();
        }
        ...
        //Process keyboard input passed in from main
        self.keypad.update_pressed_keys(kb_state);
        ...
    }
    pub fn decrement_counters(&mut self) {
        if self.delay > 0 {
            self.delay = self.delay - 1;
        }
        if self.sound > 0 {
            self.sound = self.sound - 1;
        }
        if self.sound == 0 {
            self.beep = false;
        }
    }
}

In an actual CHIP-8 the first 512 bytes of memory are reserved for the interpreter itself; it’s not neccessary to store the interpreter so the only data in this space are the sprites for the system font. Any additional sprites needed by a program are stored in the program memory space. All internal values are initialized to 0 except the program counter, which is initialized to 0x200.

const FONT_DATA: [u8; 80] = [
    0xF0, 0x90, 0x90, 0x90, 0xF0, // 0
    0x20, 0x60, 0x20, 0x20, 0x70, // 1
    ...
    0xF0, 0x80, 0xF0, 0x80, 0xF0, // E
    0xF0, 0x80, 0xF0, 0x80, 0x80  // F
];
...
pub fn new(ctx: &sdl2::Sdl, fname: &str) -> Cpu {
    let mut cpu = Cpu {
        mem: [0; 4096],
        regs: [0; 16],
        pc: 0x200,
        idx: 0,
        stack: [0; 16],
        sp: -1,
        sound: 0,
        delay: 0,
        ...
    };
    for number in 0..80 {
        cpu.mem[number] = FONT_DATA[number];
    }
}

Once the emulator is initialzed and emulation begins the CPU is responsible for loading the ROM into memory and fetching, decoding, and executing the instructions therein.

The Screen

The screen is represented by an SDL2 window. The actual dimensions of the CHIP-8 display were 64x32 pixels but that is very difficult to see on a modern screen so my implementation applies a scale factor of 10 for a 640x320 window.

This is the screen displaying the opcodes from a test rom:

The data structure for the screen is very simple because most of it is abstracted into the SDL2 WindowCanvas object. The only major screen function beyond instantiating the SDL window object is drawing the new pixel array when the DXYN instruction is executed.

pub struct Screen {
    pixels: [bool; NUM_PIXELS],
    canvas: sdl2::render::WindowCanvas,
}
impl Screen {
    pub fn draw(&mut self) {
        self.canvas.set_draw_color(COLOR_OFF);
        self.canvas.clear();
        self.canvas.set_draw_color(COLOR_ON);
        for number in 0..(NUM_PIXELS) {
            if self.pixels[number] {
                let xcoord = ((number % SCREEN_WIDTH_BASE) * SCALE) as i32;
                let ycoord = ((number / SCREEN_WIDTH_BASE) * SCALE) as i32;
                let rect = Rect::new(xcoord, ycoord, SCALE as u32, SCALE as u32);
                self.canvas.fill_rect(rect).unwrap();
            }
        }
        self.canvas.present();
    }
}

The CPU makes use of the set_pixel and get_pixel helper functions on the screen object in order to update the boolean pixel array when DXYN is executed. The draw function then just draws the current pixel array to the screen – all the XORing and collision detection is performed in the CPU.

The scale factor and the pixel colors are maintained as constants in screen.rs. From the CPU’s perspective the screen is still 64x32 and pixels are simply on or off. This allows any scale factor or color to be used without any major changes.

const COLOR_OFF: Color = Color {
    r: 0,
    g: 0,
    b: 0,
    a: 0xFF,
};
const COLOR_ON: Color = Color {
    r: 0xFF,
    g: 0xFF,
    b: 0xFF,
    a: 0xFF,
};
pub const SCALE: usize = 10;
pub const SCREEN_WIDTH_BASE: usize = 64;
pub const SCREEN_WIDTH: usize = SCREEN_WIDTH_BASE * SCALE;
pub const SCREEN_HEIGHT_BASE: usize = 32;
pub const SCREEN_HEIGHT: usize = SCREEN_HEIGHT_BASE * SCALE;
pub const NUM_PIXELS: usize = SCREEN_WIDTH_BASE * SCREEN_HEIGHT_BASE;

The Keypad

The keypad module represents the 16-key CHIP-8 input device. The keypad struct is fairly simple:

pub struct Keypad {
    pub keys: [bool; 16],  // bool per key, true if pressed
    pub key_pressed: bool, // bool for 'is any key pressed'
    pub latest_key: u8,    // integer ID of last key pressed
}

The interface with the physical device is handled by the SDL2 keyboard module; my keypad module maintains state for the CPU’s handful of keypad related instructions. This is achieved via the update_pressed_keys function in keypad.rs which runs on each CPU tick, keeping the keys array up to date.

pub fn update_pressed_keys(&mut self, state: sdl2:⌨️:KeyboardState) {
    let pressed_keys: HashSet<Scancode> = state.pressed_scancodes().collect();
    for number in 0..16 {
        let pressed = pressed_keys.contains(&SCANCODE_MAP[number]);
        if !self.keys[number] && pressed {
            self.key_pressed = true;
            self.latest_key = number as u8;
        }
        self.keys[number] = pressed;
    }
}

The instructions that care about the keypad values then reference the state stored in the Keypad object, knowing they are up-to-date as of the beginning of the tick.

The Audio

The audio module is a lot of boilerplate code to make SDL2 emit a tone. The CHIP-8 sound device is a monotone on-or-off setup that emits its tone whenever the sound timer register is non-zero.

These are the high-level bits, the code is essentially taken from the rust-sdl2 audio docs.

pub enum AudioState {
    On,
    Off,
}
pub struct Audio {
    pub device: sdl2::audio::AudioDevice<SquareWave>,
    pub state: AudioState,
}
impl Audio {
    ...
    pub fn beep(&mut self, state: AudioState) {
        match state {
            AudioState::On => self.device.resume(),
            AudioState::Off => self.device.pause(),
        }
}

The beep function is called on every tick when the sound timer is checked, setting AudioState::On if the timer is non-zero and AudioState::Off otherwise.

Connecting the Pieces

The entry point file main.rs handles various important setup pieces before starting the main loop.

First the the desired clock speed and ROM path are queried from the user This is mostly a lot of parsing and validating the user’s input. The user’s desired clock speed is converted from Hertz to number of microseconds per cycle; this value is then used to pad time at the end of each tick to keep the cycle speed consistent.

let clock_speed_micros = ((1 as f32 / clock_hz) * 1_000_000 as f32) as u32;

Next SDL is initialized and a handle on the event_pump is instantiated. SDL can only be initialized once so this handle is passed into downstream objects that require it.

    let sdl_context;
    match sdl2::init() {
        Ok(ctx) => sdl_context = ctx,
        Err(e) => panic!(e)
    }
    let mut event_pump;
    match sdl_context.event_pump() {
        Ok(pump) => event_pump = pump,
        Err(e) => panic!(e)
    }

The Cpu object is instantiated using the SDL context and the path to the ROM. The SDL keyboard, audio, and window subsystems are initialized in the constructors for the keypad, screen, and audio objects. Those objects are created as part of constructing the Cpu object and the context is passed into each as needed. The contents of the parsed ROM file are loaded byte-by-byte into the memory array.

//main.rs
let mut cpu = cpu::Cpu::new(&sdl_context,rom_path.to_str().unwrap());

//cpu.rs -- Cpu new(...) function
let mut cpu = Cpu {
    mem: [0; 4096],
    ...
    timer_time: Instant::now(),
    screen: screen::Screen::new(ctx),
    audio: audio::Audio::new(ctx),
    keypad: keypad::Keypad::new(),
    beep: false,
    status: CpuStatus::Running,
    wait_for_key: false,
};
for (pos, e) in rom.iter().enumerate() {
    cpu.mem[cpu.pc as usize + pos] = *e;
}
//screen.rs -- Screen new(...) function
pub fn new(ctx: &sdl2::Sdl) -> Screen {
    let video_subsystem = ctx.video().unwrap();
    let window = video_subsystem.window("Chip8", SCREEN_WIDTH as u32, SCREEN_HEIGHT as u32)
        .position_centered()
        .build()
        .map_err(|e| e.to_string()).unwrap();
    ...
    return screen;
}

The SDL context is passed into the Cpu constructor and the Cpu object owns the screen, keypad, and audio objects. This works because the loop in main.rs only ever needs to interact direclty with the Cpu object. Maintaining these handles outside the CPU object was very messy and caused a lot of headaches.

Once the initial user input and the initialization of the CPU are complete the steady-state run loop begins and continues until either the user exits the window or the final ROM instruction is passed.

    loop {
        match cpu.status {
            cpu::CpuStatus::Running => {
                for event in event_pump.poll_iter() {
                    match event {
                        Event::Quit { .. } => {
                            cpu.halt();
                        },
                        _ => {}
                    }
                }
                let start = Instant::now();
                cpu.tick(event_pump.keyboard_state());
                let finish = Instant::now();
                let delta = finish.duration_since(start);
                if delta.subsec_micros() < clock_speed_micros {
                    let wait = Duration::from_micros(1666 - delta.subsec_micros() as u64);
                    std:🧵:sleep(wait);
                }
            },
            cpu::CpuStatus::Halted => {
                break;
            }
        }
    }

If the CPU status has changed to cpu::CpuStatus::Halted, the program exits. Otherwise, note the time this cycle started, execute a CPU tick, then wait until sufficient microseconds have passed to maintain the clock speed before moving on to the next cycle. The tick function handles fetching, decoding, and executing the instructions in the CPU and maintaining the CPU state.

Executing a ROM

CPU execution on the CHIP-8 uses a basic fetch-decode-execute loop. Instructions are read from memory, decoded into operation and operands, and then the appropriate CPU instruction is executed with those operands.

CHIP-8 instructions are 16-bits long and take up two memory addresses. They are stored big-endian in memory beginning at address 0x200. The valid CHIP-8 instructions are as follows:

Opcode		Opcode
00E0	Clear screen	8xyE	Vx=Vx«1, Vf=high bit
00EE	PC=stack pop	9xy0	PC+=2 if Vx!=Vy
0nnn	Unused	Annn	I=nnn
1nnn	PC=nnn	Bnnn	PC=V0+nnn
2nnn	Push PC, PC=nnn	Cxkk	Vx=rand AND kk
3xkk	PC+=2 if Vx==kk	Dxyn	Draw n-byte sprite at (x,y), Vf=collision
4xkk	PC+=2 if Vx!=kk	Ex9E	PC+=2 if Key Vx pressed
5xy0	PC+=2 if Vx==Vy	ExA1	PC+=2 if Key Vx not pressed
6xkk	Vx=kk	Fx07	Vx=delay timer
7xkk	Vx=Vx+kk	Fx0A	Wait for key press, store in Vx
8xy0	Vx=Vy	Fx15	delay_timer=Vx
8xy1	Vx=Vx OR Vy	Fx18	sound_timer=Vx
8xy2	Vx=Vx AND Vy	Fx1E	I=I+Vx
8xy3	Vx=Vx XOR Vy	Fx29	I=addr sprite Vx
8xy4	Vx=Vx+Vy, Vf=carry	Fx33	Store BCD Vx in memory I, I+1, I+2
8xy5	Vx=Vx-Vy, VF=NOT Borrow	Fx55	Store V0 to Vx in memory I+
8xy6	Vx=Vx»1, Vf=low bit	Fx65	Load V0 to Vx from memory I+
8xy7	Vx=Vy-Vx, Vf=NOT Borrow

The lower case values (x, y, kk, n, nnn) are operands and the uppercase and numerals are fixed values that determine the operation. For example the Fx65 instruction can only vary in bits 8-11 (e.g. operand ‘x’) and the other bits are constant for that instruction.

Individual CPU cycles, or ticks, are performed via the tick function in the Cpu implementation block in cpu.rs. This function is invoked in the main emulator loop in main.rs once per clock cycle.

pub fn tick(&mut self, kb_state: sdl2:⌨️:KeyboardState) {
    let now = Instant::now();
    if now.duration_since(self.timer_time).as_micros() >= MICROS_60_HZ {
        self.timer_time = now;
        self.decrement_counters();
    }
    if self.sound > 0 {
        self.audio.beep(audio::AudioState::On);
    } else {
        self.audio.beep(audio::AudioState::Off);
    }
    self.keypad.update_pressed_keys(kb_state);
    let instruction = Self::fetch(self);
    let (category, x, y, n, nn, nnn) = Self::decode(self, instruction);
    match category {
        0x0 => Self::category_0(self, n),
        0x1 => Self::category_1(self, nnn),
        0x2 => Self::category_2(self, nnn),
        0x3 => Self::category_3(self, x, nn),
        0x4 => Self::category_4(self, x, nn),
        0x5 => Self::category_5(self, x, y),
        0x6 => Self::category_6(self, x, nn),
        0x7 => Self::category_7(self, x, nn),
        0x8 => Self::category_8(self, x, y, n),
        0x9 => Self::category_9(self, x, y),
        0xA => Self::category_a(self, nnn),
        0xB => Self::category_b(self, nnn),
        0xC => Self::category_c(self, x, nn),
        0xD => Self::category_d(self, x, y, n),
        0xE => Self::category_e(self, x, n),
        0xF => Self::category_f(self, x, nn),
        _ => {
            panic!("Unknown opcode, panicking.");
        }
    }
}

The tick function carries out several important steps. First the current time is compared against the last tick’s check of the 60 Hz clock – if enough microseconds have passed for a 60 Hz cycle then the clock is reset and the timers are decremented. Then if the sound timer is non-zero the audio device is toggled on, otherwise it is disabled.

Next the keyboard state, passed in from main.rs via the SDL event_pump, is updated in the Keypad object. This keeps the Cpu’s data on currently pressed keys and the most recently pressed key current every tick. This state is referenced in the handful of keypad functions.

Finally the fetch, decode, and execute processes happen. These steps perform the actual execution of the ROM and run the program.

Fetch

The fetch function is straight forward – it loads the big-endian two-byte instruction from memory at the current program counter (PC) value and PC+1. These two bytes are combined into a single 16-bit instruction and returned. No processing is done on the instruction during the fetch step.

    fn fetch(&mut self) -> u16 {
        let mut opcode: u16 = 0;
        opcode = opcode + (self.mem[self.pc as usize]) as u16;
        opcode = opcode << 8;
        opcode = opcode + (self.mem[(self.pc+1) as usize]) as u16;
        self.pc = self.pc + 2;
        return opcode;
    }

Decode

The decode function serves to split the 16-bit instruction into the various subpieces that might be needed for execution. The high 4 bits of the instruction are never used as an operand, but different instructions use different combinations of the low 12 bits. For example ANNN uses the low 12 bits as a single operand while 8XY5 uses the middle 8 bits as two separate 4-bit operands.

    fn decode(&self, instruction: u16) -> (u8, u8, u8, u8, u8, u16) {
        let category: u8 = (instruction >> 12) as u8;
        let x: u8 = ((instruction & 0x0F00) >> 8) as u8;
        let y: u8 = ((instruction & 0x00F0) >> 4) as u8;
        let n: u8 = (instruction & 0x000F) as u8;
        let nn: u8 = (instruction & 0x00FF) as u8;
        let nnn: u16 = instruction & 0x0FFF;
        return (category, x, y, n, nn, nnn);
    }

Execute

The execute function is more involved than fetch and decode as it contains the implementation for all the instructions. Many of the CHIP-8’s instructions can be uniquely identified by the first 4 bits; where this is not possible the instructions must be further isolated using the low 8 or low 4 bits.

The tick function begins the execute phase with a match block, matching on the category returned by decode. Each category of instruction, categorized by the first 4-bits, has it’s own function called by execute.

match category {
        0x0 => Self::category_0(self, n),
        0x1 => Self::category_1(self, nnn),
        0x2 => Self::category_2(self, nnn),
        0x3 => Self::category_3(self, x, nn),
        0x4 => Self::category_4(self, x, nn),
        0x5 => Self::category_5(self, x, y),
        0x6 => Self::category_6(self, x, nn),
        0x7 => Self::category_7(self, x, nn),
        0x8 => Self::category_8(self, x, y, n),
        0x9 => Self::category_9(self, x, y),
        0xA => Self::category_a(self, nnn),
        0xB => Self::category_b(self, nnn),
        0xC => Self::category_c(self, x, nn),
        0xD => Self::category_d(self, x, y, n),
        0xE => Self::category_e(self, x, n),
        0xF => Self::category_f(self, x, nn),
        _ => {
            panic!("Unknown opcode, panicking.");
        }

Where there is more than one instruction in the category additional pattern matching in the category function executes the appropriate code. See category A, a single instruction category, versus category 8 which has multiple instructions differentiated by the low 4 bits.

fn category_a(&mut self, nnn: u16) {
    self.idx = nnn;
}

fn category_8(&mut self, x: u8, y: u8, n: u8) {
    match n {
        0x0 => self.regs[x as usize] = self.regs[y as usize],
        0x1 => self.regs[x as usize] = self.regs[x as usize] | self.regs[y as usize],
        0x2 => self.regs[x as usize] = self.regs[x as usize] & self.regs[y as usize],
        0x3 => self.regs[x as usize] = self.regs[x as usize] ^ self.regs[y as usize],
        0x4 => {
            let addn: (u8, bool) = self.regs[x as usize].overflowing_add(self.regs[y as usize]);
            self.regs[x as usize] = addn.0;
            if addn.1 {
                self.regs[0xF] = 1;
            } else {
                self.regs[0xF] = 0;
            }
        },
        ...
        _ => panic!("Unsupported op code")

    }
}

The top-level emulator loop continues like this, invoking ticks on the CPU, until either the user exits the window or the final instruction in the ROM is passed. Putting it all together you can fire up a ROM:

An Aside on Drawing

The instructions in the CHIP-8 are stright forward – most of them involve some basic arithmetic or boolean logic and manipulation of the program counter. The one that seems to trip people up the most is the DXYN instruction. DXYN loads sprites from memory one 8-pixel row at a time and draws them. The resulting screen values are the current pixel value XOR’d with the sprite value, which also helps determine sprite collision. It’s not hard to mess up the logic if you don’t read the instruction carefully.

My implementation of DXYN is below, it utilizes the draw function and others from screen.rs that I described above. The CPU makes use of the boolean pixel array in the screen object to perform all of the index math and pixel XOR’ing, and then udpates the screen’s array. When screen.draw() is finally called all it does is redraw the pixel array.

//cpu.rs
fn category_d(&mut self, x: u8, y: u8, n: u8) {
    let xcoord: usize = (self.regs[x as usize] as usize) % (screen::SCREEN_WIDTH_BASE);
    let ycoord: usize = (self.regs[y as usize] as usize) % (screen::SCREEN_HEIGHT_BASE);
    self.regs[0xF] = 0;
    for number in 0..(n as usize) {
        if ycoord + number >= screen::SCREEN_HEIGHT_BASE {
            continue;
        }
        let line = self.mem[(self.idx as usize) + number];
        for x in 0..8 {
            let pix_idx = xcoord + x + ((ycoord + number)* screen::SCREEN_WIDTH_BASE);
            if (xcoord + x) >= screen::SCREEN_WIDTH_BASE {
                continue;
            }
            let sprite_pixel = ((line as usize) >> (7 - x)) & 0x1;
            let cur_pixel = self.screen.get_pixel(pix_idx);
            let new_pixel = cur_pixel ^ (sprite_pixel != 0);
            if cur_pixel == true && new_pixel == false {
                self.regs[0xF] = 1;
            }
            self.screen.set_pixel(pix_idx, new_pixel);
        }
    }
    self.screen.draw();
}

An interesting effect of the XOR-based drawing strategy is that the screen flickers as it updates. Sprites must be erased by drawing back over them. Here’s an example from one of the commonly available ROMs:

Improvements and Additions

As I mentioned, this project was a much about learning Rust for me as it was making an emulator. I would definitely like to revisit this project, clean up the structure and make things more closely conform to the Rust way of doing things – like better use of enums and pattern matching. As part of that I intend to use this to learn more about Rust’s testing features, as you’ll notice there are no unit tests in this project.

Another major improvement is a live “debugger” – which is really to say a live view of the registers, program counter, stack, etc. That was a little ambitious when I initially wrote this but it seems very attainable now and will be very useful if I do get around to a more complicated emulator.

Finally I would like to implement a GUI for the initial user setup, where the clock speed and ROM choice is configured. This would ideally also allow a top-level interface to fall back to when the emulation ends instead of just closing the emulator entirely. I’ve been looking into gtk-rs for this.

This was a fun little project and I’m definitely a fan of Rust now. I look forward to tackling more emulation projects and improving my technical writing along the way. Thanks for reading!