Chip-8 (#88)

by Ethan Price

CHIP-8 was an interpreted language used in the 1970's for basic games like Pong. While it is technically an interpreted language, it defines many components a real computer has, such as registers and a processor. Today, like many other gaming consoles, it is kept alive by emulators allowing it to be played on most any modern computer. Emulation is where raw operation codes (the most low-level way of feeding instructions to a computer) are translated by another computer, giving a totally different system the ability to run programs not designed for it.

Your job is to make an emulator. It will only cover some basic parts of it, but all emulators have to start somewhere. I have written a simple program, and will give you a list of all the opcodes I used and what they are supposed to do. We won't worry about graphics right now, just simple operations. Four things need to be done:

1. Make the emulator read the file. All opcodes are four digit hex numbers,
but those will need to be split.
2. Set up the registers and other assorted things so they can be modified.
3. Interpret the function of the opcode in Ruby.
4. Dump all the registers so you can check and make sure everything went well.

Like I said, all opcodes are four digit long hex numbers. So you read four hex digits worth of data, process it, then move on to the next four digits. There are 16 registers (areas where data can be stored). They are named V0 through VF, and are all eight bits wide. VF is usually used as a carry for math operations.

Here is the list of opcodes you will need to interpret:

NNN is an address
KK is an 8 bit constant
X and Y are two 4 bits constants

0000 This is nothing. If you see this, it should mean that you are at the end
of the file and have nothing left to read, so you therefore need to exit
gracefully.

1NNN Jump to the address NNN of the file

3XKK Skip next instruction if VX == KK

6XKK VX = KK

7XKK VX = VX + KK (Note: The documentation does not mention this, but I would
assume that VF would act as a carry if needed. The
included program will not need a carry for this
function, so providing for one is purely optional.)

8XY0 VX = VY

8XY1 VX = VX OR VY

8XY2 VX = VX AND VY

8XY3 VX = VX XOR VY

8XY4 VX = VX + VY, Sets VF to 1 if there is a carry, 0 if there isn't

8XY5 VX = VX - VY, VF is set to 0 if there is a borrow, 1 if there isn't.

8X06 VX = VX SHIFT RIGHT 1 (VX=VX/2), VF is set to the value of the least
significant bit of VX before the shift.

8XY7 VX = VY - VX, VF is set to 0 if there is a borrow, 1 if there isn't.

8X0E VX = VX SHIFT LEFT 1 (VX=VX*2), VF is set to the value of the most
significant bit of VX before the shift.

CXKK VX = Random number AND KK

Lets explain how to read these opcodes. Take opcode 1NNN for example. Since the 1NNN starts with 1, we know that no matter what comes after it we will be doing a jump instruction. Same with the other codes. For the opcode 8XY2, we know if an opcode starts with 8 and ends in 2 it will be the operation "VX = VX AND VY".

The variables NNN, KK, X and Y are a little different. They can represent any hex number (0-F), and also are used to note how big of chunks to read them in. For NNN, you will want to read those as a three digit hex number. Lets do an example opcode (in hex): 1234. Since it starts with 1, we know it is a "Jump to address NNN" command. The three numbers after that (the NNN part), are 234, so we know to jump to location 234 in the file (this is a hex number, not decimal). KK and X/Y are similar, just for different sized chunks.

While I'm on the subject, let's talk about addresses in detail. Your current address in the file is how many bytes you are from the beginning of the file. So if I am at address 008 in the file, I am eight bytes away from the beginning of the file, and therefore the next byte I read will be byte nine. Say we had the opcode 100F, that means we would jump to after byte 16 (it is a hex remember).

If your wondering how the carries work, here are some examples. First off, addition. Say we add these two numbers (opcode 8xy4):

11111111 <--This is VX
+
00000001 <--This is VY

Obviously this would equal 100000000, but that is too big for one register. So what we do is make VF equal 00000001, and then roll over to 00000000. Sort of like an old car odometer. When it hits 99,999 miles, it goes back to 00,000 miles, except we are keeping track of the fact that it rolled over. Subtraction is similar (opcode 8xy5):

00000000 <-- This is VX
-
00000001 <-- This is VY

Since the registers can only deal with positive numbers, we have to keep it positive. So what we do is "borrow", making VX equal to 100000000. All of a sudden our subtraction works:

100000000 <--VX
-
00000001 <--VY
=
11111111

If we do this though, we have to make sure VF is zero to indicate that a borrow was needed. If a borrow was not needed, VF gets set to 00000001, since you didn't need the extra.

Now for shifts. To sum it up, if you shift left, the far left number gets taken away, and a zero added to the far right. So 10101111 would become 01011110, because we got rid of the far left number and added a zero to the far right. Shift right is the same except going the other way (get rid of far right number, add 0 to the left). As you can see, a number disappears when you shift. VF is set to the number that disappears (obviously it can only be a 1 or 0).

A full list of opcodes can be had at David Winter's page, under section 1.5:

http://www.pdc.kth.se/%7Elfo/chip8/CHIP8.htm

He also has a full explanation of everything, so if check there for more details. If you want to write code for an opcode that I didn't list, by all means go for it.

Be sure to start reading the attached program from the beginning. To my understanding in a real Chip-8 game up to address 200 was reserved for machine code which we wont worry about.

The register values you should get are:

V1:01000101
V2:10111011
V3:11101100
V4:this number should be random, so do multiple runs to make sure it changes
VF:00000000

A note: If everything is working as it is supposed to, you should run straight through the file once. If you are looping, I would start with double checking all your math operations.

Good Luck!

Chip-8 Test Program


Quiz Summary

As some of you may know, the annual ICFP contest was just two weekends ago and this year's problem involved the creation of a small virtual machine, much as this quiz does. We didn't plan that, but it turned out to be a pleasant synergy.

The ICFP machine was really a non-ideal environment for Ruby. The performance issues there really called for C speed. In contrast, Chip-8 programs seem to fit Ruby emulation nicely.

I'm not going to cover it below, but do spend a little time playing with Sander Land's complete emulator if you have a Windows box handy. It runs games from the quiz-linked site and isn't much more code than some of the other solutions.

I do want to take a look at Boris Prinz's solution though, so let's get to it. This first chunk of code isn't the actual quiz solution, but it's a wonderful tool for seeing how solutions process Chip-8 files. Here's Boris's disassembler:

ruby
class Chip8Disassembler
CODES = {
/0000/ => 'exit',
/1(...)/ => 'goto \1',
/3(.)(..)/ => 'skip next if V\1 == 0x\2',
/6(.)(..)/ => 'V\1 = 0x\2',
/7(.)(..)/ => 'V\1 = V\1 + 0x\2',
/8(.)(.)0/ => 'V\1 = V\2',
/8(.)(.)1/ => 'V\1 = V\1 | V\2',
/8(.)(.)2/ => 'V\1 = V\1 & V\2',
/8(.)(.)3/ => 'V\1 = V\1 ^ V\2',
/8(.)(.)4/ => 'V\1 = V\1 + V\2',
/8(.)(.)5/ => 'V\1 = V\1 - V\2',
/8(.)06/ => 'V\1 = V\1 >> 1',
/8(.)(.)7/ => 'V\1 = V\2 - V\1',
/8(.)0E/ => 'V\1 = V\1 << 1',
/C(.)(..)/ => 'V\1 = rand() & 0x\2',
}

def self.code2text hexcode
CODES.each do |re, subs|
if hexcode =~ re
return hexcode.sub(re, subs)
end
end
'???'
end

def self.dump binary
opcodes = binary.unpack "n*"
opcodes.each_with_index do |code, waddr|
code_hex = "%04X" % code
printf("%03X: [%s] %s\n", waddr*2, code_hex, code2text(code_hex));
end
end
end

binary = File.read(ARGV[0])
Chip8Disassembler.dump(binary)

If you skip to the end, you can see that this code just slurps the Chip-8 program passed as an argument and feeds the whole thing to dump().

The dump() method separates the program into opcodes with a simple call to unpack(). The "n*" pattern for unpack() has it read two characters at a time as an unsigned integer. It also dictates that the number is in "network byte-order" which avoids endian issues on different architectures. From there each opcode is traversed, turned into a hex code, and handed off to the code2text() method for translation.

Inside code2text(), the passed hexcode is compared to each pattern of the CODES Hash in turn. When a match is found, the key and value Hash pair are used to perform a regular expression conversion from code to source statement and that statement is returned.

Each of those statements is decorated with the code_hex and the address of the instruction in the program file, then printed. The end result looks like this, for the quiz test program:

000: [6177] V1 = 0x77
002: [6245] V2 = 0x45
004: [7101] V1 = V1 + 0x01
006: [8320] V3 = V2
008: [8121] V1 = V1 | V2
00A: [8122] V1 = V1 & V2
00C: [8233] V2 = V2 ^ V3
00E: [8134] V1 = V1 + V3
010: [8235] V2 = V2 - V3
012: [8106] V1 = V1 >> 1
014: [8327] V3 = V2 - V3
016: [830E] V3 = V3 << 1
018: [64FF] V4 = 0xFF
01A: [C411] V4 = rand() & 0x11
01C: [32BB] skip next if V2 == 0xBB
01E: [1000] goto 000
020: [0000] exit

The hex codes from the quiz description are in brackets in the middle and you can see the operations just to the right of that. Hopefully that makes it clear how you break down instructions and what they are suppose to do.

Now we should be well equipped to read Boris's actual solution. Here's how it begins:

ruby
class Chip8Emulator
attr_accessor :register

VF = 15 # index of the carry/borrow register

def initialize
@register = [0] * 16 # V0..VF
end

# ...

I doubt there is anything tricky there. The registers are just an Array of Integers which are made available through the accessor. We also see a convenient VF constant defined for accessing that register.

Here's the main event loop of the emulator:

ruby
# ...

def exec(program)
opcodes = program.unpack('n*')
current = 0 # current opcode index
loop do
opcode = opcodes[current]

# these are needed often:
x = opcode >> 8 & 0x0F
y = opcode >> 4 & 0x0F
kk = opcode & 0xFF
nnn = opcode & 0x0FFF

case opcode >> 12 # first nibble
when 0 then return
when 1 then current = (nnn) / 2 and next
when 3 then current += 1 if @register[x] == kk
when 6 then @register[x] = kk
when 7 then add(x, kk)
when 8
case opcode & 0x0F # last nibble
when 0 then @register[x] = @register[y]
when 1 then @register[x] |= @register[y]
when 2 then @register[x] &= @register[y]
when 3 then @register[x] ^= @register[y]
when 4 then add(x, @register[y])
when 5 then subtract(x, x, y)
when 6 then shift_right(x)
when 7 then subtract(x, y, x)
when 0xE then shift_left(x)
else raise "Unknown opcode: " + opcode.to_s(16)
end
when 0xC then random(x, kk)
else raise "Unknown opcode: " + opcode.to_s(16)
end
current += 1 # next opcode
end
end

# ...

This entire method basically comes right out of the quiz. The program is first unpack()ed into opcodes, as we saw with the disassembler and a current opcode pointer is initialized.

Next the code enters into an infinite loop() of opcode processing. Inside that loop(), the current opcode is pulled and then split into the x, y, kk, and nnn variables the quiz discusses.

The big case statement after that is the opcode instruction processor. Most of those are trivial implementations of the quiz descriptions, but you can see that the more complex operations are delegated to helper methods, which we will examine in just a moment.

The last thing of note here is the jump implementation with the `and next` code on the end of it. That forces the next iteration of the loop() to start immediately, bypassing the current opcode pointer increment after the case statement.

Let's see those helper methods:

ruby
# ...

def add(reg, value)
result = @register[reg] + value
@register[reg] = result & 0xFF
@register[VF] = result >> 8 # carry
end

def subtract(reg, a, b)
result = @register[a] - @register[b]
@register[reg] = result & 0xFF
@register[VF] = - (result >> 8) # borrow
end

def shift_right(reg)
@register[VF] = @register[reg] & 0x01
@register[reg] >>= 1
end

def shift_left(reg)
@register[VF] = @register[reg] >> 7
@register[reg] = (@register[reg] << 1) & 0xFF
end

def random(reg, kk)
@register[reg] = rand(256) & kk
end

# ...

These are mainly just the operations that affect the VF register. Because they require multiple steps, it was easier to move these into separate methods and leave the event loop case statement clear. These definitions still come right out of the quiz.

Finally, we have the last bit of code needed to dump registers and kick-off the program run in the first place. Here's the code:

ruby
# ...

# show all registers
def dump
0.upto(VF) do |reg|
printf("V%1X:%08b\n", reg, @register[reg])
end
end
end

if $0 == __FILE__
ARGV.each do |program|
emu = Chip8Emulator.new
emu.exec(File.read(program))
emu.dump
end
end

The dump() method is a simple print routine for register names and values, iterating over all the registers. Below that we have the code the executes and dumps each program provided as an argument, using the methods we have been examining.

Boris's code did include unit tests. Each opcode was tested against a hand crafted mini-program to check functionality. Here's the subtraction test, for example:

ruby
# ... other test code not shown ...

def test_subtract
@emu.exec("\x60\x00" + "\x61\x01" + "\x80\x15" + "\0\0")
assert_equal [255, 1] + [0]*13 + [1], @emu.register
end

# ... other test code not shown ...

Don't miss the other solutions folks. Take some time to look them over. All are quite clever. You will even find elements not covered in this summary, like Alexandru E. Ungur's assembler!

My thanks to all who took tried their hand at building their own emulator. I hope you found it a lot easier than I did building a good ICFP emulator.

In tomorrow's quiz, we will see just how far a simple method like upcase() can really go in terms of useful application...