What’s the smallest virtual machine you could create? Or, why would you want one?
In the days of ubiquitous bandwidth and fast computers we often don’t care if a VM or container image is a gig, or two, in size. However, this rapidly changes when you are confronted with edge cases: people living in less well-connected countries, or, in our case, needing to run hundreds or even thousands of virtual machines for automated test cases over a VPN.
Recently, we started working on the oVirt Terraform Provider, which has unfortunately seen better days. Apart from getting our own very much needed changes we, at the time of writing, have a list of pull requests that have been open for a long time.
Maintaining a project is easy if one works on it alone, or just a few people in close coordination. One can make decisions, execute them, and get a reasonable quality despite the lack of tests. However, when more people get involved, or a project lives on for a long time, automated tests are essential to keep the quality up and avoid breaking things.
If one doesn’t have tests the development speed inevitably slows down to a crawl as time passes and new features get bolted on top. Especially when it’s a community project as the motivation dwindles to review PRs. (Not to mention the fact that the bug reports piling up cost time and effort to fix.)
This is why we needed a solution to run automated tests quickly. The previous test suite was hard-coded for specific identifiers, such as cluster IDs, template IDs, etc. This made it impossible for contributors to run these tests, and incredibly hard to build a continuous integration system to verify patches.
One of the problems we encountered while developing the new test suite was the question of the oVirt template: we had no readily available template we could rely on to be present in any random oVirt cluster. (We even had reports where someone removed the blank template!)
So, we needed a virtual machine image. What to do, what to do? Build a miniature version of CentOS? Package Alpine Linux in a VM image?
No. It’s not that these are not all valid solutions, but they would require the test suite to download external dependencies and then upload a not-so-small file to the oVirt cluster used for testing.
Instead, an old memory came up from the school days: a little utility written by the system administrators which would inject itself into the boot sector, written entirely in Assembler… Assembler! That was it! One could just write a very simple
Hello World program! This could fit in the boot sector and would be the smallest virtual machine image possible: 512 bytes in size. Small enough to just commit the binary into the Git repo.
Would it work? We had no idea. So, we had to go learn Assembler and dust off a fair bit of long-forgotten knowledge about how computers boot. In the old days we would have had to use Altavista or go through 600-page books to learn what we needed to, but thanks to Google and GitHub we found several examples of people doing just that.
At this point, we could have just taken an existing example under a permissive license and call it a day. But, what would we have learned then? Or more importantly, if we just ran someone else’s code without understanding it, how would we know that it actually does what we need it to do, 100% of the time? You can’t really hide anything super malicious in 512 bytes, but the VM may not boot at random times or shut down, which would lead to flaky tests; the last thing we wanted for a project with constrained resources.
So, down we go into the rabbit hole. Let’s learn some Assembler! The first thing we (re-)learned was the fact that there is no one Assembler *language*. Different variants have their own syntax. After a bit of digging we settled on the Netwide Assembler.
Let’s start our little program:
The first instruction tells NASM which memory address the program will start at.
0x7C00 is the address where the BIOS loads the program in the boot sector. Next, we will need to tell NASM to assume our program is running in 16 bit mode. (The CPU is in this mode when the BIOS runs and switches to 32 or 64 bit mode later on in the boot process.)
Let’s compile our program and run it with QEMU:
nasm ourprogram.asm qemu-system-x86_64 \ -nographic \ -serial mon:stdio \ -drive file=ourprogram.raw,format=raw \ -monitor telnet::2000,server,nowait
The output isn’t terribly surprising, since we haven’t written any instructions yet:
Booting from Hard Disk... Boot failed: could not read the boot disk
The BIOS didn’t detect a valid boot sector since the last two bytes of the boot sector must be
0xAA55. Let’s fix that by filling up the disk image with zero bytes (510 bytes) and then adding the magic bytes for the BIOS:
TIMES 510 - ($ - $$) DB 0 ; Fill up 510 bytes DW 0xAA55 ; Write magic bytes for boot loader
This changes the output, now the boot sector is actually loaded but doesn’t do anything yet:
Booting from Hard Disk...
Fantastic! The last bit that we need is writing our
Hello World text to the screen. Thankfully, we don’t need to write graphics drivers or anything of the sort. We can either use the INT 10 BIOS function or write directly to the
0xB8000 memory address.
Before we begin, let’s talk about CPU architecture in very broad strokes. (CPU architects please don’t read this.) RAM is fast, but not quite fast enough. In order for the CPU to work with data we need to load it into the so-called registers. These are comparatively tiny pieces of very fast memory built directly onto the CPU chip. The CPU then uses the data from the registers to perform operations on the data.
Most of the heavy lifting in our program will be done by the BIOS, so the only thing we need to do is to load a byte into the
AL register (the lower 8 bits of the accumulator register), set the
AH register value to
0x0E to print the character, and then call the interrupt
10 to trigger the BIOS to print the character.
Let’s do that. As a first step, let’s create a label that we can reference with the text we want to print, before the end of our program:
ORG 0x7C00 ; Starting address of the boot loader BITS 16 ; Start program in 16 bit mode text: DB "Hello oVirt!", 0 ; Embed data into binary, zero-terminated. TIMES 510 - ($ - $$) DB 0 ; Fill up 510 bytes DW 0xAA55 ; Write magic bytes for boot loader
Fantastic, now the text will be added into our binary. (Note, that running the program now would make the CPU interpret our text as CPU instructions, which is not what we want.)
As a next step, let’s add an instruction to load the memory address of this text label into the
SI register. (SI is the register for string operations.) This is done with the
MOV SI, text instruction:
ORG 0x7C00 BITS 16 ; Move the address of the text label into the SI register MOV SI, text text: DB "Hello oVirt!", 0 TIMES 510 - ($ - $$) DB 0 DW 0xAA55
Next up, we need to tell the BIOS that we want to print a character. This is done by putting the byte
0x0E into the
AH register. (The upper 8 bytes of the accumulator.) This is done with the
MOV AH, 0x0E command:
ORG 0x7C00 BITS 16 MOV SI, text ; Tell the BIOS that we want to print a character MOV AH, 0x0E text: DB "Hello oVirt!", 0 TIMES 510 - ($ - $$) DB 0 DW 0xAA55
Finally, we need a loop that always loads the next byte from the address in the
SI register into
AL, increases the register value by one, and then calls
INT 10. If the value of
0, the program should exit. This is done as follows:
.printChar: ; Load byte from the address in SI into AL and advance SI by one LODSB ; Check if AL is 0. ; This can also be written as OR AL, AL, which saves one byte in the disk image. CMP AL, 0 ; If yes, jump to the return JE .stop ; Trigger BIOS print method INT 0x10 ; Repeat for next byte JMP .printChar .stop: ; Stop the CPU HLT
Put together, this is our entire program:
; Tell the assembler the starting address ORG 0x7C00 ; Tell the assembler we are running in 16 bit mode BITS 16 ; Move the address of the text label below to the SI register MOV SI, text ; Tell the BIOS that we want to print a character MOV AH, 0x0E .printChar: ; Load byte from the address in SI into AL and advance SI by one LODSB ; Check if AL is 0. ; This can also be written as OR AL, AL, which saves one byte in the disk image. CMP AL, 0 ; If yes, jump to the return JE .stop ; Trigger BIOS print method INT 0x10 ; Repeat for next byte JMP .printChar .stop: ; Stop the CPU HLT text: ; Embed this text into the binary, terminated with a 0 DB "Hello oVirt!", 0 ; Fill up the binary to 510 bytes with zeroes TIMES 510 - ($ - $$) DB 0 ; Write the boot sector magic. DW 0xAA55
If we now run this program with QEMU we’ll see the following:
Booting from Hard Disk... Hello oVirt!
Fantastic! We have our 512 byte VM image. Of course, this project has evolved significantly and has received contributions on GitHub, including a CI/CD system, test, and a proper readme, but these are the very baby steps we took to create our test image. Enjoy!
My older 80286 and 80386 based computers didn’t yet have these flexible BIOS’ where you could freely define any drive geometry you needed for the disk you just bought. Nor were the disks smart enought to tell you, because they were almost “analog” devices operating at bitstream level with all the control logic residing on the add-on card you plugged into your ISA bus. So if you wanted to support an unknown drive not pre-configured in the BIOS, or if in fact you were using one of those fancy RLL controlers that added 33% of capacity by changing the phyical format from MFM (modified frequency encoding) to run length limited, modifying the boot sector to load a different set of paramaters for the “Winchester drive” was the only option, apart from continuously patching and burning a new BIOS with the geometry (and an updated checksum) built in. And since I tended to use that additional space to run things like Microport Unix (for the 80286) or QNX apart from DOS, I also had to add a boot choice prompt. Was all done in MASM (Microsoft’s Assembler, which came with Microsoft Fortran, Microsoft Cobol and Microsoft’s Basic Compiler, I believe) and I should still have the source code somewhere… Then used a small Turbo-Pascal program and the INT13 BIOS routines to overwrite the boot sector.
All of that was made easy, because the IBM PC, PC-XT and PC-AT came with a nicely printed manual, that not only included the full electrical schematic but the orginal BIOS source code with comments. That’s what launched the personal computer: Full disclosure! Hear Apple?