I Was Reincarnated as a 6502 CPU After Accidentally Emulating Myself (Now I Have Trust Issues)

$ hexdump -C What_Is_Emulation

The process of recreating one computer system, device, or software environment on another system

  • Recreating the hardware behavior of another machine
  • Translating instruction meant for original system into ones your current system understands
  • Lets softwares (eg, games, operating systems) run as if they were on their native platform
whyiemulate
? How is Emulation different from Simulation
Type Description Example
Emulator Mimics the actual hardware behavior. High accuracy. QEMU
Simulator Models behavior or logic, not necessarily the hardware. Approximate accuracy. Matlab & Simulink

My First Emulator

  • First emulator: John GBA (Android)
  • Played Pokémon Emerald extensively.
  • Over-leveled Swampert (Level 90+).
  • Could defeat Ice-types with just Muddy Water!

Key Features of John GBA

  • Legal
  • Save States
  • Fast Forward
  • Dropbox Sync
  • Cheat Codes

How to emulate?

  • An emulator has to be designed taking in accounts for the internal architecture of the computer.
  • All modern computers are based on von neumann architectures, i.e., a bus connected to CPU, memory and other devices. A bus is bunch of electric line that help in connecting.
  • whyiemulate

CPU emulation

  • A big problem is the CPUs, memory devices, sound devices, etc. all work in parallel, but almost all emulators use to be implemented for monoprocessor machines so devices cannot be emulated in parallel.
  • 
    // Fetch-decode-execute loop
    while (run) {
        executeCPU(cycles_to_execute);   // Simulated CPU: fetch, decode, execute
        generateInterrupts();            // Simulate hardware interrupts
        emulateGraphics();               // Simulate graphics processor
        emulateSound();                  // Simulate sound hardware
        emulateOtherSoftware();          // Simulate other software
        timeSynchronization();          // Keep emulator speed real-time
    }
    
  • The CPU is the core of the emulation and it is used to mark the time of the emulation. Many computers have hardware which introduce time into their system (for example timers and interrupts driven by timers), but the main method to know about the time in a computer is the same executed instruction time in the CPU (counting the CPU cycles). That is the way the emulator main loop takes into account the time.

Two types of CPU emulation :

  1. Interpreted emulation
  2. Binary translation
    • Static binary translation
    • Dynamic binary translation
Link to the question

    interpreted emulation

  • In interpreted emulation we get the intruction codes - decode what they mean - and execute the function of the decoded instruction.
  • A basic CPU reads bytes from an address of the memory pointed by a special register (PC or Program Counter).
  • The more basic CPUs (and also Virtual Machines like Java), have at least two registers, the PC and the SP.

fetch decode loop

  • Gets a byte for some bytes from memory which are located in a position pointed by a special register (PC).
  • The SP, or Stack Pointer, is a pointer to the memory. It is used to keep a stack data structure, that is, a FIFO (First In First Out) structure which is useful for retrieving the last data added to the structure. The SP is decremented and incremented as new values are pushed or popped to the stack.
  • Those CPUs, called stack machines
  • Byte or group of bytes which define a single instruction are called opcodes

opcode examples

  • 1000 1XXX (Intel 8080 instruction)
    • 1000 1XXX → add REG
    • 1 byte instruction
    • XXX is a 3 bit reg code
    • function : adds a general purpose register to accumulator (special register)
    • eg; ADD B : 10001000
    • XXX Register
      000 B
      001 C
      010 D
      011 E
      100 H
      101 L
  • SUBQ.B # data, Dn
    • It stands for subtract quick
    • It subtract a small number (1-8) from a data register

C trick in emulators to access the same register data in multiple ways


typedef union
{
    UINT32 w;   /* Access it as a 32-bit value (maybe for full register set) */
    UINT16 w;   /* Access it as a 16-bit value */
    struct
    {
        UINT8 l, h;  /* Low and High byte parts */
        UINT16 pad;  /* Padding for little-endian alignment */
    } b;       /* Access as two bytes */
} i8080Reg;
                    
    Using this, registers can be accessed as:
  • A 32 bit value
  • A 16 bit value
  • Two separate bytes (low and high parts). This is useful because some instructions operate on 8 bit while others on 16 bit registers

Instruction Emulation

  • Here we need to take ISA (Instruction set architecture) of CPU, and reproduce in the language we are implementing the algorithm in
  • 
    instruction (operands){
        get_operands;
        perform_calculations;
        store_result;
        update_time;
        return to the main loop / fetch next opcode;
    }
                        

    Flags

    • One of the harder takes to emuate using a high level language
    • In most CPUs, the result of an operation (like ADD, SUB, AND, etc.) isn't just the number you get — the CPU also updates a set of flags that describe properties of the result.
    • Flags are single bit variables or registers which are set after some arithmetic or logic instructions. Common in 8 bit and 16 bit CPU, also in many modern COPU (like, x86).
    • This is because most of the times a single instruction changes more than one flag, so, it becomes harder to handle.
    • Some examples of flags are carry, zero, sign flag, etc.
    • Emulating flags is quite expensive because you're replicating hardware's parallel work in sequential software.
    • You often end up with multiple if statements per flag. ach if can become a branch/jump in the compiled code — and modern CPUs hate unpredictable jumps because they break instruction pipelines.
    • This won't be a problem if your host CPU (the one running the emulator) has similar flags to your guest CPU (the one you're emulating). eg 8080 and x86 have similar flags.

    Memory

      Memory emulation can be slow because

    • Memory access is extremely common. Every instruction needs to fetch its code, and many need to read/write data
    • The access logic is complex. In software, the emulator must check every time what kind of memory is being accessed. Some addresses points to ROM, I/O registers, some are special banks that map to different physical pages at different times
    • This may involve scanning lists, calling functions, handling bank switching, and simulating MMU behavior.

    Other performance concerns

    1. Alignment Checks - Some CPUs forbid multi-byte reads/writes to unaligned addresses (e.g., reading a 32-bit word from address 0x0001). You must check and raise that exception
    2. Endianness conversion - If the emulated CPU uses a different byte order (big-endian vs. little-endian) than the host then every multi-byte read/write needs byte swapping

    Modern CPUs and MMU

    Modern CPUs have:

    • Virtual Addresses – seen by running programs
    • Protection
    • Memory-mapped devices

    They are managed by the Memory Management Unit (MMU)

    When emulating such CPUs, you also have to emulate the MMU, which is slow because:
    • Every memory access may involve a translation step
    • Access rights need to be checked
    • Page faults/exceptions must be handled
    Translational step?
    CPU instruction asks for a virtual address → a page table finds the physical location → read/write that physical memory.

    Remember: MMU does all this in parallel, but you can’t while emulating XD

    Interrupts

      Both interrupts and exceptions temporarily stop what the CPU is currently doing so it can run special code to handle an event
    • Hardware interrupts Generated by hardware outside the CPU (e.g., keyboard, timer, network card) and sends to the CPU through dedicated pins on the control bus.
    • Exceptions are generated inside the CPU when it detects a problem during instruction execution. eg, divide by zero, illegal opcode

    How interruption is handled?

    1. Stop execution at the current PC (Program Counter).
    2. Save PC and sometimes other CPU state (registers, flags) in memory or special registers.
    3. Jump to the interrupt handler (special code that deals with the event).
    4. When handler finishes:
      • Restore saved state.
      • Resume execution from where it stopped.
    As you might have guessed, interpreted emulators are slow . I know a way to make them faster:
    • write them in assembly instead.
    • But there are some problems with that as well -
      • The emulator is not portable without rewriting the CPU core (tradeoff between portability and performance).
      • Another implementation exists to increase performance — threaded emulators.
      • Instead of interpreting one instruction at a time, it decodes it once and stores a pointer for reuse.
      • Faster, but less portable code.
      • Not all C compilers can do this — some need inline assembly or special features.

    Binary translation

    Interpreters reads on instruction decodes it, and executes. While, Binary translation is the act of converting the guest machine code into equivalent host machine code, so the host can run it directly.
    • Translates small blocks while running, caches them, and reuses them if needed.
    • Only translates code that actually executes.
    • Speeds up emulation compared to interpreting instructions one-by-one.
      Two main types:
        Static binary translation
      • Translate the entire program before running it.
      • Like translating a whole book before reading.
      • Can be faster because translation is done once, but harder with complex or self-modifying code.
        Dynamic binary translation
      • Translate instructions while the program runs (just-in-time).
      • Like translating a speech as it happens. More flexible, adapts to dynamic code, but adds overhead during execution.
    QEMU uses dynamic binary translation to run ARM code on x86 machines.

    ありがとう

    Made with ❤️ by Saksham

    Checkout this amazing explanation video:

    QR