Back to Cambridge, I decided to focus on assembly code generation, which is the last layer of compilation. There are multiple things to perform to create a new target backend.
This directory contains for each architecture a sub-directory that implements architecture-specific code. I created my own
xtensa sub-directory, which contains the following files:
emit.mlp: a pre-processed OCaml file (on which syntax highlighting has a lot of troubles by the way). It implements
asmcomp/emit.mliand consists in translating a
Linearize.fundeclcode to assembly. This is obviously architecture-specific and I worked on it by roughly translating what was done on ARM.
arch.ml: defines architecture-dependant values such as endianness, addressing modes.
proc.ml: describes registers, calling conventions and the side effects of instructions on registers. Used by the register allocator.
selection.ml: operation and addressing selection overriding default behavior. Useful as Xtensa doesn’t have double precision hardware floating point for example.
scheduling.ml: instruction timing hints.
CSE.ml: common subexpression elimination. Set to default.
reload.ml: instruction reloading. Set to default.
xtensa.S: an architecture-specific, handwritten assembly code is here to make the glue between C and OCaml code. It handles calls to the garbage collector.
Last week I finished to fill
proc.ml to start debugging. I figured out when linking failed that I forgot to fill
xtensa.S assembly stubs.
There are a bit of features to fill in:
caml_call_gc: call the runtime garbage collector.
caml_alloc1: allocate 4 bytes
caml_alloc2: allocate 8 bytes
caml_allocN: allocate N-4 bytes, with N given in a register
caml_c_call: call a C function
caml_start_program: entry point after caml runtime startup
caml_callback_exn: callback from C to OCaml with one argument
caml_callback2_exn: callback from C to OCaml with two arguments
caml_callback3_exn: callback from C to OCaml with three arguments
trap_handler: callback from exception
caml_raise_exn: raise an exception from OCaml
caml_raise_exception: raise an exception from C
The process is not that straightforward as compiling and linking for ESP32 relies on the espressif’s Iot Development Framework with contains the linker script and required libraries. The ~easiest~ way I found, yet, to have some OCaml native code running on the ESP32 is the following:
ocamlopt-esp32 test.ml -dstartup -o main.o -S -dstartupwill generate two assembly files and fail on linking:
main.sis the main source code
main.o.startup.sis the startup code which will then call
startup-c.cthat will be the glue between ESP-IDF entry point
app_mainand OCaml runtime entry point
- Put all these files in an ESP-IDF component subdirectory of a project. That is for example
- Put library files generated by the compilation of ocaml-esp32 in a lib directory
- Create a relocatable object file
- Add the libraries in the component Makefile through
- I use QEMU for debugging. This github explains how to do it. It works out of the box with the gdb shipped with the repository.
- ESP32 WROVER kits have a JTAG interface, that will allow me to test my code on real hardware, once it works on QEMU.
Funny stuff encountered
Conditional branches don’t have legs
The conditional branch has a range of +-128 bytes. My generated code tried to jump further, generating the
Error: jump target out of range; no usable trampoline found. I had to put a jump instruction close the conditional as I often need to go far away. The jump to label has a range of +-131075 bytes. If that’s not enough I can address the whole space with a jump to address in register.
Never look forward
The PC-relative load has a range of [-262141, -4]. Therefore data must be before every load and store instructions. The assembler handles this alone when compiling a single file. But the linker doesn’t seem to handle that well accross files. I had to put additional symbols.
What you see is not what you get
Xtensa processors can have a feature called “Windowed registers”. It allows a processor to have a given number of registers (64) but only a subset interval of these registers are visible at each instant (16).
On call, you can ask the processor to move this window to the right, by a number of registers. It can be 0, 4, 8, or 12. There are special instructions that magically handles the fact that this window can overflow by spilling registers in stack memory.
That makes the ABI a bit special as
a8 register of the caller is the
a0 register of the callee if the
call8 instruction is used.
call12 is compatible as the
entry function handles everything for you. However
call0 is not compatible with
entry as the document explains it throws an IllegalInstruction exception. Guess what? I wanted to start with
call0 ABI as it’s simpler to reason about, but C code is compiled against