Back to Cambridge, I decided to focus on assembly code generation, which is the last layer of compilation. There are multiple things to perform to create a new target backend.


In /asmcomp/:

This directory contains for each architecture a sub-directory that implements architecture-specific code. I created my own xtensa sub-directory, which contains the following files:

  • emit.mlp: a pre-processed OCaml file (on which syntax highlighting has a lot of troubles by the way). It implements asmcomp/emit.mli and consists in translating a Linearize.fundecl code to assembly. This is obviously architecture-specific and I worked on it by roughly translating what was done on ARM.
  • defines architecture-dependant values such as endianness, addressing modes.
  • describes registers, calling conventions and the side effects of instructions on registers. Used by the register allocator.
  • operation and addressing selection overriding default behavior. Useful as Xtensa doesn’t have double precision hardware floating point for example.
  • instruction timing hints.
  • common subexpression elimination. Set to default.
  • instruction reloading. Set to default.

In /asmrun/:

  • xtensa.S: an architecture-specific, handwritten assembly code is here to make the glue between C and OCaml code. It handles calls to the garbage collector.


Writing code

Last week I finished to fill emit.mlp and to start debugging. I figured out when linking failed that I forgot to fill xtensa.S assembly stubs. There are a bit of features to fill in:

  • caml_call_gc: call the runtime garbage collector.
  • caml_alloc1: allocate 4 bytes
  • caml_alloc2: allocate 8 bytes
  • caml_allocN: allocate N-4 bytes, with N given in a register
  • caml_c_call: call a C function
  • caml_start_program: entry point after caml runtime startup
  • caml_callback_exn: callback from C to OCaml with one argument
  • caml_callback2_exn: callback from C to OCaml with two arguments
  • caml_callback3_exn: callback from C to OCaml with three arguments
  • trap_handler: callback from exception
  • caml_raise_exn: raise an exception from OCaml
  • caml_raise_exception: raise an exception from C

Linking it

The process is not that straightforward as compiling and linking for ESP32 relies on the espressif’s Iot Development Framework with contains the linker script and required libraries. The ~easiest~ way I found, yet, to have some OCaml native code running on the ESP32 is the following:

  • ocamlopt-esp32 -dstartup -o main.o -S -dstartup will generate two assembly files and fail on linking:
  • main.s is the main source code
  • main.o.startup.s is the startup code which will then call main.s entry point.
  • Create startup-c.c that will be the glue between ESP-IDF entry point app_main and OCaml runtime entry point caml_main.
  • Put all these files in an ESP-IDF component subdirectory of a project. That is for example hello_caml/main/.
  • Put library files generated by the compilation of ocaml-esp32 in a lib directory hello_caml/lib/:
  • libasmrun.a
  • libstdlib.a
  • std_exit.o
  • Create a relocatable object file startup-c.o from startup-c.c, main.s and main.o.startup.s.
  • Add the libraries in the component Makefile through COMPONENT_ADD_LDFLAGS and COMPONENT_EXTRA_INCLUDES.
  • make

Debugging stuff

  • I use QEMU for debugging. This github explains how to do it. It works out of the box with the gdb shipped with the repository.
  • ESP32 WROVER kits have a JTAG interface, that will allow me to test my code on real hardware, once it works on QEMU.

Funny stuff encountered

Conditional branches don’t have legs

The conditional branch has a range of +-128 bytes. My generated code tried to jump further, generating the Error: jump target out of range; no usable trampoline found. I had to put a jump instruction close the conditional as I often need to go far away. The jump to label has a range of +-131075 bytes. If that’s not enough I can address the whole space with a jump to address in register.

Never look forward

The PC-relative load has a range of [-262141, -4]. Therefore data must be before every load and store instructions. The assembler handles this alone when compiling a single file. But the linker doesn’t seem to handle that well accross files. I had to put additional symbols.

What you see is not what you get

Xtensa processors can have a feature called “Windowed registers”. It allows a processor to have a given number of registers (64) but only a subset interval of these registers are visible at each instant (16).

On call, you can ask the processor to move this window to the right, by a number of registers. It can be 0, 4, 8, or 12. There are special instructions that magically handles the fact that this window can overflow by spilling registers in stack memory. That makes the ABI a bit special as a8 register of the caller is the a0 register of the callee if the call8 instruction is used.

Using call4, call8 and call12 is compatible as the entry function handles everything for you. However call0 is not compatible with entry as the document explains it throws an IllegalInstruction exception. Guess what? I wanted to start with call0 ABI as it’s simpler to reason about, but C code is compiled against call8 ABI.