Back to Cambridge, I decided to focus on assembly code generation, which is the last layer of compilation. There are multiple things to perform to create a new target backend.
# TODO
# In /asmcomp/:
This directory contains for each architecture a sub-directory that implements architecture-specific code. I created my own xtensa sub-directory, which contains the following files:
emit.mlp: a pre-processed OCaml file (on which syntax highlighting has a lot of troubles by the way). It implementsasmcomp/emit.mliand consists in translating aLinearize.fundeclcode to assembly. This is obviously architecture-specific and I worked on it by roughly translating what was done on ARM.arch.ml: defines architecture-dependant values such as endianness, addressing modes.proc.ml: describes registers, calling conventions and the side effects of instructions on registers. Used by the register allocator.selection.ml: operation and addressing selection overriding default behavior. Useful as Xtensa doesn't have double precision hardware floating point for example.scheduling.ml: instruction timing hints.CSE.ml: common subexpression elimination. Set to default.reload.ml: instruction reloading. Set to default.
# In /asmrun/:
xtensa.S: an architecture-specific, handwritten assembly code is here to make the glue between C and OCaml code. It handles calls to the garbage collector.
# Progress
# Writing code
Last week I finished to fill emit.mlp and proc.ml to start debugging. I figured out when linking failed that I forgot to fill xtensa.S assembly stubs.
There are a bit of features to fill in:
caml_call_gc: call the runtime garbage collector.caml_alloc1: allocate 4 bytescaml_alloc2: allocate 8 bytescaml_allocN: allocate N-4 bytes, with N given in a registercaml_c_call: call a C functioncaml_start_program: entry point after caml runtime startupcaml_callback_exn: callback from C to OCaml with one argumentcaml_callback2_exn: callback from C to OCaml with two argumentscaml_callback3_exn: callback from C to OCaml with three argumentstrap_handler: callback from exceptioncaml_raise_exn: raise an exception from OCamlcaml_raise_exception: raise an exception from C
# Linking it
The process is not that straightforward as compiling and linking for ESP32 relies on the espressif's Iot Development Framework with contains the linker script and required libraries. The ~easiest~ way I found, yet, to have some OCaml native code running on the ESP32 is the following:
ocamlopt-esp32 test.ml -dstartup -o main.o -S -dstartupwill generate two assembly files and fail on linking:
main.sis the main source codemain.o.startup.sis the startup code which will then callmain.sentry point.
- Create
startup-c.cthat will be the glue between ESP-IDF entry pointapp_mainand OCaml runtime entry pointcaml_main. - Put all these files in an ESP-IDF component subdirectory of a project. That is for example
hello_caml/main/. - Put library files generated by the compilation of ocaml-esp32 in a lib directory
hello_caml/lib/:
libasmrun.alibstdlib.astd_exit.o
- Create a relocatable object file
startup-c.ofromstartup-c.c,main.sandmain.o.startup.s. - Add the libraries in the component Makefile through
COMPONENT_ADD_LDFLAGSandCOMPONENT_EXTRA_INCLUDES. make
# Debugging stuff
- I use QEMU for debugging. This github explains how to do it. It works out of the box with the gdb shipped with the repository.
- ESP32 WROVER kits have a JTAG interface, that will allow me to test my code on real hardware, once it works on QEMU.
# Funny stuff encountered
# Conditional branches don't have legs
The conditional branch has a range of +-128 bytes. My generated code tried to jump further, generating the Error: jump target out of range; no usable trampoline found. I had to put a jump instruction close the conditional as I often need to go far away. The jump to label has a range of +-131075 bytes. If that's not enough I can address the whole space with a jump to address in register.
# Never look forward
The PC-relative load has a range of [-262141, -4]. Therefore data must be before every load and store instructions. The assembler handles this alone when compiling a single file. But the linker doesn't seem to handle that well accross files. I had to put additional symbols.
# What you see is not what you get
Xtensa processors can have a feature called "Windowed registers". It allows a processor to have a given number of registers (64) but only a subset interval of these registers are visible at each instant (16).
On call, you can ask the processor to move this window to the right, by a number of registers. It can be 0, 4, 8, or 12. There are special instructions that magically handles the fact that this window can overflow by spilling registers in stack memory.
That makes the ABI a bit special as a8 register of the caller is the a0 register of the callee if the call8 instruction is used.
Using call4, call8 and call12 is compatible as the entry function handles everything for you. However call0 is not compatible with entry as the document explains it throws an IllegalInstruction exception. Guess what? I wanted to start with call0 ABI as it's simpler to reason about, but C code is compiled against call8 ABI.