doc: Update and expand firmware README

- Remove all text about other software than firmware. - Remove the Reset section. - Include a diagram and detailed explanation about the state machine in close vicinity. - Describe the test firmware. Co-authored-by: Joachim Strömbergson <joachim@assured.se>
2025-02-22 23:59:55 -05:00 · 2024-06-27 22:23:49 +02:00 · 2024-06-27 22:23:49 +02:00 · f1534e5dad
commit f1534e5dad
parent cc16c8481c
1 changed files with 195 additions and 181 deletions
--- a/hw/application_fpga/fw/README.md
+++ b/hw/application_fpga/fw/README.md
@ -1,201 +1,103 @@
-# Tillitis TKey software
-
-**NOTE:** Documentation migrated to dev.tillitis.se, this is kept for
-history. This is likely to be outdated.
+# Firmware

 ## Introduction

-This text is both an introduction to and a requirement specification
-of the TKey firmware, its protocol, and an overview of how TKey
-applications are supposed to work. For an overview of the TKey
-concepts, see [System Description](system_description.md).
+This text is an introduction to, a requirement specification of,
+and some implementation notes of the TKey firmware. It also gives a
+few hint on developing and debugging the firmware.

-First, some definitions:
+This text is specific for the firmware. For a more general description
+on how to implement device apps, see [the TKey Developer
+Handbook](https://dev.tillitis.se/).
+
+## Definitions

 - Firmware - software in ROM responsible for loading applications. The
-  firmware is included as part of the FPGA bit stream.
- Application or app - software supplied by the host machine which is
+  firmware is included as part of the FPGA bitstream and not
+  replacable on a usual consumer TKey.
+- Device application or app - software supplied by the client which is
  received, loaded, measured, and started by the firmware.

+## CPU modes and firmware
+
 The TKey has two modes of software operation: firmware mode and
-application mode. The firmware mode has the responsibility of
-receiving, measuring, loading, and starting the application. When the
-firmware is about to start the application it switches to a more
-constrained environment, the application mode.
+application mode. The TKey always starts in firmware mode and starts
+the firmware. When the firmware is about to start the application it
+switches to a more constrained environment, the application mode.

-The firmware and application uses a memory mapped input/output (MMIO)
-for communication with the hardware. The memory map is constrained
-when running in application mode, e.g. FW-RAM and UDS isn't readable,
-and several MMIO addresses are either not readable or not writable for
-the application.
+The TKey hardware cores are memory mapped. Firmware has complete
+access, except that the UDS is readable only once. The memory map is
+constrained when running in application mode, e.g. FW\_RAM and UDS
+isn't readable, and several other hardware addresses are either not
+readable or not writable for the application.

-See table in the [System
-Description](system_description.md#memory-mapped-hardware-functions)
-for details about access rules control in the memory system and MMIO.
+See the table in [the Developer
+Handbook](https://dev.tillitis.se/memory/) for an overview about the
+memory access control.

-The firmware (and optionally all software) on the TKey can communicate
-to the host via the `UART_{RX,TX}_{STATUS,DATA}` registers, using the
-framing protocol described in the [Framing
+## Communication
+
+The firmware communicates to the host via the
+`UART_{RX,TX}_{STATUS,DATA}` registers, using the framing protocol
+described in the [Framing
 Protocol](https://dev.tillitis.se/protocol/).

-The firmware defines a protocol on top of this framing layer which is
+The firmware uses a protocol on top of this framing layer which is
 used to bootstrap the application. All commands are initiated by the
-host. All commands receive a reply. See [Firmware
+client. All commands receive a reply. See [Firmware
 protocol](#firmware-protocol) for specific details.

-Applications define their own protocol used for communication with
-their host part. They may or may not be based on the Framing Protocol.
-
-## CPU
-
-The CPU is a PicoRV32, a 32-bit RISC-V processor (arch: RV32IC\_Zmmul)
-which runs the firmware and the application. The firmware and
-application both run in RISC-V machine mode. All types are
-little-endian.
-
-## Constraints
+## Memory constraints

 - ROM: 6 kByte.
+- FW\_RAM: 2 kByte.
 - RAM: 128 kByte.

-## Firmware
+## Firmware behaviour

-The purpose of the firmware is to bootstrap and measure an
-application.
+The purpose of the firmware is to load, measure, and start an
+application received from the client over the USB/UART.

-The TKey has 128 kilobyte RAM. Firmware loads the application at the
-start of RAM. The current C runtime (`crt0.S`) of apps in our [apps
-repo](https://github.com/tillitis/tillitis-key1-apps) sets up the
-stack to start just below the end of RAM. This means that a larger app
-comes at the compromise of it having a smaller stack.
-
-The firmware is part of FPGA bitstream (ROM), and is loaded at
-`0x0000_0000`.
-
-When the firmware starts it clears all RAM and then wait for commands
-coming in on `UART_RX`.
-
-Typical use scenario:
-
-  1. The host sends the `FW_CMD_LOAD_APP` command with the size of the
-     device app and the optional user-supplied secret as arguments and
-     and gets a `FW_RSP_LOAD_APP` back. After using this it's not
-     possible to restart the loading of an application.
-  2. If the the host receive a sucessful response, it will send
-     multiple `FW_CMD_LOAD_APP_DATA` commands, together containing the
-     full application.
-  3. On receiving`FW_CMD_LOAD_APP_DATA` commands the firmware places
-     the data into `0x4000_0000` and upwards. The firmware replies
-     with a `FW_RSP_LOAD_APP_DATA` response to the host for each
-     received block except the last data block.
-  4. When the final block of the application image is received with a
-     `FW_CMD_LOAD_APP_DATA`, the firmware measure the application by
-     computing a BLAKE2s digest over the entire application. Then
-     firmware send back the `FW_RSP_LOAD_APP_DATA_READY` response
-     containing the measurement.
-  5. The Compound Device Identifier (CDI) is then computed by using
-     the `UDS`, application digest, and the `USS`, and placed in
-     `CDI`. (see [Compound Device Identifier
-     computation](#compound-device-identifier-computation)) Then the
-     start address of the device app, `0x4000_0000`, is written to
-     `APP_ADDR` and the size to `APP_SIZE` to let the device
-     application know where it is loaded and how large it is, if it
-     wants to relocate in RAM.
-  6. The firmware now clears the special `FW_RAM` where it keeps it
-     stack. After this it does no more function calls and uses no more
-     automatic variables.
-  7. Firmware starts the application by first switching to application
-     mode by writing to the `SWITCH_APP` register. In this mode the
-     MMIO region is restricted, e.g. some registers are removed
-     (`UDS`), and some are switched from read/write to read-only (see
-     [memory
-     map](system_description.md#memory-mapped-hardware-functions)).
-
-     Then the firmware jumps to what's in `APP_ADDR` which starts
-     the application.
-
-     There is now no other means of getting back from application mode
-     to firmware mode than resetting/power cycling the device.
-
-### Developing firmware
-
-Standing in `hw/application_fpga/` you can run `make firmware.elf` to
-build just the firmware. You don't need all the FPGA development tools
-mentioned in [Toolchain setup](../toolchain_setup.md).
-
-You need clang with 32 RISC-V support `-march=rv32iczmmul` which comes
-in clang 15. If you don't have version 15 you might get by with
-`-march=rv32imc` but things will break if you ever cause it to emit
-`div` instructions.
-
-You also need `llvm-ar`, `llvm-objcopy`, `llvm-size` and `lld`.
-Typically these are all available in packages called "clang", "llvm",
-"lld" or similar.
-
-If your available `objcopy` and `size` commands is anything other than
-the default `llvm-objcopy` and `llvm-size` define `OBJCOPY` and `SIZE`
-to whatever they're called on your system before calling `make
-firmware.elf`.
-
-If you want to use our emulator, clone the `tk1` branch of [our
-version of qemu](https://github.com/tillitis/qemu) and build:
-
-```
-$ git clone -b tk1 https://github.com/tillitis/qemu
-$ mkdir qemu/build
-$ cd qemu/build
-$ ../configure --target-list=riscv32-softmmu --disable-werror
-$ make -j $(nproc)
-```
-
-(Built with warnings-as-errors disabled, see [this
-issue](https://github.com/tillitis/qemu/issues/3).)
-
-Run it like this:
-
-```
-$ /path/to/qemu/build/qemu-system-riscv32 -nographic -M tk1,fifo=chrid -bios firmware.elf \
-  -chardev pty,id=chrid
-```
-
-This attaches the FIFO to a tty, something like `/dev/pts/16` which
-you can use with host software to talk to the firmware.
-
-To quit QEMU you can use: `Ctrl-a x` (see `Ctrl-a ?` for other commands).
-
-Debugging? Use the HTIF console by removing `-DNOCONSOLE` from the
-`CFLAGS` and using the helper functions in `lib.c` like `htif_puts()`
-`htif_putinthex()` `htif_hexdump()` and friends for printf-like
-debugging.
-
-You can add `-d guest_errors` to the qemu commandline to make QEMU
-send errors from the TK1 machine to stderr, typically things like
-memory writes outside of mapped regions.
-
-You can also use the qemu monitor for debugging, e.g. `info
-registers`, or run qemu with `-d in_asm` or `-d trace:riscv_trap`.
-
-### Reset
-
-The PicoRV32 starts executing at `0x0000_0000`. We allow no `.data` or
-`.bss` sections. Our firmware starts at the `_start` symbol in
-`start.S` which first clears all RAM for any remaining data from
-previous applications, then initializes a stack starting at the top of
-`FW_RAM` at `0xd000_0800` and downwards and then calls `main`.
-
-When the initialization is finished, the firmware waits for incoming
-commands from the host, by busy-polling the `UART_RX_{STATUS,DATA}`
-registers. When a complete command is read, the firmware executes the
-command.
+The firmware binary is part of the FPGA bitstream as the initial
+values of the Block RAMs used to construct the `FW_ROM`. The `FW_ROM`
+start address is located at `0x0000_0000` in the CPU memory map, which
+is the CPU reset vector.

 ### Firmware state machine

+This is the state diagram of the firmware. There are only four states.
+Change of state occur when we receive specific I/O or a fatal error
+occurs.
+
+```mermaid
+stateDiagram-v2
+     S1: initial
+     S2: loading
+     S3: running
+     SE: failed
+
+     [*] --> S1
+
+     S1 --> S1: Commands
+     S1 --> S2: LOAD_APP
+     S1 --> SE: Error
+
+     S2 --> S2: LOAD_APP_DATA
+     S2 --> S3: Last block received
+     S2 --> SE: Error
+
+     S3 --> [*]
+```
+
 States:

- `initial` - At start.
- `loading` - Expect application data.
- `run` - Computes CDI and starts the application.
- `fail` - Stops waiting for commands, flashes LED forever.
+- `initial` - At start. Allows the commands `NAME_VERSION`, `GET_UDI`,
+  `LOAD_APP`.
+- `loading` - Expect application data. Allows only the command
+  `LOAD_APP_DATA`.
+- `run` - Computes CDI and starts the application. Allows no commands.
+- `fail` - Stops waiting for commands, flashes LED forever. Allows no
+  commands.

 Commands in state `initial`:

@ -212,17 +114,98 @@ Commands in state `loading`:
 |------------------------|----------------------------------|
 | `FW_CMD_LOAD_APP_DATA` | unchanged or `run` on last chunk |

-Commands in state `run`: None.
-
-Commands in state `fail`: None.
-
 See [Firmware protocol in the Dev
 Handbook](http://dev.tillitis.se/protocol/#firmware-protocol) for the
 definition of the specific commands and their responses.

+State changes from "initial" to "loading" when receiving `LOAD_APP`,
+which also sets the size of the number of data blocks to expect. After
+that we expect several `LOAD_APP_DATA` commands until the last block
+is received, when state is changed to "running".
+
+In "running", the loaded device app is measured, the Compound Device
+Identifier (CDI) is computed, we do some cleanup of firmware data
+structures, flip to application mode, and finally start the app, which
+ends the firmware state machine.
+
+The device app is now running in application mode. There is no other
+means of getting back from application mode to firmware mode than
+resetting/power cycling the device. Note that ROM is still accessible
+in the memory map, so it's still possible to execute firmware code in
+application mode, but with no privileged access.
+
+Firmware loads the application at the start of RAM (`0x4000_0000`). It
+uses the special FW\_RAM for its own stack.
+
+When the firmware starts it clears all FW\_RAM, then sets up a stack
+there before jumping to `main()`.
+
+When reset is released, the CPU starts executing the firmware. It
+begins by clearing all CPU registers, and then sets up a stack for
+itself and then jumps to main().
+
+Beginning at `main()` it sets up the "system calls", then fills the
+entire RAM with pseudo random data and setting up the RAM address and
+data hardware scrambling with values from the True Random Number
+Generator (TRNG). It then waits for data coming in through the UART.
+
+Typical expected use scenario:
+
+  1. The client sends the `FW_CMD_LOAD_APP` command with the size of
+     the device app and the optional 32 byte hash of the user-supplied
+     secret as arguments and gets a `FW_RSP_LOAD_APP` back. After
+     using this it's not possible to restart the loading of an
+     application.
+
+  2. If the the client receive a sucessful response, it will send
+     multiple `FW_CMD_LOAD_APP_DATA` commands, together containing the
+     full application.
+
+  3. On receiving`FW_CMD_LOAD_APP_DATA` commands the firmware places
+     the data into `0x4000_0000` and upwards. The firmware replies
+     with a `FW_RSP_LOAD_APP_DATA` response to the client for each
+     received block except the last data block.
+
+  4. When the final block of the application image is received with a
+     `FW_CMD_LOAD_APP_DATA`, the firmware measure the application by
+     computing a BLAKE2s digest over the entire application. Then
+     firmware send back the `FW_RSP_LOAD_APP_DATA_READY` response
+     containing the digest.
+
+  5. The Compound Device Identifier
+     ([CDI]((#compound-device-identifier-computation))) is then
+     computed by doing a new BLAKE2s using the Unique Device Secret
+     (UDS), the application digest, and any User Supplied Secret
+     (USS).
+
+  6. The start address of the device app, currently `0x4000_0000`, is
+     written to `APP_ADDR` and the size of the binary to `APP_SIZE` to
+     let the device application know where it is loaded and how large
+     it is, if it wants to relocate in RAM.
+
+  7. The firmware now clears the special `FW_RAM` where it keeps it
+     stack. After this it performs no more function calls and uses no
+     more automatic variables.
+
+  8. Firmware starts the application by first switching to from
+     firmware mode to application mode by writing to the `SWITCH_APP`
+     register. In this mode the MMIO region is restricted, e.g. some
+     registers are removed (`UDS`), and some are switched from
+     read/write to read-only (see [the memory
+     map](https://dev.tillitis.se/memory/)).
+
+     Then the firmware jumps to what is in `APP_ADDR` which starts the
+     application.
+
+If during this whole time any commands are received which are not
+allowed in the current state, we enter the "failed" state and execute
+an illegal instruction. An illegal instruction traps the CPU and
+hardware blinks the status LED red until a power cycle. No further
+instructions are executed.
+
 ### User-supplied Secret (USS)

-USS is a 32 bytes long secret provided by the user. Typically a host
+USS is a 32 bytes long secret provided by the user. Typically a client
 program gets a secret from the user and then does a key derivation
 function of some sort, for instance a BLAKE2s, to get 32 bytes which
 it sends to the firmware to be part of the CDI computation.
@ -256,6 +239,10 @@ Then we continue with the CDI computation by updating with an optional
 USS and then finalizing the hash, storing the resulting digest in
 `CDI`.

+We also clear the entire `FW_RAM` where the stack lived, including the
+BLAKE2s context with the UDS, very soon after that, just before
+jumping to the application.
+
 ### Firmware services

 The firmware exposes a BLAKE2s function through a function pointer
@ -283,11 +270,38 @@ typedef struct {
 ```

 The `libcommon` library in
-[tillitis-key1-apps](https://github.com/tillitis/tillitis-key1-apps/)
-has a wrapper for using this function called `blake2s()`.
+[tkey-libs](https://github.com/tillitis/tkey-libs/)
+has a wrapper for using this function called `blake2s()` which needs
+to be maintained if you do any changes to the firmware call.

-## Applications
+## Developing firmware

-See [our apps repo](https://github.com/tillitis/tillitis-key1-apps)
-for examples of client and TKey apps as well as libraries for writing
-both.
+Standing in `hw/application_fpga/` you can run `make firmware.elf` to
+build just the firmware. You don't need all the FPGA development
+tools. See [the Developer Handbook](https://dev.tillitis.se/tools/)
+for the tools you need. The easiest is probably to use your OCI image,
+`ghcr.io/tillitis/tkey-builder`.
+
+[Our version of qemu](https://dev.tillitis.se/tools/#qemu-emulator) is
+also useful for debugging the firmware. You can attach GDB, use
+breakpoints, et cetera.
+
+If you want to use plain debug prints you can remove `-DNOCONSOLE`
+from the `CFLAGS` in the Makefile and using the helper functions in
+`lib.c` like `htif_puts()` `htif_putinthex()` `htif_hexdump()` and
+friends for printf-like debugging. Note that these functions are only
+usable in qemu.
+
+### Test firmware
+
+The test firmware is in `testfw`. It's currently a bit of a hack and
+just runs through expected behavior of the hardware cores, giving
+special focus to access control in firmware mode and application mode.
+
+It outputs results on the UART. This means that you have to attach a
+terminal program to the serial port device, even if it's running in
+qemu. It waits for you to type a character before starting the tests.
+
+It needs to be compiled with `-Os` instead of `-O2` in `CFLAGS` in the
+ordinary `application_fpga/Makefile` to be able to fit in the 6 kByte
+ROM.