tillitis-key/hw/application_fpga/fw/README.md

# Firmware implementation notes

## Introduction

This text is specific for the firmware, the piece of software in TKey
ROM. For a more general description on how to implement device apps,
see [the TKey Developer Handbook](https://dev.tillitis.se/).

## Definitions

- Firmware: Software in ROM responsible for loading, measuring, and
  starting applications. The firmware is included as part of the FPGA
  bitstream and not replacable on a usual consumer TKey.
- Client: Software running on a computer or a mobile phone the TKey is
  inserted into.
- Device application or app: Software supplied by the client that runs
  on the TKey.

## CPU modes and firmware

The TKey has two modes of software operation: firmware mode and
application mode. The TKey always starts in firmware mode when it
starts the firmware. When the application starts the hardware
automatically switches to a more constrained environment: the
application mode.

The TKey hardware cores are memory mapped but the memory access is
different depending on mode. Firmware has complete access, except that
the Unique Device Secret (UDS) words are readable only once even in
firmware mode. The memory map is constrained when running in
application mode, e.g. FW\_RAM and UDS isn't readable, and several
other hardware addresses are either not readable or not writable for
the application.

When doing system calls from a device app the context switches back to
firmware mode. However, the UDS is still not available, protected by
two measures: 1) the UDS words can only be read out once and have
already been read by firmware when measuring the app, and, 2) the UDS
is protected by hardware after the execution leaves ROM for the first
time.

See the table in [the Developer
Handbook](https://dev.tillitis.se/memory/) for an overview about the
memory access control.

## Communication

The firmware communicates with the client using the
`UART_{RX,TX}_{STATUS,DATA}` registers. On top of that is uses three
protocols: The USB Mode protocol, the TKey framing protocol, and the
firmware's own protocol.

To communicate between the CPU and the CH552 USB controller it uses an
internal protocol, used only within the TKey, which we call the USB
Mode Protocol. It is used in both directions.

| *Name*   | *Size*    | *Comment*                          |
|----------|-----------|------------------------------------|
| Endpoint | 1B        | Origin or destination USB endpoint |
| Length   | 1B        | Number of bytes following          |
| Payload  | See above | Actual data from or to firmware    |

The different endpoints:

| *Name* | *Value* | *Comment*                                                           |
|--------|---------|---------------------------------------------------------------------|
| CTRL   | 0x20    | A USB HID special debug pipe. Useful for debug prints.              |
| CDC    | 0x40    | USB CDC-ACM, a serial port on the client.                           |
| HID    | 0x80    | A USB HID security token device, useful for FIDO-type applications. |

On top of the USB Mode Protocol is [the TKey Framing
Protocol](https://dev.tillitis.se/protocol/) which is described in the
Developer Handbook.

The firmware uses a protocol on top of this framing layer which is
used to bootstrap an application. All commands are initiated by the
client. All commands receive a reply. See [Firmware
protocol](http://dev.tillitis.se/protocol/#firmware-protocol) in the
Dev Handbook for specific details.

## Memory constraints

| *Name*  | *Size*    | *FW mode* | *App mode* |
|---------|-----------|-----------|------------|
| ROM     | 8 kByte   | r-x       | r          |
| FW\_RAM | 4 kByte*  | rw-       | -          |
| RAM     | 128 kByte | rwx       | rwx        |

* FW\_RAM is divided into the following areas:

- fw stack: 3824 bytes.
- resetinfo: 256 bytes.
- rest is available for .data and .bss.

## Firmware behaviour

The purpose of the firmware is to load, measure, and start an
application received from the client over the USB/UART.

The firmware binary is part of the FPGA bitstream as the initial
values of the Block RAMs used to construct the `FW_ROM`. The `FW_ROM`
start address is located at `0x0000_0000` in the CPU memory map, which
is also the CPU reset vector.

### Reset intentions

We have a number of reset options we call `startfrom`:

1. Start from flash slot 1 (default): `FLASH1`
2. Start from flash slot 2: `FLASH2`.
3. Load and start an app from flash slot 1 with a specific app hash:
   `FLASH1_VER`
4. Load and start an app from flash slot 2 with a specific app hash:
   `FLASH2_VER`.
5. Load and start a new app from client: `CLIENT`.
6. load and start an app from client with a specific app hash:
   `CLIENT_VER`.

### Firmware state machine

This is the state diagram of the firmware. Change of state occur when
we receive specific I/O or a fatal error occurs.

```mermaid
stateDiagram-v2
     S0: initial
     S1: waitcommand
     S2: loading
     S3: flash_loading
     S4: auth_app
     S5: starting
     S6: compute_cdi
     SE: failed

     [*] --> S0

     S0 --> S1
     S0 --> S4: Default

     S1 --> S1: Commands
     S1 --> S2: LOAD_APP
     S1 --> SE: Error

     S2 --> S2: LOAD_APP_DATA
     S2 --> S6: Last block received
     S2 --> SE: Error

     S6 --> S3

     S3 --> S5

     S4 --> S5
     S4 --> SE: Error

     SE --> [*]
     S5 --> [*]
```

States:

- `initial`: We start by checking resetinfo data in `FW_RAM` for
  `startfrom`.
- `waitcommand`: Waiting for initial commands from client. Allows the
  commands `NAME_VERSION`, `GET_UDI`, `LOAD_APP`.
- `loading`: Expecting application data from client. Allows only the
  command `LOAD_APP_DATA`.
- `flash_loading`: Loading and authentication app from flash. Computes CDI,
  creates or checks the authentication of the flash app. Allows no commands.
- `starting`: Starts the application. Does not return to firmware.
  Allows no commands.
- `failed` - Halts CPU. Allows no commands.

Allowed data in state `resetinfo`:

| *startfrom*  | *next state*    |
|--------------|-----------------|
| `FLASH1`     | `flash_loading` |
| `FLASH2`     | `flash_loading` |
| `FLASH1_VER` | `flash_loading` |
| `FLASH2_VER` | `flash_loading` |
| `CLIENT`     | `waitcommand`   |
| `CLIENT_VER` | `waitcommand`   |

I/O in state `flash_loading`:

| *I/O*              | *next state* |
|--------------------|--------------|
| Last app data read | `starting`   |

Commands in state `waitcommand`:

| *command*             | *next state* |
|-----------------------|--------------|
| `FW_CMD_NAME_VERSION` | unchanged    |
| `FW_CMD_GET_UDI`      | unchanged    |
| `FW_CMD_LOAD_APP`     | `loading`    |

Commands in state `loading`:

| *command*              | *next state*                          |
|------------------------|---------------------------------------|
| `FW_CMD_LOAD_APP_DATA` | unchanged or `starting` on last chunk |

No other states allows commands.

See [Firmware protocol in the Dev
Handbook](http://dev.tillitis.se/protocol/#firmware-protocol) for the
definition of the specific commands and their responses.

Plain text explanation of the states:

- `initial`: Execution starts here. The firmware checks in the
  `FW_RAM` for `startfrom` for what to do next.

  For all `startfrom` values `FLASH_*` the next state is `startflash`.
  Otherwise it goes to `waitcommand`, indicating that it should wait
  for further commands from the client.

- `flash_loading` loads and measure an app from flash, the Compound
  Device Identifier (CDI) is computed, then the app is authenticated
  against a stored digest to see that no one has changed the app by
  manipulating the flash. The compuation is done by:

  digest = blake2s(cdi, nonce from flash)

  and then compared against the stored digest in the app's flash slot.

- `waitcommand` waits for command from the client. State changes to
  `loading` when receiving `LOAD_APP`, which also sets the size of the
  number of data blocks to expect. After that we expect several
  `LOAD_APP_DATA` commands until the last block is received, when
  state is changed to `running`.

- `compute_cdi`: The the Compound Device Identifier (CDI) is computed
  and we go to `starting`.

- `starting`: Clean up firmware data structures, enable the system
    calls, and start the app, which ends the firmware state machine.
    Hardware guarantees that we leave firmware mode automatically when
    the program counter leaves ROM.

After `starting` the device app is now running in application mode. We
can, however, return to firmware mode (excepting access to the UDS) by
doing system calls. Note that ROM is still readable, but is now
hardware protected from execution, except through the system call
mechanism.

### Golden path

Firmware loads the application at the start of RAM (`0x4000_0000`)
from either flash or the UART. It use a part of the special FW\_RAM
for its own stack.

When reset is released, the CPU starts executing the firmware. It
begins in `start.S` by clearing all CPU registers, clears all FW\_RAM,
sets up a stack for itself there, and then jumps to `main()`. Also
included in the assembly part of firmware is an interrupt handler for
the system calls, but the handler is not yet enabled.

Beginning at `main()` it fills the entire RAM with pseudo random data
and setting up the RAM address and data hardware scrambling with
values from the True Random Number Generator (TRNG).

1. Check the special resetinfo area in FW\_RAM to see if there is any
   data about why a reset has been made. All zeroes(?) meaning default
   behaviour.

2. If it was reset with intention to start a device app from client,
   see App loaded from client below.

3. Default is to start the first device app from flash. If resetinfo
   says otherwise it starts the other one.

4. Load flash app into RAM without USS.

5. Compute digest of loaded app.

6. Compare against stored app digest in partition table to note if app
   has been corrupted on flash. If corrupted, halt CPU.

7. Proceed to [Start the device app](#start-the-device-app) below.

If the app is the first set in a chain, it's the job of the app itself
to reset the TKey when it has done its job. For instance, a verified
boot loader app:

- includes a security policy, for instance a public key and code to
  check a signature.

- the app reads the message and the signature over the message (the
  digest of the next app in the chain) from the filesystem or from
  the client.

- if the signature provided over the message is verified to be done
  by the corresponding private key, this app would do a `reset()`,
  passing the digest to the firmware for control and instructing it
  to start *just* that app.

- firmware would see the instructions about the reset in FW\_RAM:

  1. Where to expect the next app from: client, a slot in the
     filesystem?
  2. The expected digest of the next app.

#### App loaded from client

Firmware waits for data coming in through the UART.

1. The client sends the `FW_CMD_LOAD_APP` command with the size of
   the device app and the optional 32 byte hash of the user-supplied
   secret as arguments and gets a `FW_RSP_LOAD_APP` back. After
   using this it's not possible to restart the loading of an
   application.

2. If the the client receive a sucessful response, it will send
   multiple `FW_CMD_LOAD_APP_DATA` commands, together containing the
   full application.

3. On receiving`FW_CMD_LOAD_APP_DATA` commands the firmware places
   the data into `0x4000_0000` and upwards. The firmware replies
   with a `FW_RSP_LOAD_APP_DATA` response to the client for each
   received block except the last data block.

4. When the final block of the application image is received with a
   `FW_CMD_LOAD_APP_DATA`, the firmware measure the application by
   computing a BLAKE2s digest over the entire application. Then
   firmware send back the `FW_RSP_LOAD_APP_DATA_READY` response
   containing the digest.

#### Start the device app

1. If there is an app digest in the resetinfo left from previous app,
   compare the digests. Halt CPU if differences.

2. The Compound Device Identifier
   ([CDI]((#compound-device-identifier-computation))) is then computed
   by doing a new BLAKE2s using the Unique Device Secret (UDS), the
   application digest, and any User Supplied Secret (USS) digest
   already received.

3. The start address of the device app, currently `0x4000_0000`, is
   written to `APP_ADDR` and the size of the binary to `APP_SIZE` to
   let the device application know where it is loaded and how large it
   is, if it wants to relocate in RAM.

4. The firmware now clears the part of the special `FW_RAM` where it
   keeps it stack.

5. The interrupt handler for system calls is enabled.

6. Firmware starts the application by jumping to the contents of
   `APP_ADDR`. Hardware automatically switches from firmware mode to
   application mode. In this mode some memory access is restricted,
   e.g. some addresses are inaccessible (`UDS`), and some are switched
   from read/write to read-only (see [the memory
   map](https://dev.tillitis.se/memory/)).

If during this whole time any commands are received which are not
allowed in the current state, or any errors occur, we enter the
"failed" state and execute an illegal instruction. An illegal
instruction traps the CPU and hardware blinks the status LED red until
a power cycle. No further instructions are executed.

### User-supplied Secret (USS)

USS is a 32 bytes long secret provided by the user. Typically a client
program gets a secret from the user and then does a key derivation
function of some sort, for instance a BLAKE2s, to get 32 bytes which
it sends to the firmware to be part of the CDI computation.

### Compound Device Identifier computation

The CDI is computed by:

```
CDI = blake2s(UDS, blake2s(app), USS)
```

In an ideal world, software would never be able to read UDS at all and
we would have a BLAKE2s function in hardware that would be the only
thing able to read the UDS. Unfortunately, we couldn't fit a BLAKE2s
implementation in the FPGA at this time.

The firmware instead does the CDI computation using the special
firmware-only `FW_RAM` which is invisible after switching to app mode.
We keep the entire firmware stack in `FW_RAM` and clear the stack just
before switching to app mode just in case.

We sleep for a random number of cycles before reading out the UDS,
call `blake2s_update()` with it and then immediately call
`blake2s_update()` again with the program digest, destroying the UDS
stored in the internal context buffer. UDS should now not be in
`FW_RAM` anymore. We can read UDS only once per power cycle so UDS
should now not be available even to firmware.

Then we continue with the CDI computation by updating with an optional
USS digest and finalizing the hash, storing the resulting digest in
`CDI`.

### Firmware system calls

The firmware provides a system call mechanism through the use of the
PicoRV32 interrupt handler. They are triggered by writing to the
trigger address: 0xe1000000. It's typically done with a function
signature like this:

```
int syscall(uint32_t number, uint32_t arg1);
```

Arguments are system call number and upto 6 generic arguments passed
to the system call handler. The caller should place the system call
number in the a0 register and the arguments in registers a1 to a7
according to the RISC-V calling convention. The caller is responsible
for saving and restoring registers.

The syscall handler returns execution on the next instruction after
the store instruction to the trigger address. The return value from
the syscall is now available in x10 (a0).

To add or change syscalls, see the `syscall_handler()` in
`syscall_handler.c`.

Currently supported syscalls:

| *Name*      | *Number* | *Argument* | *Description*                    |
|-------------|----------|------------|----------------------------------|
| RESET       | 1        | Unused     | Reset the TKey                   |
| SET\_LED    | 10       | Colour     | Set the colour of the status LED |
| GET\_VIDPID | 12       | Unused     | Get Vendor and Product ID        |

## Developing firmware

Standing in `hw/application_fpga/` you can run `make firmware.elf` to
build just the firmware. You don't need all the FPGA development
tools. See [the Developer Handbook](https://dev.tillitis.se/tools/)
for the tools you need. The easiest is probably to use our OCI image,
`ghcr.io/tillitis/tkey-builder`.

[Our version of qemu](https://dev.tillitis.se/tools/#qemu-emulator) is
also useful for debugging the firmware. You can attach GDB, use
breakpoints, et cetera.

There is a special make target for QEMU: `qemu_firmware.elf`, which
sets `-DQEMU_DEBUG`, so you can debug prints using the `debug_*()`
functions. Note that these functions are only usable in QEMU and that
you might need to `make clean` before building, if you have already
built before.

If you want debug prints to show up on the special TKey HID debug
endpoint instead, define `-DTKEY_DEBUG`.

Note that if you use `TKEY_DEBUG` you *must* have something listening
on the corresponding HID device. It's usually the last HID device
created. On Linux, for instance, this means the last reported hidraw
in `dmesg` is the one you should do `cat /dev/hidrawX` on.

### tkey-libs

Most of the utility functions that the firmware use lives in
`tkey-libs`. The canonical place where you can find tkey-libs is at:

  https://github.com/tillitis/tkey-libs

but we have vendored it in for firmware use in `../tkey-libs`. See top
README for how to update.

### Test firmware

The test firmware is in `testfw`. It's currently a bit of a hack and
just runs through expected behavior of the hardware cores, giving
special focus to access control in firmware mode and application mode.

It outputs results on the UART. This means that you have to attach a
terminal program to the serial port device, even if it's running in
qemu. It waits for you to type a character before starting the tests.

It needs to be compiled with `-Os` instead of `-O2` in `CFLAGS` in the
ordinary `application_fpga/Makefile` to be able to fit in the 6 kByte
ROM.