From 86aedcce69b2fbe2d1742158caaaf8d89fc1a779 Mon Sep 17 00:00:00 2001 From: Michael Cardell Widerkrantz Date: Thu, 17 Oct 2024 16:26:51 +0200 Subject: [PATCH] Revise top-level README for the hardware design Merged information from fpga.md, and hence fpga.md is removed. --- doc/system_description/fpga.md | 186 ----------------- hw/application_fpga/README.md | 352 ++++++++++++--------------------- 2 files changed, 127 insertions(+), 411 deletions(-) delete mode 100644 doc/system_description/fpga.md diff --git a/doc/system_description/fpga.md b/doc/system_description/fpga.md deleted file mode 100644 index 2e18e0c..0000000 --- a/doc/system_description/fpga.md +++ /dev/null @@ -1,186 +0,0 @@ -# The TKey FPGA - -## Introduction -The TKey application FPGA (application_fpga design) contain -the application platform CPU system onto which secure applications -are loaded, measured and executed. The platform is a compact -System on Chip (SoC) with the following cores. - -![The Application FPGA block diagram](../images/application_fpga_block_diagram.png) - -The application FPGA is currently implemented using a Lattice -[iCE40 UltraPlus UP5K device](https://www.latticesemi.com/en/Products/FPGAandCPLD/iCE40UltraPlus). -Only open tools are used in the toolchain. - - -### Top level - -The top level application_fpga design contain instances of all cores as -well as the memory system. The memory system allows the CPU to access -cores in different ways given the current exection mode. There are two -execution modes - firmware and application. Basically, in application mode -the access is more restrictive. - -The API for all cores is described in the Memory mapped hardware -functions in the [System -Description](system_description.md#memory-mapped-hardware-functions). - - -### Cores - -#### CPU -The CPU core is an instance of the [PicoRV32 core](https://github.com/YosysHQ/picorv32). -The instance enables the following features - -- Compressed ISA (C extension) -- Fast multiplication. Two cycles for 32x32 multiplication -- Barrel shifter - -No other modification to the core has been done. -No interrupts are used. - - -#### Clock and reset - -The device does not rely on external clock or reset. Instead the -internal HFOSC oscillator combined with an internal PLL is used to generate -the main clock. Currently the clock frequency driving the SoC is 18 MHz. - -The device also generates its own reset. - - -#### FW ROM - -The ROM memory containing the firmware. After reset the CPU will -read from the ROM to load, measure and start applications. - -The RAM memort is only accessible by the firmware. - - -#### FW RAM - -A 512w32 small RAM only accessible by the firmware. The firmware -use the RAM during loading and measurement of the application. - - -#### UDS - -Unique Device Secret memory. - -A 256 bit memory implemented using eight 32-bit registers. The -registers can only be accessed once between power cycling. This means -that the UDS **must** be read as u32. If read as u8, only the first -byte from a given address will be correct, subsequent bytes will be -zero. - -The UDS can only be read in FW mode. Reading from the UDS in APP mode -will return all zeros. - - -#### Application RAM - -The 128 KByte main RAM. The RAM is only used by applications. -The memory is cleared by firmware before an application is loaded. - -The application RAM is available to use by firmware and applications. - - -#### Timer - -A general purpose 32 bit timer. The timer will count down from -the initial value to one. In order to handle long time sequences -(minutes, hours, days) there is also a 32 bit prescaler. - -The timer is available to use by firmware and applications. - - -#### UART - -A standard UART interface for receiving bytes from and send bytes -to the host via the interface MCU on the TKey. - -The UART default speed is 62500 bps, but can be adjusted by the -application. (Note that the host must set the same bitrate too.) - -The UART contain a 512 but Rx-FIFO with status (data available). - -The timer is available to use by firmware and applications. - - -#### TRNG - -The TRNG is a ring oscillator based internal entropy source, or -True Random Number Generator (TRNG). By default the TRNG use 32 -free running digital oscillators. By default, the oscillators are -sampled after 4096 cycles. The states are XOR combined to create -a single bit. After another 4096 cycles a second bit is created, -and the two bits are XOR combined to a single entropy bit. The -entropy bit is added to a 32 bit entropy word. - -After 32 bits has been collected, the data ready flag is set, -indicating that an entropy word is ready for consumption. Note -that entropy bit generation and collections is running continuously, -bits will be added, and the discarded after 32 more bits have -been added. - -If a data word has been read from the TRNG, by default at least -32 bits will collected before new data will be available. - -The TRNG is available to use by firmware and applications. - -Note: The TRNG generates entropy with a fairly good quality. -However for security related use cases, for example keys, the TRNG -should not be used directly. Instead use it to create a seed -for a Digital Random Bit Generator (DRBG), also known as a -Cryptographically Safe Pseudo Random Number Generator (CSPRNG). - -Examples of such generators are Hash_DRGG, CTR_DRBG, HKDF. - - -#### Touch sensor - -The core provides a stable interface to the touch sensor on the -TKey device. Using the core, the firmware and applications can -get information about touch events and manage detection of -events. - -It is recommended that SW start by acknowledge any stray events prior -to signal the user that a touch event is expected and then start -waiting for an event. - -The touch sensor is available to use by firmware and applications. - - -#### TKey - -The TKey core contains several functions, and acts as -main HW interface between firmware and applications. The core -includes: - -- Read access to the 64 bit FPGA design name, expressed as ASCII chars. -- Read access to the 32 bit FPGA design version, expressed as an integer - -- Control and status access for the RGB LED on TKey board - - Setting bit 0 high turns on the Blue LED. - - Setting bit 1 high turns on the Green LED. - - Setting bit 2 high turns on the Red LED. - -- Control and status access for the 4 GPIOs on the TKey board - - GPIO 1 and 2 are inputs and provide read access to the - current sampled value digital values on the pins. - - - GPIO 3 and 4 are outputs. The digital value written to - the bits will be presented on the pins. - -- Application read access to information about the loaded - application. The information is written by the firmware. - - Start address - - Size of address - -- Application read access to the CDI generated and written - by the firmware when the application is loaded. - -- Application-Firmware execution mode control. Can be read - by the application and written by firmware. When written - to by the firmware, the hardware will switch to application - mode and start executing the application. diff --git a/hw/application_fpga/README.md b/hw/application_fpga/README.md index 4d3a243..5b6caaa 100644 --- a/hw/application_fpga/README.md +++ b/hw/application_fpga/README.md @@ -1,93 +1,116 @@ -# System Description +# TKey hardware design -**NOTE:** Documentation migrated to dev.tillitis.se, this is kept for -history. This is likely to be outdated. +![The Application FPGA block diagram](../../doc/images/application_fpga_block_diagram.png) -## Purpose and Revision +The application FPGA is implemented using a Lattice [iCE40 UltraPlus +UP5K +device](https://www.latticesemi.com/en/Products/FPGAandCPLD/iCE40UltraPlus). +Only open tools are used in the toolchain. -The purpose of this document is to provide a description of the -Tillitis TKey. What it is, what is supposed to be used for, by whom, -where and possible use cases. The document also provides a functional -level description of features and components of the TKey. +The design top level is in `rtl/application_fpga.v`. It contains +instances of all cores as well as the memory system. -Finally, the document acts as a requirement description. For the -requirements, the document follows -[RFC2119](https://datatracker.ietf.org/doc/html/rfc2119) to indicate -requirement levels. +The memory system allows the CPU to access cores in different ways +given the current execution mode. There are two execution modes - +firmware and application. Basically, in application mode the access is +more restrictive. -The described functionality and requirements applies to version 1 of -the TKey (TK1) +The rest of the components are under `cores`. They typically have +their own `README.md` file documenting them and their API in detail. -The intended users of this document are: -- Implementors of the TKey hardware, firmware and SDKs -- Developers of secure applications for the TKey -- Technically skilled third parties that wants to understand the - TKey +Hardware functions with APIs, assets, and input/output are memory +mapped starting at base address `0xc000_0000`. For specific offsets +and bitmasks, see the file `fw/tk1_mem.h`. +Rough memory map: -## Introduction -The TKey is a USB-connected, RISC-V based application platform. The -purpose of the TKey is to provide a secure environment for TKey device -apps that provides some security functionality to the client as needed -by the use case and device user. Some examples of such security -functionality are: +| *name* | *prefix* | +|---------|----------| +| ROM | 0x00 | +| RAM | 0x40 | +| TRNG | 0xc0 | +| Timer | 0xc1 | +| UDS | 0xc2 | +| UART | 0xc3 | +| Touch | 0xc4 | +| FW\_RAM | 0xd0 | +| TK1 | 0xff | -- TOTP token generators -- Signing oracles -- SSH login dongles +## `clk_reset_gen` +Generator for system clock and system reset. -## TKey Security Features +The device does not rely on external clock or reset. Instead the +internal HFOSC oscillator combined with an internal PLL is used to +generate the main clock. Currently the clock frequency driving the SoC +is 21 MHz. -### Measured Based Security -The key, unique feature of the TKey is that it measures the secure -application when the application is being loaded onto the device. The -measurement (a hash digest), combined with a Unique Device Secret -(UDS) is used to derive a base secret for the application. +The device also generates its own reset. -The consequence of this is that if the application is altered, -the base secret derived will also change. Conversely, if the keys -derived from the base secret are the same as the last time the -application was loaded onto the same device, the application can -be trusted not to have been altered. +## `fw_ram` -Note that since the UDS is per-device unique, the same application -loaded onto another TKey device will derive a different set of keys. -This ties keys to a specific device. +Special firmware-only RAM. Unreachable from app mode. -The derivation can also be combined with a User Supplied Secret -(USS). This means that keys derived are both based on something the user -has - the specific device, and something the user knows (the USS). And -the derived can be trusted because of the measurement being used -by the derivation, thereby verifying the integrity of the application. +## `picorv32` -### Execution monitor +A softcore 32 bit RISC-V CPU from [upstreams +PicoRV32](https://github.com/YosysHQ/picorv32). -The purpose of the The Tillitis TKey execution monitor is to ensure -that execution of instructions does not happen from memory areas -containing application data och the stack. +Configuration of optional CPU features like `ENABLE_FAST_MUL` et +cetera are in `rtl/application_fpga.v`. -The monitor continuously observes the address from which the CPU wants -to fetch the next instruction. If that address falls within a defined -address range from which execution is not allowed, the monitor will -force the CPU to read an illegal instruction. This will cause the CPU -to enter its trap state. There is no way out of this state, and the -user must perform a power cycle of the TKey device. +The instance enables the following features -Currently the following rules are implemented by the execution monitor -(future releases may add more rules): +- Compressed ISA (C extension) +- Fast multiplication. Two cycles for 32x32 multiplication +- Barrel shifter -- Execution from the firmware RAM (fw\_ram) is always blocked by the - monitor -- Applications can define the area within RAM from which execution - should be blocked +No other modification to the core has been done. No interrupts are +used. -The application can define its no execution area to the -ADDR\_CPU\_MON\_FIRST and ADDR\_CPU\_MON\_LAST registers in the tk1 core. -When the registers have been set the application can enable the -monitor for the area by writing to the ADDR\_CPU\_MON\_CTRL register. -Note that once the monitor has been enabled it can't be disabled and -the addresses defining the area can't be changed. +## `ram` + +The 128 KByte main RAM. The RAM is only used by applications. +The memory is cleared by firmware before an application is loaded. + +The application RAM is available to use by firmware and applications. + +MC: Is the scrambling here now? + +## `rom` + +ROM for the firmware. The firmware.elf minus the ELF header is +embedded here in the bitstream. + +After reset the CPU will initialize the program counter to start of +ROM. + +## `timer` + +A general purpose 32 bit timer. The timer will count down from the +initial value to one. In order to handle long time sequences (minutes, +hours, days) there is also a 32 bit prescaler. + +The timer is available to use by firmware and applications. + +## `tk1` + +See [tk1 README](core/tk1/README.md) for details. + +Contains: + +- FPGA name and version. +- Execution mode control: firmware mode/application mode. +- RGB LED control. +- General purpose input/output (GPIO) pin control. +- Application introspection: start address and size of binary. +- BLAKE2s function access. +- Compound Device Identity (CDI). +- Unique Device Identity (UDI). +- RAM memory protection. +- Security monitor. +- SPI main. +- System reset. ### Illegal instruction monitor @@ -112,7 +135,7 @@ also not directly scale to multiple TKey devices. The memory protection is based on two separate mechanisms: 1. Address randomisation -2. Address dependent data randomization +2. Address dependent data randomization The address randomization is implemented by XORing the CPU address with the contents of the ADDR\_RAM\_ADDR\_RAND register in the tk1 @@ -150,177 +173,56 @@ use real encryption (for example PRINCE) for memory content protection. From the application point of view such a change will be transparent. -## Assets +## `touch_sense` -The TKey store and use the following assets internally: +Provide touch sensor events to software. -- UDS - Unique Device Secret. 256 bits. Provisioned and stored during - device manufacturing. Never to be replaced during the life time of - a given device. Used to derive application secrets. Must never leave - the device. Tillitis will NOT store a copy of the UDS. Can be read - by firmware once between power cycling +It is recommended that software start by acknowledge any stray events +prior to signalling to the user that a touch event is expected and +then start waiting for an event. -- UDI - Unique Device ID. 64 bits. Provisioned and stored during - device manufacturing. Only accessible by FW. Never to be replaced or - altered during the life time of a given device. May be copied, - extracted, read from the device. +The touch sensor is available to use by firmware and applications. -- CDI - Compound Device Identity. Computed by the FW when an - application is loaded using the UDS and the application binary. Used - by the application to derive secrets, keys as needed. The CDI should - never be exposed. +## `trng` -Additionally the following asset could be provided from the host: +Provide a True Random Number Generator to software. -- USS - User Supplied Secret. May possibly be replaced many times. - Supplied from the host to the device. Should not be revealed to a - third party. +The TRNG generates entropy with a fairly good quality. However for +security related use cases, for example key generation, the TRNG +should not be used directly. Instead use it to create a seed, and +reseed as necessary, for a Digital Random Bit Generator (DRBG), also +known as a Cryptographically Safe Pseudo Random Number Generator +(CSPRNG). Examples of such generators are Hash\_DRGG, CTR\_DRBG, and +HKDF. -## Memory +## `uart` -Addressing: +A simple universal asynchronous receiver/transmitter (UART) used for +communication with the client through the USB controller. -``` -31st bit 0th bit -v v -0000 0000 0000 0000 0000 0000 0000 0000 +The UART default speed is 62500 bps, but can be adjusted by the +application. (Note that the host must set the same bitrate too.) -- Bits [31 .. 30] (2 bits): Top level prefix (described below) -- Bits [29 .. 24] (6 bits): Core select. We want to support at least 16 cores -- Bits [23 .. 0] (24 bits): Memory/in-core address. -``` +The UART contain a 512 but Rx-FIFO with status (data available). -Assigned top level prefixes: +The UART is available to use by firmware and applications. -| *name* | *prefix* | *address length* | -|----------|----------|--------------------------------------| -| ROM | 0b00 | 30 bit address | -| RAM | 0b01 | 30 bit address | -| reserved | 0b10 | | -| MMIO | 0b11 | 6 bits for core select, 24 bits rest | +## `uds` -| *memory* | *first byte* | *last byte* | -|----------|--------------|------------------------------| -| ROM | 0x0000\_0000 | 0x0000\_17ff | -| RAM | 0x4000\_0000 | 0x4001\_ffff | -| MMIO | 0xc000\_0000 | 0xffff\_ffff (last possible) | +The Unique Device Secret. During build initial value is taken from +`data/uds.hex`. -### Memory mapped hardware functions +However, with the finished bitstream you can patch in another UDS and +UDI by using `tools/patch_uds_udi.py` with nextpnr. -Hardware functions, assets, and input/output are memory mapped (MMIO) -starting at base address `0xc000_0000`. For specific offsets/bitmasks, -see the file [tk1_mem.h](../../hw/application_fpga/fw/tk1_mem.h) (in -this repo). - -Assigned core prefixes: - -| *name* | *address prefix* | -|--------|------------------| -| TRNG | 0xc0 | -| TIMER | 0xc1 | -| UDS | 0xc2 | -| UART | 0xc3 | -| TOUCH | 0xc4 | -| FW_RAM | 0xd0 | -| TK1 | 0xff | - -*Nota bene*: MMIO accesses should be 32 bit wide, e.g use `lw` and -`sw`. Exceptions are `UDS`, `FW_RAM` and `QEMU_DEBUG`. - -| *name* | *fw* | *app* | *size* | *type* | *content* | *description* | -|--------------------|-------|-----------|--------|----------|-----------|-------------------------------------------------------------------------| -| `TRNG_STATUS` | r | r | | | | TRNG_STATUS_READY_BIT is 1 when an entropy word is available. | -| `TRNG_ENTROPY` | r | r | 4B | u32 | | Entropy word. Reading a word will clear status. | -| `TIMER_CTRL` | r/w | r/w | | | | If TIMER_STATUS_RUNNING_BIT in TIMER_STATUS is 0, setting | -| | | | | | | TIMER_CTRL_START_BIT here starts the timer. | -| | | | | | | If TIMER_STATUS_RUNNING_BIT in TIMER_STATUS is 1, setting | -| | | | | | | TIMER_CTRL_STOP_BIT here stops the timer. | -| `TIMER_STATUS` | r | r | | | | TIMER_STATUS_RUNNING_BIT is 1 when the timer is running. | -| `TIMER_PRESCALER` | r/w | r/w | 4B | | | Prescaler init value. Write blocked when running. | -| `TIMER_TIMER` | r/w | r/w | 4B | | | Timer init or current value while running. Write blocked when running. | -| `UDS_FIRST` | r[^3] | invisible | 4B | u8[32] | | First word of Unique Device Secret key. Note: Read once per power up. | -| `UDS_LAST` | | invisible | | | | The last word of the UDS. Note: Read once per power up. | -| `UART_BITRATE` | r/w | | | | | TBD | -| `UART_DATABITS` | r/w | | | | | TBD | -| `UART_STOPBITS` | r/w | | | | | TBD | -| `UART_RX_STATUS` | r | r | 1B | u8 | | Non-zero when there is data to read | -| `UART_RX_DATA` | r | r | 1B | u8 | | Data to read. Only LSB contains data | -| `UART_RX_BYTES` | r | r | 4B | u32 | | Number of bytes received from the host and not yet read by SW, FW. | -| `UART_TX_STATUS` | r | r | 1B | u8 | | Non-zero when it's OK to write data to send. | -| `UART_TX_DATA` | w | w | 1B | u8 | | Data to send. Only LSB contains data | -| `TOUCH_STATUS` | r/w | r/w | | | | TOUCH_STATUS_EVENT_BIT is 1 when touched. After detecting a touch | -| | | | | | | event (reading a 1), write anything here to acknowledge it. | -| `FW_RAM` | r/w | invisible | 2 kiB | u8[2048] | | Firmware-only RAM. | -| `UDI` | r | invisible | 8B | u64 | | Unique Device ID (UDI). | -| `QEMU_DEBUG` | w | w | | u8 | | Debug console (only in QEMU) | -| `NAME0` | r | r | 4B | char[4] | "tk1 " | ID of core/stick | -| `NAME1` | r | r | 4B | char[4] | "mkdf" | ID of core/stick | -| `VERSION` | r | r | 4B | u32 | 1 | Current version. | -| `SYSTEM_MODE_CTRL` | r/w | r | 1B | u8 | | Write anything here to trigger the switch to application mode. Reading | -| | | | | | | returns 0 if device is in firmware mode, 0xffffffff if in app mode. | -| `LED` | r/w | r/w | 1B | u8 | | Control of the color LEDs in RBG LED on the board. | -| | | | | | | Bit 0 is Blue, bit 1 is Green, and bit 2 is Red LED. | -| `GPIO` | r/w | r/w | 1B | u8 | | Bits 0 and 1 contain the input level of GPIO 1 and 2. | -| | | | | u8 | | Bits 3 and 4 store the output level of GPIO 3 and 4. | -| `APP_ADDR` | r/w | r | 4B | u32 | | Firmware stores app load address here, so app can read its own location | -| `APP_SIZE` | r/w | r | 4B | u32 | | Firmware stores app app size here, so app can read its own size | -| `BLAKE2S` | r/w | r | 4B | u32 | | Function pointer to a BLAKE2S function in the firmware | -| `CDI_FIRST` | r/w | r | 32B | u8[32] | | Compound Device Identifier (CDI). UDS+measurement... | -| `CDI_LAST` | | r | | | | Last word of CDI | -| `RAM_ASLR` | w | invisible | 4B | u32 | | Address Space Randomization seed value for the RAM | -| `RAM_SCRAMBLE` | w | invisible | 4B | u32 | | Data scrambling seed value for the RAM | -| `CPU_MON_CTRL` | w | w | 4B | u32 | | Bit 0 enables CPU execution monitor. Can't be unset. Lock adresses | -| `CPU_MON_FIRST` | w | w | 4B | u32 | | First address of the area monitored for execution attempts | -| `CPU_MON_LAST` | w | w | 4B | u32 | | Last address of the area monitored for execution attempts | - -[^3]: The UDS can only be read *once* per power-cycle. - -## Subsystems and Components - -The TKey as a project, system and secure application platform -consists of a number of subsystems and components, modules, support -libraries etc. Roughly these can be divided into: - -- TKey boards. PCB designs including schematics, Bill of Material - (BOM) and layout, as needed for development, production and and - general usage of the TKey devices - -- TKey programmer. SW, PCB designs including schematics, Bill of - Material (BOM) and layout, as needed for development, production - and and provisioning, programming general usage - -- USB to UART controller. FW for the MCU implementing the USB host - interface on the TKey - -- application\_fpga. FPGA design with cores including CPU and memory that - implements the secure application platform - -- application\_fpga FW. The base software running on the CPU as needed to - boot, load applications, measure applications, dderive base secret etc - -- One or more applications loaded onto the application\_fpga to provide - some functionality to the user of the host - -- host side application loader. Software that talks to the FW in the - application\_fpga to load a secure application - -- host side boot, management. Support software to boot, authenticate - the TKey device connected to a host - -- host side secure application. Software that communicates with the - secure application running in the application\_fpga as needed to solve - a security objective - -- application\_fpga FW SDK. Tools, libraries, documentation and examples - to support development of the application\_fpga firmware - -- secure application SDK. Tools, libraries, documentation and examples - to support development of the secure applications to be loaded onto - the application\_fpga - -- host side secure application SDK. Tools, libraries, documentation and - examples to support development of the host applications +It's a 256 bit memory implemented using eight 32-bit registers. The +registers can only be accessed once between power cycling. This means +that the UDS **must** be read as u32. If read as u8, only the first +byte from a given address will be correct, subsequent bytes will be +zero. +The UDS can only be read in firmware mode. Reading from the UDS in +application mode will return all zeros. ## References More detailed information about the firmware running on the device can