From 3cf218469c3b914abf1cda73108239ad9f486783 Mon Sep 17 00:00:00 2001
From: Michael Cardell Widerkrantz <mc@tillitis.se>
Date: Tue, 26 Mar 2024 17:45:38 +0100
Subject: [PATCH] hw/tool: UDI/UDS storage
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Describe how the UDI and UDS are actually stored in the FPGA, how they
are accessed, and how they are initialled by the patch_uds_udi.py
script.

Co-authored-by: Joachim Strömbergson <joachim@assured.se>
---
 hw/application_fpga/core/tk1/README.md     | 24 ++++++++--
 hw/application_fpga/core/uds/README.md     | 42 ++++++++++++++---
 hw/application_fpga/tools/patch_uds_udi.py | 53 +++++++++++++---------
 3 files changed, 87 insertions(+), 32 deletions(-)

diff --git a/hw/application_fpga/core/tk1/README.md b/hw/application_fpga/core/tk1/README.md
index 8007aaf..ee46e90 100644
--- a/hw/application_fpga/core/tk1/README.md
+++ b/hw/application_fpga/core/tk1/README.md
@@ -110,9 +110,27 @@ secret for any secrets it needs to perform its intended use case.
   ADDR_UDI_LAST:  0x31
 ```
 
-These registers provide read access to the 64-bit unique device
-identity. The UDI is stored as ROM within the FPGA configuration. The
-registers can't be written to.
+These read-only registers provide access to the 64-bit Unique Device
+Identity (UDI).
+
+The two UDI words are stored using 32 named SB\_LUT4 FPGA multiplexer
+(MUX) instances, identified in the source code as "udi\_rom\_idx". One
+instance for each bit in core read_data output bus.
+
+Each SB\_LUT4 MUX is able to store 16 bits of data, in total 512 bits.
+But since the UDI is 64 bits, we only use the two LSBs in each MUX.
+Note that only the LSB address of the SB_LUT4 instances are connected
+to the CPU address. This means that only the two LSBs in each MUX can
+be addressed.
+
+During build of the FPGA design, the UDI is set to a known bit
+pattern, which means that the SB_LUT4 instantiations are initialized
+to a fixed bit pattern.
+
+The tool 'patch\_uds\_udi.py' is used to replace the fixed bit pattern
+with a unique bit pattern before generating the per device unique FPGA
+bitstream. This allows us to generate these device unique FPGA
+bitstreams without having to do a full FPGA build.
 
 
 ### RAM memory protecion
diff --git a/hw/application_fpga/core/uds/README.md b/hw/application_fpga/core/uds/README.md
index f10feff..145411f 100644
--- a/hw/application_fpga/core/uds/README.md
+++ b/hw/application_fpga/core/uds/README.md
@@ -5,16 +5,21 @@ Unique Device Secret core
 ## Introduction
 
 This core store and protect the Unique Device Secret (UDS) asset. The
-UDS can be accessed as eight separate 32-bit words. The words can be
-accessed in any order, but a given word can only be accessed once
-between reset cycles. The words can only be accessed as long as the
-fw_app_mode input is low, implying that the CPU is executing the FW.
+UDS can be accessed as eight separate 32-bit words. The words can only
+be accessed as long as the fw_app_mode input is low, implying that the
+CPU is executing the FW.
 
-Each UDS words has a companion read bit that is set when the word is
-accessed. This means that the even if the chip select (cs) control
+The UDS words can be accessed in any order, but a given word can only
+be accessed once between reset cycles. This read once functionality is
+implemented with a companion read bit for each word. The read bit is
+set when the word is first accessed. The read bit controls if the real
+UDS word is returned or not.
+
+This means that the even if the chip select (cs) control
 input is forced high, the content will become all zero when the read
 bit has been set after one cycle.
 
+
 ## API
 There are eight addresses in the API. These are defined by the
 two values ADDR_UDS_FIRST and ADDR_UDS_LAST:
@@ -31,4 +36,27 @@ Any access to another address will be ignored by the core.
 
 ## Implementation
 
-The UDS words are implemented using discrete registers.
+These read-only registers provide read access to the 256-bit UDS.
+
+The eight UDS words are stored using 32 named SB\_LUT4 FPGA
+multiplexer (MUX) instances, identified in the source code as
+"uds\_rom\_idx". One instance for each bit in the core read\_data
+output bus.
+
+During build of the FPGA design, the UDS is set to a known bit
+pattern, which means that the SB\_LUT4 instantiations are initialized
+to a fixed bit pattern.
+
+The tool 'patch\_uds\_udi.py' is used to replace the fixed bit pattern
+with a unique bit pattern before generating the per device unique FPGA
+bitstream. This allows us to generate these device unique FPGA
+bitstreams without haveing to do a full FPGA build.
+
+Each SB\_LUT4 MUX is able to store 16 bits of data, in total 512 bits.
+But since the UDS is 256 bits, we only use the eight LSBs in each MUX.
+
+The eighth MSBs in each MUX will be initialized to zero. The read
+access bit (se description above) for a given word is used as the
+highest address bit to the MUXes. This forces any subsequent accesses
+to a UDS word to read from the MUX MSBs, not the LSBs where the UDS is
+stored.
diff --git a/hw/application_fpga/tools/patch_uds_udi.py b/hw/application_fpga/tools/patch_uds_udi.py
index e1226ed..19b0e3b 100644
--- a/hw/application_fpga/tools/patch_uds_udi.py
+++ b/hw/application_fpga/tools/patch_uds_udi.py
@@ -5,32 +5,40 @@
 # Written by Myrtle Shah <gatecat@ds0.me>
 # SPDX-License-Identifier: GPL-2.0-only
 #
-# patch_uds_udi.py
-# --------------
-# Python program that patches the UDS and UDI implemented using
-# named LUT4 instances to have unique initial values, not the generic
-# values used during synthesis, p&r and mapping. This allows us to
-# generate device unique bitstreams without running the complete flow.
+# Script to patch in a Unique Device Secret (UDS) and a Unique Device
+# Identifier (UDI) from files into a bitstream.
 #
-# Both the UDI and UDS are using bit indexing from 32 LUTs for each
-# word, i.e., the first word consists of bit 0 from each 32 LUTs and
-# so on.
+# It's supposed to be run like this:
 #
-# The size requirements for the UDI and UDS are specified as 1 bit (8
-# bytes of data) and 3 bits (32 bytes of data), respectively. The UDI
-# does not occupy the entire LUT4 instance, and to conserve resources,
-# the pattern of the UDI is repeated over the unused portion of the
-# LUT4 instance. This eliminates the need to drive the three MSB pins
-# while still achieving the correct output.
+# nextpnr-ice40 --up5k --package sg48 --ignore-loops \
+#    --json application_fpga_par.json --run patch_uds_udi.py
 #
-# In the case of UDS, a read-enable signal is present, and the most
-# significant bit serves as the read-enable input. This requires the
-# lower half of initialization bits to be forced to zero, ensuring
-# that the memory outputs zero when the read-enable signal is
-# inactive.
+# with this environment:
 #
+# - UDS_HEX: path to the UDS file, typically the path to
+#   ../data/uds.hex
+# - UDI_HEX: path to the UDI file, typically the path to ../data/udi.hex
+# - OUT_ASC: path to the ASC output that is then used by icebram and icepack.
 #
-#=======================================================================
+# The script changes the UDS and UDI that are stored in named 4-bit
+# LUT instances in the JSON file so we can generate device
+# unique bitstreams without running the complete flow just to change
+# UDS and UDI. Then we can just run the final bitstream generation
+# from the ASC file.
+#
+# We represent our UDI and UDS values as a number of 32 bit words:
+#
+# - UDI: 2 words.
+# - UDS: 8 words.
+#
+# We reserve 32 named 4-bit LUTs *each* to store the data: UDS in
+# "uds_rom_idx" and UDI in "udi_rom_idx".
+#
+# The script repeats the value in the LUTs so we don't have to care
+# about the value of the unused address bits.
+#
+# See documentation in their implementation in ../core/uds/README.md
+# and ../core/tk1/README.md
 
 import os
 
@@ -49,7 +57,8 @@ def rewrite_lut(lut, idx, data, has_re=False):
     new_init = 0
     for i, word in enumerate(data):
         if (word >> idx) & 0x1:
-            # repeat so inputs above address have a don't care value
+            # repeat so we don't have to care about inputs above
+            # address
             repeat = (16 // len(data))
             for k in range(repeat):
                 # UDS also has a read enable