Evaluating Timing Errors in AI Hardware Accelerators

Team: Andrew Chamberlin

Sponsor: USU Bridge Lab

Project Problem & Relevance

Simulated timing errors are represented as random or as a single process.
Both methods neglect the nature of timing errors in real circuits.
Absolute worst-case simulated errors are too restrictive.
Better than worst-case simulated errors leave room for flaws in the design.
By emulating and characterizing timing errors, future research efforts will be able to model their simulated errors for more accurate testing.

Design a Systolic Multiplier Array (SMA) with parameterized size in Verilog.
Use AXI Interface protocols and a custom control scheme to move data between clock boundaries crossing.
Develop software capable of interfacing with the AXI Interface.
Test cross layer interactions to ensure the AXI Interface is correct and error free.
Design method of over-clocking just the SMA logic to create timing errors.
Create test structure for repeatable and customizable tests in the software with external UART interface.
Create scripts on the host for generating test cases and communicating over UART.

Natural circuit errors were found to be discrete and deterministic with small variance across multiple runs.
Future research will include under-volting as an alternative source of creating timing errors as well as improving the complexity of testing and analysis.
Small discrete sets of possible errors for a given controlled environment means that error detection and error correction should not assume that errors are equivalent to completely random outcomes.
Project management is difficult and ensuring thorough designs up front saves a lot of time in future redesigns of the system.
Next, we will expand the interface to include direct memory access between the PL and the PS, allowing for larger and faster transactions.