# Multithread accelerators on FPGAs: a Dataflow–based Approach



## <u>Francesco Ratto<sup>1</sup></u>, Luigi Raffo<sup>1</sup>, Francesca Palumbo<sup>2</sup> <sup>1</sup>University of Cagliari (IT), <sup>2</sup>University of Sassari (IT) francesco.ratto@unica.it





Abstract

Multithreading is a well-known technique to deliver performance gain, raising resource efficiency by exploiting underutilization periods. In this work, we describe a modelbased approach for designing custom multithread hardware accelerators targeting reconfigurable fabric. This approach exploits dataflow models of applications and tagged tokens to let the resulting hardware support concurrent threads. Results highlight that the proposed accelerators achieve a valuable tradeoff between a set of parallel singlethread accelerators and a single-thread accelerator multiplexed in time. The ongoing and future work to validate and improve the design approach are presented.





The proposed **model-based approach\*** allows, starting from the single-thread **dataflow specification** of an application and **without** explicit need of **data synchronization**, to design a corresponding multithread hardware



Developed components that meet the above functional requirements:

- Multithread FIFO interface to allow selecting the reding thread (Req 3) and to know the status of the FIFO with respect to each thread (Req 2).
- Actor architecture with shared logic to process tokens. The state is replicated for each thread and multiplexed depending on the tag of the input token.
- Two FIFO architectures. One with a dedicated memory per thread and a simple control logic. And one with two shared memories, one for storing tokens and one for their order, but a more complex control logic.

#### Results

The design approach has been **tested** on two versions (Baseline and Matrix) of a **two-stage filter** adopted in the motion compensation phase of the HEVC standard.

Performance gain is obtained **exploiting idle periods** in the single-thread execution and allowing light threads to **"overtake"** heavier one.



The resource utilization of the multithread accelerators stands in between the single-thread ones and the corresponding set of replicas.

### Ongoing and future work

A complete host **processor – accelerator environment** with OS support and an API for thread instantiation is **under development.** 



Validation of the approach on a heterogenous platform processing LTE workloads in collaboration with CC Chair at the CFAED of TU Dresden.



**Integration** of the proposed design approach **with HLS** tool to make it available to developers and speed up the design process.

\* Ratto, F., Esposito, S., Sau, C., Raffo, L., & Palumbo, F. (2022). Multithread Accelerators on FPGAs: A Dataflow-Based Approach. In Proceedings of PARMA-DITAM 2022. Schloss Dagstuhl-Leibniz-Zentrum für Informatik..

## CPS SUMMER SCHOOL 2022, 19–23 SEPTEMBER, PULA



 A firing actor must tag the output tokens with the same tag of the input ones



2. The firing rules must be adjusted so that only **matching tokens** can fire the execution.



3. **FIFOs must provide semi-out-of-order read**, letting the reading actors choose among the first token of each flow of execution.

