Migrating from Vitis HLS
Note
This tutorial assumes that you are familiar with Vitis HLS and have some experience with TAPA. We introduce the TAPA coding style and show how to migrate from Vitis HLS to TAPA.
Example 1: Basics with VecAdd
In this tutorial, we’ll walk through the process of converting a vector addition example from Vitis HLS to TAPA. We’ll cover the key changes needed to make your code TAPA-compatible.
Step 1: Update the Includes
First, we need to replace the HLS-specific header with TAPA’s header.
#include <hls_vector.h>
-#include <hls_stream.h>
+#include <tapa.h>
#include "assert.h"
Other HLS headers like ap_int.h
and hls_vector.h
are still supported.
They can be included as usual.
Step 2: Update the Top Function
Next, we’ll modify the top function to use TAPA’s memory-mapped interface.
void vadd(
- hls::vector<uint32_t, NUM_WORDS>* in1,
- hls::vector<uint32_t, NUM_WORDS>* in2,
- hls::vector<uint32_t, NUM_WORDS>* out,
+ tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> in1,
+ tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> in2,
+ tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> out,
int size
) {
- #pragma HLS INTERFACE m_axi port = in1 bundle = gmem0
- #pragma HLS INTERFACE m_axi port = in2 bundle = gmem1
- #pragma HLS INTERFACE m_axi port = out bundle = gmem0
In this step, we replace pointer parameters with tapa::mmap<T>
, which is
passed by value. We also remove HLS interface pragmas as they’re no longer
needed.
Step 3: Update Stream Definitions
Now, let’s update the stream definitions to use TAPA’s stream type.
- hls::stream<hls::vector<uint32_t, NUM_WORDS>> in1_stream("input_stream_1");
- hls::stream<hls::vector<uint32_t, NUM_WORDS>> in2_stream("input_stream_2");
- hls::stream<hls::vector<uint32_t, NUM_WORDS>> out_stream("output_stream");
+ tapa::stream<hls::vector<uint32_t, NUM_WORDS>> in1_stream("input_stream_1");
+ tapa::stream<hls::vector<uint32_t, NUM_WORDS>> in2_stream("input_stream_2");
+ tapa::stream<hls::vector<uint32_t, NUM_WORDS>> out_stream("output_stream");
In this step, we replace hls::stream
with tapa::stream
. The default
depth for TAPA streams is 2, matching Vitis HLS behavior. However, TAPA’s
software simulation enforces the depth, allowing you to catch potential issues
early.
Note
For setting a different depth, use tapa::stream<DATA_TYPE, FIFO_DEPTH>
.
For stream arrays, use tapa::streams<DATA_TYPE, ARRAY_SIZE, FIFO_DEPTH>
.
Note
TAPA optimizes communication channel implementation compared to Vitis HLS. While Vitis HLS can use shift-registers (SLR) or block RAMs (BRAM), it often defaults to BRAM even when unnecessary, such as for wide but shallow FIFOs. TAPA uses a higher threshold for BRAM usage, reducing overall BRAM consumption, and for wide (≥36 bits) and deep (≥4096 entries) FIFOs, TAPA automatically opts for URAMs instead of BRAMs, enhancing resource efficiency.
Step 4: Update Task Invocations
Finally, we’ll update how tasks are invoked using TAPA’s task API.
- #pragma HLS dataflow
- load_input(in1, in1_stream, size);
- load_input(in2, in2_stream, size);
- compute_add(in1_stream, in2_stream, out_stream, size);
- store_result(out, out_stream, size);
+ tapa::task()
+ .invoke(load_input, in1, in1_stream, size)
+ .invoke(load_input, in2, in2_stream, size)
+ .invoke(compute_add, in1_stream, in2_stream, out_stream, size)
+ .invoke(store_result, out, out_stream, size)
+ ;
In this step, we remove the #pragma HLS dataflow
directive as TAPA always
generates dataflow designs. We replace the function calls with tapa::task()
and .invoke()
, chaining the invocations together and adding a semicolon only
the end.
Step 5: Update Task Definitions
Lastly, update the task function signatures to use TAPA’s stream types.
void compute_add(
- hls::stream<hls::vector<uint32_t, NUM_WORDS>>& in1_stream,
- hls::stream<hls::vector<uint32_t, NUM_WORDS>>& in2_stream,
- hls::stream<hls::vector<uint32_t, NUM_WORDS>>& out_stream,
+ tapa::istream<hls::vector<uint32_t, NUM_WORDS>>& in1_stream,
+ tapa::istream<hls::vector<uint32_t, NUM_WORDS>>& in2_stream,
+ tapa::ostream<hls::vector<uint32_t, NUM_WORDS>>& out_stream,
int size
) {
Compared to Vitis HLS, TAPA requires stream arguments to be directional. We use
tapa::istream
for input streams and tapa::ostream
for output streams,
in place of hls::stream
.
Note
There is no need to specify the stream depth here, and the streams are passed by reference.
Similarly, replace pointers to external memory with tapa::mmap<DATA_TYPE>
:
void store_result(
- hls::vector<uint32_t, NUM_WORDS>* out,
- hls::stream<hls::vector<uint32_t, NUM_WORDS>>& out_stream,
+ tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> out,
+ tapa::istream<hls::vector<uint32_t, NUM_WORDS>>& out_stream,
int size
) {
// ...
}
void load_input(
- hls::vector<uint32_t, NUM_WORDS>* in,
- hls::stream<hls::vector<uint32_t, NUM_WORDS>>& inStream,
+ tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> in,
+ tapa::ostream<hls::vector<uint32_t, NUM_WORDS>>& inStream,
int size
) {
// ...
}
Note
The tapa::mmap<DATA_TYPE>
is passed by value, without *
or &
.
Note
The code reads from to out_stream
, so it is actually a
tapa::istream
; likewise, in_stream
is actually a tapa::ostream
.
Don’t be confused by the stream names.
Final Code for Example 1
// Copyright (c) 2024 RapidStream Design Automation, Inc. and contributors.
// All rights reserved. The contributor(s) of this file has/have agreed to the
// RapidStream Contributor License Agreement.
#include <hls_vector.h>
#include <tapa.h>
#include "assert.h"
#define MEMORY_DWIDTH 512
#define SIZEOF_WORD 4
#define NUM_WORDS ((MEMORY_DWIDTH) / (8 * SIZEOF_WORD))
#define DATA_SIZE 4096
void load_input(tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> in,
tapa::ostream<hls::vector<uint32_t, NUM_WORDS>>& inStream,
int Size) {
for (int i = 0; i < Size; i++) {
#pragma HLS pipeline II = 1
inStream << in[i];
}
}
void compute_add(tapa::istream<hls::vector<uint32_t, NUM_WORDS>>& in1_stream,
tapa::istream<hls::vector<uint32_t, NUM_WORDS>>& in2_stream,
tapa::ostream<hls::vector<uint32_t, NUM_WORDS>>& out_stream,
int Size) {
for (int i = 0; i < Size; i++) {
#pragma HLS pipeline II = 1
out_stream << (in1_stream.read() + in2_stream.read());
}
}
void store_result(tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> out,
tapa::istream<hls::vector<uint32_t, NUM_WORDS>>& out_stream,
int Size) {
for (int i = 0; i < Size; i++) {
#pragma HLS pipeline II = 1
out[i] = out_stream.read();
}
}
extern "C" {
void vadd(tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> in1,
tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> in2,
tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> out, int size) {
tapa::stream<hls::vector<uint32_t, NUM_WORDS>> in1_stream("input_stream_1");
tapa::stream<hls::vector<uint32_t, NUM_WORDS>> in2_stream("input_stream_2");
tapa::stream<hls::vector<uint32_t, NUM_WORDS>> out_stream("output_stream");
tapa::task()
.invoke(load_input, in1, in1_stream, size)
.invoke(load_input, in2, in2_stream, size)
.invoke(compute_add, in1_stream, in2_stream, out_stream, size)
.invoke(store_result, out, out_stream, size);
}
}
Note
By updating the includes, top function, stream definitions, task invocations, and task definitions, we’ve successfully migrated the vector addition example from Vitis HLS to TAPA.
Example 2: Complex Scenarios
In this tutorial, we’ll explore more advanced migration scenarios, focusing on dataflow in loops and computation in the top function, which are not covered in Example 1 and require additional attention.
Scenario 1: Dataflow in a Loop
TAPA enforces a strict separation between communication structures and computing units. This means that the “dataflow-in-a-loop” coding style in Vitis HLS is not directly supported in TAPA.
Warning
The compiler will enforce that a task that instantiates other tasks should only include stream definitions and task invocations.
In the following example, the dataflow region is defined within a loop to be executed for multiple iterations in Vitis HLS. However, this is not allowed in TAPA, as it will hinder quality of results by introducing additional logic.
// before
for (int i = 0; i < size; i++) { // this loop is invalid in TAPA
#pragma HLS dataflow
load_input(...);
compute_add(...);
// ...
}
void load_input(
// ...
) {
foo(); bar();
}
For the code to be compatible with TAPA, we push the loop into the tasks:
// after
// for (int i = 0; i < size; i++) {
tapa::task()
.invoke(load_input, in1, in1_stream, size)
.invoke(load_input, in2, in2_stream, size)
.invoke(compute_add, in1_stream, in2_stream, out_stream, size)
.invoke(store_result, out, out_stream, size)
;
// }
void load_input(
// ...
) {
for (int i = 0; i < size; i++) { // move the loop here
foo(); bar();
}
}
We remove the outer loop from the top function and move the loop into each task function. While this restriction may seem bothering, it ensures a good timing quality of the generated hardware.
Scenario 2: Computation in the Top Function
TAPA aims to strictly separate communication and computation for performance and quality of results. Therefore, computation should not be performed in the top function.
In the following example, the top function performs computation in a dataflow region. This is not allowed in TAPA, and the computation should be pushed into child tasks:
// before
size /= NUM_WORDS;
#pragma HLS dataflow
load_input(in1, in1_stream, size);
load_input(in2, in2_stream, size);
compute_add(in1_stream, in2_stream, out_stream, size);
store_result(out, out_stream, size);
While size /= NUM_WORDS
seems trivial, it is not allowed in TAPA, as it
in fact introduces computation in the top function. We need to move the
computation into the child tasks:
// after
tapa::task()
.invoke(load_input, in1, in1_stream, size)
.invoke(load_input, in2, in2_stream, size)
.invoke(compute_add, in1_stream, in2_stream, out_stream, size)
.invoke(store_result, out, out_stream, size)
;
void load_input(
tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> in,
tapa::ostream<hls::vector<uint32_t, NUM_WORDS>>& inStream,
int size
) {
size /= NUM_WORDS; // move the computation here
for (int i = 0; i < size; i++) {
#pragma HLS pipeline II=1
inStream << in[i];
}
}
TAPA requires the top function focused on task invocation and communication structure setup.
Final Code for Example 2
// Copyright (c) 2024 RapidStream Design Automation, Inc. and contributors.
// All rights reserved. The contributor(s) of this file has/have agreed to the
// RapidStream Contributor License Agreement.
#include <hls_vector.h>
#include <tapa.h>
#include "assert.h"
#define MEMORY_DWIDTH 512
#define SIZEOF_WORD 4
#define NUM_WORDS ((MEMORY_DWIDTH) / (8 * SIZEOF_WORD))
#define DATA_SIZE 4096
void load_input(tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> in,
tapa::ostream<hls::vector<uint32_t, NUM_WORDS>>& inStream,
int size) {
size /= NUM_WORDS;
for (int i = 0; i < size; i++) {
#pragma HLS pipeline II = 1
inStream << in[i];
}
}
void compute_add(tapa::istream<hls::vector<uint32_t, NUM_WORDS>>& in1_stream,
tapa::istream<hls::vector<uint32_t, NUM_WORDS>>& in2_stream,
tapa::ostream<hls::vector<uint32_t, NUM_WORDS>>& out_stream,
int size) {
size /= NUM_WORDS;
for (int i = 0; i < size; i++) {
#pragma HLS pipeline II = 1
out_stream << (in1_stream.read() + in2_stream.read());
}
}
void store_result(tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> out,
tapa::istream<hls::vector<uint32_t, NUM_WORDS>>& out_stream,
int size) {
size /= NUM_WORDS;
for (int i = 0; i < size; i++) {
#pragma HLS pipeline II = 1
out[i] = out_stream.read();
}
}
extern "C" {
void vadd(tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> in1,
tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> in2,
tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> out, int size) {
tapa::stream<hls::vector<uint32_t, NUM_WORDS>> in1_stream("input_stream_1");
tapa::stream<hls::vector<uint32_t, NUM_WORDS>> in2_stream("input_stream_2");
tapa::stream<hls::vector<uint32_t, NUM_WORDS>> out_stream("output_stream");
tapa::task()
.invoke(load_input, in1, in1_stream, size)
.invoke(load_input, in2, in2_stream, size)
.invoke(compute_add, in1_stream, in2_stream, out_stream, size)
.invoke(store_result, out, out_stream, size);
}
}
Note
TAPA requires a strict separation between communication structures and computing units. By pushing the loop and the computation into child tasks, TAPA ensures a good timing quality of the generated hardware.
Example 3: HLS-Compat Helpers
HLS-Compat Helpers are designed to bridge the gap between Vitis HLS and TAPA, allowing for incremental migration and verification. These helpers provide HLS-compatible behavior while using TAPA coding style.
Warning
This helper is only intended for software simulation and is not synthesizable. It is designed as the migration from existing HLS code to TAPA can take some efforts. The HLS-Compat helpers provide a way to incrementally migrate and verify the correctness of the code using software simulation.
We start from the HLS code of Example 1 <tutorial/migrate_from_vitis_hls:example 1: Basics with VecAdd> to demonstrate the usage of HLS-Compat helpers.
Step 1: Include the Compat Header
To use the HLS-compat helpers, in addition to tapa.h
, also include
tapa/host/compat.h
.
#include <hls_vector.h>
-#include <hls_stream.h>
+#include <tapa.h>
+#include <tapa/host/compat.h
#include "assert.h"
Step 2: Use Infinite-Depth Streams
In Vitis HLS’s software simulation, hls::stream
has infinite depth. While
it helps to simplify the development, it does not match the hardware behavior.
TAPA takes a different approach by enforcing a fixed depth for streams in the
simulation as the hardware does. This is usually desired as it helps to catch
potential issues early.
However, during development, it can be useful to have infinite-depth streams
for migration and verification.
The HLS-Compat helpers provide tapa::hls_compat::stream
which behaves like
hls::stream
in software simulation.
- hls::stream<hls::vector<uint32_t, NUM_WORDS>> in1_stream("input_stream_1");
- hls::stream<hls::vector<uint32_t, NUM_WORDS>> in2_stream("input_stream_2");
- hls::stream<hls::vector<uint32_t, NUM_WORDS>> out_stream("output_stream");
+ tapa::hls_compat::stream<hls::vector<uint32_t, NUM_WORDS>> in1_stream("input_stream_1");
+ tapa::hls_compat::stream<hls::vector<uint32_t, NUM_WORDS>> in2_stream("input_stream_2");
+ tapa::hls_compat::stream<hls::vector<uint32_t, NUM_WORDS>> out_stream("output_stream");
Step 3: Use Direction-Agnostic Stream
HLS uses hls::stream&
for both stream input and output; TAPA, however,
requires tapa::istream&
for input streams and tapa::ostream&
for
output streams. tapa::hls_compat::stream_interface&
is the HLS-compat
equivalent of tapa::istream&
and tapa::ostream&
that exposes APIs
from both in software simulation.
void compute_add(
- hls::stream<hls::vector<uint32_t, NUM_WORDS>>& in1_stream,
- hls::stream<hls::vector<uint32_t, NUM_WORDS>>& in2_stream,
- hls::stream<hls::vector<uint32_t, NUM_WORDS>>& out_stream,
+ tapa::hls_compat::stream_interface<hls::vector<uint32_t, NUM_WORDS>>& in1_stream,
+ tapa::hls_compat::stream_interface<hls::vector<uint32_t, NUM_WORDS>>& in2_stream,
+ tapa::hls_compat::stream_interface<hls::vector<uint32_t, NUM_WORDS>>& out_stream,
int size
) {
Step 4: Sequentially Scheduling Tasks
HLS schedules dataflow tasks sequentially for software simulation.
TAPA’s tapa::task()
, however, schedules them in parallel by default
to accelerate simulation and mimic the hardware behavior.
tapa::hls_compat::task()
is the HLS-compat equivalent of tapa::task()
that schedules tasks sequentially in the order of invocations.
- load_input(in1, in1_stream, size);
- load_input(in2, in2_stream, size);
- compute_add(in1_stream, in2_stream, out_stream, size);
- store_result(out, out_stream, size);
+ tapa::hls_compat::task()
+ .invoke(load_input, in1, in1_stream, size)
+ .invoke(load_input, in2, in2_stream, size)
+ .invoke(compute_add, in1_stream, in2_stream, out_stream, size)
+ .invoke(store_result, out, out_stream, size)
+ ;
Warning
Remember that this helper is only for software simulation and is not
synthesizable. If sequential scheduling is desired in hardware, use
tapa::task()
and pass a token between tasks to signal the completion,
and enforce the order of execution.
HLS-Compat Version of Example 1
// Copyright (c) 2024 RapidStream Design Automation, Inc. and contributors.
// All rights reserved. The contributor(s) of this file has/have agreed to the
// RapidStream Contributor License Agreement.
#include <hls_vector.h>
#include <tapa.h>
#include <tapa/host/compat.h>
#include "assert.h"
#define MEMORY_DWIDTH 512
#define SIZEOF_WORD 4
#define NUM_WORDS ((MEMORY_DWIDTH) / (8 * SIZEOF_WORD))
#define DATA_SIZE 4096
void load_input(
tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> in,
tapa::hls_compat::stream_interface<hls::vector<uint32_t, NUM_WORDS>>&
inStream,
int Size) {
for (int i = 0; i < Size; i++) {
#pragma HLS pipeline II = 1
inStream << in[i];
}
}
void compute_add(
tapa::hls_compat::stream_interface<hls::vector<uint32_t, NUM_WORDS>>&
in1_stream,
tapa::hls_compat::stream_interface<hls::vector<uint32_t, NUM_WORDS>>&
in2_stream,
tapa::hls_compat::stream_interface<hls::vector<uint32_t, NUM_WORDS>>&
out_stream,
int Size) {
for (int i = 0; i < Size; i++) {
#pragma HLS pipeline II = 1
out_stream << (in1_stream.read() + in2_stream.read());
}
}
void store_result(
tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> out,
tapa::hls_compat::stream_interface<hls::vector<uint32_t, NUM_WORDS>>&
out_stream,
int Size) {
for (int i = 0; i < Size; i++) {
#pragma HLS pipeline II = 1
out[i] = out_stream.read();
}
}
extern "C" {
void vadd(tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> in1,
tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> in2,
tapa::mmap<hls::vector<uint32_t, NUM_WORDS>> out, int size) {
tapa::hls_compat::stream<hls::vector<uint32_t, NUM_WORDS>> in1_stream(
"input_stream_1");
tapa::hls_compat::stream<hls::vector<uint32_t, NUM_WORDS>> in2_stream(
"input_stream_2");
tapa::hls_compat::stream<hls::vector<uint32_t, NUM_WORDS>> out_stream(
"output_stream");
tapa::hls_compat::task()
.invoke(load_input, in1, in1_stream, size)
.invoke(load_input, in2, in2_stream, size)
.invoke(compute_add, in1_stream, in2_stream, out_stream, size)
.invoke(store_result, out, out_stream, size);
}
}
Warning
tapa::hls_compat
APIs are software simulation only and are NOT
synthesizable.
One must finish the migration before synthesis, including to remove
#include <tapa/host/compat.h>
and replace any tapa::hls_compat
API
with their synthesizable equivalent.