C++ API

This page documents the TAPA C++ library (#include <tapa.h>). Types and functions live in the tapa namespace unless noted otherwise.

The task hierarchy builder. An upper-level task constructs a tapa::task and chains .invoke() calls on it. The tapa::task destructor waits for all joined child instances to finish before returning.

struct task {
  // Invoke func with the given arguments using the default join mode.
  template <typename Func, typename... Args>
  task& invoke(Func&& func, Args&&... args);

  // Invoke func with an explicit mode (tapa::join or tapa::detach).
  template <internal::InvokeMode mode, typename Func, typename... Args>
  task& invoke(Func&& func, Args&&... args);

  // Invoke func N times with the given mode.
  template <internal::InvokeMode mode, int N, typename Func, typename... Args>
  task& invoke(Func&& func, Args&&... args);
};

Invoke modes:

Mode	Behavior
`tapa::join` (default)	The task runs concurrently with siblings; the parent waits for it to finish before returning.
`tapa::detach`	Fire-and-forget; the parent does not wait for the task to finish. Use with care — the parent may return before the detached task completes.

Example:

void Top(tapa::istream<float>& in, tapa::ostream<float>& out, int n) {
  tapa::task()
      .invoke(LoadData, in, n)
      .invoke<tapa::detach>(MonitorTask, n)
      .invoke(StoreData, out, n);
}

`tapa::seq`

A sequential index generator. When tapa::seq{} is passed as an argument to .invoke() with a repeat count N, each invocation receives a unique integer (0, 1, 2, …, N−1). Use this to distribute indexed work across task instances, such as assigning each instance its slice of a stream array.

tapa::streams<float, 4> channels;
tapa::task().invoke<tapa::join, 4>(Worker, channels, tapa::seq{});
// Worker instance 0 gets channel[0], instance 1 gets channel[1], etc.

`tapa::executable`

Wraps a path to an XO or bitstream file for use in .invoke(). When an executable is passed as the second argument to .invoke(), the task runs on hardware (via FRT) instead of in software simulation.

class executable {
 public:
  explicit executable(std::string path);
};

Usage:

tapa::task().invoke(MyKernel, tapa::executable("my_kernel.xo"), arg1, arg2);

Streams

Streams are the fundamental inter-task communication primitive. Each stream is a fixed-depth FIFO. Blocking operations stall until data or space is available; non-blocking operations return immediately.

`tapa::stream<T, Depth>`

Bidirectional FIFO that owns the underlying storage. Declared inside an upper-level task and passed to child tasks as istream<T>& (read end) or ostream<T>& (write end). The default depth is 2.

template <typename T, uint64_t Depth = 2>
class stream;

`tapa::istream<T>`

Read-only view of a stream. Always passed by reference in task signatures: tapa::istream<T>&.

Method	Blocking	Destructive	Description
`read()`	yes	yes	Blocks until an element is available, then returns it.
`read(bool& ok)`	no	yes	Non-blocking read; sets `ok` to true if an element was consumed.
`try_read(T& val)`	no	yes	Non-blocking read; returns true and writes to `val` if successful.
`peek(bool& ok)`	no	no	Returns the next element without consuming it; sets `ok`.
`try_peek(T& val)`	no	no	Non-blocking peek; returns true if data was available.
`empty()`	no	no	Returns true if the stream contains no elements.
`eot(bool& ok)`	no	no	Returns true if the head element is an end-of-transaction marker.
`open()`	yes	yes	Blocks until an EoT marker arrives, then consumes it. Used to receive stream closure.
`try_open()`	no	yes	Non-blocking variant of `open()`; returns true if EoT was consumed.

`tapa::ostream<T>`

Write-only view of a stream. Always passed by reference in task signatures: tapa::ostream<T>&.

Method	Blocking	Destructive	Description
`write(const T& val)`	yes	yes	Blocks until space is available, then writes `val`.
`try_write(const T& val)`	no	yes	Non-blocking write; returns true if the element was written.
`full()`	no	no	Returns true if the stream is full.
`close()`	yes	yes	Writes an end-of-transaction marker; blocks until space is available.
`try_close()`	no	yes	Non-blocking variant of `close()`; returns true if the EoT was written.

`tapa::streams<T, N, Depth>`

Array of N streams of type T, each with depth Depth. Declared in an upper-level task and unpacked by index when passed to child tasks.

`tapa::istreams<T, N>` / `tapa::ostreams<T, N>`

Array of N read-only or write-only stream views. Always passed by reference in task signatures.

Note

All stream types (istream, ostream, istreams, ostreams) must be passed by reference in task signatures. Passing by value is a compile error.

Memory (mmap)

`tapa::mmap<T>`

A pointer-like handle for synchronous bulk memory access. Backed by a contiguous host allocation. In a task signature, tapa::mmap<T> is passed by value.

template <typename T>
class mmap {
 public:
  explicit mmap(T* ptr);
  mmap(T* ptr, uint64_t size);
  template <typename Container>
  explicit mmap(Container& container);  // accepts std::vector etc.

  T* data() const;
  uint64_t size() const;

  template <uint64_t N>
  mmap<vec_t<T, N>> vectorized() const;  // reinterpret as wider element type

  template <typename U>
  mmap<U> reinterpret() const;  // reinterpret element type
};

`tapa::async_mmap<T>`

Decoupled memory access type. Instead of blocking on each memory operation, the kernel issues read/write requests and collects responses through five FIFO channels. This allows the kernel to pipeline memory operations. Passed by reference in task signatures: tapa::async_mmap<T>&.

See async_mmap channels below for channel details.

`tapa::mmaps<T, N>`

Array of N tapa::mmap<T> regions. Passed by value as a single argument and unpacked by the framework one region per child invocation.

template <typename T, uint64_t N>
class mmaps;

Directional mmap wrappers (host-side only)

Used in the top-level tapa::invoke() call to express direction hints. The kernel task signature uses plain tapa::mmap<T> or tapa::mmaps<T, N>.

Wrapper	Direction
`tapa::read_only_mmap<T>`	Host writes, kernel reads
`tapa::write_only_mmap<T>`	Kernel writes, host reads
`tapa::read_write_mmap<T>`	Both read and write
`tapa::placeholder_mmap<T>`	No direction hint
`tapa::read_only_mmaps<T, N>`	Array variant of `read_only_mmap`
`tapa::write_only_mmaps<T, N>`	Array variant of `write_only_mmap`
`tapa::read_write_mmaps<T, N>`	Array variant of `read_write_mmap`

`tapa::aligned_allocator<T>`

STL-compatible allocator that returns page-aligned memory suitable for DMA transfers. Use this with std::vector when allocating host buffers that will be passed to a kernel.

std::vector<float, tapa::aligned_allocator<float>> buf(n);
tapa::invoke(MyKernel, bitstream, tapa::read_only_mmap<float>(buf), n);

async_mmap Channels

tapa::async_mmap<T> exposes five public member channels. The kernel writes addresses to the request channels and reads results from the response channels. All channel operations are non-blocking where prefixed with try_.

Channel	Type	Direction	Description
`read_addr`	`ostream<int64_t>`	kernel → memory	Write an element index to request a read. The framework converts the index to a byte offset internally.
`read_data`	`istream<T>`	memory → kernel	Read the data returned by a previously issued read request.
`write_addr`	`ostream<int64_t>`	kernel → memory	Write an element index to request a write.
`write_data`	`ostream<T>`	kernel → memory	Write the data to be written at the requested address.
`write_resp`	`istream<uint8_t>`	memory → kernel	Drain write-completion acknowledgements. Each response value encodes `burst_length - 1` (i.e., a value of 0 means one write completed, 255 means 256 writes completed).

Warning

The kernel must drain write_resp to avoid deadlock. If the response channel fills up, the memory subsystem stops issuing further write completions and the kernel stalls.

Typical async_mmap read pattern:

void Reader(tapa::async_mmap<float>& mem, tapa::ostream<float>& out, int n) {
#pragma HLS pipeline II=1
  for (int i_req = 0, i_resp = 0; i_resp < n;) {
    if (i_req < n && !mem.read_addr.full()) {
      mem.read_addr.write(i_req);
      ++i_req;
    }
    float val;
    if (mem.read_data.try_read(val)) {
      out.write(val);
      ++i_resp;
    }
  }
}

Utilities

`tapa::vec_t<T, N>`

An N-element SIMD vector of type T. Stores elements as a packed bit array, which maps directly to wide AXI ports. Supports element access via operator[], arithmetic operators element-wise, and common reductions (sum, product).

template <typename T, int N>
struct vec_t {
  static constexpr int length = N;
  static constexpr int width = widthof<T>() * N;  // total bit width

  T& operator[](int pos);
  const T& operator[](int pos) const;
};

Related free functions: truncated<begin, end>(vec), cat(v1, v2), make_vec<N>(val).

`tapa::widthof<T>()`

Returns the bit width of type T. For ap_int<W> and ap_uint<W>, returns W. For plain C++ types, returns sizeof(T) * CHAR_BIT.

template <typename T>
inline constexpr int widthof();

template <typename T>
inline constexpr int widthof(T object);  // deduce T from argument

EoT macros

End-of-transaction macros simplify consuming a stream until a sentinel marker is received.

Macro	Description
`TAPA_WHILE_NOT_EOT(stream)`	Loop body executes once per data element; loop exits when the EoT marker is seen.
`TAPA_WHILE_NEITHER_EOT(s1, s2)`	Two-stream variant; exits when either stream reaches EoT.
`TAPA_WHILE_NONE_EOT(s1, s2, s3)`	Three-stream variant.

// Example: consume all elements from 'in' and forward to 'out'
TAPA_WHILE_NOT_EOT(in) {
  out.write(in.read());
}
in.open();   // consume the EoT marker
out.close(); // send EoT marker downstream

Synthesis pragmas (C++ attributes)

These C++ attributes are recognised by TAPA and lowered to Vitis HLS pragmas during synthesis. They have no effect in software simulation.

Attribute	Description
`[[tapa::pipeline(II)]]`	Pipeline the enclosing loop or function with initiation interval `II`.
`[[tapa::unroll(factor)]]`	Unroll the enclosing loop by `factor`.
`[[tapa::target("ignore")]]`	Mark a task for custom RTL replacement. TAPA generates a port-signature template but does not synthesize the task body.

Note

[[tapa::target("ignore")]] was formerly written as [[tapa::target("non_synthesizable", "xilinx")]]. The "ignore" form is the current spelling.

`tapa::hls` sub-namespace

tapa::hls::stream<T> is a stream type that behaves like hls::stream<T> in software simulation: it has effectively infinite depth, so producers never block in simulation. Use it when incrementally migrating a Vitis HLS design and you want software simulation to pass without tuning stream depths. #include <tapa.h> includes this automatically.

Note

tapa::hls::stream synthesizes to the same RTL FIFO as tapa::stream<T, N> with the declared depth N. The infinite depth only applies to software simulation. The practical reason to replace it before hardware build is that software simulation with tapa::hls::stream will not expose backpressure bugs — switching to tapa::istream<T>& / tapa::ostream<T>& with a tuned depth catches those bugs at simulation time rather than on hardware.

TAPA Documentation

C++ API

Task Invocation

`tapa::task`

`tapa::seq`

`tapa::executable`

Streams

`tapa::stream<T, Depth>`

`tapa::istream<T>`

`tapa::ostream<T>`

`tapa::streams<T, N, Depth>`

`tapa::istreams<T, N>` / `tapa::ostreams<T, N>`

Memory (mmap)

`tapa::mmap<T>`

`tapa::async_mmap<T>`

`tapa::mmaps<T, N>`

Directional mmap wrappers (host-side only)

`tapa::aligned_allocator<T>`

async_mmap Channels

Utilities

`tapa::vec_t<T, N>`

`tapa::widthof<T>()`

EoT macros

Synthesis pragmas (C++ attributes)

`tapa::hls` sub-namespace

Keyboard shortcuts

TAPA Documentation