Tasks
Purpose: Understand TAPA's three task types and their constraints.
Prerequisites: The Programming Model
Why this exists
TAPA organizes an FPGA accelerator as a hierarchy of C++ functions called tasks. This hierarchy lets the compiler assign each leaf task to an independent HLS module synthesized in parallel, while upper-level tasks provide the wiring between those modules. The result is a design whose parallel structure is explicit in the source code rather than inferred from pragmas.
Mental model
A TAPA design forms a tree of tasks:
Top-level task (entry point, kernel boundary)
├── Upper-level task (orchestration only)
│ ├── Leaf task A (computation)
│ └── Leaf task B (computation)
└── Leaf task C (computation)
Each level has a distinct role:
- Leaf task — performs computation: loops, arithmetic, stream reads/writes. May call ordinary C++ functions. Must NOT invoke other TAPA tasks.
- Upper-level task — orchestrates execution. Its body may only instantiate
streams and invoke child tasks with
tapa::task().invoke(...). It contains no computation of its own. - Top-level task — the kernel entry point invoked from the host via
tapa::invoke. For thexilinx-vitistarget (the default), the top-level task must itself be an upper-level task.
Minimal correct example
The VecAdd function from the vector-add example is a top-level upper-level
task. It instantiates three streams, then invokes four leaf tasks:
void VecAdd(tapa::mmap<const float> a, tapa::mmap<const float> b,
tapa::mmap<float> c, uint64_t n) {
tapa::stream<float> a_q("a");
tapa::stream<float> b_q("b");
tapa::stream<float> c_q("c");
tapa::task()
.invoke(Mmap2Stream, a, n, a_q)
.invoke(Mmap2Stream, b, n, b_q)
.invoke(Add, a_q, b_q, c_q, n)
.invoke(Stream2Mmap, c_q, c, n);
}
Mmap2Stream, Add, and Stream2Mmap are leaf tasks that each perform a
specific computation. VecAdd contains no computation — only stream
declarations and .invoke(...) calls.
Detached tasks
By default a parent task waits for all child tasks to finish before it terminates. A detached task is instead left running; the parent does not wait for it. This is useful for purely data-driven tasks that have no natural termination point (e.g., a constant data source or an infinite switch network).
tapa::task().invoke<tapa::detach>(LeafTask, arg1, arg2);
Detached tasks are similar to std::thread::detach in the C++ STL. Because
their state does not need to be tracked, they avoid fan-out termination signals
and reduce area.
By default, TAPA tasks are joined: the parent waits for each child to complete.
Use tapa::detach only when the child task genuinely does not need to
terminate on program completion.
Rules
- Leaf tasks receive streams by reference (
istream<T>&,ostream<T>&) and mmap interfaces by value (mmap<T>). - An upper-level task body must contain only stream instantiations and
.invoke(...)calls — no loops, arithmetic, or other computation. async_mmapchannel operations (read_addr,read_data, etc.) are leaf-task-only.- For the
xilinx-vitistarget (the default), the top-level task must be an upper-level task — it cannot be a leaf task. - Leaf templated tasks (template functions that compute directly) are supported. Non-leaf templated tasks that invoke other tasks are not yet supported.
Common mistakes
Wrong — computation inside an upper-level task body:
// Wrong: for loop makes this a leaf task, not an upper-level task
void BadUpper(tapa::mmap<float> mem, uint64_t n) {
tapa::stream<float> q("q");
for (uint64_t i = 0; i < n; ++i) { // <-- computation here
q.write(mem[i]);
}
tapa::task().invoke(Consumer, q, n);
}
Right — move computation into a dedicated leaf task:
void Loader(tapa::mmap<float> mem, uint64_t n, tapa::ostream<float>& q) {
for (uint64_t i = 0; i < n; ++i) {
q.write(mem[i]);
}
}
void GoodUpper(tapa::mmap<float> mem, uint64_t n) {
tapa::stream<float> q("q");
tapa::task()
.invoke(Loader, mem, n, q)
.invoke(Consumer, q, n);
}
See also
Next step: Streams