T is a reproducibility-first domain-specific language (DSL) for polyglot data science. It provides a functional, immutable language for constructing composable micropipelines: first-class, introspectable computation graphs that coordinate R, Python, Julia, Quarto, and Shell execution within a unified system. Pipelines in T are not configuration artifacts but executable program structures with explicit dataflow, typed nodes, and content-addressed outputs. Artifacts are automatically serialized and exchanged across language boundaries, allowing polyglot workflows to compose without manual I/O glue code.
Built on Nix, T integrates declarative environment management and deterministic builds at the language level, enabling reproducible execution across machines and operating systems. Workflow structure, dependency resolution, environment specification, and provenance tracking are intrinsic properties of the language rather than concerns delegated to external tooling. As a result, T is designed so that reproducible workflows are the default: reproducibility of your projects is not an afterthought anymore.
T also includes a growing collection of data manipulation verbs inspired by the R tidyverse ecosystem, particularly packages such as dplyr, stringr, and lubridate. This makes it possible to perform exploratory data analysis directly from the T REPL before promoting computations into reproducible pipelines.
Status: Version 0.52.0 “Kaméhaméha”.
T’s core strength is its mandatory pipeline architecture. To execute code in T, you typically define it as a series of nodes in a directed acyclic graph (DAG). T handles the “glue”:
-- A reproducible polyglot pipeline
p = pipeline {
-- 1. Load data natively in T (CSV backend)
data = node(
command = read_csv("examples/sample_data.csv") |> filter($age > 25),
serializer = "csv"
)
-- 2. Train a statistical model in R (using the rn() wrapper)
model_r = rn(
command = <{ lm(score ~ age, data = data) }>,
serializer = "pmml",
deserializer = "csv"
)
-- 3. Predict natively in T (no R/Python runtime needed for evaluation!)
predictions = node(
command = data |> mutate($pred = predict(data, model_r)),
deserializer = "pmml"
)
-- 4. Generate a shell report
report = shn(command = <{
printf 'R model results cached at: %s\n' "$T_NODE_model_r/artifact"
}>)
}
-- Build the pipeline into reproducible Nix artifacts
build_pipeline(p)
Pipelines are not mandatory in the T REPL. This is a deliberate design choice intended to support exploratory data analysis and rapid experimentation before computations are promoted into reproducible pipelines. Users can also launch R, Python, or Julia REPLs directly from within a T project, inheriting the same pinned environments and project dependencies. This allows exploratory work to take place in familiar ecosystems while remaining integrated with T’s reproducibility model.
T is not designed to replace your existing tools; it is designed to orchestrate them. It addresses the “dependency drift” and “works on my machine” syndrome by making Nix mandatory.
a = 1 / 0, then
a is an Error value, not an exception. Logic
is auditable and predictable.intent blocks and structured metadata, T is built for a
future where humans and AI collaborate on complex data workflows.When you define a node using node(), rn()
(R), pyn() (Python), jln() (Julia), or
shn() (Shell), T treats the result as a first-class
Node object. These objects transition through two main
states:
build_pipeline(),
the node points to a concrete, immutable artifact in the Nix store.When you call read_node("node_name") in the REPL, T
looks at the node’s serializer and attempts to
automatically load the data back into the T environment:
| Serializer | Resulting T Type | Backend |
|---|---|---|
default /
serialize |
Varies | Native T binary serialization |
arrow |
DataFrame |
Apache Arrow IPC (zero-copy) |
csv |
DataFrame |
Native CSV parser |
json |
Dict / List |
JSON parser |
pmml |
Model |
Native T model evaluator |
If a node’s serializer is not supported for automatic
deserialization, read_node() returns the Computed
Node object itself. This object contains all the metadata
necessary to load the artifact manually or inspect its provenance.
You can use explain() to look inside a built node:
-- Example: Inspecting a built R node
> model_node = p.model_r
> explain(model_node)
{
`kind`: "computed_node",
`name`: "model_r",
`runtime`: "R",
`path`: "/nix/store/...-model_r/artifact",
`serializer`: "pmml",
`class`: "lm",
`dependencies`: ["data"]
}
The path field is the “escape hatch”: it gives you the
absolute path to the node’s output in the Nix store. You can use this to
start an external interpreter and inspect the file directly, or pass it
to a custom loader like read_parquet(model_node.path).
For a more streamlined experience, you can use our External Helper Packages for R, Python, and Julia, which automate log resolution and deserialization from within those environments.
fct_* helpers