T — The Orchestration Engine for Polyglot Data Science

T is an experimental orchestration engine designed for declarative, reproducible pipelines. It provides a functional Domain-Specific Language (DSL) that coordinates R, Python, and Shell nodes—with Julia support planned—within a Nix-managed infrastructure.

Unlike traditional scripting languages, T is built to be a specifications-ready engine, making data analysis explicit, inspectable, and pipeline-oriented. This unique architecture ensures that humans and LLMs can collaborate on defining high-level intent while T handles the low-level orchestration and environmental consistency.

Status: Version 0.51.1 “Sangoku”, latest stabilization release.


The Polyglot Pipeline

T’s core strength is its mandatory pipeline architecture. To execute code in T, you typically define it as a series of nodes in a directed acyclic graph (DAG). T handles the “glue”:

-- A reproducible polyglot pipeline
p = pipeline {
  -- 1. Load data natively in T (CSV backend)
  data = node(
    command = read_csv("examples/sample_data.csv") |> filter($age > 25),
    serializer = "csv"
  )
  
  -- 2. Train a statistical model in R (using the rn() wrapper)
  model_r = rn(
    command = <{ lm(score ~ age, data = data) }>,
    serializer = "pmml",
    deserializer = "csv"
  )
  
  -- 3. Predict natively in T (no R/Python runtime needed for evaluation!)
  predictions = node(
    command = data |> mutate($pred = predict(data, model_r)),
    deserializer = "pmml"
  )

  -- 4. Generate a shell report
  report = shn(command = <{
    printf 'R model results cached at: %s\n' "$T_NODE_model_r/artifact"
  }>)
}

-- Build the pipeline into reproducible Nix artifacts
build_pipeline(p)

What is T?

T is not designed to replace your existing tools; it is designed to orchestrate them. It addresses the “dependency drift” and “works on my machine” syndrome by making Nix mandatory.


Foreign Language Nodes & Deserialization

When you define a node using node(), rn() (R), pyn() (Python), or shn() (Shell)—with Julia support planned—T treats the result as a first-class Node object. These objects transition through two main states:

  1. Unbuilt Node: A specification of what to run (command, runtime, environment variables).
  2. Computed Node: After build_pipeline(), the node points to a concrete, immutable artifact in the Nix store.

Automatic Deserialization

When you call read_node("node_name") in the REPL, T looks at the node’s serializer and attempts to automatically load the data back into the T environment:

Serializer Resulting T Type Backend
default / serialize Varies Native T binary serialization
arrow DataFrame Apache Arrow IPC (zero-copy)
csv DataFrame Native CSV parser
json Dict / List JSON parser
pmml Model Native T model evaluator

Looking into the “Entrails”

If a node’s serializer is not supported for automatic deserialization, read_node() returns the Computed Node object itself. This object contains all the metadata necessary to load the artifact manually or inspect its provenance.

You can use explain() to look inside a built node:

-- Example: Inspecting a built R node
> model_node = p.model_r
> explain(model_node)
{
  `kind`: "computed_node",
  `name`: "model_r",
  `runtime`: "R",
  `path`: "/nix/store/...-model_r/artifact",
  `serializer`: "pmml",
  `class`: "lm",
  `dependencies`: ["data"]
}

The path field is the “escape hatch”—it gives you the absolute path to the node’s output in the Nix store. You can use this to start an external interpreter and inspect the file directly, or pass it to a custom loader like read_parquet(model_node.path).


Documentation

Getting Started

User Guides

Advanced Topics

Developer Resources

Reference & Support