Anyscale and NovaSky Team Releases SkyRL tx v0.1.0: Bringing Tinker Compatible Reinforcement Learning RL Engine To Local GPU Clusters

How can AI teams run Tinker style reinforcement learning on large language models using their own infrastructure with a single unified engine? Anyscale and NovaSky (UC Berkeley) Team releases SkyRL tx v0.1.0 that gives developers a way to run a Tinker compatible training and inference engine directly on their own hardware, while keeping the same minimal API that Tinker exposes in the managed service.

The research team describes SkyRL tx as a unified training and inference engine that implements the Tinker API and allows people to run a Tinker like service on their own infrastructure. This v0.1.0 version is the first of its series that supports reinforcement learning end to end, and it also makes sampling significantly faster.

Tinker API in brief

Tinker from Thinking Machines is a training API built around four core functions. forward_backward performs a forward pass and a backward pass and accumulates gradients. optim_step updates model weights based on those gradients. sample generates tokens for interaction, evaluation or RL actions. save_state writes checkpoints for resuming training.

Instead of a full task specific fine tuning abstraction, Tinker exposes these low level primitives so that users can implement their own supervised or reinforcement learning loops in regular Python code, while the service handles GPU scheduling and distributed execution.

SkyRL tx targets this exact API and implements an open backend that users can deploy locally. It keeps the Tinker programming model, while removing the need to rely only on the hosted environment.

Where SkyRL tx fits inside SkyRL

SkyRL is a full stack reinforcement learning library for large language models that includes skyrl-agent for long horizon agents, skyrl-train for training, and skyrl-gym for tool use environments such as math, coding, search and SQL.

Within this stack, skyrl-tx is marked as an experimental cross platform library that exposes a local Tinker like REST API for model post training. SkyRL tx therefore becomes the system layer that connects RL logic, environments and training code to concrete GPU resources through the Tinker interface.

Architecture, inference engine that also trains

The SkyRL tx architecture is described as an inference engine that also supports backward passes. It has four main components:

REST API server that processes incoming requests from different users.

Database that tracks metadata about models, checkpoints, requests and futures, and also acts as a job queue. The current implementation uses SQLite behind an interface that also supports other SQL databases such as Postgres.

Engine that schedules and batches requests across users. Each engine instance serves a single base model and can attach many LoRA adapters.

Worker that executes forward and backward passes and holds model definitions and optimizer states. Multiple workers would be enabling more advanced multi node sharding in upcoming versions

What v0.1.0 adds?

The v0.1.0 release focuses on reinforcement learning support and performance improvements. The official release highlights several concrete changes:

Sampling is now much faster, since it is jitted and properly batched and sharded in the engine.
Different sampling parameters per request, per request seeds and stop tokens are now supported, which is useful when many experiments share a base model.
After several fixes, the RL loop now runs properly through the engine.
Gradient checkpointing support and micro batching for sampling are implemented.
Postgres is now supported as a database backend, next to SQLite.

Running RL end to end on 8 H100 GPUs

The official release contains a specific code recipe for running reinforcement learning end to end on a cluster with 8 H100 GPUs.

First, users clone the SkyRL repository and in the skyrl-tx folder start the engine with:

uv run –extra gpu –extra tinker -m tx.tinker.api \
–base-model Qwen/Qwen3-4B \
–max-lora-adapters 3 \
–max-lora-rank 1 \
–tensor-parallel-size 8 \
–train-micro-batch-size 8 > out.log

Then they clone the Tinker Cookbook from the Thinking Machines team and in the tinker_cookbook/recipes folder run:

export TINKER_API_KEY=dummy
export WANDB_API_KEY=<your key>
uv run –with wandb –with tinker rl_loop.py \
base_url=http://localhost:8000 \
model_name=”Qwen/Qwen3-4B” \
lora_rank=1 \
max_length=1024 \
save_every=100

This produces a reward curve that confirms the RL loop runs correctly through the local SkyRL tx backend.

Key Takeaways

SkyRL tx v0.1.0 implements a local, Tinker compatible engine that unifies training and inference for LLM post training.
The system exposes Tinker primitives, forward_backward, optim_step, sample and save_state over REST, while handling batching, LoRA adapters and device placement internally.
Architecture is split into API server, SQL database, scheduling engine and workers that execute forward and backward passes for a single base model with multiple LoRA adapters.
v0.1.0 adds end to end reinforcement learning support, faster jitted and sharded sampling, per request sampling parameters, gradient checkpointing, micro batching and Postgres support.

SkyRL tx v0.1.0 is a practical step for dev teams that want Tinker style reinforcement learning on their own clusters with a consistent Tinker API surface. The design that treats the system as an inference engine that also runs backward passes is clean and reduces stack divergence. Support for LoRA, gradient checkpointing, micro batching and Postgres is a concrete systems upgrade. Overall, this release turns Tinker compatibility into an actionable local RL backend for LLM

Check out the Repo and Official Release. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

Source link