How-To Series · Episode 46 / 59 · Module 7: Power Tools

Hermes · Batch Processing

Generate training data overnight. Or eval a model. Same primitive.

After this videoYou can now run Hermes across large prompt sets unattended.

batch_runner.py runs the full agent, with tools, across hundreds or thousands of prompts in parallel, producing structured trajectory data, primarily for fine-tuning data or model evaluation. The input is a JSONL file, one {"prompt": ...} per line (optional per-entry image / cwd). Run with --dataset_file --batch_size --run_name --model --num_workers; each prompt gets its own isolated session. Output lands in data/<run_name>/, headlined by trajectories.jsonl in ShareGPT format (from/value) with per-trajectory tool stats. It is built for long runs: checkpointing plus a content-based --resume that retries failures, automatic quality filtering (no-reasoning and hallucinated-tool samples dropped), and toolset distributions so trajectories cover diverse tool combinations.

About these resources. Every command in this video comes from the Batch Processing doc.

New words here · Plain English

one sentence each · full glossary

BatchRunning many similar tasks in sequence without human supervision between each one.

EvalShort for evaluation: testing an AI model on a set of questions to measure how well it performs.

Sources · What this video distills

1 docs page · every command below traces to one of them

Primary · batch_runner.py, dataset format, run flags, ShareGPT output, checkpointing/resume, quality filtering, distributions

Batch Processing

Commands shown · Copy and paste

each shows the source doc it came from

Runfrom source ↗

python batch_runner.py --dataset_file=data/prompts.jsonl --batch_size=20 --run_name=coding_v1 --model=anthropic/claude-sonnet-4.6 --num_workers=8

Resumefrom source ↗

python batch_runner.py --dataset_file=... --run_name=coding_v1 --resume

List distributionsfrom source ↗

python batch_runner.py --list_distributions

Going deeper · Related Hermes docs

further reading · not sources of facts shown above

the single-script analogue

predictable cost-per-trajectory at scale

Next in the series · Episodes that build on this

E45

Code Execution

E38

Subagent Delegation

E50

The Provider Landscape