Cutechess-cli: Command-line chess engine tester
Cutechess-cli
Definition
Cutechess-cli is the command-line tournament manager from the open-source Cute Chess project. It runs automated matches and tournaments between chess engines that speak the UCI or XBoard/WinBoard protocol. Designed for headless operation on servers, it schedules games, enforces time controls, adjudicates results, and saves complete records (e.g., PGN) for later analysis.
How it is used in chess
Although it doesn’t play chess itself, cutechess-cli is a foundational tool in computer chess:
- Engine-versus-engine testing to compare strength, speed, and stability.
- Regression testing of new engine versions or parameter changes.
- Large-scale statistical experiments (e.g., SPRT testing) to validate Elo gains.
- Round-robin and gauntlet tournaments using curated opening suites or random positions.
- Continuous integration on clusters for open-source engines, often with thousands of games per patch.
Key features
- Protocols: Runs UCI and XBoard engines side-by-side.
- Tournaments: Round-robin, gauntlet, and head-to-head matches with automatic color alternation.
- Time controls: Classical (e.g., 60+1), blitz/bullet, moves-per-session (e.g., 40/15+0.1), or fixed depth/nodes/movetime.
- Openings: Start games from PGN or EPD opening files; shuffle or repeat with color-reversed pairs to reduce bias.
- Adjudication: Early draw or resignation rules (score and move-count thresholds), plus standard rules (checkmate, stalemate, repetition, 50-move rule).
- Concurrency: Runs many games in parallel to utilize multi-core hardware.
- Output: PGN game files, live match scores, and optional statistics (including Elo estimates, and pass/fail for SPRT).
- Cross-platform: Works on Linux, macOS, and Windows; ideal for servers or CI pipelines.
Strategic and historical significance
During the 2010s and beyond, cutechess-cli became a de facto standard for reproducible engine testing. Its automation, opening handling, and statistical testing support enabled rapid, community-driven improvement of leading engines. Many public testing frameworks—such as large-scale volunteer systems and lab setups—use cutechess-cli to run controlled experiments, accelerating innovation in evaluation functions, search heuristics, and endgame techniques. In short, consistent testing with cutechess-cli underpins much of the measured Elo progress you see in top engines today.
Examples
Below are representative command patterns. Exact flags can vary by version; consult your local help (cutechess-cli -help) for details.
-
Basic head-to-head match (100 games, 4 parallel games):
"cutechess-cli -engine cmd=./stockfish name=SF option.Hash=256 option.Threads=4 -engine cmd=./engineB name=EngB option.Hash=256 option.Threads=4 -each proto=uci tc=60+0.6 timemargin=50 -games 100 -concurrency 4 -openings file=book.pgn format=pgn order=random -repeat -pgnout sf_vs_engb.pgn -event "Local Match" -site "Lab""
-
Gauntlet (one engine vs. many):
"cutechess-cli -engine cmd=./candidate name=NewBuild -engine cmd=./ref1 name=Ref1 -engine cmd=./ref2 name=Ref2 -each tc=10+0.1 proto=uci -openings file=starter.epd format=epd order=random -games 200 -concurrency 8 -draw movenumber=80 movecount=8 score=10 -resign movecount=3 score=600 -pgnout gauntlet.pgn"
-
SPRT regression test (detect a small Elo gain):
"cutechess-cli -engine cmd=./stockfish name=Base option.Threads=2 -engine cmd=./stockfish name=Test option.Threads=2 option.SomePatch=on -each tc=5+0.05 -games 4000 -sprt elo0=0 elo1=5 alpha=0.05 beta=0.05 -openings file=mini.epd format=epd order=random -concurrency 16 -pgnout sprt_test.pgn"
-
Fixed-depth testing to isolate search/eval changes:
"cutechess-cli -engine cmd=./engineA -engine cmd=./engineB -each depth=15 proto=uci -games 200 -openings file=balanced.pgn format=pgn order=random -pgnout depth15.pgn"
Sample PGN output (one miniature from a match)
Example of the kind of record cutechess-cli writes; short game from a Ruy Lopez start.
Practical tips
- Fix engine options (Hash, Threads, Syzygy paths, etc.) for both sides to ensure fairness.
- Use a diverse opening suite and -repeat to play each position with colors reversed, minimizing opening and color bias.
- Set timemargin appropriately; too small a margin can cause avoidable time losses, too large wastes time.
- For reproducibility, pin CPU affinity or keep the machine load stable; parallel games can contend for resources.
- When tuning parameters with SPRT, choose realistic elo0/elo1 bounds; wider bounds decide faster but risk missing tiny improvements.
- Depth/nodes tests remove time-management variance; time-control tests are closer to real play. Use both during development.
Common pitfalls
- Mismatched protocols (e.g., starting an XBoard engine as UCI) lead to immediate crashes or illegal moves.
- Uncontrolled randomness: if your engine has a random seed option, fix it during A/B testing to reduce noise.
- Skewed hardware: running engines with different thread counts or memory sizes invalidates Elo comparisons.
- Over-aggressive adjudication: too-early resign/draw thresholds can bias results; keep thresholds conservative unless you’ve validated them.
Interesting facts and anecdotes
- The “cli” stands for command-line interface, distinguishing it from the Cute Chess GUI aimed at human users.
- It is widely used by community testing frameworks for top engines; its robust scheduling and PGN output make large volunteer-driven testnets feasible.
- Cutechess-cli can also run chess variants when supported by both the engine and protocol (e.g., Chess960 via UCI_Chess960).
- By standardizing testing, cutechess-cli helped popularize statistically sound acceptance tests (like SPRT) in engine development, making “+3 Elo” claims meaningfully verifiable.