R-ConstraintBench stress-tests LLMs on real-world planning and complex reasoning. Initial findings: No model stays consistently feasible at high complexity and o3 leads synthetic tests, GPT-5 leads real data center migration.
Announcing R-ConstraintBench: A novel way to stress-test LLM reasoning abilities under interacting constraints labelbox.com