Disarray builds AI research agents for long-horizon autonomy over heterogeneous data

Disarray agents manage the end-to-end research loop: translating high-level goals, finding relevant data, forming hypotheses, writing and running code, interpreting partial or failed results, and deciding what to try next. 

Disarray tackles two key challenges in AI research agents: generating stronger hypotheses for efficient experimentation and avoiding the failure modes that derail long-running agent loops. A key insight in Disarray’s approach is that success comes from better context, not more context. Disarray’s core differentiator is a context graph that captures rich lineage across data, code, execution history, and documentation, enabling precise retrieval of the most relevant information and multi-hop analysis that reveals novel insights from prior work. By giving agents more targeted context, higher-order understanding, and intelligent harnesses, Disarray agents form better hypotheses, stay focused over time, and ultimately deliver breakthrough results rapidly in long-horizon research.

Validated in autonomous ML experimentation

In fully autonomous runs limited to 24 hours and a single GPU per Kaggle competition, Disarray agents earned 28 medals across vision, NLP, tabular, and object-detection tasks, including nine top-10 finishes and one result that outperformed all human teams.

ML model development is Disarray’s first product use case, but the underlying capabilities extend to a much broader class of open-ended research problems.

Toward recursive self-improvement

Every time a Disarray agent runs an experiment, it generates a detailed trace: what context it retrieved, what hypothesis it formed, what code it executed, what errors it hit, how it recovered, and the ultimate result. These high-fidelity execution traces serve as the precise training data for improving all components of Disarray. Today, agent outcomes and user interactions are used to self-heal the context graph. Dynamic self-improvement of the harness and planning modules are active areas of research at Disarray.

The instrumentation that runs research today forms the foundation for recursive self-improvement tomorrow.

Follow our work

Follow our work on context graphs, long-running agent harnesses, autonomous experimentation, and the journey towards recursive self-improvement for AI systems.

Stay Updated

Join us

We’ve built a strong foundation and proven what’s possible, but the most interesting problems are still ahead. If you’re naturally curious, energized by tough challenges, and want to push the boundaries of automation in AI development, come join us! Reach us at careers@disarray.ai.