May 19, 2026
Engineering

How We Made Our Monorepo Ergonomic for Agents

Michael Crabtree
Michael Crabtree

At Basis, we’re obsessed with this question: How do we make our codebase ergonomic for agents? There are decades of learnings in software engineering on what a well-designed codebase looks like for humans (small functions, defined modules, no over-bloated documents, etc.). How do we evolve that for agents?

The Atlas team at Basis is responsible for internal agents and context. Our product is the codebase itself. A codebase is two things at once. It is the source code that runs in production, and it is the context that coding agents use to make decisions. So to make our product truly friendly for our users, we had to make the monorepo as agent-native, as ergonomic, as possible.

We did it. In three months, token usage per developer increased more than 5x and commit velocity increased by 2.5x.

Our Vision

Basis has placed a core bet on intelligence. From the very beginning of Basis three years ago, we believed that most of our code would soon be written by agents, and built our company accordingly. We hit that point in intelligence about nine months ago.

At that point, it became easy to imagine a world where agents consistently deliver high-quality, well-tested code while engineers focus on the challenging task of actually making engineering decisions.

But we weren't there yet. While coding agents are capable in isolation, they are prone to mistakes when dropped into a working codebase without supporting infrastructure.

This is not a new problem; it's also true of any new hire. A fast-growing company like Basis might onboard multiple engineers every month, so it has always been important to make your codebase easy to learn. But unlike a human, an agent has to "onboard" to the codebase every single trajectory. As we've adopted coding agents, suddenly the "onboardings" at Basis have gone from a handful a month to thousands a month. At this rate, any small inconsistencies, contradictions, and gaps compound quickly, while previously they may have gone unnoticed.

Principles for an Agent-Native Codebase

The Five Principles
i.

Canonicality

Every artifact is either a source of truth about the system as it is today, or a record of intent and history. It is never both.

ii.

Localization

Context lives as close to where it is used as possible. It only moves up as it becomes more generally applicable.

iii.

Verifiability

Agents need to be able to test their own work.

iv.

Interoperability

No layer binds the team to a single vendor. AI is moving too fast to bet the architecture on one platform.

v.

Default-no

Any context loaded automatically must earn its keep. When the default is include, files balloon; when the default is exclude, every line is argued for.

Figure 1. The five principles, as small multiples. Stated negatively where the negation is load-bearing: Default-no is the only one of the five whose phrasing materially affects how it is applied. The grammatical asymmetry is intentional.

The primary levers to empower coding agents are context and tools. To get to our end state of fluent agents, we developed five principles to guide the development of those levers.

1. Canonicality. Every artifact in the repo is either a source of truth about the system as it is today, or a record of intent and history. It is never both. An agent reading your codebase needs an explicit map of what to trust as a description of reality and what to read as a plan, a hypothesis, or a memory.

2. Localization. Context should live as close to where it is used as possible. It only moves up as it becomes more generally applicable. This reduces the likelihood that agents miss relevant context.

3. Verifiability. Agents need verification of their work. We built mechanisms to enforce that, including sub-agent roles, pre-commit hooks, and tests.

4. Interoperability. No layer of the architecture binds the team to a single vendor. AI technology is moving too fast to bet on a single platform. Locking into a vendor this early in AI development risks missing large benefits down the road.

5. Default-no. Any context that is loaded automatically must be scrutinized closely. Tokens that earn no behavior are a tax on every session, paid by every agent and every engineer. Stating it negatively is intentional. When the default is "include," loaded files balloon; when the default is "exclude," every line earns its place.

The architecture we built is the implementation of these principles in code.

Canon vs. Not Canon

The first step in applying our principles was categorizing existing context into canonical and non-canonical categories. This was a rigorous process that forced the team to gather and collate many types of information from across the codebase, and then engage in intense discussions to reconcile them. Through that reconciliation process, we formalized our approach in a documentation-standards document that maps every artifact type in the repo to an authority level.

Canon is material a coding agent should treat as a source of truth about how the system works today. It includes root and nested AGENTS.md files, skills, the docs/ directory, and inline code comments and docstrings. These artifacts say, "This is the current state and how we work in it."

Not canon is useful context that is not a source of truth about the current codebase. It includes plans and specs (.specs/ and Linear), and historical rationale (.notes/).

Both categories are valuable. The potential mistake is treating not-canon as canon. A Linear ticket may describe a feature that was never implemented, or was implemented differently than planned. If the agent reads that ticket and treats it as truth, it will be confused about the correct state of the world. By explicitly marking what is and is not canonical, we give agents a more nuanced ontology.

The question this may raise is, "Why allow agents to see non-canonical information at all?" The answer is that non-canonical information can still be extremely valuable when parsing complex situations. Agents need a way to reach back to specific moments in history and answer questions like "Why did we write this code this way?" In a pre-agent world, the answer was a Slack DM to whoever wrote the commit. Now the answer is .notes/. 

For example, when our incident response agent, Clueso, debugs a user report, non-canonical context helps Clueso understand whether it is a bug or a feature. While specifications tell Clueso the latest intended behavior, the notes indicate important edge cases that were considered by the original code author.

Our full mapping is published below as the Authority Map

The Authority Map
Canon a source of truth about today
  • AGENTS.mdRoot and nested directives: how we work, here.
  • SkillsCross-cutting procedures loaded on match.
  • docs/Durable architecture and onboarding.
  • DocstringsContract, invariants, side effects.
  • CommentsNon-obvious local reasoning, in place.
Not Canon intent, history, hypothesis
  • .specs/Repo-backed product and tech specs.
  • LinearActive project specs, alignment-stage decisions.
  • .notes/Change-set tradeoffs, recorded close to the work.
  • PR descriptionsWhy the diff exists; what it discarded.
  • Slack threadsThe unrecorded conversation, fossilised.

The Six-Layer Architecture

The Authority Map gave us a clean six-layer architecture.

The Context Pyramid (Layers 1-3)
Nested AGENTS.md · loaded by directory
Migrations AGENTS.md never edit committed migrations · always add a new one
Folder AGENTS.md directory-specific rules
Skills · loaded on match
Database query patterns, schema conventions
Testing when tests are required & how to write them
Skills backend · frontend · pr · docs · transactions
Root AGENTS.md · always loaded
Root AGENTS.md principles, workflow, communication, type safety, naming — every line read by every agent every session

Layer 1: Root AGENTS.md. Our engineering principles, workflow definitions, and communication patterns. Loaded in every session. Currently around 300 lines. The most high-leverage file in the repository: every token is seen by every agent, every time. For Claude users, we merely symlink the AGENTS.md.

Layer 2: Nested AGENTS.md files. More than 100 of these across the monorepo, each scoped to its directory. The backend AGENTS.md specifies import conventions, concurrency patterns, and dependency rules. Each file is narrow and operational. 

Example:

### Imports

All Python imports go at the top of the file.

- Strongly avoid inline/deferred imports to work around circular imports. A circular import means the module structure is wrong--fix the structure instead.

- Only acceptable reason for a non-top-level import: the imported module has expensive load-time side effects and the calling code path is rarely executed.

Layer 3: Skills. The .agents/skills/ directory contains skill packages covering backend architecture, frontend patterns, testing standards, documentation conventions, and domain-specific knowledge for products. 

Layer 4: Sub-agent roles. The .agents/roles/ directory defines more than half a dozen specialized agents, each with its own context window. The verifier runs diff-scoped tests and pre-commit hooks, then reports pass/fail with actionable failure details. The standards-enforcer validates code against all applicable AGENTS.md files and skills, checking for overly defensive programming, dead code, and missing test coverage. 

# verifier.md (frontmatter)

---

id: verifier

name: verifier

description: Runs diff-scoped tests, pre-commit hooks, and relevant lint/type checks, then reports pass/fail status with actionable failure details.

codex_agent_key: verifier

codex_model: gpt-5.5

codex_model_reasoning_effort: low

codex_model_verbosity: low

---

Layer 5: Unified MCP. Our unified MCP server gives agents access to external systems: Linear for project context, Slack for team communication, Better Stack for logs, PostHog for analytics, and dev database access for validation. An agent investigating a bug can pull the relevant Linear ticket, check production logs, and query the database without the engineer manually copying context into the prompt. 

Layer 6: Tests. Automated enforcement that catches standard violations before they reach CI. Ruff for Python linting and formatting, BasedPyright for type checking, ESLint and Prettier for TypeScript, plus detections for large files, private keys, and merge conflicts. These hooks are the last line of defense; they enforce the standards even when an agent (or a human) forgets to follow them. 

Rewriting AGENTS.md

Our repo contained lots of AGENTS.md files that had been written before we codified our principles. We found about 20 of them, and they were in rough shape. Here are the three most common issues we saw across the AGENTS.md files.

First, many of the files described the codebase to the agents rather than instructing them. For example, one AGENTS.md said: "SRC is where we put all our source code." Of course, the agent already knows what an src/ folder is; it has been trained on hundreds of thousands of repositories with that convention.

Compare that with an instruction like "use strict type checking" or "never use inline imports to work around circular dependencies; fix the module structure instead." These operational directives change the agent behavior. They tell the agent how we expect it to work.

Second, when our AGENTS.md files did include instructions, they were often all high-priority, "must-follow" directives. When you tell an agent in strongly worded terms that everything is important, it makes nothing important. One of the trickier parts of refining the rules was consistently embedding an accurate sense of priority into the prose. The default-no and localization principles helped guide us here. Removing unnecessary emphasis and placing instructions where they applied yielded the agent behavior we wanted.

Third, we also needed to organize information that applied in multiple scenarios across folders. For example, knowledge about the intricacies of our Tasks product could not properly live only in the backend AGENTS.md. This knowledge was necessary for frontend business logic as well. We embedded cross-folder knowledge in skills that could be loaded by the agent on demand. Originally we used a /docs folder, but moved to take advantage of the models all being post-trained to load skills effectively. (Docs now are for explicitly human-facing material.) 

We codified five authoring rules for AGENTS.md files, each of them a corollary of the principles:

  1. Instruction quality. Write for agents, not for humans. The objective of your AGENTS.md files should be to explain to an agent how to operate. They should not become permanent documentation for humans.
  2. Hierarchy-first placement. Place context at the most specific directory that fully owns it. Information moves up only when it is genuinely shared.
  3. Resilient references. Use descriptive names rather than exact file paths. Paths change; descriptions are stable.
  4. Text-only, search-friendly content. No ASCII art, no binary content, no formatting that interferes with search or parsing.
  5. Default-no. Would an agent reasonably need this information for the majority of tasks in this directory? If not, it belongs somewhere else.

The team rewrote AGENTS.md files across about 20 folders, migrating contextual knowledge to skills and replacing descriptive content with operational instructions. Examples of what survived the rewrite:

Canon context is a source of truth you can trust to inform decisions. Non-canonical context is context that indicates intent, notes, temporary states, etc.

Prefer early returns over deep nesting.

Write code that can be understood without referencing other files. Be explicit rather than clever.

These are loaded into every agent session across the entire monorepo. They are the directives we want followed regardless of where an agent is working. The root AGENTS.md is currently around 300 lines, and every line has been argued over.

The Cleanup

With the instruction layer rebuilt and the architecture in place, we finally turned to the codebase itself. Ryan Moffat used coding agents to audit every directory against the newly codified instructions, producing a list of nine projects with thousands of lines of violations.

We then deployed agents to fix the problems that agents had perpetuated. The agents that had been absorbing bad patterns were now given explicit, well-structured instructions to rewrite code according to the new standards. 

The rewrite touched an estimated 20 to 30 percent of the entire codebase across the nine completed projects. The principles told us where the bar was; the cleanup was the cost of getting the existing code up to that bar so that it could serve as canon. There is no shortcut. An agent-native codebase demands more local correctness than a human-only one, because every file is context and the agents are constantly onboarding.

Refactoring with agents hit natural limits. Often, there were structural reasons for the technical debt that agents could not solve. We prioritized the most frequently visible parts of the codebase that agents could fix. We then prioritized the visible areas that required human intervention. The rest we left to be cleaned up in our normal processes. 

Maintaining Canonical Context

The first question anyone asks when they see our architecture is, "How do you keep all that from rotting?" 

Maintenance starts with owners. Every canonical artifact at Basis carries an explicit owner field in YAML frontmatter at the top of the file. A CI/CD check ensures that any new skill or non-production markdown file has a corresponding owner. When our automated context cleanup system flags something, the owner is responsible for reviewing it. 

We have a set of cloud agent automations that review the monorepo. This is what we call our Automatic Context system. Three of those automations target context directly:

  • A CI/CD check ensures merges match our deterministic standards: validated frontmatter, descriptive prose where operational directives belong, and proper grammar.
  • A scanner runs daily to do a broad sweep of skills and AGENTS.md files for staleness, contradictions, duplicated instructions, broken references, and missing context for recent changes.
  • Workers run daily to pick up tickets from the scanner and implement small, scoped fixes.

The broader point: automated context maintenance is only possible because we agreed on what is canonical. A scanner can sweep AGENTS.md files and skills for contradictions because canonical context is, by definition, supposed to agree with itself. Non-canonical context is allowed to disagree with itself; specs are revised, plans are abandoned, .notes/ entries capture decisions made at moments that no longer exist. If you do not draw the line between what must be self-consistent and what may not be, you cannot run a scanner over either category. 

Closing the Validation Loop

Alongside the problem of agents writing non-standard code, we also recognized that our testing wasn’t standardized. One of our principles was that agents’ work requires verification, so we expanded our testing frameworks. This was a separate effort, led by Bhavdeep Sethi on the platform side and the Atlas team on the agent behavior side.

We found a lot of success with an inter-team structure: pairing one engineer focused on solving the traditional technical problems of testing with another engineer focused on the agent's instructions. Bhavdeep built the testing infrastructure: unit tests, integration tests, proper fixtures and markers, CI integration. The Atlas team's contribution was embedding testing standards into the agent behavior layer. This approach treated agent behavior as a first-class requirement rather than an afterthought.

We created a testing skill that defines what tests are expected, when they are required, and how they should be structured. We extensively evaluated whether our guidelines induced agents to produce the tests we wanted. Sometimes agents were too verbose. Other times, they were extremely lazy. Getting the skill language correct required some work. It was worth the investment to have agents that consistently produced tests according to our standards.

How We Judged the Result

We started with the simplest metric to measure: token usage per developer. The hypothesis was that if we solved the problems making coding agents perform poorly, engineers would be able to trust agents to do more work, which would let engineers manage more agents simultaneously, which would increase token usage. We set a goal of 5x token usage in one quarter. It felt ambitious because engineers at Basis were already coding-agent power users. When we hit that goal, we knew we were enabling developers to parallelize more and spend less time fixing agent output.

Increasing AI usage is only meaningful if it enhances the team's overall productivity. Weekly commit velocity over this time increased by 2.5 times. By the end of this work, 100% of our engineering team was working with multiple worktrees. Engineers were coming to us asking for better tooling to help them manage more agents.

What's Next

Coding agents are a new kind of consumer of your codebase, with their own failure modes, their own appetite for context, and their own demands on what counts as a well-organized repo. Most companies have not begun to take that seriously. The ones that do will find, as we did, that the work is bigger than expected, the principles are non-obvious, and the payoff is substantial.

Now that we have given coding agents an ergonomic environment to succeed in, we are optimizing the entire AI-native software development lifecycle at Basis. This includes new approaches such as proof-based development, redesigning our code review process, and experimenting with automatic code maintenance - the natural extension of the Automatic Context machinery from the instruction layer to the code itself.

If you want to join an agent-native company, we’re hiring.

Michael Crabtree is Atlas Tech Lead at Basis. Ryan Moffat led the codebase standards audit and owns the Automatic Context system. Bhavdeep Sethi built the testing infrastructure. Seth Schiesel contributed to this post.

Share: