Churn ratio: the quality signal most engineering teams aren't reading

Code churn is not a new concept. What has changed is how important it has become now that a significant portion of commits in most active repositories are AI-assisted.

Code churn is not a new concept. Software engineering research has tracked it for decades as an indicator of codebase stability and development process health. What has changed is how important it has become now that a significant portion of commits in most active repositories are AI-assisted.

Churn ratio measures the proportion of recently added lines that get rewritten or deleted within a short window — typically two weeks. A line committed on Monday and modified again by the following Monday did not survive. It was not stable code; it was a draft that made it to the repository.

High churn is not inherently a problem. Exploratory development churns. Prototypes churn. Code under active architectural revision churns by design. The signal becomes important when churn is elevated across normal feature work, when it is increasing over time, and when the increase coincides with a change in how the team is writing code.

What the research shows

GitClear’s longitudinal analysis of a large sample of repositories found that AI-assisted commits show approximately twice the churn rate of human-authored commits. That finding has held consistently across their dataset.

To work through what that means concretely: a team where 60% of commits are AI-assisted, running the 2× churn multiplier on that portion, is producing meaningfully less durable code than their total commit volume implies. If the team is committing 10,000 lines a month and 6,000 are AI-assisted, the effective output of code that survives past two weeks is considerably lower than the gross additions figure suggests.

The same research identifies a 4× increase in duplicate code blocks in high-adoption codebases. Duplication and churn often appear together — AI generation tends toward pattern-matching rather than the kind of abstraction that reduces repetition, which means similar code gets written in multiple places rather than being factored into shared functions. The churn comes later, when someone notices the duplication and consolidates it.

Why most teams are not reading this signal

Standard git analytics surfaces additions, deletions, and commit counts. Some tools add velocity trends and author breakdowns. Churn ratio requires tracking whether specific lines are modified again within a defined window after they were written — slightly more involved than raw line counts, and requiring a clear definition of what “churned” means.

Most dashboards do not calculate it. Most engineering managers have not asked for it because they are not aware it exists as a trackable number. The teams that do measure it tend to surface it in technical metrics reviews rather than in the management reporting where velocity and commit counts live.

The result is a signal that is genuinely predictive of code quality trends, sitting unused in data that most teams already have.

What a concerning pattern looks like

Churn ratio is most useful as a before-and-after comparison rather than as an absolute number. A team at a churn ratio of 1.8 is not clearly in trouble or clearly fine without context. A team that was at 0.9 two years ago and is now at 1.8, with the inflection point coinciding with significant AI adoption, has a signal worth investigating.

The investigation is a calibration question: which part of the workflow is producing unstable code? Is it concentrated in one developer’s output, one repository, one type of task? Is it feature work, refactoring, test code? The churn ratio is the signal that tells you there is a question. The breakdown tells you where to look.

It is also worth separating two distinct patterns. In the first, churn is high and uniformly distributed across all AI-assisted commits — which suggests a workflow problem where AI output is not being reviewed carefully enough before commit. In the second, churn is elevated only in certain task categories — which suggests the tools are being applied well in some contexts and poorly in others.

Both findings are actionable. Neither is visible without the number.

What a healthy number looks like

There is no universal benchmark, because churn varies by codebase maturity, team size, and the nature of the work. Greenfield development has higher natural churn than maintenance work on a stable system.

The useful reference point is your own pre-AI baseline — the churn ratio your team was running before AI tools entered the workflow. If current churn is materially above that number, and the increase tracks with adoption, you have a concrete finding to work with. If churn has held steady through adoption, or improved, that is also a finding — one that tells you the AI workflow is producing durable code.

Both outcomes are worth knowing.