Can AI Understand a Codebase With 15 Years of History?

Millions of lines, hundreds of tables, dozens of integrations: what AI grasps in hours, where it fails, how teams deploy RAG and repo indexing, and why expert + model beats either alone.

Published: 23 June 2026

Key takeaways

Legacy is accumulated complexity, not framework age. A fifteen-year project stacks several technology generations, stale documentation, and business logic embedded in SQL and cron jobs. The pain is not “Java 8” — it is that changing one field can touch five integrations.

In hours, AI delivers what takes humans weeks — with proper data prep. Without repo indexing, DB schemas, and git history, the model sees fragments and fills gaps from generic patterns. With RAG, semantic search, and agent tools (Cursor, Claude Code, Windsurf Deep Wiki), the picture forms an order of magnitude faster.

AI excels at search, impact analysis, and documentation. Where price is calculated, where document approval happens, which modules read orders_legacy — answered in minutes when the repo is accessible.

AI struggles with “why we decided this” and blind trust. It was not in the 2014 meeting, does not know the bank contract, and can confidently describe a non-existent API. Hallucinations are dangerous because they sound plausible.

The optimal model is expert + AI. The developer or architect sets boundaries, verifies outputs, and decides; the model is the analyst that never tires of reading three million lines.

Introduction: why legacy stays the main pain

Most companies do not rewrite systems every five years. They evolve them: new modules, integrations, regulatory changes. Retail ERP, B2B CRM, core banking, government systems — they run for decades. Downtime costs more than a year of maintenance.

A typical mature corporate project: 1.5–4M lines of code, 8–15k files, 200–600 tables in the primary DB plus reporting stores, 20–40 external integrations (banks, e-invoicing, marketplaces, ERP bridges, message buses). Teams turned over five to ten times; some authors are unreachable.

Onboarding is not “read the README.” It is months of code, incident postmortems, and learning where truth lives in code vs. in two people’s heads. Leadership naturally asks: can we delegate first-pass discovery to AI?

The 2026 answer is yes, partially and conditionally — not “upload a zip to ChatGPT and get architecture.” You need indexing, repo-aware tools, and human verification. Below: what models grasp, where they fail, and how teams deploy this in practice.

What a typical fifteen-year project looks like

Technical layers from different eras

Over fifteen years one repo (or family of repos) accumulates waves of tech: monolith on Java or .NET, JSP or WebForms, stored procedures; then REST services, Angular or React frontends, a reporting service. A “temporary” Python export script still runs in prod. Microservices were extracted partially — three new services beside a core nobody dares touch.

Traces of many teams show in coding style, naming, multilingual comments, and duplicate abstraction layers. “Old” modules often behave more reliably than “new” ones because they have been patched at the edges for a decade.

Missing up-to-date documentation

The Confluence page “Architecture v3” is from 2019, before the Kafka migration. Swagger covers new APIs only; legacy exchanges XML via a schema one integrator knows. Actual behavior diverges from docs: a config flag, a 02:00 cron, a manual operator step.

Some rules exist only in people’s heads: “do not touch this table before month close,” “this endpoint is deprecated but the bank still hits it.” They rarely land in git.

For AI, docs are one source among many — never trusted without code cross-check. The upside: models can generate documentation from code and shrink the gap.

Complex business processes

ERP, CRM, banking, and public-sector systems are not CRUD apps. Business logic accumulated: approvals, limits, tax rules, document states, multi-step orders. A bug is not a UI glitch — it is a fine, frozen account, or regulator rejection.

High cost of errors changes the game: “quick patch from AI” is not enough. You need consequence analysis. AI finds where totals are computed; humans decide whether the formula can change before release.

How modern AI models analyze code

What changed in recent years

Three shifts made legacy analysis realistic.

Context windows grew from thousands to hundreds of thousands and millions of tokens. You still cannot load an entire repo at once, but a module, DB schema, or call chain fits in one session.

Code understanding in specialized models (Claude, GPT-4o, Gemini, Codestral, etc.) is strong enough to trace dependencies, explain SQL, and map DTOs to tables — not perfect, but comparable to a strong mid-level developer on a first pass.

Repository tools moved beyond chat: Cursor and Claude Code index projects, traverse files, grep, read git blame; Windsurf Deep Wiki builds live wikis; enterprise RAG connects GitLab, Jira, Confluence.

Data AI can analyze

Source	What it gives the model
Source code	Logic, dependencies, APIs
SQL schemas and migrations	Data model, table evolution
OpenAPI, WSDL, protobuf	Integration contracts
Documentation (even stale)	Intent and glossary
Git history	Who changed what, when, why (if commits are honest)
Logs, configs, feature flags	Runtime behavior

The more sources are linked in one index + RAG pipeline, the less the model invents. Code alone without DB schema is a classic failure mode: AI finds entity Order but misses a PostgreSQL trigger.

How AI builds a project map

Typical analysis pipeline: dependency graph (imports, package calls, HTTP clients); domain entities (order, counterparty, shipment) from models, tables, REST paths; scenarios (“create order → reserve → pay → ship”) as class and queue chains; integration points (external URLs, Kafka topics, SFTP folders).

The agent does not “memorize the repo” — it queries like a senior with ripgrep and an IDE: “where is status updated in shipments,” “who calls LegacyBillingAdapter.”

Experiment: giving AI a fifteen-year project

Below is a typical scenario reconstruction based on real wholesale ERP patterns (monolith + satellite services). Numbers are representative; your project may differ, but orders of magnitude should feel familiar.

Starting conditions

~2.8M lines of Java, Kotlin, SQL, JavaScript, XML configs
~11,400 files in the main monorepo + 4 satellite repos
~380 tables in Oracle (core) + 90 in PostgreSQL (reporting)
34 integrations: banks, e-invoicing, marketplaces, accounting bridge, SMS, customs
Documentation: ~40% of modules lack current descriptions; wiki partially contradicts code

AI setup: repo indexing, DDL dump access (no PII), read-only git, IDE agent. No prod logs, no oral legends from the team.

What AI understands in the first hours

Over 4–8 hours of targeted sessions (not one continuous run), a tooled model usually produces:

Top-level architecture: monolith core-app, extracted print and notification services, overnight batch 01:00–04:00, bus for order events.

Core entities: counterparty, contract, order, shipment, invoice, payment — mapped to packages and tables.

Key scenarios: order placement, discount approval, warehouse reservation, invoicing, bank reconciliation — with REST, UI, and scheduler entry points.

Integration points: adapter list, URLs, formats, common failure modes (bank timeout, queue retries).

This is faster than a new senior without such tools — humans spend time navigating and guessing where to look.

What humans still study for weeks

Non-obvious dependency maps: “field discount_reason affects the tax line via a view not referenced in Java.”

Informal rules: seasonal procedures, key-client exceptions, a one-region workaround.

Quality and risk: untested modules, last P1 incident areas, who to call when nightly batch fails.

Change policy: Friday deploy rules, DBA windows.

AI accelerates the first 60–70% of the map but does not replace team conversations and incident memory. Onboarding from “three months” toward “six weeks” with a good AI loop is realistic; “full understanding in one week” is not.

Where AI performs best

Finding business logic

Questions like “where is line total computed with discount and VAT” are a strength. The agent finds PriceCalculator, reporting SQL, and a duplicate legacy method nobody removed.

“Where is document approval” — workflow engine + approval_steps + notifications.

“Where does status change” — enum grep, mapper update, event listener.

Humans can too — in days; AI in minutes with a fresh index.

Change impact analysis

Before refactoring client_id or dropping a table, you need impact analysis. AI lists JPA entities, reports, integration DTOs, stored procedures, tests. Not 100% guaranteed (dynamic SQL, reflection) but removes ~80% of drudgery.

Especially valuable before DB migrations or column type changes.

Documentation generation

From code: module descriptions, missing OpenAPI, component diagrams (Mermaid, PlantUML), entity glossaries. Windsurf Deep Wiki and peers do this semi-automatically; teams in the Platform9 / Monday.com webinar cite live repo docs as early AI ROI.

Mark output as generated and review it — otherwise wiki drifts again, just prettier.

Faster developer onboarding

New hires ask RAG chat: “how is order cancellation implemented,” “why two PaymentServices,” “where are bank X integration logs.” Answers with file links shrink time-to-first-commit.

Not a mentor replacement — compression of the first weeks of repo wandering.

Limitations of AI

Context limits

Even a million tokens is not 2.8M lines. You need indexing, chunking, hierarchical module summaries. Without that, the model sees a slice and extrapolates.

Copy-paste legacy and magic strings add noise — AI may “merge” two similar classes in its head.

Missing business context

Code shows what, rarely why. A “temporary” 2017 bank API workaround looks like nonsense until an architect explains it.

Historical constraints (license, hardware, SLA contract) are not in git. ADRs help when they exist; in legacy, often they do not.

Hallucinations and misinterpretation

Models confidently cite non-existent endpoints, confuse v1 and v2 APIs, miss reflection-based calls. Blind trust risk is higher for management than engineers — because answers are well structured.

Rule: verify every AI output for prod decisions with file/line references or tests. For critical paths — second model or peer review, as high-volume AI teams recommend.

How companies deploy AI for legacy systems

Repository indexing

Minimum corpus: code (all prod repos including SQL and infra), DB schema (DDL, Flyway/Liquibase migrations), docs (Confluence export, ADRs, README). Reindex on merge to main, not once a year.

Secrets: index without .env, keys, PII; corporate policy ties to AI agent sandboxing.

Internal knowledge base

Dependency graph (modules, services, tables) + semantic search (“where is credit limit mentioned”). Tools: enterprise RAG (Azure AI Search, Elasticsearch + embeddings, on-prem stacks), IDE agents with project index.

Wiki becomes a secondary layer: generated from code, reviewed, versioned beside the repo.

RAG vs. plain chat

Generic ChatGPT does not see your git. RAG retrieves current chunks: class, migration, wiki page. Without RAG, answers average Stack Overflow; with RAG, “in your InvoiceService.java line 142.”

Legacy needs precision and citations. RAG + agent tools is the 2026 default for internal systems.

Keeping knowledge fresh

On merge to main: reindex touched modules, diff-summary for architecture maps, optional PR comment “consumers of table X changed.” Documentation stops being a 2019 snapshot.

Can AI replace an experienced developer?

What AI already does better

Search speed across millions of lines without fatigue. Parallel traversal of many modules. Draft diagrams and dependency tables. Recall of file names and signatures — with indexing.

What stays human

Architecture: service boundaries, domain splits, strangler-fig strategy for monoliths — decades-long trade-offs.

Business process: product owner alignment, regulation, integrator negotiations.

Risk: “ship on Friday?”, rollback plans, stakeholder warnings.

Communication: explain to the CFO why refactor takes a quarter, not “AI said it’s easy.”

Optimal working model

Developer / architect = expert and final filter. AI = analyst, tech writer, navigator. Ritual: question → cited answer → verification → decision → ADR. Same pattern as teams restructuring SDLC around agents, not “asked ChatGPT and deployed.”

Future: how legacy maintenance will change

Self-documenting systems

Docs generated from main and published automatically; wiki/code drift becomes a CI failure. Knowledge base is a living artifact, not a PDF.

Digital architecture assistants

Internal assistants answer: “what breaks if we drop this column,” “technical debt in billing module,” “who last changed bank Y integration.” Linking monitoring and tickets adds incident context — missing from pure code RAG today.

Legacy in five years

Onboarding: weeks instead of months at the same quality bar. Maintenance: less bus factor on “the one who remembers.” Evolution: more product work, less archaeology. Legacy will not vanish — systems will evolve longer and rewrite less often when architecture tolerates change.

FAQ

Can ChatGPT read our entire repo at once?

No. Even large contexts do not fit multi-million-line codebases. You need indexing, RAG, or file-search agents (Cursor, Claude Code). Zip upload to web chat works only for small projects or selected modules.

How much onboarding time does AI save?

Typical outcome: initial system map compresses from weeks to days; full context (risk, policy, nuance) still measured in months — but productive work starts earlier.

Are hallucinations dangerous for legacy?

Yes. A confident wrong answer about an API or DB trigger can cause prod incidents. Require file citations, human review, and tests on critical paths.

Do we need separate RAG if we have Cursor?

Often enough for one developer in the IDE. For whole teams, wiki + Jira + code search, and compliance — you need a central corporate index with access control.

Can AI replace the architect on legacy?

No. It accelerates fact gathering; decisions on boundaries, migrations, and trade-offs stay with accountable humans.

What to index first?

Main branches of all prod repos, DB DDL and migrations, OpenAPI/integration contracts, ADRs and runbooks. Prod logs and PII only per security policy.

How to verify AI did not invent a dependency?

Open the cited file, grep the symbol, run tests or static analysis. For doubtful areas — second model or colleague.

Does this work in banking and government?

Yes, with on-prem or private cloud and no code leakage to external SaaS without contract. Many banks already run local LLM + RAG; regulation matters more than model choice.

Conclusion

Can AI understand a fifteen-year-old project? — Yes, for a large share of discovery, faster than humans on first-pass reconnaissance. Architecture, entities, scenarios, integrations, impact analysis, and draft docs are strengths — with indexing and agent tools.

Today’s boundary: informal context, historical “why,” dynamic-code completeness, and prod accountability. Hallucinations are real; verification is mandatory.

The future is not replacing developers but expert + AI: less archaeology, more product evolution. Practical step this week: index main, connect an IDE agent or RAG, run a control experiment — three typical newcomer questions (“where is order status,” “who writes table X,” “which services depend on adapter Y”) — verified by a senior. The time delta shows ROI without a “transformation” deck.