Contents
A ten- to fifteen-year-old enterprise system is not just “old code” — it is a living organism: millions of lines, hundreds of tables, dozens of integrations, and knowledge that never made it into Confluence. Onboarding a new developer can easily stretch to months. The question for 2026: can modern AI map such a system faster than a human — and where are the limits?
Key takeaways
Legacy is accumulated complexity, not framework age. A fifteen-year project stacks several technology generations, stale documentation, and business logic embedded in SQL and cron jobs. The pain is not “Java 8” — it is that changing one field can touch five integrations.
In hours, AI delivers what takes humans weeks — with proper data prep. Without repo indexing, DB schemas, and git history, the model sees fragments and fills gaps from generic patterns. With RAG, semantic search, and agent tools (Cursor, Claude Code, Windsurf Deep Wiki), the picture forms an order of magnitude faster.
AI excels at search, impact analysis, and documentation. Where price is calculated, where document approval happens, which modules read orders_legacy — answered in minutes when the repo is accessible.
AI struggles with “why we decided this” and blind trust. It was not in the 2014 meeting, does not know the bank contract, and can confidently describe a non-existent API. Hallucinations are dangerous because they sound plausible.
The optimal model is expert + AI. The developer or architect sets boundaries, verifies outputs, and decides; the model is the analyst that never tires of reading three million lines.
Introduction: why legacy stays the main pain
Most companies do not rewrite systems every five years. They evolve them: new modules, integrations, regulatory changes. Retail ERP, B2B CRM, core banking, government systems — they run for decades. Downtime costs more than a year of maintenance.
A typical mature corporate project: 1.5–4M lines of code, 8–15k files, 200–600 tables in the primary DB plus reporting stores, 20–40 external integrations (banks, e-invoicing, marketplaces, ERP bridges, message buses). Teams turned over five to ten times; some authors are unreachable.
Onboarding is not “read the README.” It is months of code, incident postmortems, and learning where truth lives in code vs. in two people’s heads. Leadership naturally asks: can we delegate first-pass discovery to AI?
The 2026 answer is yes, partially and conditionally — not “upload a zip to ChatGPT and get architecture.” You need indexing, repo-aware tools, and human verification. Below: what models grasp, where they fail, and how teams deploy this in practice.
What a typical fifteen-year project looks like
Technical layers from different eras
Over fifteen years one repo (or family of repos) accumulates waves of tech: monolith on Java or .NET, JSP or WebForms, stored procedures; then REST services, Angular or React frontends, a reporting service. A “temporary” Python export script still runs in prod. Microservices were extracted partially — three new services beside a core nobody dares touch.
Traces of many teams show in coding style, naming, multilingual comments, and duplicate abstraction layers. “Old” modules often behave more reliably than “new” ones because they have been patched at the edges for a decade.
Missing up-to-date documentation
The Confluence page “Architecture v3” is from 2019, before the Kafka migration. Swagger covers new APIs only; legacy exchanges XML via a schema one integrator knows. Actual behavior diverges from docs: a config flag, a 02:00 cron, a manual operator step.
Some rules exist only in people’s heads: “do not touch this table before month close,” “this endpoint is deprecated but the bank still hits it.” They rarely land in git.
For AI, docs are one source among many — never trusted without code cross-check. The upside: models can generate documentation from code and shrink the gap.
Complex business processes
ERP, CRM, banking, and public-sector systems are not CRUD apps. Business logic accumulated: approvals, limits, tax rules, document states, multi-step orders. A bug is not a UI glitch — it is a fine, frozen account, or regulator rejection.
High cost of errors changes the game: “quick patch from AI” is not enough. You need consequence analysis. AI finds where totals are computed; humans decide whether the formula can change before release.
Related reading: why ERP systems become monoliths, designing systems that last ten years.
How modern AI models analyze code
What changed in recent years
Three shifts made legacy analysis realistic.
Context windows grew from thousands to hundreds of thousands and millions of tokens. You still cannot load an entire repo at once, but a module, DB schema, or call chain fits in one session.
Code understanding in specialized models (Claude, GPT-4o, Gemini, Codestral, etc.) is strong enough to trace dependencies, explain SQL, and map DTOs to tables — not perfect, but comparable to a strong mid-level developer on a first pass.
Repository tools moved beyond chat: Cursor and Claude Code index projects, traverse files, grep, read git blame; Windsurf Deep Wiki builds live wikis; enterprise RAG connects GitLab, Jira, Confluence.
Data AI can analyze
| Source | What it gives the model |
|---|---|
| Source code | Logic, dependencies, APIs |
| SQL schemas and migrations | Data model, table evolution |
| OpenAPI, WSDL, protobuf | Integration contracts |
| Documentation (even stale) | Intent and glossary |
| Git history | Who changed what, when, why (if commits are honest) |
| Logs, configs, feature flags | Runtime behavior |
The more sources are linked in one index + RAG pipeline, the less the model invents. Code alone without DB schema is a classic failure mode: AI finds entity Order but misses a PostgreSQL trigger.
How AI builds a project map
Typical analysis pipeline: dependency graph (imports, package calls, HTTP clients); domain entities (order, counterparty, shipment) from models, tables, REST paths; scenarios (“create order → reserve → pay → ship”) as class and queue chains; integration points (external URLs, Kafka topics, SFTP folders).
The agent does not “memorize the repo” — it queries like a senior with ripgrep and an IDE: “where is status updated in shipments,” “who calls LegacyBillingAdapter.”
Experiment: giving AI a fifteen-year project
Below is a typical scenario reconstruction based on real wholesale ERP patterns (monolith + satellite services). Numbers are representative; your project may differ, but orders of magnitude should feel familiar.
Starting conditions
- ~2.8M lines of Java, Kotlin, SQL, JavaScript, XML configs
- ~11,400 files in the main monorepo + 4 satellite repos
- ~380 tables in Oracle (core) + 90 in PostgreSQL (reporting)
- 34 integrations: banks, e-invoicing, marketplaces, accounting bridge, SMS, customs
- Documentation: ~40% of modules lack current descriptions; wiki partially contradicts code
AI setup: repo indexing, DDL dump access (no PII), read-only git, IDE agent. No prod logs, no oral legends from the team.
What AI understands in the first hours
Over 4–8 hours of targeted sessions (not one continuous run), a tooled model usually produces:
Top-level architecture: monolith core-app, extracted print and notification services, overnight batch 01:00–04:00, bus for order events.
Core entities: counterparty, contract, order, shipment, invoice, payment — mapped to packages and tables.
Key scenarios: order placement, discount approval, warehouse reservation, invoicing, bank reconciliation — with REST, UI, and scheduler entry points.
Integration points: adapter list, URLs, formats, common failure modes (bank timeout, queue retries).
This is faster than a new senior without such tools — humans spend time navigating and guessing where to look.
What humans still study for weeks
Non-obvious dependency maps: “field discount_reason affects the tax line via a view not referenced in Java.”
Informal rules: seasonal procedures, key-client exceptions, a one-region workaround.
Quality and risk: untested modules, last P1 incident areas, who to call when nightly batch fails.
Change policy: Friday deploy rules, DBA windows.
AI accelerates the first 60–70% of the map but does not replace team conversations and incident memory. Onboarding from “three months” toward “six weeks” with a good AI loop is realistic; “full understanding in one week” is not.
Where AI performs best
Finding business logic
Questions like “where is line total computed with discount and VAT” are a strength. The agent finds PriceCalculator, reporting SQL, and a duplicate legacy method nobody removed.
“Where is document approval” — workflow engine + approval_steps + notifications.
“Where does status change” — enum grep, mapper update, event listener.
Humans can too — in days; AI in minutes with a fresh index.
Change impact analysis
Before refactoring client_id or dropping a table, you need impact analysis. AI lists JPA entities, reports, integration DTOs, stored procedures, tests. Not 100% guaranteed (dynamic SQL, reflection) but removes ~80% of drudgery.
Especially valuable before DB migrations or column type changes.
Documentation generation
From code: module descriptions, missing OpenAPI, component diagrams (Mermaid, PlantUML), entity glossaries. Windsurf Deep Wiki and peers do this semi-automatically; teams in the Platform9 / Monday.com webinar cite live repo docs as early AI ROI.
Mark output as generated and review it — otherwise wiki drifts again, just prettier.
Faster developer onboarding
New hires ask RAG chat: “how is order cancellation implemented,” “why two PaymentServices,” “where are bank X integration logs.” Answers with file links shrink time-to-first-commit.
Not a mentor replacement — compression of the first weeks of repo wandering.
Limitations of AI
Context limits
Even a million tokens is not 2.8M lines. You need indexing, chunking, hierarchical module summaries. Without that, the model sees a slice and extrapolates.
Copy-paste legacy and magic strings add noise — AI may “merge” two similar classes in its head.
Missing business context
Code shows what, rarely why. A “temporary” 2017 bank API workaround looks like nonsense until an architect explains it.
Historical constraints (license, hardware, SLA contract) are not in git. ADRs help when they exist; in legacy, often they do not.
Hallucinations and misinterpretation
Models confidently cite non-existent endpoints, confuse v1 and v2 APIs, miss reflection-based calls. Blind trust risk is higher for management than engineers — because answers are well structured.
Rule: verify every AI output for prod decisions with file/line references or tests. For critical paths — second model or peer review, as high-volume AI teams recommend.
How companies deploy AI for legacy systems
Repository indexing
Minimum corpus: code (all prod repos including SQL and infra), DB schema (DDL, Flyway/Liquibase migrations), docs (Confluence export, ADRs, README). Reindex on merge to main, not once a year.
Secrets: index without .env, keys, PII; corporate policy ties to AI agent sandboxing.
Internal knowledge base
Dependency graph (modules, services, tables) + semantic search (“where is credit limit mentioned”). Tools: enterprise RAG (Azure AI Search, Elasticsearch + embeddings, on-prem stacks), IDE agents with project index.
Wiki becomes a secondary layer: generated from code, reviewed, versioned beside the repo.
RAG vs. plain chat
Generic ChatGPT does not see your git. RAG retrieves current chunks: class, migration, wiki page. Without RAG, answers average Stack Overflow; with RAG, “in your InvoiceService.java line 142.”
Legacy needs precision and citations. RAG + agent tools is the 2026 default for internal systems.
Keeping knowledge fresh
On merge to main: reindex touched modules, diff-summary for architecture maps, optional PR comment “consumers of table X changed.” Documentation stops being a 2019 snapshot.
Can AI replace an experienced developer?
What AI already does better
Search speed across millions of lines without fatigue. Parallel traversal of many modules. Draft diagrams and dependency tables. Recall of file names and signatures — with indexing.
What stays human
Architecture: service boundaries, domain splits, strangler-fig strategy for monoliths — decades-long trade-offs.
Business process: product owner alignment, regulation, integrator negotiations.
Risk: “ship on Friday?”, rollback plans, stakeholder warnings.
Communication: explain to the CFO why refactor takes a quarter, not “AI said it’s easy.”
Optimal working model
Developer / architect = expert and final filter. AI = analyst, tech writer, navigator. Ritual: question → cited answer → verification → decision → ADR. Same pattern as teams restructuring SDLC around agents, not “asked ChatGPT and deployed.”
Future: how legacy maintenance will change
Self-documenting systems
Docs generated from main and published automatically; wiki/code drift becomes a CI failure. Knowledge base is a living artifact, not a PDF.
Digital architecture assistants
Internal assistants answer: “what breaks if we drop this column,” “technical debt in billing module,” “who last changed bank Y integration.” Linking monitoring and tickets adds incident context — missing from pure code RAG today.
Legacy in five years
Onboarding: weeks instead of months at the same quality bar. Maintenance: less bus factor on “the one who remembers.” Evolution: more product work, less archaeology. Legacy will not vanish — systems will evolve longer and rewrite less often when architecture tolerates change.
FAQ
Can ChatGPT read our entire repo at once?
No. Even large contexts do not fit multi-million-line codebases. You need indexing, RAG, or file-search agents (Cursor, Claude Code). Zip upload to web chat works only for small projects or selected modules.
How much onboarding time does AI save?
Typical outcome: initial system map compresses from weeks to days; full context (risk, policy, nuance) still measured in months — but productive work starts earlier.
Are hallucinations dangerous for legacy?
Yes. A confident wrong answer about an API or DB trigger can cause prod incidents. Require file citations, human review, and tests on critical paths.
Do we need separate RAG if we have Cursor?
Often enough for one developer in the IDE. For whole teams, wiki + Jira + code search, and compliance — you need a central corporate index with access control.
Can AI replace the architect on legacy?
No. It accelerates fact gathering; decisions on boundaries, migrations, and trade-offs stay with accountable humans.
What to index first?
Main branches of all prod repos, DB DDL and migrations, OpenAPI/integration contracts, ADRs and runbooks. Prod logs and PII only per security policy.
How to verify AI did not invent a dependency?
Open the cited file, grep the symbol, run tests or static analysis. For doubtful areas — second model or colleague.
Does this work in banking and government?
Yes, with on-prem or private cloud and no code leakage to external SaaS without contract. Many banks already run local LLM + RAG; regulation matters more than model choice.
Further reading
This article is the pillar for legacy + AI. Deep dives:
Conclusion
Can AI understand a fifteen-year-old project? — Yes, for a large share of discovery, faster than humans on first-pass reconnaissance. Architecture, entities, scenarios, integrations, impact analysis, and draft docs are strengths — with indexing and agent tools.
Today’s boundary: informal context, historical “why,” dynamic-code completeness, and prod accountability. Hallucinations are real; verification is mandatory.
The future is not replacing developers but expert + AI: less archaeology, more product evolution. Practical step this week: index main, connect an IDE agent or RAG, run a control experiment — three typical newcomer questions (“where is order status,” “who writes table X,” “which services depend on adapter Y”) — verified by a senior. The time delta shows ROI without a “transformation” deck.