← All posts

Why EAV Looks Like a Good Idea — Then Destroys ERP Performance

Entity–Attribute–Value looks perfect at the start: flexibility without migrations. Why JOINs, indexes, and reports break ERP years later — and where EAV fits, and where it does not.

Contents

ERP developers keep coming back to the same idea: instead of dozens of tables with columns, use three universal entities, and any field can be added without a migration. The EAV (Entity–Attribute–Value) pattern — entity, attribute, value — looks almost perfect in a demo. A customer card is assembled in minutes. A new reference book is one row in metadata. The architect gets approval, the team celebrates the absence of ALTER TABLE.

Two or three years later, the same architecture becomes a bottleneck: cards take seconds to open, reports run for hours, DBAs live inside query execution plans, and developers are afraid to touch the "universal engine." The problem is not that EAV is "bad." The problem is that it solves flexibility at the cost of performance, development complexity, and rising maintenance cost — and that price only becomes visible once you already have a lot of data.

This article is a deep dive into EAV in the ERP context. If you want a broader view of universality as an architectural mistake, see the most expensive mistake in ERP architecture. Here the focus is on mechanics: why the pattern attracts teams, how it breaks the database, and what to use instead.

Key takeaways

EAV is a deferred-complexity trap. At the start you win on schema change speed; at millions of rows you lose on every SELECT.

Every field is an extra JOIN. Ten attributes — ten joins; fifty — a query the optimizer cannot stabilize.

Indexes and typing break in a single value column. Searching by phone and filtering by price need different strategies; a universal index does not exist.

Reports and analytics are EAV's weak spot. BI tools and regulated reporting expect tables with columns, not attribute pivots.

EAV belongs on the periphery, not in the core. Settings, rare custom fields, prototypes — yes. Orders, stock, ledger entries — no.

EAV is too good at the design stage. That is exactly why experienced architects do not build the ERP core on it.


Introduction: Why EAV Keeps Coming Back

ERP lives in a world where requirements change faster than the release cycle. Today you need a SKU; tomorrow a serial number; the day after that storage temperature and fifty more fields "only for this client." The classic relational model answers with migrations: ALTER TABLE, deploy, regression. EAV promises to never touch the schema — only add rows to the values table.

For an architect who has watched integrators spend months getting a column added in production, that sounds like salvation. For the business in a presentation, it sounds like "the system configures itself without programmers." For a developer in the first sprint, it sounds like an elegant abstraction. That is why EAV keeps resurfacing in every generation of ERP projects, despite mountains of postmortems from previous teams.

EAV in plain terms: instead of storing a customer in a row of the customers table with columns name, phone, email, you store an entity identifier and a set of "attribute name → value" pairs in separate rows. One table describes everything. Flexibility is infinite — until the first heavy report.


Chapter 1. What Is EAV

The classic relational model

A customer table is the example everyone understands:

id name phone email
1 Ivan Petrov +380501234567 [email protected]

Advantages:

  • Simple queriesSELECT name, phone FROM customers WHERE id = 1.
  • Indexes behave predictablyINDEX(phone) speeds up phone lookup.
  • Clear structure — the schema is visible in \d customers, in the IDE, in documentation.

DBMSs have been optimized for this model for decades. Column type is known, statistics are meaningful, execution plans are stable.

The EAV model

Instead of columns — value rows:

entity_id attribute value
1 name Ivan Petrov
1 phone +380501234567
1 email [email protected]

In real systems the schema is slightly more complex: separate entities, attributes, attribute_values tables with types and references. But the idea is the same: data structure is described by data, not by DDL.

Why developers think this is brilliant:

  • You can add fields without changing the database schema — a new row in attributes, not ALTER TABLE.
  • You can create any entitiesentity_type = 'product', 'order', 'warehouse_cell'.
  • No migrations for every customer field (in theory).
  • You can build a universal object constructor — the UI reads metadata and renders a form.

On paper, the problem of "the business asks for one more field" is solved forever.


Chapter 2. Why EAV Looks Like the Perfect ERP Solution

Business requirements keep changing

Typical evolution of a product card:

When What they ask for
Today SKU, manufacturer
Tomorrow Serial number, color
In a month Expiry date, storage temperature
In a year 50 more fields for industry, branch, regulator

In a normalized model, each step is a migration, tests, DBA sign-off, report updates. In EAV — configuration: an administrator or integrator adds an attribute in the UI. For the product owner that is a competitive advantage: react to the market faster.

The illusion of infinite flexibility

The architect thinks:

"We will create one universal table and never change the database structure again."

It sounds beautiful. It sounds like the end of the war between development and implementation. It sounds like a scalable product for a thousand clients with different processes.

The dream of a fully universal ERP

On the slide:

  • any reference book;
  • any document;
  • any business process;
  • any set of fields —

everything is stored the same way. One persistence engine, one reporting engine, one getField('...') API. It seems the ERP architectural problem is solved forever.

In practice you did not solve the problem — you deferred it into the layer of queries, indexes, and debugging. More on the cost of universality in the article on universal ERP architecture.


Chapter 3. The First Hit to Performance

A simple SELECT turns into a nightmare

In a normal model:

SELECT name, phone, email
FROM customers
WHERE id = 1;

One primary-key lookup. Microseconds.

In EAV for the same three fields:

SELECT
    MAX(CASE WHEN a.code = 'name'  THEN av.value_string END) AS name,
    MAX(CASE WHEN a.code = 'phone' THEN av.value_string END) AS phone,
    MAX(CASE WHEN a.code = 'email' THEN av.value_string END) AS email
FROM entities e
JOIN attribute_values av ON av.entity_id = e.id
JOIN attributes a ON a.id = av.attribute_id
WHERE e.id = 1
  AND e.entity_type = 'customer'
GROUP BY e.id;

Already three attributes — JOIN, GROUP BY, conditional aggregate (pivot). The alternative is a separate JOIN per field:

SELECT v_name.value  AS name,
       v_phone.value AS phone,
       v_email.value AS email
FROM entities e
JOIN attribute_values v_name  ON v_name.entity_id  = e.id
JOIN attributes a_name        ON a_name.id = v_name.attribute_id  AND a_name.code = 'name'
JOIN attribute_values v_phone ON v_phone.entity_id = e.id
JOIN attributes a_phone       ON a_phone.id = v_phone.attribute_id AND a_phone.code = 'phone'
JOIN attribute_values v_email ON v_email.entity_id = e.id
JOIN attributes a_email       ON a_email.id = v_email.attribute_id AND a_email.code = 'email'
WHERE e.id = 1;

For each field — an extra join. A customer card with 20 fields — 20 JOINs just to show a form.

The number of joins grows

Fields on object Query character
10 Tolerable at small volume
50 Noticeable CPU load, complex plans
200 A separate engineering project, caches, denormalization

Read cost grows linearly with the number of attributes in the worst case and worse when filtering on several fields at once.

The DB optimizer starts to suffer

Symptoms:

  • Complex execution plans — dozens of nodes, nested loop, hash join on millions of rows in attribute_values.
  • Unstable queries — after ANALYZE the plan changes; yesterday's fast report hangs today.
  • Unexpected degradation — one attribute added, and the order list query slows fivefold because the optimizer chose a different JOIN order.

Relational DBMSs optimize fixed schemas extremely well. EAV turns every SELECT into a dynamic puzzle.


Chapter 4. Indexes Stop Saving You

In a normal table the index is obvious

CREATE INDEX idx_customers_phone ON customers(phone);

SELECT * FROM customers WHERE phone = '+380501234567';

Index scan, thousands of rows per second. The DBA is happy.

In EAV everything lives in universal columns

Logically:

attribute = 'phone'
value = '+380501234567'

Physically — millions of rows in attribute_values, where value_string holds names, phones, dates, JSON, and numbers mixed together.

Problems:

  • An index on value alone is useless for selectivity — half the table is "Ivan" and "red".
  • An index on (attribute_id, value_string) helps one attribute but swells with every new entity type.
  • Composite indexes multiply: for phone, for sku, for status — separate strategies, because types and cardinality differ.

There become too many indexes

For each "hot" attribute group the DBA adds a partial or composite index. The result:

  • database size grows — indexes rival the data;
  • RAM consumption grows — the hot set no longer fits in memory;
  • writes slow down — every value insert updates several B-trees.

In a normalized table one order record is one row, a few indexes. In EAV one order record with 30 fields is 30 INSERTs and a cascade of index updates.


Chapter 5. Filtering Becomes Expensive

The business wants a simple report

"Show products that are red and cost more than 100 euros."

In a normal schema:

SELECT id, name, price
FROM products
WHERE color = 'red'
  AND price > 100;

Two conditions on typed columns. An index on (color, price) or a bitmap plan — a standard task.

In EAV — a chain of JOINs

You need entities that simultaneously have:

  • attribute color with value 'red';
  • attribute price with value > 100.

Typical pattern:

SELECT e.id
FROM entities e
WHERE e.entity_type = 'product'
  AND e.id IN (
      SELECT av1.entity_id
      FROM attribute_values av1
      JOIN attributes a1 ON a1.id = av1.attribute_id
      WHERE a1.code = 'color' AND av1.value_string = 'red'
  )
  AND e.id IN (
      SELECT av2.entity_id
      FROM attribute_values av2
      JOIN attributes a2 ON a2.id = av2.attribute_id
      WHERE a2.code = 'price' AND av2.value_number > 100
  );

Two subqueries, four JOINs, set intersection. Add sorting by name, grouping by category, and pagination — the query keeps growing.

Complex filters are a disaster

Especially painful when one report has:

  • dozens of attributes in SELECT;
  • filters on five to ten fields;
  • GROUP BY and aggregations;
  • sorting on untyped value.

Each condition is another branch in the plan. The user waits. The business asks why a "simple list" does not work.


ERP lives on reports

Operational accounting is forms and cards. But ERP value for management is in summaries: sales by region, stock as of a date, receivables, cost, plan vs actual. Those queries:

  • touch large volumes of data;
  • run regularly and in parallel;
  • do not tolerate "five times slower since Monday" degradation.

EAV is a poor fit for analytics

Reasons:

  • many JOINs — every report column is pivoted from attributes;
  • many conversionsCAST, CASE, type coercion from value_string;
  • complex aggregationSUM(price) requires assembling price into a row per entity first.

A common outcome: reports move to nightly ETL into a separate warehouse with normalized marts. The EAV ERP core remains for data entry; analytics lives in a copy of the data. You pay for two models instead of one.

Why BI tools dislike EAV

Power BI, Metabase, internal report builders expect:

  • tables with names;
  • columns with types;
  • relationships via foreign keys.

EAV forces you either to write views with hundreds of lines or configure a semantic layer that emulates a normal schema on top of EAV. In effect you build a normalized model a second time — on top of the first — to fix an architectural choice.


Chapter 7. Typing Starts to Break Down

One column stores everything

A typical universal cell value_string (or several value_* columns):

100
150.5
Ivan Petrov
2025-01-01
true
{"nested": "json"}

What is the value?

For the DBMS and optimizer — a string. For the business — a number, date, flag, reference. The application must guess the type from attribute metadata on every operation.

Every operation requires conversion

WHERE CAST(av.value_string AS NUMERIC) > 100
  • CAST kills index use on the value.
  • Type errors surface at runtime: 'N/A' in a price field breaks a report.
  • Comparing dates as strings gives wrong sort order without a strict format.

Constant CAST, CONVERT, checks in the application — extra CPU on every row and a source of quiet bugs in reporting.


Chapter 8. Data Volume Grows Explosively

Simple math

Objects Fields per object Rows in attribute_values
100,000 100 10,000,000
1,000,000 50 50,000,000
500,000 orders 80 fields 40,000,000 orders alone

In a normalized model 100,000 customers — 100,000 rows. In EAV — 100,000 × N attributes.

What happens next

Growth in:

  • tables — backup and restore measured in hours;
  • indexes — rebuild and vacuum become a planned project;
  • replication — read replica lag is visible to operators;
  • cloud cost — more disk, more IOPS, more RAM.

Every new field increases volume

Even if an attribute is filled for 1% of records, in a horizontal EAV model an empty value is often still stored or complicates queries with LEFT JOIN. Adding a "rare" field in the configurator is not free for infrastructure.


Chapter 9. Why Development Gets Harder Too

Queries are hard to read

A simple list of orders for the week in EAV — a page of SQL with subqueries, aliases av_status, a_status, av_total. Code review turns into archaeology. A new developer cannot answer "what fields does an order have?" without querying attributes.

Errors become more expensive

  • Logic is spread across metadata, a universal service, and SQL.
  • Data is non-obvious — a typo in an attribute code breaks a report for one client.
  • Hard to debug — a "wrong status" bug requires tracing entity → values → client 17 override.

Dependence on "magicians"

Two or three people appear who "understand the engine." They leave — onboarding takes months. That is not mature architecture; it is bus factor at minimum. See also why ERP systems turn into monoliths — EAV accelerates the same process.


Chapter 10. The Real Story of Most EAV ERPs

The typical trajectory is not theory — it is a pattern from dozens of projects.

Stage 1. Enthusiasm

"We built a universal platform. Any reference book in a day. Competitors run migrations — we run config."

Stage 2. Growth

Clients appear, documents, millions of rows in attribute_values. Functionality grows. Still tolerable on powerful hardware.

Stage 3. Slowdown

User complaints:

  • cards take long to open;
  • reports are slow;
  • reference search lags.

Support blames "large data volume." DBAs open EXPLAIN ANALYZE for the first time.

Stage 4. Workarounds

  • caches for cards and lists in Redis;
  • materialized views for the top 10 reports;
  • denormalized tables orders_flat for the UI;
  • a separate search index (Elasticsearch, OpenSearch) for reference books.

Each workaround is an admission that EAV cannot carry the hot path.

Stage 5. De facto abandonment of EAV

The system grows specialized tables: orders, order_lines, inventory_movements. EAV remains for "custom fields" and legacy configuration. You get a hybrid — the most honest architecture, but expensive: you pay for both worlds until you cut out the old core.


Chapter 11. When EAV Is Actually Useful

EAV is not an anti-pattern everywhere. It is a tool with a narrow scope.

System settings

Module configuration, feature flags, integration parameters — few records, rare changes, no heavy analytics.

Metadata

When structure is unknown upfront and volume is small: form field definitions, import schema, external system mapping.

Rarely used additional fields

Custom fields on a customer card — manager comment, internal code, non-standard attribute for one branch. A mistake here does not break the balance sheet.

Small volume and non-critical performance

Prototype, internal tool, low-code platform — the user consciously trades query speed for flexibility.

Rule of thumb: if stock, money, taxes, or regulated reporting depend on a field — it belongs in a typed table column, not an EAV row.


Chapter 12. What to Use Instead of EAV

Classic normalized tables

The foundation of most successful ERPs. An order is an orders table, lines are order_lines, foreign keys, invariants in code or CHECK. Boring, predictable, fast.

JSON fields for rare attributes

PostgreSQL JSONB, MySQL JSON — flexibility without a JOIN per field:

ALTER TABLE products ADD COLUMN extras JSONB;
-- extras->>'color', GIN index when needed

Pros: one row per product, fewer joins, simpler card. Cons: weaker typing, JSON indexing requires discipline. Fits peripheral data.

Hybrid architecture

The most popular compromise in mature systems:

Layer Storage
Core (order, product, stock) Typed columns
Extensions (client custom) JSONB or a separate EAV table with small volume
UI settings Metadata

80% of queries hit normal tables; 20% tolerate flexibility.

Entity engine with physical tables

Metadata describes an entity, but when the schema is published the system generates real tables (ALTER or separate tenant schema). Flexibility at design time, normal DB performance at runtime. Complexity lives in the generation tool and migrations — but that is honest complexity, not complexity hidden in JOINs.


Conclusion

The core problem with EAV

Not that it is a "bad idea." That it is too good an idea at the design stage.

The architect at the whiteboard sees:

  • flexibility;
  • universality;
  • no migration for every field.

They do not see (or defer):

  • millions of rows in the values table;
  • an avalanche of JOINs in every report;
  • indexes breaking in a single value column;
  • analytics and BI pain;
  • rising maintenance cost and dependence on a few people who "know how it works."

The same path most large ERPs take

  1. Enthusiasm for universality.
  2. Fight with performance — caches, hardware, DBAs.
  3. Denormalization — flat tables, materialized views.
  4. Return to specialized tables in the core.

That is why experienced architects use EAV as a supporting tool — settings, custom fields, prototypes — but do not build the core the orders, warehouse, and money flow through on it.

If you are choosing a data schema for a new ERP today, ask one question: "In three years, can we explain the EXPLAIN of the main sales report to a junior developer in an hour?" On a normalized model — yes. On EAV in the core — probably not. And that costs more than any migration saved at the start.