← All posts

Smart document registry: free Google Drive OCR and Gemini quota tricks

Google Apps Script, hidden Drive OCR, Gemini key rotation, and LockService — a Habr pipeline for cataloging huge archives.

Contents

In brief

How do you walk gigabytes of PDFs and DOCX on Google Drive, extract paper titles and abstracts, and survive Gemini API quotas? A Habr write-up chains Google Apps Script, built-in Drive OCR, time triggers, LockService, and API key rotation — no paid document parsers.

What happened

The author needed to catalog a large scientific archive: exact title, short summary, and whether a specific researcher co-authored each paper. A naive Apps Script hit three walls at once.

Six-minute execution limit: OCR plus LLM per heavy PDF takes 15–40 seconds — the run dies around file 20. Binary formats: GAS cannot read PDF/DOCX natively; paid parsers are expensive. Gemini free-tier quotas → rapid HTTP 429.

The fix stacks several tricks. Hidden Google Drive OCR: via Drive API, copy PDF/DOCX to a temp Google Doc with ocr: true — same engine as manual scan open. Read text with DocumentApp, delete the temp file in finally or Drive fills with junk.

Beat the 6-minute cap with a Google Sheet as a simple DB: cache processed filenames, a minute trigger restarts the script, the new run skips finished rows and continues at file 16. LockService stops races: while one run OCRs a PDF for over a minute, the next trigger must not duplicate rows.

Gemini key rotation: an array of AI Studio keys; on 429, switch; if you wrap the pool, sleep 30s for RPM reset. Ask the LLM for JSON (responseMimeType: application/json) — title and summary in one call, no markdown fences.

Why it matters

The pattern shows Apps Script can run long background pipelines when you chunk work and guard state — cheaper than a dedicated OCR server for hundreds–thousands of Drive files, not millions.

Trade-offs: Google quota dependence and temp-file hygiene. The combo free OCR + Flash Lite + key pool can process on the order of 1,500 docs/day on three keys in ~2 hours of trigger time.

In practice

  1. Enable Drive API in the Apps Script editor, not only DocumentApp.
  2. OCR with Drive.Files.copy, ocr: true, ocrLanguage: "ru"try/finally delete temps.
  3. Track progress in Google Sheets; hash/skip processed names before the loop.
  4. Time-driven triggers; delete triggers when the catalog finishes.
  5. LockService.getScriptLock() per file — no parallel double-processing.
  6. GEMINI_API_KEYS pool, rotate on 429, Utilities.sleep(30000) when all keys hit RPM.
  7. responseMimeType: "application/json" — structured fields without ```json parsing.
  8. Non-text formats (.pptx, .xlsx) → placeholder rows, zero tokens.

Takeaway

The Habr article is a practical autonomous Drive document registry: OCR without third-party services, LLM field extraction, resilience to timeouts and quotas. If your archive lives in Google cloud, adapt columns and prompts — code fragments are in the original.