Document AI
Applied Document AI at Rényi AI
At Rényi AI, we explore the frontier of Document AI to solve real-world problems across a wide spectrum of domains. Our applied projects are designed to unlock value from unstructured documents—whether freshly scanned, historical, or highly confidential. Our work blends state-of-the-art machine learning, computer vision, and natural language processing into powerful solutions that automate, enhance, and protect document workflows.
Intelligent Document Processing for the Hungarian State Treasury
Project: Document Classification and Separation In collaboration with the Hungarian State Treasury (Magyar Államkincstár), we supported the digital transformation of administrative workflows. A substantial number of documents were scanned in bulk during this process. However, as different case-related documents were mixed together, manual navigation through the digitized stacks slowed operations significantly.
To address this, we developed an automated system for document separation and classification, enabling faster and more structured access to digital records. Our solution improved operational efficiency and set the stage for more intelligent downstream automation.
Document AI in Historical Archives
We are actively contributing to digitization and information retrieval in national archival institutions, including:
- Historical Archives of Hungarian State Security
- National Archives of Hungary
These archives present unique challenges: many documents are degraded, handwritten, or stored in difficult-to-read formats. Our work here spans several focus areas:
- Document Image Enhancement: Using advanced computer vision techniques, we restore and improve the legibility of deteriorated archival scans.
- Custom OCR Solutions: We train and fine-tune optical character recognition models on this niche dataset, significantly improving transcription accuracy for historical fonts, degraded pages, and unusual layouts.
- Digital Archivist Applications: We develop high-level tools such as retrieval-augmented generation (RAG) systems that make archival data searchable. These tools combine OCR and large language models (LLMs) to help historians and researchers ask natural language questions and receive document-grounded answers.
- Sensitive Data Detection: To aid daily administrative workflows, we build systems that identify and redact sensitive personal or classified information from document inquiries—enabling compliant and efficient document handling.