ページタイトル

Data processing workflow applied DIGI-XTRACT to classify, detect, and extract both written and printed historical records from the 14th century

サービスの提供: Data extraction in historical records

BUSINESS CHALLENGES

About our client

DIGI-TEXX’s client is one of the most traditional and long-established libraries in Germany. They preserve an impressive array of books dating through centuries, reflecting the richness of human civilization and cultural diversity. On top of that, our client is renowned for safeguarding precious manuscripts and historical records from the 14th century.

The challenges

Our client’s storage contains over 6 million printed volumes with extensive collections of journals, manuscripts, and maps, reflecting centuries of academic development. All the vital documents above need digitizing for better management and open an easy-to-access platform for readers and researchers globally.

In their digital transformation journey, some of the challenges faced by our client are:

  • Lack of workforce to effectively process and digitize the immense volume of historical records they possess 
  • Massive amounts of handwritten ancient characters are required to be extracted accurately 
  • Limited service providers are capable of outputting the proceeded documents in MARC 21 format

The scope

Recognizing and extracting ancient characters including letters, punctuations, spaces, numbers, etc. 

  • Document types (both printed and handwritten)
    • Census information in apartments and buildings (names, addresses, occupation, building information, etc.) 
    • Information about constructions and infrastructures (employee name, year of establishment, age, castle information, etc.)
    • Survey information
    • Documents are written in Fraktur – a Western calligraphy style of the Latin alphabet
  • Languages: Old French, Old German
  • Volume: Over 10 million ancient characters for one project

DIGITIZING SERVICE FOR HISTORICAL RECORDS

We offer

  • Data processing workflow applied DIGI-XTRACT – A Document Processing solution built on the base of Machine Learning (ML) & Deep Learning (DL) technologies, to classify, detect, and extract both written and printed historical records
  • Experienced workforce to validate ancient characters, especially old German

Historical data processing workflow

Chart-Digitization-Workflow Historical records
  • DIGI-TEXX’s system receives scanned documents 
  • Apply DIGI-XTRACT to:
    • Classify the quality of the input data
    • Detect required fields 
    • Extract data 
  • Human validation to ensure the accuracy rate of each extracted character 
  • Export the output including images and metadata
  • Transfer data to the client system

BUSINESS OUTCOME

  • Extract ancient characters with high accuracy: 98%
  • Process 240.000 ancient characters in 1 day
  • Enable structured data for archiving and managing historical documents
  • Open an accessible digital platform for the public to read and research historical records
Historical records _ BUSINESS OUTCOME

CÁC DỰ ÁN LIÊN QUAN

Data Preparation Service On ERP Systems

BUSINESS CHALLENGES Our Client DIGI-TEXX’s client is a retail department store chain with over 90 locations in Germany. We provide ...

Data Extraction Solution for Customer Onboarding Straight-Through Process

BUSINESS CHALLENGES Our Client We serve a leading international insurance and financial services company with over 1.5 million customers operating ...

Object Detection and Labeling in The Construction Sector 

Our client is a company specializing in AI and Computer Vision in the construction sector. They have gained...

CHÚNG TÔI SẼ GIẢI QUYẾT CÁC KHÓ KHĂN CỦA BẠN