User:Hrishikesh.kb/OCR

Corpus

Notes Based on the paper Building Data Sets for Indian Language OCR Research by C.V. Jawahar, Anand Kumar, A. Phaneendra, and K.J. Jinesh , IIIT Hyderabad,

Generation of large database of annotated document images involves

Identification of the content/source
Employing well-defined,repeatable pre-processing steps for creation of multiple images suited for various DIA tasks;
consistent labeling procedures for annotation
structured storage of annotation information for effective access.

Annotation

Labelling image components(often with text)
Additional details such as
- Layout information
- Language/Script
- Scanning parameters
- Printing parameters

etc are usefull

Levels of annotation
- Structural level
- Functional level
- Content level

out of which 'content level' annotation of critical for OCR

Anonymous

Search

User:Hrishikesh.kb/OCR

Namespaces

More

Page actions

Corpus

Annotation

Navigation

Navigation

പ്രധാന കണ്ണികള്‍

പ്രാദേശികവത്കരണം

നിവേശകരീതികള്‍

സംഭാഷണോപാധികള്‍

ഉപകരണങ്ങള്‍

കല

പ്രസിദ്ധീകരണം

Wiki tools

Wiki tools

Anonymous

Search

User:Hrishikesh.kb/OCR

Corpus

Annotation

Navigation

Wiki tools

Page tools

Categories