Baza Publikacji Pracowników Politechniki Lubelskiej

Status:
Autorzy:	Tokovarov Mikhail, Kaczorowska Monika, Miłosz Marek
Dyscypliny:
	Aby zobaczyć szczegóły należy się zalogować.
Rok wydania:	2020
Wersja dokumentu:	Elektroniczna
Język:	angielski
Numer czasopisma:	3
Wolumen/Tom:	14
Strony:	30 - 38
Web of Science® Times Cited:	2
Scopus® Cytowania:	3
Bazy:	Web of Science \| Scopus
Efekt badań statutowych	NIE
Materiał konferencyjny:	NIE
Publikacja OA:	TAK
Licencja:
Sposób udostępnienia:	Otwarte czasopismo
Wersja tekstu:	Ostateczna wersja opublikowana
Czas opublikowania:	W momencie opublikowania
Data opublikowania w OA:	1 września 2020
Abstrakty:	angielski
	In the modern world fast and efficient processing of non-digital (handwritten or typed) texts is the task of extreme importance. Similar to many other fields, optical character recognition (OCR) benefits from appliance of machine learning (ML) which allows to develop effective and accurate methods. In order to achieve good performance a machine learning algorithm requires great amount of data. Nowadays a large database of handwritten characters prepared by National Institute of Standards and Technology (NIST), USA can be used for training an ML model. However, significant differences between manners of handwriting in the US and Poland exist. That fact along with the absence of Polish signs causes the NIST database to be less useful for development of OCR model for Polish language. According to the best knowledge of the authors, no database with samples of Polish handwriting exists. The present research is focused at filling this gap, i.e. gathering and preparing an extensive database of Polish handwritten characters. The paper presents the very first database of Polish handwriting samples. The database is by far larger than all the datasets used in previous attempts of implementing OCR for Polish handwriting. The database also is the first fully publicly accessible database of Polish handwriting of this scale. The same method and developed tools can be used to build handwritten characters databases of other languages.

Informacja o cookies

Development of Extensive Polish Handwritten Characters Database for Text Recognition Research

Artykuł w czasopiśmie