Baza Publikacji Pracowników Politechniki Lubelskiej

Status:
Autorzy:	Wójcicki Piotr, Zientarski Tomasz
Dyscypliny:
	Aby zobaczyć szczegóły należy się zalogować.
Rok wydania:	2024
Wersja dokumentu:	Drukowana \| Elektroniczna
Język:	angielski
Wolumen/Tom:	12
Strony:	49817 - 49825
Impact Factor:	3,6
Web of Science® Times Cited:	0
Scopus® Cytowania:	2
Bazy:	Web of Science \| Scopus \| IEEE Xplore
Efekt badań statutowych	NIE
Materiał konferencyjny:	NIE
Publikacja OA:	TAK
Licencja:
Sposób udostępnienia:	Witryna wydawcy
Wersja tekstu:	Ostateczna wersja opublikowana
Czas opublikowania:	W momencie opublikowania
Data opublikowania w OA:	12 kwietnia 2024
Abstrakty:	angielski
	Word recognition of Slavic languages is not an easy task due to the complicated declension of words and a variety of diacritical signs. Polish is a representative of West Slavic languages, which are written in Latin characters. Automatic handwritten word recognition in Slavic languages is not easy, due to the poor recognition rate of letters with diacritical signs and lack of good handwritten text corpora for languages with declension. The main aim of the research is to investigate the possibility of correcting typos made in the final phase of recognizing Polish. The method developed is based on letter recognition by means of convolutional neural networks (CNNs) and text matching algorithms for resulting words. At the first stage, we use a designed convolutional neural network for character recognition. At the second stage, after combining letters into words we apply a post-processing error correction method, which improves the efficiency of recognition of the misspelled words. We checked the efficiency of word matching for a few measures of similarity of words, i.e: edit distance (Damerau-Levenshtein), string matching (Sorensen-Dice) and list of candidates. In addition, we examine how word length and the number of misplaced letters affect the behaviour of the algorithms used. The analysis is carried out for bigram and trigram methods. By combining different methods to assess the similarity of words, better selection of lists of proposed words has been achieved. The article proposes an innovative method for correcting post-processing errors in recognizing Polish words with the efficiency of correct word matching ranging from 76% to 99%, depending on the measure and word length used.

Informacja o cookies

Polish Word Recognition Based on n-Gram Methods

Artykuł w czasopiśmie