Baza Publikacji Pracowników Politechniki Lubelskiej

Status:
Autorzy:	Badurowicz Marcin
Dyscypliny:
	Aby zobaczyć szczegóły należy się zalogować.
Rok wydania:	2022
Wersja dokumentu:	Elektroniczna
Język:	angielski
Numer czasopisma:	1
Wolumen/Tom:	18
Strony:	89 - 98
Scopus® Cytowania:	1
Bazy:	Scopus
Efekt badań statutowych	NIE
Materiał konferencyjny:	NIE
Publikacja OA:	TAK
Licencja:
Sposób udostępnienia:	Witryna wydawcy
Wersja tekstu:	Ostateczna wersja opublikowana
Czas opublikowania:	W momencie opublikowania
Data opublikowania w OA:	24 marca 2022
Abstrakty:	angielski
	In the paper, the authors are presenting the outcome of web scraping software allowing for the automated classification of source code. The software system was prepared for a discussion forum for software developers to find fragments of source code that were published without marking them as code snippets. The analyzer software is using a Machine Learning binary classification model for differentiating between a program- ming language source code and highly technical text about software. The analyzer model was prepared using the AutoML subsystem without human intervention and fine- tuning and its accuracy in a described problem exceeds 95%. The analyzer based on the automatically generated model has been deployed and after the first year of contin- uous operation, its False Positive Rate is less than 3%. The similar process may be introduced in document management in software development process, where automatic tagging and search for code or pseudo-code may be useful for archiving purposes.

Informacja o cookies

Detection of source code in internet texts using automatically generated machine learning models

Artykuł w czasopiśmie