Features of the Knowledge Base of the System of Automated Construction of Logic and Linguistic Models of Text Documents

Anastasiia Vavilenkova

The article outlines the problem of finding meaningful units in electronic text documents and analyzes the main shortcomings of existing approaches of extracting knowledge from textual information. The article is devoted to the study of the peculiarities of the process of construction of logic and linguistic models of electronic text documents, in particular the description and research of the peculiarities of knowledge bases of the system of automated construction of logic and linguistic models of Ukrainian- language text documents. The author proposes a scheme of formalization of textual information based on the construction of a logic and linguistic model of an electronic text document. The first stage of construction is the formation of logical and linguistic models of natural language sentences, which uses a specially developed method of automated formation of logical and linguistic models. This method is based on parsing sentences of natural language, using words of natural language as a thesaurus database and using a database of rules to identify logical connections. This in turn is made possible by the author’s developed knowledge base 1, which is used to determine the role of each word in an electronic text document and serves as a production model with formalized rules of the Ukrainian language for forming phrases that can form members of sentence of natural language. The knowledge base 2 was created by the author to find connections between sentences that are part of an electronic text document and is a set of productions that reflect the principles of synthesis of logic and linguistic models of sentences of natural language, ie the rules of combining and replacing structural components of logic and linguistic models of sentences of natural language. The knowledge base 3, used to build the linguistic component of the logic and linguistic model of a text document, is a set of productions that contains the rules of forming of transition networks to interpret the thematic progression of the text. The application of the developed formalized rules was demonstrated on specific text fragments. Applying the developed knowledge bases allows to trace the process of formation of logic and linguistic models of electronic text documents.

meaningful units

natural language

electronic text document

logic and linguistic model

knowledge base

production model

Fillipov, К.А. (2008). Text Linguistics. SpB Publisher, 336 p.
Vavilenkova, A. (2020). Modelling of the context links between the natural language sentences. Proceedings of the 9th International Scientific and Practical Conference «Information Control Systems & Technologies» (ICST2020), pp. 282-293.
Bisikalo, O.V., Wojcik, W., Yahimovich, O.V., Smailova, S. (2015). Method of determining of keywords in English texts based on the DKPro Core. Technology Audit and Production Reserves, 1/2(21), pp. 26-30. https://doi.org/10.15587/2312-8372.2015.37274. phttps://doi.org/10.15587/2312-8372.2015.37274
Bengfort, B. Syntax Parsing With CoreNLP and NLTK. Available at: https://www.districtdatalabs.com/syntax-parsing-with-corenlp-and-nltk. (Accessed: 5 March 2021).
Gupta, M. Syntactic/ Constituency Parsing usiong the CYK algorithm in NLP. Available at: https://medium.com/data-science-in-your-pocket/syntactic-constituency-pa.... (Accessed: 4 May 2020).
NLPIR 2020: Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, Association for Computing Machinery, New York, United States, Seoul Republic of Korea. Available at: https://dl.acm.org/doi/proceedings/10.1145/3443279. (Accessed: 5 March 2021).
NLPAI 2021: 2nd International Conference on Natural Language Processing and Artificial Intelligence. China. Available at: http://www.nlpai.org/. (Accessed: 5 March 2021).
Lande, D.V. (2014). The Elements of Computer Linguistics in Legal Informatics. Kyiv, NDIIP NAPrH, 168 p.
Sumam, F., Landeghem, J.V., Moens, M.-F. (2019). Transfer Learning for Named Entity Recognition in Financial and Biomedical Documents. Information 2019, 10(8), 248. https://doi.org/10.3390/info10080248. phttps://doi.org/10.3390/info10080248
Chen, X., Xie. H., Cheng, G., Poon, L., Leng, M., and Wang, F. (2020). Trends and Features of the Applications of Natural Language Processing Techniques for Clinical Trials Text Analysis. Applied Sciences.10, 2157. doi:10.3390/app10062157. phttps://doi.org/10.3390/app10062157
Khairova, N., Mamyrbayev, O., Mukhsina, K. and Kolesnyk, A. (2020), «Logical-linguistic model for multilingual Open Information Extraction», Cogent Engineering, doi: 10.1080/23311916.2020.1714829. phttps://doi.org/10.1080/23311916.2020.1714829
Khairova, N., Petrasova, S. and Gautam A.P.S. (2016), The Logical-Linguistic Model of Fact Extraction from English Texts, Communications in Computer and Information Science, vol. 639. Springer, Cham. https://doi.org/10.1007/978-3-319-46254-7_51. phttps://doi.org/10.1007/978-3-319-46254-7_51
Vavilenkova, А.І. (2017), Analysis and Synthesis of logic and linguistic models for natural language sentences, TOV «SIK GROUP UKRAINE», Kyiv, 152 p.
Vavilenkova, A. (2015), Basic principles of the synthesis of logical-linguistic models, Cybernetics and systems analysis, Vol. 51(5), pp. 826-834, http:// doi.org/10.1007/s10559-015-9776-z. phttps://doi.org/10.1007/s10559-015-9776-z