Features of the Knowledge Base of the System of Automated Construction of Logic and Linguistic Models of Text Documents

pp. 75 - 83
National Aviation University

The article outlines the problem of finding meaningful units in electronic text documents and analyzes the main shortcomings of existing approaches of extracting knowledge from textual information. The article is devoted to the study of the peculiarities of the process of construction of logic and linguistic models of electronic text documents, in particular the description and research of the peculiarities of knowledge bases of the system of automated construction of logic and linguistic models of Ukrainian- language text documents. The author proposes a scheme of formalization of textual information based on the construction of a logic and linguistic model of an electronic text document. The first stage of construction is the formation of logical and linguistic models of natural language sentences, which uses a specially developed method of automated formation of logical and linguistic models. This method is based on parsing sentences of natural language, using words of natural language as a thesaurus database and using a database of rules to identify logical connections. This in turn is made possible by the author’s developed knowledge base 1, which is used to determine the role of each word in an electronic text document and serves as a production model with formalized rules of the Ukrainian language for forming phrases that can form members of sentence of natural language. The knowledge base 2 was created by the author to find connections between sentences that are part of an electronic text document and is a set of productions that reflect the principles of synthesis of logic and linguistic models of sentences of natural language, ie the rules of combining and replacing structural components of logic and linguistic models of sentences of natural language. The knowledge base 3, used to build the linguistic component of the logic and linguistic model of a text document, is a set of productions that contains the rules of forming of transition networks to interpret the thematic progression of the text. The application of the developed formalized rules was demonstrated on specific text fragments. Applying the developed knowledge bases allows to trace the process of formation of logic and linguistic models of electronic text documents.

