Natural language processing (NLP) is a realm of artificial intelligence that helps machines understand and interpret human language and vice versa.
Understanding a text is much more complex than you think. It’s necessary for computers to identify a context, to perform syntactic, morphological, semantic and lexical analysis of it, to produce abstracts, to perform translations into other languages, to absorb information, to interpret senses and feelings, and to extract learning.
How did NLP come about?
The studies began in the 1940s, with the creation of Machine Translation, which had the purpose of differentiating languages and perceiving syntax problems. The machine evolved into systems based on dictionaries that could do simple translations.
In 1950, Noam Chomsky published studies on generative grammar – a theory that the generative capacity of languages makes infinite use of finite lexical and grammatical resources – and made the NLP surveys, which were previously aimed solely at computing, into linguistics as well.
However, it was only in the late 1960s that things began to take off. After Chomsky’s work on language skills – a set of “norms” internalized in our minds that allows us to see if the sentences are grammatically correct or not -, several methods have been developed, with the main concern about models of meaning representation.
A major highlight in this development was Eliza, the first chatbot in history. Created by the German Joseph Weizenbaum in MIT’s Artificial Intelligence laboratory, Eliza simulated conversations between patient and psychologist – being the psychologist -.
By the late 1970s, scientists had emphasized research into systems that were able to perform tasks guided by dialogues.
The 1980s were characterized by attempts to solve problems in NLP and by the progress of the contexts’ interpretation.
In the 1990s, several studies have triumphed in the identification of language and treatment of ambiguity, for example.
In recent years, it has become possible to develop and model sophisticated linguistic systems to meet specific objectives. Today, natural language processing is part of our everyday life in the smallest of things and keeps evolving.
How does it work?
Pre-processing
To concentrate the data and to shape the language, so that the machine can understand it, textual pre-processing are indispensable.
Normalization is the process in which adjustments such as tokenization, the removal of tags and special characters and the change of capital letters to lowercase are covered. Lexical tokenization marks each word as a token in the text and sentential tokenization identifies and marks sentences. That is what begins to structure the text.
Removal of stopwords takes away very repeated words, like “the”, “that”, “of”; because, in general, they aren’t relevant to the construction of the model. Such removals should only occur when stopwords don’t really matter.
Removing numerals and accompanying symbols (for example: “$”, “km”, “kg”) is also required.
The use of spell checkers is quite common for dealing with typos, abbreviations and informal vocabulary.
The stemming process reduces a word to its radical and lemmatization to its lemma.
Processing
There are 7 levels of processing: phonology, morphology, lexical, syntactic, semantic, discourse and pragmatic.
Phonology identifies and interprets the sounds that form words when the machine needs to understand the spoken language.
Morphology takes care of what refers to the composition of words and their nature, dividing them into morphemes (smaller units of meaning that constitute the words: ending, root, radical, affix, theme and vowel thematic). While the syntactic processing analyzes the sentence formation and the specification of the structures allowed in the language.
The lexical analyzes the input of character lines (like the source code of a program) and produces a sequence of lexical tokens. Captures the individual meaning of words.
The semantic analysis deals with the meaning of the phrase. Extracts a meaning from the syntactic structure. And the speech checks the total meaning of the text.
Finally, the pragmatic processing interprets the concepts extracted from the text, ascertaining if the meaning of the semantic analysis is correct and determining meanings that aren’t clear.
Types of approach
The levels of processing are approached in 4 ways: symbolic, statistical, connectionist and hybrid.
The symbolic approach is based on unambiguous and well-structured linguistic rules.
Statistics use mathematical models to deduce the correct use of levels without applying linguistic rules.
The connectionist also develops generic models for language, but does so using statistical learning and theories of knowledge representation, and can transform, infer and manipulate texts.
Finally, the hybrid approach mixes the previous approaches. Flexes and treats NLP issues broadly and effectively.
NLP Technologies
We can find the natural language processing in different scopes and the trend is that it reaches more and more areas. See some of the technologies!
The systems of dialogue try to reproduce a human conversation, through voices, texts, and even gestures.
Applying questions and answers, as the name already says, answers specific questions to users. This is the case of Siri, Apple.
Understanding natural language gives a meaning that machines can understand for the sentences of the human language.
Machine translation is a complex technology of translating texts from one language into another, which is considered one of the main difficulties of artificial intelligence.
Bothub
A great example of a natural language processing system is one of the most recent creations of our team (Ilhasoft). After the expertise and insights we gained in enhancing the Push platform, we decided to develop Bothub, a smart, democratic and open platform that can understand multiple languages – 7 for now -.
The software allows T.I. professionals to train sample sentences, synonyms and make translations.
Did you like this content? Follow our blog!!