Natural Language Processing (NLP)
Natural Language Processing (NLP)
Natural language processing (NLP) is the process by which artificial intelligence interprets and manipulates human language. In short, NLP provides computers with the ability to understand natural language. NLP fills the gap between human communication and computer understanding. It also draws from many disciplines, like computer science and computational linguistics.
There's a good chance you've interacted with NLP through voice-operated GPS systems, digital assistants, speech-to-text dictation software, customer service chatbots, and other consumer conveniences. NLP also plays a growing role in enterprise solutions that help streamline business operations, increase employee productivity, and simplify mission-critical business processes.
It improves communication between humans and computers
NLP is a technology where human language in the form of text or voice data is processed by computers. It helps computers 'understand' its whole meaning, including the speaker or writer's intent and sentiment. This is very important as it is the foundation that allows NLP to be used in various applications like speech-to-text dictation software, voice-operated GPS systems, and the like. (IBM, 2020)
It helps to analyse large volumes of texts
In today's interconnected world, there is a large volume of natural language text containing content of knowledge or information. "Today’s machines can analyse more language-based data than humans without fatigue and in a consistent, unbiased way." Considering the staggering amount of unstructured data generated every day, from medical records to social media posts, NLP technology is key to fully analysing text and speech data efficiently.
(SAS, n.d.)
It assists in providing a structure to natural language
As humans who communicate in our native language daily, we might not realise the complexity and diversity of the human language. We express ourselves verbally and in writing. There are hundreds of languages and dialects. Each language has a unique set of grammar and syntax rules, terms and slang. We often misspell, abbreviate words, or omit punctuation when we write. We have regional accents, mumble, stutter and borrow terms from other languages when we speak.
Here are some examples to help you to understand some of the difficulties that a computer might face while trying to recognise the natural language.
"Hey bro, Ill be there n 20m pls sorry Im a dog 4 being l8."
"The fish is ready to eat."
Named-entity recognition (NER)
Otherwise known as NER, this NLP task helps computers determine words that are categorised into groups.
For example, NER could analyse a news article and identify all mentions of a company or product. It would be able to differentiate between entities that are visually the same using the semantics of the text.
For instance, in the sentence, 'Daniel McDonald's son went to McDonald's and ordered a Happy Meal,' NER could help the computer recognise the two instances of "McDonald's" as two separate entities: one a restaurant and the other a person.
(Lutkevich, B., 2021)
Co-reference resolution
Co-reference resolution is the task of identifying if and when two words refer to the same entity.
The most common example is determining the person or object to which a certain pronoun refers (for example, ‘she’ = ‘Mary’). According to IBM, "it can also involve identifying a metaphor or an idiom in the text (for example, an instance where 'bear' isn't an animal but a large hairy person)."
(IBM, 2020)
Word sense disambiguation
Word sense disambiguation is the "selection of the meaning of a word with multiple meanings."
At times, natural language in NLP goes through a process of semantic analysis that determines the word that makes the most sense in the given context.
IBM provides the example of how "word sense disambiguation helps distinguish the meaning of the verb 'make' in ‘make the grade’ (achieve) vs. ‘make a bet’ (place)."
(IBM, 2020)
What does "semantic" mean?
Semantics studies word meanings and how words combine to form sentences with new meanings.
(Mousa, H. M, 2019)
Automated question answering (QA)
Automated question-answering (QA) systems retrieve information in response to questions asked by humans in natural language
The fundamental use of this task is to assist man-machine interaction. In the context of a virtual assistant, you could ask Google:
“Hey, Google, who won the most individual medals in the Olympics in 2012?”
An excellent QA system would give you the answer,
"Michael Phelps won the most medals in the 2012 Olympics."
However, less perfect QA systems would instead provide you with a list of relevant documents to explore in the search for an accurate answer.
(Dwivedi, S. K., & Singh, V., 2013)
Text summarisation
Text summarisation uses NLP techniques to help digest huge volumes of digital text. It also creates summaries and synopses for indexes, research databases, or busy readers who don't have time to read the full text.
The ideal text summarisation applications use semantic reasoning and natural language generation to add useful context and conclusions to summaries.
(IBM, 2020)
Text search
Search engines are the most common text search application as an NLP task.
Search engines use NLP to surface relevant results based on similar search behaviours or user intent. The average person can then find what they need without being a search-term wizard.
For example, Google predicts what popular searches may apply to your query as you start typing. It also looks at the whole picture and recognises what you’re trying to say rather than the exact search words.
(Tableau, n.d.)
Part-of-speech (POS) tagging
POS tagging, also known as grammatical tagging, uses language context to determine the grammatical functions of individual words or phrases in a section of text.
For example, in the phrase 'I can make a paper plane', POS tagging identifies the word 'make' as a verb.
However, it will identify the word 'make' as a noun in 'What make of car do you own?'.
(IBM, 2020)
Sentiment analysis
Sentiment analysis is an NLP task that attempts to extract subjective qualities such as attitudes, emotions, sarcasm, confusion, or suspicion from the text.
For example, sentiment analysis would allow computers to identify the emotions behind this sentence, 'Awful experience. I would never buy this product again!' as "anger".
(IBM, 2020; Roldós, I., 2021)
Natural language generation
Natural language generation is the task of putting structured information into human language.
Often described as the opposite of speech recognition or speech-to-text, natural language generation produces a human language text response based on some data input. This text can also be converted into a speech format through text-to-speech services.
NLG also encompasses text summarisation capabilities that generate summaries from input documents while maintaining the integrity of the information.
(IBM, 2020; Kavlakoglu, E., 2020)