Natural Language Processing (NLP) has already penetrated many areas of daily life and work. And the use of NLP will continue to increase. All of us are familiar with voice assistants such as Alexa or Cortana, we dictate emails and letters, we sometimes deal with conversational AI if we call a hotline. NLP is even used for therapeutical purposes: The app Woebot uses chatbot technology to interact with people suffering from depression, anxiety or burn-out.

Lawyers use NLP to sift through large portfolios of contracts for legal risks; translation programs such as DeepL or Google.Translate have become an integral part of everyday life in internationally operating companies; in HR departments NLP is used to pre-select applications. In few companies, the application process for applicants starts with a telephone interview with artificial intelligence. And the Sentiment Analysis is used by Marketing experts to evaluate the perception of companies in the social media (positive, negative).

Natural Language Processing can be described as the use of AI (mostly: Deep Learning) to process natural language or generate speech: Speech-To-Text (conversion of spoken language into text), Text-to-Speech, translations, meaning analysis and so on.

We look at the following: What are the basics of NLP? Where do we stand? What are the recent trends and breakthroughs? What are the current challenges?

Overview of the basics of computational linguistics

Let’s start with a stunning statement: The computer has no understanding for what is being said, and this won’t change in the foreseeable future. Natural Language Processing is based (like Deep Learning in general) on statistical correlations; sentences are generated using training data based on statistical correlations (in the simplest case: subject-predicate-object): Probabilities for the use of certain vocabulary in linguistic contexts are determined, etcetera.

Computational linguistics wasn’t always based on deep learning methods; actually, classical computational linguistics (before the era of artificial neural networks) mainly focused on detecting and generating rule-based methods. However, it is true that deep learning methods are increasingly replacing the conventional approaches of classical computational linguistics.

Computational linguistics can basically be divided into several sub-areas, or put differently: Language processing can be divided into different process steps. Let us consider this in the case of the analysis of spoken language:

The first step consists in the speech recognition itself. Spoken speech (received from voice assistants like Siri) is converted into analyzable text (Speech-to-Text): In fact, today speech assistants are very good at this – provided that only a single person is speaking: Speech recognition works reliably even if you are talking to speech assistants from a certain distance, or if you are not turned perfectly towards the device.

In the second step the grammatical and syntactical analysis is performed. For example, it is determined whether a word is present as singular or plural. Or: Whether we’re dealing with the genitive case or accusative case of a particular word. The sentence structure is recognized (main clause, subordinate clause, subject-predicate-object) and much more.

You can see such analysis at work by using the search engine www.wolframalpha.com. If you enter a question, it gets analysed with the help of such linguistic methods. The recognised elements are displayed next to the actual search result:

Search Engine Wolfram AlphaFig.: Search Engine Wolfram Alpha

In the third step the semantic analysis takes place. The aim here is to “understand” the meaning of individual sentences or the statement of a paragraph. There’s different methods to approach the meaning.

In the simplest case, you merely want to know whether a speaker has a positive or negative attitude towards a company or a product. This so-called Sentiment Analysis can be done with comparatively simple means: Take a text corpus and clean it up, that is: Remove so-called stop words; these are conjunctions or articles (“and”, “or”, “the”, “the”, “the”, “that”); You will find numerous lists of such stop word lists for download on the internet. Next step: You compare the remaining text corpus with annotated texts that were labelled as “positive” or “negative”. Machine learning will complete the task and assign a “sentiment” (positive, negative) to the elements of your text corpus.

Another important method is the entity analysis: It is about identifying referenced objects, such as companies, historical events, cities and so on. The first step is to identify the type of entity (company, city or person). There are also possibilities to query additional information (in machine language) about identified entities. Such machine-readable information on entities can be found on wikidata or dbpedia (in short: a “Wikipedia” that can be understood by machines).

Another technique that contributes to the semantic classification of texts for computers is the so-called Word Embedding. These are multidimensional spaces (with up to several hundred dimensions!) in which words are classified. A basic rule applies: Words with similar meanings (i.e. synonyms) are close to each other in this multidimensional space. However, Word Embedding is quite powerful. You can apply vector calculus here: So, applying the vector calculus to “kayak” + “big” gets you to something like “passenger ship”. Here’s yet another example: “boss” + “woman” becomes “female boss”. However, the research on Word Embedding is still in its infancy.

I would also like to mention the knowledge graph: Comparable to a MindMap, relations are established between the different elements in a text; as far as possible “directed” relations (e.g. company + supplier) or relations with “polarity” (Eva “likes” a product or: Peter “hates” the subject XYZ).

The fourth step and last step consists in the (very demanding) dialogue and discourse analysis. A superordinate structure is analysed, for example the relations between sentences or the structure of a paragraph, which may consist of a statement/hypothesis at the beginning, which is then substantiated.

Status Quo & Important Trends

At the trade fair for digital movers and makers Digital X in Cologne, 2019 I heard an excellent lecture by Jan Morgenthal, at that time Chief Product Owner Artificial Intelligence, Deutsche Telekom AG. I talked to him in person after the event. He shared the market analysis of Deutsche Telekom on Conversational AI with the audience. Actually, he made it clear that many solutions on the market labelled “AI inside” do not always make use of AI. Many solutions were rule-based (“Fake it until you make it”). As per his analysis, this was even the case for the first versions of Alexa, which however became increasingly more “intelligent”. Morgenthal estimated (as of October 2019) the number of startups working with “real” AI in this area to be in the order of 30 – all over Europe.

Now, let’s take a look at the trends and hypes: One important trend is Transfer Learning: In the early days of Conversational AI, the Conversational AI models had to be trained with extensive annotated text corpora. Today, pre-trained models are used. The desired results can be achieved much faster and with less area-specific data sets. The idea behind this is simple: A model is pre-trained with a large text corpus (e.g. Wikipedia is such a large annotated text corpus. Even machines can get the meanings here thanks to cross-references); such a pre-trained model is then fine-tuned for use in a specific area with more specific data sets.

Today, users can access various pretrained NLP frameworks, including, for example Google Word2Vec, PyText (pre-trained Open-Source NLP Framework from Facebook), Generative Pre-trained Transformer Model (pre-trained model from OpenAI); Bidirectional Representation from Transformers (BERT) (pre-trained model from Google.

Another interesting development: Bi-directional neural networks. Since language is rarely unambiguous, the context plays an important role for the analysis of meaning (i.e. semantic analysis – compare above: Third Step). The context of a word (or a sentence) can be deduced from what was said before – or from what is said AFTER that word. And this is precisely the idea of bi-directionality: both the text-context preceding a sentence and the text-context following a sentence are taken into account. This improves the understanding of the text. Certain short-/long memory technologies are used for this purpose.

And finally: Computer-generated language – even if it is grammatically correct – has so far been characterized by a treacherous monotony. A more natural language is now created by incorporating certain delays, arhythmic passages or artificial pauses in thought.

view: Challenges & Research Areas

What are today’s challenges of NLP in practice?

Let’s start with the so-called cocktail party effect. Speech assistants work very well nowadays, if only one speaker interacts with Alexa or Magenta. But if you try to dictate an email in the subway or on the bus with ambient noise (background noise), the recognition rate immediately drops to values that de facto prevent practical use. In other words: The ability to filter out the voice of an interlocutor from the babble of voices in the subway or suburban train is an easy task for us humans – for the speech assistant, however, this task is a huge challenge.

Let’s come to another point: If you observe the behavior of users in interaction with speech assistants, you will notice something striking: “Alexa, play my favorite song”; “Alexa, dim the lights in the living room”, “Alexa, when did Charles the Fifth reign?” These are nothing but instructions; the interactions couldn’t be called dialogues or conversations. It is unilateral speech acts. The ability to conduct a real dialogue is far more demanding, this is another area of improvement and research.

A general challenge for computational linguistics is basically the ambiguity of language; In many cases only the context of a sentence will make the meaning unambiguous. Let’s take a look at an example: The sentence “Bring me some ice” typically does not mean that the person I am speaking to will go to the North Pole to collect Arctic ice. This sentence rather means that (a) the person I am speaking to should bring the ice cream for dessert or (b) the person I am speaking to should get ice cubes for cocktails or (c) the person I am speaking to should bring an ice bag to cool a laceration. In addition, the meaning of a sentence cannot be deduced exclusively from what is said; sound and facial expressions can change the meaning of speech acts even more. Irony, sarcasm – this is difficult for computers to understand.

Most people also rarely speak “Oxford English” or “Duden High German”: words are omitted, dialect is spoken, sentences are repeated and so on.

Author

The author is a manager in the software industry with international expertise: Authorized officer at one of the large consulting firms - Responsible for setting up an IT development center at the Bangalore offshore location - Director M&A at a software company in Berlin.