The Appanion AI Guide
Artificial intelligence is not clearly defined as a term and is used as an overarching term for a variety of methods for solving tasks on data-based analyses.
For this reason, our AI Guide provides comprehensive and understandable definitions and summaries of the major concepts in the context of artificial intelligence.
Content Overview: What is...?
What is Artificial Intelligence (AI)?
Artificial Intelligence (AI) is the ability of a computer-controlled entity to perform cognitive tasks and react flexibly to its environment in order to maximize the probability of achieving a particular goal. The system can learn from experience data, and can mimic behaviors associated with humans, but does therefore not necessarily use methods that are biologically observable.
Artificial Intelligence describes the ability of computers to perform tasks in order to successfully achieving the desired goals. This often means that human tasks can be efficiently augmented or even fully substituted by a machine.
Common examples of chess, Go or Dota playing machine intelligence show that machines even outperform human capabilities in some areas. Currently, the most recognized fields of AI are Natural language processing (NLP) and machine learning algorithms.
Why does it matter?
The internet enabled global communication for everyone and created an impressive change of how we work, live and interact with each other. Artificial Intelligence is expected to do the same for process automation.
This will impact the consumer sphere, but even more business processes that are repetitive or follow simple decisions. One of the most cited impacts will be autonomous driving, but also research, customer service, controlling, legal and administration are areas, where AI will significantly increase productivity.
This cross-industry impact makes it relevant for almost anybody to get involved with AI.
Where are we today?
Although AI is there for over 50 years now, we are still in the starting phase. Enabling factors like processing power, global networking and cloud technologies have just started to unleash the possibilities of artificial intelligence.
The three major phases of Artificial intelligence can be segmented into pattern recognition coming with the availability of huge enterprise data sets – the big data hype. Second phase is currently the commercialization process of deep learning algorithms, that enable actual learning systems via neuronal networks. An abstracting and reasoning intelligence is yet quite a few years away, but its development starts right now.
First experiments are already conducted, but commercial applications currently focus on phase two - giving the technology a lot of space to grow in performance and adoption.
What is Machine Learning?
"Machine Learning (ML) is the science of programming computers so they can learn from data and/or information, and improve their learning autonomously."
The terms Machine Learning (ML) and Artificial Intelligence (AI) are not to be used interchangeably. ML is a subset and in this sense an application of AI that focuses on teaching computers how to learn without the need to be programmed for specific tasks.
Supervised vs. Unsupervised Learning
ML is commonly divided into the sub-domains of supervised and unsupervised learning.
They basically differ in the fact that in supervised learning input and output variables are provided, whereas in unsupervised learning only the input data is given. Some thereby argue that unsupervised learning is "real" ML as in the process itself no human interaction is needed.
Choosing to use either a supervised or unsupervised machine learning algorithm typically depends on factors related to the structure and volume of your data and the use case of the issue at hand.
Supervised learning is often used in image recognition, speech recognition, forecasting, financial analysis, and the training of e.g. neural networks. Most of us experience supervised learning everyday lives in weather forecasts, sports outcome predictions as well as when interacting with Siri or Facebook. Unsupervised learning, on the other hand, is used during exploratory analysis, to pre-process data or to pre-train supervised learning algorithms.
From a technical point of view, classification and regression for supervised learning and clustering and dimensionality reducing for unsupervised learning are the main approaches.
From a business perspective, it is thereby quite difficult to decide where to start in applying machine learning. The following 5 business applications are considered the “low hanging fruits” for machine learning applications:
Big Data & Analytics
It becomes apparent that those applications are mostly located in areas within an organization where on the one hand already a lot of data is collected. On the other hand, there are quite straight-forward use cases to be found that can be tested on a small scale and deliver some visible output relatively fast.
So while machine learning isn’t an easy topic to start with, it’s also not one that any future-minded business can leave off the table for too long.
What is Deep Learning?
"Deep Learning is an artificial intelligence function that imitates the workings of the human brain in processing data and creating patterns for use in decision making."
„Deep Learning“ is a new term for something that has already existed since 1942 in the McCulloch & Pitts model. It is in this sense similar to other non-academic terms like „Big Data“.
Deep Learning is a machine learning (ML) method that allows us to train an AI to predict outputs, given a set of inputs. It is thereby a subfield of ML dealing with algorithms inspired by the structure and function of the brain – called artificial neural networks. The foundation of deep learning is the modeling of non-linear systems. In comparison to general ML, there is an associated neural network with n hidden layers where n is large.
Investing in a robust data and analytics foundation is the first step of a deep learning project. In fact, the project’s success is dependent on the data – so the majority of a deep learning project’s work should be done there.
The most common use cases in an enterprise environment including:
Prediction Models (Pricing, Maintenance, ...)
Voice- and Image-Recognition
Deploying deep learning potentially involves the automation of decision-making at scale. It is thereby different from adopting other kinds of software and will most likely require the rethinking of processes that were engineered in prior.
What is Reinforcement Learning?
"Reinforcement Learning (RL) is a type of dynamic programming that trains algorithms interacting with an environment using a system of reward and punishment."
Reinforcement Learning is subset of Machine Learning (ML), where an agent – an “action-taker” in the form of an algorithm – learns how to behave in an at least partly unknown environment. The agent performs an action, which always creates a corresponding outcome. Reinforcement Learning simply connects the outcome with a reward or punishment. Thereby, the agent doesn't need to understand the complex casual relationships between its action and the outcome. It is in some way similar to teaching a dog how to fetch.
And of course, variations of RL exist yet again: If this agent is for example a neuronal network, from Reinforcement Learning becomes Deep Reinforcement Learning.
To get a basic understanding of RL, here are some definition of key terms that you will run into again and again as you dive deeper into the topic:
Key Definitions for understanding Reinforcement Learning
Action: An agent chooses from a set of actions. A list of all possible distinct actions represents the complete option space, that agent can potentially perform to move forward. If we take the stock market as an example - possible actions of an agent would be to buy, sell or hold.
State: The state represents the situation, an agent is in. It only represents a snapshot of the agent's position after its last move and therefore changes with every action taken by the agent. A specific state can also limit the set of actions that an agent is able to perform. Sticking to the stock market example: if the agent owns 10 shares, it can sell n shares (in this case n<11), buy m shares (m depending on its credit and the share value) or just hold its shares.
Environment: The environment contains the set of rules, an agent can move within. It takes the current state of the agent and its action and returns the next state. In our stock market example, the stock market itself in combination with the agent's bank account and the trading platform would be the environment.
Reward: A reward is a feedback, which measures the success (or failure) of an agent's actions. The change of the value of the agent's portfolio could be the basis for a reward system in our stock market example. Those rewards must be designed carefully, as they are the foundation for the agents learning process.
Policy: The policy is kind of the underlying strategy that an agent uses to determine its next action. This decision is based on the current state as it maps states to actions to find out, which action promises the highest reward. The strategy of the stock-trader agent could, for example, be, to buy stocks of company A as soon as the stock price of company B falls. It can then take all other factors of the environment into account and derive a decision.
Reinforcement Learning is suitable when information about the outside world is limited. In those situations, learning from actions – by interacting with the environment – is most promising. Applications, therefore, are for example:
To decide whether Reinforcement Learning is the right approach to be taken, the following criteria are good indicators to be evaluated. If one or more of them apply to your use case, you should probably consider trying a RL approach:
One goal is the augmentation of human analysts and domain experts by optimizing efficiency and providing decision support
The system or process is too complex for manually teaching machines through trial and error
Large state spaces are to be considered
What are Artificial Neural Networks?
"Artificial Neural Networks (ANN) are frameworks inspired by biological brains to process complex data inputs by using various machine learning algorithms. The key benefit of neural networks is that they don’t need detailed programming but learn from training."
Artificial Neural Networks are one of the key tools in the machine learning space. The basic concept is that the learning processes mimic the brain of humans or animals – in other words, perceive a lot of information, recognize patterns and derive knowledge from experience and feedback of the environment.
Of course, it is a bit more complex than it sounds, but let's try to get to the fundamentals of Artificial Neural Networks step by step.
Where are Artificial Neural Networks coming from?
The concept of ANNs is around for decades, but there are recent advancements in AI research that allow ANNs to become way more complex than ever before. One of the most complex systems that we know is the human brain, and trying to just vaguely replicate its function is extremely challenging. Therefore, the first successful results started to come with the ability to conceptualize systems with higher complexity and efficiently process data. The three enabling factors in this context are:
Architecture of Artificial Neural Networks
Don't worry, we will come back to these aspects soon. But to connect the dots, it is important to understand the architecture and components of ANNs first.
Artificial Neural Networks work based on connected artificial neurons (nodes) that process signals by receiving input, changing their state (activation) and transmitting output. The neurons are organised in layers. There is always an input layer and an output layer.
In case of more complex requirements, e.g. image recognition, there are one or multiple hidden layers in between – sequentially processing specific characteristics. In the image recognition example this would mean shapes, textures, brightness, contrast etc.
Simplified Architecture of an Artificial Neural Network
To sum it up: A data signal enters the ANN at the input layer, is processed through one or multiple layers of neurons depending on the complexity of the network architecture and finally reaches an output layer, where a goal function gives back a result. This result can for example be a simple classification into group A or B or a more sophisticated facial recognition system that fits together a number of image elements (facial structures) and concludes a match or no-match.
Types of Artificial Neural Networks
There are different types of ANNs with a vast amount of different characteristics depending on the area of application. So the first step when confronted with Artificial Neural Networks is to make the right choice of which type suits your use case best.
One of the most common terms in this context is the deep learning neural network. This is not really "type" but describes the fact that the network uses multiple hidden layers to process data.
Some of the most common ANN types are:
feedforward neural networks
(data travels only in one direction from input to output layer)
recurrent neural networks
(data flows in sequences without a predefined amount of computational steps)
convolutional neural networks
(similar to feedforward but data is encoded to certain properties in order to reduce the number of parameters and increase computational efficiency)
What is Robotics?
Robotics is a technology branch that aims to develop machines that can replicate human actions - partly even with enhanced capabilities"
Robotics and Artificial Intelligence are connected, but neither are they exactly the same nor is Robotics simply a subset of AI. Robotics deals with the mechanical, electronic and information engineering of programmable machines (Robots). It includes design, construction, and operation as well as the underlying computer systems for control functions and data processing.
Robots capture environmental data via sensors and actuators and are able to perform tasks or a series of actions autonomously or semi-autonomously. Due to the extremely broad range of appearances, a clear definition of 'robots' is quite tricky. Usually, a robot is considered as any device that:
performs (complex series of) tasks (semi-) autonomously
interacts with the (physical or virtual) world based on its captured environment data
Social Robots (Pepper), Bionic Robots (Boston Dynamics), Humanoid Robots (Sophia), Unmanned Vehicles (Drones), Transport Robots (Fraunhofer), Industrial Robots (Kuka)
Exceptions are telerobots or vehicles that can be controlled remotely by humans. Other exceptions are software robots or smart assistants, that start to blur the line between AI and Robotics. The most prominent ones are Siri, Alexa, and Cortana, which do not (yet) physically interact with their environment but are in most of the modern definitions also considered as intelligent robots.
Which brings us to the intersection of AI and robotics - the field of intelligent robotic systems (smart robots). It's important to understand that automation or carrying out tasks autonomously has per se nothing to do with intelligence. Every autonomous task can be programmed and based on a set of predefined rules.
The majority of robots so far were non-intelligent robots, that performed pre-programmed repetitive tasks in well-known environments (like assembly lines). Intelligent Robots, on the other hand, make choices based on their input data, react flexibly to their environment and learn from experience. Today, a number of factors allow the intelligent robot segment to grow rapidly:
Increasing sensor density and ability to capture precise real-time data
Miniaturization and performance of computational (processing) power
The increasing mobility of devices due to wireless connectivity
Through cloud computing, and especially low-latency edge cloud computing, the trend of robots getting smarter has partly reversed in the meantime. In predefined local areas such as factory floors, it is getting more efficient to relocate the computing and therefore intelligence unit to the cloud in order to have more scalable and cost-efficient hardware devices. The precondition is, therefore, a sufficient and stable internet connection.
The way, how robots contribute to the value creation is quite obvious: They substitute parts of human labor, boost productivity due to reliability, precision, and endurance, and automate oftentimes dirty, dangerous or exhausting jobs.
There are two major value drivers for the future of intelligent robots
Improved capabilities of robots to collaborate with humans seamlessly (human-machine interaction), due to advances in collision detection and real-time reactivity
Lower unit costs due to computational "off-loading" of the intelligent control unit to the cloud
The World Economic Forum estimates that the adoption of AI enabled robots could boost productivity in many industries by 30% while cutting labor costs by 18-33%. The resulting global economic impact ranges between $600 billion and $1.2 trillion by 2025.
What is Machine Perception?
"Machine perception encompasses methods and technologies to simulate human senses and therefore the perception and subconscious interpretation of the environment."
Major areas of machine perception are computer vision, computer audition and machine touch, whereas tasting and smelling are in a very early stage of experimental research.
Advances in machine perception research are extremely valuable for many different areas of artificial intelligence. It enables – or at least significantly reduces the effort of – data capturing and pre-processing for further applications such as natural language processing, robotics or machine learning.
Computer vision mimics and automates the human visual system. It, therefore, includes all tasks of acquiring, processing and analyzing digital images or videos in order to derive an understanding of the environment. Machine vision is a related discipline but focuses on rather low-price and robust hard- and software systems for industrial applications. Computer vision, on the other hand, can be used in a much broader sense.
Computer vision systems depend on the application but some typical functions are relevant for most use cases, such as:
Image acquisition by one or multiple image sensors (cameras, lidar, 3D scanner, magnetic resonance image etc.) resulting in a 2D / 3D image or video
Pre-processing is necessary to extract specific information before the actual image analysis starts (e.g. re-sampling, noise reduction, contrast enhancement)
Feature extraction (e.g. lines, edges, interest points, texture or motion)
Detection (selection of image points or regions that are of interest for further analysis)
High-level processing (based on the extracted image sample – including verification, estimation of object pose and size, image recognition and classification as well as comparison or combination of different views)
Decision making (depending on the goal function, e.g. match / no-match, flagging for further inspection, pass/fail for any given category)
The use of convolutional neural networks (CNN) led to significant improvements in visual data analysis. In essence, this special form of deep learning neural networks can learn patterns of features to detect and identify objects with higher accuracy compared to traditional models. A very short high-level introduction to CNNs is given in this video.
Computer audition (also machine hearing or listening) deals with the understanding of audio signals apart from natural language (see NLP). It includes representation, reasoning, and grouping of audio signals based on general sound semantics. Detecting, locating and monitoring and classifying things and persons by their emitted audio signals are typical areas of application.
Similar to image data, computer audition requires a combination of different use case specific functions to achieve a particular goal. Those include:
Filtering and channeling
Classifier / decision making
Machine touch (also tactile sensing) processes haptic or tactile information that arises from the physical interaction with the environment. Tactile sensors detect external stimuli from physical contact (e.g. pressure or temperature) and are primarily used in robotics, security systems or touch-sensitive screens or control units.
In a first step, tactile sensors provide a 3D ‘image’ of the contact surface and can combine this information with data from force sensitivity or temperature sensors. The overall process pattern of data acquisition, filtering or pre-processing, feature extraction and decision making are thereby analog to computer vision or computer audition.
As of today, smelling and tasting are sense functions that are in an extremely early stage of research so that their commercial value is yet negligible.
Machine perception developed to an equal level of human capabilities would allow transmitting information extremely efficiently. It would solve a bandwidth problem that currently exists in the communication between computers and humans. In the past, human-machine interaction was limited to a one-dimensional way of input information through mouse, keyboard or touch screens and one-dimensional output information in form of text in natural or programming language.
With voice, pictures, and videos, this communication path is already getting multi-dimensional but the authentic simulation of human senses would allow humans to virtually access any given real or fantasy environment without actual physical presence. This strongly connects to applications of virtual and augmented reality and opens up completely new ways of learning, communicating and many other perception related tasks.
Currently evolving fields of application are:
Automatic inspection (e.g., in manufacturing)
Assisting humans in identification tasks (e.g., medical diagnosis)
Controlling processes (e.g., in robotics)
Detecting events (e.g., video surveillance)
Human-computer interaction (e.g., touch screen)
Modeling objects (e.g., topographical land modeling)
Navigation (e.g., by an autonomous vehicle)
Organizing information (e.g., for indexing databases)
Google even provides large-scale, publicly available data sets to start with machine perception related application development. Examples of such projects are:
What is Natural Language Processing (NLP)?
Natural language processing (NLP) is a sub-dimension of artificial intelligence that deals with the written and verbal communication between computers and humans in natural languages. NLP is used to recognize, understand and interpret natural language input data and generate contextually meaningful output data."
Before we dive deeper into applied NLP, let’s clarify some important terms in the context of natural language processing:
Rule-based vs. statistical NLP: When talking about modern, AI-related NLP methods, it is all about statistical NLP. Today, models are not solely relying on pre-coded rules but the systems learn the correct use of language by itself and make soft, probabilistic decisions which offer a lot more flexibility and naturalness.
Speech recognition (also speech-to-text) describes methodologies and technologies that allow recognition of natural language out of recorded audio data and translation into text.
How does that work? For a long time speech recognition systems were mostly based on Hidden Markov Models (HMM) – efficient and trainable statistical models that segment speech into so-called phonemes (smallest units of sound – 10 to 20 ms) and match it with pre-trained knowledge of different words.
However, HMMs require a relevant amount of local memory and are therefore impractical for small and mobile devices. Modern systems, therefore, try to apply end-to-end automatic speech recognition (ASR) systems that rely on a deep learning method called Long short-term memory and deploy the solutions on the cloud.
Natural language understanding (text interpretation) starts either directly with text input or uses the output of speech recognition to understand the language and context. This is a real challenge as natural language can be very messy. Humans skip words, create neologisms or build sentences that are ambiguous without common sense understanding of the context.
The process of understanding natural language requires generally four major aspects:
A lexicon (vocabulary and meaning of words)
A parser (syntax analysis and structuring)
Grammar (structural rules for the composition of words)
A semantic theory (for the derivation of a logical, contextual meaning)
There are different methods in language understanding that are processed on the input text data in order to achieve the desired goal. Some of the most common terms are listed below, Just to give you a clue about the variety of analytical steps a system can make to derive results.
splitting text into smaller pieces, e.g. words
removing punctuation, converting case letters, numbers to text etc.
elimination of affixes to obtain the word stem
Parts-of-speech (POS) tagging
identifying verbs, nouns, etc.
removing words not contributing to the meaning, e.g. “the”, “a”, “and”
analyzing text strings with regards to correct applied grammar rules
deriving the meaning of word collections according to the applied semantic theory
evaluating the emotional context (e.g. positive, negative or neutral)
analyzing the similarity of text strings, e.g. for look-ups or matching purposes
an analysis that gives back sequences of n specific strings (e.g. words) from a text
Of course, there are already many algorithms and tools that can be used, even open source, to get started with NLP (we'll get to those at the end of this post).
Natural language generation (NLG) is the translation of data represented by a machine into natural language. NLG systems make decisions on how to choose and compose the right words in order to effectively deliver a message to a human recipient.
Sophisticated NLG systems, therefore, require a series of planning and information merging:
Content determination = WHAT information is included
Document structuring = WHERE to place content parts
Aggregation = MERGING of redundancies
Lexical choice = HOW to formulate meaningfully
Referring expressions = WHICH references make contextually sense
Realization = CREATION with respect to syntax, grammar, orthography
There is an incredibly wide range of applications for NLP – practically everything that involves spoken or written language. And in many cases, it is not even recognized by the end user. Every suggestion of words while typing a message in your phone or a search term into the Google search bar uses NLP. E-Mails are automatically filtered and moved to your spam folder based on NLP, websites automatically translated into the language of the operating system of your device or your browser.
But the most prominent examples are of course voice assistants like Siri, Alexa or Cortana that enable us to interact with machines on a very natural way, without the necessity of knowledge about input commands or any underlying programming language.
The value of natural language processing is therefore independent of the industry or even consumer and business-to-business markets. NLP allows deriving insights from huge amounts of spoken or written data at a scale that no human being would be able to process. On the other hand, it allows to communicate seamlessly with machines and reduces the efforts of training to productively interact with computers, whether it is smartphones, production machines, cleaning robots or business applications.
Areas of application
automatically summarizing text for previews
generating keyword tags from blog posts
social media analytics e.g. through sentiment analysis
categorization and indexing of emails
topic discovery from search terms or discussions
extraction of structured data from news articles
customer service chatbot
hands-free control of smart home functions
How to get started?
Some practical recommendations at the end. If you want to start a chatbot project, build a smart assistant, structure your text data flow or want to improve your brand perception online, here are three rules of thumb:
Set a very narrow focus in the beginning to reduce complexity
Start where you already have good and unique internal data, enrich this data with other sources in a second step
Use existing frameworks and libraries to build your applications
Many modules for analyzing text or recognizing speech are provided as toolkits by open source platforms. We recommend checking out the following NLP libraries:
Alternatively, all the large tech players offer user-friendly services in the NLP space: