Machine learning (ML) has become a very popular topic in recent years, but despite technology advances, many companies have a hard time putting it to work. I’ve found that this is primarily because they don’t know how to implement ML to meet strategic business goals. The related buzz around ML fosters uncertainty about what exactly ML is, how it works, and what it can do for a company. But before a company can leverage ML, it’s important to understand the basic components. In this blog post, I will explain my take on ML and the various elements that can best be used by companies today. This will be the first in a series of ML-related blog posts covering the full spectrum of how ML can be best utilized by businesses.
what is machine learning?
We see ML applied every day whether we think about it or not. Web search engines, chatbots, product recommendations, and spam filters are all examples that impact our daily lives. Ultimately, if a person surfs the web or uses Siri, Cortana, or Alexa, they are benefitting from ML.
ML is a subset of artificial intelligence (AI) that creates systems to learn and predict outcomes without manually programming a computer. ML is unique within the AI field because it has made the largest real impact for businesses to date. Due to its impact, ML is generally considered separate from AI, which is more about building systems to do intelligent things.
The following displays the overarching field of AI and where ML fits into the spectrum:
Source: Legal Executive Institute
The key to ML is that rather than a person writing algorithms or rules that drive outcomes directly, ML determines outcomes through computers learning from data with more processing power and automation than could be accomplished by a human. With that said, inherently rule-based ML systems can become frail when they must address the complexity of the world in general. As a result, the best ML models tend to focus on generalizing patterns in data without adhering to overly strict rules. Those models can then be used to assess new information.
ML breaks down into two general types of learning models, classification or regression. Classification is used to determine a Boolean (true or false) outcome, like whether a person will buy or not buy an item, or to categorize data. Regression aims to determine the level of something such as price points, stock prices, profit, or any other level measure. To execute a classification or regression model, a statistical algorithm must be incorporated.
This leads us to the core of ML, algorithms. Algorithms, in their simplest form, are like a mathematical recipe for predicting outcomes based on the underlying data. Some algorithms, such as regressions, k-means clustering, and support vector machines, have been in use for quite some time. K-means clustering, for example, seeks to organize the data around cluster points and determine the distance each data point is from the nearest cluster point in order to classify the data. In the end, choosing the right algorithm is a central tenant of ML. Careful consideration and testing should be performed to arrive at an algorithm selection. The following is an example of k-means clustering:
Source: Wikimedia Commons
Shifting gears to advancements in algorithms, we arrive at the neural network. A neural network is a ML algorithm built on a network of interconnected nodes that work well for tasks like recognizing patterns in data. Neural networks aren’t a new algorithm, but the availability of large data sets and more powerful processing (especially GPUs, which can handle large streams of data in parallel) have only recently made them useful in practice. Despite the name, neural networks are only roughly based on biological neurons. Each node in a neural network has connections to other nodes that are prompted by inputs. When prompted, each node adds a weight to its input to mark the probability that it does or doesn’t match that node’s function. The nodes are organized in fixed layers that the data flows through, unlike the brain, which creates, removes, and reorganizes synapse connections regularly. The following exemplifies how a ML neural network is organized:
Next, we look at deep learning, which is a subset of ML based on deep neural networks. Deep neural networks have many layers for performing learning in multiple steps. Convolutional deep neural networks often perform image recognition by processing a ranking of features where each layer looks for more complicated objects than the prior layer. For instance, the first layer of a deep network that recognizes an employee’s face for security purposes might be trained to find the shape of a human face, the second layer might look at attributes like distance spacing of eyes and ears, with other layers recognizing eye color, skin tone, nose structure, and other characteristics, and the final level distinguishing the actual employee authorized to access the system. Recursive deep neural networks are used for speech recognition and natural language processing, where the sequence and context are important. The following depicts a deep learning model:
Source: McKinsey Analytics
nature of learning
Now we turn our attention to the nature of learning in ML. There are two kinds: supervised and unsupervised.
The majority of ML applications use supervised learning, in which a function is derived from labeled training data. Developers choose and label a set of training data, set aside a proportion of that data for testing, and score the results from the ML system to help it improve. The training process can be complex, and results are often probabilities, with a system being, for example, 35% confident that it has recognized a dog in an image, 85% confident it’s found a cat, and maybe even 5% certain it’s found a turtle. The feedback developers give the system is likely a score between zero and one indicating how close the answer is to correct.
Another important ML construct is to not train the system too strictly to the training data. That’s called overfitting and it means the system won’t be able to generalize well to handle new inputs. If the data changes substantively over time, developers will need to retrain the system due to what some practitioners refer to as “ML rot.”
Unsupervised learning is the absence of scored labels by the computer. When the data labels are unknown, the developer scores the results and the ML system devises rules that make sense of the answers it gets right or wrong as it progresses with the learning. Such instances are more common as the internet of things devices collect real-time, amorphous data at increasing rates. The most common unsupervised learning algorithm is clustering, which derives the structure of data by looking at relationships between variables in the data. For instance, Amazon’s product recommendation system that conveys what people who bought an item also bought uses unsupervised learning. To better understand an unsupervised learning outcome, the following displays how ML might align a series of data points:
Source: Stanford University
Often, it takes more than one ML method to get the best result; ensemble learning systems use multiple ML procedures in combination. For instance, the Google’s DeepMind system that beat expert human players at the game Go uses not only unsupervised learning, but also supervised deep learning to learn from thousands of recorded Go matches between human players. That combination is known as semi-supervised learning.
Another example application, predictive analytics, often combines different ML and statistical approaches. One model might score how likely a group of customers is to leave, with another model predicting which marketing channel to use to contact each person with an offer that might keep them as a customer.
Since ML systems aren’t explicitly programmed to solve problems, it’s difficult to know how a system arrived at its results. This is known as a “black box” problem, and it can have consequences, especially for regulated businesses.
explaining ml decisions
As ML becomes more widely used, it will be important to explain why a ML-powered system performs what it does. Some markets—real estate, financial, and medical for example—already have regulations requiring explanations for ML-based decisions. Additionally, there may be a requirement for algorithmic transparency so there is an audit trail for ML performance. Details of the training data and the algorithms in use might not be enough. There are many layers of non-linear processing going on inside a deep network, making it very difficult to understand why a deep network is arriving at a particular outcome. A common approach is to use another ML system to describe the behavior of the first.
ML can only be as good as the data it trains on to build its model and the data it processes, so it’s important to investigate the data being used. ML also doesn’t understand the data or the concepts behind it the way a person might. For example, scientists can create images that look like random static but get recognized as specific objects by ML systems.
So what does this mean to businesses? For starters, given the explosive advancements in affordable computing power, there is now an accessible toolkit that companies can use to explore and operationalize their data as never before. Some of the toughest decisions facing companies can now be tackled with probabilities and data science supporting a given path rather than solely relying on intuition and partially baked data results. Furthermore, some areas of business can be automated via ML to reduce the emphasis on manual activity and solve problems more quickly and efficiently than people. Yet, for the foreseeable future, I think ML is best thought of as a set of tools to support workers rather than replace them.
If your company is wondering how to better leverage ML, feel free reach out to us at firstname.lastname@example.org. We’d love to help as you think through how your company can gain a competitive advantage through machine learning.