An Introduction to Unsupervised Learning

This article defines unsupervised learning, discusses its most well-known algorithms, and provides examples of unsupervised learning in computer vision and natural language processing.
Mostafa Ibrahim
Created on January 4|Last edited on May 19
Comment
﻿
IntroductionIn this article, we explore the subject of unsupervised learning. Unlike its sibling, supervised learning, unsupervised learning is a type of machine learning algorithm which learns patterns from unlabeled data. We'll give a robust definition of unsupervised learning, explore the most well-known algorithms, and provide examples in the domains of computer vision and natural language processing (NLP). 
Here's what we'll be covering: 
Table Of ContentsIntroductionTable Of ContentsWhat Is Unsupervised Learning?How Does an Unsupervised Machine Learning Model Work?Types of Unsupervised LearningPopular Unsupervised Learning AlgorithmsWhat is Unsupervised Learning Used ForConclusion
﻿
﻿
Let's dive in! 
What Is Unsupervised Learning?Unsupervised learning is a subfield of machine learning in which a model is trained on unlabeled (or untagged) data. The main idea behind unsupervised learning is for the model to detect hidden insights and patterns in a given data set without having to first identify or classify what to look for. 
In unsupervised learning, we are feeding the model data without specifying what output values we want it to produce. This gives the model the freedom to manipulate the data set as it sees fit.
To better understand unsupervised learning, consider a mother teaching her child to distinguish between two animals: a dog and a cat. The mother shows her child a set number of images of both animals, and the child then defines some characteristics of both animals until he is able to fully classify new images as either species. The child can then categorize dogs as category 1 and cats as category 2. It's worth noting that the mother did not label the images, so the child has no idea which animal is which. Instead, the child made observations about the features of both animals, such as nose shape, tail length, size, and so on, in order to categorize them.
﻿Source﻿
How Does an Unsupervised Machine Learning Model Work?An unsupervised machine learning model works in three stages — collecting the data that's needed, training the model to make sense of the unlabeled data, and then evaluating the model to see how it performs for a given set of inputs.  
Let's look at each step in isolation: 
Step 1: Collection of Necessary DataIn general, data collected for an unsupervised machine learning model is unstructured as it's in a more raw format. Even though unsupervised data sets are much bigger than labeled or supervised data sets, they are usually cheaper to collect, as they require no specific labeling or processing in order for the data set to be used. 
Step 2: Training of the ModelAs we'll see in some of the unsupervised machine learning algorithms, unlike supervised algorithms, such algorithms take in unlabeled data and try to make sense of it. This can be done by clustering all data points into given clusters or by discovering hidden patterns and trends.
Step 3: Model EvaluationTo make sure that our model is returning peak accurate results, we must deliberately test the model’s output on different and various input variables. We can then move on to tuning the model’s parameters in order to improve its final result.
Types of Unsupervised Learning
ClusteringClustering is the task of classifying unlabeled data into multiple groups (or 'clusters') based on their similarities and differences. Data points with the most similar features will be clustered together. Two of the most well-known unsupervised clustering algorithms are K-Means clustering and hierarchical clustering.
Association Association is an unsupervised machine learning technique that is used for discovering relations between variables. Association learning is commonly used in basket market analysis, in which the given algorithm tries to relate or find a given relationship between two products. For example, 90 percent of customers that buy product A also buy product B. Such hidden insights and patterns are incredibly useful for marketing purposes, boosting a company’s sales.
Popular Unsupervised Learning Algorithms
K-Means Clustering
﻿Source﻿
The K means clustering is an unsupervised machine learning algorithm that clusters data into a K number of clusters. Depending on the value of the K parameter, a different number of clusters would be created. For example, when K=2, the data will be divided into two clusters only. 
The K-Means Clustering Algorithm can be divided into three main steps. In the first step, the algorithm selects a random position for each K cluster. In the second step, the algorithm assigns each and every data point to its nearest K point, forming a K number of clusters. In the third step, The algorithm then tries to minimize the total distance between each point and its corresponding cluster by identifying a new K center point for each cluster that is more accurately located than the previous ones. 
This algorithm then keeps looping on the second and third steps until there is no more improvement on any of the clusters from the previous loop. The algorithm's primary goal is to identify the K points that best match the observed data points. 
K-means clustering loss function(Source)
Where µ [j] is cluster number j, ρ i[j] is a boolean indicator variable used to indicate whether the data point i belongs to cluster j (Source). One problem with such clustering algorithms is that it's extremely difficult to accurately define a correct value for the K parameter. In most cases, the model will either group dissimilar points together or separate similar points into different clusters giving a small range of error.
Hierarchical ClusteringHierarchical clustering is another unsupervised machine learning algorithm to cluster data points into similar groups. So how does such an algorithm work? To start with, the algorithm starts with each data point being its own cluster. Moving on, the algorithm takes each data point and searches for the nearest data point(cluster) to it. Second, the algorithm merges the two data points or clusters, forming a bigger cluster. 
The algorithm keeps looping until all the data points are fitted into a single cluster. After that, the algorithm draws a dendrogram (shown on the left of the image below) which shows which points form a given cluster. The distance between any two clusters can be measured to check for the similarity between two clusters. The nearer the clusters are, the more similar the clusters are as well.
﻿Source﻿
There are two types of hierarchal clustering. The first is the down-top(agglomerative) clustering algorithm (which is the algorithm described above). This algorithm starts from the bottom of the tree and moves upward, clustering the data points together. On the other hand, we also have the top-down(divisive) clustering algorithm. This algorithm starts from the top of the tree with all data points grouped into a single cluster and moves down the tree separating clusters into smaller ones.
Autoencoders (Neural Networks)Similar to supervised learning, neural networks can also be trained on unlabeled data. An autoencoder consists of two main parts, which are the encoder and decoder layers. The encoder layer takes a  given input and learns how to efficiently compress and encode such data. The decoder then reconstructs the encoded data representation returning the model's output.
"The main motivation for the encoding-decoding process is that if the decoder is able to reconstruct the image from the internal code with minimal error, then this indicates that only this internal representation (code) is necessary to represent the image, i.e, it represents the most significant features of the image. This can be seen as a feature extraction method." — Generalized Categorisation of Digital Pathology Whole Image Slides using Unsupervised Learning.
Layers of an autoencoder neural network(Source)
In the encoder part, the convolutional layers keep on getting smaller and smaller until we reach the bottleneck. The bottleneck is the most compressed convolutional layer. It is also the middle layer between the encoder and the decoder segments. Autoencoders are useful in removing noise from an image. 
There are multiple autoencoder loss functions. The first is the MSE loss function, which calculates the squared difference between the constructed and original images.
﻿Source﻿
The Structural Similarity Index Measure (SSIM), which is more complicated than MSE, is the second loss function that can be used to measure the model's performance.
﻿Source﻿
﻿
What is Unsupervised Learning Used ForUnsupervised learning has myriad uses. These include finding meaningful groupings and patterns in data,  extracting features, and a host of exploratory purposes. Its uses, therefore, range from cancer diagnosis to selecting who is eligible for a loan and beyond.
Let's consider how it is used in NLP and computer vision: 
Unsupervised Learning in Natural Language Processing“Natural language processing (NLP) refers to the branch of computer science—and more specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability to understand  text and the spoken words in much the same way human beings can.”[IBM, Natural language processing(NLP)]
In the case of natural language processing in supervised learning, a set of predefined keywords is provided as input to the model, while in unsupervised learning, no keywords are given to the model. Instead, the model should be able to extract such words on its own without help from the user. So what are some applications of unsupervised NLP?
1. Text Sentiment Analysis
﻿Source﻿
Sentiment analysis is the process of clustering different sentences depending on the semantical meaning that they hold. In sentiment analysis, a sentence can be either labeled as positive, neutral, or negative depending on the writer’s attitude toward a certain topic. For example, take the sentence, “I love the rain”. Such a sentence shows that the writer holds a positive sentimental attitude towards a certain topic which is the rain. The sentiment meaning of a given sentence can be identified using a list of keywords such as love, hate, like, dislike, etc.[TowardsDataScience, Unsupervised sentiment analysis]
Sentimental analysis is incredibly useful in real-world applications as it is heavily implemented in social media apps in order to detect and eliminate hate speech from all over the internet.
2. Speech Recognition
﻿Source﻿
Speech recognition is the ability of a machine learning model to extract meaning from human speech. Such models take as input an audio recording of human speech and decode it extracting all relevant information from it. While supervised speech recognition models offer great precision, unsupervised learning speech recognition enables us to generate precise predictions on never-before-seen data sets. Apple's Siri and Amazon's Alexa are two of the most popular speech recognition applications.
3. Artificial Intelligence Chatbots 
﻿Source﻿
AI chatbots are being heavily implemented in nearly every business and government sector nowadays. Such chatbots are capable of providing users with human-like interactions, answering questions, providing assistants, and more. Some companies that infuse chatbots into their services include Lyft, Spotify, and Starbucks.
There exist three generations of chatbots. The first generation was based on written rules. The programmer provided a specific list of answers for a specific list of questions. Moving on to the second generation, chatbots were infused with artificial intelligence, starting with supervised learning. Such chatbots were trained on a massive amount of labeled user chats. Second-generation or AI chatbots provided a way more dynamic answering mechanism to the model. As you may have guessed, the third and final chatbot generation integrates unsupervised learning to train its training models. Such models are trained on even bigger data sets that are unlabeled. The third-generation chatbots offer all the advantages of the first-generation while also having additional space to handle trickier and more complex situations. [Rulia, The 3 Different Generations Of Chatbot Technology]
Unsupervised Learning in Computer VisionComputer vision is a subfield of machine learning in which computers are capable of extracting useful information from visual data representations such as images and video recordings. The goal of the computer vision field is to allow computers to view the world in a matter similar to that of a human's visual eyesight.
In contrast to supervised image analysis in computer vision, in unsupervised learning, the model is trained on unlabelled images. It is up to the model to detect all anomalies in the image on its own.  A supervised learning model will typically produce superior results, with the exception of situations when the sought-after anomaly is difficult to identify. What are some uses for unsupervised computer vision, then?
1. Cancer Diagnosis
﻿Source﻿
Computers can recognize odd anomalies in a particular medical scan using unsupervised learning in computer vision. The model is initially provided with a massive quantity of unlabeled images that include both healthy and cancer-positive inputs. The model is then able to examine and contrast various photos, correctly detecting variations between the images. As a result, the model can later determine whether a particular scan contains a tumor. The model will be able to distinguish and label each form of tumor on its own in situations even where there are several different tumor types. As we did not properly identify each tumor type from the beginning, it is important to note that the model will assign each kind a number to identify it.
2. X-ray Diagnosis
﻿Source﻿
Similar to the cancer diagnosis model, the x-ray diagnosis model is fed a multitude of X-ray scans.  Any irregularities in the image can then be detected by the model. These models are excellent at seeing minute anomalies that doctors would overlook. We can use AI to find anomalies in the heart, lungs, pleura, mediastinum, bones, and diaphragm.
It is important to keep in mind that this model is still in its early stages of development. Therefore X-ray imagining will still require the occasional doctor checkup.
ConclusionTo answer the age-old question, which is superior: supervised or unsupervised learning? The answer is that it depends. 
While some machine learning practitioners may prefer supervised learning algorithms over unsupervised learning algorithms as they are easier to use and produce in most cases and will return more accurate results, it is worth noting that unsupervised learning also has its advantages, such as being more resistant to overfitting and being better suited to complex and unstructured data. 
In some cases, the user would be unsure where to start looking for hidden insights in a given data set, making unsupervised learning approaches extremely useful in such cases. Furthermore, while supervised learning data sets are much smaller in size than unsupervised data sets, they are far more difficult to collect and maintain. This is because each data point must be manually checked and labeled separately. A process like this could take months or even years to complete. Unsupervised data, on the other hand, has no definite structure and does not require labeling. 
Thus, whether supervised or unsupervised learning is used is highly dependent on the problem at hand.
﻿
﻿
Add a comment
Tags: Beginner, Articles, Tutorial, Panels, Domain Agnostic
Iterate on AI agents and models faster. Try Weights & Biases today.