Machine Learning Glossary
A glossary of machine learning terms, frameworks, tasks and technologies.
Created on October 20|Last edited on November 22
Comment
This glossary is a work in progress. It is limited at present, but is being actively worked on.
A
B
Batch Normalization
Batch normalization in machine learning is a technique which standardized (or normalizes) the inputs to layers deep in a neural network, to avoid internal covariance shift.
BERT
Bidirectional Encoder Representations from Transformers, better known as BERT, is a revolutionary paper by Google that increased the State-of-the-art performance for various NLP tasks and was the stepping stone for many other revolutionary architectures.
C
Catastrophic Forgetting
The tendency of neural networks to completely forget how to do one task, when they are trained on another.
Classification
Classification is a supervised learning task concerned with predicting or categorizing data. It involves the systematic grouping of data into categories. Classification algorithms are especially useful in cases where there is a large volume of historical data to be categorized.
Cross Entropy Loss
Cross entropy loss is a metric used to measure how well a classification model in machine learning performs. The loss (or error) is measured as a number between 0 and 1, with 0 being a perfect model. The goal is generally to get your model as close to 0 as possible. Read more >
Cross-Validation
D
DataLoader
The DataLoader in PyTorch is a class that fetches data from a Dataset and serves the data in batches to the model. Generally batches will be created for training, and one for testing. Read more >
Dropout
Dropout is a machine learning technique where you remove (or "drop out") units in a neural net to simulate training large numbers of architectures simultaneously. Importantly, dropout can drastically reduce the chance of overfitting during training.
Decision trees
Decision trees are supervised learning models used for classification and regression problems. A decision tree learns rules that “branch” off into different predictions based on the features of a data point, in order to predict some value of a new data point.
E
F
FinBERT
Dropout is a machine learning technique where you remove (or "drop out") units in a neural net to simulate training large numbers of architectures simultaneously. Importantly, dropout can drastically reduce the chance of overfitting during training.
G
H
I
Image Classification
In image classification, a system detects objects and produces a Boolean true or false, answering whether a particular image belongs to a certain class or not. The goal of the classification is to assign a label to each image.
Image Segmentation

Figure 1 .2 Image segmentation classifying each pixel in an image
Image segmentation, sees the system assign a label to every pixel in an image. A class is assigned to each pixel, defining what the system believes it to be and which object it belongs to.
J
K
K-Fold Cross Validation
K-fold cross-validation is a procedure where a dataset is divided into multiple training and validations sets (folds) where k is the number of them to help safeguard the model against random bias caused by the selection of only one training and validation set.
K-Means Clustering
k-means clustering is an unsupervised learning algorithm used for clustering problems. The goal is to partition data points into a pre-specified k number of clusters, which each data point belonging to the cluster with the nearest center.
K-Nearest Neighbors
k-nearest neighbors, or knn, is a supervised learning algorithm used primarily for classification problems. The goal is to predict the probability that a data point belongs to a certain class, based on which class(es) the data points near it belong to.
L
Linear Regression
Linear regression is a supervised learning algorithm used for regression problems. The goal is to identify the hyperplane that best predicts the value of some relationship between two or more features within a specified dataset, in order to predict new values.
Logistic regression
Logistic regression is a supervised learning algorithm used primarily for classification. The goal is to identify the logistic curve that best predicts the probability that an input belongs to some class, which is then used to map the input to an actual class.
M
Meta Learning
Meta-learning in neural networks refers to the approach of using a reward and/or error system to teach said system to solve problems outside its trained domain. Rather than looking directly at the data however, the system instead looks to the output of the algorithm and trains on making predictions based on that.
N
Naive Bayes
Naive Bayes classifiers are supervised learning models used for classification problems. A naive Bayes model uses Bayes’ theorem to calculate the probability that a data point belongs to each possible class, in order to identify the most probable class.
Neural Network Pruning
One popular approach for reducing the resource requirements at test time is Neural Network Pruning. This means systematically removing parameters (neurons, connections, etc.) from an existing network to try to reduce down its size.
O
Object Detection
Object Detection is a computer vision technique in which software learns to identify and locate objects in a video or digital image. Once an object has been identified and localized, an Object Detection algorithm can also label it.
Optical Character Recognition (OCR)
Optical Character Recognition (OCR) is a computer vision and machine learning technique that extracts the text from images, generally to make it usable by other systems including image search and software-based receipt processing.
Optimizer
In deep learning, an optimizer is a function or algorithm that is dependent on a neural network's Weights & Biases. The optimizer modifies these parameters with the goal of reducing the loss with minimal effort.
P
Permutation-Invariance
Permutation-invariance in machine learning refers to a system in which reordering the inputs does not impact the output.
Policy
In machine learning, a policy is a formula based on the current environment, the possible set of actions, the probability that the action will result in a state change, and the reward function. The policy is used to steer a model to the highest reward.
Principal component analysis (PCA)
Principal component analysis (PCA) is an unsupervised dimensionality reduction algorithm. The goal is to compute a dataset’s principal components (PCs), new features derived from the original features. Typically, only the first two to three PCs are kept, allowing the dataset to be remapped into two or three dimensions.
Q
R
S
Self-Organization
Self-organization in neural networks, describes the ability of a self-supervised system to take local interactions between disorganized parts of itself and create from that a coherent policy.
Sensory Neuron
In a neural network, a sensory neuron (or sensory input neuron) is a node which takes input from "the outside world" and after processing it through the activation function, passes the resulting value along.
Support Vector Machines (SVMs)
Support vector machines (SVMs) are supervised learning models used for classification. An SVM is the hyperplane that best separates different classes within a dataset, in order to classify a new data point by identifying which side of the hyperplane (aka which class) it belongs to.
T
Tokens
A token in Natural Language Processing is a representation of a word, word segment (subword) or character. When text is being processed, a tokenizer breaks that text into tokens, so those tokens can be processed by the system with historically higher efficiency that processing the same text character-by-character.
U
V
W
Weight Initialization
Weight Initialization was first discussed as a "trick" to prevent certain undesirable behaviours during neural network training. The initial values of the weights can have a significant impact on the training process. Read more >
X
Y
YOLO
YOLO stands for You Only Look Once and is an extremely fast object detection framework using a single convolutional network. YOLO is frequently faster than other object detection systems because it looks at the entire image at once as opposed to sweeping it pixel-by-pixel.
Z
Zero-Shot Learning
Zero-shot learning is a machine learning term that describes the ability of a model to be applied to a task when it has received no training on that task, but has been trained on tasks of other types.
Add a comment