Skip to main content

Machine Learning Glossary

A glossary of machine learning terms, frameworks, tasks and technologies.
Created on October 20|Last edited on November 22
This glossary is a work in progress. It is limited at present, but is being actively worked on.

A



B

Batch Normalization

Batch normalization in machine learning is a technique which standardized (or normalizes) the inputs to layers deep in a neural network, to avoid internal covariance shift.

BERT

Bidirectional Encoder Representations from Transformers, better known as BERT, is a revolutionary paper by Google that increased the State-of-the-art performance for various NLP tasks and was the stepping stone for many other revolutionary architectures.


C

Catastrophic Forgetting

The tendency of neural networks to completely forget how to do one task, when they are trained on another.

Classification

Classification is a supervised learning task concerned with predicting or categorizing data. It involves the systematic grouping of data into categories. Classification algorithms are especially useful in cases where there is a large volume of historical data to be categorized.

Cross Entropy Loss

Cross entropy loss is a metric used to measure how well a classification model in machine learning performs. The loss (or error) is measured as a number between 0 and 1, with 0 being a perfect model. The goal is generally to get your model as close to 0 as possible. Read more >

Cross-Validation

D

DataLoader

The DataLoader in PyTorch is a class that fetches data from a Dataset and serves the data in batches to the model. Generally batches will be created for training, and one for testing. Read more >

Dropout

Dropout is a machine learning technique where you remove (or "drop out") units in a neural net to simulate training large numbers of architectures simultaneously. Importantly, dropout can drastically reduce the chance of overfitting during training.

Decision trees

Decision trees are supervised learning models used for classification and regression problems. A decision tree learns rules that “branch” off into different predictions based on the features of a data point, in order to predict some value of a new data point.


E



F

FinBERT

Dropout is a machine learning technique where you remove (or "drop out") units in a neural net to simulate training large numbers of architectures simultaneously. Importantly, dropout can drastically reduce the chance of overfitting during training.


G



H



I

Image Classification

In image classification, a system detects objects and produces a Boolean true or false, answering whether a particular image belongs to a certain class or not. The goal of the classification is to assign a label to each image.

Image Segmentation

Figure 1 .2 Image segmentation classifying each pixel in an image
Image segmentation, sees the system assign a label to every pixel in an image. A class is assigned to each pixel, defining what the system believes it to be and which object it belongs to.

J



K

K-Fold Cross Validation

K-fold cross-validation is a procedure where a dataset is divided into multiple training and validations sets (folds) where k is the number of them to help safeguard the model against random bias caused by the selection of only one training and validation set.

K-Means Clustering

k-means clustering is an unsupervised learning algorithm used for clustering problems. The goal is to partition data points into a pre-specified k number of clusters, which each data point belonging to the cluster with the nearest center.

K-Nearest Neighbors

k-nearest neighbors, or knn, is a supervised learning algorithm used primarily for classification problems. The goal is to predict the probability that a data point belongs to a certain class, based on which class(es) the data points near it belong to.

L

Linear Regression

Linear regression is a supervised learning algorithm used for regression problems. The goal is to identify the hyperplane that best predicts the value of some relationship between two or more features within a specified dataset, in order to predict new values.
You can read more on linear regression in this article.
You can read the mathematical definition in our ML mathematics glossary.

Logistic regression

Logistic regression is a supervised learning algorithm used primarily for classification. The goal is to identify the logistic curve that best predicts the probability that an input belongs to some class, which is then used to map the input to an actual class.

M

Meta Learning

Meta-learning in neural networks refers to the approach of using a reward and/or error system to teach said system to solve problems outside its trained domain. Rather than looking directly at the data however, the system instead looks to the output of the algorithm and trains on making predictions based on that.


N

Naive Bayes

Naive Bayes classifiers are supervised learning models used for classification problems. A naive Bayes model uses Bayes’ theorem to calculate the probability that a data point belongs to each possible class, in order to identify the most probable class.

Neural Network Pruning

One popular approach for reducing the resource requirements at test time is Neural Network Pruning. This means systematically removing parameters (neurons, connections, etc.) from an existing network to try to reduce down its size.


O

Object Detection

Object Detection is a computer vision technique in which software learns to identify and locate objects in a video or digital image. Once an object has been identified and localized, an Object Detection algorithm can also label it.

Optical Character Recognition (OCR)

Optical Character Recognition (OCR) is a computer vision and machine learning technique that extracts the text from images, generally to make it usable by other systems including image search and software-based receipt processing.

Optimizer

In deep learning, an optimizer is a function or algorithm that is dependent on a neural network's Weights & Biases. The optimizer modifies these parameters with the goal of reducing the loss with minimal effort.


P

Permutation-Invariance

Permutation-invariance in machine learning refers to a system in which reordering the inputs does not impact the output.

Policy

In machine learning, a policy is a formula based on the current environment, the possible set of actions, the probability that the action will result in a state change, and the reward function. The policy is used to steer a model to the highest reward.

Principal component analysis (PCA)

Principal component analysis (PCA) is an unsupervised dimensionality reduction algorithm. The goal is to compute a dataset’s principal components (PCs), new features derived from the original features. Typically, only the first two to three PCs are kept, allowing the dataset to be remapped into two or three dimensions.

Q



R



S

Self-Organization

Self-organization in neural networks, describes the ability of a self-supervised system to take local interactions between disorganized parts of itself and create from that a coherent policy.

Sensory Neuron

In a neural network, a sensory neuron (or sensory input neuron) is a node which takes input from "the outside world" and after processing it through the activation function, passes the resulting value along.

Support Vector Machines (SVMs)

Support vector machines (SVMs) are supervised learning models used for classification. An SVM is the hyperplane that best separates different classes within a dataset, in order to classify a new data point by identifying which side of the hyperplane (aka which class) it belongs to.

T

Tokens

A token in Natural Language Processing is a representation of a word, word segment (subword) or character. When text is being processed, a tokenizer breaks that text into tokens, so those tokens can be processed by the system with historically higher efficiency that processing the same text character-by-character.

U



V



W

Weight Initialization

Weight Initialization was first discussed as a "trick" to prevent certain undesirable behaviours during neural network training. The initial values of the weights can have a significant impact on the training process. Read more >


X



Y

YOLO

YOLO stands for You Only Look Once and is an extremely fast object detection framework using a single convolutional network. YOLO is frequently faster than other object detection systems because it looks at the entire image at once as opposed to sweeping it pixel-by-pixel.


Z

Zero-Shot Learning

Zero-shot learning is a machine learning term that describes the ability of a model to be applied to a task when it has received no training on that task, but has been trained on tasks of other types.