Text to Motion with GPT!

T2M-GPT simulates human motion from text descriptions.

Created on January 19|Last edited on January 19

Comment

﻿
﻿
T2M-GPT presents a unique approach to creating discrete human motions with text descriptions! The approach they introduce is composed of 2 modules: a Motion VQ-VAE and a T2M-GPT. If you don't know, an autoencoder is simply a neural network with 2 parts: an encoder and a decoder. Its job is to turn its input into a vector in a specific dimension with a VAE as a flavor of AEs which provides statistical information (and lets you generate data points from the encoder). And GPT stands for Generative Pretrained Transformer. It is a type of large language model (with recent news like ChatGPT) specializing in, roughly speaking, text generation.
The Motion VQ-VAE or Vector Quantized Variational Auto-Encoder is tasked with taking in a set of discrete human motion representations and encoding it into a space with dimension dcd_cdc​﻿. Additionally, the VQ-VAE has something called a codebook which can be thought of as a lookup table for human motion (hence the discrete representations part of the title). From this, the T2M-GPT is in charge of processing a text input description and generating an index into this codebook. In large, given a set of encoded representations for motion, T2M-GPT selects these representations in a specific order to best capture the motion described in the text input! 
If you'd like, take a look at their paper and their repo!
﻿
﻿
﻿

Add a comment

Tags: ML News

Iterate on AI agents and models faster. Try Weights & Biases today.