Text to Motion with GPT!
T2M-GPT simulates human motion from text descriptions.
Created on January 19|Last edited on January 19
Comment
T2M-GPT presents a unique approach to creating discrete human motions with text descriptions! The approach they introduce is composed of 2 modules: a Motion VQ-VAE and a T2M-GPT. If you don't know, an autoencoder is simply a neural network with 2 parts: an encoder and a decoder. Its job is to turn its input into a vector in a specific dimension with a VAE as a flavor of AEs which provides statistical information (and lets you generate data points from the encoder). And GPT stands for Generative Pretrained Transformer. It is a type of large language model (with recent news like ChatGPT) specializing in, roughly speaking, text generation.
The Motion VQ-VAE or Vector Quantized Variational Auto-Encoder is tasked with taking in a set of discrete human motion representations and encoding it into a space with dimension . Additionally, the VQ-VAE has something called a codebook which can be thought of as a lookup table for human motion (hence the discrete representations part of the title). From this, the T2M-GPT is in charge of processing a text input description and generating an index into this codebook. In large, given a set of encoded representations for motion, T2M-GPT selects these representations in a specific order to best capture the motion described in the text input!
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.