HuggingFace's Newest LLM!

HugginFace's GLM-130B is a bilingual model trained on Chinese and English.

Created on January 17|Last edited on January 17

Comment

GLM-130B, a Chinese and English speaking pre-trained LLM is now in HuggingFace Spaces! Check it out here! If you'd like to see the repo, check here.
﻿
﻿
There have been tons of popular LLMs recently (BLOOM comes to mind for me!). What makes this one unique?
In their paper, they describe the limitations of autoencoding models (BERT), autoregressive models (GPT), and encoder-decoder models (T5). Instead of having different frameworks for these natural language tasks, GLM-130B presents a General Language Model approach to tackling all of these tasks. They achieve performance improvements over BERT, GPT, and T5 in their respective tasks!
Outlined in their paper, they simply take a string of words denoted xxx﻿ of nnn﻿ words and select mmm﻿ spans of consecutive tokens of random length from the string. These consecutive spans are then replaced with a special token [MASK][MASK][MASK]﻿  in xxx﻿. Masked sequences later in the string xxx﻿ are predicted given all previous tokens, including the earlier predicted masks.
They also introduce a system of encoding inter and intra-position of these masked tokens. The authors specify this 2D position encoding approach to be more robust because it doesn't specify to the model how long the masked span is, as common downstream tasks of text generation are usually of variable length. 
I'd highly recommend reading the paper. Currently, a lot of AI systems are designed and tuned to tackle one specific natural language task. GLM proposes a more general, unifying approach to these tasks. Perhaps in the future, Language Modeling would follow a universal best approach kind of like how all LLMs are using Transformers now.
Recommended Resources﻿Visit The GLM-130B GitHub page﻿
﻿

Add a comment

Tags: ML News

Iterate on AI agents and models faster. Try Weights & Biases today.