Skip to main content

Explosion Releases spaCy v3.3

The most recent update for Explosion's spaCy natural language processing library has been released. The update offers new functionality, language support, and performance increases.
Created on April 29|Last edited on April 29
Explosion has released a new update for spaCy, their open-source natural language processing library. spaCy is designed for production environments and can be easily used in your NLP projects written in Python. With support for over 60 language, spaCy provides a number of pretrained models and utilities to help you process natural language with machine learning, even offering powerful custom pipelines for your specific training needs.

What's new in spaCy v3.3?

One of the cornerstone pieces to this update is the new trainable lemmatizer component, a piece to spaCy's NLP pipeline which converts words to their base form (ie. dictionary form; removes conjugation) for clearer base understanding. A non-trainable rule-based lemmatizer component is also still available from previous updates.
Three new trained pipelines have been added for Finnish, Korean, and Swedish. These new pipelines use the new trainable lemmatizer, and come in a variety of model sizes. In addition, 10 other languages have had their pipelines updated to use the new trainable lematizer feature.
Other changes include: A flat performance increase of up to 15% on longer texts across the board, the displaCy visualizer now supports highlighting overlapping spans in a text, and a variety of new plugins, extensions, pipelines, and tutorials added since v3.2.
You can find the detailed changelog on the spaCy github repo here: https://github.com/explosion/spaCy/releases/tag/v3.3.0

Find out more

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.