Using version tags and artifacts to organize your most valuable ML assets

A primer on how to track and organize models, code, datasets, and artifacts in Weights & Biases
Created on October 21|Last edited on November 15
Comment
While unlabeled artifacts scattered across multiple environments prevent a busy ML platform engineer from easily identifying and retrieving desired models and datasets, a properly organized and curated central artifact repository allows ML platform engineers, practitioners, and programmatic workflows to easily search and find the artifacts that they need. As the system of record to train, fine-tune, and manage AI models, Weights & Biases offers Registry, the ideal location to publish, share, and easily find all of your ML models and datasets.
﻿
In the context of machine learning artifacts, discoverability refers to the ease with which users—and applications—can locate, access, and understand published models and datasets. Weights & Biases has recently made a number of enhancements that improve artifact discoverability and make life much easier for your ML team. Support for artifact version tags and the ability to more easily search, filter, and group artifact versions using these tags in both the GUI and the SDK facilitates more seamless and efficient ML workflows and a smoother journey for model development from training to production deployment.
With an eye towards CI/CD pipeline optimization and the post-experiment tracking stages of the ML lifecycle, Weights & Biases has invested in promoting discoverability and improving model management. In this blog post, we'll: 
Explore why discoverability is essential to an effective MLOps workflow,
Dive into Registry, aliases, and tags, the winning combination for managing and easily identifying your machine learning artifacts,
Take a look at how to use the new artifact version tag features,
And discuss a couple of business use cases where aliases and the new artifact version tags facilitate ML workflows.
Let's start with why this is important. 
Why is discoverability important?Improved discoverability benefits many stages of the ML lifecycle and provides a boost in both model quality and speed-to-deployment. Before we focus on the Weights & Biases discoverability features, let’s explore why artifact discoverability is so important to an ML organization.
EfficiencyEasy discoverability of models saves time and resources. Teams can quickly find and reuse existing models, rather than spending time tracking them down or creating new ones. This accelerates the development, evaluation, and deployment process and allows teams to build on and take advantage of previous work.
CollaborationAn artifact registry that supports discoverability fosters better collaboration among team members. ML engineers and platform engineers throughout an organization can easily share models, insights, and documentation, promoting teamwork and knowledge sharing. Aliases, tags and metadata can all help users and teams quickly find the artifacts and information they need in the moment. 
Accessibility to this type of information can be especially helpful during onboarding when new team members are exploring and coming up to speed on a repository. Another example of collaboration resulting from discoverability is an ML practitioner checking the artifact registry prior to beginning work on a new project to see whether other users or teams have published models or artifacts that might be helpful or related. When similar or identical work already exists and can be easily found, there is less likelihood of duplicative work.
Consistency and standardization Discoverability helps ensure that standardized models and practices are used across the entire organization. Multiple users or teams with multiple ways of doing things tend to evolve into a “wild west” situation where each user or team is on their own making sense of artifacts strewn across multiple environments.
Governance and compliance Many industries have strict regulatory and compliance requirements regarding the use of machine learning models. A discoverable registry makes it easier to audit and ensure that all models and other artifacts meet necessary compliance and governance standards. 
For example, active curation by an ML Platform engineer combined with the necessary filtering functionality in an artifact registry can be used to proactively enforce that ML engineers can access only base models from external sources with valid licenses. Models lacking a proper license for usage can be either excluded altogether or identified via artifact tags. And beyond just licensing issues, ensuring that only the right artifacts are discoverable by the right users is of the utmost importance in maintaining a secure environment.
Improving model performance By making models discoverable, it's easier to compare different models and choose the best performing one for a given task. This also allows teams to learn from past successes and failures, leading to continuous improvement in model performance.
Knowledge retention Teams often experience turnover, and valuable knowledge can be lost when key members leave. A discoverable registry helps retain knowledge by providing a centralized place to store and share models, their metadata, and their performance metrics.
Scalability As organizations grow and the number of artifacts increases, having a system with good discoverability ensures that the artifact repository scales effectively. Without discoverability, managing a large number of models can become chaotic and inefficient.
﻿
How Weights & Biases helpsAs the central repository that stores and provides versioning, aliases, lineage tracking, and governance of models and datasets, W&B Registry offers a number of features to elevate discoverability across an ML organization. There are, of course, standardized naming conventions and metadata usage techniques that improve discoverability, but those often are not as elegant and involve more coding work—and potential for problems. 
Let’s take a quick look at our primary ways to address identifying and referencing artifacts, aliases and tags, and dig into how they help to make artifacts both more discoverable and more easily available to users and automated workflows.
RegistryIt can be helpful to think of Registry as a catalog of bookmarked artifacts. Typically, these might be the best performing models and most effective training datasets that are instrumental to CI/CD workflows, assets that provide the most value when shared among multiple users and teams. Whereas other model registries limit organizations to a rigid, models-only repository, Weights & Biases allows all artifact types and provides the ML platform engineer with the flexibility to create a Registry structure that is the best fit for an organization.
Within a governed and secure environment, Registry allows users across an enterprise to share models and datasets regardless of whether they work on the same team or same project. Users with the proper permissions can search Registry by team, artifact name, and tags to locate models, datasets, and other artifacts. Just as a library uses the Dewey Decimal System to make books easy to find, Registry uses default and custom registries, artifact collections, aliases, and tags to make all machine learning artifacts easy to find and easy to use if you have permission to do so.
Registry has an intuitive interface that makes managing registries and collections simple. That means it's easy to find detailed artifact information, robust lineage tracking, and dependency graphs that help you better understand and visualize your ML pipeline. As we mentioned, governance and compliance benefit greatly from improved discoverability which not only makes artifacts easier to find, but also makes it easy to figure out which runs used or produced which artifacts. The lineage graphs (and supporting data available in Weights & Biases) provide the exact recipe for any given artifact, allowing you to reproduce any model, address audit inquiries, and adhere to industry and company policies.
AliasesAliases are unique identifiers that allow you to identify or reference a specific artifact version from a collection. Every artifact generated by a W&B run automatically receives a version alias based on the order in which it was generated. It’s often helpful to be able to quickly identify the most recent version of an artifact and W&B makes this easy by automatically applying the “Latest” alias to the most recent artifact version in a collection.
Adding aliases to artifacts can be done either using the SDK or through the UI. Another common use case is programmatically applying an alias such as ”best” for the best performing model checkpoint generated during an experiment. When multiple model checkpoints are generated, it's helpful to track which has performed the best during the run itself rather than sifting through the metadata after the run has completed.
The enforced uniqueness of aliases makes them extremely valuable when searching for and retrieving a specific artifact version. But there are times when you want to track down and retrieve multiple artifact versions based on a search filter. That's when tags are the answer.
TagsWhen multiple artifact versions share a common property that enables grouping and filtering, tags are the way to go. 
Like aliases, they can be added to an artifact version either programmatically or using the UI. Unlike aliases, tags are always created by users, not automatically by Weights & Biases. Tags are not exclusive to artifact versions and can also be used for other objects inside of W&B Models such as artifact collections or experiment runs. They're particularly helpful at the granular version level when dealing with model and dataset versions.
Tags are simple on the surface, but a thoughtful and comprehensive tagging strategy translates to increased discoverability throughout the ML lifecycle. And increased discoverability means increased productivity. 
Let’s take a look at some business use cases for using artifact version tags below:
How to find the right artifactsThere are multiple areas of W&B Models where aliases and tags are helpful, but let’s hone in on Registry, the center of our system of record.
Ease of search in the UI Inside of every default or custom registry, there is a search bar that reads “Search collection names, tags, and version tags.” 
Yes, it really is this easy. 
Just enter the label associated with the objects that you are hoping to find and they will be front and center in the search results. UI search is great for an ML platform engineer manually curating Registry artifacts or hunting down specific artifacts and the ML engineer looking for work from others that might be helpful or standardized datasets to use for model training. Users for whom the web browser is the primary W&B interface will derive a ton of benefit from the intuitive process that will produce search results from the artifact hierarchy so easily.
﻿
Search and retrieval with the SDK Programmatic workflows—such as continuous integration and continuous deployment (CI/CD)—use the SDK to find desired models and datasets. 
The SDK, of course, permits identifying and downloading any individual artifact using the name and alias. With minimal additional effort, it is now possible to filter artifact versions by a tag or tag combination and retrieve them using a for loop. Just check out this quick code snippet below that shows how to filter all artifact versions in a given collection (“Artifact Demo Models”) with the tag “us-west-2.” 
import wandb
﻿
def main():
        api = wandb.Api()
        production_versions = api.artifacts(type_name="model", name="wandb-registry-model/Artifact Demo Models", tags=['us-west-2'])
        for pv in production_versions:
                print("Artifact Version: " + str(pv.name) + "\nTags: " + str(pv.tags) + "\n")
﻿
﻿
if __name__ == "__main__":
    main()
Business use casesWe’ve discussed why discoverability of models and datasets is important and how Weights & Biases provides ways to maximize discoverability using Artifacts and Registry. Now let’s examine how organizations use these features to achieve business goals.
AliasesOne of the most important purposes of aliases in W&B Models is to trigger automated workflows—or Automations—often used for model testing and deployment as part of a CI/CD pipeline. 
A user can choose whether an Automation should be triggered when a new model version is added to a collection or when a specific alias, such as “candidate” or “production,” is applied to an artifact version either manually or programmatically. An Automation can trigger either a webhook call with a user-specified payload or a W&B Launch job. You can read more about kicking-off CI/CD workflows with Automations here.
When a specific model version within a collection has passed through an evaluation process and been labeled “production” or “champion,” this is generally indicative of a scenario where a single, specific artifact version is deployed in a single production environment. When one version is good enough, aliases are generally the way to go.
Ensuring that teams are using the latest version of a shared dataset guarantees consistency. Granular transaction datasets are extremely valuable to multiple teams and support multiple machine learning use cases for retail companies. For example, purchase data can be used to train seasonal forecast models and customer churn models.
But, as we have discussed, when multiple artifact versions within a single collection are relevant to production deployments or important internal workflows, version tags open up a world of possibilities.
Tags Imagine that you have generated multiple models that all belong in the same collection, but version numbers and the order in which the model was added to Registry are inconsequential in deciding which models are deployed. Instead, multiple models are deployed into production environments, but each production model is deployed to a different region. 
Referencing our code snippet above, tags can be used to label the appropriate region for each model, such as "north_america," "south_america," "emea," "apac," and "africa," while an additional tag can be used to label them as “production” or “candidate” models. When multiple production models exist, retrieval using tags offers a more elegant solution than aliases. And to access only the production model for the North America region, just include both tags, “north_america” and “production,” in the search. 
It should be easy to extrapolate from this example how tags can be used in place of aliases when multiple “production” models exist. Other examples include models that were trained or are intended to be used in specific compute environments or models built for the same purpose, but for different customers. Any instance where it is useful to filter multiple artifacts by non-unique values is a good use case for tags.
Fine-tuning a large language model (LLM) tailors it for specific tasks or domains, improving its accuracy, performance, and/or relevance by training on a smaller, specialized dataset. Fine-tuning requires selecting both a base model, such as Llama or Mistral, and training data, such as the Alpaca or some internal dataset. A given project may use multiple base models and multiple datasets in an effort to achieve the best results. 
While all of the fine-tuned models associated with this project may use a different combination of input artifacts, they are all associated with the same project and may be added to the same collection. In order to differentiate the models, tags can be applied denoting both the base model and the training dataset. This allows both easy searching by tag from within the W&B interface, and also easy retrieval and comparison of models by specifying different base models and training datasets from the SDK.
One of the key benefits of W&B Registry is that it can be used to store not just models, but also datasets and any other type of artifact. Training datasets come in all shapes and sizes, and just as models may be similar enough to store in a single collection yet require different tags to aid in discoverability, the same is true for datasets. Many fine-tuning datasets are generated by LLM’s. Version tags can be used to identify datasets generated using the same prompt and serving the exact same purpose, but generated by different LLM’s. Another example of when version tags can be used for multiple datasets in the same collection is for language purposes. The datasets may serve the exact same purpose, but tags can be used to identify whether they are in English or Japanese.
ConclusionArtifact discoverability is critical to the success of any enterprise-scale machine learning organization. Making artifacts as easy to find as possible benefits both the freshly hired ML engineer looking for the right dataset to use for model training and the automated CI/CD workflow evaluating the most recent candidate model published to Registry.
Aliases serve a very important purpose, identifying artifacts within a collection and triggering Automations, but the value of uniqueness is also a limitation. When a more complex strategy is required to organize and retrieve artifacts, version tags open up a world of possibilities. W&B Registry lays the foundation for a more efficient, better structured artifact repository. Artifact version tags are the latest in a series of enhancements that significantly improve discoverability and make Weights & Biases the ideal system of record to train, fine-tune, and manage AI models.
Getting StartedIf you are interested in seeing the Python code and experimenting with the new SDK artifact filtering, please visit our product docs﻿﻿.
﻿
﻿
﻿
﻿
﻿
﻿
﻿
Add a comment
Tags: Articles, W&B Features, Model Registry
Iterate on AI agents and models faster. Try Weights & Biases today.