Skip to main content

Wandb integration with NIM (internal or partner only)

Created on April 25|Last edited on April 25


What is NIM?

"NVIDIA NIM is designed to bridge the gap between the complex world of AI development and the operational needs of enterprise environments, enabling 10-100X more enterprise application developers to contribute to AI transformations of their companies. " (NVIDIA NIM Offers Optimized Inference Microservices for Deploying AI Models at Scale)



Why is NIM needed?

Pain points in the process of LLM-based app development are;
  • Prediction interfaces and output formats differ for each model. -> This makes it difficult to compare many models quickly.
  • Optimize GPU usage during inference requires specialized knowledge.
So, standardized interface and easy deployment on any devices in an optimal way are important for LLM-based app development

How does NIM work?

Before NIM: (material link)

What is the difference between NIM and TensorRT-LLM / Triton inference Server?


WandB integration with NIM

Now, you can kick NIM process with wandb launch by picking up a model in wandb model registry.


Only A100 level GPU are compatible
💡

Demo



Get started

There is a guideline in README.

Step2: Create a queue
git clone https://github.com/wandb/launch-jobs.git
Step3: install libraries
pip install -r requirements.txt
Step4: create job
wandb job create \
-n "deploy-to-nvidia-nemo-inference-microservice" \
-e wandb-japan \
-p nimtest \
-E jobs/deploy_to_nvidia_nemo_inference_microservice/job.py \
git https://github.com/wandb/launch-jobs

Step5: create agent & run agent
config

net: host
gpus: all
volume:
- /mnt/batch/tasks/shared/LS_root/mounts/clusters/nim/code/launch-jobs/jobs/deploy_to_nvidia_nemo_inference_microservice/artifacts:/launch/artifacts/
- /mnt/batch/tasks/shared/LS_root/mounts/clusters/nim/code/launch-jobs/jobs/deploy_to_nvidia_nemo_inference_microservice/model-store:/model-store/
runtime: nvidia
env-file: /home/nvidia/launch-jobs/jobs/deploy_to_nvidia_nemo_inference_microservice/.env
Step6: create queue & kick off
yaml
{
"args": [],
"entry_point": [],
"run_config": {
"artifact":"vanpelt/support-llama/merged:v0",
"artifact_model_type":"llama"
}
}


  1. Model registry (wandb only):