Wandb integration with NIM (internal or partner only)

Created on April 25|Last edited on April 25
Comment
﻿﻿﻿
﻿Press-release﻿
What is NIM?﻿﻿"NVIDIA NIM is designed to bridge the gap between the complex world of AI development and the operational needs of enterprise environments, enabling 10-100X more enterprise application developers to contribute to AI transformations of their companies. " (NVIDIA NIM Offers Optimized Inference Microservices for Deploying AI Models at Scale)
﻿
﻿
Why is NIM needed?Pain points in the process of LLM-based app development are;
Prediction interfaces and output formats differ for each model. -> This makes it difficult to compare many models quickly.
Optimize GPU usage during inference requires specialized knowledge.
So, standardized interface and easy deployment on any devices in an optimal way are important for LLM-based app development
How does NIM work?Before NIM:  (material link)
﻿
What is the difference between NIM and TensorRT-LLM / Triton inference Server?
﻿
WandB integration with NIMNow, you can kick NIM process with wandb launch by picking up a model in wandb model registry.
﻿
﻿
Only A100 level GPU are compatible
💡
Demo﻿Demo movie in GTC﻿
﻿
Get startedgithub: https://github.com/wandb/launch-jobs/tree/main/jobs/deploy_to_nvidia_nemo_inference_microservice﻿
There is a guideline in README.﻿﻿
﻿
Step1: Apply for NIM access﻿
Step2: Create a queue
git clone https://github.com/wandb/launch-jobs.git
Step3: install libraries
pip install -r requirements.txt
Step4: create job
wandb job create \
-n "deploy-to-nvidia-nemo-inference-microservice"    \
-e wandb-japan    \
-p nimtest    \
-E jobs/deploy_to_nvidia_nemo_inference_microservice/job.py    \
git https://github.com/wandb/launch-jobs
﻿
Step5: create agent & run agent 
config
﻿
net: host
gpus: all
volume:
  - /mnt/batch/tasks/shared/LS_root/mounts/clusters/nim/code/launch-jobs/jobs/deploy_to_nvidia_nemo_inference_microservice/artifacts:/launch/artifacts/
  - /mnt/batch/tasks/shared/LS_root/mounts/clusters/nim/code/launch-jobs/jobs/deploy_to_nvidia_nemo_inference_microservice/model-store:/model-store/
runtime: nvidia
env-file: /home/nvidia/launch-jobs/jobs/deploy_to_nvidia_nemo_inference_microservice/.env
Step6: create queue & kick off 
 yaml
{
  "args": [],
  "entry_point": [],
  "run_config": {
    "artifact":"vanpelt/support-llama/merged:v0",
    "artifact_model_type":"llama"
  }
}
﻿
﻿
Example model : https://wandb.ai/vanpelt/support-	llama/artifacts/model/merged/v0/usage﻿
Model registry (wandb only): 
﻿
Add a comment