Generating a Weights & Biases advertisement using diffusion

Utilizing the power of open-source diffusion models using HuggingFace Diffusers to generate an advertisement video for Weights & Biases
Soumik Rakshit
Created on June 3|Last edited on August 1
Comment
It was a warm Wednesday afternoon. I was furiously coding (to maintain the best tools for machine learning practitioners, naturally) when my colleagues Darek and Morgan reached out to me on Slack regarding a fun project: to generate a 10-second advertisement video for Weights & Biases.
﻿
Over the next few weeks, that's exactly what I did. I should mention that I also worked closely with our video production team to generate the following video advertising Weights & Biases. In this report, I'll share how it all came together. But first, the finished product: 
﻿
﻿﻿﻿
🚀 Weave for LLMs and Diffusion Models💭 Initial ideation stageSDXL → Stable Video DiffusionAnimateDiffObservations and Conclusions⚗️ Interpolation with Stable DiffusionInterpolating between two sets of prompt embeddingsInterpolating between two sets of latent vectorsObservations and conclusion🧬 Controlled interpolation with Stable DiffusionObservations and conclusion🧶 Weaving the final product🚀 Weave for faster development cycles
﻿
🚀 Weave for LLMs and Diffusion ModelsLet's talk briefly about the product for which we worked so hard to create a marketing video.
﻿Weave is a lightweight toolkit for tracking and evaluating LLM applications built by Weights & Biases. It aims to bring rigor, best practices, and composability to the inherently experimental process of developing AI applications without introducing cognitive overhead. You can use Weave to:
Log and debug language model inputs, outputs, and traces
Build rigorous, apples-to-apples evaluations for language model use cases
Organize all the information generated across the LLM workflow, from experimentation to evaluations to production
﻿
﻿
💭 Initial ideation stageAt the very beginning of the project, we didn't have much idea about how exactly we wanted the video to look besides:
It has to do something with Weights & Biases (obvious in retrospect 😅)
It might have to do something with Weave, one of our newest products
We should use open-source diffusion models (because open-source rocks)
We eventually settled on two important points I want to highlight as well. The first:
SDXL → Stable Video DiffusionThe idea is simple: generate an image using a powerful image generation model like Stable Diffusion XL and animate it with Stable Video Diffusion, an image-to-video generation model that can generate 2-4 seconds of high-resolution footage.
If you want to know more about using Stable Diffusion XL using 🤗 Diffusers, you can check out the following reports. You will also learn how to use the W&B autolog integration for Diffusers that enable you to log all the prompts, negative prompts, generated media, and configs associated with your image/video/audio generation experiment by simply including two lines of code in your program.
A Guide to Using Stable Diffusion XL with HuggingFace Diffusers and W&B
A comprehensive guide to using Stable Diffusion XL (SDXL) for generating high-quality images using HuggingFace Diffusers and managing experiments with Weights & Biases
A Guide to Prompt Engineering for Stable Diffusion
A comprehensive guide to prompt engineering for generating images using Stable Diffusion, HuggingFace Diffusers and Weights & Biases.
﻿
﻿
﻿
SDXL → Stable Video Diffusion10
﻿
AnimateDiffWe also looked at AnimateDiff, a framework that enables us to animate an existing text-to-image model for generating videos from text prompts.
﻿
Animatediff8
﻿
Observations and ConclusionsOne of the things we achieved as part of the initial experiments was to make the videos's color palette visually consistent with that of the official W&B brand guide.
Unsure of how to visually align the videos with the idea of Weave, we decided to take it somewhat literally and use visual cues such as threads, strings, woolen balls, etc. 
Stable Video Diffusion doesn't seem good at producing complex interactions between objects in the image across frames. Most of the time, it adds panning and zooming effects, which do not add much dynamic vibe to the footage. When it does try to render complex movements (such as moving the head of the llama), it generates creepy distortions.
While videos generated by AnimateDiff are far from static, they still struggle to create complex movements that are also realistic (as is the case with the cat playing with balls of string and the mutated llamas). Although the videos look better than the SDXL + SVD approach, they might not be a good candidate for generating a video promoting our brand.
⚗️ Interpolation with Stable DiffusionNext, we tried different techniques to interpolate between images using Stable Diffusion (for example, we tried creating intermediate images to help smooth our transitions). To achieve this, we'd have to navigate through the high-dimensional latent space of Stable Diffusion corresponding to the images, where each dimension represents a specific feature that the model has learned.
We attempted several different latent interpolation techniques and created a GitHub repository called weave-diffusion, which implements all of these interpolation techniques as Diffuser pipelines. 
Let's check the results produced by some of these techniques.
Interpolating between two sets of prompt embeddingsAs part of this technique, we attempt interpolating positive and negative prompt embeddings by exploring the latent space between two conceptual points defined by prompts. 
In this case, we take consecutive pairs of embeddings corresponding to the prompt and the negative prompts and perform spherical linear interpolation between them, thus creating a series of interpolated embeddings later used to generate images with smooth transitions between different states.
This technique is implemented as StableDiffusionMultiPromptInterpolationPipeline in the weave-diffusion repository. You can reproduce the results using this script.
﻿
﻿
Interpolating between 2 sets of prompt embeddings3
﻿
Interpolating between two sets of latent vectorsAs part of this technique, we perform spherical linear interpolation interpolation between the two latent embeddings of the diffusion model itself instead of the prompts.
This technique is implemented as StableDiffusionMultiLatentInterpolationPipeline in the weave-diffusion repository. You can reproduce the results using this script. 
﻿
Interpolating between 2 sets of latent vectors3
﻿
Observations and conclusionWhile latent interpolation techniques were a definite improvement over the initial results obtained from SDXL + SVD and the AnimateDiff pipelines and provided impressive results, we still need to insert some visuals in the video that would be instantly recognizable to someone familiar with the brand beyond simply the colors associated with the brand.
🧬 Controlled interpolation with Stable DiffusionA ControlNet is a fine-tuned Stable Diffusion model that enables us to use an additional control image to condition and control Stable Diffusion generation. For example, we could use an image of the Weights & Biases logo to control the interpolation process and generate the interpolating frames.
To that end, we implemented ControlnetInterpolationPipeline in the weave-diffusion repository. This script allows you to reproduce the results.
﻿
The issue with conditioning and how we fixed them2
﻿
Observations and conclusionUsing the vanilla canny-edge detected version of the W&B logo as the condition image did not provide sufficient string conditioning regarding the overall structure of the logo in the generated frames and often led to too many circles being generated, which made the W&B logo almost unrecognizable. This is probably because canny-edge-based conditioning for controlnets works on the arrangement of white pixels in the condition image. Since the distribution of white pixels was pretty small in the original condition image compared to the black ones, the conditioning was weak.
To provide stronger conditioning, we increased the number of white pixels in the condition image by flood-filling the edges of the circles from the W&B logo; this ensured a better distribution of white pixels in the condition image, thus providing stronger conditioning to the generated frames.
The conditioning image only ensures an overall structure in the foreground of the generated frames. At the same time, the prompts on whose embeddings the interpolation is targeted provide the entire context and theme for the generated frames.
The total number of interpolated frames is the product of the specified number of interpolated frames in each step and the number of interpolating prompts. Therefore, increasing one of these values can increase the length of the resultant video. 
The number of interpolated frames per step enables us to control the pace of individual interpolations, where increasing the value ensures smoother interpolations.
The best way to make the video look diverse is to provide more prompts.
🧶 Weaving the final productArmed with the insights from our previous experiments, we generated a bunch of videos, which our awesome video production team could use to put together the final 10-second video.
﻿
Run set10
﻿
And with that, we had the building blocks to create our final video:
﻿
﻿
🚀 Weave for faster development cyclesThanks for reading this far. We're proud of Weave and excited about its potential to help developers working with all types of generative models to ship to production faster. If you try Weave and have any feedback or feature requests from us we'd love to we'd love to hear from you on github.﻿
﻿
A sample video showing how Weave can be used to analyze LLM-driven multi-modal evaluation strategies for diffusion models.1
﻿
Automated PDF summarization of arXiv papers with Claude 3.5 Sonnet and W&B Weave 
Learn how to create an automated PDF summarization system for arXiv papers using Anthropic's API and W&B Weave using Chain Of Density. 
Building an AI teacher's assistant using LlamaIndex and Groq
Today, we're going to leverage a RAG pipeline to create an AI TA capable of helping out with grading, questions about a class syllabus, and more
﻿
﻿
Add a comment
Tags: Articles, Weave, GenAI, LLM, Video, Tutorial, Framework / Integration
Iterate on AI agents and models faster. Try Weights & Biases today.