Join us for our first ever GenAI Salon, with Shreya Shankar, at our offices on August 15th from 5pm – 8pm. Shreya is going to talk about scaling up your “vibe checks” and evaluating your GenAI models. We’ll follow this up with a discussion of the challenges we’re facing as we deploy our GenAI models in production, and share best practices we’re learning.
Feel free to invite your friends. Space for this event is limited though, so please register soon if you plan to attend.
400 Alabama St, San Francisco, CA 94110
Shreya Shankar is a PhD student in computer science at UC Berkeley. Her research focuses on addressing data challenges in production AI and ML pipelines through a human-centered approach. Her work has appeared in top database and human-computer interaction venues like VLDB, SIGMOD, CIDR, and CSCW. She is a recipient of the NDSEG Fellowship and co-organizes the DEEM workshop at SIGMOD, which focuses on data management in end-to-end machine learning.
Large language models (LLMs) are increasingly being used to write custom pipelines that repeatedly process or generate data. Despite their usefulness, LLM pipelines often produce errors, typically identified through manual “vibe checks” by developers. This talk explores automating this process using evaluation assistants, presenting a method for automatically generating assertions and an interface to help developers iterate on assertion sets. We share takeaways from a deployment with LangChain, where we auto-generated assertions for 2000+ real-world LLM pipelines. Finally, we discuss insights from a qualitative study of how 9 engineers use evaluation assistants: we highlight the subjective nature of “good” assertions and how they adapt over time with changes in prompts, data, LLMs, and pipeline components.
ML Engineer
PhD at UC Berkeley