Data Validation in Production ML Pipelines

Data Validation in Production ML Pipelines
Gain expertise in data validation to build robust production ML pipelines, detect data drift, and manage data quality using cutting-edge automated toolkits.
2 Hours
Free

Sign up for this course to:

  • Grasp data validation importance, discover how data validation enhances machine learning pipelines, by managing data drift, schema validation, and handling data corruption.
  • Dive into hands-on examples and analyze real-world datasets with techniques such as schema validation, drift detection, and continual retraining.
  • Utilize powerful tools like TensorFlow Data Validation (TFDV) and the GATE method to effectively detect data drifts and maintain data quality.

Curriculum

  • Introduction
  • Schema (Data Structure) Validation
  • Skew Detection
  • Automatic Data Validation System for ML Pipelines
  • Course Assessment
  • Conclusions
Course Reviews
Fantastic course Crisp and clearly explained
Using TFX and Schema Validation. Even though the whole course is wonderful, the fact that pyarrow raises an error when installed caused the modal to crash. I had to install tfdv locally, but it is ok as I learnt more about the whole ML ecosystems in use and also the tfdv module spot on to my current needs in the work. I think the requirements txt might need an update!
Effective MLOps: Data Validation for ML. Good datasets and very detailed codes. This helped me a lot to understand more on datasets, ML pipelines training with wandb. Thank you so much.
Preventing Data Drifts in Production ML: Insights & Techniques. I got wonderful insights on predicting and preventing data drifts in production ML Pipeline with real world use cases and dataset. Tracking the ML Pipeline with weights and biases was interesting to learn and generate the report. Thank you
Course instructor

Shreya Shankar

Shreya Shankar is doing her PhD in databases at UC Berkeley. She is broadly interested in data management for machine learning (ML), with an emphasis on helping non-ML experts build and productionize ML pipelines. She is currently working on a new framework for building ML pipelines with automatic data validation, model retraining, and observability. Outside of research, she enjoys making ice creams, hiking, and sampling Bay Area coffee roasters.
Researcher PhD student @ UC Berkeley
Explore our other courses