Data validation in production ML pipelines

Gain expertise in data validation to build robust production ML pipelines, detect data drift, and manage data quality using cutting-edge automated toolkits.

2 Hours

Free

Sign up for this course to:

Grasp data validation importance, discover how data validation enhances machine learning pipelines, by managing data drift, schema validation, and handling data corruption
Dive into hands-on examples and analyze real-world datasets with techniques such as schema validation, drift detection, and continual retraining
Utilize powerful tools like TensorFlow Data Validation (TFDV) and the GATE method to effectively detect data drifts and maintain data quality

Curriculum

Introduction
Schema (Data Structure) validation
Skew detection
Automatic data validation systems for ML pipelines
Course assessment
Conclusions

Course Reviews

Fantastic course

Crisp and clearly explained

Using TFX and Schema Validation.

Even though the whole course is wonderful, the fact that pyarrow raises an error when installed caused the modal to crash. I had to install tfdv locally, but it is ok as I learnt more about the whole ML ecosystems in use and also the tfdv module spot on to my current needs in the work. I think the requirements txt might need an update!

Effective MLOps: Data Validation for ML.

Good datasets and very detailed codes. This helped me a lot to understand more on datasets, ML pipelines training with wandb. Thank you so much.

Preventing Data Drifts in Production ML: Insights & Techniques.

I got wonderful insights on predicting and preventing data drifts in production ML Pipeline with real world use cases and dataset. Tracking the ML Pipeline with weights and biases was interesting to learn and generate the report. Thank you

Course instructor

Shreya Shankar

Researcher PhD student @ UC Berkeley

Shreya Shankar is doing her PhD in databases at UC Berkeley. She is broadly interested in data management for machine learning (ML), with an emphasis on helping non-ML experts build and productionize ML pipelines. She is currently working on a new framework for building ML pipelines with automatic data validation, model retraining, and observability. Outside of research, she enjoys making ice creams, hiking, and sampling Bay Area coffee roasters.

Explore our other courses

MLOPS

Model CI/CD

Overcome model chaos, automate key workflows, ensure governance, and streamline the end-to-end model lifecycle. This course will provide you with the concepts, best practices, and tools to level up your model management and drive success.

MLOPS

Effective MLOps: Model development

Bringing machine learning models to production is challenging, with a continuous iterative lifecycle that consists of many complex components. Having a disciplined, flexible and collaborative process - an effective MLOps system - is crucial to enabling velocity and rigor, and building an end-to-end machine learning pipeline that continually delivers production-ready ML models and services.

MLOPS

CI/CD for machine learning (GitOps)

Streamline your ML workflows and save valuable time by automating your pipelines and deploying models with confidence. Learn how to use GitHub Actions and integrate W&B experiment tracking in this practical, hands-on learning experience.

MLOPS

Machine learning for business decision optimization

Learn to optimize decision rules, translating machine learning predictions into actionable insights. Discover how to achieve practical value and business impact by measuring performance using business metrics, and deploy ML models successfully.

MLOPS

W&B PLATFORM

Weights & Biases 201: Registry

This compact course, led by ML Success Engineer Ken Lee, dives into advanced model management utilizing Weights and Biases for logging, registering, and managing ML models.

MLOPS

W&B PLATFORM

Weights & Biases 101

This course is a gentle introduction to Weights & Biases with a focus on experiment tracking. Learn to track, visualize, and optimize your ML experiments, streamline collaboration with your team, and make your projects efficient and reproducible.

Data validation in production ML pipelines

Sign up for this course to:

Curriculum

Shreya Shankar

Model CI/CD

Effective MLOps: Model development

CI/CD for machine learning (GitOps)

Machine learning for business decision optimization

Weights & Biases 201: Registry

Weights & Biases 101

The Platform

Article

Resources

Company

Use cases

Industries

Learn more

Data validation in production ML pipelines

Sign up for this course to:

Curriculum

Shreya Shankar

Model CI/CD

Effective MLOps: Model development

CI/CD for machine learning (GitOps)

Machine learning for business decision optimization

Weights & Biases 201: Registry

Weights & Biases 101

The Platform

Article

Resources

Company

Use cases

Industries