Statistical Types for Pandas DataFrames and Friends: Parse, Validate, and Synthesize DataFrames with Generative Schemas
This webinar comes from Machine Learning Engineer Niels Bantilan. As well as working as an ML Engineer at Union.ai, Niels is also creator of UnionML, creator of Pandera, open Source Maintainer of Flyte.
DataFrames are one of the key data structures that data practitioners use to manipulate tabular data. The pandas library, for example, is flexible and powerful, but working with DataFrames for complex use cases often leads to unexpected data types, invalid values, and overall opacity in the contents of a particular DataFrame as it’s transformed from its raw form to one that’s ready for analysis.
In this workshop, you will learn how to ensure data quality with Pandera, a statistical data testing tool for pandas-like DataFrames, so that you can be more confident in the correctness of your code.
• Why should I validate data?
• What’s data testing, and how can I put it into practice?
• Pandera quickstart: create statistical types for your DataFrames
• Example 1: Validate your Data analysis
• Example 2: Validate your Machine Learning Pipeline
• Conclusion: How can I start using Pandera in my work?
The Python Community
Python Live is a series of free, live events designed to connect the tech community and educate Python, AI and Machine Learning professionals worldwide for free. By bringing successful people together, we can add value to tech professionals who want to develop their skillset as well as for tech leaders who want to help their team to develop.
The Live Series Webinars
How Can I Attend Webinars?
Simply check out all our upcoming events via our Eventbrite page. Head there now to register as spaces can fill up quickly!
Find Python Developers
This webinar will be broadcast live and for free on Thursday 28th July.