MLOps using AWS Sagemaker Session Slides - Session 3 - DataWrangler and Feature Store
Meetup Link
https://www.meetup.com/dataopslabs/events/290943425/
Summary
The presentation will delve into the dynamic capabilities of Amazon SageMaker Data Wrangler and Feature Store, offering a comprehensive understanding of their functionalities and seamless integration into the machine learning pipeline. Through a live demonstration, attendees will witness the Data Wrangler's ability to transform and publish data to the Feature Store for ML Pipelines, utilizing various sources such as S3, Athena, Redshift, EMR, Snowflake, Databricks, Salesforce, and Appflow for over 40 data sources. Key points include creating a data flow for ML data preparation, leveraging standard transforms for dataset cleaning and transformation, and exporting data preparation workflows to various destinations, including S3, Sagemaker Model Pipeline, Feature Store, Python script/Notebook.
The AWS Glue Data Catalog integration ensures storage consistency, allowing users to query features using SQL tools like Amazon Athena and facilitating easy discovery and reuse of existing features. The Feature Store supports both offline and online storage, maintaining synchronization crucial for model accuracy. Additional features such as lineage tracking and point-in-time recovery enhance confidence in feature reuse and support specific requirements like training models with past feature values. The SageMaker Feature Store emerges as a versatile tool in the MLOps lifecycle, managing datasets and feature pipelines efficiently to streamline data science tasks and eliminate redundant feature creation efforts.