Data engineering practice, including building data pipelines (ELT) from a variety of sources.
-
Updated
Feb 13, 2023 - Python
Data engineering practice, including building data pipelines (ELT) from a variety of sources.
Building a next-generation hybrid data pipeline architecture that combines the power of Microsoft Fabric, Azure Cloud, and Power BI. This pipeline is engineered to tackle the challenges of real-time data ingestion, multi-layered processing, and analytics, delivering business-critical insights.
Building an ETL pipeline that extracts data from S3, stages them in Redshift.
Summary/Notes of Snowflake cloud data warehouse. (Complete ✅)
This project builds a cloud-based ETL pipeline for Sparkify to move data to a cloud data warehouse. It extracts song and user activity data from AWS S3, stages it in Redshift, and transforms it into a star-schema data model with fact and dimension tables, enabling efficient querying to answer business questions.
Automating Data Workflows in Snowflake with Task Scheduling & Management.
This project demonstrates Snowflake Streams for change data capture. It covers creating streams to track INSERT, UPDATE, and DELETE operations on tables, loading data from S3, querying captured changes, and managing stream objects for real-time data monitoring.
The objective of this task is to create and configure a new virtual warehouse in Snowflake. Warehouses are crucial for query execution and data processing, as they provide the compute resources required to run SQL statements.
End-to-end pipeline analysing Yelp reviews using AWS S3, Snowflake, Python UDFs and advanced SQL sentiment analysis
Hands-on project covering Snowflake data loading with custom file formats, validation modes, error handling, string length limits, TRUNCATECOLUMNS, and analyzing load history using account_usage.load_history.
This project demonstrates data sampling techniques in Snowflake. It covers loading datasets from S3, performing RANDOM and SYSTEM sampling methods to extract subsets, validating sampled data, and optimizing analysis on datasets.
Error Handling Hands-on project showcasing Snowflake data loading with error handling using VALIDATION_MODE, ON_ERROR = CONTINUE, ON_ERROR = SKIP_FILE, and ON_ERROR = SKIP_FILE_% while ingesting CSV files from AWS S3.
moved, cleaned, and transformed data stored in S3 as json to Redshift.
This project demonstrates how to use Snowflake stages for loading data from Amazon S3 into Snowflake tables. It also covers applying transformations during loading and selecting only specific columns from the source data.
This project explores Snowflake’s table types, including Permanent, Temporary, Transient, and External tables. It demonstrates creating tables, loading data from S3 stages, querying and validating data, and understanding differences in persistence, retention, and Time Travel support.
This project explores Snowflake’s Time Travel feature, including querying historical data using offsets, retention periods, and query IDs. It demonstrates restoring previous table states after updates, managing retention settings, and recovering data efficiently.
Building a cloud data warehouse with AWS Redshift.
This project demonstrates Snowflake table cloning and swapping techniques. It covers creating original and cloned tables, loading data from S3, verifying cloned data, and performing table swaps to efficiently exchange data between staging and production tables.
🌨️ Load and transform data from Amazon S3 into Snowflake efficiently using stages, enhancing your data ingestion practices without altering source files.
Add a description, image, and links to the cloud-data-warehouse topic page so that developers can more easily learn about it.
To associate your repository with the cloud-data-warehouse topic, visit your repo's landing page and select "manage topics."