DE Onboarding - Foundations
DE Onboarding - Foundations
This curriculum is designed to onboard programmers into data engineering with a specific focus on PostgreSQL, BigQuery, SQLite, and Kubernetes. Each section is optimized to take approximately 52 minutes (one lunch break), providing a focused learning experience that fits into your daily schedule. It is a part of the Data Engineering curriculum in Hijra Group. To follow along, you can visit this onboarding project’s source code.
Phase 1: Python Foundations
- 01 - Python Core Language Essentials
- 02 - Python Data Handling and Error Management
- 03 - Essential Data Libraries: Numpy and Pandas Basics
- 04 - Web Integration and APIs
- 05 - Object-Oriented Programming for Data Engineering
- 06 - Checkpoint 1: Python Foundation Review
Phase 2: Python Code Quality
- 07 - Static Typing with Python
- 08 - Python Annotations and Decorators
- 09 - Introduction to Testing in Python
- 10 - Data Engineering Code Quality
- 11 - Checkpoint 2: Python Code Quality Review
Phase 3A: Database Fundamentals I
- 12 - SQL Fundamentals with SQLite
- 13 - Python and SQLite Integration
- 14 - Advanced Database Operations with SQLite
- 15 - Type-Safe Database Programming
- 16 - PostgreSQL Fundamentals
- 17 - Python and PostgreSQL Integration
- 18 - Checkpoint 3A: Database Fundamentals I Review
Phase 3B: Database Fundamentals II
- 19 - Advanced SQL Querying with SQLite
- 20 - SQLite Indexing and Optimization
- 21 - Advanced PostgreSQL Querying
- 22 - PostgreSQL Indexing and Optimization
- 23 - Type-Safe Database Integration
- 24 - Checkpoint 3B: Database Fundamentals II Review
Phase 4: Cloud Analytics
- 25 - BigQuery Fundamentals
- 26 - Python and BigQuery Integration
- 27 - BigQuery Advanced Querying
- 28 - BigQuery Data Warehousing
- 29 - BigQuery Optimization Techniques
- 30 - Checkpoint 4: Cloud Analytics Review
Phase 5: Analytical Storage
- 31 - Data Lakes with Google Cloud Storage
- 32 - Data Marts with BigQuery
- 33 - BigQuery and Google Sheets Integration
- 34 - Python for Data Lake Processing: Foundations
- 35 - Google Cloud Storage Advanced Features
- 36 - Python for Data Lake Processing: Optimization
- 37 - Checkpoint 5: Analytical Storage Review
Phase 6: Advanced Data Processing
- 38 - Advanced NumPy
- 39 - Advanced Pandas
- 40 - Concurrency in Python
- 41 - Type-Safe Data Processing
- 42 - Testing Data Pipelines
- 43 - Advanced Testing Techniques
- 44 - Checkpoint 6: Advanced Data Processing Review
Phase 7: Web and Database Integration
- 45 - Jupyter Notebooks for Data Development
- 46 - Data Access Patterns for Applications
- 47 - Advanced PostgreSQL Features
- 48 - PostgreSQL Performance Optimization
- 49 - BigQuery Advanced Optimization
- 50 - Data Visualization and BI Tools
- 51 - Checkpoint 7: Web and Database Integration Review
Phase 8: Pipeline Orchestration
- 52 - Introduction to Django
- 53 - Introduction to FastAPI
- 54 - dbt for Data Transformation
- 55 - Simple Scheduling with Python
- 56 - Airflow Fundamentals
- 57 - Airflow in Docker
- 58 - Building Complex Airflow Workflows
- 59 - Checkpoint 8: Pipeline Orchestration Review
Phase 9: Production Deployment
- 60 - Docker for Data Applications
- 61 - Kubernetes Fundamentals
- 62 - Deploying Data Applications to Kubernetes
- 63 - PostgreSQL in Kubernetes
- 64 - Airflow in Kubernetes
- 65 - Security Best Practices for Data Pipelines
- 66 - Checkpoint 9: Production Deployment Review
Phase 10: Capstone Project
- 67 - Capstone Project Planning
- 68 - Capstone Project Implementation Part 1
- 69 - Capstone Project Implementation Part 2
- 70 - Capstone Project Implementation Part 3