07 - Static Typing with Python
Complexity: Moderate (M)
7.0 Introduction: Why This Matters for Data Engineering
Static typing in Python enhances code reliability and maintainability, critical for Hijra Group’s Sharia-compliant fintech analytics, where data pipelines process millions of financial transactions. Think of static typing like a grocery list with specific items (e.g., ‘2 apples’) versus a vague note (‘some fruit’). Just as the list ensures you buy exactly what’s needed, type annotations ensure your code uses the right data types, preventing errors in Hijra Group’s financial pipelines. By enforcing type safety with tools like Pyright, errors like passing a string to a numeric function are caught before runtime, reducing bugs in production. For example, a pipeline handling sales.csv can ensure price is a float, preventing type-related failures. This chapter builds on Chapters 1–6 (Python basics, data handling, NumPy, Pandas, OOP) and introduces type annotations, Pyright, and Generics, ensuring robust pipelines. All Python code from this chapter onward includes type annotations verified by Pyright, per the curriculum’s guidelines, using 4-space indentation per PEP 8 (spaces, not tabs) to avoid IndentationError.
Data Engineering Workflow Context
Static typing fits into the pipeline as follows:
flowchart TD
A["Raw Data (CSV)"] --> B["Type-Annotated Python Scripts"]
B --> C{"Type Checking"}
C -->|Pyright| D["Validated Code"]
D --> E["Process Data (NumPy/Pandas)"]
E --> F["Output (CSV/JSON)"]
F --> G["Storage/Analysis"]
classDef data fill:#f9f9f9,stroke:#333,stroke-width:2px
classDef process fill:#d0e0ff,stroke:#336,stroke-width:1px
classDef tool fill:#ffddff,stroke:#636,stroke-width:1px
classDef storage fill:#ddffdd,stroke:#363,stroke-width:1px
class A,F data
class B,D,E process
class C tool
class G storageBuilding On and Preparing For
- Building On:
- Chapter 1: Extends Python syntax with type annotations for variables and functions.
- Chapter 2: Enhances file handling and validation with typed inputs/outputs.
- Chapter 3: Applies type safety to Pandas/NumPy operations on
sales.csv. - Chapter 5: Uses OOP to structure typed classes for data processing.
- Chapter 6: Integrates type annotations into consolidated Python skills.
- Preparing For:
- Chapter 8: Adds annotations and decorators for logging/monitoring.
- Chapter 9: Enables type-safe testing with
unittest/pytest. - Chapter 13: Supports type-safe SQLite integration with Pydantic.
- Chapter 41: Prepares for type-safe data processing with advanced Pydantic.
What You’ll Learn
This chapter covers:
- Type Annotations: Syntax for variables, functions, and classes, including the
Anytype. - Pyright: Setup and usage for type checking.
- Generics: Flexible typing for reusable code.
- Basic Exception Handling: Typed exception handling (introduced here, expanded in Chapter 8).
- Micro-Project: A type-safe sales data processor with Pyright verification.
By the end, you’ll build a type-safe processor for data/sales.csv, validating data with type annotations, checking with Pyright, and handling basic exceptions, all with 4-space indentation per PEP 8. The project uses sales.csv and config.yaml from Appendix 1, preparing for advanced typing and testing.
Follow-Along Tips:
- Ensure
de-onboarding/data/containssales.csvandconfig.yamlper Appendix 1. - Install libraries:
pip install numpy pandas pyyaml pyright. - Configure editor for 4-space indentation (VS Code: “Editor: Tab Size” = 4, “Editor: Insert Spaces” = true, “Editor: Detect Indentation” = false).
- Run
python -tt script.pyto detect tab/space mixing. - Use print statements (e.g.,
print(df.dtypes)) to debug DataFrames. - Verify Pyright setup with
pyright --version. - Use UTF-8 encoding to avoid
UnicodeDecodeError.
7.1 Type Annotations Basics
Type annotations, introduced in Python 3.5 via PEP 484, allow specifying variable, function, and class types, checked statically by tools like Pyright. They don’t affect runtime but catch errors during development, reducing bugs in data pipelines. For example, annotating price: float ensures a pipeline processes financial data correctly.
7.1.1 Variable and Function Annotations
Annotate variables and functions using : type and -> return_type. Common types include built-in (int, float, str) and typing module types (List, Dict, Optional).
# File: de-onboarding/type_basics.py
from typing import List, Dict, Any, Optional # Import typing module
# Variable annotations
price: float = 999.99 # Annotate price as float
quantity: int = 2 # Annotate quantity as integer
products: List[str] = ["Halal Laptop", "Halal Mouse"] # List of strings
config: Dict[str, Any] = {"min_price": 10.0} # Dictionary with string keys
# Optional[str] is useful for pipeline configs, e.g., optional file paths in data ingestion
config_path: Optional[str] = None # Optional type allows None; covered fully in Chapter 13
# Function annotations
def calculate_total(prices: List[float], quantities: List[int]) -> float:
"""Calculate total sales with type annotations."""
if len(prices) != len(quantities): # Validate input lengths
print("Error: Mismatched lengths") # Debug
return 0.0
total: float = sum(p * q for p, q in zip(prices, quantities)) # Compute total
print(f"Total: {total}") # Debug
return total
# Test function
result: float = calculate_total([999.99, 24.99], [2, 10]) # Call with valid types
print(f"Result: {result}") # Print result
# Expected Output:
# Total: 2249.88
# Result: 2249.88The Any Type
The Any type allows any value, offering flexibility when types are unknown, like in config: Dict[str, Any] for mixed YAML data. However, use it as a last resort, as it bypasses type checking, risking errors in pipelines. Using Any for transaction amounts could lead to errors, like treating a string as a number, disrupting Hijra Group’s analytics. Prefer specific types (e.g., float, str) for reliability.
from typing import Any
risky: Any = "1000.00" # Could be str, float, etc., risking misprocessing transactions
safe: float = 1000.00 # Enforces float, safer for financial calculationsThis diagram shows how Any skips checks, risking errors, while float ensures safety:
flowchart TD
A["Input: Any"] --> B["Function
No Type Checking"]
B --> C["Risky Output
Possible Errors"]
D["Input: float"] --> E["Function
Type Checking"]
E --> F["Safe Output
Reliable"]
classDef data fill:#f9f9f9,stroke:#333,stroke-width:2px
classDef process fill:#d0e0ff,stroke:#336,stroke-width:1px
classDef error fill:#ffdddd,stroke:#933,stroke-width:1px
classDef safe fill:#ddffdd,stroke:#363,stroke-width:1px
class A,D data
class B,E process
class C error
class F safeFollow-Along Instructions:
- Ensure
de-onboarding/exists. - Save as
de-onboarding/type_basics.py. - Configure editor for 4-space indentation per PEP 8.
- Install Pyright:
pip install pyright. - Run Pyright:
pyright type_basics.py. - Run script:
python type_basics.py. - Verify output matches comments.
- Common Errors:
- Pyright Error: If
reportGeneralTypeIssues, ensure correct types (e.g.,List[float]notlist[float]). Check withpyright --verbose. - ModuleNotFoundError: Install
pyrightwithpip install pyright. - IndentationError: Use 4 spaces. Run
python -tt type_basics.py.
- Pyright Error: If
Key Points:
- Syntax:
: typefor variables/arguments,-> typefor return values. - Types: Built-in (
int,float,str) andtypingmodule (List,Dict,Any,Optional). - Any Type: Use sparingly to avoid unchecked types; prefer specific types for pipeline safety.
- Time Complexity: Annotations are checked statically, adding no runtime cost (O(1)).
- Space Complexity: Minimal memory for type metadata during static analysis.
- Implication: Ensures type safety for pipeline inputs, e.g.,
priceasfloatin sales data.
7.1.2 Class Annotations
Annotate class attributes and methods for structured data processing.
# File: de-onboarding/class_types.py
from typing import List
class Sale:
product: str # Class attribute annotation
price: float
quantity: int
def __init__(self, product: str, price: float, quantity: int) -> None:
"""Initialize Sale with type annotations."""
self.product = product
self.price = price
self.quantity = quantity
def get_amount(self) -> float:
"""Calculate sale amount."""
amount: float = self.price * self.quantity
print(f"Amount for {self.product}: {amount}") # Debug
return amount
# Test class
sale: Sale = Sale("Halal Laptop", 999.99, 2) # Create instance
amount: float = sale.get_amount() # Call method
print(f"Total Amount: {amount}") # Print result
# Expected Output:
# Amount for Halal Laptop: 1999.98
# Total Amount: 1999.98Follow-Along Instructions:
- Save as
de-onboarding/class_types.py. - Configure editor for 4-space indentation per PEP 8.
- Run Pyright:
pyright class_types.py. - Run script:
python class_types.py. - Verify output matches comments.
- Common Errors:
- Pyright Error: Ensure method return types match (e.g.,
-> float). Checkpyright --verbose. - IndentationError: Use 4 spaces. Run
python -tt class_types.py.
- Pyright Error: Ensure method return types match (e.g.,
Key Points:
- Class Annotations: Attributes use
: typeat class level or in__init__. - Methods: Annotate arguments and return types like functions.
- Implication: Structures data for pipelines, e.g.,
Saleclass for sales records.
7.2 Pyright Setup and Usage
Pyright is a static type checker that verifies annotations, catching errors like passing a str to a float parameter. It’s faster than mypy, as you explored in our April 19, 2025 discussion, and integrates with VS Code.
7.2.1 Setting Up Pyright
Pyright is like a spell-checker for types, scanning your code before it runs. Setting it up involves a configuration file to tell Pyright what to check.
flowchart TD
A["Python Code"] --> B["Pyright Config
pyproject.toml"]
B --> C["Pyright Checks"]
C --> D["Error Report"]
classDef data fill:#f9f9f9,stroke:#333,stroke-width:2px
classDef process fill:#d0e0ff,stroke:#336,stroke-width:1px
class A,D data
class B,C process# File: de-onboarding/pyproject.toml
[tool.pyright]
include = ["*.py"] # Check all Python files
pythonVersion = "3.10" # Specify Python version
reportMissingImports = true # Report missing imports
reportGeneralTypeIssues = true # Report type errorsFollow-Along Instructions:
- Create
de-onboarding/pyproject.toml. - Copy content above.
- Install Pyright:
pip install pyright. - Verify:
pyright --version. - Common Errors:
- Pyright Not Found: Install with
pip install pyright. - Config Error: Ensure
pyproject.tomlis inde-onboarding/. Printcat pyproject.toml(Unix) ortype pyproject.toml(Windows).
- Pyright Not Found: Install with
7.2.2 Running Pyright
Check a script with Pyright, demonstrating common type errors.
# File: de-onboarding/pyright_example.py
from typing import List
def sum_sales(prices: List[float], quantities: List[int]) -> float:
"""Sum sales with type annotations."""
total: float = sum(p * q for p, q in zip(prices, quantities))
return total
# Example 1: Incorrect return type
result: int = sum_sales([999.99, 24.99], [2, 10]) # Error: int instead of float
print(result)
# Example 2: Incorrect input type (uncomment to test Pyright error)
# prices: List[str] = [999.99, 24.99] # Error: List[str] instead of List[float]
# result: float = sum_sales(prices, [2, 10])
# print(result)
# Fix for Example 2 (comment out erroneous lines above and use this)
prices: List[float] = [999.99, 24.99] # Correct type
result: float = sum_sales(prices, [2, 10]) # Correct return type
print(result)Follow-Along Instructions:
- Save as
de-onboarding/pyright_example.py. - Configure editor for 4-space indentation per PEP 8.
- Run Pyright:
pyright pyright_example.py. - Expected Pyright Output for Example 1:
/de-onboarding/pyright_example.py /de-onboarding/pyright_example.py:7:10 - error: Type of "result" is incompatible with declared type Expected: "int" Actual: "float" (reportGeneralTypeIssues) - Test Example 2 by uncommenting erroneous lines, running Pyright, and observing:
/de-onboarding/pyright_example.py /de-onboarding/pyright_example.py:11:10 - error: Argument of type "List[str]" cannot be assigned to parameter "prices" of type "List[float]" (reportGeneralTypeIssues) - Fix by using the corrected lines, re-run Pyright to confirm no errors, and run script:
python pyright_example.py. - Expected Output:
2249.88 - Common Errors:
- Pyright Error: Fix type mismatches. Use
pyright --verbosefor details. If output is unclear, usepyright --verbose | grep error(Unix) orpyright --verbose | findstr error(Windows) to filter errors. These commands use basic command-line skills from prerequisites; review LinuxCommand.org if needed. - IndentationError: Use 4 spaces. Run
python -tt pyright_example.py.
- Pyright Error: Fix type mismatches. Use
If Pyright errors confuse you, start by checking variable types with print(type(var)) in your code. For example, if processing transaction prices, ensure price is float to avoid miscalculations in Hijra Group’s reports. This simple step helps debug financial data errors.
Key Points:
- Pyright: Checks types statically, reporting errors before runtime.
- Time Complexity: O(n) for checking n lines of code, typically fast (<1s for small projects).
- Space Complexity: O(n) for storing type metadata.
- Implication: Catches type errors in pipelines, e.g., ensuring
priceisfloat.
7.3 Generics for Reusable Code
Generics, from the typing module, allow flexible typing for reusable functions/classes, e.g., processing lists of any numeric type in financial transactions.
flowchart TD
A["Input: List[T]"] --> B["Generic Function"] --> C["Output: T"]
classDef data fill:#f9f9f9,stroke:#333,stroke-width:2px
classDef process fill:#d0e0ff,stroke:#336,stroke-width:1px
class A,C data
class B process# File: de-onboarding/generics_example.py
from typing import TypeVar, List
T = TypeVar("T", int, float) # Define generic type for numbers
def sum_transactions(values: List[T]) -> T:
"""Sum transaction amounts with generics."""
total: T = sum(values) # Type-safe sum
print(f"Total Transactions: {total}") # Debug
return total
# Test with different types
sales: float = sum_transactions([999.99, 24.99]) # Sales amounts
quantities: int = sum_transactions([2, 10]) # Quantities
print(f"Sales Sum: {sales}") # Print sales sum
print(f"Quantities Sum: {quantities}") # Print quantities sum
# Expected Output:
# Total Transactions: 1024.98
# Total Transactions: 12
# Sales Sum: 1024.98
# Quantities Sum: 12Follow-Along Instructions:
- Save as
de-onboarding/generics_example.py. - Configure editor for 4-space indentation per PEP 8.
- Run Pyright:
pyright generics_example.py. - Run script:
python generics_example.py. - Verify output matches comments.
- Common Errors:
- Pyright Error: Ensure
TypeVarconstraints match (e.g.,int,float). Checkpyright --verbose. - IndentationError: Use 4 spaces. Run
python -tt generics_example.py.
- Pyright Error: Ensure
Key Points:
- TypeVar: Defines generic types with constraints (e.g.,
int,float). - Beginner Tip: Think of Generics as a template for functions, letting you sum sales (
float) or quantities (int) with one function, saving time in pipeline coding. - Use Case: Reusable functions for pipeline components, e.g., summing transaction amounts.
- Time/Space Complexity: No runtime impact; static analysis only. For large pipelines, Pyright’s static analysis may take O(n log n) for complex Generic types due to type inference, but this is negligible for typical projects.
- Implication: Enhances code flexibility for Hijra Group’s varied data types.
7.4 Basic Exception Handling with Types
Introduce basic exception handling with typed exceptions, expanded in Chapter 8. Avoid advanced constructs like context managers.
# File: de-onboarding/exception_types.py
from typing import List
def divide_sales(total: float, count: int) -> float:
"""Divide total sales by count with typed exception."""
try:
result: float = total / count
print(f"Result: {result}") # Debug
return result
except ZeroDivisionError as e: # Typed exception
print(f"Error: {e}") # Log error
return 0.0
# Test
result: float = divide_sales(1000.0, 0) # Test division by zero
print(f"Final Result: {result}") # Print result
# Expected Output:
# Error: division by zero
# Final Result: 0.0Follow-Along Instructions:
- Save as
de-onboarding/exception_types.py. - Configure editor for 4-space indentation per PEP 8.
- Run Pyright:
pyright exception_types.py. - Run script:
python exception_types.py. - Verify output matches comments.
- Common Errors:
- Pyright Error: Ensure return type matches (e.g.,
-> float). Checkpyright --verbose. - IndentationError: Use 4 spaces. Run
python -tt exception_types.py.
- Pyright Error: Ensure return type matches (e.g.,
Key Points:
- Syntax:
try/exceptwith specific exceptions (e.g.,ZeroDivisionError). - Type Safety: Annotate return types consistently, even in
exceptblocks. - Time Complexity: O(1) for exception handling (negligible unless frequent).
- Space Complexity: O(1) for exception objects.
- Implication: Prevents pipeline crashes, e.g., handling invalid sales counts.
- Minimal Handling: Minimal exception handling ensures simplicity for beginners, with comprehensive error management introduced in Chapter 8.
7.5 Micro-Project: Type-Safe Sales Data Processor
Project Requirements
Build a type-safe sales data processor for data/sales.csv, enhancing Chapter 3’s processor with type annotations and Pyright verification. The processor validates sales data, computes metrics, and exports results, ensuring type safety for Hijra Group’s analytics pipelines.
- Load
data/sales.csvwithpandas.read_csv. - Read
data/config.yamlwith PyYAML. - Validate records using type-annotated functions and Pandas.
- Compute total sales and top 3 products using NumPy/Pandas.
- Export to
data/sales_results.json. - Log steps and invalid records with print statements.
- Use try/except for basic file and parsing errors.
- Use 4-space indentation per PEP 8 (spaces, not tabs).
- Verify with Pyright to ensure type safety.
Sample Input Files
data/sales.csv (from Appendix 1):
product,price,quantity
Halal Laptop,999.99,2
Halal Mouse,24.99,10
Halal Keyboard,49.99,5
,29.99,3
Monitor,invalid,2
Headphones,5.00,150data/config.yaml (from Appendix 1):
min_price: 10.0
max_quantity: 100
required_fields:
- product
- price
- quantity
product_prefix: 'Halal'
max_decimals: 2Data Processing Flow
flowchart TD
A["Input CSV
sales.csv"] --> B["Load CSV
pandas.read_csv"]
B --> C["Pandas DataFrame"]
C --> D["Read YAML
config.yaml"]
D --> E["Validate DataFrame
Type-Annotated utils.py"]
E -->|Invalid| F["Log Warning
Skip Record"]
E -->|Valid| G["Compute Metrics
Pandas/NumPy"]
G --> H["Export JSON
sales_results.json"]
F --> J["End Processing"]
H --> J
classDef data fill:#f9f9f9,stroke:#333,stroke-width:2px
classDef process fill:#d0e0ff,stroke:#336,stroke-width:1px
classDef error fill:#ffdddd,stroke:#933,stroke-width:1px
classDef endpoint fill:#ddffdd,stroke:#363,stroke-width:1px
class A,C data
class B,D,E,G,H process
class F error
class J endpointAcceptance Criteria
- Go Criteria:
- Loads
sales.csvandconfig.yamlwith type-annotated functions. - Validates records for required fields, Halal prefix, numeric price/quantity, positive prices, and config rules.
- Computes total sales and top 3 products.
- Exports to
data/sales_results.json. - Logs steps and invalid records.
- Uses try/except for file and YAML errors.
- Passes Pyright checks with no type errors.
- Uses 4-space indentation per PEP 8.
- Loads
- No-Go Criteria:
- Fails to load files or validate data.
- Incorrect calculations or JSON export.
- Fails Pyright checks.
- Uses tabs or inconsistent indentation.
Common Pitfalls to Avoid
- Pyright Type Errors:
- Problem: Incorrect annotations (e.g.,
List[str]instead oflist[str]for Python 3.9+). - Solution: Use
pyright --verboseto debug. Print variable types withprint(type(var)).
- Problem: Incorrect annotations (e.g.,
- FileNotFoundError:
- Problem: Missing
sales.csvorconfig.yaml. - Solution: Print paths with
print(csv_path). Ensure files exist per Appendix 1.
- Problem: Missing
- YAML Parsing Errors:
- Problem: Invalid YAML syntax.
- Solution: Print
open(config_path).read()to inspect. Use YAML linter.
- Pandas Type Mismatches:
- Problem: Non-numeric
pricecauses errors. - Solution: Validate with
is_numeric_value. Printdf.dtypes.
- Problem: Non-numeric
- Type Errors in Financial Data:
- Problem: Using
strforpricein calculations causes errors. - Solution: Ensure
floatwithis_numeric_value. Printdf["price"].apply(type).
- Problem: Using
- Overusing
Any:- Problem: Using
Anyforconfigrisks unchecked types in financial data. - Solution: Use specific types where known, and limit
Anyto mixed data like YAML.
- Problem: Using
- IndentationError:
- Problem: Mixed spaces/tabs.
- Solution: Use 4 spaces. Run
python -tt sales_processor.py.
How This Differs from Production
In production, this solution would include:
- Advanced Error Handling: Context managers and custom exceptions (Chapter 8).
- Testing: Unit tests with
pytest(Chapter 9). - Logging: File-based logging (Chapter 52).
- Scalability: Chunked processing for large CSVs (Chapter 40).
- CI/CD: Pre-commit hooks with Pyright (Chapter 10).
Implementation
# File: de-onboarding/utils.py
from typing import Dict, Any, Union, List
def is_numeric(s: str, max_decimals: int = 2) -> bool:
"""Check if string is a decimal number with up to max_decimals."""
parts: List[str] = s.split(".") # Split on decimal point
if len(parts) != 2 or not parts[0].isdigit() or not parts[1].isdigit():
return False # Invalid format
return len(parts[1]) <= max_decimals # Check decimal places
def clean_string(s: str) -> str:
"""Strip whitespace from string."""
return s.strip()
def is_numeric_value(x: Any) -> bool:
"""Check if value is an integer or float."""
return isinstance(x, (int, float)) # Return True for numeric types
def has_valid_decimals(x: Union[int, float], max_decimals: int) -> bool:
"""Check if value has valid decimal places."""
return is_numeric(str(x), max_decimals) # Use is_numeric for validation
def apply_valid_decimals(x: Union[int, float], max_decimals: int) -> bool:
"""Apply has_valid_decimals to a value."""
return has_valid_decimals(x, max_decimals)
def is_integer(x: Any) -> bool:
"""Check if value is an integer when converted to string."""
return str(x).isdigit() # Return True for integer strings
def validate_sale(sale: Dict[str, str], config: Dict[str, Any]) -> bool:
"""
Validate sale based on config rules.
Note: Uses `Dict[str, str]` for CSV string parsing; advanced typing
(e.g., `Union`) is covered in Chapter 13.
"""
required_fields: List[str] = config["required_fields"] # Get required fields
min_price: float = config["min_price"] # Get minimum price
max_quantity: int = config["max_quantity"] # Get maximum quantity
prefix: str = config["product_prefix"] # Get product prefix
max_decimals: int = config["max_decimals"] # Get max decimal places
print(f"Validating sale: {sale}") # Debug: print sale
# Check for missing or empty fields
for field in required_fields: # Loop through required fields
if not sale.get(field) or clean_string(sale[field]) == "":
print(f"Invalid sale: missing {field}: {sale}") # Log invalid
return False
# Validate product: non-empty and matches prefix
product: str = clean_string(sale["product"]) # Clean product string
if not product.startswith(prefix): # Check prefix
print(f"Invalid sale: product lacks '{prefix}' prefix: {sale}") # Log invalid
return False
# Validate price: numeric, meets minimum, and positive
price: str = clean_string(sale["price"]) # Clean price string
if not is_numeric(price, max_decimals) or float(price) < min_price or float(price) <= 0:
print(f"Invalid sale: invalid price: {sale}") # Log invalid
return False
# Validate quantity: integer and within limit
quantity: str = clean_string(sale["quantity"]) # Clean quantity string
if not quantity.isdigit() or int(quantity) > max_quantity:
print(f"Invalid sale: invalid quantity: {sale}") # Log invalid
return False
return True # Return True if all checks pass
# File: de-onboarding/sales_processor.py
import pandas as pd
import numpy as np
import yaml
import json
from typing import Dict, Any, Tuple, List, Optional
import utils
import os
# Uses `Any` for YAML flexibility; prefer specific types (e.g., `Dict[str, float]`) when possible to ensure type safety
def read_config(config_path: Optional[str]) -> Dict[str, Any]:
"""Read YAML configuration."""
try:
if config_path is None:
raise ValueError("Config path cannot be None")
print(f"Opening config: {config_path}") # Debug
with open(config_path, "r") as file:
config: Dict[str, Any] = yaml.safe_load(file)
print(f"Loaded config: {config}") # Debug
return config
except (FileNotFoundError, yaml.YAMLError, ValueError) as e:
# Other possible errors (e.g., PermissionError) are handled in Chapter 8
print(f"Error loading config: {e}") # Log error
return {}
def load_and_validate_sales(csv_path: str, config: Dict[str, Any]) -> Tuple[pd.DataFrame, int, int]:
"""Load sales CSV and validate using Pandas."""
try:
print(f"Loading CSV: {csv_path}") # Debug
df: pd.DataFrame = pd.read_csv(csv_path) # Load CSV
print("Initial DataFrame:") # Debug
print(df.head()) # Show first 5 rows
# Validate required fields
required_fields: List[str] = config["required_fields"]
missing_fields: List[str] = [f for f in required_fields if f not in df.columns]
if missing_fields:
print(f"Missing columns: {missing_fields}") # Log error
return pd.DataFrame(), 0, len(df)
# Clean and filter DataFrame
df = df.dropna(subset=["product"]) # Drop missing product
df = df[df["product"].str.startswith(config["product_prefix"])] # Filter Halal
df = df[df["quantity"].apply(utils.is_integer)] # Ensure integer quantity
df["quantity"] = df["quantity"].astype(int) # Convert to int
df = df[df["quantity"] <= config["max_quantity"]] # Filter quantity
df = df[df["price"].apply(utils.is_numeric_value)] # Ensure numeric price
df = df[df["price"] > 0] # Filter positive prices
df = df[df["price"] >= config["min_price"]] # Filter min price
df = df[df["price"].apply(lambda x: utils.apply_valid_decimals(x, config["max_decimals"]))] # Check decimals
total_records: int = len(df)
print("Validated DataFrame:") # Debug
print(df)
return df, len(df), total_records
except FileNotFoundError as e:
# Other possible errors (e.g., PermissionError, TypeError) are handled in Chapter 8
print(f"Error loading CSV: {e}") # Log error
return pd.DataFrame(), 0, 0
def process_sales(df: pd.DataFrame, config: Dict[str, Any]) -> Tuple[Dict[str, Any], int]:
"""Process sales: compute total and top products."""
if df.empty:
print("No valid sales data") # Log empty
return {"total_sales": 0.0, "unique_products": [], "top_products": {}}, 0
# Compute amount
df["amount"] = df["price"] * df["quantity"] # Price * quantity
print("DataFrame with Amount:") # Debug
print(df)
# Compute metrics
total_sales: float = float(np.sum(df["amount"].values)) # Total sales
unique_products: List[str] = df["product"].unique().tolist() # Unique products
sales_by_product: pd.Series = df.groupby("product")["amount"].sum() # Group by product
top_products: Dict[str, float] = sales_by_product.sort_values(ascending=False).head(3).to_dict()
valid_sales: int = len(df)
print(f"Valid sales: {valid_sales} records") # Log count
return {
"total_sales": total_sales,
"unique_products": unique_products,
"top_products": top_products
}, valid_sales
def export_results(results: Dict[str, Any], json_path: str) -> None:
"""Export results to JSON."""
try:
print(f"Writing to: {json_path}") # Debug
print(f"Results: {results}") # Debug
with open(json_path, "w") as file:
json.dump(results, file, indent=2)
print(f"Exported results to {json_path}") # Confirm
except (FileNotFoundError, PermissionError) as e:
# Other possible errors (e.g., TypeError) are handled in Chapter 8
print(f"Error exporting JSON: {e}") # Log error
def main() -> None:
"""Main function to process sales data."""
csv_path: str = "data/sales.csv"
config_path: str = "data/config.yaml"
json_path: str = "data/sales_results.json"
config: Dict[str, Any] = read_config(config_path)
if not config:
print("Failed to load config, exiting")
return
df: pd.DataFrame
valid_sales: int
total_records: int
df, valid_sales, total_records = load_and_validate_sales(csv_path, config)
results: Dict[str, Any]
results, valid_sales = process_sales(df, config)
export_results(results, json_path)
# Output report
print("\nSales Report:")
print(f"Total Records Processed: {total_records}")
print(f"Valid Sales: {valid_sales}")
print(f"Invalid Sales: {total_records - valid_sales}")
print(f"Total Sales: ${round(results['total_sales'], 2)}")
print(f"Unique Products: {results['unique_products']}")
print(f"Top Products: {results['top_products']}")
print("Processing completed")
if __name__ == "__main__":
main()Expected Outputs
data/sales_results.json:
{
"total_sales": 2499.83,
"unique_products": ["Halal Laptop", "Halal Mouse", "Halal Keyboard"],
"top_products": {
"Halal Laptop": 1999.98,
"Halal Mouse": 249.9,
"Halal Keyboard": 249.95
}
}Console Output (abridged):
Opening config: data/config.yaml
Loaded config: {'min_price': 10.0, 'max_quantity': 100, 'required_fields': ['product', 'price', 'quantity'], 'product_prefix': 'Halal', 'max_decimals': 2}
Loading CSV: data/sales.csv
Initial DataFrame:
product price quantity
0 Halal Laptop 999.99 2
1 Halal Mouse 24.99 10
2 Halal Keyboard 49.99 5
3 NaN 29.99 3
4 Monitor NaN 2
Validated DataFrame:
product price quantity
0 Halal Laptop 999.99 2
1 Halal Mouse 24.99 10
2 Halal Keyboard 49.99 5
DataFrame with Amount:
product price quantity amount
0 Halal Laptop 999.99 2 1999.98
1 Halal Mouse 24.99 10 249.90
2 Halal Keyboard 49.99 5 249.95
Valid sales: 3 records
Writing to: data/sales_results.json
Exported results to data/sales_results.json
Sales Report:
Total Records Processed: 3
Valid Sales: 3
Invalid Sales: 0
Total Sales: $2499.83
Unique Products: ['Halal Laptop', 'Halal Mouse', 'Halal Keyboard']
Top Products: {'Halal Laptop': 1999.98, 'Halal Mouse': 249.9, 'Halal Keyboard': 249.95}
Processing completedHow to Run and Test
Setup:
- Create
de-onboarding/data/and savesales.csv,config.yamlper Appendix 1. - Install libraries:
pip install numpy pandas pyyaml pyright. - Create
pyproject.tomlinde-onboarding/. - Save
utils.pyandsales_processor.py. - Configure editor for 4-space indentation per PEP 8.
- Create
Run:
- Run Pyright:
pyright sales_processor.py. - Run script:
python sales_processor.py. - Verify outputs:
sales_results.jsonand console logs. - Verify JSON content:
cat data/sales_results.json(Unix) ortype data\sales_results.json(Windows).
- Run Pyright:
Test Scenarios:
- Valid Data: Check
sales_results.jsonfor correct metrics. - Empty Config: Modify
read_configto return{}and verify graceful exit. - Invalid CSV: Test with non-existent CSV to trigger
FileNotFoundError. - Type Error: Modify
process_salesto annotatetotal_sales: intinstead offloat, runpyright sales_processor.py, verify error, and revert tofloat. - Optional Type: Call
read_config(None)without modifying the function, runpython sales_processor.py, and verify it raises aValueError. Check Pyright to confirm - Expected Output: confirmOptional[str]allowsNonebut the function rejects it.
- Valid Data: Check
7.6 Practice Exercises
Exercise 1: Type-Annotated Sales Calculator
Write a type-annotated function to compute total sales, verified by Pyright.
Sample Input:
prices: List[float] = [999.99, 24.99, 49.99]
quantities: List[int] = [2, 10, 5]Expected Output:
2499.83Follow-Along Instructions:
- Save as
de-onboarding/ex1_sales.py. - Configure editor for 4-space indentation per PEP 8.
- Run Pyright:
pyright ex1_sales.py. - Run:
python ex1_sales.py. - How to Test:
- Add:
print(calculate_sales([999.99, 24.99, 49.99], [2, 10, 5])). - Verify output:
2499.83. - Test with mismatched lengths: Should return
0.0. - Introduce a type error (e.g.,
result: int), run Pyright, and fix.
- Add:
Exercise 2: Type-Annotated Data Loader
Write a type-annotated function to load and filter data/sample.csv for Halal products.
Sample Input (data/sample.csv):
product,price,quantity
Halal Laptop,999.99,2
Halal Mouse,24.99,10Expected Output:
product price quantity
0 Halal Laptop 999.99 2
1 Halal Mouse 24.99 10
Count: 2Follow-Along Instructions:
- Save as
de-onboarding/ex2_loader.py. - Ensure
data/sample.csvexists per Appendix 1. - Configure editor for 4-space indentation per PEP 8.
- Run Pyright:
pyright ex2_loader.py. - Run:
python ex2_loader.py. - How to Test:
- Add:
df, count = load_halal_sales("data/sample.csv"); print(df); print(f"Count: {count}"). - Verify output matches expected.
- Test with non-existent CSV to trigger
FileNotFoundError.
- Add:
Exercise 3: Generic Sum Function
Write a generic function to sum a list of numbers, verified by Pyright.
Sample Input:
int_values: List[int] = [1, 2, 3]
float_values: List[float] = [1.5, 2.5, 3.5]Expected Output:
Int Sum: 6
Float Sum: 7.5Follow-Along Instructions:
- Save as
de-onboarding/ex3_generic.py. - Configure editor for 4-space indentation per PEP 8.
- Run Pyright:
pyright ex3_generic.py. - Run:
python ex3_generic.py. - How to Test:
- Add:
print(f"Int Sum: {sum_transactions([1, 2, 3])}"); print(f"Float Sum: {sum_transactions([1.5, 2.5, 3.5])}"). - Verify output matches expected.
- Test with incorrect type (e.g.,
List[str]) to trigger Pyright error.
- Add:
Exercise 4: Conceptual Analysis of Generics
Explain how Generics in the sum_transactions function (Section 7.3) enable code reuse across int and float lists, and why this benefits pipeline scalability. Write a 50–100 word explanation, addressing code reuse, type safety, and scalability, using bullet points. Save to de-onboarding/ex4_concepts.txt.
Expected Output (ex4_concepts.txt):
- Code Reuse: Generics in sum_transactions allow one function to process List[int] and List[float], reducing duplication.
- Type Safety: Pyright ensures correct types, preventing errors in pipelines.
- Scalability: Reusability supports diverse numeric data (e.g., sales, quantities) for Hijra Group’s analytics, minimizing maintenance and enabling modular pipeline design.Follow-Along Instructions:
- Write explanation in
de-onboarding/ex4_concepts.txt. - Verify content addresses code reuse, type safety, and scalability within 50–100 words using bullet points.
- How to Test:
- Check
ex4_concepts.txtmatches expected themes. - Discuss with peers to ensure clarity.
- Check
Exercise 5: Typed Exception Handling
Write a function to divide sales with typed exception handling.
Sample Input:
total: float = 1000.0
count: int = 0Expected Output:
Error: division by zero
Result: 0.0Follow-Along Instructions:
- Save as
de-onboarding/ex5_exception.py. - Configure editor for 4-space indentation per PEP 8.
- Run Pyright:
pyright ex5_exception.py. - Run:
python ex5_exception.py. - How to Test:
- Add:
print(divide_sales(1000.0, 0)). - Verify output matches expected.
- Test with valid input (e.g.,
count=2) to ensure correct division.
- Add:
Exercise 6: Debug a Pyright Error
Fix this code with a Pyright type error.
Buggy Code:
from typing import List
def sum_sales(prices: List[float], quantities: List[int]) -> float:
"""Sum sales with type annotations."""
total: float = sum(p * q for p, q in zip(prices, quantities))
return total
result: int = sum_sales([999.99, 24.99], [2, 10]) # Error: int instead of float
print(result)Expected Output:
2249.88Follow-Along Instructions:
- Save as
de-onboarding/ex6_debug.py. - Configure editor for 4-space indentation per PEP 8.
- Run Pyright:
pyright ex6_debug.pyto see error. - Fix and re-run.
- How to Test:
- Verify output matches expected.
- Test with additional inputs to ensure correct summation.
Exercise 7: Refactor Any to Specific Types
Refactor a function using Any to use specific types, verified by Pyright. Given def process_data(data: Any) -> Any, which accepts any input and returns any output, refactor it to def process_data(prices: List[float]) -> float to sum sales prices. Write a 50-word explanation in de-onboarding/ex7_explanation.txt explaining why specific types are safer for analytics. Verify with Pyright.
Sample Input:
data: Any = [999.99, 24.99] # Unsafe
prices: List[float] = [999.99, 24.99] # SafeExpected Output:
1024.98Expected Explanation (ex7_explanation.txt):
The refactored process_data uses List[float] and float, ensuring Pyright checks for correct types, preventing errors like passing strings. This type safety is crucial for Hijra Group’s analytics, reducing bugs in financial pipelines and ensuring reliable sales calculations compared to Any’s unchecked flexibility.Follow-Along Instructions:
- Save as
de-onboarding/ex7_refactor.py. - Configure editor for 4-space indentation per PEP 8.
- Run Pyright:
pyright ex7_refactor.py. - Run:
python ex7_refactor.py. - Save explanation in
de-onboarding/ex7_explanation.txt. - How to Test:
- Add:
print(process_data([999.99, 24.99])). - Verify output:
1024.98. - Test with incorrect type (e.g.,
List[str]) to trigger Pyright error. - Check
ex7_explanation.txtfor 50-word explanation.
- Add:
7.6 Exercise Solutions
Solution to Exercise 1: Type-Annotated Sales Calculator
from typing import List
def calculate_sales(prices: List[float], quantities: List[int]) -> float:
"""Compute total sales with type annotations."""
if len(prices) != len(quantities):
print("Error: Mismatched lengths")
return 0.0
total: float = sum(p * q for p, q in zip(prices, quantities))
return total
# Test
result: float = calculate_sales([999.99, 24.99, 49.99], [2, 10, 5])
print(result) # 2499.83Solution to Exercise 2: Type-Annotated Data Loader
import pandas as pd
from typing import Tuple
def load_halal_sales(csv_path: str) -> Tuple[pd.DataFrame, int]:
"""Load and filter CSV for Halal products."""
try:
df: pd.DataFrame = pd.read_csv(csv_path)
df = df[df["product"].str.startswith("Halal")]
return df, len(df)
except FileNotFoundError as e:
print(f"Error: {e}")
return pd.DataFrame(), 0
# Test
df, count = load_halal_sales("data/sample.csv")
print(df)
print(f"Count: {count}")Solution to Exercise 3: Generic Sum Function
from typing import TypeVar, List
T = TypeVar("T", int, float)
def sum_transactions(values: List[T]) -> T:
"""Sum transaction amounts with generics."""
return sum(values)
# Test
int_sum: int = sum_transactions([1, 2, 3])
float_sum: float = sum_transactions([1.5, 2.5, 3.5])
print(f"Int Sum: {int_sum}")
print(f"Float Sum: {float_sum}")Solution to Exercise 4: Conceptual Analysis of Generics
File: de-onboarding/ex4_concepts.txt
- Code Reuse: Generics in sum_transactions allow one function to process List[int] and List[float], reducing duplication.
- Type Safety: Pyright ensures correct types, preventing errors in pipelines.
- Scalability: Reusability supports diverse numeric data (e.g., sales, quantities) for Hijra Group’s analytics, minimizing maintenance and enabling modular pipeline design.
(56 words)Solution to Exercise 5: Typed Exception Handling
def divide_sales(total: float, count: int) -> float:
"""Divide total sales by count with typed exception."""
try:
result: float = total / count
return result
except ZeroDivisionError as e:
print(f"Error: {e}")
return 0.0
# Test
result: float = divide_sales(1000.0, 0)
print(f"Result: {result}")Solution to Exercise 6: Debug a Pyright Error
from typing import List
def sum_sales(prices: List[float], quantities: List[int]) -> float:
"""Sum sales with type annotations."""
total: float = sum(p * q for p, q in zip(prices, quantities))
return total
result: float = sum_sales([999.99, 24.99], [2, 10]) # Fixed: float
print(result) # 2249.88Explanation:
- Bug:
result: intmismatches thefloatreturn type ofsum_sales. - Fix: Change to
result: float.
Solution to Exercise 7: Refactor Any to Specific Types
from typing import List, Any
# Original function using Any (unsafe)
def process_data(data: Any) -> Any:
"""Sum sales prices (unsafe with Any)."""
return sum(data)
# Refactored function with specific types
def process_data(prices: List[float]) -> float:
"""Sum sales prices with type safety."""
return sum(prices)
# Test
result: float = process_data([999.99, 24.99])
print(result) # 1024.98Explanation (ex7_explanation.txt):
The refactored process_data uses List[float] and float, ensuring Pyright checks for correct types, preventing errors like passing strings. This type safety is crucial for Hijra Group’s analytics, reducing bugs in financial pipelines and ensuring reliable sales calculations compared to Any’s unchecked flexibility.
(49 words)7.7 Chapter Summary and Connection to Chapter 8
You’ve mastered:
- Type Annotations: For variables, functions, and classes, including cautious use of
Any. - Pyright: Setup and type checking for pipeline reliability.
- Generics: Flexible typing for reusable code.
- Exception Handling: Basic typed try/except for robustness.
- White-Space Sensitivity and PEP 8: 4-space indentation, spaces over tabs.
The micro-project built a type-safe processor for sales.csv, verified by Pyright, ensuring robust data handling for Hijra Group’s analytics. This type-safe processor lays the foundation for capstone projects, where type safety ensures robust ETL pipelines for transaction data, such as validating Sharia-compliant sales amounts across millions of records. This prepares for Chapter 8, which adds annotations and decorators for logging and monitoring, enhancing pipeline observability while maintaining type safety.