15 - Type Safe Database Programming
Complexity: Moderate (M)
15.0 Introduction: Why This Matters for Data Engineering
In data engineering, ensuring robust and reliable database interactions is critical for Hijra Group’s Sharia-compliant fintech analytics, where data integrity directly impacts financial reporting and compliance with Islamic Financial Services Board (IFSB) standards. Type-safe programming with Python, using tools like Pyright, Pydantic, and sqlite3, minimizes runtime errors by enforcing strict type checks at development time, reducing issues like TypeError or AttributeError in database operations. For example, type annotations ensure that a price field is always a float and not mistakenly a str, which could corrupt financial calculations. Building on prior chapters, this chapter integrates type-safe Python with SQLite, leveraging concepts from Chapter 7 (Static Typing), Chapter 9 (Testing), Chapter 12 (SQL Fundamentals), Chapter 13 (Python-SQLite Integration), and Chapter 14 (Advanced SQLite Operations) to create reliable, testable database pipelines.
This chapter focuses on using Pydantic for data validation, Pyright for type checking, and pytest for testing, ensuring that database interactions are robust and maintainable. All code uses 4-space indentation per PEP 8, preferring spaces over tabs to avoid IndentationError, aligning with Hijra Group’s pipeline scripts. The micro-project builds a type-safe SQLite client to process sales data, preparing for PostgreSQL integration (Chapter 17) and FastAPI development (Chapter 53).
Data Engineering Workflow Context
This diagram illustrates how type-safe database programming fits into a data engineering pipeline:
flowchart TD
A["Raw Data (CSV)"] --> B["Type-Safe Python Client"]
B --> C{"Database Operations"}
C -->|Load/Validate| D["Pydantic Models"]
C -->|Query| E["SQLite Database"]
D --> F["Validated Data"]
E --> F
F --> G["Output (JSON/Reports)"]
classDef data fill:#f9f9f9,stroke:#333,stroke-width:2px
classDef process fill:#d0e0ff,stroke:#336,stroke-width:1px
classDef storage fill:#ddffdd,stroke:#363,stroke-width:1px
class A,D,F,G data
class B,C process
class E storageBuilding On and Preparing For
- Building On:
- Chapter 7: Uses Pyright for static typing, extended here with Generics and Pydantic for database operations.
- Chapter 9: Applies
pytestfor testing type-safe functions, ensuring robust database interactions. - Chapter 12: Leverages SQL basics for SQLite queries.
- Chapter 13: Builds on Python-SQLite integration with
sqlite3and YAML configs. - Chapter 14: Uses advanced SQLite operations (e.g., transactions) for efficient data handling.
- Preparing For:
- Chapter 17: Prepares for type-safe PostgreSQL integration with
psycopg2. - Chapter 23: Extends to integrated SQLite/PostgreSQL pipelines.
- Chapter 53: Lays groundwork for type-safe FastAPI endpoints for database access.
- Chapter 65: Supports secure database interactions with PII protection.
- Chapter 17: Prepares for type-safe PostgreSQL integration with
What You’ll Learn
This chapter covers:
- Type Annotations for Database Operations: Using
Union,List, and Generics with Pyright. - Pydantic for Validation: Creating models to validate sales data.
- Type-Safe SQLite Interactions: Safe queries and data loading with
sqlite3. - Testing with pytest: Unit and integration tests for database operations.
- Performance Considerations: Time/space complexity of database operations.
By the end, you’ll build a type-safe SQLite client that loads, validates, and queries sales data from data/sales.db, exports results to JSON, and includes comprehensive tests, all with 4-space indentation per PEP 8. The micro-project uses data/sales.csv and data/config.yaml for seeding and validation, per Appendix 1.
Follow-Along Tips:
- Create
de-onboarding/data/and populate with files from Appendix 1 (sales.csv,config.yaml,sales.db). - Install libraries:
pip install pyyaml pydantic sqlite3 pytest pyright. - Ensure Pyright is installed globally (
pip install pyright --user) or in your virtual environment. If Pyright fails, verify Python 3.10+ withpython --version. - Configure editor for 4-space indentation per PEP 8 (VS Code: “Editor: Tab Size” = 4, “Editor: Insert Spaces” = true, “Editor: Detect Indentation” = false).
- Use print statements (e.g.,
print(model.dict())) to debug Pydantic models. - Verify file paths with
ls data/(Unix/macOS) ordir data\(Windows). - Use UTF-8 encoding for all files to avoid
UnicodeDecodeError. - Run
pyright script.pyto check types before execution.
15.1 Type Annotations for Database Operations
Type annotations, enforced by Pyright, ensure that database operations use correct data types, reducing errors in pipelines. For example, annotating a function to return List[Dict[str, Any]] ensures query results are properly structured. This section introduces Generics and Union types for flexible, type-safe database interactions.
15.1.1 Using Generics and Union Types
Generics allow reusable type-safe code, while Union supports multiple possible types (e.g., float | str for prices before validation). While Dict[str, Any] is flexible for query results, prefer specific types (e.g., Dict[str, Union[str, float, int]]) when possible to enhance type safety, as explored in Chapter 23.
# File: de-onboarding/type_example.py
from typing import List, Dict, Any, Union # Import typing utilities
import sqlite3 # For database operations
def fetch_sales(db_path: str) -> List[Dict[str, Any]]:
"""Fetch sales records from SQLite with type annotations."""
conn: sqlite3.Connection = sqlite3.connect(db_path) # Connect with type hint
cursor: sqlite3.Cursor = conn.cursor() # Create cursor with type hint
cursor.execute("SELECT product, price, quantity FROM sales") # Query
results: List[Dict[str, Any]] = [
{"product": row[0], "price": row[1], "quantity": row[2]}
for row in cursor.fetchall()
] # Fetch and structure results
print(f"Fetched {len(results)} records") # Debug
conn.close() # Close connection
return results # Return typed results
# Test
if __name__ == "__main__":
sales: List[Dict[str, Any]] = fetch_sales("data/sales.db")
print(sales)
# Expected Output:
# Fetched 3 records
# [{'product': 'Halal Laptop', 'price': 999.99, 'quantity': 2},
# {'product': 'Halal Mouse', 'price': 24.99, 'quantity': 10},
# {'product': 'Halal Keyboard', 'price': 49.99, 'quantity': 5}]Follow-Along Instructions:
- Ensure
data/sales.dbexists per Appendix 1. - Save as
de-onboarding/type_example.py. - Configure editor for 4-space indentation per PEP 8.
- Install Pyright:
pip install pyright. - Run:
pyright type_example.pyto check types, thenpython type_example.py. - Verify output matches expected.
- Common Errors:
- DatabaseError: Ensure
sales.dbhassalestable. Runsqlite3 data/sales.db "SELECT * FROM sales;". - TypeError: Run
pyright type_example.pyto catch type mismatches. - IndentationError: Use 4 spaces (not tabs). Run
python -tt type_example.py.
- DatabaseError: Ensure
Key Points:
- Type Annotations:
str,List[Dict[str, Any]],sqlite3.Connectionensure type safety. - Generics:
List[Dict[str, Any]]allows flexible query results. - Time Complexity: O(n) for querying n rows.
- Space Complexity: O(n) for n rows in results.
- Implication: Type-safe queries prevent errors in Hijra Group’s financial data pipelines.
15.2 Pydantic for Validation
Pydantic provides runtime data validation through models, ensuring data conforms to expected types and rules (e.g., price is a positive float). Pydantic validates data at runtime (when the code runs), while Pyright checks types statically (before running), ensuring both development-time and runtime reliability. Pydantic integrates with Pyright for static type checking, enhancing reliability. Pydantic’s runtime validation adds slight overhead compared to manual checks but ensures robust type safety, critical for financial data.
15.2.1 Creating Pydantic Models
Define a Sale model to validate sales data. The following diagram illustrates the Pydantic validation flow:
flowchart TD
A["Input Dict"] --> B["Pydantic Model"]
B -->|Valid| C["Valid Sale"]
B -->|Invalid| D["ValidationError"]
classDef data fill:#f9f9f9,stroke:#333,stroke-width:2px
classDef process fill:#d0e0ff,stroke:#336,stroke-width:1px
classDef error fill:#ffdddd,stroke:#933,stroke-width:1px
class A,C data
class B process
class D error# File: de-onboarding/pydantic_example.py
from pydantic import BaseModel, PositiveFloat, PositiveInt # Import Pydantic
from typing import List # For type annotations
class Sale(BaseModel): # Define Sale model
product: str # Product name
price: PositiveFloat # Positive float for price
quantity: PositiveInt # Positive integer for quantity
def validate_sales(sales: List[dict]) -> List[Sale]:
"""Validate sales data using Pydantic."""
valid_sales: List[Sale] = []
for sale in sales:
try:
valid_sale: Sale = Sale(**sale) # Validate with Pydantic
valid_sales.append(valid_sale)
print(f"Validated: {valid_sale.dict()}") # Debug
except ValueError as e:
print(f"Invalid sale: {sale}, Error: {e}") # Log invalid
return valid_sales
# Test
if __name__ == "__main__":
sales_data: List[dict] = [
{"product": "Halal Laptop", "price": 999.99, "quantity": 2},
{"product": "Halal Mouse", "price": -24.99, "quantity": 10}, # Invalid price
{"product": "Halal Keyboard", "price": 49.99, "quantity": 5}
]
valid_sales: List[Sale] = validate_sales(sales_data)
print([sale.dict() for sale in valid_sales])
# Expected Output:
# Validated: {'product': 'Halal Laptop', 'price': 999.99, 'quantity': 2}
# Invalid sale: {'product': 'Halal Mouse', 'price': -24.99, 'quantity': 10}, Error: ...
# Validated: {'product': 'Halal Keyboard', 'price': 49.99, 'quantity': 5}
# [{'product': 'Halal Laptop', 'price': 999.99, 'quantity': 2},
# {'product': 'Halal Keyboard', 'price': 49.99, 'quantity': 5}]Follow-Along Instructions:
- Install Pydantic:
pip install pydantic. - Save as
de-onboarding/pydantic_example.py. - Configure editor for 4-space indentation per PEP 8.
- Run:
pyright pydantic_example.py, thenpython pydantic_example.py. - Verify output shows only valid sales.
- Common Errors:
- ValidationError: Print
saleand checktype(sale['price'])to debug invalid data. - TypeError: Run
pyright pydantic_example.pyto catch type issues. - IndentationError: Use 4 spaces (not tabs). Run
python -tt pydantic_example.py.
- ValidationError: Print
Key Points:
- Pydantic Models: Enforce types and constraints (e.g.,
PositiveFloat). - Validation: Catches invalid data at runtime.
- Time Complexity: O(n) for validating n sales.
- Space Complexity: O(n) for n valid sales.
- Implication: Ensures data integrity for financial reporting.
15.3 Type-Safe SQLite Interactions
Combine type annotations, Pydantic, and sqlite3 for safe database operations, ensuring queries and data loading are robust.
15.3.1 Type-Safe Query Execution
Execute queries with type-safe parameters. SQLite is ideal for small-scale applications but supports only single-user access, unlike PostgreSQL’s multi-user capabilities (Chapter 16).
# File: de-onboarding/sqlite_example.py
from typing import List, Dict, Any
import sqlite3
def query_sales(db_path: str, min_price: float) -> List[Dict[str, Any]]:
"""Query sales with type-safe parameters."""
conn: sqlite3.Connection = sqlite3.connect(db_path)
cursor: sqlite3.Cursor = conn.cursor()
cursor.execute(
"SELECT product, price, quantity FROM sales WHERE price >= ?",
(min_price,) # Type-safe parameter
)
results: List[Dict[str, Any]] = [
{"product": row[0], "price": row[1], "quantity": row[2]}
for row in cursor.fetchall()
]
print(f"Queried {len(results)} sales with price >= {min_price}") # Debug
conn.close()
return results
# Test
if __name__ == "__main__":
sales: List[Dict[str, Any]] = query_sales("data/sales.db", 50.0)
print(sales)
# Expected Output:
# Queried 1 sales with price >= 50.0
# [{'product': 'Halal Laptop', 'price': 999.99, 'quantity': 2}]Follow-Along Instructions:
- Ensure
data/sales.dbexists per Appendix 1. - Save as
de-onboarding/sqlite_example.py. - Run:
pyright sqlite_example.py, thenpython sqlite_example.py. - Verify output shows sales with price >= 50.0.
- Common Errors:
- DatabaseError: Verify table schema with
sqlite3 data/sales.db ".schema sales". - TypeError: Run
pyright sqlite_example.py.
- DatabaseError: Verify table schema with
Key Points:
- Parameterized Queries: Prevent SQL injection.
- Type Safety: Annotations ensure correct parameter types.
- Time Complexity: O(n) for querying n rows.
- Space Complexity: O(n) for n rows.
- Implication: Safe queries ensure reliable analytics.
15.4 Testing with pytest
Use pytest to test type-safe database operations, ensuring reliability through unit and integration tests. The following diagram illustrates the complexity of key operations in the micro-project:
flowchart TD
A["Validate Sales
O(n)"] --> B["Query Database
O(n)"]
B --> C["Process Metrics
O(n)"]
C --> D["Export JSON
O(1)"]
classDef process fill:#d0e0ff,stroke:#336,stroke-width:1px
class A,B,C,D process15.4.1 Unit and Integration Tests
Test a database client with pytest.
# File: de-onboarding/test_db_client.py
from typing import List, Dict, Any
import sqlite3
import pytest
from pydantic_example import Sale, validate_sales # Import from earlier
def setup_test_db(db_path: str) -> None:
"""Setup test database."""
conn: sqlite3.Connection = sqlite3.connect(db_path)
cursor: sqlite3.Cursor = conn.cursor()
cursor.execute("CREATE TABLE sales (product TEXT, price REAL, quantity INTEGER)")
cursor.executemany(
"INSERT INTO sales (product, price, quantity) VALUES (?, ?, ?)",
[("Halal Laptop", 999.99, 2), ("Halal Mouse", 24.99, 10)]
)
conn.commit()
conn.close()
def test_validate_sales() -> None:
"""Test sales validation."""
sales_data: List[Dict[str, Any]] = [
{"product": "Halal Laptop", "price": 999.99, "quantity": 2},
{"product": "Invalid", "price": -24.99, "quantity": 10}
]
valid_sales: List[Sale] = validate_sales(sales_data)
assert len(valid_sales) == 1
assert valid_sales[0].product == "Halal Laptop"
@pytest.fixture
def test_db(tmp_path) -> str:
"""Create temporary test database."""
db_path: str = str(tmp_path / "test.db")
setup_test_db(db_path)
return db_path
def test_query_sales(test_db: str) -> None:
"""Test querying sales."""
sales: List[Dict[str, Any]] = query_sales(test_db, 50.0)
assert len(sales) == 1
assert sales[0]["product"] == "Halal Laptop"
# Run with: pytest test_db_client.py -vFollow-Along Instructions:
- Install pytest:
pip install pytest. - Save as
de-onboarding/test_db_client.py. - Ensure
pydantic_example.pyandsqlite_example.pyare inde-onboarding/. - Run:
pytest test_db_client.py -v. - Verify all tests pass.
- Common Errors:
- FixtureError: Ensure pytest is installed (
pip install pytest) and the test file is in the correct directory. Runpytest --versionto verify. Printtmp_pathto debug. - DatabaseError: Print
db_pathto debug.
- FixtureError: Ensure pytest is installed (
Key Points:
- Unit Tests: Test validation logic.
- Integration Tests: Test database queries.
- Time Complexity: O(1) for small test datasets.
- Space Complexity: O(1) for test data.
- Implication: Ensures pipeline reliability.
15.5 Micro-Project: Type-Safe SQLite Client
Project Requirements
Build a type-safe SQLite client to process sales data from data/sales.db, validate with Pydantic, and export results to data/sales_report.json. The client supports Hijra Group’s transaction reporting, ensuring data integrity for IFSB compliance. Type-safe programming prevents errors like passing a string price (e.g., ‘999.99’) to financial calculations, which could misreport revenue and violate IFSB standards.
- Load
data/config.yamlwith PyYAML. - Validate sales data using Pydantic models.
- Query
data/sales.dbwith type-safesqlite3operations. - Compute total sales and top products.
- Export results to
data/sales_report.json. - Include pytest tests for validation and queries.
- Use 4-space indentation per PEP 8, preferring spaces over tabs.
- Test edge cases (e.g., empty database).
Sample Input Files
data/sales.db (from Appendix 1):
CREATE TABLE sales (product TEXT, price REAL, quantity INTEGER);
INSERT INTO sales (product, price, quantity) VALUES
('Halal Laptop', 999.99, 2),
('Halal Mouse', 24.99, 10),
('Halal Keyboard', 49.99, 5);data/config.yaml (from Appendix 1):
min_price: 10.0
max_quantity: 100
required_fields:
- product
- price
- quantity
product_prefix: 'Halal'
max_decimals: 2Data Processing Flow
flowchart TD
A["Sales DB
sales.db"] --> B["Load Config
config.yaml"]
B --> C["Type-Safe Client"]
C --> D["Pydantic Validation"]
D -->|Invalid| E["Log Warning"]
D -->|Valid| F["Query Database"]
F --> G["Compute Metrics"]
G --> H["Export JSON
sales_report.json"]
E --> I["End Processing"]
H --> I
classDef data fill:#f9f9f9,stroke:#333,stroke-width:2px
classDef process fill:#d0e0ff,stroke:#336,stroke-width:1px
classDef error fill:#ffdddd,stroke:#933,stroke-width:1px
classDef endpoint fill:#ddffdd,stroke:#363,stroke-width:1px
class A,G,H data
class B,C,D,F process
class E error
class I endpointAcceptance Criteria
- Go Criteria:
- Loads
config.yamland queriessales.dbcorrectly. - Validates data with Pydantic (Halal prefix, positive price/quantity).
- Computes total sales and top 3 products.
- Exports to
data/sales_report.json. - Includes pytest tests (unit, integration).
- Uses type annotations verified by Pyright.
- Uses 4-space indentation per PEP 8.
- Handles edge cases (e.g., empty database).
- Loads
- No-Go Criteria:
- Fails to load config or database.
- Incorrect validation or calculations.
- Missing JSON export or tests.
- Type errors detected by Pyright.
- Inconsistent indentation.
Common Pitfalls to Avoid
- Database Connection Issues:
- Problem:
DatabaseErrordue to missingsales.db. - Solution: Verify with
sqlite3 data/sales.db "SELECT * FROM sales;". - Debug: Run
ls data/sales.db(Unix/macOS) ordir data\sales.db(Windows) to check file existence.
- Problem:
- Pydantic Validation Errors:
- Problem:
ValidationErrorfor invalid data. For example, aValidationErrormay report:price: Value error, value must be > 0if a negative price is provided. - Solution: Print
saleand checktype(sale['price'])to debug. - Debug: Add
print(sale)beforeSale(**sale)invalidate_sales.
- Problem:
- Type Mismatches:
- Problem: Pyright detects incorrect types. For example, Pyright may report:
Expected List[Dict[str, Any]] but got List[str]if the return type is incorrect. - Solution: Run
pyright db_client.py. - Debug: Check
pyrightoutput for specific type errors (e.g.,List[str]vs.List[Dict[str, Any]]).
- Problem: Pyright detects incorrect types. For example, Pyright may report:
- IndentationError:
- Problem: Mixed spaces/tabs.
- Solution: Use 4 spaces. Run
python -tt db_client.py. - Debug: Use VS Code’s “Show Whitespace” to inspect indentation.
- Test Failures:
- Problem: Tests fail due to schema mismatches.
- Solution: Print
cursor.execute('.schema sales')insetup_test_db. - Debug: Verify test database schema with
sqlite3 test.db ".schema sales".
- YAML Parsing Errors:
- Problem:
yaml.YAMLErrordue to incorrectconfig.yamlsyntax. - Solution: Verify YAML with
python -c "import yaml; yaml.safe_load(open('data/config.yaml'))". - Debug: Print
open(config_path).read()to inspect syntax.
- Problem:
How This Differs from Production
In production, this solution would include:
- Error Handling: Robust try/except (Chapter 17).
- Connection Pooling: For scalability (Chapter 63).
- Security: Encrypted connections and PII masking (Chapter 65).
- Monitoring: Logging and observability (Chapter 66).
- CI/CD: Automated tests with GitHub Actions (Chapter 66).
Implementation
# File: de-onboarding/db_client.py
from typing import List, Dict, Any, Tuple
import sqlite3
import yaml
import json
from pydantic import BaseModel, PositiveFloat, PositiveInt, validator
import os
class Sale(BaseModel):
"""Pydantic model for sale data."""
product: str
price: PositiveFloat
quantity: PositiveInt
@validator("product")
def check_product_prefix(cls, value: str) -> str:
"""Validate product starts with Halal."""
if not value.startswith("Halal"):
raise ValueError("Product must start with 'Halal'")
return value
def read_config(config_path: str) -> Dict[str, Any]:
"""Read YAML configuration."""
print(f"Opening config: {config_path}")
with open(config_path, "r") as file:
config: Dict[str, Any] = yaml.safe_load(file)
print(f"Loaded config: {config}")
return config
def validate_sales(sales: List[Dict[str, Any]], config: Dict[str, Any]) -> Tuple[List[Sale], List[Dict[str, Any]]]:
"""Validate sales data with Pydantic."""
valid_sales: List[Sale] = []
invalid_sales: List[Dict[str, Any]] = []
max_decimals: int = config["max_decimals"]
for sale in sales:
try:
# Check decimal places
price_str: str = str(sale["price"])
if len(price_str.split(".")[1]) > max_decimals:
raise ValueError(f"Price exceeds {max_decimals} decimals")
valid_sale: Sale = Sale(**sale)
valid_sales.append(valid_sale)
print(f"Validated: {valid_sale.dict()}")
except (ValueError, TypeError) as e:
print(f"Invalid sale: {sale}, Error: {e}")
invalid_sales.append(sale)
return valid_sales, invalid_sales
def query_sales(db_path: str, min_price: float) -> List[Dict[str, Any]]:
"""Query sales from SQLite."""
conn: sqlite3.Connection = sqlite3.connect(db_path)
cursor: sqlite3.Cursor = conn.cursor()
cursor.execute(
"SELECT product, price, quantity FROM sales WHERE price >= ?",
(min_price,)
)
results: List[Dict[str, Any]] = [
{"product": row[0], "price": row[1], "quantity": row[2]}
for row in cursor.fetchall()
]
print(f"Queried {len(results)} sales with price >= {min_price}")
conn.close()
return results
def process_sales(sales: List[Sale]) -> Dict[str, Any]:
"""Process validated sales."""
if not sales:
print("No valid sales data")
return {"total_sales": 0.0, "unique_products": [], "top_products": {}}
amounts: List[float] = [sale.price * sale.quantity for sale in sales]
total_sales: float = sum(amounts)
unique_products: List[str] = list({sale.product for sale in sales})
# Group by product
product_sums: Dict[str, float] = {}
for sale in sales:
amount: float = sale.price * sale.quantity
product_sums[sale.product] = product_sums.get(sale.product, 0.0) + amount
top_products: Dict[str, float] = dict(
sorted(product_sums.items(), key=lambda x: x[1], reverse=True)[:3]
)
return {
"total_sales": total_sales,
"unique_products": unique_products,
"top_products": top_products
}
def export_results(results: Dict[str, Any], json_path: str) -> None:
"""Export results to JSON."""
print(f"Writing to: {json_path}")
with open(json_path, "w") as file:
json.dump(results, file, indent=2)
print(f"Exported to {json_path}")
def main() -> None:
"""Main function."""
config_path: str = "data/config.yaml"
db_path: str = "data/sales.db"
json_path: str = "data/sales_report.json"
config: Dict[str, Any] = read_config(config_path)
sales_data: List[Dict[str, Any]] = query_sales(db_path, config["min_price"])
valid_sales, invalid_sales = validate_sales(sales_data, config)
results: Dict[str, Any] = process_sales(valid_sales)
export_results(results, json_path)
print("\nSales Report:")
print(f"Total Records: {len(sales_data)}")
print(f"Valid Sales: {len(valid_sales)}")
print(f"Invalid Sales: {len(invalid_sales)}")
print(f"Total Sales: ${round(results['total_sales'], 2)}")
print(f"Unique Products: {results['unique_products']}")
print(f"Top Products: {results['top_products']}")
if __name__ == "__main__":
main()# File: de-onboarding/test_db_client.py
from typing import List, Dict, Any
import sqlite3
import pytest
from db_client import Sale, validate_sales, query_sales, process_sales
def setup_test_db(db_path: str) -> None:
"""Setup test database."""
conn: sqlite3.Connection = sqlite3.connect(db_path)
cursor: sqlite3.Cursor = conn.cursor()
cursor.execute("CREATE TABLE sales (product TEXT, price REAL, quantity INTEGER)")
cursor.executemany(
"INSERT INTO sales (product, price, quantity) VALUES (?, ?, ?)",
[
("Halal Laptop", 999.99, 2),
("Halal Mouse", 24.99, 10),
("Non-Halal Item", 49.99, 5)
]
)
conn.commit()
conn.close()
@pytest.fixture
def test_db(tmp_path) -> str:
"""Create temporary test database."""
db_path: str = str(tmp_path / "test.db")
setup_test_db(db_path)
return db_path
@pytest.fixture
def config() -> Dict[str, Any]:
"""Sample config."""
return {
"min_price": 10.0,
"max_quantity": 100,
"required_fields": ["product", "price", "quantity"],
"product_prefix": "Halal",
"max_decimals": 2
}
def test_validate_sales(config: Dict[str, Any]) -> None:
"""Test sales validation."""
sales_data: List[Dict[str, Any]] = [
{"product": "Halal Laptop", "price": 999.99, "quantity": 2},
{"product": "Non-Halal Item", "price": 49.99, "quantity": 5},
{"product": "Halal Mouse", "price": 24.999, "quantity": 10} # Too many decimals
]
valid_sales, invalid_sales = validate_sales(sales_data, config)
assert len(valid_sales) == 1
assert valid_sales[0].product == "Halal Laptop"
assert len(invalid_sales) == 2
def test_query_sales(test_db: str) -> None:
"""Test querying sales."""
sales: List[Dict[str, Any]] = query_sales(test_db, 50.0)
assert len(sales) == 1
assert sales[0]["product"] == "Halal Laptop"
def test_process_sales() -> None:
"""Test processing sales."""
sales: List[Sale] = [
Sale(product="Halal Laptop", price=999.99, quantity=2),
Sale(product="Halal Mouse", price=24.99, quantity=10)
]
results: Dict[str, Any] = process_sales(sales)
assert results["total_sales"] == 2249.88
assert len(results["unique_products"]) == 2
assert results["top_products"]["Halal Laptop"] == 1999.98
def test_empty_db(tmp_path) -> None:
"""Test empty database."""
db_path: str = str(tmp_path / "empty.db")
conn: sqlite3.Connection = sqlite3.connect(db_path)
conn.execute("CREATE TABLE sales (product TEXT, price REAL, quantity INTEGER)")
conn.close()
sales: List[Dict[str, Any]] = query_sales(db_path, 10.0)
assert len(sales) == 0Expected Outputs
data/sales_report.json:
{
"total_sales": 2499.83,
"unique_products": ["Halal Laptop", "Halal Mouse", "Halal Keyboard"],
"top_products": {
"Halal Laptop": 1999.98,
"Halal Mouse": 249.9,
"Halal Keyboard": 249.95
}
}Console Output (abridged):
Opening config: data/config.yaml
Loaded config: {'min_price': 10.0, 'max_quantity': 100, ...}
Queried 3 sales with price >= 10.0
Validated: {'product': 'Halal Laptop', 'price': 999.99, 'quantity': 2}
Validated: {'product': 'Halal Mouse', 'price': 24.99, 'quantity': 10}
Validated: {'product': 'Halal Keyboard', 'price': 49.99, 'quantity': 5}
Writing to: data/sales_report.json
Exported to data/sales_report.json
Sales Report:
Total Records: 3
Valid Sales: 3
Invalid Sales: 0
Total Sales: $2499.83
Unique Products: ['Halal Laptop', 'Halal Mouse', 'Halal Keyboard']
Top Products: {'Halal Laptop': 1999.98, 'Halal Mouse': 249.9, 'Halal Keyboard': 249.95}Test Output:
collected 4 items
test_db_client.py::test_validate_sales PASSED
test_db_client.py::test_query_sales PASSED
test_db_client.py::test_process_sales PASSED
test_db_client.py::test_empty_db PASSEDHow to Run and Test
Setup:
- Create
de-onboarding/data/and populate withsales.db,config.yamlper Appendix 1. - Install libraries:
pip install pyyaml pydantic pytest pyright sqlite3. - Configure editor for 4-space indentation per PEP 8.
- Save
db_client.pyandtest_db_client.py.
- Create
Run:
- Run:
pyright db_client.py, thenpython db_client.py. - Outputs:
data/sales_report.json, console logs.
- Run:
Test:
- Run:
pytest test_db_client.py -v. - Verify all tests pass.
- Test empty database by modifying
setup_test_dbto insert no data.
- Run:
15.6 Practice Exercises
Exercise Setup Checklist:
- Ensure
data/sales.dbexists per Appendix 1. - Install
pydantic,pytest,pyright(pip install pydantic pytest pyright). - Verify 4-space indentation in your editor.
Exercise 1: Type-Safe Query Function
Write a type-safe function to query sales by quantity, with 4-space indentation per PEP 8.
Expected Output:
[{'product': 'Halal Mouse', 'price': 24.99, 'quantity': 10}]Follow-Along Instructions:
- Save as
de-onboarding/ex1_query.py. - Run:
pyright ex1_query.py, thenpython ex1_query.py. - How to Test:
- Add:
print(query_by_quantity("data/sales.db", 10)). - Verify output matches expected.
- Test with
min_quantity=100: Should return empty list. - Test with
min_quantity=-1: Should return empty list, as negative quantities are invalid. - Common Errors:
- DatabaseError: Run
sqlite3 data/sales.db "SELECT * FROM sales;". - IndentationError: Use 4 spaces (not tabs). Run
python -tt ex1_query.py.
- DatabaseError: Run
- Add:
Exercise 2: Pydantic Model with Custom Validation
Create a Pydantic model with validators for decimal places and the Halal prefix, ensuring compliance with Hijra Group’s Sharia rules, with 4-space indentation per PEP 8.
Sample Input:
sales_data = [
{"product": "Halal Laptop", "price": 999.99, "quantity": 2},
{"product": "Non-Halal Mouse", "price": 24.999, "quantity": 10}
]Expected Output:
[{'product': 'Halal Laptop', 'price': 999.99, 'quantity': 2}]Follow-Along Instructions:
- Save as
de-onboarding/ex2_pydantic.py. - Run:
pyright ex2_pydantic.py, thenpython ex2_pydantic.py. - How to Test:
- Add:
print([sale.dict() for sale in validate_sales(sales_data)]). - Verify output matches expected.
- Test with valid prices and prefixes: Should include all valid sales.
- Common Errors:
- ValidationError: Print
saleandtype(sale['price']). - IndentationError: Use 4 spaces (not tabs). Run
python -tt ex2_pydantic.py.
- ValidationError: Print
- Add:
Exercise 3: Pytest for Database Query
Write a pytest test for a query function, with 4-space indentation per PEP 8. The test_db fixture creates a temporary database for testing, ensuring isolation from sales.db. Print test_db to inspect its path.
Expected Output:
test_query_by_quantity PASSEDFollow-Along Instructions:
- Save as
de-onboarding/ex3_test.py. - Ensure
ex1_query.pyis inde-onboarding/. - Run:
pytest ex3_test.py -v. - How to Test:
- Verify test passes.
- Test with
min_quantity=100: Should pass with empty results. - Common Errors:
- FixtureError: Print
test_dbto debug. - IndentationError: Use 4 spaces (not tabs). Run
python -tt ex3_test.py.
- FixtureError: Print
Exercise 4: Debug Type Mismatch
Fix a type mismatch in a query function, with 4-space indentation per PEP 8.
Buggy Code:
def query_sales(db_path: str) -> List[str]:
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
cursor.execute("SELECT product, price, quantity FROM sales")
results = [{"product": row[0], "price": row[1], "quantity": row[2]} for row in cursor.fetchall()]
conn.close()
return resultsExpected Output:
[{'product': 'Halal Laptop', 'price': 999.99, 'quantity': 2}, ...]Follow-Along Instructions:
- Save as
de-onboarding/ex4_debug.py. - Run:
pyright ex4_debug.pyto identify error, fix, and re-run. - How to Test:
- Add:
print(query_sales("data/sales.db")). - Verify output matches expected.
- Common Errors:
- TypeError: Check
pyrightoutput for type mismatch. - IndentationError: Use 4 spaces (not tabs). Run
python -tt ex4_debug.py.
- TypeError: Check
- Add:
Exercise 5: Conceptual Analysis of Pydantic Benefits
Explain the benefits of Pydantic vs. manual validation in a Hijra Group pipeline processing sales data, considering revenue misreporting and performance trade-offs (e.g., Pydantic’s runtime overhead). Save the answer to de-onboarding/ex5_concepts.txt, with 4-space indentation per PEP 8.
Expected Output (in ex5_concepts.txt):
Pydantic simplifies validation in Hijra Group’s sales data pipeline by enforcing type constraints (e.g., PositiveFloat for prices), preventing errors like passing a string price (‘999.99’) that could misreport revenue and violate IFSB standards. Unlike manual validation, which requires extensive if-statements and is error-prone, Pydantic catches invalid data at runtime. Its integration with Pyright ensures static type checking, enhancing pipeline reliability. Pydantic’s runtime overhead is higher than manual checks but ensures robust validation.Follow-Along Instructions:
- Save as
de-onboarding/ex5_concepts.py. - Run:
python ex5_concepts.py. - How to Test:
- Verify
ex5_concepts.txtcontains the expected text. - Check file with
cat ex5_concepts.txt(Unix/macOS) ortype ex5_concepts.txt(Windows). - Common Errors:
- FileNotFoundError: Ensure write permissions in
de-onboarding/. - IndentationError: Use 4 spaces (not tabs). Run
python -tt ex5_concepts.py.
- FileNotFoundError: Ensure write permissions in
- Verify
15.7 Exercise Solutions
Solution to Exercise 1
from typing import List, Dict, Any
import sqlite3
def query_by_quantity(db_path: str, min_quantity: int) -> List[Dict[str, Any]]:
"""Query sales by minimum quantity."""
conn: sqlite3.Connection = sqlite3.connect(db_path)
cursor: sqlite3.Cursor = conn.cursor()
cursor.execute(
"SELECT product, price, quantity FROM sales WHERE quantity >= ?",
(min_quantity,)
)
results: List[Dict[str, Any]] = [
{"product": row[0], "price": row[1], "quantity": row[2]}
for row in cursor.fetchall()
]
conn.close()
return results
if __name__ == "__main__":
print(query_by_quantity("data/sales.db", 10)) # Valid case
print(query_by_quantity("data/sales.db", 100)) # Edge case: no results
print(query_by_quantity("data/sales.db", -1)) # Edge case: invalid quantity
# Output:
# [{'product': 'Halal Mouse', 'price': 24.99, 'quantity': 10}]
# []
# []Solution to Exercise 2
from pydantic import BaseModel, PositiveFloat, PositiveInt, validator
from typing import List
class Sale(BaseModel):
product: str
price: PositiveFloat
quantity: PositiveInt
@validator("price")
def check_decimals(cls, value: float) -> float:
if len(str(value).split(".")[1]) > 2:
raise ValueError("Price must have <= 2 decimals")
return value
@validator("product")
def check_product_prefix(cls, value: str) -> str:
if not value.startswith("Halal"):
raise ValueError("Product must start with 'Halal'")
return value
def validate_sales(sales: List[dict]) -> List[Sale]:
valid_sales: List[Sale] = []
for sale in sales:
try:
valid_sale: Sale = Sale(**sale)
valid_sales.append(valid_sale)
except ValueError as e:
print(f"Invalid: {sale}, Error: {e}")
return valid_sales
if __name__ == "__main__":
sales_data: List[dict] = [
{"product": "Halal Laptop", "price": 999.99, "quantity": 2},
{"product": "Non-Halal Mouse", "price": 24.999, "quantity": 10}
]
print([sale.dict() for sale in validate_sales(sales_data)])
# Output:
# [{'product': 'Halal Laptop', 'price': 999.99, 'quantity': 2}]Solution to Exercise 3
import pytest
from typing import List, Dict, Any
import sqlite3
from ex1_query import query_by_quantity
@pytest.fixture
def test_db(tmp_path) -> str:
db_path: str = str(tmp_path / "test.db")
conn: sqlite3.Connection = sqlite3.connect(db_path)
cursor: sqlite3.Cursor = conn.cursor()
cursor.execute("CREATE TABLE sales (product TEXT, price REAL, quantity INTEGER)")
cursor.execute("INSERT INTO sales VALUES ('Halal Mouse', 24.99, 10)")
conn.commit()
conn.close()
return db_path
def test_query_by_quantity(test_db: str) -> None:
sales: List[Dict[str, Any]] = query_by_quantity(test_db, 10)
assert len(sales) == 1
assert sales[0]["product"] == "Halal Mouse"
# Run with: pytest ex3_test.py -vSolution to Exercise 4
from typing import List, Dict, Any
import sqlite3
def query_sales(db_path: str) -> List[Dict[str, Any]]:
"""Query sales with correct type annotation."""
conn: sqlite3.Connection = sqlite3.connect(db_path)
cursor: sqlite3.Cursor = conn.cursor()
cursor.execute("SELECT product, price, quantity FROM sales")
results: List[Dict[str, Any]] = [
{"product": row[0], "price": row[1], "quantity": row[2]}
for row in cursor.fetchall()
]
conn.close()
return results
if __name__ == "__main__":
print(query_sales("data/sales.db"))
# Output:
# [{'product': 'Halal Laptop', 'price': 999.99, 'quantity': 2}, ...]Solution to Exercise 5
def explain_pydantic_benefits() -> None:
"""Explain Pydantic vs. manual validation benefits."""
explanation: str = (
"Pydantic simplifies validation in Hijra Group’s sales data pipeline by "
"enforcing type constraints (e.g., PositiveFloat for prices), preventing "
"errors like passing a string price (‘999.99’) that could misreport revenue "
"and violate IFSB standards. Unlike manual validation, which requires extensive "
"if-statements and is error-prone, Pydantic catches invalid data at runtime. "
"Its integration with Pyright ensures static type checking, enhancing pipeline "
"reliability. Pydantic’s runtime overhead is higher than manual checks but ensures "
"robust validation."
)
with open("ex5_concepts.txt", "w") as file:
file.write(explanation)
print("Explanation saved to ex5_concepts.txt")
if __name__ == "__main__":
explain_pydantic_benefits()
# Output:
# Explanation saved to ex5_concepts.txt
# (Creates ex5_concepts.txt with explanation)15.8 Chapter Summary and Connection to Chapter 16
You’ve mastered:
- Type Annotations: Using Generics and Union for database operations, with best practices for specific types.
- Pydantic: Validating data with models, ensuring runtime and static reliability.
- SQLite: Type-safe queries with
sqlite3, suitable for small-scale applications. - Testing: Unit and integration tests with pytest, with robust debugging for fixtures.
- Performance: O(n) time/space complexity for queries and validation, visualized for clarity.
The micro-project built a type-safe SQLite client, ensuring robust data handling for Hijra Group’s analytics by preventing errors like incorrect price types that could violate IFSB standards. With comprehensive tests and 4-space indentation per PEP 8, it prepares for Chapter 16: PostgreSQL Fundamentals, which transitions to production-grade databases using psycopg2, building on type-safe principles and extending to enterprise-scale analytics with multi-user support.