- Documents the current state (messy but functional code) - Lists all code smells present in the original implementation - Explains the refactoring goals and learning objectives - Provides guidance for the refactoring path
93 lines
3.1 KiB
Markdown
93 lines
3.1 KiB
Markdown
# Demo 03: Refactoring Exercise - CSV Data Processor
|
|
|
|
## Overview
|
|
|
|
This demo demonstrates how to approach refactoring messy but functional code. The `employee_data_processor.py` script works correctly but contains numerous code smells and anti-patterns that make it difficult to maintain, test, or extend.
|
|
|
|
## The Current State (Before)
|
|
|
|
`employee_data_processor.py` is a functional CSV processing tool that:
|
|
- Reads employee data from `employees.csv`
|
|
- Validates records (email, salary, department, hire_date)
|
|
- Transforms data (salary to annual, department codes to full names)
|
|
- Outputs to `report.json`, `report.html`, and console
|
|
|
|
### Code Smells Present
|
|
|
|
🚨 **God Function:** `process_employee_data()` does everything in one 169-line function
|
|
|
|
🚨 **Global Variables:** 4 globals (`processed_records`, `skipped_records`, `total_salary`, `dept_count`)
|
|
|
|
🚨 **Hardcoded Values:** File paths, department mappings, validation rules scattered throughout
|
|
|
|
🚨 **Mixed Concerns:** Validation logic mixed with file I/O mixed with output generation
|
|
|
|
🚨 **Copy-Paste Code:** Validation blocks repeated unnecessarily
|
|
|
|
🚨 **Poor Naming:** Variables like `d`, `dt`, `sal`, `f`, `jf`, `hf`
|
|
|
|
🚨 **Nested Conditionals:** 4-5 levels of nested if-else statements
|
|
|
|
🚨 **String Concatenation:** Building HTML strings in loops (inefficient)
|
|
|
|
🚨 **Limited Error Handling:** Generic try-catch that doesn't provide actionable feedback
|
|
|
|
## The Goal (After)
|
|
|
|
The refactored version should demonstrate:
|
|
|
|
✅ **Single Responsibility:** Each function/class has one clear purpose
|
|
|
|
✅ **Separation of Concerns:** Validation, transformation, and output are independent
|
|
|
|
✅ **Configuration Management:** Constants and config objects replace magic values
|
|
|
|
✅ **Testable Design:** Pure functions, dependency injection, no globals
|
|
|
|
✅ **Clear Naming:** Descriptive variable and function names
|
|
|
|
✅ **Error Handling:** Proper exception handling and logging
|
|
|
|
✅ **Extensibility:** Easy to add new validation rules or output formats
|
|
|
|
## Sample Data
|
|
|
|
`employees.csv` contains 10 employee records with intentional issues:
|
|
- 6 valid records
|
|
- 4 invalid records (negative salary, bad email, invalid department, bad date)
|
|
|
|
## Running the Script
|
|
|
|
```bash
|
|
python3 employee_data_processor.py
|
|
```
|
|
|
|
This will:
|
|
1. Read and validate `employees.csv`
|
|
2. Print a summary to console
|
|
3. Generate `report.json` with structured data
|
|
4. Generate `report.html` with a formatted table
|
|
|
|
## Refactoring Path
|
|
|
|
Recommended refactoring steps:
|
|
1. Extract constants and configuration
|
|
2. Separate validation logic into validators
|
|
3. Create data classes/structures for employee records
|
|
4. Extract output generators (JSON, HTML, console)
|
|
5. Implement proper error handling and logging
|
|
6. Write tests to verify behavior is preserved
|
|
7. Consider using dataclasses, pydantic, or similar for validation
|
|
|
|
## Learning Objectives
|
|
|
|
This exercise demonstrates:
|
|
- Identifying code smells and anti-patterns
|
|
- Planning a refactoring strategy
|
|
- Applying SOLID principles
|
|
- Maintaining functionality while improving code quality
|
|
- Testing refactored code to ensure no regressions
|
|
|
|
---
|
|
|
|
*Ready to turn this mess into maintainable, professional code! 🛠️*
|