- Documents the current state (messy but functional code) - Lists all code smells present in the original implementation - Explains the refactoring goals and learning objectives - Provides guidance for the refactoring path
3.1 KiB
Demo 03: Refactoring Exercise - CSV Data Processor
Overview
This demo demonstrates how to approach refactoring messy but functional code. The employee_data_processor.py script works correctly but contains numerous code smells and anti-patterns that make it difficult to maintain, test, or extend.
The Current State (Before)
employee_data_processor.py is a functional CSV processing tool that:
- Reads employee data from
employees.csv - Validates records (email, salary, department, hire_date)
- Transforms data (salary to annual, department codes to full names)
- Outputs to
report.json,report.html, and console
Code Smells Present
🚨 God Function: process_employee_data() does everything in one 169-line function
🚨 Global Variables: 4 globals (processed_records, skipped_records, total_salary, dept_count)
🚨 Hardcoded Values: File paths, department mappings, validation rules scattered throughout
🚨 Mixed Concerns: Validation logic mixed with file I/O mixed with output generation
🚨 Copy-Paste Code: Validation blocks repeated unnecessarily
🚨 Poor Naming: Variables like d, dt, sal, f, jf, hf
🚨 Nested Conditionals: 4-5 levels of nested if-else statements
🚨 String Concatenation: Building HTML strings in loops (inefficient)
🚨 Limited Error Handling: Generic try-catch that doesn't provide actionable feedback
The Goal (After)
The refactored version should demonstrate:
✅ Single Responsibility: Each function/class has one clear purpose
✅ Separation of Concerns: Validation, transformation, and output are independent
✅ Configuration Management: Constants and config objects replace magic values
✅ Testable Design: Pure functions, dependency injection, no globals
✅ Clear Naming: Descriptive variable and function names
✅ Error Handling: Proper exception handling and logging
✅ Extensibility: Easy to add new validation rules or output formats
Sample Data
employees.csv contains 10 employee records with intentional issues:
- 6 valid records
- 4 invalid records (negative salary, bad email, invalid department, bad date)
Running the Script
python3 employee_data_processor.py
This will:
- Read and validate
employees.csv - Print a summary to console
- Generate
report.jsonwith structured data - Generate
report.htmlwith a formatted table
Refactoring Path
Recommended refactoring steps:
- Extract constants and configuration
- Separate validation logic into validators
- Create data classes/structures for employee records
- Extract output generators (JSON, HTML, console)
- Implement proper error handling and logging
- Write tests to verify behavior is preserved
- Consider using dataclasses, pydantic, or similar for validation
Learning Objectives
This exercise demonstrates:
- Identifying code smells and anti-patterns
- Planning a refactoring strategy
- Applying SOLID principles
- Maintaining functionality while improving code quality
- Testing refactored code to ensure no regressions
Ready to turn this mess into maintainable, professional code! 🛠️