- Raised minimum PyTorch requirement to
>1.11.0.
- New Function:
analyze_neuron_bias()- Analyze per-neuron bias contributions across multiple demographic prompt pairs- Computes activation-based bias scores for individual neurons
- Supports multiple aggregation methods (mean, max) across sequence positions
- Works with GLU architecture MLP layers (gate_proj, up_proj)
- New Function:
compute_fairness_pruning_scores()- Combine bias and importance scores for balanced pruning- Configurable
bias_weightparameter (0.0 to 1.0) to adjust fairness vs. performance trade-offs - Returns fairness pruning scores for each layer
- Enables fairness-aware neuron selection strategies
- Configurable
- Modified:
prune_model_mlp_glu()- Improved compatibility with fairness-aware workflows - Documentation: Added comprehensive fairness-aware pruning guide with examples
- Complete guide to fairness-aware pruning workflow with:
- Step-by-step tutorial for
analyze_neuron_bias() - Step-by-step tutorial for
compute_fairness_pruning_scores() - Understanding the bias_weight parameter with recommended configurations
- Complete end-to-end example combining bias analysis with pruning
- Common patterns for fairness-aware analysis
- Step-by-step tutorial for
- New example notebook:
fairness_aware_pruning_demo.ipynb
- Added
analyze_neuron_bias()to API reference - Added
compute_fairness_pruning_scores()to API reference - Enhanced usage guide with fairness workflows
- Compatible with existing pruning functionality
- No breaking changes to existing API
- All existing tests remain passing
- Multi-Format Batch Handling:
analyze_layer_importancenow automatically detects and handles multiple DataLoader batch formats without requiring HuggingFace dataset utilities - Supported Formats:
- Dictionary: HuggingFace-style
{'input_ids': tensor, 'attention_mask': tensor} - Tuple/List: PyTorch
TensorDatasetformat(input_ids, attention_mask, ...) - Single Tensor: Direct tensor input treated as
input_ids
- Dictionary: HuggingFace-style
- Positional Mapping: Tuple/list elements automatically map to standard transformer arguments:
[0]=input_ids,[1]=attention_mask,[2]=token_type_ids, etc. - Internal Utility: New
_prepare_batch_inputs()function normalizes all formats transparently - Debug Logging: Optional DEBUG-level logging shows format detection and positional mapping
- Zero Breaking Changes: Existing code with dict-based DataLoaders works exactly as before
Closes Issues: #12, #17, #18
- New Nomenclature: The neuron selection method previously known as "MAW (Maximum Absolute Weight)" is now officially documented as PPM (Peak-to-Peak Magnitude), which more accurately describes the calculation method (max + |min|).
- Backward Compatibility: The parameter value
"MAW"is maintained for full backward compatibility and maps to the PPM method. - Research Foundation: PPM is formally described in: Martra, P. (2025). Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2. ArXiv. https://arxiv.org/abs/2512.22671
- Updated Documentation: All documentation files now reference PPM as the primary name with MAW noted as the legacy parameter value.
- Existing Feature: The L2 norm method (
neuron_selection_method="L2") has been available since early versions - How It Works: Calculates neuron importance using L2 (Euclidean) norms of weight values:
||gate_weight||β + ||up_weight||β - Static Only: Supports weight-only (static) pruning exclusively - not compatible with data-driven mode (dataloader)
- Documentation Enhancement: Added explicit warnings in usage guides about L2 limitations vs PPM/MAW data-driven capabilities
- Unit Tests: 11 new tests for
_prepare_batch_inputs()covering all format variations - Integration Tests: 5 new tests for
analyze_layer_importance()with different DataLoader types - Test Coverage: Dict batches, 2-element tuples, 3+ element tuples, lists, single tensors, None handling, device placement
- All Tests Pass: 16 new tests + 95 existing tests = 111 total passing tests
- File:
optipfair/pruning/utils.py - New Function:
_prepare_batch_inputs(batch, device)- internal utility with underscore prefix - Modified Function:
analyze_layer_importance()inoptipfair/pruning/depth.pynow uses normalized batch handling - Device Handling: All tensors automatically moved to model device regardless of input format
- Error Handling: Clear ValueError with format hints for unsupported batch types
- layer_importance_analysis.ipynb: Added section demonstrating TensorDataset (tuple format) usage
- docs/usage.md: New examples showing analyze_layer_importance with various DataLoader formats
- Fully Backward Compatible: All existing code continues to work without modification
- No API Changes: Function signatures unchanged, new functionality is transparent
- Python: Requires Python >=3.8 (unchanged)
- Dependencies: No new dependencies added
- compute_neuron_pair_importance_maw_hybrid(): Simplified and fixed fΓ³rmula for hybrid importance calculation
- Improved Accuracy: Now correctly combines static weight magnitudes with dynamic activation statistics
- Better Performance: Reduced unnecessary calculations while maintaining correctness
- Consistent Methodology: Uses MAW (Maximum Absolute Weight) consistently across all MLP components
- No API Changes: Fully backward compatible, internal optimization only
compute_neuron_pair_importance_maw_hybrid(): Corrected importance score calculation:- Static Component: Uses MAW (max + |min|) for gate_proj and up_proj layers
- Normalization: Scales each component to [0,1] for balanced weighting
- Hybrid Fusion: Multiplies structural potential by activation norms
- Validation: All tests pass, no breaking changes
- Fully backward compatible with v0.2.2 and earlier
- No changes to public API
- No changes to function signatures
- Internal optimization only
- layer_indices for MLP_GLU: Extended
layer_indicesparameter to support selective neuron pruning in specific layers - Contextual Usage: For DEPTH pruning, specifies layers to remove; for MLP_GLU, specifies layers to prune
- Preservation Strategy: Allows preserving critical layers (e.g., first/last) at full capacity while pruning others
- Full Compatibility: Works seamlessly with all MLP_GLU features (expansion_rate, expansion_divisor, dataloader, all methods)
- Optimized MAW Hybrid: Simplified
compute_neuron_pair_importance_maw_hybrid()to use simple MAW for gate_proj and up_proj - Focused Complexity: Maintains complex activation-weighted calculation only for down_proj where it has most impact
- Better Performance: Faster execution by reducing unnecessary calculations
- Consistent Formula: Uses same MAW method (max + |min|) as static pruning for gate/up components
- Extended API:
layer_indicesparameter now works for both DEPTH and MLP_GLU pruning types - Smart Validation: Comprehensive error checking for layer indices (range, duplicates, empty lists, types)
- Enhanced Statistics:
get_pruning_statistics()now reports selective pruning info (pruned_layers, total_layers) - Selective Calibration: Hooks only registered on selected layers when using data-driven pruning with layer_indices
- CLI Support: Updated
--layer-indiceshelp text to mention both pruning types - Backward Compatible:
layer_indices=Nonemaintains default behavior (prunes all layers)
prune_model(): Updated docstring and passeslayer_indicestoprune_model_mlp_glu()prune_model_mlp_glu(): Addedlayer_indicesparameter with full validation and filtering logicsetup_mlp_hooks_for_importance(): Now acceptslayer_indicesto register hooks only on selected layerscompute_neuron_pair_importance_maw_hybrid(): Simplified to use MAW for gate/up, complex calculation only for downget_pruning_statistics(): Detects and reports selective pruning information- CLI
commands.py: Removed restriction blockinglayer_indicesfor MLP_GLU, added parsing logic
- README.md: New "Selective Layer Width Pruning" section with examples and use cases
- Reference Manual: Comprehensive section with 4+ usage examples and best practices
- New Example File:
examples/selective_layer_width_pruning.pywith 5 complete examples - Updated Roadmap: Marked selective pruning as completed in v0.2.2
- API Documentation: Updated parameter descriptions for contextual meaning
- Complete test suite in
tests/test_selective_layer_pruning.py - 12 comprehensive test cases covering:
- Basic selective pruning (single and multiple layers)
- All neuron selection methods (MAW, VOW, PON)
- Compatibility with expansion_rate and expansion_divisor
- Data-driven pruning with layer_indices
- Invalid input handling and validation
- Statistics reporting
- Weight preservation in unpruned layers
- Result consistency and reproducibility
- Preserve Critical Layers: Keep first and last layers at full capacity
- Importance-Based: Target least important layers identified by analysis
- Domain Adaptation: Implement asymmetric pruning strategies
- Experimental: Test different layer-wise pruning patterns
- Fully backward compatible with v0.2.1
- Works with all neuron selection methods (MAW, VOW, PON)
- Compatible with both static and data-driven pruning
- Integrates with expansion_rate and expansion_divisor
layer_indicesvalidation ensures indices are valid, unique integers within model range- Empty lists raise
ValueError - Selective pruning with dataloader only calibrates on specified layers (more efficient)
- Statistics include
pruned_layersandtotal_layerswhen selective pruning is detected
- expansion_divisor Parameter: New parameter to round intermediate layer sizes to specific multiples (32, 64, 128, 256)
- GPU Optimization: Ensures tensor dimensions are optimized for modern GPU/TPU architectures
- Flexible Integration: Works seamlessly with both
pruning_percentageandexpansion_rateparameters - Automatic Rounding: Intelligently rounds to the nearest multiple after pruning calculation
- Extended API: New
expansion_divisorparameter inprune_model()andprune_model_mlp_glu() - Hardware Alignment: Better memory access patterns for tensor cores and SIMD operations
- Validation System: Comprehensive error checking for valid divisor values and parameter combinations
- Utility Function: New
round_to_divisor()function for precise rounding logic
round_to_divisor(): Rounds values to nearest multiple of specified divisor
prune_model(): Addedexpansion_divisorparameter with validationprune_model_mlp_glu(): Integrated expansion_divisor validation and propagationprune_neuron_pairs(): Added rounding logic after initial pruning calculation
- Updated API reference with expansion_divisor examples
- Added comprehensive usage guide for hardware optimization
- Created Jupyter notebook example:
examples/expansion_divisor_example.ipynb - Updated README.md with hardware-optimized pruning section
- Updated examples/README.md with new tutorial link
- Enhanced LLM reference manual with expansion_divisor documentation
- Complete test suite in
tests/test_expansion_divisor.py - Validation tests for all allowed values
- Rounding behavior tests
- Integration tests with different pruning methods
- Edge case testing
expansion_divisorcannot be used alone - requires eitherpruning_percentageorexpansion_rate- Valid values:
None(default),32,64,128,256 - Rounding maintains bounds: result is always β₯1 and β€ original size
- Fully backward compatible with v0.2.0
- Works with all neuron selection methods (MAW, VOW, PON)
- Compatible with both static and data-driven pruning
All notable changes to OptiPFair will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Hybrid Importance Calculation: Implemented data-driven neuron selection combining static weights with activation statistics
- Activation Capture System: PyTorch hooks infrastructure to collect neuron activations during calibration
- CFSP Method Integration: Implementation based on "CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information" (arXiv:2409.13199v2)
- Extended API: New
dataloaderparameter inprune_model()for calibration data - Automatic Method Selection: Intelligent switching between static and hybrid pruning based on dataloader presence
- Memory Optimization: CPU-based activation storage during calibration to minimize VRAM usage
- Better Error Messages: Comprehensive validation with clear error messages for incompatible configurations
compute_neuron_pair_importance_maw_hybrid(): Hybrid importance calculation using Equation 8 from CFSP papersetup_mlp_hooks_for_importance(): Register forward hooks for activation captureget_activation_norms(): Retrieve accumulated L2 norms from calibrationrun_calibration_forward_passes(): Execute calibration with progress tracking
prune_model(): Addeddataloaderparameterprune_model_mlp_glu(): Integrated calibration workflow and hybrid pruning logicprune_neuron_pairs(): Extended to support both static and hybrid importance calculation
- Updated API reference with data-driven pruning examples
- Added comprehensive usage guide for hybrid pruning
- Created Jupyter notebook example:
examples/data_driven_pruning.ipynb - Updated README with quick start guide for data-driven pruning
- Validated on Gemma, LLaMA, and Mistral model families
- Confirmed backward compatibility with existing static pruning code
- Added validation for dataloader compatibility with pruning methods
None - This release is fully backward compatible with v0.1.x
- Only
neuron_selection_method="MAW"supports data-driven pruning - VOW and PON methods remain static-only (will raise
ValueErrorif used with dataloader) - Supports PyTorch dataloaders with dict or tuple batch formats
- Layer importance analysis
- Depth pruning functionality
- Various bug fixes and improvements
- Bias visualization tools
- Initial release
- MLP GLU pruning support
- MAW, VOW, PON neuron selection methods
- CLI interface