8000
Skip to content

Latest commit

Β 

History

History
356 lines (259 loc) Β· 16.1 KB

File metadata and controls

356 lines (259 loc) Β· 16.1 KB

[Unreleased]

πŸ”§ Compatibility

  • Raised minimum PyTorch requirement to >1.11.0.

[0.3.0] - 2026-03-02

πŸŽ‰ New Features

Fairness-Aware Pruning

  • New Function: analyze_neuron_bias() - Analyze per-neuron bias contributions across multiple demographic prompt pairs
    • Computes activation-based bias scores for individual neurons
    • Supports multiple aggregation methods (mean, max) across sequence positions
    • Works with GLU architecture MLP layers (gate_proj, up_proj)
  • New Function: compute_fairness_pruning_scores() - Combine bias and importance scores for balanced pruning
    • Configurable bias_weight parameter (0.0 to 1.0) to adjust fairness vs. performance trade-offs
    • Returns fairness pruning scores for each layer
    • Enables fairness-aware neuron selection strategies

Enhanced Pruning Integration

  • Modified: prune_model_mlp_glu() - Improved compatibility with fairness-aware workflows
  • Documentation: Added comprehensive fairness-aware pruning guide with examples

πŸ“š Documentation Enhancements

New Fairness-Aware Pruning Section

  • Complete guide to fairness-aware pruning workflow with:
    • Step-by-step tutorial for analyze_neuron_bias()
    • Step-by-step tutorial for compute_fairness_pruning_scores()
    • Understanding the bias_weight parameter with recommended configurations
    • Complete end-to-end example combining bias analysis with pruning
    • Common patterns for fairness-aware analysis
  • New example notebook: fairness_aware_pruning_demo.ipynb

Updated API Documentation

  • Added analyze_neuron_bias() to API reference
  • Added compute_fairness_pruning_scores() to API reference
  • Enhanced usage guide with fairness workflows

πŸ§ͺ Testing & Quality

  • Compatible with existing pruning functionality
  • No breaking changes to existing API
  • All existing tests remain passing

[0.2.4] - 2026-01-10

πŸŽ‰ New Features

Universal DataLoader Format Support for analyze_layer_importance

  • Multi-Format Batch Handling: analyze_layer_importance now automatically detects and handles multiple DataLoader batch formats without requiring HuggingFace dataset utilities
  • Supported Formats:
    • Dictionary: HuggingFace-style {'input_ids': tensor, 'attention_mask': tensor}
    • Tuple/List: PyTorch TensorDataset format (input_ids, attention_mask, ...)
    • Single Tensor: Direct tensor input treated as input_ids
  • Positional Mapping: Tuple/list elements automatically map to standard transformer arguments: [0]=input_ids, [1]=attention_mask, [2]=token_type_ids, etc.
  • Internal Utility: New _prepare_batch_inputs() function normalizes all formats transparently
  • Debug Logging: Optional DEBUG-level logging shows format detection and positional mapping
  • Zero Breaking Changes: Existing code with dict-based DataLoaders works exactly as before

Closes Issues: #12, #17, #18

πŸ“ Documentation Updates

Terminology Update: MAW β†’ PPM

  • New Nomenclature: The neuron selection method previously known as "MAW (Maximum Absolute Weight)" is now officially documented as PPM (Peak-to-Peak Magnitude), which more accurately describes the calculation method (max + |min|).
  • Backward Compatibility: The parameter value "MAW" is maintained for full backward compatibility and maps to the PPM method.
  • Research Foundation: PPM is formally described in: Martra, P. (2025). Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2. ArXiv. https://arxiv.org/abs/2512.22671
  • Updated Documentation: All documentation files now reference PPM as the primary name with MAW noted as the legacy parameter value.

Clarification: L2 Norm Neuron Selection Method

  • Existing Feature: The L2 norm method (neuron_selection_method="L2") has been available since early versions
  • How It Works: Calculates neuron importance using L2 (Euclidean) norms of weight values: ||gate_weight||β‚‚ + ||up_weight||β‚‚
  • Static Only: Supports weight-only (static) pruning exclusively - not compatible with data-driven mode (dataloader)
  • Documentation Enhancement: Added explicit warnings in usage guides about L2 limitations vs PPM/MAW data-driven capabilities

πŸ§ͺ Testing

Comprehensive Test Coverage for Batch Format Support

  • Unit Tests: 11 new tests for _prepare_batch_inputs() covering all format variations
  • Integration Tests: 5 new tests for analyze_layer_importance() with different DataLoader types
  • Test Coverage: Dict batches, 2-element tuples, 3+ element tuples, lists, single tensors, None handling, device placement
  • All Tests Pass: 16 new tests + 95 existing tests = 111 total passing tests

πŸ”§ Technical Details

Implementation

  • File: optipfair/pruning/utils.py
  • New Function: _prepare_batch_inputs(batch, device) - internal utility with underscore prefix
  • Modified Function: analyze_layer_importance() in optipfair/pruning/depth.py now uses normalized batch handling
  • Device Handling: All tensors automatically moved to model device regardless of input format
  • Error Handling: Clear ValueError with format hints for unsupported batch types

Enhanced Examples

  • layer_importance_analysis.ipynb: Added section demonstrating TensorDataset (tuple format) usage
  • docs/usage.md: New examples showing analyze_layer_importance with various DataLoader formats

πŸ”’ Compatibility

  • Fully Backward Compatible: All existing code continues to work without modification
  • No API Changes: Function signatures unchanged, new functionality is transparent
  • Python: Requires Python >=3.8 (unchanged)
  • Dependencies: No new dependencies added

[0.2.3] - 2025-12-04

πŸ› Bug Fixes

Fixed Hybrid Importance Calculation

  • compute_neuron_pair_importance_maw_hybrid(): Simplified and fixed fΓ³rmula for hybrid importance calculation
  • Improved Accuracy: Now correctly combines static weight magnitudes with dynamic activation statistics
  • Better Performance: Reduced unnecessary calculations while maintaining correctness
  • Consistent Methodology: Uses MAW (Maximum Absolute Weight) consistently across all MLP components
  • No API Changes: Fully backward compatible, internal optimization only

πŸ”§ Technical Details

Fixed Functions

  • compute_neuron_pair_importance_maw_hybrid(): Corrected importance score calculation:
    • Static Component: Uses MAW (max + |min|) for gate_proj and up_proj layers
    • Normalization: Scales each component to [0,1] for balanced weighting
    • Hybrid Fusion: Multiplies structural potential by activation norms
    • Validation: All tests pass, no breaking changes

πŸ”’ Compatibility

  • Fully backward compatible with v0.2.2 and earlier
  • No changes to public API
  • No changes to function signatures
  • Internal optimization only

[0.2.2] - 2025-11-26

πŸŽ‰ New Features

Selective Layer Width Pruning

  • layer_indices for MLP_GLU: Extended layer_indices parameter to support selective neuron pruning in specific layers
  • Contextual Usage: For DEPTH pruning, specifies layers to remove; for MLP_GLU, specifies layers to prune
  • Preservation Strategy: Allows preserving critical layers (e.g., first/last) at full capacity while pruning others
  • Full Compatibility: Works seamlessly with all MLP_GLU features (expansion_rate, expansion_divisor, dataloader, all methods)

Simplified Hybrid Importance Calculation

  • Optimized MAW Hybrid: Simplified compute_neuron_pair_importance_maw_hybrid() to use simple MAW for gate_proj and up_proj
  • Focused Complexity: Maintains complex activation-weighted calculation only for down_proj where it has most impact
  • Better Performance: Faster execution by reducing unnecessary calculations
  • Consistent Formula: Uses same MAW method (max + |min|) as static pruning for gate/up components

✨ Enhancements

  • Extended API: layer_indices parameter now works for both DEPTH and MLP_GLU pruning types
  • Smart Validation: Comprehensive error checking for layer indices (range, duplicates, empty lists, types)
  • Enhanced Statistics: get_pruning_statistics() now reports selective pruning info (pruned_layers, total_layers)
  • Selective Calibration: Hooks only registered on selected layers when using data-driven pruning with layer_indices
  • CLI Support: Updated --layer-indices help text to mention both pruning types
  • Backward Compatible: layer_indices=None maintains default behavior (prunes all layers)

πŸ”§ Technical Details

Modified Functions

  • prune_model(): Updated docstring and passes layer_indices to prune_model_mlp_glu()
  • prune_model_mlp_glu(): Added layer_indices parameter with full validation and filtering logic
  • setup_mlp_hooks_for_importance(): Now accepts layer_indices to register hooks only on selected layers
  • compute_neuron_pair_importance_maw_hybrid(): Simplified to use MAW for gate/up, complex calculation only for down
  • get_pruning_statistics(): Detects and reports selective pruning information
  • CLI commands.py: Removed restriction blocking layer_indices for MLP_GLU, added parsing logic

πŸ“š Documentation

  • README.md: New "Selective Layer Width Pruning" section with examples and use cases
  • Reference Manual: Comprehensive section with 4+ usage examples and best practices
  • New Example File: examples/selective_layer_width_pruning.py with 5 complete examples
  • Updated Roadmap: Marked selective pruning as completed in v0.2.2
  • API Documentation: Updated parameter descriptions for contextual meaning

πŸ§ͺ Testing

  • Complete test suite in tests/test_selective_layer_pruning.py
  • 12 comprehensive test cases covering:
    • Basic selective pruning (single and multiple layers)
    • All neuron selection methods (MAW, VOW, PON)
    • Compatibility with expansion_rate and expansion_divisor
    • Data-driven pruning with layer_indices
    • Invalid input handling and validation
    • Statistics reporting
    • Weight preservation in unpruned layers
    • Result consistency and reproducibility

πŸ’‘ Use Cases

  1. Preserve Critical Layers: Keep first and last layers at full capacity
  2. Importance-Based: Target least important layers identified by analysis
  3. Domain Adaptation: Implement asymmetric pruning strategies
  4. Experimental: Test different layer-wise pruning patterns

πŸ”’ Compatibility

  • Fully backward compatible with v0.2.1
  • Works with all neuron selection methods (MAW, VOW, PON)
  • Compatible with both static and data-driven pruning
  • Integrates with expansion_rate and expansion_divisor

⚠️ Important Notes

  • layer_indices validation ensures indices are valid, unique integers within model range
  • Empty lists raise ValueError
  • Selective pruning with dataloader only calibrates on specified layers (more efficient)
  • Statistics include pruned_layers and total_layers when selective pruning is detected

[0.2.1] - 2025-11-24

πŸŽ‰ New Features

Hardware-Optimized Pruning with expansion_divisor

  • expansion_divisor Parameter: New parameter to round intermediate layer sizes to specific multiples (32, 64, 128, 256)
  • GPU Optimization: Ensures tensor dimensions are optimized for modern GPU/TPU architectures
  • Flexible Integration: Works seamlessly with both pruning_percentage and expansion_rate parameters
  • Automatic Rounding: Intelligently rounds to the nearest multiple after pruning calculation

✨ Enhancements

  • Extended API: New expansion_divisor parameter in prune_model() and prune_model_mlp_glu()
  • Hardware Alignment: Better memory access patterns for tensor cores and SIMD operations
  • Validation System: Comprehensive error checking for valid divisor values and parameter combinations
  • Utility Function: New round_to_divisor() function for precise rounding logic

πŸ”§ Technical Details

New Functions

  • round_to_divisor(): Rounds values to nearest multiple of specified divisor

Modified Functions

  • prune_model(): Added expansion_divisor parameter with validation
  • prune_model_mlp_glu(): Integrated expansion_divisor validation and propagation
  • prune_neuron_pairs(): Added rounding logic after initial pruning calculation

πŸ“š Documentation

  • Updated API reference with expansion_divisor examples
  • Added comprehensive usage guide for hardware optimization
  • Created Jupyter notebook example: examples/expansion_divisor_example.ipynb
  • Updated README.md with hardware-optimized pruning section
  • Updated examples/README.md with new tutorial link
  • Enhanced LLM reference manual with expansion_divisor documentation

πŸ§ͺ Testing

  • Complete test suite in tests/test_expansion_divisor.py
  • Validation tests for all allowed values
  • Rounding behavior tests
  • Integration tests with different pruning methods
  • Edge case testing

⚠️ Important Notes

  • expansion_divisor cannot be used alone - requires either pruning_percentage or expansion_rate
  • Valid values: None (default), 32, 64, 128, 256
  • Rounding maintains bounds: result is always β‰₯1 and ≀ original size

πŸ”’ Compatibility

  • Fully backward compatible with v0.2.0
  • Works with all neuron selection methods (MAW, VOW, PON)
  • Compatible with both static and data-driven pruning

Changelog

All notable changes to OptiPFair will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.2.0] - 2025-10-27

πŸŽ‰ Major Features

Data-Driven Width Pruning

  • Hybrid Importance Calculation: Implemented data-driven neuron selection combining static weights with activation statistics
  • Activation Capture System: PyTorch hooks infrastructure to collect neuron activations during calibration
  • CFSP Method Integration: Implementation based on "CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information" (arXiv:2409.13199v2)

✨ Enhancements

  • Extended API: New dataloader parameter in prune_model() for calibration data
  • Automatic Method Selection: Intelligent switching between static and hybrid pruning based on dataloader presence
  • Memory Optimization: CPU-based activation storage during calibration to minimize VRAM usage
  • Better Error Messages: Comprehensive validation with clear error messages for incompatible configurations

πŸ”§ Technical Details

New Functions

  • compute_neuron_pair_importance_maw_hybrid(): Hybrid importance calculation using Equation 8 from CFSP paper
  • setup_mlp_hooks_for_importance(): Register forward hooks for activation capture
  • get_activation_norms(): Retrieve accumulated L2 norms from calibration
  • run_calibration_forward_passes(): Execute calibration with progress tracking

Modified Functions

  • prune_model(): Added dataloader parameter
  • prune_model_mlp_glu(): Integrated calibration workflow and hybrid pruning logic
  • prune_neuron_pairs(): Extended to support both static and hybrid importance calculation

πŸ“š Documentation

  • Updated API reference with data-driven pruning examples
  • Added comprehensive usage guide for hybrid pruning
  • Created Jupyter notebook example: examples/data_driven_pruning.ipynb
  • Updated README with quick start guide for data-driven pruning

πŸ§ͺ Testing

  • Validated on Gemma, LLaMA, and Mistral model families
  • Confirmed backward compatibility with existing static pruning code
  • Added validation for dataloader compatibility with pruning methods

⚠️ Breaking Changes

None - This release is fully backward compatible with v0.1.x

πŸ”’ Compatibility

  • Only neuron_selection_method="MAW" supports data-driven pruning
  • VOW and PON methods remain static-only (will raise ValueError if used with dataloader)
  • Supports PyTorch dataloaders with dict or tuple batch formats

[0.1.5] - 2024-XX-XX

Added

  • Layer importance analysis
  • Depth pruning functionality

Fixed

  • Various bug fixes and improvements

[0.1.0] - 2024-XX-XX

Added

  • Bias visualization tools
  • Initial release
  • MLP GLU pruning support
  • MAW, VOW, PON neuron selection methods
  • CLI interface
0