Permit decimation_factors other than 2.
Permit tap counts != 64 (but still must be multiple of 8).
Half the amount of tap memory required.
Performance is significantly degraded due to greater flexibility -- most likely due to separate sample buffer shift phase, instead of performing shift during output sample calculation.