Permutation Test on imbalanced group sets

Permutation Tests on imbalanced group sets

February 18, 2026

While working on a lab dataset, I was trying to determine whether multiple groups share the same chemical properties. I tried different statistical tests, to show that my groups were equivalent based on multiple features. The tests failed due to some features suffering from heteroscedasticity or being undersampled. Permutation test is an assumption-free choice, more reliable on small datasets than asymptotic tests.

Permutation Test

(Definition) Permutation Test

Let $ S(\mathbf{X}, \mathbf{y}) $ be a scoring function (e.g. balanced accuracy) for a model trained on feature matrix $ \mathbf{X} \in \mathbb{R}^{n \times p} $ and label vector $ \mathbf{y} \in \{0, 1\}^n $. The permutation p-value is defined as:

$$ p = \frac{1}{B} \sum_{b=1}^{B} \mathbf{1}\left[ S(\mathbf{X}, \pi_b(\mathbf{y})) \geq S(\mathbf{X}, \mathbf{y}) \right] $$

where $ \pi_b $ is the $ b $-th random permutation of the label vector, and $ B $ is the total number of permutations.