Permutation Test on imbalanced group sets | Tudorita Zaharia

Tudorita Zaharia

Cat

MSc. Artificial Intelligence in Medicine, University of Bern.

Schaffhausen, Switzerland

Permutation Tests on imbalanced group sets

February 18, 2026


While working on a lab dataset, I was trying to determine whether multiple groups share the same chemical properties. I tried different statistical tests, to show that my groups were equivalent based on multiple features. The tests failed due to some features suffering from heteroscedasticity or being undersampled. Permutation test is an assumption-free choice, more reliable on small datasets than asymptotic tests.



Permutation Test

(Definition) Permutation Test

Let \( S(\mathbf{X}, \mathbf{y}) \) be a scoring function (e.g. balanced accuracy) for a model trained on feature matrix \( \mathbf{X} \in \mathbb{R}^{n \times p} \) and label vector \( \mathbf{y} \in \{0, 1\}^n \). The permutation p-value is defined as:

$$ p = \frac{1}{B} \sum_{b=1}^{B} \mathbf{1}\left[ S(\mathbf{X}, \pi_b(\mathbf{y})) \geq S(\mathbf{X}, \mathbf{y}) \right] $$

where \( \pi_b \) is the \( b \)-th random permutation of the label vector, and \( B \) is the total number of permutations.