SPAR: Simplicial Neural Networks

Blog here.

Current mechanistic interpretability focuses on understanding models post-hoc (e.g., sparse autoencoders). We investigated whether restricting activations to the simplex could yield models that are both practical and more interpretable by design.
Points on the simplex have a privileged basis and a natural interpretation as probability distributions, so the hope was that vertices would correspond to relevant features and activations could be read as feature probabilities.
We explored several MLP variants constrained to the simplex: stochastic weight matrices, rescaled ReLU (normalize after ReLU), dimension-rescaled ReLU (scale by layer dimension), and a decaying variant that exponentially anneals the scale factor during training.
We analyzed training obstacles and interpretability metrics for each variant. This project was done as part of SPAR (Feb-May 2025) and was later extended into work on simplex-constrained transformers and manifold neural blocks.

cobylk.io