Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
100% Positive
Analyzed from 1297 words in the discussion.
Trending Topics
#pca#data#linear#quadratic#decoder#geometric#reconstruction#algebra#where#distribution
Discussion Sentiment
Analyzed from 1297 words in the discussion.
Trending Topics
Discussion (28 Comments)Read Original on HackerNews
Anisotropy and the cone ideas may explain why PCA underperforms, but it does not uniquely justify this particular quadratic decoder. The geometric story is not doing explanatory work beyond “data is nonlinear,” and the real substance is simply that second-order reconstruction empirically helps.
The normal PCA encoding:
1) Given a mean-center-scaled X matrix, get the latent variable matrix T with X = T * P’ + e, where P = loadings and e = residuals. The P is your model, so for a new vector xnew, you can calculate tnew = xnew * P (because P’ * P = I).
This is the encoder —- nothing changes here. The original matrix is dimensionally reduced with residuals e discarded. This is why PCA is lossy.
The decoder is where things diverge
The usual PCA decoder reconstructs a given latent variable t_any by using the trained P loadings, like thus x_reconstructed = t_any * P’. This reconstructed data lies on a linear hyperplane, so if the original data did not lie on the hyperplane, reconstruction errors are potetially high.
In your proposal, instead of a linear decoder, you train a quadratic decoder (essentially a classic ridge regression using a quadratic) on the original X. So for your reconstruction, you have x_reconstructed = poly(t_new).
This achieves lower reconstruction error in-sample (naturally, because quadratic is higher order than linear), but your poly function is trained on a particular corpus. Which means that when you’re in-distribution within that corpus, you’re good but when you’re not, you can be very wrong in biased ways that PCA’s linear reconstruction is not.
SO this is not a better technique than PCA in a general sense. It’s a better reconstruction machine when your data is mostly in-sample. It’s a kind of computationally cheap “specialization” on a particular distribution of data, which can be useful if you’re mostly in-distribution but introduces new risks when out-of-distribution.
Whereas PCA just drops the residual and makes modest claims, a quadratic decoder is trying to predict the residual and on out-of-sample data, it can be wrong in biased ways that PCA is not. In other words, it can hallucinate.
But if on a large enough training corpus, chances are we’re going to be in-distribution most of the time, so maybe this could generalize well.
By representing data as multivectors, translational and rotational symmetries are encoded natively which allows them to handle geometric hierarchies with massive efficiency gains (reports of up to 78x speedups and 200x parameter reductions) compared to standard Transformers.
> A novel sequence architecture is introduced, Versor, which uses Conformal Geometric Algebra (CGA) in place of traditional linear operations to achieve structural generalization and significant performance improvements on a variety of tasks, while offering improved interpretability and efficiency. By embedding states in the manifold and evolving them via geometric transformations (rotors), Versor natively represents -equivariant relationships without requiring explicit structural encoding. Versor is validated on chaotic N-body dynamics, topological reasoning, and standard multimodal benchmarks (CIFAR-10, WikiText-103), consistently outperforming Transformers, Graph Networks, and geometric baselines (GATr, EGNN).
https://arxiv.org/abs/2602.10195
But for RAG that might be too much work per vector?
Polynomial Regression As an Alternative to Neural Nets (2018) https://arxiv.org/abs/1806.06850
Î -nets: Deep Polynomial Neural Networks (2020) https://arxiv.org/abs/2003.03828
None of this stuff is as difficult to understand as people claim it is once you work with it.
That said I do think it's a good habit to either write out abbreviations in full or link to say Wikipedia, eg for PCA[1]. It's a well-known tool but still if you come from a slightly different field it might not ring a bell.
[1]: https://en.wikipedia.org/wiki/Principal_component_analysis