1 article
InfoMamba’s linear filtering layer cuts Transformer memory use by 40% but admits exactly where it falls short of attention.