
Federated MLLMs: A Pre-Training Workaround for Siloed Data📷 Source: Web
- ★Fed-MA freezes vision encoders, trains only the projector
- ★Privacy-sensitive silos remain the real bottleneck
- ★No benchmarks yet—just a lightweight pre-training pitch
A new arXiv preprint Fed-MA introduces a federated pre-training method for multimodal LLMs that sidesteps the usual hype: it doesn’t claim breakthroughs, just a pragmatic workaround. The core idea, Federated MLLM Alignment (Fed-MA), freezes the vision encoder and LLM backbone, training only the cross-modal projector in a federated setting. That’s a narrow but deliberate choice—avoiding the computational chaos of full-model aggregation while still tapping into siloed data.
The paper’s framing is refreshingly honest: existing federated learning for MLLMs has focused on fine-tuning, leaving pre-training as the untouched elephant in the room. Fed-MA’s lightweight approach suggests a path forward, but it’s still just that—a suggestion. No benchmarks, no deployment metrics, just a conceptual scaffold.
Privacy-sensitive data silos remain the actual bottleneck, not model architecture. Fed-MA doesn’t solve access; it only offers a way to use data you already have (but can’t centralize).

The gap between federated fine-tuning and foundational training📷 Source: Web
The gap between federated fine-tuning and foundational training
The two acknowledged challenges—parameter interference during aggregation and the cross-modal projector’s role—hint at the real tension here: federated pre-training isn’t just a technical problem, but a coordination one. Local updates from disparate datasets might clash, and the projector’s design (still underspecified) could become the single point of failure.
Industry players should note who benefits: cloud providers with federated infrastructure (looking at you, Google’s FL frameworks) and enterprises sitting on untapped multimodal data. Open-source reaction? Early but skeptical—developers want numbers, not just architecture diagrams.
The paper’s silence on performance is telling. Fed-MA might be a step, but it’s a small one on a very long staircase.