DSN LINK STABLECARRIER WAVE LOCKORBITAL INDEX HOTSIGNAL CLOCK SYNCLOW NOISE FLOORFRAME BUFFER ONLINE
Loading
56 articles
arXiv’s new penalty targets papers where hallucinated references, AI meta-comments, or similar traces show that authors did not verify the text before submission.
arXiv is not banning AI tools, but it is making the author’s name mean something again when a paper shows obvious signs of unchecked model output.
A new analysis of LFBOT host galaxies supports a compact-object collision with a Wolf-Rayet star.
Researchers at the National University of Singapore and RoboScience have built FingerEye, a compact sensor that keeps visual and tactile signals together from approach to contact.
FORTE, a University of Texas at Austin robotic hand, reached 91.9% single-trial grasping success on 31 objects by using compliant fingers that measure force and slip.
Peñarrubia and Nadler propose that dwarf spheroidal galaxies evolve toward an attractor linking stellar radius and velocity dispersion.
Paper arXiv:2604.07467 shows that discrete speech units encode lexical tone less reliably than segmental speech structure.
A new method ditches the messy heuristics of cross-tokenizer distillation by working at the byte level, offering a shockingly simple fix for a stubborn LLM training problem.
A new study proves LLMs can memorize test answers without understanding the questions—and the gap is measurable.
A new arXiv paper treats LLM hallucinations as a classification error—and builds a gate to block them before they escape.
An arXiv paper compares fine-tuning, RAG, and a hybrid LLM approach for building an RCA knowledge base from support tickets.
A new arXiv study reveals language models refuse to help users bypass rules—even unjust ones—95% of the time.
Researchers from the Institute of Science and Technology Austria have made a significant discovery, identifying a new class of stars known as Merger Remnants.
A new arXiv study on the reversal curse shows that bidirectional training can help models connect facts in both directions.
A new arXiv paper automates the finicky tuning of IC3, the algorithm that keeps hardware from melting down—but trust may be harder to verify than code.
SoLA is interesting because it does not promise another smaller model trained from scratch, but tries to compress an existing LLM without extra training or special hardware.
Researchers have made a significant breakthrough in teaching Large Language Models to generate consistently correct code, with a new paper on arXiv detailing the approach.
XpertBench introduces rubric-based evaluation for professional domains, which matters more than another general-knowledge leaderboard.
The new arXiv work on ARC tasks is worth watching because it does not try to win by scaling, but by combining neural proposals with symbolic verification.
SIEVE uses SIEVE-GEN to create synthetic queries from decomposed context and then distills them into model weights.
Automated evaluation can scale safety checks, but it must not pretend to be diagnosis.
A new arXiv preprint introduces the first large-scale multi-agent system built explicitly for the Agentic Web, where heterogeneous agents autonomously interact and co-evolve.
Top AI models’ accuracy plunges from 85.8% to 61.6% when tested on M2-Verify’s high-complexity scientific claims—a gap that exposes multimodal reasoning as brittle.
Sven’s authors claim their pseudoinverse-based optimizer cuts natural gradient costs to *k*× stochastic overhead—without defining *k* for real-world models.
The Habitable Worlds Observatory may image an Earth-like planet, but without a precise mass measurement that discovery remains scientifically unfinished.
Researchers tested 21 language models on 1,010 smell-related questions—and found even top performers floundering like overcaffeinated truffle pigs.
NGC 1052-DF9’s stars move at speeds implying virtually no dark matter—yet the galaxy remains intact, defying a core tenet of astrophysics.
ArXiv 2604.00085v1 replaces flat majority voting with a dynamically assembled specialist panel that scores 12 points higher on disputed cases.
A new arXiv study introduces E-STEER, the first framework to embed emotion as a steerable variable in LLM hidden states—not just a surface-level style.
Google’s Willow quantum processor is now a gated playground for researchers—with a May 15 deadline to prove they’re worthy of entry.
Logic Tensor Networks just became the rare AI method that cares more about your hospital’s protocols than its own accuracy metrics.
Cross-dataset EEG emotion recognition just got a prototype-driven upgrade—on paper, at least, with PAA-L’s local alignment outpacing global adversarial methods in early arXiv tests.
The KGWAS framework has been upgraded to incorporate contextual information, aiming to improve detection power and provide mechanistic insights.
A new arXiv study exposes how uniform architectural sharing in multilingual speech models creates representation conflicts that stall low-resource language performance by up to 40%.
The arXiv paper’s authors admit what KG vendors won’t: 90% of the world’s textual data is still *unstructured noise*—and no one’s cracked the cost-efficient way to turn it into actionable graphs.
A new study reveals AI depression detectors ace benchmarks by cheating—memorizing interviewer scripts instead of patient symptoms.
Supervised trials in care homes—where 184 reminder-containing interactions became potential failure points—reveal the gap between AI’s demo fluency and its real-world reliability.
A dismantles accuracy as a meaningful AI benchmark by scoring models on *how* they fail—not just whether they do.
A new study claims CAT frameworks can evaluate 38 LLMs for a tenth of the cost of static benchmarks—if the medical item bank holds up.
A new paper argues AI self-improvement will stall when human-written data runs out.
LATENT achieved a 96.5% success rate on a Unitree G1 returning tennis balls within 2.5 meters of the target.
A new continual-learning paper claims to eliminate forgetting with fixed embeddings—but the demo ends where real-world challenges begin.
Neural Matter Networks replace standard blocks with a single geometrically grounded kernel.
Researchers have long been puzzled by the paradox of tabular machine learning, where high-dimensional, collinear, and error-prone data yield state-of-the-art performance.
P-GRPO tries to keep personalized gradients intact instead of flattening feedback into one global average.
New reasoning-based LLM unlearning method cuts model bias 40% by surgically removing unsafe knowledge—without full retraining.
A new arXiv study shows NLLB-200 partly tracks language phylogeny, suggesting deeper linguistic patterns.
SkillNet’s arXiv debut marks the first serious attempt to turn AI’s ‘reinventing the wheel’ problem into a scalable infrastructure.
Cheaper in AI often means dumber. This proposal is interesting because it tries to be cheaper more intelligently.
Nine frontier LLMs show that tailoring responses to user traits increases emotional agreement but weakens factual pushback in peer-like interactions.