Decoder and Encoder LLM Models

XDA Developers on MSN

I tested Google's new Gemma 4 12B on my 8GB GPU, and now I don't want to go back to smaller models

Not bad for limited hardware ...

Context compression finally works in production: new research cuts LLM input 16x without the accuracy hit

Context windows are becoming a computational bottleneck. The longer an agent runs, the more tokens accumulate from retrieved documents, reasoning traces and conversation history, and the more memory ...

IEEE

Visual Evidence-aware for Object Hallucinations Rectification in LLM-based Video Captioning

Abstract: Recent neural models for video captioning are typically built using a framework that combines a pre-trained visual encoder with a large language model(LLM) decoder. However, large language ...

note

Gemma 4 12B In-Depth: A New Model Bringing Full-Scale Multimodality to Laptops with an Encoder-Free Design

Gemma 4 12B is a new model in the Gemma 4 family announced by Google on June 3, 2026. It is positioned as an "encoder-free unified multimodal model optimized for laptops." The official blog (Google ...

GitHub

OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation

We introduce OneCAT, a unified multimodal model that seamlessly integrates understanding, generation, and editing within a novel, pure decoder-only transformer architecture. Our framework uniquely ...

Nature

POLYT5: an encoder-decoder foundation chemical language model for generative polymer design

Traditional machine learning has advanced polymer discovery, yet direct generation of chemically valid and synthesizable polymers without exhaustive enumeration remains a challenge. Here we present ...

Phys.org

AI model learns yeast DNA 'language' to boost protein drug output

Industrial yeasts are a powerhouse of protein production, used to manufacture vaccines, biopharmaceuticals, and other useful compounds. In a new study, MIT chemical engineers have harnessed artificial ...

the-decoder

Deepseek OCR 2 cuts visual tokens by 80% and outperforms Gemini 3 Pro on document parsing

Chinese AI company Deepseek has unveiled a new vision encoder that rearranges image information based on meaning rather than processing it in a rigid top-to-bottom, left-to-right pattern. Traditional ...

EurekAlert!

Insilico Medicine launches science MMAI gym to train frontier LLMs into pharmaceutical-grade scientific engines

New “AI GYM for Science” dramatically boosts the biological and chemical intelligence of any causal or frontier LLM, delivering up to 10x performance gains on key drug discovery benchmarks and ...

AppleInsider

Apple AI research shows how MLLMs understand, generate, search for images

Apple's researchers continue to focus on multimodal LLMs, with studies exploring their use for image generation, understanding, and multi-turn web searches with cropped images. Now, the company is ...

9to5Mac

New Apple model combines vision understanding and image generation with impressive results

In the study titled MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer, a team of nearly 30 Apple researchers details a novel unified approach that enables both ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results