Coding Decoding Mains Level

DeepSeek open sources DSpark, a new framework to speed up LLM inference by up to 85%

DSpark can make decoding faster, but acceptance quality still determines how much speed the system actually realizes.

DeepSeek Releases DSpark: Speculative Decoding Makes V4 Up to 85 Percent Faster

DeepSeek speculative decoding framework DSpark went live June 27 on V4-Flash and V4-Pro, reporting up to 85 percent faster ...

IEEE

DOPD: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving

Abstract: To meet strict Service-Level Objectives (SLO), contemporary Large Language Models (LLMs) decouple the prefill and decoding stages and place them on separate GPUs to mitigate the distinct ...

IEEE

SCDP: Systematic Rateless Coding for Efficient Data Transport in Data Centers

Abstract: In this paper we propose SCDP, a general-purpose data transport protocol for data centres that, in contrast to all other protocols proposed to date, supports efficient one-to-many and ...

GitHub

Whisper.cpp: Port of OpenAI's Whisper model in C/C++

The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. Having such a lightweight implementation ...

GitHub

TileRT: Tile-Based Runtime for

🎉 2026-02-14 · v0.1.3 Released. The v0.1.3 release introduces full support for the latest GLM-5 model, achieving up to 500 tokens/s on GLM-5-FP8 and up to 600 tokens/s on DeepSeek-V3.2. TileRT is a ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results