Technology

Deepseek may have found a way to solve the RAM crisis by eliminating the need for expensive HBM for AI inference and training — yes, the very reason why DRAM prices went up by 5X in 10 weeks

2026-01-17 22:30
681 views
Deepseek may have found a way to solve the RAM crisis by eliminating the need for expensive HBM for AI inference and training — yes, the very reason why DRAM prices went up by 5X in 10 weeks

DeepSeek’s Engram decouples memory from computation, enabling AI models to scale efficiently while alleviating costly HBM constraints globally.

  1. Pro
Deepseek may have found a way to solve the RAM crisis by eliminating the need for expensive HBM for AI inference and training — yes, the very reason why DRAM prices went up by 5X in 10 weeks News By Efosa Udinmwen published 17 January 2026

DeepSeek's Engram separates static storage from computation

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

A person's hand using DeepSeek on their mobile phone (Image credit: Adobe Stock) Share Share by:
  • Copy link
  • Facebook
  • X
  • Whatsapp
  • Reddit
  • Pinterest
  • Flipboard
  • Threads
Share this article 0 Join the conversation Follow us Add us as a preferred source on Google
  • DeepSeek’s Engram separates static memory from computation, increasing efficiency in large AI models
  • The method reduces high-speed memory needs by enabling DeepSeek models to use lookups
  • Engram supports asynchronous prefetching across multiple GPUs with minimal performance overhead

DeepSeek, in collaboration with Peking University, introduced a new training method called Engram, designed to decouple memory storage from computational processes.

Traditional large language models require high-bandwidth memory for knowledge retrieval and basic computation, creating a bottleneck in both performance and cost.

This HBM bottleneck is widely recognized as a key reason DRAM prices rose by 5X in just 10 weeks, as hardware demand spiked to support large AI models.

You may like
  • Phison CEO Pua Khein Seng ‘In AI models, the real bottleneck isn’t computing power — it’s memory’: Phison CEO on 244TB SSDs, PLC NAND, why high-bandwidth flash isn’t a good idea, and why CSP profit goes hand in hand with storage capacity
  • GSI Gemini-I APU The associative processing unit wants to displace Nvidia's GPU as the go-to AI powerhouse by putting compute in the memory itself
  • Imec 3D HBM-on-GPU technology HBM-on-GPU set to power the next revolution in AI accelerators - and just to confirm, there's no way this will come to your video card anytime soon

Validation and technical approach

The researchers said existing models waste sequential depth on trivial operations, which could otherwise support higher-level reasoning.

Engram allows models to efficiently “look up” essential information without overloading GPU memory, freeing capacity for more complex reasoning tasks.

The system was tested on a 27-billion-parameter model and showed measurable improvements across standard industry benchmarks.

By performing knowledge retrieval through hashed N-grams, Engram provides static memory access independent of the current context.

Are you a pro? Subscribe to our newsletterContact me with news and offers from other Future brandsReceive email from us on behalf of our trusted partners or sponsorsBy submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.

The retrieved information is then adjusted using a context-aware gating mechanism to align with the model’s hidden state.

This design allows models to handle long context inputs more efficiently and supports system-level prefetching with minimal performance overhead.

The Engram method complements other hardware-efficient approaches, including solutions such as Phison’s AI inference accelerators.

You may like
  • Phison CEO Pua Khein Seng ‘In AI models, the real bottleneck isn’t computing power — it’s memory’: Phison CEO on 244TB SSDs, PLC NAND, why high-bandwidth flash isn’t a good idea, and why CSP profit goes hand in hand with storage capacity
  • GSI Gemini-I APU The associative processing unit wants to displace Nvidia's GPU as the go-to AI powerhouse by putting compute in the memory itself
  • Imec 3D HBM-on-GPU technology HBM-on-GPU set to power the next revolution in AI accelerators - and just to confirm, there's no way this will come to your video card anytime soon

Engram minimizes the amount of high-speed memory required by using lookups for static information, making memory usage more efficient.

Phison offers a cost-effective way to expand total memory using SSDs, supporting large AI models such as Engram or Mixture-of-Experts systems.

Combined, these approaches allow AI systems to optimize fast-memory usage while affordably increasing overall memory capacity.

It also works alongside emerging CXL (Compute Express Link) standards, which aim to overcome GPU memory bottlenecks in large-scale AI workloads.

The method separates static pattern storage from dynamic computation, enhancing the Transformer backbone without increasing FLOPs or parameter counts.

DeepSeek formalized a U-shaped expansion rule to optimize the allocation of parameters between the MoE conditional computation module and the Engram memory module.

Tests show that reallocating around 20–25% of the sparse parameter budget to Engram yields better performance than pure MoE models, maintaining stable gains across different scales.

Memory slot expansion provides predictable improvements without additional computational cost.

This confirms the scalability of conditional memory as an independent axis for sparse models.

Engram’s deterministic retrieval mechanism allows memory capacity to scale linearly across multiple GPUs while supporting asynchronous prefetching during inference.

It offloads static knowledge reconstruction from lower layers, freeing attention mechanisms to focus on global context.

Hierarchical caching of frequently used embeddings enhances efficiency, and the module works with existing GPU and system memory architectures, potentially avoiding costly HBM upgrades.

This technique may relieve pressure on expensive memory hardware, particularly in regions such as China, where HBM access lags behind competitors such as Samsung, SK Hynix, and Micron.

Early validation of Engram suggests models can expand parameter scale and reasoning capacity while managing memory demands more efficiently.

This approach may help ease memory constraints across AI infrastructure, potentially reducing sharp DDR5 DRAM price swings.

Via SCMP

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

Efosa UdinmwenEfosa UdinmwenFreelance Journalist

Efosa has been writing about technology for over 7 years, initially driven by curiosity but now fueled by a strong passion for the field. He holds both a Master's and a PhD in sciences, which provided him with a solid foundation in analytical thinking.

Show More Comments

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

Logout Read more Phison CEO Pua Khein Seng ‘In AI models, the real bottleneck isn’t computing power — it’s memory’: Phison CEO on 244TB SSDs, PLC NAND, why high-bandwidth flash isn’t a good idea, and why CSP profit goes hand in hand with storage capacity    GSI Gemini-I APU The associative processing unit wants to displace Nvidia's GPU as the go-to AI powerhouse by putting compute in the memory itself    Imec 3D HBM-on-GPU technology HBM-on-GPU set to power the next revolution in AI accelerators - and just to confirm, there's no way this will come to your video card anytime soon    JEDEC Solid State Technology Association SPHBM4 New 'serial' tech will significantly reduce the cost of memory — HBM memory, that is, the sort of RAM only AI hyperscalers can use, but hey, at least they won't go after consumer RAM, or would they?    Micron DDR5 Why is RAM so expensive right now? It's way more complicated than you think    Marvell Structera A CXL DDR4 This tiny chip could single-handedly solve the RAM shortage by allowing hyperscalers to reuse old DDR4 memory via CXL — and it comes with an extraordinary feature    Latest in Pro An image of hosting company logos on a TechRadar background Best n8n hosting    Hurricane 3000 microwave system Forget lasers and missiles, China wants to kill drones using a common tech households use everyday in kitchens all around the world — invisible microwave weapons can fry electronics but still require line-of-sight and proximity    MW-class S2000 Stratosphere Airborne Wind Energy System (SAWES) 'It’s like wrapping the wind from all sides': Drone-like airship with 24 blades is world's first megawatt-class tethered wind turbine — S2000 rises to 2Km and can generate a whopping 3MW    The fastest DIY drone in the world A 3D printed drone is probably the world's fastest DIY plane ever, quicker than even the legendary P38 Lightning — but at 408mph, it has some way to go to catch up with the 575mph TU-95    printful yoga mat page on a macbook 7 high-converting print-on-demand products to add to your website    Person writing on computer. 5 income streams you can add to your website right now    Latest in News NYT Connections homescreen on a phone, on a purple background NYT Connections hints and answers for Sunday, January 18 (game #952)    NYT Strands homescreen on a mobile phone screen, on a light blue background NYT Strands hints and answers for Sunday, January 18 (game #686)    Quordle on a smartphone held in a hand Quordle hints and answers for Sunday, January 18 (game #1455)    Apple MacBook Pro 14-inch (M5, 2025) in recording studio Apple's OLED touchscreen MacBook Pro upgrade could arrive early    Samsung Galaxy S25 series Samsung Galaxy S26 leak shows no sign of the Pro or Edge models    Getting image with Future Edit ChatGPT ads are coming — OpenAI confirms and explains how they'll work    LATEST ARTICLES