AMD and Nexa AI Enhance DeepSeek R1 with NexaQuant’s 4-bit Innovations


February 18, 2025 by our News Team

Nexa AI introduces NexaQuants, a revolutionary quantization method that promises to enhance accuracy without sacrificing speed for large language models.

  • Enhanced accuracy without sacrificing speed
  • Improved performance in llama.cpp or GGUF-based applications
  • Potential for developers and enthusiasts to explore new possibilities


Nexa AI introduces NexaQuants: A Leap in Quantization

Exciting news from the world of artificial intelligence! Today, Nexa AI rolled out its latest innovation: the NexaQuants, featuring two new DeepSeek R1 Distills—namely, the DeepSeek R1 Distill Qwen 1.5B and the DeepSeek R1 Distill Llama 8B. If you’ve been keeping an eye on quantization methods, you know that approaches like the llama.cpp based Q4 K M can significantly trim down the memory footprint of large language models. But there’s a catch: while they offer low perplexity loss for dense models, this can sometimes come at the cost of reasoning capabilities, especially for models that rely on Chain of Thought traces.

But here’s where Nexa AI claims to have cracked the code. They assert that the NexaQuants can recover that reasoning capability loss—compared to the full 16-bit precision models—while still maintaining the efficiency of 4-bit quantization. It’s like having your cake and eating it too, right?

Performance Insights: What the Benchmarks Reveal

So, what do the benchmarks say? Well, according to Nexa AI, the Q4 K M quantized DeepSeek R1 Distills score slightly lower in LLM benchmarks like GPQA and AIME24 when stacked against their full 16-bit counterparts. The exception here is the AIME24 benchmark on the Llama 3 8B distill, which takes a notable hit. Now, you might wonder if moving to a Q6 or Q8 quantization could solve this issue. It could, but it would also slow things down and demand more memory—definitely a trade-off to consider.

Nexa AI, however, is touting a proprietary quantization method that aims to recover lost capabilities while keeping the quantization at a lean 4 bits. This means that users could theoretically enjoy the best of both worlds: enhanced accuracy without sacrificing speed. Sounds promising, doesn’t it?

Getting Started with NexaQuants

If you’re eager to dive into the world of NexaQuants, you’re in luck! Here’s how you can run them on your AMD Ryzen processors or Radeon graphics card. We recommend using LM Studio for all your LLM needs. Just follow these simple steps:

1. Download and install LM Studio from lmstudio.ai/ryzenai.
2. Head over to the discover tab and paste the Hugging Face link of one of the NexaQuants mentioned earlier.
3. Sit tight while the model downloads.
4. Once that’s done, switch back to the chat tab and select your model from the drop-down menu—just make sure to choose “manually choose parameters.”
5. Set the GPU offload layers to MAX.
6. Load the model and start chatting away!

According to the data shared by Nexa AI, developers can expect to see generally improved performance with the NexaQuant versions of the DeepSeek R1 Distills, especially in llama.cpp or GGUF-based applications.

Final Thoughts

With the launch of NexaQuants, Nexa AI is making a compelling case for the future of quantization in AI. By balancing performance and efficiency, they’re opening doors for developers and enthusiasts alike. So, what do you think? Are you ready to explore the new possibilities that NexaQuants bring to the table?

AMD and Nexa AI Enhance DeepSeek R1 with NexaQuant’s 4-bit Innovations

AMD and Nexa AI Enhance DeepSeek R1 with NexaQuant’s 4-bit Innovations

AMD and Nexa AI Enhance DeepSeek R1 with NexaQuant’s 4-bit Innovations

AMD and Nexa AI Enhance DeepSeek R1 with NexaQuant’s 4-bit Innovations

About Our Team

Our team comprises industry insiders with extensive experience in computers, semiconductors, games, and consumer electronics. With decades of collective experience, we’re committed to delivering timely, accurate, and engaging news content to our readers.

Background Information


About AMD:

AMD, a large player in the semiconductor industry is known for its powerful processors and graphic solutions, AMD has consistently pushed the boundaries of performance, efficiency, and user experience. With a customer-centric approach, the company has cultivated a reputation for delivering high-performance solutions that cater to the needs of gamers, professionals, and general users. AMD's Ryzen series of processors have redefined the landscape of desktop and laptop computing, offering impressive multi-core performance and competitive pricing that has challenged the dominance of its competitors. Complementing its processor expertise, AMD's Radeon graphics cards have also earned accolades for their efficiency and exceptional graphical capabilities, making them a favored choice among gamers and content creators. The company's commitment to innovation and technology continues to shape the client computing landscape, providing users with powerful tools to fuel their digital endeavors.

AMD website  AMD LinkedIn
Latest Articles about AMD

Technology Explained


GPU: GPU stands for Graphics Processing Unit and is a specialized type of processor designed to handle graphics-intensive tasks. It is used in the computer industry to render images, videos, and 3D graphics. GPUs are used in gaming consoles, PCs, and mobile devices to provide a smooth and immersive gaming experience. They are also used in the medical field to create 3D models of organs and tissues, and in the automotive industry to create virtual prototypes of cars. GPUs are also used in the field of artificial intelligence to process large amounts of data and create complex models. GPUs are becoming increasingly important in the computer industry as they are able to process large amounts of data quickly and efficiently.

Latest Articles about GPU

LLM: A Large Language Model (LLM) is a highly advanced artificial intelligence system, often based on complex architectures like GPT-3.5, designed to comprehend and produce human-like text on a massive scale. LLMs possess exceptional capabilities in various natural language understanding and generation tasks, including answering questions, generating creative content, and delivering context-aware responses to textual inputs. These models undergo extensive training on vast datasets to grasp the nuances of language, making them invaluable tools for applications like chatbots, content generation, and language translation.

Latest Articles about LLM

Radeon: AMD Radeon, a product line by Advanced Micro Devices (AMD), consists of graphics processing units (GPUs) recognized for their strong performance in gaming, content creation, and professional applications. Powered by innovative technologies like the RDNA architecture, Radeon GPUs deliver efficient and powerful graphics processing. The brand also supports features like FreeSync, enhancing visual fluidity and reducing screen tearing during gaming. Moreover, AMD Radeon GPUs embrace real-time ray tracing for heightened realism in lighting and reflections. With a balance between price and performance, Radeon competes with NVIDIA's GeForce graphics cards and remains a popular choice for a wide range of users.

Latest Articles about Radeon




Leave a Reply