The 9 Best Local/Offline LLMs You Can Try Right Now

Quick Links

DeepSeek Coder V2 Instruct

Wizard Vicuna Uncensored-GPTQ

With quantized LLMs now available on HuggingFace, and AI ecosystems such as H20, Text Gen, and GPT4All allowing you to load LLM weights on your computer, you now have an option for a free, flexible, and secure AI. Here are nine of the best local/offline LLMs you can try right now!

1

Hermes 2 Pro GPTQ

Zinetron/Tero Vesalainen/Shutterstock

Hermes 2 Pro is a state-of-the-art language model fine-tuned by Nous Research. It uses an updated and cleaned version of the OpenHermes 2.5 dataset, along with a newly introduced Function Calling and JSON Mode dataset developed in-house. This model is based on the Mistral 7B architecture and has been trained on 1,000,000 instructions/chats of GPT-4 quality or better, primarily synthetic data.

Model	Hermes 2 Pro GPTQ
Model Size	7.26 GB
Parameters	7 billion
Quantization	4-bit
Type	Mistral
License	Apache 2.0

Hermes 2 Pro on Mistral 7B is the new flagship 7B Hermes model, offering improved performance across various benchmarks, including AGIEval, BigBench Reasoning, GPT4All, and TruthfulQA. Its enhanced capabilities make it suitable for a wide range of natural language processing (NLP) tasks, such as code generation, content creation, and conversational AI applications.

Download: Hermes 2 Pro GPTQ via Hugging Face

2

Zephyr 7B Beta

Zephyr is a series of language models trained to act as helpful assistants. Zephyr-7B-Beta is the second model in the series, fine-tuned from Mistral-7B-v0.1 using Direct Preference Optimization (DPO) on a mix of publicly available, synthetic datasets.

Model	Hermes 2 Pro GPTQ
Model Size	7.26 GB
Parameters	7 billion
Quantization	4-bit
Type	Mistral
License	Apache 2.0

By removing the in-built alignment of the training datasets, Zephyr-7B-Beta demonstrates improved performance on benchmarks like MT-Bench, enhancing its helpfulness in various tasks. However, this adjustment may lead to the generation of problematic text when prompted in certain ways.

Download: Zephyr 7B Beta via Hugging Face

3

Falcon Instruct GPTQ

Person tracking their spendings on a spreadsheet — Chay Tee / Shutterstock

This quantized version of Falcon is based on the decoder-only architecture fine-tuned on top of TII’s raw Falcon-7b model. The base Falcon model was trained using an outstanding 1.5 trillion tokens sourced across the public internet. As an instruction-based decoder-only model licensed under Apache 2, Falcon Instruct is perfect for small businesses looking for a model to use for language translation and data entry.

Model	Falcon-7B-Instruct
Model Size	7.58 GB
Parameters	7 billion
Quantization	4-bit
Type	Falcon
License	Apache 2.0

However, this version of Falcon is not ideal for fine-tuning and is for inferencing only. If you want to fine-tune Falcon, you will have to use the raw model, which can require access to enterprise-grade training hardware such as NVIDIA DGX or AMD Instinct AI Accelerators.

Download: Falcon-7B-Instruct via Hugging Face

4

GPT4ALL-J Groovy

GPT4All-J Groovy is a decoder-only model fine-tuned by Nomic AI and licensed under Apache 2.0. GPT4ALL-J Groovy is based on the original GPT-J model, which is known to be great at text generation from prompts. GPT4ALL -J Groovy has been fine-tuned as a chat model, which is great for fast and creative text generation applications. This makes GPT4All-J Groovy ideal for content creators in assisting them in writing and creative works, whether it be poetry, music, or stories.

Model	GPT4ALL-J Groovy
Model Size	3.53 GB
Parameters	7 billion
Quantization	4-bit
Type	GPT-J
License	Apache 2.0

Unfortunately, the base GPT-J model was trained on an English-only dataset, which means even this fine-tuned GPT4ALL-J model can only chat and perform text generation applications in English.

Download: GPT4ALL-J Groovy via Hugging Face

5

DeepSeek Coder V2 Instruct

Power user desktop computer — Gorodenkoff/Shutterstock

DeepSeek Coder V2 is an advanced language model that enhances coding and mathematical reasoning capabilities. It supports an extensive range of programming languages and offers an extended context length, making it a versatile tool for developers.

Model	DeepSeek Coder V2 Instruct
Model Size	13 GB
Parameters	33 billion
Quantization	4-bit
Type	DeepSeek
License	Apache 2.0

Compared to its predecessor, DeepSeek Coder V2 shows significant advancements in code-related tasks, reasoning, and general capabilities. It expands support for programming languages from 86 to 338 and extends the context length from 16K to 128K tokens. In standard benchmark evaluations, it outperforms models like GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks.

Download: DeepSeek Coder V2 Instruct via Hugging Face

6

Mixtral-8x7B

Image Depicting the Custom GPT team assembly. — Quinten Epting / DALL-E / MakeUseOf

Mixtral-8x7B is a sparse mixture of expert (MoE) models developed by Mistral AI. It features eight experts per MLP, totaling 45 billion parameters. However, only two experts are activated per token during inference, making it computationally efficient and comparable in speed and cost to a dense 12-billion parameter model.

Model	Mixtral-8x7B
Model Size	12 GB
Parameters	45 billion (8 experts)
Quantization	4-bit
Type	Mistral MoE
License	Apache 2.0

Mixtral supports a context length of 32k tokens and outperforms Llama 2 70B on most benchmarks, matching or exceeding GPT-3.5 performance. It is proficient in multiple languages, including English, French, German, Spanish, and Italian, making it a versatile choice for various NLP tasks.

Download: Mixtral-8x7B via Hugging Face

7

Wizard Vicuna Uncensored-GPTQ

Wizard-Vicuna GPTQ is a quantized version of Wizard Vicuna based on the LlaMA model. Unlike most LLMs released to the public, Wizard-Vicuna is an uncensored model with its alignment removed. This means the model doesn’t have the same safety and moral standards as most models.

Model	Wizard-Vicuna-30B-Uncensored-GPTQ
Model Size	16.94 GB
Parameters	30 billion
Quantization	4-bit
Type	LlaMA
License	GPL 3

Although possibly posing an AI alignment control problem, having an uncensored LLM also brings out the best in the model by allowing it to answer without any constraints. This also allows the users to add their custom alignment on how the AI should act or answer based on a given prompt.

Download: Wizard-Vicuna-30B-Uncensored-GPTQ via Hugging Face

8

Orca Mini-GPTQ

Looking to experiment with a model trained on a unique learning method? Orca Mini is an unofficial model implementation of Microsoft’s Orca research papers. It was trained using the teacher-student learning method, where the dataset was full of explanations instead of only prompts and responses. This, in theory, should result in a smarter student, where the model can understand the problem rather than just looking for input and output pairs like how typical LLMs work.

Model	Orca Mini-GPTQ
Model Size	8.11 GB
Parameters	3 billion
Quantization	4-bit
Type	LlaMA
License	MIT

With only three billion parameters, Orca Mini GPTQ is easy to run even on less powerful systems. However, this model should not be used for anything professional as it generates false information, biased, and offensive responses. This model should be utilized to learn and experiment with Orca and its methods.

Download: Orca Mini-GPTQ via Hugging Face

9

Llama 2 13B Chat GPTQ

A woman at an office desk using her laptop with multiple virtual desktops. — Ground Picture / Shutterstock

Llama 2 is the successor to the original Llama LLM, offering improved performance and versatility. The 13B Chat GPTQ variant is fine-tuned for conversational AI applications optimized for English dialogue.

Model	Llama 2 13B Chat GPTQ
Model Size	7.26 GB
Parameters	13 billion
Quantization	4-bit
Type	Llama 2
License	Meta License

Llama 2 is intended for commercial and research use. Its licensing terms allow companies with fewer than 700 million users to utilize it without additional fees. This model is ideal for organizations seeking a robust chatbot solution that requires minimal additional training.

Download: Llama 2 13B Chat GPTQ via Hugging Face

Some of the models listed above have several versions in terms of parameters. In general, higher parameter versions yield better results but require more powerful hardware, while lower parameter versions will generate lower quality results but can run on lower-end hardware. If you’re unsure if your PC can run the model, try going for the lower parameter version first, then keep going until you feel the performance drop is no longer acceptable.

Trending Now

These Are the 8 Plex Plugins I Can’t Live Without

Why I Stopped Using Free AI Tools (and What I Do Instead)

Best amplifier 2025: power up your home Hi-Fi

8 Ways to Tackle Your Ever-Growing Watch List This Holiday Season

Xbox handheld console: why I want to see it so much

The 9 Best Local/Offline LLMs You Can Try Right Now

These Are the 8 Plex Plugins I Can’t Live Without

Why I Stopped Using Free AI Tools (and What I Do Instead)

8 Ways to Tackle Your Ever-Growing Watch List This Holiday Season

This eReader Puts Distraction-Free Reading and Productivity In a Palm-Sized Package

Google “Squid Game” Now to Play an Iconic Game From the Series

Why Those “Filmed on iPhone” Ads Aren’t Telling You the Truth.

A New Audio Feature on Amazon’s Fire TV Makes Communal Watching Possible for More People

For Some Reason, You Can Now Access ChatGPT Through 1-800-CHATGPT

Amazon’s 2024 Kindle Is Underwhelming in a Good Way

Why I Stopped Using Free AI Tools (and What I Do Instead)

Best amplifier 2025: power up your home Hi-Fi

8 Ways to Tackle Your Ever-Growing Watch List This Holiday Season

Xbox handheld console: why I want to see it so much

How I learned that smart tech can’t save dumb appliances

Best 4K projector 2025: the silver screen in your home

The best games of 2024

Three major Miami restaurant openings to book this weekend, before the word gets out

Trending Now

The 9 Best Local/Offline LLMs You Can Try Right Now

Quick Links

1 Hermes 2 Pro GPTQ

2 Zephyr 7B Beta

3 Falcon Instruct GPTQ

4 GPT4ALL-J Groovy

5 DeepSeek Coder V2 Instruct

6 Mixtral-8x7B

7 Wizard Vicuna Uncensored-GPTQ

8 Orca Mini-GPTQ

9 Llama 2 13B Chat GPTQ

Related Articles

1

Hermes 2 Pro GPTQ

2

Zephyr 7B Beta

3

Falcon Instruct GPTQ

4

GPT4ALL-J Groovy

5

DeepSeek Coder V2 Instruct

6

Mixtral-8x7B

7

Wizard Vicuna Uncensored-GPTQ

8

Orca Mini-GPTQ

9

Llama 2 13B Chat GPTQ