AirgapAI Model Downloads – 10 Models Available

Q-3.5

Qwen 3.5

Latest instruct line with thinking, in 4B & 9B

Chat

INTELLIGENCE

Advanced

SPEED

Fast

SIZE

Small–Medium

Qwen 3.5 instruct family with built-in step-by-step thinking for stronger reasoning. Use 4B for quick, smooth performance on more machines, or 9B when you want richer answers and have a stronger device.

Thinking Vision Function calling Structured output

Q4_K_MGPU2.8 GB Q4_K_MCPU2.8 GB INT4GPU

Q4_K_MGPU5.9 GB

27B

Q4_K_MGPU17 GB

35B-A3B

Q4_K_MGPU23 GB

Q4Apple Silicon2.8 GB

Q4Apple Silicon5.9 GB

27B

Q4Apple Silicon17 GB

35B-A3B

Q4Apple Silicon23 GB

4B is the easy default; 9B when answer quality matters most
Choose a size, then the download row under it
A second 9B Windows row offers uncensored weights (fewer refusals); use only where that trade-off is appropriate.
Learn more (4B)
Learn more (9B)

GEMMA

Gemma 4

Google's efficient open models, E4B to 31B

Chat

INTELLIGENCE

Advanced

SPEED

Fast

SIZE

Small–Large

Google's Gemma 4 instruct family, packaged for AirgapAI. Start with the E4B build for fast, smooth responses on modest hardware, or step up to 12B / 26B / 31B when you want richer answers and have a stronger GPU or Apple Silicon machine.

Thinking Vision Function calling Structured output

E4B

Q4_K_MGPU5.0 GB Q4_K_MCPU5.0 GB

12B

Q4_K_MGPU7.1 GB

26B-A4B

Q4_K_MGPU17 GB

31B

Q4_K_MGPU18 GB

E4B

Q4Apple Silicon5.0 GB

12B

Q4Apple Silicon7.1 GB

26B-A4B

Q4Apple Silicon17 GB

31B

Q4Apple Silicon18 GB

4-bit quantisation across all variants
E4B runs on most modern GPUs / iGPUs; larger builds want a 2022+ discrete GPU or Apple Silicon
Choose a size, then the download row under it
Model details

L-3.2

Llama 3.2

Meta's compact instruct models (1B & 3B)

Chat

INTELLIGENCE

Strong

SPEED

Very Fast

SIZE

Small

Meta's Llama 3.2 instruct family in 1B and 3B sizes. Runs universally via MLC or optimised for Intel GPU / NPU via OpenVINO. Includes a 32K-context 3B build for long-document workflows.

INT4NPU

Q4_K_MGPU2.0 GB Q4_K_MCPU2.0 GB INT4GPU INT4NPU

Q4Apple Silicon2.0 GB

4-bit quantisation across all variants
1B runs on older GPUs / iGPUs; 3B benefits from 2025 Intel iGPU or 2022+ discrete GPU
Model details (1B)
Model details (3B)

L-8B

Llama 3.1 8B

Meta's 8B instruct, Intel-optimised

Chat

INTELLIGENCE

Advanced

SPEED

Fast

SIZE

Medium

Llama 3.1 8B Instruct compiled for Intel hardware via OpenVINO. Pick GPU for Arc / iGPU setups or NPU on Core Ultra laptops.

Q4_K_MGPU4.9 GB INT4GPU INT4NPU

Q4Apple Silicon4.9 GB

INT4 quantisation
Requires Lunar Lake or later for best results
Model details

SAUL

Saul 7B Instruct

Legal-domain expert, Mistral 7B base

Chat

INTELLIGENCE

Advanced

SPEED

Fast

SIZE

Medium

Saul 7B Instruct is a legal-domain specialist fine-tuned from Mistral 7B on a large corpus of legal texts. Ideal for contract review, case summarisation, and legal Q&A where domain accuracy matters.

Q4_K_MGPU4.4 GB

Q4Apple Silicon4.4 GB

5 GB+ vRAM
Runs via llama.cpp runtime (GPU/CPU hybrid supported)
Specialised for legal reasoning and drafting
Model details

AFM

Arcee AFM 4.5B

Intel-optimised small reasoning model

Chat

INTELLIGENCE

Flagship

SPEED

Fast

SIZE

Small

Arcee's AFM 4.5B optimised for Intel hardware via OpenVINO runtime.

4.5B

INT8GPU

5 GB+ vRAM
Optimised for 2024+ Intel Arc / iGPU
Model details

IBM-G

IBM Granite 4.1

Chat

IBM Granite 4.1 packaged for AirgapAI.

Q4_K_MGPU5.3 GB

Q4Apple Silicon5.3 GB

BLK

By Iternal Technologies

Blockify

Local document → IdeaBlocks ingestion

Document Processing

SPEED

Very Fast

SIZE

Small

MODALITY

Text → IdeaBlocks

Iternal's proprietary fine-tuned ingestion model. Transforms unstructured documents into clean, searchable IdeaBlocks so retrieval-augmented search runs on signal instead of noise. Pick the size that fits your hardware — accuracy scales with parameter count. Because of local hardware limitations some information may be lost; human review is required.

INT4GPU

INT4GPU INT4NPU INT4CPU INT8GPU INT8NPU

Processes documents into optimised IdeaBlocks
1B runs on ≥ 2 GB vRAM; 3B on ≥ 3 GB; 8B on ≥ 5 GB
Universal (MLC) builds run on any GPU; OpenVINO builds are optimised for Intel
Read more about Blockify here

Jina Embeddings V2

Vector-search embeddings

Vector Search

SPEED

Instant

MEMORY

2 GB+

Light-weight model for high-quality text embeddings.

CPU

Uses CPU and normal Memory
Model details

NLLB

NLLB Text Translation

Multi-language text translation

Translation

SPEED

Very Fast

SIZE

Medium

MODALITY

Text → Text

No Language Left Behind (NLLB) model for high-quality text translation across 200+ languages.

CPU

Supports 200+ languages
Uses CPU and normal Memory
Model details

AirgapAI Local Model Downloads

Large Language Models

Qwen 3.5

Gemma 4

Llama 3.2

Llama 3.1 8B

Saul 7B Instruct

Arcee AFM 4.5B

IBM Granite 4.1

Document Processing

Blockify

Embeddings & Vector Search

Jina Embeddings V2

Translation

NLLB Text Translation