Oasis

Oasis is the local AI engine built into webAI. It runs large language models directly on your hardware using , MLX, or llama.cpp — so your conversations, data, and context never leave your machine.

Why Oasis?

Private by design. Your conversations and data never leave your machine. This isn’t a policy promise; it’s an architectural guarantee. There’s nothing to breach because there’s nothing sent.
Works without the internet. Oasis runs AI models locally using your device’s GPU (via WebGPU), so it works offline, on a plane, or anywhere else.
Nothing to install. Oasis is browser-native. If your browser supports WebGPU, you’re ready. No additional downloads or dependencies.
Adapts to you. Switch between personas — research, creative, builder — without reloading. One runtime, many contexts.
Can take action. Oasis isn’t just a chatbot. Its agent system can plan multi-step tasks, use tools, automate browser actions, and execute workflows — all locally.

The big idea: intelligence should be a property of your device and a resource of your network — not a service rented from the cloud. Oasis makes that real.

What you can do with Oasis

Have a private conversation

Chat with an AI that runs entirely on your hardware. Stream responses in real time, see reasoning traces, and adjust how the model thinks (temperature, sampling) without any data leaving your machine.

Create and switch personas

Define custom AI behaviors — a research assistant, a writing partner, a math tutor — each with their own system prompt, tool permissions, and fine-tuned adapters. Switch between them mid-conversation. See Personas for details.

Get rich, structured output

Oasis doesn’t just return text. It can render charts, comparison tables, code blocks, data cards, stat grids, and more — directly in the conversation.

Choose your model and backend

Load models from HuggingFace (0.5B to 32B parameters), pick your inference backend (WebGPU, MLX, or llama.cpp), and attach adapters — all from a settings panel, no command line required.

WebGPU
MLX
llama.cpp

Browser-native GPU inference. Works on any platform with a supported browser (Chrome 113+, Edge 113+). Supports models up to 32B parameters depending on available GPU memory.

The system automatically selects the best backend for your hardware, or you can choose manually. See On-Device AI for the full breakdown of supported models, tiers, and hardware routing.

Supported models

Oasis supports models ranging from 0.5B to 235B parameters across all backends. A few highlights:

Model	Size	Best for
Qwen3 1.7B	Small	Quick answers, everyday use on most devices
Qwen3 4B	Medium	Balanced performance and capability
Qwen3 8B	Medium	Research, writing, and code
Qwen3 14B	Large	Detailed reasoning and analysis
Qwen3 32B	Large	Complex reasoning, multi-step tasks

Choosing a model

Not sure which model to pick? See the practical guide to choosing the right model for your hardware.

Getting Started

Using webAI

Apps

Key Concepts

Settings

Why Oasis?

What you can do with Oasis

Have a private conversation

Create and switch personas

Get rich, structured output

Choose your model and backend

Supported models

Choosing a model

Getting Started

Using webAI

Apps

Key Concepts

Settings

Documentation Index

​Why Oasis?

​What you can do with Oasis

​Have a private conversation

​Create and switch personas

​Get rich, structured output

​Choose your model and backend

​Supported models

Choosing a model

Why Oasis?

What you can do with Oasis

Have a private conversation

Create and switch personas

Get rich, structured output

Choose your model and backend

Supported models