Skip to main content
OasisHost is the interface to webAI’s local AI runtime. It lets your app run language model inference directly on the user’s hardware — no cloud, no API keys, no network requests. Your app acquires the runtime, sends prompts, and receives streamed token responses.

Quick start

Here’s the minimal flow to get AI working in your app:
import { getOasisHost } from './webai';

async function askAI(prompt) {
  const host = getOasisHost();
  if (!host) throw new Error('AI not available outside webAI.');

  const release = await host.acquire({ warmRuntime: true });
  try {
    const response = await host.request(prompt, {
      systemPrompt: 'You are a helpful assistant.',
      maxTokens: 2048,
      temperature: 0.7,
      onToken: (token) => process.stdout.write(token),
    });
    return response;
  } finally {
    if (release) release();
  }
}

Checking runtime status

Before sending requests, check whether a model is loaded and ready. OasisHost exposes a getStatus() method that returns the current state of the AI runtime.
function getOasisState() {
  const host = getOasisHost();
  if (!host?.getStatus) return 'waiting';

  const status = host.getStatus();
  if (status?.lastModel) return 'ready';
  if (status?.loadingModel || status?.isGenerating) return 'loading';
  return 'waiting';
}
The three states your app should handle:
StateMeaningRecommended UX
readyA model is loaded and idleEnable AI features, show a green indicator
loadingA model is loading or actively generatingDisable new requests, show a loading state
waitingNo model loaded or OasisHost unavailableShow a “no model” notice, disable AI features

Full status object

The getStatus() return object includes additional fields for advanced use cases:
FieldTypeDescription
hasRuntimebooleanWhether the inference runtime is initialized
lastModelstring | nullID of the currently loaded model
loadingModelstring | nullID of a model currently being loaded
isGeneratingbooleanWhether the model is actively generating tokens
refCountnumberNumber of active runtime consumers
modulesLoadedbooleanWhether all required modules are loaded
deviceProfileobject | nullHardware capabilities (GPU, memory, backend support)
backendSelectionobject | nullWhich inference backend was selected and why
modelSelectionobject | nullWhich model was selected for the current backend
bootstrapPhasestringCurrent boot phase of the runtime
nativeServerobject | nullStatus of the native inference server (MLX/llama.cpp)
For most apps, checking lastModel, loadingModel, and isGenerating is sufficient. The additional fields are useful for diagnostic tools or apps that need to display detailed runtime info.

Polling for status changes

The AI runtime state can change at any time — a user might load a new model, or a generation might finish. Poll getStatus() on an interval to keep your UI in sync.
const [oasisState, setOasisState] = useState('waiting');

useEffect(() => {
  const probe = () => {
    const host = window.OasisHost ?? window.parent?.OasisHost;
    if (!host?.getStatus) return 'waiting';
    const s = host.getStatus();
    if (s?.lastModel) return 'ready';
    if (s?.loadingModel || s?.isGenerating) return 'loading';
    return 'waiting';
  };
  setOasisState(probe());
  const id = setInterval(() => setOasisState(probe()), 1200);
  return () => clearInterval(id);
}, []);
A 1200ms polling interval strikes a good balance between responsiveness and performance. Avoid polling faster than 500ms.

Acquiring the runtime

Before sending a request, your app must acquire exclusive access to the AI runtime. This prevents multiple apps from competing for the same GPU resources.
const release = await host.acquire({ warmRuntime: true });
Parameters:
ParameterTypeDescription
warmRuntimebooleanWhen true, pre-warms the inference runtime for faster first response
Returns: A release function. Call it when you’re done with the runtime to let other apps use it.
Always call the release function when your request completes — even if it fails. Use a try/finally block to guarantee cleanup.

Streaming completions

The core of OasisHost is the request() method. It sends a prompt to the loaded model and streams the response back token-by-token.
const fullText = await host.request(prompt, {
  systemPrompt: 'You are a helpful assistant.',
  maxTokens: 2048,
  temperature: 0.7,
  onToken: (token) => {
    // Called for each token as it's generated
    // Accumulate tokens yourself for UI updates
  },
});

Parameters

ParameterTypeDefaultDescription
promptstringThe user’s input prompt
systemPromptstring''Instructions that guide the model’s behavior
maxTokensnumber2048Maximum number of tokens to generate
temperaturenumber0.7Controls randomness. Lower = more deterministic, higher = more creative
onToken(token: string) => voidCallback invoked with each generated token for real-time streaming
personastringOptional persona type to use for this request
appIdstringYour app’s identifier, used for persona permission checks
onPersonaStart(name: string) => voidCalled when a persona begins generating (useful in multi-persona mode)
onPersonaEnd(name: string) => voidCalled when a persona finishes generating

Return value

request() returns a Promise<string> that resolves to the full accumulated response text once generation is complete.

Complete example

Here’s a reusable helper module that wraps the full acquire → request → release lifecycle:
// src/webai.js
export const getOasisHost = () =>
  window.OasisHost ?? window.parent?.OasisHost ?? null;

export async function streamCompletion(prompt, systemPrompt, onToken) {
  const host = getOasisHost();
  if (!host) throw new Error('Oasis AI is not available in this environment.');

  const release = await host.acquire({ warmRuntime: true });
  try {
    return await host.request(prompt, {
      systemPrompt: systemPrompt ?? '',
      maxTokens: 2048,
      temperature: 0.7,
      onToken,
    });
  } finally {
    if (release) release();
  }
}
And a React component that uses it:
import { useState, useEffect } from 'react';
import { getOasisState, streamCompletion } from './webai';

function AIChat() {
  const [oasisState, setOasisState] = useState('waiting');
  const [prompt, setPrompt] = useState('');
  const [output, setOutput] = useState('');
  const [isGenerating, setIsGenerating] = useState(false);

  useEffect(() => {
    setOasisState(getOasisState());
    const id = setInterval(() => setOasisState(getOasisState()), 1200);
    return () => clearInterval(id);
  }, []);

  async function handleRun() {
    if (!prompt.trim() || isGenerating) return;
    setIsGenerating(true);
    setOutput('');
    try {
      await streamCompletion(
        prompt,
        'You are a helpful assistant.',
        (token) => setOutput((prev) => prev + token)
      );
    } catch (err) {
      setOutput('Error: ' + err.message);
    } finally {
      setIsGenerating(false);
    }
  }

  return (
    <div>
      <textarea
        value={prompt}
        onChange={(e) => setPrompt(e.target.value)}
        placeholder="Ask something..."
      />
      <button onClick={handleRun} disabled={isGenerating || oasisState !== 'ready'}>
        {isGenerating ? 'Generating...' : 'Run'}
      </button>
      {output && <pre>{output}</pre>}
    </div>
  );
}

Chat memory

OasisHost provides per-app chat memory that persists across sessions. Memory is stored locally and can be injected into prompts automatically.

loadAppChatHistory(appId)

Load the persisted chat memory for an app. Returns a structured object containing a rolling summary, learned preferences, recent conversation turns, and tool outcomes.

clearAppChatHistory(appId)

Clear all chat memory for an app.

Memory context in requests

When calling request(), two options control memory behavior:
OptionTypeDefaultDescription
memoryContextboolean | 'auto''auto''auto' follows the shell’s Memory toggle. true always injects memory. false skips memory for this request.
chatSessionstringSession ID for isolating memory between independent conversations in the same app.
Chat history is always auto-saved when appId is present in the request, regardless of the memoryContext setting.
Use chatSession when your app has multiple independent chat threads. Without it, memory is filtered by persona ID.

Personas

OasisHost exposes persona management methods for apps that need to work with custom AI behaviors.

getPersonas()

Returns all available personas as an array.

getPersonasWithPermissions(appId?)

Returns personas that the specified app has permission to use, keyed by specialty type.
const personas = host.getPersonasWithPermissions('my-app');
// { "research": { id: "...", name: "Research Assistant", type: "research" } }

getActivePersona(appId?, personaType?)

Returns the currently active persona for a given app and type, or null.

loadPersona(personaType, options?)

Loads a persona’s model into memory. Throws if no persona is found for the given type.
await host.loadPersona('coding', {
  appId: 'my-app',
  onProgress: (p) => console.log(`${p}% loaded`)
});

requestPersonaAccess(appId?, personaType?)

Prompts the user to grant your app permission to use a persona. Returns true if granted.

removePersonaPermission(appId?, personaType?)

Revokes persona access for your app.

getLoadedPersonas()

Returns a map of persona types currently loaded in memory.
Persona methods are optional and only available when running inside the Apogee shell with the persona manager active. Always check for null before calling.

Runtime constants

These are the default values the AI runtime uses. You can override maxTokens and temperature per-request, but the others are system-level.
ConstantValue
Default max tokens2048
Default temperature0.7
Request timeout150,000ms (2.5 minutes)
Tool turn max tokens800
Max tool turns per message3

Next steps