Documentation Index Fetch the complete documentation index at: https://developers.webai.com/llms.txt
Use this file to discover all available pages before exploring further.
OasisHost is the interface to webAI’s local AI runtime. It lets your app run language model inference directly on the user’s hardware — no cloud, no API keys, no network requests. Your app acquires the runtime, sends prompts, and receives streamed token responses.
Quick start
Here’s the minimal flow to get AI working in your app:
import { getOasisHost } from './webai' ;
async function askAI ( prompt ) {
const host = getOasisHost ();
if ( ! host ) throw new Error ( 'AI not available outside webAI.' );
const release = await host . acquire ({ warmRuntime: true });
try {
const response = await host . request ( prompt , {
systemPrompt: 'You are a helpful assistant.' ,
maxTokens: 2048 ,
temperature: 0.7 ,
onToken : ( token ) => process . stdout . write ( token ),
});
return response ;
} finally {
if ( release ) release ();
}
}
Checking runtime status
Before sending requests, check whether a model is loaded and ready. OasisHost exposes a getStatus() method that returns the current state of the AI runtime.
function getOasisState () {
const host = getOasisHost ();
if ( ! host ?. getStatus ) return 'waiting' ;
const status = host . getStatus ();
if ( status ?. lastModel ) return 'ready' ;
if ( status ?. loadingModel || status ?. isGenerating ) return 'loading' ;
return 'waiting' ;
}
The three states your app should handle:
State Meaning Recommended UX readyA model is loaded and idle Enable AI features, show a green indicator loadingA model is loading or actively generating Disable new requests, show a loading state waitingNo model loaded or OasisHost unavailable Show a “no model” notice, disable AI features
Full status object
The getStatus() return object includes additional fields for advanced use cases:
Field Type Description hasRuntimebooleanWhether the inference runtime is initialized lastModelstring | nullID of the currently loaded model loadingModelstring | nullID of a model currently being loaded isGeneratingbooleanWhether the model is actively generating tokens refCountnumberNumber of active runtime consumers modulesLoadedbooleanWhether all required modules are loaded deviceProfileobject | nullHardware capabilities (GPU, memory, backend support) backendSelectionobject | nullWhich inference backend was selected and why modelSelectionobject | nullWhich model was selected for the current backend bootstrapPhasestringCurrent boot phase of the runtime nativeServerobject | nullStatus of the native inference server (MLX/llama.cpp)
For most apps, checking lastModel, loadingModel, and isGenerating is sufficient. The additional fields are useful for diagnostic tools or apps that need to display detailed runtime info.
Polling for status changes
The AI runtime state can change at any time — a user might load a new model, or a generation might finish. Poll getStatus() on an interval to keep your UI in sync.
const [ oasisState , setOasisState ] = useState ( 'waiting' );
useEffect (() => {
const probe = () => {
const host = window . OasisHost ?? window . parent ?. OasisHost ;
if ( ! host ?. getStatus ) return 'waiting' ;
const s = host . getStatus ();
if ( s ?. lastModel ) return 'ready' ;
if ( s ?. loadingModel || s ?. isGenerating ) return 'loading' ;
return 'waiting' ;
};
setOasisState ( probe ());
const id = setInterval (() => setOasisState ( probe ()), 1200 );
return () => clearInterval ( id );
}, []);
const oasisState = ref ( 'waiting' );
let oasisInterval = null ;
onMounted (() => {
const probe = () => {
const host = window . OasisHost ?? window . parent ?. OasisHost ;
if ( ! host ?. getStatus ) return 'waiting' ;
const s = host . getStatus ();
if ( s ?. lastModel ) return 'ready' ;
if ( s ?. loadingModel || s ?. isGenerating ) return 'loading' ;
return 'waiting' ;
};
oasisState . value = probe ();
oasisInterval = setInterval (() => { oasisState . value = probe (); }, 1200 );
});
onUnmounted (() => clearInterval ( oasisInterval ));
A 1200ms polling interval strikes a good balance between responsiveness and performance. Avoid polling faster than 500ms.
Acquiring the runtime
Before sending a request, your app must acquire exclusive access to the AI runtime. This prevents multiple apps from competing for the same GPU resources.
const release = await host . acquire ({ warmRuntime: true });
Parameters:
Parameter Type Description warmRuntimebooleanWhen true, pre-warms the inference runtime for faster first response
Returns: A release function. Call it when you’re done with the runtime to let other apps use it.
Always call the release function when your request completes — even if it fails. Use a try/finally block to guarantee cleanup.
Streaming completions
The core of OasisHost is the request() method. It sends a prompt to the loaded model and streams the response back token-by-token.
const fullText = await host . request ( prompt , {
systemPrompt: 'You are a helpful assistant.' ,
maxTokens: 2048 ,
temperature: 0.7 ,
onToken : ( token ) => {
// Called for each token as it's generated
// Accumulate tokens yourself for UI updates
},
});
Parameters
Parameter Type Default Description promptstring— The user’s input prompt systemPromptstring''Instructions that guide the model’s behavior maxTokensnumber2048Maximum number of tokens to generate temperaturenumber0.7Controls randomness. Lower = more deterministic, higher = more creative onToken(token: string) => void— Callback invoked with each generated token for real-time streaming personastring— Optional persona type to use for this request appIdstring— Your app’s identifier, used for persona permission checks onPersonaStart(name: string) => void— Called when a persona begins generating (useful in multi-persona mode) onPersonaEnd(name: string) => void— Called when a persona finishes generating
Return value
request() returns a Promise<string> that resolves to the full accumulated response text once generation is complete.
Complete example
Here’s a reusable helper module that wraps the full acquire → request → release lifecycle:
// src/webai.js
export const getOasisHost = () =>
window . OasisHost ?? window . parent ?. OasisHost ?? null ;
export async function streamCompletion ( prompt , systemPrompt , onToken ) {
const host = getOasisHost ();
if ( ! host ) throw new Error ( 'Oasis AI is not available in this environment.' );
const release = await host . acquire ({ warmRuntime: true });
try {
return await host . request ( prompt , {
systemPrompt: systemPrompt ?? '' ,
maxTokens: 2048 ,
temperature: 0.7 ,
onToken ,
});
} finally {
if ( release ) release ();
}
}
And a React component that uses it:
import { useState , useEffect } from 'react' ;
import { getOasisState , streamCompletion } from './webai' ;
function AIChat () {
const [ oasisState , setOasisState ] = useState ( 'waiting' );
const [ prompt , setPrompt ] = useState ( '' );
const [ output , setOutput ] = useState ( '' );
const [ isGenerating , setIsGenerating ] = useState ( false );
useEffect (() => {
setOasisState ( getOasisState ());
const id = setInterval (() => setOasisState ( getOasisState ()), 1200 );
return () => clearInterval ( id );
}, []);
async function handleRun () {
if ( ! prompt . trim () || isGenerating ) return ;
setIsGenerating ( true );
setOutput ( '' );
try {
await streamCompletion (
prompt ,
'You are a helpful assistant.' ,
( token ) => setOutput (( prev ) => prev + token )
);
} catch ( err ) {
setOutput ( 'Error: ' + err . message );
} finally {
setIsGenerating ( false );
}
}
return (
< div >
< textarea
value = { prompt }
onChange = { ( e ) => setPrompt ( e . target . value ) }
placeholder = "Ask something..."
/>
< button onClick = { handleRun } disabled = { isGenerating || oasisState !== 'ready' } >
{ isGenerating ? 'Generating...' : 'Run' }
</ button >
{ output && < pre > { output } </ pre > }
</ div >
);
}
Chat memory
OasisHost provides per-app chat memory that persists across sessions. Memory is stored locally and can be injected into prompts automatically.
loadAppChatHistory(appId)
Load the persisted chat memory for an app. Returns a structured object containing a rolling summary, learned preferences, recent conversation turns, and tool outcomes.
clearAppChatHistory(appId)
Clear all chat memory for an app.
Memory context in requests
When calling request(), two options control memory behavior:
Option Type Default Description memoryContextboolean | 'auto''auto''auto' follows the shell’s Memory toggle. true always injects memory. false skips memory for this request.chatSessionstring— Session ID for isolating memory between independent conversations in the same app.
Chat history is always auto-saved when appId is present in the request, regardless of the memoryContext setting.
Use chatSession when your app has multiple independent chat threads. Without it, memory is filtered by persona ID.
Personas
OasisHost exposes persona management methods for apps that need to work with custom AI behaviors.
getPersonas()
Returns all available personas as an array.
getPersonasWithPermissions(appId?)
Returns personas that the specified app has permission to use, keyed by specialty type.
const personas = host . getPersonasWithPermissions ( 'my-app' );
// { "research": { id: "...", name: "Research Assistant", type: "research" } }
getActivePersona(appId?, personaType?)
Returns the currently active persona for a given app and type, or null.
loadPersona(personaType, options?)
Loads a persona’s model into memory. Throws if no persona is found for the given type.
await host . loadPersona ( 'coding' , {
appId: 'my-app' ,
onProgress : ( p ) => console . log ( ` ${ p } % loaded` )
});
requestPersonaAccess(appId?, personaType?)
Prompts the user to grant your app permission to use a persona. Returns true if granted.
removePersonaPermission(appId?, personaType?)
Revokes persona access for your app.
getLoadedPersonas()
Returns a map of persona types currently loaded in memory.
Persona methods are optional and only available when running inside the Apogee shell with the persona manager active. Always check for null before calling.
Runtime constants
These are the default values the AI runtime uses. You can override maxTokens and temperature per-request, but the others are system-level.
Constant Value Default max tokens 2048 Default temperature 0.7 Request timeout 150,000ms (2.5 minutes) Tool turn max tokens 800 Max tool turns per message 3
Next steps
Navigation API Control shell navigation from your app.
Collaboration API Add real-time multiplayer features.