English | 日本語
This is a hand-curated reference for every symbol exposed at vrcpilot.<name>. For runnable examples see usage.md; for the equivalent CLI see cli.md. Function signatures match the source as of 0.2.0rc1.
- All
vrcpilot.<name>symbols are listed insrc/vrcpilot/__init__.py::__all__. - Module attributes
vrcpilot.keyboard,vrcpilot.mouse, andvrcpilot.clipboardare also part of the public surface. - Most call sites that send synthetic input or interact with the VRChat window expect VRChat to be running and focused. That requirement is enforced by
ensure_target(), and the high-level helpers call it for you. The relevant exceptions (VRChatNotRunningError,VRChatNotFocusedError) are re-raised so callers can recover. - Coordinate-bearing types (
OCRWord,OCRResult,Detection,DetectResult) expose only window-local coordinates (polygon/bbox, origin at the VRChat window's top-left).mouse.move(x, y)consumes the same window-local frame, so OCR / detect bboxes feed in directly without translation — see coordinate system. - Code blocks use
...as the body of every signature so they paste back cleanly into a Python REPL or stub file.
Resolved from installed distribution metadata via importlib.metadata, so it stays in sync with the package version in pyproject.toml.
def launch(
*,
app_id: int = 438100,
steam_path: Path | None = None,
no_vr: bool = False,
screen_width: int | None = None,
screen_height: int | None = None,
osc: OscConfig | None = None,
extra_args: list[str] | None = None,
wait_timeout: float = 30.0,
wait_interval: float = 1.0,
) -> int | None: ...Start VRChat through Steam. The new process is detached from the calling process group. After spawning Steam, launch() polls find_pids() for up to wait_timeout seconds (default 30s) and returns the observed PID. Pass wait_timeout=0 (or any non-positive value) to skip the wait and return immediately. This is useful for "fire and forget" launches where you intend to poll later yourself.
Returns: the PID once VRChat is observed, or None if wait_timeout <= 0 or the timeout is exceeded. A None return on a positive timeout is not an exception — branch on the return value if you need a stricter signal.
app_id defaults to VRChat's Steam app id. If you need to reference the constant directly, for example when building a custom launch wrapper, import it from the implementation module: from vrcpilot.process import VRCHAT_STEAM_APP_ID.
Raises: SteamNotFoundError when no Steam executable is found.
def terminate(*, timeout: float = 5.0) -> list[int]: ...Force-kill every running VRChat process and wait up to timeout seconds for them to exit. Idempotent — returns an empty list when nothing is running.
Returns: the PIDs that were signalled.
def find_pids() -> list[int]: ...Returns: PIDs of every running VRChat process, sorted newest-first by psutil.Process.create_time(). The list is empty when no VRChat is running.
@dataclass(frozen=True)
class OscConfig:
in_port: int = 9000
out_ip: str = "127.0.0.1"
out_port: int = 9001
def to_launch_arg(self) -> str: ...Structured form of VRChat's --osc=<in>:<ip>:<out> flag. to_launch_arg() renders the single CLI token used at launch.
Raised by launch() (and the Steam discovery helpers) when no Steam executable can be located.
def focus() -> bool: ...Bring the VRChat window to the foreground (and de-minimize it if needed).
Returns: True on success, False when VRChat is not running, the window is not mapped, the platform call fails, or the session is Wayland-native.
Raises: NotImplementedError on platforms other than Windows / Linux.
def unfocus() -> bool: ...Send the VRChat window to the bottom of the z-order without raising any other window. Same return / raise contract as focus().
def is_foreground() -> bool: ...Returns: True iff the VRChat window is currently in the foreground.
class Capture:
def __init__(self, *, frame_timeout: float = 2.0) -> None: ...
def read(self) -> np.ndarray: ...
def close(self) -> None: ...Streaming capture session for the VRChat window. Captures without focus. The internal buffer keeps only the most recent frame, so read() always returns "now".
read()returns(H, W, 3)uint8RGB.close()is idempotent and never raises.- Supports
with(context manager). frame_timeoutis the per-frame wait in seconds; must be> 0.
Raises:
NotImplementedErroron platforms other than Windows / Linux.RuntimeErrorwhen the backend cannot start (VRChat not running, window not mapped, X11 unavailable, WGC session failure, Wayland-native).ValueErrorwhenframe_timeout <= 0.
class CaptureLoop:
def __init__(
self,
callback: Callable[[np.ndarray], None],
*,
fps: float,
frame_timeout: float = 2.0,
) -> None: ...
@property
def is_running(self) -> bool: ...
def start(self) -> None: ...
def stop(self) -> None: ...
def close(self) -> None: ...Drives a Capture on a background thread at a fixed fps. Each frame is delivered to callback as (H, W, 3) uint8 RGB. Supports with.
Raises: ValueError when fps or frame_timeout is non-positive; RuntimeError when the inner Capture cannot start; NotImplementedError on unsupported platforms.
The CLI vrcpilot record command composes CaptureLoop (for video) and SpeakerLoop (for audio) with internal PyAV-backed muxers (vrcpilot.cli.record.muxer, not part of the public surface). Build your own writer by passing a callback that consumes (H, W, 3) uint8 RGB frames — for example wrap PyAV or a ffmpeg subprocess to mux into whatever container you need.
Process-isolated audio capture for VRChat. On Linux the backend is a native PipeWire pipeline (virtual null-sink + pw-link + pw-record); on Windows it is proc-tap, a native extension that taps a single PID's audio rather than the whole system mix. Either way the stream contains only VRChat's output — Discord, OBS, and other applications are not mixed in. import vrcpilot raises ImportError on any other sys.platform, so no other Speaker backend is reachable.
The backend produces float32 (N, CHANNELS) chunks at 48 kHz stereo. The CLI vrcpilot record command muxes these via internal PyAV-backed writers (vrcpilot.cli.record.muxer, not part of the public surface); to persist audio from your own code, feed the chunks into a writer of your choice — for example PyAV (WAV, MP4, MKV, ...) or a ffmpeg subprocess. The (N, 2) float32 layout maps cleanly onto PyAV's planar/packed float frames.
class Speaker:
def __init__(self, *, read_timeout: float = 2.0) -> None: ...
def read(self) -> NDArray[np.float32]: ...
def close(self) -> None: ...Context-managed capture session. VRChat must already be running when the constructor is called; otherwise RuntimeError is raised. Each read() returns every sample buffered since the previous call as a (N, 2) float32 ndarray. N == 0 is a valid "no new audio" signal (returned when read_timeout expires on a quiet stream). close() is idempotent and never raises. Supports with.
Raises:
RuntimeErrorwhen VRChat is not running or the Speaker backend cannot start (the PipeWire pipeline on Linux,proc-tapon Windows).ValueErrorwhenread_timeout <= 0.
class SpeakerLoop:
def __init__(
self,
callback: AudioCallback,
*,
chunk_seconds: float = 0.05,
read_timeout: float = 2.0,
) -> None: ...
@property
def is_running(self) -> bool: ...
def start(self) -> None: ...
def stop(self) -> None: ...
def close(self) -> None: ...Background-thread driver around Speaker. Constructs and owns its own Speaker, so VRChat must already be running when the loop is instantiated. Each tick drains one chunk and forwards it to callback; the worker sleeps chunk_seconds between drains (default 50 ms, chosen to match the backend buffer cadence). Empty chunks are forwarded verbatim so consumers can treat them as a "silence tick". Exceptions raised by the callback or by Speaker.read() are captured and re-raised on the next stop() / close() so worker-thread failures are never lost. Supports with.
Raises: ValueError when chunk_seconds or read_timeout is non-positive; RuntimeError from the inner Speaker.
type AudioCallback = Callable[[NDArray[np.float32]], None]The chunk-callback signature accepted by SpeakerLoop. Each callback invocation receives one (N, 2) float32 chunk; an N == 0 chunk is a silence tick.
SpeakerLoop accepts any callable that consumes (N, 2) float32 chunks. The example below collects everything into a single ndarray; in real code you would instead feed each chunk to a streaming writer (PyAV, a ffmpeg subprocess, a network socket, etc.).
import time
import numpy as np
from numpy.typing import NDArray
from vrcpilot.speaker import SpeakerLoop
chunks: list[NDArray[np.float32]] = []
# VRChat must already be running; SpeakerLoop raises RuntimeError otherwise.
with SpeakerLoop(chunks.append, chunk_seconds=0.05) as loop:
loop.start()
time.sleep(5.0)
audio = np.concatenate(chunks, axis=0) if chunks else np.empty((0, 2), np.float32)To persist the recording, write audio (or each incoming chunk) with PyAV, the standard wave module, or the vrcpilot record CLI command — see cli.md record.
Forward the audio of a single VRChat PID to a chosen OS output device by pairing the existing PID-scoped SpeakerLoop capture with a soundcard output player. Cross-platform — the same module is used on Windows (proc-tap capture) and Linux (PipeWire-native capture). The PID-scoped capture above feeds two downstream consumers: vrcpilot record persists it to a file, while this module relays it live to another device. Because the relay happens in user space rather than through OS audio policy (IAudioPolicyConfig / EarTrumpet), per-PID isolation holds even when several VRChat.exe instances are running. See virtual-audio.md for the user-facing playbook (virtual-cable setup, latency tuning, mic feedback avoidance) and cli.md for the matching vrcpilot speaker CLI.
@dataclass(frozen=True, slots=True)
class AudioDevice:
id: str
name: str
is_default: boolImmutable handle to an OS output speaker. id is the opaque backend identifier from soundcard (Windows endpoint GUID, PipeWire node identifier) — treat it as a black box and only pass it back into find_device / Router. name is the user-visible friendly name (Windows FriendlyName, PipeWire node.description). is_default is True iff this is the OS default output; at most one device per list_devices() result has the flag set. frozen=True so a resolved device is safe to share between threads.
def list_devices() -> list[AudioDevice]: ...Enumerate every visible output device. The OS default (if any) is first; the remaining devices follow in name-ascending order (Python's default codepoint comparison, case-sensitive). Returns [] cleanly when no output device exists — not an error.
Raises: ImportError when soundcard is not installed; OSError when soundcard cannot load libpulse / WASAPI.
def default_device() -> AudioDevice: ...Returns: the OS default output device.
Raises: DeviceNotFoundError when the host has no output device at all; ImportError / OSError as for list_devices().
def find_device(query: str) -> AudioDevice: ...Resolve query to a single output device through three strict stages, each of which stops on a unique hit and raises immediately (without falling through) on two-or-more hits:
query == device.id(exact).query == device.name(exact, case-sensitive).query.lower() in device.name.lower()(substring, case-insensitive).
This means an ambiguous substring query fails loudly rather than silently picking one device. Use the full id or the exact name to disambiguate.
Raises: AudioRoutingError when any stage matched two or more devices; DeviceNotFoundError when all three stages matched zero; ImportError / OSError as for list_devices().
class Router:
def __init__(
self,
pid: int,
device: str | AudioDevice | None = None,
*,
chunk_seconds: float = 0.02,
blocksize: int | None = None,
) -> None: ...
@property
def device(self) -> AudioDevice: ...
@property
def is_running(self) -> bool: ...
def start(self) -> None: ...
def stop(self) -> None: ...
def close(self) -> None: ...
def __enter__(self) -> Self: ...
def __exit__(self, exc_type, exc_val, exc_tb) -> None: ...Relay one VRChat PID's audio to a chosen output device. Construction resolves device (None → default_device(), str → find_device(...), AudioDevice → used directly) but does not open any audio stream; start() is what acquires resources. Use the explicit start() / stop() pair, the close() alias, or the with block — all three are interchangeable.
pid is the target VRChat PID and is forwarded to the inner SpeakerLoop; None is intentionally not accepted because the whole point of this module is per-PID separation under multi-instance VRChat. chunk_seconds and blocksize are the capture-side and output-side buffer knobs respectively (see virtual-audio.md for tuning guidance); chunk_seconds=0.02 keeps latency low at the cost of some underrun risk, and blocksize=None lets soundcard pick its backend default.
Lifecycle: start() opens the output player first, then spawns the inner SpeakerLoop; if the capture side fails, the already-opened player is rolled back before the original exception propagates, so a partial start never leaks a resource. stop() tears both down in the reverse order with the player release in a finally block, so a worker-thread exception (e.g. VRChat died mid-relay) still releases the player before being re-raised. Double-start() / double-stop() are intentional no-ops. After a successful stop() the instance is re-startable: a fresh SpeakerLoop and player pair are created on the next start(), which is what makes re-entering the with block well-defined.
VRChat process death does not interrupt start() itself — the worker-thread exception is captured by SpeakerLoop and surfaces on the next stop() / close() / __exit__, mirroring SpeakerLoop's contract.
Thread safety: start() / stop() are expected to be called from the main thread. Audio frames arrive on the SpeakerLoop worker thread; a callback racing with stop() sees a None player snapshot and skips its play() call without locking.
Raises (from start()):
RuntimeErrorwhenSpeakerLoop/Speakerfails to start (e.g. VRChat process not running).ValueErrorwhenchunk_seconds <= 0(surfaced bySpeakerLoop.__init__).OSErrorwhensoundcardfails to open the player (device disappeared between resolution andstart(), libpulse / WASAPI runtime error, etc.).NotImplementedErrorwhen the host is neither Windows nor Linux (surfaced by theSpeakerbackend dispatch).
def route(
pid: int,
device: str | AudioDevice | None = None,
*,
chunk_seconds: float = 0.02,
blocksize: int | None = None,
) -> Router: ...Construct a Router and call start() in one step. The returned Router is already running; the caller owns the lifecycle from there (call stop() / close(), or wrap the result in with). If start() raises, the exception propagates and no Router is returned — start() itself rolled back the partial player, so there is nothing for the caller to clean up. Arguments mean exactly what they do on Router.__init__.
RuntimeError subclass and the package-wide base for routing failures — catch this for a single except clause covering both ambiguous-resolution and device-not-found cases. Direct (non-subclass) instances are raised when find_device matches two or more devices in the same stage.
Subclass of AudioRoutingError. Raised by find_device when all three resolution stages return zero hits, and by default_device when the host has no output device at all.
import time
from vrcpilot.speaker.routing import Router, find_device, list_devices
# Pick a target speaker. find_device("CABLE") would also work if the
# substring is unambiguous on this host.
device = next(d for d in list_devices() if "CABLE" in d.name)
# VRChat must already be running; Router.start raises RuntimeError otherwise.
with Router(pid=12345, device=device) as router:
assert router.is_running
time.sleep(5.0)
# Leaving the `with` block stops the relay and releases the output player.For the matching CLI (vrcpilot speaker list / vrcpilot speaker route --pid N), see cli.md.
Stream float32 PCM into a virtual-cable output device so it appears to VRChat as live microphone input. The primary use case is piping an LLM agent's TTS chunks directly into VRChat without ever touching a real microphone or an intermediate audio file. The session opens a soundcard player in __init__ and keeps it alive until the instance is closed; play(chunk) writes a single chunk per call so callers drive the cadence themselves (for chunk in tts.stream(): mic.play(chunk)). On Windows the default device is VB-Audio Virtual Cable's "CABLE Input"; on Linux the default is the "VRCPilotMic" PipeWire sink created by vrcpilot.mic.linux.register_virtual_mic (or by running vrcpilot linux-mic register) — when running multiple AI agent instances in parallel, register a dedicated VRCPilotMic_<suffix> for each via vrcpilot.mic.linux.register_virtual_mic(suffix=...) so each instance's TTS gets its own isolated input path into VRChat.
class Mic:
def __init__(
self,
device: str | None = None,
*,
sample_rate: int = 48000,
channels: int = 1,
) -> None: ...
@property
def device_name(self) -> str: ...
@property
def device_id(self) -> str: ...
@property
def sample_rate(self) -> int: ...
@property
def channels(self) -> int: ...
def play(self, chunk: NDArray[np.float32]) -> None: ...
def close(self) -> None: ...
def __enter__(self) -> Self: ...
def __exit__(self, exc_type, exc_val, exc_tb) -> None: ...device is matched as a case-insensitive substring against the names soundcard reports (matching covers both Speaker.name and Speaker.id, with fuzzy id matching). None defers to $VRCPILOT_MIC_DEVICE, then to the OS default returned by default_device_name(). The constructor resolves the device, opens a soundcard player for (sample_rate, channels), and enters it — those values are baked in for the lifetime of the session, so reconfiguring means constructing a new Mic.
device_id exposes the underlying soundcard Speaker.id as a string. On Linux this is the PulseAudio sink name (e.g. "VRCPilotMic"); on Windows it is the WASAPI device id string surfaced by soundcard.
play(chunk) writes one float32 array per call. The chunk shape must match the configured channel count ((N,) for mono, (N, channels) for multi-channel) or ValueError is raised. The call blocks if the backend's internal buffer is full, giving the caller natural back-pressure for live TTS streams.
The stream is released by close(), by leaving the with block, or as a best-effort fallback in __del__. Prefer the context manager — __del__ runs at GC time and cannot be relied on for prompt resource release on every interpreter.
Raises:
MicDeviceNotFoundErrorwhen no output device matches the resolved name, or no default is configured for this platform.ImportErrorwhensoundcardis not installed (the lazy import happens during construction).RuntimeErrorfrom thesoundcardbackend (libpulse on Linux, WASAPI on Windows) when it cannot open the player, or fromplay()after the Mic has been closed.OSErrorwhen the native backend shared library cannot be loaded (e.g.libpulse0is missing on Linux).ValueErrorwhensample_rate/channelsis not strictly positive, or whenplay()receives a non-float32chunk, a chunk withndimoutside{1, 2}, or a chunk whose channel count disagrees with the constructor.
RuntimeError subclass raised when soundcard cannot locate an output device matching the resolved name. The message lists every output device soundcard currently sees and includes an OS-specific setup hint (vrcpilot linux-mic register on Linux, VB-Cable install link on Windows), which makes mis-named installations easy to diagnose.
def default_device_name() -> str | None: ...The OS-specific default output-device substring. Returns "CABLE Input" on Windows and "VRCPilotMic" on Linux (after vrcpilot linux-mic register). Returns None on other platforms. When you register additional sinks via vrcpilot linux-mic register --suffix <name> (or vrcpilot.mic.linux.register_virtual_mic(suffix="<name>")), target them explicitly by sink name (e.g. Mic("VRCPilotMic_<name>")) — default_device_name() only ever returns the empty-suffix "VRCPilotMic".
Environment variable consulted between the constructor argument and default_device_name(). Useful for keeping device names out of source code, or for overriding the Windows default without code changes.
Play a single preloaded buffer:
import numpy as np
import vrcpilot
samples = np.zeros(48000, dtype=np.float32) # 1 second of silence
with vrcpilot.Mic(sample_rate=48000, channels=1) as mic:
mic.play(samples)Stream chunks from a generator (the shape an LLM agent's incremental TTS typically produces):
from collections.abc import Iterator
import numpy as np
from numpy.typing import NDArray
import vrcpilot
def tts_chunks() -> Iterator[NDArray[np.float32]]:
# Replace with the agent's actual chunk iterator.
for _ in range(10):
yield np.zeros(4800, dtype=np.float32) # 100 ms of silence per chunk
with vrcpilot.Mic(sample_rate=48000, channels=1) as mic:
for chunk in tts_chunks():
mic.play(chunk)Linux-only helpers that manage the persistent VRCPilotMic virtual mic in PipeWire. This is the programmatic counterpart of the vrcpilot linux-mic CLI; both write the same config fragment and call the same PulseAudio module_load path.
Every public function takes a suffix keyword so the same machinery can manage multiple sinks side-by-side — the primary use case being running multiple AI agent instances in parallel, each with its own dedicated virtual mic. An empty suffix ("", the default) targets the legacy VRCPilotMic and preserves backward compatibility; a non-empty suffix (e.g. "alt") targets the VRCPilotMic_<suffix> derived sink. suffix may contain only [A-Za-z0-9_-]; anything else raises ValueError.
Importing this submodule on non-Linux platforms raises ImportError at import time (raise ImportError("vrcpilot.mic.linux is Linux-only")), so guard accesses with sys.platform == "linux" (or import lazily) when writing cross-platform code.
def register_virtual_mic(
*,
suffix: str = "",
runtime_load: bool = True,
) -> RegisterResult: ...Persist the VRCPilotMic (or VRCPilotMic_<suffix>) module-null-sink to
$XDG_CONFIG_HOME/pipewire/pipewire.conf.d/vrcpilot-mic[-<suffix>].conf (falling back to ~/.config/... when the variable is unset) and, when runtime_load=True, additionally call pulsectl.Pulse.module_load("module-null-sink", ...) so the sink is usable immediately. The runtime step is best-effort — failures (missing pulsectl, control-plane error) are surfaced via RegisterResult.runtime_warning rather than raised, because the persistent config is the source of truth and will be picked up after the next PipeWire restart / re-login.
suffix selects which sink to act on. An empty suffix (the default) targets the existing VRCPilotMic; a non-empty suffix targets VRCPilotMic_<suffix>. Re-calling with the same suffix is idempotent — any pre-existing runtime module is unloaded before the fresh load, so re-registration never double-stacks.
Returns: a RegisterResult describing what was done.
Raises: OSError when the persistent config cannot be written (permission errors, filesystem failures); ValueError when suffix contains illegal characters.
def unregister_virtual_mic(*, suffix: str = "") -> bool: ...Remove the persistent config fragment for suffix and unload any currently loaded module-null-sink with the matching name. An empty suffix (the default) targets VRCPilotMic; a non-empty suffix targets VRCPilotMic_<suffix>. Returns True if anything was actually removed (config file deleted, runtime module unloaded, or both); False when neither artefact existed. Idempotent — safe to call repeatedly.
Raises: ValueError when suffix contains illegal characters.
def is_registered(*, suffix: str = "") -> bool: ...Return whether the persistent config fragment for suffix exists. An empty suffix checks the default VRCPilotMic; a non-empty suffix checks VRCPilotMic_<suffix>. Does not consult PulseAudio — use the vrcpilot linux-mic status CLI or call open_pulse_control() directly to inspect the runtime module list.
Raises: ValueError when suffix contains illegal characters.
def config_path(*, suffix: str = "") -> Path: ...Absolute path of the PipeWire config fragment for suffix (empty suffix → vrcpilot-mic.conf, non-empty → vrcpilot-mic-<suffix>.conf). Honours $XDG_CONFIG_HOME with ~/.config as the XDG fallback. Public so vrcpilot linux-mic status can surface the location.
Raises: ValueError when suffix contains illegal characters.
def iter_registered_suffixes() -> list[str]: ...Scan pipewire.conf.d/ for vrcpilot-mic*.conf fragments and recover the suffix from each filename. The empty suffix (the default sink) sorts first; the remainder is in lexicographic order so callers — CLI listings, e2e enumerations — get a stable ordering. Returns [] if the config directory does not exist yet.
@dataclass(frozen=True)
class RegisterResult:
config_path: Path
created_config: bool
runtime_loaded: bool
runtime_warning: str | None
suffix: strOutcome of register_virtual_mic:
config_path— absolute path to the persistent config fragment.created_config—Trueiff the call wrote the file (Falsewhen it already existed with the expected contents).runtime_loaded—Trueiff the immediatepulsectlmodule_loadsucceeded.Falsewhen skipped viaruntime_load=Falseor when the runtime step failed (in which caseruntime_warningis populated).runtime_warning— human-readable description of the runtime-load failure, orNonewhen no failure occurred.suffix— the normalised suffix that was registered (empty string for the default sink). The value passed toregister_virtual_mic(suffix=...)round-trips here unchanged.
@dataclass(frozen=True, eq=False)
class Screenshot:
image: NDArray[np.uint8] # (H, W, 3) uint8 RGB
x: int # window top-left, desktop-absolute
y: int
width: int
height: int
monitor_index: int # mss.MSS().monitors index
captured_at: datetime # UTC
def save(self, png_path: Path | None = None) -> str: ...
@classmethod
def load(cls, text: str) -> Screenshot: ...Pixel data plus the window's on-screen geometry (x / y are the window's top-left in desktop-absolute pixels; monitor_index records the mss monitor the capture came from). OCR / detect results are window-local, so this geometry is informational rather than required for clicking. eq=False because numpy arrays cannot be compared element-wise in __eq__.
save() returns a YAML string. When png_path is provided the PNG is written there and the YAML stores path:; otherwise the YAML embeds the PNG as base64 under image:. load() restores either form.
def take_screenshot(*, settle_seconds: float = 0.05) -> Screenshot | None: ...Focus VRChat, sleep settle_seconds, and grab a one-shot capture of the VRChat window only.
Returns: a Screenshot, or None on a recoverable failure (Wayland-native, focus refused, window unmapped, mss error).
Raises: NotImplementedError on unsupported platforms; ValueError when settle_seconds < 0.
@dataclass(frozen=True)
class OCRWord:
text: str
polygon: Polygon # (TL, TR, BR, BL), image-local
confidence: float # 0.0–1.0
@property
def bbox(self) -> tuple[int, int, int, int]: ... # (x, y, w, h), axis-aligned
@property
def center(self) -> tuple[float, float]: ...@dataclass(frozen=True, eq=False)
class OCRResult:
screenshot: Screenshot
words: tuple[OCRWord, ...]Bundles a Screenshot with the words detected on it. All OCRWord.polygon / OCRWord.bbox values are window-local (origin at the VRChat window's top-left), which is the same frame mouse.move() consumes — no translation step is required.
class OCREngine(ABC):
@abstractmethod
def recognize(self, image: NDArray[np.uint8]) -> Sequence[OCRWord]: ...Swap in your own backend by implementing this ABC.
class RapidOCREngine(OCREngine):
def __init__(self, *, params: dict[str, Any] | None = None) -> None: ...Default backend (PP-OCRv4 via rapidocr). It lazy-imports rapidocr in the constructor, so the rest of the package remains usable without the ocr extra installed.
Raises: ImportError when rapidocr is not installed.
def ocr(
screenshot: Screenshot,
*,
engine: OCREngine | None = None,
) -> OCRResult: ...Run OCR on screenshot. When engine is None, a process-cached RapidOCREngine instance is used.
vrcpilot.ocris callable directly (vrcpilot.ocr(shot)). The submodulevrcpilot.ocris still accessible viafrom vrcpilot.ocr import OCREngineand similar import-from forms — Python's import machinery resolves these throughsys.modules, so the function binding does not break submodule access.
@dataclass(frozen=True)
class Detection:
polygon: Polygon
confidence: float
scale: float # 1.0 = same size as the query
rotation: float # radians, counter-clockwise positive
@property
def bbox(self) -> tuple[int, int, int, int]: ...
@property
def center(self) -> tuple[float, float]: ...@dataclass(frozen=True, eq=False)
class DetectResult:
screenshot: Screenshot
query: NDArray[np.uint8] # (h, w, 3) uint8 RGB
detections: tuple[Detection, ...]All Detection.polygon / Detection.bbox values are window-local, matching OCRResult and the frame mouse.move() accepts.
class DetectEngine(ABC):
@abstractmethod
def detect(
self,
image: NDArray[np.uint8],
query: NDArray[np.uint8],
) -> Sequence[Detection]: ...class TemplateDetectEngine(DetectEngine):
def __init__(
self,
*,
threshold: float = 0.85,
scales: Sequence[float] = (
0.25, 0.3, 0.35, 0.4, 0.5, 0.6, 0.75,
0.9, 1.0, 1.25, 1.5, 1.8, 2.2, 2.6, 3.0,
),
rotations_deg: Sequence[float] = (0.0,),
nms_iou: float = 0.3,
max_results: int = 32,
) -> None: ...Multi-scale (and optionally multi-rotation) cv2.matchTemplate(..., TM_CCOEFF_NORMED) runner with non-maximum suppression.
def detect(
screenshot: Screenshot,
query: NDArray[np.uint8],
*,
engine: DetectEngine | None = None,
) -> DetectResult: ...Run engine.detect(screenshot.image, query). When engine is None, a process-cached TemplateDetectEngine is used.
The keyboard and mouse modules expose thin singleton objects rather than classes. Call methods on them directly. All methods accept focus: bool = True; leave it True unless you deliberately want to bypass the VRChat focus guard. The signatures below are written as defs for paste-friendliness; in practice you call them as vrcpilot.keyboard.press(...) and so on.
StrEnum of every supported key name. Members:
- Letters:
A–Z - Digits:
NUM_0–NUM_9 - Function keys:
F1–F12 - Modifiers:
SHIFT,SHIFT_LEFT,SHIFT_RIGHT,CTRL,CTRL_LEFT,CTRL_RIGHT,ALT,ALT_LEFT,ALT_RIGHT,WIN,WIN_LEFT,WIN_RIGHT - Navigation:
UP,DOWN,LEFT,RIGHT,HOME,END,PAGE_UP,PAGE_DOWN - Editing:
BACKSPACE,DELETE,INSERT,TAB,ENTER,ESCAPE,SPACE - Punctuation:
MINUS,EQUALS,LBRACKET,RBRACKET,BACKSLASH,SEMICOLON,QUOTE,COMMA,PERIOD,SLASH,BACKTICK
def press(*keys: Key, duration: float = 0.1, focus: bool = True) -> None: ...
def down(*keys: Key, focus: bool = True) -> None: ...
def up(*keys: Key, focus: bool = True) -> None: ...press is a chord-tap: keys are pressed left-to-right, held for duration seconds, then released right-to-left. Do not lower duration below 0.1 — VRChat / Unity drops shorter taps.
down and up are paired half-actions. They are intentionally useful only within a single Python process; the synthetic input device is released by the kernel when the process exits, so down/up cannot be paired across CLI invocations.
Raises: TypeError when keys is empty; VRChatNotRunningError / VRChatNotFocusedError from the focus guard.
StrEnum with members LEFT, RIGHT, MIDDLE.
def move(x: int, y: int, *, relative: bool = False, focus: bool = True) -> None: ...
def click(*buttons: MouseButton, count: int = 1, duration: float = 0.0, focus: bool = True) -> None: ...
def scroll(amount: int, *, focus: bool = True) -> None: ...
def press(*buttons: MouseButton, focus: bool = True) -> None: ...
def release(*buttons: MouseButton, focus: bool = True) -> None: ...move(x, y) interprets (x, y) as VRChat window-local pixels — (0, 0) is the top-left of the VRChat window. This is the same frame OCRWord.bbox / Detection.bbox use, so OCR / detect results feed in directly. Coordinates outside the window are not rejected; they are translated to the desktop and passed to the OS as-is. With relative=True, (x, y) is a delta added to the current cursor position (the window-local interpretation does not apply in that branch).
click() falls back to LEFT when called with no buttons. count > 1 repeats the press/release pair. duration > 0 holds each click for that many seconds.
press / release are paired half-actions for chord clicks. As with keyboard.down / up, they are meaningful only within a single Python process.
def ensure_target() -> None: ...Verify VRChat is running and currently focused, focusing it if necessary. Idempotent. The high-level keyboard / mouse / clipboard.paste calls invoke this for you when focus=True (the default).
Raises: NotImplementedError on Wayland-native; VRChatNotRunningError; VRChatNotFocusedError.
Raised by ensure_target() and the input helpers.
def paste(text: str, *, focus: bool = True) -> None: ...Copy text to the OS clipboard, then send Ctrl+V to VRChat. Use this for non-ASCII content (Japanese, emoji, etc.) — scancode-based keyboard.press cannot type those directly.
Raises: pyperclip.PyperclipException when no clipboard backend is available (e.g. Linux without xclip or xsel installed); the focus-guard exceptions when focus=True.
type Polygon = tuple[
tuple[float, float], # TL
tuple[float, float], # TR
tuple[float, float], # BR
tuple[float, float], # BL
]The 4-corner polygon shape used by OCRWord.polygon and Detection.polygon. Coordinates are image-local pixels.
from time import sleep
import vrcpilot
# launch() waits up to wait_timeout seconds for VRChat's PID.
# None means the timeout expired before VRChat appeared.
pid = vrcpilot.launch(no_vr=True, screen_width=1280, screen_height=720)
if pid is None:
raise RuntimeError("VRChat did not start before launch() timed out")
sleep(45) # extra warm-up wait: shaders / avatar loading / network sync
try:
shot = vrcpilot.take_screenshot()
if shot is None:
raise RuntimeError("could not capture the VRChat screen")
result = vrcpilot.ocr(shot)
for word in result.words:
print(word.text, word.bbox, word.confidence)
if result.words:
first = result.words[0]
x, y, w, h = first.bbox
vrcpilot.mouse.move(int(x + w / 2), int(y + h / 2))
vrcpilot.mouse.click(vrcpilot.MouseButton.LEFT)
vrcpilot.keyboard.press(vrcpilot.Key.W, duration=1.0)
vrcpilot.clipboard.paste("こんにちは、VRChat!")
finally:
vrcpilot.terminate()