Skip to content

Latest commit

 

History

History
887 lines (608 loc) · 40.3 KB

File metadata and controls

887 lines (608 loc) · 40.3 KB

Python API Reference

English | 日本語

This is a hand-curated reference for every symbol exposed at vrcpilot.<name>. For runnable examples see usage.md; for the equivalent CLI see cli.md. Function signatures match the source as of 0.2.0rc1.

Conventions

  • All vrcpilot.<name> symbols are listed in src/vrcpilot/__init__.py::__all__.
  • Module attributes vrcpilot.keyboard, vrcpilot.mouse, and vrcpilot.clipboard are also part of the public surface.
  • Most call sites that send synthetic input or interact with the VRChat window expect VRChat to be running and focused. That requirement is enforced by ensure_target(), and the high-level helpers call it for you. The relevant exceptions (VRChatNotRunningError, VRChatNotFocusedError) are re-raised so callers can recover.
  • Coordinate-bearing types (OCRWord, OCRResult, Detection, DetectResult) expose only window-local coordinates (polygon / bbox, origin at the VRChat window's top-left). mouse.move(x, y) consumes the same window-local frame, so OCR / detect bboxes feed in directly without translation — see coordinate system.
  • Code blocks use ... as the body of every signature so they paste back cleanly into a Python REPL or stub file.

Package metadata

vrcpilot.__version__

Resolved from installed distribution metadata via importlib.metadata, so it stays in sync with the package version in pyproject.toml.


Process control

vrcpilot.launch

def launch(
    *,
    app_id: int = 438100,
    steam_path: Path | None = None,
    no_vr: bool = False,
    screen_width: int | None = None,
    screen_height: int | None = None,
    osc: OscConfig | None = None,
    extra_args: list[str] | None = None,
    wait_timeout: float = 30.0,
    wait_interval: float = 1.0,
) -> int | None: ...

Start VRChat through Steam. The new process is detached from the calling process group. After spawning Steam, launch() polls find_pids() for up to wait_timeout seconds (default 30s) and returns the observed PID. Pass wait_timeout=0 (or any non-positive value) to skip the wait and return immediately. This is useful for "fire and forget" launches where you intend to poll later yourself.

Returns: the PID once VRChat is observed, or None if wait_timeout <= 0 or the timeout is exceeded. A None return on a positive timeout is not an exception — branch on the return value if you need a stricter signal.

app_id defaults to VRChat's Steam app id. If you need to reference the constant directly, for example when building a custom launch wrapper, import it from the implementation module: from vrcpilot.process import VRCHAT_STEAM_APP_ID.

Raises: SteamNotFoundError when no Steam executable is found.

vrcpilot.terminate

def terminate(*, timeout: float = 5.0) -> list[int]: ...

Force-kill every running VRChat process and wait up to timeout seconds for them to exit. Idempotent — returns an empty list when nothing is running.

Returns: the PIDs that were signalled.

vrcpilot.find_pids

def find_pids() -> list[int]: ...

Returns: PIDs of every running VRChat process, sorted newest-first by psutil.Process.create_time(). The list is empty when no VRChat is running.

vrcpilot.OscConfig

@dataclass(frozen=True)
class OscConfig:
    in_port: int = 9000
    out_ip: str = "127.0.0.1"
    out_port: int = 9001

    def to_launch_arg(self) -> str: ...

Structured form of VRChat's --osc=<in>:<ip>:<out> flag. to_launch_arg() renders the single CLI token used at launch.

vrcpilot.SteamNotFoundError

Raised by launch() (and the Steam discovery helpers) when no Steam executable can be located.


Window control

vrcpilot.focus

def focus() -> bool: ...

Bring the VRChat window to the foreground (and de-minimize it if needed).

Returns: True on success, False when VRChat is not running, the window is not mapped, the platform call fails, or the session is Wayland-native.

Raises: NotImplementedError on platforms other than Windows / Linux.

vrcpilot.unfocus

def unfocus() -> bool: ...

Send the VRChat window to the bottom of the z-order without raising any other window. Same return / raise contract as focus().

vrcpilot.is_foreground

def is_foreground() -> bool: ...

Returns: True iff the VRChat window is currently in the foreground.


Screen capture

vrcpilot.capture.Capture

class Capture:
    def __init__(self, *, frame_timeout: float = 2.0) -> None: ...
    def read(self) -> np.ndarray: ...
    def close(self) -> None: ...

Streaming capture session for the VRChat window. Captures without focus. The internal buffer keeps only the most recent frame, so read() always returns "now".

  • read() returns (H, W, 3) uint8 RGB.
  • close() is idempotent and never raises.
  • Supports with (context manager).
  • frame_timeout is the per-frame wait in seconds; must be > 0.

Raises:

  • NotImplementedError on platforms other than Windows / Linux.
  • RuntimeError when the backend cannot start (VRChat not running, window not mapped, X11 unavailable, WGC session failure, Wayland-native).
  • ValueError when frame_timeout <= 0.

vrcpilot.CaptureLoop

class CaptureLoop:
    def __init__(
        self,
        callback: Callable[[np.ndarray], None],
        *,
        fps: float,
        frame_timeout: float = 2.0,
    ) -> None: ...

    @property
    def is_running(self) -> bool: ...

    def start(self) -> None: ...
    def stop(self) -> None: ...
    def close(self) -> None: ...

Drives a Capture on a background thread at a fixed fps. Each frame is delivered to callback as (H, W, 3) uint8 RGB. Supports with.

Raises: ValueError when fps or frame_timeout is non-positive; RuntimeError when the inner Capture cannot start; NotImplementedError on unsupported platforms.

The CLI vrcpilot record command composes CaptureLoop (for video) and SpeakerLoop (for audio) with internal PyAV-backed muxers (vrcpilot.cli.record.muxer, not part of the public surface). Build your own writer by passing a callback that consumes (H, W, 3) uint8 RGB frames — for example wrap PyAV or a ffmpeg subprocess to mux into whatever container you need.


Speaker (audio capture)

Process-isolated audio capture for VRChat. On Linux the backend is a native PipeWire pipeline (virtual null-sink + pw-link + pw-record); on Windows it is proc-tap, a native extension that taps a single PID's audio rather than the whole system mix. Either way the stream contains only VRChat's output — Discord, OBS, and other applications are not mixed in. import vrcpilot raises ImportError on any other sys.platform, so no other Speaker backend is reachable.

The backend produces float32 (N, CHANNELS) chunks at 48 kHz stereo. The CLI vrcpilot record command muxes these via internal PyAV-backed writers (vrcpilot.cli.record.muxer, not part of the public surface); to persist audio from your own code, feed the chunks into a writer of your choice — for example PyAV (WAV, MP4, MKV, ...) or a ffmpeg subprocess. The (N, 2) float32 layout maps cleanly onto PyAV's planar/packed float frames.

vrcpilot.speaker.Speaker

class Speaker:
    def __init__(self, *, read_timeout: float = 2.0) -> None: ...
    def read(self) -> NDArray[np.float32]: ...
    def close(self) -> None: ...

Context-managed capture session. VRChat must already be running when the constructor is called; otherwise RuntimeError is raised. Each read() returns every sample buffered since the previous call as a (N, 2) float32 ndarray. N == 0 is a valid "no new audio" signal (returned when read_timeout expires on a quiet stream). close() is idempotent and never raises. Supports with.

Raises:

  • RuntimeError when VRChat is not running or the Speaker backend cannot start (the PipeWire pipeline on Linux, proc-tap on Windows).
  • ValueError when read_timeout <= 0.

vrcpilot.speaker.SpeakerLoop

class SpeakerLoop:
    def __init__(
        self,
        callback: AudioCallback,
        *,
        chunk_seconds: float = 0.05,
        read_timeout: float = 2.0,
    ) -> None: ...

    @property
    def is_running(self) -> bool: ...

    def start(self) -> None: ...
    def stop(self) -> None: ...
    def close(self) -> None: ...

Background-thread driver around Speaker. Constructs and owns its own Speaker, so VRChat must already be running when the loop is instantiated. Each tick drains one chunk and forwards it to callback; the worker sleeps chunk_seconds between drains (default 50 ms, chosen to match the backend buffer cadence). Empty chunks are forwarded verbatim so consumers can treat them as a "silence tick". Exceptions raised by the callback or by Speaker.read() are captured and re-raised on the next stop() / close() so worker-thread failures are never lost. Supports with.

Raises: ValueError when chunk_seconds or read_timeout is non-positive; RuntimeError from the inner Speaker.

vrcpilot.speaker.AudioCallback

type AudioCallback = Callable[[NDArray[np.float32]], None]

The chunk-callback signature accepted by SpeakerLoop. Each callback invocation receives one (N, 2) float32 chunk; an N == 0 chunk is a silence tick.

End-to-end snippet

SpeakerLoop accepts any callable that consumes (N, 2) float32 chunks. The example below collects everything into a single ndarray; in real code you would instead feed each chunk to a streaming writer (PyAV, a ffmpeg subprocess, a network socket, etc.).

import time

import numpy as np
from numpy.typing import NDArray

from vrcpilot.speaker import SpeakerLoop

chunks: list[NDArray[np.float32]] = []

# VRChat must already be running; SpeakerLoop raises RuntimeError otherwise.
with SpeakerLoop(chunks.append, chunk_seconds=0.05) as loop:
    loop.start()
    time.sleep(5.0)

audio = np.concatenate(chunks, axis=0) if chunks else np.empty((0, 2), np.float32)

To persist the recording, write audio (or each incoming chunk) with PyAV, the standard wave module, or the vrcpilot record CLI command — see cli.md record.


Speaker routing (audio output relay)

Forward the audio of a single VRChat PID to a chosen OS output device by pairing the existing PID-scoped SpeakerLoop capture with a soundcard output player. Cross-platform — the same module is used on Windows (proc-tap capture) and Linux (PipeWire-native capture). The PID-scoped capture above feeds two downstream consumers: vrcpilot record persists it to a file, while this module relays it live to another device. Because the relay happens in user space rather than through OS audio policy (IAudioPolicyConfig / EarTrumpet), per-PID isolation holds even when several VRChat.exe instances are running. See virtual-audio.md for the user-facing playbook (virtual-cable setup, latency tuning, mic feedback avoidance) and cli.md for the matching vrcpilot speaker CLI.

vrcpilot.speaker.routing.AudioDevice

@dataclass(frozen=True, slots=True)
class AudioDevice:
    id: str
    name: str
    is_default: bool

Immutable handle to an OS output speaker. id is the opaque backend identifier from soundcard (Windows endpoint GUID, PipeWire node identifier) — treat it as a black box and only pass it back into find_device / Router. name is the user-visible friendly name (Windows FriendlyName, PipeWire node.description). is_default is True iff this is the OS default output; at most one device per list_devices() result has the flag set. frozen=True so a resolved device is safe to share between threads.

vrcpilot.speaker.routing.list_devices

def list_devices() -> list[AudioDevice]: ...

Enumerate every visible output device. The OS default (if any) is first; the remaining devices follow in name-ascending order (Python's default codepoint comparison, case-sensitive). Returns [] cleanly when no output device exists — not an error.

Raises: ImportError when soundcard is not installed; OSError when soundcard cannot load libpulse / WASAPI.

vrcpilot.speaker.routing.default_device

def default_device() -> AudioDevice: ...

Returns: the OS default output device.

Raises: DeviceNotFoundError when the host has no output device at all; ImportError / OSError as for list_devices().

vrcpilot.speaker.routing.find_device

def find_device(query: str) -> AudioDevice: ...

Resolve query to a single output device through three strict stages, each of which stops on a unique hit and raises immediately (without falling through) on two-or-more hits:

  1. query == device.id (exact).
  2. query == device.name (exact, case-sensitive).
  3. query.lower() in device.name.lower() (substring, case-insensitive).

This means an ambiguous substring query fails loudly rather than silently picking one device. Use the full id or the exact name to disambiguate.

Raises: AudioRoutingError when any stage matched two or more devices; DeviceNotFoundError when all three stages matched zero; ImportError / OSError as for list_devices().

vrcpilot.speaker.routing.Router

class Router:
    def __init__(
        self,
        pid: int,
        device: str | AudioDevice | None = None,
        *,
        chunk_seconds: float = 0.02,
        blocksize: int | None = None,
    ) -> None: ...

    @property
    def device(self) -> AudioDevice: ...
    @property
    def is_running(self) -> bool: ...

    def start(self) -> None: ...
    def stop(self) -> None: ...
    def close(self) -> None: ...
    def __enter__(self) -> Self: ...
    def __exit__(self, exc_type, exc_val, exc_tb) -> None: ...

Relay one VRChat PID's audio to a chosen output device. Construction resolves device (Nonedefault_device(), strfind_device(...), AudioDevice → used directly) but does not open any audio stream; start() is what acquires resources. Use the explicit start() / stop() pair, the close() alias, or the with block — all three are interchangeable.

pid is the target VRChat PID and is forwarded to the inner SpeakerLoop; None is intentionally not accepted because the whole point of this module is per-PID separation under multi-instance VRChat. chunk_seconds and blocksize are the capture-side and output-side buffer knobs respectively (see virtual-audio.md for tuning guidance); chunk_seconds=0.02 keeps latency low at the cost of some underrun risk, and blocksize=None lets soundcard pick its backend default.

Lifecycle: start() opens the output player first, then spawns the inner SpeakerLoop; if the capture side fails, the already-opened player is rolled back before the original exception propagates, so a partial start never leaks a resource. stop() tears both down in the reverse order with the player release in a finally block, so a worker-thread exception (e.g. VRChat died mid-relay) still releases the player before being re-raised. Double-start() / double-stop() are intentional no-ops. After a successful stop() the instance is re-startable: a fresh SpeakerLoop and player pair are created on the next start(), which is what makes re-entering the with block well-defined.

VRChat process death does not interrupt start() itself — the worker-thread exception is captured by SpeakerLoop and surfaces on the next stop() / close() / __exit__, mirroring SpeakerLoop's contract.

Thread safety: start() / stop() are expected to be called from the main thread. Audio frames arrive on the SpeakerLoop worker thread; a callback racing with stop() sees a None player snapshot and skips its play() call without locking.

Raises (from start()):

  • RuntimeError when SpeakerLoop / Speaker fails to start (e.g. VRChat process not running).
  • ValueError when chunk_seconds <= 0 (surfaced by SpeakerLoop.__init__).
  • OSError when soundcard fails to open the player (device disappeared between resolution and start(), libpulse / WASAPI runtime error, etc.).
  • NotImplementedError when the host is neither Windows nor Linux (surfaced by the Speaker backend dispatch).

vrcpilot.speaker.routing.route

def route(
    pid: int,
    device: str | AudioDevice | None = None,
    *,
    chunk_seconds: float = 0.02,
    blocksize: int | None = None,
) -> Router: ...

Construct a Router and call start() in one step. The returned Router is already running; the caller owns the lifecycle from there (call stop() / close(), or wrap the result in with). If start() raises, the exception propagates and no Router is returned — start() itself rolled back the partial player, so there is nothing for the caller to clean up. Arguments mean exactly what they do on Router.__init__.

vrcpilot.speaker.routing.AudioRoutingError

RuntimeError subclass and the package-wide base for routing failures — catch this for a single except clause covering both ambiguous-resolution and device-not-found cases. Direct (non-subclass) instances are raised when find_device matches two or more devices in the same stage.

vrcpilot.speaker.routing.DeviceNotFoundError

Subclass of AudioRoutingError. Raised by find_device when all three resolution stages return zero hits, and by default_device when the host has no output device at all.

End-to-end snippet

import time

from vrcpilot.speaker.routing import Router, find_device, list_devices

# Pick a target speaker. find_device("CABLE") would also work if the
# substring is unambiguous on this host.
device = next(d for d in list_devices() if "CABLE" in d.name)

# VRChat must already be running; Router.start raises RuntimeError otherwise.
with Router(pid=12345, device=device) as router:
    assert router.is_running
    time.sleep(5.0)
# Leaving the `with` block stops the relay and releases the output player.

For the matching CLI (vrcpilot speaker list / vrcpilot speaker route --pid N), see cli.md.


Mic (audio playback)

Stream float32 PCM into a virtual-cable output device so it appears to VRChat as live microphone input. The primary use case is piping an LLM agent's TTS chunks directly into VRChat without ever touching a real microphone or an intermediate audio file. The session opens a soundcard player in __init__ and keeps it alive until the instance is closed; play(chunk) writes a single chunk per call so callers drive the cadence themselves (for chunk in tts.stream(): mic.play(chunk)). On Windows the default device is VB-Audio Virtual Cable's "CABLE Input"; on Linux the default is the "VRCPilotMic" PipeWire sink created by vrcpilot.mic.linux.register_virtual_mic (or by running vrcpilot linux-mic register) — when running multiple AI agent instances in parallel, register a dedicated VRCPilotMic_<suffix> for each via vrcpilot.mic.linux.register_virtual_mic(suffix=...) so each instance's TTS gets its own isolated input path into VRChat.

vrcpilot.Mic

class Mic:
    def __init__(
        self,
        device: str | None = None,
        *,
        sample_rate: int = 48000,
        channels: int = 1,
    ) -> None: ...

    @property
    def device_name(self) -> str: ...
    @property
    def device_id(self) -> str: ...
    @property
    def sample_rate(self) -> int: ...
    @property
    def channels(self) -> int: ...

    def play(self, chunk: NDArray[np.float32]) -> None: ...
    def close(self) -> None: ...
    def __enter__(self) -> Self: ...
    def __exit__(self, exc_type, exc_val, exc_tb) -> None: ...

device is matched as a case-insensitive substring against the names soundcard reports (matching covers both Speaker.name and Speaker.id, with fuzzy id matching). None defers to $VRCPILOT_MIC_DEVICE, then to the OS default returned by default_device_name(). The constructor resolves the device, opens a soundcard player for (sample_rate, channels), and enters it — those values are baked in for the lifetime of the session, so reconfiguring means constructing a new Mic.

device_id exposes the underlying soundcard Speaker.id as a string. On Linux this is the PulseAudio sink name (e.g. "VRCPilotMic"); on Windows it is the WASAPI device id string surfaced by soundcard.

play(chunk) writes one float32 array per call. The chunk shape must match the configured channel count ((N,) for mono, (N, channels) for multi-channel) or ValueError is raised. The call blocks if the backend's internal buffer is full, giving the caller natural back-pressure for live TTS streams.

The stream is released by close(), by leaving the with block, or as a best-effort fallback in __del__. Prefer the context manager — __del__ runs at GC time and cannot be relied on for prompt resource release on every interpreter.

Raises:

  • MicDeviceNotFoundError when no output device matches the resolved name, or no default is configured for this platform.
  • ImportError when soundcard is not installed (the lazy import happens during construction).
  • RuntimeError from the soundcard backend (libpulse on Linux, WASAPI on Windows) when it cannot open the player, or from play() after the Mic has been closed.
  • OSError when the native backend shared library cannot be loaded (e.g. libpulse0 is missing on Linux).
  • ValueError when sample_rate / channels is not strictly positive, or when play() receives a non-float32 chunk, a chunk with ndim outside {1, 2}, or a chunk whose channel count disagrees with the constructor.

vrcpilot.MicDeviceNotFoundError

RuntimeError subclass raised when soundcard cannot locate an output device matching the resolved name. The message lists every output device soundcard currently sees and includes an OS-specific setup hint (vrcpilot linux-mic register on Linux, VB-Cable install link on Windows), which makes mis-named installations easy to diagnose.

vrcpilot.mic.default_device_name

def default_device_name() -> str | None: ...

The OS-specific default output-device substring. Returns "CABLE Input" on Windows and "VRCPilotMic" on Linux (after vrcpilot linux-mic register). Returns None on other platforms. When you register additional sinks via vrcpilot linux-mic register --suffix <name> (or vrcpilot.mic.linux.register_virtual_mic(suffix="<name>")), target them explicitly by sink name (e.g. Mic("VRCPilotMic_<name>")) — default_device_name() only ever returns the empty-suffix "VRCPilotMic".

VRCPILOT_MIC_DEVICE

Environment variable consulted between the constructor argument and default_device_name(). Useful for keeping device names out of source code, or for overriding the Windows default without code changes.

End-to-end snippets

Play a single preloaded buffer:

import numpy as np
import vrcpilot

samples = np.zeros(48000, dtype=np.float32)  # 1 second of silence
with vrcpilot.Mic(sample_rate=48000, channels=1) as mic:
    mic.play(samples)

Stream chunks from a generator (the shape an LLM agent's incremental TTS typically produces):

from collections.abc import Iterator

import numpy as np
from numpy.typing import NDArray

import vrcpilot

def tts_chunks() -> Iterator[NDArray[np.float32]]:
    # Replace with the agent's actual chunk iterator.
    for _ in range(10):
        yield np.zeros(4800, dtype=np.float32)  # 100 ms of silence per chunk

with vrcpilot.Mic(sample_rate=48000, channels=1) as mic:
    for chunk in tts_chunks():
        mic.play(chunk)

vrcpilot.mic.linux

Linux-only helpers that manage the persistent VRCPilotMic virtual mic in PipeWire. This is the programmatic counterpart of the vrcpilot linux-mic CLI; both write the same config fragment and call the same PulseAudio module_load path.

Every public function takes a suffix keyword so the same machinery can manage multiple sinks side-by-side — the primary use case being running multiple AI agent instances in parallel, each with its own dedicated virtual mic. An empty suffix ("", the default) targets the legacy VRCPilotMic and preserves backward compatibility; a non-empty suffix (e.g. "alt") targets the VRCPilotMic_<suffix> derived sink. suffix may contain only [A-Za-z0-9_-]; anything else raises ValueError.

Importing this submodule on non-Linux platforms raises ImportError at import time (raise ImportError("vrcpilot.mic.linux is Linux-only")), so guard accesses with sys.platform == "linux" (or import lazily) when writing cross-platform code.

vrcpilot.mic.linux.register_virtual_mic

def register_virtual_mic(
    *,
    suffix: str = "",
    runtime_load: bool = True,
) -> RegisterResult: ...

Persist the VRCPilotMic (or VRCPilotMic_<suffix>) module-null-sink to $XDG_CONFIG_HOME/pipewire/pipewire.conf.d/vrcpilot-mic[-<suffix>].conf (falling back to ~/.config/... when the variable is unset) and, when runtime_load=True, additionally call pulsectl.Pulse.module_load("module-null-sink", ...) so the sink is usable immediately. The runtime step is best-effort — failures (missing pulsectl, control-plane error) are surfaced via RegisterResult.runtime_warning rather than raised, because the persistent config is the source of truth and will be picked up after the next PipeWire restart / re-login.

suffix selects which sink to act on. An empty suffix (the default) targets the existing VRCPilotMic; a non-empty suffix targets VRCPilotMic_<suffix>. Re-calling with the same suffix is idempotent — any pre-existing runtime module is unloaded before the fresh load, so re-registration never double-stacks.

Returns: a RegisterResult describing what was done.

Raises: OSError when the persistent config cannot be written (permission errors, filesystem failures); ValueError when suffix contains illegal characters.

vrcpilot.mic.linux.unregister_virtual_mic

def unregister_virtual_mic(*, suffix: str = "") -> bool: ...

Remove the persistent config fragment for suffix and unload any currently loaded module-null-sink with the matching name. An empty suffix (the default) targets VRCPilotMic; a non-empty suffix targets VRCPilotMic_<suffix>. Returns True if anything was actually removed (config file deleted, runtime module unloaded, or both); False when neither artefact existed. Idempotent — safe to call repeatedly.

Raises: ValueError when suffix contains illegal characters.

vrcpilot.mic.linux.is_registered

def is_registered(*, suffix: str = "") -> bool: ...

Return whether the persistent config fragment for suffix exists. An empty suffix checks the default VRCPilotMic; a non-empty suffix checks VRCPilotMic_<suffix>. Does not consult PulseAudio — use the vrcpilot linux-mic status CLI or call open_pulse_control() directly to inspect the runtime module list.

Raises: ValueError when suffix contains illegal characters.

vrcpilot.mic.linux.config_path

def config_path(*, suffix: str = "") -> Path: ...

Absolute path of the PipeWire config fragment for suffix (empty suffix → vrcpilot-mic.conf, non-empty → vrcpilot-mic-<suffix>.conf). Honours $XDG_CONFIG_HOME with ~/.config as the XDG fallback. Public so vrcpilot linux-mic status can surface the location.

Raises: ValueError when suffix contains illegal characters.

vrcpilot.mic.linux.iter_registered_suffixes

def iter_registered_suffixes() -> list[str]: ...

Scan pipewire.conf.d/ for vrcpilot-mic*.conf fragments and recover the suffix from each filename. The empty suffix (the default sink) sorts first; the remainder is in lexicographic order so callers — CLI listings, e2e enumerations — get a stable ordering. Returns [] if the config directory does not exist yet.

vrcpilot.mic.linux.RegisterResult

@dataclass(frozen=True)
class RegisterResult:
    config_path: Path
    created_config: bool
    runtime_loaded: bool
    runtime_warning: str | None
    suffix: str

Outcome of register_virtual_mic:

  • config_path — absolute path to the persistent config fragment.
  • created_configTrue iff the call wrote the file (False when it already existed with the expected contents).
  • runtime_loadedTrue iff the immediate pulsectl module_load succeeded. False when skipped via runtime_load=False or when the runtime step failed (in which case runtime_warning is populated).
  • runtime_warning — human-readable description of the runtime-load failure, or None when no failure occurred.
  • suffix — the normalised suffix that was registered (empty string for the default sink). The value passed to register_virtual_mic(suffix=...) round-trips here unchanged.

Screenshot

vrcpilot.Screenshot

@dataclass(frozen=True, eq=False)
class Screenshot:
    image: NDArray[np.uint8]   # (H, W, 3) uint8 RGB
    x: int                     # window top-left, desktop-absolute
    y: int
    width: int
    height: int
    monitor_index: int         # mss.MSS().monitors index
    captured_at: datetime      # UTC

    def save(self, png_path: Path | None = None) -> str: ...
    @classmethod
    def load(cls, text: str) -> Screenshot: ...

Pixel data plus the window's on-screen geometry (x / y are the window's top-left in desktop-absolute pixels; monitor_index records the mss monitor the capture came from). OCR / detect results are window-local, so this geometry is informational rather than required for clicking. eq=False because numpy arrays cannot be compared element-wise in __eq__.

save() returns a YAML string. When png_path is provided the PNG is written there and the YAML stores path:; otherwise the YAML embeds the PNG as base64 under image:. load() restores either form.

vrcpilot.take_screenshot

def take_screenshot(*, settle_seconds: float = 0.05) -> Screenshot | None: ...

Focus VRChat, sleep settle_seconds, and grab a one-shot capture of the VRChat window only.

Returns: a Screenshot, or None on a recoverable failure (Wayland-native, focus refused, window unmapped, mss error).

Raises: NotImplementedError on unsupported platforms; ValueError when settle_seconds < 0.


OCR

vrcpilot.ocr.OCRWord

@dataclass(frozen=True)
class OCRWord:
    text: str
    polygon: Polygon          # (TL, TR, BR, BL), image-local
    confidence: float         # 0.0–1.0

    @property
    def bbox(self) -> tuple[int, int, int, int]: ...   # (x, y, w, h), axis-aligned
    @property
    def center(self) -> tuple[float, float]: ...

vrcpilot.ocr.OCRResult

@dataclass(frozen=True, eq=False)
class OCRResult:
    screenshot: Screenshot
    words: tuple[OCRWord, ...]

Bundles a Screenshot with the words detected on it. All OCRWord.polygon / OCRWord.bbox values are window-local (origin at the VRChat window's top-left), which is the same frame mouse.move() consumes — no translation step is required.

vrcpilot.ocr.OCREngine

class OCREngine(ABC):
    @abstractmethod
    def recognize(self, image: NDArray[np.uint8]) -> Sequence[OCRWord]: ...

Swap in your own backend by implementing this ABC.

vrcpilot.ocr.RapidOCREngine

class RapidOCREngine(OCREngine):
    def __init__(self, *, params: dict[str, Any] | None = None) -> None: ...

Default backend (PP-OCRv4 via rapidocr). It lazy-imports rapidocr in the constructor, so the rest of the package remains usable without the ocr extra installed.

Raises: ImportError when rapidocr is not installed.

vrcpilot.ocr

def ocr(
    screenshot: Screenshot,
    *,
    engine: OCREngine | None = None,
) -> OCRResult: ...

Run OCR on screenshot. When engine is None, a process-cached RapidOCREngine instance is used.

vrcpilot.ocr is callable directly (vrcpilot.ocr(shot)). The submodule vrcpilot.ocr is still accessible via from vrcpilot.ocr import OCREngine and similar import-from forms — Python's import machinery resolves these through sys.modules, so the function binding does not break submodule access.


Image-template detection

vrcpilot.detect.Detection

@dataclass(frozen=True)
class Detection:
    polygon: Polygon
    confidence: float
    scale: float        # 1.0 = same size as the query
    rotation: float     # radians, counter-clockwise positive

    @property
    def bbox(self) -> tuple[int, int, int, int]: ...
    @property
    def center(self) -> tuple[float, float]: ...

vrcpilot.detect.DetectResult

@dataclass(frozen=True, eq=False)
class DetectResult:
    screenshot: Screenshot
    query: NDArray[np.uint8]    # (h, w, 3) uint8 RGB
    detections: tuple[Detection, ...]

All Detection.polygon / Detection.bbox values are window-local, matching OCRResult and the frame mouse.move() accepts.

vrcpilot.detect.DetectEngine

class DetectEngine(ABC):
    @abstractmethod
    def detect(
        self,
        image: NDArray[np.uint8],
        query: NDArray[np.uint8],
    ) -> Sequence[Detection]: ...

vrcpilot.detect.TemplateDetectEngine

class TemplateDetectEngine(DetectEngine):
    def __init__(
        self,
        *,
        threshold: float = 0.85,
        scales: Sequence[float] = (
            0.25, 0.3, 0.35, 0.4, 0.5, 0.6, 0.75,
            0.9, 1.0, 1.25, 1.5, 1.8, 2.2, 2.6, 3.0,
        ),
        rotations_deg: Sequence[float] = (0.0,),
        nms_iou: float = 0.3,
        max_results: int = 32,
    ) -> None: ...

Multi-scale (and optionally multi-rotation) cv2.matchTemplate(..., TM_CCOEFF_NORMED) runner with non-maximum suppression.

vrcpilot.detect

def detect(
    screenshot: Screenshot,
    query: NDArray[np.uint8],
    *,
    engine: DetectEngine | None = None,
) -> DetectResult: ...

Run engine.detect(screenshot.image, query). When engine is None, a process-cached TemplateDetectEngine is used.


Synthetic input

The keyboard and mouse modules expose thin singleton objects rather than classes. Call methods on them directly. All methods accept focus: bool = True; leave it True unless you deliberately want to bypass the VRChat focus guard. The signatures below are written as defs for paste-friendliness; in practice you call them as vrcpilot.keyboard.press(...) and so on.

vrcpilot.Key

StrEnum of every supported key name. Members:

  • Letters: AZ
  • Digits: NUM_0NUM_9
  • Function keys: F1F12
  • Modifiers: SHIFT, SHIFT_LEFT, SHIFT_RIGHT, CTRL, CTRL_LEFT, CTRL_RIGHT, ALT, ALT_LEFT, ALT_RIGHT, WIN, WIN_LEFT, WIN_RIGHT
  • Navigation: UP, DOWN, LEFT, RIGHT, HOME, END, PAGE_UP, PAGE_DOWN
  • Editing: BACKSPACE, DELETE, INSERT, TAB, ENTER, ESCAPE, SPACE
  • Punctuation: MINUS, EQUALS, LBRACKET, RBRACKET, BACKSLASH, SEMICOLON, QUOTE, COMMA, PERIOD, SLASH, BACKTICK

vrcpilot.keyboard

def press(*keys: Key, duration: float = 0.1, focus: bool = True) -> None: ...
def down(*keys: Key, focus: bool = True) -> None: ...
def up(*keys: Key, focus: bool = True) -> None: ...

press is a chord-tap: keys are pressed left-to-right, held for duration seconds, then released right-to-left. Do not lower duration below 0.1 — VRChat / Unity drops shorter taps.

down and up are paired half-actions. They are intentionally useful only within a single Python process; the synthetic input device is released by the kernel when the process exits, so down/up cannot be paired across CLI invocations.

Raises: TypeError when keys is empty; VRChatNotRunningError / VRChatNotFocusedError from the focus guard.

vrcpilot.MouseButton

StrEnum with members LEFT, RIGHT, MIDDLE.

vrcpilot.mouse

def move(x: int, y: int, *, relative: bool = False, focus: bool = True) -> None: ...
def click(*buttons: MouseButton, count: int = 1, duration: float = 0.0, focus: bool = True) -> None: ...
def scroll(amount: int, *, focus: bool = True) -> None: ...
def press(*buttons: MouseButton, focus: bool = True) -> None: ...
def release(*buttons: MouseButton, focus: bool = True) -> None: ...

move(x, y) interprets (x, y) as VRChat window-local pixels(0, 0) is the top-left of the VRChat window. This is the same frame OCRWord.bbox / Detection.bbox use, so OCR / detect results feed in directly. Coordinates outside the window are not rejected; they are translated to the desktop and passed to the OS as-is. With relative=True, (x, y) is a delta added to the current cursor position (the window-local interpretation does not apply in that branch).

click() falls back to LEFT when called with no buttons. count > 1 repeats the press/release pair. duration > 0 holds each click for that many seconds.

press / release are paired half-actions for chord clicks. As with keyboard.down / up, they are meaningful only within a single Python process.

vrcpilot.controls.ensure_target

def ensure_target() -> None: ...

Verify VRChat is running and currently focused, focusing it if necessary. Idempotent. The high-level keyboard / mouse / clipboard.paste calls invoke this for you when focus=True (the default).

Raises: NotImplementedError on Wayland-native; VRChatNotRunningError; VRChatNotFocusedError.

vrcpilot.VRChatNotRunningError, vrcpilot.VRChatNotFocusedError

Raised by ensure_target() and the input helpers.


Clipboard

vrcpilot.clipboard.paste

def paste(text: str, *, focus: bool = True) -> None: ...

Copy text to the OS clipboard, then send Ctrl+V to VRChat. Use this for non-ASCII content (Japanese, emoji, etc.) — scancode-based keyboard.press cannot type those directly.

Raises: pyperclip.PyperclipException when no clipboard backend is available (e.g. Linux without xclip or xsel installed); the focus-guard exceptions when focus=True.


Type aliases

vrcpilot.types.Polygon

type Polygon = tuple[
    tuple[float, float],  # TL
    tuple[float, float],  # TR
    tuple[float, float],  # BR
    tuple[float, float],  # BL
]

The 4-corner polygon shape used by OCRWord.polygon and Detection.polygon. Coordinates are image-local pixels.


End-to-end snippet

from time import sleep

import vrcpilot

# launch() waits up to wait_timeout seconds for VRChat's PID.
# None means the timeout expired before VRChat appeared.
pid = vrcpilot.launch(no_vr=True, screen_width=1280, screen_height=720)
if pid is None:
    raise RuntimeError("VRChat did not start before launch() timed out")
sleep(45)  # extra warm-up wait: shaders / avatar loading / network sync

try:
    shot = vrcpilot.take_screenshot()
    if shot is None:
        raise RuntimeError("could not capture the VRChat screen")

    result = vrcpilot.ocr(shot)
    for word in result.words:
        print(word.text, word.bbox, word.confidence)

    if result.words:
        first = result.words[0]
        x, y, w, h = first.bbox
        vrcpilot.mouse.move(int(x + w / 2), int(y + h / 2))
        vrcpilot.mouse.click(vrcpilot.MouseButton.LEFT)

    vrcpilot.keyboard.press(vrcpilot.Key.W, duration=1.0)
    vrcpilot.clipboard.paste("こんにちは、VRChat!")
finally:
    vrcpilot.terminate()