Name	Name	Last commit message	Last commit date
parent directory ..
YOLOWorldDemo.xcodeproj	YOLOWorldDemo.xcodeproj
YOLOWorldDemo	YOLOWorldDemo
README.md	README.md
convert_models.py	convert_models.py

Name

Last commit message

Last commit date

YOLOWorldDemo

Open-vocabulary object detection on iOS using YOLO-World + CLIP.

Type any text — "person", "red car", "coffee cup" — and detect it in real-time camera, photos, or videos. No fixed class list.

Architecture

Text Input ──→ CLIP Text Encoder ──→ txt_feats [1,80,512]
                                          │
Camera/Image ──→ YOLO-World Detector ─────┤──→ boxes [1,4,8400]
                                          └──→ scores [1,80,8400] (sigmoid-calibrated)
                                                  │
                                              NMS + Filter ──→ Bounding Boxes

The CoreML detector includes the full BNContrastiveHead scoring pipeline internally. Scores are pre-computed — no external parameter files needed.

Models

Model	Size	Description
`yoloworld_detector.mlpackage`	25 MB	YOLO-World V2-S visual detector
`clip_text_encoder.mlpackage`	121 MB	CLIP ViT-B/32 text encoder
`clip_vocab.json`	1.6 MB	BPE vocabulary for tokenizer

Features

Camera: Real-time open-vocabulary detection
Photo: Pick from library, detect with any text query
Video: Pick a video, detect frame-by-frame with overlay
Open-vocabulary: Up to 80 simultaneous queries, any text

Requirements

iOS 16.0+
Xcode 15.0+
Physical device (camera + Neural Engine)

Quick Start

Open YOLOWorldDemo.xcodeproj in Xcode
Select your development team
Build and run on a physical device

Models are pre-bundled. No additional setup required.

Re-converting Models (Optional)

To convert with a different model size (m/l/x):

pip install ultralytics open_clip_torch coremltools torch==2.7.0
python convert_models.py --size l

Then replace the .mlpackage files in the Xcode project.

Usage

Enter comma-separated object names in the text field (e.g., person, dog, car)
Tap the search button or press return
Switch between Camera / Photo / Video modes with the bottom buttons
In Photo/Video mode, tap the green (+) button to pick from library

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

YOLOWorldDemo

Architecture

Models

Features

Requirements

Quick Start

Re-converting Models (Optional)

Usage

FilesExpand file tree

YOLOWorldDemo

Directory actions

More options

Directory actions

More options

Latest commit

History

YOLOWorldDemo

Folders and files

parent directory

README.md

YOLOWorldDemo

Architecture

Models

Features

Requirements

Quick Start

Re-converting Models (Optional)

Usage