Converted Core ML Model Zoo.
Core ML is a machine learning framework by Apple. If you are iOS developer, you can easly use machine learning models in your Xcode project.
Take a look this model zoo, and if you found the CoreML model you want, download the model from google drive link and bundle it in your project. Or if the model have sample project link, try it and see how to use the model in the project. You are free to do or not.
If you like this repository, please give me a star so I can do my best.
-
Stable Diffusion :text2image
You can get the model converted to CoreML format from the link of Google drive. See the section below for how to use it in Xcode. The license for each model conforms to the license for the original project.
| Google Drive Link | Size | Dataset | Original Project | License |
|---|---|---|---|---|
| Efficientnetb0 | 22.7 MB | ImageNet | TensorFlowHub | Apache2.0 |
| Google Drive Link | Size | Dataset | Original Project | License | Year |
|---|---|---|---|---|---|
| Efficientnetv2 | 85.8 MB | ImageNet | Google/autoML | Apache2.0 | 2021 |
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
| Google Drive Link | Size | Dataset | Original Project | License | Year |
|---|---|---|---|---|---|
| VisionTransformer-B16 | 347.5 MB | ImageNet | google-research/vision_transformer | Apache2.0 | 2021 |
Local Features Coupling Global Representations for Visual Recognition.
| Google Drive Link | Size | Dataset | Original Project | License | Year |
|---|---|---|---|---|---|
| Conformer-tiny-p16 | 94.1 MB | ImageNet | pengzhiliang/Conformer | Apache2.0 | 2021 |
Data-efficient Image Transformers
| Google Drive Link | Size | Dataset | Original Project | License | Year |
|---|---|---|---|---|---|
| DeiT-base384 | 350.5 MB | ImageNet | facebookresearch/deit | Apache2.0 | 2021 |
Making VGG-style ConvNets Great Again
| Google Drive Link | Size | Dataset | Original Project | License | Year |
|---|---|---|---|---|---|
| RepVGG-A0 | 33.3 MB | ImageNet | DingXiaoH/RepVGG | MIT | 2021 |
Designing Network Design Spaces
| Google Drive Link | Size | Dataset | Original Project | License | Year |
|---|---|---|---|---|---|
| regnet_y_400mf | 16.5 MB | ImageNet | TORCHVISION.MODELS | MIT | 2020 |
CVNets: A library for training computer vision networks
| Google Drive Link | Size | Dataset | Original Project | License | Year | Conversion Script |
|---|---|---|---|---|---|---|
| MobileViTv2 | 18.8 MB | ImageNet | apple/ml-cvnets | apple | 2022 |
| Download Link | Size | Output | Original Project | License | Note | Sample Project |
|---|---|---|---|---|---|---|
| dfine-n-coco | 13MB | Confidence(MultiArray (Float32 300 × 80)), Coordinates (MultiArray (Float32 300 × 4)) | Peterande/D-FINE | Apache 2.0 | Input 640×640. Coordinates are normalized cxcywh. No NMS — filter by confidence threshold. | peaceofcake DFINEDemo |
| Download Link | Size | Output | Original Project | License | Note | Sample Project |
|---|---|---|---|---|---|---|
| rfdetr-n-coco | 95MB | Confidence(MultiArray (Float32 300 × 91)), Coordinates (MultiArray (Float32 300 × 4)) | roboflow/rf-detr | Apache 2.0 | Input 384×384. 91 classes (index 0 = background, 1-90 = COCO category IDs). Coordinates are normalized cxcywh. No NMS. | peaceofcake DFINEDemo |
| Google Drive Link | Size | Output | Original Project | License | Note | Sample Project |
|---|---|---|---|---|---|---|
| YOLOv5s | 29.3MB | Confidence(MultiArray (Double 0 × 80)), Coordinates (MultiArray (Double 0 × 4)) | ultralytics/yolov5 | GNU | Non Maximum Suppression has been added. | CoreML-YOLOv5 |
| Google Drive Link | Size | Output | Original Project | License | Note | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|
| YOLOv7 | 147.9MB | Confidence(MultiArray (Double 0 × 80)), Coordinates (MultiArray (Double 0 × 4)) | WongKinYiu/yolov7 | GNU | Non Maximum Suppression has been added. | CoreML-YOLOv5 |
| Google Drive Link | Size | Output | Original Project | License | Note | Sample Project |
|---|---|---|---|---|---|---|
| YOLOv8s | 45.1MB | Confidence(MultiArray (Double 0 × 80)), Coordinates (MultiArray (Double 0 × 4)) | ultralytics/ultralytics | GNU | Non Maximum Suppression has been added. | CoreML-YOLOv5 |
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. Uses PGI and GELAN architecture for efficient object detection.
| Download Link | Size | Output | Original Project | License | Year | Note | Sample Project |
|---|---|---|---|---|---|---|---|
| yolov9s.mlpackage.zip | 14 MB | Confidence (MultiArray (Double 0 × 80)), Coordinates (MultiArray (Double 0 × 4)) | WongKinYiu/yolov9 | GPL-3.0 | 2024 | Non Maximum Suppression has been added. | YOLOv9Demo |
YOLOv10: Real-Time End-to-End Object Detection. NMS-free architecture using consistent dual assignments — no post-processing needed.
| Download Link | Size | Output | Original Project | License | Year | Note | Sample Project |
|---|---|---|---|---|---|---|---|
| yolov10s.mlpackage.zip | 14 MB | MultiArray (1 × 300 × 6) | THU-MIG/yolov10 | AGPL-3.0 | 2024 | NMS-free end-to-end detection. | YOLO26Demo |
YOLO11: Ultralytics latest YOLO with improved backbone and neck architecture. 22% fewer parameters than YOLOv8 with higher mAP.
| Download Link | Size | Output | Original Project | License | Year | Note | Sample Project |
|---|---|---|---|---|---|---|---|
| yolo11s.mlpackage.zip | 18 MB | Confidence (MultiArray (Double 0 × 80)), Coordinates (MultiArray (Double 0 × 4)) | ultralytics/ultralytics | AGPL-3.0 | 2024 | Non Maximum Suppression has been added. | YOLOv9Demo |
YOLO26: Edge-first vision AI with NMS-free end-to-end detection. Up to 43% faster CPU inference vs YOLO11 with DFL removal and ProgLoss.
| Download Link | Size | Output | Original Project | License | Year | Note | Sample Project |
|---|---|---|---|---|---|---|---|
| yolo26s.mlpackage.zip | 18 MB | MultiArray (1 × 300 × 6) | ultralytics/ultralytics | AGPL-3.0 | 2026 | NMS-free end-to-end detection. | YOLO26Demo |
YOLO-World: Real-Time Open-Vocabulary Object Detection. Type any text query and detect it — no fixed class list. Uses CLIP text encoder for open-vocabulary matching.
| Download Link | Size | Description | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|
| yoloworld_detector.mlpackage.zip | 25 MB | YOLO-World V2-S visual detector | AILab-CVC/YOLO-World | GPL-3.0 | 2024 | YOLOWorldDemo |
| clip_text_encoder.mlpackage.zip | 121 MB | CLIP ViT-B/32 text encoder | openai/CLIP | MIT | 2021 | — |
| clip_vocab.json.zip | 1.6 MB | BPE vocabulary for tokenizer | — | — | — | — |
| Google Drive Link | Size | Output | Original Project | License |
|---|---|---|---|---|
| U2Net | 175.9 MB | Image(GRAYSCALE 320 × 320) | xuebinqin/U-2-Net | Apache |
| U2Netp | 4.6 MB | Image(GRAYSCALE 320 × 320) | xuebinqin/U-2-Net | Apache |
| Google Drive Link | Size | Output | Original Project | License | Year | Conversion Script |
|---|---|---|---|---|---|---|
| IS-Net | 176.1 MB | Image(GRAYSCALE 1024 × 1024) | xuebinqin/DIS | Apache | 2022 | |
| IS-Net-General-Use | 176.1 MB | Image(GRAYSCALE 1024 × 1024) | xuebinqin/DIS | Apache | 2022 |
RMBG1.4 - The IS-Net enhanced with our unique training scheme and proprietary dataset.
| Download Link | Size | Output | Original Project | License | year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|
| RMBG_1_4.mlpackage.zip | 42 MB (INT8) | Alpha mask 1024x1024 | briaai/RMBG-1.4 | Creative Commons | 2024 | RMBGDemo | convert_rmbg.py |
| Google Drive Link | Size | Output | Original Project | License | Sample Project |
|---|---|---|---|---|---|
| face-Parsing | 53.2 MB | MultiArray(1 x 512 × 512) | zllrunning/face-parsing.PyTorch | MIT | CoreML-face-parsing |
Simple and Efficient Design for Semantic Segmentation with Transformers
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| SegFormer_mit-b0_1024x1024_cityscapes | 14.9 MB | MultiArray(512 × 1024) | NVlabs/SegFormer | NVIDIA | 2021 |
Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| BiSeNetV2_1024x1024_cityscapes | 12.8 MB | MultiArray | ycszen/BiSeNet | Apache2.0 | 2021 |
Disentangled Non-Local Neural Networks
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| dnl_r50-d8_512x512_80k_ade20k | 190.8 MB | MultiArray[512x512] | ADE20K | yinmh17/DNL-Semantic-Segmentation | Apache2.0 | 2020 |
Interlaced Sparse Self-Attention for Semantic Segmentation
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| isanet_r50-d8_512x512_80k_ade20k | 141.5 MB | MultiArray[512x512] | ADE20K | openseg-group/openseg.pytorch | MIT | ArXiv'2019/IJCV'2021 |
Rethinking Dilated Convolution in the Backbone for Semantic Segmentation
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| fastfcn_r50-d32_jpu_aspp_512x512_80k_ade20k | 326.2 MB | MultiArray[512x512] | ADE20K | wuhuikai/FastFCN | MIT | ArXiv'2019 |
Non-local Networks Meet Squeeze-Excitation Networks and Beyond
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| gcnet_r50-d8_512x512_20k_voc12aug | 189 MB | MultiArray[512x512] | PascalVOC | xvjiarui/GCNet | Apache License 2.0 | ICCVW'2019/TPAMI'2020 |
Dual Attention Network for Scene Segmentation(CVPR2019)
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| danet_r50-d8_512x1024_40k_cityscapes | 189.7 MB | MultiArray[512x1024] | CityScapes | junfu1115/DANet | MIT | CVPR2019 |
Panoptic Feature Pyramid Networks
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| fpn_r50_512x1024_80k_cityscapes | 108.6 MB | MultiArray[512x1024] | CityScapes | facebookresearch/detectron2 | Apache License 2.0 | 2019 |
Code for binary segmentation of various cloths.
| Google Drive Link | Size | Output | Dataset | Original Project | License | year |
|---|---|---|---|---|---|---|
| clothSegmentation | 50.1 MB | Image(GrayScale 640x960) | fashion-2019-FGVC6 | facebookresearch/detectron2 | MIT | 2020 |
EasyPortrait - Face Parsing and Portrait Segmentation Dataset.
| Google Drive Link | Size | Output | Original Project | License | year | Swift sample | Conversion Script |
|---|---|---|---|---|---|---|---|
| easyportrait-segformer512-fp | 7.6 MB | Image(GrayScale 512x512) * 9 | hukenovs/easyportrait | Creative Commons | 2023 | easyportrait-coreml |
Faster Segment Anything: Towards Lightweight SAM for Mobile Applications. MobileSAM replaces the heavy ViT-H image encoder with a lightweight ViT-Tiny encoder via decoupled knowledge distillation, making it ~60x smaller and ~40x faster than the original SAM.

| Download Link | Size | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|
| MobileSAM.zip | 23 MB (Encoder 13 MB + Decoder 9.8 MB) | Segmentation Mask | ChaoningZhang/MobileSAM | Apache 2.0 | 2023 | SamKit |
SAM 2: Segment Anything in Images and Videos. SAM 2 extends promptable segmentation from images to videos using a streaming architecture with memory. The Tiny variant uses a Hiera-T backbone for efficient on-device inference.
| Download Link | Size | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|
| SAM2Tiny.zip | 76 MB (ImageEncoder 64 MB + PromptEncoder 2 MB + MaskDecoder 9.8 MB) | Segmentation Mask | facebookresearch/sam2 | Apache 2.0 | 2024 | SamKit |
pq-yang/MatAnyone (CVPR 2025) — temporally consistent video matting with object-level memory propagation. Given a first-frame mask the network tracks and refines an alpha matte across the whole clip, holding sharp edges (hair, semitransparent regions) much better than per-frame matting baselines. Built on the Cutie video object segmentation backbone with a dedicated mask decoder for matting.
The CoreML port splits the network into 5 stateless modules so the per-frame memory state machine can live in Swift while CoreML handles the heavy compute. End-to-end alpha matte parity vs the official PyTorch reference: MAE < 2e-4, correlation 0.9999+ across 18 frames including 3 memory cycles.
The sample app uses Vision's VNGeneratePersonSegmentationRequest to bootstrap the first-frame mask automatically — pick a video, tap "Remove BG", and it composites the foreground over the chosen background colour.
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| MatAnyone (5 mlpackages, ~111 MB FP16 total) | 111 MB | image [1,3,432,768] (per-frame state in Swift) | alpha matte [1,1,432,768] | pq-yang/MatAnyone | NTU S-Lab 1.0 | 2025 | MatAnyoneDemo | convert_matanyone.py |
See sample_apps/MatAnyoneDemo/README.md for the per-frame state machine, the 5-module split, and conversion details.
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| Real ESRGAN4x | 66.9 MB | Image(RGB 2048x2048) | xinntao/Real-ESRGAN | BSD 3-Clause License | 2021 |
| Real ESRGAN Anime4x | 66.9 MB | Image(RGB 2048x2048) | xinntao/Real-ESRGAN | BSD 3-Clause License | 2021 |
Towards Real-World Blind Face Restoration with Generative Facial Prior
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| GFPGAN | 337.4 MB | Image(RGB 512x512) | TencentARC/GFPGAN | Apache2.0 | 2021 |
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| BSRGAN | 66.9 MB | Image(RGB 2048x2048) | cszn/BSRGAN | 2021 |
| Google Drive Link | Size | Output | Original Project | License | year | Conversion Script |
|---|---|---|---|---|---|---|
| A-ESRGAN | 63.8 MB | Image(RGB 1024x1024) | aesrgan/A-ESRGANN | BSD 3-Clause License | 2021 |
Best-Buddy GANs for Highly Detailed Image Super-Resolution
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| Beby-GAN | 66.9 MB | Image(RGB 2048x2048) | dvlab-research/Simple-SR | MIT | 2021 |
The Residual in Residual Dense Network for image super-scaling.
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| RRDN | 16.8 MB | Image(RGB 2048x2048) | idealo/image-super-resolution | Apache2.0 | 2018 |
Fast-SRGAN.
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| Fast-SRGAN | 628 KB | Image(RGB 1024x1024) | HasnainRaz/Fast-SRGAN | MIT | 2019 |
Enhanced-SRGAN.
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| ESRGAN | 66.9 MB | Image(RGB 2048x2048) | xinntao/ESRGAN | Apache 2.0 | 2018 |
Pretrained: 4xESRGAN
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| UltraSharp | 34 MB | Image(RGB 1024x1024) | Kim2019/ | CC-BY-NC-SA-4.0 | 2021 |
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| SRGAN | 6.1 MB | Image(RGB 2048x2048) | dongheehand/SRGAN-PyTorch | 2017 |
Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| SRResNet | 6.1 MB | Image(RGB 2048x2048) | dongheehand/SRGAN-PyTorch | 2017 |
Lightweight Image Super-Resolution with Enhanced CNN.
| Google Drive Link | Size | Output | Original Project | License | year | Conversion Script |
|---|---|---|---|---|---|---|
| LESRCNN | 4.3 MB | Image(RGB 512x512) | hellloxiaotian/LESRCNN | 2020 |
Metric Learning based Interactive Modulation for Real-World Super-Resolution
| Google Drive Link | Size | Output | Original Project | License | year | Conversion Script |
|---|---|---|---|---|---|---|
| MMRealSRGAN | 104.6 MB | Image(RGB 1024x1024) | TencentARC/MM-RealSR | BSD 3-Clause | 2022 | |
| MMRealSRNet | 104.6 MB | Image(RGB 1024x1024) | TencentARC/MM-RealSR | BSD 3-Clause | 2022 |
Pytorch implementation of "Unsupervised Degradation Representation Learning for Blind Super-Resolution", CVPR 2021
| Google Drive Link | Size | Output | Original Project | License | year |
|---|---|---|---|---|---|
| DASR | 12.1 MB | Image(RGB 1024x1024) | The-Learning-And-Vision-Atelier-LAVA/DASR | MIT | 2022 |
wyf0912/SinSR — single-step diffusion-based super-resolution (CVPR 2024, ~113M params). Distilled from ResShift for one-step 4x upscaling. Uses a Swin Transformer UNet with VQ-VAE latent space.
Left: bicubic 4x upscale, Right: SinSR single-step diffusion SR (128x128 → 512x512)
3 CoreML models: VQ-VAE encoder, Swin-UNet denoiser (single step), and VQ-VAE decoder with vector quantization.
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| SinSR_Encoder.mlpackage.zip | 39 MB | image [1,3,1024,1024] | latent [1,3,256,256] | wyf0912/SinSR | S-Lab | 2024 | SinSRDemo | convert_sinsr.py |
| SinSR_Denoiser.mlpackage.zip | 420 MB | input [1,6,256,256] | predicted_latent [1,3,256,256] | |||||
| SinSR_Decoder.mlpackage.zip | 58 MB | latent [1,3,256,256] | image [1,3,1024,1024] |
See sample_apps/SinSRDemo/README.md for the inference pipeline and conversion details.
Learning Temporal Consistency for Low Light Video Enhancement from Single Images.
| Google Drive Link | Size | Output | Original Project | License | Year |
|---|---|---|---|---|---|
| StableLLVE | 17.3 MB | Image(RGB 512x512) | zkawfanx/StableLLVE | MIT | 2021 |
Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement
| Google Drive Link | Size | Output | Original Project | License | Year | Conversion Script |
|---|---|---|---|---|---|---|
| Zero-DCE | 320KB | Image(RGB 512x512) | Li-Chongyi/Zero-DCE | See Repo | 2021 |
Retinexformer: One-stage Retinex-based Transformer for Low-light Image Enhancement
| Google Drive Link | Size | Output | Original Project | License | Year | Conversion Script |
|---|---|---|---|---|---|---|
| ZRetinexformer FiveK | 3.4MB | Image(RGB 512x512) | caiyuanhao1998/Retinexformer | MIT | 2023 | |
| ZRetinexformer NTIRE | 3.4MB | Image(RGB 512x512) | caiyuanhao1998/Retinexformer | MIT | 2023 |
Multi-Stage Progressive Image Restoration.
Debluring
Denoising
Deraining
| Google Drive Link | Size | Output | Original Project | License | Year |
|---|---|---|---|---|---|
| MPRNetDebluring | 137.1 MB | Image(RGB 512x512) | swz30/MPRNet | MIT | 2021 |
| MPRNetDeNoising | 108 MB | Image(RGB 512x512) | swz30/MPRNet | MIT | 2021 |
| MPRNetDeraining | 24.5 MB | Image(RGB 512x512) | swz30/MPRNet | MIT | 2021 |
Learning Enriched Features for Fast Image Restoration and Enhancement.
Denoising
Super Resolution
Contrast Enhancement
Low Light Enhancement
| Google Drive Link | Size | Output | Original Project | License | Year | Conversion Script |
|---|---|---|---|---|---|---|
| MIRNetv2Denoising | 42.5 MB | Image(RGB 512x512) | swz30/MIRNetv2 | ACADEMIC PUBLIC LICENSE | 2022 | |
| MIRNetv2SuperResolution | 42.5 MB | Image(RGB 512x512) | swz30/MIRNetv2 | ACADEMIC PUBLIC LICENSE | 2022 | |
| MIRNetv2ContrastEnhancement | 42.5 MB | Image(RGB 512x512) | swz30/MIRNetv2 | ACADEMIC PUBLIC LICENSE | 2022 | |
| MIRNetv2LowLightEnhancement | 42.5 MB | Image(RGB 512x512) | swz30/MIRNetv2 | ACADEMIC PUBLIC LICENSE | 2022 |
| Google Drive Link | Size | Output | Original Project | License | Sample Project |
|---|---|---|---|---|---|
| MobileStyleGAN | 38.6MB | Image(Color 1024 × 1024) | bes-dev/MobileStyleGAN.pytorch | Nvidia Source Code License-NC | CoreML-StyleGAN |
| Google Drive Link | Size | Output | Original Project |
|---|---|---|---|
| DCGAN | 9.2MB | MultiArray | TensorFlowCore |
| Google Drive Link | Size | Output | Original Project | License | Usage |
|---|---|---|---|---|---|
| Anime2Sketch | 217.7MB | Image(Color 512 × 512) | Mukosame/Anime2Sketch | MIT | Drop an image to preview |
| Google Drive Link | Size | Output | Original Project | Conversion Script |
|---|---|---|---|---|
| AnimeGAN2Face_Paint_512_v2 | 8.6MB | Image(Color 512 × 512) | bryandlee/animegan2-pytorch |
| Google Drive Link | Size | Output | Original Project | License | Note |
|---|---|---|---|---|---|
| Photo2Cartoon | 15.2 MB | Image(Color 256 × 256) | minivision-ai/photo2cartoon | MIT | The output is little bit different from the original model. It cause some operations were converted replaced manually. |
| Google Drive Link | Size | Output | Original Project | Sample |
|---|---|---|---|---|
| AnimeGANv2_Hayao | 8.7MB | Image(256 x 256) | TachibanaYoshino/AnimeGANv2 | AnimeGANv2-iOS |
| Google Drive Link | Size | Output | Original Project |
|---|---|---|---|
| AnimeGANv2_Paprika | 8.7MB | Image(256 x 256) | TachibanaYoshino/AnimeGANv2 |
| Google Drive Link | Size | Output | Original Project |
|---|---|---|---|
| WarpGAN Caricature | 35.5MB | Image(256 x 256) | seasonSH/WarpGAN |
| Google Drive Link | Size | Output | Original Project |
|---|---|---|---|
| UGATIT_selfie2anime | 266.2MB(quantized) | Image(256x256) | taki0112/UGATIT |
| Google Drive Link | Size | Output | Original Project |
|---|---|---|---|
| CartoonGAN_Shinkai | 44.6MB | MultiArray | mnicnc404/CartoonGan-tensorflow |
| CartoonGAN_Hayao | 44.6MB | MultiArray | mnicnc404/CartoonGan-tensorflow |
| CartoonGAN_Hosoda | 44.6MB | MultiArray | mnicnc404/CartoonGan-tensorflow |
| CartoonGAN_Paprika | 44.6MB | MultiArray | mnicnc404/CartoonGan-tensorflow |
| Google Drive Link | Size | Output | Original Project | License | Year |
|---|---|---|---|---|---|
| fast-neural-style-transfer-cuphead | 6.4MB | Image(RGB 960x640) | eriklindernoren/Fast-Neural-Style-Transfer | MIT | 2019 |
| fast-neural-style-transfer-starry-night | 6.4MB | Image(RGB 960x640) | eriklindernoren/Fast-Neural-Style-Transfer | MIT | 2019 |
| fast-neural-style-transfer-mosaic | 6.4MB | Image(RGB 960x640) | eriklindernoren/Fast-Neural-Style-Transfer | MIT | 2019 |
Learning to Cartoonize Using White-box Cartoon Representations
| Google Drive Link | Size | Output | Original Project | License | Year |
|---|---|---|---|---|---|
| White_box_Cartoonization | 5.9MB | Image(1536x1536) | SystemErrorWang/White-box-Cartoonization | creativecommons | CVPR2020 |
White-box facial image cartoonizaiton
| Google Drive Link | Size | Output | Original Project | License | Year |
|---|---|---|---|---|---|
| FacialCartoonization | 8.4MB | Image(256x256) | SystemErrorWang/FacialCartoonization | creativecommons | 2020 |
| Google Drive Link | Size | Output | Original Project | License | Note | Sample Project |
|---|---|---|---|---|---|---|
| AOT-GAN-for-Inpainting | 60.8MB | MLMultiArray(3,512,512) | researchmm/AOT-GAN-for-Inpainting | Apache2.0 | To use see sample. | john-rocky/Inpainting-CoreML |
| Google Drive Link | Size | Input | Output | Original Project | License | Note | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| Lama | 216.6MB | Image (Color 800 × 800), Image (GrayScale 800 × 800) | Image (Color 800 × 800) | advimman/lama | Apache2.0 | To use see sample. | john-rocky/lama-cleaner-iOS | mallman/CoreMLaMa |
microsoft/MoGe (CVPR 2025 Oral) — open-domain monocular 3D geometry from a single image. Predicts a metric depth map, surface normals, and a confidence mask in one forward pass on a DINOv2 ViT-B backbone with three task heads. The successor to MiDaS-style relative depth: depth comes out in real meters.
| Module | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| MoGe-2 ViT-B + normal | ~200 MB FP16 | Image (RGB 504 × 504) | depth + normal + mask + metric_scale | microsoft/MoGe | MIT | 2025 | MoGe2Demo | convert_moge2.py |
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
| Google Drive Link | Size | Output | Original Project | License | Year | Conversion Script |
|---|---|---|---|---|---|---|
| MiDaS_Small | 66.3MB | MultiArray(1x256x256) | isl-org/MiDaS | MIT | 2022 |
ByteDance/Hyper-SD — single-step text-to-image distilled from SD1.5 via Trajectory Segmented Consistency Distillation. ByteDance reports user preference 2x over SD-Turbo at 1 step. Combined with Apple's ml-stable-diffusion (Split-Einsum attention, chunked UNet, 6-bit palettization), runs at acceptable speed and quality on iPhone 15+.
1-step generations on iPhone, 512×512. Prompts: cat with sunglasses, cyberpunk city, japanese garden, astronaut on horse.
4 CoreML models (~947 MB total): CLIP text encoder + Swin-style chunked UNet (6-bit palettized) + VAE decoder. Uses TCD scheduler for single-step inference.
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| HyperSDTextEncoder.mlpackage.zip | 235 MB | input_ids [1,77] | encoder_hidden_states [1,77,768] | ByteDance/Hyper-SD | OpenRAIL++ | 2024 | HyperSDDemo | convert_hypersd.py |
| HyperSDUnetChunk1.mlpackage.zip | 318 MB | latent + encoder_hs + timestep | first half intermediates | |||||
| HyperSDUnetChunk2.mlpackage.zip | 299 MB | first half outputs + skip connections | noise_pred [2,4,64,64] | |||||
| HyperSDVAEDecoder.mlpackage.zip | 95 MB | latent [1,4,64,64] | image [1,3,512,512] |
See sample_apps/HyperSDDemo/README.md for the LoRA fusion, chunked-UNet palettization, and TCD scheduler details.
| Google Drive Link | Original Model | Original Project | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|---|
| stable-diffusion-v1-5 | runwayml/stable-diffusion-v1-5 | runwayml/stable-diffusion | Open RAIL M license | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2022 |
Pastel Mix - a stylized latent diffusion model.This model is intended to produce high-quality, highly detailed anime style with just a few prompts.
| Google Drive Link | Original Model | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|
| pastelMixStylizedAnime_pastelMixPrunedFP16 | andite/pastel-mix | Fantasy.ai | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2023 |
| Google Drive Link | Original Model | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|
| AOM3_orangemixs | WarriorMama777/OrangeMixs | CreativeML OpenRAIL-M | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2023 |
| Google Drive Link | Original Model | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|
| Counterfeit-V2.5 | gsdf/Counterfeit-V2.5 | - | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2023 |
| Google Drive Link | Original Model | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|
| anything-v4.5 | andite/anything-v4.0 | Fantasy.ai | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2023 |
| Google Drive Link | Original Model | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|
| Openjourney | prompthero/openjourney | - | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2023 |
| Google Drive Link | Original Model | License | Run on mac | Conversion Script | Year |
|---|---|---|---|---|---|
| dreamlike-photoreal-2.0 | dreamlike-art/dreamlike-photoreal-2.0 | CreativeML OpenRAIL-M | godly-devotion/MochiDiffusion | godly-devotion/MochiDiffusion | 2023 |
DDColor — AI image colorization for grayscale/B&W photos using dual decoders (ICCV 2023).
| Input | Output |
|---|---|
![]() |
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| DDColor_Tiny.mlpackage.zip | 242 MB | 512×512 RGB | AB channels (LAB) | piddnad/DDColor | Apache-2.0 | 2023 | DDColorDemo | convert_ddcolor.py |
AdaFace — Quality-adaptive face recognition. Outputs 512-dim embedding for face verification and identification.
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| AdaFace_IR18.mlpackage.zip | 48 MB | Image (112×112 face) | 512-dim L2-normalized embedding | mk-minchul/AdaFace | MIT | 2022 | AdaFaceDemo | convert_adaface.py |
3DDFA_V2 — 3D face reconstruction and head pose estimation (yaw, pitch, roll) from a single face image.
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| 3DDFA_V2.mlpackage.zip | 6.3 MB | Image (120×120 RGB) | 62 params (12 pose + 40 shape + 10 expression) | cleardusk/3DDFA_V2 | MIT | 2020 | Face3DDemo |
pyannote segmentation — Speaker diarization with up to 3 simultaneous speakers. Identifies who speaks when, with overlap detection and per-speaker transcription.
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| SpeakerSegmentation.mlpackage.zip | 5.8 MB | 10s mono 16kHz [1,1,160000] | [1, 589, 7] speaker logits | pyannote/segmentation-3.0 | MIT | 2023 | DiarizationDemo | convert_diarization.py |
OpenVoice — Zero-shot voice conversion. Record source and target voice, convert on-device.
openvoice.mp4
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| OpenVoice_SpeakerEncoder.mlpackage.zip | 1.7 MB | Spectrogram [1, T, 513] | 256-dim speaker embedding | myshell-ai/OpenVoice | MIT | 2024 | OpenVoiceDemo | convert_openvoice.py |
| OpenVoice_VoiceConverter.mlpackage.zip | 64 MB | Spectrogram + speaker embeddings | Waveform audio (22050 Hz) |
Hybrid Transformer Demucs — separates music into 4 stems: drums, bass, vocals, and other instruments.
demucs.mp4
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| HTDemucs_SourceSeparation_F32.mlpackage.zip | 80 MB | Audio Waveform [1, 2, 343980] at 44.1kHz | 4 stems (drums, bass, other, vocals) stereo | facebookresearch/demucs | MIT | 2022 | DemucsDemo | convert_htdemucs.py |
Microsoft Florence-2 — a unified vision-language model supporting image captioning, OCR, and object detection from a single model. Converted as 3 CoreML models (INT8): Vision Encoder (DaViT), Text Encoder (BART), and Decoder with autoregressive generation.
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| Florence2VisionEncoder / TextEncoder / Decoder | 260 MB (INT8, 3 models total) | 768x768 RGB image + task prompt | Generated text (caption, OCR, etc.) | microsoft/Florence-2-base | MIT | 2024 | Florence2Demo | convert_florence2.py |
john-rocky/CoreML-LLM — Companion repository for running LLMs on the Apple Neural Engine. Unlike MLX Swift (GPU-only), CoreML-LLM targets ANE for ~10x lower power draw, making always-on on-device LLMs practical on iPhone.
Google Gemma 4 E2B (2B params) running on iPhone 15 Pro at ~11 tok/s decode and ~175 tok/s effective prefill throughput (seq=64 batched prefill → 188 ms TTFT for a 33-token prompt). INT4 palettized, 2048 context length, Sliding Window Attention (28/35 layers are O(W)), Per-Layer Embedding computed inside the ANE graph.
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Swift Package |
|---|---|---|---|---|---|---|---|---|
| mlboydaisuke/gemma-4-E2B-coreml | 2.7 GB (INT4, 4 decode + 4 prefill chunks) | Text prompt (up to 2048 tokens) | Generated text (streaming) | google/gemma-3n-E2B-it | Gemma ToU | 2025 | CoreMLLLMChat | CoreML-LLM |
See CoreML-LLM for the full conversion pipeline, ANE optimization techniques (cat-trick RMSNorm, Conv2d Linear, pre-computed RoPE, stateless KV with explicit I/O), and the Swift sample app.
Google SigLIP — sigmoid-based contrastive image-text model for zero-shot classification. Type any labels (e.g. "cat, dog, car") and get per-label probabilities. Converted as 2 CoreML models (INT8): Image Encoder and Text Encoder.
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| SigLIP_ImageEncoder / TextEncoder | 386 MB (FP16, 2 models total) | 224x224 RGB image + text labels | Per-label similarity scores (softmax) | google/siglip-base-patch16-224 | Apache-2.0 | 2024 | SigLIPDemo | convert_siglip.py |
hexgrad/Kokoro-82M — open-weight 82M-parameter TTS by hexgrad. Style-conditioned StyleTTS2 architecture (BERT + duration predictor + iSTFTNet vocoder) producing 24kHz speech in 9 languages from per-voice style embeddings. The first CoreML port with on-device bilingual (English + Japanese) free-text input — no MLX, no MeCab, no IPADic, no Python G2P at runtime.
ScreenRecording_04-07-2026.12-30-44_1.mov
2 CoreML models: a flexible-length Predictor (BERT + LSTM duration head + text encoder) and 3 fixed-shape Decoder buckets (128 / 256 / 512 frames). The Swift pipeline picks the smallest bucket that fits the predicted total duration, pads input features with zeros, and trims the output audio.
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| Kokoro_Predictor.mlpackage.zip | 75 MB | input_ids [1, T≤256] (int32) + ref_s_style [1, 128] | duration [1, T] + d_for_align [1, 640, T] + t_en [1, 512, T] | hexgrad/Kokoro-82M | Apache-2.0 | 2025 | KokoroDemo | convert_kokoro.py |
| Kokoro_Decoder_128.mlpackage.zip | 238 MB | en_aligned [1, 640, 128] + asr_aligned [1, 512, 128] + ref_s [1, 256] | audio [1, 76800] @ 24kHz | |||||
| Kokoro_Decoder_256.mlpackage.zip | 241 MB | en_aligned [1, 640, 256] + asr_aligned [1, 512, 256] + ref_s [1, 256] | audio [1, 153600] @ 24kHz | |||||
| Kokoro_Decoder_512.mlpackage.zip | 246 MB | en_aligned [1, 640, 512] + asr_aligned [1, 512, 512] + ref_s [1, 256] | audio [1, 307200] @ 24kHz |
See sample_apps/KokoroDemo/README.md for the on-device G2P (English + Japanese), bucketed decoder strategy, and conversion details.
EfficientAD (PDN-Small) — lightweight unsupervised anomaly detection for industrial inspection. Wraps teacher, student, and autoencoder networks into a single model that outputs a per-pixel anomaly heatmap and image-level anomaly score. Pretrained on MVTec AD bottle category.
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| EfficientAD_Bottle.mlpackage.zip | 15 MB (FP16) | 256x256 RGB image | anomaly_map [1,1,256,256] + anomaly_score [0-1] | nelson1425/EfficientAD | MIT | 2023 | EfficientADDemo | convert_efficientad.py |
spotify/basic-pitch — polyphonic Automatic Music Transcription. Converts any audio (any instrument, any voice) into MIDI notes with pitch bend detection. Just 17K parameters / 272 KB — runs in real time on iPhone with full ANE acceleration.
ScreenRecording_04-08-2026.02-14-51_1.mov
The first open-source iOS implementation. Loads any audio file, runs the CoreML model in 2-second sliding windows, then runs the full Python note_creation.py pipeline natively in Swift (onset inference, greedy backwards-in-time tracking, melodia trick, pitch bend extraction). Detected notes are visualized as a piano roll, exported as a Standard MIDI File, and played back through a built-in additive sine synth so you can A/B compare with the original audio.
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project |
|---|---|---|---|---|---|---|---|
| BasicPitch_nmp.mlpackage.zip | 272 KB | audio waveform [1, 43844, 1] @ 22050 Hz mono | note [1,172,88] + onset [1,172,88] + contour [1,172,264] | spotify/basic-pitch | Apache-2.0 | 2022 | BasicPitchDemo |
See sample_apps/BasicPitchDemo/README.md for the sliding-window inference, post-processing port, and iOS-specific gotchas.
stabilityai/stable-audio-open-small — text-to-music generation (497M params). Generates up to 11.9 seconds of stereo 44.1kHz audio from text prompts using rectified flow diffusion.
ScreenRecording_04-04-2026.13-54-08_1.mov
4 CoreML models: T5 text encoder, NumberEmbedder (seconds conditioning), DiT (diffusion transformer), and VAE decoder (Oobleck).
| Download Link | Size | Input | Output | Original Project | License | Year | Sample Project | Conversion Script |
|---|---|---|---|---|---|---|---|---|
| StableAudioT5Encoder.mlpackage.zip | 105 MB | input_ids [1, 64] | text_embeddings [1, 64, 768] | stabilityai/stable-audio-open-small | Stability AI Community | 2024 | StableAudioDemo | convert_stable_audio.py |
| StableAudioNumberEmbedder.mlpackage.zip | 396 KB | normalized_seconds [1] | seconds_embedding [1, 768] | |||||
| StableAudioDiT.mlpackage.zip | 326 MB | latent [1,64,256] + timestep + conditioning | velocity [1,64,256] | |||||
| StableAudioDiT_FP32.mlpackage.zip | 1.3 GB | latent [1,64,256] + timestep + conditioning | velocity [1,64,256] | |||||
| StableAudioVAEDecoder.mlpackage.zip | 149 MB | latent [1, 64, 256] | stereo audio [1, 2, 524288] at 44.1kHz |
See sample_apps/StableAudioDemo/README.md for INT8 vs FP32 DiT selection and conversion details.
import Vision
lazy var coreMLRequest:VNCoreMLRequest = {
let model = try! VNCoreMLModel(for: modelname().model)
let request = VNCoreMLRequest(model: model, completionHandler: self.coreMLCompletionHandler)
return request
}()
let handler = VNImageRequestHandler(ciImage: ciimage,options: [:])
DispatchQueue.global(qos: .userInitiated).async {
try? handler.perform([coreMLRequest])
}
If the model has Image type output:
let result = request?.results?.first as! VNPixelBufferObservation
let uiimage = UIImage(ciImage: CIImage(cvPixelBuffer: result.pixelBuffer))Else the model has Multiarray type output:
For visualizing multiArray as image, Mr. Hollance’s “CoreML Helpers” are very convenient. CoreML Helpers
Converting from MultiArray to Image with CoreML Helpers.
func coreMLCompletionHandler(request:VNRequest?、error:Error?){
let = coreMLRequest.results?.first as!VNCoreMLFeatureValueObservation
let multiArray = result.featureValue.multiArrayValue
let cgimage = multiArray?.cgImage(min:-1、max:1、channel:nil)
Option 2,Use CoreGANContainer. You can use models with dragging&dropping into the container project.
You can make the model size lighter with Quantization if you want. https://coremltools.readme.io/docs/quantization
The lower the number of bits, more the chances of degrading the model accuracy. The loss in accuracy varies with the model.
import coremltools as ct
from coremltools.models.neural_network import quantization_utils
# load full precision model
model_fp32 = ct.models.MLModel('model.mlmodel')
model_fp16 = quantization_utils.quantize_weights(model_fp32, nbits=16)
# nbits can be 16(half size model), 8(1/4), 4(1/8), 2, 1Cover image was taken from Ghibli free images.
On YOLOv5 convertion, dbsystel/yolov5-coreml-tools give me the super inteligent convert script.
And all of original projects
Daisuke Majima Freelance engineer. iOS/MachineLearning/AR I can work on mobile ML projects and AR project. Feel free to contact: rockyshikoku@gmail.com

















































































































