GitHub - sbhjt-gr/InferrLM: On-device AI for iOS & Android

InferrLM (Previously Inferra)

InferrLM is a mobile application that brings LLMs & SLMs directly to your Android & iOS device and lets your device act as a local server. Cloud-based models like Claude, Gemini and ChatGPT are also supported. File attachments with RAG are also well-supported for local models.

If you want to support me and the development of this project, you can donate to me through Ko-fi.

Demo

Demos of using the Apple Foundation Model and downloading MLX models from HuggingFace and running them on-device.

Features

Core Inference

Core local inference through llama.cpp with support for GGUF models on both Android and iOS.
Local MLX inference for Apple Silicon devices.
Seamless integration with cloud-based models from OpenAI, Gemini, and Anthropic. You need your own API keys and an InferrLM registered account for remote models. Using remote models is optional.
Customizable base URLs for OpenAI-compatible providers like OpenRouter, Groq, Ollama, LM Studio, Together AI. This allows you to access alternative API endpoints within the app.
Apple Foundation model support for Apple Intelligence supported devices.

Vision and Multimodal

Vision support through multimodal models with their corresponding projector (mmproj) files. You can read more about them here.
Built-in camera lets you capture pictures directly within the app and send them to models.

Document Processing and RAG

RAG (Retrieval-Augmented Generation) support for enhanced document understanding and context-aware responses.
File attachment support with a built-in document extractor that performs OCR locally on all pages of your documents and extracts text content to send to the local models.
Document ingestion system that processes and indexes your files for efficient retrieval during conversations.
Native file upload support for the remote models.

Local Server

Built-in HTTP server that exposes REST APIs for accessing your models from any device on your local network. The server can be started from the Server tab. Share your InferrLM chat interface with computers, tablets, or other devices through a URL.
Full API documentation is available HERE and at the server homepage.
A command-line interface tool is available at github.com/sbhjt-gr/InferrLM-CLI that demonstrates how to build applications using its API.

Model Management

Download manager that fetches models directly from HuggingFace. Cherry-picked model list optimized for running on edge devices is available in Models -> "Download Models" tab.
Downloaded models appear in the chat screen model selector and in the "Stored Models" tab inside the "Models" tab.
Import models from local storage or download directly from URLs.

Chat Experience

Messages support editing, regeneration, copy functionality and markdown rendering.
Fast native markdown rendering with math rendering support powered by react-native-nitro-markdown, a C++ based renderer built on the Nitro Modules bridge.
Dedicated branching support on each chat bubble lets you fork the conversation from any message, preserving the original thread so you can explore alternate directions without losing prior context.
Code generated by the models is rendered inside codeblocks with clipboard functionality.
Chat history management with the ability to pin conversations.

If you want to contribute or just try to run it locally, follow the guide below. Your work/modifications should adhere to our LICENSE.

Prerequisites

Node.js (>= 16.0.0, < 23.0.0)
npm or yarn
Expo CLI
Android Studio (for Android development)
Xcode (for iOS development)

Installation

Clone the repository

git clone https://github.com/sbhjt-gr/InferrLM
cd InferrLM

Install dependencies
```
yarn install
```
Set up environment variables Configure your API keys and Firebase settings. The list of variables is available in the app.config.json

Run on device or emulator

# For Android
npx expo run:android

# For iOS
npx expo run:ios

REST API

InferrLM includes a built-in HTTP server that exposes the models using the OpenAI API for accessing your local models from any device on your local network. This allows you to integrate InferrLM with other applications, scripts, or services.

Starting the Server

Open the InferrLM app
Navigate to the Server tab
Toggle the server switch to start it
The server URL will be displayed (typically http://YOUR_DEVICE_IP:8889)

Command Line Interface

The InferrLM-CLI tool is a terminal-based client that connects to your InferrLM server and provides an interactive chat interface directly from your command line. This serves as both a functional tool and a reference implementation for developers who want to build applications using the InferrLM REST API.

The CLI is built with React and Ink to provide a basic terminal UI with features like streaming responses, conversation history, and an interactive setup flow. You can find the complete source code and installation instructions at github.com/sbhjt-gr/InferrLM-CLI.

To get started with the CLI, make sure your InferrLM server is running on your mobile device, then install the CLI tool and follow the setup instructions provided in its repository.

API Documentation

Once the server is running, you can access the complete API documentation by opening the server URL in any web browser. The documentation includes:

Chat and completion endpoints
Model management operations
RAG and embeddings APIs
Server configuration and status

For detailed API reference, see the REST API Documentation.

License

This project is distributed under the AGPL-3.0 License. Please read it here. Any modifications must adhere to the rules of this LICENSE.

Contributing

Contributions are welcome! You can find issues in the issues tab or raise new ones and start your work.

Read our Contributing Guide for detailed contribution guidelines, code standards, and best practices.

Tech Stack

Framework: React Native 0.81 with Expo 54 (New Architecture)
App language: TypeScript, JavaScript
iOS native modules: Swift
Android native modules: Kotlin
Inference engine: C, C++
Navigation: React Navigation
Database: OP-SQLite, Expo SQLite

Acknowledgments

llama.cpp - The underlying engine for running local GGUF models on both Android and iOS.
mlx-swift-lm - Swift library for running MLX language models on Apple Silicon, powering the MLX inference backend on iOS.
inferrlm-llama.rn - The customized React Native adapter which provides the bridge for llama.cpp. Originally forked and self-hosted from llama.rn for updating llama.cpp more frequently.
@inferrlm/react-native-mlx - Apple Silicon MLX inference engine for iOS, providing optimized on-device performance via the Nitro Modules bridge forked and maintained from react-native-nitro-mlx
react-native-nitro-markdown - Native C++ markdown renderer for React Native, used for fast chat message rendering.
react-native-rag + @langchain/textsplitters - RAG implementation for React Native that powers the document retrieval and ingestion features using LangChain.
react-native-ai - The adaptor that provides access to the Apple Foundation model with its Swift API.
If someone thinks they also need to be mentioned here, please let me know.

Star History

_{Star this repository if you find it useful!}

Name		Name	Last commit message	Last commit date
Latest commit History 1,448 Commits
assets		assets
docs		docs
modules		modules
src		src
.easignore		.easignore
.eslintrc.js		.eslintrc.js
.gitattributes		.gitattributes
.gitignore		.gitignore
App.tsx		App.tsx
LICENSE		LICENSE
README.md		README.md
app.config.js		app.config.js
babel.config.js		babel.config.js
eas.json		eas.json
index.tsx		index.tsx
metro.config.js		metro.config.js
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InferrLM (Previously Inferra)

Demo

Features

Core Inference

Vision and Multimodal

Document Processing and RAG

Local Server

Model Management

Chat Experience

Prerequisites

Installation

REST API

Starting the Server

Command Line Interface

API Documentation

License

Contributing

Tech Stack

Acknowledgments

Star History

About

Uh oh!

Releases 10

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

InferrLM (Previously Inferra)

Demo

Features

Core Inference

Vision and Multimodal

Document Processing and RAG

Local Server

Model Management

Chat Experience

Prerequisites

Installation

REST API

Starting the Server

Command Line Interface

API Documentation

License

Contributing

Tech Stack

Acknowledgments

Star History

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages