Skip to content

Conversation

@arsenkylyshbek
Copy link
Collaborator

@arsenkylyshbek arsenkylyshbek commented Oct 12, 2025

This PR introduces a floating control bar for the macOS app with quick-access AI chat functionality, including vision support for image attachments.

Frontend changes

Floating Control Bar

  • New always-on-top floating window for instant AI access
  • Glass-effect design with native macOS blur and transparency
  • Draggable, compact interface that stays accessible while working
  • Quick-access menu bar

Components

  • FloatingControlBar.swift: Main floating window implementation
  • AskAIInputView.swift: Text input interface with streaming responses
  • AIResponseView.swift: Response display with audio playback support
  • AudioResponseManager.swift: Native audio playback handling
  • audio_response_service.dart: Cross-platform Flutter audio service

Backend changes (Vision API Integration)

Why these backend changes?

The floating bar needed to support image attachments in chat (drag & drop screenshots, photos, etc.), which required implementing OpenAI's Vision API for image understanding. The existing file chat feature only supported documents (PDFs, text files) via the Assistants API.

Key backend changes

backend/routers/chat.py

  • Fixed file attachment handling to work without requiring a chat session (critical for floating bar's quick-chat UX)
  • Ensured current message with files is included in the context for AI processing

backend/utils/other/chat_file.py

  • Images now stored locally (_chat_files/) instead of uploading to OpenAI Files API
    • Vision API requires base64-encoded images, not file IDs
    • Faster, more reliable for image processing

backend/utils/llm/chat.py

  • Made qa_rag_stream async to support vision API
  • Added image detection in messages
  • When images detected:
    • Reads images from local storage
    • Base64 encodes for vision API
    • Constructs vision-compatible prompts
    • Uses OpenAI's GPT-4 Vision model
  • Extensive logging for debugging vision flow

backend/utils/retrieval/graph.py

  • Smart routing: Images → Vision API, Documents → Assistants API
  • Prevents images from being sent to the wrong handler
  • Forces context-dependent conversation path for vision

@aaravgarg aaravgarg requested a review from beastoin October 12, 2025 22:17
@kodjima33
Copy link
Collaborator

@thinhx can you merge this asap? need to launch it on twitter

@beastoin
Copy link
Collaborator

beastoin commented Oct 16, 2025

hey man your PR is good at the UI, but the backend is not that good

1/ the flow of chat with images is weird. chat with images is basically a new node of the graph if you've read about the graph. it should not be at the end of the node but be routing at some points between it, and the implementation is not quite right[1]. you should maintain the chat session logic at the top to prevent any further breaking on the current chat system.

2/ local image: our deployment is in multi-instances, so you cannot save the image locally in one instance and recall them from another instance. it's a distributed system.

my suggestion: you could define a new node for chat with images and feel free to roll out your implementation there. use google cloud storage to store image files[2].

[1]: let’s use your mobile app/desktop app and chat with omi from your backend a bit more. try switching between image and non-images; you will see the issue.
[2]: please note, if you cannot achieve the encryption in this ticket, let's create a new one since we need to encrypt the image before storing it in google storage.

@arsenkylyshbek

@kodjima33
Copy link
Collaborator

@arsenkylyshbek

@beastoin
Copy link
Collaborator

beastoin commented Nov 4, 2025

to move faster, i have reverted the backend changes and merge your UI changes to #3368

@arsenkylyshbek fyi ~

next: pls pull the main and test it carefully, then ask Mohsin man help deploy it to the app store

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants