AMD GPU (DirectML) Optimization for Live Mode (No README changes) by ozp3 · Pull Request #1726 · hacksider/Deep-Live-Cam

ozp3 · 2026-04-01T15:25:24Z

As requested, this PR contains the exact same code optimizations from #1710, but excludes any modifications to the README.md file.

Summary by Sourcery

Optimize live webcam face swapping for DirectML/AMD GPUs and improve responsiveness and stability in live mode.

Bug Fixes:

Ensure face analysis and swapping operations are serialized via a global DirectML lock to avoid concurrent execution issues in live mode.

Enhancements:

Reduce default live preview resolution for improved performance, especially on constrained GPUs.
Warm up face analyser and face swapper models when starting webcam preview to reduce initial latency.
Move face detection from a dedicated thread into the processing loop with cached results updated every few frames to reduce overhead.
Increase the idle sleep interval when no frame is available and duplicate frames before GPU color conversion to improve stability and resource usage.

…pre-load fix

remove lnk and bat files as requested

sourcery-ai · 2026-04-01T15:25:30Z

Reviewer's Guide

Introduces AMD GPU / DirectML-friendly live mode optimizations by serializing DirectML calls with a global lock, adjusting live preview behavior, and simplifying the live webcam detection pipeline while keeping existing functionality intact.

Sequence diagram for live webcam processing with DirectML lock and inline detection

sequenceDiagram
    actor User
    participant UI as ui_webcam_preview
    participant CaptureThread
    participant ProcessingThread
    participant FaceAnalyser
    participant FaceSwapper
    participant DMLLock as modules_globals_dml_lock

    User->>UI: click_start_webcam_preview(camera_index)
    UI->>FaceAnalyser: get_face_analyser()
    UI->>FaceSwapper: get_face_swapper()
    UI->>CaptureThread: start_capture_thread()
    UI->>ProcessingThread: start_processing_thread()
    Note over CaptureThread,ProcessingThread: Detection thread is not started

    loop capture_frames
        CaptureThread->>CaptureThread: read_frame_from_camera()
        CaptureThread-->>ProcessingThread: push_frame_to_capture_queue(frame)
    end

    loop process_frames
        ProcessingThread->>ProcessingThread: pop_frame_from_capture_queue()
        alt every_third_frame
            ProcessingThread->>FaceAnalyser: get_one_face_or_get_many_faces(temp_frame)
            activate FaceAnalyser
            FaceAnalyser->>DMLLock: acquire()
            FaceAnalyser-->>FaceAnalyser: DirectML_inference()
            FaceAnalyser->>DMLLock: release()
            deactivate FaceAnalyser
            ProcessingThread-->>ProcessingThread: update_detection_result_cache()
        else reuse_cached_detection
            ProcessingThread-->>ProcessingThread: read_detection_result_cache()
        end

        ProcessingThread-->>FaceSwapper: swap_face(source_face,target_face,temp_frame)
        activate FaceSwapper
        FaceSwapper->>DMLLock: acquire()
        FaceSwapper-->>FaceSwapper: DirectML_inference()
        FaceSwapper->>DMLLock: release()
        deactivate FaceSwapper

        ProcessingThread-->>UI: push_processed_frame_to_display_queue()
        UI-->>User: show_live_preview_frame()
    end

Class diagram for modules using DirectML lock and live processing changes

classDiagram

    class modules_globals {
        +dml_lock Lock
    }

    class face_analyser {
        +get_face_analyser()
        +get_one_face(frame)
        +get_many_faces(frame)
    }

    class face_swapper {
        +get_face_swapper()
        +swap_face(source_face,target_face,temp_frame)
    }

    class ui_live_webcam {
        +webcam_preview(root,camera_index)
        +create_webcam_preview(camera_index)
        +_processing_thread_func(capture_queue,processed_queue,stop_event,latest_frame_holder,detection_result,detection_lock)
        +_detection_thread_func(latest_frame_holder,detection_result,detection_lock,stop_event)
    }

    modules_globals <.. face_analyser : uses_dml_lock
    modules_globals <.. face_swapper : uses_dml_lock

    face_analyser <.. ui_live_webcam : detection_calls
    face_swapper <.. ui_live_webcam : swapping_calls

    face_analyser : +get_one_face(frame) uses dml_lock
    face_analyser : +get_many_faces(frame) uses dml_lock
    face_swapper : +swap_face(source_face,target_face,temp_frame) uses dml_lock

    ui_live_webcam : +webcam_preview preloads_face_analyser_and_face_swapper
    ui_live_webcam : +_processing_thread_func inlines_detection_every_third_frame
    ui_live_webcam : -_detection_thread_func disabled_in_live_mode

File-Level Changes

Change	Details	Files
Serialize DirectML-related face analysis and swapping operations to avoid concurrent access issues on AMD GPUs.	Wrap face analysis calls in get_one_face and get_many_faces with a global dml_lock to ensure exclusive access to the analyser Wrap face_swapper.get in swap_face with the same global dml_lock to serialize DirectML inference execution Introduce a global dml_lock in modules.globals backed by threading.Lock	`modules/face_analyser.py` `modules/processors/frame/face_swapper.py` `modules/globals.py`
Refactor live webcam detection/processing pipeline to run detection inline on the processing thread with frame-skipping and cached results for performance and stability.	Disable the separate detection thread and move detection logic into the processing thread Run detection every 3 frames and reuse cached detection results on intermediate frames to reduce DirectML load Adjust handling of target and many_faces detection results to work with the new inline detection flow Increase the wait time when no frame is available to reduce busy-waiting	`modules/ui.py`
Adjust live preview and initialization behavior for better stability and performance in live mode.	Reduce default preview resolution from 960x540 to 640x360 to lighten GPU/CPU load during live preview Ensure frames are copied before color conversion in the display path to avoid side effects on shared arrays Eagerly initialize face_analyser and face_swapper when starting webcam preview to avoid first-frame stalls Leave a commented-out hook in core.run for preloading face_analyser on GUI startup (no behavioral change)	`modules/ui.py` `modules/core.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

Now that the detection thread is effectively disabled (det_thread.start() commented out) but _detection_thread_func and detection_lock are still wired up, consider either removing or clearly gating this unused threading code to avoid confusion and accidental re‑activation later.
Using a function attribute (_processing_thread_func._det_count) to track detection cadence hides mutable state on the function object; consider moving this counter into a small helper class or a closure-local dict to keep state management more explicit and testable.
The new dml_lock is a global lock shared by both face analysis and swapping; if future work adds more DirectML consumers, it may be worth encapsulating this in a dedicated DML/ORT execution manager to avoid ad‑hoc locking scattered across modules.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Now that the detection thread is effectively disabled (`det_thread.start()` commented out) but `_detection_thread_func` and `detection_lock` are still wired up, consider either removing or clearly gating this unused threading code to avoid confusion and accidental re‑activation later.
- Using a function attribute (`_processing_thread_func._det_count`) to track detection cadence hides mutable state on the function object; consider moving this counter into a small helper class or a closure-local dict to keep state management more explicit and testable.
- The new `dml_lock` is a global lock shared by both face analysis and swapping; if future work adds more DirectML consumers, it may be worth encapsulating this in a dedicated DML/ORT execution manager to avoid ad‑hoc locking scattered across modules.

## Individual Comments

### Comment 1
<location path="modules/globals.py" line_range="75-76" />
<code_context>

 # --- END OF FILE globals.py ---
+
+import threading
+dml_lock = threading.Lock()
</code_context>
<issue_to_address>
**suggestion (performance):** Using a single global DML lock for both analysis and swapping may cause unnecessary serialization

`dml_lock` is now taken for both `get_one_face`/`get_many_faces` and `swap_face`, which serializes all ONNX/DML work and removes parallelism between analysis and swapping. This may become a throughput bottleneck on capable hardware. If the root issue is driver/ORT instability under concurrency, consider narrowing the lock scope (e.g., per-session/per-ORT instance) or clearly documenting where/when this global lock must be used so future changes don’t over‑serialize work.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-01T15:26:56Z

modules/globals.py

+import threading
+dml_lock = threading.Lock()


suggestion (performance): Using a single global DML lock for both analysis and swapping may cause unnecessary serialization

dml_lock is now taken for both get_one_face/get_many_faces and swap_face, which serializes all ONNX/DML work and removes parallelism between analysis and swapping. This may become a throughput bottleneck on capable hardware. If the root issue is driver/ORT instability under concurrency, consider narrowing the lock scope (e.g., per-session/per-ORT instance) or clearly documenting where/when this global lock must be used so future changes don’t over‑serialize work.

ozp3 · 2026-04-01T16:04:46Z

lmk if you want something else

EpsilonPhoenix · 2026-04-13T13:24:31Z

modules/core.py

 # single thread doubles cuda performance - needs to be set before torch import
 if any(arg.startswith('--execution-provider') for arg in sys.argv):
-    os.environ['OMP_NUM_THREADS'] = '1'
+    os.environ['OMP_NUM_THREADS'] = '6'


Why? The comment above this literally tells you why it was set at 1

No cuda for AMD gpus

Yes I know. But I'm pretty sure that threading change affects every type of computer doesn't it? There's no hardware specific threading change

ozp3 and others added 3 commits April 1, 2026 18:21

feat: AMD DML optimization - GPU face detection, detection throttle, …

49f716f

…pre-load fix

Delete run-dml.bat

fceb34b

remove lnk and bat files as requested

Delete DeepLiveCam.lnk

a978673

remove lnk and bat files as requested

sourcery-ai bot reviewed Apr 1, 2026

View reviewed changes

ozp3 mentioned this pull request Apr 1, 2026

AMD GPU (DirectML) Optimization for Live Mode #1710

Merged

DML Lock fixed for cuda and CPU

e563e83

EpsilonPhoenix reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMD GPU (DirectML) Optimization for Live Mode (No README changes)#1726

AMD GPU (DirectML) Optimization for Live Mode (No README changes)#1726
ozp3 wants to merge 4 commits intohacksider:mainfrom
ozp3:amd-dml-optimization-v2

ozp3 commented Apr 1, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Apr 1, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Apr 1, 2026

Uh oh!

ozp3 commented Apr 1, 2026

Uh oh!

EpsilonPhoenix Apr 13, 2026

Uh oh!

ozp3 Apr 14, 2026

Uh oh!

EpsilonPhoenix Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ozp3 commented Apr 1, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for live webcam processing with DirectML lock and inline detection

Class diagram for modules using DirectML lock and live processing changes

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

ozp3 commented Apr 1, 2026

Uh oh!

EpsilonPhoenix Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

ozp3 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

EpsilonPhoenix Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ozp3 commented Apr 1, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Apr 1, 2026 •

edited

Loading