-
Couldn't load subscription status.
- Fork 648
Remove texture-based video processing in NvDecoder #6076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR refactors the legacy NvDecoder video reader by removing texture-based frame processing and replacing it with direct VideoColorSpaceConversion calls. The TextureObject class and all texture caching infrastructure have been eliminated, simplifying the codebase and aligning the legacy reader with the experimental video reader implementation. The convert_frame method has been inlined into receive_frames, eliminating the indirection through process_frame and the textures_ map that cached CUDA texture objects for luma and chroma planes. This change maintains functionality while reducing code complexity by approximately 100 lines across the header and implementation files.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h | 2/5 | Removed TextureObject class, texture management structs, and get_textures method; orphaned TexID and tex_hash remain unused |
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc | 2/5 | Inlined color space conversion into receive_frames with hardcoded stride calculation that breaks non-RGB modes |
Confidence score: 2/5
- This PR contains a critical bug in the stride calculation and incomplete cleanup that will cause failures in production
- Score reflects hardcoded stride
sequence.width * 3that assumes 3 channels (RGB only) instead of usingsequence.channels, breaking YUV and other non-RGB output modes; unused TexID/tex_hash definitions suggest incomplete refactoring; missing full-range handling for non-RGB conversions in lines 389-392 - Pay close attention to dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc lines 398 and 389-392, and verify whether TexID/tex_hash in the header file can be safely removed
Additional Comments (1)
-
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h, line 179-200 (link)style: TexID and tex_hash are now unused after texture removal but remain in the header. Are TexID and tex_hash still needed for any other purpose, or should they be removed along with the texture-related code?
2 files reviewed, 4 comments
| TYPE_SWITCH(dtype_, type2id, OutType, NVDECODER_SUPPORTED_TYPES, ( | ||
| auto* tensor_out = sequence.sequence.mutable_data<OutType>() + frame_stride; | ||
| VideoColorSpaceConversion( | ||
| reinterpret_cast<OutType *>(tensor_out), sequence.width * 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Stride calculation uses sequence.width * 3 but should use sequence.width * sequence.channels to match the frame_stride calculation pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| auto conversion_type = rgb_ ? | ||
| is_full_range ? VIDEO_COLOR_SPACE_CONVERSION_TYPE_YUV_TO_RGB_FULL_RANGE : | ||
| VIDEO_COLOR_SPACE_CONVERSION_TYPE_YUV_TO_RGB : | ||
| VIDEO_COLOR_SPACE_CONVERSION_TYPE_YUV_UPSAMPLE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The conversion_type ternary doesn't cover the YUV case when !rgb_ with full_range=true. Should full_range affect YUV_UPSAMPLE? Should VIDEO_COLOR_SPACE_CONVERSION_TYPE_YUV_UPSAMPLE have a full-range variant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is intended. is_full_range is applicable only to the RGB onversion.
| stream_); | ||
| ), DALI_FAIL(make_string("Unsupported type: ", dtype_))); | ||
| // synchronize before MappedFrame is destroyed and cuvidUnmapVideoFrame is called | ||
| CUDA_CALL(cudaStreamSynchronize(stream_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Stream synchronization happens inside the loop for each frame. Consider moving to line 421 after all frames are processed to improve performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended, as frames are mapped separately for each iteration and the synchronization should happen before we call cuvidUnmapVideoFrame at the end of it.
|
CI MESSAGE: [37191286]: BUILD STARTED |
8e4ea64 to
2406771
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR refactors the legacy video decoder by eliminating the texture-based frame processing pipeline in favor of direct color space conversion. The change removes the TextureObject class (which wrapped CUDA texture management), the convert_frame method that performed texture-based conversions, and the textures_ map that cached texture objects keyed by input pointer and dimensions. Instead, receive_frames() now calls VideoColorSpaceConversion directly on the raw decoded frame data. This simplifies the codebase by removing an intermediate caching layer and aligns the legacy reader with the experimental video reader, where texture usage provided minimal performance benefit given the single-use nature of video frames.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h | 4/5 | Removed TextureObject class, convert_frame/get_textures methods, and textures_ member; left behind unused TexID typedef, tex_hash struct, and std::hash specialization that should be cleaned up |
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc | 3/5 | Replaced texture-based conversion with direct VideoColorSpaceConversion calls; stride calculation uses hardcoded * 3 instead of * sequence.channels, and stream synchronization moved inside frame loop instead of after all frames |
Confidence score: 3/5
- This PR simplifies the video decoder but introduces potential memory corruption and performance issues that may surface in production
- Score reflects three critical issues: (1) hardcoded stride multiplier causing buffer mismatch for non-RGB formats, (2) missing full-range handling for YUV upsample conversion type, (3) redundant per-frame stream synchronization
- Pay close attention to dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc lines 398 and 407-408 where stride calculation and synchronization logic need correction
Sequence Diagram
sequenceDiagram
participant User
participant NvDecoder
participant Parser as CUVideoParser
participant Decoder as CUVideoDecoder
participant ColorSpace as VideoColorSpaceConversion
participant FrameQueue as ThreadSafeQueue
User->>NvDecoder: push_req(FrameReq)
activate NvDecoder
NvDecoder->>FrameQueue: push(req)
deactivate NvDecoder
User->>NvDecoder: decode_packet(AVPacket)
activate NvDecoder
NvDecoder->>NvDecoder: decode_av_packet()
NvDecoder->>Parser: cuvidParseVideoData()
activate Parser
Parser-->>NvDecoder: handle_sequence(CUVIDEOFORMAT)
NvDecoder->>Decoder: initialize(format)
Parser-->>NvDecoder: handle_decode(CUVIDPICPARAMS)
NvDecoder->>Decoder: cuvidDecodePicture()
Parser-->>NvDecoder: handle_display(CUVIDPARSERDISPINFO)
NvDecoder->>FrameQueue: push(disp_info)
deactivate Parser
deactivate NvDecoder
User->>NvDecoder: receive_frames(SequenceWrapper)
activate NvDecoder
loop for each frame
NvDecoder->>FrameQueue: pop()
FrameQueue-->>NvDecoder: disp_info
NvDecoder->>Decoder: cuvidMapVideoFrame()
Decoder-->>NvDecoder: MappedFrame
NvDecoder->>ColorSpace: VideoColorSpaceConversion()
activate ColorSpace
ColorSpace->>ColorSpace: YUV to RGB / Upsample
ColorSpace-->>NvDecoder: converted frame
deactivate ColorSpace
NvDecoder->>NvDecoder: cudaStreamSynchronize()
NvDecoder->>Decoder: cuvidUnmapVideoFrame()
end
NvDecoder->>NvDecoder: record_sequence_event_()
deactivate NvDecoder
NvDecoder-->>User: SequenceWrapper with frames
Additional Comments (1)
-
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h, line 179-200 (link)style: TexID, tex_hash, and the hash template specialization (lines 179-220) are now unused after removing textures. Consider removing this dead code.
2 files reviewed, 1 comment
2406771 to
3b1876e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR removes texture-based video frame processing from the legacy NvDecoder implementation, replacing it with direct VideoColorSpaceConversion calls. The change eliminates the TextureObject class, texture caching (textures_ member), and associated methods (get_textures, convert_frame), simplifying the video decode pipeline to align with the experimental video reader implementation. The refactor preserves existing color space conversion functionality (YUV to RGB/upsampling with full-range support) while removing an intermediate texture layer that the developers determined provided insufficient performance benefit. This unification reduces code complexity in the legacy video reader path within the DALI data loading library.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h | 1/5 | Removed TextureObject class, texture management methods, and textures_ member, but left dangling tex_hash struct that references undefined TexID type causing compilation errors |
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc | 2/5 | Replaced texture-based conversion with direct VideoColorSpaceConversion call in receive_frames, but stride parameter may be incorrect (elements vs bytes), and synchronization happens inefficiently inside the frame loop |
Confidence score: 2/5
- This PR contains critical compilation errors and potential runtime issues that make it unsafe to merge in its current state
- Score reduced because (1) the header leaves orphaned
tex_hashstruct referencing removedTexIDtype breaking compilation, (2) stride calculation may pass wrong units toVideoColorSpaceConversion, (3) stream synchronization inefficiently placed inside loop, and (4) full-range YUV upsample conversion path may be incomplete - Pay close attention to dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h lines 179-198 (dangling tex_hash struct), nvdecoder.cc lines 394-395 (stride parameter calculation), and lines 387-392 (conversion_type logic for YUV full-range case)
Additional Comments (2)
-
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h, line 179-198 (link)style: The entire
tex_hashstruct and itshash_combineimplementation are now unused. They were only needed for the deletedstd::unordered_map<TexID, TextureObjects, tex_hash> textures_member. Remove this dead code. -
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h, line 188 (link)logic:
TexIDtype is referenced here but is no longer defined anywhere in the file. This will cause a compilation error.
2 files reviewed, 2 comments
3b1876e to
4e4f262
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR refactors the NvDecoder video reader by eliminating texture-based frame conversion in favor of direct VideoColorSpaceConversion calls. The TextureObject class, texture caching map (textures_), and the convert_frame method have been removed. The new implementation directly calls VideoColorSpaceConversion with raw device pointers in the receive_frames loop, passing full-range metadata stored in the frame_full_range_ vector. This change unifies the legacy video reader with the experimental reader by removing a layer of indirection that provided minimal performance benefit.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h | 4/5 | Removed TextureObject class, texture map member, and get_textures method; cleaned up texture-related includes and hash specialization |
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc | 2/5 | Replaced texture-based convert_frame with inline VideoColorSpaceConversion calls; contains stride calculation bug and incomplete full-range logic |
Confidence score: 2/5
- This PR has critical bugs that will likely cause runtime errors and incorrect output for certain configurations
- The stride calculation error (line 398) uses hardcoded
3instead ofsequence.channels, which will break non-RGB output modes. The incomplete full-range logic (lines 389-392) means YUV output with full-range color won't be handled correctly. Stream synchronization inside the frame loop (line 413) is inefficient compared to synchronizing once after all frames - Pay close attention to dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc lines 386-421, especially the stride calculation, conversion_type ternary, and stream synchronization placement
Additional Comments (1)
-
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h, line 190-199 (link)style: ScaleMethod enum and its hash specialization are now unused after texture removal and can be deleted
2 files reviewed, 1 comment
ce64217 to
f02e194
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR removes the texture-based video frame processing layer from the NvDecoder implementation in DALI's legacy video reader. The change eliminates approximately 90 lines of texture management infrastructure—including the TextureObject class, texture caching (textures_ map), and the convert_frame method—replacing it with direct calls to VideoColorSpaceConversion. This unification aligns the legacy reader with the experimental one by removing an abstraction layer that profiling showed provided minimal performance benefit. The core color space conversion logic is preserved but moved inline into the receive_frames method, where conversion type (full-range YUV-to-RGB, standard YUV-to-RGB, or YUV upsampling) is now determined directly based on the rgb_ flag and per-frame full_range metadata.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h | 3/5 | Removes TextureObject/TextureObjects classes, texture caching infrastructure (TexID, tex_hash), textures_ map, get_textures and convert_frame methods; leaves unused includes and ScaleMethod enum |
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc | 2/5 | Replaces texture-based conversion with direct VideoColorSpaceConversion calls; moves conversion type logic inline; retains per-frame cudaStreamSynchronize inside loop |
Confidence score: 2/5
- This PR has significant correctness and performance concerns that should be addressed before merging
- Score reduced due to: (1) stride parameter mismatch between comment and code (line 398uses
sequence.width * sequence.channelsbut historical comment suggestssequence.width * 3), (2) incomplete conversion_type logic that doesn't handle full-range YUV_UPSAMPLE case when!rgb_andis_full_range=true, (3) per-frame synchronization in loop (line 407) causing performance overhead, (4) unused artifacts (ScaleMethod enum, tuple/unordered_map includes) left in header suggesting incomplete cleanup - Critical attention needed: stride calculation at line 398 in nvdecoder.cc, conversion_type ternary logic at lines 392-395, and synchronization placement at line 407
2 files reviewed, no comments
|
!build |
|
CI MESSAGE: [37212149]: BUILD STARTED |
|
CI MESSAGE: [37212149]: BUILD FAILED |
3960235 to
68b46b6
Compare
|
!build |
1 similar comment
|
!build |
|
CI MESSAGE: [37259971]: BUILD STARTED |
|
CI MESSAGE: [37259971]: BUILD FAILED |
68b46b6 to
10e9397
Compare
|
!build |
|
CI MESSAGE: [37290525]: BUILD STARTED |
|
CI MESSAGE: [37290525]: BUILD FAILED |
10e9397 to
8df03e0
Compare
|
!build |
|
CI MESSAGE: [37301989]: BUILD STARTED |
|
CI MESSAGE: [37301989]: BUILD FAILED |
8df03e0 to
7ff531f
Compare
|
CI MESSAGE: [37306863]: BUILD PASSED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR refactors the NvDecoder video reader by removing texture-based frame processing in favor of direct VideoColorSpaceConversion calls. The texture-based approach used CUDA texture objects with a caching layer (textures_ map) to abstract GPU memory access, but added complexity without meaningful performance gains. The refactoring deletes ~400 lines of texture infrastructure including the TextureObject RAII wrapper class (nvdecoder.cc), the imgproc.h/cu kernel files containing texture-based YCbCr conversion kernels, and the convert_frame/get_textures methods. Frame conversion now happens inline in receive_frames() by calling VideoColorSpaceConversion directly on mapped frame buffers, determining the conversion type based on rgb_ and full_range flags. The test suite is updated to tolerate slightly higher numerical differences between CPU and GPU paths due to implementation variations. This change unifies the legacy video reader with the experimental implementation by simplifying the code path while maintaining equivalent functionality.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| dali/operators/video/legacy/reader/nvdecoder/imgproc.h | 5/5 | Removed entire header file containing process_frame texture-based conversion interface |
| dali/operators/video/legacy/reader/nvdecoder/imgproc.cu | 5/5 | Deleted ~245 lines of texture-based YCbCr-to-RGB CUDA kernels with template specializations |
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h | 5/5 | Removed TextureObject class, ScaleMethod enum, texture management infrastructure and related methods |
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc | 3/5 | Replaced texture-based conversion with inline VideoColorSpaceConversion call; potential synchronization inefficiency |
| dali/test/python/test_video_reader.py | 4/5 | Added configurable threshold for frame comparison to handle numerical differences; typo in variable name |
Confidence score: 3/5
- This PR requires careful review due to logic gaps and performance concerns despite being mostly a simplification
- Score reflects three significant issues: (1) per-frame synchronization in a loop creates performance bottleneck, (2) conversion_type logic doesn't handle full-range YUV case (missing branch when
!rgb_ && is_full_range), and (3) typoadditonal_argsshould beadditional_argsin test file. These are straightforward fixes but affect correctness and performance. - Pay close attention to dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc lines385-405 for the conversion logic and synchronization pattern
Sequence Diagram
sequenceDiagram
participant User
participant NvDecoder
participant FrameQueue
participant MappedFrame
participant VideoColorSpaceConversion
participant SequenceWrapper
participant CUDAStream
User->>NvDecoder: receive_frames(sequence)
loop for each frame in sequence
NvDecoder->>FrameQueue: pop()
FrameQueue-->>NvDecoder: frame_disp_info
NvDecoder->>MappedFrame: create(frame_disp_info, decoder_, stream_)
MappedFrame->>MappedFrame: cuvidMapVideoFrame()
NvDecoder->>NvDecoder: determine conversion_type (YUV to RGB or YUV upsample)
NvDecoder->>VideoColorSpaceConversion: convert(tensor_out, frame.get_ptr(), width, height, conversion_type, normalized_, stream_)
VideoColorSpaceConversion-->>NvDecoder: conversion complete
NvDecoder->>CUDAStream: cudaStreamSynchronize(stream_)
NvDecoder->>MappedFrame: destroy
MappedFrame->>MappedFrame: cuvidUnmapVideoFrame()
NvDecoder->>SequenceWrapper: add timestamp
NvDecoder->>NvDecoder: mark frame_in_use_ as false
end
NvDecoder->>SequenceWrapper: set_started(stream_)
NvDecoder-->>User: sequence populated
Additional Comments (2)
-
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h, line 28 (link)style: std::tuple include appears unused after removing TexID type definition
-
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h, line 29 (link)style: std::unordered_map include appears unused after removing textures_ member
5 files reviewed, 3 comments
| if device == "cpu": | ||
| additonal_args = {"threshold": 0.06} | ||
| else: | ||
| additonal_args = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: Typo: additonal_args should be additional_args
| if device == "cpu": | |
| additonal_args = {"threshold": 0.06} | |
| else: | |
| additonal_args = {} | |
| additional_args = {"threshold": 0.06} | |
| else: | |
| additional_args = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
ab8247e to
cc0a927
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR refactors the NvDecoder video processing pipeline by removing texture-based color space conversion in favor of direct VideoColorSpaceConversion calls. The changes eliminate approximately 300 lines of texture management infrastructure—including the TextureObject RAII wrapper, texture caching via textures_ map, and the imgproc.h/.cu kernel implementation—simplifying the legacy video reader to align with the experimental implementation. The core conversion now happens in receive_frames by mapping decoded frames directly, performing YUV-to-RGB or YUV upsampling via VideoColorSpaceConversion, and unmapping after synchronization. This unification reduces code complexity while maintaining functional parity with the previous texture-based approach.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| dali/operators/video/legacy/reader/nvdecoder/imgproc.h | 5/5 | Deleted header declaring texture-based process_frame function |
| dali/operators/video/legacy/reader/nvdecoder/imgproc.cu | 5/5 | Deleted CUDA kernels for YCbCr-to-RGB conversion via texture objects |
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h | 5/5 | Removed TextureObject class, textures_ map, get_textures method, and ScaleMethod enum |
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc | 3/5 | Replaced texture-based conversion with direct VideoColorSpaceConversion calls; per-frame synchronization may impact performance |
| dali/test/python/test_video_reader.py | 2/5 | Added device-specific comparison thresholds (6% CPU, 3% GPU) with syntax error in additional_args assignment |
Confidence score: 3/5
- This PR is relatively safe to merge but contains a syntax error and performance/correctness concerns that require resolution before merging.
- Score reflects: (1) syntax error in test file with extra spaces before
=, (2) per-frame synchronization in the conversion loop that could degrade performance compared to batch synchronization after all frames are processed, (3) increased CPU test threshold (6%) suggesting potential regression that needs validation, and (4) unclear handling of full-range YUV upsampling case in the conversion_type logic. - Pay close attention to
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc(conversion loop synchronization and conversion_type logic) anddali/test/python/test_video_reader.py(syntax error and threshold justification).
Sequence Diagram
sequenceDiagram
participant User
participant NvDecoder
participant CUVideoParser
participant CUVideoDecoder
participant VideoColorSpaceConversion
participant CUDA
User->>NvDecoder: "decode_av_packet(AVPacket, start_time, stream_base)"
NvDecoder->>CUVideoParser: "cuvidParseVideoData(parser_, &cupkt)"
Note over CUVideoParser: Parser triggers callbacks
CUVideoParser->>NvDecoder: "handle_sequence(user_data, format)"
NvDecoder->>NvDecoder: "handle_sequence_(format)"
NvDecoder->>CUVideoDecoder: "initialize(format)"
NvDecoder-->>CUVideoParser: "return success/failure"
CUVideoParser->>NvDecoder: "handle_decode(user_data, pic_params)"
NvDecoder->>NvDecoder: "handle_decode_(pic_params)"
NvDecoder->>CUVideoDecoder: "cuvidDecodePicture(decoder_, pic_params)"
NvDecoder-->>CUVideoParser: "return success/failure"
CUVideoParser->>NvDecoder: "handle_display(user_data, disp_info)"
NvDecoder->>NvDecoder: "handle_display_(disp_info)"
NvDecoder->>NvDecoder: "frame_queue_.push(disp_info)"
NvDecoder-->>CUVideoParser: "return success/failure"
NvDecoder-->>User: "return VidReqStatus"
User->>NvDecoder: "receive_frames(SequenceWrapper)"
loop For each frame
NvDecoder->>NvDecoder: "frame_queue_.pop()"
NvDecoder->>NvDecoder: "MappedFrame(disp_info, decoder_, stream_)"
NvDecoder->>CUVideoDecoder: "cuvidMapVideoFrame(decoder_, picture_index)"
CUVideoDecoder-->>NvDecoder: "ptr_, pitch_"
NvDecoder->>VideoColorSpaceConversion: "VideoColorSpaceConversion(tensor_out, width, frame_ptr, pitch, height, conversion_type, normalized_, stream_)"
VideoColorSpaceConversion->>CUDA: "Launch conversion kernel"
CUDA-->>VideoColorSpaceConversion: "Complete"
VideoColorSpaceConversion-->>NvDecoder: "return"
NvDecoder->>CUDA: "cudaStreamSynchronize(stream_)"
NvDecoder->>CUVideoDecoder: "cuvidUnmapVideoFrame(decoder_, ptr_)"
end
NvDecoder-->>User: "return"
5 files reviewed, no comments
cc0a927 to
661c83c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR refactors the legacy video decoder by removing the texture-based color space conversion infrastructure and replacing it with direct VideoColorSpaceConversion calls. The changes eliminate the TextureObject class, the textures_ cache (an unordered_map storing texture resources), and the convert_frame method. Instead of creating CUDA texture objects as intermediates and calling process_frame() with texture handles, the decoder now invokes VideoColorSpaceConversion directly with raw CUDA surface pointers from the decoder's frame queue. This simplifies the codebase by removing ~200 lines of texture management code (creation, caching, lifetime tracking) and unifies the legacy and experimental video reader implementations. The conversion now happens inline within receive_frames(), and the output stride calculation switches from ALIGN32(decoder_.width()) (used for texture input) to sequence.width * sequence.channels (matching the direct buffer layout). The test suite adjusts the CPU threshold to 0.06 to account for precision differences in the new direct color space conversion path.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc |
4/5 | Replaces texture-based conversion with direct VideoColorSpaceConversion calls, removes texture caching and TextureObject logic |
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h |
5/5 | Removes TextureObject class, ScaleMethod enum, texture cache member, and related method declarations |
dali/operators/video/legacy/reader/nvdecoder/imgproc.cu |
5/5 | Deletes entire file containing texture-based YCbCr-to-RGB CUDA kernels and process_frame template |
dali/operators/video/legacy/reader/nvdecoder/imgproc.h |
5/5 | Deletes header exposing process_frame function accepting texture objects |
dali/test/python/test_video_reader.py |
4/5 | Adds conditional threshold for CPU tests to handle precision differences in new color space conversion path |
Confidence score: 3/5
- This PR requires careful review due to the architectural change in video processing and unresolved logic concerns
- Score reflects critical open questions: the conversion_type ternary does not handle full_range YUV upsample cases (when
!rgb_ && is_full_range), the stream synchronization remains inside the loop instead of after all frames, and the stride calculation in line 395 may need verification against the actual VideoColorSpaceConversion implementation's stride expectations - Pay close attention to
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cclines 385-402 (conversion type logic and VideoColorSpaceConversion arguments), line 421 (stream sync placement), and verify that VideoColorSpaceConversion correctly handles all color space cases previously supported by the removed imgproc.cu kernels
Sequence Diagram
sequenceDiagram
participant User
participant NvDecoder
participant CUVideoParser
participant CUVideoDecoder
participant VideoColorSpaceConversion
participant CUDA
User->>NvDecoder: "decode_av_packet(AVPacket*, start_time, stream_base)"
NvDecoder->>CUVideoParser: "cuvidParseVideoData(parser_, &cupkt)"
Note over CUVideoParser: Parser triggers callbacks
CUVideoParser->>NvDecoder: "handle_sequence(CUVIDEOFORMAT*)"
NvDecoder->>CUVideoDecoder: "initialize(format)"
CUVideoDecoder-->>NvDecoder: "success/failure"
NvDecoder-->>CUVideoParser: "kNvcuvid_success/failure"
CUVideoParser->>NvDecoder: "handle_decode(CUVIDPICPARAMS*)"
NvDecoder->>CUVideoDecoder: "cuvidDecodePicture(decoder_, pic_params)"
CUVideoDecoder-->>NvDecoder: "success/failure"
NvDecoder-->>CUVideoParser: "kNvcuvid_success/failure"
CUVideoParser->>NvDecoder: "handle_display(CUVIDPARSERDISPINFO*)"
NvDecoder->>NvDecoder: "frame_queue_.push(disp_info)"
NvDecoder-->>CUVideoParser: "kNvcuvid_success/failure"
CUVideoParser-->>NvDecoder: "return status"
NvDecoder-->>User: "VidReqStatus"
User->>NvDecoder: "receive_frames(SequenceWrapper&)"
loop For each frame in sequence
NvDecoder->>NvDecoder: "frame_queue_.pop()"
NvDecoder->>CUVideoDecoder: "cuvidMapVideoFrame(decoder_, picture_index)"
CUVideoDecoder-->>NvDecoder: "ptr, pitch"
NvDecoder->>VideoColorSpaceConversion: "VideoColorSpaceConversion(tensor_out, frame_ptr, pitch, width, height, conversion_type)"
VideoColorSpaceConversion->>CUDA: "Launch kernel on stream_"
CUDA-->>VideoColorSpaceConversion: "complete"
VideoColorSpaceConversion-->>NvDecoder: "complete"
NvDecoder->>CUDA: "cudaStreamSynchronize(stream_)"
CUDA-->>NvDecoder: "sync complete"
NvDecoder->>CUVideoDecoder: "cuvidUnmapVideoFrame(decoder_, ptr)"
CUVideoDecoder-->>NvDecoder: "unmapped"
end
NvDecoder->>NvDecoder: "record_sequence_event_(sequence)"
NvDecoder-->>User: "frames received"
5 files reviewed, no comments
|
|
||
| auto* frame_disp_info = frame_queue_.pop(); | ||
| if (stop_) break; | ||
| auto frame = MappedFrame{frame_disp_info, decoder_, stream_}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| MappedFrame frame{frame_disp_info, decoder_, stream_}; |
This avoids the unnecessary move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the suggestion is displayed weirdly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| for k in range(num_frames): | ||
| compare_frames(sample_experimental[k], sample_legacy[k], i, j, k) | ||
| if device == "cpu": | ||
| additional_args = {"threshold": 0.06} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this PR lower change the precision this much for CPU? What are we comparing against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are comparing to a FFmpeg decoding and frame conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The latest change is a small refactor in dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc at line 381, switching from pointer-style initialization (auto* frame = MappedFrame{...}) to direct brace initialization (MappedFrame frame{...}). This change improves the code by making the RAII ownership semantics of MappedFrame clearer and avoiding an unnecessary move operation. Since MappedFrame is a move-only RAII type that manages the lifetime of a video frame (automatically calling cuvidUnmapVideoFrame on destruction), direct initialization communicates the ownership intent more explicitly and prevents accidental pointer reassignment. This aligns with the broader PR goal of simplifying video decoder code by removing texture-based processing and making frame lifetime management more straightforward.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc | 5/5 | Changed frame initialization from pointer to direct brace initialization for clearer RAII semantics |
Confidence score: 5/5
- This PR is safe to merge with minimal risk
- Score reflects a simple, well-understood RAII refactoring that removes an unnecessary move operation while improving code clarity
- No files require special attention; the change is a straightforward stylistic improvement
1 file reviewed, no comments
- Removes TextureObject class and related texture management code - Replaces texture-based convert_frame with direct VideoColorSpaceConversion - Removes textures_ member and get_textures method - Unifies the video reader with the experimental one by removing texture usage which don't provide much gain in this use case Signed-off-by: Janusz Lisiecki <[email protected]>
Signed-off-by: Janusz Lisiecki <[email protected]>
…b video conversion. Signed-off-by: Michal Zientkiewicz <[email protected]>
2382e9f to
6b42199
Compare
|
As a co-author, I'm removing myself from review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR removes texture-based video processing from the NvDecoder implementation, replacing it with direct VideoColorSpaceConversion calls. The refactoring deletes the TextureObject class and all associated texture management infrastructure (caching, hash maps, get_textures method), simplifying the video reader by eliminating ~300 lines of texture-specific code. The legacy reader now directly passes raw frame pointers to VideoColorSpaceConversion, matching the experimental reader's approach. The chroma sampling offset in color_space.cu was adjusted from 0.25 to 0.5, and the CPU decoder path received additional SwsContext quality flags (SWS_FULL_CHR_H_INT, SWS_ACCURATE_RND) to compensate for visual differences after texture removal. Test utilities were also updated with clearer error messages.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc | 4/5 | Removed texture caching and convert_frame method; replaced with inline VideoColorSpaceConversion call using raw frame pointers |
| dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h | 5/5 | Deleted TextureObject class, hash specializations, and texture-related members/methods |
| dali/operators/video/color_space.cu | 3/5 | Changed chroma sampling offset from 0.25 to 0.5 and refactored to use float intermediates with normalized input/output paths |
| dali/operators/video/legacy/reader/nvdecoder/imgproc.cu | 5/5 | Entire file deleted (245 lines of texture-based YCbCr-to-RGB conversion kernels) |
| dali/operators/video/legacy/reader/nvdecoder/imgproc.h | 5/5 | Entire file deleted (header for texture-based process_frame function) |
| dali/operators/video/frames_decoder_cpu.cc | 4/5 | Added SWS_FULL_CHR_H_INT and SWS_ACCURATE_RND flags to improve CPU color conversion quality |
| dali/test/python/test_video_reader.py | 5/5 | Updated comments and error messages for better readability; no functional changes |
Confidence score: 3/5
- This PR requires careful review due to semantic changes in the color space conversion path and potential visual output differences
- Score reflects concerns about the chroma sampling offset change (0.25→0.5) which could affect video quality if it doesn't match codec chroma siting conventions, plus the incomplete handling of full-range YUV conversion mentioned in previous reviews, and the performance impact of per-frame stream synchronization inside the loop
- Pay close attention to
dali/operators/video/color_space.cu(verify chroma sampling correctness and visual output),dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc(confirm conversion_type logic handles all cases and stream sync placement is optimal), and validate that existing tests thoroughly cover all color space conversion paths with both normalized/non-normalized and full-range/limited-range inputs
Sequence Diagram
sequenceDiagram
participant User
participant NvDecoder
participant CUVideoParser
participant CUVideoDecoder
participant MappedFrame
participant VideoColorSpaceConversion
participant CUDA
User->>NvDecoder: "decode_av_packet(AVPacket)"
NvDecoder->>CUVideoParser: "cuvidParseVideoData(CUVIDSOURCEDATAPACKET)"
CUVideoParser->>NvDecoder: "handle_sequence_(CUVIDEOFORMAT)"
NvDecoder->>CUVideoDecoder: "initialize(format)"
CUVideoParser->>NvDecoder: "handle_decode_(CUVIDPICPARAMS)"
NvDecoder->>CUVideoDecoder: "cuvidDecodePicture(pic_params)"
CUVideoParser->>NvDecoder: "handle_display_(CUVIDPARSERDISPINFO)"
NvDecoder->>NvDecoder: "frame_queue_.push(disp_info)"
User->>NvDecoder: "receive_frames(SequenceWrapper)"
NvDecoder->>NvDecoder: "frame_queue_.pop()"
NvDecoder->>MappedFrame: "MappedFrame(disp_info, decoder, stream)"
MappedFrame->>CUVideoDecoder: "cuvidMapVideoFrame(decoder, picture_index)"
NvDecoder->>VideoColorSpaceConversion: "VideoColorSpaceConversion(output, yuv_data, params)"
VideoColorSpaceConversion->>CUDA: "VideoColorSpaceConversionKernel<<<...>>>()"
CUDA->>CUDA: "YUV to RGB conversion"
NvDecoder->>CUDA: "cudaStreamSynchronize(stream)"
MappedFrame->>CUVideoDecoder: "cuvidUnmapVideoFrame(decoder, ptr)"
NvDecoder->>SequenceWrapper: "return decoded frames"
7 files reviewed, no comments
Signed-off-by: Michal Zientkiewicz <[email protected]>
| Height(), | ||
| sws_output_format, | ||
| SWS_BILINEAR, | ||
| SWS_BILINEAR|SWS_FULL_CHR_H_INT|SWS_ACCURATE_RND, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This caused inaccurate results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This review covers only the changes made since the last review, not the entire PR. The most recent change is a minor formatting adjustment to test_video_reader.py. The developer collapsed a multi-line compare_frames function call into a single line, bringing it into compliance with the repository's 100-character line limit. This is purely a stylistic change with no functional impact, likely triggered by an automated formatter like Black (as configured in pyproject.toml). The change aligns with DALI's coding standards and supports better readability of the test file.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| dali/test/python/test_video_reader.py | 5/5 | Collapsed multi-line function call to single line for style compliance |
Confidence score: 5/5
- This change is safe to merge with no risk of issues
- Score reflects a trivial formatting change with zero functional impact and full alignment with project style guidelines
- No files require special attention; this is a simple whitespace adjustment
1 file reviewed, no comments
|
CI MESSAGE: [37437527]: BUILD STARTED |
Signed-off-by: Michal Zientkiewicz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This summary covers only the changes made since the last review. The PR modifies a single line in the CUDA color space conversion kernel (dali/operators/video/color_space.cu), adjusting the chroma sampling offset from 0.5f to 0.25f for both the X and Y axes. This change corrects chroma siting alignment when upsampling YUV 4:2:0 to 4:4:4 during color space conversion. The offset adjustment repositions where the bilinear sampler reads chroma values, likely switching from one chroma siting convention (e.g., MPEG-1 center-positioned) to another (e.g., MPEG-2 co-sited or JPEG left-positioned). This is part of the broader effort to remove texture-based video processing and switch to direct VideoColorSpaceConversion—the texture path may have handled chroma positioning differently, necessitating this correction to maintain visual quality and avoid color fringing artifacts in the new direct-conversion path.
Important Files Changed
| Filename | Score | Overview |
|---|---|---|
| dali/operators/video/color_space.cu | 3/5 | Modified chroma sampling offset from 0.5f to 0.25f (lines 50, 53) to correct chroma siting alignment during YUV to RGB conversion |
Confidence score: 3/5
- This PR contains a targeted chroma siting adjustment that is likely safe but requires validation against video codec standards and visual quality tests
- Score reflects uncertainty around whether 0.25f is the correct offset for all YUV input formats and chroma siting conventions, and whether this change inadvertently introduces artifacts or requires corresponding updates elsewhere in the color conversion pipeline
- Pay close attention to the chroma offset calculation logic (lines 50-53) and verify this change against test videos with various chroma siting standards (MPEG-1, MPEG-2, JPEG); also confirm the change doesn't conflict with the previously identified stride bug and missing full-range conversion case
1 file reviewed, no comments
|
CI MESSAGE: [37442117]: BUILD STARTED |
|
CI MESSAGE: [37442117]: BUILD FAILED |
which don't provide much gain in this use case
Category:
Other (e.g. Documentation, Tests, Configuration)
Description:
which don't provide much gain in this use case
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-4474