Skip to content

Conversation

uruemu
Copy link
Contributor

@uruemu uruemu commented Apr 16, 2025

Which issue does this PR close?

Addresses #5702: Split service logic to backend and core modules

Part of #5702.

Rationale for this change

What changes are included in this PR?

This pull request refactors the GridFsBackend by introducing a new GridFsBackendCore struct to hosting connection functionality and state. GridFsCore implements the kv::Adapters trait, allowing the removal of the Adapter struct

Are there any user-facing changes?

@uruemu uruemu requested a review from Xuanwo as a code owner April 16, 2025 01:10
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. core releases-note/refactor The PR does a refactor on code or has a title that begins with "refactor" services/hdfs labels Apr 16, 2025
@uruemu uruemu marked this pull request as draft April 17, 2025 03:05
@Xuanwo
Copy link
Member

Xuanwo commented Apr 18, 2025

@uruemu uruemu marked this pull request as ready for review April 21, 2025 01:16
@dosubot dosubot bot added the releases-note/feat The PR implements a new feature or has a title that begins with "feat" label Apr 21, 2025
@uruemu
Copy link
Contributor Author

uruemu commented Apr 21, 2025

@Xuanwo yes, just took some time to fix that. The PR is ready for review

Copy link
Member

@erickguan erickguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping HdfsCore. Hope you had a great easter.

One comment about how the backend references the core.

#[derive(Clone, Debug)]
pub struct HdfsCore {
pub info: Arc<AccessorInfo>,
pub client: Arc<hdrs::Client>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: If we push this further, we should wrap Client's methods instead of exposing the client. I doubt this will be trouble in the future. So either other maintainer's call or you decide the direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I originally started by abstracting the client calls but then decided against it since the wrappers would be quite thin (mostly wrapping the exact call) and the core library is only used internally within this module. I'm leaning more for letting future changes drive this if there becomes a need to abstract more than just the nested client call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can get a clone of the client and reduce the call chain. I'll add that in a new commit and you can let me know what you think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have applied this change here fb2e349. Let me know if it makes sense. Thanks

@uruemu uruemu requested a review from erickguan April 23, 2025 01:16
Ok((
RpDelete::default(),
oio::OneShotDeleter::new(HdfsDeleter::new(Arc::new(self.clone()))),
oio::OneShotDeleter::new(HdfsDeleter::new(Arc::clone(&self.core), self.root.clone())),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@uruemu uruemu requested a review from erickguan April 25, 2025 02:15
Copy link
Member

@erickguan erickguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @uruemu. We now have a base to start adding code to HdfsCore. We’ll continue iterating to build out this core. Some implementation details can be extracted into functions within the core itself. This will make the backend easier to read and understand, as it will delegate more of the implementation to either the core or the reader/writer.


fn info(&self) -> Arc<AccessorInfo> {
self.info.clone()
self.core.info()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use self.core.info.

pub struct HdfsBackend {
pub info: Arc<AccessorInfo>,
pub root: String,
root: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this into HdfsCore. When using the root in the HdfsBackend, use self.core.root.

}

async fn stat(&self, path: &str, _: OpStat) -> Result<RpStat> {
let p = build_rooted_abs_path(&self.root, path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this line to 272: m.set_last_modified(meta.modified().into()); This can be a convenience function. Maybe as HdfsCore::hdfs_get_metadata. Then we can use hdfs_get_metadata in these functions:

  • blocking_stat
  • stat

self.core.client.metadata is a good function. We will keep using self.core.client.metadata.

}

async fn read(&self, path: &str, args: OpRead) -> Result<(RpRead, Self::Reader)> {
let p = build_rooted_abs_path(&self.root, path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move parts of this implementation to core as HdfsCore::hdfs_read_from_path. Once we have done that, the read function will become:

  1. read the file
  2. return results (HdfsReader is what OpenDAL returns for users to read data from)

}

async fn write(&self, path: &str, op: OpWrite) -> Result<(RpWrite, Self::Writer)> {
let target_path = build_rooted_abs_path(&self.root, path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, we can move parts of this implementation to core as HdfsCore::hdfs_open_path_for_writing. Once we have done that, the write function will become:

  • open the file for writing
  • return results (HdfsWriter is what OpenDAL returns for users to write data from)

Ok((RpList::default(), Some(rd)))
}

async fn rename(&self, from: &str, to: &str, _args: OpRename) -> Result<RpRename> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't read the details of the implementation. At a glance, this looks similar to blocking_rename. Maybe we can build HdfsCore::hdfs_rename. This will benefit:

  • rename
  • blocking_rename

}

fn blocking_write(&self, path: &str, op: OpWrite) -> Result<(RpWrite, Self::BlockingWriter)> {
let target_path = build_rooted_abs_path(&self.root, path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to extract this implementation to core as well.

}

fn blocking_read(&self, path: &str, args: OpRead) -> Result<(RpRead, Self::BlockingReader)> {
let p = build_rooted_abs_path(&self.root, path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to extract this to core too.

}

impl HdfsCore {
pub fn info(&self) -> Arc<AccessorInfo> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this function. pub info: Arc<AccessorInfo> is enough.

@uruemu
Copy link
Contributor Author

uruemu commented Apr 30, 2025

@erickguan Will come back to this PR this week. Sorry, I've been a bit busy

@erickguan
Copy link
Member

@uruemu Thanks for working on this and your schedule!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core releases-note/feat The PR implements a new feature or has a title that begins with "feat" releases-note/refactor The PR does a refactor on code or has a title that begins with "refactor" services/hdfs size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants