-
Notifications
You must be signed in to change notification settings - Fork 128
Added delete_namespace
and delete_project
functions
#1318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideIntroduces delete_namespace and delete_project functions with rule-based validations, extends the metastore interface to support removal and dataset counting, defines new delete-related errors and exports, and adds comprehensive unit tests. Sequence diagram for deleting a projectsequenceDiagram
actor User
participant "delete_project()"
participant "Session"
participant "Metastore"
participant "Project"
participant "Error"
User->>"delete_project()": Request to delete project
"delete_project()"->>"Session": Get session
"delete_project()"->>"Metastore": Get project by name/namespace
"delete_project()"->>"Metastore": Check if listing project
alt Is listing project
"delete_project()"->>"Error": Raise ProjectDeleteNotAllowedError
else Not listing project
"delete_project()"->>"Metastore": Check if default project
alt Is default project
"delete_project()"->>"Error": Raise ProjectDeleteNotAllowedError
else Not default project
"delete_project()"->>"Metastore": Count datasets in project
alt Project has datasets
"delete_project()"->>"Error": Raise ProjectDeleteNotAllowedError
else Project is empty
"delete_project()"->>"Metastore": Remove project
end
end
end
Sequence diagram for deleting a namespacesequenceDiagram
actor User
participant "delete_namespace()"
participant "Session"
participant "Metastore"
participant "Namespace"
participant "Error"
User->>"delete_namespace()": Request to delete namespace
"delete_namespace()"->>"Session": Get session
"delete_namespace()"->>"Metastore": Get namespace by name
"delete_namespace()"->>"Metastore": Check if system namespace
alt Is system namespace
"delete_namespace()"->>"Error": Raise NamespaceDeleteNotAllowedError
else Not system namespace
"delete_namespace()"->>"Metastore": Check if default namespace
alt Is default namespace
"delete_namespace()"->>"Error": Raise NamespaceDeleteNotAllowedError
else Not default namespace
"delete_namespace()"->>"Metastore": List projects in namespace
alt Namespace has projects
"delete_namespace()"->>"Error": Raise NamespaceDeleteNotAllowedError
else Namespace is empty
"delete_namespace()"->>"Metastore": Remove namespace
end
end
end
Class diagram for new and updated error typesclassDiagram
class NotAllowedError
class ProjectDeleteNotAllowedError
class NamespaceDeleteNotAllowedError
NotAllowedError <|-- ProjectDeleteNotAllowedError
NotAllowedError <|-- NamespaceDeleteNotAllowedError
Class diagram for Metastore interface changesclassDiagram
class Metastore {
+remove_namespace(namespace_id: int, conn=None)
+remove_project(project_id: int, conn=None)
+count_datasets(project_id: Optional[int] = None): int
+is_default_project(project_name: str, namespace_name: str): bool
+is_listing_project(project_name: str, namespace_name: str): bool
}
Class diagram for new delete functions in lib.projects and lib.namespacesclassDiagram
class ProjectsLib {
+delete(name: str, namespace: str, session: Optional[Session])
}
class NamespacesLib {
+delete(name: str, session: Optional[Session])
}
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Metastore.count_datasets uses the base list_datasets query (which joins versions) and will count versions rather than distinct datasets—switch to COUNT(DISTINCT dataset_id) or a simpler count query to get the correct number of datasets.
- The public functions in datachain.lib.projects and datachain.lib.namespaces are both named
delete
, which can be ambiguous or shadow built‐ins; consider renaming them todelete_project
anddelete_namespace
for clarity. - The rules for reserved namespaces/projects (default, system, listing) are duplicated in multiple places—consider centralizing those checks in a shared helper to keep validation logic DRY.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Metastore.count_datasets uses the base list_datasets query (which joins versions) and will count versions rather than distinct datasets—switch to COUNT(DISTINCT dataset_id) or a simpler count query to get the correct number of datasets.
- The public functions in datachain.lib.projects and datachain.lib.namespaces are both named `delete`, which can be ambiguous or shadow built‐ins; consider renaming them to `delete_project` and `delete_namespace` for clarity.
- The rules for reserved namespaces/projects (default, system, listing) are duplicated in multiple places—consider centralizing those checks in a shared helper to keep validation logic DRY.
## Individual Comments
### Comment 1
<location> `src/datachain/data_storage/metastore.py:761` </location>
<code_context>
+ def remove_namespace(self, namespace_id: int, conn=None) -> None:
+ n = self._namespaces
+ with self.db.transaction():
+ self.db.execute(self._namespaces_delete().where(n.c.id == namespace_id))
+
</code_context>
<issue_to_address>
Consider handling the case where the namespace or project does not exist before attempting deletion.
If no rows are deleted, raise a NotFoundError to inform callers that the namespace or project does not exist.
Suggested implementation:
```python
def remove_namespace(self, namespace_id: int, conn=None) -> None:
n = self._namespaces
with self.db.transaction():
result = self.db.execute(self._namespaces_delete().where(n.c.id == namespace_id))
if result.rowcount == 0:
raise NotFoundError(f"Namespace or project with id {namespace_id} does not exist.")
```
Make sure that `NotFoundError` is imported or defined in this file. If it is not, you should add:
```python
from datachain.data_storage.exceptions import NotFoundError
```
or wherever `NotFoundError` is defined in your codebase.
</issue_to_address>
### Comment 2
<location> `src/datachain/lib/namespaces.py:115` </location>
<code_context>
+ f"Namespace {metastore.default_namespace_name} cannot be removed"
+ )
+
+ projects = metastore.list_projects(namespace.id)
+ if len(projects) > 0:
+ raise NamespaceDeleteNotAllowedError(
</code_context>
<issue_to_address>
Consider using a more efficient existence check for projects in a namespace.
Listing all projects and counting them may be inefficient for large namespaces. Use a count query or an existence check to avoid loading all project objects.
Suggested implementation:
```python
if metastore.has_projects(namespace.id):
raise NamespaceDeleteNotAllowedError(
```
If `metastore.has_projects(namespace.id)` does not exist, you will need to implement it in the metastore class. This method should efficiently check for the existence of projects in the given namespace, ideally using a count query or an existence check at the database level.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
||
def remove_namespace(self, namespace_id: int, conn=None) -> None: | ||
n = self._namespaces | ||
with self.db.transaction(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Consider handling the case where the namespace or project does not exist before attempting deletion.
If no rows are deleted, raise a NotFoundError to inform callers that the namespace or project does not exist.
Suggested implementation:
def remove_namespace(self, namespace_id: int, conn=None) -> None:
n = self._namespaces
with self.db.transaction():
result = self.db.execute(self._namespaces_delete().where(n.c.id == namespace_id))
if result.rowcount == 0:
raise NotFoundError(f"Namespace or project with id {namespace_id} does not exist.")
Make sure that NotFoundError
is imported or defined in this file. If it is not, you should add:
from datachain.data_storage.exceptions import NotFoundError
or wherever NotFoundError
is defined in your codebase.
src/datachain/lib/namespaces.py
Outdated
f"Namespace {metastore.default_namespace_name} cannot be removed" | ||
) | ||
|
||
projects = metastore.list_projects(namespace.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (performance): Consider using a more efficient existence check for projects in a namespace.
Listing all projects and counting them may be inefficient for large namespaces. Use a count query or an existence check to avoid loading all project objects.
Suggested implementation:
if metastore.has_projects(namespace.id):
raise NamespaceDeleteNotAllowedError(
If metastore.has_projects(namespace.id)
does not exist, you will need to implement it in the metastore class. This method should efficiently check for the existence of projects in the given namespace, ideally using a count query or an existence check at the database level.
Deploying datachain-documentation with
|
Latest commit: |
6b86ba2
|
Status: | ✅ Deploy successful! |
Preview URL: | https://acc524e7.datachain-documentation.pages.dev |
Branch Preview URL: | https://ilongin-1244-delete-project.datachain-documentation.pages.dev |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1318 +/- ##
==========================================
+ Coverage 87.51% 87.56% +0.05%
==========================================
Files 156 156
Lines 14416 14493 +77
Branches 2068 2081 +13
==========================================
+ Hits 12616 12691 +75
+ Misses 1332 1331 -1
- Partials 468 471 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Ci fails? |
"""Checks if some project in namespace is default""" | ||
|
||
@abstractmethod | ||
def is_listing_project(self, project_name: str, namespace_name: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we prevent it in the db level as well? Have a flag for all special ovjects. To make it more reliable and not easy to hack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, yes, but I would like to do it in a followup as it requires more changes, DB migration etc.
|
||
project = metastore.get_project(name, namespace) | ||
|
||
if metastore.is_listing_project(name, namespace): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checks should be also happening on the API side + on the db level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG but there is a request to change public API
src/datachain/__init__.py
Outdated
"datasets", | ||
"delete_dataset", | ||
"delete_namespace", | ||
"delete_project", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's don't expose project in user's api. Instead you can do something like:
def delete_namespace(name: str):
parts = name.split(".")
if len(parts) == 1:
remove_namespace(parts[0])
elif len(parts) == 2:
delete_project(parts[1])
else:
raise ValueError(f"Invalid namespace format: {name}. Expected 'namespace' or 'ns1.ns2'.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to do this here as there is an open issue about this to change it everywhere (and be consistent) #1302 ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow 🙂
We’re exposing an API that’ll be removed soon anyway. Skipping it would just mean ~5 extra lines of code.
Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to be consistent in all of our APIs and change them all together, but ok, we can do this one in correct way right away. I changed it, please approve if you think this is ok now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple comments below, with one important about Studio related PR and one highly recommended about count_dataset
performance 🙏
def remove_namespace(self, namespace_id: int, conn=None) -> None: | ||
n = self._namespaces | ||
with self.db.transaction(): | ||
self.db.execute(self._namespaces_delete().where(n.c.id == namespace_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, just wonder, should we add another check here (so no datasets exists within this namespace), or is it enough to have this check on a higher level? May be consider to move the check here and raise an exceptions (like "default namespace can not be deleted" or "namespace with datasets can not be deleted) to be on the safe side? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that check should be added here. It's a higher level business logic check and metastore, IMO should be de-coupled of that .. it should be just simple DB layer for other parts to use and that's it. For example, maybe we will have some management command to actually remove those default namespaces at some point and we want to be able to use metastore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it not a business logic, but DB level constraints.
Since we don't have foreign keys at the DB level to prevent deletion of relations and if someone will use this remove_namespace
metastore method to delete namespace, db will be broken — we will have records (datasets) with namespace IDs and no namespace exists in the DB.
Also if we'll want to use remove_namespace
metastore method from somewhere else, we should implement these same sanity checks in multiple places. It is more practical to move it on a metastore level to do not duplicate the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I thought you were referring to removing default / system namespaces and checks for that .. sorry. Anyway, yes, I added additional check for empty namespace / project on removal, that makes sense to me.
def remove_project(self, project_id: int, conn=None) -> None: | ||
p = self._projects | ||
with self.db.transaction(): | ||
self.db.execute(self._projects_delete().where(p.c.id == project_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here with checks on this level 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same answer as above.... I personally wouldn't put these kind of things here. It's business logic
@ilongin what is the progress here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! 👍 Thank you 🤗
Couple comments left (below), also we have come cases in metastore not covered by tests, would be really great to have these covered.
num_projects = metastore.count_projects(namespace.id) | ||
if num_projects > 0: | ||
raise NamespaceDeleteNotAllowedError( | ||
f"Namespace cannot be removed. It contains {num_projects} project(s). " | ||
"Please remove the project(s) first." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can now remove this check from here.
num_datasets = metastore.count_datasets(project.id) | ||
if num_datasets > 0: | ||
raise ProjectDeleteNotAllowedError( | ||
f"Project cannot be removed. It contains {num_datasets} dataset(s). " | ||
"Please remove the dataset(s) first." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can now remove this check from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Up until now we didn't have the way to remove namespace or project. This PR adds 2 functions:
delete_project
anddelete_namespace
There are some rules:
Summary by Sourcery
Enable removal of projects and namespaces with corresponding metastore support, error handling, and comprehensive tests.
New Features:
Enhancements:
Tests:
Chores: