Skip to content

Conversation

ilongin
Copy link
Contributor

@ilongin ilongin commented Sep 5, 2025

Up until now we didn't have the way to remove namespace or project. This PR adds 2 functions: delete_project and delete_namespace

There are some rules:

  1. Cannot delete default namespace or project
  2. Cannot delete system namespace
  3. Cannot delete listing project
  4. Cannot delete non empty project (has datasets in it)
  5. Cannot delete non empty namespace (has projects in it)

Summary by Sourcery

Enable removal of projects and namespaces with corresponding metastore support, error handling, and comprehensive tests.

New Features:

  • Add delete_project function to remove a project with safety checks
  • Add delete_namespace function to remove a namespace with safety checks

Enhancements:

  • Extend metastore with remove_project, remove_namespace, count_datasets, is_default_project, and is_listing_project methods

Tests:

  • Add unit tests for delete_project covering success, not found, listing/default, and non-empty scenarios
  • Add unit tests for delete_namespace covering success, not found, system/default, and non-empty scenarios

Chores:

  • Define ProjectDeleteNotAllowedError and NamespaceDeleteNotAllowedError error classes
  • Export delete_project in the public API

Copy link
Contributor

sourcery-ai bot commented Sep 5, 2025

Reviewer's Guide

Introduces delete_namespace and delete_project functions with rule-based validations, extends the metastore interface to support removal and dataset counting, defines new delete-related errors and exports, and adds comprehensive unit tests.

Sequence diagram for deleting a project

sequenceDiagram
  actor User
  participant "delete_project()"
  participant "Session"
  participant "Metastore"
  participant "Project"
  participant "Error"
  User->>"delete_project()": Request to delete project
  "delete_project()"->>"Session": Get session
  "delete_project()"->>"Metastore": Get project by name/namespace
  "delete_project()"->>"Metastore": Check if listing project
  alt Is listing project
    "delete_project()"->>"Error": Raise ProjectDeleteNotAllowedError
  else Not listing project
    "delete_project()"->>"Metastore": Check if default project
    alt Is default project
      "delete_project()"->>"Error": Raise ProjectDeleteNotAllowedError
    else Not default project
      "delete_project()"->>"Metastore": Count datasets in project
      alt Project has datasets
        "delete_project()"->>"Error": Raise ProjectDeleteNotAllowedError
      else Project is empty
        "delete_project()"->>"Metastore": Remove project
      end
    end
  end
Loading

Sequence diagram for deleting a namespace

sequenceDiagram
  actor User
  participant "delete_namespace()"
  participant "Session"
  participant "Metastore"
  participant "Namespace"
  participant "Error"
  User->>"delete_namespace()": Request to delete namespace
  "delete_namespace()"->>"Session": Get session
  "delete_namespace()"->>"Metastore": Get namespace by name
  "delete_namespace()"->>"Metastore": Check if system namespace
  alt Is system namespace
    "delete_namespace()"->>"Error": Raise NamespaceDeleteNotAllowedError
  else Not system namespace
    "delete_namespace()"->>"Metastore": Check if default namespace
    alt Is default namespace
      "delete_namespace()"->>"Error": Raise NamespaceDeleteNotAllowedError
    else Not default namespace
      "delete_namespace()"->>"Metastore": List projects in namespace
      alt Namespace has projects
        "delete_namespace()"->>"Error": Raise NamespaceDeleteNotAllowedError
      else Namespace is empty
        "delete_namespace()"->>"Metastore": Remove namespace
      end
    end
  end
Loading

Class diagram for new and updated error types

classDiagram
  class NotAllowedError
  class ProjectDeleteNotAllowedError
  class NamespaceDeleteNotAllowedError
  NotAllowedError <|-- ProjectDeleteNotAllowedError
  NotAllowedError <|-- NamespaceDeleteNotAllowedError
Loading

Class diagram for Metastore interface changes

classDiagram
  class Metastore {
    +remove_namespace(namespace_id: int, conn=None)
    +remove_project(project_id: int, conn=None)
    +count_datasets(project_id: Optional[int] = None): int
    +is_default_project(project_name: str, namespace_name: str): bool
    +is_listing_project(project_name: str, namespace_name: str): bool
  }
Loading

Class diagram for new delete functions in lib.projects and lib.namespaces

classDiagram
  class ProjectsLib {
    +delete(name: str, namespace: str, session: Optional[Session])
  }
  class NamespacesLib {
    +delete(name: str, session: Optional[Session])
  }
Loading

File-Level Changes

Change Details Files
Add delete_project and delete_namespace in library layers
  • Implement delete(name, namespace) with validation rules for default, system, listing, and non-empty
  • Fetch session and metastore, then call remove_project/remove_namespace
  • Raise ProjectDeleteNotAllowedError or NamespaceDeleteNotAllowedError on violations
src/datachain/lib/projects.py
src/datachain/lib/namespaces.py
Extend metastore API and backend implementations
  • Add abstract methods: remove_project, remove_namespace, is_default_project, is_listing_project, count_datasets
  • Implement concrete remove_* methods using DB transactions
  • Implement count_datasets via SQL count on dataset records
src/datachain/data_storage/metastore.py
Define new delete-related exceptions and update exports
  • Add ProjectDeleteNotAllowedError and NamespaceDeleteNotAllowedError in error module
  • Update init.py to include delete_project in public API
src/datachain/error.py
src/datachain/__init__.py
Add unit tests for delete functions and rules
  • Test successful deletion of empty project/namespace
  • Test errors for missing, default, system, listing and non-empty cases
  • Verify error messages match expected strings
tests/unit/lib/test_project.py
tests/unit/lib/test_namespace.py

Possibly linked issues

  • Initial DataChain Commit #1: The PR implements the delete_project method and includes the rule to prevent deleting non-empty projects as requested by the issue.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@ilongin ilongin linked an issue Sep 5, 2025 that may be closed by this pull request
@ilongin ilongin requested a review from dreadatour September 5, 2025 12:08
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Metastore.count_datasets uses the base list_datasets query (which joins versions) and will count versions rather than distinct datasets—switch to COUNT(DISTINCT dataset_id) or a simpler count query to get the correct number of datasets.
  • The public functions in datachain.lib.projects and datachain.lib.namespaces are both named delete, which can be ambiguous or shadow built‐ins; consider renaming them to delete_project and delete_namespace for clarity.
  • The rules for reserved namespaces/projects (default, system, listing) are duplicated in multiple places—consider centralizing those checks in a shared helper to keep validation logic DRY.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Metastore.count_datasets uses the base list_datasets query (which joins versions) and will count versions rather than distinct datasets—switch to COUNT(DISTINCT dataset_id) or a simpler count query to get the correct number of datasets.
- The public functions in datachain.lib.projects and datachain.lib.namespaces are both named `delete`, which can be ambiguous or shadow built‐ins; consider renaming them to `delete_project` and `delete_namespace` for clarity.
- The rules for reserved namespaces/projects (default, system, listing) are duplicated in multiple places—consider centralizing those checks in a shared helper to keep validation logic DRY.

## Individual Comments

### Comment 1
<location> `src/datachain/data_storage/metastore.py:761` </location>
<code_context>

+    def remove_namespace(self, namespace_id: int, conn=None) -> None:
+        n = self._namespaces
+        with self.db.transaction():
+            self.db.execute(self._namespaces_delete().where(n.c.id == namespace_id))
+
</code_context>

<issue_to_address>
Consider handling the case where the namespace or project does not exist before attempting deletion.

If no rows are deleted, raise a NotFoundError to inform callers that the namespace or project does not exist.

Suggested implementation:

```python
    def remove_namespace(self, namespace_id: int, conn=None) -> None:
        n = self._namespaces
        with self.db.transaction():
            result = self.db.execute(self._namespaces_delete().where(n.c.id == namespace_id))
            if result.rowcount == 0:
                raise NotFoundError(f"Namespace or project with id {namespace_id} does not exist.")

```

Make sure that `NotFoundError` is imported or defined in this file. If it is not, you should add:
```python
from datachain.data_storage.exceptions import NotFoundError
```
or wherever `NotFoundError` is defined in your codebase.
</issue_to_address>

### Comment 2
<location> `src/datachain/lib/namespaces.py:115` </location>
<code_context>
+            f"Namespace {metastore.default_namespace_name} cannot be removed"
+        )
+
+    projects = metastore.list_projects(namespace.id)
+    if len(projects) > 0:
+        raise NamespaceDeleteNotAllowedError(
</code_context>

<issue_to_address>
Consider using a more efficient existence check for projects in a namespace.

Listing all projects and counting them may be inefficient for large namespaces. Use a count query or an existence check to avoid loading all project objects.

Suggested implementation:

```python
    if metastore.has_projects(namespace.id):
        raise NamespaceDeleteNotAllowedError(

```

If `metastore.has_projects(namespace.id)` does not exist, you will need to implement it in the metastore class. This method should efficiently check for the existence of projects in the given namespace, ideally using a count query or an existence check at the database level.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.


def remove_namespace(self, namespace_id: int, conn=None) -> None:
n = self._namespaces
with self.db.transaction():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider handling the case where the namespace or project does not exist before attempting deletion.

If no rows are deleted, raise a NotFoundError to inform callers that the namespace or project does not exist.

Suggested implementation:

    def remove_namespace(self, namespace_id: int, conn=None) -> None:
        n = self._namespaces
        with self.db.transaction():
            result = self.db.execute(self._namespaces_delete().where(n.c.id == namespace_id))
            if result.rowcount == 0:
                raise NotFoundError(f"Namespace or project with id {namespace_id} does not exist.")

Make sure that NotFoundError is imported or defined in this file. If it is not, you should add:

from datachain.data_storage.exceptions import NotFoundError

or wherever NotFoundError is defined in your codebase.

f"Namespace {metastore.default_namespace_name} cannot be removed"
)

projects = metastore.list_projects(namespace.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Consider using a more efficient existence check for projects in a namespace.

Listing all projects and counting them may be inefficient for large namespaces. Use a count query or an existence check to avoid loading all project objects.

Suggested implementation:

    if metastore.has_projects(namespace.id):
        raise NamespaceDeleteNotAllowedError(

If metastore.has_projects(namespace.id) does not exist, you will need to implement it in the metastore class. This method should efficiently check for the existence of projects in the given namespace, ideally using a count query or an existence check at the database level.

Copy link

cloudflare-workers-and-pages bot commented Sep 5, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 6b86ba2
Status: ✅  Deploy successful!
Preview URL: https://acc524e7.datachain-documentation.pages.dev
Branch Preview URL: https://ilongin-1244-delete-project.datachain-documentation.pages.dev

View logs

Copy link

codecov bot commented Sep 5, 2025

Codecov Report

❌ Patch coverage is 90.69767% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.56%. Comparing base (8c9b666) to head (6b86ba2).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/datachain/data_storage/metastore.py 86.04% 2 Missing and 4 partials ⚠️
src/datachain/namespace.py 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1318      +/-   ##
==========================================
+ Coverage   87.51%   87.56%   +0.05%     
==========================================
  Files         156      156              
  Lines       14416    14493      +77     
  Branches     2068     2081      +13     
==========================================
+ Hits        12616    12691      +75     
+ Misses       1332     1331       -1     
- Partials      468      471       +3     
Flag Coverage Δ
datachain 87.50% <90.69%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/datachain/__init__.py 100.00% <100.00%> (ø)
src/datachain/error.py 100.00% <100.00%> (ø)
src/datachain/lib/namespaces.py 100.00% <100.00%> (ø)
src/datachain/lib/projects.py 100.00% <100.00%> (ø)
src/datachain/namespace.py 93.47% <71.42%> (-3.96%) ⬇️
src/datachain/data_storage/metastore.py 94.12% <86.04%> (+0.37%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shcheklein
Copy link
Member

Ci fails?

"""Checks if some project in namespace is default"""

@abstractmethod
def is_listing_project(self, project_name: str, namespace_name: str) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we prevent it in the db level as well? Have a flag for all special ovjects. To make it more reliable and not easy to hack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, yes, but I would like to do it in a followup as it requires more changes, DB migration etc.


project = metastore.get_project(name, namespace)

if metastore.is_listing_project(name, namespace):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks should be also happening on the API side + on the db level.

Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG but there is a request to change public API

"datasets",
"delete_dataset",
"delete_namespace",
"delete_project",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's don't expose project in user's api. Instead you can do something like:

def delete_namespace(name: str):
    parts = name.split(".")
    if len(parts) == 1:
        remove_namespace(parts[0])
    elif len(parts) == 2:
        delete_project(parts[1])
    else:
        raise ValueError(f"Invalid namespace format: {name}. Expected 'namespace' or 'ns1.ns2'.")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to do this here as there is an open issue about this to change it everywhere (and be consistent) #1302 ..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow 🙂

We’re exposing an API that’ll be removed soon anyway. Skipping it would just mean ~5 extra lines of code.

Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to be consistent in all of our APIs and change them all together, but ok, we can do this one in correct way right away. I changed it, please approve if you think this is ok now.

Copy link
Contributor

@dreadatour dreadatour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments below, with one important about Studio related PR and one highly recommended about count_dataset performance 🙏

def remove_namespace(self, namespace_id: int, conn=None) -> None:
n = self._namespaces
with self.db.transaction():
self.db.execute(self._namespaces_delete().where(n.c.id == namespace_id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, just wonder, should we add another check here (so no datasets exists within this namespace), or is it enough to have this check on a higher level? May be consider to move the check here and raise an exceptions (like "default namespace can not be deleted" or "namespace with datasets can not be deleted) to be on the safe side? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that check should be added here. It's a higher level business logic check and metastore, IMO should be de-coupled of that .. it should be just simple DB layer for other parts to use and that's it. For example, maybe we will have some management command to actually remove those default namespaces at some point and we want to be able to use metastore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it not a business logic, but DB level constraints.

Since we don't have foreign keys at the DB level to prevent deletion of relations and if someone will use this remove_namespace metastore method to delete namespace, db will be broken — we will have records (datasets) with namespace IDs and no namespace exists in the DB.

Also if we'll want to use remove_namespace metastore method from somewhere else, we should implement these same sanity checks in multiple places. It is more practical to move it on a metastore level to do not duplicate the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I thought you were referring to removing default / system namespaces and checks for that .. sorry. Anyway, yes, I added additional check for empty namespace / project on removal, that makes sense to me.

def remove_project(self, project_id: int, conn=None) -> None:
p = self._projects
with self.db.transaction():
self.db.execute(self._projects_delete().where(p.c.id == project_id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here with checks on this level 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as above.... I personally wouldn't put these kind of things here. It's business logic

@shcheklein
Copy link
Member

@ilongin what is the progress here?

Copy link
Contributor

@dreadatour dreadatour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 👍 Thank you 🤗

Couple comments left (below), also we have come cases in metastore not covered by tests, would be really great to have these covered.

Comment on lines +115 to +120
num_projects = metastore.count_projects(namespace.id)
if num_projects > 0:
raise NamespaceDeleteNotAllowedError(
f"Namespace cannot be removed. It contains {num_projects} project(s). "
"Please remove the project(s) first."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can now remove this check from here.

Comment on lines +127 to +132
num_datasets = metastore.count_datasets(project.id)
if num_datasets > 0:
raise ProjectDeleteNotAllowedError(
f"Project cannot be removed. It contains {num_datasets} dataset(s). "
"Please remove the dataset(s) first."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can now remove this check from here.

Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@ilongin ilongin merged commit 80eb643 into main Sep 11, 2025
58 of 63 checks passed
@ilongin ilongin deleted the ilongin/1244-delete-project branch September 11, 2025 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add delete_project() method
4 participants