Skip to content

feat: use seaweedfs as nodestore backend #3842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 6, 2025

Conversation

aldy505
Copy link
Collaborator

@aldy505 aldy505 commented Jul 31, 2025

WIP

@aldy505 aldy505 changed the base branch from master to byk/feat/s3-nodestore July 31, 2025 13:57
@aldy505 aldy505 marked this pull request as draft July 31, 2025 13:58
Copy link

codecov bot commented Jul 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.45%. Comparing base (0318e84) to head (d25faa3).
⚠️ Report is 1 commits behind head on byk/feat/s3-nodestore.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@                  Coverage Diff                   @@
##           byk/feat/s3-nodestore    #3842   +/-   ##
======================================================
  Coverage                  99.45%   99.45%           
======================================================
  Files                          3        3           
  Lines                        183      183           
======================================================
  Hits                         182      182           
  Misses                         1        1           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aldy505
Copy link
Collaborator Author

aldy505 commented Aug 1, 2025

  worker-1                                        | Traceback (most recent call last):
  worker-1                                        |   File "/.venv/lib/python3.13/site-packages/celery/app/trace.py", line 453, in trace_task
  worker-1                                        |     R = retval = fun(*args, **kwargs)
  worker-1                                        |                  ~~~^^^^^^^^^^^^^^^^^
  worker-1                                        |   File "/.venv/lib/python3.13/site-packages/sentry_sdk/utils.py", line 1811, in runner
  worker-1                                        |     return sentry_patched_function(*args, **kwargs)
  worker-1                                        |   File "/.venv/lib/python3.13/site-packages/sentry_sdk/integrations/celery/__init__.py", line 415, in _inner
  worker-1                                        |     reraise(*exc_info)
  worker-1                                        |     ~~~~~~~^^^^^^^^^^^
  worker-1                                        |   File "/.venv/lib/python3.13/site-packages/sentry_sdk/utils.py", line 1746, in reraise
  worker-1                                        |     raise value
  worker-1                                        |   File "/.venv/lib/python3.13/site-packages/sentry_sdk/integrations/celery/__init__.py", line 410, in _inner
  worker-1                                        |     return f(*args, **kwargs)
  worker-1                                        |   File "/usr/src/sentry/src/sentry/celery.py", line 104, in __call__
  worker-1                                        |     return super().__call__(*args, **kwargs)
  worker-1                                        |            ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  worker-1                                        |   File "/.venv/lib/python3.13/site-packages/celery/app/trace.py", line 736, in __protected_call__
  worker-1                                        |     return self.run(*args, **kwargs)
  worker-1                                        |            ~~~~~~~~^^^^^^^^^^^^^^^^^
  worker-1                                        |   File "/usr/src/sentry/src/sentry/silo/base.py", line 158, in override
  worker-1                                        |     return original_method(*args, **kwargs)
  worker-1                                        |   File "/usr/src/sentry/src/sentry/tasks/base.py", line 187, in _wrapped
  worker-1                                        |     result = func(*args, **kwargs)
  worker-1                                        |   File "/usr/src/sentry/src/sentry/tasks/store.py", line 641, in save_event
  worker-1                                        |     _do_save_event(
  worker-1                                        |     ~~~~~~~~~~~~~~^
  worker-1                                        |         cache_key,
  worker-1                                        |         ^^^^^^^^^^
  worker-1                                        |     ...<5 lines>...
  worker-1                                        |         **kwargs,
  worker-1                                        |         ^^^^^^^^^
  worker-1                                        |     )
  worker-1                                        |     ^
  worker-1                                        |   File "/usr/src/sentry/src/sentry/tasks/store.py", line 567, in _do_save_event
  worker-1                                        |     manager.save(
  worker-1                                        |     ~~~~~~~~~~~~^
  worker-1                                        |         project_id,
  worker-1                                        |         ^^^^^^^^^^^
  worker-1                                        |     ...<3 lines>...
  worker-1                                        |         has_attachments=has_attachments,
  worker-1                                        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  worker-1                                        |     )
  worker-1                                        |     ^
  worker-1                                        |   File "/.venv/lib/python3.13/site-packages/sentry_sdk/tracing_utils.py", line 829, in func_with_tracing
  worker-1                                        |     return func(*args, **kwargs)
  worker-1                                        |   File "/usr/src/sentry/src/sentry/event_manager.py", line 496, in save
  worker-1                                        |     return self.save_error_events(
  worker-1                                        |            ~~~~~~~~~~~~~~~~~~~~~~^
  worker-1                                        |         project,
  worker-1                                        |         ^^^^^^^^
  worker-1                                        |     ...<5 lines>...
  worker-1                                        |         has_attachments=has_attachments,
  worker-1                                        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  worker-1                                        |     )
  worker-1                                        |     ^
  worker-1                                        |   File "/.venv/lib/python3.13/site-packages/sentry_sdk/tracing_utils.py", line 829, in func_with_tracing
  worker-1                                        |     return func(*args, **kwargs)
  worker-1                                        |   File "/usr/src/sentry/src/sentry/event_manager.py", line 580, in save_error_events
  worker-1                                        |     _nodestore_save_many(jobs=jobs, app_feature="errors")
  worker-1                                        |     ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  worker-1                                        |   File "/usr/src/sentry/src/sentry/event_manager.py", line 1104, in _nodestore_save_many
  worker-1                                        |     job["event"].data.save(subkeys=subkeys)
  worker-1                                        |     ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  worker-1                                        |   File "/usr/src/sentry/src/sentry/db/models/fields/node.py", line 140, in save
  worker-1                                        |     nodestore.backend.set_subkeys(self.id, subkeys)
  worker-1                                        |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  worker-1                                        |   File "/usr/src/sentry/src/sentry/utils/metrics.py", line 234, in inner
  worker-1                                        |     return f(*args, **kwargs)
  worker-1                                        |   File "/.venv/lib/python3.13/site-packages/sentry_sdk/tracing_utils.py", line 829, in func_with_tracing
  worker-1                                        |     return func(*args, **kwargs)
  worker-1                                        |   File "/usr/src/sentry/src/sentry/nodestore/base.py", line 269, in set_subkeys
  worker-1                                        |     self.set_bytes(item_id, bytes_data, ttl=ttl)
  worker-1                                        |     ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  worker-1                                        |   File "/usr/src/sentry/src/sentry/nodestore/base.py", line 236, in set_bytes
  worker-1                                        |     return self._set_bytes(item_id, data, ttl)
  worker-1                                        |            ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  worker-1                                        |   File "/.venv/lib/python3.13/site-packages/sentry_nodestore_s3/backend.py", line 83, in _set_bytes
  worker-1                                        |     self.__write_to_bucket(id, data)
  worker-1                                        |     ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
  worker-1                                        |   File "/.venv/lib/python3.13/site-packages/sentry_nodestore_s3/backend.py", line 121, in __write_to_bucket
  worker-1                                        |     self.client.put_object(
  worker-1                                        |     ~~~~~~~~~~~~~~~~~~~~~~^
  worker-1                                        |         Key=self.__get_key_for_id(id),
  worker-1                                        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  worker-1                                        |     ...<2 lines>...
  worker-1                                        |         ContentEncoding=content_encoding,
  worker-1                                        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  worker-1                                        |     )
  worker-1                                        |     ^
  worker-1                                        |   File "/.venv/lib/python3.13/site-packages/botocore/client.py", line 565, in _api_call
  worker-1                                        |     return self._make_api_call(operation_name, kwargs)
  worker-1                                        |            ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
  worker-1                                        |   File "/.venv/lib/python3.13/site-packages/botocore/client.py", line 1021, in _make_api_call
  worker-1                                        |     raise error_class(parsed_response, operation_name)
  worker-1                                        | botocore.exceptions.ClientError: An error occurred (InternalError) when calling the PutObject operation (reached max retries: 3): We encountered an internal error, please try again.

@aldy505
Copy link
Collaborator Author

aldy505 commented Aug 1, 2025

@doc-sheet I have this on the logs, have you encountered this? Is this expected?

I0731 14:24:56.246129 disk_location.go:463 dir /tmp disk free 19.26 GiB < required 50.00 GiB

@doc-sheet
Copy link
Contributor

@aldy505 looks like -volume.minFreeSpace=50GiB is working.

-volume=true
-volume.dir.idx=/data/idx
-volume.index=leveldbLarge
-volume.max=30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and -master.volumeSizeLimitMB too

It is better to have several volumes running on one machine, so that if one volume is compacting, the other volumes can still serve read and write requests. The default volume size is 30GB. So if your server does not have multiple 30GB empty spaces, you need to reduce the volume size.

https://github.com/seaweedfs/seaweedfs/wiki/Production-Setup#for-single-node-setup

@aldy505 aldy505 changed the title aldy505/feat/seaweed nodestore feat: use seaweedfs as nodestore backend Aug 5, 2025
@aldy505 aldy505 marked this pull request as ready for review August 6, 2025 04:10
@aldy505 aldy505 requested review from doc-sheet and BYK and removed request for doc-sheet August 6, 2025 04:10
@aldy505
Copy link
Collaborator Author

aldy505 commented Aug 6, 2025

@BYK @doc-sheet heya, the CI are green :)

Copy link
Member

@BYK BYK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me. I think S3PassthroughDjangoNodeStorage had a fallback mode. Is it enabled automatically so people can easily switch over?

@@ -17,5 +17,6 @@ echo "Created $(create_volume sentry-kafka)."
echo "Created $(create_volume sentry-postgres)."
echo "Created $(create_volume sentry-redis)."
echo "Created $(create_volume sentry-symbolicator)."
echo "Created $(create_volume sentry-seaweedfs)."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should make this a global volume btw. Not sure if that pattern is good at all (carried the old ones over for compatibility reasons)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would survive docker compose down -v. I mostly use docker compose down -v to destroy any other volumes that don't have any serious data (like the logs volumes).

$dc exec seaweedfs mkdir -p /data/idx/
$s3cmd --access_key=sentry --secret_key=sentry --no-ssl --region=us-east-1 --host=localhost:8333 --host-bucket='localhost:8333/%(bucket)' mb s3://nodestore
else
echo "Node store already exists, skipping..."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we actually migrated all the data at this stage?

@BYK
Copy link
Member

BYK commented Aug 6, 2025

FYI your PR is targeting my branch. Not sure if that was intentional or not.

@aldy505
Copy link
Collaborator Author

aldy505 commented Aug 6, 2025

FYI your PR is targeting my branch. Not sure if that was intentional or not.

It is intentional

@aldy505 aldy505 merged commit 3ab2d81 into byk/feat/s3-nodestore Aug 6, 2025
9 checks passed
@aldy505 aldy505 deleted the aldy505/feat/seaweed-nodestore branch August 6, 2025 09:58
@github-actions github-actions bot locked and limited conversation to collaborators Aug 22, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants