Skip to content

Conversation

@blackboxsw
Copy link
Collaborator

@blackboxsw blackboxsw commented Dec 2, 2025

  • I have signed the CLA: https://ubuntu.com/legal/contributors
  • I have included a comprehensive commit message using the guide below
  • I have added unit tests to cover the new behavior under tests/unittests/
    • Test files should map to source files i.e. a source file cloudinit/example.py should be tested by tests/unittests/test_example.py
    • Run unit tests with tox -e py3
  • I have kept the change small, avoiding unnecessary whitespace or non-functional changes.
  • I have added a reference to issues that this PR relates to in the PR message (Refs integration: do not LXD bind mount /etc/cloud/cloud.cfg.d #1234, Fixes integration: do not LXD bind mount /etc/cloud/cloud.cfg.d #1234)
  • I have updated the documentation with the changed behavior.
    • If the change doesn't change the user interface and is trivial, this step may be skipped.
    • Cloud-config documentation is generated from the jsonschema.
    • Generate docs with tox -e doc.

Proposed Commit Message

feat(lxd): add s390x virtio-ports detection for LXD

Support LXD detection on IBM's s390x architecture using virtio-ports
serial device detection. Add a new detection method which checks
LXD serial devices in /sys/class/virtio-ports/ for both current
(com.canonical.lxd) and legacy (org.linuxcontainers.lxd) names.
Update public LXD datasource documentation for this detection method.

Drop the now unnecessary DMI board name check from ds-identify and
DataSourceLXD because virtio-ports is sufficient for both KVM/QEMU
and s390x.

The following mechinisms for LXD detection are retained:
1. /dev/lxd/sock socket file (primary method)
2. virtio-ports serial device presence if socket is not yet active

Implements: PL073

Additional Context

Test Steps

git checkout upstream/ubuntu/questing -B ubuntu/questing
new_upstream_snapshot -c 5ff9034ffdac4a59a871384c1bd082c2589a1c4e
build-package
sbuild ....
CLOUD_INIT_OS_IMAGE=questing CLOUD_INIT_CLOUD_INIT_SOURCE=cloud-init_25.4~2gdef5afa3-0ubuntu1_all.deb CLOUD_INIT_PLATFORM=lxd_vm .tox/integration-tests/bin/pytest tests/integration_tests/datasources/test_lxd_discovery.py::test_lxd_kvm_datasource_discovery_without_lxd_socket

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

@github-actions github-actions bot added the documentation This Pull Request changes documentation label Dec 2, 2025
@blackboxsw
Copy link
Collaborator Author

Additionally, we should add resolute support to pycloudlib per canonical/pycloudlib#494

@blackboxsw blackboxsw requested a review from holmanb December 3, 2025 03:58
@blackboxsw blackboxsw force-pushed the pl073-s390x-lxd branch 3 times, most recently from fe83d28 to 413e95b Compare December 3, 2025 04:04
Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @blackboxsw, this is a good start. I left some comments inline, also note that the commit message:

Drop the now unnecessary DMI board name check from ds-identify and
DataSourceLXD because virtio-ports is sufficient in bot KVM/QEMU and
s390x.

The following mechinisms for LXD detection are retained:
1. /dev/lxd/sock socket file (primary method)
2. virtio-ports serial device (KVM/QEMU and new for s390x)

This reads as if s390x is a new platform alternative to QEMU, when in fact it is a different cpu instruction set (like x86_64 or amd64).

Support LXD detection on IBM's s390x architecture using virtio-ports
serial device detection. Add a new detection method that checks for
LXD serial devices in /sys/class/virtio-ports/ for both current
(com.canonical.lxd) and legacy (org.linuxcontainers.lxd) names.
Update public LXD datasource documentation for this detection method.

Drop the now unnecessary DMI board name check from ds-identify and
DataSourceLXD because virtio-ports is sufficient in bot KVM/QEMU and
non-x86_64 platforms without DMI information.

The following mechinisms for LXD detection are retained:
1. /dev/lxd/sock socket file (primary method)
2. virtio-ports serial device presence if socket file isn't yet present

Implements: PL073
Copy link
Collaborator Author

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review @holmanb. I think I have resolved all comments.

@blackboxsw blackboxsw requested a review from holmanb December 17, 2025 19:43
Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we still have a failing ruff check.

Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this @blackboxsw. See my comments.

@blackboxsw blackboxsw requested a review from holmanb December 19, 2025 17:34
Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @blackboxsw, I think this is getting closer. See my latest round of comments.

@github-actions
Copy link

github-actions bot commented Jan 4, 2026

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging blackboxsw, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag blackboxsw to reopen it.)

@github-actions github-actions bot added the stale-pr Pull request is stale; will be auto-closed soon label Jan 4, 2026
@holmanb holmanb removed the stale-pr Pull request is stale; will be auto-closed soon label Jan 5, 2026
@holmanb holmanb self-assigned this Jan 5, 2026
@blackboxsw blackboxsw force-pushed the pl073-s390x-lxd branch 3 times, most recently from 67190b4 to df409f7 Compare January 5, 2026 22:25
- comment fixes in tests and code paths regarding lxd socket file.
- fix integration test behavior to validate ds-identify discovery of
  lxd despite security.devlxd set False
- fix unit tests to drop trailing newline
Copy link
Collaborator Author

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Brett. I believe I have addressed all review comments now.

@blackboxsw blackboxsw requested a review from holmanb January 5, 2026 22:33
Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @blackboxsw. Besides the linter failures it looks like there were also a couple of comments from before that weren't addressed.

Comment on lines 88 to 95
# Expect fallback to NoCloud because python DataSourceLXD cannot
# get any information from /dev/lxd/sock due to security.devlxd above.
cloud_id = client.execute("cloud-id").stdout
if "nocloud" != cloud_id:
raise AssertionError(
"cloud-init did not discover 'nocloud' datasource."
f" Found '{cloud_id}'"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests a specific failure mode that occurs when the socket is disabled, which is more of a test of the platform than of cloud-init, isn't it? This doesn't actually "Test DataSourceLXD on KVM detected by ds-identify using virtio-ports."

If the LXD project stopped providing the nocloud device then this test would start to fail unexpectedly due to changes in an external project. Checking for nocloud here could provide a "canary signal" to warn that something is broken on the platform, but the signal getting to the right place would mean work for the developers to look at failing integration tests and doing something about the failure. But if this is known and expected by LXD developers then what is the point?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was maybe misleading here or didn't provide enough relevant context.

I've updated the comment to:

+        # Expect NoCloud because python DataSourceLXD cannot read metadata from
+        # /dev/lxd/sock due to security.devlxd above and falls back to
+        # the nocloud seed files written by _customize_environment.
         cloud_id = client.execute("cloud-id").stdout

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what is happening. I'm saying I don't necessarily think that this is important behavior for us to test. Is the fact that NoCloud is detected when the socket is not available a core feature of cloud-init? Or is this just a failure mode that happens to still resemble success simply because that is how LXD originally exposed the IMDS?

My comment earlier was about the test maintenance burden this kind of test will have introduced if the LXD project later stops providing 2 duplicate datasources. I don't want to have to come back later and rewrite this test when the LXD project decides that they will no longer provide duplicate datasources.

},
# /dev/lxd/sock does not exist and KVM virt-type
"mocks": [{"name": "is_socket_file", "ret": 1}, MOCK_VIRT_IS_KVM],
"mocks": [{"name": "is_socket_file", "ret": 1}, MOCK_VIRT_IS_KVM_QEMU],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forget to push?

@blackboxsw blackboxsw requested a review from holmanb January 6, 2026 20:55
@blackboxsw
Copy link
Collaborator Author

blackboxsw commented Jan 6, 2026

Thank you for this additional round @holmanb. Getting nearer to the finish line here and avoiding stray test changes that are unrelated to this PR.

Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @blackboxsw! I think we're getting close. See my comments below.

Comment on lines 20 to 24
Disabling the ``security.devlxd`` configuration setting at initial launch will
ensure that ``cloud-init`` uses the :ref:`datasource_nocloud` datasource.
Disabling ``security.devlxd`` over the life of the container will result in
warnings from ``cloud-init``, and ``cloud-init`` will keep the
originally-detected LXD datasource.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that this is documented.

This seems like something that users shouldn't need to do. I would rather we didn't document this behavior unless we have a really good reason to do so.

Comment on lines 9 to 16
performs HTTP GETs against the `LXD socket device`_ which is provided to each
running LXD container and VM as ``/dev/lxd/sock`` and represents all
running LXD container and VM as :file:`/dev/lxd/sock` and represents all
instance-meta-data as versioned HTTP routes such as:

- 1.0/meta-data
- 1.0/config/cloud-init.vendor-data
- 1.0/config/cloud-init.user-data
- 1.0/config/user.<any-custom-key>
Copy link
Member

@holmanb holmanb Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are implementation details of LXD - a user shouldn't need them.

If anywhere, they belong in dev docs, but even then it doesn't seem important.

I would drop and represents all.*$ of this highlighted section (including the bullet list).

Comment on lines 26 to +28
The LXD datasource is detected as viable by ``ds-identify`` during the
:ref:`detect stage<boot-Detect>` when either ``/dev/lxd/sock`` exists or
``/sys/class/dmi/id/board_name`` matches "LXD".
:ref:`detect stage<boot-Detect>` when either :file:`/dev/lxd/sock` exists
or an LXD serial device is present in :file:`/sys/class/virtio-ports`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also seem like implementation details the user shouldn't need.

Comment on lines +536 to +538
pytest.param("LXD-kvm-not-MAAS-1", True, id="mass_not_detected_1"),
pytest.param("LXD-kvm-not-MAAS-2", True, id="mass_not_detected_2"),
pytest.param("LXD-kvm-not-MAAS-3", True, id="mass_not_detected_3"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I just miss adding these tests to the parametrized list?

The structure of these tests is error-prone: it requires modification in multiple places which has a high likelyhood of mistakes. The networking tests are like that too, iirc, but in both cases when I considered refactoring them before it seemed like too much work to be worth it.

Maybe something to point an agent at someday.

set +f; set -- "${virtio_ports_path}/"*; set -f;
for port_dir in "$@"; do
local name_file="${port_dir}/name"
if [ -f "${name_file}" ]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Besides the nesting argument, cognitive load can be reduced if we instead continue if the file doesn't exist. The difference is subtle, but real. Filtering out values with early continues (or in other cases early returns) reduces the range of possible values in scope - so there is less to mentally store in your head. An early continue when the conditions don't match the expected condition ensures that the condition that was checked for has been completely dealt with. Compare two cases:

  1. As written:
    if [ -f "${name_file}"]; then
        # do something
    fi
    # The reader knows nothing about the path in $name_file at this point!
  1. Filter out unwanted conditions with an early continue:
    if [ ! -f "${name_file}"]; then
        continue
    fi
    # The case where "not a regular file or doesn't exist" has been
    # completely handled at this point. The reader knows that the path in
    # $name_file exists and is a real file for the remainder of the scope. 

Obviously this isn't a hard rule to always follow - but when filtering for a single condition, simply following this rule tends to produce code that is less nested and requires less cognitive load simply because there is less "state" to keep track of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation This Pull Request changes documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants