-
Notifications
You must be signed in to change notification settings - Fork 1k
feat(lxd): add s390x virtio-ports detection for LXD #6597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Additionally, we should add resolute support to pycloudlib per canonical/pycloudlib#494 |
fe83d28 to
413e95b
Compare
holmanb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @blackboxsw, this is a good start. I left some comments inline, also note that the commit message:
Drop the now unnecessary DMI board name check from ds-identify and
DataSourceLXD because virtio-ports is sufficient in bot KVM/QEMU and
s390x.
The following mechinisms for LXD detection are retained:
1. /dev/lxd/sock socket file (primary method)
2. virtio-ports serial device (KVM/QEMU and new for s390x)
This reads as if s390x is a new platform alternative to QEMU, when in fact it is a different cpu instruction set (like x86_64 or amd64).
Support LXD detection on IBM's s390x architecture using virtio-ports serial device detection. Add a new detection method that checks for LXD serial devices in /sys/class/virtio-ports/ for both current (com.canonical.lxd) and legacy (org.linuxcontainers.lxd) names. Update public LXD datasource documentation for this detection method. Drop the now unnecessary DMI board name check from ds-identify and DataSourceLXD because virtio-ports is sufficient in bot KVM/QEMU and non-x86_64 platforms without DMI information. The following mechinisms for LXD detection are retained: 1. /dev/lxd/sock socket file (primary method) 2. virtio-ports serial device presence if socket file isn't yet present Implements: PL073
blackboxsw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review @holmanb. I think I have resolved all comments.
413e95b to
cff1ba6
Compare
holmanb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we still have a failing ruff check.
holmanb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work on this @blackboxsw. See my comments.
holmanb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @blackboxsw, I think this is getting closer. See my latest round of comments.
|
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging blackboxsw, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag blackboxsw to reopen it.) |
67190b4 to
df409f7
Compare
- comment fixes in tests and code paths regarding lxd socket file. - fix integration test behavior to validate ds-identify discovery of lxd despite security.devlxd set False - fix unit tests to drop trailing newline
df409f7 to
35c274a
Compare
blackboxsw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Brett. I believe I have addressed all review comments now.
35c274a to
042e234
Compare
holmanb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @blackboxsw. Besides the linter failures it looks like there were also a couple of comments from before that weren't addressed.
| # Expect fallback to NoCloud because python DataSourceLXD cannot | ||
| # get any information from /dev/lxd/sock due to security.devlxd above. | ||
| cloud_id = client.execute("cloud-id").stdout | ||
| if "nocloud" != cloud_id: | ||
| raise AssertionError( | ||
| "cloud-init did not discover 'nocloud' datasource." | ||
| f" Found '{cloud_id}'" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests a specific failure mode that occurs when the socket is disabled, which is more of a test of the platform than of cloud-init, isn't it? This doesn't actually "Test DataSourceLXD on KVM detected by ds-identify using virtio-ports."
If the LXD project stopped providing the nocloud device then this test would start to fail unexpectedly due to changes in an external project. Checking for nocloud here could provide a "canary signal" to warn that something is broken on the platform, but the signal getting to the right place would mean work for the developers to look at failing integration tests and doing something about the failure. But if this is known and expected by LXD developers then what is the point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment was maybe misleading here or didn't provide enough relevant context.
I've updated the comment to:
+ # Expect NoCloud because python DataSourceLXD cannot read metadata from
+ # /dev/lxd/sock due to security.devlxd above and falls back to
+ # the nocloud seed files written by _customize_environment.
cloud_id = client.execute("cloud-id").stdout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what is happening. I'm saying I don't necessarily think that this is important behavior for us to test. Is the fact that NoCloud is detected when the socket is not available a core feature of cloud-init? Or is this just a failure mode that happens to still resemble success simply because that is how LXD originally exposed the IMDS?
My comment earlier was about the test maintenance burden this kind of test will have introduced if the LXD project later stops providing 2 duplicate datasources. I don't want to have to come back later and rewrite this test when the LXD project decides that they will no longer provide duplicate datasources.
tests/unittests/test_ds_identify.py
Outdated
| }, | ||
| # /dev/lxd/sock does not exist and KVM virt-type | ||
| "mocks": [{"name": "is_socket_file", "ret": 1}, MOCK_VIRT_IS_KVM], | ||
| "mocks": [{"name": "is_socket_file", "ret": 1}, MOCK_VIRT_IS_KVM_QEMU], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forget to push?
|
Thank you for this additional round @holmanb. Getting nearer to the finish line here and avoiding stray test changes that are unrelated to this PR. |
holmanb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @blackboxsw! I think we're getting close. See my comments below.
| Disabling the ``security.devlxd`` configuration setting at initial launch will | ||
| ensure that ``cloud-init`` uses the :ref:`datasource_nocloud` datasource. | ||
| Disabling ``security.devlxd`` over the life of the container will result in | ||
| warnings from ``cloud-init``, and ``cloud-init`` will keep the | ||
| originally-detected LXD datasource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now that this is documented.
This seems like something that users shouldn't need to do. I would rather we didn't document this behavior unless we have a really good reason to do so.
| performs HTTP GETs against the `LXD socket device`_ which is provided to each | ||
| running LXD container and VM as ``/dev/lxd/sock`` and represents all | ||
| running LXD container and VM as :file:`/dev/lxd/sock` and represents all | ||
| instance-meta-data as versioned HTTP routes such as: | ||
|
|
||
| - 1.0/meta-data | ||
| - 1.0/config/cloud-init.vendor-data | ||
| - 1.0/config/cloud-init.user-data | ||
| - 1.0/config/user.<any-custom-key> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are implementation details of LXD - a user shouldn't need them.
If anywhere, they belong in dev docs, but even then it doesn't seem important.
I would drop and represents all.*$ of this highlighted section (including the bullet list).
| The LXD datasource is detected as viable by ``ds-identify`` during the | ||
| :ref:`detect stage<boot-Detect>` when either ``/dev/lxd/sock`` exists or | ||
| ``/sys/class/dmi/id/board_name`` matches "LXD". | ||
| :ref:`detect stage<boot-Detect>` when either :file:`/dev/lxd/sock` exists | ||
| or an LXD serial device is present in :file:`/sys/class/virtio-ports`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also seem like implementation details the user shouldn't need.
| pytest.param("LXD-kvm-not-MAAS-1", True, id="mass_not_detected_1"), | ||
| pytest.param("LXD-kvm-not-MAAS-2", True, id="mass_not_detected_2"), | ||
| pytest.param("LXD-kvm-not-MAAS-3", True, id="mass_not_detected_3"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I just miss adding these tests to the parametrized list?
The structure of these tests is error-prone: it requires modification in multiple places which has a high likelyhood of mistakes. The networking tests are like that too, iirc, but in both cases when I considered refactoring them before it seemed like too much work to be worth it.
Maybe something to point an agent at someday.
tools/ds-identify
Outdated
| set +f; set -- "${virtio_ports_path}/"*; set -f; | ||
| for port_dir in "$@"; do | ||
| local name_file="${port_dir}/name" | ||
| if [ -f "${name_file}" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Besides the nesting argument, cognitive load can be reduced if we instead continue if the file doesn't exist. The difference is subtle, but real. Filtering out values with early continues (or in other cases early returns) reduces the range of possible values in scope - so there is less to mentally store in your head. An early continue when the conditions don't match the expected condition ensures that the condition that was checked for has been completely dealt with. Compare two cases:
- As written:
if [ -f "${name_file}"]; then
# do something
fi
# The reader knows nothing about the path in $name_file at this point!- Filter out unwanted conditions with an early continue:
if [ ! -f "${name_file}"]; then
continue
fi
# The case where "not a regular file or doesn't exist" has been
# completely handled at this point. The reader knows that the path in
# $name_file exists and is a real file for the remainder of the scope. Obviously this isn't a hard rule to always follow - but when filtering for a single condition, simply following this rule tends to produce code that is less nested and requires less cognitive load simply because there is less "state" to keep track of.
tests/unittests/cloudinit/example.pyshould be tested bytests/unittests/test_example.pytox -e py3tox -e doc.Proposed Commit Message
Additional Context
Test Steps
Merge type