Skip to content

Conversation

@elias-ba
Copy link
Contributor

@elias-ba elias-ba commented Nov 12, 2025

Description

This PR fixes a security vulnerability in the mix lightning.merge_projects task where malicious JSON input with arbitrary keys could cause atom exhaustion and crash the VM.

The fix uses String.to_existing_atom/1 to safely convert JSON keys to atoms, only allowing keys that already exist in the system. This prevents creation of unlimited atoms from malicious input while maintaining compatibility with the merge algorithm.

Closes #3956

Validation steps

  1. Test basic merge functionality:

    mix test test/mix/tasks/merge_projects_test.exs

    All tests should pass.

  2. Test with non-UUID IDs (Joe's requirement):

    • Run merge with projects using simple IDs like "1", "2", "test-source-1"
    • Verify merge works without requiring valid UUIDs
  3. Test security:

    • Try merging a project with unknown JSON keys
    • Should raise clear error about unknown fields
  4. Test offline operation:

    • Run mix task without database connection
    • Should work without any database access

Additional notes for the reviewer

  1. Implementation approach: Uses atomize_keys/1 function that recursively converts only map keys to atoms (not values). UUIDs and other string values remain as strings.

  2. No Provisioner coupling: Intentionally does not use Provisioner.parse_document/1 to avoid coupling to validation rules and usage limiting checks (per Joe's feedback).

  3. Lets merge fail naturally: If project structure is truly invalid, the merge algorithm itself will fail with appropriate errors rather than blocking upfront.

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy

Pre-submission checklist

  • I have performed a self-review of my code.
  • I have implemented and tested all related authorization policies. (e.g., :owner, :admin, :editor, :viewer)
  • I have updated the changelog.
  • I have ticked a box in "AI usage" in this PR

@codecov
Copy link

codecov bot commented Nov 12, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.67%. Comparing base (cc02dd3) to head (172e011).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3973      +/-   ##
==========================================
+ Coverage   88.65%   88.67%   +0.02%     
==========================================
  Files         422      422              
  Lines       18913    18913              
==========================================
+ Hits        16767    16771       +4     
+ Misses       2146     2142       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@elias-ba
Copy link
Contributor Author

elias-ba commented Nov 12, 2025

Hey @rorymckinley and @josephjclark 👋

This is a new PR that supersedes #3956. Based on Joe's feedback about not coupling the merge task to Provisioner validation or database requirements, I've implemented a much simpler approach.

What changed from #3956

The new implementation:

  • No Provisioner dependency (avoids coupling to validation rules and usage limits)
  • No database access (works completely offline as Joe needed)
  • Simple String.to_existing_atom/1 for security (prevents atom exhaustion)
  • Supports non-UUID IDs like "1", "test-source-1" (Joe's requirement for testing)
  • Lets the merge algorithm handle validation naturally

How it works

  1. Takes JSON files as input
  2. Converts string keys to existing atoms using String.to_existing_atom/1
  3. Passes atom-keyed maps to MergeProjects.merge_project/2
  4. Returns the merged result

This approach keeps the Mix task decoupled from Lightning's validation constraints while still preventing the atom exhaustion security vulnerability.

@josephjclark - As discussed on Slack, this should satisfy your testing needs. When you have time, please test this branch thoroughly and let me know if you see any issues. I've added you as a reviewer.

@rorymckinley - When you get a chance, would you be able to help review this and get it merged? I believe this approach is cleaner and aligns better with the Mix task's purpose of being a simple, offline utility.

Thanks both! 🙏

Copy link
Collaborator

@josephjclark josephjclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not going to pretend to 100% understand what's happening here - but this passes against my test suite, so I'm all for it!

Copy link
Collaborator

@rorymckinley rorymckinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elias-ba I have not had a chance to really step through the test changes, will do that tomorrow, but I am shutting down now and don't want to take the chance that github forgets my current comments - so please consider this to be part 1 :).

I have 'Requested Changes' primarily because of my confusion re: Jason.decode - is it sill in the execution path, or are my eyes just tired?

This may indicate incompatible project structures or corrupted data.
Please verify both files are valid Lightning project exports.
""")
defp atomize_keys(data) when is_list(data) do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elias-ba If I comment out this method, no tests fail - is it still required?

defp atomize_keys(data) when is_map(data) do
Map.new(data, fn {key, value} ->
atom_key = if is_binary(key), do: String.to_existing_atom(key), else: key
{atom_key, atomize_keys(value)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elias-ba If you are going to be atomizing things that are not keys - maybe this should be called just atomize, or maybe_atomize?

MergeProjects.merge_project(source_project, target_project)
rescue
e in KeyError ->
ArgumentError ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elias-ba This does not appear to have any test coverage?

end

defp perform_merge(source_project, target_project) do
defp perform_merge(source_data, target_data) do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elias-ba I can unfortunately not comment on the line I want to as it is not part of the diff, but:

read_state_file still contains Jason.decode(content, keys: :atoms) and read_state_file provides the input to perform_merge so it gets executed before we cll atomize_keys so are we not still vulnerable to a DoS via JSON parsing?

#{Exception.message(e)}
defp atomize_keys(data) when is_map(data) do
Map.new(data, fn {key, value} ->
atom_key = if is_binary(key), do: String.to_existing_atom(key), else: key
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elias-ba Are we sure that all the string keys will already be present in memory as items at the time this task runs? I can imagine that it will be quite a frustrating user experience if this is not the case.

end

describe "run/1 - flexibility for testing (Joe's requirements)" do
test "allows non-UUID IDs for testing purposes", %{tmp_dir: tmp_dir} do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elias-ba Would passing in non-UUID IDs be a valid option for users other than @josephjclark ?

If not, I would suggest that we perhaps disable the UUID -checking via an ENV variable to give Joe the flexibility he requires, but keep it enabled otherwise, which would allow other users tighter validation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elias-ba This test looks as if it may be a duplicate of the test on line 9?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with a strict mode in prod and chill mode in dev! I don't need this stuff to run on the platform

@github-project-automation github-project-automation bot moved this from New Issues to In review in v2 Nov 12, 2025
assert result["name"] == "target-project"
end

test "successfully merges projects with deeply nested structures", %{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elias-ba This test also looks as if it may be a duplicate of the test on line 9?

assert job["name"] == "Job 1"
end

test "works offline without database access", %{tmp_dir: tmp_dir} do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elias-ba For this test are we simulating 'without database access' by just not inserting any fixtures into the database?

@rorymckinley
Copy link
Collaborator

@elias-ba Ok, done. If my assessment re: the continued presence of Jason.decode with atoms is correct, that is the ony blocker - the rest are nice-to-haves (depending on how you view test coverage :) ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

4 participants