Skip to content

stabilize ELF analysis and prevent crashes/hangs#2914

Closed
akshat4703 wants to merge 4 commits intomandiant:masterfrom
akshat4703:akshat/elf-binaries
Closed

stabilize ELF analysis and prevent crashes/hangs#2914
akshat4703 wants to merge 4 commits intomandiant:masterfrom
akshat4703:akshat/elf-binaries

Conversation

@akshat4703
Copy link

Summary

This PR improves the stability of ELF analysis in capa and resolves several
issues reported in #2780 where ELF binaries could cause crashes or hangs.

Problems addressed

  1. Unsupported architectures (e.g., aarch64 with yara-x) triggered fatal
    exceptions during analysis.
  2. vivisect section-symbol parsing could stall the ELF workspace loader.
  3. Certain viv analysis modules exhibit pathological behavior on ELF files.
  4. Large ELF binaries could cause long analysis times or hangs.

Changes

Clean handling of unsupported architectures

When encountering unsupported ELF architectures, capa now exits gracefully
with E_INVALID_FILE_ARCH instead of raising a fatal exception.

Prevent viv loader stalls

Adjusted ELF workspace loading to avoid cases where vivisect section-symbol
parsing causes the loader to get stuck.

Disable problematic viv modules for ELF

Viv modules known to trigger unstable behavior during ELF analysis are
disabled for ELF workspaces.

Bound analysis scope for large ELF binaries

Introduced a safety bound for viv function analysis:

CAPA_ELF_MAX_FUNCTIONS (default: 1000)

This prevents excessive analysis in very large ELF binaries and avoids
hangs similar to those observed with /usr/bin/gimp.

Testing

Added tests:

tests/test_loader_segfault.py

Results:

  • pytest tests/test_loader_segfault.py -q
    6 tests passed

Manual verification:

capa /bin/ls -> RC 0
capa /usr/bin/gimp -> RC 0 (completes successfully)
capa –debug /usr/bin/gimp -> RC 0

Additionally verified behavior on an aarch64 sample (yara-x):

  • exits cleanly with E_INVALID_FILE_ARCH
  • no fatal traceback

Files changed

capa/loader.py
capa/main.py
tests/test_loader_segfault.py

Result

ELF analysis is now more robust and avoids:

  • fatal exceptions
  • loader stalls
  • hangs on large binaries

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bug fixes, new features, breaking changes and anything else you think is worthwhile mentioning to the master (unreleased) section of CHANGELOG.md. If no CHANGELOG update is needed add the following to the PR description: [x] No CHANGELOG update needed

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the stability and robustness of ELF file analysis within capa. It addresses several long-standing issues that led to crashes, hangs, or excessive resource consumption when processing ELF binaries, particularly those with unsupported architectures or large symbol tables. By implementing graceful error handling, optimizing vivisect workspace loading, and introducing analysis timeouts and scope limits, the changes ensure a more reliable and predictable analysis experience for ELF files.

Highlights

  • Improved ELF Architecture Handling: Unsupported ELF architectures now result in a graceful exit with E_INVALID_FILE_ARCH instead of fatal exceptions, enhancing stability.
  • Vivisect Loader Stall Prevention: ELF workspace loading has been adjusted to temporarily disable vivisect's inefficient section-symbol parsing, preventing stalls on large binaries.
  • Problematic Vivisect Modules Disabled: Several vivisect analysis modules known to cause unstable or pathological behavior during ELF analysis are now disabled for ELF workspaces.
  • Analysis Scope Bounding and Timeouts: Introduced CAPA_ELF_MAX_FUNCTIONS to limit function analysis and CAPA_ELF_ANALYSIS_TIMEOUT_SECONDS for vivisect analysis, along with CAPA_ELF_TOTAL_ANALYSIS_TIMEOUT_SECONDS for overall capability matching, preventing hangs on large ELF binaries.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • capa/loader.py
    • Added imports for threading and signal modules.
    • Introduced _AnalysisTimeoutError class for handling analysis timeouts.
    • Implemented _get_elf_analysis_timeout_seconds and _get_elf_max_functions to retrieve timeout and function limit values from environment variables.
    • Created _timebox context manager to apply time limits to code blocks using SIGALRM.
    • Developed _temporarily_disable_viv_elf_section_symbols context manager to skip vivisect's inefficient ELF section-symbol parsing during workspace loading.
    • Modified get_workspace to detect ELF input, apply the section-symbol parsing bypass, and convert ModuleNotFoundError exceptions related to vivisect's impapi modules into UnsupportedArchError.
    • Updated get_workspace to apply a timeout to vivisect's analyze() call for ELF files, raising CorruptFile on timeout.
    • Adjusted get_workspace to disable specific vivisect analysis modules (e.g., symswitchcase, elfplt, emulation, emucode, noret) for ELF files to prevent pathological behavior.
    • Modified get_extractor to apply a FunctionFilter based on CAPA_ELF_MAX_FUNCTIONS for ELF files, limiting the scope of analysis for very large binaries.
  • capa/main.py
    • Added imports for signal and threading modules.
    • Introduced _AnalysisTimeoutError class (similar to capa/loader.py).
    • Implemented _get_elf_total_analysis_timeout_seconds to retrieve a total analysis timeout for ELF capability matching.
    • Created _timebox context manager (similar to capa/loader.py) to apply time limits.
    • Modified the main function to apply a total analysis timeout to the find_capabilities call for ELF files, returning E_FILE_LIMITATION on timeout.
  • tests/test_loader_segfault.py
    • Added imports for capa.loader and UnsupportedArchError.
    • Added test_elf_workspace_temporarily_disables_section_symbol_parsing to verify the temporary disabling and restoration of vivisect's ELF section-symbol parsing.
    • Added test_viv_module_not_found_maps_to_unsupported_arch to confirm that vivisect ModuleNotFoundError during analysis is correctly converted to UnsupportedArchError.
    • Added test_viv_workspace_module_not_found_maps_to_unsupported_arch to confirm that vivisect ModuleNotFoundError during workspace creation is correctly converted to UnsupportedArchError.
    • Added test_elf_analysis_timeout_maps_to_corrupt_file to verify that an ELF analysis timeout is correctly converted to a CorruptFile exception.
Activity
  • New tests were added in tests/test_loader_segfault.py to cover the new stability improvements.
  • All 6 new tests in tests/test_loader_segfault.py passed successfully.
  • Manual verification was performed on /bin/ls and /usr/bin/gimp, confirming successful completion (RC 0).
  • Manual verification on an aarch64 sample (yara-x) confirmed clean exit with E_INVALID_FILE_ARCH and no fatal traceback.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions bot dismissed their stale review March 13, 2026 17:01

CHANGELOG updated or no update needed, thanks! 😄

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the stability of ELF file analysis by introducing timeouts, gracefully handling unsupported architectures, and implementing workarounds for performance issues in vivisect. The changes are logical and well-supported by new tests. My review includes a few suggestions to address code duplication and improve readability, which will enhance the long-term maintainability of this new functionality.

@williballenthin
Copy link
Collaborator

@akshat4703 in #2913 I asked you to discuss ideas with the maintainers before opening PRs. Less than 12h later you've opened this PR with multiple unexpected changes. We are already spread thin, and these PRs make it even harder to maintain capa.

Please discuss ideas and proposals with us first; otherwise, we will close PRs without discussion.

@akshat4703
Copy link
Author

@akshat4703 in #2913 I asked you to discuss ideas with the maintainers before opening PRs. Less than 12h later you've opened this PR with multiple unexpected changes. We are already spread thin, and these PRs make it even harder to maintain capa.

Please discuss ideas and proposals with us first; otherwise, we will close PRs without discussion.

I am sorry for being so naive, understood.

Since #2780 already exists for the stability issues, would you prefer that I post the proposed approaches and discussion there, or open a separate issue to outline the ideas before any PR? or should i open a discussion?

Happy to follow whichever workflow you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants