Skip to content

Analysis of utils.py #33

@albatrosary

Description

@albatrosary

This is the utility belt — security, editor integration, console helpers, string manipulation. It's generally well-structured but has a few significant issues.


Architecture

utils.py
├── Security:     secure_resolve_path()
├── Editor:       open_editor_for_prompt()
├── Console:      safe_print(), clear_thinking_line(), print_welcome_banner()
├── Config:       _get_cfg_int()
├── AI helpers:   _make_continue_prompt(), _tail_of()
└── Extraction:   extract_code_block()

Critical Issues

1. secure_resolve_path has a path traversal bypass on certain platforms

if not os.path.commonpath([abs_base, target_path]) == abs_base:
    raise PermissionError(...)

os.path.commonpath raises ValueError on Windows if paths are on different drives:

os.path.commonpath(["C:\\data", "D:\\evil"])  # ValueError

This uncaught exception would crash the caller instead of raising PermissionError. More critically, on case-insensitive filesystems (macOS HFS+, Windows NTFS), os.path.commonpath does case-sensitive comparison, so:

abs_base = "/Users/data/Work_Data"
target   = "/Users/data/work_data/../../../etc/passwd"
# After abspath: "/etc/passwd"
# commonpath correctly catches this

That specific case works because abspath resolves ... But the real issue is symlinks:

# If /Users/data/work_data/link -> /etc
filename = "link/passwd"
target_path = os.path.abspath("/Users/data/work_data/link/passwd")
# = "/Users/data/work_data/link/passwd"  (abspath doesn't resolve symlinks!)
# commonpath check PASSES
# But the actual file is /etc/passwd

Fix: Use os.path.realpath instead of os.path.abspath:

abs_base = os.path.realpath(base_dir)
target_path = os.path.realpath(os.path.join(abs_base, filename))

if not target_path.startswith(abs_base + os.sep) and target_path != abs_base:
    raise PermissionError(...)

The startswith check with os.sep appended prevents /data_evil matching base /data.

2. extract_code_block misparses nested or indented code fences

if line.startswith("```"):
    if not in_block:
        in_block = True
    else:
        if line.strip() == "```":
            in_block = False
            ...
        else:
            current_block.append(line)

The logic is:

  • Opening: any line starting with ```
  • Closing: only lines that are exactly ``` after stripping

This means:

```python
def foo():
    pass
```python   ← this does NOT close the block (strip gives "```python")

The block never closes. More importantly, this line:

code here

Would be treated as an opening fence (starts with ```), not a 4-backtick fence. The parser doesn't handle variable-length fences per the CommonMark spec.

Also, the opening fence check skips the language identifier line, which is correct, but a line inside a code block that happens to start with ``` (e.g., showing markdown syntax) would be misinterpreted:

```markdown
Here's how to write a code block:
```python   ← parser thinks this is inside, treated as content
print("hello")
```          ← parser closes here, losing the real structure

Practical fix:

import re

def extract_code_block(text: str) -> str:
    pattern = re.compile(r"^(`{3,})(\w*)\s*\n(.*?)^\1\s*$", re.MULTILINE | re.DOTALL)
    blocks = [m.group(3) for m in pattern.finditer(text)]
    return "\n\n".join(blocks) if blocks else text

This handles variable-length fences and matches opening/closing fence lengths per CommonMark.


Significant Issues

3. open_editor_for_prompt has a TOCTOU vulnerability with temp files

tmp_fd, tmp_path = tempfile.mkstemp(suffix=".md", prefix="ai_prompt_")
with os.fdopen(tmp_fd, "w", encoding="utf-8") as f:
    tmp_fd = None
    f.write(header)

result = subprocess.run(editor_cmd + [tmp_path], check=False)

with open(tmp_path, encoding="utf-8") as f:
    raw_content = f.read()

Between the editor closing and the file being read back, another process could modify the temp file. This is a minor concern for a CLI tool, but the temp file is created with mkstemp defaults which on Unix gives 0o600 permissions — that's good.

More concerning: The tmp_fd = None trick to prevent double-close is clever but fragile. If os.fdopen raises an exception after taking ownership of the fd but before the assignment, the finally block would try to close an already-transferred fd. In practice, os.fdopen is unlikely to fail at that point, but the pattern is unusual.

4. clear_thinking_line doesn't account for multi-byte characters or ANSI codes

cols = shutil.get_terminal_size(fallback=(80, 20)).columns
print(" " * (cols - 1), end="\r", flush=True)

If the "thinking" line contained ANSI escape codes (which it doesn't currently, but could in the future), this wouldn't fully clear it. The standard approach is:

print(f"\r\033[K", end="", flush=True)  # ANSI: carriage return + clear to end of line

5. _get_cfg_int swallows all exceptions silently

def _get_cfg_int(config, section, key, fallback):
    try:
        return config.getint(section, key, fallback=fallback)
    except Exception:
        return fallback

This catches everything — including KeyboardInterrupt via the broad Exception (actually Exception doesn't catch KeyboardInterrupt, but it catches AttributeError if config is None, TypeError if fallback is wrong type, etc.). A malformed config value like max_tokens = "abc" would silently fall back with no warning. At minimum, log when this happens:

except (ValueError, configparser.Error) as e:
    logger.warning(f"Config error for [{section}]{key}: {e}. Using fallback={fallback}")
    return fallback

Moderate Issues

6. print_welcome_banner hardcodes the command list

print("[*] Commands: @model, @efficient, @scrub, @sequence, @sh, exit")

This duplicates the command registry in handlers.py and parsers.py. If commands change, three places need updating.

7. _make_continue_prompt is hardcoded English with no configurability

return (
    "The output was truncated due to an output limit.\n"
    "Continue EXACTLY from where you stopped.\n"
    ...
)

For a multi-engine tool, the continuation prompt could benefit from being engine-specific or configurable via the INI file. Some models respond better to different phrasing.

8. open_editor_for_prompt accepts logger parameter but other functions use the global logger

def open_editor_for_prompt(logger=None) -> str | None:
    if logger is None:
        logger = logging.getLogger("MultiAI")

This is the only utility function that accepts a logger parameter. Every other function in the codebase imports and uses the global logger from config.py. This inconsistency suggests an incomplete refactoring.

9. Comment noise continues

if not text:
    return ""  # Return empty string if input is empty
return text[-n:] if len(text) > n else text  # Return last n characters
extracted_blocks = []  # List to hold extracted code blocks
current_block = []  # List for the current block being constructed
in_block = False  # Flag to track if currently inside a code block

Summary Table

Severity Issue Location
🔴 Critical Symlink bypass in path traversal check secure_resolve_path()
🔴 Critical Code fence parser mishandles nested/language fences extract_code_block()
🟠 High commonpath raises ValueError on Windows cross-drive secure_resolve_path()
🟡 Medium _get_cfg_int silently swallows config errors _get_cfg_int()
🟡 Medium Clearing line without ANSI escape codes clear_thinking_line()
🟡 Medium Hardcoded command list in banner print_welcome_banner()
🟡 Medium Inconsistent logger parameter pattern open_editor_for_prompt()
🟢 Low Continue prompt not configurable _make_continue_prompt()
🟢 Low TOCTOU on temp file (minimal risk) open_editor_for_prompt()
🟢 Low Excessive comments Throughout

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions