[MEDIUM] inject-hashed-filenames.py: tag-aware HTML rewrite instead of substring replace #3

Closed
opened 2026-04-25 13:39:54 +02:00 by uwe.admin · 3 comments
Owner

From audit #1.

Problem

bin/inject-hashed-filenames.py:50-52 does a plain html.replace(old_src, new_src). If old_src ever appears outside a <script src="…"> attribute (e.g. inside a comment or a JSON literal embedded in the HTML), the rewrite is wrong.

Remediation

Tag-aware rewrite: regex limited to <script src="…"> (or a tiny HTML parser).

Acceptance

  • A test asserts that occurrences of old_src inside HTML comments / non-script-src contexts are NOT rewritten.

Effort: small.

From audit #1. ## Problem `bin/inject-hashed-filenames.py:50-52` does a plain `html.replace(old_src, new_src)`. If `old_src` ever appears outside a `<script src="…">` attribute (e.g. inside a comment or a JSON literal embedded in the HTML), the rewrite is wrong. ## Remediation Tag-aware rewrite: regex limited to `<script src="…">` (or a tiny HTML parser). ## Acceptance - A test asserts that occurrences of `old_src` inside HTML comments / non-script-src contexts are NOT rewritten. Effort: small.
uwe.admin added the
new
label 2026-04-25 13:39:54 +02:00
Author
Owner

Agent Evaluation

Feasibility: Trivial. The substring replace is exactly one line (bin/inject-hashed-filenames.py:50); a regex anchored to <script ... src="OLD" (and <link ... href="OLD" once stylesheets land) plus a small unit test covers it.

Impact: Low–Medium today. The placeholder filenames (/static/dist/app.js, /guest/dist/guest-app.js) are unique enough that a stray match in a comment or JSON literal is unlikely, but the failure mode is silent corruption — production HTML emits a broken script src and the SPA breaks at runtime. Worth fixing because the build pipeline is exactly the place where silent-substring bugs are most expensive.

Effort: Small.

Recommendation: Accept.

Implementation plan

  1. Replace html.replace(old_src, new_src) with a regex that matches only src / href attribute values (single or double quoted) on <script> / <link> tags. Pattern: re.compile(r'(<(?:script|link)[^>]*\b(?:src|href)\s*=\s*["\'])' + re.escape(old_src) + r'(["\']\s*/?>)'), replacement r'\g<1>' + new_src + r'\g<2>'.
  2. Refuse to write the file unchanged when zero matches occurred (turn the current print("skip") into an explicit warning so a typo in old_src is loud).
  3. Add tests/test_inject_hashed_filenames.py (pytest):
    • happy path: <script src="/static/dist/app.js"> → rewritten.
    • inert occurrence: <!-- /static/dist/app.js --> and <pre>{ "src": "/static/dist/app.js" }</pre> → NOT rewritten.
    • both attribute quote styles handled.
    • the script's existing skip paths (no manifest / no html / no entry) still exit gracefully.
  4. Wire the test into the scaffold's existing test runner (currently none — add a one-line pytest step to package.json scripts.test or to the parent project's prepare hook).

No decision checkboxes — the change is mechanical and the regex+test pair is the only sensible shape.

Note on stylesheets

The scaffold doesn't currently rewrite hashed CSS, but it's adjacent and easy to fold in. Recommend doing both <script src> and <link href> in the same regex so we don't have to revisit when CSS hashing lands.

## Agent Evaluation **Feasibility:** Trivial. The substring replace is exactly one line (`bin/inject-hashed-filenames.py:50`); a regex anchored to `<script ... src="OLD"` (and `<link ... href="OLD"` once stylesheets land) plus a small unit test covers it. **Impact:** Low–Medium today. The placeholder filenames (`/static/dist/app.js`, `/guest/dist/guest-app.js`) are unique enough that a stray match in a comment or JSON literal is unlikely, but the failure mode is silent corruption — production HTML emits a broken script src and the SPA breaks at runtime. Worth fixing because the build pipeline is exactly the place where silent-substring bugs are most expensive. **Effort:** Small. **Recommendation:** Accept. ### Implementation plan 1. Replace `html.replace(old_src, new_src)` with a regex that matches only `src` / `href` attribute values (single or double quoted) on `<script>` / `<link>` tags. Pattern: `re.compile(r'(<(?:script|link)[^>]*\b(?:src|href)\s*=\s*["\'])' + re.escape(old_src) + r'(["\']\s*/?>)')`, replacement `r'\g<1>' + new_src + r'\g<2>'`. 2. Refuse to write the file unchanged when zero matches occurred (turn the current `print("skip")` into an explicit warning so a typo in `old_src` is loud). 3. Add `tests/test_inject_hashed_filenames.py` (pytest): - happy path: `<script src="/static/dist/app.js">` → rewritten. - inert occurrence: `<!-- /static/dist/app.js -->` and `<pre>{ "src": "/static/dist/app.js" }</pre>` → NOT rewritten. - both attribute quote styles handled. - the script's existing skip paths (no manifest / no html / no entry) still exit gracefully. 4. Wire the test into the scaffold's existing test runner (currently none — add a one-line `pytest` step to `package.json` `scripts.test` or to the parent project's `prepare` hook). No decision checkboxes — the change is mechanical and the regex+test pair is the only sensible shape. ### Note on stylesheets The scaffold doesn't currently rewrite hashed CSS, but it's adjacent and easy to fold in. Recommend doing both `<script src>` and `<link href>` in the same regex so we don't have to revisit when CSS hashing lands.
uwe.admin added the
effort:medium
evaluated
labels 2026-04-25 13:44:11 +02:00
Author
Owner

Evaluated webapp-scaffold#3 — Small/Medium, recommend accept; tag-aware regex + pytest test, no decision needed.

Evaluated webapp-scaffold#3 — Small/Medium, recommend accept; tag-aware regex + pytest test, no decision needed.
uwe.admin added the
accepted
label 2026-04-25 21:32:07 +02:00
Author
Owner

Implemented #3 → commit b3b2903 (tag-aware regex on

Implemented #3 → commit b3b2903 (tag-aware regex on <script src>/<link href>; 8 tests covering positive + inert-substring cases).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: uwe.admin/webapp-scaffold#3
No description provided.