8000
Skip to content

fix: close @mention neutralization bypass via U+200E/200F/00AD/034F invisible chars#23735

Merged
pelikhan merged 2 commits intomainfrom
copilot/fix-harden-unicode-text-bypass-again
Mar 31, 2026
Merged

fix: close @mention neutralization bypass via U+200E/200F/00AD/034F invisible chars#23735
pelikhan merged 2 commits intomainfrom
copilot/fix-harden-unicode-text-bypass-again

Conversation

Copy link
Copy Markdown
Contributor
Copilot AI commented Mar 31, 2026

hardenUnicodeText stripped strong BiDi overrides and zero-width chars but left four invisible characters unhandled. Inserting any of them between @ and a username breaks the neutralizeAllMentions regex match, leaving the mention un-neutralized in safe-output content.

Affected characters:

  • U+200E — LEFT-TO-RIGHT MARK
  • U+200F — RIGHT-TO-LEFT MARK
  • U+00AD — SOFT HYPHEN
  • U+034F — COMBINING GRAPHEME JOINER

Example bypass:

// Before fix
sanitizeContent("@\u200Fadmin please review")
// → "@\u200Fadmin please review"  ← mention NOT neutralized

// After fix
sanitizeContent("@\u200Fadmin please review")
// → "`@admin` please review"  ← invisible char stripped, mention wrapped

Changes:

  • sanitize_content_core.cjs — Extended Step 3 of hardenUnicodeText to strip the four characters; sanitize_label_content.cjs inherits the fix since it delegates to the same function
  • sanitize_content.test.cjs — Added per-character stripping tests and @mention bypass regression tests for all four characters
  • sanitize_label_content.test.cjs — Added equivalent bypass regression tests

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to 8000 connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/repos/github/gh-aw/contents/.github%2Fworkflows%2Faudit-workflows.md
    • Triggering command: /opt/hostedtoolcache/node/24.14.0/x64/bin/node /opt/hostedtoolcache/node/24.14.0/x64/bin/node --experimental-import-meta-resolve --require /home/REDACTED/work/gh-aw/gh-aw/actions/setup/js/node_modules/vitest/suppress-warnings.cjs --conditions node --conditions development /home/REDACTED/work/gh-aw/gh-aw/actions/setup/js/node_modules/vitest/dist/workers/forks.js (http block)
    • Triggering command: /opt/hostedtoolcache/node/24.14.0/x64/bin/node /opt/hostedtoolcache/node/24.14.0/x64/bin/node --experimental-import-meta-resolve --require /home/REDACTED/work/gh-aw/gh-aw/actions/setup/js/node_modules/vitest/suppress-warnings.cjs --conditions node --conditions development /home/REDACTED/work/gh-aw/gh-aw/actions/setup/js/node_modules/vitest/dist/workers/forks.js a227600b10686702-V=full de/node/bin/git ode_modules/vitest/dist/workers/forks.js /hom�� 4e27fb756447f59d91480344..HEAD .cfg 64/pkg/tool/linux_amd64/vet * origin _modules/.bin/gitest 64/pkg/tool/linux_amd64/vet (http block)
    • Triggering command: /opt/hostedtoolcache/node/24.14.0/x64/bin/node /opt/hostedtoolcache/node/24.14.0/x64/bin/node --experimental-import-meta-resolve --require /home/REDACTED/work/gh-aw/gh-aw/actions/setup/js/node_modules/vitest/suppress-warnings.cjs --conditions node --conditions development /home/REDACTED/work/gh-aw/gh-aw/actions/setup/js/node_modules/vitest/dist/workers/forks.js 78ab18e2bedbf4f9--experimental-import-meta-resolve tions/setup/node--require git chec�� r/work/gh-aw/gh-node 904d832f 0/x64/lib/node_mdevelopment -u (http block)
  • invalid.example.invalid
    • Triggering command: /usr/lib/git-core/git-remote-https /usr/lib/git-core/git-remote-https origin https://invalid.example.invalid/nonexistent-repo.git git conf�� user.name lure tions/setup/js/node_modules/.bin/git -M main /usr/sbin/git git init�� --bare --initial-branch=main k/gh-aw/gh-aw/actions/setup/js/node_modules/.bin/git '/tmp/bare-incregit '/tmp/bare-increadd cal/bin/git git (dns block)
    • Triggering command: /usr/lib/git-core/git-remote-https /usr/lib/git-core/git-remote-https origin https://invalid.example.invalid/nonexistent-repo.git git conf�� user.name lure tions/setup/js/node_modules/.bin/git -M main git git init�� --bare --initial-branch=main k/gh-aw/gh-aw/actions/setup/js/node_modules/.bin/git user.name Test User _modules/.bin/giagent-change.txt git (dns block)
    • Triggering command: /usr/lib/git-core/git-remote-https /usr/lib/git-core/git-remote-https origin https://invalid.example.invalid/nonexistent-repo.git git conf�� user.name lure k/gh-aw/gh-aw/node_modules/.bin/git -M main -branch git bran�� -M main t user.name Test User 86_64/git git (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

…se @mention bypass

Extends the Step 3 invisible character stripping in hardenUnicodeText to
include four characters that could be inserted between '@' and a username
to bypass the neutralizeAllMentions regex:

- U+200E (LEFT-TO-RIGHT MARK)
- U+200F (RIGHT-TO-LEFT MARK)
- U+00AD (SOFT HYPHEN)
- U+034F (COMBINING GRAPHEME JOINER)

Since hardenUnicodeText runs before neutralizeAllMentions, stripping these
characters ensures '@<invisible>username' is collapsed to '@username' and
then properly backtick-wrapped.

Adds regression tests in sanitize_content.test.cjs and
sanitize_label_content.test.cjs for each character individually and for
the complete @mention bypass scenario.

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/48cee184-8589-4a49-ac3d-30d0aa58b70c

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix hardenUnicodeText to strip U+200E, U+200F, U+00AD, and U+034F fix: close @mention neutralization bypass via U+200E/200F/00AD/034F invisible chars Mar 31, 2026
Copilot AI requested a review from pelikhan March 31, 2026 14:18
@pelikhan pelikhan marked this pull request as ready for review March 31, 2026 14:19
Copilot AI review requested due to automatic review settings March 31, 2026 14:19
Copy link
Copy Markdown
Contributor
Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the sanitization pipeline to prevent @mention neutralization bypasses that rely on inserting certain invisible Unicode characters between @ and a username.

Changes:

  • Extended hardenUnicodeText to strip U+200E, U+200F, U+00AD, and U+034F.
  • Added regression tests ensuring these characters are removed and can’t bypass @mention neutralization in both content and label sanitizers.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
actions/setup/js/sanitize_content_core.cjs Expands invisible-character stripping in hardenUnicodeText to close the reported bypass.
actions/setup/js/sanitize_content.test.cjs Adds per-character stripping tests + mention-bypass regression coverage.
actions/setup/js/sanitize_label_content.test.cjs Adds equivalent label-focused stripping and mention-bypass regression tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 932 to +937
// Step 3: Strip invisible zero-width characters that can hide content
// These include: zero-width space, zero-width non-joiner, zero-width joiner,
// left-to-right mark (U+200E), right-to-left mark (U+200F),
// soft hyphen (U+00AD), combining grapheme joiner (U+034F),
// word joiner, and byte order mark
result = result.replace(/[\u200B\u200C\u200D\u2060\uFEFF]/g, "");
result = result.replace(/[\u00AD\u034F\u200B\u200C\u200D\u200E\u200F\u2060\uFEFF]/g, "");
Copy link
Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardenUnicodeText now strips the literal U+200E/U+200F/U+00AD/U+034F characters, but decodeHtmlEntities() does not decode the corresponding named HTML entities (e.g. &lrm;, &rlm;, &shy;). That means inputs like @&rlm;admin will still bypass neutralizeAllMentions() because the regex sees @&... and won’t match, while GitHub’s renderer may decode the entity back into an invisible char. Consider decoding these named entities (and adding regression tests) so entity-encoded forms are also neutralized.

Copilot uses AI. Check for mistakes.
@pelikhan pelikhan merged commit bca8c30 into main Mar 31, 2026
173 of 177 checks passed
@pelikhan pelikhan deleted the copilot/fix-harden-unicode-text-bypass-again branch March 31, 2026 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hardenUnicodeText must strip U+200E, U+200F, U+00AD, and U+034F to close residual @mention neutralization bypass

3 participants

0