8000
Skip to content

duplicate ID handling#5464

Open
jesboat wants to merge 4 commits intoracket:masterfrom
jesboat:pr-draft-dup-id-functions-and-uses
Open

duplicate ID handling#5464
jesboat wants to merge 4 commits intoracket:masterfrom
jesboat:pr-draft-dup-id-functions-and-uses

Conversation

@jesboat
Copy link
Copy Markdown
Contributor
@jesboat jesboat commented Mar 7, 2026

PR Notes

This PR introduces two new functions in racket/private/stx,
stx-find-duplicate-ids and raise-if-duplicate-identifiers. Then,
throughout the codebase, places where duplicate identifier syntax errors
are raised are changed to use one of the two new functions. This causes
those syntax errors to highlight all copies of the duplicate identifier,
instead of just one.

It also centralizes two previous chunks of code which did duplicate ID
checking: the implementation of check-duplicate-identifier and an
implementation in the contract code. The implementation is written in
pure #%kernel in the intent that it will also soon replace a third
duplicate-identifier check in the implementation of let.

The new implementation uses straightforward loops instead of a let/ec
previously used in check-duplicate-identifier. I have not benchmarked
this, but would be quite surprised if the let/ec version wins.

Unlike the implementation in let which uses a list-based loop for
small input sizes, we use a hashtable for all input sizes (except for 0
and 1, for which there can be no duplicates). This is based on some
benchmarking done on lists of various sizes which have no duplicates:
https://gist.github.com/jesboat/a47f090dca2094b5cd36598ebb5e6406

There is an ad-hoc test suite which tests almost all the changes:
https://gist.github.com/jesboat/83e728cd39a818d9a977b7bb1bf3c9f9
It checks that duplicate identifiers are detected and that the
exn:fail:syntax contains the expected number of identifiers all of
which have distinct source locations. It is missing tests for three
changes:

  • "duplicate identifier" in racket/collects/racket/private/unit/unit-infer.rkt
  • 2nd "duplicate declared identifier" in racket/collects/racket/private/class-internal.rkt
  • "duplicate identifier" in racket/collects/racket/private/class-internal.rkt

There is no fundamental reason those aren't tested; I'm just not
familiar enough with units and classes that writing a test which tickles
those particular places would have been straightforward.

Commit log

duplicate id handling: new helper functions

Introduce new helper functions:

stx-find-duplicate-ids is like check-duplicate-identifier, except it
returns all copies of the duplicated identifier, in a form which makes
it easy to pass them to raise-syntax-error and get a syntax error which
highlights all uses. (This commit does not introduce a public variant,
although it might be desirable to add one in the future.)

raise-if-duplicate-identifiers does exactly what it says on the tin.

These functions are tested in the rest of the PR series.

duplicate id handling: use new helpers in various places

Use the new helpers in various places across the codebase, where
the changes are unobtrusive. There are ~29 uses total, of which this
PR changes 28.

An uncommitted ad-hoc test suite tests all but 3 (one in racket/unit, two in
racket/class). I don't know how to integrate it into the Racket test infra,
but it's at:
https://gist.github.com/jesboat/83e728cd39a818d9a977b7bb1bf3c9f9

duplicate id handling: use new helpers in contract-in, contract-out

The change to use the new duplicate ID helper functions in the code for
(require (contract-in ...)) and (provide (contract-out ...)) is
not trivial, and is presented here as its own commit.

It is tested by the uncommitted test suite (along with most of the
changes in the previous commit) at:
https://gist.github.com/jesboat/83e728cd39a818d9a977b7bb1bf3c9f9

duplicate id handling: replace check-duplicate-identifier implementation

Replace the old implementation (which was written using an, um, odd style
using an escape continuation) with a simple delegate to the new function.

Add some basic test cases.

@jesboat jesboat marked this pull request as draft March 7, 2026 16:33
@jesboat jesboat force-pushed the pr-draft-dup-id-functions-and-uses branch 4 times, most recently from a20ce9e to a03cef3 Compare March 13, 2026 14:36
@jesboat jesboat force-pushed the pr-draft-dup-id-functions-and-uses branch from a03cef3 to e07bf3c Compare March 16, 2026 14:28
(define (signal-dup-syntax-error!)
(define-values (dup origs) (stx-find-duplicate-identifiers (reverse (unbox bound-id-list))))
(when dup
(raise-syntax-error who "duplicate identifiers" stx dup origs)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is a semantics-preserving change. But I am pretty sure we don't need the box (also the comment is out of sync with the box)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I think this is okay. Probably okay to use set!, instead of set-box!, tho?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior would change from highlighting all copies of the duplicate identifier to only highlighting an arbitrary two. It felt like that's an acceptable compromise for using the common stx-find-duplicate-identifiers function rather than custom code here.

Modifying stx-find-duplicate-identifiers to find all the duplicates would actually be pretty straightforward though. I can see that resulting in unwieldy errors if somebody has, say, 50 copies of the same identifier. But perhaps we say that if somebody has 50 copies of the same identifier, something has already gone wrong enough that we're okay with giving them an unwieldy error. :)

I'll modify to use set! instead of boxes. (I suspect the only reason I used a box was a vague tendency to prefer value mutation over variable mutation.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying stx-find-duplicate-identifiers to find all the duplicates would actually be pretty straightforward though. I can see that resulting in unwieldy errors if somebody has, say, 50 copies of the same identifier. But perhaps we say that if somebody has 50 copies of the same identifier, something has already gone wrong enough that we're okay with giving them an unwieldy error. :)

I think it would be great to include them all. We can let the code that renders the error message decide on some cutoff if one is needed, I'd say (since this won't be the only place that can generate big lists). In the case of DrRacket, you hit cmd-. to cycle between them so 50 seems totally fine. 10^6 may be a problem if there's some non-linear algorithm somewhere, tho. Not a new problem!

I'll modify to use set! instead of boxes. (I suspect the only reason I used a box was a vague tendency to prefer value mutation over variable mutation.)

Does it help to know that a set!d variable (probably) has a different representation at runtime that's more "value" like? From my perspective, I prefer the set! because then there's a little bit more information that's "obvious" to the compiler about the fact that the assignment doesn't escape a certain scope. Probably doesn't actually matter in this case, of course!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current version of the PR includes all copies of the identifier, and uses set! here.

(I think the reason I lean towards boxes is because the mutation becomes an explicit part of the type and becomes explicit at the read sites. After a couple years reading spaghetti pushed me pretty far into the "make things explicit" camp. But I see your point about set! being easier for compilers to analyze.)

Copy link
Copy Markdown
Member
@rfindler rfindler Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think the reason I lean towards boxes is because the mutation becomes an explicit part of the type and becomes explicit at the read sites. After a couple years reading spaghetti pushed me pretty far into the "make things explicit" camp. But I see your point about set! being easier for compilers to analyze.)

Oh, I'm totally on the same page in that sense! I just see a set! as explicit; although the syntax written by the programmer doesn't show it at the binding site of a set!d variable it is definitely a property of the syntax whether a variable is mutated or not. Check syntax shows it, for example, and that shows only things evident about the syntax. A box, in contrast can flow elsewhere and that requires more than just information about the syntax; it requires information about what happens at runtime.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current version of the PR includes all copies of the identifier

Yay, thank you!

@rfindler
Copy link
Copy Markdown
Member

I'm loving the better error messages that're going to come out of this! DrRacket's -. keyboard shortcut is going to get a lot more useful!

@jesboat jesboat force-pushed the pr-draft-dup-id-functions-and-uses branch 2 times, most recently from a378125 to a9455f7 Compare March 16, 2026 17:13
jesboat added 4 commits March 16, 2026 13:14
Introduce new helper functions:

`stx-find-duplicate-ids` is like check-duplicate-identifier, except it
returns all copies of the duplicated identifier, in a form which makes
it easy to pass them to `raise-syntax-error` and get a syntax error which
highlights all uses. (This commit does not introduce a public variant,
although it might be desirable to add one in the future.)

`raise-if-duplicate-identifiers` does exactly what it says on the tin.

These functions are tested in the rest of the PR series.
Use the new helpers in various places across the codebase, where
the changes are unobtrusive. There are ~29 uses total, of which this
PR changes 28.

An uncommitted ad-hoc test suite tests all but 3 (one in racket/unit, two in
racket/class). I don't know how to integrate it into the Racket test infra,
but it's at:
https://gist.github.com/jesboat/83e728cd39a818d9a977b7bb1bf3c9f9
The change to use the new duplicate ID helper functions in the code for
`(require (contract-in ...))` and `(provide (contract-out ...))` is
not trivial, and is presented here as its own commit.

It is tested by the uncommitted test suite (along with most of the
changes in the previous commit) at:
https://gist.github.com/jesboat/83e728cd39a818d9a977b7bb1bf3c9f9
Replace the old implementation (which was written using an, um, odd style
using an escape continuation) with a simple delegate to the new function.

Add some basic test cases.
@jesboat jesboat force-pushed the pr-draft-dup-id-functions-and-uses branch from a9455f7 to f4f1fb4 Compare March 16, 2026 17:33
@jesboat jesboat changed the title draft: duplicate ID handling duplicate ID handling Mar 16, 2026
@jesboat jesboat marked this pull request as ready for review March 16, 2026 17:56
@jesboat
Copy link
Copy Markdown
Contributor Author
jesboat commented Mar 16, 2026

Opportunity for discussion: which IDs (first occurrence of the duplicated identifier, second occurrence i.e. first duplicate, other occurrences) should end up where (sub-expr argument or extra-sources argument) in the raise-syntax-error calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

0