add best+key+index, best, best-key and best-index#3076
add best+key+index, best, best-key and best-index#3076Metaxal wants to merge 4 commits intoracket:masterfrom
Conversation
|
It looks like the |
|
I would prefer to avoid this option because collections-lib is not a built-in package, which comes with several issue:
Furthermore, a stress test suggests that Code: #lang racket
(require (prefix-in c: data/collection))
(define N 1000000)
(define l (range 1 (+ N 1)))
(displayln "best")
(collect-garbage)
(define n 0)
(time
(for ([i (in-range 10)])
(set! n (+ n (best l <)))))
(displayln n)
(displayln "data/collections find-best")
(collect-garbage)
(set! n 0)
(time
(for ([i (in-range 10)])
(set! n (+ n (c:find-best l <)))))
(displayln n) |
|
"discoverability" As much as I agree -- having written these functions myself more than once -- I also think that repeating functionality across packages is less than ideal. I'd rather see Ryan's package incorporated into the core. |
|
I suspect the performance problem can be fixed by adding specialized forms for lists and vectors and by weakening the contract slightly (turning the get-key and compare contracts into flat arity checks). I agree with @Metaxal about dependencies; that's more than I would want to transitively depend on. I think that's fixable too, but that would depend on Alexis. |
|
@rmculpepper adding specialized forms is precisely what this PR (and the next) is doing. I'm very happy for @mfelleisen I dislike code duplication too—and I like what data/collection does. But I also dislike slow functions when I know how to make them faster. So if we go with |
This PR adds a second, alternative API, it doesn't add a fast path to a single unified API. That's a big difference. Keeping specialized implementations of an API hidden from the user is a valuable thing to do. |
|
@jackfirth That remark would be true if the first API was part of the core library, but currently it's not. Also, racket built-ins are all using such type-specific functions (such as If really there is so much reluctance (which I find astonishing), would you all prefer I write a separate package with |
|
Hi, so what do you propose? It's okay if you reject the PR, but I'd prefer if it doesn't stay on hold indefinitely so I can at least put it somewhere. It's also okay if you want to request more from this PR before being accepted. Maybe you prefer a single Or something else? |
|
(Haven't gotten around to replying until now, due to COVID-19 and some other stuff I had to put Racket on the back burner for a while.) @Metaxal I think this could be useful, so no rejection from me. Here are my thoughts:
> (sequence-best (list 15 30 10 20 10) < #:key -)
(selection #:element 30 #:key -30 #:position 1)
(sequence-max [seq (sequence/c any/c)]
[less-than? (-> any/c any/c boolean?) <] ; default to < for argmax behavior
[#:key key-function (-> any/c any/c) values]) -> any/c
> (sequence-max (list 1 2 3 4 5))
5
> (sequence-max (list 1 2 3 4 5) >)
1
> (sequence-max (list 1 2 3 4 5) #:key -)
1
(sequence-select-max [seq (sequence/c any/c)]
[less-than? (-> any/c any/c boolean?) <] ; default to < for argmax behavior
[#:key key-function (-> any/c any/c) values]) -> selection?
> (sequence-select-max (list 1 2 3 4 5))
(selection #:element 5 #:key 5 #:position 4)
> (sequence-select-max (list 1 2 3 4 5) >)
(selection #:element 1 #:key 1 #:position 0)
> (sequence-select-max (list 1 2 3 4 5) #:key -)
(selection #:element 1 #:key -1 #:position 0)What do you think? |
|
Thanks a lot for your comments @jackfirth, I have a clearer view of the situation now.
How does this sound? Side question: |
Was there any function that returns a transparent struct in the way you described? If not, returning values might be better for the sake of consistency of API. |
|
I mostly disagree with @jackfirth about the design here. I wouldn't want to switch to a generic API without a more wholesale move to that in Racket (which is one of the goals for Rhombus). Returning multiple values is, as @sorawee says, more idiomatic than a new struct. And the single-result versions seem quite useful to me. |
|
Exposing list and vector based versions for the sake of consistency with current Racket style is reasonable, but please don't omit the sequence version. There's little point to having a generic sequence interface at all if there's no way to port type-specialized code to use it. I don't think multiple values is an idiom we want to keep introducing in new Racket APIs. At the very least, doing so should require more justification than merely being consistent with existing APIs. They are awkward to use, hard to document, give confusing error messages, don't compose well with many other Racket constructs (e.g. how do you put them in a data structure?), and make code harder to read than it would be with named field accessors. Their downsides are significant. The only thing they've got going for them is avoiding allocation, and I don't think that matters enough in this case. Rhombus could improve the situation on both these points, but I don't think that means we should avoid improving new Racket APIs. We should be less radical, but we shouldn't avoid trying anything new at all. |
@Metaxal Would |
|
@jackfirth Put it this way: If I was teaching again in front of 15 pairs of first-year students and they add to find the minimum of a list knowing that either Actually, |
|
Hey everyone, so what's the next step then? |
|
Let me also offer you (yet) another name: Re: values vs struct, I agree with Sam here. As a person who initiates racket/rhombus#78, I understand the pain of IMO, what you can do (and in fact, what you already did) is to make a package that is a wrapper library with a better interface. It won't help with discoverability of your library, unfortunately, but the consistency of the Racket library won't be compromised. |
|
My accumulated two cents: I'm satisfied that it's reasonable to add this to To me, the I personally don't find the word "best" intuitive to describe these functions. It conflicts with the following two intuitions I do have: (1) a partial (or total) order is usually defined by its <= relation; and (2) the "best" numeric score is usually the greatest, not the least. But It should be straightforward to extend So I think my preference would be to extend I also don't think discoverability is a good argument for the name "best". I think it would be more useful to write a little example section and index words like "minimum", "minimize", "maximum", "maximize", "extreme", etc. On the other hand, I'm not the main prospective user of these functions. Feel free to argue the points. The implementation has a leftover |
|
@sorawee @rmculpepper over the years, I've been redefining this function over and over, and the least confusing names I tried are Regarding numeric scores, there are still many cases where
(define (better? s1 s2)
(or (< (height s1)
(height s2))
(and (= (height s1)
(height s2))
(> (width s1)
(width s2)))))Here the objects can't be compared with a single scalar, so Also, fixing two functions to make them do the same thing but in opposite directions when one function would do both jobs just fine sounds wrong to me. That said, if the consensus among core devs is for changing both (As I said before, I'm okay with discarding I'm perfectly fine with adding keywords in the docs. |
|
Don't know if we're interested in more bells/whistles suggestions but one thing that I run into a bunch with |
|
@rfindler This is indeed extremely handy, but it's probably better left as a whole different topic. I have something for this, called 'comparators', and so does @jackfirth :) If there's a desire for that I'm happy to make a proposal, but I also worry this will lead to a very lengthy discussion :) I think it would make a very useful addition in any case. |
|
@Metaxal lets definitely not go that route here! I was just thinking of something simple where I could put in a list of comparators and the library would do the right thing (it would assume they were "strict" comparison functions that treat |
|
@rfindler that is indeed useful, but probably should be implemented as a separate combinator? It then can be used with both |
|
@sorawee Let's continue this discussion on a different place maybe (slack?) |
|
Agreed! |
|
ping! Can we look at this again so it is not forgotten? There's an approved review by @bennn but no merge. |
|
@mfelleisen @rmculpepper @samth @jackfirth @sorawee Here's a more 'generic' implementation based on It's only a first sketch though and an actual implementation would use syntax-parse and would have more checks. Is it worth pursuing this road instead of the original PR? #lang racket
(define no-result (gensym))
;; 'Generic' macro
(define-syntax-rule (define-find-best name elts? elts idx elt idx-elt-binder ...)
(define (name elts better? #:value [extract-val values])
(unless (elts? elts)
(raise-argument-error 'name (symbol->string 'elts?) elts))
(for/fold ([best-elt #f]
[best-idx no-result]
[best-val #f])
(idx-elt-binder
...)
(define val (extract-val elt))
(if (or (eq? best-idx no-result)
(better? val best-val))
(values elt idx val)
(values best-elt best-idx best-val)))))
;;; Instantiations for some types
(define-find-best vector-best+index+value vector? elts idx elt
[idx (in-naturals)]
[elt (in-vec
992E
tor elts)])
(define-find-best best+index+value list? elts idx elt
[idx (in-naturals)]
[elt (in-list elts)])
(define-find-best dict-best+key+value dict? elts idx elt
[(idx elt) (in-dict elts)])
; and some more?
;::::::::::::::;
;:: Examples ::;
;::::::::::::::;
(best+index+value '(1 2 3 2 8 4) >)
(define vec (vector "a" "aa" "bcb" "b" "bbbb" "aaaa"))
(vector-best+index+value vec < #:value string-length)
(vector-best+index+value vec >= #:value string-length)
(dict-best+key+value vec >= #:value string-length) ; same as vector-best
(define dic '((a . 5) (b . 2) (c . 3)))
(dict-best+key+value dic <)
(vector-best+index+value dic <) ; error with good messageEdit: |
|
Forgot to say that I changed from |
|
Huhh, I don't know how to describe this, but it feels kinda awkward to provide the macro Also, it should be named |
|
@sorawee All good points. Indeed, I was thinking of exposing only the generated xxx-best that make sense, but I wouldn't be against exposing the definer too, but only if it becomes clean enough in the end. My point was more that this new approach would avoid most code duplication. |
|
I think providing the list, vector, and sequence versions makes sense. For dictionaries, I don't think we want to conflate "keys" and "indices" as being the same thing. Dictionary keys are usually unordered and sparse. If a user wanted to apply the |
|
@jackfirth What about this: where |
|
The dict version's expected semantics aren't clear to me. Does |
|
Keys in dicts play the same role as indices in vectors or lists. So This is departing from (I just cleaned the code above a little.) |
|
I've encountered cases where I need the value corresponding to the max or min key of a dictionary-like collection. I don't think we should pick between comparing keys and comparing values, they both have many valid use cases. |
|
Then what do you think of having the |
|
@jackfirth I'm also okay with not providing the dict form in this PR if we can't get a consensus of course. |
|
Ok, at this point it may just be best for me to make another PR with the list, vector and maybe sequence or dict variants to start afresh. I don't have much free time so it may take a while, sorry about that. |
|
Do we ever expect to use |
|
it's either 1 (like for |
After a discussion on Slack, it appears that a few people (myself included) often have a need for a more flexible version of
argmin.This PR introduces a new function
best+key+indexthat has a similar interface tosortand returns 3 values. Three additional functions are provided for convenience, which return only one of the three values. Examples from the docs:If this PR is accepted, I will make another one for
vector, which was my original intention.Tests are added, docs are updated and all tests pass AFAICT.