jackfirth · 2020-11-29T03:06:52Z

This pull request adds the following functions:

sequence->bytes added to racket/bytes
sequence->string added to racket/string
sequence->vector added to racket/vector
sequence->set added to racket/set

Each of these functions is equivalent to the corresponding list-specific version, such as list->vector, except they accept arbitrary single-element sequences instead of lists. The functions include two important optimizations: fast paths for specific kinds of sequences and a fast path for already-converted inputs.

The functions are meant to serve as drop-in replacements for their monomorphic equivalents, with the exception that they always produce immutable values instead of mutable ones. To simplify the APIs of these collection types, all other forms of converting between these collections are soft-deprecated in favor of the unified sequence conversion APIs.

This change is backwards compatible, but it does represent a shift from some of the current Racket API design choices. I think it's a good shift worth making in standard Racket, but I'm open to discussing alternatives.

This pull request is missing tests because I'm not sure where to add them. Pointers to the appropriate test files would be appreciated.

This commit adds the following functions: - sequence->bytes - sequence->string - sequence->vector - sequence->set Each of these functions is equivalent to the corresponding list-specific version, such as list->vector, except they accept arbitrary single-element sequences instead of lists. The functions include two important optimizations: fast paths for specific kinds of sequences and a fast path for already-converted inputs. The functions are meant to serve as drop-in replacements for their monomorphic equivalents, with the exception that they always produce immutable values instead of mutable ones. To simplify the APIs of these collection types, all other forms of converting between these collections are soft-deprecated in favor of the unified sequence conversion APIs.

sorawee · 2020-11-29T03:30:59Z

I like the overall PR, but I disagree with producing immutable values like vector. IIUC, it's generally assumed that vector is mutable (e.g. for/vector), so this interface makes Racket libraries self-inconsistent.

mflatt · 2020-11-29T03:31:05Z

A drawback of distributing these functions among racket/string, etc., is that it may substantially increase the footprint of each of those. Currently, the difference between

#lang racket/base
(require racket/bytes)

and

#lang racket/base
(require racket/bytes racket/sequence)

appeaars to be 55MB vs. 69MB with Racket CS on x86_64 according to (current-memory-use). It looks like adding them to racket/sequence would not have that drawback.

jackfirth · 2020-11-29T04:06:36Z

Adding to racket/sequence makes sense. It also fixes a cyclic dependency problem. Will do that next, except for sequence->set since that one depends on racket/sequence directly already.

rmculpepper · 2020-11-29T14:31:09Z

I think this would be a reasonable addition to racket/sequence. I also agree with Jack that for new APIs, producing immutable data is preferable as a default.

I object to "prefer" and "soft deprecated". I would be happy with "see also" for links to the new functions.

sequence->bytes and sequence->string raise errors referring to list->bytes and list->string when given bad sequences, like (sequence->string '(#\a #\b 3)).

For the custom errors for bytes and strings, consider phrasing those as normal contract errors with additional explanation. You can do that by inserting "contract violation;\n " at the beginning of the error message. (I think it would also be fine to omit the custom errors, though.)

…bytes

jackfirth · 2020-11-30T02:16:46Z

Update: I was able to remove the dependency on racket/sequence from all of the current modules, making the move to racket/sequence unnecessary.

I object to "prefer" and "soft deprecated". I would be happy with "see also" for links to the new functions.

This change introduces a lot more ways to convert between different collection types, so I think it's important we include some soft guidance on which way to use. How about "consider using" instead of "prefer using"?

sequence->bytes and sequence->string raise errors referring to list->bytes and list->string when given bad sequences, like (sequence->string '(#\a #\b 3)).

What's a good way to fix this without introducing terrible performance? sequence/c would be expensive right? Is there something better that I can do?

For the custom errors for bytes and strings, consider phrasing those as normal contract errors with additional explanation. You can do that by inserting "contract violation;\n " at the beginning of the error message. (I think it would also be fine to omit the custom errors, though.)

Done!

rmculpepper · 2020-12-03T13:35:00Z

This change introduces a lot more ways to convert between different collection types, so I think it's important we include some soft guidance on which way to use. How about "consider using" instead of "prefer using"?

I think "consider" and "prefer" should have a stated reason, and some reasons are fine for the reference docs and some reasons belong elsewhere. "Consider using sequence->list if you need to handle other inputs in addition to vectors" would probably be fine, because it doesn't push me towards something else if vector->list is already suitable for my program. "Consider using sequence->list because all Racket code should be written as generically as possible" (forgive the hyperbole) would not, IMO.

What's a good way to fix this without introducing terrible performance? sequence/c would be expensive right? Is there something better that I can do?

I think you just have to check the list yourself using (andmap char? _) or equivalent. For example:

(define (sequence->string seq)
  (define (bad)
    (raise-argument-error 'sequence->string "(sequence/c char?)" seq))
  (define (handle-list lst)
    (if (andmap char? lst)
        (string->immutable-string (list->string lst))
        (bad)))
  (cond
    ....
    [(list? seq) (handle-list seq)]
    [else (handle-list (sequence->list seq))]))

gus-massa · 2020-12-14T14:21:35Z

In spite I really like the idea of immutable-by-default and do-not-make-unnecessary copies, I think an consistent standard library is more important. All the other functions in the standard libraries make the mutable version, unless immutability is explicit.

Related to racket/rhombus#22

jackfirth · 2020-12-15T03:22:16Z

I don't think it's fair to call the standard library consistent about this, because none of the contracts or documentation actually say that the returned strings are mutable, and occasionally they aren't. For example, (string-trim "hello" #:left? #false #:right? #false) returns an immutable string. If the documentation for the existing functions promised to return (and/c string? (not/c immutable?)) I might agree with you. But as it is I can't really rely on them returning mutable or immutable either way without checking first. I wouldn't be surprised if there's plenty of other corner cases where the "returned strings, bytes, and vectors are mutable unless stated otherwise" convention is silently broken, since we don't check it with contracts and we don't document it.

All that said, we should not continue this convention. It is a mistake with wide ranging consequences. Backwards compatibility is a good reason not to change existing functions, but I don't think consistency with an undocumented, unchecked convention is a good reason to perpetuate this mistake in new APIs. We should instead write new APIs that allow both mutable and immutable inputs but which produce immutable outputs by default, and we should document that behavior.

Instead of recommending the generic conversion functions, we just include "see also" links. The generic functions already document that there shouldn't be a performance reason to prefer the specialized versions. This avoids pushing people to learn about generic sequences if their code already works fine without generic sequences.

sorawee · 2020-12-15T11:53:21Z

none of the contracts or documentation actually say that the returned strings are mutable

That might be fair for strings. What about vectors?

we should not continue this convention

Why do I feel like we are repeating the discussion in #3076... I definitely agree that immutability is a good idea. But they are inconsistent with the current convention, which could cause confusion to users. In Rhombus, we should definitely make immutability the default. But in Racket, I hope that they should follow the convention, and if you wish to add the immutable variants, perhaps suffix them with * or something like that (e.g. sequence->list*, sequence->vector*).

jackfirth · 2020-12-20T08:23:46Z

Hmm. I agree it's confusing when users can't tell which functions return immutable data and which don't. In line with the existing symbol->immutable-string and keyword->immutable-string functions, what do folks think about using these names instead:

sequence->immutable-bytes
sequence->immutable-string
sequence->immutable-vector
sequence->set (sets are kind of an oddball since the existing racket/set API produces immutable sets everywhere by default already)

jackfirth added the api design Related to the design of core libraries and language features label Nov 29, 2020

jackfirth marked this pull request as draft November 29, 2020 03:07

jackfirth added 2 commits November 29, 2020 18:08

Adjust error messages of string/bytes sequence conversion functions

95398fe

Remove dependencies on racket/sequence from racket/string and racket/…

08263e4

…bytes

jackfirth added 3 commits December 14, 2020 19:31

Fix error messages from sequence->bytes and sequence->string

2bc78ff

Add tests for sequence->string and sequence->bytes

1d07887

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add sequence conversion functions for built-in collection types#3525

Add sequence conversion functions for built-in collection types#3525
jackfirth wants to merge 6 commits intoracket:masterfrom
jackfirth:sequence-conversion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants