Add sequence conversion functions for built-in collection types#3525
Add sequence conversion functions for built-in collection types#3525jackfirth wants to merge 6 commits intoracket:masterfrom
Conversation
This commit adds the following functions: - sequence->bytes - sequence->string - sequence->vector - sequence->set Each of these functions is equivalent to the corresponding list-specific version, such as list->vector, except they accept arbitrary single-element sequences instead of lists. The functions include two important optimizations: fast paths for specific kinds of sequences and a fast path for already-converted inputs. The functions are meant to serve as drop-in replacements for their monomorphic equivalents, with the exception that they always produce immutable values instead of mutable ones. To simplify the APIs of these collection types, all other forms of converting between these collections are soft-deprecated in favor of the unified sequence conversion APIs.
|
I like the overall PR, but I disagree with producing immutable values like |
|
A drawback of distributing these functions among and appeaars to be 55MB vs. 69MB with Racket CS on x86_64 according to |
|
Adding to |
|
I think this would be a reasonable addition to I object to "prefer" and "soft deprecated". I would be happy with "see also" for links to the new functions.
For the custom errors for bytes and strings, consider phrasing those as normal contract errors with additional explanation. You can do that by inserting "contract violation;\n " at the beginning of the error message. (I think it would also be fine to omit the custom errors, though.) |
|
Update: I was able to remove the dependency on
This change introduces a lot more ways to convert between different collection types, so I think it's important we include some soft guidance on which way to use. How about "consider using" instead of "prefer using"?
What's a good way to fix this without introducing terrible performance?
Done! |
I think "consider" and "prefer" should have a stated reason, and some reasons are fine for the reference docs and some reasons belong elsewhere. "Consider using sequence->list if you need to handle other inputs in addition to vectors" would probably be fine, because it doesn't push me towards something else if
I think you just have to check the list yourself using |
|
In spite I really like the idea of immutable-by-default and do-not-make-unnecessary copies, I think an consistent standard library is more important. All the other functions in the standard libraries make the mutable version, unless immutability is explicit. Related to racket/rhombus#22 |
|
I don't think it's fair to call the standard library consistent about this, because none of the contracts or documentation actually say that the returned strings are mutable, and occasionally they aren't. For example, All that said, we should not continue this convention. It is a mistake with wide ranging consequences. Backwards compatibility is a good reason not to change existing functions, but I don't think consistency with an undocumented, unchecked convention is a good reason to perpetuate this mistake in new APIs. We should instead write new APIs that allow both mutable and immutable inputs but which produce immutable outputs by default, and we should document that behavior. |
Instead of recommending the generic conversion functions, we just include "see also" links. The generic functions already document that there shouldn't be a performance reason to prefer the specialized versions. This avoids pushing people to learn about generic sequences if their code already works fine without generic sequences.
That might be fair for strings. What about vectors?
Why do I feel like we are repeating the discussion in #3076... I definitely agree that immutability is a good idea. But they are inconsistent with the current convention, which could cause confusion to users. In Rhombus, we should definitely make immutability the default. But in Racket, I hope that they should follow the convention, and if you wish to add the immutable variants, perhaps suffix them with |
|
Hmm. I agree it's confusing when users can't tell which functions return immutable data and which don't. In line with the existing
|
This pull request adds the following functions:
sequence->bytesadded toracket/bytessequence->stringadded toracket/stringsequence->vectoradded toracket/vectorsequence->setadded toracket/setEach of these functions is equivalent to the corresponding list-specific version, such as
list->vector, except they accept arbitrary single-element sequences instead of lists. The functions include two important optimizations: fast paths for specific kinds of sequences and a fast path for already-converted inputs.The functions are meant to serve as drop-in replacements for their monomorphic equivalents, with the exception that they always produce immutable values instead of mutable ones. To simplify the APIs of these collection types, all other forms of converting between these collections are soft-deprecated in favor of the unified sequence conversion APIs.
This change is backwards compatible, but it does represent a shift from some of the current Racket API design choices. I think it's a good shift worth making in standard Racket, but I'm open to discussing alternatives.
This pull request is missing tests because I'm not sure where to add them. Pointers to the appropriate test files would be appreciated.