Menu

New Compression Method for 7-Zip called ZStandard

2016-06-27
2022-06-15
1 2 3 > >> (Page 1 of 3)
  • Tino Reichardt

    Tino Reichardt - 2016-06-27

    Hello Igor,

    Zstd, short for Zstandard, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios.

    It is provided as a BSD-license package, hosted on Github: ZStd Homepage: https://github.com/Cyan4973/zstd

    I am adding this new Codec for a while now and everything seems to work very well. Is it possible to include this new method in your mainline 7-Zip version ?

    Here is the link to the 7-Zip ZStd Homepage of me: https://mcmilk.de/projects/7-Zip-ZStd/ ...

    Thank you a lot for 7-Zip, it is fast and stable ... and with ZStd very very fast for making backups to USB 3.0 Disks with USB 3.0 Speed :-)

     
    👍
    1
  • Igor Pavlov

    Igor Pavlov - 2016-06-27

    Now I don't add new external methods to 7z code.
    And if you use 0x4F711xx id - probably you should request it or notify me.

     
    • Tino Reichardt

      Tino Reichardt - 2016-06-27

      The id is from Rich, he did the first versions of the plugin and got an id from you.

      So this should be fine ;)

       
      • Igor Pavlov

        Igor Pavlov - 2016-06-27

        As I remember I suggested 0x4F710xx for his codecs (LZHAM).
        But you use 0x4F711xx - that is another range.
        It's not problem, but I must know about this range and update methods.txt about these IDs.

         
  • Tino Reichardt

    Tino Reichardt - 2016-06-27

    Hello Igor,

    sorry for this. I thought, that 0x4F71101 was already registered and agreed with you :(
    It would be fine, if I can leave the define to the current value ;-)

    These two external I know currently:
    1) 0x4F71001 - LZHAM (http://richg42.blogspot.de/2015/11/lzham-custom-codec-plugin-for-7-zip.html)
    2) 0x4F71101 - Zstd (http://www.zstd.net)

    Is there any way, that ZStandard will get it into the mainline. The source is BSD licensed and Yann Collet will be happy with including v1.0, which will released when the beta time is over... this will be in some months I think.

    With best regards, Tino

     
  • Igor Pavlov

    Igor Pavlov - 2016-06-28

    Yes, you can use 0x4F71101, and I'll update methods.txt list with 0x4F711xx range.

    Now I don't plan to include any new external codec to 7-Zip.

     
    • Dec

      Dec - 2016-06-29

      Возможно, имеет смысл добавить дополнительное api в интерфейс 7z.dll, которое позволит разработчикам, использующим 7z.dll, использовать их собственные кодеки без перекомпиляции 7z.dll? Что то вроде новой экспортируемой функции RegisterCodecFactory(DWORD ACodecID, ICodecFactory AFactory)? И объект ICodecFactory будет по запросу 7z.dll создавать нужный кодек. У этого решения есть дополнительная фишка - кодеки можно будет писать практически на чем угодно, начиная с C, заканчивая Питоном :)

       
      • Igor Pavlov

        Igor Pavlov - 2016-06-29

        7-Zip supports external codecs for 7z.dll. You just need to place DLL to "Codecs" folder. It works for extraction / compression.
        But probably they want some additional parameter support in 7z.dll and GUI. So they recompile it.

         
      • Robert Pollak

        Robert Pollak - 2016-10-12

        Google translation:

        Perhaps it makes sense to add additional api in 7z.dll interface that enables developers using 7z.dll, use their own codecs without recompiling 7z.dll? Something like a new exported function RegisterCodecFactory (DWORD ACodecID, ICodecFactory AFactory)? And ICodecFactory object will 7z.dll request to create the required codec. This solution has an additional feature - the codecs can write almost anything, since with C, Python ending :)

         
  • Tino Reichardt

    Tino Reichardt - 2016-06-29

    Hello Igor,

    thanks a lot for adding the ID to the methods.txt. I will try to keep the 7-Zip Zstd version up to date, cause the speed of Zstandard @ around 100 Mib/s is needed for my my GPL USB-Backup program.

    with best regards, Tino

     
  • Igor Pavlov

    Igor Pavlov - 2016-06-30

    Note that if new version of your codec decoder is changed (and is not compatible with old data), you must change ID.

     
  • Tino Reichardt

    Tino Reichardt - 2016-06-30

    The Decoder will handle older versions correctly, it is saved within the ZStd Stream.
    No need to change the id for every new release.

    I started using ZStd for the Backup program since version 0.5 .. and all versions since (0.5.x, 0.6.x, 0.7.x) then can be decoded with that 7za.dll, which is currently around 450KB statically compiled incl. all these methods: ppmd, deflate, bzip2, zstd and lzma.

     
  • Tino Reichardt

    Tino Reichardt - 2016-12-26

    Hello Igor,

    you gave me the method ID range 0x4F711xx. I have added two other compression algos now.

    This is my current list, is this okay for you?
    0x xx - reserved
    10 xx - reserved (LZHAM, https://github.com/richgel999/lzham_codec)
    11 xx - reserved (Tino Reichardt)
    11 01 - reserved (ZStandard, https://facebook.github.io/zstd/)
    11 04 - reserved (LZ4, https://lz4.github.io/lz4/)
    11 05 - reserved (LZ5, https://github.com/inikep/lz5)

    with best regards, Tino

     
    • Igor Pavlov

      Igor Pavlov - 2016-12-27

      If algorithms are different, then we can need different ID ranges for these codecs in some cases. So it can be enhanced in future with new versions.
      So you must write full details for each new method.
      Author
      Date of creation
      If new versions are possible in future, what numbers you will use in these cases.

      For example, you add ZStandard, but you are not author of original codec.
      So we can have 2 ways -
      11 01 - is ID for ZStandard
      or
      11 01 - is ID for ZStandard from Reichardt

      What about any new future versions of ZStandard?
      And you must write how your version is related to original ZStandard code. For example, is it possible to write another implementation that still will be ZStandard?
      Probably
      1) you support some subset (maybe full) of features of ZStandard.
      2) you selected some way to encode ZStandard properies to 7-Zip properties.
      So actually it's not ZStandard, but it's something like ZStandard-Reichardt.
      And you must describe in details all these additions to original ZStandard code.

      Same things for LZ4 / LZ5.

       

      Last edit: Igor Pavlov 2016-12-27
      • Tino Reichardt

        Tino Reichardt - 2016-12-28

        Hello Igor,

        thanks for your fast reply.

        The algorithms are different, yes. Could you assign new ID ranges for LZ4 and LZ5?

        The authors of the different codecs are as follows:
        LZ4 and ZStandard: Yann Collet
        LZ5: Przemyslaw Skibinski

        The original streams of these 3 codecs are directly wrapped with the open 7-Zip container format. They use an 5 bytes header for defining version numbers and compression level information for showing them in the 7zFM GUI.

        I am very sorry... I forgot to mention, that I added direct Lzip, LZ4, LZ5 and ZStandard Archive support. So using tar-files from these 4 codecs is also possible.... like this:

        7z x -so test.tar.zstd | 7z l -si -ttar
        REM -> show contents of zstd compressed tar archiv test.tar.zstd
        
        7z x -so test.tar.lz | 7z l -si -ttar
        REM -> show contents of lzip compressed tar archiv test.tar.lz
        

        I currently added these Handler GUID's to my GUID.txt file:
        0E Zstd
        0F Lz4
        10 Lz5
        C6 Lzip

        Could you assign them also, Or give me some other ID's I should use for that ?

         
        • Igor Pavlov

          Igor Pavlov - 2016-12-28

          So try to describe information about all these codecs in txt file.
          1) what exact code is used (exact version information and version history)
          2) what modes of original code are supported.
          3) exact description of encoding headers.
          Memory requirements for different values of property ranges.
          4) what expectation of possible new versions of these codecs?
          That information can help to select good ID range.

          If these codecs can be used as external archive format, then describe it also. Does it uses additional header?

          About ID for archive format. It's not so important as codecs id.
          We can change archive-ID at any time.
          Now 7-Zip supports 1-byte ID in macros in RegisterArc.h.
          But full ID can be longer. So you can try to change macro source code.
          Or use IDs from 50-5F range.

           

          Last edit: Igor Pavlov 2016-12-28
          • Tino Reichardt

            Tino Reichardt - 2016-12-28

            I created such a txt file, you can view it here:
            https://github.com/mcmilk/7-Zip-zstd/blob/master/DOC/Methods-Extern.txt

            I will also maintain it, if you want.

             
          • Tino Reichardt

            Tino Reichardt - 2016-12-28

            (deleted - double post)

             

            Last edit: Tino Reichardt 2016-12-28
            • Igor Pavlov

              Igor Pavlov - 2016-12-28

              Please update it about version.

               Byte _ver_major;
               Byte _ver_minor;
               Byte _level;
               Byte _reserved[2];
              

              What is ver? Is it decoding or encoding version?
              How decoder must treat that header?
              Does decoder just ignore all these fields (5 bytes)?
              Are all zstd streams are compatible for all versions?
              And why we need all these properties?
              What do you show in "Method" column to user?

              - threading is supported through skippable frame id 0x184D2A50U
              

              What does it mean?
              Is it your addition to ZStandard?
              Or it's original zstd feature?

              - the codec is used as archiv handler also, see ZstdHandler.cpp
                  - when compiled with ZSTD_LEGACY_SUPPORT, then support is increased to these
                  addtional version numbers of zstd: v0.1 up v0.7
              

              Write also how you support old versions as codec (in 7z), if you support them.

               
              • Tino Reichardt

                Tino Reichardt - 2016-12-28

                What is ver? Is it decoding or encoding version?

                The version, which was used for compression.

                How decoder must treat that header?

                This header is informational only. It's not used for decompressing the data.

                Does decoder just ignore all these fields (5 bytes)?

                Yes.

                Are all zstd streams are compatible for all versions?

                All zstd versions <= 0.8 are considered legacy and are only supported when ZSTD_LEGACY_SUPPORT is defined at compile time. 7-Zip zstd has it enabled and can decompress all old versions....
                ZStandard reached version 1.0 in august 2016... the format was then considered stable and will not be changed in the future.

                And why we need all these properties?

                They are just generic information, to be used for

                What do you show in "Method" column to user?

                The zstd version and the level which was used for compression of that file.

                • threading is supported through skippable frame id 0x184D2A50U

                What does it mean?
                Is it your addition to ZStandard?
                Or it's original zstd feature?

                It has a special meaning for being able to be decompressed in a multithreaded way. This is currently an optional addition to the zstd stream. It's ignored by older versions and remains compatible in this way. Yann currently adds support for multithreading in the same way, so the upcoming release 1.2.0 will have it also ;-) But the feature will remain optional I think... so this is handled by skippable frames withing zstd.

                See the zstdmt branch for more details about it: https://github.com/facebook/zstd/tree/zstdmt

                And also the zstd compression format description: https://github.com/facebook/zstd/blob/zstdmt/doc/zstd_compression_format.md

                 
  • Igor Pavlov

    Igor Pavlov - 2016-12-28

    There is ZStandard, and there is 7z-ZStandard.
    If you write ZStandard stream without any properties, as, for example, bzip2 in 7z, then there is no any question.
    But you have created new substance (7z-ZStandard with 5 bytes properties).
    So every thing about these 5 bytes properties is now YOUR problem.
    So you must describe any aspect of these 5 bytes.
    For example, you must write that the Decoder MUST ignore all fields of these properties and try to decode stream with default ZStandard code.
    Is it really so?
    Even if version contains 0.1?

    I'm not sure about 0x184D2A50 things still. I don't know in what specification it must be placed.

     
    • Tino Reichardt

      Tino Reichardt - 2016-12-28

      Is it really so?
      Even if version contains 0.1?

      Yes. ZStandard detects the version itself from the zstd header, which follows the 5 Byte 7z Container header.

       
      • Igor Pavlov

        Igor Pavlov - 2016-12-28

        What do you mean?
        "5 Byte 7z Container header" probably is stored in 7z header at the end of archive.
        and "zstd header" probably is stored in data stream at the start of 7z archive at offset 32 from start of 7z file.
        So we can't say that "zstd header, which follows the 5 Byte 7z Container header". These headers are stored in different places.

         
        • Tino Reichardt

          Tino Reichardt - 2016-12-28

          Yes, you are right. The informational 5 Byte extra header is stored in the end of the archive.

           
1 2 3 > >> (Page 1 of 3)

Log in to post a comment.

MongoDB Logo MongoDB