snappy-framed-0.1.0.0: Decompression for framed Snappy streams

Safe HaskellNone
LanguageHaskell2010

Codec.Compression.Snappy.Framed

Contents

Description

This module provides facilities for decoding framed Snappy streams.

Snappy is a block compression format, meaning that the whole compressed stream must be kept in memory until it is fully decoded [1]. The task of splitting data into a stream of independently-decodable chunks is handled by the framing format. In addition, this format often provides checksums of the uncompressed data.

Unfortunately, for a long time snappy had no official framing format, and so a number of improvised formats appeared. While there is now a standard format, many of the historical formats are still in common use. The good news is that these formats mercifully begin with distinct magic byte sequences, and so can be easily distinguished.

The list of formats, and the names given to them, come from the snzip application (https:/github.comkubo/snzip).

[1]: In Snappy, the offsets used by back-references may be as large as a 32-bit word. As a result, a byte in the uncompressed stream can't be discarded until 4GB of uncompressed data following it has been decoded. This effectively makes Snappy a block compression format.

TODO (asayers): Tests

Synopsis

Documentation

decompress :: Monad m => ByteString -> Producer ByteString m (Either String ()) Source

Decompress a framed Snappy stream, reporting errors.

decompress_ :: Monad m => ByteString -> Producer ByteString m () Source

Decompress a framed Snappy stream, raising an exception on bad input. TODO (asayers): better names

decompress__ :: Monad m => ByteString -> Producer ByteString m (Either (ParsingError, Producer ByteString m ()) ()) Source

Decompress a framed Snappy stream, returning unconsumed input in the case of an error. TODO (asayers): we can do better in terms of streaming the input.

Internals

data FramingFormat Source

Snappy unfortunately has a variety of historical framing formats, and while the comminity has now accepted "framing2" as the default, Kafka still uses the "snappy-java" framing format.

parseHeader :: Parser FramingFormat Source

Attempt to parse the headers of each format in turn. This tells us which format we're using. If we don't see a header we recognise, we assume that we've been given an unframed snappy stream.

parseBlock :: FramingFormat -> Parser ByteString Source

Parse a single block of the compressed bytestream, returning a segment of the uncompressed stream.