I am working on Inflate decompression implementation.
It works pretty good with GZIP compressed files, but I am not sure I can test it good enough this way.
Is there some reference set of test files compressed with the different variants of the Deflate algorithm?
I mean with fixed and dynamic Huffman trees, with uncompressed blocks, different window sizes and all possible combinations of edge and corner cases. Also, some set of wrongly encoded files would be good, in order to test the error checking.
You can find some error and edge cases in infcover.c, though many of them are specific to zlib's inflate code, conceived to cover all of the branches therein.
Related
Is there any safe way to automate this process for multiple files? By safe I want that this will not break the code or introduce some kind of weird side effects that will manifest exactly when you don't want it in production.
I know about http://man.cx/expand. Is this method truly safe?
expand is pretty good, but I seem to recall it can get tricked in some conditions / for some languages, so for safety I'd have to assume "not truly".
Hopefully, however, your source code has plenty of tests before it goes to Production to demonstrate its full functionality and correctness.
Alternatively / additionally, if you're compiling or producing bytecode (e.g. Java), you could probably do a binary comparison of the artefacts to prove equivalence between the original and that produced from the de-tabbed source code.
I'm trying to:
load a jpeg through FileReference
write the result to a bytearray
extract the pixel data (and nothing else) directly from the bytearray
I've spent many hours looking for an AS3 class that can decode a jpeg object from raw binary data (such as from a bytearray), but to no avail (there is one here but it relies on Alchemy and a SWC, which isn't suitable).
Put simply, once I have the raw data in the byte array, I'm want to know how to discern the pixel data from the rest of the file.
I'm not interested in using the Loader class, or Bitmap's 'getPixels' function.
you will notice that steganography relies on using a png file. The reason that you can't use the jpg file(easily) is that the encoding process removes the reliability of pixel data. Jpg files can be encoded in several ways, including CMYK and RGB but most often YCbCr. Jpg compression relies on Fourier transform, which will eliminate the pixel-level detail. Therefore you will not be able to use the same process on jpg and png,gif,bmp etc.
This is not to say that you cannot do it in a jpg file, but you need to change the approach, or account for loss of data at compressions stage (or save uncompressed).
Well, you could manipulate the compressed data directly to include your message, but you'd have to read up on how you're able to do it without totally corrupting the image.
But if you're thinking to encode the message in the pixels to do a per-pixel diff when decoding your message I'm afraid your assumption (from the comment on Daniel's answer) is wrong.
JPEG compression is lossy - this means that when you put the amended pixel data back into the image file it which will cause all pixel data to be lost (since it needs to be re-encoded.) Instead of pixel data the only information that's saved in the file is how to reassemble an image that appears very similar looking to the original for the human eye, but the pixel data is not the same.
Not even if you decode the image, then save it as a JPEG file, then do the transformation of the original image and finally save this as a second JPEG with the exact same compression settings can you rely on a per-pixel comparison.
However, as I seem to remember that JPEG compresses the image data in 8*8 pixel blocks, you might be able to manipulate and compare the image data on a per-block basis.
extract the pixel data (and nothing else) directly from the bytearray
To do this you need to decode the jpeg first (apart from some eventual metadata, there is nothing else than pixel data in a typical jpeg file), and the way to do that is precisely using Loader.loadBytes and then BitmapData.getPixels. You can probably make your own decoder (like the one you posted), but I don't see any benefit in doing so.
A guy named Thibault Imbert at ByteArray.org adapted the libjpeg library for ActionScript 3. I have not tested this, but other folks seem to like it by the comments at bytearray.org.
http://code.google.com/p/as3-jpeg-decoder/downloads/list
I'm working on a project where I'll be sending lots of binary data (several images in one message) over HTTP POST to a RESTful interface.
I've looked into alternatives such as JSON, protobuff, thrift but found no conclusive comparisons of overhead introduced by these formats. Which one would you prefer to use in this case?
If you really need to do that all as part of a single HTTP POST, then I would first be more concerned about reliability and functionality. Efficiency is all going to be relative to what you are sending. If it is images in an already compressed format/container, then it is very likely you are not going to see a good percentage difference in efficiency without sacrificing something else. So in my opinion, probably the most effective thing to look into would be to use MIME encoding of your content in the POST which would mean encoding binary's using Base64. Using this you have the benefit that almost any development platform these days will either have this functionality either built in or will be easily available in external libraries for doing MIME / Base64. Sticking with highly used standards like these can make it easy to support a wide user base. Some links for reference:
http://en.wikipedia.org/wiki/MIME
http://en.wikipedia.org/wiki/Base64
I'd like to be able to do random access into a gzipped file.
I can afford to do some preprocessing on it (say, build some kind of index), provided that the result of the preprocessing is much smaller than the file itself.
Any advice?
My thoughts were:
Hack on an existing gzip implementation and serialize its decompressor state every, say, 1 megabyte of compressed data. Then to do random access, deserialize the decompressor state and read from the megabyte boundary. This seems hard, especially since I'm working with Java and I couldn't find a pure-java gzip implementation :(
Re-compress the file in chunks of 1Mb and do same as above. This has the disadvantage of doubling the required disk space.
Write a simple parser of the gzip format that doesn't do any decompressing and only detects and indexes block boundaries (if there even are any blocks: I haven't yet read the gzip format description)
Have a look at this link (C code example).
/* zran.c -- example of zlib/gzip stream indexing and random access
...
Gzip is just zlib with an envelope.
The BGZF file format, compatible with GZIP was developped by the biologists.
(...) The advantage of
BGZF over conventional gzip is that
BGZF allows for seeking without having
to scan through the entire file up to
the position being sought.
In http://picard.svn.sourceforge.net/viewvc/picard/trunk/src/java/net/sf/samtools/util/ , have a look at BlockCompressedOutputStream and BlockCompressedInputStream.java
FWIW: I've developed a command line tool upon zlib's zran.c source code which can do random access to gzip with the creation of indexes for gzip files: https://github.com/circulosmeos/gztool
It can even create an index for a still-growing gzip file (for example a log created by rsyslog directly in gzip format) thus reducing in the practice to zero the time of index creation. See the -S (Supervise) option.
interesting question. I don't understand why your 2nd option (recompress file in chunks) would double the disk space. Seems to me it would be the same, less a small amount of overhead. If you have control over the compression piece, then that seems like the right idea.
Maybe what you mean is that you don't have control over the input, and therefore it would double.
If you can do it, I'm imagining modelling it as a CompressedFileStream class that uses as its backing store, a series of 1mb gzip'd blobs. When reading, a Seek() on the stream would move to the appropriate blob and decompress. A Read() past the end of a blob would cause the stream to open the next blob.
ps: GZIP is described in IETF RFC 1952, but it uses DEFLATE for the compression format. There'd be no reason to use the GZIP elaboration if you implemented this CompressedFileStream class as I've imagined it.
Is there any plugin (gem) that after rendering page can clean and reformat it? By cleaning I mean removing unnecessary new lines and whitespaces.
Apologies if this is too orthogonal an answer: You should consider just making sure gzip compression is enabled. This makes it easier to view your src pages for debugging, requires less fiddling, and is a bigger win then simply removing unnecessary whitespace. If you have Apache as the front end, you could use mod_deflate (e.g., How do I gzip webpage output with Rails?) and other servers have similar gzip support. Most modern browsers support gzip, so you'll get the biggest bang for your buck.
Perhaps you are looking for http://www.railslodge.com/plugins/455-rails-tidy Jason's point about ensuring gzip is enables is super important as well.