Writing TIFF pixel data last and making 8 GB TIFF files - tiff

Is there any reason not to write TIFF pixel data last? Typically a simple TIFF file starts with a header that describes endianness and contains the offset to the first IFD, then the pixel data, followed at the end of the file by the IFD and then the extra data that the IFD tags point to. All TIFF files I've seen are written in this order, however the TIFF standard says nothing about mandating such an order, in fact it says that "Compressed or uncompressed image data can be stored almost anywhere in a TIFF file", and I can't imagine that any TIFF parser would mind the order as they must follow the IFD offset and then follow the StripOffsets (tag 273) tags. I think it makes more sense to put the pixel data last so that one goes through a TIFF file more sequentially without jumping from the top to the bottom back to the top for no good reason, but yet I don't see anyone else doing this, which perplexes me slightly.
Part of the reason why I'm asking is that a client of mine tries to create TIFF files slightly over 4 GB which doesn't work due to the IFD offsets overflowing, and I'm thinking that even though the TIFF standard claims TIFF files cannot exceed 2^32 bytes there might be a way to create 8 GB TIFF files that would be accepted by most TIFF parsers if we put everything that isn't pixel data first so that they all have very small offsets, and then point to two strips of pixel data, a first strip followed by a second that would start before offset 2^32 and that is itself no larger than 2^32-1, thus giving us TIFF files that can be up to 2*(2^32-1) bytes and still be in theory readable despite being limited to 32 bit offsets and sizes.
Please note that this is a question about the TIFF format, I'm not talking about what any third-party library would accept to write as I wrote my own TIFF writing code, nor am I asking about BigTIFF.

It's OK to write pixel data after the IFD structure and tag data. Many TIFF writing software and libraries do that. Most notable, libtiff based software writes image data first. As for writing image data in huge strips, possibly extending the 4GB file size, check with the software/libraries you intend to read the files with. Software compiled for 32-bit or implementation details might prevent reading such files. I found that for example modern libtiff based software, Photoshop, and tifffile can read such files, while ImageJ, BioFormats, and Paint.NET can not.

Related

how can i open big size html file?

I am using python script for comparing the data and output of differences is saved into html file, but due to huge difference the result file turns into 158 MB size which I am unable open in any browser
How to open it or should I convert it into pdf or some other format on which I can open it?
HTML is for rendering content for consumption. A 158MB web page is to large, a human can not be expected to process this amount of information in a single viewing.
Alter your script to restrict the number of displayed differences between files to more manageable amount and include the total number of additional differences a count.
eg:
<p>[X] differences identified between [file1] and [file2], the first 5 difference are listed below:</p>
<ul>
<li>[line]:[difference]</li>
<li>[line]:[difference]</li>
<li>[line]:[difference]</li>
<li>[line]:[difference]</li>
<li>[line]:[difference]</li>
</ul>
If you need to have all the differences available, consider a different file format. A plain text file will allow you to use the various large file viewing applications (top, more etc) available.

How to check pdf is exist or same 80% in mysql?

How to check pdf is exist or same 80% in mysql?
User want to upload pdf.
But problem is reup.
I think covert pdf to binary
=> I will have a string "X"(binary of that pdf) to save in mysql.
=> Select like %(splice (1/3 length(X) -> 2/3 length(X)).
maybe do it?
im using laravel
thank for reading
This cannot be done reasonably in MySQL. Since you are also using a PHP environment, it may be possible to perform via PHP, but to achieve a general solution you will need substantial effort.
PDF files are composed of (possibly compressed) streams of images and text. Several libraries can attempt to extract the text, and will work reasonably well if the PDF was generated in a straightforward way; however, they will typically fail if some text was rendered as images of its characters, or if other ofuscation has been applied. In those cases, you will need to use OCR to generate the actual text as it is seen when the PDF is displayed. Note also that tables and images are out-of-scope for these tools.
Once you have two text files, finding overlaps becomes much easier, although there are several techniques. "Same 80%" can be interpreted in several ways, but let us assume that copying a contiguous 79% of the text from a file and saving it again should not trigger alarms, while copying 81% of that same text should trigger them. Any diff tool can provide information on duplicate chunks, and may be enough for your purposes. A more sophisticated approach, which however does not provide exact percentages, is to use the normalized compression distance.

How should I store the images for my website?

What is the correct way of storing the image files in the database.
I am using file-paths to store the images.
But the problem is here. I have to basically show 3 different sizes of one image for my website. One would be used as thumbnail, the second would be around 290px*240px and third would be full size(approx 500px*500px). As it is not considered good to scale down the images using HTML img elements, so what should be the solution for it?
Currently, what I am doing is storing 3 different size images for one thing. Is there any better way?
Frankly the correct way to store images in a database is not to store them in a database. Store them in the file system, and keep the DB restricted to the path.
At the end of the day, you're talking about storage space. That's the real issue. If you're storing multiple copies of the same file at different resolutions, it will take more space than storing just a single copy of the file.
On the other hand, if you only keep one copy of the file and scale it dynamically, you don't need the storage space, but it does take more processing power instead.
And as you already stated in the question, sending the full-size image every time is costly in terms of bandwidth.
So that's the trade-off; storage space on your server vs processor work vs bandwidth costs.
The simple fact is that the cheapest of those three things is storage space. Therefore, you should store the multiple copies of the files on your server.
This is also the solution that will give you the best performance as well, which in many ways is an even more important point than direct cost.
In terms of storing the file paths, my advice is to give the scaled versions predictable names with a standard prefix or suffix compared to the original file. This means you only need to have the single filename on the database; you can simply add the prefix for the relevant version of the image that has been requested.
Nothing wrong with storing multiple versions of the same image.
Ideally you want even more – the #2x ones for retina screens.
You can use a server side script to generate the smaller ones dynamically, but depending on traffic levels this may be a bad idea.
You are really trading storage space and speed for higher CPU and RAM usage to generate them on the fly – depending on your situation that might be a good trade off, or it might not.
Agree with rick , you can store multiple size pics as your business requirements. You should store Image in folder on the server and store its location in database. Make hierarchy in the folder and store low res images inside the folders so that you can always refer to them with only one address.
you can do like this in your web.config
<add key="WebResources" value="~/Assets/WebResources/" />
<add key="ImageRoot" value="Images\Web" />
make .233240 and .540540 folders and store those pictures with same name inside them so you can easily access them.

Large text file manipulation

Seeking help on manipulating large text files. I'm not a programmer and apparently have some serious trouble with regex. The text files in question are an output of some tool that logs enormous amount of information about the server it's running on. The produced output needs to be adjusted to defined requirements. Each text file between 4-10 MB of size. The structure of the file broken into a sections that look like this:
http://imageshack.us/photo/my-images/853/79515724.jpg/
What I need is to somehow remove the some sections where as leaving SOME of them intact.
I have around 200 of those files.
Thank you.

Why people always encourage single js for a website?

I read some website development materials on the Web and every time a person is asking for the organization of a website's js, css, html and php files, people suggest single js for the whole website. And the argument is the speed.
I clearly understand the fewer request there is, the faster the page is responded. But I never understand the single js argument. Suppose you have 10 webpages and each webpage needs a js function to manipulate the dom objects on it. Putting 10 functions in a single js and let that js execute on every single webpage, 9 out of 10 functions are doing useless work. There is CPU time wasting on searching for non-existing dom objects.
I know that CPU time on individual client machine is very trivial comparing to bandwidth on single server machine. I am not saying that you should have many js files on a single webpage. But I don't see anything go wrong if every webpage refers to 1 to 3 js files and those js files are cached in client machine. There are many good ways to do caching. For example, you can use expire date or you can include version number in your js file name. Comparing to mess the functionality in a big js file for all needs of many webpages of a website, I far more prefer split js code into smaller files.
Any criticism/agreement on my argument? Am I wrong? Thank you for your suggestion.
A function does 0 work unless called. So 9 empty functions are 0 work, just a little exact space.
A client only has to make 1 request to download 1 big JS file, then it is cached on every other page load. Less work than making a small request on every single page.
I'll give you the answer I always give: it depends.
Combining everything into one file has many great benefits, including:
less network traffic - you might be retrieving one file, but you're sending/receiving multiple packets and each transaction has a series of SYN, SYN-ACK, and ACK messages sent across TCP. A large majority of the transfer time is establishing the session and there is a lot of overhead in the packet headers.
one location/manageability - although you may only have a few files, it's easy for functions (and class objects) to grow between versions. When you do the multiple file approach sometimes functions from one file call functions/objects from another file (ex. ajax in one file, then arithmetic functions in another - your arithmetic functions might grow to need to call the ajax and have a certain variable type returned). What ends up happening is that your set of files needs to be seen as one version, rather than each file being it's own version. Things get hairy down the road if you don't have good management in place and it's easy to fall out of line with Javascript files, which are always changing. Having one file makes it easy to manage the version between each of your pages across your (1 to many) websites.
Other topics to consider:
dormant code - you might think that the uncalled functions are potentially reducing performance by taking up space in memory and you'd be right, however this performance is so so so so minuscule, that it doesn't matter. Functions are indexed in memory and while the index table may increase, it's super trivial when dealing with small projects, especially given the hardware today.
memory leaks - this is probably the largest reason why you wouldn't want to combine all the code, however this is such a small issue given the amount of memory in systems today and the better garbage collection browsers have. Also, this is something that you, as a programmer, have the ability to control. Quality code leads to less problems like this.
Why it depends?
While it's easy to say throw all your code into one file, that would be wrong. It depends on how large your code is, how many functions, who maintains it, etc. Surely you wouldn't pack your locally written functions into the JQuery package and you may have different programmers that maintain different blocks of code - it depends on your setup.
It also depends on size. Some programmers embed the encoded images as ASCII in their files to reduce the number of files sent. These can bloat files. Surely you don't want to package everything into 1 50MB file. Especially if there are core functions that are needed for the page to load.
So to bring my response to a close, we'd need more information about your setup because it depends. Surely 3 files is acceptable regardless of size, combining where you would see fit. It probably wouldn't really hurt network traffic, but 50 files is more unreasonable. I use the hand rule (no more than 5), but surely you'll see a benefit combining those 5 1KB files into 1 5KB file.
Two reasons that I can think of:
Less network latency. Each .js requires another request/response to the server it's downloaded from.
More bytes on the wire and more memory. If it's a single file you can strip out unnecessary characters and minify the whole thing.
The Javascript should be designed so that the extra functions don't execute at all unless they're needed.
For example, you can define a set of functions in your script but only call them in (very short) inline <script> blocks in the pages themselves.
My line of thought is that you have less requests. When you make request in the header of the page it stalls the output of the rest of the page. The user agent cannot render the rest of the page until the javascript files have been obtained. Also javascript files download sycronously, they queue up instead of pull at once (at least that is the theory).