I have a tiff file with several hundred pages in it. My goal is to break it into many files each containing two pages. i.e. every two pages save a new file. It is basically a stack of single sheet documents scanned front and back.
I'm working on a C# Forms (visual studio 2008) application to automate the process.
My initial thought is to use Graphicsmagick to split every page into a seperate file then step through the files to join them back to geather again two pages at a time.
I have the split process working by calling a command like this.
gm convert largeinputfile.tif +adjoin singlepageoutput%d.tif
When I try and join just two of the pages back togeather again with a command like this
gm convert -page A4 -append singlepageoutput0.tif singlepageoutput1.tif New2pageImage.tif
This creates one long document containing both pages but no page break.
I have tried seveveral things with the -page option but I'm just guessing and it's not having much effect.
I'm very close to a working solution but stuck on the last bit.
Any ideas?
Thanks in advance
David
Thought I would answer this as I had a similar problem and this was the first SO question I came across.
This will take the first two pages and create a new multi-page tiff file.
gm convert -page A4 page0.tif page1.tif -adjoin output.tif
Then you take the file you have created and add a new page to it.
gm convert output.tif page2.tif -adjoin output.tif
gm convert output.tif page3.tif -adjoin output.tif
.... and so on ....
This will also have the added benefit of not eating through CPU and RAM as graphics magick will try to do the whole thing in RAM and a 10MB 500 page tiff will take about 24GB of RAM if done in one go.
Related
Is there any reason not to write TIFF pixel data last? Typically a simple TIFF file starts with a header that describes endianness and contains the offset to the first IFD, then the pixel data, followed at the end of the file by the IFD and then the extra data that the IFD tags point to. All TIFF files I've seen are written in this order, however the TIFF standard says nothing about mandating such an order, in fact it says that "Compressed or uncompressed image data can be stored almost anywhere in a TIFF file", and I can't imagine that any TIFF parser would mind the order as they must follow the IFD offset and then follow the StripOffsets (tag 273) tags. I think it makes more sense to put the pixel data last so that one goes through a TIFF file more sequentially without jumping from the top to the bottom back to the top for no good reason, but yet I don't see anyone else doing this, which perplexes me slightly.
Part of the reason why I'm asking is that a client of mine tries to create TIFF files slightly over 4 GB which doesn't work due to the IFD offsets overflowing, and I'm thinking that even though the TIFF standard claims TIFF files cannot exceed 2^32 bytes there might be a way to create 8 GB TIFF files that would be accepted by most TIFF parsers if we put everything that isn't pixel data first so that they all have very small offsets, and then point to two strips of pixel data, a first strip followed by a second that would start before offset 2^32 and that is itself no larger than 2^32-1, thus giving us TIFF files that can be up to 2*(2^32-1) bytes and still be in theory readable despite being limited to 32 bit offsets and sizes.
Please note that this is a question about the TIFF format, I'm not talking about what any third-party library would accept to write as I wrote my own TIFF writing code, nor am I asking about BigTIFF.
It's OK to write pixel data after the IFD structure and tag data. Many TIFF writing software and libraries do that. Most notable, libtiff based software writes image data first. As for writing image data in huge strips, possibly extending the 4GB file size, check with the software/libraries you intend to read the files with. Software compiled for 32-bit or implementation details might prevent reading such files. I found that for example modern libtiff based software, Photoshop, and tifffile can read such files, while ImageJ, BioFormats, and Paint.NET can not.
I created various flowcharts of processes with the latest visio of a small, complex company (not done yet, but there will be approximately 8 visio files, each with 3-6 sheets).
I am currently looking for a way to present the final result, my idea is to save those files as a website (VML). The problem however is, that I want one single file: therefore my question, how can I merge those files?
I tried to use my very limited html knowledge, the site didn't open anymore. I tried to use "Microsoft Expression Web 4" and just copied 2 test files in there, but it was not usable. My goal is to have on the left side contents, which are linked to the actual visio drawings (think: visio file1 - sheets1.1-1.5; visio file2 - sheets2.1-2.3,...)
Thanks a lot for any help (or other ideas), I am going crazy over this!
How much easier would it be to merge the drawings in Visio itself, before exporting to HTML. Just open the files side-by-side and drag the pages from one document to the other. You may need to press the control key to avoid moving the shapes instead of copying them.
I have a 96mb .json file
It has been filtered to only the content needed
There is no index
Binaries have been created where possible
The file needs to be served all at one time to calculate summary statistics from the start.
The site: https://3milychu.github.io/met-erials/
How could I improve performance and speed and/or convert the .json file to a compressed file that can be read client-side in javascript?
Most visitors will not hang around for the page to load -- I thought that the demo was broken when I first visited the site. A few ideas:
JSON is not a compact data format as the tag names get repeated in every datum. CSV/TSV is much better in that respect as the headers only appear once, at the top of the file.
On the other hand, repetitive data compresses well, so you could set up your server to compress your JSON data (e.g. using mod_deflate on Apache or compression on nginx ) and serve it as a gzipped file that will be decompressed by the user's browser. You can experiment to see what combination of file formats and compression works best.
Do the summary stats need to be calculated every single time the page loads? When working with huge datasets in the past, summary data was generated by a daily cron job so users didn't have to wait for the queries to be performed. From user feedback, and my own experience as a user, summary stats are only of passing interest, and you are likely to lose more users by making them wait for an interface to load than you are through not providing summary stats or sending stats that are very slightly out of date.
Depending on how your interface / app is structured, it might also make sense to split your massive file into segments for each category / material type, and load the categories on demand, rather than making the user wait for the whole lot to download.
There are numerous other ways to improve the load time and (perceived) performance of the page -- e.g. bundle up your CSS and your JS files and serve them each as a single file; consider using image sprites to reduce the number of separate requests that the page makes; serve your resources compressed wherever possible; move the JS loading out of the document head and to the foot of the HTML page so it isn't blocking the page contents from loading; lazy-load JS libraries as required; etc., etc.
I created a Bill of Material print out for a manufacturing company using advanced pdf. So one of the requirements is to print out the detailed manufacturing which is stored on a custom field (long text) in the assembly item record. This is done because items have different set of process each. The problem is in the print out, only a third of the manufacturing process is being printed. Normally the instruction is around 4k characters, but the pdf print out has around 1k characters only. Is there a way to resolve this?
You may be experiencing a built-in Netsuite issue.
One possible workaround is that if your instructions are consistent you could pull them from a library stored in the file path. Make sure the files are "available without login"
Then you'd include them as:
<#include "https://system....." parse=false>
I want to know if I should have one html file per url(home,register,login,contact) i got more than 50 or should i separate them into like 5 files and get them through ?id=1,2,3,4,5,6 etc.
I want to know which method is more convenient , anyway I have understood that the second method would have to load the whole file which will be more slower than loading a single file.
But loading a single file will require more petitions and request to and from the server and the whole html files will be heavier due to i have to write a head and include all the files for each one of them
In past experience, I make sure that any components with distinct functionality is placed in its own file. I would consider distinct functionality as the examples that you listed above (home, register, login, contact, etc). On the other hand, if you are managing blog posts (or something similar), I would definitely use GET requests (i.e. ?page=1,2,3).
I have also maintained websites that have about 50-100 different pages, but it did use a content management system. If you feel overwhelmed, this could also be a possibility to explore.
If you do not choose to use a cms, I would recommend you to use partial files. A good example for a partial would be a header or footer. By using partials, you no longer need to replicate the same code on multiple pages (say goodbye to creating 50 navbars).