how can i open big size html file? - html

I am using python script for comparing the data and output of differences is saved into html file, but due to huge difference the result file turns into 158 MB size which I am unable open in any browser
How to open it or should I convert it into pdf or some other format on which I can open it?

HTML is for rendering content for consumption. A 158MB web page is to large, a human can not be expected to process this amount of information in a single viewing.
Alter your script to restrict the number of displayed differences between files to more manageable amount and include the total number of additional differences a count.
eg:
<p>[X] differences identified between [file1] and [file2], the first 5 difference are listed below:</p>
<ul>
<li>[line]:[difference]</li>
<li>[line]:[difference]</li>
<li>[line]:[difference]</li>
<li>[line]:[difference]</li>
<li>[line]:[difference]</li>
</ul>
If you need to have all the differences available, consider a different file format. A plain text file will allow you to use the various large file viewing applications (top, more etc) available.

Related

Writing TIFF pixel data last and making 8 GB TIFF files

Is there any reason not to write TIFF pixel data last? Typically a simple TIFF file starts with a header that describes endianness and contains the offset to the first IFD, then the pixel data, followed at the end of the file by the IFD and then the extra data that the IFD tags point to. All TIFF files I've seen are written in this order, however the TIFF standard says nothing about mandating such an order, in fact it says that "Compressed or uncompressed image data can be stored almost anywhere in a TIFF file", and I can't imagine that any TIFF parser would mind the order as they must follow the IFD offset and then follow the StripOffsets (tag 273) tags. I think it makes more sense to put the pixel data last so that one goes through a TIFF file more sequentially without jumping from the top to the bottom back to the top for no good reason, but yet I don't see anyone else doing this, which perplexes me slightly.
Part of the reason why I'm asking is that a client of mine tries to create TIFF files slightly over 4 GB which doesn't work due to the IFD offsets overflowing, and I'm thinking that even though the TIFF standard claims TIFF files cannot exceed 2^32 bytes there might be a way to create 8 GB TIFF files that would be accepted by most TIFF parsers if we put everything that isn't pixel data first so that they all have very small offsets, and then point to two strips of pixel data, a first strip followed by a second that would start before offset 2^32 and that is itself no larger than 2^32-1, thus giving us TIFF files that can be up to 2*(2^32-1) bytes and still be in theory readable despite being limited to 32 bit offsets and sizes.
Please note that this is a question about the TIFF format, I'm not talking about what any third-party library would accept to write as I wrote my own TIFF writing code, nor am I asking about BigTIFF.
It's OK to write pixel data after the IFD structure and tag data. Many TIFF writing software and libraries do that. Most notable, libtiff based software writes image data first. As for writing image data in huge strips, possibly extending the 4GB file size, check with the software/libraries you intend to read the files with. Software compiled for 32-bit or implementation details might prevent reading such files. I found that for example modern libtiff based software, Photoshop, and tifffile can read such files, while ImageJ, BioFormats, and Paint.NET can not.

d3.json() to load large file

I have a 96mb .json file
It has been filtered to only the content needed
There is no index
Binaries have been created where possible
The file needs to be served all at one time to calculate summary statistics from the start.
The site: https://3milychu.github.io/met-erials/
How could I improve performance and speed and/or convert the .json file to a compressed file that can be read client-side in javascript?
Most visitors will not hang around for the page to load -- I thought that the demo was broken when I first visited the site. A few ideas:
JSON is not a compact data format as the tag names get repeated in every datum. CSV/TSV is much better in that respect as the headers only appear once, at the top of the file.
On the other hand, repetitive data compresses well, so you could set up your server to compress your JSON data (e.g. using mod_deflate on Apache or compression on nginx ) and serve it as a gzipped file that will be decompressed by the user's browser. You can experiment to see what combination of file formats and compression works best.
Do the summary stats need to be calculated every single time the page loads? When working with huge datasets in the past, summary data was generated by a daily cron job so users didn't have to wait for the queries to be performed. From user feedback, and my own experience as a user, summary stats are only of passing interest, and you are likely to lose more users by making them wait for an interface to load than you are through not providing summary stats or sending stats that are very slightly out of date.
Depending on how your interface / app is structured, it might also make sense to split your massive file into segments for each category / material type, and load the categories on demand, rather than making the user wait for the whole lot to download.
There are numerous other ways to improve the load time and (perceived) performance of the page -- e.g. bundle up your CSS and your JS files and serve them each as a single file; consider using image sprites to reduce the number of separate requests that the page makes; serve your resources compressed wherever possible; move the JS loading out of the document head and to the foot of the HTML page so it isn't blocking the page contents from loading; lazy-load JS libraries as required; etc., etc.

How to check pdf is exist or same 80% in mysql?

How to check pdf is exist or same 80% in mysql?
User want to upload pdf.
But problem is reup.
I think covert pdf to binary
=> I will have a string "X"(binary of that pdf) to save in mysql.
=> Select like %(splice (1/3 length(X) -> 2/3 length(X)).
maybe do it?
im using laravel
thank for reading
This cannot be done reasonably in MySQL. Since you are also using a PHP environment, it may be possible to perform via PHP, but to achieve a general solution you will need substantial effort.
PDF files are composed of (possibly compressed) streams of images and text. Several libraries can attempt to extract the text, and will work reasonably well if the PDF was generated in a straightforward way; however, they will typically fail if some text was rendered as images of its characters, or if other ofuscation has been applied. In those cases, you will need to use OCR to generate the actual text as it is seen when the PDF is displayed. Note also that tables and images are out-of-scope for these tools.
Once you have two text files, finding overlaps becomes much easier, although there are several techniques. "Same 80%" can be interpreted in several ways, but let us assume that copying a contiguous 79% of the text from a file and saving it again should not trigger alarms, while copying 81% of that same text should trigger them. Any diff tool can provide information on duplicate chunks, and may be enough for your purposes. A more sophisticated approach, which however does not provide exact percentages, is to use the normalized compression distance.

Long text is cut when printed using NetSuite advanced pdf

I created a Bill of Material print out for a manufacturing company using advanced pdf. So one of the requirements is to print out the detailed manufacturing which is stored on a custom field (long text) in the assembly item record. This is done because items have different set of process each. The problem is in the print out, only a third of the manufacturing process is being printed. Normally the instruction is around 4k characters, but the pdf print out has around 1k characters only. Is there a way to resolve this?
You may be experiencing a built-in Netsuite issue.
One possible workaround is that if your instructions are consistent you could pull them from a library stored in the file path. Make sure the files are "available without login"
Then you'd include them as:
<#include "https://system....." parse=false>

Large text file manipulation

Seeking help on manipulating large text files. I'm not a programmer and apparently have some serious trouble with regex. The text files in question are an output of some tool that logs enormous amount of information about the server it's running on. The produced output needs to be adjusted to defined requirements. Each text file between 4-10 MB of size. The structure of the file broken into a sections that look like this:
http://imageshack.us/photo/my-images/853/79515724.jpg/
What I need is to somehow remove the some sections where as leaving SOME of them intact.
I have around 200 of those files.
Thank you.