Rails plugin to clean up (reformat) html output? - html

Is there any plugin (gem) that after rendering page can clean and reformat it? By cleaning I mean removing unnecessary new lines and whitespaces.

Apologies if this is too orthogonal an answer: You should consider just making sure gzip compression is enabled. This makes it easier to view your src pages for debugging, requires less fiddling, and is a bigger win then simply removing unnecessary whitespace. If you have Apache as the front end, you could use mod_deflate (e.g., How do I gzip webpage output with Rails?) and other servers have similar gzip support. Most modern browsers support gzip, so you'll get the biggest bang for your buck.

Perhaps you are looking for http://www.railslodge.com/plugins/455-rails-tidy Jason's point about ensuring gzip is enables is super important as well.

Related

Best way to minify html in an asp.net mvc 5 application

I came across several articles about this topic but most of them are outdated. So what is the best way to minify/get rid of the whitespaces when outputing my views html?
I built a very trivial minifier called RazorHtmlMinifier.Mvc5.
It's operating in compile-time when the cshtml Razor files are converted to C# classes, so it won't have any performance overhead at runtime.
The minification is very trivial, basically just replacing multiple spaces with one (because sometimes a space is still significant, e.g. <span>Hello</span> <span>World</span> is different to <span>Hello</span><span>World</span>).
The source code is very recent and very simple (just one file with less than 100 lines of code) and installation involves just a NuGet package and changing one line in Web.config file.
And all of this is built for the latest version of ASP.NET MVC 5.
Usually, it's recommended to use gzip encoding to minify HTTP responses, but I found out that if you minify the HTML before gzipping, you can still get around 11% smaller responses on average. In my opinion, it's still worth it.
Use WebMarkupMin: ASP.NET 4.X MVC. Install the NuGet package and then use the MinifyHtmlAttribute on your action method, controller, or register it in RegisterGlobalFilters in FilterConfig. You can also try the CompressContentAttribute. Here's the wiki: https://github.com/Taritsyn/WebMarkupMin/wiki/WebMarkupMin:-ASP.NET-4.X-MVC
If you use the CompressContentAttribute you'll see the Content-Encoding:deflate header rather than Content-Encoding:gzip header if you were using gzip before applying this attribute.
Some test numbers:
No minification or compression:
Content-Length:21594
Minification only: Content-Length:19869
Minification and compression: Content-Length:15539
You'll have to test to see if you're getting speed improvements overall from your changes.
EDIT:
After exhaustive testing locally and on the live site, I've concluded that minifying and compressing HTML with WebMarkupMin in my case slowed the page load time by about 10%. Just compressing (using CompressContentAttribute) or just minifying also slowed it down. So I've decided not to compress (using CompressContentAttribute) or minify my HTML at all.

Performance of wkhtmltopdf

We are intending to use wkhtmltopdf to convert html to pdf but we are concerned about the scalability of wkhtmltopdf. Does anyone have any idea how it scales? Our web app potentially could attempt to convert hundreds of thousands of (reletively complex)html so it's important for us to have some idea. Has anyone got any information on this?
First of all, your question is quite general; there are many variables to consider when asking about scalability of any project. Obviously there is a difference between converting "hundreds of thousands" of HTML files over a week and expecting to do that in a day, or an hour. On top of that "relatively complex" HTML can mean different things to other people.
That being said, I figured since I have done something similar to this, converting approximately 450,000 html files, utilizing wkhtmltopdf; I'd share my experience.
Here was my scenario:
450,000 HTML files
95% of the files were one page in length
generally containing 2 images (relative path, local system)
tabular data (sometimes contained nested tables)
simple markup elsewhere (strong, italic, underline, etc)
A spare desktop PC
8GB RAM
2.4GHz Dual Core Processor
7200RPM HD
I used a simple single threaded script written in PHP, to iterate over the folders and pass the html file path to wkhtmltopdf. The process took about 2.5 days to convert all the files, with very minimal errors.
I hope this gives you insight to what you can expect from utilizing wkhtmltopdf in your web application. Some obvious improvements would come from running this on better hardware but mainly from utilizing a multi-threaded application to process files simultaneously.
In my experience performance depends a lot on your pictures. It there are lots of large pictures it can slow down significantly. If at all possible I would try to stage a test with an estimate of what the load would be for your servers. Some people do use it for intensive operations, but I have never heard of hundrerds of thousands. I guess like everything, it depends on your content and resources.
The following quote is straight off the wkhtmltopdf mailing list:
I'm using wkHtmlToPDF to convert about 6000 E-mails a day to PDF. It's all
done on a quadcore server with 4GB memory... it's even more then enough for
that.
There are a few performance tips, but I would suggest trying out what is your bottlenecks before optimizing for performance. For instance I remember some person saying that if possible, loading images directly from disk instead of having a web server inbetween can speed it up conciderably.
Edit:
Adding to this I just had some fun playing with wkhtmltopdf. Currently on an Intel Centrino 2 with 4Gb memory I generate PDF with 57 pages of content (mixed p,ul,table), ~100 images and a toc takes consistently < 7 seconds. I'm also running visual studio, browser, http server and various other software that might slow it down. I use stdin and stdout directly instead of files.
Edit:
I have not tried this, but if you have linked CSS, try embedding it in the HTML file (remember to do a before and after test to see the effects properly!). The improvement here most likely depends on things like caching and where the CSS is served - if it's read from disk every time or god forbid regenerated from scss, it could be pretty slow, but if the result is cached by the webserver (I dont think wkhtmltopdf caches anything between instances) it might not have a big effects. YMMV.
We try to use wkhtmltopdf in any implementations. My objects are huge tables for generated coordinate points. Typically volume of my pdf = 500 pages
We try to use port of wkhtmltopdf to .net. Results are
- Pechkin - Pro: don't need other app. Contra: slow. 500 pages generated about 5 minutes
- PdfCodaxy - only contra: slow. Slower than pure wkhtmltopdf. Required installed wkhtmltopdf. Problems with non unicode text
- Nreco - only contra: slow. Slower than pure wkhtmltopdf. Required installed wkhtmltopdf. Incorrect unlock libs after use (for me)
We try to use binary wkhtmltopdf invoked from C# code.
Pro: easy to use, faster that libs
Contra: need temporary files (cannot use Stream objects). Break with very huge (100MB+)html files as like as other libs
wkhtmltopdf --print-media-type is blazing fast. But you loose normal CSS styling with that.
This may NOT be an ideal solution for complex html pages export. But it worked for me because my html contents are pretty simple and in tabular form.
Tested on version wkhtmltopdf 0.12.2.1
You can create own pool of the wkhtmltopdf engines. I did it for a simple use case by invoking API directly instead of start process wkhtmltopdf.exe every time. The wkhtmltopdf API is not thread-safe, so it's not easy to do. Also, you should not forget about sharing a native code between AppDomains.

What is the most efficient way of encoding binary data over HTTP POST

I'm working on a project where I'll be sending lots of binary data (several images in one message) over HTTP POST to a RESTful interface.
I've looked into alternatives such as JSON, protobuff, thrift but found no conclusive comparisons of overhead introduced by these formats. Which one would you prefer to use in this case?
If you really need to do that all as part of a single HTTP POST, then I would first be more concerned about reliability and functionality. Efficiency is all going to be relative to what you are sending. If it is images in an already compressed format/container, then it is very likely you are not going to see a good percentage difference in efficiency without sacrificing something else. So in my opinion, probably the most effective thing to look into would be to use MIME encoding of your content in the POST which would mean encoding binary's using Base64. Using this you have the benefit that almost any development platform these days will either have this functionality either built in or will be easily available in external libraries for doing MIME / Base64. Sticking with highly used standards like these can make it easy to support a wide user base. Some links for reference:
http://en.wikipedia.org/wiki/MIME
http://en.wikipedia.org/wiki/Base64

any FAST tex to html program?

(im using debian squeeze)
i tried catdvi (but its unacceptable - just a lot of '?'s)
now i am using tex4ht but its awfully sloow..
for example generating html for this :
takes ~2 seconds (thats 4+ times slower than generating the image !!!)
is there something wrong with my config or is tex4ht really that slow?
(i doubt theres something wrong with my config) are there any other(FAST) reliable tex2html converters?
As already suggested, if you want equations in a web page, MathJax will process TeX math code into proper math display.
What about latex2html? It seems the only hit on Google that provides this kind of functionality. Keep in mind that latex is inherently slow, and it may be better to rely on something MathML or MathJax related. I have not tested the above for performance.
On Debian squeeze, just do
apt-get install latex2html

When did it become OK to use UTF-8?

When did people start using UTF-8 in files encoding and HTTP Content-Type headers? Since all web servers, OSes, text editors and browsers support it today, when did it become "compatible" between these?
this picture might be useful:
Edit: I found a later version:
UTF-8 has always been backwards compatible with ASCII.
So basically, it's been OK to use UTF-8 since it's been OK to use ASCII (which is quite a long time).