I came across several articles about this topic but most of them are outdated. So what is the best way to minify/get rid of the whitespaces when outputing my views html?
I built a very trivial minifier called RazorHtmlMinifier.Mvc5.
It's operating in compile-time when the cshtml Razor files are converted to C# classes, so it won't have any performance overhead at runtime.
The minification is very trivial, basically just replacing multiple spaces with one (because sometimes a space is still significant, e.g. <span>Hello</span> <span>World</span> is different to <span>Hello</span><span>World</span>).
The source code is very recent and very simple (just one file with less than 100 lines of code) and installation involves just a NuGet package and changing one line in Web.config file.
And all of this is built for the latest version of ASP.NET MVC 5.
Usually, it's recommended to use gzip encoding to minify HTTP responses, but I found out that if you minify the HTML before gzipping, you can still get around 11% smaller responses on average. In my opinion, it's still worth it.
Use WebMarkupMin: ASP.NET 4.X MVC. Install the NuGet package and then use the MinifyHtmlAttribute on your action method, controller, or register it in RegisterGlobalFilters in FilterConfig. You can also try the CompressContentAttribute. Here's the wiki: https://github.com/Taritsyn/WebMarkupMin/wiki/WebMarkupMin:-ASP.NET-4.X-MVC
If you use the CompressContentAttribute you'll see the Content-Encoding:deflate header rather than Content-Encoding:gzip header if you were using gzip before applying this attribute.
Some test numbers:
No minification or compression:
Content-Length:21594
Minification only: Content-Length:19869
Minification and compression: Content-Length:15539
You'll have to test to see if you're getting speed improvements overall from your changes.
EDIT:
After exhaustive testing locally and on the live site, I've concluded that minifying and compressing HTML with WebMarkupMin in my case slowed the page load time by about 10%. Just compressing (using CompressContentAttribute) or just minifying also slowed it down. So I've decided not to compress (using CompressContentAttribute) or minify my HTML at all.
Related
I'm thinking about using the jQuery Ajax load method. In some cases, the html I want to load is quite large. I'm wondering if the browser already streamlines the process behind the scenes, or should I minify and/or compress the html before calling .load() from jQuery? If so, which one? or both? Is there a standard way to perform minification and/or compressing in this scenario?
UPDATE
Does this make any sense:
The data I'm going to retrieve from the server is static. Let's say I have data for apples, oranges, kumquats, and papayas, and none of it changes "on the fly" (only when I update the site).
So is it preferable that I get the data as Json via jQuery this way:
$.getJson('kumquats')
(...and then, of course, process the results that come back)... OR ...simply send back the html with no need of massaging, as "kumquats" will always send back the exact same html, "oranges" will always be the same html, etc.
In the latter option, then, I would do something like this (jQuery pseudocode) instead:
$('#MainContent").html($.load("\Content\Kumquat.htm"));
In summation, I can send all the html fully-formed across the wire, and clog up the pipes with some extra bits for a bit, OR I can send a less verbose representation of the datta (json), and then massage it in the .getJson() callback function, transforming it into html. Performance-wise, does it make much difference? BTW, this is not "sensitive" data - it doesn't matter who sees it as it zips by through the ether.
I'm wondering if the browser already streamlines the process behind the scenes
The browser can't control how much data the server sends in its response.
or should I minify and/or compress the html before calling .load() from jQuery?
You call load on the client. The server has to do any minification or compression of the HTML.
Is there a standard way to perform minification and/or compressing in this scenario?
Compression is usually handled by gzip encoding. How you set that up depends on your server and/or the server side programming language that is generating the content.
I'm not aware of any standard way to perform minification. I used HTML Tidy to do that once.
The browser can't minify HTML before downloading it first. The only reason you want to minify is to reduce download time by decreasing the file size of the download, so this is counter intuitive.
Your server needs to minify and/or compress. It probably already is compressing by default (mod_deflate on apache for example). Minification of the HTML can be done in a variety of ways depending upon the server-side technology you are using. There may be a library for it, or you could use a third party CDN to minify and serve the content for you.
We are intending to use wkhtmltopdf to convert html to pdf but we are concerned about the scalability of wkhtmltopdf. Does anyone have any idea how it scales? Our web app potentially could attempt to convert hundreds of thousands of (reletively complex)html so it's important for us to have some idea. Has anyone got any information on this?
First of all, your question is quite general; there are many variables to consider when asking about scalability of any project. Obviously there is a difference between converting "hundreds of thousands" of HTML files over a week and expecting to do that in a day, or an hour. On top of that "relatively complex" HTML can mean different things to other people.
That being said, I figured since I have done something similar to this, converting approximately 450,000 html files, utilizing wkhtmltopdf; I'd share my experience.
Here was my scenario:
450,000 HTML files
95% of the files were one page in length
generally containing 2 images (relative path, local system)
tabular data (sometimes contained nested tables)
simple markup elsewhere (strong, italic, underline, etc)
A spare desktop PC
8GB RAM
2.4GHz Dual Core Processor
7200RPM HD
I used a simple single threaded script written in PHP, to iterate over the folders and pass the html file path to wkhtmltopdf. The process took about 2.5 days to convert all the files, with very minimal errors.
I hope this gives you insight to what you can expect from utilizing wkhtmltopdf in your web application. Some obvious improvements would come from running this on better hardware but mainly from utilizing a multi-threaded application to process files simultaneously.
In my experience performance depends a lot on your pictures. It there are lots of large pictures it can slow down significantly. If at all possible I would try to stage a test with an estimate of what the load would be for your servers. Some people do use it for intensive operations, but I have never heard of hundrerds of thousands. I guess like everything, it depends on your content and resources.
The following quote is straight off the wkhtmltopdf mailing list:
I'm using wkHtmlToPDF to convert about 6000 E-mails a day to PDF. It's all
done on a quadcore server with 4GB memory... it's even more then enough for
that.
There are a few performance tips, but I would suggest trying out what is your bottlenecks before optimizing for performance. For instance I remember some person saying that if possible, loading images directly from disk instead of having a web server inbetween can speed it up conciderably.
Edit:
Adding to this I just had some fun playing with wkhtmltopdf. Currently on an Intel Centrino 2 with 4Gb memory I generate PDF with 57 pages of content (mixed p,ul,table), ~100 images and a toc takes consistently < 7 seconds. I'm also running visual studio, browser, http server and various other software that might slow it down. I use stdin and stdout directly instead of files.
Edit:
I have not tried this, but if you have linked CSS, try embedding it in the HTML file (remember to do a before and after test to see the effects properly!). The improvement here most likely depends on things like caching and where the CSS is served - if it's read from disk every time or god forbid regenerated from scss, it could be pretty slow, but if the result is cached by the webserver (I dont think wkhtmltopdf caches anything between instances) it might not have a big effects. YMMV.
We try to use wkhtmltopdf in any implementations. My objects are huge tables for generated coordinate points. Typically volume of my pdf = 500 pages
We try to use port of wkhtmltopdf to .net. Results are
- Pechkin - Pro: don't need other app. Contra: slow. 500 pages generated about 5 minutes
- PdfCodaxy - only contra: slow. Slower than pure wkhtmltopdf. Required installed wkhtmltopdf. Problems with non unicode text
- Nreco - only contra: slow. Slower than pure wkhtmltopdf. Required installed wkhtmltopdf. Incorrect unlock libs after use (for me)
We try to use binary wkhtmltopdf invoked from C# code.
Pro: easy to use, faster that libs
Contra: need temporary files (cannot use Stream objects). Break with very huge (100MB+)html files as like as other libs
wkhtmltopdf --print-media-type is blazing fast. But you loose normal CSS styling with that.
This may NOT be an ideal solution for complex html pages export. But it worked for me because my html contents are pretty simple and in tabular form.
Tested on version wkhtmltopdf 0.12.2.1
You can create own pool of the wkhtmltopdf engines. I did it for a simple use case by invoking API directly instead of start process wkhtmltopdf.exe every time. The wkhtmltopdf API is not thread-safe, so it's not easy to do. Also, you should not forget about sharing a native code between AppDomains.
Some friends and I have been working on a set of scripts that make it easier to do work on the machines at uni. One of these tools currently uses Nokogiri, but in order for these tools to run on all machines with as little setup as possible we've been trying to find a 'native' html parser, instead of requiring users to install RVM and custom gems (due to disk space limitations for most users).
Are we pretty much restricted to Nokogiri/Hpricot/? Should we look at just writing our own custom parser that fits our needs?
Cheers.
EDIT: If there's posts on here that I've missed in my searches, let me know! S.O. is sometimes just too large to find things effectively...
There is no html parser in ruby stdlib
html parsers have to be more forgiving of bad markup than xml parsers
You could run the html though tidy (http://tidy.sourceforge.net)
to tidy up the html and produce valid markup
This can now be read via rexml :-) which is in stdlib
rexml is much slower than nokogiri, last checked in 2009
Sam Ruby had been working on making rexml faster though
A better way would be to have a better deployment
Take a look at http://gembundler.com/bundle_package.html and using capistrano (or some such) to provision servers
Is it possible to reuse HTML tags across multiple files, headers and footers for example? Placing them in separate files adds an extra HTTP request, that I'd like to avoid.
I don't want to replicate minor changes in headers and footers across every html file every time a change request comes along.
HTML is not a programming language - it's a markup language. You don't do object-oriented HTML because it isn't object based. This is the whole purpose of a server-side language, so you can make include files and use them in your server-side application.
If you have Apache however, you can use server-side includes which don't require a programming language such as PHP, but it's less flexible:
<!--#include virtual="/footer.html" -->
First, HTML isn't even a programming language, so it's impossible to have "Object-oriented" HTML.
Placing them in separate files adds an
extra HTTP request, that I'd like to
avoid.
If this is the reason for your "without server side code" requirement, then you are mistaken - the client does not fetch the templates that make up a page separately; the server side code will return a single HTML page to the client.
If, on the other hand, you don't have the option to run any server-side code at all and have to make do with static HTML pages, then there's only two options I can think of: iframes (which do result in separate HTTP requests, of course), or some sort of tool that basically runs the equivalent of server-side code to embed your reused templates everywhere and spits out the result to be uploaded to the server. You can have this effect by running a PHP/Apache-with-SSI/JSP/Whatever server on your development machine and using wget to make a static snapshot of the pages.
What I want to do is this. The files can be scattered during development. But I when I'm ready to release, a toolkit should compile the included files into a single html file.
You can use a template language/engine, such as jinja2.
You can layout files in a certain hierarchy, and have templates inherit from other templates, and include other templates, and define reusable macros (closest thing to what you referred to as "reusable tags").
What I want to do is this. The files can be scattered during development. But I when I'm ready to release, a toolkit should compile the included files into a single html file
I know this is late, but CodeKit's .kit language lets you do exactly what you were saying.
http://incident57.com/codekit/help.php
I think the language you've chosen in your question (object oriented HTML) is actually masking the real issue you have here...
What I want to do is this. The files can be scattered during development. But I when I'm ready to release, a toolkit should compile the included files into a single html file.
This sounds like a job for a preprocessor, I don't believe it has anything to do with your webserver or server side technology, as this is a step which would happen before deployment.
There's a number of text pre-processors available eg M4 - hell you could even use the C compiler pre-processor if you wanted. A quick google reveals that there are specialised pre-processors for HTML as well....
Automatic file inclusion, automatic escaping, and whatnot that can be done with automatically inserted headers and footers, chosen based on path patterns.
Seems to fit the bill?
Sure . But these would have to be separate ajax calls form the client . There are lot of javascript mvc frameworks like that do that .
If you want to have include files during development, then compile them into free-standing HTML files, you could do that by spidering your development server with wget: whatever server-side technology you use will combine the files and return the HTML, which wget will saves as one file.
As everithing is object over the technology but not directly, indirectly interacting with the object that are created at different level as per security implementation.
You can do this.
I just released a mature framework called Hypertag that is, in fact, Object Oriented HTML. It is entirely client-side, in continuous development, and allows for very interesting, yet HTML-compatible, advanced solutions for logic and layout.
See http://hypertag.io for more.
Being lazy (and liking DRY code), I'm the kind of guy who's going to write a few little wrappers for recurring HTML markup. Those provided by Rails are good already, but sometimes I have something a little more specific that I know I'm going to repeat over and over.
In some situations a partial can be the solution, but sometimes I'm just going to call the snippet way too often to justify the overhead of using partials.
Right now I create a helpers/html_helper.rb file and stick them in there. The problem is that helpers are not reloaded dynamically per request during development. So each time I tweak my snippet or the code around it, I have to kill the server and restart it.
Granted, it's just a 5 seconds process, but I love Rails' convenience of just developing and then refreshing the browser. So I'd love to have that for my markup snippets as well.
Note: Just sticking 'unloadable' inside the helper module doesn't work.
Good question! This is a technique I should abuse more frequently.
#I go in environment.db (presumably it will work in one of the per-environment files, too.)
Dependencies.explicitly_unloadable_constants << 'NameOfHelperToReloadHere'
That array starts out empty, incidentally, at least in my install. (Checked via console.)
I tested this locally and it works for me, at least on Rails 2.0.2. Major credit for the solution belongs to this gentleman.
If you stick them in application_helper.rb they'll be loaded every time and be available for all of your views. This is loaded every time in development mode (or at least I haven't encountered any issues).
I typically will create little helpers that I use throughout the site (sortable table headers for instance) that use the same logic.
This should reload ALL helpers on every request (assuming you've stuck to the default naming conventions)
#Put this in config/environments/development.rb
ActiveSupport::Dependencies.explicitly_unloadable_constants.concat(Dir.glob("#{RAILS_ROOT}/app/helpers/**/*.rb").map {|file| File.basename(file, '.rb').camelize})
Or if you are using an older version of Rails (2.0.2 or earlier I think)
#Put this in config/environments/development.rb
Dependencies.explicitly_unloadable_constants.concat(Dir.glob("#{RAILS_ROOT}/app/helpers/**/*.rb").map {|file| File.basename(file, '.rb').camelize})
Works for me in RoR 2.1.1
Update: modified top snippet to include 'ActiveSupport::', must have copied / pasted incorrectly from my code.
It's not a real solution but you could use tests (TestUnit, RSpec or whatever) to make sure your helpers work as expected. That way, you wouldn't rely on automatic reloading of your helpers so much.