How to remove white space in web page before serving to client? - html

I am deploying a web application and I am able to compress CSS and js in my web application using page speed module in nginx/apache, but couldn't able to remove HTML white space.
Does anyone has done this before, I have seen this implementation in a major website such as LinkedIn Facebook, and Google.
Does removing white space in HTML add performance boost? As per my understanding removing whitespace reduces some extra bytes.
Here is an example of a condensed version of HTML page from google.

Does removing white space in HTML add performance boost?
Unlikely will you benefit much given that gzip is enabled for your site. The more you save during such a stripping phase the less benefit you would gain from gzipping and vice versa.
BTW, mod_pagespeed has the collapsible module doing what you're asking.
If mod_pagespeed doesn't meet your requirements
Multiple options
if you have a static html pages that just need to be returned to the user it's quite easy to do in offline mode
you can also do it at backend level using your framework batteries. I.e. in case you use a python framework this module could be used
django-html is an HTML minifier for Python, with full support for HTML
5. It supports Django, Flask and many other Python web frameworks. It also provides a command line tool, that can be used for static
websites or deployment scripts.
if none of the options above are viable use some 3rd party modules doing it at nginx level
depending on your needs you might also consider services like cloudflare

You can use tools like https://www.textfixer.com/html/compress-html-compression.php
If you use a text editor like atom or vscode you might as well look for a plugin that does that for you.
It doesn't really affect much on the performance, as you said it will reduce some bytes, but as I suspect you're not building something as big as an Amazon site it doesn't quite matter, moreover it will be a pain to read the code.

Related

HTML - reduce byte size

I'm testing a website speed using PageSpeed Insights tool.
In the result page, one of the warnings suggested me to reduce byte size of css, html and js files.
At the first I tried to remove comments, but nothing changed.
How can I do that?
Should I remove spaces and tabs?
It seems to be a very long operation, worth it?
The action of removing spaces, tabs and useless chars is called minify.
You don't need to do that, there are a lot of services that can minimize files for you.
for example:
http://www.willpeavy.com/minifier/
Be care if you have jquery code: sometimes it removes spaces in wrong place.
You have two things to do to reduce page size:
Minify CSS & JS files
In server side, if you are running your website via Apache, you can install APC, for page cahing. You'll have better parformances
APC
In addition to CSS minifier/prettifier tools above, I recommend using proCSSor for optimizing CSS files. It offers variety of advanced options.
Never found those tools to be much use beyond giving some tips for what might be slowing it down. Minifying is unlikely to achieve much. If you want to speed up your site, save the page and see what the largest files are. Generally they will be the image files rather than the code, and see if you can reduce these.
Also, try and test it on two servers - is your host slow?
If your html file is massive, that suggests a problem with the site's structure - it is rare that a page needs to be large.
Finally, large javascript files are most likely to be things like jquery. If Google hosts these, then use the hosted version. That way, it will probably be already in a user's cache and not impact on your loading time.
EDIT, after further testing and incorporating the issues discussed in the comments below:
PageSpeed Insights is an utterly amateurish tool, and there are much more effective ways to speed up the rendering time than minifying the codes.
PageSpeed Insights is an utterly amateurish tool, that as a matter of standard advises to reduce HTML, CSS and JS file sizes, if not minified. A much, much better tool is Pingdom Website Speed Test. That compares rendering speed to the average of the sites it is asked to test, and gives the download times of the site's components.
Just test www.gezondezorg.org on both, and see the enormous difference in test results. At which the Google tool is dead wrong. It advises to reduce the CSS and JS files, while its own figures (click the respective headers) show that doing so will reduce their sizes with 3.8 and 7.9 kB, respectively. That comes down to less than 1 millisecond download time difference! (1 millisecond = 1/1000 of a second; presumed broadband internet).
Also, it says that I did do a good thing: enable caching. That is BS as well, because my .htaccess file tells browsers to check for newly updated files at every visit, and refresh cached files whenever updated. Tests confirm that all browsers heed that command.
Furthermore, that site is not intended to be viewed on mobile phones. There is just way too much text on it for that. Nevertheless, PageSpeed Insights opens default with the results of testing against mobile-phone criteria.
More effective ways to speed up the rendering
So, minifying hardly does anything to speed up the rendering time. What does do that is the following:
Put your CSS codes and Javascripts as much as possible in one file each. That saves browser-to-server (BTS) requests. (Do keep in mind that quite a number of Javascripts need the DOM to be fully loaded first, so in practice it comes down to putting the scripts as much as possible in 2 files: a pre- and a post-body file.)
Optimize large images for the web. Photoshop and the likes even have a special function for that, reducing the file size while keeping the quality good enough for use on the web.
In case of images that serve as full-size background for containers: use image sprites. That saves BTS requests as well.
Code the HTML and JS files so that there is no rendering dependency on files from external domains, such as from Twitter, Facebook, Google Analytics, advertisement agencies, etc.
Make sure to get a web-host that will respond swiftly, has a sufficient processing capacity, and has a(n almost) 100% up-time.
Use vanilla/native JS as much as possible. Use jQuery or other libraries only for tasks that would otherwise be too difficult or too time-consuming. jQuery not only is an extra file to download, it is also processed slower than native JS.
Lastly, you should realize that:
having the server minify the codes on the fly generally results in a much slower response from the server;
minifying a code makes it unreadable;
de-minifying tools are notorious for their poor performance.
Minifying resources refers to eliminating unnecessary bytes, such as extra spaces, line breaks, and indentation. Compacting HTML, CSS, and JavaScript can speed up downloading, parsing, and execution time. In addition, for CSS and JavaScript, it is possible to further reduce the file size by renaming variable names as long as the HTML is updated appropriately to ensure the selectors continue working.
You can find plenty of online tools for this purpose, a few of them are below.
HTML Minify
CSS Minify
JS Minify
good luck!

Can I "pre-generate" all possible static-html pages of my "dynamic" website?

Some websites like for example http://www.idealo.co.uk seem to serve only static html, although their content is dynamic.
For example if I navigate through a certain category, I get a link to a static html page:
http://www.idealo.co.uk/cat/5666/electric-guitars.html
Now if I apply a custom filter, again I get a link to something that seems to be static html:
http://www.idealo.co.uk/cat/5666F456496-735760-1502100/electric-guitars.html
How is this achieved? Are there any frameworks out there that help to "pre-generate" all possible dynamic pages, in such way that whenever a new input is given, the page already exists (i.e. the static html is already available)?
Background: we run a small search engine for real estate offers. Offers are updated by our scraper once a day (the content is static through the day). The content is searchable on a Ruby-on-Rails website.
As the traffic increases, performance is becoming an issue. I'm wondering if there is any framework / tool that could batch-generate all our searches so that we could serve static html.
Their site isn't dynamic. They're using URL rewriting (e.g. mod_rewrite) to translate the input URLs into a request that can be satisfied by a script.
For example:
/cat/5666/electric-guitars.html
Might be rewritten to:
/cat.php?id=5666
A quick trick to test this is to go to /cat/5666/foo.html
The use of .html in this case is probably to hide what kind of scripting is used on their site, as a weak security-through-obscurity measure.
In response to your problem - no, there's no (easy) way to generate all possible results into static HTML files. You're looking at potentially billions of permutations. If you're having performance issues, look into performance profiling, caching, query optimisation, etc.
What you're describing is, in a sense, caching. With caching your application will generate pages (and even parts of pages) only when their content has changed. Rails has a lot of cache functionality built in, which you can tune to fit your needs. Start by reading the Rails Guide on caching which describes the Rails' capabilities as well as common add-ons. Google around for "Rails 3 caching"—there's tons of information out there. Finally, you can add software to your server stack that does additional caching, such as Squid and Varnish. With the right tools (and research) you can get 95% of the benefit of a static site without the effort of turning your site into a quasi-static Frankenapp by hand.
I finally found this blog post, which points to a few tools that do what I was looking for. I'm adding it here just for future reference:
Hyde
"Hyde is a static website generator powered by Python & Django. Hyde supports all the Django template tags & filters and even has a few of its own. The built-in web server + auto-generator provide instant refresh and unlimited flexibility..."
Jekyll
"Jekyll is a simple, blog-aware, static site generator. It takes a template directory containing raw text files in various formats, runs it through Markdown (or Textile) and Liquid converters, and spits out a complete, ready-to-publish static website suitable for serving with your favorite web server..."
blatter
"Blatter is a tiny tool for creating and publishing static web sites
built from dynamic templates..."

How to create DRY HTML?

I have a small static website and every page of this site has a menu and a footer.
What is the best way to make sure changes in the menu and the footer only need to be done in one place and enable me to easily update all my pages which consist of them.
I am looking for some kind of simple template system that enables me to combine files together.
I have looked a bit into ruby .erb files but they seem too complicated for what I want to achieve as I would have to install rails and enable my webserver to use that.
For a simple site, there's nothing wrong with doing server side includes. Simply create the HTML snippets (they don't even have to be fully formed HTML) for your menu and footer. Then on each page, add the appropriate
<!-- #include virtual="/footer.html" -->
statement in the proper location. Since you're on a Debian server, I'm pretty sure Apache wil already have this enabled by default.
It may seem antiquated, but my wife works for a company that does a lot of maintenance for small websites and they still take this approach and it works just fine.
If your site goes above 10 pages, then I'd say look into some of the templating systems, just to alleviate the need to remember to add your SSI on each new page you create.
you could have a look at some Web Templating Systems and decide based on the language/platform you are familiar with
I use Octopress. It's a static site generator built on top of Jekyll which uses markdown for content markup and specific template language for constructing pages. So if you only need a site with a few pages you should try jekyll.
It requires for your system to have ruby since all site generation is done on client side and afterwards the site is deployed via rsync.
Try searching the internet for static site generator. It gives dozen of solutions in all sort of languages: Python, Ruby, PHP, Haskell, Sh, Bash…
Do you need to combine those on the server side?
For a small static site I simply created a little local script (I used PowerShell, but feel free to use whatever you want or have at your disposal) that does deployment from the local source files which represent the templates. While maybe not as flexible on the template side as full-blown templating engines it's easy, fast and works well for quite a while. Also it runs locally and doesn't need anything except a simple web server on the server side, cutting down on potential vulnerabilities.
I've used WML ("Website Meta Language"; NB nothing to do with the WML associated with mobile and WAP!) on Debian for years to maintain consistent templated header/sidebar/footer boilerplate for pages on my ISP's static page hosting.

HTML minification? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
Is there a online tool that we can input the HTML source of a page into and will minify the code?
I would do that for aspx files as it is not a good idea to make the webserver gzip them...
Perhaps try HTML Compressor, here's a before and after table showing what it can do (including for Stack Overflow itself):
It features many selections for optimizing your pages up to and including script minimizing (ompressor, Google Closure Compiler, your own compressor) where it would be safe. The default option set is quite conservative, so you can start with that and experiment with enabling more aggressive options.
The project is extremely well documented and supported.
Don't do this. Or rather, if you insist on it, do it after any more significant site optimizations are complete. Chances are very high that the cost/benefit for this effort is negligible, especially if you were planning to manually use online tools to deal with each page.
Use YSlow or Page Speed to determine what you really need to do to optimize your pages. My guess is that reducing bytes of HTML will not be your site's biggest problem. It's much more likely that compression, cache management, image optimization, etc will make a bigger difference to the performance of your site overall. Those tools will show you what the biggest problems are -- if you've dealt with them all and still find that HTML minification makes a significant difference, go for it.
(If you're sure you want to go for it, and you use Apache httpd, you might consider using mod_pagespeed and turning on some of the options to reduce whitespace, etc., but be aware of the risks.)
Here is a short answer to your question: you should minify your HTML, CSS, JS. There is an easy to use tool which is called grunt. It allows you to automate a lot of tasks. Among them JS, CSS, HTML minification, file concatenation and many others.
The answers written here are extremely outdated or even sometimes does not make sense. A lot of things changed from old 2009, so I will try to answer this properly.
Short answer - you should definitely minify HTML. It is trivial today and gives approximately 5% speedup. For longer answer read the whole answer
Back in old days people were manually minifying css/js (by running it through some specific tool to minify it). It was kind of hard to automate the process and definitely required some skills. Knowing that a lot of high level sites even right now are not using gzip (which is trivial), it is understandable that people were reluctant in minifying html.
So why were people minifying js, but not html? When you minify JS, you do the following things:
remove comments
remove blanks (tabs, spaces, newlines)
change long names to short (var isUserLoggedIn to var a)
Which gave a lot of improvement even at old days. But in html you were not able to change long names for short, also there was almost nothing to comment during that time. So the only thing that was left is to remove spaces and newlines. Which gives only small amount of improvement.
One wrong argument written here is that because content is served with gzip, minification does not make sense. This is totally wrong. Yes, it makes sense that gzip decrease the improvement of minification, but why should you gzip comments, whitespaces if you can properly trim them and gzip only important part. It is the same as if you have a folder to archive which has some crap that you will never use and you decide to just zip it instead of cleaning up and zip it.
Another argument why it pointless to do minification is that it is tedious. Maybe this was true in 2009, but new tools appeared after this time. Right now you do not need to manually minify your markup. With things like Grunt it is trivial to install grunt-contrib-htmlmin (relies on HTMLMinifier by #kangax) and to configure it to minify your html. All you need is like 2 hours to learn grunt and to configure everything and then everything is done automatically in less than a second. Sounds that 1 second (which you can even automate to do nothing with grunt-contrib-watch) is not really so bad for approximately 5% of improvement (even with gzip).
One more argument is that CSS and JS are static, and HTML is generated by the server so you can not pre-minify it. This was also true in 2009, but currently more and more sites are looking like a single page app, where the server is thin and the client is doing all the routing, templating and other logic. So the server is only giving you JSON and client renders it. Here you have a lot of html for the page and different templates.
So to finish my thoughts:
google is minifying html.
pageSpeed is asking your to minify html
it is trivial to do
it gives ~5% of improvement
it is not the same as gzip
I wrote a web tool to minify HTML. http://prettydiff.com/?m=minify&html
This tool operates using these rules:
All HTML comments are removed
Runs of white space characters are converted to single space characters
Unnecessary white space characters inside tags are removed
White space characters between two tags where one of these two tags is not a singleton is removed
All content inside a style tag is presumed to be CSS and is minified as such
All content inside a script tag is presumed to be JavaScript, unless provided a different media type, and then minified as such
The CSS and JavaScript minification uses a heavily forked form of JSMin. This fork is extended to support CSS natively and also support SCSS syntax. Automatic semicolon insertion is supported for JavaScript minification, however automatic curly brace insertion is not yet supported.
This worked for me:
http://minify.googlecode.com/git/min/lib/Minify/HTML.php
It's not an already available online tool, but being a simple PHP include it's easy enough you can just run it yourself.
I would not save compressed files though, do this dynamically if you really have to, and it's always a better idea to enable Gzip server compression.
I don't know how involved that is in IIS/.Net, but in PHP it's as trivial as adding one line to the global include file
CodeProject has a published sample project (http://www.codeproject.com/KB/aspnet/AspNetOptimizer.aspx?fid=1528916&df=90&mpp=25&noise=3&sort=Position&view=Quick&select=2794900) to handle some of the following situations...
Combining ScriptResource.axd calls into a single call
Compress all client side scripts based on the browser capability including gzip/deflate
A ScriptMinifier to remove comments, indentations, and line breaks.
An HTML compressor to compress all html markup based on the browser capability including gzip/deflate.
And - most importantly - an HTML Minifier to write complete html into single line and minify it at possible level (under construction).
For Microsoft .NET platform there is a library called the WebMarkupMin, which produces the minification of HTML code.
In addition, there is a module for integration this library into ASP.NET MVC - WebMarkupMin.Mvc.
try http://code.mini-tips.com/html-minifier.html, this is .NET Libary for Html Minifier
HtmlCompressor is a small, fast and very easy to use .NET library that minifies given HTML or XML source by removing extra whitespaces, comments and other unneeded characters without breaking the content structure. As a result pages become smaller in size and load faster. A command-line version of the compressor is also available.

REALLY Simple Website--How Basic Can You Go?

Although I've done programming, I'm not a programmer. I've recently agreed to coordinate getting a Website up for a club. The resources are--me, who has done Web content maintenance (putting content into HTML and ColdFusion templates via a gatekeeper to the site itself; doing simple HTML and XML coding); a serious Web developer who does database programming, ColdFusion, etc., and talks way over the heads of the rest of us; two designers who use Dreamweaver; the guy who created the original (and now badly broken) site in Front Page and wants to use Expression Web; and assorted other club members who are even less technically inclined.
What we need up first is some text and graphics (a gorgeous design has been created in Dreamweaver), some links (including to existing PDF newsletters for download), and maybe hooking up an existing Blogspot blog. Later (or earlier if it's not hard), we may add mouseover menus to the links, a gallery, a calendar, a few Mapquest hotlinks, and so on.
My question--First, is there any real problem with sticking with HTML and jpegs for the initial site? Second, for the "later" part of the site development, what's the simplest we can go with? Third, are there costs in doing this the simple way that will make us regret it down the road? Also, is there a good site/resource where I can learn more about this from a newbie perspective?
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­
If you don't require any dynamic content, heck, if you don't plan on editing the content more than once a week, I'd say stick to basic HTML.
Later, you'd probably want a basic, no-fuss and easily installable CMS. The brand really depends on the platform (most likely PHP/Rails/ASP), but most of them can be found by typing " CMS" into Google. Try prefixing it with "free" or "open source" if you want.
I'm pretty sure you can do all this for absolutely free. Most PHP and Ruby CMS's are free and web hosting is free/extremely cheap if you're not demanding.
And last/best tip: Find someone who has done this before, preferably more than once. He'll probably set you up so you never have to look at anything more complicated than a WYSIWYG editor.
Plain old HTML is fine, just as long as you don't use tags like blink and marquee.
I personally love tools like CityDesk.
And I'm not just plugging Joel. (There are others out there in this class I'm sure.) The point is they make making a static website very easy:
The structure is just a filesystem structure
pages have templates to consolidate formatting
all resources are contained in one file
easy and fast Preview and Publish functions
For a dynamic collaborative site, I would just install one of many open source CMSs available on shared hosting sites.
If you're familiar with html/javascript basics I'd look into a CMS - wordpress, drupal, joomla, nuke, etc. All of these are free. Very often your web hosting company will install one of these by default which takes all of the hard part out of your hands. Next is just learning to customize the system and there's tons of docs out there for any of those systems.
All that being said there is noting wrong with good old fashioned html.
In addition to some of the great content management systems already mentioned, consider cms made simple.
It makes it very easy to turn a static site into a content managed site (which sounds like exactly what you might need to do in the future), and the admin area is very easy to use. Our clients have found it much simpler to use than the likes of Joomla.
It's also free and open source.
Good luck!
There's no reason to not go with plain old HTML and JPGs if you don't know any server side scripting languages. Also, once you want to get more advanced, most cheap hosting services have tools that can be installed with one click, and provide things like blogs, photo galleries, bulletin boards (PHPBB), and even content management tools like Joomla.
I had the same problem myself, I was just looking for something really easy to smash together a website quickly. First I went with just plain old HTML, but then I realised a simple CMS would be better.
I went for Wordpress. Wordpress is mostly known as a blogging platform, but in my opinion it is really great as a deadly simple CMS as well.
why not simply use Google pages?
Here is an example of a website I did, takes about 2 hours, easy to maintain (not that I do (-: ) and FREE.
I think that suggesting you mess with HTML for what you need is crazy!
Plain HTML is great, gives you the most control. If you want to make updating a bit easier though, you could use SSI. Most servers have this enabled. It basically let's you attach one file to many pages.
For example, you could have your menu in navigation.html and every page would include this file. That way you wouldn't have to update this one file on every page each time you need to update.
<!--#include virtual="navigation.html" -->
I agree with the other commenters that a CMS might be useful to you, however as I see it, probably a solution like Webby might do it for you. It generates plain HTML pages based on Templates. Think about it as a "webpage preprocessor" which outputs plain HTML files. It has most of the advantages of using a server-based CMS, but without a lot of load on the server, and making it easy for you to change stuff on any of the templates you might use.
It's fine
Rails (or purchase / use a CMS)
Not unless you start becoming crazy-popular
It really depends on what you go with for 2. Rails has a plethora of tutorials on the net and any product you go with will have its own community etc.
To be perfectly honest though, if the dynamic part is someone elses blog and you move the gallery out into flikr you may find that you can actually live with large parts of it being static HTML for a very long time.
If a to Implement a website With User Profiles/Logins, Extensions, Gallery's etc s a Newbi then a CMS like Joomla, Etc are good , but Else if you presently have only Static Content then Its good to go with Good Old HTML, About JPEG , I though Presently Its better to use PNG or GIF as its Less Bulky.
Also About you Query About Shifting to Server Scripts , When you have Database Driven Material or When you have Other Things that Require Advanced Prog Languages , Just use PHP Scripts inside PHP , and Rename teh File as a PHP, Thats IT, No Loss to you HTML Data.....
Do Go Ahead and Launch you Site ......
Dude, you're talking about HTML, obviously you'll be styling your content with CSS. Wait till you run into IE issues and god forbid your client wants ie6 compatibility.
Go with the HTML for now, I'm sure you guys will hack it through. Our prayers are with you.
Personally, I'd never use JPEG images on a website, mainly because of three reasons:
JPEGs often contains artifacts.
Quality is often proportional
with filesize.
Does not support
alpha transparency.
That said, I'd recommend you to use PNGs for images since it's lossless and a 24-bit palette (meaning full colors + alpha transparency). The only quirk is that IE6 and below does not support native alpha for PNGs, however this could be resolved by running a javascript which would fix this issue.
As for designing a website, there's both pros and cons for this. I suggest you read through:
37 Signal's Why We Skip Photoshop
Jeff Croft's Why We Don't Skip Photoshop
As for newbie resources, I'd recommend you flip through the pages at W3 Schools.