Related
I have a big performance issue in IE6 even with javascript turned-off.
It's strange because sometimes when the page is loaded, the header is floated next to the footer, or slideshow is over the the content.
I wonder if someone had same or similiar issues in IE6 and if i minify a generated source code into a one-line, will it help somehow to gain loadspeed in that browser ?
-Want to mention that it should be compatible with ie6 so please, do not post a messages like - use modern browsers.
The problem was in MS png fixer inside css. Even if i turned off a javascript, it still works, so when i removed css lines with ms filter for png transparency, it starts working like it should.
Thanks for any submits.
I doubt that removing newlines would increase the speed in any noticeable fashion.
That is, the performance issues are likely not caused by line count but rather the size/number/type/cost of the elements/operations after the parsing.
The actual lexer that handles the newlines should see them no differently in the stream than any other character. Depending on exactly what context "source" means newlines have some effect semantically at the parser:
CSS: none (ignoring embedded newlines)
HTML: possible new text-nodes or different content
JavaScript: automatic semicolon insertion (or embedded newlines)
However, there is no reason not to try a minified version quickly to see if it makes a difference and, more importantly, to satisfy your curiosity ;-) I would definitely heed the other suggestions as well, such as checking the page (everything) for validity.
Happy coding.
You haven't specified what your page consists of, but my guess would be you're outputting the mother of all HTML tables?
The reason I guess that is because IE6 is known to be extremely slow when rendering large tables, particularly where the column widths are variable. (later IEs are also slow with this, but not as bad as IE6)
The reason is that the browser attempts to render the entire table before displaying anything, and performs a lot of calculations to work out how to render it.
The answers on this question may also help you: Are large html tables slow?
I've been writing a source-to-display converter for a small project. Basically, it takes an input and transforms the input into an output that is displayable by the browser (think Wikipedia-like).
The idea is there, but it isn't like the MediaWiki style, nor is like the MarkDown style. It has a few innovations by itself. For example, when the user types in a chain of spaces, I would presume he wants the spaces preserved. Since html ignores spaces by default, I was thinking of converting these chain of spaces into respective s (for example 3 spaces in a row converted to 1 )
So what happens is that I can foresee a possibility of a ton of tags per post (and a single page may have multiple posts).
I've been hearing alot of anti- s in the web, but most of it boils down to readability headaches (in this case, the input is supplied by the user. if he decides to make his post unreadable he can do so with any of the other formatting actions supplied) or maintenance headaches (which in this case is not, since it's a converted output).
I'm wondering what are the disadvantages of having tons of tags on a webpage?
You are rendering every space as ?
Besides wasting so much bandwidth, this will not allow dynamic line breaking as "nbsp" means "*n*on *b*reaking *sp*ace". This will most probably cause much trouble.
If it's just being dumped to a client, it's just a matter of size, and if it's gzipped, it barely matters in terms of network traffic.
It'll slow down rendering, I'm sure, and take up DOM space, but whether or not that matters depends on stuff I don't know about your use case(s). You might be able to achieve the same result in other ways, too; not sure.
s aren't tags, but are character entities like ©, <, >, etc.
I'd say that the disadvantages would be readability. When I see a word, I expect the spacing to be constant (unless it is in a block of justified text).
Can you show me a case where you'd need s?
Have you considered trying to figure out what the user, by inserting those spaces, is really trying to achieve? Rather than the how (they want to insert the spaces), the what (if the spaces are at the beginning of a line, they want to indent the text in question).
An example of this is many programming sites convert 4 spaces at the start of a line to a pre+code block.
For your purposes, maybe it should be a <block> block.
The end goal being that of converting the spaces not to what the user (with their limited resources) intended to show up there but, rather, what they meant to convey with it.
Minimizing html is the only section on Google's Page Speed where there is still room for improvement.
My site is all dynamic and the HTML is already Deflated so there is no reason to put any more pressure on the server (I don't want to minimize pages real time before sending).
What I could do was to minimize the template files. My templates files are a mix of PHP and HTML so I've come up with some code that I think is pretty safe but would like to be community revised.
// this will loop trough all template files
// php is cleaned first so that line-comments will not interfere with the regex
$original = file_get_contents($dir.'/'.$file);
$php_clean = php_strip_whitespace($dir.'/'.$file);
$minimized = preg_replace('/\s+/', ' ', $php_clean);
This will make my template files as a single very long file alternated with some places where DB content is inserted. Google's homepage source looks more or less like what I get so I wonder if they follow a similar approach.
Question 1: Do you antecipate potencial problems?
Question 2: Is there anyway better (more efficient to do this)?
And please remember that I'm not trying to validate HTML as the templates are not valid HTML (header and footer are includes, for example).
Edit: Do take into consideration that the template files will be minimized on deploy. As CSS and Javascript files are minimized and compressed using YUI Compressure and Closure, the template files would be minimized like-wise, on deploy. Not on client-request.
Thank you.
Google's own Closure Templates (Soy) strips whitespace at the end of the line by default, and the template designer explicitly inserts a space using {sp}. This probably isn't a good enough reason to switch away from PHP, but I just wanted to bring it to your attention.
In addition, realize that HTML 4 allows you to exclude some tags, as recommended by the Page Speed documentation on minifying HTML (http://code.google.com/p/page-speed/wiki/MinifyHtml). You can exclude </p>, </td>, </tr>, etc. For a complete list of elements for which you can omit the end tag, search for "- O" in the HTML 4 DTD (http://www.w3.org/TR/REC-html40/sgml/dtd.html). You can even omit the <html>, <head>, <body>, and <tbody> tags entirely, as both start and end tags are optional ("O O" in the DTD).
You can also omit the quotes around attributes (http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.2) such as id, class (with a single class name), and type that have simple content (i.e., matches /^[-A-Za-z0-9._:]+$/). For attributes that have a single possible value, you can exclude the value (e.g., say simply checked rather than checked=checked).
Some people may find these tips repulsive because we've been conditioned for so many years to prepare for the upcoming world of simple LALR parsers for XHTML. Thus, tools like Dave Raggett's HTML Tidy generate HTML with proper closing tags and quotes around attribute values. But let's face it, all the browsers already have parsers that understand HTML 4, any new browser will use the HTML 5 parser rather than XHTML, and we should get comfortable writing HTML that is optimized for size.
That being said, besides a couple large companies like Google and Facebook, my guess is that page size is a negligible component of latency, so if you're optimizing your own site it's probably because of your own obsessive tendencies rather than performance.
White space can be significant (e.g. in pre elements).
When I had a particularly large page (i.e. large enough that there was a benefit in minifying the HTML) I used HTML Tidy and cached the results.
tidy -c -n -omit -ashtml -utf8 --doctype strict \
--drop-proprietary-attributes yes --output-bom no \
--wrap 0
I think you'll end up running into issues with load time with this approach, as the get contents, strip whitespace, and preg replace calls are going to take a lot longer to do than whatever bandwidth the minified HTML is saving you.
I've been running tests on all my sites for a couple of weeks and I can say that this method is pretty consistent. It will only affect template content, so there is little risk of messing up with unknown <pre> or similar.
It is run before deploy so there is no impact on server - actually there should be a little speed up as the file becomes smaller.
Do remember that all content that comes from the database will not suffer any influence as, like said before, this runs before deploy and on template files only.
The method seams solid enough to pass it into production.
If anything goes wrong I'll post it here.
Simple question hopefully.
We have a style sheet that is over 3000 lines long and there is a noticeable lag when the page is rendering as a result.
Here's the question: Is it better to have one massive style sheet that covers everything, or lots of little style sheets that cover different parts of the page? (eg one for layout, one for maybe the drop down menu, one for colours etc?)
This is for performance only, not really 'which is easier'
3,000 lines? You may want to first go in and look for redundancy, unnecessarily-verbose selectors, and other formatting/content issues. You can opt to create a text stylesheet, a colors stylesheet, and a layout stylesheet, but it's likely not going to improve performance. That is generally done to give you more organization. Once you've tightened up your rules, you could also minify it by removing all formatting, which might shave off a little bit more, but likely not much.
Well, if you split those 3k lines into multiple files the overall rendering time won't decrease because
All 3000 lines will still need to be parsed
Multiple requests are needed to get the CSS files which slows down the whole issue on another level
According to this probably reliable source one file, to satisfy rule 1 (minimize http requests). And don't forget rule 10, minify js and css, especially with a 3000 line monster.
It'll be worse if you split them up because of the overhead of the extra HTTP requests andnew connections for each one (I believe it is Apache's default behaviour to have keep-alive off)
Either way, it all needs to be downloaded and parsed before anything can happen.
Separating that monster file into smaller once (layout, format and so on) would make development more efficient. Before deployment you should merge and minify them to avoid multiple http requests. Giving the file a new number (style-x.css) for each new deployment will allow you to configure your http server to set an expire date far into the future and by that saving some additional http requests.
It sounds like you are using CSS in a very inefficient way. I usually have a style sheet with between 400 and 700 lines and some of the sites that I have designed are very intricate. I don't think you should ever need more than 1500 lines, ever.
3000 lines of code is far to many to maintain properly. My advice would be to find things that share the same properties and make them sub-categories. For example, if you want to have one font throughout the page, define it once in the body and forget about it. If you need multiple fonts or multiple backgrounds you can put a div font1 and wrap anything that needs that font style with that div.
Always keep CSS in one file, unless you have completely different styles on each page.
the effort of loading multiple css files stands against the complexity (and hence speed) of parsing as well as maintenance aspects
If certain subsets of the monster file can be related to certain html pages (and only those certain pages) then a separation into smaller units would make sense.
example:
you have a family homepage and your all.css contains all the formats for your own range of pages, your spouse's, your kids' and your pet's pages - all together 3000 lines.
./my/*.html call ./css/all.css
./spouse/*.html call ./css/all.css
./kid/*.html call ./css/all.css
./pet/*.html call ./css/all.css
in this case it's rather easy to migrate to
./my/*.html call ./css/my.css
./spouse/*.html call ./css/spouse.css
./kid/*.html call ./css/kid.css
./pet/*.html call ./css/pet.css
better to maintain, easier to transfer responsibilities, better to protect yourself from lousy code crunchers :-)
If all (or most) of your pages are sooo complex that they absolutely need the majority of the 3000 lines, then don't split. You may consider to check for "overcoding"
Good luck
MikeD
The only gain in dividing your CSS would be to download each part in parallel. If you host each CSS on different server, it could in some case gain a bit of speed.
But in most case, having a single 3000 lines of code CSS should be (a bit) faster.
Check out the Yahoo performance rules. They are backed by lots of empirical research.
Rule #1 is minimize HTTP requests (don't split the file--you could for maintenance purposes but for performance you should concat them back together as part of a build process). #5 is place CSS references at the top (in < head>). You can also use the YUI compressor to reduce the file size of CSS by stripping whitespace etc.
More stuff (CDNs, gzipping, cache-control, etc.) in the rules.
I have a site css file that controls the styles for the overall site (layout mainly).
Then I have smaller css files for page specific stuff.
I even sometimes have more than one if I am planning on ripping out an entire section at a later date.
Better to download and parse 2 files at 20kb than 1 file at 200kb.
Update: Besides, isn't this a moot point? It only has to be downloaded once. If the pause is that big a deal, have a 'loading' screen like what GMail has.
3000 lines is not a big deal. If you split them into multiple chunks, it still needs to be downloaded for rendering. the main concern is the file size. i have over 11000 lines in one of our master css file and the size is about 150 kb.
And we gzipped the static contents and the size is drastically reduced to about 20 kb.. and we didnt face any performance issues.
I've googled around but can't find any HTML minification scripts.
It occurred to me that maybe there is nothing more to HTML minification than removing all unneeded whitespace.
Am I missing something or has my Google Fu been lost?
You have to be careful when removing stuff from HTML as it's a fragile language. Depending on how your pages are coded some of that whitespace might be more significant; also if you have CSS styles such as white-space: pre then you may need to keep the whitespace. Plus there are numerous browser bugs, etc, and basically every character in an HTML file might be there to satisfy some requirement or appease some browser.
In my opinion your best bet is to design the pages well with CSS techniques (I was recently able to take an important page on the site I work for and reduce it's size by 50% just by recoding it using CSS instead of tables and nested style="..." attributes). Then, use GZip to reduce the size of your pages for browsers that understand gzip. This will save bandwidth while preserving the structure of the html.
Sometimes, depending on the enclosing tags and/or on the CSS, whitespace may be significant.
Outside of HTML Tidy/removing white space as the other answers mentioned, there isn't much.
This is more of a manual task pulling out style attributes into CSS (hopefully you're not using FONT tags, etc.), using fewer tags and attributes where possible (like not embedding <strong> tags in an element but using CSS to make the whole element font-weight: bold, unless of course it makes semantic sense to use >strong<), etc.
Yes I guess it's pretty much removing whitespace and comments. You cannot replace identifiers with shorter ones like in javascript, since chances are that CSS classes or javascript will depend on those identifiers.
Also, you should be careful when removing whitespace and make sure that there is always at least whitespace character left, otherwise allyourtextwilllooklikethis.
There's a pretty lengthy discussion on this Wordpress blog about this topic. You can find a very lengthy proposed solution using PHP and HTML Tidy there.
You can find some good references here to things like HTML tidy and others.
If you don't want to use one of those options, Prototype has a means to clean the whitespace in the DOM. You could do that on your own and copy it via 'View Generated Source' in the Firefox extension Web Developer Toolbar. Then you can replace the original html with prototype's fix. Sorry for not making that apparent nickf.
(I recommend the first link)
I haven’t tried it yet, but htmlcompressor is an HTML minifier, if you fancy giving one a try.
If you have installed node.js and you are a windows user you can create this .bat
It will minify all html in your folder in the min subfolder.
The output will be in min folder
open the console. run--> npm install html-minifier -g
create the .bat. don't forget to change the route in cd command. It's easier to change the folder in the bat file than copy and paste.
go in console into the .bat folder and run it.
cd the_destination_folder
dir /b *.HTML > list1.txt
for /f "tokens=*" %%A in (list1.txt) do html-minifier --collapse-whitespace --remove-comments --remove-optional-tags %%~nxA -o min\%%~nxA
pause
Couldn't JavaScript be used as a decompresser for a compressed HTML string, for instance have a DEV build for the uncompressed format, run a 'publish' script to compress the DEV build to production and attach a JavaScript to the HTML source (with the whitespace and such removed as before)?
The bandwidth would be reduced on the server, but the downside is there is a lot more client strain for decompressing the string to HTML. Also JavaScript would need to be enabled and be able to parse the decompressed string to HTML.
I am not saying its a definite solution, but something that might work - it all depends on if your looking in regards to bandwidth without the users JavaScript permissions/systems spec, or such.
Otherwise look for obfuscation scripts, a simple google search produced http://tinyurl.com/phpob - dependent on what your looking for there should be a software package available.
If I am on the wrong lines, please shout and I will see what else I can do.
Good Luck!
I recently found a PHP based script that minify your sites HTML - Inline css - Inline javascript on the fly it is called as
Dynamic website compressor
I've used this regexp for years, without any problems: s/>\s*</></g
In Python re.sub(r'>\s*<', '><', html)
Or in PHP preg_replace('/>\s*</', '><', $html);
This removed all whitespace between tags, but not anywhere, this is fairly safe (but not perfect, there are situations where this will break, but they're rare).
My main reason for doing this isn't speed/file size, but because the whitespace often introduces a, well, space. This would be okay, but when you start mucking about in your DOM with Javascript, spaces are often lost, creating (minor) layout differences.
Consider:
<div>
<a>link1</a>
<a>link2</a>
</div>
There's a space between the links, but now I do something like:
$('div').append('<a>link3</a>')
And there's no space ... I need to manually add the space in my JS, which is fairly ugly & error-prone IMHO.
Here is a minifier for HTML5 written in PHP.
<?PHP
$in=file_get_contents('path/to/source.html');
//Strips spaces if there are more than one.
$in=preg_replace('/\s{2,}/m',' ',$in);
//trim
$in=preg_replace('/^\s+|\s+$/m','',$in);
/*Strips spaces between tags.
Use ( or or better) padding or margin if necessary, otherwise the html
parser appends a one space textnode.*/
$in=preg_replace('/ ?> < ?/','><',$in);
//Removes tag end slash.
$in=preg_replace('# ?/>#','>',$in);
//Removes HTML comments except conditional IE comments.
$in=preg_replace('/<!--[^\[]*?-->/','',$in);
//Removes quotes where possible.
$in=preg_replace('/="([^ \'"\=><]+)"/','=$1',$in);
$in=preg_replace("/='([^ '\"\=><]+)'/",'=$1',$in);
file_put_contents('path/to/min.html',$in);
?>
After that you have a one line, shorter html code.
Better you make an array from the regular expressions, but aware to escape the back slashes.