Multiple css files or one big css file? - html

Which one is better and faster? why?
using only one file for styling:
css/style.css
or
using several files for styling:
css/header.css
css/contact.css
css/footer.css
css/tooltip.css
The reason Im asking it is that im developing a site for users who have very low internet speed. country uganda. So I want to make it as fast as possible.

Using a single file is faster because it requires less HTTP requests (assuming the amount of styles loaded is still the same).
So it's better to keep it in just one file.
Separating CSS should only be done if you want to keep for example IE specific classes separate.

As per Yahoo's Performance Rules [source], It is VERY IMPORTANT to minimize HTTP requests
From the source
Combined files are a way to reduce the number of HTTP requests by combining all scripts into a single script, and similarly combining
all CSS into a single stylesheet. Combining files is more challenging
when the scripts and stylesheets vary from page to page, but making
this part of your release process improves response times.
It is quite uneasy to develop using combined files, so stick to developing with multiple files but you should combine the files once you are deploying the system on the web.
I really recommend using boilerplate's ant build script. You can find it here.
It Combines and minifies CSS

One css file is better than multiple css files because of the overhead involved by the end user's browser to make multiple requests for each file. Other things you can do yo improve the performance include:
Enable gzip impression on your webserver e.g. on Apache so that the files are compressed before downloading
where possible host your files geographically as close to the majority of your end users as possible
use a CDN network for your static content such as css files
Use CSS sprites
Cache your content
Note that there are tools available to help you do this. See 15 ways to optimise css for more information

This is always a better solution to bundle or combine multiple CSS or JavaScript files into fewer HTTP requests. This causes the browser to request a lot fewer files and in turn reduces the time it takes to fetch them.
With a proper caching, you can gain extra bandwidth and even fewer HTTP request.
Update:
There's a new Bundling feature in ASP.Net 4.5 which you might be interested in.
This allows you to have css files separated at compile-time, and in runtime gain benefit of combined resources into one resource

One resource file is always the fastest approach since you reduce the number of HTTP requests made to fetch those files.
I would suggest to use Yslow which is a great extension for firebug that analyzes web pages and suggests ways to improve their performance.

Related

Is higher number of different folders affecting website load speed?

I have one question about page load speed. Consider a case when I have different kinds of images on my web page, such as icons, logos, images inside content etc....
I want to know whether having separate folders for each media category may affect the page load speed:
/logos
/icons
/images
Will the webpage load faster if the images of all categories were located in a single folder rather in multiple ones?
Thanks in advance for your advice.
Even though performance-related questions often get closed due to them not being answerable without benchmarks on the machine, this one is worth an answer since unless you run a potato-based computer, you won't have any performance impact.
Directories are not actually physical folders like you would have in real life.
They are simply registers of pointers to disk spaces where your files are stored. (Of course this is massively over-simplified as it involves file-systems and more low-level stuff, but that's not needed at that point).
To come back to your question, the difference between loading two files from two directories:
/var/foobar/dir1/image1.jpeg
/var/foobar/dir2/image2.jpeg
or one directory:
/var/foobar/dir1/image1.jpeg
/var/foobar/dir1/image2.jpeg
...is that your file system will have to look-up two different directories tables. With modern file-systems and moderate (even low-end) hardware, this causes no issues.
As #AjitZero mentioned, here your performance impact will come from the size of the files, the number of distinct HTTP requests (i.e.: How many images, CSS, scripts, etc...) and the way you cache data on the user's computer.
No, the number of folders doesn't affect the speed of page-load.
Lesser number of HTTP requests matter, however, so you can use sprite sheets.

Inroducing service-workers on mobile site

I'm new to service workers. I want to integrate service workers in my site.My motive is to improve the performance of my website not making the website offline.Its a real estate website.
So what i have done till now is create modular templates of my site and store them in the cache.
for e.g. template1
<div>
<p>#data</p>
<div>
whenever a fetch occurs on my page i first call an api through ajax get the data and replace the #data variable in the cache response by actual api response and then i returned the new response to the browser.
Question 1:- So i want to know is that the right approach. for html template caching?
In the above approach i'm getting challenges like loops and conditional statements in my html.
Question 2:- Is there any way that i can cache the templates with loops and change them at run time?.
Question 3:- Say if i show the cached app-shell to the user initially, so is that going to effect my site's SEO ranking?.
Question 4:- I have to write new templates of the existing code, which means i have to maintain two codes one for service-workers and other for normal browsers which dont support service workers.Any solution to this?
Regards
That's a lot of questions and I don't think service workers are necessarily your best bet.
Question 1: Personally I recommend using a framework such as KnockoutJS, Angular, Polymer...etc for your HTML templates. These often have template caching built-in.
Question 2: Instead of the approach you are using whereby you are replacing the variables before 'sending them to the browser' most frameworks would use some form of data bindings which would take care of iterations and conditions within the browser.
Question 3: Caching the app-shell would have no effect on SEO and Google has been parsing JavaScript for a while; however personally I would recommend that the website's content loads without JavaScript and then JavaScript is only used to enhance the experience. This would be the same whether using the app-shell model or not.
Question 4: I do not understand your current setup and this would not be an ordinary scenario so you might have something wrong.
Service workers and the Cache API are ordinarily used to cache your static assets, usually fonts, css, JavaScript and HTML templates and should result in improved performance as there are less HTTP requests; but there are other ways to improve performance that will address all of your questions without the use of service workers.

Under what circumstances would loading images individually with HTTP/2 be slower than loading all images at once with a sprite a la HTTP/1.1?

HTTP/2 makes it possible to multiplex connections, eliminating the need for more than one connection to a server. Over a single connection, many individual images can be sent down to the client. This obviates the old image sprite pattern of combining many images into one and using CSS to cut it apart.
I'm curious if sprites would still actually be faster in an HTTP/2 world. If so, under what circumstances?
Sprites, as you will know, are used to prevent multiple requests being queued, so with one payload you could get all the sprites for the site.
But with sprites you tend to get lots of additional icons that are used throughout the website that aren't all needed on any single page.
So with http/2 multiplexing, queuing resources is no longer an issue. You get the speed benefit when you only download the files needed for each page.
However you may get better compression by combining some images into a single file, making the overall size of file transfers smaller.
Speed tests run by Benoît Béraud and Alexandre Masselot have given an example of a sprite sheet loading faster than individual sprites. They concluded that sprite sets can still be used to optimise site performance when using http/2 http://blog.octo.com/en/http2-arrives-but-sprite-sets-aint-no-dead/
Extended write up about http/2 by Rachel Andrew can be found here:
https://www.smashingmagazine.com/2016/02/getting-ready-for-http2/
With HTTP/2 multiplexing, the server will be reading a lot of small files instead of reading a single big file. If the server is resource-limited (some Internet-of-Things contraption, for example), then you might be able to find a situation where it's better to have it making the single, big read instead of a lot of small ones, since each read causes the server OS to potentially do a lot of file-access-related operations.
On the client side, the browser will be managing a lot of small files, instead of a big one. I can imagine the code path used for the current sprite workflow being well massaged and optimized, since it's so commonly used. So it might happen that the new case of having a lot of small files could be slower, at least for a time.

Why people always encourage single js for a website?

I read some website development materials on the Web and every time a person is asking for the organization of a website's js, css, html and php files, people suggest single js for the whole website. And the argument is the speed.
I clearly understand the fewer request there is, the faster the page is responded. But I never understand the single js argument. Suppose you have 10 webpages and each webpage needs a js function to manipulate the dom objects on it. Putting 10 functions in a single js and let that js execute on every single webpage, 9 out of 10 functions are doing useless work. There is CPU time wasting on searching for non-existing dom objects.
I know that CPU time on individual client machine is very trivial comparing to bandwidth on single server machine. I am not saying that you should have many js files on a single webpage. But I don't see anything go wrong if every webpage refers to 1 to 3 js files and those js files are cached in client machine. There are many good ways to do caching. For example, you can use expire date or you can include version number in your js file name. Comparing to mess the functionality in a big js file for all needs of many webpages of a website, I far more prefer split js code into smaller files.
Any criticism/agreement on my argument? Am I wrong? Thank you for your suggestion.
A function does 0 work unless called. So 9 empty functions are 0 work, just a little exact space.
A client only has to make 1 request to download 1 big JS file, then it is cached on every other page load. Less work than making a small request on every single page.
I'll give you the answer I always give: it depends.
Combining everything into one file has many great benefits, including:
less network traffic - you might be retrieving one file, but you're sending/receiving multiple packets and each transaction has a series of SYN, SYN-ACK, and ACK messages sent across TCP. A large majority of the transfer time is establishing the session and there is a lot of overhead in the packet headers.
one location/manageability - although you may only have a few files, it's easy for functions (and class objects) to grow between versions. When you do the multiple file approach sometimes functions from one file call functions/objects from another file (ex. ajax in one file, then arithmetic functions in another - your arithmetic functions might grow to need to call the ajax and have a certain variable type returned). What ends up happening is that your set of files needs to be seen as one version, rather than each file being it's own version. Things get hairy down the road if you don't have good management in place and it's easy to fall out of line with Javascript files, which are always changing. Having one file makes it easy to manage the version between each of your pages across your (1 to many) websites.
Other topics to consider:
dormant code - you might think that the uncalled functions are potentially reducing performance by taking up space in memory and you'd be right, however this performance is so so so so minuscule, that it doesn't matter. Functions are indexed in memory and while the index table may increase, it's super trivial when dealing with small projects, especially given the hardware today.
memory leaks - this is probably the largest reason why you wouldn't want to combine all the code, however this is such a small issue given the amount of memory in systems today and the better garbage collection browsers have. Also, this is something that you, as a programmer, have the ability to control. Quality code leads to less problems like this.
Why it depends?
While it's easy to say throw all your code into one file, that would be wrong. It depends on how large your code is, how many functions, who maintains it, etc. Surely you wouldn't pack your locally written functions into the JQuery package and you may have different programmers that maintain different blocks of code - it depends on your setup.
It also depends on size. Some programmers embed the encoded images as ASCII in their files to reduce the number of files sent. These can bloat files. Surely you don't want to package everything into 1 50MB file. Especially if there are core functions that are needed for the page to load.
So to bring my response to a close, we'd need more information about your setup because it depends. Surely 3 files is acceptable regardless of size, combining where you would see fit. It probably wouldn't really hurt network traffic, but 50 files is more unreasonable. I use the hand rule (no more than 5), but surely you'll see a benefit combining those 5 1KB files into 1 5KB file.
Two reasons that I can think of:
Less network latency. Each .js requires another request/response to the server it's downloaded from.
More bytes on the wire and more memory. If it's a single file you can strip out unnecessary characters and minify the whole thing.
The Javascript should be designed so that the extra functions don't execute at all unless they're needed.
For example, you can define a set of functions in your script but only call them in (very short) inline <script> blocks in the pages themselves.
My line of thought is that you have less requests. When you make request in the header of the page it stalls the output of the rest of the page. The user agent cannot render the rest of the page until the javascript files have been obtained. Also javascript files download sycronously, they queue up instead of pull at once (at least that is the theory).

How do web spiders differ from Wget's spider?

The next sentence caught my eye in Wget's manual
wget --spider --force-html -i bookmarks.html
This feature needs much more work for Wget to get close to the functionality of real web spiders.
I find the following lines of code relevant for the spider option in wget.
src/ftp.c
780: /* If we're in spider mode, don't really retrieve anything. The
784: if (opt.spider)
889: if (!(cmd & (DO_LIST | DO_RETR)) || (opt.spider && !(cmd & DO_LIST)))
1227: if (!opt.spider)
1239: if (!opt.spider)
1268: else if (!opt.spider)
1827: if (opt.htmlify && !opt.spider)
src/http.c
64:#include "spider.h"
2405: /* Skip preliminary HEAD request if we're not in spider mode AND
2407: if (!opt.spider
2428: if (opt.spider && !got_head)
2456: /* Default document type is empty. However, if spider mode is
2570: * spider mode. */
2571: else if (opt.spider)
2661: if (opt.spider)
src/res.c
543: int saved_sp_val = opt.spider;
548: opt.spider = false;
551: opt.spider = saved_sp_val;
src/spider.c
1:/* Keep track of visited URLs in spider mode.
37:#include "spider.h"
49:spider_cleanup (void)
src/spider.h
1:/* Declarations for spider.c
src/recur.c
52:#include "spider.h"
279: if (opt.spider)
366: || opt.spider /* opt.recursive is implicitely true */
370: (otherwise unneeded because of --spider or rejected by -R)
375: (opt.spider ? "--spider" :
378: (opt.delete_after || opt.spider
440: if (opt.spider)
src/options.h
62: bool spider; /* Is Wget in spider mode? */
src/init.c
238: { "spider", &opt.spider, cmd_boolean },
src/main.c
56:#include "spider.h"
238: { "spider", 0, OPT_BOOLEAN, "spider", -1 },
435: --spider don't download anything.\n"),
1045: if (opt.recursive && opt.spider)
I would like to see the differences in code, not abstractly. I love code examples.
How do web spiders differ from Wget's spider in code?
A real spider is a lot of work
Writing a spider for the whole WWW is quite a task --- you have to take care about many "little details" such as:
Each spider computer should receive data from a few thousand servers in parallel in order to make efficient use of the connection bandwidth. (asynchronous socket i/o).
You need several computers that spider in parallel in order to cover the vast amount of information on the WWW (clustering; partitioning the work)
You need to be polite to the spidered web sites:
Respect the robots.txt files.
Don't fetch a lot of information too quickly: this overloads the servers.
Don't fetch files that you really don't need (e.g. iso disk images; tgz packages for software download...).
You have to deal with cookies/session ids: Many sites attach unique session ids to URLs to identify client sessions. Each time you arrive at the site, you get a new session id and a new virtual world of pages (with the same content). Because of such problems, early search engines ignored dynamic content. Modern search engines have learned what the problems are and how to deal with them.
You have to detect and ignore troublesome data: connections that provide a seemingly infinite amount of data or connections that are too slow to finish.
Besides following links, you may want to parse sitemaps to get URLs of pages.
You may want to evaluate which information is important for you and changes frequently to be refreshed more frequently than other pages. Note: A spider for the whole WWW receives a lot of data --- you pay for that bandwidth. You may want to use HTTP HEAD requests to guess whether a page has changed or not.
Besides receiving, you want to process the information and store it. Google builds indices that list for each word the pages that contain it. You may need separate storage computers and an infrastructure to connect them. Traditional relational data bases don't keep up with the data volume and performance requirements of storing/indexing the whole WWW.
This is a lot of work. But if your target is more modest than reading the whole WWW, you may skip some of the parts. If you just want to download a copy of a wiki etc. you get down to the specs of wget.
Note: If you don't believe that it's so much work, you may want to read up on how Google re-invented most of the computing wheels (on top of the basic Linux kernel) to build good spiders. Even if you cut a lot of corners, it's a lot of work.
Let me add a few more technical remarks on three points
Parallel connections / asynchronous socket communication
You could run several spider programs in parallel processes or threads. But you need about 5000-10000 parallel connections in order to make good use of your network connection. And this amount of parallel processes/threads produces too much overhead.
A better solution is asynchronous input/output: process about 1000 parallel connections in one single thread by opening the sockets in non-blocking mode and use epoll or select to process just those connections that have received data. Since Linux kernel 2.4, Linux has excellent support for scalability (I also recommend that you study memory-mapped files) continuously improved in later versions.
Note: Using asynchronous i/o helps much more than using a "fast language": It's better to write an epoll-driven process for 1000 connections written in Perl than to run 1000 processes written in C. If you do it right, you can saturate a 100Mb connection with processes written in perl.
From the original answer:
The down side of this approach is that you will have to implement the HTTP specification yourself in an asynchronous form (I am not aware of a re-usable library that does this). It's much easier to do this with the simpler HTTP/1.0 protocol than the modern HTTP/1.1 protocol. You probably would not benefit from the advantages of HTTP/1.1 for normal browsers anyhow, so this may be a good place to save some development costs.
Edit five years later:
Today, there is a lot of free/open source technology available to help you with this work. I personally like the asynchronous http implementation of node.js --- it saves you all the work mentioned in the above original paragraph. Of course, today there are also a lot of modules readily available for the other components that you need in your spider. Note, however, that the quality of third-party modules may vary considerably. You have to check out whatever you use. [Aging info:] Recently, I wrote a spider using node.js and I found the reliability of npm modules for HTML processing for link and data extraction insufficient. For this job, I "outsourced" this processing to a process written in another programming language. But things are changing quickly and by the time you read this comment, this problem may already a thing of the past...
Partitioning the work over several servers
One computer can't keep up with spidering the whole WWW. You need to distribute your work over several servers and exchange information between them. I suggest to assign certain "ranges of domain names" to each server: keep a central data base of domain names with a reference to a spider computer.
Extract URLs from received web pages in batches: sort them according to their domain names; remove duplicates and send them to the responsible spider computer. On that computer, keep an index of URLs that already are fetched and fetch the remaining URLs.
If you keep a queue of URLs waiting to be fetched on each spider computer, you will have no performance bottlenecks. But it's quite a lot of programming to implement this.
Read the standards
I mentioned several standards (HTTP/1.x, Robots.txt, Cookies). Take your time to read them and implement them. If you just follow examples of sites that you know, you will make mistakes (forget parts of the standard that are not relevant to your samples) and cause trouble for those sites that use these additional features.
It's a pain to read the HTTP/1.1 standard document. But all the little details got added to it because somebody really needs that little detail and now uses it.
I am not sure exactly what the original author of the comment was referring to, but I can guess that wget is slow as a spider, since it appears to only use a single thread of execution (at least by what you have shown).
"Real" spiders such as heritrix use a lot of parallelism and tricks to optimize their crawling speed, while simultaneously being nice to the website they are crawling. This typically means limiting hits to one site at a rate of 1 per second (or so), and crawling multiple websites at the same time.
Again this is all just a guess based on what I know of spiders in general, and what you posted here.
Unfortunately, many of the more well-known 'real' web spiders are closed-source, and indeed closed-binary. However there are a number of basic techniques wget is missing:
Parallelism; you're never going to be able to keep up with the entire web without retrieving multiple pages at a time
Prioritization; some pages are more important to spider than others
Rate limiting; you'll be banned quickly if you keep pulling down pages as quickly as you can
Saving to something other than a local filesystem; the Web is big enough that it's not going to fit in a single directory tree
Rechecking pages periodically without restarting the entire process; in practice, with a real spider you'll want to recheck 'important' pages for updates frequently, while less interesting pages can go for months.
There are also various other inputs that can be used such as sitemaps and the like. Point is, wget isn't designed to spider the entire web, and it's not really a thing that can be captured in a small code sample, as it's a problem of the whole overall technique being used, rather than any single small subroutine being wrong for the task.
I'm not going to go into details of how to spider the internet, I think that wget comment is regarding to spidering one website which is still a serious challenge.
As a spider you need to figure out when to stop, not go into recursive crawls just because the URL changed like date=1/1/1900 to 1/2/1900 and so
Even bigger challenge to sort out URL Rewrite (I have no clue what so ever how google or any other handles this). It's pretty big challenge to crawl enough but not too much. And how one can automatically recognise URL Rewrite with some random parameters and random changes in the content?
You need to parse Flash / Javascript at least up to some level
You need to consider some crazy HTTP issues like base tag. Even parsing the HTML is not easy, considering most of the websites are not XHTML and browsers are so flexible in the syntax.
I don't know how much of these implemented or considered in wget but you might want to take a look at httrack to understand the challenges of this task.
I'd love to give you some code examples but this is big tasks and a decent spider will be about 5000 loc without 3rd party libraries.
+ Some of them already explained by #yaakov-belch so I'm not going to type them again