The codebases for modern web browsers like Chrome, Firefox, and Safari (WebKit) are quite large. I am curious about what specifically makes their implementations so non-trivial that they require vast amounts of code.
As a corollary question, if a hypothetical browser only supported strict HTML5 and JavaScript, to avoid compatibility hacks, would the codebase be significantly smaller?
For your first question, consider the things a modern browser needs to implement (some browsers push some of this work out to operating system services):
Several parsers: XML, HTML, JavaScript, CSS, at least.
At least four separate layout systems (CSS box model, flexbox, SVG, MathML).
At least one graphics library; for cross-platform browsers this needs per-platform backends (IE9+ just uses the system Direct2D library; Safari on Mac just uses Quartz as far as I know).
A high-performance virtual machine with a JIT, a garbage collector, a bit of a standard library (growing all the time; see typed arrays and various other recent JavaScript features).
A DOM implementation, including various things like the HTML-specific and SVG-specific DOM interfaces and so forth.
Audio and video processing facilities (again Safari on Mac and IE offload these to the operating system).
Image processing facilities, with support for at least JPG/GIF/PNG. Again, some browsers may be able to offload parts of this to the operating system.
A library for converting byte streams to Unicode characters. Again, sometimes this can be offloaded to the operating system and sometimes not.
For cross-platform browsers, some sort of portability layer that abstracts away the platform-specific bits.
An HTML editor with transactions and a programmable API; think contenteditable.
A plaintext editor for textareas. Some of this can be shared with the HTML editor, maybe.
A spellchecker, which may or may not be offloaded to the OS.
A network library supporting HTTP, maybe SPDY, probably FTP, and maybe a few other protocols. Again, this may or may not be offloaded to the OS.
A cryptographic library to handle SSL and various other cryptography needs. Again, this may or may not be offloaded to the OS.
At least one database implementation (sqlite seems to be popular).
Various code for the actual user interface and whatnot.
Glue code to handle interactions between all these: code that manages calls back and forth between JavaScript and the DOM, code that manages recomputing style and layout information when the DOM changes, code that handles things like document.write injecting strings from JavaScript into the parser's input stream, and so forth. Note that the amount of glue code is generally quadratic in the number of interacting modules.
I'm probably missing a few things, but that's off the top of my head.
In addition to this at least Gecko and WebKit have template libraries for things like strings and arrays (because the C++ standard library ones have various drawbacks).
For the rest... at this point a lot of the "compatibility hacks" are actually part of web standards. So you can't exactly avoid them. Your scenario talks about JavaScript and HTML but not SVG or MathML or CSS. If you really just mean HTML and JavaScript but not CSS or the rest, then you could obviously cut out a bunch of code. If you include all of those, plus the audio and video capabilities of HTML5 and want your browser to perform well, then I doubt you can make it much smaller.
I think modern web browsers are complicated apps. Mainly, they have rendering engines which have to handle different kinds of HTML, ability to deal with not HTML formats (like XML, RSS etc.), CSS handlers, Javascript engines sometimes with a JIT.
Apart from that, they have plugin architectures and APIs, parts to abstract differences between platforms and are usually built using components that other apps use.
This makes them quite non-trivial. As for your collorary, I think so. Lynx is quite small and doesn't support Javascript or fancy HTML.
Related
In the field of RIAs, I've read tons of comments stated that the discussion about Silverlight vs JavaFX vs Flash vs HTML5 is outdated and the winner is HTML5.
Since I am a programmer (not a designer) but never used any of the technologies above and I have no time to learn all of them to compare, I want to ask the following:
1) With HTML5 we continue to only have interpreted JavaScript, or we can use more powerful languages that generates compiled code (some kind of MSIL or bytecode inside <object> ... </object> tags)?
2) Does HTML5 hide portions of our code from unwanted viewers (like Java applets and ActiveX did in the past) or the option "View - Source Code" continues to show all of our work?
3) HTML5 need some kind of runtime or all the work is done by the browser?
There is a bit of a fundamental problem with your question in that HTML5 is not really a thing. It's hard to compare it to Java or Flash, which are programming languages. It is possible to create interactive applications using HTML and JavaScript without using any of the features of HTML5. This seems to happen a lot, but for simplicity these are often referred to as HTML5 applications -- especially by non-technical people.
1) I would say that JavaScript can be as powerful as the other compiled languages you talk about even on a webpage -- especially with the power of <canvas>. You have found that HTML5 has won for a reason. In my opinion, it can do what the others can do and is simpler to implement.
There is nothing that forbids you from including Flash objects of Applets alongside JavaScript, though, and they can even interact.
2) There is nothing that prevents all of your HTML/JavaScript from being downloaded in the browser and viewed in plain text, although it can be obfuscated with tools such as Uglify.
It's also possible to download and decompile swfs and jars from a web interface, so compiling does not necessarily offer you a lot of protection anyway. Your code would be protected by Copyright (at least in the US) and you could use a license the MIT license too.
3) All the work is done by the browser. The client will only need a browser to run your code, but some browsers do not support some features you may want to use. This especially applies to older browsers.
There are a ton of frameworks and libraries out there for creating rich HTML/JavaScript applications, but these are just are mostly just JavaScript files.
HTML5 is a brand and trend. In silverlight/flash/html battle HTML must be the winner. Because browsers can't progress fragmentary. Is it true then some part of features developing in browser plugins, some part in browsers. For strong and fast progress always, at the start, needs the vector. In redefining web in our life such vector is HTML5. HTML5 is not a language, it is the set of capabilities, such video, music, webgl, geolocation, semantic essences and more and more only from browser. And we can't speek here about HTML5 like a language.
All your questions are about javascript.
Want bytecode inside <object> ... </object>? Use Chrome Native Client.
Want to hide your code? Use any obfuscator.
Modern javascript engines has just-in-time compilation. And there is the "subset" of javascript like asm.js which is just 2x slower then C++.
Every now and again I get asked to install something like this on a customer web server, or we're asked if we support BrowserHawk (which we don't).
I'm wondering if Modernizr is something I can point my customers at and tell them to use instead?
I've not used Browserhawk (in fact, I'd never heard of it until now), so please don't take my opinion as infallible.
However, I do know about browsecap.ini, and having taken a few moments to read the Browserhawk website, I'm fairly certain it's also a server-side browser detection tool.
If that's the case, then the answer is 'Yes'. Current best practice says to avoid using server-side browser detection, and to use client-side feature detection instead. And this is exactly what Modernizr does.
Feature detection allows you to do much finer-grained tuning of your site according to what the user's browser is capable of, rather than simply blocking users who have (or don't have) a particular browser. It also allows you to implement specific fall-back solutions for specific features, if required.
Detecting the user's browser from the server-side is a problem because of the rapid pace of change in the browser market; you would need to be constantly updating your browser detection script to cope with new versions.
In addition, users of slightly more unusual browsers or browser shells may not be detected properly by a browser detection script, so they may have trouble with sites that use it, even though their browser should be capable of displaying the site. Also, some users may not provide the user-agent string required to correctly detect their browser; it is blocked by some proxies, firewalls, etc, and some browsers also allow it to be modified, so it can be spoofed easily if a user wants to.
But having gone to lengths to promote feature detection over browser detection, I need to point out one exception to all of this, and that's IE.
Older versions of IE have a lot of bugs. This is different to simply having missing features, because you can't actively check for bugs so easily. If you're having specific issues with IE bugs, then it is legitimate to do browser detection to avoid them. (feature detection is still valid if you're only worried about what the browser supports, rather than actual bugs)
But even in this case, a tool such as browsercap.ini or Browserhawk is unnecessary. IE helpfully supports Conditional Comments which allows you to add specific code for IE without having to go out of your way to detect it.
I am learning how to program and my goal is to build a simple functional prototype...I'm at the very beginning.
I am not concerned with the visual design at this stage, other than as it relates to being able to demonstrate the functionality.
My question is: do I need to worry about ironing out cross-browser bugs in the HTML/CSS, or can I do development on a single browser? (Perhaps a better way of asking this is does the back-end programming have any effect on which browser is displaying it).
If you are at the very beginning and only want a functional prototype, do not worry about cross browser HTML/CSS. In fact forget the CSS altogether and focus on printing just standard HTML. Since the visual design will change, focus on the content, styles can always be applied and switched later.
If you need Javascript/AJAX stuff I would recommend using a library like JQuery that has already solved many cross browser problems for you.
The back-end stuff "Perl, PHP, Python, etc" shouldn't care about the browser as it is simply printing text for the browser to render as it will.
The back-end programming will affect the way a given browser displays your page and there might well be two schools of thought on whether you should be picky about the browser compatibility issues.
On the one hand, if you're just finding your feet in web development it might be asking too much to expect to have a standards-perfect, cross-browser site or application every time. It might be better to focus on actually accomplishing a finished result and learning as much syntax and technique as possible.
On the other hand, it might be argued that it's a good idea to get into the habit of adopting good practices now and recognising the sorts of things that are going to give you headaches... probably when you view your page in Internet Explorer. This takes more time to reach a finished product, but it would teach you good habits up-front.
Really it comes down to your own approach and preferences. Do you want to be detail-oriented and turn out a polished result in a longer period of time, or would you prefer to just get to the finish line and identify issues on a case-by-case basis?
Do car prototypes have a working stereo, leather upholstery, chrome rims, dice, and other random stuff which does not demonstrate the functionality of the newly-designed car?
My rule of thumb is that if it takes you more than 10 minutes to make it look acceptable to others (I'm completely fine with a disgusting design when prototyping), you're spending too much time on the aesthetics and less time on the actual clockwork.
What good does a "pretty-looking" site do if it has no functional layout?
This depends on both your audience and on your tooling. If you are trying to support all users on all browsers, then you will certainly need to do testing on those browsers (although actively developing on those browsers may not be necessary), whereas if you only need to support WebKit-based browsers (Chrome, Safari) or WebKit-based browsers and Firefox, that is less testing that you need to do.
It also depends on your tooling. For example, if you are writing directly in HTML and CSS, then you are much more likely to run into browser compatibility issues. However, if you use a tool such as GWT, which can generate browser-specific output automatically, there are fewer such issues to deal with.
Note that you can use Selenium (aka WebDriver), to automatically test your code on multiple different browsers, even if you only actively develop within a single browser environment. That way, you can know if you've broken something, but not have to constantly manually test in multiple browsers.
This question has a discussion of progressive enhancement. My question is about the alternative type of web application. If you have a web app in which the UI is constructed almost entirely in Javascript, won't gracefully degrade, has a desktop feel, etc., what is that kind of web application called?
Do you mean this type or the opposite of this type:
"Rich Internet Application" where you could have an application that runs on for example AIR.
to me, what you describe seems to be a JavaScript based fat client ... i see nothing wrong in that ...
the thing is, that everyone forgets is that HTML means hypertext markup language ... it is a format for describing documents and was never designed to capture the functionality that some HTML-based apps offer nowadays ...
the answer "RIA" seems the best to me ... of course that includes flash and silverlight ... but your choice of HTML+JS is completely arbitrary in this case, because you manipulate the HTML DocumentObjectModel with JavaScript as a flash developer would manipulate the flash DisplayObjectModel with ActionScript ...
there are simply web apps, that are document and form based ... they have a CRUD infrastructure for some type of data, that is accessed in a RESTful, or at least RESTish way ... this type of apps can employ progressive enhancement, using HTML to capture its semantics and plain HTTP for all client<->server communication... i'd tend to simply call this kind of web app a web site ... having a bit of funky AJAX won't change that really ... i mean, from a simple guest book, to a forum, to stackoverflow, the basic idea never changes ... and a guest book does not make a web application, does it?
there are web apps, where the state is fully maintained by a much richer client, because these apps do a lot of granular data manipulation, as opposed to the document based CRUD web apps, and to me, this is the type of web application actually deserving the name, but i'd call them RIAs, to emphasize the difference ... in some cases this solution is faster, more lightweight, scalable, usable, easier and faster to develop/maintain/extend, and simply more natural ... this choice is often based on the type of data they deal with, as well as the functionality exposed for manipulating that data ... for example, if you were to implement a game like tetris, progressive enhancement wouldn't be the way to go ... instead, in order to create such apps, willful misuse of HTML is required ... so what? :-D
so, yeah, RIA is the right word, i'd say ... and opposed to others, i think first of all, it is a great, easy and powerful way of deploying functionality ... i mean i get the whole "inaccessible" and "incrawlable" thing ... but the latter is often pointless, and the first one is a problem you can't address properly, unless for example screen readers read whatever is in the DOM, instead of spitting out the original page ... but that's the problem you face with "real", i.e. desktop like, apps ...
greetz
back2dos
Monolithic?
Well, really the opposite of "progressive enhancement" is "graceful degradation", even though they basically achieve the same thing.
Progressive enhancement means you start off with plain old HTML for older browsers, then enhance it in stages, with cross-browser CSS, additional CSS (e.g. CSS3 styles), Javascript and AJAX.
Graceful degradation means you rush headlong into creating a Rich Internet Experience, then tack on alternatives for people without Javascript/CSS.
Anyway to answer your question, I'd probably call it "ungraceful degradation". Alternatives:
Badly designed
Uncrawlable (from search engine perspective)
Inaccessible (credit: Chuck)
Inaccessible.
It just doesn't degrade well.
I'm not sure I'd categorise applications as progessively enhanced, because that inferrs that there is some sort of baseline. How far back should an app degrade before it's considered as 'progressively enhanced'?
At a push, I'd say the app is dependant on certain features of the browser - maybe it is 'edge dependant' or 'modern browser only'?
UI is constructed almost entirely in Javascript, won't gracefully degrade
Arrogant. Presumptive. Illegal (depending on specifics of application and jurisdiction).
Why HTML/JavaScript/CSS are not becoming compiled languages (or maybe even merge into a single compiled language)? What if browsers were running "Browser Virtual Machine" and html/javascript/css sources could by compiled to a "browser bytecode". Wouldn't it help developers and users a lot?
I can see a few challenges:
What to do with zillions of existing pages? Make this compilation optional, so if you want you can use plain old html. If you want to feed a browser with a compiled page just use .chtml for example.
How search providers would index pages? Make a decompiler that would decompile bytecode into exact original sources (for example like flash can be decompiled). Or search providers can use the same virtual machine and get data they need from there.
How to make it compatible with all browsers? Have one centralized developer (lets say w3c) to develop this virtual machine and then each browser would embed it.
But what about benefits:
Speed.
Size.
No more "loose" and "half-correct" html. It is either correct or won't compile.
Looks the same in every (supported) browser.
If not a bytecode then at least have some native compression going on, html probably is not the most efficient way of data storing. I know there is gzip but why to compress pages every time on a server and decompress in a browser if we can compress it once and feed it to a browser?
So what stops us from taking this road (well, besides a huge amount of effort to make it all happen)?
Ah, but Javascript IS becoming a compiled language. Check out Firefox 3.5 with TraceMonkey. It's insanely fast compared to um you-know-who's browser. It's true that JS will never be C, but it's a much more dynamic language than C is, and in many ways that makes it more expressive and powerful.
As far as HTML goes, I don't think that the lack of validity of HTML is a huge detriment to speed. I think the engines that put together the visual representation and manipulate the DOM need to get a lot better (um, IE, I'm looking in your general direction...). CSS compliance needs to get better, and CSS itself needs to get more powerful. (Get on the bus with CSS 3 people!)
But I do think that speed is going to get better on Firefox and Chrome to such an extent that people really ARE going to start using it for mainstream application development. It's funny. Adobe seems to be selling Flash as their platform for dynamic web content, MSFT is selling Silverlight for dynamic web content, and Google just wants to really improve HTML and Javascript to display dynamic web content. And Google's doing pretty well at it so far, I must say...
Your ideas have validity when they are applied to JavaScript. As others have noted, to one degree or another several vendors are trying to apply those principles to JS even now. Another big step in this area will likely be the Chrome OS Google has announced. However, when it comes to (X)HTML and CSS I think your ideas may be missing the point.
The world wide web is not a buggy and inconsistent application platform but a massive and unprecedented collection of interconnected documents. The power of the web is in the abstraction of the data from the often rigid (and breakable) visual layouts and increasingly complex in-page functionality largely provided via JavaScript. Encoding these pages in (X)HTML is ideal for making them accessible to the widest possible audience both in terms of browsers and in terms of technical knowledge required to author a page.
More and more the web is being used as an application platform - which is a powerful and exciting use of this technology - but we cannot lose sight of the fact that these Ajax-driven "web 2.0" apps are merely documents with extended functionality. Compilation doesn't make sense for a document and compression is already happening (via gzip and the like).
On a more practical note, the W3C moves at a glacier's pace and browser vendors take turns between jumping-the-gun supporting experimental features in unfinished specs and taking their sweet time supporting other specs which have been on the table and in common usage for years. The whole processes is like herding cats. I wouldn't hold my breath for them to make the kind of radical changes you're proposing any time soon.
Since HTML and CSS aren't code they can't be compiled. Google Chrome's V8 engine does actually convert JS into byte code, expect other rendering engines to follow suit!
http://code.google.com/apis/v8/design.html
We recently reworked a php template system I've helped create to use minify to compress multiple JS and CSS into one file each, seeing our file sizes drop to about 20% of the origial combined sizes. Minify also does gzip and caching so it's really amazing for speeding up websites.
http://code.google.com/p/minify/
In short you can't compile non-code, which HTML and CSS are. JS can be compiled and is starting to be, but all depends on what browsers feel like doing.
Browsers just need to be on the ball regarding supporting web standards. The more browsers do this, the less headache us web developers have. I was quite happy with YouTube's very public drop of support for IE6. We need more action like that for the web to move forward.
The V8 javascript engine (also embedded in Google Chrome, but it's open-source and liberally licensed so you're welcome to use it in the next browser you write!) does compile Javascript to native machine code -- of course, it does it "just in time" (like most modern compilers -- Java, C#, etc!), not "ahead of time" (like Fortran did in 1954 when computers were just too weak to handle compilation in the midst of execution). I'd be surprised if other good JS engines, like those in the very latest Firefox and Safari, didn't do the same.
Looks like you're not advocating "javascript as a compiled language" (since it obviously already IS compiled, if you're using a good JS engine), but rather "ahead-of-time" compilation for it (just when most modern languages are essentially abandoning ahead-of-time compilation). Pushing machine code rather than compilable code down the wire sounds like a mostly horrible idea -- much larger size, difficulties in supporting one CPU vs another, security nightmares in properly sandboxing it, etc, etc) with not much in term of compensating benefits.
That said, if you're really keen on pushing machine code to the client, try out nativeclient (as long as the client is an x86 machine - forget every smart phone on the planet, many netbooks, good old macs, etc) -- at least it promises a fix to the security nightmares. If and when you're happy with nativeclient, transforming a just-in-time compiler into an ahead-of-time one is a far easier technical challenge (if you want to keep using Javascript for the sources rather than other languages, of course).
See here for a previous discussion on the matter
Not all of the reasons given are necessarily valid, but one important one is that, unless you're Google, server-side CPU cycles are a lot more valuable than client-side cycles: so it's easier to have the client compile/optimize what is quite often dynamically generated HTML/JavaScript, rather than the server.
Ken
Speed.
You're assuming that it takes significant time to parse HTML. However it might be that that time is insignificant compared to the time required for something else, e.g. the time required to layout the text on the end-user's window.
No more "loose" and "half-correct" html. It is either correct or won't compile.
You already get that, using [X]HTML.
Looks the same in every (supported) browser.
You seem to be saying that there should only be one browser, or that all browsers would support it equally.
Internet standards don't happen by having a single body (the w3c) implementing something and declaring it a standard. Instead, internet standards happen by having multiple independent bodies creating multiple implementations. A consequence is:
Some people have developed something that isn't standard yet (i.e. they're ahead of the standard)
Some people haven't yet developed something that is standard (i.e. they're behind of the standard)
I think your idea is sound, however there's still no way to enforce a standard. Thus if there was a non-supported feature, there's a good chance the entire page would simply not display anything. In the current setup, critical information can still be passed.
Google V8, which is one of many new-generation javascript engines 'compiles' javascript into pseudocode, much like .NET 'compiles' c# on the fly. Nothing magical here. Expect more of it esp. as webapps get heavier and more demanding
HTML
HTML is pretty much XML. DTD'd exist for various versions and developers can check against that at any time.
CSS
CSS is not a programming language, however I do agree that "compiled" CSS could work seeing as compilation would compress it. However with the support that CSS has and with the number of essential hacks any CSS needs to have, you'd never manage to compile it without errors.
JS
As others have mentioned, JS IS becoming a compiled language except the browser compiles it for you and not you yourself.