GWT HTML5 offline web-application dynamic manifest - html

I have recently converted one GWT web application to be capable to work in HTML5 offline mode. So far seems to work fine but I'm wondering if it's a good idea to serve different cache.manifest versions for different browsers?
As we know, GWT will need only one permutation for one target browser (in case of one language, let's make it simple). And it's obvious that we would need to download just one XXXXXX.cach.html for one target browser.
I see it's possible as on the serverside I could check User-Agent HTTP header and return contents of the appropriate version of my cache.manifest, accordingly setting all headers in order not to break offline status checking behavior. The rest of resources would be served with no custom filtering.
Is this a good idea to optimize it this way? Is there anything I could be missing?

By accident, I came across the following project: Mobile GWT. Quick documentation (HTML5 Manifest) and code (HTML5ManifestServletBase) review reveals that manifest is prepared considering each client, so that only required resources are sent over the network. Pity,- I was just about to make my own open source solution...

Related

PWA vs html5 webapp?

I know that the most Iconic feature of PWA are
Service Worker: which make user can use the app offline from cached resources
Add To Home Screen: With this feature, user can add a shortcut of the app on their mobile home screen, to get a experience like interacting with a Native App(But there still a huge different, in my opinion)
etc.
However, I can do like all of them on about 6/7 years ago by using the HTML5 technology at that time, I know that service worker comes recently but there also was "HTML5 App Cache, as well as the Local Storage, Indexed DB, and the File API specifications." can do similar things.
Is there anyone can explain what's the difference between PWA and HTML5 webapps? or they are just the in term of same? or a similar concept but different implementation? or PWA is the (next gen/extension) of HTML5 webapp?
I might have some misunderstanding on PWAs, since I am new to this term, Thanks.
If keep it simple PWA is ordinary site with 3 additional features.
responsive design - site should look well on all devices
manifest.json - site must have general description about itself stored in manifest.json
caching - site must work offline
I believe 1 and 2 is easey. And 3 is realy what PWA is all about. So the question is: how we can provide offline support?
First problem is how we can get our initial .html .css and .js files without internet connection? Answer is: we should use service worker or App Cache. But App Cache has a lot of problems and probably will be deprecated, in other hand service worker under developement and become better each month. You can read more about their difference here.
Second problem is how we can get server data without internet connection? We should store most vital data somwhere. But after we get our .js file from cache we have access to Local Storage, Indexed DB etc. So we can store vital data in any of this storage while we online and get it back from them when we offline. It is totally up to you how you will handle that.
I believe there no cleare and strict defentiton for PWA and HTML5 webapp (at least for now). So you can asume it is the same but today PWA is more common word.

Loading external files for a non-hosted HTML5 application

I am currently developing a HTML5 game which loads in an external resource. Currently, I am using XMLHttpRequest to read in the file, but this does not work on Chrome, resulting in a
XMLHttpRequest cannot load file:///E:/game/data.txt
Cross origin requests are only supported for HTTP.
The file is in the same directory as the HTML5 file.
Questions:
Is there any way for a HTML5 application to use XMLHttpRequest (or
other method) to load in an external file without requiring it to be
hosted on a webserver?
If I package the HTML5 code as an application on a tablet/phone
which supports HTML5, would XMLHttpRequest be able to load external
files?
(a) Yes and no. As a matter of security-policy, XHR has traditionally been both same-protocol (ie: http://, rather than file:///), and on top of that, has traditionally been same-domain, as well (as in same subdomain -- http://pages.site.com/index can't get a file from http://scripts.site.com/). Cross-domain requests are now available, but they require a webserver, regardless, and the server hosting the file has to accept the request specifically.
(b) So in a roundabout way, the answer is yes, some implementations might (incorrectly) allow you to grab a file through XHR, even if the page is speaking in file-system terms, rather than http requests (older versions of browsers)... ...but otherwise you're going to need a webserver of one kind or another. The good news is that they're dirt-simple to install. EasyPHP would be sufficient, and it's pretty much a 3-click solution. There are countless others as well. It's just the first that comes to mind in terms of brain-off installation, if all you want is a file-server in apache, and you aren't planning on using a server-side scripting language (or if you do plan on using PHP).
XMLHttpRequest would absolutely be able to get external files...
IF they're actually external (ie: not bundled in a phone-specific cache -- use the phone's built-in file-access API for that, and write a wrapper to handle each one with the same, custom interface), AND the phone currently has reception -- be prepared to handle failure-conditions (like having a default-settings object, or having error-handling or whatever the best-case is, for whatever is missing).
Also, look into Application Cache Manifests. Again, this is an html5 solution which different versions of different phones handle differently (early-days versus more standardized formats). DO NOT USE IT DURING DEVELOPMENT, AS IT MAKES TESTING CODE/CONTENT CHANGES MISERABLY SLOW AND PAINFUL, but it's useful when your product is pretty much finished and bug-free, and seconds away from launch, where you tell users' browsers to cache all of the content for eternity, so that they can play offline, and they can save all kinds of bandwidth, not having to download everything the next time they play.

Offline webapps in HTML5 - Persist after closing the browser?

With HTML5's offline capabilities is it possible to create an app that will persist after the connection is lost and the browser is closed? Specifically, here's what I'd like to do:
Connect to the app while online. Download the entire app including a small database it runs on.
Close the browser and disconnect.
Open the browser again while offline and load the app from the local cache.
Thanks to Mark Pilgrim's excellent book I believe I have an idea of how to accomplish the first step, I'm mainly wondering if the last step is possible. If this is possible, I'm guessing it requires some configuration of the browser. Any settings I should be aware of that aren't obvious?
Thanks very much for any help offered.
The last step should be possible - it just depends on what extent you want to implement it to. To my knowledge it shouldn't require any browser settings. You just have to be aware of the limitations of local storage, which I believe is 5mb max at the moment (for most browsers). Obviously you'd have to perform the checks for such permissions as outlined int the Dive Into Html5 guide you linked.
The quickest and dirtiest way is to simply issue a GET request to your online app. If it responds correctly, then use the online version. If not, use the local cache. Just disguise the timeout/failed response as a 'loading' screen.

What are the pros and cons of various ways of analyzing websites?

I'd like to write some code which looks at a website and its assets and creates some stats and a report. Assets would include images. I'd like to be able to trace links, or at least try to identify menus on the page. I'd also like to take a guess at what CMS created the site, based on class names and such.
I'm going to assume that the site is reasonably static, or is driven by a CMS, but is not something like an RIA.
Ideas about how I might progress.
1) Load site into an iFrame. This would be nice because I could parse it with jQuery. Or could I? Seems like I'd be hampered by cross-site scripting rules. I've seen suggestions to get around those problems, but I'm assuming browsers will continue to clamp down on such things. Would a bookmarklet help?
2) A Firefox add-on. This would let me get around the cross-site scripting problems, right? Seems doable, because debugging tools for Firefox (and GreaseMonkey, for that matter) let you do all kinds of things.
3) Grab the site on the server side. Use libraries on the server to parse.
4) YQL. Isn't this pretty much built for parsing sites?
My suggestion would be:
a) Chose a scripting language. I suggest Perl or Python: also curl+bash but it bad no exception handling.
b) Load the home page via a script, using a python or perl library.
Try Perl WWW::Mechanize module.
Python has plenty of built-in module, try a look also at www.feedparser.org
c) Inspect the server header (via the HTTP HEAD command) to find application server name. If you are lucky you will also find the CMS name (i.d. WordPress, etc).
d) Use Google XML API to ask something like "link:sitedomain.com" to find out links pointing to the site: again you will find code examples for Python on google home page. Also asking domain ranking to Google can be helpful.
e)You can collect the data in a SQLite db, then post process them in Excel.
You should simply fetch the source (XHTML/HTML) and parse it. You can do that in almost any modern programming language. From your own computer that is connected to Internet.
iframe is a widget for displaying HTML content, it's not a technology for data analysis. You can analyse data without displaying it anywhere. You don't even need a browser.
Tools in languages like Python, Java, PHP are certainly more powerful for your tasks than Javascript or whatever you have in those Firefox extensions.
It also does not matter what technology is behind the website. XHTML/HTML is just a string of characters no matter how a browser renders it. To find your "assets" you will simply look for specific HTML tags like "img", "object" etc.
I think an writing an extension to Firebug would proabably be one of the easiest way to do with. For instance YSlow has been developed on top of Firebug and it provides some of the features you're looking for (e.g. image, CSS and Javascript-summaries).
I suggest you try option #4 first (YQL):
The reason being that it looks like this might get you all the data you need and you could then build your tool as a website or such where you could get info about a site without actually having to go to the page in your browser. If YQL works for what you need, then it looks like you'd have the most flexibility with this option.
If YQL doesn't pan out, then I suggest you go with option #2 (a firefox addon).
I think you should probably try and stay away from Option #1 (the Iframe) because of the cross-site scripting issues you already are aware of.
Also, I have used Option #3 (Grab the site on the server side) and one problem I've ran into in the past is the site being grabbed loading content after the fact using AJAX calls. At the time I didn't find a good way to grab the full content of pages that use AJAX - SO BE WARY OF THAT OBSTACLE! Other people here have ran into that also, see this: Scrape a dynamic website
THE AJAX DYNAMIC CONTENT ISSUE:
There may be some solutions to the ajax issue, such as using AJAX itself to grab the content and using the evalScripts:true parameter. See the following articles for more info and an issue you might need to be aware of with how evaluated javascript from the content being grabbed works:
Prototype library: http://www.prototypejs.org/api/ajax/updater
Message Board: http://www.crackajax.net/forums/index.php?action=vthread&forum=3&topic=17
Or if you are willing to spend money, take a look at this:
http://aptana.com/jaxer/guide/develop_sandbox.html
Here is an ugly (but maybe useful) example of using a .NET component called WebRobot to scrap content from a dynamic AJAX enabled site such as Digg.com.
http://www.vbdotnetheaven.com/UploadFile/fsjr/ajaxwebscraping09072006000229AM/ajaxwebscraping.aspx
Also here is a general article on using PHP and the Curl library to scrap all the links from a web page. However, I'm not sure if this article and the Curl library covers the AJAX content issue:
http://www.merchantos.com/makebeta/php/scraping-links-with-php/
One thing I just thought of that might work is:
grab the content and evaluate it using AJAX.
send the content to your server.
evaluate the page, links, etc..
[OPTIONAL] save the content as a local page on your server .
return the statistics info back to the page.
[OPTIONAL] display cached local version with highlighting.
^Note: If saving a local version, you will want to use regular expressions to convert relative link paths (for images especially) to be correct.
Good luck!
Just please be aware of the AJAX issue. Many sites nowadays load content dynamically using AJAX. Digg.com does, MSN.com does for it's news feeds, etc...
That really depends on the scale of your project. If it’s just casual, not fully automated, I’d strongly suggest a Firefox Addon.
I’m right in the middle of similar project. It has to analyze the DOM of a page generated using Javascript. Writing a server-side browser was too difficult, so we turned to some other technologies: Adobe AIR, Firefox Addons, userscripts, etc.
Fx addon is great, if you don’t need the automation. A script can analyze the page, show you the results, ask you to correct the parts, that it is uncertain of and finally post the data to some backend. You have access to all of the DOM, so you don’t need to write a JS/CSS/HTML/whatever parser (that would be hell of a job!)
Another way is Adobe AIR. Here, you have more control over the application — you can launch it in the background, doing all the parsing and analyzing without your interaction. The downside is — you don’t have access to all DOM of the pages. The only way to go pass this is to set up a simple proxy, that fetches target URL, adds some Javascript (to create a trusted-untrusted sandbox bridge)… It’s a dirty hack, but it works.
Edit:
In Adobe AIR, there are two ways to access a foreign website’s DOM:
Load it via Ajax, create HTMLLoader object, and feed the response into it (loadString method IIRC)
Create an iframe, and load the site in untrusted sandbox.
I don’t remember why, but the first method failed for me, so I had to use the other one (i think there was some security reasons involved, that I couldn’t workaround). And I had to create a sandbox, to access site’s DOM. Here’s a bit about dealing with sandbox bridges. The idea is to create a proxy, that adds a simple JS, that creates childSandboxBridge and exposes some methods to the parent (in this case: the AIR application). The script contents is something like:
window.childSandboxBridge = {
// ... some methods returning data
}
(be careful — there are limitations of what can be passed via the sandbox bridge — no complex objects for sure! use only the primitive types)
So, the proxy basically tampered with all the requests that returned HTML or XHTML. All other was just passed through unchanged. I’ve done this using Apache + PHP, but could be done with a real proxy with some plugins/custom modules for sure. This way I had the access to DOM of any site.
end of edit.
The third way I know of, the hardest way — set up an environment similar to those on browsershots. Then you’re using firefox with automation. If you have a Mac OS X on a server, you could play with ActionScript, to do the automation for you.
So, to sum up:
PHP/server-side script — you have to implement your own browser, JS engine, CSS parser, etc, etc. Fully under control and automated instead.
Firefox Addon — has access to DOM and all stuff. Requires user to operate it (or at least an open firefox session with some kind of autoreload). Nice interface for a user to guide the whole process.
Adobe AIR — requires a working desktop computer, more difficult than creating a Fx addon, but more powerful.
Automated browser — more of a desktop programming issue that webdevelopment. Can be set up on a linux terminal without graphical environment. Requires master hacking skills. :)
Being primarily a .Net programmer these days, my advice would be to use C# or some other language with .Net bindings. Use the WebBrowser control to load the page, and then iterate through the elements in the document (via GetElementsByTagName()) to get links, images, etc. With a little extra work (parsing the BASE tag, if available), you can resolve src and href attributes into URL's and use the HttpWebRequest to send HEAD requests for the target images to determine their sizes. That should give you an idea of how graphically intensive the page is, if that's something you're interested in. Additional items you might be interested in including in your stats could include backlinks / pagerank (via Google API), whether the page validates as HTML or XHTML, what percentage of links link to URL's in the same domain versus off-site, and, if possible, Google rankings for the page for various search strings (dunno if that's programmatically available, though).
I would use a script (or a compiled app depending on language of choice) written in a language that has strong support for networking and text parsing/regular expressions.
Perl
Python
.NET language of choice
Java
whatever language you are most comfortable with. A basic stand alone script/app keeps you from needing to worry too much about browser integration and security issues.

What is the easiest way to validate page references? (css, javascript, etc..)

i've got firebug (my team does not have firefox), and the IE developer toolbar(IE7) but I can not seem to figure out how to easily validate if the referenced files in a page are loading (i see javascript errors, but that doesn't succinctly point me to the exact file in a heirarchy of jquery - jqueryUI - datepicker files).
Additionally i'd like to be able to do this remotely, because on our corporate domain some files load fine for me, but not anyone else because they sometimes get encrypted to my domain user. So it would be nice if this process was either simple enough for my teammates to do it very quickly, or ... even better somehow with automation from a remote machine or web service request.
I thought I had seen a simple place on firebug to validate what loaded and what did not, but I can't find it now.
What are my options?
do you tried Javascript Lint?
Or the javascript plugin for Eclipse.
do you know YSlow?
It provides you with a set of excelent tools for web developing and I think it solves your question