Random Querystring to avoid IE caching - html

It is a well known problem that IE caches too much of html, even when giving a Cache-Control: no-cache or Last-Modified header to everypage.
This behaiviour is really troubling when working with querystrings to get dynamic information, as IE considers it to be the same page (i.e.: http://example.com/?id=10) and serves the cached version.
I've solved it adding either a random number or a timestring to the querystring (as others have done) like this http://example.com/?id=10&t=2009-08-06_13:12:56 that I just ignore serverside.
Is there a better option? Is there another, cleaner way to acomplish this? I'm aware that POST isn't cached, but it is semanticaly correct to use GET here.

Assuming you are using jQuery, instead of using $.get or $.getJson, use the more generic $.ajax and explicitly set the cache value to false. The following is an example:
$.ajax({
url: "/Controller/Action",
cache: false,
type: "GET",
dataType: "json",
success: function(data, textStatus) {
alert("success");
}
});
A little more code required (not much though) than using .getJson or .get but will solve the problem cleanly without appending random numbers.

You could also use the current Unix Time in milliseconds to avoid the problem of many requests in one second (it is much less likely to have multiple requests in one millisecond)
var url = "http://whatever.com/stuff?key=value&ie=" + (new Date()).getTime();

Using a random number (not timestamp) on the querystring, or actually changing the filename are the two methods recommended. Steve Souders and YAHOO!'s performance group has published a ton of useful information and practices they've discovered and developed while optimizing one of the world's most heavily-visited properties.

So, in the end, the only reliable way to do this (thanks to IE6) is using a
random,
or
time bound
querystring.
You could use a
time bound querystring
that only changes every 15 seconds (or any other amount of time), so you'd lower the server hit count, as you'd see locally cached content for those 15 seconds.
If you have
a
standard
compliant
browser, you can get away with only using
ETags.

I have the same problem, but take care, in one second there can be many requests. This is why I use this:
$.getJSON("http://server/example?param=value&dummy=" + Math.random(), ...);

Have you tried adding an ETag header in the response?
You might use a random one, or a checksum of the generated page so that the cached version is served when appropriate.
I'm not sure what's the behaviour of IE, but with recent versions it should work.
See also the HTTP RFC section on ETag

Related

Data Studio connector making multiple calls to API when it should only be making 1

I'm finalizing a Data Studio connector and noticing some odd behavior with the number of API calls.
Where I'm expecting to see a single API call, I'm seeing multiple calls.
In my apps script I'm keeping a simple tally which increments by 1 every url fetch and that is giving me the correct number I expect to see with getData().
However, in my API monitoring logs (using Runscope) I'm seeing multiple API requests for the same endpoint, and varying numbers for different endpoints in a single getData() call (they should all be the same). E.g.
I can't post the code here (client project) but it's substantially the same framework as the Data Connector code on Google's docs. I have caching and backoff implemented.
Looking for any ideas or if anyone has experienced something similar?
Thanks
Per the this reference, GDS will also perform semantic type detection if you aren't explicitly defining this property for your fields. If the query is semantic type detection, the request will feature sampleExtraction: true
When Data Studio executes the getData function of a community connector for the purpose of semantic detection, the incoming request will contain a sampleExtraction property which will be set to true.
If the GDS report includes multiple widgets with different dimensions/metrics configuration then GDS might fire multiple getData calls for each of them.
Kind of a late answer but this might help others who are facing the same problem.
The widgets / search filters attached to a graph issue getData calls of their own. If your custom adapter is built to retrieve data via API calls from third party services, data which is agnostic to the request.fields property sent forward by GDS => then these API calls are multiplied by N+1 (where N = the amout of widgets / search filters your report is implementing).
I could not find an official solution for this either, so I invented a workaround using cache.
The graph's request for getData (typically requesting more fields than the Search Filters) will be the only one allowed to query the API Endpoint. Before starting to do so it will store a key in the cache "cache_{hashOfReportParameters}_building" => true.
if (enableCache) {
cache.putString("cache_{hashOfReportParameters}_building", 'true');
Logger.log("Cache is being built...");
}
It will retrieve API responses, paginating in a look, and buffer the results.
Once it finished it will delete the cache key "cache_{hashOfReportParameters}building", and will cache the final merged results it buffered so far inside "cache{hashOfReportParameters}_final".
When it comes to filters, they also invoke: getData but typically with only up to 3 requested fields. First thing we want to do is make sure they cannot start executing prior to the primary getData call... so we add a little bit of a delay for things that might be the search filters / widgets that are after the same data set:
if (enableCache) {
var countRequestedFields = requestedFields.asArray().length;
Logger.log("Total Requested fields: " + countRequestedFields);
if (countRequestedFields <= 3) {
Logger.log('This seams to be a search filters.');
Utilities.sleep(1000);
}
}
After that we compute a hash on all of the moving parts of the report (date range, plus all of the other parameters you have set up that could influence the data retrieved form your API endpoints):
Now the best part, as long as the main graph is still building the cache, we make these getData calls wait:
while (cache.getString('cache_{hashOfReportParameters}_building') === 'true') {
Logger.log('A similar request is already executing, please wait...');
Utilities.sleep(2000);
}
After this loop we attempt to retrieve the contents of "cache_{hashOfReportParameters}_final" -- and in case we fail, its always a good idea to have a backup plan - which would be to allow it to traverse the API again. We have encountered ~ 2% error rate retrieving data we cached...
With the cached result (or buffered API responses), you just transform your response as per the schema GDS needs (which differs between graphs and filters).
As you start implementing this, you`ll notice yet another problem... Google Cache is limited to max 100KB per key. There is however no limit on the amount of keys you can cache... and fortunately others have encountered similar needs in the past and have come up with a smart solution of splitting up one big chunk you need cached into multiple cache keys, and gluing them back together into one object when retrieving is necessary.
See: https://github.com/lwbuck01/GASs/blob/b5885e34335d531e00f8d45be4205980d91d976a/EnhancedCacheService/EnhancedCache.gs
I cannot share the final solution we have implemented with you as it is too specific to a client - but I hope that this will at least give you a good idea on how to approach the problem.
Caching the full API result is a good idea in general to avoid round trips and server load for no good reason if near-realtime is good enough for your needs.

How to extend AFNetworking 2.0 to perform request combining

I have a UI where the same image URL could be requested by several UIImageViews at varying times. Obviously if a request from one of them has finished then returning the cached version works as expected. However, especially with slower networks, I'd like to be able to piggy-back requests for an image URL onto any currently running/waiting HTTP request for the same URL.
On an HTTP server this called request combining and I'd love to do the same in the client - to combine the different requests for the same URL into a single request and then callback separately to each of the callers). The requests for that URL dont happen to start at the same time.
What's the best way to accomplish this?
I think re-writing UIImageView+AFNetworking might be the easiest way:
check the af_sharedImageRequestOperationQueue to see if it has an operation with the same request
if I do already have an operation in the queue or running then add myself to some list of callbacks/blocks to be called on success/failure
if I don't have the operation, then create it as normal
in the setCompletionBlockWithSuccess to call each of the blocks in turn.
Any simpler alternatives?
I encountered a similar problem and decided that your way was the most straightforward. One added bit of complexity is that these downloads require special credentials and so must go through their own operation queue. Here's the code from my UIImageView category to check whether a particular URL is inflight:
NSUInteger foundOperation = [[ConnectionManager sharedConnectionManager].operationQueue.operations indexOfObjectPassingTest:^BOOL(AFHTTPRequestOperation *obj, NSUInteger idx, BOOL *stop) {
BOOL URLAlreadyInFlight = [obj.request.URL.absoluteString isEqualToString:URL.absoluteString];
if (URLAlreadyInFlight) {
NSBlockOperation *updateUIOperation = [NSBlockOperation blockOperationWithBlock:^{
[[NSOperationQueue mainQueue] addOperationWithBlock:^{
self.image = [[ImageCache sharedImageCache] cachedImageForURL:URL];
}];
}];
//Makes updating the UI dependent on the completion of the matching operation.
[updateUIOperation addDependency:obj];
}
return URLAlreadyInFlight;
}];
Were you able to come up with a better solution?
EDIT: Well, it looks like my method of updating the UI just can't work, as the operation's completion blocks are run asynchronously, so the operation finishes before the blocks are run. However, I was able to modify the image cache to be able to add callbacks for when certain URLs are cached, which seems to work correctly. So this method will properly detect when certain URLs are in flight and be able to take action with that knowledge.

How do I load JSON data synchronously with d3.js?

When my site first initializes, it queries a server to get back some data. I can't lay anything out on the page until this data gets back. With d3.js, I can use d3.json() to get my data, but because it's asynchronous, I need to put the entire page logic in the callback function. How do I request the data and wait for it to come back?
You're basically doing it the only way. The callback function has to be the one initiating the rest of your code. You don't need all your code in the callback function though, you can introduce indirection. So the callback function will call another function inside which would be what is currently in your callback function.
Using synchronous requests in JavaScript is not recommended as it blocks the whole thread and nothing gets done in the meantime. The user can also not interact well with the webpage.
If it is really what you want, you can do the following (using jQuery):
var jsonData;
jQuery.ajax({
dataType: "json",
url: "jsondatafile.json",
async: false
success: function(data){jsonData = data}
});
However it is not recommended, even by jQuery, as explained here the jQuery.ajax() documentation:
The first letter in Ajax stands for "asynchronous," meaning that the operation occurs in parallel and the order of completion is not guaranteed. The async option to $.ajax() defaults to true, indicating that code execution can continue after the request is made. Setting this option to false (and thus making the call no longer asynchronous) is strongly discouraged, as it can cause the browser to become unresponsive.
As a final note, I don't see what prevents you from using whatever function there is in the success attribute in an asynchronous way. Most of the times changing your design to use async requests will be worth it. By experience, debugging a page that uses synchronous requests is a pain (especially when the requests don't get answered...).

How does facebook, gmail send the real time notification?

I have read some posts about this topic and the answers are comet, reverse ajax, http streaming, server push, etc.
How does incoming mail notification on Gmail works?
How is GMail Chat able to make AJAX requests without client interaction?
I would like to know if there are any code references that I can follow to write a very simple example. Many posts or websites just talk about the technology. It is hard to find a complete sample code. Also, it seems many methods can be used to implement the comet, e.g. Hidden IFrame, XMLHttpRequest. In my opinion, using XMLHttpRequest is a better choice. What do you think of the pros and cons of different methods? Which one does Gmail use?
I know it needs to do it both in server side and client side.
Is there any PHP and Javascript sample code?
The way Facebook does this is pretty interesting.
A common method of doing such notifications is to poll a script on the server (using AJAX) on a given interval (perhaps every few seconds), to check if something has happened. However, this can be pretty network intensive, and you often make pointless requests, because nothing has happened.
The way Facebook does it is using the comet approach, rather than polling on an interval, as soon as one poll completes, it issues another one. However, each request to the script on the server has an extremely long timeout, and the server only responds to the request once something has happened. You can see this happening if you bring up Firebug's Console tab while on Facebook, with requests to a script possibly taking minutes. It is quite ingenious really, since this method cuts down immediately on both the number of requests, and how often you have to send them. You effectively now have an event framework that allows the server to 'fire' events.
Behind this, in terms of the actual content returned from those polls, it's a JSON response, with what appears to be a list of events, and info about them. It's minified though, so is a bit hard to read.
In terms of the actual technology, AJAX is the way to go here, because you can control request timeouts, and many other things. I'd recommend (Stack overflow cliche here) using jQuery to do the AJAX, it'll take a lot of the cross-compability problems away. In terms of PHP, you could simply poll an event log database table in your PHP script, and only return to the client when something happens? There are, I expect, many ways of implementing this.
Implementing:
Server Side:
There appear to be a few implementations of comet libraries in PHP, but to be honest, it really is very simple, something perhaps like the following pseudocode:
while(!has_event_happened()) {
sleep(5);
}
echo json_encode(get_events());
The has_event_happened function would just check if anything had happened in an events table or something, and then the get_events function would return a list of the new rows in the table? Depends on the context of the problem really.
Don't forget to change your PHP max execution time, otherwise it will timeout early!
Client Side:
Take a look at the jQuery plugin for doing Comet interaction:
Project homepage: http://plugins.jquery.com/project/Comet
Google Code: https://code.google.com/archive/p/jquerycomet/ - Appears to have some sort of example usage in the subversion repository.
That said, the plugin seems to add a fair bit of complexity, it really is very simple on the client, perhaps (with jQuery) something like:
function doPoll() {
$.get("events.php", {}, function(result) {
$.each(result.events, function(event) { //iterate over the events
//do something with your event
});
doPoll();
//this effectively causes the poll to run again as
//soon as the response comes back
}, 'json');
}
$(document).ready(function() {
$.ajaxSetup({
timeout: 1000*60//set a global AJAX timeout of a minute
});
doPoll(); // do the first poll
});
The whole thing depends a lot on how your existing architecture is put together.
Update
As I continue to recieve upvotes on this, I think it is reasonable to remember that this answer is 4 years old. Web has grown in a really fast pace, so please be mindful about this answer.
I had the same issue recently and researched about the subject.
The solution given is called long polling, and to correctly use it you must be sure that your AJAX request has a "large" timeout and to always make this request after the current ends (timeout, error or success).
Long Polling - Client
Here, to keep code short, I will use jQuery:
function pollTask() {
$.ajax({
url: '/api/Polling',
async: true, // by default, it's async, but...
dataType: 'json', // or the dataType you are working with
timeout: 10000, // IMPORTANT! this is a 10 seconds timeout
cache: false
}).done(function (eventList) {
// Handle your data here
var data;
for (var eventName in eventList) {
data = eventList[eventName];
dispatcher.handle(eventName, data); // handle the `eventName` with `data`
}
}).always(pollTask);
}
It is important to remember that (from jQuery docs):
In jQuery 1.4.x and below, the XMLHttpRequest object will be in an
invalid state if the request times out; accessing any object members
may throw an exception. In Firefox 3.0+ only, script and JSONP
requests cannot be cancelled by a timeout; the script will run even if
it arrives after the timeout period.
Long Polling - Server
It is not in any specific language, but it would be something like this:
function handleRequest () {
while (!anythingHappened() || hasTimedOut()) { sleep(2); }
return events();
}
Here, hasTimedOut will make sure your code does not wait forever, and anythingHappened, will check if any event happend. The sleep is for releasing your thread to do other stuff while nothing happens. The events will return a dictionary of events (or any other data structure you may prefer) in JSON format (or any other you prefer).
It surely solves the problem, but, if you are concerned about scalability and perfomance as I was when researching, you might consider another solution I found.
Solution
Use sockets!
On client side, to avoid any compatibility issues, use socket.io. It tries to use socket directly, and have fallbacks to other solutions when sockets are not available.
On server side, create a server using NodeJS (example here). The client will subscribe to this channel (observer) created with the server. Whenever a notification has to be sent, it is published in this channel and the subscriptor (client) gets notified.
If you don't like this solution, try APE (Ajax Push Engine).
Hope I helped.
According to a slideshow about Facebook's Messaging system, Facebook uses the comet technology to "push" message to web browsers. Facebook's comet server is built on the open sourced Erlang web server mochiweb.
In the picture below, the phrase "channel clusters" means "comet servers".
Many other big web sites build their own comet server, because there are differences between every company's need. But build your own comet server on a open source comet server is a good approach.
You can try icomet, a C1000K C++ comet server built with libevent. icomet also provides a JavaScript library, it is easy to use as simple as:
var comet = new iComet({
sign_url: 'http://' + app_host + '/sign?obj=' + obj,
sub_url: 'http://' + icomet_host + '/sub',
callback: function(msg){
// on server push
alert(msg.content);
}
});
icomet supports a wide range of Browsers and OSes, including Safari(iOS, Mac), IEs(Windows), Firefox, Chrome, etc.
Facebook uses MQTT instead of HTTP. Push is better than polling.
Through HTTP we need to poll the server continuously but via MQTT server pushes the message to clients.
Comparision between MQTT and HTTP: http://www.youtube.com/watch?v=-KNPXPmx88E
Note: my answers best fits for mobile devices.
One important issue with long polling is error handling.
There are two types of errors:
The request might timeout in which case the client should reestablish the connection immediately. This is a normal event in long polling when no messages have arrived.
A network error or an execution error. This is an actual error which the client should gracefully accept and wait for the server to come back on-line.
The main issue is that if your error handler reestablishes the connection immediately also for a type 2 error, the clients would DOS the server.
Both answers with code sample miss this.
function longPoll() {
var shouldDelay = false;
$.ajax({
url: 'poll.php',
async: true, // by default, it's async, but...
dataType: 'json', // or the dataType you are working with
timeout: 10000, // IMPORTANT! this is a 10 seconds timeout
cache: false
}).done(function (data, textStatus, jqXHR) {
// do something with data...
}).fail(function (jqXHR, textStatus, errorThrown ) {
shouldDelay = textStatus !== "timeout";
}).always(function() {
// in case of network error. throttle otherwise we DOS ourselves. If it was a timeout, its normal operation. go again.
var delay = shouldDelay ? 10000: 0;
window.setTimeout(longPoll, delay);
});
}
longPoll(); //fire first handler

Determine user agent timezone offset without using javascript?

Is there a way to determine the timezone for a user agent without javasript?
normally, i would execute this snippet of javascript:
<script type="text/javascript" language="JavaScript">
var offset = new Date();
document.write('<input type="hidden" id="clientTzOffset"
name="clientTzOffset" value="' + offset.getTimezoneOffset() + '"/>');
</script>
However, for a lot of mobiles and clients, JS is either not existant or turned off. Is there something clever I can do or am I reduced to "tell me where you are and what timzeone you are in?"
Use both. Have javascript set the default value of a list or text field. The percentage of people without javascript is so small the burden is tiny.
I was going to say "Use the request Date: header" but that's a response header only it seems.
I thought of another solution but it is a little complex. Set up some fake HEAD requests with a max-age: 1 response header to coerce the browser into re-fetching them. You should then receive an if-modified-since header from any modern browser like so:
If-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMT
Just be careful not to send any Last-Modified headers with the first response because
To get best results when sending an If-
Modified-Since header field for cache validation, clients are
advised to use the exact date string received in a previous Last-
Modified header field whenever possible.
Note the "whenever possible" disclaimer. This, and other parts of the header description imply that the client will use its own clock when it doesn't know anything of the servers.
With the right combination of headers this could actually work very well.
EDIT: Tried some tests with FF but couldn't find a valid combination of headers to trigger an if-modified-since in client time. FF only sends the header if it got a last-modified header previously and then it just reflects the value back (even if it isn't a valid date).
Maybe with a server side language you could do an IP lookup, and then determine from their country what timezone they could be in.
I'd imagine this could be problematic, though.
If going down this road, this question (and answers) may be of assistance.
Getting the location from an IP address
I've seen a website where instead of asking what timezone the user was in it simply asked "pick the correct time" and it showed them a list of times. It's a subtle difference from "what timezone are you in" but much easier to understand. Unfortunately I can't find the website anymore