Implementation of synchronization primitives over HTML5 local storage

Implementation of synchronization primitives over HTML5 local storage - html

Consider a scenario where a browser has two or more tabs pointing to the same origin. Different event loops of the different tabs can lead to race conditions while accessing local storage and the different tabs can potentially overwrite each other's changes in local storage.
I'm writing a web application that would face such race conditions, and so I wanted to know about the different synchronization primitives that could be employed in such a scenario.

My reading of the relevant W3C spec, and the comment from Ian Hickson at the end of this blog post on the topic, suggests that what's supposed to happen is that a browser-global mutex controls access to each domain's localStorage. Each separate "thread" (see below for what I'm fairly confident that means) of JavaScript execution must attempt to acquire the storage mutex (if it doesn't have it already) whenever it examines local storage. Once it gets the mutex, it doesn't give it up until it's completely done.
Now, what's a thread, and what does it mean for a thread to be done? The only thing that makes sense (and the only thing that's really consistent with Hixie's claim that the mutex makes things "completely safe") is that a thread is JavaScript code in some browser context that's been initiated by some event. (Note that one possible event is that a <script> block has just been loaded.) The nature of JavaScript in the browser in general is that code in a <script> block, or code in a handler for any sort of event, runs until it stops; that is, runs to the end of the <script> body, or else runs until the event handler returns.
So, given that, what the storage mutex is supposed to do is to force all shared-domain scripts to block upon attempting to claim the mutex when one of their number already has it. They'll block until the owning thread is done — until the <script> tag code is exhausted, or until the event handler returns. That behavior would achieve this guarantee from the spec:
Thus, the length attribute of a Storage object, and the value of the various properties of that object, cannot change while a script is executing, other than in a way that is predictable by the script itself.
However, it appears that WebKit-based browsers (Chrome and Safari, and probably the Android browser too, and now maybe Opera?) don't bother with the mutex implementation, which leaves you in the situation that drove you to ask the question. If you're concerned with such race conditions (a perfectly reasonable attitude), then you can use either the locking mechanism suggested in the blog post (by someone who does, or did, work for Stackoverflow :) or else implement a version counting system to detect dirty writes. (edit — now that I think about it, an RDBMS-style version mechanism would be problematic, because there'd still be a race condition checking the version!)

Related

When to perform certain actions in a Google Extension: onStartup, onInstalled, or just in the JS

Not usually a fan of asking these kind of open ended questions but I cant find any reliable documentation (either independent or from google) that is very clear and tutorials and examples all conflict with each other.
Currently I'm working with chrome.commands.onCommand as well as chrome.tabs.onCreated and chrome.tabs.onActivated but I'm interested in general guidelines as well (which it seems to me may be impossible). I've found a few resources such as this one and the samples but the samples are mostly one liners (WHY) and the only useful SO link I found specifically states that the post is specific to that API.
I'm using a persistent background page (since the SO answer says that matters) and really like the quote included from the documentation:
If you need to do some initialization when your extension is installed or upgraded, listen to the runtime.onInstalled event. This is a good place to register for declarativeWebRequest rules, contextMenu entries, and other such one-time initialization
But I'm currently doing all my registration in runtime.onInstalled and lose keybinds (tab stuff still seems to work but relies on the keybinds so I can't tell for sure) when the browser crashes and restarts. I'd think keybinds are a one-time initialization thing but clearly not. I could just move the keybinds to onStartup as I know it doesn't work in onInstalled but I'd prefer to know best practices for extensions (who cares if I don't use the best practice for some random library, but extensions are all about the best code imo).
Any help would be appreciated, and if any other info is needed feel free to leave a comment and let me know. I'd prefer to not have to come up with a minimum example though if possible though and keep this to guidelines for chrome.commands, chrome.tabs, as well as general guidelines for persistent pages (although event pages would be appreciated since there seem to be no good resources and others may find this question in the future).

Given the useful info by #wOxxOm and #err1100 I've decided to self answer this question. I like using comments in the question for clarification, however I often see SO users answering the question in the comments instead of posting an answer (maybe because the answers in the comments are usually more conversational than stating an answer) but either way I think questions deserve answers. If either of them posts an answer I'll accept the first to post or if someone other than them posts a significantly better answer (don't go stealing their credit).
Persistent Pages:
Persistent pages apparently have better support across different browsers and so should be preferred in that case (at least as of 11/24/2018). Otherwise consider using an event page if the extension is just for Chrome as persistence is rarely needed.
onInstalled:
Context menus and anything with the declarativeXXXX naming scheme should be initialized once in the onInstalled event but there aren't many other things requiring one time initialization.
onStartup:
With persistent pages the script is loaded once and never unloaded unless the browser is closed and so onStartup doesn't have much of a use.
In the Script:
As said above, since the script is loaded only once per browser restart (and also ran once if installed on an already running browser) all per browsing session initalization can be done here.
Event Pages:
I didn't have a clear idea of exactly what the use of non persistent pages were (I believe this is what wOxxOm is referring to as an event page as googling it takes me to defunct documentation and redirects to documentation without the term) prior to this question so I'll clear that up for those who may be in the same boat. A non persistent script is ran (at some point, IDK and won't be testing this) and registers it's event listeners. After that the script dies but the listeners remain meaning that initialization can be done with onInstalled and onStartup (since I know for sure when these are run and not when the non persistent page script is ran) and all the events you registered to listen for will reactivate that part of your script (really just runs the callback provided but whatever).
onInstalled:
Just like with Persistent pages use this for the same things that require one time initialization.
onStartup:
I'd leave no code in the script and everything in a listener if creating an event page but I'm not well versed in event pages so I'll update this if someone comments on it being wrong. Stuff requiring initialization each browser restart would be in this listener.
In the Script:
As said above, I don't have a great understanding of event scripts but it seems like the only code in your script should be for setting up listeners and whatever variables those listeners need. No significant scripting should occur in the script however do as you must to make your extension of course (or just use a persistent page).

The fantastic but confusing idea of Resource Hints: (a)synchronous?

I've been reading through Google's slides for the so-called pre-optimisation. (For the ones interested or for those who do not know what I'm talking about, this slide kinda summarises it.)
In HTML5 we can prefetch and prerender pages in the link element. Here's an overview. We can use the rel values dns-prefetch, subresource, prefetch and prerender.
The first confusing thing is that apparently only prefetch is in the spec for HTML5 (and 5.1) but none of the others are. (Yet!) The second, that browser support is OK for (dns-)prefetch but quite bad for the others. Especially Firefox's lack of support for prerender is annoying.
Thirdly, the question that I ask myself is this: does the prefetching (or any other method) happen as soon as the browser reads the line (and does it, then, block the current page load), or does it wait with loading the resources in the background until the current page is loaded completely?
If it's loaded synchronously in a blocking manner, is there a way to do this asynchronously or after page load? I suppose with a JS solution like this, but I'm not sure it will run asynchronously then.
var pre = document.createElement("link");
pre.setAttribute("rel", "prerender prefetch");
pre.setAttribute("href", "next-page.php");
document.head.appendChild(pre);
Please answer both questions if applicable!
EDIT 17 September
After reading through the Editor's draft of Resource Hints I found the following (emphasis mine):
Resource fetches that may be required for the next navigation can
negatively impact the performance of the current navigation context
due to additional contention for the CPU, GPU, memory, and network
resources. To address this, the user agent should implement logic to
reduce and eliminate such contention:
Resource fetches required for the next navigation should have lower
relative priority and should not block or interfere with resource
fetches required by the current navigation context.
The optimal time to initiate a resource fetch required for the next navigation is
dependent on the negotiated transport protocol, users current
connectivity profile, available device resources, and other context
specific variables. The user agent is left to determine the optimal
time at which to initiate the fetch - e.g. the user agent may decide
to wait until all other downloads are finished, or may choose to
pipeline requests with low priority if the negotiated protocol
supports the necessary primitives. Alternatively, the user agent may
opt-out from initiating the fetch due to resource constraints, user
preferences, or other factors.
Notice how much the user agent may do this or that. I really fear that this will lead to different implementations by different browsers which will lead to a divergence again.
The question remains though. It isn't clear to me whether loading an external resource by using prefetch or other happens synchronously (and thus, when placed in the head, before the content is loaded), or asynchronously (with a lower priority). I am guessing the latter, though I don't understand how that is possible because there's nothing in the link spec that would allow asynchronous loading of link elements' content.

Disclaimer: I haven't spent a lot of dedicated time with these specs, so it's quite likely I've missed some important point.
That said, my read agrees with yours: if a resource is fetched as a result of a pre optimization, it's likely to be fetched asynchronously, and there's little guarantee about where in the pipeline you should expect it to be fetched.
The intent seems advisory rather than prescriptive, in the same way as the CSS will-change attribute advises rendering engines that an element should receive special consideration, but doesn't prescribe the behavior, or indeed that there should be any particular behavior.
there's nothing in the link spec that would allow asynchronous loading of link elements' content
Not all links would load content in any case (an author type wouldn't cause the UA to download the contents of a mailto: URL), and I can't find any mention of fetching resources in the spec apart from that in the discussion around crossorigin:
The exact behavior for links to external resources depends on the exact relationship, as defined for the relevant link type. Some of the attributes control whether or not the external resource is to be applied (as defined below)... User agents may opt to only try to obtain such resources when they are needed, instead of pro-actively fetching all the external resources that are not applied.
(emphasis mine)
That seems to open the door for resources specified by a link to be fetched asynchronously (or not at all).

Google realtime object pool

This question is a little "meta" for SO, but there doesn't seem to be a better place to ask it...
According to Google, realtime collaborative objects are never deleted from the model. So it makes sense to pool objects where possible, rather than not-really-delete them and subsequently create new ones, thus preventing an unnecessary increase in file-size and overhead.
And here's the problem: in an "undo" scenario, this would mean pulling a deleted object out of the trash pool. But "undo" only applies to operations by the local user, and I can't see how the realtime engine could cope if that "deleted" object had already been claimed by a different user.
My question is, am I missing something or wrong-thinking, and/or is there an alternative to a per-user pool?
(It also occurs to me that as a feature, the API could handle pooling deleted objects, automatically minimizing file-bloat.)

I think you have to be very careful about reusing objects in the way you describe. Its really hard to get right. Are you actually running into size issues? In general as long as you don't constantly create and throw out objects, it shouldn't be a big deal.
You can delete the contents of the collab object when its not being used to free up space. That should generally be enough.
(Note, yes, the API could theoretically handle this object cleanup automatically. It turns out to be a really tricky problem to get right, do to features like undo. It might show up as a future feature if it becomes a real issue for people.)

Adding to Cheryl's answer, the one thing that I see as particularly challenging (actually, impossible) is the pulling-an-object-from-the-pool stuff:
Let's say you have a pool of objects, which (currently) contains a single object O1.
When a client needs a new object it will first check the pool. if the pool is not empty it will pull an object from there (the O1 object) and use it, right?
Now, consider the scenario where two clients (a.k.a, editors/collaborators) need a new object at the same time. Each of these clients will run the logic described in the previous paragraph. That is: both clients will check whether the pool is empty and both clients will pull O1 off of the pull.
So, the loosing client will "think" for some time that it succeeded. it will grab an object from the pool and will do some things with it. later on it will receive an event (E) that tells it that the object was actually pulled by another client. At this point the "loosing" client will need to create another object and re-apply whatever changes it did to the first object to this second object.
Given that you do not know if/when the (E) event is going to fire it actually means that every client needs to be prepared to replace every collaborative object it uses with a new one. This seems quite difficult. Making it more difficult is the fact that you cannot do model changes from event handlers (as this will trump the redo/undo stack). So the actual reaction to the (E) event need to be carried out outside of the (E) event handler. Thus, in the time between the receiving of the (E) event and the fix to the model, your UI layer will not be able to use the model.

Clojure agents: rate limiting?

Okay, so I have this small procedural SVG editor in Clojure.
It has a code pane where the user creates code that generates a SVG document, and a preview pane. The preview pane is updated whenever the code changes.
Right now, on a text change event, the code gets recompiled on the UI thread (Ewwwww!) and the preview pane updated. The compilation step should instead happen asynchronously, and agents seem a good answer to that problem: ask an agent to recompile the code on an update, and pass the result to the image pane.
I have not yet used agents, and I do not know whether they work with an implicit queue, but I suspect so. In my case, I have zero interest in computing "intermediate" steps (think about fast keystrokes: if a keystroke happens before a recompilation has been started, I simply want to discard the recompilation) -- ie I want a send to overwrite any pending agent computation.
How do I make that happen? Any hints? Or even a code sample? Is my rambling even making sense?
Thanks!

You describe a problem that has more to deal with execution flow control rather than shared state management. Hence, you might want to leave STM apart for a moment and look into futures: they're still executed in a thread pool as agents, but instead of agents they can be stopped by calling future-cancel, and inspecting their status with future-cancelled?.
There are no strong guarantees that the thread the future is executing can be effectively stopped. Still, your code will be able to try to cancel the future, and move on to schedule the next recompilation.

agents to indeed work on a queue, so each function gets the state of the agent and produces the next state of the agent. Agents track an identity over time. this sounds like a little more than you need, atoms are a slightly better fit for your task and used in a very similar manner.

HTML localStorage setItem and getItem performance near 5MB limit?

I was building out a little project that made use of HTML localStorage. While I was nowhere close to the 5MB limit for localStorage, I decided to do a stress test anyway.
Essentially, I loaded up data objects into a single localStorage Object until it was just slightly under that limit and must requests to set and get various items.
I then timed the execution of setItem and getItem informally using the javascript Date object and event handlers (bound get and set to buttons in HTML and just clicked =P)
The performance was horrendous, with requests taking between 600ms to 5,000ms, and memory usage coming close to 200mb in the worser of the cases. This was in Google Chrome with a single extension (Google Speed Tracer), on MacOSX.
In Safari, it's basically >4,000ms all the time.
Firefox was a surprise, having pretty much nothing over 150ms.
These were all done with basically an idle state - No YouTube (Flash) getting in the way, not many tabs (nothing but Gmail), and with no applications open other than background process + the Browser. Once a memory-intensive task popped up, localStorage slowed down proportionately as well. FWIW, I'm running a late 2008 Mac -> 2.0Ghz Duo Core with 2GB DDR3 RAM.
===
So the questions:
Has anyone done a benchmarking of sorts against localStorage get and set for various different key and value sizes, and on different browsers?
I'm assuming the large variance in latency and memory usage between Firefox and the rest is a Gecko vs Webkit Issue. I know that the answer can be found by diving into those code bases, but I'd definitely like to know if anyone else can explain relevant details about the implementation of localStorage on these two engines to explain the massive difference in efficiency and latency across browsers?
Unfortunately, I doubt we'll be able to get to solving it, but the closer one can get is at least understanding the limitations of the browser in its current state.
Thanks!

Browser and version becomes a major issue here. The thing is, while there are so-called "Webkit-Based" browsers, they add their own patches as well. Sometimes they make it into the main Webkit repository, sometimes they do not. With regards to versions, browsers are always moving targets, so this benchmark could be completely different if you use a beta or nightly build.
Then there is overall use case. If your use case is not the norm, the issues will not be as apparent, and it's less likely to get noticed and adressed. Even if there are patches, browser vendors have a lot of issues to address, so there a chance it's set for another build (again, nightly builds might produce different results).
Honestly the best course of action would to be to discuss these results on the appropriate browser mailing list / forum if it hasn't been addressed already. People will be more likely to do testing and see if the results match.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008