localStorage and locking - html

Since (html5) localStorage and its equivalencies persist in between tabs and windows, I've thought about using it for message passing. The problem is that fetch and store are different operations, and therefore not atomic. I have models that rely on UUID generation, conflict resolutions, and beaconing to do the small subset of what I need to do, but my real question is this:
Since the local storage is a shared memory resource, what are the locking mechanisms available for mutual access?

Benjamin Dumke-von der Ehe recently came up with some (experimental) locking code for localStorage: http://balpha.de/2012/03/javascript-concurrency-and-locking-the-html5-localstorage/
Update: the code link in the article is broken, but here on Github is code, specifically mentioning the article.

I think what you really need is Channel Messaging, though as far as I'm aware no-one has implemented it yet. It allows arbitrary client side messaging between scripts.

There aren't any built-in. You'll have to come up with your own locking mechanism. You can off course use any of the existing methods that other people have come up with for other things (like locking in memcache for instance).

Related

Why should I prefer HTTP REST over HTTP RPC JSON-Messaging style in conjunction with CQRS?

Every time I read about how web service should communicate the first thing that comes up is:
Use REST, because it decouples client and server!
I would like to build a web service where each Query and Command is an Http-Endpoint. With REST I would have fewer endpoints, because of its nature of thinking of resources instead of operations (You typically have more operations than resources).
Why do I have a stronger coupling by using RPC over REST?
What is the benefit of using REST over RPC Json-Messaging style?
Additional information: With messaging I mean synchronous messaging (request/response)
Update: I think it would be also possible/better to only have one single Http endpoint that can handle a Query or Command depending on the given Http verb.
Before I get to the CQRS part, I'd like to spend a little time talking about the advantages and disadvantages of REST. I don't think it's possible to answer the question before we've established a common understanding of when and why to use REST.
REST
As with most other technology options, REST, too, isn't a silver bullet. It comes with advantages and disadvantages.
I like to use Richardson's Maturity Model, with Martin Fowler's additional level 0, as a thinking tool.
Level 0
Martin Fowler also calls level 0 the swamp of POX, but I think that what really distinguishes this level is simply the use of RPC over HTTP. It doesn't have to be XML; it could be JSON instead.
The primary advantage at this level is interoperability. Most system can communicate via HTTP, and most programming platforms can handle XML or JSON.
The disadvantage is that systems are difficult to evolve independently of clients (see level 3).
One of the distinguishing traits of this style is that all communication goes through a single endpoint.
Level 1
At level 1, you start to treat various parts of your API as separate resources. Each resource is identified by a URL.
One advantage is that you can now start to use off-the-shelf software, such as firewalls and proxy servers, to control access to various distinct parts of the system. You can also use HTTP redirects to point clients to different endpoints, although there are some pitfalls in that regard.
I can't think of any disadvantages, apart from those of level 0.
Level 2
At this level, not only do you have resources, but you also use HTTP verbs, such as GET, POST, DELETE, etc.
One advantage is that you can now begin to take more advantage of HTTP infrastructure. For instance, you can instruct clients to cache responses to GET requests, whereas other requests typically aren't cacheable. Again, you can use standard HTTP firewalls and proxies to implement caching. You can get 'web-scale' caching for free.
The reason that level 2 builds on level 1 is that you need each resource to be separate, because you want to be able to cache resources independently from each other. You can't do this if you can't distinguish various resources from each other, or if you can't distinguish reads from writes.
The disadvantage is that it may involve more programming work to implement this. Also, all the previous disadvantages still apply. Clients are tightly coupled to your published API. If you change your URL structure, clients break. If you change data formats, clients break.
Still, many so-called REST APIs are designed and published at this level, so in practice it seems that many organisations find this a good trade-off between advantages and disadvantages.
Level 3
This is the level of REST design that I consider true REST. It's nothing like the previous levels; it's a completely different way to design APIs. In my mind, there's a hard divide between levels 0-2, and level 3.
One distinguishing feature of level 3 is that you must think content negotiation into the API design. Once you have that, though, the reasons to choose this API design style become clearer.
To me, the dominant advantage of level 3 APIs is that you can evolve them independently of clients. If you're careful, you can change the structure, even the navigation graph, of your API without breaking existing clients. If you need to introduce breaking changes, you can use content negotiation to ensure that clients can opt-in to the breaking change, whereas legacy clients will keep working.
Basically, when I'm asked to write an API where I have no control over clients, my default choice is level 3.
Designing a level 3 REST API requires you to design in a way that's unusual and alien to many, so that's a disadvantage. Another disadvantage is that client developers often find this style of API design unfamiliar, so they often try to second-guess, or retro-engineer, your URL structure. If they do, you'll have to expend some effort to prevent them from doing that as well, since this will prevent you from being able to evolve the API.
In other words, level 3 APIs require considerable development effort, particularly on the server-side, but clients also become more complex.
I will, though, reiterate the advantage: you can evolve an level 3 REST API independently of clients. If you don't control clients, backwards compatibility is critical. Level 3 enables you to evolve APIs while still retaining compatibility. I'm not aware of a way you can achieve this with any of the other styles.
CQRS
Now that we've identified some advantages and disadvantages of REST, we can start to discuss whether it's applicable to CQRS.
The most fundamental agreement between Greg Young and Udi Dahan concerning CQRS is that it's not a top-level architecture.
In a nutshell, the reason for this is that the messages (commands and events) and queries that make up a CQRS system are sensitive to interpretation. In order to do something, a client must know which command to issue, and the server must know how to interpret it. The command, thus, is part of the system.
The system may be distributed across clients and servers, but the messages and data structures are coupled to each other. If you change how your server interprets a given message, that change will impact your clients. You can't evolve clients and servers independently in a CQRS architecture, which is the reason why it's not a top-level architecture.
So, given that it's not a top-level architecture, the transport architecture becomes fairly irrelevant. In a sense, the only thing you need in order to send messages is a single 'service bus' endpoint, which could easily be a level 0 endpoint, if all you need is interoperability. After all, the only thing you do is to put a message on a queue.
The final answer, then, is, as always: it depends.
Is speed of delivery the most important criterion? And can you control all clients and servers at the same time? Then, perhaps level 0 is all you need. Perhaps level 2 is fine.
On the other hand, if you have clients out of your control (mobile apps, hardware (IoT), business partners using your public API, etc.), you must consider how to deal with backwards and forwards compatibility, in which case level 3 is (IMO) required. In that case, though, I'd suggest keeping CQRS an implementation detail.
The best answer would probably be "it depends", but one of the big things about real REST is that it is stateless. And it being stateless means all sorts of things.
Basically, if you look at the HATEOAS constraint you have the reason for the decoupling 1
I think that RPC-JSON is not statefull per se, but it definately is not defined as stateless. So there you have the strongest argument of why the decoupling is very high with REST.
1: https://en.wikipedia.org/wiki/HATEOAS , http://restfulapi.net/hateoas/

Realtime synchronization of live data over network

How do you sync data between two processes (say client and server) in real time over network?
I have various documents/datasets constructed on the server, which are downloaded and displayed by clients. Once downloaded, the document receives continuous updates in order to remain fresh.
It seems to be a simple and commonly occurring concept, but I cannot find any tools that provide this level of abstraction. I am not even sure what I am looking for. Perhaps there is a similar concept with solid tool support? Perhaps there is a chain of different tools that must be put together? Here's what I have considered so far:
I am required to propagate every change in a single hop (0.5 RTT), which rules out polling (typically >10 RTT) and cache invalidation techniques (1.5 RTT).
Data replication and simple notification broadcasts are not an option, because there is too much data and too many changes. Clients must be able to select specific documents to download and monitor for changes.
I am currently using message passing pattern, which does the job, but it is hopelessly unproductive. It works at way too low level of abstraction. It is laborious, error-prone, and it doesn't scale well with increasing application complexity.
HTTP and other RPC-like techniques are good for the initial fetch, but they encourage polling for subsequent synchronization. When performing reverse requests (from data source to data consumer), change notifications are possible, but it's even more complicated than message passing.
Combining RPC (for the initial fetch) with message passing (for updates) turned out to be a nightmare due to the complexity involved in coordinating communication over the two parallel connections as well as due to the impedance mismatch between the two paradigms. I need something unified.
WebSocket & Comet are popular methods to implement change notification, but they need additional libraries to be productive and I am not aware of any libraries suitable for my application.
Message queues merely put an intermediary on the network while maintaining the basic message passing pattern. Custom message filters/routers allow me to get closer to the live document concept, but I feel like I am implementing custom middleware layer on top of the MQ.
I have tons of additional requirements (native observable data structure API on both ends, incremental updates, custom message filters, custom connection routing, cross-platform, robustness & scalability), but before considering those requirements, I need to find some tools that at least attempt to do what I need. I am trying to avoid in-house frameworks for the standard reasons - cost, time to market, long-term maintenance, and keeping developers happy.
My conclusion at the moment is that there is no such live document synchronization framework. In-house solution is the way to go, but many existing components can be used as part of the solution.
It is pretty simple to layer live document logic on top of WebSocket or any other message passing platform. Server just sends the document as a separate message when the connection is initiated and then after every change. Automated reconnection and some connection monitoring must be added to handle network failures.
Serialization at both ends is a separate problem targeted by many existing libraries. Detecting changes in server-side data structures (needed to initiate push) is yet another separate problem that has its own set of patterns and tools. Incremental updates and many other issues can be solved by intermediaries intercepting the connection.
This approach will work with current technology at the cost of extensive in-house glue code. It can be incrementally substituted with standard components as they become available.
WebSocket already includes resource URIs, routing, and a few other nice features. Useful intermediaries and libraries will likely emerge in the future. HTTP with text/event-stream MIME type is a possible future alternative to WebSocket. The advantage of HTTP is that existing tools can be reused with little modification.
I've completely thrown away the pattern of combining RPC pull with separate push channel despite rich tool support. Pushing everything in 0.5 RTT requires the push channel to use exactly the same technology as the pull channel, i.e. reverse RPC. Reverse RPC is like message passing except it introduces redundant returns, throws away useful connection semantics, and makes it hard to insert content-agnostic intermediaries into the stream.

MySQL: which API to use?

I'm just getting started with interfacing to MySQL from a C++ app. The app is pretty simple: it's a Linux web server, and the C++ code retrieves JavaScript from a local database to return to the client via Apache and Ajax. The database will contain no more than a few thousand short JavaScript programs.
Question: any advice on which API I should use? I'm just reading through the docs on dev.mysql.com, and there doesn't seem to be any good reason to choose one or other of libmysql, Connector/C, Connector/C++, MySQL++, or Connector/ODBC. Thanks.
With no more than a few thousand rows, chances are, you should pick your API after your language preferences, not the other way round - so go aheead and chose whatever fits your mood.
If your app's performance stands and falls with the performance differences of the MySQL connectors you should be quite busy fixing your design elsewhere.
I personally prefer portability, so I tend to use a lot of ODBC, accepting the small performance hit, but others might think different. If you never ever want to use a different RDBMS stay away from ODBC - without the portability benefit it's quite ugly.
I would just use the raw C API. Seems to be the simplest way with the least overhead.

Is RavenDB just a frontend for Access?

I've started using Raven for my last project. When my boss learned about it, he mentioned it's based on Access and he had very bad experience with multiple users and Access. Now I have to either switch or prove to him he is wrong.
No, it isn't. The confusion is because RavenDB can use ESENT for data storage and ESENT used to be called Jet Blue. It was called Jet Blue because it was originally developed to replace the Jet Red engine which was/is used in Access. The Wikipedia entry is quite accurate about the history and differences.
Laurion's answer is correct, but I also wanted to point out that in Raven you can swap out the ESENT storage engine for another that Oren developed called Munin.
From Ayende's blog post about Munin.
Raven.Munin is the actual implementation of a low level managed storage for RavenDB. I split it out of the RavenDB project because I intend to make use of it in additional projects.
At its core, Munin provides high performance transactional, non relational, data store written completely in managed code. The main point in writing it was to support the managed storage in RavenDB, but it is going to be used for Raven MQ as well, and probably a bunch of other stuff as well. I’ll post about Raven MQ in the future, so don’t bother asking about it.
Munin is a low level api, not something that you are likely to use directly. And it was explicitly modeled to give me an interface similar in capability to what Esent gives me, but in purely managed code.

What tools do distributed programmers lack?

I have a dream to improve the world of distributed programming :)
In particular, I'm feeling a lack of necessary tools for debugging, monitoring, understanding and visualizing the behavior of distributed systems (heck, I had to write my own logger and visualizers to satisfy my requirements), and I'm writing a couple of such tools in my free time.
Community, what tools do you lack with this regard? Please describe one per answer, with a rough idea of what the tool would be supposed to do. Others can point out the existence of such tools, or someone might get inspired and write them.
OK, let me start.
A distributed logger with a high-precision global time axis - allowing to register events from different machines in a distributed system with high precision and independent on the clock offset and drift; with sufficient scalability to handle the load of several hundred machines and several thousand logging processes. Such a logger allows to find transport-level latency bottlenecks in a distributed system by seeing, for example, how many milliseconds it actually takes for a message to travel from the publisher to the subscriber through a message queue, etc.
Syslog is not ok because it's not scalable enough - 50000 logging events per second will be too much for it, and timestamp precision will suffer greatly under such load.
Facebook's Scribe is not ok because it doesn't provide a global time axis.
Actually, both syslog and scribe register events under arrival timestamps, not under occurence timestamps.
Honestly, I don't lack such a tool - I've written one for myself, I'm greatly pleased with it and I'm going to open-source it. But others might.
P.S. I've open-sourced it: http://code.google.com/p/greg
Dear Santa, I would like visualizations of the interactions between components in the distributed system.
I would like a visual representation showing:
The interactions among components, either as a UML collaboration diagram or sequence diagram.
Component shutdown and startup times as self-interactions.
On which hosts components are currently running.
Location of those hosts, if available, within a building or geographically.
Host shutdown and startup times.
I would like to be able to:
Filter the components and/or interactions displayed to show only those of interest.
Record interactions.
Display a desired range of time in a static diagram.
Play back the interactions in an animation, with typical video controls for playing, pausing, rewinding, fast-forwarding.
I've been a good developer all year, and would really like this.
Then again, see this question - How to visualize the behavior of many concurrent multi-stage processes?.
(I'm shamelessly refering to my own stuff, but that's because the problems solved by this stuff were important for me, and the current question is precisely about problems that are important for someone).
You could have a look at some of the tools that come with erlang/OTP. It doesn't have all the features other people suggested, but some of them are quite handy, and built with a lot of experience. Some of these are, for instance:
Debugger that can debug concurrent processes, also remotely, AFAIR
Introspection tools for mnesia/ets tables as well as process heaps
Message tracing
Load monitoring on local and remote nodes
distributed logging and error report system
profiler which works for distributed scenarios
Process/task/application manager for distributed systems
These come of course in addition to the base features the platform provides, like Node discovery, IPC protocol, RPC protocols & services, transparent distribution, distributed built-in database storage, global and node-local registry for process names and all the other underlying stuff that makes the platform tic.
I think this is a great question and here's my 0.02 on a tool I would find really useful.
One of the challenges I find with distributed programming is in the deployment of code to multiple machines. Quite often these machines may have slightly varying configuration or worse have different application settings.
The tool I have in mind would be one that could on demand reach out to all the machines on which the application is deployed and provide system information. If one specifies a settings file or a resource like a registry, it would provide the list for all the machines. It could also look at the user access privileges for the users running the application.
A refinement would be to provide indications when settings are not matching a master list provided by the developer. It could also indicate servers that have differing configurations and provide diff functionality.
This would be really useful for .NET applications since there are so many configurations (machine.config, application.config, IIS Settings, user permissions, etc) that the chances of varying configurations are high.
In my opinion, what is missing is a distributed programming platform...a platform that makes application programming over distributed systems as transparent as non-distributed programming is now.
Isn't it a bit early to work on Tools when we don't even agree on a platform? We have several flavors of actor models, virtual shared memory, UMA, NUMA, synchronous dataflow, tagged token dataflow, multi-hierchical memory vector processors, clusters, message passing mesh or network-on-a-chip, PGAS, DGAS, etc.
Feel free to add more.
To contribute:
I find myself writing a lot of distributed programs by constructing a DAG, which gets transformed into platform-specific code. Every platform optimization is a different kind of transformation rules on this DAG. You can see the same happening in Microsoft's Accelerator and Dryad, Intel's Concurrent Collections, MIT's StreaMIT, etc.
A language-agnostic library that collects all these DAG transformations would save re-inventing the wheel every time.
You can also take a look at Akka:
http://akka.io
Let me notify those who've favourited this question by pointing to the Greg logger - http://code.google.com/p/greg . It is the distributed logger with a high-precision global time axis that I've talked about in the other answer in this thread.
Apart from the mentioned tool for "visualizing the behavior of many concurrent multi-stage processes" (splot), I've also written "tplot" which is appropriate for displaying quantitative patterns in logs.
A large presentation about both tools, with lots of pretty pictures here.