Google realtime object pool - google-drive-api

This question is a little "meta" for SO, but there doesn't seem to be a better place to ask it...
According to Google, realtime collaborative objects are never deleted from the model. So it makes sense to pool objects where possible, rather than not-really-delete them and subsequently create new ones, thus preventing an unnecessary increase in file-size and overhead.
And here's the problem: in an "undo" scenario, this would mean pulling a deleted object out of the trash pool. But "undo" only applies to operations by the local user, and I can't see how the realtime engine could cope if that "deleted" object had already been claimed by a different user.
My question is, am I missing something or wrong-thinking, and/or is there an alternative to a per-user pool?
(It also occurs to me that as a feature, the API could handle pooling deleted objects, automatically minimizing file-bloat.)

I think you have to be very careful about reusing objects in the way you describe. Its really hard to get right. Are you actually running into size issues? In general as long as you don't constantly create and throw out objects, it shouldn't be a big deal.
You can delete the contents of the collab object when its not being used to free up space. That should generally be enough.
(Note, yes, the API could theoretically handle this object cleanup automatically. It turns out to be a really tricky problem to get right, do to features like undo. It might show up as a future feature if it becomes a real issue for people.)

Adding to Cheryl's answer, the one thing that I see as particularly challenging (actually, impossible) is the pulling-an-object-from-the-pool stuff:
Let's say you have a pool of objects, which (currently) contains a single object O1.
When a client needs a new object it will first check the pool. if the pool is not empty it will pull an object from there (the O1 object) and use it, right?
Now, consider the scenario where two clients (a.k.a, editors/collaborators) need a new object at the same time. Each of these clients will run the logic described in the previous paragraph. That is: both clients will check whether the pool is empty and both clients will pull O1 off of the pull.
So, the loosing client will "think" for some time that it succeeded. it will grab an object from the pool and will do some things with it. later on it will receive an event (E) that tells it that the object was actually pulled by another client. At this point the "loosing" client will need to create another object and re-apply whatever changes it did to the first object to this second object.
Given that you do not know if/when the (E) event is going to fire it actually means that every client needs to be prepared to replace every collaborative object it uses with a new one. This seems quite difficult. Making it more difficult is the fact that you cannot do model changes from event handlers (as this will trump the redo/undo stack). So the actual reaction to the (E) event need to be carried out outside of the (E) event handler. Thus, in the time between the receiving of the (E) event and the fix to the model, your UI layer will not be able to use the model.

Related

Reagent - methods of storing ui state together with (or separate from?) server-persisted state?

I'm using multiple atoms within a map called app-state and it's working quite well architecturally so far. The state distributed across those atoms is normalised, reflecting as it is stored in datomic, and of course what the client is initialised with is a specific subset of what's in datomic. This is me preparing the way to try out datascript (which is what gave me the aha moment of why client state is so much better fully normalised, even if not using datascript).
I have a question at this point. We all know that some state in reagent is a reflection of what's in the server's database (typically), but there's also state in reagent concerning solely the current condition of the ui. That state will vanish when the page is re-loaded and there's (typically) no need to store that on the server.
So, I'm looking at my list of atoms and realising that I have some atoms which hold database-record-like maps, i.e. they contain exact reflections of datomic entities, (which arrive by transit), which is great.
But now I notice I also want some ui state per datomic entity.
So the question arises whether to (this seems wrong to me) add some keys to what came from datomic, of the ui state that is irrelevant to datomic, but that the client needs (i.e., dump it into the same nested map). That is entirely possible, but seems wrong, and so suggests.... (this being my idea as of now), How about a parallel atom per "entity", like #<entity-name>-ui, containing a map (or even a vector of maps, if multiple entities), with a set of keys for ui state.
That seems an improvement on what I have ended up with by default as of now, which is separate atom for every piece of ui state (I've avoided component local state up to now). (Currently the ui only holds ui state for one record at a time, so these ui atoms need only be concerned with a single current entity).
But if, say, I made a parallel atom (to avoid mixing ephemeral ui and server state), then ui state could perhaps manageably extend deeper. We could hold, say, ui state per entity so switching current-entity back and forth would remember ui state.
Since this is Stack Overflow, I have to ask a specific question, rather than this just be discussion, so: given what I've described, what are some sensible architectural choices in this case, to store state in reagent?
If you are storing your app state in several component-independent reagent atoms already - you can check https://github.com/day8/re-frame which is a widely-adopted reagent-powered framework exactly for your case. Essentially it stores all the application state in a reagent atom but has a well-developed infrastructure to support coordinated storage and updates. They have brilliant documentation with a great high-level explanation of the idea.
Regarding your initial question about server/ui state separation - I think you should definitely go this way. It'll give you a better way of separating concerns and give you an easier way to update server/ui data separately. It is very easy to achieve this with re-frame by storing both parts of the state under separate top-level keys in re-frame application db. E.g.
{:server {:entity-name ...}
:ui {:entity-name ...}}
and then create suitable subscriptions(see re-frame docs) to retrieve it.

Replacing items in message queue

Our system requirements say that we need to build a slightly unusual producer-consumer processing system. Imagine we have multiple data streams and we take a snapshot each X seconds and put it into the queue for processing. The streams count is not constant. The more clients we have, the more streams we need to process. At the same time, we don't need to process ALL taken snapshots. If we have too many clients and we are not able to process all items in real-time, we would prefer to skip old snapshots and process only the latest ones.
So as I see, the requirements can be met by keeping only one item in a queue for each stream. If there is a new snapshot, while the previous is still there, we need to REPLACE it using stream id as a key.
Is it possible to implement such behavior by Service Bus queue or something similar? Or maybe it makes sense to look into some other solutions like Redis?
So as I see, the requirements can be met by keeping only one item in a
queue for each stream. If there is a new snapshot, while the previous
is still there, we need to REPLACE it using stream id as a key. Is it
possible to implement such behavior by Service Bus queue or something
similar?
To the best of my knowledge, Azure Service Bus does not support this scenario. Through it's duplicate detection functionality, in fact it supports exact opposite of that. You would need to use some other mechanism (like Redis Cache you mentioned) to accomplish this.

CQRS / communication between contexts / eventstore / push or pull?

Communications between bounded context in CQRS/ES architecture is achieved through events; context A generates events as response to commands, and these events is then forwarded to context B through event bus (message queue).
Or... you can store the events in eventstore (that belongs to context A).
Or... both (store and forward).
My question is: from context B, should I pull the events from the context store? or simply consume the events pushed through the event bus?
I'm leaning toward the pulling approach. Because then we can do some catching up in context B. In contrast, in the push approach, context B might be unaware of events that were delivered while B is experiencing downtime.
So... does it mean... when we have eventstore, we can simply forget about the message queue (seems redundant)?
Or am I missing something here?
You'll want to review Consume event stream without Pub/Sub
At the DDD Europe conference, I realized that the speakers I talked with where (sic) avoiding Pub/Sub whenever possible.
The discussion that follows may have value. TL;DR: not many fans of pub/sub there.
Konrad Garus on Push or Pull?, describing the Pull design:
In the latter (and simpler) design, they only spread the information that a new event has been saved, along with its sequential ID (so that all projections can estimate how much behind they are). When awakened, the executor can continue along its normal path, starting with querying the event store.
Why? Because handling events coming from a single source is easier, but more importantly because a DB-backed event store trivially guarantees ordering and has no issues with lost or duplicate messages. Querying the database is very fast, given that we’re reading a single table sequentially by primary key, and most of the time the data is in RAM cache anyway. The bottleneck is in the projection thread updating its read model database.
In the large, it comes down to this: when people are thinking about event sourcing, they are really thinking about histories, rather than events in isolation. If what you really want is an ordered sequence of events with no gaps, querying the authority for that sequence is much better than trying to reconstruct if from a bunch of disjoint event messages.
But - once you decide to do that, then suddenly the history, and all of the events that appear within it, becomes part of the api of context A. What happens when team A decides that a different event store implementation is more suitable? Can they just roll out a new version of their own services, or do we need a grand outage because every consumer also has to get updated?
Similarly, what happens if we decide to refactor context A into context C and context D? Again, do we have to screw around in context B to get the data we need?
Maybe the real problem is that context B is coupled to the histories in context A, and those histories should really be private? Should context B be accessing context A's data, or should it instead be delegating that work to context A's capabilities?
Udi Dahan essays on SOA may jump start your thinking in that direction.

Synchronize Changes To A Textfield

I'm experimenting with P2P on Flash, and I've come across a little hurdle that I'd like to clarify before moving forward. The technology itself (Flash) doesn't matter for this problem, as I think this problem occurs in other languages.
I'm trying to create a document that can be edited "live" by multiple people. Just like Google Docs pretty much. But I'm wondering, how would you suggest synchronizing everyone's text? I mean, should I message everyone with all the text in the text field every time someone makes a change? That seems very inefficient.
I'm thinking there has to be a design pattern that I can learn and implement, but I'm not sure where to start.
Optimally, the application should send the connected clients only the changes that have occurred to the document, and have some sort of buffer or error correction that can be used for retrieving earlier changes that may have been missed. Is there any established design pattern that deals with this type of issue?
Thanks,
Sandro
I think your "Optimally" solution is actually the one you should go for.
each textfield has a model, the model has a history (a FILO storing last, let's say, 10 values).
every time you edit that textfield you push the whole text into the model and send the delta to other connected clients.
as other clients receive the data they just pick the last value from the model and merge it to the received data.
you can refine the mechanism by putting an idle timer in the middle: as a user types something in the textfield you flag that model as "toBeSentThroughTheNet" and you start a timer. as the timer "ticks" (TimerEvent.TIMER) you stop it, collect the flagged data and send it to other clients. just remember to reset the timer everytime the user is actually typing (a semplification coul be keydown = reset, keyup = start).
one more optimization could be send the data packed in a compressed bytearray, but this requires you write your own protocol and may be not so an easy and quick path :)
If the requirement is that everyone can edit the document at the same time and the changes should be propagated to everyone and no changes should be lost, then it is a non-trivial problem. There are few different approaches out there, but one that is quite robust is Operational Transformation. This is the same algorithm that Google Docs uses for collaborative editing.
Understanding and Applying Operational Transformation and the attendant hacker news discussion are probably other good places to start.
The Wave Protocol was released as open source so you can take a look on how it is implemented.
You could of course forgo the tricky synchronization and just allow people to take turns and only one person can edit the document at a time and this person just pushes the changes to the remainder of the group.

Best use pattern for a DataContext

What's the best lifetime model for a DataContext? Should I just create a new one whenever I need it (aka, function level), should I keep one available in each class that would use it (class level), or should I create a static class with a static DataContext (app-domain level)? Are there any considered best practices on this?
You pretty much need to keep the same data context available throughout the lifetime of the operations you want to perform if you're ever going to be storing changes which are to be .SubmitChanges()'d later, as otherwise you will lose those changes.
If you're just querying stuff then it's fine to create them as needed, but then if later you want to .SubmitChanges() you'll have to refactor your code a lot, so you may as well adopt the pattern of effectively keeping the datacontext global throughout your app from the beginning.
Note the data context is disconnected. The connection is only made when the query data is enumerated (not when you first run the query, it's a 'lazy' data type so only provides data when it's needed), and then closed immediately afterwards. On .SubmitChanges() the connection is opened to submit the changes then closed immediately afterwards. So don't think keeping the datacontext around keeps a connection open, it doesn't (you can hook the StateChange event of the connection to confirm this for yourself, that's how I'm sure).
There is a great article over at Rick Strahl's Blog which covers this topic in depth, far more than my answer here provides!!
I think Jeff Atwood talked about this in the Herding Code podcast, when he was questioned about the exact same thing. Listen to it towards the last 15-20 minutes or so.
I think in SO, the datacontext is created in the Controller class. Not sure about a lot of details here. But that's what it looked like.