What can I do to prevent write-write conflicts on a wiki-style website?

What can I do to prevent write-write conflicts on a wiki-style website? - usability

On a wiki-style website, what can I do to prevent or mitigate write-write conflicts while still allowing the site to run quickly and keeping the site easy to use?
The problem I foresee is this:
User A begins editing a file
User B begins editing the file
User A finishes editing the file
User B finishes editing the file, accidentally overwriting all of User A's edits
Here were some approaches I came up with:
Have some sort of check-out / check-in / locking system (although I don't know how to prevent people from keeping a file checked out "too long", and I don't want users to be frustrated by not being allowed to make an edit)
Have some sort of diff system that shows an other changes made when a user commits their changes and allows some sort of merge (but I'm worried this will hard to create and would make the site "too hard" to use)
Notify users of concurrent edits while they are making their changes (some sort of AJAX?)
Any other ways to go at this? Any examples of sites that implement this well?

Remember the version number (or ID) of the last change. Then read the entry before writing it and compare if this version is still the same.
In case of a conflict inform the user who was trying to write the entry which was changed in the meantime. Support him with a diff.
Most wikis do it this way. MediaWiki, Usemod, etc.

Three-way merging: The first thing to point out is that most concurrent edits, particularly on longer documents, are to different sections of the text. As a result, by noting which revision Users A and B acquired, we can do a three-way merge, as detailed by Bill Ritcher of Guiffy Software. A three-way merge can identify where the edits have been made from the original, and unless they clash it can silently merge both edits into a new article. Ideally, at this point carry out the merge and show User B the new document so that she can choose to further revise it.
Collision resolution:
This leaves you with the scenario when both editors have edited the same section. In this case, merge everything else and offer the text of the three versions to User B - that is, include the original - with either User A's version in the textbox or User B's. That choice depends on whether you think the default should be to accept the latest (the user just clicks Save to retain their version) or force the editor to edit twice to get their changes in (they have to re-apply their changes to editor A's version of the section).
Using three-way merging like this avoids lock-outs, which are very difficult to handle well on the web (how long do you let them have the lock?), and the aggravating 'you might want to look again' scenario, which only works well for forum-style responses. It also retains the post-respond style of the web.
If you want to Ajax it up a bit, dynamically 3-way merge User A's version into User B's version while they are editing it, and notify them. Now that would be impressive.

In Mediawiki, the server accepts the first change, and then when the second edit is saved a conflicts page comes up, and then the second person merges the two changes together. See Wikipedia: Help:Edit Conflicts

Using a locking mechanism will probably be the easiest to implement. Each article could have a lock field associated with it and a lock time. If the lock time exceeded some set value you'd consider the lock to be invalid and remove it when checking out the article for edit. You could also keep track of open locks and remove them on session close. You'd also need to implement some concurrency control in the database (autogenerated timestamps, perhaps) so that you could make sure that you are checking in an update to the version that you checked out, just in case two people were able to edit the article at the same time. Only the one with the correct version would be able successfully check in an edit.
You might also be able to find a difference engine that you could just use to construct differences, though displaying them in a wiki editor may be problematic -- actually displaying the differences is probably harder than constructing the diff. You'd rely on the versioning system to detect when you needed to reject an edit and perform a diff.

In Gmail, if we are writing a reply to a mail and someone else sends a reply while we are still typing it, a popup appears indicating that there is a new update and the update itself appears as another post without a page reload. This approach would suit your needs and if you can use Ajax to show the exact post with a link to diff of what was just updated while User B is still busy typing his entry that would be great.

As Ravi (and others) have said, you could use an AJAX approach and inform the user when another change is in progress. When an edit is submitted, just indicate the textual differences and let the second user work out how to merge the two versions.
However, I'd like to add on with something new you could try in addition to that: Open a chat dialog between the editors while they're doing their edits. You could use something like embedded Gabbly for that, for instance.
The best conflict resolution is direct dialog, I say.

Your problem (lost update) is solved best using Optimistic Concurrency Control.
One implementation is to add a version column in each editable entity of the system. On user edit you load the row and display the html form on the user. A hidden field gives the version, let's say 3. The update query needs to look something like:
update articles set ..., version=4 where id=14 and version=3;
If rows returned is 0 then someone has already updated article 14. All you need to do then is how to deal with the situation. Some common solutions:
last commit wins
first commit wins
merge conflicting updates
let the user decide
Instead of an incrementing version int/long you can use a timestamp but it's not suggested because:
retrieving the current time from the JVM isn't necessarily safe in a clustered environment, where nodes may not be time synchronized.
(quote from Java Persistence with Hibernate)
Some more info at the hibernate documentation.

At my office, we have a policy that all data tables contain 4 fields:
CreatedBy
CreatedDate
LastUpdateBy
LastUpdateDate
That way there is a nice audit trail on who has done what to the records, at least most recently.
But most importantly, it becomes easy enough to compare the LastUpdateDate of the current or edited record on the screen (requires you to store it on the page, in a cookie, whatever, with the value in the database. If the values don't match, you can decide what to do from there.

Related

How does Chrome update URL bar completions?

I really enjoy using Chrome's URL bar because it remembers commonly-visited sites and often suggests a good completion based on what I've typed and/or visited before. So, for example, I can type t in the URL bar and Chrome will automatically fill it in with twitter.com, or I can type maps and Chrome will fill in the .google.com. This gives me the convenience of data-driven domain name shortcuts without having to maintain an explicit list.
What I'm wondering, though, is how Chrome determines that an old shortcut should be replaced with a new one. For example, if I visit twitter.com often, then that becomes the completion when I type t. But if I then start visiting twilio.com often enough, then, after some time, Chrome will start to fill that in as the default completion for t. What I can't figure out is how or when that transition takes place. It also seems that there are (at least) two cases involved : one for domain names, and another for path strings, because if I visit a certain full URL often, and then want to get to the root of the same domain, I end up having to type the entire domain name out to get Chrome to ignore the full-URL completion.
If I had to guess, I'd imagine that Chrome stores the things that I type in the URL bar in a trie whose values are the number of times that a particular string has been typed (and/or visited ?). Then I'd imagine it has some sort of exponential decay model for the "counts" in the trie. But this is just a guess. Does anyone know how this updating process happens ?

Well, I ended up finding some answers by having a look at the Chromium source code ; I'd imagine that Chrome itself uses this code without too much modification.
When you type something into the search/URL bar (which is apparently called the "Omnibox"), Chrome starts looking for suggestions and completions that match what you've typed. To do this, there are several "providers" registered with the browser, each of which knows how to make a particular type of suggestion. The URL history provider is one of these.
The querying process is pretty cool, actually. It all happens asynchronously, with particular attention paid to which activity happens in which thread (the main thread being especially important not to block). When the providers find suggestions, they call back to the omnibox, which appears to merge and sort things before updating the UI widget.
History provider
It turns out that URLs in Chrome are stored in at least one, and probably two, sqlite databases (one is on disk, and the second, which I know less about, seems to be in memory).
This comment at the top of HistoryURLProvider explains the lookup process, complete with multithreaded ASCII art !
Sqlite lookup
Basically, typing in the omnibox causes sqlite to run this SQL query for looking up URLs by prefix. The suggestions are ordered by the number of visits to the URL, as well as by the number of times that a URL has been typed.
Interestingly, this is not a trie ! The lookup is indeed based on prefix, but the scoring of those lookups does not appear to be aggregated by prefix, like I'd imagined.
I had a little less success in determining how the scores in the database are updated. This part of the code updates a URL after a visit, but I haven't yet run across where the counts are decremented (if at all ?).
Updating suggestions
What I think is happening regarding the updating of suggestions -- and this is still just a guess right now -- is that the in-memory sqlite database essentially has priority over the on-disk DB, and then whenever Chrome restarts or otherwise flushes the contents of the in-memory DB to disk, the visit and typed counts for each URL get updated at that time. Again, just a guess, but I'll keep looking as I get time.
The code is really nice to read through, actually. I definitely recommend it if you have similar questions about Chrome.

How best to handle a many-to-many concurrency conflict?

We have an application feature similar to gmail's labels - you can 'tag' the items. Now this is a concurrent application i.e., this so called 'whiteboard' is editable by multiple users - which means that many users can choose to re/group the items. Basically tag multiple items at the same time.
There will definitely be conflicts but the question is how best to handle it? The only strategy that comes to mind is similar to the famous ALOHA protocol i.e., check before commit if any thing has changed - if so, abort and inform user; else commit. This is quite inefficient IMO.
Here are two similar ideas - one difficult and the other easier by comparison:
Easier one first :) - overwrite changes i.e., the duplicates would just be updated but new ones would be tagged too.
Difficult one: Check for which are to be 'removed' i.e., there could be some that doesn't belong to the categorization (by user 2 say. i.e., user 1 made a change and user 2 also made it at the same time. Basically finding the set of tagged items {user1 - user2}). This is going to be extremely hard and really not worth the effort IMHO.
I was wondering what's the best practice solution to use in such a case which doesn't hinder the user experience and doesn't confuse them either.
(This is a J2EE/Restlet app with a MySQL backend and a Jquery/ajax front end)

The answer to this question really depends on who your users are and what they expect. If it were me, I think I would anticipate being informed of a user creating changes before mine are done (stackoverflow does this), and allow me to commit changes anyway or roll back. All of the solutions you've presented seem acceptable .. it depends purely on what you want to do. If you're asking for how to do this with code, you are going to have to post some code first so we can see what you are dealing with.
Another possible solution would be similar to #2 (just overwrite changes as they occur), but keep a revision of each change to allow for easy reversions, and to make it easy to tell if changes were made on top of others.

How to Determine If Webpage Has Been Modified?

I know I can check the response header's 'last-modified' value to determine when the web page was last modified, but in many instances that header is NOT provided. Also, in many instances the content itself hasn't changed, but the current time/date is displayed on the page, thus giving the appearance of a modification.
Any suggestions on how to overcome the above issues and determine if a web page has been (truly) modified?
Thanks.

Sure. Define for yourself what counts as a "modification" (for example, only things in the "content" div) and only look at that.
If you can't find a way to decide whether something's been changed, then you can't expect a computer to…

You are asking two question here:
When was it modified?
Was it modified?
To answer question #1, you'd have to check the page every so often to meet your granularity requirements e.g. every hour, every day, every week, etc. This could be quite resource intensive. This will depend on if you really need to know this.
To answer question #2, you need to compare something. You could do what #Paul Rosnia suggested, but if they as much as added a comma, it will be considered modified.
Then, you might also want to see what has been modifed. Then you you'd have to save the content and compare them to each other in order to highlight the changes.
You could use http://php.net/manual/en/function.file-get-contents.php and a CRON job to cache the page on your server and then perdiodically compare your cache. The comparing part will be the tricky part, since you have to write specific code to ignore the things that don't matter to you e.g. date/time stamp, header changes, menu changes, etc.

The sure-fire way to detect page changes is to download and checksum it. If the checksum changes, the page has been edited (with extremely high certainty).
Here's an example that works on the command line:
curl -s news.ycombinator.com | md5 #=> d86582bec138c051b0d8322f7823a23c
That was a few minutes ago. If you run it now you'll get a different answer!

Is there a way to create a custom edit flag in MediaWiki?

I moderate a wiki where many users use the AutoWikiBrowser to rapidly edit. This is fine but it makes it harder to locate and deal with vandalism via the recent changes. Is there any way that I can create a custom edit flag to mark edits as semi-automated and allow users to hide them from the recent changes? Ideally this would come with the ability to mark edits as semi-automated by default, which would allow the functionality I seek without needing a change to the AWB source code.
The ability to mark one's edits as semi-automated shouldn't be open to anyone, so it would need to be restricted to certain usergroups (probably rollback and up). I realise that there is the ability to mark edits as bot edits, but this is inaccurate as they are not truly bots, and inconvenient, since it requires a bureaucrat to mark the user as a bot, then unmark them when their editing is finished. I realise its a lot to request, and I certainly understand if its not possible, but I was hoping that it was.

Why not have users use two accounts - one for manual edits and one for bot edits? Or is that too much overhead for the users?
As you say, if your bots have their own accounts you can add them to the bot group. Then users can, in Recent Changes, decide themselves to show / hide bot edits.
Then your admins can patrol the changes as usual.

The solution is very simple: add a user group whose users are able to add and remove themselves (via Special:userrights) to the default "bot" group or to another group "flood" having only the "bot" and perhaps "noratelimit" permission; then add those users to this group.

Change config values on a specific time

I just got a mail saying that I have to change a config value at 2009-09-01 (new taxes). Our normal approach for this would be to to awake at 2009-08-31 at 23:59 and then just change the value manually. Which not is a big problem since this don't happens to often. But it makes me wonder how other people handle issues like this.
So! How do you handle date specific config changes?
(We are working in asp.net but I don't think this has to be language specific)
Br
Carl Bergquist

I'd normally store this kind of data in a database table like this
Key, Value, EffectiveFrom, EffectiveTo
-----------------------------------------
VAT, 15.0, 20081201, 20091231
VAT, 17.5, 20100101, NULL
I'd then use the EffectiveFrom and EffectiveTo dates to chose the value that is effective at the given time. If the rate is open ended then the effecive to could either by NULL or 99991231.
This also allows you to go back without having to change the config. E.g. if someone asks you to recalculate the tax for the previous month before the rate change.

In linux, there is a command "at" for batch execution.
See "man at" for details.

To be honest, waking up near the time and changing it seems to be the simplest and cheapest approach. All of the technical solutions are fine, but it depends where you work.
In our environment it would be cheaper and simpler to get someone to wake up and make the change than to redevelop the functionality of a piece of software that already works. It certainly involves less testing, development overhead and costs which means we would tend to solve the problem as you do, manually.

That depends totally on the situation and the technology.
pjp's idea is good, if you get your config from a database, or as metadata to define the valid time for whole config sets/files.
Another might be: just prepare a new configfile with the new entries and swap them at midnight (probably with a restart of the service/program) whatever.
Swapping them would be possible with at (as given bei Neeraj) ...
If timing is a problem you should handle the change, or at least the timing of the change on the running server (to avoid time out of synch problems).

We got same kind of problem some time before and handled using the following approach.
this is suitable if you are well known to the source that orginates the configuration changes..
In our case, the source exposed a webservice (actualy a third party) which will return a modified config details. And there is a windows service running on our server which keeps on polling the webservice and will update the configuration file if there is any change.
this works perfectly in our case..
You can make use of this approach by changing the polling webservice part to your source of config change (say reading changes from some disk path). But am not sure how this is possible reading config changes from email.

Why not just make a shell script to swap out the files. run it in cron and switch the files out a minute before and send an alert text if NOT successful and an email if successful.
This is an example on a Linux box but I think you get the point and can do this on a Windows box.
Script:
cp /path/to/old/config /path/to/backup/dir/config.timestamp
cp /path/to/new/config
if(/path/to/new/config exsits) {
sendSuccessEmail();
} else {
sendPanicTextAlert();
}
cron:
59 23 31 8 * /path/to/script.sh
you could test this as well before hand just point to some dummy directories and file

I've seen the hybrid approach. Instead of actually changing the data model to include EffectiveDate/EndDate or manually changing the values yourself, schedule a script to change the values automatically. Also, be sure to have a solid test plan that will validate all changes.
However, this type of manual change can have a dramatic impact on reporting. If previous transactions join directly to the tables being changed, numbers in historical reports could change in a very bad way. There really is no "right" answer.

If I'm not able to do something like pjp's solution, I'd use either a scheduled task or a server job to update it automatically at the right time.
But...I'd probably still be awake checking it had worked.

Look the best solution would be to parameterise your config file and add things like when a certain entry should be used from. This would negate the need for any copying or swapping of files and your application would simply deal with it. (That goes for a config file approach or a database)
If you cannot change the current systems and you have to go with swapping the config files, then you also have two options:
Use a scheduled task to kick off a batch job or even a VBScript or PowerShell script (which ever you feel comfortable with) Make sure you set up the correct credentials to be able to do this at the middle of the night and you could also add some checking and mitigation into this approach.
Write a windows Service that does this for you. Here you have all the flexibility you need. Code it to do whatever it needs to do, do all the checks you need to (so that you can keep sleeping rather than making sure it actually worked) etc, etc. You service would then even take care of the scheduling aspect and all will be good. Here you could use xml DOM object and xPath and not replace the file, but simply update the specific entries as required.
Remember that any change to the config file would cause your site to restart, so make sure you take care of all the other housekeeping stuff that this could cause. (Although this would be exactly the same if you where sitting there in the middle of the night copying file around)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008