Soft Delete vs. DB Archive - relational-database

Suggested Reading
Similar: Are soft deletes a good idea?
Good Article: http://weblogs.asp.net/fbouma/archive/2009/02/19/soft-deletes-are-bad-m-kay.aspx
How I ended up here
I strongly belive that when making software, anything done up front to minimize work later on pays off in truck loads. As such, I am trying to make sure when approaching my database schema and maintenance that it can maintain relational integrity while not being archaic or overly complex.
This resulted in a sort of shudder when looking at the typical delete approach, CASCADE. Yikes, a little over the top for my current situation. I wanted to maintain relational graph integrity, but I didn't want to remove every graph just because one part of the chain was irrelevant. Therefore I chose to go the way of soft deleting to make sure data integrity would remain while records could be removed from relevance. I accomplished this by adding a "DateDeleted" field to every, sigh, table in the database.
Turning Point
However, this is clearly starting to add too much complexity and work to be worth it. I am including logic where it should not go and do not feel like perpetuating these bad practices throughout my whole application. In short, I am going to roll back this implementation.
When looking up weather or not people like soft-deleting, it seems there is a lot of support for it. In fact, the linked "Similar" post up top sports a top voted answer of "I always soft-delete". Moreover, the majority of answers there and around SO include some sort of "isDeleted" or "isActive" type of approach.
New Implementation Idea
The "Good Article" linked covers some of the issues I actually began encountering. It also suggests an alternative to soft-deleting which I found spot on from a best practices standpoint. The suggestion is to use an "Archiving Database", which I had actually considered when looking at soft deleting. The reason I decided against it was because of the point I made earlier about CASCADE deleting. I am wary to remove entire graphs from the database because one part of the chain is removed. However, this graph would be able to be retained at least from the archive so I am not sure that it would be really that terrible.
Crossroads
So, should I just keep adding logic, logic, logic....logic? Or, should I consider making the archival database where most of the logic would simply sit in a very complex graph management class to store / restore relational object graphs? The latter seems to be best practice to me.

Soft deleting is definitely an easy approach in theory. However, not really much attention is paid to what to do with the data that wasn't deleted. In fact, it is glossed over.
In my opinion this is because the wrong issue is in focus. Not just "what does deleting mean", but what IS being deleted. When a record is to be removed, what is really being removed is a node in a graph - not just a single record. That whole graph integriy is the reason for people to bandaid over the issue with "soft deletes". These bandaid solutions tend to hide the gangrene underneath - a festering problem which only gets worse with time.
What's worse is that in order to accompany the soft delete logic must be included all over (many times breaking various conventions and implementing anti-patterns) to account for the possible breaks in the object graph. Moreover, what kind of business logic is "isDeleted"?!
I believe a very strong solution to this problem, the problem of removing an object while retaining the referential integrity of the object graph, is to use an archival pattern. On delete of an object, the object is archived then deleted. The archive database, a mirror database with meta data (temporal database design can be used and is very relevant here), would then receive the object to be archived and restored if necessary.
This makes it very direct to avoid listing or including a deleted object as the relevant database will no longer hold it. Now, the same logic which was applied looking for "isDeleted" "isActive" or "DeletedDate" can be applied in the correct place (Not all over the place) to foreign keys of retrieved objects. When a foreign key is present, but the object is not, then there is now a logical explanation and a logical set of options. Display that the containing object was deleted and some course of action: "Restore, Delete Current Containing Object, View Deleted". These options can be either chosen by the user, or explicitly defined in code in a logical manner. Depending on how advanced the archival database is, perhaps more options exist such as who deleted it, when, why, etc. etc.

Related

Downside of having string properties in service contracts that can contain a full json model

We are working with a DDD framework in our company. We are changing a lot of core things in our API because we are still growing and we are still in our enfant phase when designing a good API.
The problem is that there are alot of flows already in the same api. Which are not compatible with eachother.
We have an order service and a product service.
Normally when the product model radically changes, we have a major impact in the order model.
Now im here listing all kind of red flags which should never happen but I simply dont have control over how it needs to be done. That is pretty much management pushing for a fast solution. And leading to bad shortcuts...
The way is has been decided to overcome that Order needs to adapt constantly. They made a property in the orderline called productConfiguration. This is in the contract of the service and is direcrtly translated as is in the DB tables. This contains the product model that can change. In json format.
For me its very clear that this is very dangerous to do this. Because i nthe end you need to change this json into an actual object. So you just move the restrictions from the service contract to code logic. Which makes it worse cause it will only cause an issue at run time...
Are there other major things I just know about, so I can bring it to the table to avoid this way of working...
Using strings that are directly converted into DB tables is not just in your opinion a bad design. It's an opinion shared by a lot of us.
What do you do when an object changes? For example, the new one requires an attribute that the old one didn't had. How do you manage this situation? I suppose that you've to change everything, including the objects stored before. Or build a kind of transformation layer where you translate objects from the old to the new design. A lot of extra work.
Anyway, given that the two domains are separated, what are the information that change so much and require such a design? I mean, for most of the things you could know at the beginning what do you need for your part of the domain. For the rest, I would prefer to have a kind of service that given an Id gives you the information from the other domain. You can change this service (here could be also json obj, if nothing than just showing is required) and adapt to your/their needs. But, it's just a solution that comes from my limited knowledge of your processes.
Other ways are also possible, as long as you can always understand which version of the design are you using.

How best to handle a many-to-many concurrency conflict?

We have an application feature similar to gmail's labels - you can 'tag' the items. Now this is a concurrent application i.e., this so called 'whiteboard' is editable by multiple users - which means that many users can choose to re/group the items. Basically tag multiple items at the same time.
There will definitely be conflicts but the question is how best to handle it? The only strategy that comes to mind is similar to the famous ALOHA protocol i.e., check before commit if any thing has changed - if so, abort and inform user; else commit. This is quite inefficient IMO.
Here are two similar ideas - one difficult and the other easier by comparison:
Easier one first :) - overwrite changes i.e., the duplicates would just be updated but new ones would be tagged too.
Difficult one: Check for which are to be 'removed' i.e., there could be some that doesn't belong to the categorization (by user 2 say. i.e., user 1 made a change and user 2 also made it at the same time. Basically finding the set of tagged items {user1 - user2}). This is going to be extremely hard and really not worth the effort IMHO.
I was wondering what's the best practice solution to use in such a case which doesn't hinder the user experience and doesn't confuse them either.
(This is a J2EE/Restlet app with a MySQL backend and a Jquery/ajax front end)
The answer to this question really depends on who your users are and what they expect. If it were me, I think I would anticipate being informed of a user creating changes before mine are done (stackoverflow does this), and allow me to commit changes anyway or roll back. All of the solutions you've presented seem acceptable .. it depends purely on what you want to do. If you're asking for how to do this with code, you are going to have to post some code first so we can see what you are dealing with.
Another possible solution would be similar to #2 (just overwrite changes as they occur), but keep a revision of each change to allow for easy reversions, and to make it easy to tell if changes were made on top of others.

Write programs that do one thing and do it well

I can grasp the part "do one thing" via encapsulation, Dependency Injection, Principle of Least Knowledge, and You Ain't Gonna Need It; but how do I understand the second part "do it well?"
An example given was the notion of completeness, given in the same YAGNI article:
for example, among features which allow adding items, deleting items, or modifying items, completeness could be used to also recommend "renaming items".
However, I found reasoning like that could easily be abused into feature creep, thus violating the "do one thing" part.
So, what is a litmus test for seeing rather a feature belongs to the "do it well" category (hence, include it into the function/class/program) or to the other "do one thing" category (hence, exclude it)?
The first part, "do one thing," is best understood via UNIX's ls command as a counterexample for its inclusion of excessive number of flags for formatting its output, which should have been completely delegated to another external program. But I don't have a good example to see the second part "do it well."
What is a good example where removing any further feature would make it not "do it well?"
I see "Do It Well" as being as much about quality of implementation of a function than about the completeness of a set functions (in your example having rename, as well as create and delete).
Do It Well manifests in many ways, some ways of thinking:
Behaviour in response to "special" inputs. Example, calculating the mean of some integers:
int mean(int[] values) { ... }
what does this do if the array has zero elements? If the items total more than MAX_INT?
Performance Characteristics. Has sufficient attention been given to behaviour as the data volumes increase?
Dependency Failures. If our implementation depends upon other modules or infrastructure what happens when these fail. Example: File System Full, Database Down?
Concerning feature creep itself, I think you're correct to indentify a tension here. One thing you might consider: you don't need to implment every feature providing that it's pretty obvious that a feature can be added easily without a complete rewrite.
The whole purpose of this advice is to make you favor quality over quantity.
The concept of one thing is subjective and depends on granularity. Would you say that a spreadsheet application does more than one thing if it can also print, or is that part of that one thing?
The point is that you should make sure that any feature, and the application itself, is done and will delight customers before you scramble to add new features.
I think your question points out the fundamentally organic nature of feature creep, and in understanding that nature, you will be empowered to meditate on the larger question.
Think of it like a garden: If you plant one thing and plant it well, say, a chrysanthemum, you aren't done at simply planting the seed. In fact you'll need to ensure that the soil is well tended, that the area is sufficiently protected, that the season is right, etc.
As your chrysanthemum (your one thing) grows, so too will other competitive plants - some that need to be weeded out and others that may actually compliment the original one thing. In fact, these other organisms may in some cases prove vital for the survival of your one thing.
Like those features that YAGN, a bit of vigilance is required to determine which weeds represent feature creep and which represent vital and complimentary functions.
Regardless, having done it well means simply that your chrysanthemum is hearty, healthy, and on-time. :-)
I would say an email program without the ability to add attachments would be a good example.
This may sound like an odd example, but I'd say dropbox is a good, albeit complex example.
Its managed to beat off a swathe of similar competing apps, through a dedication to simplification and a lack of feature creep tha,t as you mentioned, would violate the 'do one thing' principle. The ap lets you store documents in a folder that you can access anywhere, and that's about the limit of it. They drilled down to the core problem, and solved it in a way that works perfectly well in 90+% of cases.
Its hard to put a hard and fast rule to it, but I'd say that catering to around the 90% majority of use cases and ignoring 'fringe requirements' is the best way to stick to this rule.
I'd guess 90+% of ls use is with no arguments or maybe two or three of the most popular. The 'do it well' principle should focus on what the majority of users need, instead of catering for power users or fringe cases, as ls does with its plethora of options.
This is what dropbox does successfully and why it is pretty well agreed upon as an example of good application design.

Is there any reason to use one DataContext instance, instead of several?

For example, I have 2 methods that use one DataContext (Linq to sql).
using(DataContext data = new DataContext){
// doing something
another_datamethod(data);
}
void another_datamethod(DataContext data){
// doing
}
Use this style? Or with the same result, I can create separate "using DataContext". What benefits, I would achieve if i'll use one DataContext? Maybe some cache possibilities?
Recently, I've read numerous articles and blogs that "highly recommend" that you use multiple DataContexts for your applications, due to multiple issues including the creation of records associated with lookup tables. When I was learning LINQ-to-SQL, one of the most attractive qualities of it for me was the ability to import my complete database schema into one "big" DataContext. So, that's what I did...but a few months, in comes the contradictory information saying that what I did was a bad thing. What to do, what to do...
Nine months later, here's where I stand. My single large DataContext is still my single large DataContext. I have over thirty data repository classes accessing the sixty-plus tables contained within, and I still haven't seen a valid reason to break up my existing data-dom, or to not handle the next project using a single DataContext. The problems that the article and blog writers experienced were valid problems. However, like most things technical, there's never just one way to do things. The best investment of my time and energy was to learn and truly understand how LINQ-to-SQL does what it does. The best book that I found to help me do exactly that is Pro LINQ: Language Integrated Query in C# 2008 by Joseph C. Rattz, Jr. The LINQ-to-SQL coverage is detailed and clear, and there are plenty of examples to clarify the mystery.
So, in your case, create one big DataContext or create many smaller ones...the choice is up to you. Smaller ones clearly give better opportunity for reuse, while one big one helps increase the time you can focus on business logic and presentation code.
Datacontexts track changes and do caching, so yes caching is a possibility depending on what work you are performing.

How to partition a problem into smaller understandable portions?

I'm not sure if it's possible to give general advice on this topic, but please try. It's hard to explain my case because it's too complex to explain. And that's exactly the problem.
I seem to constantly stumble on a situation where I try to design some part of my project, but it has so many things to take into consideration that I'm unable to get a grasp of it.
Are there any general tips or advice on how to look at my system in smaller pieces at a time? How to find smaller portions that could be designed separately on their own?
Create a glossary.
In other words, identify the terms that are meaningful to the project domain — not from the programmer's point of view, but from a user's, who is familiar with the subject matter.
Then define the terms as precisely and discretely as you can. A good definition in this form can serve as a kind of pseudocode.
Since you have not identified even the domain of your problem, I'll choose a random example. In a civilian personnel system, you might have terms like:
billet: a term of service (from start date to end date) at a particular grade and step
employee: a series of billets associated with a particular SSN
grade and step: row and column in the federal general schedule
And so on. This isn't to identify functional units, as it sounds like you are trying to do, but it's a good preparatory step before doing so, so that you can express your functional steps in well-defined terms.
Your key goals are:
High cohesion: Code (methods, fields, classes) within one piece/module/partition should interact intensively; it should make sense for these elements to know about each other. If you find that some of them don't interact much with the rest, they probably belong somwhere else or should form their own partition. If you find code outside interacting intensively with the partition and knowing too much about its inner workings, it probably belongs inside. The typical example is found in OO code written in procedural style, with "dumb" data objects and "manager" code that operates on them but should really be part of the data objects.
Loose coupling: Interaction between pieces/modules/partitions should only happen through narrow, well-defined, well-documented APIs. Try to identify such APIs and see what code is needed to implement them and what code will use them.
It's useful to approach problem decomposition both top-down and bottom-up.
If you're having trouble splitting a big problem into two or more smaller problems, try to think of the smallest possible problems that will need to be solved. Once those are handled, you may start to see ways to combine them into larger problems as you approach your original large problem.
When I find myself copying and pasting chunks of code with minimal adjustments I realize that's a "partition" and then create a class, method, function, or whatever.
Actually, the whole object oriented approach is what it's all about. Try thinking of your application as tangible things that do stuff. Write pseudo code describing what the things are and what they do, I find lots of "partitions" this way.
Here's a try, kind of wild guess.
People usually underestimate how long it will take them to do the work. If your project is large, then most likely you'll need several people to work on it, so you can try planning with that in mind. Now a person can be expected to hold just one area in the head, so you'll need to explain to him exactly what kind of task he's supposed to do.
So I'd say you should try to write a job description that should encompass as much as possible for one person to seriously concentrate on. Repeat, until you have broken your project into parts you wanted to. As a benefit, you're ready to assemble your team. But if you find out the parts are small, maybe you'll still be able to do it yourself.