Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I was wondering what are the advantages of using Triple Stores over a relational database?
The viewpoint of the CTO of a company that extensively uses RDF Triplestores commercially:
Schema flexibility - it's possible to do the equivalent of a schema change to an RDF store live, and without any downtime, or redesign - it's not a free lunch, you need to be careful with how your software works, but it's a pretty easy thing to do.
More modern - RDF stores are typically queried over HTTP it's very easy to fit them into Service Architectures without hacky bridging solutions, or performance penalties. Also they handle internationalised content better than typical SQL databases - e.g. you can have multiple values in different languages.
Standardisation - the level of standardisation of implementations using RDF and SPARQL is much higher than SQL. It's possible to swap out one triplestore for another, though you have to be careful you're not stepping outside the standards. Moving data between stores is easy, as they all speak the same language.
Expressivity - it's much easier to model complex data in RDF than in SQL, and the query language makes it easier to do things like LEFT JOINs (called OPTIONAL in SPARQL). Conversely though, if your data is very tabular, then SQL is much easier.
Provenance - SPARQL lets you track where each piece of information came from, and you can store metadata about it, letting you easily do sophisticated queries, only taking into account data from certain sources, or with a certain trust level, on from some date range etc.
There are downsides though. SQL databases are generally much more mature, and have more features than typical RDF databases. Things like transactions are often much more crude, or non existent. Also, the cost per unit information stored in RDF v's SQL is noticeably higher. It's hard to generalise, but it can be significant if you have a lot of data - though at least in our case it's an overall benefit financially given the flexibility and power.
Both commenters are correct, especially since Semantic Web is not a database, it's a bit more general than that.
But I guess you might mean triple store, rather than Semantic Web in general, as triple store v. relational database is a somewhat more meaningful comparison. I'll preface the rest of my answer by noting that I'm not an expert in relational database systems, but I have a little bit of knowledge about triple stores.
Triple (or quad) stores are basically databases for data on the semantic web, particularly RDF. That's kind of where the similarity between triples stores & relational databases end. Both store data, both have query languages, both can be used to build applications on top of; so I guess if you squint your eyes, they're pretty similar. But the type of data each stores is quite different, so the two technologies optimize for different use cases and data structures, so they're not really interchangeable.
A lot of people have done work in overlaying a triples view of the world on top of a relational database, and that can work, and also will be slower than a system dedicated for storing and retrieving triples. Part of the problems is that SPARQL, the standard query language used by triple stores, can require a lot of self joins, something relational databases are not optimized for. If you look at benchmarks, such as SP2B, you can see that Oracle, which just overlays SPARQL support on its relational system, runs in the middle or at the back of the pack when compared with systems that more natively support RDF.
Of course, the RDF systems would probably get crushed by Oracle if they were doing SQL queries over relational data. But that's kind of the point, you pick the tool that's well suited for the application you want to build.
So if you're thinking about building a semantic web application, or just trying to get some familiarity in the area, I'd recommend ultimately going with a dedicated triple store.
I won't delve into reasoning and how that plays into query answering in triple stores, as that's yet another discussion, but it's another important distinction between relational systems and triple stores that do reasoning.
Some triplestores (Virtuoso, Jena SDB) are based on relational databases and simply provide an RDF / SPARQL interface. So to rephrase the question slighty, are triplestores built from the ground up as a triplestore more performant than those that aren't - #steve-harris definitely knows the answer to that ;) but I wager a yes.
Secondly, what features do triplestores have that RDBMS don't. The simple answer is support for SPARQL, RDF, OWL etc. (i.e the Semantic Web Technology stack) and to make it a fair fight, its better to define the value of SPARQL based on SPARQL 1.1 (it has considerably more features than 1.0). This provides support for federation (so so cool), property path expressions and entailment regimes along with an standards set of update protocols, graph management protocols (that SPARQL 1.0 didn't have and sorely lacked). Also #steve-harris points out that transactions are not part of the standard (can of worms) although many vendors provide non-standardised mechanisms for transactions (Virtuoso supports JDBC and Hibernate compliant connection pooling and management along with all the transactional features of Hibernate)
The big drawback in my mind is that not many triplestores support all of SPARQL 1.1 (since it is still not in recommendation) and this is where the real benefits lie.
Having said that, I am and always have been an advocate of substituting RDBMS with triplestores and platforms I deliver run entirely off triplestores (Volkswagen in my last role was an example of this), deprecating the need for RDBMS. An additional advantage is that Object to RDF mapping is more flexible and provides more options and flexibility than traditional ORM (also known as putting a square peg in a round hole).
Also you can still use a database but use RDF as a data exchange format which is very flexible.
When you say "thin data access layer", does this mainly mean you are talking about writing your SQL manually as opposed to relying on an ORM tool to generate it for you?
That probably depends on who says it, but for me a thin data access layer would imply that there is little to no additional logic in the layer (i.e. data storage abstractions), probably no support for targeting multiple RDBMS, no layer-specific caching, no advanced error handling (retry, failover), etc.
Since ORM tools tend to supply many of those things, a solution with an ORM would probably not be considered "thin". Many home-grown data access layers would also not be considered "thin" if they provide features such as the ones listed above.
Depends on how we define the word "thin". It's one of the most abused terms I hear, rivaled only by "lightweight".
That's one way to define it, but perhaps not the best. An ORM layer does a lot besides just generate SQL for you (e.g., caching, marking "dirty" fields, etc.) That "thin" layer written in lovingly crafted SQL can become pretty bloated by the time you implement all the features an ORM is providing.
I think "thin" in this context means:
It is lightweight;
It has a low performance overhead; and
You write minimal code.
Writing SQL certainly fits this bill but there's no reason it couldn't be an ORM either although most ORMs that spring to mind don't strike me as lightweight.
I think it depends on the context.
It could very well mean that, or it may simply mean that your business objects map directly a simple underlying relational table structure: one table per class, one column per class attribute, so that the translation of business object structure to database table structure is "thin" (i.e. not complex). This could still be handled by an ORM of course.
It may mean that there is no or minimal logic employed on the database such as avoiding the use of stored procedures. As other people have mentioned it depends on the statement's context as to the most likely meaning.
I thought data access were always supposed to be thin... DALs aren't really the place to have logic.
Maybe the person you talked to is talking about a combination of a business layer and a data access layer; where the business layer is non-existent (e.g. a very simple app, or perhaps all of the business rules are in the database, etc).
Obviously, "Hello World" doesn't require a separated, modular front-end and back-end. But any sort of Enterprise-grade project does.
Assuming some sort of spectrum between these points, at which stage should an application be (conceptually, or at a design level) multi-layered? When a database, or some external resource is introduced? When you find that the you're anticipating spaghetti code in your methods/functions?
when a database, or some external resource is introduced.
but also:
always (except for the most trivial of apps) separate AT LEAST presentation tier and application tier
see:
http://en.wikipedia.org/wiki/Multitier_architecture
Layers are a mean to keep a design loosely coupled and highly cohesive.
When you start to have a few classes (either implemented or just sketched with UML), they can be grouped logically, into layers - or more generally packages, or modules. This is called the art of separating the concerns.
The sooner the better: if you do not start layering early enough, then you risk to have never do it as the effort can be too important.
Here are some criteria of when to...
Any time you anticipate the need to
replace one part of it with a
different part.
Any time you find
yourself need to divide work amongst
parallel team.
There is no real answer to this question. It depends largely on your application's needs, and numerous other factors. I'd suggest reading some books on design patterns and enterprise application architecture. These two are invaluable:
Design Patterns: Elements of Reusable Object-Oriented Software
Patterns of Enterprise Application Architecture
Some other books that I highly recommend are:
The Pragmatic Programmer: From Journeyman to Master
Refactoring: Improving the Design of Existing Code
No matter your skill level, reading these will really open your eyes to a world of possibilities.
I'd say in most cases dealing with multiple distinct levels of abstraction in the concepts your code deals with would be a strong signal to mirror this with levels of abstraction in your implementation.
This does not override the scenarios that others have highlighted already though.
I think once you ask yourself "hmm should I layer this" the answer is yes.
I've worked on too many projects that probably started off as proof of concept/prototype that ended up being full projects used in production, which are horribly written and just wreak of "get it done quick, we'll fix it later." Trust me, you wont fix it later.
The Pragmatic Programmer lists this as the Broken Window Theory.
Try and always do it right from the start. Separate your concerns. Build it with modularity in mind.
And of course try and think of the poor maintenance programmer who might take over when you're done!
Thinking of it in terms of layers is a little limiting. It's what you see in whitepapers about a product, but it's not how products really work. They have "boxes" that depend on each other in various ways, and you can make it look like they fit into layers but you can do this in several different configurations, depending on what information you're leaving out of the diagram.
And in a really well-designed application, the boxes get very small. They are down to the level of individual interfaces and classes.
This is important because whenever you change a line of code, you need to have some understanding of the impact your change will have, which means you have to understand exactly what the code currently does, what its responsibilities are, which means it has to be a small chunk that has a single responsibility, implementing an interface that doesn't cause clients to be dependent on things they don't need (the S and the I of SOLID).
You may find that your application can look like it has two or three simple layers, if you narrow your eyes, but it may not. That isn't really a problem. Of course, a disastrously badly designed application can look like it has layers tiers if you squint as hard as you can. So those "high level" diagrams of an "architecture" can hide a multitude of sins.
My generic rule of thumb is to at least to separate the problem into a model and view layer, and throw in a controller if there is a possibility of more than one ways of handling the model or piping data to the view.
(Or as the first answer, at least the presentation tier and the application tier).
Loose coupling is all about minimising dependencies, so I would say 'layer' when a dependency is introduced. i.e. a database, third party application, etc.
Although 'layer' is probably the wrong term these days. Most of the time I use Dependency Injection (DI) through an Inversion of Control container such as Castle Windsor. This means that I can code on one part of my system without worrying about the rest. It has the side effect of ensuring loose coupling.
I would recommend DI as a general programming principle all of the time so that you have the choice on how to 'layer' your application later.
Give it a look.
R
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
One thing I struggle with is planning an application's architecture before writing any code.
I don't mean gathering requirements to narrow in on what the application needs to do, but rather effectively thinking about a good way to lay out the overall class, data and flow structures, and iterating those thoughts so that I have a credible plan of action in mind before even opening the IDE. At the moment it is all to easy to just open the IDE, create a blank project, start writing bits and bobs and let the design 'grow out' from there.
I gather UML is one way to do this but I have no experience with it so it seems kind of nebulous.
How do you plan an application's architecture before writing any code? If UML is the way to go, can you recommend a concise and practical introduction for a developer of smallish applications?
I appreciate your input.
I consider the following:
what the system is supposed to do, that is, what is the problem that the system is trying to solve
who is the customer and what are their wishes
what the system has to integrate with
are there any legacy aspects that need to be considered
what are the user interractions
etc...
Then I start looking at the system as a black box and:
what are the interactions that need to happen with that black box
what are the behaviours that need to happen inside the black box, i.e. what needs to happen to those interactions for the black box to exhibit the desired behaviour at a higher level, e.g. receive and process incoming messages from a reservation system, update a database etc.
Then this will start to give you a view of the system that consists of various internal black boxes, each of which can be broken down further in the same manner.
UML is very good to represent such behaviour. You can describe most systems just using two of the many components of UML, namely:
class diagrams, and
sequence diagrams.
You may need activity diagrams as well if there is any parallelism in the behaviour that needs to be described.
A good resource for learning UML is Martin Fowler's excellent book "UML Distilled" (Amazon link - sanitised for the script kiddie link nazis out there (-: ). This book gives you a quick look at the essential parts of each of the components of UML.
Oh. What I've described is pretty much Ivar Jacobson's approach. Jacobson is one of the Three Amigos of OO. In fact UML was initially developed by the other two persons that form the Three Amigos, Grady Booch and Jim Rumbaugh
I really find that a first-off of writing on paper or whiteboard is really crucial. Then move to UML if you want, but nothing beats the flexibility of just drawing it by hand at first.
You should definitely take a look at Steve McConnell's Code Complete-
and especially at his giveaway chapter on "Design in Construction"
You can download it from his website:
http://cc2e.com/File.ashx?cid=336
If you're developing for .NET, Microsoft have just published (as a free e-book!) the Application Architecture Guide 2.0b1. It provides loads of really good information about planning your architecture before writing any code.
If you were desperate I expect you could use large chunks of it for non-.NET-based architectures.
I'll preface this by saying that I do mostly web development where much of the architecture is already decided in advance (WebForms, now MVC) and most of my projects are reasonably small, one-person efforts that take less than a year. I also know going in that I'll have an ORM and DAL to handle my business object and data interaction, respectively. Recently, I've switched to using LINQ for this, so much of the "design" becomes database design and mapping via the DBML designer.
Typically, I work in a TDD (test driven development) manner. I don't spend a lot of time up front working on architectural or design details. I do gather the overall interaction of the user with the application via stories. I use the stories to work out the interaction design and discover the major components of the application. I do a lot of whiteboarding during this process with the customer -- sometimes capturing details with a digital camera if they seem important enough to keep in diagram form. Mainly my stories get captured in story form in a wiki. Eventually, the stories get organized into releases and iterations.
By this time I usually have a pretty good idea of the architecture. If it's complicated or there are unusual bits -- things that differ from my normal practices -- or I'm working with someone else (not typical), I'll diagram things (again on a whiteboard). The same is true of complicated interactions -- I may design the page layout and flow on a whiteboard, keeping it (or capturing via camera) until I'm done with that section. Once I have a general idea of where I'm going and what needs to be done first, I'll start writing tests for the first stories. Usually, this goes like: "Okay, to do that I'll need these classes. I'll start with this one and it needs to do this." Then I start merrily TDDing along and the architecture/design grows from the needs of the application.
Periodically, I'll find myself wanting to write some bits of code over again or think "this really smells" and I'll refactor my design to remove duplication or replace the smelly bits with something more elegant. Mostly, I'm concerned with getting the functionality down while following good design principles. I find that using known patterns and paying attention to good principles as you go along works out pretty well.
http://dn.codegear.com/article/31863
I use UML, and find that guide pretty useful and easy to read. Let me know if you need something different.
UML is a notation. It is a way of recording your design, but not (in my opinion) of doing a design. If you need to write things down, I would recommend UML, though, not because it's the "best" but because it is a standard which others probably already know how to read, and it beats inventing your own "standard".
I think the best introduction to UML is still UML Distilled, by Martin Fowler, because it's concise, gives pratical guidance on where to use it, and makes it clear you don't have to buy into the whole UML/RUP story for it to be useful
Doing design is hard.It can't really be captured in one StackOverflow answer. Unfortunately, my design skills, such as they are, have evolved over the years and so I don't have one source I can refer you to.
However, one model I have found useful is robustness analysis (google for it, but there's an intro here). If you have your use-cases for what the system should do, a domain model of what things are involved, then I've found robustness analysis a useful tool in connecting the two and working out what the key components of the system need to be.
But the best advice is read widely, think hard, and practice. It's not a purely teachable skill, you've got to actually do it.
I'm not smart enough to plan ahead more than a little. When I do plan ahead, my plans always come out wrong, but now I've spend n days on bad plans. My limit seems to be about 15 minutes on the whiteboard.
Basically, I do as little work as I can to find out whether I'm headed in the right direction.
I look at my design for critical questions: when A does B to C, will it be fast enough for D? If not, we need a different design. Each of these questions can be answer with a spike. If the spikes look good, then we have the design and it's time to expand on it.
I code in the direction of getting some real customer value as soon as possible, so a customer can tell me where I should be going.
Because I always get things wrong, I rely on refactoring to help me get them right. Refactoring is risky, so I have to write unit tests as I go. Writing unit tests after the fact is hard because of coupling, so I write my tests first. Staying disciplined about this stuff is hard, and a different brain sees things differently, so I like to have a buddy coding with me. My coding buddy has a nose, so I shower regularly.
Let's call it "Extreme Programming".
"White boards, sketches and Post-it notes are excellent design
tools. Complicated modeling tools have a tendency to be more
distracting than illuminating." From Practices of an Agile Developer
by Venkat Subramaniam and Andy Hunt.
I'm not convinced anything can be planned in advance before implementation. I've got 10 years experience, but that's only been at 4 companies (including 2 sites at one company, that were almost polar opposites), and almost all of my experience has been in terms of watching colossal cluster********s occur. I'm starting to think that stuff like refactoring is really the best way to do things, but at the same time I realize that my experience is limited, and I might just be reacting to what I've seen. What I'd really like to know is how to gain the best experience so I'm able to arrive at proper conclusions, but it seems like there's no shortcut and it just involves a lot of time seeing people doing things wrong :(. I'd really like to give a go at working at a company where people do things right (as evidenced by successful product deployments), to know whether I'm a just a contrarian bastard, or if I'm really as smart as I think I am.
I beg to differ: UML can be used for application architecture, but is more often used for technical architecture (frameworks, class or sequence diagrams, ...), because this is where those diagrams can most easily been kept in sync with the development.
Application Architecture occurs when you take some functional specifications (which describe the nature and flows of operations without making any assumptions about a future implementation), and you transform them into technical specifications.
Those specifications represent the applications you need for implementing some business and functional needs.
So if you need to process several large financial portfolios (functional specification), you may determine that you need to divide that large specification into:
a dispatcher to assign those heavy calculations to different servers
a launcher to make sure all calculation servers are up and running before starting to process those portfolios.
a GUI to be able to show what is going on.
a "common" component to develop the specific portfolio algorithms, independently of the rest of the application architecture, in order to facilitate unit testing, but also some functional and regression testing.
So basically, to think about application architecture is to decide what "group of files" you need to develop in a coherent way (you can not develop in the same group of files a launcher, a GUI, a dispatcher, ...: they would not be able to evolve at the same pace)
When an application architecture is well defined, each of its components is usually a good candidate for a configuration component, that is a group of file which can be versionned as a all into a VCS (Version Control System), meaning all its files will be labeled together every time you need to record a snapshot of that application (again, it would be hard to label all your system, each of its application can not be in a stable state at the same time)
I have been doing architecture for a while. I use BPML to first refine the business process and then use UML to capture various details! Third step generally is ERD! By the time you are done with BPML and UML your ERD will be fairly stable! No plan is perfect and no abstraction is going to be 100%. Plan on refactoring, goal is to minimize refactoring as much as possible!
I try to break my thinking down into two areas: a representation of the things I'm trying to manipulate, and what I intend to do with them.
When I'm trying to model the stuff I'm trying to manipulate, I come up with a series of discrete item definitions- an ecommerce site will have a SKU, a product, a customer, and so forth. I'll also have some non-material things that I'm working with- an order, or a category. Once I have all of the "nouns" in the system, I'll make a domain model that shows how these objects are related to each other- an order has a customer and multiple SKUs, many skus are grouped into a product, and so on.
These domain models can be represented as UML domain models, class diagrams, and SQL ERD's.
Once I have the nouns of the system figured out, I move on to the verbs- for instance, the operations that each of these items go through to commit an order. These usually map pretty well to use cases from my functional requirements- the easiest way to express these that I've found is UML sequence, activity, or collaboration diagrams or swimlane diagrams.
It's important to think of this as an iterative process; I'll do a little corner of the domain, and then work on the actions, and then go back. Ideally I'll have time to write code to try stuff out as I'm going along- you never want the design to get too far ahead of the application. This process is usually terrible if you think that you are building the complete and final architecture for everything; really, all you're trying to do is establish the basic foundations that the team will be sharing in common as they move through development. You're mostly creating a shared vocabulary for team members to use as they describe the system, not laying down the law for how it's gotta be done.
I find myself having trouble fully thinking a system out before coding it. It's just too easy to only bring a cursory glance to some components which you only later realize are much more complicated than you thought they were.
One solution is to just try really hard. Write UML everywhere. Go through every class. Think how it will interact with your other classes. This is difficult to do.
What I like doing is to make a general overview at first. I don't like UML, but I do like drawing diagrams which get the point across. Then I begin to implement it. Even while I'm just writing out the class structure with empty methods, I often see things that I missed earlier, so then I update my design. As I'm coding, I'll realize I need to do something differently, so I'll update my design. It's an iterative process. The concept of "design everything first, and then implement it all" is known as the waterfall model, and I think others have shown it's a bad way of doing software.
Try Archimate.
I create business applications with heavy database use. Most of the programming work is just to connect components to the database and modifying components to adapt to general interface behaviour. I mostly use Delphi with its rich VCL library, and generally buy components needed. I keep most of the business logic in the database. I rarely get the chance to build a nice class hierarchy from the bottom up as there really is no need. Anyone else have this experience?
For me, occasionally a problem is clearer or easier with subclassing, but not often.
This also changes quite a bit in a given design as it's refactored.
My biggest problem is that programming courses and texts give so much weight to inheritance, hierarchies, and polymorphism through base classes (vs. interfaces or dynamic typing). This helps create legions of programmers that subclass everything and their mother.
The answer to this question is not totally language-agnostic;
Some languages like Java have a fairly limited set of language features available, meaning that subclassing is fairly often used because it's a convenient method for re-use, technical inheritance.
Closures and lambdas of C# make inheritance for technical reasons much less relevant. So normally inheritance is used for semantic reasons (like cat extends animal).
The last C# project I worked on, we more or less made all of the class hierarchies within a few weeks. After that it was more or less over.
On my current java project we create new class hierarchies all of the time.
Other languages will have other features that similarly affect this composition (mixins come to mind)
I put on my architecting/class design hat probably once or twice a month. It's probably the best hat I have and is the most fun to wear.
Depends what stage of the lifecycle your project is in though.
When your tackling problem domains you are well familiar with and already have a common code base to work from, you often have no need to create a new class hierarchy. It's when you stumble upon problems you have no ready solutions for, that you start building your own.
It's also very dependant on the type of applications you develop. If your domain already has well accepted conventions and libraries to work from, there probably isn't any need to reinvent the wheel (other than personal / academic interests). Some areas have inherently less available resources to work with, and in those you'll find yourself building everything from scratch most of the time.
A majority of applications, especially business applications, contains at least some kind of business logic in it. I would contend that business should not be in the database, but should rather be in the application. You can put referential integrity in the database as I think this is a good choice, but business logic should be only in the application.
By class hierarchy, I suppose you mean do you always have to end up with some inheritance in your object model, then the answer is no. But chances are you can often find some common code, factor it out and create a base class to contain the common code.
If you agree with me on the point that business logic should not be in the database, but should be in the application, then I recommend you look into the MVC Design Pattern to guide your design. You will find your design contain classes or objects. Your VCLs will represent your View, and you can have your Model classes map directly to the database table, i.e. each member in the class in the model corresponds to a field in a database table (again, this is the norm but there will be exception, where this simplicity fails to apply). Then you'll need a layer to handle the CRUD (Create, Read, Update, Delete) of the Model classes to the database tables. You will end up with an "layered" application that is easier to maintain and enhance.
It depends on what you mean by hierarchy - inheritance or layering?
When object oriented languages first came out, inheritance was overused. Complicated hierarchies were common. Now, interfaces (as in Java and C#) provide a simpler way to get the benefit of polymorphism without the complications of inheritance. I rarely use inheritance anymore.
Layering, however, is vital when creating a large application. Layering prevents general low-level classes (like lists) from directly referencing specific high-level classes (like web browser windows). As far as I know, there isn't a formal way to describe layering, but there are general guidelines (model-view-controller (MVC), separate GUI logic from business logic, separate data from presentation, etc.).
It really depends on the types/phases of the projects you're working on. I happen to do that everyday because I'm working on database internals for a new database, creating related libraries/frameworks. I'd imagine doing that a lot less if I'm working within a mature framework using other people's libraries.
I'm doing Infrastructure for our companys' product, so I'm writing a lot of code that will be used later by guys in other teams. So I end up writing lots of abstract classes, interfaces, hierarchies and so on. Mostly it's just a pattern of "default behaviour in an abstract/virtual class, which other programmers may override".
Very challenging, I must say.
The time that I find class hierarchies most beneficial is when the relationship between objects actually does match a true "is-a" relationship in the domain.
However if I can avoid large hierarchies I will due to the fact that they are often a little more tricky to map to relational databases and can really complicate your database designs. Since you say most of your applications make heavy use of databases this would be something to take into consideration.