In what way do Object-Relational databases provide limited inferencing, when compared to the use of Ontologies? - relational-database

I'm currently working on modeling context for a context-aware application.
The better choice seems to be ontologies, however object oriented models and relation data bases, seem to have some advantages too.
An author (Jagdev Bhogal and Philip Moore) in particular used object-relational databases to model context and claim that:
The ORDBMS approach provides limited inferencing. A subtype
definition has access to the representation of all of its direct
supertypes (but only within the ADT definition that defines
the subtype of that supertype), but it has no access to the
representation of its sibling types. 〈…〉 Such functionality would
need to be manually programmed when developing the application
interface.
I'm new to this subject, and it seems to me that you could make very complex queries to retrieve almost any information, but I've never used ontologies before.
So: in what way do object-relational databases, or even regular relational databases, provide limited inferencing, when compared to the use of ontologies (e.g. OWL Description Logic)?

Related

Build system that is not file-centric

We have a software infrastructure which works pretty much like a software build system: Information is gathered from different sources and used to generate some outputs. Like in traditional software builds we have different types of output, dependency trees, etc.
The main difference is that our sources, intermediate results and outputs are not inherently file-based. Rather, they're (uniquely addressable) data objects.
Right now we're mapping our data structure to files and directories in combination with a traditional build system (SCons) but that does not scale, both w.r.t. performance but (more importantly) w.r.t. maintainability. Hence I'm looking for an infrastructure that's built for this purpose from the ground up.
As an illustration, assume you have 3 XML documents A, B and C. Let's say that B/foo/bar is to be calculated from A/x/y and A/x/z, and that similarly C/a/b is calculated from A/x/y. I need an infrastructure to
Implement these relationships (i.e. the transformations and their dependencies)
Automatically re-build the relevant parts after changes are made
One major problem with using files is that, if I map A, B and C to some files A.xml, B.xml and C.xml and use a traditional build system, then any change to A.xml will trigger a rebuild of B.xml and C.xml, even if A/x/y and A/x/z (the original dependencies of B) are not modified. For a fine-grained dependency resolution I therefore would need to map each of A, B and C not to a file, but to a directory where each sub-directory represents an element, files represents attributes, etc. As I said, this does not scale for us.
(Please note that our system is not actually based on XML)
Right now I'm looking for any existing software, infrastructure or concept which points into this direction, regardless of implementation language and underlying data structures.
It sounds like you need an active object database management system (ODBMS) like GemStone/S. ODBMSs provide the traditional persistence services without the old cost of mapping data structures to files and the well-known benefits of object technology. As you've mentioned dependency trees and addressable objects, in ODBMSs navigational references are stored as part of their data, allowing any complex interaction patterns among objects to be represented/accessed. This is specially true when you predict a system which makes use of inheritance, object nesting and cross-referencing.
Although an object engine may seem oversized for your requirements, it is common for large-scale production business systems to store and execute methods using OODBMs, within a concurrent and multiuser environment. It doesn't come for free because you have to invest in the human part of the equation (education and experience) but once the initial fear is overcome, it will pay the return of investment.
For re-building (subscribed) parts after changes (notifications from announcers) are made, you may use the Observer design pattern, or one of its variants (SASE or Announcements framework), to implement your announce/subscription architecture. Under this type of event frameworks there are intrinsic problems which are hard to solve with traditional file-based solutions, as you have noticed already. For example, it is typical for a dependency mechanism to manage the replacement of an object, or in your example an XML document, by another one. Any modern events framework should manage when an object is removed, all dependents plugged to the old object are updated to the new reference.
Finally, there is a free GemStone/S stack which includes object dependency framework so you may experiment with a real object-database.
So nothing comes to mind that solves exactly your problem, but there are a few tools that might get you a little closer than you are now:
1) You might be able to throw something together using Fuse that would give you better control of how your data objects are mapped out to files. Fuse basically allows you to construct arbitrary file systems from whatever backing data you want. (The python bindings are pretty friendly, but there are a number of other language interfaces available as well). Then you could use a traditional build tool, and take advantage of file like objects better associated w/your data.
2) Cmake has a pretty extensible language for writing custom targets that you might be able to press into service. Unfortunately its language is pretty didactic and has something of a steep learning curve, so it wouldn't be my first choice.

Is there a name for the concept of a type such as this

I have a type that is constructed using information from various domain entities.
The type itself is present because within some contexts in the system it is useful and meaningful to abstract away from the large and complex legacy types that supply the information for the type. It exposes a subset of the fields of the types used to instantiate it, plus it contains some functionality of its own.
The type has its own service, providing a creation method, that under the hood, coordinates the creation and persistence of the domain entities that make up instances of the type.
Is there a name for the concept of such a type?
It is certainly an aggregate of some kind. It is certainly a kind of domain model, but it is a facade onto other domain models.
In a greenfield system I suspect the need for such a type would be limited, but I have found it to be useful when dealing with inflexible legacy codebases.
Simply Adapter pattern, I think.
Or, talking about legacy it wraps, I recall something about ball of mud in Martin Fowler's "Refactoring" - that says that sometimes it's better just to wrap it into pretty API and keep the mud inside.
I will invent a new term for your object - ActiveFacade - you heard it here first ;)

Specification Pattern defined in Domain

Using Linq to SQL, and a DDD style Domain Layer with de-coupled repositories, does anyone have any good ideas on how to implement a specification pattern without bleeding L2S concerns up into the domain layer, that is still understandable? :)
We have complex business logic surrounding the selection of a set of transaction data, and would like those rules/specifications to be owned by the Domain. We've also done a good job of keeping our domain persistence ignorant.
This presents a problem, because in order to implement a Specification, the domain (as far as I can tell) needs to see the types being queried (L2S types).
Any ideas?
Also, nHibernate is out of the question for reasons I don't want to explain.. :)
Have you considered mapping your generic Specifications into an Expression tree that would translate into proper L2S syntax? It seems that is the main problem you are hitting here. The Specification pattern isn't the problem, but the mapping to L2S is.
Linq-To-Sql classes can be partial. This means that you can extend them by implementing a partial that implements a common interface. That Interface can be shared between layers without the "bleeding" problem you are describing. The rest is just the details of your "IsStatisfiedBy" which should be easy to encapsulate.
I recently had the same issue. Different pattern, but still LINQ to SQL (L2S). I tried two different ways to avoid the leakage.
First we tried using DTOs and a mapping layer. So we wrote super simple objects that had a one to one mapping to the tables. They were all decorated with L2S attributes. We then wrote a mapping layer to map the DTOs to our business objects. All of this was hidden via the Repository pattern from Doman Driven Design. So consumers of the business objects had no idea the L2S was under the hood.
Next, mostly for variety. We tried using the XML mapping features of L2S so the objects themselves needed no attributes. For collections we exposed IEnumerable instead of any of L2S collections. If you looked at the internals of the business classes you could still detect some usage of L2S (EntitySet or Ref). But consumers of the class had no idea. So some bits of leakage but nothing drastic.
In the end we stuck with the first pattern. The second worked and we could have replaced L2S without changing the interface of the business layer, but I was never happy with XML mapping. The first pattern had a much cleaner separation between the database and the business objects. It took more code. The first one also worked better for us because it allowed us to evolve the business objects differently than the tables. In the early days of the project the xml mapping worked because our objects were pretty much one to one with the tables.
So in the end we put a layer between L2S and the domain. It worked. It took more code, but it was really simple stuff. And it was all very testable.
If you want to avoid referencing Linq2Sql from your domain layer, you must work against interfaces that represent your entities instead of working with the actual entities themselves. You then need a mapping layer between your interfaces and your entities.
I've worked this way and found it to be a severe hindrance. I switched to NHibernate for new projects and for the older projects I simply stopped worrying about the domain referencing Linq2Sql entities directly. Overcoming that restriction is simply too much of a time-cost in my opinion.

Does Model Driven Architecture play nice with LINQ-to-SQL or Entity Framework?

My newly created system was created using the Model Driven Architecture approach so all I have is the model (let's say comprehensive 'Order' and 'Product' classes). These are fully tested classes that support the business of my application. Now it's time to persist these classes as objects on the harddrive and at some later time retrieve them in the same state (thinking very abstractly here). Typically I'd create an IOrderRepository interface and eventually a ADO.NET-driven OrderRepository class with methods such as GetAll(), GetById(), Save(), etc... or at some point a BinaryFormatter-driven OrderRepostiroy class that serves a similar purpose through this same common interface.
Is this approach just not conducive to LINQ-To-Sql or the Entity Framework. Something that attempts to build my model from a pre-existing DB structure just seems wrong. Could I take advantage of these technologies but retain this 'MDA' approach to software engineering?
... notice I did not mention that this was a Web App. It may or may not be -- and shouldn't matter.
In general, I think that you should not make types implementing business methods and types used for O/R mapping the same type. I think this violates the single responsibility principle. The point of your entity types is to bridge the gap between relational space and object space. The point of your business types is to have collections of testable behavior. Instead, I would suggest that you project from your entity types onto your business types when materializing objects from the database. Separating these two allows your business methods and data mappings to evolve independently, which is very important, especially if you cannot always control the schema of the database. I explain this idea more fully in this presentation.

Is CouchDB best suited for dynamic languages?

I'm familiar with CouchDB and the idea of mapping its results to Scala objects, as well as find some natural way to iteract with it, came immediatly.
But I see that Dynamic languages such as Ruby and Javascript do things very well with the json/document-centric/shchema-free aproach of CouchDB.
Any good aproach to do things with Couch in static languages?
I understand that CouchDB works purely with JSON objects. Since JSON is untyped, it's tempting to believe that it's more naturally suited for dynamic languages. However, XML is generally untyped too, and Scala has very good library support for creating and manipulating XML. For an exploration of Scala's XML features, see: http://www.ibm.com/developerworks/library/x-scalaxml/
Likewise with JSON. With the proper library support, dealing with JSON can feel natural even in static languages. For one approach to dealing with JSON data in Scala, see this article: http://technically.us/code/x/weaving-tweed-with-scala-and-json/
With object databases in general, sometimes it's convenient to define a "model" (using, for example, a class in the language) and use JSON or XML or some other untyped document language to be a serialized representation of the class. Proper library support can then translate between the serialized form (like JSON) and the in-memory data structures, with static typing and all the goodies that come with it. For one example of this approach, see Lift's Record which has added conversions to and from JSON: http://groups.google.com/group/liftweb/msg/63bb390a820d11ba
I wonder if you asked the right question. Why are you using Scala, and not dynamic languages? Probably because of some goodness that Scala provides you that is important for you and, I assume, your code quality. Then why aren't you using a "statically typed" (i.e. schema-based) database either? Once again I'm just assuming, but the ability to respond to change comes to mind. Production SQL databases have a horrible tendency of being very difficult to change and refactor.
So, your data is weakly typed, and your code is strongly typed. But somewhere you'll need to make the transition. This means that somewhere, you'll have a "schema" for your data even though the database has none. This schema is defined by the classes you're mapping Couch documents onto. This makes perfect sense; most uses of Couch that I've seen have a key such as "type" and for each type at least some common set of keys. Whether to hand-map the JSON to these Scala classes or to use e.g. fancy reflection tools (slower but pretty), or some even fancier Scala feature that I'm yet new to is a detail. Start with the easy-but-slow one, then see if it's fast enough.
The big thing occurs when your classes, i.e. your schema, change. Instead of ALTER'ing your tables, you can just change the class, ensure that you do something smart if for some document a key you expect is missing (because it was based on an older version of the class), and off you go. Responding to change has never been easier, and still your code is as statically typed as it can get.
If this is not good enough for you, and you want no schema at all, then you're effectively saying that you don't want to use classes to define and manipulate your data. That's fine too (though I can't imagine a use), but then the question is not about dynamic vs static languages, but about whether to use class-based OO languages at all.