I have read from a book
The difference between containers and collections lies in the fact that containers are always open (i.e., new members may be added through additional RDF statements) and collections may be closed.
I don't understand this difference clearly. It says that no new members can be added to a collection. What if I change the value of the last rdf:rest property from rdf:nil to _:xyz and add
_:xyz rdf:first <ex:aaa> .
_:xyz rdf:rest rdf:nil .
I am thus able to add a new member _:xyz. Why does it then say that collections are closed?
The key difference is that in a Container, you can simply continue to add new items, by only asserting new RDF triples. In a Collection, you first have to remove a statement before you can add new items.
This is an important difference in particular for RDF reasoning. It's important because RDF reasoning employs an Open World Assumption (OWA), which, put simply, states that just because a certain fact is not known, that does not mean we can assume that fact to be untrue.
If you apply this principle to a container, and you ask the question "how many items does the container have", the answer must always be "I don't know", simply because there is no way to determine how many unknown items might be in the container. However, if we have a collection, we have an explicit marker for the last item, so we can with certainty say how many items the collection contains - there can be no unknown additional items.
Related
This design problem is turning out to be a bit more "interesting" than I'd expected....
For context, I'll be implementing whatever solution I derive in Access 2007 (not much choice--customer requirement. I might be able to talk them into a different back end, but the front end has to be Access (and therefore VBA & Access SQL)). The two major activities that I anticipate around these tables are batch importing new structures from flat files and reporting on the structures (with full recursion of the entire structure). Virtually no deletes or updates (aside from entire trees getting marked as inactive when a new version is created).
I'm dealing with two main tables, and wondering if I really have a handle on how to relate them: Products and Parts (there are some others, but they're quite straightforward by comparison).
Products are made up of Parts. A Part can be used in more than one Product, and most Products employ more than one Part. I think that a normal many-to-many resolution table can satisfy this requirement (mostly--I'll revisit this in a minute). I'll call this Product-Part.
The "fun" part is that many Parts are also made up of Parts. Once again, a given Part may be used in more than one parent Part (even within a single Product). Not only that, I think that I have to treat the number of recursion levels as effectively arbitrary.
I can capture the relations with a m-to-m resolution from Parts back to Parts, relating each non-root Part to its immediate parent part, but I have the sneaking suspicion that I may be setting myself up for grief if I stop there. I'll call this Part-Part. Several questions occur to me:
Am I borrowing trouble by wondering about this? In other words, should I just implement the two resolution tables as outlined above, and stop worrying?
Should I also create Part-Part rows for all the ancestors of each non-root Part, with an extra column in the table to store the number of generations?
Should Product-Part contain rows for every Part in the Product, or just the root Parts? If it's all Parts, would a generation indicator be useful?
I have (just today, from the Related Questions), taken a look at the Nested Set design approach. It looks like it could simplify some of the requirements (particularly on the reporting side), but thinking about generating the tree during the import of hundreds (occasionally thousands) of Parts in a Product import is giving me nightmares before I even get to sleep. Am I better off biting that bullet and going forward this way?
In addition to the specific questions above, I'd appreciate any other comentary on the structural design, as well as hints on how to process this, either inbound or outbound (though I'm afraid I can't entertain suggestions of changing the language/DBMS environment).
Bills of materials and exploded parts lists are always so much fun. I would implement Parts as your main table, with a Boolean field to say a part is "sellable". This removes the first-level recursion difference and the redundancy of Parts that are themselves Products. Then, implement Products as a view of Parts that are sellable.
You're on the right track with the PartPart cross-ref table. Implement a constraint on that table that says the parent Part and the child Part cannot be the same Part ID, to save yourself some headaches with infinite recursion.
Generational differences between BOMs can be maintained by creating a new Part at the level of the actual change, and in any higher levels in which the change must be accomodated (if you want to say that this new Part, as part of its parent hierarchy, results in a new Product). Then update the reference tree of any Part levels that weren't revised in this generational change (to maintain Parts and Products that should not change generationally if a child does). To avoid orphans (unreferenced Parts records that are unreachable from the top level), Parts can reference their predecessor directly, creating a linked list of ancestors.
This is a very complex web, to be sure; persisting tree-like structures of similarly-represented objects usually are. But, if you're smart about implementing constraints to enforce referential integrity and avoid infinite recursion, I think it'll be manageable.
I would have one part table for atomic parts, then a superpart table with a superpartID and its related subparts. Then you can have a product/superpart table.
If a part is also a superpart, then you just have one row for the superpartID with the same partID.
Maybe 'component' is a better term than superpart. Components could be reused in larger components, for example.
You can find sample Bill of Materials database schemas at
http://www.databaseanswers.org/data_models/
The website offers Access applications for some of the models. Check with the author of the website.
I have a GUI tool that manages state sequences. One component is a class that contains a set of states, your typical DFA state machine. For now, I'll call this a StateSet (I have a more specific name in mind for the actual class that makes sense, but this name I think will suffice for the purpose of this question.)
However, I have another class that has a collection (possibly partially unordered) of those state sets, and lists them in a particular order. and I'm trying to come up with a good name for it - not just for internal code, but for customers to refer to it.
The role of this particular second collection is to encapsulate the entire currently used/available collection of StateSets that the user has created. All of the StateSets will be used eventually in the application. A good analogy would be a hand of cards versus the entire table: The 'table' contains all of the currently available hands, while the 'hand' contains a particular collection of cards.
I've got these as starter ideas I could throw out for the class name; I'm not comfortable with either at the moment:
Sequence (maybe...with something else tacked on to the name)
StateSetSet (reasonable for code, but not for customers)
And as ewernli mentions, these are really technical terms, which don't really convey a the idea well. Any other suggestions or ideas?
Sequence - Definitely NOT. It's too generic, and doesn't have any real semantic meaning.
StateSetSet - While more semantically correct, this is confusing. You have a sequence, which implies order, which is different from a set, which does not.
That being said, the best option, IMO, is StateSetSequence, as it implies you have a sequence of StateSet instances.
What is the role/function of you StateSetSet?
StateSetSet or Sequence are technical terms.
Prefer a term that convey the role/function of the class.
That could well be something like History, Timeline, WorldSnapshot,...
EDIT
According to your updated description, StateSet looks to me like StateSpace (the space of all possible states). If the user can then interactively create something, it might be appropriate to speak of a Workspace. If the user creates various state spaces of interest, I would then go for StateSpaceWorkspace. Isn't that a cool name :)
"StateSets" may be sufficient.
Others:
StateSetList
StateSetLister
StateSetListing
StateSetSequencer
I like StateSetArrangement, implying an ordering without implying anything about the underlying storage mechanisms.
I have a couple of questions about searching in graphs/trees:
Let's assume I have an empty chess board and I want to move a pawn around from point A to B.
A. When using depth first search or breadth first search must we use open and closed lists ? This is, a list that has all the elements to check, and other with all other elements that were already checked? Is it even possible to do it without having those lists? What about A*, does it need it?
B. When using lists, after having found a solution, how can you get the sequence of states from A to B? I assume when you have items in the open and closed list, instead of just having the (x, y) states, you have an "extended state" formed with (x, y, parent_of_this_node) ?
C. State A has 4 possible moves (right, left, up, down). If I do as first move left, should I let it in the next state come back to the original state? This, is, do the "right" move? If not, must I transverse the search tree every time to check which states I've been to?
D. When I see a state in the tree where I've already been, should I just ignore it, as I know it's a dead end? I guess to do this I'd have to always keep the list of visited states, right?
E. Is there any difference between search trees and graphs? Are they just different ways to look at the same thing?
A. When using depth first search or
breadth first search must we use open
and closed lists ?
With DFS you definitely need to store at least the current path. Otherwise you would not be able to backtrack. If you decide upon maintaining a list of all visited (closed) nodes, you are able to detect and avoid cycles (expanding the same node more than once). On the other side you don't have the space efficiency of DFS anymore. DFS without closed list only needs space proportional to the depth of the search space.
With BFS you need to maintain an open list (sometimes called fringe). Otherwise the algorithm simply can't work. When you additionally maintain a closed list, you will (again) be able to detect/avoid cycles. With BFS the additional space for the closed list might be not that bad, since you have to maintain the fringe anyway. The relation between fringe size and closed list size strongly depends upon the structure of the search space, so this has to be considered here. E.g. for a branching factor of 2, both lists are equal in size and the impact of having the closed list doesn't seem very bad compared to its benefits.
What about A*, does it need it?
A*, as it can be seen as some special (informed) type of BFS, needs the open list. Omitting the closed list is more delicate than with BFS; also deciding upon updating costs inside the closed list. Depending upon those decisions, the algorithm can stop being optimal and/or complete depending on the type of heuristic used, etc. I won't go into details here.
B.
Yup, the closed list should form some kind of inverse tree (pointers going towards the root node), so you can extract the solution path. You usually need the closed list for doing this. For DFS, your current stack is exactly the solution path (no need for closed list here). Also note that sometimes you are not interested in the path but only in the solution or the existence of it.
C.
Read previous answers and look for the parts which talk about the detection of cycles.
D.
To avoid cycles with a closed list: don't expand nodes that are already inside the closed list. Note: with path-costs coming into play (remember A*), things might get more tricky.
E. Is there any difference between
search trees and graphs?
You could consider searches that maintain a closed list to avoid cycles as graph-searches and those without one tree-searches.
A) It's possible to avoid the open/closed lists - you could try all possible paths, but that would take a VERY long time.
B) Once you've reached the goal, you use the parent_of_this_node information to "walk backwards" from the goal. Start with the goal, get its parent, get the parent's parent, etc. until you reach the start.
C) I think it doesn't matter - there's no way that the step you describe would result in a shorter path (unless your steps have negative weight, in which case you can't use Dijkstra/A*). In my A* code, I check for this case and ignore it, but do whatever is easiest to code up.
D) It depends - I believe Dijkstra can never reopen the same node (can someone correct me on that?). A* definitely can revisit a node - if you find a shorter path to the same node, you keep that path, otherwise you ignore it.
E) Not sure, I've never done anything specifically for trees myself.
There's a good introduction to A* here:
http://theory.stanford.edu/~amitp/GameProgramming/
that covers a lot of details about how to implement the open set, pick a heuristic, etc.
A. Open and Closed lists are common implementation details, not part of the algorithm as such. It's common to do a depth-first tree search without either of these for example, the canonical way being a recursive traversal of the tree.
B. It is typical to ensure that nodes refer back to previous nodes allowing for a plan to be reconstructed by following the back-links. Alternatively you just store the entire solution so far in each candidate, though it would then be misleading to call it a node really.
C. I'm assuming that moving left and then moving right bring you to an equivalent state - in this case, you would have already explored the original state, it would be on the closed list, and therefore should not have been put back onto the open list. You don't traverse the search tree each time because you keep a closed list - often implemented as an O(1) structure - for precisely this purpose of knowing which states have already been fully examined. Note that you cannot always assume that being in the same position is the same as being in the same state - for most game path-finding purposes, it is, but for general purpose search, it is not.
D. Yes, the list of visited states is what you're calling the closed list. You also want to check the open list to ensure you're not planning to examine a given state twice. You don't need to search any tree as such, since you typically store these things in linear structures. The algorithm as a whole is searching a tree (or a graph), and it generates a tree (of nodes representing the state space) but you don't explicitly search through a tree structure at any point within the algorithm.
E. A tree is a type of graph with no cycles/loops in it. Therefore you use the same graph search procedure to search either. It's common to generate a tree structure that represents your search through the graph, which is represented implicitly by the backwards links from each node to the node that preceded/generated it in the search. (Although if you go down the route of holding the entire plan in each state, there will be no tree, just a list of partial solutions.)
I have only used 3 functional languages -- scala, erlang, and haskell, but in all 3 of these, the correct way to build lists is to prepend the new data to the front and then reversing it instead of just appending to the end. Of course, you could append to the lists, but that results in an entirely new list being constructed.
Why is this? I could imagine it's because lists are implemented internally as linked lists, but why couldn't they just be implemented as doubly linked lists so you could append to the end with no penalty? Is there some reason all functional languages have this limitation?
Lists in functional languages are immutable / persistant.
Appending to the front of an immutable list is cheap because you only have to allocate a single node with the next pointer pointing to the head of the previous list. There is no need to change the original list since it's only a singly linked list and pointers to the previous head cannot see the update.
Adding a node to the end of the list necessitates modifying the last node to point to the newly created node. Only this is not possible because the node is immutable. The only option is to create a new node which has the same value as the last node and points to the newly created tail. This process must repeat itself all the way up to the front of the list resulting in a brand new list which is a copy of the first list with the exception of thetail node. Hence more expensive.
Because there is no way to append to a list in O(1) without modifying the original (which you don't do in functional languages)
Because it's faster
They certainly could support appending, but it's so much faster to prepend that they limit the API. It's also kind of non-functional to append, as you must then modify the last element or create a whole new list. Prepend works in an immutable, functional, style by its nature.
That is the way in which lists are defined. A list is defined as a linked list terminated by a nil, this is not just an implementation detail. This, coupled with that these languages have immutable data, at least erlang and haskell do, means that you cannot implement them as doubly linked lists. Adding an element would them modify the list, which is illegal.
By restricting list construction to prepending, it means that anybody else that is holding a reference to some part of the list further down, will not see it unexpectedly change behind their back. This allows for efficient list construction while retaining the property of immutable data.
Something keeps showing up in my programming, and it is that two things are the same from some viewpoint, but different from another. Like, imagine you build a graph of rail stations, connected by trains, then the classes Vertex and RailStation are sometimes the same, other times not.
So, imagine I have a graph that very much represents rail stations and trains. Then I hand this graph to another object, which deletes some vertices, and then I want the corresponding rail stations to be gone.
I don't want to make rail stations "properties" of vertices, they're not. Also, the problem is symmetrical: If I erase a railstation, I want the corresponding vertex to be gone. What is the proper OO way to model or correspondences. I'm willing to go a few extra miles by writing some support methods or classes, if in the end the overall usage is simple and easy.
I'm currently using the Smalltalk programming language, but the question isn't really smalltalk-specific, I think. I just mention it because in Smalltalk, you can do cool tricks like examining the call stack, which might be helpful in this context.
Update:
Well, RailStations aren't Vertices! Are they?
Ok, let us consider real code, as demanded in the answers. Let me model a person with children. That's the easiest thing, right? Children should also know their parents, so we have like a doubly linked tree. To make disbanding parents from children easier, I model the link between parent and child as a Relationship, with properties parent and child.
So, I could implement parent>>removeChild: perhaps like this
removeChild: aChild
(parent relationshipWith: aChild) disband.
So, a parent has a collection of relationships, not of children. But each relationship corresponds to a child. Now I want to do things like this:
parent children removeAllSuchThat: [:e | e age < 12]
which should remove the relationship and the child.
Here, relationships and children correspond in some sense. So, what do I do now? Don't get me wrong, I'm fully aware that I could solve the problem without introducing Relationship classes. But indeed, parents and children actually do share a relationship, so why not model that and use it to help disbanding double links less imperatively?
In your problem domain, aren't stations a kind of vertex? In which case, why not derive Station from Vertex?
Notice the use of the phrase "in your problem domain". Your problem appears to be about the use as railway stations appearing in a graph. So yes, in that domain, stations are vertexes. If it was a different problem domain, say a database on railway station architecture, they may well not be. Most modern languages support some idea of namespaces to allow you to have different kinds of entity with the same names in different domains.
Regarding your parent/child problem, once again you are being too general. If I were modelling mathematical expressions and sub expressions, if I remove a parent I would want to remove and delete/free all subexpressions. OTOH, ff I were modelling legal responsibility relationships in the UK population, then when a responsibility isis dissolved (say because of a divorce), I only want to remove the relationship, and NOT delete/free the child, which has its own independent existence.
It seems like you just want RailStation to inherit from Vertex (is-a relationship). See this smalltalk tutorial on inheritance. That way, if you have a graph of RailStations, an object used to dealing (generically) with graphs of Vertexes would handle things right naturally.
If this approach won't work, be more specific (preferably with real code).
From your description of the problem, you have a one-to-one correspondence of stations to vertices and deleting a station should automatically delete the corresponding vertex (and vice-versa). You also mentioned building "a graph of rail stations, connected by trains", by which you apparently mean a graph in which stations are vertices and trains are edges.
So, in what way is a station not a vertex? If the station does not exist except as a vertex, and if a vertex does not exist except as a station, then what benefit do you see in maintaining them as two distinct-but-linked entities?
As I understand your situation, station-isa-vertex and inheritance is the way to model that.
Having a Relationship object is a good idea.
I think the appropriate question here is "which use should be made of it?".
Probably Parent and Child classes are extending the same Person superclass, so they'll have some attributes in common, age for example.
In my idea, I can see the following: Parent and Child objects have to know each other, so both classes have to keep a link to the same Relationship.
The Relationship object keeps a one-to-many relation between a single parent and a certain number of children, and it'll keep a reference to each Person object.
This way you can implement the whole disbanding logic within the Relationshp object, more or less sophisticated as you wish. You can query the Relationship object to know which members of the family match your requirements to do something. You can make the relationship to disband (and destroy) safely, as it will know all members and would ask them to break the reference and then it would be ready to destroy, or ask to some member to leave the family, keeping the Relationship object alive.
But that's not all. Relationship should be really a superclass, extended by HierarchicalRelationship and PeerRelationship (or FriendRelationship).
This specialization lets you have Parent(s) and Child(ren) to link between other hierarchies in a completely traversal way.
The true concept behind this is that your Relationship objects are the key to query and organize the whole bunch of Person objects (or Vertex objects) in a scalable and structured way, so the whole data domain you end up with is usable in any sense you like, whether you want to disband groups or walk a certain path (or railroad) between them.
Sorry for the huge amount of metaphores.
Take a look at Fame, see http://www.squeaksource.com/Fame.html
We use a specialized subclass of Collection that updates the opposite end when you add or remove elements. Also, you can annotate your classes with pragmas to annotate relations. These pragmas are used by the Fame framework to do all kind of nice stuff.