For a school project, I need to create a way to create personnalized queries based on end-user choices.
Since the user can choose basically any fields from any combination of tables, I need to find a way to map the tables in order to make a join and not have extraneous data (This may lead to incoherent reports, but we're willing to live with that).
For up to two tables, I already managed to design an algorithm that works fine. However, when I add another table, I can't find a way to path through my database. All tables available for the personnalized reports can be linked together so it really all falls down to finding which path to use.
You might be able to try some form of an A* algorithm. Basically this looks at each of the possible next options to choose and applies a heuristic to it, a function that determines roughly how far it is between this node and your goal. It then chooses the one that is closer and repeats. The hardest part of implementing A* is designing a good heuristic.
Without more information on how the tables fit together, or what you mean by a 'path' through the tables, it's hard to recommend something though.
Looks like it didn't like my link, probably the * in it, try:
http://en.wikipedia.org/wiki/A*_search_algorithm
Edit:
If that is the whole database, I'd go with a depth-first exhaustive search.
I thought about using A* or a similar algorithm, but as you said, the hardest part is about designing the heuristic.
My tables are centered around somewhat of a backbone with quite a few branches each leading to at most a single leaf node. Here is the actual map (table names removed because I'm paranoid). Assuming I want to view data from tha A, B and C tables, I need an algorithm to find the blue path.
Related
I'm struggling with a database design issue, and it's kind of a long winded one:
My website will have an unlimited number of organizations users they can join, subgroups under those organizations, and finally specific profiles for those subgroups. Subgroups within the same organization will be able to borrow and make changes to profiles from each other. Users will generate the organizations, the subgroups, and profiles.
I can draw it out, make the flow sensible on paper. When it comes to actually putting it to either SQL I'm lost. The majority of the help guides out there assumes static groups so a simple primary and foreign key set up can refer back to the right information. Mine has too much dynamic information for most of these to outright work as I understand it.
Most writers say stay away from dynamically generated tables, but that's where my instinct takes me. Another idea I had was 3 massive tables one for all Organizations, Groups, and Profiles.
So is there a better way to go about this? Or are there any good documents I should read up on to help me translate from drawing to actual code?
I have some experience with both SQL and MongoDB if that helps explain things.
I don't know about MongoDB(NoSQL), but from the SQL standpoint, here is my opinion.
As far as your schema goes, Most of the time when your "instinct" indicates that :- Only a "Dynamic Tables" solution is your best bet, for some problem that you are working on.
Remember there is a high chance that, that very problem can be solved by multiple static tables with different relationships. (By Static I mean the ones which you have created yourself as a developer.)
Also I'd like to mention that, I too myself in my initial days always thought of problem solving the similar way, but then I started understanding the principles and how exactly the databases work.
Back To Your Problem:-
If your organisation hirerchy consists of three major types of objects/levels, viz. Organizations, Groups, and Profiles then I'd suggest that you go with the 3 tables with correct relationships, which any SQL engine is quiet efficient at handling, in comparison to creating tables at runtime.
Now if the hierarchy is dynamic like say, An organisation can contain many groups which in turn shall contain profiles which again shall/can contain other organisations and so on.... Then you may want to look at Recursive structure with SQL(Recursion). (Just do a google search there are a lot of articles about that.)
I'm currently working on Blog-Software which should offer support for content in multiple languages.
I'm thinking of a way to design my database (MySQL). My first thought was the following:
Every entry is stored in a table (lets call it entries). This table
holds information which doesn't change (like the unique ID, if it's
published or not and the post-type).
Another table (let's call it content) contains the strings
(like the content, the headline, the date, and author of the specific
language).
They are then joined by the unique entry-id.
The idea of this is that one article can be translated into multiple other languages, but it doesn't need to be. If there is no translation in the native language of the user (determined by his IP or something), he sees the standard language (which would be English).
For me this sounds like a simple multilingual database and I'm sure there is a design pattern for this. Sadly, I didn't find any.
If there is no pattern, how would you go about realizing this? Any input is greatly appreciated.
Your approach is what I've seen in most applications with this kind of capability. The only changing piece is that some places will put the "default" values into the base table (Entry) while others will treat it as just another Content row.
That design will also give you the ability to search (or restrict search) in all languages easily. From a db design perspective, its imho the best design you can use.
With small amounts of text and a simple application this would work. In the large, you might be bitten by the extra joins needed, especially when your database is larger than ram. Presenting things in the right order (sorting) also might need solving
This design problem is turning out to be a bit more "interesting" than I'd expected....
For context, I'll be implementing whatever solution I derive in Access 2007 (not much choice--customer requirement. I might be able to talk them into a different back end, but the front end has to be Access (and therefore VBA & Access SQL)). The two major activities that I anticipate around these tables are batch importing new structures from flat files and reporting on the structures (with full recursion of the entire structure). Virtually no deletes or updates (aside from entire trees getting marked as inactive when a new version is created).
I'm dealing with two main tables, and wondering if I really have a handle on how to relate them: Products and Parts (there are some others, but they're quite straightforward by comparison).
Products are made up of Parts. A Part can be used in more than one Product, and most Products employ more than one Part. I think that a normal many-to-many resolution table can satisfy this requirement (mostly--I'll revisit this in a minute). I'll call this Product-Part.
The "fun" part is that many Parts are also made up of Parts. Once again, a given Part may be used in more than one parent Part (even within a single Product). Not only that, I think that I have to treat the number of recursion levels as effectively arbitrary.
I can capture the relations with a m-to-m resolution from Parts back to Parts, relating each non-root Part to its immediate parent part, but I have the sneaking suspicion that I may be setting myself up for grief if I stop there. I'll call this Part-Part. Several questions occur to me:
Am I borrowing trouble by wondering about this? In other words, should I just implement the two resolution tables as outlined above, and stop worrying?
Should I also create Part-Part rows for all the ancestors of each non-root Part, with an extra column in the table to store the number of generations?
Should Product-Part contain rows for every Part in the Product, or just the root Parts? If it's all Parts, would a generation indicator be useful?
I have (just today, from the Related Questions), taken a look at the Nested Set design approach. It looks like it could simplify some of the requirements (particularly on the reporting side), but thinking about generating the tree during the import of hundreds (occasionally thousands) of Parts in a Product import is giving me nightmares before I even get to sleep. Am I better off biting that bullet and going forward this way?
In addition to the specific questions above, I'd appreciate any other comentary on the structural design, as well as hints on how to process this, either inbound or outbound (though I'm afraid I can't entertain suggestions of changing the language/DBMS environment).
Bills of materials and exploded parts lists are always so much fun. I would implement Parts as your main table, with a Boolean field to say a part is "sellable". This removes the first-level recursion difference and the redundancy of Parts that are themselves Products. Then, implement Products as a view of Parts that are sellable.
You're on the right track with the PartPart cross-ref table. Implement a constraint on that table that says the parent Part and the child Part cannot be the same Part ID, to save yourself some headaches with infinite recursion.
Generational differences between BOMs can be maintained by creating a new Part at the level of the actual change, and in any higher levels in which the change must be accomodated (if you want to say that this new Part, as part of its parent hierarchy, results in a new Product). Then update the reference tree of any Part levels that weren't revised in this generational change (to maintain Parts and Products that should not change generationally if a child does). To avoid orphans (unreferenced Parts records that are unreachable from the top level), Parts can reference their predecessor directly, creating a linked list of ancestors.
This is a very complex web, to be sure; persisting tree-like structures of similarly-represented objects usually are. But, if you're smart about implementing constraints to enforce referential integrity and avoid infinite recursion, I think it'll be manageable.
I would have one part table for atomic parts, then a superpart table with a superpartID and its related subparts. Then you can have a product/superpart table.
If a part is also a superpart, then you just have one row for the superpartID with the same partID.
Maybe 'component' is a better term than superpart. Components could be reused in larger components, for example.
You can find sample Bill of Materials database schemas at
http://www.databaseanswers.org/data_models/
The website offers Access applications for some of the models. Check with the author of the website.
I'm working on a forum-like webapp where I'd like to allow users to favourite an item so that they can keep track of it, and also so that others can see how many times an item's been favourited.
The problem is, I'm unsure on the best practices for databases, which includes this situation.
I have two ideas in my head on how to do this:
Add an extra column to the user table and store things like so: "|2|5|73|"
Add an extra table with at least two columns, one for referencing an item, the other for referencing a user.
I feel uncomfortable about going for the second method as it involves an extra table, and potentially more queries would be required. Perhaps these beliefs aren't an issue, as I have little understanding of databases beyond simply working with table layouts and basic queries.
The second method, commonly called a junction or join table, is fairly standard practice and is going to be far more efficient than adding a column like the one you describe to the user table. Through the magic of JOINs you won't be making any extra queries.
Since it sounds like your app is starting to get a little complex, I highly recommend picking up a MySQL database book at your local library or book store (check for reviews on Amazon to find a good one) and expanding your knowledge.
Well I'd have +1-ed the otherresponse but I'm too much of a newb apparently. But yes, I recommend a join table for this type of thing.
How do you know when to create a new table for very similar object types?
Example:
To learn mysql I'm building a model solar system. For the purposes of my project, planets have many similar attributes to dwarf planets, centaurs, and comets. Dwarf planets are almost completely identical to planets. Centaurs and comets are only different from planets because their orbital path has more variation. Should I have a separate table for each type of object, or should they share tables?
The example is probably too simple, but I'm also interested in best practices. Like should I use separate tables just in case I want to make planets and dwarf planets different in the future, or are their any efficiency reasons for keeping them in the same table.
Normal forms is what you should be interested with. They pretty much are the convention for building tables.
Any design that doesn't break the first, second or third normal form is fine by me. That's a pretty long list of requirement though, so I suggest you go read it off the Wikipedia links above.
It depends on what type of information you want to store about the objects. If the information for all of them is the same, say orbit radius, mass and name, then you can use the same table. However, if there are different properties for each (say atmosphere composition for planets, etc.) then you can either use separate tables for each (not very normalized) or have one table for basic properties like orbit, mass and name and a second table for just the properties that are unique to planets (and a similar table for comets, etc. if needed). All objects would be in the first table but only planets would be in the second table and linked through a foreign key to the first table.
It's called Database Normalization
There are many normal forms. By applying normalization you will go through metadata (tables) and study the relationsships between data more clearly. By using the normalization techniques you will optimize the tables to prevent redundancy. This process will help you understand which entities to create based on the relationsships between the different fields.
You should most likely split the data about a planet etc so that the shared (common) information is in another table.
E.g.
Common (Table)
Diameter (Column)
Mass (Column)
Planet
Population
Comet
Speed
Poor columns I know. Have the Planet and Comet tables link to the Common data with a key.
This is definitely a subjective question. It sounds like you are already on the right lines of thinking. I would ask:
Do these objects share many attributes? If so, it's probably worth considering at the very least a base table to list them all in.
Does one object "extend" another - it has all the attributes of the other, plus some extras? If so, it might be worth adding another table with the extra attributes and a one-to-one mapping back to the base object.
Do both objects have many shared attributes and unshared attributes? If this is the case, maybe you need a single table plus a "data extension" system where each object can have a type or category that specifies any amount of extra attributes that may be associated with it.
Do the objects only share one or two attributes? In this case, they are probably dissimilar enough to separate into multiple tables.
You may also ask yourself how you are going to query the data. Will you ever want to get them all in the same list? It's always a good idea to combine data into tables with other data they will commonly be queried with. For example, an "attachments" table where the file can be an image or a video, instead of images and video tables, if you commonly want to query for all attachments. Don't split into multiple tables unless there is a really good reason.
If you will ever want to get planets and comets in one single query, they will pretty much have to be in the same table if you want the database to work efficiently. Inheritance should be handled inside your app itself :)
Here's my answer to a similar question, which I think applies here as well:
How do you store business activities in a SQL database?
There are many different ways to express inheritance in your relational model. For example you can try to squish everything in to one table and have a field that allows you to distinguish between the different types or have one table for the shared attributes with relationships to a child table with the specific attributes etc... in either choice you're still storing the same information. When going from a domain model to a relational model this is what is called an impedance mismatch. Both choices have different trade offs, for example one table will be easier to query, but multiple tables will have higher data density.
In my experience it's best not to try to answer these questions from a database perspective, but let your domain model, and sometimes your application framework of choice, drive the table structure. Of course this isn't always a viable choice, especially when performance is concerned.
I recommend you start by drawing on paper the relationships you want to express and then go from there. Does the table structure you've chosen represent the domain accurately? Is it possible to query to extract the information you want to report on? Are the queries you've written complicated or slow? Answering these questions and others like them will hopefully guide you towards creating a good relational model.
I'd also suggest reading up on database normalization if you're serious about learning good relational modeling principals.
I'd probably have a table called [HeavenlyBodies] or some such thing. Then have a look up table with the type of body, ie Planet, comet, asteroid, star, etc. All will share similar things such as name, size, weight. Most of the answers I read so far all have good advise. Normalization is good, but I feel you can take it too far sometimes. 3rd normal is a good goal.