What database (and schema, if applicable) would be most appropriate for storing and retrieving data (location, timestamp) that can be placed at any node of an arbitrarily defined tree? For instance: the location of a book you own:
Book
| |
Home Work
| | | | |
Bedroom Bathroom Den Office Conf room
| | | | |
Closet Underbed EntCtr Closet Desk
| |
Top Shelf Bottom Shelf
XXXX
For each item record, the item's position could look conceivably different but likely the same root and primary nodes, but beyond that could have a different branches and leaves where the item is actually located. And with each added item, the tree itself could conceivably grow (you could add specificity to that "top shelf in the bedroom closet" node eventually, placing newer items in one of 2-3 sub-locations).
I'm thinking a SQL db might not be ideal since the tree could expand arbitrarily and could be entirely different depending on user, but not sure how a NoSQL db like Mongo could handle any updating/expansion (like if the example book is moved from an existing node to a new one a level or two deeper). Maybe the depth/breadth of tree levels could be constrained if using a SQL db, but the column labels could vary, and on the other hand Mongo could simply create a new document for an item if it is moved to a new location.
Any insights from database experts very much appreciated!
Locations, especially those managed by different organizations, are not necessarily hierarchical. For example, Russia is in Europe and Asia. Texarkana is in Texas and Arkansas. US ZIP 42223 is in Kentucky and Tennessee. Geopolitical locations are graphical / networked.
That being said, you can easily model hierarchical data in a SQL database by using an adjacency list:
create table locations (
location_id int primary key,
name text not null,
parent_id int null references locations(location_id)
);
You can then query such a table using Recursive Common Table Expressions (CTEs), which are available in every major database except MySQL, but it sounds like switching databases is an option for you.
Here's an example: http://blog.databasepatterns.com/2014/02/trees-paths-recursive-cte-postgresql.html
You don't need Nested Set, Materialized Path or Closure Table if your DB supports RCTE.
When you say "SQL DB" I think you are referencing a relational database. For this you seem to want a hierarchical database. You can get such a structure in a relational DB. It's called a Nested Set Model. See: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
Related
I am creating a database for a community to store details of all their members along with those members' relations with each other.
For Instance: There is a family of 4. Mother, Father, Son and Daughter. The Son gets married to a girl from another family in the same community (Their data is also in the same database). The newly married couple has a new member soon. Also they need to add their grand parents to the database at a later stage (Parents of both the Mother and Father in this case).
What would be the best way to create a schema for such a database.
I have a schema called member_details that'll store all community members' data in a single table something like this.
member_details: ID | Name | Birthdate | Gender | Father | Mother | Spouse | Child
All members would have relations mapped to Father,Mother,Spouse,Child referenced in the same table.
Is this schema workable from a technical pov?
I just need to know if my solution is correct or is there a better way to do this. I know there are a lot of websites storing this kind of data and if someone could point me to the right direction.
I'd advice you to use two tables. One for members of community and one for relations beetween them. Something like this:
Members:
ID | Name | Birth | Gender
Relations:
First Member ID | Second Member ID | Relation
Where you use IDs from first table as foreign keys in second. That way you'll be able to add more relations types when you need it. By the way, I'd add a third table to store relation types, so it can work as a dictionary. Same thing for genders.
As usual, "it depends".
The first question is "how will you use this data?". What sort of questions do you expect the database to answer? If you want to show a person's profile with their relationships, that's pretty easy. If you want to find out how many children a person has, or who is the grandfather of a person, or the age of someone's youngest child, that could be a little harder.
The second question is "how sure are you these are the only relationships you want to store?" Perhaps you also want to store "neighbour", "team member", "engaged_to" - or maybe you need to store that information later on. Maybe you need to take account of people getting divorced, or remarrying.
The schema you suggest works fine for most scenarios, but adding a new type of relationship means you have to add a new column. There are no hard and fast rules, but in general it's better to add rows than columns when faced with events in the problem domain. Asking "who is this person's grandfather" requires a couple of self joins, and that's okay.
#ba3a suggests splitting the information about people from the information about relationships. This is much "cleaner" - and less likely to require new columns as you store more types of relationship. Showing a person's profile requires a query with lots of outer joins. Finding a grand parent requires self joins on the "relations" table.
I am working with a client in manufacturing whose products are configurations of the same bunch of parts. I am creating a database that holds all valid products and their Bill of Materials. I need help on deciding a Bill Of Material schedule to implement.
The obvious solution is a many-to-many relationship with a junction table:
Table 1: Products
Table 2: Parts
Junction Table: products, parts, part quantities
However, there are multiple levels in my client's product;
-Assembly
-Sub-Assembly
-Component
-Part
and items from lower levels are allowed to be associated with any upper level item;
Assembly |Sub-assembly
Assembly |Component
Assembly |Part
Sub-Assembly |Component
Sub-Assembly |Part
Component |Part
and I suspect the client will want to add more levels in the future when new product lines are added.
Correct me if I am wrong, but I believe the above relation schedule would demand a growing integer sequence of junction tables and queries (0+1+1+2+3...) to display and export the full Bill of Materials which may eventually affect performance.
Someone suggested to put everything in one table:
Table 1: Assemblies, sub-assemblies, components, parts, etc...
Junction table: Children and Parents
This only requires one junction table to create infinite levels of many-to-many relationships. I don't know if I trust this solution, but I can't think of any issues other than accidentally making an item its own parent and creating an infinite loop and that it sounds disorganized.
I lack the experience to determine whether either or neither of these models will work for my client. I am sketching these models in MS Access, but I am open to moving this project to a more powerful platform if necessary. Any input is appreciated. Thank you.
-M
What you are describing is a hierarchy. As such it should take the form:
part_hierarchy:
part_id | parent_part_id | other | attributes | of | this | relationship
So part_id 1 may have a parent part_id 10 "component" which may have a parent_part_id (when looked up itself in this table) of 12 "Assembly. It would look like:
part_id | parent_part_id
1 | 10
10 | 12
and parts table:
part_id | description
1 | widget
10 | widget component
12 | aircraft carrier
That's a little simplified since it doesn't take into account your product/part relationship, but it will all fit together using this methodology.
Nice and simple. Now it doesn't matter how deep the hierarchy goes. It's still just two columns (And any extra columns needed for attributes of this relationship like... create_date, last_changed_by_user, etc.
I would suggest something more powerful than access though since it lacks the ability to pick a part a hierarchy using a Recursive CTE, something that comes with SQL Server, Postgres, Oracle, and the like.
I would 100% avoid any schema that requires you to add more fields or tables as the hierarchy becomes deeper and more complex. That is a path that leads towards pain and regret.
Since the level of nesting is arbitrary, use one table with a self-referencing parent_id foreign key to itself.
While this is technically correct, navigating it requires recursive query that most DB's don't support. However, a simple and effective way of making accessing nested parts simple is to store a "path" to each component, which looks like a path in a file system.
For example, say part id 1 is a top level part that has a child whose id is 2, and part id 2 has a child part with id 3, the paths would be:
id parent_id path
1 null /1
2 1 /1/2
3 2 /1/2/3
Doing this means finding the tree of subparts for any part is simply:
select b.part
from parts a
join parts b on b.path like concat(a.path, '%')
where a.id = ?
Hi I've got a small internal project I am working on. Currently it only serves my company, but I'd like to scale it so that it could serve multiple companies. The tables I have at the moment are USERS and PROJECTS. I want to start storing company specific information and relate it to the USERS table. For each user, I will have a new column that is the company they belong to.
Now I also need to store that companies templates in the database. The templates are stored as strings like this:
"divider","events","freeform" etc.
Initially I was thinking each word should go in as a separate row, but as I write this I'm thinking perhaps I should store all templates in one entry separated by commas (as written above).
Bottom line, I'm new to database design and I have no idea how to best set this up. How many tables, what columns etc. For right now, my table structure looks like this:
PROJECTS
Project Number | Title | exacttarget_id | Author | Body | Date
USERS
Name | Email | Date Created | Password
Thanks in advance for any insights you can offer.
What I would do is create 2 tables:
I would create one table for the different companies, lets call it COMPANY:
Company_id | Title | Logo | (Whatever other data you want)
I would also create one table for the settings listed above, lets call it COMPANY_SETTINGS:
Company_id | Key | Value
This gives you the flexibility in the future to add additional settings without compromising your existing code. A simple query gets all the settings, regardless of how many your current version uses.
SELECT Key, Value FROM COMPANY_SETTINGS WHERE Company_id = :companyId
Te results can then be put into an associative array for easy use throughout the project.
I'm working on a comic book database project, and I need to be able to include the various locations within a particular comic issue. There are a couple issues I have to work with:
Locations are more often than not inside other locations (the "Daily
Bugle building" is on "The corner of 39th street and 2nd Avenue" is
in "New York City" is in "New York", etc.)
While the hierarchy of locations is pretty standard
(Universe->Dimension->Galaxy->System->Planet->Continent->Country->State->City->Street->Building->Room),
not all the parent locations are necessarily known for every location
(a comic might involve a named building in an unnamed country in
Africa for instance).
There are a few locations that don't fit into that nice hierarchy but
branch off at some point (for instance, "The Savage Land" is a giant
jungle in Antarctica, so while its parent is a Continent, it is not a
country).
My main goal is to be able to run a search for any location and get all issues that have that location or any locations within that location. A secondary goal is to be able on the administration side of the application to be able to autocomplete full locations (ie I type in a new building for an issue and specify that it is in New York City, and it pulls all "New York City" instances -- yes, there is more than one :P -- in the database and lets me chose the one in Earth-616 or the one in Earth-1610 or I can just add a new New York City under different parent locations). All that front-end stuff I can do and figure out when the time comes, I'm just unsure of the database setup at this point.
Any help would be appreciated!
Update:
After a lot of brainstorming with a couple peers, I think I have come up with a solution that is a bit simpler than the nested model that has been suggested.
The location table would look like this:
ID
Name
Type (enum list of the previously mentioned categories, including an
'other' option)
Uni_ID (ID of the parent universe, null if not applicable)
Dim_ID (ID of the parent Dimension, null if not applicable)
Gal_ID (ID of the parent Galaxy, null if not applicable)
...and so on through all the categories...
Bui_ID (ID of the parent Building, null if not applicable)
So while there are a lot of fields, searching and autocomplete work really easily. All the parents of any given location are right there in the row, all the children of any location can be found with a single query, and as soon as a type is defined for a new location, autocomplete would work easily. At this point, I'm leaning towards this approach instead of the nested model, unless anyone can point out any problems with this setup that I haven't seen.
For hierarchical data, I always prefer using a nested set model to a parent->child (adjacency) model. Look here for a good explanation and example queries. It's a more complicated data model, but it makes querying and searching the data much easier.
I really like what #King Isaac linked earlier about the nested set model. The only arguments I have with what the link said is scalability. If you're defining your lft and rgt boundaries, you have to know how many elements you have, or you have to set arbitrarily large numbers and just hope that you never reach it. I don't know how big this database will be and how many entries you'll have, but it's good to implement a model that doesn't require re-indexing and the such. Here's my modified version
create table #locations (id varchar(100),
name varchar(300),
descriptn varchar(500),
depthLevelId int)
create table #depthLevel(id int,
levelName varchar(300))
***Id level structuring***
10--level 1
100 101-- level 2
1000 1001 1010 1011 --level 3
10000 10001 10010 10011 10100 10101 10110 10111 --level 4
Essentially this makes for super simple queries. The important part is the child id is comprised of the parent id plus whatever random id you want to give it. It doesn't even have to be sequential, just unique. You want everything in the universe?
SELECT *
FROM #locations
WHERE id like '10%'
You want something down the 4th level?
SELECT *
FROM #locations
WHERE id like '10000%'
The id's might get a little long when you get down so many levels but does that really matter when you're writing simple queries? And since it's just a string you can have a very large amount of expandability without ever having to reindex.
I have a categories table in MySql something like this:
categoryId | categoryTitle | definedField | parentId
1 Title 123 NULL
2 AnotherTitle 234 1
3 AndAnotherOne NULL 1
What I need to do is find the closest definedField value by going up to parent,like this;
Since category 2 has a definedField, return its value;
Since category 3 does not have a definedField, search up, to its parent. It has definedField, so return it. If it didn't have one, search up until find one.
There will ALLWAYS be the topmost category that will have definedField set. I only need to find a good algorithm to search for this in a MySQL InnoDb table.
There is no direct way of retrieving hierarchical data in MySQL (like, for example, Postgres's RECURSIVE query). There is a good article summarizing different ways of implementing nested data set in MySQL: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
Most users at one time or another have dealt with hierarchical data in
a SQL database and no doubt learned that the management of
hierarchical data is not what a relational database is intended for.
The tables of a relational database are not hierarchical (like XML),
but are simply a flat list. Hierarchical data has a parent-child
relationship that is not naturally represented in a relational
database table.
The article covers two models: Adjacency List and Nested Set.
The Adjacency List Model
In the adjacency list model, each item in the table contains a pointer
to its parent. The topmost element, in this case electronics, has a
NULL value for its parent. The adjacency list model has the advantage
of being quite simple, it is easy to see thatFLASH is a child ofmp3
players, which is a child of portable electronics, which is a child of
electronics. While the adjacency list model can be dealt with fairly
easily in client-side code, working with the model can be more
problematic in pure SQL.
The Nested Set Model
In the Nested Set Model, we can look at our hierarchy in a new way,
not as nodes and lines, but as nested containers.