I'm working on a comic book database project, and I need to be able to include the various locations within a particular comic issue. There are a couple issues I have to work with:
Locations are more often than not inside other locations (the "Daily
Bugle building" is on "The corner of 39th street and 2nd Avenue" is
in "New York City" is in "New York", etc.)
While the hierarchy of locations is pretty standard
(Universe->Dimension->Galaxy->System->Planet->Continent->Country->State->City->Street->Building->Room),
not all the parent locations are necessarily known for every location
(a comic might involve a named building in an unnamed country in
Africa for instance).
There are a few locations that don't fit into that nice hierarchy but
branch off at some point (for instance, "The Savage Land" is a giant
jungle in Antarctica, so while its parent is a Continent, it is not a
country).
My main goal is to be able to run a search for any location and get all issues that have that location or any locations within that location. A secondary goal is to be able on the administration side of the application to be able to autocomplete full locations (ie I type in a new building for an issue and specify that it is in New York City, and it pulls all "New York City" instances -- yes, there is more than one :P -- in the database and lets me chose the one in Earth-616 or the one in Earth-1610 or I can just add a new New York City under different parent locations). All that front-end stuff I can do and figure out when the time comes, I'm just unsure of the database setup at this point.
Any help would be appreciated!
Update:
After a lot of brainstorming with a couple peers, I think I have come up with a solution that is a bit simpler than the nested model that has been suggested.
The location table would look like this:
ID
Name
Type (enum list of the previously mentioned categories, including an
'other' option)
Uni_ID (ID of the parent universe, null if not applicable)
Dim_ID (ID of the parent Dimension, null if not applicable)
Gal_ID (ID of the parent Galaxy, null if not applicable)
...and so on through all the categories...
Bui_ID (ID of the parent Building, null if not applicable)
So while there are a lot of fields, searching and autocomplete work really easily. All the parents of any given location are right there in the row, all the children of any location can be found with a single query, and as soon as a type is defined for a new location, autocomplete would work easily. At this point, I'm leaning towards this approach instead of the nested model, unless anyone can point out any problems with this setup that I haven't seen.
For hierarchical data, I always prefer using a nested set model to a parent->child (adjacency) model. Look here for a good explanation and example queries. It's a more complicated data model, but it makes querying and searching the data much easier.
I really like what #King Isaac linked earlier about the nested set model. The only arguments I have with what the link said is scalability. If you're defining your lft and rgt boundaries, you have to know how many elements you have, or you have to set arbitrarily large numbers and just hope that you never reach it. I don't know how big this database will be and how many entries you'll have, but it's good to implement a model that doesn't require re-indexing and the such. Here's my modified version
create table #locations (id varchar(100),
name varchar(300),
descriptn varchar(500),
depthLevelId int)
create table #depthLevel(id int,
levelName varchar(300))
***Id level structuring***
10--level 1
100 101-- level 2
1000 1001 1010 1011 --level 3
10000 10001 10010 10011 10100 10101 10110 10111 --level 4
Essentially this makes for super simple queries. The important part is the child id is comprised of the parent id plus whatever random id you want to give it. It doesn't even have to be sequential, just unique. You want everything in the universe?
SELECT *
FROM #locations
WHERE id like '10%'
You want something down the 4th level?
SELECT *
FROM #locations
WHERE id like '10000%'
The id's might get a little long when you get down so many levels but does that really matter when you're writing simple queries? And since it's just a string you can have a very large amount of expandability without ever having to reindex.
Related
I am trying to get my head around creating classpass like database design. I'm new to database design and there are a few things that are not quite for me how to implement them and I can't quite get my head around.
You can check the classpass example:
https://classpass.com/classes
https://classpass.com/studios
EDIT 1: So here is the idea: Each city have multiple neighbourhoods having multiple studios/venues.
After reading spencer7593's comment, here is what I came with and the things that are still not quite clear:
So what I am not quite sure about is:
I am not sure how to store the venue/studio address and geolocation. Is it better to have table Region which defines id | name | parent_id and stores the cities and the neighborhoods recursively? Or add a foreign key constraint to city and neighborhoods? Should I store the lan/lon into the venue table, into the address or even separate locations table? I would like to be able to perform searches like:
show me venues in that neighborhood or city
show me venues which are in radius XX from position
Each class should have a schedule and currently I am not sure how to design it. For example: Spinning class, Mo, We, Fr from 9 AM till 10 AM. I would like
to be able to do queries like:
show me venues, which have spinning classes on Mo
or show me all classes in category Spinning, Boxing for example
or even show me venues offering spinning classes
Should I create an extra table schedules here? Or just create some kind of view which creates the schedule? If it's an extra table, how should I describe start, end of each day of the week?
#Dimitar,
Even though #rhavendc is correct, this question should be placed in Database Adminstrator, I will answer your question in respective order to the best of my knowledge.
I am not sure how to store the venue/studio address and geolocation. [...]
You can easily find Geo-Locations by searching on the web. take MyGeoPosition for example.
I would like to be able to perform searches like
show me venues in that neighborhood or city.
You can do this easily. There are a few ways to do it, and each way will require a bit of tweaking with your ERD design. With the example I attached below, you can run a query to list all the venues with the address_id followed by the city id. The yellow entities are the one I added to ensure integrity.
For example:
-- venue.name is using the "[table].[field]" format to help
-- the engine recognize where the field is coming from.
-- This is useful if you are pulling the fields of the
-- same name from different tables.
select venue.name, city.name
from venue join
address using (address_id) join
city using (city_id);
NOTE: You don't have to include the city_name. I just threw it in there so you can try it out to see all the venues matching it.
If you would like to do it by the neighborhood, you would have to tweak the ERD I gave you by adding neighbor_id in the ADDRESS table. I have attached the example below, You would also have to add neighborhood_id From there, you can run a query like this:
Using this ERD:
-- Remember the format from the previously mentioned code.
select venue.name, neighborhood.name
from venue join
address using (address_id) join
neighborhood using (neighbor_id);
show me venues which are in radius XX from position
You can calculate the amount of miles, kilometers, etc. from longitude and latitude using Haversine's Formula.
Each class should have a schedule and currently I am not sure how to design it. For example: Spinning class, Mo, We, Fr from 9 AM till 10 AM. I would like to be able to do queries like:
show me venues, which have spinning classes on Mo
or show me all classes in category Spinning, Boxing for example
or even show me venues offering spinning classes
This can be easily derived from either of the ERDs I attached here. In the CLASS table, I added a field called parent_class_id which gets the class_id from the same table. This uses recursion, and I know this is a bit of a headache to understand. This recursion will allow the classes with assigned parent class to show that the classes are also offered at different times.
You can get this result by doing so:
-- Remember the format from the previously mentioned code.
select class1.name, class1.class_id, class2.class_id
from class as class1,
class as class2
where class1.parent_class_id = class2.class_id;
or even show me venues offering spinning classes
This may be a tricky one... If you are wondering which venues are offering spinning classes, where spinning is either part of or the name of the class, not a category, it's simple.
Try this...
-- Remember the format from the previously mentioned code.
select venue_id
from venue join
class using (venue_id)
where class_name = 'spinning';
NOTE: Keep in mind that most SQL languages are case-sensitive when it comes to searching for literals. You could try using where UPPER(class_name) = 'SPINNING'.
If the class name may include words other than "spinning" in its name, use this instead: where UPPER(class_name) like '%SPINNING%'.
If you are wondering which classes are offering spinning classes where spinning is a category, that's where the tricky bit comes in. I believe you would have to use a subquery for this.
Try this:
-- Remember the format from the previously mentioned code.
select class_id
from class join
class_category using (class_id)
where cat_id = (select cat_id
from category
where name = 'spinning');
Again, SQL engines are usually sensitive when it comes to literal searches. Make sure your cases are in its correct upper or lower cases.
Should I create an extra table schedules here? Or just create some kind of view which creates the schedule? If it's an extra table, how should I describe start, end of each day of the week?
Yes and no. You could, but if you can understand recursion in database systems, you don't have to.
Hope this helps. :)
Entity Relationship Modeling.
An entity is a person, place, thing, concept or event that can be uniquely identified, is important to the business, and we can store information about.
Based on information in the question, some candidates to consider as entities might be:
studio
class
rating
neighborhood
city
For each entity, what uniquely identifies it? Figure out the candidate keys.
And figure out the relationships between the entities, and the cardinalities. (What is related to what, and how many, required or optional?)
Is a studio related to a class?
Can a studio have more than one class?
Can a studio have zero classes?
Can a class be related to more than one studio?
Is a neighborhood related to zero, one or more city?
Can a studio be related to more than one neighborhood?
Once you've got the entities and relationships, getting the attributes assigned to each entity is pretty straightforward. Just make sure every attribute is dependent on the key, the whole key, and nothing but the key.
FIRST
Your question is not suited to be posted here in Stack Overflow for I guess it's best to be posted in Database Administrators.
SECOND
Here are some info for reading, just to give you a good start for building your database:
Data Modeling (It's kinda broad but it's for the better)
Logical Data Model (Short but comprehensive one)
THIRD
Basically, when designing your database you should first know all the data that would be needed in your system and group them (if needed) to make it small. Normalize it to reduce data redundancy.
EXAMPLE
Let's assume that table venue would be your main table or the center of all the transaction in your system. By that, venue may have subdata for example branch that may hold different branch location... and that branch may have subdata too for example schedule, teacher and/or class which may also related to each other (subdata gets data from another subdata)... so forth and so on with dependent tables.
Then you can also create independent tables but still have connections with others. For example the neighborhood table, it may contain the neighbor location and main venue location (so it should get the id of selected venue from the venuetable)... so forth and so on with related and independent tables.
NOTE
Just remember the "one-to-one, one-to-many" relationship. If a data will be going to hold many kinds of subdata, just split them in different table. If a data will be going to hold only (1) kind of subdata, then put it all in one table.
I have following requirements for item management.
Item can be moved from location 'A' to 'B'. And later on it can also be moved from 'B' to 'C' location.
History should be maintained for each item to display it location wise items for specific period, can be display item wise history.
Also I need to display items 'in transit' on particular date.
Given below is the database design:
item_master
-----------
- ItemId
- Item name
- etc...
item_location_history
------------------
- ItemId
- LocationId (foreign key of location_master)
- Date
While item is being transported I want to insert data in following way:
1. At the time of transport I want to enter item to be moved from location 'A' to 'In Transit' on particular date. As there is possibilities that item remains in 'in transit' state for several days.
2. At the time of receive at location 'B' I want to insert item to be moved from 'In Transit' to location 'B' on particular date and so on.
This way I will have track of both 'In Transit' state and item location.
What is the best way to achieve this? What changes I need to apply to the above schema? Thanks.
Initial Response
What is the best way to achieve this?
This is a simple and common Data Modelling Problem, and the answer (at least in the Relational Database context) is simple. I would say, every database has at least a few of these. Unfortunately, because the authors who write books about the Relational Model, are in fact completely ignorant of it, they do not write about this sort of simple straight-forward issue, or the simple solution.
What you are looking for is an OR gate. In this instance, because the Item is in a Location XOR it is InTransit, you need an XOR gate.
In Relational terms, this is a Basetype::Subtype structure. If it is implemented properly, it provides full integrity, and eliminates Nulls.
As far as I know, it is the only Relational method. Beware, the methods provided by famous writers are non-relational, monstrous, massively inefficient, and they don't work.
###Record ID
But first ... I would not be serving you if I didn't mention that your files have no integrity right now, you have a Record Filing System. This is probably not your fault, in that the famous writers know only pre-1970's Record Filing Systems, so that is all that they can teach, but the problem is, they badge it "relational", and that is untrue. They also have various myths about the RM, such as it doesn't support hierarchies, etc.
By starting with an ID stamped on every table, the data modelling process is crippled
You have no Row Uniqueness, as is required for RDBS.
an ID is not a Key.
If you do not understand that, please read this answer.
I have partially corrected those errors:
In Item, I have given a more useful PK. I have never heard any user discuss an Item RecordId, they always uses Codes.
Often those codes are made up of components, if so, you need to record those components in separate columns (otherwise you break 1NF).
Item needs an Alternate Key on Name, otherwise you will allow duplicate Names.
In Location, I have proposed a Key, which identifies an unique physical location. Please modify to suit.
If Location has a Name, that needs to be an AK.
I have not given you the Predicates. These are very important, for many reasons. The main reason here, is that it will prove the insanity of Record IDs. If you want them, please ask.
If you would like more information on Predicates, visit this Answer, scroll down (way down!) to Predicate, and read that section. Also check the ERD for them.
###Solution
What changes [do] I need to apply to the above schema?
Try this:
Item History Data Model
(Obsolete, refer below for the updated mode, in the context of the progression)
If you are not used to the Notation, please be advised that every little tick, notch, and mark, the solid vs dashed lines, the square vs round corners, means something very specific. Refer to the IDEF1X Notation for a full explanation, or Model Anatomy.
If you have not encountered Subtypes implemented properly before, please read this Subtype Overview
That is a self-contained document, with links to code examples
There is also an SO discussion re How to implement referential integrity in subtypes.
When contemplating a Subtype cluster, consider each Basetype::Subtype pair as a single unit, do not perceive them as two fragments, or two halves. Each pair in one fact.
ItemHistory is an event (a fact) in the history of an Item.
Each ItemHistory fact is either a Location fact XOR an InTransit fact.
Each of those facts has different attributes.
Notice that the model represents the simple, honest, truth about the real world that you are engaging. In addition to the integrity, etc, as discussed above, the result is simple straight-forward code: every other "solution" makes the code complex, in order to handle exception cases. And some "solutions" are more horrendous than others.
Dr E F Codd gave this to us in 1970. It was implemented it as a modelling method in 1984, named IDEF1X. That became the standard for Relational Databases in 1993. I have used it exclusively since 1987.
But the authors who write books, allegedly on the Relational Model, have no knowledge whatsoever, about any of these items. They know only pre-1970's ISAM Record Filing Systems. They do not even know that they do not have the Integrity, Power, or Speed of Relational Databases, let alone why they don't have it.
Date, Darwen, Fagin, Zaniolo, Ambler, Fowler, Kimball, are all promoting an incorrect view of the RM.
Response to Comments
1) ItemHistory, contains Discriminator column 'InTransit'.
Correct. And all the connotations that got with that: it is a control element; its values better be constrained; etc.
Shall it be enum with the value Y / N?
First, understand that the value-stored has meaning. That meaning can be expressed any way you like. In English it means {Location|InTransit}.
For the storage, I know it is the values for the proposition InTransit are {True|False}, ...
In SQL (if you want the real article, which is portable), I intended it as a BIT or BOOLEAN. Think about what you want to show up in the reports. In this case it is a control element, so it won't be present in the user reports. There I would stick to InTransit={0|1}.
But if you prefer {Y|N}, that is fine. Just keep that consistent across the database (do not use {0|1} in one place and {Y|N} in another).
For values that do show up in reports, or columns such as EventType, I would use {InTransit|Location}.
In SQL, for implementation, if it BOOLEAN, the domain (range-of-values) is already constrained. nothing further is required.
If the column were other BOOLEAN,` you have two choices:
CHECKConstraint
CHECK #InTransit IN ( "Y", "N" )
Reference or Lookup Table
Implement a table that contains only the valid domain. The requirement is a single column, the Code itself. And you can add a column for short Descriptor that shows up in reports. CHAR(12)works nicely for me.
ENUM
There is no ENUM in SQL. Some of the non-SQL databases have it. Basically it implements option [2] above, with a Lookup table, under the covers. It doesn't realise that the rows are unique, and so it Enumerates the rows, hence the name, but it adds a column for the number, which is of course an ID replete with AUTOINCREMENT, so MySQL falls into the category of Stupid Thing to Do as described in this answer (scroll down to the Lookup Table section).
So no, do not use ENUM unless you wish to be glued at the hip to a home-grown, stupid, non-SQL platform, and suffer a rewrite when the database is ported to a real SQL platform. The platform might be stupid, but that is not a good reason to go down the same path. Even if MySQL is all you have, use one of the two SQL facilities given above, do not use ENUM.
2) Why is'ItemHistoryTransit' needed as 'Date' column
(DATETIME,not DATE, but I don't think that matters.)
[It] is there in ItemHistory?
The standard method of constraining (everything in the database is constrained) the nature of teh Basetype::Subtype relationship is, to implement the exact same PK of the Basetype in the Subtype. The Basetype PK is(ItemCode, DateTime).
[Why] will only Discriminator not work?
It is wrong, because it doesn't follow the standard requirement, and thus allows weird and wonderful values. I can't think of an instance where that could be justified, even if a replacement constraint was provided.
Second, there can well be more than two occs of ItemEventsthat are InTransitper ItemCode,`which that does not allow.
Third, it does not match the Basetype PK value.
Solution
Actually, a better name for the table would be ItemEvent. Labels are keys to understanding.
I have given the Predicates, please review carefully.
Data model updated.
Item Event Data Model
You could add a boolean field for in_transit to item_location_history so when it is getting moved from Location A to Location B, you set the LocationId to Location B (so you know where it is going) but then when it actually arrives you log another row with LocationId as LocationB but with in_transit as false. That way you know when it arrived also.
If you don't need to know where it is headed when it is "in transit" then you could just add "In Transit" as a location and keep your schema the same. In the past with an inventory applicaiton, I went as far as making each truck a location so that we knew what specific truck the item was in.
One of the techniques I've adopted over the years is to normalize transitional attributes (qty, status, location, etc.) from the entity table. If you also want to track the history, just version (versionize?) the subsequent status table.
create table ItemLocation(
ItemID int,
Effective date,
LocationID int,
Remarks varchar( 256 ),
constraint PK_ItemLocation primary key( ItemID, Effective ),
constraint FK_ItemLocation_Item foreign key( ItemID )
references Items( ID ),
constraint FK_ItemLocation_Location foreign key( LocationID )
references Locations( ID )
);
There are several good design options, I've shown the simplest, where "In transit" is implied. Consider the following data:
ItemID Effective LocationID Remarks
====== ========= ========== ===============================
1001 2015-04-01 15 In location 15
1001 2015-04-02 NULL In Transit [to location xx]
1001 2015-04-05 17 In location 17
Item 1001 appears in the database when it arrives at location 15, where it spends one whole day. The next day it is removed and shipped. Three days later it arrives at location 17 where it is remains to this day.
Implied meanings are generally frowned upon and are indeed easy to overdo. If desired, you can add an actual status field to contain "In location" and "In Transit" values. You may well consider such a course if you think there could be other status values added later (QA Testing, Receiving, On Hold, etc.). But for just two possible values, In Location or In Transit, implied meaning works.
At any rate, you know the current whereabouts of any item by fetching the LocationID with the latest Effective date. You also have a history of where the item is at any date -- and both can be had with the same query.
declare AsOf date = sysdate;
select i.*, il.Effective, IfNull( l.LocationName, 'In Transit' ) as Location
from Items i
join ItemLocation il
on il.ItemID = i.ID
and il.Effective =(
select Max( Effective )
from ItemLocation
where ItemID = il.ItemID
and Effective <= AsOf )
left join Locations l
on l.ID = il.LocationID;
Set the AsOf value to "today" to get the most recent location or set it to any date to see the location as of that date. Since the current location will be far and away the most common query, define a view that generates just the current location and use that in the join.
join CurrentItemLocation cil
on cil.ItemID = i.ID
left join Locations l
on l.ID = cil.LocationID;
I have got an issue how to change a model of database:
For now we have predefined table Categories
and let's say tables Places and People which can be assigned to categories so it looks like this:
People <=> PeopleCategories <=> Categories <=> PlaceCategories <=> Places
(People can have many categories, categories can have many people, places can have many categories, categories can have many places)
But now there is a new requirement:
On person profile show all corresponding places based on categories (so far no problem) and add a tick box modeling some attribute (for example show on front-end as favorite place). The same from the other side on Place profile mark people assigned to at least one same category with a tick box.
I wonder whether there is some nice way to model this - the only thing which came to my mind is to add a new PeoplePlaces table but then I have to manually control whether people or places did not change their categories and they are still assigned and so on - There will be quite a problem with consistency of data which I will have to manage on application layer.
The second thing I could probably do is to delete categories totally and make it only on PeoplePlaces level but I will lose some simplicity for user: there are like 10 predefined categories which user can select so the linking between People and Places is quite automatic on front-end and only admin should see which places are assigned to which people and manage that tick box I was talking about
What would you suggest for this architecture? Thanks in advance! (It is a MySQL db if it is important for some kind of solution but this is more a general architecture thing)
If I understood your question correctly, you need to ensure that a person can only favor a place that is connected to the same category as the person herself?
If so, take a look at the following model:
We don't link the "endpoints" directly, and instead "link the links". This allows us to migrate PERSON_CATEGORY.CATEGORY_ID and PLACE_CATEGORY.CATEGORY_ID into the FAVORED_PLACE table, and "merge" them there, producing a single FAVORED_PLACE.CATEGORY_ID field (note FK1,FK2in the diagram above).
As a consequence, if a person is connected to a place, that must be done through a common category.
Furthermore, since CATEGORY_ID is outside PERSON_CATEGORY's PK, a particular combination of person and place can be used only once, even if they match through multiple categories. Effectively, you pick one common category as "special". If a place (or person) is removed from the special category, you'll need to pick another common category to serve as special. If there are no common categories left, the corresponding row in FAVORED_PLACE will not be allowed to exist anymore.
I don't think deleting Categories is a good idea.
What you are doing is introducing a new entity - PersonsFavouritePlaces - which relates People and Place directly rather than via a Category. It is sensible that a PersonsFavouritePlace be limited to a Person and a Place linked by Category, so it should probably reference PeopleCategories and PlaceCategories rather than the People and Category tables.
The table would look like:
create table PeopleFavourtiePlace
(
ID int not null, -- Primary key
PeopleCategoriesId int not null, -- FK to PK of PerpleCategories
PlaceCategoriesId int not null -- FK to PK of PlaceCategories
)
I don't know whether MySql supports cascading deletes, but if so the two FK's should have that turned on so when someone deselects a category (deleting the PeopleCategories row) if it linked to a favourite place in that category it too gets deleted.
However, if a person links to a place via multiple categories then it gets complicated....
Okay, so I'm working on a project with hierarchical data that I'm using for a book-writing app like so:
top-level parent (Act) - contains Act name, position (first Act), description, and text (intro text)
mid-level parent / child (Chapter) - contains Chapter name, position (first Chapter), description, and text (intro text)
bottom-level (Section) - contains Section name, position (first Section), description, and text (actual content text)
Now, I want to have a variable number of levels (for example include sub-sections that will have the full text), but I'm not entirely sure how to create a table like this efficiently. My initial thought was to have them connected by parentId with top-level having a null for parentId.
For example, if I want to call up a top-level parent in the first position, it's not a big deal. Right now, I can search for nulled parent fields and for a position of 1.
To call up a "chapter" (mid-level), I do the same thing but get the id of the result and use it as a parentid.
The problem is for section, I'd have to have several sub-queries to get to the final result. If I want to ahve variable number of levels, I'd require numerous sub-queries which will be a performance hit.
I saw a similar question but did not really understand the answer or how i could use it.
I already considered some kind of taxonomy table or carrying over all parent ids into the children (ie. a section will have both chapter and act ids listed under parentid) but I'm not set on it. Feedbooks.com uses a similar hierarchy when submitting a book but they don't store data in a database, they just take an input and convert it to an output (a pdf, epub or whatever).
Oh and, I plan on building this in MySQL.
Ideas?
EDIT
An easier way of imagining this scenario is with a family. Let's say you have 3 grandfathers (Bill, Bernard, Boe), who have 5 children between them (John, Joey, Josh, Jeremy, Jackson), who, in turn have all 2 children of their own (examples: Donald, Duey, Donnie). And let's say the database does not store THAT relationship but rather the relationship of what child is staying where and we don't care about the biology here.
So let's say Donnie started out living with John who lives with Bill. Donnie is John's first child and John is Bill's second child. How do you query Bill's first grandchild of the second son?
Let's say they move around and now Donald is staying with John instead. Donald is John's second child (the first child position was filled with someone else) and John is still Bill's second child. How do you query Bill's second grandchild of the second son?
What if John moves with all his children to Boe's house. Boe is the second grandfather. How do you query this information now? How would you store this type of information?
What if you throw great-grandchildren into the mix now?
This is a good example to use closure tables, as on the answer you are referencing. You may want to take a look here and here.
Quoting from the first link:
The Closure Table is a design for representing trees in a relational
database by storing all the paths between tree nodes.
I am working on a reviews website. Basically you can choose a location and business type and optionally filter your search results by various business attribures. There are five tables at play here:
Businesses
ID
Name
LocationID
Locations
LocationID
LocationName
State
Attributes
AttributeID
AttributeName
AttributeValues
AttributeValueID
ParentAttributeID
AttributeValue
BusinessAttributes
ID
AttributeID
AttributeValueID
So what I need is to work out the query to use (joins?) to get a business in a particular location based on attribute values.
For example, I want to find a barber in Santa Monica with these attributes:
Price: Cheap
Open Weekends: Yes
Cuts Womens Hair: Yes
These attributes are stored in the Attributes and AttributeValues tables and are linked to the business in the BusinessAttributes table.
So let's say I have these details from the search form:
LocationID=5&Price=Cheap&Open_Weekends=Yes&Customs_Womens_Hair=Yes
I need to build the query to return the businesses that match this location and attributes.
Thank you in advance for your help and I think StackOverflow is awesome.
Thinking about your data needs, you may be a perfect candidate for a schema-free document oriented database. On a recent episode of .Net Rocks (link to show), Michael Dirolf talked about his project MongoDB.
From what I understand, you could take each Business entity and store it in the database with all its associated attributes (LocationID, Price, Open_Weekends, Customs_Womens_Hair, Etc.). Each entity stored in the store can have different combinations of attributes because there is no schema. This natively accomplishes what you are trying to do with an Attribute and Attribute_Value table.
To search the database, just ask it for all entities that have the particular set of keys and values you need. No complex joins and no loss of performance. What you are doing is exactly what schema-free, document based databases are designed for.
Michael Dirolf: Yes, I think that a lot of the people who are switching are people who have sort of got themselves into corners where they are using relational database the way that we use MongoDB.
Richard Campbell: Right.
Michael Dirolf: So having columns that, a column key and a separate column value and inserting stuff that way so that they get done in schema and all sorts of crazy stuff like that…
Richard Campbell: Yeah, now in reflection I suddenly realized I just describe your perfect customer, a guy who has taken, you know, abusing SQL Server as they say. We’re going down this funny path and you just shouldn’t be here in the first place.
If you keep going down the path of building a relational attribute/value store, your performance will suffer with the combonatoric explosion that results.