Database Architecture Many-to-Many-to-Many - mysql

I have got an issue how to change a model of database:
For now we have predefined table Categories
and let's say tables Places and People which can be assigned to categories so it looks like this:
People <=> PeopleCategories <=> Categories <=> PlaceCategories <=> Places
(People can have many categories, categories can have many people, places can have many categories, categories can have many places)
But now there is a new requirement:
On person profile show all corresponding places based on categories (so far no problem) and add a tick box modeling some attribute (for example show on front-end as favorite place). The same from the other side on Place profile mark people assigned to at least one same category with a tick box.
I wonder whether there is some nice way to model this - the only thing which came to my mind is to add a new PeoplePlaces table but then I have to manually control whether people or places did not change their categories and they are still assigned and so on - There will be quite a problem with consistency of data which I will have to manage on application layer.
The second thing I could probably do is to delete categories totally and make it only on PeoplePlaces level but I will lose some simplicity for user: there are like 10 predefined categories which user can select so the linking between People and Places is quite automatic on front-end and only admin should see which places are assigned to which people and manage that tick box I was talking about
What would you suggest for this architecture? Thanks in advance! (It is a MySQL db if it is important for some kind of solution but this is more a general architecture thing)

If I understood your question correctly, you need to ensure that a person can only favor a place that is connected to the same category as the person herself?
If so, take a look at the following model:
We don't link the "endpoints" directly, and instead "link the links". This allows us to migrate PERSON_CATEGORY.CATEGORY_ID and PLACE_CATEGORY.CATEGORY_ID into the FAVORED_PLACE table, and "merge" them there, producing a single FAVORED_PLACE.CATEGORY_ID field (note FK1,FK2in the diagram above).
As a consequence, if a person is connected to a place, that must be done through a common category.
Furthermore, since CATEGORY_ID is outside PERSON_CATEGORY's PK, a particular combination of person and place can be used only once, even if they match through multiple categories. Effectively, you pick one common category as "special". If a place (or person) is removed from the special category, you'll need to pick another common category to serve as special. If there are no common categories left, the corresponding row in FAVORED_PLACE will not be allowed to exist anymore.

I don't think deleting Categories is a good idea.
What you are doing is introducing a new entity - PersonsFavouritePlaces - which relates People and Place directly rather than via a Category. It is sensible that a PersonsFavouritePlace be limited to a Person and a Place linked by Category, so it should probably reference PeopleCategories and PlaceCategories rather than the People and Category tables.
The table would look like:
create table PeopleFavourtiePlace
(
ID int not null, -- Primary key
PeopleCategoriesId int not null, -- FK to PK of PerpleCategories
PlaceCategoriesId int not null -- FK to PK of PlaceCategories
)
I don't know whether MySql supports cascading deletes, but if so the two FK's should have that turned on so when someone deselects a category (deleting the PeopleCategories row) if it linked to a favourite place in that category it too gets deleted.
However, if a person links to a place via multiple categories then it gets complicated....

Related

Classpass.com like database design

I am trying to get my head around creating classpass like database design. I'm new to database design and there are a few things that are not quite for me how to implement them and I can't quite get my head around.
You can check the classpass example:
https://classpass.com/classes
https://classpass.com/studios
EDIT 1: So here is the idea: Each city have multiple neighbourhoods having multiple studios/venues.
After reading spencer7593's comment, here is what I came with and the things that are still not quite clear:
So what I am not quite sure about is:
I am not sure how to store the venue/studio address and geolocation. Is it better to have table Region which defines id | name | parent_id and stores the cities and the neighborhoods recursively? Or add a foreign key constraint to city and neighborhoods? Should I store the lan/lon into the venue table, into the address or even separate locations table? I would like to be able to perform searches like:
show me venues in that neighborhood or city
show me venues which are in radius XX from position
Each class should have a schedule and currently I am not sure how to design it. For example: Spinning class, Mo, We, Fr from 9 AM till 10 AM. I would like
to be able to do queries like:
show me venues, which have spinning classes on Mo
or show me all classes in category Spinning, Boxing for example
or even show me venues offering spinning classes
Should I create an extra table schedules here? Or just create some kind of view which creates the schedule? If it's an extra table, how should I describe start, end of each day of the week?
#Dimitar,
Even though #rhavendc is correct, this question should be placed in Database Adminstrator, I will answer your question in respective order to the best of my knowledge.
I am not sure how to store the venue/studio address and geolocation. [...]
You can easily find Geo-Locations by searching on the web. take MyGeoPosition for example.
I would like to be able to perform searches like
show me venues in that neighborhood or city.
You can do this easily. There are a few ways to do it, and each way will require a bit of tweaking with your ERD design. With the example I attached below, you can run a query to list all the venues with the address_id followed by the city id. The yellow entities are the one I added to ensure integrity.
For example:
-- venue.name is using the "[table].[field]" format to help
-- the engine recognize where the field is coming from.
-- This is useful if you are pulling the fields of the
-- same name from different tables.
select venue.name, city.name
from venue join
address using (address_id) join
city using (city_id);
NOTE: You don't have to include the city_name. I just threw it in there so you can try it out to see all the venues matching it.
If you would like to do it by the neighborhood, you would have to tweak the ERD I gave you by adding neighbor_id in the ADDRESS table. I have attached the example below, You would also have to add neighborhood_id From there, you can run a query like this:
Using this ERD:
-- Remember the format from the previously mentioned code.
select venue.name, neighborhood.name
from venue join
address using (address_id) join
neighborhood using (neighbor_id);
show me venues which are in radius XX from position
You can calculate the amount of miles, kilometers, etc. from longitude and latitude using Haversine's Formula.
Each class should have a schedule and currently I am not sure how to design it. For example: Spinning class, Mo, We, Fr from 9 AM till 10 AM. I would like to be able to do queries like:
show me venues, which have spinning classes on Mo
or show me all classes in category Spinning, Boxing for example
or even show me venues offering spinning classes
This can be easily derived from either of the ERDs I attached here. In the CLASS table, I added a field called parent_class_id which gets the class_id from the same table. This uses recursion, and I know this is a bit of a headache to understand. This recursion will allow the classes with assigned parent class to show that the classes are also offered at different times.
You can get this result by doing so:
-- Remember the format from the previously mentioned code.
select class1.name, class1.class_id, class2.class_id
from class as class1,
class as class2
where class1.parent_class_id = class2.class_id;
or even show me venues offering spinning classes
This may be a tricky one... If you are wondering which venues are offering spinning classes, where spinning is either part of or the name of the class, not a category, it's simple.
Try this...
-- Remember the format from the previously mentioned code.
select venue_id
from venue join
class using (venue_id)
where class_name = 'spinning';
NOTE: Keep in mind that most SQL languages are case-sensitive when it comes to searching for literals. You could try using where UPPER(class_name) = 'SPINNING'.
If the class name may include words other than "spinning" in its name, use this instead: where UPPER(class_name) like '%SPINNING%'.
If you are wondering which classes are offering spinning classes where spinning is a category, that's where the tricky bit comes in. I believe you would have to use a subquery for this.
Try this:
-- Remember the format from the previously mentioned code.
select class_id
from class join
class_category using (class_id)
where cat_id = (select cat_id
from category
where name = 'spinning');
Again, SQL engines are usually sensitive when it comes to literal searches. Make sure your cases are in its correct upper or lower cases.
Should I create an extra table schedules here? Or just create some kind of view which creates the schedule? If it's an extra table, how should I describe start, end of each day of the week?
Yes and no. You could, but if you can understand recursion in database systems, you don't have to.
Hope this helps. :)
Entity Relationship Modeling.
An entity is a person, place, thing, concept or event that can be uniquely identified, is important to the business, and we can store information about.
Based on information in the question, some candidates to consider as entities might be:
studio
class
rating
neighborhood
city
For each entity, what uniquely identifies it? Figure out the candidate keys.
And figure out the relationships between the entities, and the cardinalities. (What is related to what, and how many, required or optional?)
Is a studio related to a class?
Can a studio have more than one class?
Can a studio have zero classes?
Can a class be related to more than one studio?
Is a neighborhood related to zero, one or more city?
Can a studio be related to more than one neighborhood?
Once you've got the entities and relationships, getting the attributes assigned to each entity is pretty straightforward. Just make sure every attribute is dependent on the key, the whole key, and nothing but the key.
FIRST
Your question is not suited to be posted here in Stack Overflow for I guess it's best to be posted in Database Administrators.
SECOND
Here are some info for reading, just to give you a good start for building your database:
Data Modeling (It's kinda broad but it's for the better)
Logical Data Model (Short but comprehensive one)
THIRD
Basically, when designing your database you should first know all the data that would be needed in your system and group them (if needed) to make it small. Normalize it to reduce data redundancy.
EXAMPLE
Let's assume that table venue would be your main table or the center of all the transaction in your system. By that, venue may have subdata for example branch that may hold different branch location... and that branch may have subdata too for example schedule, teacher and/or class which may also related to each other (subdata gets data from another subdata)... so forth and so on with dependent tables.
Then you can also create independent tables but still have connections with others. For example the neighborhood table, it may contain the neighbor location and main venue location (so it should get the id of selected venue from the venuetable)... so forth and so on with related and independent tables.
NOTE
Just remember the "one-to-one, one-to-many" relationship. If a data will be going to hold many kinds of subdata, just split them in different table. If a data will be going to hold only (1) kind of subdata, then put it all in one table.

Access query is duplicating unique records / Linked table issues

I hope someone can help me with this:
I have a simple query combining a list of names and basic details with another table containing more specific information. Some names will necessarily appear more than once and arbitrary distinctions like "John Smith 1" and "John Smith 2" are not an option, so I have been using an autonumber to keep the records distinct.
The problem is that my query is creating two records for each name that appears more than once. For example, there are two clients named 'Sophoan', each with a different id number, and the query has picked up each one twice resulting in four records (in total there are 122 records when there should only be 102). 'Unique values' is set to 'yes'.
I've researched as much as I can and am completely stuck. I've tried to tinker with sql but it always comes back with errors, I presume because there are too many fields in the query.
What am I missing? Or is a query the wrong approach and I need to find another way to combine my tables?
Project in detail: I'm building a database for a charity which has two main activities: social work and training. The database is to record their client information and the results of their interactions with clients (issues they asked for help with, results of training workshops etc.). Some clients will cross over between activities which the organisation wants to track, hence all registered clients go into one list and individual tables spin of that to collect data for each specific activity the client takes part in. This query is supposed to be my solution for combining these tables for data entry by the user.
At present I have the following tables:
AllList (master list of client names and basic contact info; 'Social Work Register' and 'Participant Register' join to this table by
'Name')
Social Work Register (list of social work clients with full details
of each case)
Social Work Follow-up Table (used when staff call social work clients
to see how their issue is progressing; the register has too many
columns to hold this as well; joined to Register by 'Client Name')
Participants Register (list of clients for training and details of
which workshops they were attended and why they were absent if they
missed a session)
Individual workshop tables x14 (each workshop includes a test and
these tables records the clients answers and their score for each
individual test; there will be more than 20 of these when the
database is finished; all joined to the 'Participants Register' by
'Participant Name')
Queries:
Participant Overview Query (links the attendance data from the 'Register' with the grading data from each Workshop to present a read-only
overview; this one seems to work perfectly)
Social Work Query (non-functional; intended to link the 'Client
Register' to the 'AllList' for data entry so that when a new client
is registered it creates a new record in both tables, with the
records matched together)
Participant Query (not yet attempted; as above, intended to link the
'Participant Register' to the 'AllList' for data entry)
BUT I realised that queries can't be used for data entry, so this approach seems to be a dead end. I have had some success with using subforms for data entry but I'm not sure if it's the best way.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
[N.B. There are more tables that store secondary information but aren't relevant to the issue as they are not and will not be linked to any other tables.]
I realised that queries can't be used for data entry
Actually, non-complex queries are usually editable as long as the table whose data you want to edit remains 'at the core' of the query. Access applies a number of factors to determine if a query is editable or not.
Most of the time, it's fairly easy to figure out why a query has become non-editable.
Ask yourself the question: if I edit that data, how will Access ensure that exactly that data will be updated, without ambiguity?
If your tables have defined primary keys and these are part of your query, and if there are no grouping, calculated fields (fields that use some function to change or test the value of that field), or complex joins, then the query should remain editable.
You can read more about that here:
How to troubleshoot errors that may occur when you update data in Access queries and in Access forms
Dealing with Non-Updateable Microsoft Access Queries and the Use of Temporary Tables.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
This remark actually proves that you have design issues in your database.
A basic tenet of Database Design is to remove redundancy as much as possible. One of the reasons is actually to avoid having to update the same data in multiple places.
Another remark: you are using the Client's name as a Natural Key. Frankly, it is not a very good idea. Generally, you want to make sure that what constitutes a Primary key for a table is reliably unique over time.
Using people's names is generally the wrong choice because:
people change name, for instance in many cultures, women change their family name after they get married.
There could also have been a typo when entering the name and now it can be hard to correct it if that data is used as a Foreign Key all in different tables.
as your database grows, you are likely to end up with some people having the same name, creating conflicts, or forcing the user to make changes to that name so it doesn't create a duplicate.
The best way to enforce uniqueness of records in a table is to use the default AutoNumber ID field proposed by Access when you create a new table. This is called a Surrogate key.
It's not mean to be edited, changed or even displayed to the user. It's sole purpose is to allow the primary key of a table to be unique and non-changing over time, so it can reliably be used as a way to reference a record from one table to another (if a table needs to refer to a particular record, it will contain a field that will hold that ID. That field is called a Foreign Key).
The names you have for your tables are not precise enough: think of each table as an Entity holding related data.
The fact that you have a table called AllList means that its purpose isn't that well-thought of; it sounds like a catch-all rather than a carefully crafted entity.
Instead, if this is your list of clients, then simply call it Client. Each record of that table holds the information for a single client (whether to use plural or singular is up to you, just stick to your choice though, being consistent is hugely important).
Instead of using the client's name as a key, create an ID field, an Autonumber, and set it as Primary Key.
Let's also rename the "Social Work Register", which holds the Client's cases, simply as ClientCase. That relationship seems clear from your description of the table but it's not clear in the table name itself (by the way, I know Access allows spaces in table and field names, but it's a really bad idea to use them if you care at least a little bit about the future of your work).
In that, create a ClientID Number field (a Foreign Key) that will hold the related Client's ID in the ClientCase table.
You don't talk about the relationship between a Client and its Cases. This is another area where you must be clear: how many cases can a single Client have?
At most 1 Case ? (0 or 1 Case)
exactly 1 Case?
at least one Case? (1 or more Cases)
any number of Cases? (0 or more Cases)
Knowing this is important for selecting the right type of JOIN in your queries. It's a crucial part of the design assumptions when building your database.
For instance, in the most general case, assuming that a Client can have 0 or more cases, you could have a report that displays the Client's Name and the number of cases related to them like this:
SELECT Client.Name,
Count(ClientCase.ID) AS CountOfCases
FROM Client
LEFT JOIN ClientCase
ON Client.ID = ClienCase.ClientID
GROUP BY Client.Name
You've described your basic design a bit more, but that's not enough. Show us the actual table structures and the SQL of the queries you tried. From the description you give, it's hard to really understand the actual details of the design and to tell you why it fails and how to make it work.

How to build database for variant management in a webshop

I am searching for a guideline on how to set up my database for a auction side.
My problem is, that there is a lot of different product types - let's say paintings, clothes, computers etc. They have different specifications, and it should be possible to set just Product A in size L on auction - or the whole stock of Product B e.g.
How should I build my database for optimal performance - and coding - in this case?
I would suggest the following database/object structure:
[Auction] n..1 [Category] 1..n [Variation Attribute] 1..n [Attribute Value]
An auction then has a category and several attribute values referring the variation attribute as well:
[Auction] = [Category], [Name], [Description]
[Auction_AttrVal] = [AuctionID], [VarAttrID], [AttrValID]
First of all you can have some kind of category table, which holds items like "Paintings", "Clothes", "Computers". An auction / product is assigned to one category.
Each category then defines variation attributes for this specific category. An example would be "Size" for the category "Clothes" or "CPU" for the category "Computers". You can also add predefined values for the variation attributes to limit the number of variations and avoid differentiations like "3GhZ" vs "3 GhZ".
This mechanism also allows for easy filtering of search results. You select a category and simply load all variation attributes as filters (or add a flag to an attribute to declare it as such) and offer the values for filtering to the end-user.
Furthermore you can make variation attributes for a category mandatory to force users who create the auctions (I'm assuming it's Consumer-to-Consumer) to provide sufficient information for their auction.
The code will probably be quite generic and simple. The database structure is highly flexible and extensible. Performance is much better than having all in one table. You probably should create an index (for the field AuctionID) for the Auction_AttrVal table. Please let me know if the database structure is not explained properly.

MySQL: how to do row-level security (like Oracle's Virtual Private Database)?

Say that I have vendors selling various products. So, at a basic level, I will have the following tables: vendor, product, vendor_product.
If vendor-1 adds Widget 1 to the product table, I want only vendor-1 to see that information (because that information is "owned" by vendor-1). Same goes for vendor-2. Say vendor-2 adds Widget 2, only vendor-2 should see that information.
If vendor-1 tries to add Widget 2, which was already entered by vendor-2, a duplicate entry for Widget 2 should not be made in the product table. This means that, somehow, I need to know that vendor-2 now also "owns" Widget 2.
A problem with having multiple "owners" of a piece of information is how to deal owners editing/deleting the data. Perhaps vendor-1 no longer wants Widget 2 to be available to him/her, but that doesn't necessarily apply for vendor-2.
Finally, I want the ability to flag(?) certain records as "yes, I have reviewed this data and it is correct" such that it then becomes available to all the vendors. Say I flag Widget 1 as good data, that product should now be seen by all vendors.
It seems that the solution is row level security. The problem is that I'm not too familiar with its concepts or how to implement it in MySQL. Any help is highly appreciated. Thanks.
NOTE: this problem is somewhat discussed here: Database Design: use composite key as FK, flag data for sharing?. When I asked the question, I wasn't sure how to phrase the question very well. Hopefully, I explained my problem better this time.
Mysql doesn't natively support row level security on tables. However, you can sort of implement it with views. So, just create a view on your table that exposes only the rows you want a given client to see. Then, only provide that client access to those views, and not the underlying tables.
See http://www.sqlmaestro.com/resources/all/row_level_security_mysql/
You already suggested a vendor, product and vendor_product mapping table. You want vendors to share the same product if they both want to use it, but you don't want duplicate products. Right?
If so, then define a unique index/constraint on the natural key that identifies a product (product name?).
If a vendor adds a product, and it doesn't exist, insert it into the product table, and map it to that vendor via the vendor_product table.
If the product already exists, but is mapped to another vendor, do not insert anything into the product table, and add another mapping row mapping the new vendor to the existing product (so that now the product is mapped to two vendors).
Finally, when a vendor removes a product, instead of actually removing it, just delete the vendor_product reference mapping the two. Finally, if no other vendors are still referencing a product, you can remove the product. Alternatively, you could run a script periodically that deletes all products that no longer have vendors referencing them.
Finally, have a flag on the product table that says that you've reviewed the product, and then use something like this to query for products viewable by a given vendor (we'll say vendor id 7):
select product.*
from product
left join vendor_map
on vendor_map.product_id = product.product_id
where vendor_map.vendor_id = 7
or product.reviewed = 1;
Finally, if a product is owned by multiple vendors, then you can either disallow edits or perhaps "split" the single product into a new unique product when one of the owning vendors tries to edit it, and allow them to edit their own copy of the product. They would likely need to modify the product name though, unless you come up with some other natural key to base your unique constraint on.
This sounds to me that you want to normalize your data. What you have is a 1 (product) to many (vendors) relationship. That the relationship is 1:1 for most cases and only 1:n for some doesn't really matter I would say - in general terms it's still 1:n and therefor you should design your database this way. The basic layout would probably be this:
Vendor Table
VendorId VendorName OtherVendorRelatedInformation
WidgetTable
WidgetId WidgetName WidgetFlag CreatorVendor OtherWidgetInformation
WidgetOwnerships
VendorId WidgetId OwnershipStatus OtherInformation
Update: The question of who is allowed to do what is a business problem so you need to have all the rules laid out. In the above structure you can flag which vendor created the widget. And in the ownership you can flag what the status of the ownership is, for example
CreatorFullOwnership
SharedOwnership
...
You would have to make up the flags based on your business rules and then design the business logic and data access part accordingly.

Organizational chart represented in a table

I have an Access application, in which I have an employee table. The employees are part of several different levels in the organization. The orgranization has 1 GM, 5 department heads, and under each department head are several supervisors, and under those supervisors are the workers.
Depending on the position of the employee, they will only have access to records of those under them.
I wanted to represent the organization in a table with some sort of level system. The problem I saw with that was that there are many ppl on the same level (for example supervisors) but they shouldn't have access to the records of a supervisor in another department. How should I approach this problem?
One common way of keeping this kind of hierarchical data in a database uses only a single table, with fields something like this:
userId (primary key)
userName
supervisorId (self-referential "foreign key", refers to another userId in this same table)
positionCode (could be simple like 1=lakey, 2=supervisor; or a foreign key pointing to another table of positions and such)
...whatever else you need to store for each employee...
Then your app uses SQL queries to figure out permissions. To figure out the employees that supervisor 'X' (whose userId is '3', for example) is allowed to see, you query for all employees where supervisorId=3.
If you want higher-up bosses to be able to see everyone underneath them, the easiest way is just to do a recursive search. I.e. query for everyone that reports to this big boss, and for each of them query who reports to them, all the way down the tree.
Does that make sense? You let the database do the work of sorting through all the users, because computers are good at that kind of thing.
I put the positionCode in this example in case you wanted some people to have different permissions... for example, you might have a code '99' for HR employees which have the right to see the list of all employees.
Maybe I'll let some other people try to explain it better...
Here's an article from Microsoft's Access Cookbook that explains these queries rather well.
And here is a somewhat chunky explanation of the same.
Here's a completely different method (the "adjacency list model") that you might find useful, and his explanation is pretty good. He also points out some difficulties with both methods (when he talks about the tables being "denormalized").