MySQL 1:N Data Mapping - mysql

Something really bugs me and im not sure what is the "correct" approach.
If i make a select to get contacts from my database there are a decent amount of joins involved.
It will look something like this (around 60-70 columns):
SELECT *
FROM contacts
LEFT JOIN company
LEFT JOIN person
LEFT JOIN address
LEFT JOIN person_communication
LEFT JOIN company_communication
LEFT JOIN categories
LEFT JOIN notes
company and person are 1:1 cardinality so its straight forward.
But "address", "communication" and "categories" are 1:n cardinality.
So depending on the amount of rows in the 1:n tables i will get a lot of "double" rows (I don't know whats the real term for that, the rows are not double i know that the address or phone number etc is different). For myself as a contact, a fairly filled contact, i get 85 rows back.
How do you guys work with that?
In my PHP application i always wrote some "Data-Mapper" where the array key was the "contact.ID aka primary" and then checked if it exists and then pushed the additional data into it. Also PHP is not really type strict what makes it easy.
Now I'm learning GO(golang) and i thought screw that LOOOONG select and data mapping just write selects for all the 1:n.... yeah no, not enough connections to load a table full of contacts. I know that i can increase the connections but the error seems to imply that this would be the wrong way.
I use the following driver: https://github.com/go-sql-driver/mysql
I also tried GROUP_CONCAT but then i running in trouble parsing it back.
Do i have to do my mapping approach again or is there some nice solution out there? I found it quite dirty at points tho?

The solution is simple: you need to execute more than one query!
The cause of all the "duplicate" rows is that you're generating a result called a Cartesian product. You are trying to join to several tables with 1:n relationships, but each of these has no relationship to the other, so there's no join condition restricting them with respect to each other.
Therefore you get a result with every combination of all the 1:n relationships. If you have 3 matches in address, 5 matches in communication, and 5 matches in categories, you'd get 3x5x5 = 75 rows.
So you need to run a separate SQL query for each of your 1:n relationships. Don't be afraid—MySQL can handle a few queries. You need them.

Related

SQL Query relationship between 3 tables

I have a scenario where I have the following tables:
Inventories
delivery_items
deliveries
I seek a query where, having the inventory id, I get the delivery_item(fk_inventory),
which then I get the delivery from the (fk_delivery).
Manually, I go to the delivery_items table, then I search for the fk_inventory that matches the id from the inventory that I'm looking for,
then I get the fk_delivery, and get the delivery.
But I need to run a report on 15k+ items.
How to write a query where from a list of inventory ids I can get to the delivery following the relationship that I mentioned above?
There are many sites on writing SQL queries, differentiating between a normal (inner) join vs outer join, left join, right join, subqueries, etc. What you are looking to do is probably best (due to all inventory items in question) is simple joins.
Try to think of it this way, and maybe do it this way. Have a sheet of paper, one representing each table and write the columns on it.
Now, visually looking at the available tables, put them next to each other based on how they are related. Note the column in table A that is the foreign key to the next table. Then again, from the second to the third.
Once you have this done (or even if just mentally), you can SEE how they are related. This is the basis of the FROM clause
select *
from
YourFirstTable yft
JOIN YourSecondTable yst
on yft.WhateverKey = yst.MatchingKeyColumn
JOIN YourThirdTable ytt
on yst.KeyToThirdTable = ytt.KeyInThisTable
Now that you have all your relationships established, you can always declare the individual columns you want from those respective tables. Easier to use with the aliases such as I provided here via yft, yst, ytt representing the first, second and third tables. Use aliases appropriate to your tables such as i=inventories, di = delivery_items, d = deliveries.
Then add whatever FILTERING conditions you want. If the condition is based on the FIRST Table such as yft above, that would go into the WHERE clause such as
where
yft.SomeColumn = 'blah'
If the filtering criteria is specific to your second or third table, just add that to the JOIN / ON condition so it stays with the table and you know contextually it is associated HERE. It makes it easier when you are getting into LEFT JOINs.
from
YourFirstTable yft
JOIN YourSecondTable yst
on yft.WhateverKey = yst.MatchingKeyColumn
AND yst.SecondTableColumn = 'someOtherValue'
AND yst.SomeOtherColumn = 'somethingElse'
So now, the engine can go through all inventory items, to the corresponding details, to the actual deliveries without having to do individual searches each time which would be painful to trace / run / and performance.

Modelling ownership in MySQL

I have a table Things and I want to add ownership relations to a table Users. I need to be able to quickly query the owners of a thing and the things a user owns. If I know that there will be at most 50 owners, and the pdf for the number of owners will probably look like this, should I rather
add 50 columns to the Things table, like CoOwner1Id, CoOwner2Id, …, CoOwner50Id, or
should I model this with a Ownerships table which has UserId and ThingId columns, or
would it better to create a table for each thing, for example Thing8321Owners with a row for each owner, or
perhaps a combination of these?
The second choice is the correct one; you should create an intermediate table between the table Things and the table Owners (that contains the details of each owner).
This table should have the thing_id and the owner_id as the primary key.
So finally, you well have 3 tables:
Things (the things details and data)
Owner (the owners details and data)
Ownerships (the assignment of each thing_id to an owner_id)
Because in a relational DB you should not have any redundant data.
You should definitely go with option 2 because what you are trying to model is a many to many relationship. (Many owners can relate to a thing. Many things can relate to an owner.) This is commonly accomplished using what I call a bridging table. (Which exactly what option 2 is.) It is a standard technique in a normalized database.
The other two options are going to give you nightmares trying to query or maintain.
With option 1 you'll need to join the User table to the Thing table on 50 columns to get all of your results. And what happens when you have a really popular thing that 51 people want to own?
Option 3 is even worse. The only way to easily query the data is to use dynamic sql or write a new query each time because you don't know which Thing*Owners table to join on until you know the ID value of the thing you're looking for. Or you're going to need to join the User table to every single Thing*Owners table. Adding a new thing means creating a whole new table. But at least a thing doesn't have a limit on the number of owners it could possibly have.
Now isn't this:
SELECT Users.Name, Things.Name
FROM Users
INNER JOIN Ownership ON Users.UserId=Ownership.UserId
INNER JOIN Things ON Things.ThingId=Ownership.ThingId
much easier than any of those other scenarios?

Correlate and Sum Table Data as Summary

I want to summarize rows from one end of a relationship tree with a table on the other side. Is "correlate" the correct term? Really just knowing the terms would help me solve this problem.
I am using MySQL and am extending an existing DB structure - though would have the liberty to rearrange data if needed. I'm getting better at creating "filtering" queries using JOINs, and I'm sure this next piece will be straight-forward once I understand it (without performing tons of queries : )
I made a simplified schema (and theme!) for this example, but the idea is the same.
Say there are many DietPlans, which is related to a bunch of MenuItems and each MenuItem has an ItemType (such as 'Healthy','Fast','Normal', etc.) On the other side of DietPlan there are Persons, who each store how many DailyCalories they consume, and another table MenuAllocations, where a Person stores how much percent of their daily intake is from what MenuItem.
As examples of scale, There could be 1000 MenuItems, and 50 of those associated with each of 200 DietPlans. Also, each DietPlan might have 10,000 Persons, who each will have 5-10 MenuAllocations of various types.
What I'd like to do feels complex to me. I want to create a dashboard for each DietPlan (there could be many), gathering data from the Persons of that DietPlan, and tabulating the number of calories for each item type.
The math is simple: tblPerson.dailyCalories * tblMenuAllocations.percent. But I want to do that for each Person in the DietPlan, for each ItemType.
I understand the JOINs required to 'filter' from tblItemType around to tblMenuAllocation and think it would be similar to this:
SELECT *
FROM tblMenuAllocation
INNER JOIN tblPerson
on personId = PersonId
INNER JOIN tblDietPlan
on tblPerson.dietPlanId = tblDietPlan.DietPlanId
INNER JOIN tblMenuItem
on tblMenuItem.dietPlanId = tblDietPlan.DietPlanId
INNER JOIN tblItemTyp
on ItemTypeId = itemTypeId
WHERE ItemTypeId = 2
It feels like one query for each tblItemType, which could be a LOT of Person and MenuAllocation data to sort through, and doing that many consecutive queries feels like I'm missing something. Also, I think math can be handled in the query to sum values, but I've never done that. Where can I begin?
EDIT: The final results would be something like this:
----------------------------------------------
ItemId | ItemDesc | TotalCalories
----------------------------------------------
1 Healthy 450,876
2 Fast 1,987,948
3 Vegan 349,123
etc.
I would be willing to accept some manipulation of data outside the query, but the Person's specific dailyCalories is very important to the tblMenuAllocation.percent calculation. Some tblMenuAllocation rows might be of the same ItemType!
I think you are looking for these topics :
Aggregate Functions and
Group By Modifiers

Best way to do a query with a large number of possible joins

On the project I'm working on we have an activity table and each activity can be linked to one of about 20 different "activity details" tables...
e.g. If the activity was of type "work", then it would have a corresponding activity_details_work record, if it was of type "sick leave" then it would have a corresponding activity_details_sickleave record and so on.
Currently we are loading the activities and then for each activity we have a separate query to go fetch the activity details from the relevant table. This obviously doesn't scale well if you have thousands of activities.
So my initial thought was to have a single query which fetches the activities and joins the details in one go e.g.
SELECT * FROM activity
LEFT JOIN activity_details_1_work ON ...
LEFT JOIN activity_details_2_sickleave ON ...
LEFT JOIN activity_details_3_travelwork ON ...
...etc...
LEFT JOIN activity_details_20_yearleave ON ...
But this will result in each record having 100's of fields, most of which are empty and that feels nasty.
Lazy-loading the details isn't really an option either as the details are almost always requested in the core logic, at least for the main types anyway.
Is there a super clever way of doing this that I'm not thinking of?
Thanks in advance
My suggestion is to define a view for each ActivityType, that is tailored specifically to that activity.
Then add an index on the Activity table lead by the ActivityType field. Cluster said index unless there is an overwhelming need for some other to be clustered (or performance benchmarking shows some other clustering selection to be more performant).
Is there a particular reason why this degree of denormalization was designed in? Is that reason well known?
Chances are your activity tables are like (date_from, date_to, with_who, descr) or something to that effect. As Pieter suggested, consider tossing in a type varchar or enum field in there, so as to deal with a single details table.
If there are rational reasons to keep the tables apart, consider adding triggers that maintain boolean/tinyint fields (has_work, has_sickleave, etc), or a bit string (has_activites_of_type where the first position amounts to has_work, the next to has_sickleave, etc.).
Either way, you'll probably be better off by fetching the activity's details in one or more separate queries -- if only to avoid field name collisions.
I don't think enum is the way to go, because as you say there might be 1000's of activities, then altering your activity table would become an issue.
There is no point doing a left join on a large number of tables either.
So the options that you have are :
See this The first comment might be useful.
I am guessing that your activity table has a field called activity_type_id.
Build a table called activity_types containing fields activity_type_id, activity_name, activity_details_table_name. First query in the following way
activity
inner join
activity_types
using( activity_type_id )
This query gives you the table name on which to query for the details.
This way you can add any new activity type just by adding a row in the activity_types table.

Empty Rows in MySQL SELECT with LEFT JOIN

I have three tables called: users, facilities, and staff_facilities.
users contains average user data, the most important fields in my case being users.id, users.first, and users.last.
facilities also contains a fair amount of data, but none of it is necessarily pertinent to this example except facilities.id.
staff_facilties consists of staff_facilities.id (int,auto_inc,NOT NULL),staff_facilities.users_id (int,NOT NULL), and staff_faciltities.facilities_id (int,NOT NULL). (That's a mouthful!)
staff_facilities references the ids for the other two tables, and we are calling this table to look up users' facilities and facilities' users.
This is my select query in PHP:
SELECT users.id, users.first, users.last FROM staff_facilities LEFT JOIN users ON staff_facilities.users_id=users.id WHERE staff_facilities.facilties_id=$id ORDER BY users.last
This query works great on our development server, but when I drop it into the client's production environment often times blank rows appear in the results set. Our development server is using the replicated tables and data that already exist on the client's production server, but the hardware and software vary quite a bit.
These rows are devoid of any information, including the three id fields that require NOT NULL values to be entered into the database. Running the query through the MySQL management tools on the backend returns the same results. Searching the table for NULL fields has not turned up anything.
The other strange thing is that the number of empty rows is changing based on the varying results caused by the WHERE clause id check. It's usually around one to three empty rows, but they are consistent when using the same parameter.
I've many times dealt with the returning of nearly duplicate rows due to LEFT JOINS, but I've never had this happen before. As far as displaying the information goes, I can easily hide it from the end user. My concern is primarily that this problem will be compounded as time passes and the number of records grows larger. As it sits, this system has just been installed, and we already have 2000+ records in the staff_facilities table.
Any insight or direction would be appreciated. I can provide further more detailed examples and information as well.
You are only selecting columns from the table on the right side of the join. Of course some of them are completely null, you did a left join. So those records match to an id in the table on the left side of the join but not to any data on the right side of the join. Since you aren't returning any columns from the left table, you see no data.