Database better inner join for performance in this case - mysql

I have 2 tables that manages the time spent on doing various things:
#times(id, time_in_minutes)
#times_intervals(id, times_id, time_in_minutes, start, end)
Then the #times might relate to different things:
#tasks(id, description)
#products(id, description, serial_number, year)
What is the best practice in order to reuse the same #times and #times_intervals for #task and #products?
I would think about:
#times(+task_id, +product_id)
// add task_id and product_id to the original #times table
But if I do so, when I'd join the #times table with #task and #products table would be slower as should choice between the 2 (task_id or product_id). When task_id is not null join on the #tasks and viceversa.
(I'm using MySQL6)
Thanks a lot

I would drop the time_in_minutes column from the times table. This information is redundant if it is just the sum of the detail and is a premature optimization.
I would add a product_time table containing product_id, times_id and a task_time table containing task_id, time_id
Then to get the total time with a product:
SELECT *
FROM product p
INNER JOIN product_time pt
ON pt.product_id = p.id
INNER JOIN (
SELECT times_id, SUM(time_in_minutes) as time_in_minutes
FROM times_intervals
GROUP BY times_id
) AS t
ON t.times_id = pt.times_id
Typically to make this perform, you would have a non-clustered covering index for times_intervals with columns times_id and time_in_minutes - note that the times table is simply a data-less header table at this point and the only purpose it to group the times_intervals and it's only necessary because you have this very similar arrangement for tasks.
If there were not two (or more) entities using the times_intervals, you might simply put product_id in the times_intervals and treat it as your header/master id.

I would suggest against adding an id column to times for every table you might join it to. It would break normalization and make joins much more complicated.
If you only have one time (or time interval) for a task or a product, make a column in that table that references the times table. Otherwise, you could make a separate table like
#multitimes(multi_id, time_id)
where the two columns together are a primary key, and then have products and tasks reference multi_id. Then each record in each of those tables can be related to any number of times without any conflicts.

Related

MySQL trigger, view, separate table, or on-the-fly calculation for loyalty points?

What is the least resource-intensive way to calculate a sum of points from two tables? The total point tally is calculated by adding points from a table points and subtracting points from a table points_redeemed.
points:
CREATE TABLE IF NOT EXISTS points(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
user__id INT,
tx__id INT,
points INT
) ENGINE=MyISAM;
points_redeemed:
CREATE TABLE IF NOT EXISTS points_redeemed(
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
user__id INT,
points INT
) ENGINE=MyISAM;
(Both tables above are heavily simplified.)
points is populated upon a transaction (recorded in a different table). When transaction values are changed or voided, the corresponding row in points is updated as well.
points_redeemed is populated when user redeems their accumulated points for a reward.
Use cases:
show stats to user and admin: total, redeemed, and unredeemed points
check unredeemed points upon user-initiated redeem request
The options I've came up with are:
a) Triggers.
Create a table points_sum with one row per user.id and add three triggers:
on insert into points
on update of points
on insert into points_redeemed
I've heard that MySQL triggers are not that performant though, so I'm simply not sure if this is a good idea.
b) View.
Create a view that calculates points.points - points_redeemed.points. Not sure if this is any better than just doing it on the fly.
c) Sum table.
Create a table points_sum and update it per separate query each time points and points_redeemed is inserted into and updated. This feels like the least effective way, but then again I could be wrong and it might be the best way.
d) On the fly.
Query points from both tables on the fly and calculate the difference. This is the easiest and probably the most accurate way, but it can potentially clog up the pipes a lot when the tables grow in size. Then again, are any of the other options any better in that regard?
Edit: These are the current on-the-fly queries.
First, a very straight-forward query from points_redeemed:
SELECT *
FROM points_redeemed
WHERE user__id = 1
Second, the points table is queried:
(
SELECT p.*,
tx.*
FROM points p
INNER JOIN tx ON p.tx__id = tx.id
WHERE p.user__id = '1'
AND p.tx_is_external IS NULL
ORDER BY p.date DESC
)
UNION
(
SELECT p.*,
tx.*
FROM points p
INNER JOIN tx_external tx ON p.tx__id = tx.id
WHERE p.user__id = '1'
AND p.tx_is_external = '1'
ORDER BY p.date DESC
)
(There are several named columns SELECTed that I abbrieviated as * here. In the second query, about 40 columns are fetched per row.)
After this, I'm looping through both result sets and adding/subtracting points on the app layer.
My worry is that the two separate queries, and the joins in the second query, might "clog the pipes" when the tx tables grow in size (and the points table too). That's why I'm trying to figure out a better way that will save resources at runtime.
The more I think about it though... transactions and points inserts will probably happen a lot more frequently than a user looking up their current point status. In that scenario, a trigger would probably have the opposite effect.
I'd appreciate any kind of insight. Thank you!
WHERE user__id = 1 needs INDEX(user__id) on the table.
( SELECT ... ORDER BY ... ) UNION ( SELECT ... ORDER BY ... ) will not have a particular order; do you need to move the ORDER BY outside?
tx and tx_external need id to be indexed (PRIMARY KEY?)
Did you really want UNION DISTINCT? That's the default. UNION ALL is faster.
Fix those, then see if you still need to discuss triggers, etc.

MySQL multiple tables relationship (code opinion)

I have 4 tables: rooms(id, name, description), clients(id, name, email), cards(id, card_number, exp_date, client_id) and orders(id, client_id, room_id, card_id, start_date, end_date).
The tables are all InnoDB and are pretty much simple. What I need is to add relationships between them. What I did was to assign cards.client_id as a Foreign Key to db.clients and orders.client_id, orders.room_id and orders.card_id as Foreign Keys to the other tables.
My question: is this way correct and reliable? I never had the need to use Foreign Key before now and this is my first try. All the Foreign Keys are also indexes.
Also, what's the easiest way to retrieve all the information I need for db.orders ?
I need a query to output: who is the client, what's his card details, what room/s did he ordered and what's the period he's checked in.
Can I accomplish this query based on the structure I created?
You must create the FK's in all columns that relate to other tables. In your case, create on: cards.client_id, orders.client_id, orders.room_id, orders.card_id
In the case of MySQL it automatically creates indexes for these FK's.
On your select, I believe it can be the following:
SELECT * FROM orders
INNER JOIN client on client.id = orders.client_id
INNER JOIN cards on cards.client_id = client.id
INNER JOIN rooms on rooms.id = orders.room_id
I do not know what columns you need, there is only you replace the * by the columns you need, so SQL is faster.

A MySQL query addressing three tables: How many from A are not in B or C?

I have a problem formulating a MySQL query to do the following task, although I have seen similar queries discussed here, they are sufficiently different from this one to snooker my attempts to transpose them. The problem is (fairly) simple to state. I have three tables, 'members', 'dog_shareoffered' and 'dog_sharewanted'. Members may have zero, one or more adverts for things they want to sell or want to buy, and the details are stored in the corresponding offered or wanted table, together with the id of the member who placed the ad. The column 'id' is unique to the member, and common to all three tables. The query I want is to ask how many members have NOT placed an ad in either table.
I have tried several ways of asking this. The closest I can get is a query that doesn't crash! (I am not a MySQL expert by any means). The following I have put together from what I gleaned from other examples, but it returns zero rows, where I know the result should be greater than zero.
SELECT id
FROM members
WHERE id IN (SELECT id
FROM dog_sharewanted
WHERE id IS NULL)
AND id IN (SELECT id
FROM dog_shareoffered
WHERE id IS NULL)
THis query looks pleasingly simple to understand, unlike the 'JOIN's' I've seen but I am guessing that maybe I need some sort of Join, but how would that look in this case?
If you want no ads in either table, then the sort of query you are after is:
SELECT id
FROM members
WHERE id NOT IN ( any id from any other table )
To select ids from other tables:
SELECT id
FROM <othertable>
Hence:
SELECT id
FROM members
WHERE id NOT IN (SELECT id FROM dog_shareoffered)
AND id NOT IN (SELECT id FROM dog_sharewanted)
I added the 'SELECT DISTINCT' because one member may put in many ads, but there's only one id. I used to have a SELECT DISTINCT in the subqueries above but as comments below mention, this is not necessary.
If you wanted to avoid a sub-query (a possible performance increase, depending..) you could use some LEFT JOINs:
SELECT members.id
FROM members
LEFT JOIN dog_shareoffered
ON dog_shareoffered.id = members.id
LEFT JOIN dog_sharewanted
ON dog_sharewanted.id = members.id
WHERE dog_shareoffered.id IS NULL
AND dog_sharewanted.id IS NULL
Why this works:
It takes the table members and joins it to the other two tables on the id column.
The LEFT JOIN means that if a member exists in the members table but not the table we're joining to (e.g. dog_shareoffered), then the corresponding dog_shareoffered columns will have NULL in them.
So, the WHERE condition picks out rows where there's a NULL id in both dog_shareoffered and dog_sharewanted, meaning we've found ids in members with no corresponding id in the other two tables.

MySQL: translate id's to values

I have a table that contains alot of columns with ids(keys) corresponding to other tables.
for example, I have a table of cars that were sold
[table of cars that were sold]
(
car_make_id
, car_engine_id
, car_model_id
, car_radio_id
, buyer_id
, seller_id
, car_tittle_id
, sale_price
)
with each one of the id fields having another table containing the id and name like:
[another table]
(
car_make_id
, car_engine_id
, car_model_id
, car_radio_id
, buyer_id
, seller_id
, car_tittle_id
, sale_price
)
[and another table]
(
car_make
, car_make_id
)
[and another table]
(
car_title
, car_title_id
)
etc,...with each table named car_lookup, car_model_lookup,...
Is there anyway to join all these simply without writing a million subqueries. The are millions of entries in this table, and each additional join costs alot in terms of time. I am looking for a fast and efficient way of comparing this data against another table that doesn't have id's, but just the names. lets say I have a list of compatible radios that would have(make, model, engine, radio) and I want to have a list of all the sellers names who sold cars with incompatible radios, and how many incompatible sales they made.
I have been doing stuff like this in perl, but it can take hours to run. so I am looking for something that can be done in mysql.
ps: the car stuff is just an example, I don't actually work with cars, but it illustrates the problem I am having. I cannot change the way the database is set up either, due to a large number of code that already queries the data.
Thanks
You need some way of telling the database which tables to pull names from for each ID.
If this kind of query is too slow, perhaps you can optimize your database or MySQL server to be able to fill these JOIN statements faster. Try increasing cache sizes (especially if your server has much RAM) and make sure you have key indexing on those lookup tables.
SELECT car_make, car_engine, car_model, car_radio,
buyer, seller, car_title, sale_price FROM cars_sold
JOIN car_make_lookup USING (car_make_id)
JOIN car_engine_lookup USING (car_engine_id)
JOIN car_title_lookup USING (car_title_id)
JOIN car_model_lookup USING (car_model_id)
JOIN car_radio_lookup USING (car_radio_id)
JOIN buyer_lookup USING (buyer_id)
JOIN seller_lookup USING (seller_id)
JOIN car_title_lookup USING (car_title_id)

Should I relate all of my MySQL tables to each other?

I'm working on a personal project for timekeeping on various projects, but I'm not sure of the best way to structure my database.
A simplified breakdown of the structure is as follows:
Each client can have multiple reports.
Each report can have multiple line items.
Each line item can have multiple time records.
There will ultimately be more relationships, but that's the basis of the application. As you can see, each item is related to the item beneath it in a one-to-many relationship.
My question is, should I relate each table to each "parent" table above it? Something like this:
clients
id
reports
id
client_id
line_items
id
report_id
client_id
time_records
id
report_id
line_item_id
client_id
And as it cascaded down, there would be more and more foreign keys added to each new table.
My initial reaction is that this is not the correct way to do it, but I would love to get some second(and third!) opinions.
The advantage of the way you're doing it is that you could check all time records for, say, a specific client id without needing a join. But really, it isn't necessary. All you need is to store a reference back up one "level" so to speak. Here are some examples from the "client" perspective:
To get a specific client's reports: (simple; same as current schema you suggest)
SELECT * FROM `reports`
WHERE `client_id` = ?;
To get a specific client's line items: (new schema; don't need "client_id" in table)
SELECT `line_items`.* FROM `line_items`
JOIN `reports` ON `reports`.`id` = `line_items`.`id`
JOIN `clients` ON `clients`.`id` = `reports`.`client_id`
WHERE `clients`.`id` = ?;
To get a specific client's time entries: (new schema; don't need "client_id" or "report_id" in table)
SELECT `time_records`.* FROM `time_records`
JOIN `line_items` ON `line_items`.`id` = `time_records`.`line_item_id`
JOIN `reports` ON `reports`.`id` = `line_items`.`id`
JOIN `clients` ON `clients`.`id` = `reports`.`client_id`
WHERE `client_id` = ?;
So, the revised schema would be:
clients
id
reports
id
client_id
line_items
id
report_id
time_records
id
line_item_id
EDIT:
Additionally, I would consider using views to simplify the queries (I assume you'll use them often), definitely creating indexes on the join columns, and utilizing foreign key references for normalization (InnoDB only).
No, if there is no direct relation in the elements of the model, then there should not be direct relation in the corresponding tables. Otherwise your data will have redundancies and you will have problems for updating.
This is the right way:
clients
id
reports
id
client_id
line_items
id
report_id
time_records
id
line_id
You don't need to create client_id on line_items table if you never join line items directly clients, becouse you can get that by reports table. Same happens to others FKs.
I recommend you think in your report needs/queries over this collection of data before create redundant foreign keys who can complicate your development.
Create redundant FKs is not difficult if you need them in the future, some ALTERS and UPDATE SELECTS solves your problem.
If you not have so much information in the line_items, you can denormalize and add this info in the time_records.
Anywhere there is a direct relationship between two tables, you should use foreign keys to keep the data integrity. Personally, I would look at a structure like this:
Client
ClientId
Report
ReportId
ClientId
LineItem
LineItemId
ReportId
TimeRecord
TimeRecordId
LineItemId
In this example, you do not need ClientId in LineItem because you have that relationship through the Report table. The major disadvantage of having ClientId in all of your tables is that if the business logic does not enforce consistency of these values (a bug is in the code) you can run into situations where you get different values if you search based on
Report:
ReportId = 3
ClientId = 2
LineItem:
LineItemId = 1
ReportId = 3
ClientId = 3
In the above situation, you would be looking at ClientId = 2 if your query went through Report and ClientId = 3 if your query went through LineItem It is difficult once this happens to determine which relationship is correct, and where the bug is.
Also, I would advocate for not having id columns, but instead more explicit names to describe what the id is used for. (ReportId or ClientId) In my opinion, this makes Joins easier to read. As an example:
SELECT COUNT(1) AS NumberOfLineItems
FROM Client AS c
INNER JOIN Report AS r ON c.ClientId = r.ClientId
INNER JOIN LineItem AS li ON r.ReportId = li.ReportId
WHERE c.ClientId = 12
As personal opinion, I would have:
clients
id
time_records
id
client_id
report
line_item
report_id
That way all of your fields are over in the time_records table. You can then do something like:
SELECT *
FROM 'time_records'
WHERE 'time_records'.'client_id' = 16542
AND 'time_records'.'report' = 164652
ORDER BY 'time_records'.'id' ASC