Optimizing sql join query, comparing query effectiveness

Optimizing sql join query, comparing query effectiveness - mysql

I'm a student working on a module for moodle cms (course management system) of my college. I have to write some join queries for my module. I can not make changes to table structures, they are pretty much set in stone (I didn't make them, they were given to me).
I have no experience with writing queries for large databases. I've created a working prototype of my module and now I'm trying to organize the code/optimize queries etc.
Tasks:
| id | task |
--------------------
| 1 | task1 |
| 2 | task3 |
| 3 | task3 |
| 4 | task4 |
| ... | ... |
Assets:
| id | asset |
--------------------
| 1 | task1 |
| 2 | task3 |
| 3 | task3 |
| 4 | task4 |
| ... | ... |
TaskAsset:
| id | taskid | assetid | coefficient |
-----------------------------------------------
| 1 | 2 | 33 | coefficient1 |
| 2 | 5 | 35 | coefficient2 |
| 3 | 6 | 36 | coefficient3 |
| 4 | 8 | 37 | coefficient4 |
| 5 | ... | ... | ... |
$query = "SELECT TaskAsset.id as id, Assets.asset AS asset, Tasks.task AS task
, coefficient
FROM Tasks, Assets, Taskasset
WHERE Taskasset.taskid= Tasks.id AND TaskAsset.assetid = Assets.id";
$result = mysql_query($query) or die(mysql_error());
while($row = mysql_fetch_array($result))
{
echo $row['id']." - ".$row['asset']." - ".$row['task'] . $row['coefficient'];
echo "<br />";
}
Questions:
1.) So, if table structures are like these, is my query effective?
If they are, is a simple join still effective if I have to join more tables? Like 4 or 5?
2.) How do I rate effectiveness of queries? In phpmyadmin, I can see the time it took for the query to run. I've never used anything else for this because my tables had very few records, so it did not matter.

The only thing that I would do differently is explicitly specify the joins.
$query = "SELECT ta.id as id, a.asset AS asset, t.task AS task
, coefficient
FROM TaskAsset ta
JOIN Tasks t ON ta.taskId = t.id
JOIN Assets a ON ta.assetId = a.id";
This does the same thing but I personally prefer it a lot better. That said, you should try to run an EXPLAIN on your query. That is where you'll see the pressure points.

Your query is fine as is from an optimality standpoint, assuming indexes are present on the id fields of the tables. With the right indexes, you can join many more tables and the performance will still be good.
You should try to get yourself familiar with the ANSI join syntax - as this is much easier to read than the old FROM x, y, z ... style joins - and it's also more difficult to get wrong!

This query is appropriate for the results that you want.
TaskAssets is a mapping table that is meant to join columns of Task and Asset together by foreign keys. You need to view columns from all three tables for your result set so this is the most efficient way for it to be done.

What might be even more important than the query are the indexes in the tables.
You are doing
SELECT ta.id as id, a.asset AS asset, t.task AS task, coefficient
FROM TaskAsset ta
JOIN Tasks t ON ta.taskId = t.id <-- equi join here
JOIN Assets a ON ta.assetId = a.id <-- another equi join.
This query has two equi joins.
Always assign indexes on fields involved in an equi-join.
Consider assigning indexes on fields involved in a where clause (this query doesn't have any but that's beside the point)
Strongly consider putting an index on a field used in a group by clause

Related

Left Join takes very long time on 150 000 rows

I am having some difficulties to accomplish a task.
Here is some data from orders table:
+----+---------+
| id | bill_id |
+----+---------+
| 3 | 1 |
| 9 | 3 |
| 10 | 4 |
| 15 | 6 |
+----+---------+
And here is some data from a bills table:
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
+----+
I want to list all the bills that have no order associated with.
In order to achieve that, I thought that the use of LEFT JOIN was appropriated so I wrote this request:
SELECT * FROM bills
LEFT JOIN orders
ON bills.id = orders.bill_id
WHERE orders.bill_id IS NULL;
I thought that I would have the following result:
+----------+-----------+----------------+
| bills.id | orders.id | orders.bill_id |
+----------+-----------+----------------+
| 2 | NULL | NULL |
| 5 | NULL | NULL |
+----------+-----------+----------------+
But I can't reach the end of the request, it has run more than 5 minutes without result, I stopped the request because this can't be a production time anyway.
My real dataset has more than 150 000 orders and 100 000 bills. Is the dataset too big?
Is my request wrong somewhere?
Thank you very much for your tips!
EDIT: side note, the tables have no foreign keys defined... *flies away*

Your query is fine. I would use table aliases in writing it:
SELECT b.*
FROM bills b LEFT JOIN
orders o
ON b.id = o.bill_id
WHERE o.bill_id IS NULL;
You don't need the NULL columns from orders, probably.
You need an index on orders(bill_id):
create index idx_orders_billid on orders(bill_id);

By your where statement, I assume your looking for orders that have no bills.
If that's the case you don't need to do a join to the bills table as they would by definition not exist.
You will find
SELECT * FROM orders
WHERE orders.bill_id IS NULL;
A much better performing query.
Edit:
Sorry I missed your "I want to list all the bills that have no order associated with." when reading the question. As #gordon pointed out an index would certainly help. However if changing the scheme is feasible I would rather have a nullable bill.order_id column instead of a order.bill_id because you won't need a left join, an inner join would suffice to get order bills as it would be a quicker query for your other assumed requirements.

MySQL Intermediate-Level Table Relationship

Each row in Table_1 needs to have a relationship with one or more rows that might come from any number of other tables in the database (Table_X). So I set up an intermediate table (Table_2) where each row contains an id from Table_1, and the id from Table_X. It also has its own auto increment id since none of the relationships will be exclusive and therefore both the other ids will not be unique in the table.
My problem now is that when I retrieve the row from Table_1 and would like to see the information from each related row from Table_X, I don't know how to get it. At first I thought I could create a column for the exact name of Table_X for each row in Table_2 and have a second SELECT statement using that information, but I've been seeing inklings about things such as foreign keys and join statements that I think I need to get into. I'm just having trouble sorting it all out. Do I even need Table_2?
This probably isn't overly complicated, but I'm just getting into MySQL and this is the first real challenge I've encountered.
Edit to include requested information: If I understand correctly, I think I'm dealing with a many to many relationship. Table_3 has games; Table_1 has articles. An article can be about multiple games, and a game can also have multiple articles written about it. The only other possibly pertinent information I can see is that when a new article is made, every game that will be related to it is decided all at once. But the list of articles related to a given game can grow over time as more articles are written. That's probably not especially important, however.

If I understood correctly You are talking about one to many relationship in database (for example: one person can have multiple phone numbers), You can store data in two separate tables persons and phones.
Persons:
|person_id|person_name |person_age |
| 1 | Bodan Kustan| 28 |
Phones:
|phone_id |person_id |phone_number|
| 1 | 1 | 31337 |
| 2 | 1 | 370 |
Then you can execute query with Join:
SELLECT * FROM `persons`
LEFT JOIN `phones` ON `persons`.`person_id` = `phones`.`person_id`
WHERE `persons`.`person_id` = 1;
And it will return to You list of persons with phone numbers:
|person_id|person_name |person_age |phone_id |person_id |phone_number|
| 1 | Bodan Kustan| 28 | 1 | 1 | 31337 |
| 1 | Bodan Kustan| 28 | 2 | 1 | 370 |
Another possibility is Many to Many relationship (for example: Any person can love pizza, and pizza is not unique for that person), then You need third table to join tables together person_food
Persons:
|person_id|person_name |person_age |
| 1 | Bodan Kustan| 28 |
Food:
|food_id |food_name |
| 1 | meat |
| 2 | pizza |
Person_Food
|person_id |food_id |
| 1 | 2 |
Then you can execute query with Join:
SELLECT * FROM `persons`
LEFT JOIN `person_food` ON `person`.`person_id` = `person_food`.`person_id`
LEFT JOIN `food` ON `food`.`food_id` = `person_food`.`food_id`
WHERE `persons`.`person_id` = 1;
And it will return data from all tables:
|person_id|person_name |person_age |person_id |food_id |food_name |
| 1 | Bodan Kustan| 28 | 1 | 2 | pizza |
However sometimes you need to join n amount of tables to join, then You could use separate table to hold information about relation. My approach (I don't think it's the best) would be to store table name next to relation (for example split mobile phones and home phones into two separate tables):
Persons:
|person_id|person_name |person_age |
| 1 | Bodan Kustan| 28 |
Mobile_Phone:
|mobile_phone_id |mobile_phone_number |
| 1 | 31337 |
Home_Phone:
|home_phone_id |home_phone_number |
| 1 | 370 |
Person_Phone:
|person_id |related_id |related_column |related_table |
| 1 | 1 | mobile_phone_id | mobile_phone |
| 1 | 1 | home_phone_id | home_phone |
Then query middle table to get all relations:
SELECT * FROM person_phone WHERE person_id = 1
Then build dynamic query (pseudo code, not tested -- might not work):
foreach (results as result)
append_to_final_sql = "LEFT JOIN {related_table}
ON {related_table}.{related_column} = `person_phone`.`related_id`
AND `person_phone`.`related_table` = {related_table}"
final_sql = "SELECT * FROM `persons` "
+ append_to_final_sql +
" WHERE `persons`.`person_id` = 1"
So Your final SQL would be:
SELECT * FROM `persons`
LEFT JOIN `person_phone` ON `person_phone`.`person_id` = `person`.`person_id`
LEFT JOIN `mobile_phone` ON `mobile_phone`.`mobile_phone_id` = `person_phone`.`related_id` AND `person_phone`.`related_table` = 'mobile_phone'
LEFT JOIN `home_phone` ON `home_phone`.`home_phone_id` = `person_phone`.`related_id` AND `person_phone`.`related_table` = 'home_phone'

You only need Table2 if entries in Table_x can be related to multiple rows in Table1 - otherwise a simple key for Table1 will suffice.
Look into joins - very powerful, flexible and fast.
select * from Table1 left join Table2 on Table1_id = Table2_table_1_id
left join Table_X on Tablex_id = Table2_table_x_id
Look at the output and you'll see that it returns all table_x rows with copies of the Table1 and Table2 fields.

Switching Raw greatest-n-per-group MySQL query to Laravel query builder

I want to move a raw mysql query into Laravel 4's query builder, or preferably Eloquent.
The Setup
A database for storing discount keys for games.
Discount keys are stored in key sets where each key set is associated with one game (a game can have multiple keysets).
The following query is intended to return a table of key sets and relevant data, for viewing on an admin page.
The 'keys used so far' is calculated by a scheduled event and periodically stored/updated in log entries in a table keySetLogs. (it's smart enough to only log data when the count changes)
We want to show the most up-to-date value of 'keys used', which is a 'greatest-n-per-group' problem.
The Raw Query
SELECT
`logs`.`id_keySet`,
`games`.`name`,
`kset`.`discount`,
`kset`.`keys_total`,
`logs`.`keys_used`
FROM `keySets` AS `kset`
INNER JOIN
(
SELECT
`ksl1`.*
FROM `keySetLogs` AS `ksl1`
LEFT OUTER JOIN `keySetLogs` AS `ksl2`
ON (`ksl1`.`id_keySet` = `ksl2`.`id_keySet` AND `ksl1`.`set_at` < `ksl2`.`set_at`)
WHERE `ksl2`.`id_keySet` IS NULL
ORDER BY `id_keySet`
)
AS `logs`
ON `logs`.`id_keySet` = `kset`.`id`
INNER JOIN `games`
ON `games`.`id` = `kset`.`id_game`
ORDER BY `kset`.`id_game` ASC, `kset`.`discount` DESC
Note: the nested query gets the most up-to-date keys_used value from the logs. This greatest-n-per-group code used as discussed in this question.
Example Output:
+-----------+-------------+----------+------------+-----------+
| id_keySet | name | discount | keys_total | keys_used |
+-----------+-------------+----------+------------+-----------+
| 5 | Test_Game_1 | 100.00 | 10 | 4 |
| 6 | Test_Game_1 | 50.00 | 100 | 20 |
| 3 | Test_Game_2 | 100.00 | 10 | 8 |
| 4 | Test_Game_2 | 50.00 | 100 | 14 |
| 1 | Test_Game_3 | 100.00 | 10 | 1 |
| 2 | Test_Game_3 | 50.00 | 100 | 5 |
...
The Question(s)
I have KeySet, KeySetLog and Game Eloquent Models created with relationship functions set up.
How would I write the nested query in query builder?
Is it possible to write the query entirely with eloquent (no manually writing joins)?

I don't know Laravel or Eloquent so I probably shouldn't comment, but if performance isn't at stake then it seems to me that this query could be rewritten something like this:
SELECT ksl1.id_keySet
, g.name
, k.discount
, k.keys_total
, ksl1.keys_used
FROM keySetLogs ksl1
LEFT
JOIN keySetLogs ksl2
ON ksl1.id_keySet = ksl2.id_keySet
AND ksl1.set_at < ksl2.set_at
LEFT
JOIN keysets k
ON k.id = l.id_keySet
LEFT
JOIN games g
ON g.id = k.id_game
WHERE ksl2.id_keySet IS NULL
ORDER
BY k.id_game ASC
, k.discount DESC

Is this good Database Normalization?

I am a beginner at using mysql and I am trying to learn the best practices. I have setup a similar structure as seen below.
(main table that contains all unique entries) TABLE = 'main_content'
+------------+---------------+------------------------------+-----------+
| content_id | (deleted) | title | member_id |
+------------+---------------+------------------------------+-----------+
| 6 | | This is a very spe?cal t|_st | 1 |
+------------+---------------+------------------------------+-----------+
(Provides the total of each difficulty and joins id --> actual name) TABLE = 'difficulty'
+---------------+-------------------+------------------+
| difficulty_id | difficulty_name | difficulty_total |
+---------------+-------------------+------------------+
| 1 | Absolute Beginner | 1 |
| 2 | Beginner | 1 |
| 3 | Intermediate | 0 |
| 4 | Advanced | 0 |
| 5 | Expert | 0 |
+---------------+-------------------+------------------+
(This table ensures that multiple values can be inserted for each entry. For example,
this specific entry indicates that there are 2 difficulties associated with the submission)
TABLE = 'lookup_difficulty'
+------------+---------------+
| content_id | difficulty_id |
+------------+---------------+
| 6 | 1 |
| 6 | 2 |
+------------+---------------+
I am joining all of this into a readable query:
SELECT group_concat(difficulty.difficulty_name) as difficulty, member.member_name
FROM main_content
INNER JOIN difficulty ON difficulty.difficulty_id
IN (SELECT difficulty_id FROM main_content, lookup_difficulty WHERE lookup_difficulty.content_id = main_content.content_id )
INNER JOIN member ON member.member_id = main_content.member_id
The above works fine, but I am wondering if this is good practice. I practically followed the structure laid out Wikipedia's Database Normalization example.
When I run the above query using EXPLAIN, it says: 'Using where; Using join buffer' and also that I am using 2 DEPENDENT SUBQUERY (s) . I don't see any way to NOT use sub-queries to achieve the same affect, but then again I'm a noob so perhaps there is a better way....

The DB design looks fine - regarding your query, you could rewrite it exclusively with joins like:
SELECT group_concat(difficulty.difficulty_name) as difficulty, member.member_name
FROM main_content
INNER JOIN lookup_difficulty ON main_content.id = lookup_difficulty.content_id
INNER JOIN difficulty ON difficulty.id = lookup_difficulty.difficulty_id
INNER JOIN member ON member.member_id = main_content.member_id

If the lookup_difficulty provides a link between content and difficulty I would suggest you take out the difficulty_id column from your main_content table. Since you can have multiple lookups for each content_id, you would need some extra business logic to determine which difficulty_id to put in your main_content table (or multiple entries in the main_content table for each difficulty_id, but that goes against normalization practices). For ex. the biggest value / smallest value / random value. In either case, it does not make much sense.
Other than that the table looks fine.
Update
Saw you updated the table :)
Just as a side-note. Using IN can slow down your query (IN can cause a table-scan). In any case, it used to be that way, but I'm sure that these days the SQL compiler optimizes it pretty well.

How can I use rows in a lookup table as columns in a MySQL query?

I'm trying to build a MySQL query that uses the rows in a lookup table as the columns in my result set.
LookupTable
id | AnalysisString
1 | color
2 | size
3 | weight
4 | speed
ScoreTable
id | lookupID | score | customerID
1 | 1 | A | 1
2 | 2 | C | 1
3 | 4 | B | 1
4 | 2 | A | 2
5 | 3 | A | 2
6 | 1 | A | 3
7 | 2 | F | 3
I'd like a query that would use the relevant lookupTable rows as columns in a query so that I can get a result like this:
customerID | color | size | weight | speed
1 A C D
2 A A
3 A F
The kicker of the problem is that there may be additional rows added to the LookupTable and the query should be dynamic and not have the Lookup IDs hardcoded. That is, this will work:
SELECT st.customerID,
(SELECT st1.score FROM ScoreTable st1 WHERE lookupID=1 AND st.customerID = st1.customerID) AS color,
(SELECT st1.score FROM ScoreTable st1 WHERE lookupID=2 AND st.customerID = st1.customerID) AS size,
(SELECT st1.score FROM ScoreTable st1 WHERE lookupID=3 AND st.customerID = st1.customerID) AS weight,
(SELECT st1.score FROM ScoreTable st1 WHERE lookupID=4 AND st.customerID = st1.customerID) AS speed
FROM ScoreTable st
GROUP BY st.customerID
Until there is a fifth row added to the LookupTable . . .
Perhaps I'm breaking the whole relational model and will have to resolve this in the backend PHP code?
Thanks for pointers/guidance.
tom

You have architected an EAV database. Prepare for a lot of pain when it comes to maintainability, efficiency and correctness. "This is one of the design anomalies in data modeling." (http://decipherinfosys.wordpress.com/2007/01/29/name-value-pair-design/)
The best solution would be to redesign the database into something more normal.

What you are trying to do is generally referred to as a cross-tabulation, or cross-tab, query. Some DBMSs support cross-tabs directly, but MySQL isn't one of them, AFAIK (there's a blog entry here depicting the arduous process of simulating the effect).
Two options come to mind for dealing with this:
Don't cross-tab at all. Instead, sort the output by row id, then AnalysisString, and generate the tabular output in your programming language.
Generate code on-the-fly in your programming langauge to emit the appropriate query.
Follow the blog I mention above to implement a server-side solution.
Also consider #Marek's answer, which suggests that you might be better off restructuring your schema. The advice is not a given, however. Sometimes, a key-value model is appropriate for the problem at hand.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008