Return null values in conditional join - mysql

I have a settings table and a settings_values table that cross matches the value for every user. It's important that I return a NULL value in cases where the setting has not been turned on for a user. It's also important that the table is highly efficient as it will be called often.
Ultimately, I have about 50 settings and thousands of users, so every time I run the query I should get 50 result similar to this:
setting_id | setting_name | value | user_id
1 blue true 35
2 red NULL NULL
3 yellow false 35
4 brown true 35
5 black NULL NULL
I have a working solution below that's running, but it's creating a bottleneck in my database requests which is strange given that settings only has 50 rows and settings_values only has about 20,000.
My tables are structured similar to this:
settings:
setting_id | setting_name
With setting_id as the only index (Primary Key)
settings_values:
id | setting_id | value | user_id
With id being the Primary Key, setting_id being a foregin key reference to the previous table, value is varchar and user_id also has an index on it.
My current working code is the following:
SELECT
s.setting_id,
s.setting_name,
sv.value,
sv.user_id
FROM
cmssps_settings s
LEFT JOIN cmssps_settings_values sv ON
s.setting_id = sg.setting_id
AND sv.user_id = '35';
That seems to work OK apart from the bottleneck (I get my 5 rows returned), and I suspect its the JOIN ON AND that's causing the slow down. I've been trying to achieve the same result with a JOIN ON WHERE like this, but I don't get my required NULL results (i.e. it only returns the 3 rows that satisfy the WHERE):
SELECT
s.setting_id,
s.setting_name,
sv.value,
sv.user_id
FROM
cmssps_settings s
LEFT JOIN cmssps_settings_values sv ON
s.setting_id = sg.setting_id
WHERE sv.user_id = '35';
When I EXPLAIN both I get this for the former:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | s | ALL | [NULL] | [NULL]| [NULL] | [NULL]| 49 | [NULL]
1 | SIMPLE | sv | ref | user_id | branch_id | 5 | const | 24 | Using where
When I do it for the latter I get a key value for my first table which I assume indicates that it will excecute faster:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | s | index | [NULL] | PRIMARY | 4 | [NULL]| 49 | [NULL]
1 | SIMPLE | sv | ref | user_id | branch_id | 5 | const | 24 | Using where
I've also attempted nesting the queries and using UNION but that seems like a regression.
Is there anything obvious as to why my initial approach is having such a negative impact on performance?
Is there a more efficient way to achieve the same outcome?

Related

MySQL query randomly hangs on 'sending data' status?

I am using a MySQL 5.5.25 server, and InnoDB for my databases.
Quite often the CPU of the server is working at 100% due to a mysqld process for roughly a minute. Using SHOW PROCESSLIST:
Command | Time | State | Info
Query | 100 | Sending data | SELECT a.prefix, a...
Query | 107 | Sending data | SELECT a.prefix, a...
Query | 50 | Sending data | SELECT a.prefix, a...
The problematic query is:
SELECT a.prefix, a.test_id, a.method_id, b.test_id
FROM a
LEFT JOIN b ON b.test_id = a.test_id
AND user_id = ?
AND year = ?
All these columns are INDEXED, thus this ain't the problem. Also, when I run the query in phpMyAdmin (with a sufficient LIMIT), it takes 0.05 seconds to complete. Also, it is quite frustrating that I find it to be impossible to reproduce this problem myself, even when executing this query twice simultaneously and spamming it only gives me spikes to 40% CPU.
Prefixing the query with EXPLAIN results in:
Id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 1169 |
1 | SIMPLE | b | ref | user_id,year,test_id | test_id | 4 | db.a.test_id | 58 |
Unless I cannot see what is right in front of me, I am looking for ways to discover how to debug this kind of problem. I do already log all queries with their time to execute, and their values etc. But since I cannot reproduce it, I am stuck. What should I do to figure out what this problem is about?
Wow, thanks to Masood Alam's comment I found and solved the problem.
Using SHOW ENGINE INNODB STATUS turns out to be very helpful in debugging.
It turns out that with some values for user_id the database handles things differently, hence its unpredictable behaviour and not being able to reproduce the problem. Through the status command I was able to copy and paste the exact query that was running at the moment.
This are the results of EXPLAIN of that particular value for user_id:
Id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 841 |
1 | SIMPLE | b | index_merge | user_id,year,test_id | user_id,year | 4,4 | NULL | 13 | Using intersect(user_id,year); Using where
A follow-up question would be now be how this behaviour can be explained. However, to fix the problem for me was just to change the query to:
SELECT a.prefix, a.test_id, a.method_id, b.test_id
FROM a
LEFT JOIN b ON b.test_id = a.test_id
WHERE
b.id IS NULL
OR user_id = ?
AND year = ?
Of which EXPLAIN results in:
Id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | a | ALL | NULL | NULL | NULL | NULL | 671 |
1 | SIMPLE | b | ref | test_id | test_id | 4 | db.a.test_id | 49 | Using where
Now I know, that InnoDB can have different approaches for executing queries given different input values. Why? I still don't know.
SENDING DATA is an indication for a long-running query no matter which engine you use and as such is very bad.
In your case no index is used on table a in both cases and you are lucky that this table only has a couple hundred records. No index or ALL (full table scan) is something to be avoided completely. Remember, things are different when you have thousands or even millions of records in your table.
Create an index modeled after the query in case. Use EXPLAIN again to see if the engine uses this index. If not, you may use a hint like USE INDEX(myindex) or FORCE INDEX (myindex). The result should be stunning.
If not, your index might not be as good as you think. Reiterate the whole process.
Remember, though, if you change the query you work with this index might no longer be appropriate, so again you have to reiterate.

Optimizing / improving a slow mysql query - indexing? reorganizing?

First off, I've looked at several other questions about optimizing sql queries, but I'm still unclear for my situation what is causing my problem. I read a few articles on the topic as well and have tried implementing a couple possible solutions, as I'll describe below, but nothing has yet worked or even made an appreciable dent in the problem.
The application is a nutrition tracking system - users enter the foods they eat and based on an imported USDA database the application breaks down the foods to the individual nutrients and gives the user a breakdown of the nutrient quantities on a (for now) daily basis.
here's
A PDF of the abbreviated database schema
and here it is as a (perhaps poor quality) JPG. I made this in open office - if there are suggestions for better ways to visualize a database, I'm open to suggestions on that front as well! The blue tables are directly from the USDA, and the green and black tables are ones I've made. I've omitted a lot of data in order to not clutter things up unnecessarily.
Here's the query I'm trying to run that takes a very long time:
SELECT listing.date_time,listing.nutrdesc,data.total_nutr_mass,listing.units
FROM
(SELECT nutrdesc, nutr_no, date_time, units
FROM meals, nutr_def
WHERE meals.users_userid = '2'
AND date_time BETWEEN '2009-8-12' AND '2009-9-12'
AND (nutr_no <100000
OR nutr_no IN
(SELECT nutr_def_nutr_no
FROM nutr_rights
WHERE nutr_rights.users_userid = '2'))
) as listing
LEFT JOIN
(SELECT nutrdesc, date_time, nut_data.nutr_no, sum(ingred_gram_mass*entry_qty_num*nutr_val/100) AS total_nutr_mass
FROM nut_data, recipe_ingredients, food_entries, meals, nutr_def
WHERE nut_data.nutr_no = nutr_def.nutr_no
AND ndb_no = ingred_ndb_no
AND foods_food_id = entry_ident
AND meals_meal_id = meal_id
AND users_userid = '2'
AND date_time BETWEEN '2009-8-12' AND '2009-9-12'
GROUP BY date_time,nut_data.nutr_no ) as data
ON data.date_time = listing.date_time
AND listing.nutr_no = data.nutr_no
ORDER BY listing.date_time,listing.nutrdesc,listing.units
So I know that's rather complex - The first select gets a listing of all the nutrients that the user consumed within the given date range, and the second fills in all the quantities.
When I implement them separately, the first query is really fast, but the second is slow and gets very slow when the date ranges get large. The join makes the whole thing ridiculously slow. I know that the 'main' problem is the join between these two derived tables, and I can get rid of that and do the join by hand basically in php much faster, but I'm not convinced that's the whole story.
For example: for 1 month of data, the query takes about 8 seconds, which is slow, but not completely terrible. Separately, each query takes ~.01 and ~2 seconds respectively. 2 seconds still seems high to me.
If I try to retrieve a year's worth of data, it takes several (>10) minutes to run the whole query, which is problematic - the client-server connection sometimes times out, and in any case we don't want I don't want to sit there with a spinning 'please wait' icon. Mainly, I feel like there's a problem because it takes more than 12x as long to retrieve 12x more information, when it should take less than 12x as long, if I were doing things right.
Here's the 'explain' for each of the slow queries: (the whole thing, and just the second half).
Whole thing:
+----+--------------------+--------------------+----------------+-------------------------------+------------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------+----------------+-------------------------------+------------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 5053 | Using temporary; Using filesort |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 4341 | |
| 4 | DERIVED | meals | range | PRIMARY,day_ind | day_ind | 9 | NULL | 30 | Using where; Using temporary; Using filesort |
| 4 | DERIVED | food_entries | ref | meals_meal_id | meals_meal_id | 5 | nutrition.meals.meal_id | 15 | Using where |
| 4 | DERIVED | recipe_ingredients | ref | foods_food_id,ingred_ndb_no | foods_food_id | 4 | nutrition.food_entries.entry_ident | 2 | |
| 4 | DERIVED | nutr_def | ALL | PRIMARY | NULL | NULL | NULL | 174 | |
| 4 | DERIVED | nut_data | ref | PRIMARY | PRIMARY | 36 | nutrition.nutr_def.nutr_no,nutrition.recipe_ingredients.ingred_ndb_no | 1 | |
| 2 | DERIVED | meals | range | day_ind | day_ind | 9 | NULL | 30 | Using where |
| 2 | DERIVED | nutr_def | ALL | PRIMARY | NULL | NULL | NULL | 174 | Using where |
| 3 | DEPENDENT SUBQUERY | nutr_rights | index_subquery | users_userid,nutr_def_nutr_no | nutr_def_nutr_no | 19 | func | 1 | Using index; Using where |
+----+--------------------+--------------------+----------------+-------------------------------+------------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
10 rows in set (2.82 sec)
Second chunk (data):
+----+-------------+--------------------+-------+-----------------------------+---------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+-------+-----------------------------+---------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | meals | range | PRIMARY,day_ind | day_ind | 9 | NULL | 30 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | food_entries | ref | meals_meal_id | meals_meal_id | 5 | nutrition.meals.meal_id | 15 | Using where |
| 1 | SIMPLE | recipe_ingredients | ref | foods_food_id,ingred_ndb_no | foods_food_id | 4 | nutrition.food_entries.entry_ident | 2 | |
| 1 | SIMPLE | nutr_def | ALL | PRIMARY | NULL | NULL | NULL | 174 | |
| 1 | SIMPLE | nut_data | ref | PRIMARY | PRIMARY | 36 | nutrition.nutr_def.nutr_no,nutrition.recipe_ingredients.ingred_ndb_no | 1 | |
+----+-------------+--------------------+-------+-----------------------------+---------------+---------+-----------------------------------------------------------------------+------+----------------------------------------------+
5 rows in set (0.00 sec)
I've 'analyzed' all the tables involved in the query, and added an index on the datetime field that is joining meals and food entries. I called it 'day_ind'. I hoped that would accelerate things, but it didn't seem to make a difference. I also tried removing the 'sum' function, as I understand that having a function in the query will frequently mean a full table scan, which is obviously much slower. Unfortunately removing the 'sum' didn't seem to make a difference either (well, about 3-5% or so, but not the order magnitude that I'm looking for).
I would love any suggestions and will be happy to provide any more information you need to help diagnose and improve this problem. Thanks in advance!
There are a few type All in your explain suggest full table scan. and hence create temp table. You could re-index if it is not there already.
Sort and Group By are usually the performance killer, you can adjust Mysql memory settings to avoid physical i/o to tmp table if you have extra memory available.
Lastly, try to make sure the data type of the join attributes matches. Ie data.date_time = listing.date_time has same data format.
Hope that helps.
Okay, so I eventually figured out what I'm gonna end up doing. I couldn't make the 'data' query any faster - that's still the bottleneck. But now I've made it so the total query process is pretty close to linear, not exponential.
I split the query into two parts and made each one into a temporary table. Then I added an index for each of those temp tables and did the join separately afterwards. This made the total execution time for 1 month of data drop from 8 to 2 seconds, and for 1 year of data from ~10 minutes to ~30 seconds. Good enough for now, I think. I can work with that.
Thanks for the suggestions. Here's what I ended up doing:
create table listing (
SELECT nutrdesc, nutr_no, date_time, units
FROM meals, nutr_def
WHERE meals.users_userid = '2'
AND date_time BETWEEN '2009-8-12' AND '2009-9-12'
AND (
nutr_no <100000 OR nutr_no IN (
SELECT nutr_def_nutr_no
FROM nutr_rights
WHERE nutr_rights.users_userid = '2'
)
)
);
create table data (
SELECT nutrdesc, date_time, nut_data.nutr_no, sum(ingred_gram_mass*entry_qty_num*nutr_val/100) AS total_nutr_mass
FROM nut_data, recipe_ingredients, food_entries, meals, nutr_def
WHERE nut_data.nutr_no = nutr_def.nutr_no
AND ndb_no = ingred_ndb_no
AND foods_food_id = entry_ident
AND meals_meal_id = meal_id
AND users_userid = '2'
AND date_time BETWEEN '2009-8-12' AND '2009-9-12'
GROUP BY date_time,nut_data.nutr_no
);
create index joiner on data(nutr_no, date_time);
create index joiner on listing(nutr_no, date_time);
SELECT listing.date_time,listing.nutrdesc,data.total_nutr_mass,listing.units
FROM listing
LEFT JOIN data
ON data.date_time = listing.date_time
AND listing.nutr_no = data.nutr_no
ORDER BY listing.date_time,listing.nutrdesc,listing.units;

Cascading WHERE clause inside a VIEW with a UNION

This isn't solved, but I found out why: MySQL View containing UNION does not optimize well...In other words SLOW!
Original post:
I'm working with a database for a game. There are two identical tables equipment and safety_dep_box. To check if a player has a piece of equipment I'd like to check both tables.
Instead of doing two queries, I want to take advantage of the UNION functionality in MySQL. I've recently learned that I can create a VIEW. Here's my view:
CREATE VIEW vAllEquip AS SELECT * FROM equipment UNION SELECT * FROM safety_dep_box;
The view created just fine. However when I run
SELECT * FROM vAllEquip WHERE owner=<id>
The query takes forever, while independent select queries are quick. I think I know why, but I don't know how to fix it.
Thanks!
P.S. with Additional Information:
The two tables are identical in structure, but split because they are multi-100-million row tables.
The structure includes primary key on int id, multiple index on int owner.
What I don't understand is the speed difference between the following:
SELECT COUNT(*) FROM (SELECT * FROM equipment WHERE owner=1 UNION ALL SELECT * FROM safety_dep_box WHERE owner=1) AS uES;
0.42 sec
SELECT COUNT(*) FROM (SELECT * FROM equipment WHERE owner=1 UNION SELECT * FROM safety_dep_box WHERE owner=1) AS uES;
0.37 sec
SELECT COUNT(*) FROM vAllEquip WHERE owner=1;
aborted after 60 seconds
Version: 5.1.51
mysql> explain SELECT * FROM equipment UNION SELECT * FROM safety_dep_box;
+----+--------------+----------------+------+---------------+------+---------+------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+----------------+------+---------------+------+---------+------+---------+-------+
| 1 | PRIMARY | equipment | ALL | NULL | NULL | NULL | NULL | 1499148 | |
| 2 | UNION | safety_dep_box | ALL | NULL | NULL | NULL | NULL | 867321 | |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+----------------+------+---------------+------+---------+------+---------+-------+
with a WHERE clause
mysql> explain SELECT * FROM equipment WHERE owner=1 UNION ALL SELECT * FROM safety_dep_box WHERE owner=1
-> ;
+----+--------------+----------------+------+-----------------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+----------------+------+-----------------------+-------+---------+-------+------+-------+
| 1 | PRIMARY | equipment | ref | owner,owner_2,owner_3 | owner | 4 | const | 1 | |
| 2 | UNION | safety_dep_box | ref | owner,owner_3 | owner | 4 | const | 1 | |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+----------------+------+-----------------------+-------+---------+-------+------+-------+
First off, you should probably be using UNION ALL instead of plain UNION. With plain UNION, the engine will try to de-duplicate your result set. That is likely the source of your problem.
Secondly, you'll need indexes on owner in both tables, not just one. And, ideally, they'll be integer columns.
Thirdly, Randolph is right that you should not be using "*" in your SELECT statement. List out all the columns you want included. That is especially important in a UNION because the columns must match up exactly and, if there's a disagreement in the column order in your two tables you may be forcing some type conversion to go on that is costing you some time.
Finally, the phrase "There are two identical tables" is almost always a tip-off that your database is not optimally designed. These should probably be a single table. To indicate ownership of an item, your safety_dep_box table should contain only the ownerID and itemID of the item (to relate equipment and players), and possibly an additional autonumbered integer key column.
First off, don't use SELECT * in views ever. It's lazy code. Secondly, without knowing what the base tables look like, we're even less likely to be able to help you.
The reason it takes forever is because it has to build the full result and then filter it. You'll want indexes on your owner fields, whatever they may be.

MySQL query optimization is driving me nuts! Almost same, but horribly different

I have the following two queries (*), which only differ in the field being restricted in the WHERE clause (name1 vs name2):
SELECT A.third_id, COUNT(DISTINCT B.fourth_id) AS num
FROM first A
JOIN second B ON A.third_id = B.third_id
WHERE A.name1 LIKE 'term%'
SELECT A.third_id, COUNT(DISTINCT B.fourth_id) AS num
FROM first A
JOIN second B ON A.third_id = B.third_id
WHERE A.name2 LIKE 'term%'
Both of the name fields have a single-column index on them. There is also an index on both third_id columns as well as fourth_id (which are all foreign keys into other tables, but it is not relevant here).
According to EXPLAIN, the first one behaves like this - which is what I want:
+----+-------------+-------+-------+---------------+----------+---------+---------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+----------+---------+---------------+------+-------------+
| 1 | SIMPLE | A | range | third_id,name | name | 767 | NULL | 3491 | Using where |
| 1 | SIMPLE | B | ref | third_id | third_id | 4 | db.A.third_id | 16 | |
+----+-------------+-------+-------+---------------+----------+---------+---------------+------+-------------+
The second one does this, which I definitely do not want:
+----+-------------+-------+------+----------------+----------+---------+---------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------+----------+---------+---------------+--------+-------------+
| 1 | SIMPLE | B | ALL | third_id | NULL | NULL | NULL | 507539 | |
| 1 | SIMPLE | A | ref | third_id,name2 | third_id | 4 | db.B.third_id | 1 | Using where |
+----+-------------+-------+------+----------------+----------+---------+---------------+--------+-------------+
What the heck is happening here? How do I make the second one behave properly (i.e. like the first one)?
(*) Actually, I don't. I have a bit more complex queries; I have eliminated extras for this post, and distilled them to the minimal queries that still exhibit the problematic behaviour. Also, names were changed to protect the guilty.
Add CREATE TABLE statements to your post.
A real SELECT statement would be helpful too.
1 possible reason is that name2 has a much higher percentage of values starting with "term%".
Try enforcing order of tables in query by using STRAIGHT_JOIN.
SELECT A.third_id, COUNT(DISTINCT B.fourth_id) AS num
FROM first A
STRAIGHT_JOIN second B ON A.third_id = B.third_id
WHERE A.name2 LIKE 'term%'
How many records is in those tables ? Check cardinality/slectivity in name2 column.
If selectivity is low try Naktibalda "STRAIGHT_JOIN" or hints http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

How to select in between?

I have a table, called Level.
id | level | points(minimum)
-------------------------
1 | 1 | 0
2 | 2 | 100
3 | 3 | 200
Let say I have 189 points, how do i check which level the user in?
EDIT:
Best answer chosen. Now I am comparing the request by adding EXPLAIN before the SELECT query, i have this result:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
-------------------------------------------------------------------------------------------------------------
1 | SIMPLE | level | ALL | NULL | NULL | NULL | NULL | 8 | Using where
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
-------------------------------------------------------------------------------------------------------------
1 | SIMPLE | level | ALL | NULL | NULL | NULL | NULL | 8 | Using where; Using filesort
How do i know which one is better or faster?
If you're looking for the level that the player is currently in, you want to select the maximum level with a points requirement less than the points the player currently has:
select max(level) from level where points <= 189;
This may work better if each level has a min_points and max_points amount:
id | level | min_points | max_points
------------------------------------
1 | 1 | 0 | 99
2 | 2 | 100 | 199
3 | 3 | 200 | 299
Then your query wouldn't need to aggregate:
select * from level where min_points <= 189 && max_points > 189;
Edit: ugh, I keep messing up my SQL tonight, LOL.
This wouldn't require a 'between' or any kind of aggregate function at all.. you could just select all rows that are less than the points, sort descending, and then first row should be the correct one.
select level from Level where points <= 189 order by points desc limit 1
(Assuming MySQL .. if you do not have MySQL, 'limit' may not work)
I'm not quite sure what you mean but I think this is what you want:
SELECT `level` FROM `Level` WHERE `points`=189
select max(level) from level where points <= 189
This assumes the 'points' field in the table is the mininum points ot achieve that level.