I need to select all movies that a user has not watched yet.
My SQL query to grab the last 20 movies looks like this:
SELECT movies.* FROM movies, hdd WHERE hdd.id=movies.hdd_id and hdd.status='1' and movies.skip!='1' order by id desc limit 20
The movie table looks like this:
id int(11) Incrément automatique
hdd_id int(20)
tmdb_id int(20) NULL
imdb_id text NULL
file_path text
ftp_path text NULL
file_name text
resolution text NULL
timestamp int(11) NULL
skip int(2)
credits int(2)
title varchar(255) NULL
original_title varchar(255) NULL
adult int(2) NULL
categ text NULL
collection text NULL
companies text NULL
language text NULL
lang text NULL
rating text NULL
mpaa text NULL
tagline text NULL
overview text NULL
budget text NULL
homepage text NULL
popularity text NULL
runtime varchar(255) NULL
revenue varchar(255) NULL
release_date date NULL
vote_average varchar(255) NULL
vote_count varchar(255) NULL
movie_poster_path varchar(255) NULL
movie_poster varchar(255) NULL
movie_backdrop_path varchar(255) NULL
movie_backdrop varchar(255) NULL
This selects the movies only if the HDD status is online and the crawler did not skip it.
Now the problem is that the watch log is in a separate table:
The table looks like this:
id int(11) Incrément automatique
type varchar(255)
ref int(9) NULL
membre int(9) NULL
counter int(9) NULL
duration varchar(255)
currentTime varchar(255)
Membre is the user id and ref is the movie tmdb_id
This is what I tried so fare
SELECT movies.* FROM movies, hdd, watch WHERE hdd.id=movies.hdd_id and hdd.status='1' and movies.skip!='1' and (watch.membre='$_SESSION[id]' and watch.ref=movies.tmdb_id) order by id desc limit 20
But of course this is not working. I think the output is backwards. Instead of returning the unwatched stuff, it's returning the watched movies.
you need to fliter movies from the whole list which is already watched. using left join it can be filtered. try this.
SELECT movies.* FROM movies left join watch on watch.ref=movies.tmdb_id, hdd
WHERE hdd.id=movies.hdd_id and hdd.status='1'and movies.skip!='1' and
watch.membre = '$_SESSION[id]'
order by id desc limit 20
What you need is to LEFT JOIN watch and then check that there is no matching entry in watch. I think this will work:
SELECT movies.*
FROM movies
JOIN hdd ON hdd.id = movies.hdd_id AND movies.skip != 1
LEFT JOIN watch ON watch.ref = movies.tmdb_id AND watch.membre='$_SESSION[id]'
WHERE watch.id IS NULL
ORDER BY id DESC
LIMIT 20
Alternatively to keep it in the same style without JOINs (if that is your preference for whatever reason), you can also do this:
SELECT movies.*
FROM movies, hdd
WHERE hdd.id = movies.hdd_id and hdd.status = '1' and movies.skip != '1' and
NOT EXISTS(SELECT 1 FROM watch WHERE watch.membre = '$_SESSION[id]' and watch.ref = movies.tmdb_id)
ORDER BY id desc
LIMIT 20
My app needs to run this query pretty often, which gets a list of user data for the app to display. The problem is that subquery about the user_quiz is resource heavy and calculating the rankings are also very CPU intense too.
Benchmark: ~.5 second each run
When it will be run:
When the user want to see their ranking
When the user want to see other people's ranking
Getting a list of user's friends
.5 second it's a really long time considering this query will be run pretty often. Is there anything I could do to optimize this query?
Table for user:
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`firstname` varchar(100) DEFAULT NULL,
`lastname` varchar(100) DEFAULT NULL,
`password` varchar(20) NOT NULL,
`email` varchar(300) NOT NULL,
`verified` tinyint(10) DEFAULT NULL,
`avatar` varchar(300) DEFAULT NULL,
`points_total` int(11) unsigned NOT NULL DEFAULT '0',
`points_today` int(11) unsigned NOT NULL DEFAULT '0',
`number_correctanswer` int(11) unsigned NOT NULL DEFAULT '0',
`number_watchedvideo` int(11) unsigned NOT NULL DEFAULT '0',
`create_time` datetime NOT NULL,
`type` tinyint(1) unsigned NOT NULL DEFAULT '1',
`number_win` int(11) unsigned NOT NULL DEFAULT '0',
`number_lost` int(11) unsigned NOT NULL DEFAULT '0',
`number_tie` int(11) unsigned NOT NULL DEFAULT '0',
`level` int(1) unsigned NOT NULL DEFAULT '0',
`islogined` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=230 DEFAULT CHARSET=utf8;
Table for user_quiz:
CREATE TABLE `user_quiz` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`question_id` int(11) NOT NULL,
`is_answercorrect` int(11) unsigned NOT NULL DEFAULT '0',
`question_answer_datetime` datetime NOT NULL,
`score` int(1) DEFAULT NULL,
`quarter` int(1) DEFAULT NULL,
`game_type` int(1) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=9816 DEFAULT CHARSET=utf8;
Table for user_starter:
CREATE TABLE `user_starter` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`result` int(1) DEFAULT NULL,
`created_date` date DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=456 DEFAULT CHARSET=utf8mb4;
My indexes:
Table: user
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
user 0 PRIMARY 1 id A 32 BTREE
Table: user_quiz
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
user_quiz 0 PRIMARY 1 id A 9462 BTREE
user_quiz 1 user_id 1 user_id A 270 BTREE
Table: user_starter
Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment
user_starter 0 PRIMARY 1 id A 454 BTREE
user_starter 1 user_id 1 user_id A 227 YES BTREE
Query:
SET #curRank = 0;
SET #lastPlayerPoints = 0;
SELECT
sub.*,
#curRank := IF(#lastPlayerPoints!=points_week, #curRank + 1, #curRank) AS rank,
#lastPlayerPoints := points_week AS db_PPW
FROM (
SELECT u.id,u.firstname,u.lastname,u.email,u.avatar,u.type,u.points_total,u.number_win,u.number_lost,u.number_tie,u.verified,
COALESCE(SUM(uq.score),0) as points_week,
COALESCE(us.number_lost,0) as number_week_lost,
COALESCE(us.number_win,0) as number_week_win,
(select MAX(question_answer_datetime) from user_quiz WHERE user_id = u.id and game_type = 1) as lastFrdFight,
(select MAX(question_answer_datetime) from user_quiz WHERE user_id = u.id and game_type = 2) as lastBotFight
FROM `user` u
LEFT JOIN (SELECT user_id,
count(case when result=1 then 1 else null end) as number_win,
count(case when result=-1 then 1 else null end) as number_lost
from user_starter where created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27' ) us ON u.id = us.user_id
LEFT JOIN (SELECT * FROM user_quiz WHERE question_answer_datetime BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 00:00:00') uq on u.id = uq.user_id
GROUP BY u.id ORDER BY points_week DESC, u.lastname ASC, u.firstname ASC
) as sub
EXPLAIN:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 3027 100
2 DERIVED u ALL PRIMARY 32 100 Using temporary; Using filesort
2 DERIVED <derived5> ALL 1 100 Using where; Using join buffer (Block Nested Loop)
2 DERIVED <derived6> ref <auto_key0> <auto_key0> 4 fancard.u.id 94 100
6 DERIVED user_quiz ALL 9461 100 Using where
5 DERIVED user_starter ALL 454 100 Using where
4 DEPENDENT SUBQUERY user_quiz ref user_id user_id 4 func 35 100 Using where
3 DEPENDENT SUBQUERY user_quiz ref user_id user_id 4 func 35 100 Using where
Example output and expected output:
Bench mark: around .5 second
The following index should make the subquery to user_quiz ultra fast.
ALTER TABLE user_quiz
ADD INDEX (`user_id`,`game_type`,`question_answer_datetime`)
Please provide SHOW CREATE TABLE tablename statements for all tables, as that will help with additional optimizations.
Update #1
Alright, I've had some time to look things over, and fortunately there a appears to be a lot of relatively low hanging fruit in terms of optimization.
Here are all the indexes to add:
ALTER TABLE user_quiz
ADD INDEX `userGametypeAnswerDatetimes` (`user_id`,`game_type`,`question_answer_datetime`)
ALTER TABLE user_quiz
ADD INDEX `userAnswerScores` (`user_id`,`question_answer_datetime`,`score`)
ALTER TABLE user_starter
ADD INDEX `userResultDates` (`user_id`,`result`,`created_date`)
Note that the names (such as userGametypeAnswerDatetimes) are optional, and you can name them to whatever makes the most sense to you. But, in general, it's good to put specific names on your custom indexes (simply for organization purposes.)
Now, here is your query that should work will with those new indexes:
SET #curRank = 0;
SET #lastPlayerPoints = 0;
SELECT
sub.*,
#curRank := IF(#lastPlayerPoints!=points_week, #curRank + 1, #curRank) AS rank,
#lastPlayerPoints := points_week AS db_PPW
FROM (
SELECT u.id,
u.firstname,
u.lastname,
u.email,
u.avatar,
u.type,
u.points_total,
u.number_win,
u.number_lost,
u.number_tie,
u.verified,
COALESCE(user_scores.score,0) as points_week,
COALESCE(user_losses.number_lost,0) as number_week_lost,
COALESCE(user_wins.number_win,0) as number_week_win,
(
select MAX(question_answer_datetime)
from user_quiz
WHERE user_id = u.id and game_type = 1
) as lastFrdFight,
(
select MAX(question_answer_datetime)
from user_quiz
WHERE user_id = u.id
and game_type = 2
) as lastBotFight
FROM `user` u
LEFT OUTER JOIN (
SELECT user_id,
COUNT(*) AS number_won
from user_starter
WHERE created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27'
AND result = 1
GROUP BY user_id
) user_wins
ON user_wins.user_id = u.user_id
LEFT OUTER JOIN (
SELECT user_id,
COUNT(*) AS number_lost
from user_starter
WHERE created_date BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 05:10:27'
AND result = -1
GROUP BY user_id
) user_losses
ON user_losses.user_id = u.user_id
LEFT OUTER JOIN (
SELECT SUM(score)
FROM user_quiz
WHERE question_answer_datetime
BETWEEN '2016-01-11 00:00:00' AND '2016-05-12 00:00:00'
GROUP BY user_id
) user_scores
ON u.id = user_scores.user_id
ORDER BY points_week DESC, u.lastname ASC, u.firstname ASC
) as sub
Note: This is not necessarily the best result. It depends a LOT on your data set, as to whether this is necessarily the best, and sometimes you need to do a bit of trial and error.
A hint as to what you can use trial and error on is the structure of how we query the lastFrdFight and lastBotFight verses how we query points_week, number_week_lost, number_week_win. All of these could either be done in the select statement (like the first two are in my query) or could be done by joining to a subquery result (like the last three do, in my query.)
Mix and match to see what works best. In general, I've found the joining to a subquery to be fastest when you have a large number of rows in the outer query (in this case, querying the user table.) This is because it only needs to get the results once, and then can just match them up on a user by user basis. Other times, it can be better to have the query just in the SELECT clause - this will run MUCH faster, since there are more constants (the user_id is already known), but has to run for each row. So it's a trade off, and why you sometimes need to use trial and error.
Why do the indexes work?
So, you may be wondering why I made the indexes as I did. If you are familiar with phone books (in this age of smartphones, that's no longer a valid assumption I can make) then we can use that as an analogy:
If you had a composite index of phonebookIndex (lastname,firstname,email) on your user table (example here! you don' actually need to add that index!) you would have a result similar to what a phone book provides. (Using email instead of phone number.)
Each index is an internal copy of the data in the overall table. With this phonebookIndex there would internally be stored a list of all users with their lastname, then their first name, and then their email, and each of these would be ordered, just like a phone book.
Why is that useful? Consider when you know someone's first and last name. You can quickly flip to where their last name is, then quickly go through that list of everyone with their last name, finding the first name you want, so obtaining the email.
Indexes work in exactly the same way, in terms of how the database looks at them.
Consider the userGametypeAnswerDatetimes index I defined above, and how we query that index in the lastFrdFight SELECT subquery.
(
select MAX(question_answer_datetime)
from user_quiz
WHERE user_id = u.id and game_type = 1
) as lastFrdFight
Notice how we have both the user_id (from the outer query) and the game_type as constants. That is exactly like our example earlier, with having the first and last name, and wanting to look up an email/phone number. In this case, we are looking for the MAX of the 3rd value in the index. Still easy: All the values are ordered, so if this index was sitting in front of us, we could just flip to the specific user_id, then look at the section with all game_type=1 and then just pick the last value to find the maximum. Very very fast. Same for the database. It can find this value extremely fast, which is why you saw an 80%+ reduction in your overall query time.
So, that's how indexes work, and why I choose these indexes as I did.
Be aware, that the more indexes you have, the more you'll see slowdowns when doing inserts and updates. But, if you are reading a lot more from your tables than you are writing, this is usually a more than acceptable trade off.
So, give these changes a shot, and let me know how it performs. Please provide the new EXPLAIN plan if you want further optimization help. Also, this should give you quite a bit of tools to use trial and error to see what does work at what doesn't. All my changes are fairly independent of each other, so you can swap them in and out with your original query pieces to see how each one works.
I am joining three tables 'customer', 'customer_address' and 'country' using left join because I'm allowing a customer to have one or none address.
At the moment I have 13k+ customers and the query takes about 40 sec. I tried inner join but in that case I'm not getting the customers with no address.
All columns in 'ON' are indexed but it doesn't make much of a difference.
Here is my query:
SELECT DISTINCT *,
CASE
WHEN customer_address.customerid is NULL THEN customer.customerid
ELSE customer_address.customerid
END as customerid,
CASE
WHEN address1 = '' THEN 'NA'
ELSE address1
END as address1
FROM customer
LEFT JOIN customer_address ON customer.customerid = customer_address.customerid
LEFT JOIN country ON country.id = customer_address.country
WHERE deleted='0'
ORDER BY customer.customerid
DESC
LIMIT 0, 10
Any help would be appreciated
EDIT:
Here is 'explain' for the three tables:
customer
Field Type Null Key Default Extra
customerid int(12) NO PRI NULL auto_increment
forename varchar(128) YES NULL
surname varchar(128) YES NULL
company varchar(64) YES NULL
tel varchar(32) YES NULL
tel2 varchar(32) YES NULL
fax varchar(32) YES NULL
mob varchar(32) YES NULL
email varchar(255) YES NULL
date_reg date YES NULL
last_update datetime YES NULL
deleted int NO
customer_address
Field Type Null Key Default Extra
addressid varchar(12) NO PRI
customerid varchar(12) YES MUL NULL
address1 varchar(128) YES NULL
address2 varchar(128) YES NULL
town varchar(128) YES NULL
county varchar(128) YES MUL NULL
postcode varchar(12) YES NULL
country int(12) YES NULL
address_date datetime YES NULL
isprimary int NO not
country
Field Type Null Key Default Extra
id int(12) NO PRI 0
country varchar(255) YES NULL
At the moment there are no deleted!='0'
EDIT 2:
Query Explain:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE customer NULL ALL deleted NULL NULL NULL 13082 99.98 Using where; Using temporary; Using filesort
1 SIMPLE customer_address NULL ALL NULL NULL NULL NULL 9983 100.00 Using where; Using join buffer (Block Nested Loop)
1 SIMPLE country NULL eq_ref PRIMARY,id PRIMARY 4 db_name.customer_address.country 1 100.00 NULL
EDIT 3:
1 SIMPLE customer NULL index NULL customerid 4 NULL 1 10.00 Using where; Using temporary
1 SIMPLE customer_address NULL ALL NULL NULL NULL NULL 9983 100.00 Using where
1 SIMPLE country NULL eq_ref PRIMARY,id PRIMARY 4 db_name.customer_address.country 1 100.00 NULL
Well, you have to ALL type queries that do not use any indexes. One of them even has the dreaded filesort, which is a very expensive operation.
Add an index on customer_address.customerid field. This will be used to match the records from customer_address table to the main customer table.
List the columns you want to return from the query, do not use *. For example, I do not see why you need to return customerid from both the customer and the address tables.
Get rid of the 1st case statement. customer.customerid field will always be populated.
Add an index hint after the customer table to make mysql think about using the customerid index for sorting:
...
FROM customer FORCE INDEX index_name_forcustomerid_field
...
You may want to consider increasing join_buffer_size server variable, however, adding the index in the first place should help a lot.
you can try this. You do not need to use first CASE statement as You are never going to receive CustomerId as NULL. I have removed ORDER BY clause too as I am assuming it will improve query perfomance (CustomerId is a primary key which shows how records are physically arranged in database. The default arrangement order is Ascending.)
SELECT DISTINCT *, C.customerid as customerid,
CASE
WHEN customer_address.address1 = '' THEN 'NA'
ELSE customer_address.address1
END as address1 from (select *
FROM customer where deleted='0' order by customerid DESC) AS C
LEFT JOIN customer_address ON C.customerid = customer_address.customerid
LEFT JOIN country ON country.id = customer_address.country
LIMIT 0, 10
in one query I would like to select information from table X.
however if table X doesn't return any information I would like to retrieve data from table Y.
Apart from each other the queries would look like this:
SELECT * FROM tableY WHERE user_id=1
SELECT * FROM tableX WHERE id=1
I tried the following to combine this, but it doesn't seem to work
SELECT * FROM tableY WHERE user_id=
IF (EXISTS (SELECT * FROM tableX WHERE id=1), 1, 0)
and of course the other way around
SELECT * FROM tableX WHERE id=
IF (EXISTS (SELECT * FROM tableY WHERE user_id=1), 1, 0)
Bot versions will only execute the first query, but not the second.
So I am kinda stuck here.
I also tried this, but as the tables do not have the same rows this shouldn't work... and thats correct it doesn't work:
SELECT *
FROM orbib.billing_address
WHERE user_id=1
UNION ALL
SELECT *
FROM orbib.users
WHERE id=1
AND NOT EXISTS
(SELECT *
FROM orbib.billing_address
WHERE user_id=1
)
Also tried doing this with a procedure as explained here:
However this didn't help as well, besides that it looks like the procedure is saved, causing the user id to always be 1, and this of course varies.
Maybe anybody has an idea how to create a query which does do what I want?
EDITS:
Here are table descriptions:
tableX:
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
username varchar(30) NO UNI NULL
firstname varchar(45) YES NULL
lastname varchar(45) YES NULL
street varchar(45) YES NULL
street_nr varchar(10) YES NULL
zipcode varchar(10) YES NULL
city varchar(45) YES NULL
password varchar(255) NO NULL
salt varchar(255) NO UNI NULL
email varchar(255) NO NULL
create_time datetime NO CURRENT_TIMESTAMP
company varchar(45) YES NULL
branche varchar(45) YES NULL
tableY:
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
user_id int(11) NO NULL
company varchar(45) YES NULL
contact_name varchar(100) YES NULL
street varchar(45) YES NULL
street_nr varchar(10) YES NULL
zipcode varchar(45) YES NULL
city varchar(45) YES NULL
terms_ok tinyint(1) YES NULL
billing_ok tinyint(1) YES NULL
So from the idea from #kickstart I tried to do this:
SELECT
IFNULL(tableY.company, tableX.company) company,
IFNULL(tableY.contact_name, tableX.lastname) contact,
IFNULL(tableY.street, tableX.street) street,
IFNULL(tableY.street_nr, tableX.street_nr) street_nr,
IFNULL(tableY.zipcode, tableX.zipcode) zipcode,
IFNULL(tableY.city, tableX.city) city
FROM (SELECT * FROM tableX) x
LEFT OUTER JOIN tableY ON tableY.user_id=1
LEFT OUTER JOIN tableX ON tableX.id=1
This gave me the error: 1248 Every derived table must have its own alias.
But found the solution I forgot the x in the FROM (SELECT)
After changing this it worked, resulting on two rows however, so I need to change this a bit.
Tnx #kickstarter
Making a major assumption that this is to return a single row, then possibly have a sub query to generate a single row and then LEFT OUTER JOIN the other 2 tables to that row.
Then you can use a load of IF statements to decide which tables values to return.
Efficiency is not likely to be its strong point!
SELECT IF(tableY.user_id IS NULL, tableX.id, tableY.user_id) AS id
IF(tableY.user_id IS NULL, tableX.field2, tableY.other_field2) AS field2,
etc
FROM (SELECT 1 AS dummy) a
LEFT OUTER JOIN tableY ON tableY.user_id = 1
LEFT OUTER JOIN tableX ON tableX.id = 1
I have a MySQL query to select all product id's with certain filters applied to the products. This query
works but I want to learn to improve this query. Alternatives for this query are welcome with explanation.
SELECT kkx_products.id from kkx_products WHERE display = 'yes' AND id in
(SELECT product_id FROM `kkx_filters_products` WHERE `filter_id` in
(SELECT id FROM `kkx_filters` WHERE kkx_filters.urlname = "comics" OR kkx_filters.urlname = "comicsgraphicnovels")
group by product_id having count(*) = 2)
ORDER BY kkx_products.id desc LIMIT 0, 24
I've included the structure of the tables being used in the query.
EXPLAINkkx_filters;
Field Type Null Key Default Extra
id int(11) unsigned NO PRI NULL auto_increment
name varchar(50) NO
filtergroup_id int(11) YES MUL NULL
urlname varchar(50) NO MUL NULL
date_modified timestamp NO CURRENT_TIMESTAMP
orderid float(11,2) NO NULL
EXPLAIN kkx_filters_products;
Field Type Null Key Default Extra
filter_id int(11) NO PRI 0
product_id int(11) NO PRI 0
EXPLAIN kkx_products;
Field Type Null Key Default Extra
id int(11) NO PRI NULL auto_increment
title varchar(255) NO
urlname varchar(50) NO MUL
description longtext NO NULL
price float(11,2) NO NULL
orderid float(11,2) NO NULL
imageurl varchar(255) NO
date_created datetime NO NULL
date_modified timestamp NO CURRENT_TIMESTAMP
created_by varchar(11) NO NULL
modified_by varchar(11) NO NULL
productnumber varchar(32) NO
instock enum('yes','no') NO yes
display enum('yes','no') NO yes
Instead of using inline queries in your criteria statements, try using the EXISTS block...
http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.html
You will be able to see the difference in your explain plan. Before you had a query executing for each and every record in your result set, and every result in that inline view result set had its own query executing to.
You see how nested inline views can create an exponential increase in cost. EXISTS doesn't work that way.
Example of the use of EXISTS:
Consider tbl1 has columns id and data. tbl2 has columns id, parentid, and data.
SELECT a.*
FROM tbl1 a
WHERE 1 = 1
AND EXISTS (
SELECT NULL
FROM tbl2 b
WHERE b.parentid = a.id
AND b.data = 'SOME CONDITIONAL DATA TO CONSTRAIN ON'
)
1) We can assume the 1 = 1 is some condition that equates to true for every record
2) Doesn't matter what we select in the EXISTS statment really, NULL is fine.
3) It is important to look at b.parentid = a.id, this links our exist statement to the result set