An example of a table, data along with the query can be found in http://sqlfiddle.com/#!9/2e65dd/3
I'm interested in finding all distinct user_id's that don't have certain record_type.
In my actual case, this table is huge and it has several million records in it and have an index on user_id column. Although i'm planning to retrieve it in batches by limiting the output to 1000 at a time.
select distinct user_id from
records o where
not exists (
select *
from records i
where i.user_id=o.user_id and i.record_type=3)
limit 0, 1000
Is there a better approach to achieve this need ?
I would do it this way:
SELECT u.user_id
FROM (SELECT DISTINCT user_id FROM records) AS u
LEFT OUTER JOIN records as r
ON u.user_id = r.user_id AND r.record_type = 3
WHERE r.user_id IS NULL
That avoids the correlated subquery in your NOT EXISTS solution.
Alternatively, you should have another table that just lists users, so you don't have to do the subquery:
SELECT u.user_id
FROM users AS u
LEFT OUTER JOIN records as r
ON u.user_id = r.user_id AND r.record_type = 3
WHERE r.user_id IS NULL
In either case, it would help optimize the JOIN to add a compound index on the pair of columns:
ALTER TABLE records ADD KEY (user_id, record_type)
I's suggest a join as well, but mine would have differed from Bill K's like so:
SELECT DISTINCT r.user_id
FROM records AS r
LEFT JOIN (SELECT DISTINCT user_id FROM records WHERE record_type = 3) AS rt3users
ON r.user_id = rt3users.user_id
WHERE rt3users.user_id IS NULL
;
However, an alternative I would not expect better performance from, but is worth checking, since performance can vary based on size and content of data...
SELECT DISTINCT r.user_id
FROM records AS r
WHERE r.user_id NOT IN (
SELECT DISTINCT user_id
FROM records
WHERE record_type = 3
)
;
Note, this one is more similar to your original but does away with the correlated nature of the original subquery.
You could create a temporary table with record types equal 3 like
Select distinct user_id
into #users
from records
where record_type=3
Then create unique index (or primary key) on this table. Then you query would search indexes in both tables.
I can’t say the performance would be better you’d have to test it on your data.
Related
Here's the format of mysql code
select a,b,c
from table1
left join table2 on x=y
left join table3 on m=n
limit 100000, 10
I know know to optimize limit when I have a large offset. But I couldn't find the solution to optimize the one with multiple tables, is there any way to make my query faster?
First of all, offsets and limits are unpredictable unless you include ORDER BY clauses in your query. Without ORDER BY, your SQL server is allowed to return result rows in any order it chooses.
Second, Large offsets and small limits are a notorious query-performance antipattern. There's not much you can to do make the problem go away.
To get decent performance, it's helpful to rethink why you want to use this kind of access pattern, and then try to use WHERE filters on some indexed column value.
For example, let's say you're doing this kind of thing.
select a.user_id, b.user_email, c.user_account
from table1 a
left join table2 b on a.user_id = b.user_id
left join table3 c on b.account_id = c.account_id
limit whatever
Let's say you're paginating the query so you get fifty users at a time. Then you can start with a last_seen_user_id variable in your program, initialized to -1.
Your query looks like this:
select a.user_id, b.user_email, c.user_account
from (
select user_id
from table1
where user_id > ?last_seen_user_id?
order by user_id
limit 50
) u
join table1 a on u.user_id = a.user_id
left join table2 b on a.user_id = b.user_id
left join table3 c on b.account_id = c.account_id
order by a.user_id
Then, when you retrieve that result, set your last_seen_user_id to the value from the last row in the result.
Run the query again to get the next fifty users. If table1.user_id is a primary key or a unique index, this will be fast.
I have two related tables as follows :
USERS
user_id <\PK>
USERACTIONS
user_action_id <\PK>
user_id <\FK>
user_action <\int>
Whenever user performs an action, there is a new insertion in "useractions" table. I need a query to fetch those USERACTION rows where user performed only particular set of actions say (1,2) but not (3,4).
So I have a query like -
select * from USERACTIONS where (1,2) in(select user_action from USERACTIONS where user_id=100) and user_id=100;
Problem is the above query doesn't work as supplying (1,2) expects subquery also to return two columns which is understandable. This is the error I get -
ERROR: subquery has too few columns
Giving a single value say (1) or (2) works perfectly. I want to know if there is any way I can use the same query and compare the subquery's result with multiple values? I prefer the same query because the case demonstrated here is just a part of a large query.
Please note the query should not list users who performed (1,2,3,4) those who performed only (1,2) should be listed and also user_action values can be any random integer.
Any alternate queries are welcome but would prefer changes in the same query. Thanks in advance.
try this:
SELECT USERS.user_id, USERACTIONS.user_action
FROM USERACTIONS
LEFT JOIN USERS ON USERS.user_id = USERACTIONS.user_id where USERACTIONS.user_action in (1,2);
This Works for your query.
You add the numbers to the in Clause
SELECT a.user_id
FROM
(SELECT DISTINCT user_id
from
USERACTIONS
WHERE user_action
IN (1,2)) a
INNER JOIN
(SELECT DISTINCT user_id
from
USERACTIONS
WHERE user_action
NOT IN (1,2)) b
ON a.user_id <> b.user_id
;
CREATE TABLE USERACTIONS (id INT NOT NULL AUTO_INCREMENT
, PRIMARY KEY(id)
, user_action INT
, user_id INT
);
INSERT USERACTIONS VALUES (NULL,1,100),(NULL,2,100),(NULL,3,100), (NULL,1,101),(NULL,2,101);
✓
✓
SELECT a.user_id
FROM
(SELECT DISTINCT user_id
from
USERACTIONS
WHERE user_action
IN (1,2)) a
INNER JOIN
(SELECT DISTINCT user_id
from
USERACTIONS
WHERE user_action
NOT IN (1,2)) b
ON a.user_id <> b.user_id
;
| user_id |
| ------: |
| 101 |
db<>fiddle here
I see typical SO answers that aren't answering OP's question, but rather trying to steer them in a different direction. I know this is old, but if anyone stumbles upon this, I believe this will be more helpful.
I too have a large, enterprise solution where the WHERE check is MUCH more performant in a subquery than using a JOIN.
You can set a variable in your WHERE clause and use it afterwards. I am currently trying to find a better way to do this without setting a variable, but something like this works:
SELECT * FROM USERACTIONS
WHERE
( #useraction =
(select user_action from USERACTIONS where user_id=100 LIMIT 1)
= 1
OR #useraction = 2)
AND user_id=100;
What you are doing is creating a variable in your WHERE clause, setting that variable, then using it later. This is encapsulated, so it can match either one of the conditions.
I need a fresh pair of eyes on this. I have two tables, one of which has users and the second which contains login records, multiple records for each user. What I'm trying to do is select all entries from the first table, and the most recent record from the second table, e.g., a list of all users but only show the most recent activity. Both tables have auto increment in the ID column.
My code currently is thus:
SELECT u.user_id, u.name, u.email, r.rid, r.user_id
FROM users AS u
LEFT JOIN login_records AS r ON r.user_id = u.user_id
WHERE
r.rid = (
SELECT MAX( rid )
FROM login_records
WHERE user_id = u.user_id
)
I've scoured answers to similar questions on SO and tried all of them, but results have been either returning nothing or only getting odd results (not necessarily the newest one). ID in both tables is auto-increment, so I thought it should be a relatively simple matter to get the only or highest ID for a particular user, but it either returns nothing or a completely different selection each time.
It's my first time using JOIN - do I have the wrong JOIN? Do I need to ORDER or GROUP things differently?
Thanks for your help. It's got to be something simple, since Danny Coulombe's answer appearing here seems to work for other users.
You will need a subquery I believe:
https://www.db-fiddle.com/f/2wudMDVxReYJz4FEyG19Va/0
CREATE TABLE users (
user_id INT UNSIGNED NOT NULL
AUTO_INCREMENT PRIMARY KEY
);
CREATE TABLE users_logins (
user_login_id INT UNSIGNED NOT NULL
AUTO_INCREMENT PRIMARY KEY,
user_id INT UNSIGNED NOT NULL
);
INSERT INTO users SELECT 1;
INSERT INTO users SELECT 2;
INSERT INTO users_logins SELECT 1,1;
INSERT INTO users_logins SELECT 2,1;
INSERT INTO users_logins SELECT 3,1;
INSERT INTO users_logins SELECT 4,1;
INSERT INTO users_logins SELECT 5,2;
INSERT INTO users_logins SELECT 6,2;
And the query:
SELECT
u.user_id, ul.latest_login_id
FROM users u
LEFT JOIN
(
SELECT user_id, MAX(user_login_id) latest_login_id
FROM users_logins
GROUP BY user_id
) ul ON u.user_id = ul.user_id
You have to ORDER BY with what column you want to display by desc, for example ORDER BY last_login DESC.
Change the last_login column with the column you want to order, but you must first declare the last_login column after SELECT.
How about replacing all rid in where clause and corrolated subquery by record_id?
SELECT u.user_id, u.name, u.email, r.rid, r.record_id, r.user_id
FROM test_users AS u
LEFT JOIN test_login_records AS r ON r.user_id = u.user_id
WHERE
(r.record_id = (
SELECT MAX(record_id)
FROM test_login_records
WHERE user_id = u.user_id
) OR r.record_id is null);
Test here
I am building a sql query with a large set of data but query is too slow
I've got 3 tables; movies, movie_categories, skipped_movies
The movies table is normalized and I am trying to query a movie based on a category while excluding ids from skipped_movies table.
However I am trying to use WHERE IN and WHERE NOT IN to in my query.
movies table has approx. 2 million rows (id, name, score)
movie_categories approx. 5 million (id, movie_id, category_id)
skipped_movies has approx. 1k rows (id, movie_id, user_id)
When the skipped_movies table is very small 10 - 20 rows the query is quite fast. (about 40 - 50 ms) but when the table gets somewhere around 1k of data I get somewhere around 7 to 8 seconds on the query.
This is the query I'm using.
SELECT SQL_NO_CACHE * FROM `movies` WHERE `id` IN (SELECT `movie_id` FROM `movie_categories` WHERE `category_id` = 1) AND `id` NOT IN (SELECT `movie_id` FROM `skipped_movies` WHERE `user_id` = 1) AND `score` <= 9 ORDER BY `score` DESC LIMIT 1;
I've tried many ways that came to mind but this was the fastest one. I even tried the EXISTS method to no extent.
I'm using the SQL_NO_CACHE just for testing.
And I guess that the ORDER BY statement is running very slow.
Assuming that (movie_id,category_id) is unique in movies_categories table, I'd get the specified result using join operations, rather than subqueries.
To exclude "skipped" movies, an anti-join pattern would suffice... that's a left outer join to find matching rows in skipped_movies, and then a predicate in the WHERE clause to exclude any matches found, leaving only rows that didn't have a match.
SELECT SQL_NO_CACHE m.*
FROM movies m
JOIN movie_categories c
ON c.movie_id = m.id
AND c.category_id = 1
LEFT
JOIN skipped_movies s
ON s.movie_id = m.id
AND s.user_id = 1
WHERE s.movie_id IS NULL
AND m.score <= 9
ORDER
BY m.score DESC
LIMIT 1
And appropriate indexes will likely improve performance...
... ON movie_categories (category_id, movie_id)
... ON skipped_movies (user_id, movie_id)
Most IN/NOT IN queries can be expressed using JOIN/LEFT JOIN, which usually gives the best performance.
Convert your query to use joins:
SELECT m.*
FROM movies m
JOIN movie_categories mc ON m.id = mc.movie_id AND mc.category_id = 1
LEFT JOIN skipped_movies sm ON m.id = sm.movie_id AND sm.user_id = 1
WHERE sm.movie_id IS NULL
AND score <= 9
ORDER BY score DESC
LIMIT 1
Your query seem to be all right. Just a small tweak need. You can replace * with with the column/attribute names in your table. It will make this query work faster then ever. Since * operation is really slow
I was wondering what is better in MySQL. I have a SELECT query that exclude every entry associated to a banned userID.
Currently I have a subquery clause in the WHERE statement that goes like
AND (SELECT COUNT(*)
FROM TheBlackListTable
WHERE userID = userList.ID
AND blackListedID = :userID2 ) = 0
Which will accept every userID not present in the TheBlackListTable
Would it be faster to retrieve first all Banned ID in a previous request and replace the previous clause by
AND creatorID NOT IN listOfBannedID
LEFT JOIN / IS NULL and NOT IN are fastest:
SELECT *
FROM mytable
WHERE id NOT IN
(
SELECT userId
FROM blacklist
WHERE blackListedID = :userID2
)
or
SELECT m.*
FROM mytable m
LEFT JOIN
blacklist b
ON b.userId = m.id
AND b.blackListedID = :userID2
WHERE b.userId IS NULL
NOT EXISTS yields the same plan but due to implementation flaws is marginally less efficient:
SELECT *
FROM mytable
WHERE NOT EXISTS
(
SELECT NULL
FROM blacklist b
WHERE b.userId = m.id
AND b.blacklistedId = :userID2
)
All these queries stop on the first match in blacklist (hence performing a semi-join)
The COUNT(*) solution is the least efficient, since MySQL will calculate the actual COUNT(*) rather than stopping on the first match.
However, if you have a UNIQUE index on (userId, blacklistedId), this is not much of problem as there cannot be more than one match anyway.
Use EXISTS clause to check for user not in blacklist.
Sample Query
Select * from userList
where not exists( Select 1 from TheBlackListTable where userID = userList.ID)
IN clause is used when there is fixed values or low count of values.