I have this dataset:
id uid follows_uid status
1 1 2 ACTIVE
2 1 3 ACTIVE
3 3 1 ACTIVE
4 4 1 ACTIVE
5 2 1 ACTIVE
on giving uid I want to calculate how many users are following, and how many are followed by (the given user).
Result set will be:
following followers
2 3
and here is the query which does the work:
SELECT COUNT(*) as following,
(SELECT COUNT(*) FROM user_followers where follows_uid = 1 ) as followers
FROM user_followers
WHERE uid = 1 and `status` = 'ACTIVE'
Now the question is, In't there any other way to get this done? Or is it the best way to achieve this?
If you have separate indexes on uid and follows_uid, then I believe using subqueries as you did is the fastest way to retrieve the separate counts because each query will take advantage of an index to retrieve the count.
Here's another way of achieving it.
select following.*, followers.* from
(select count(uid) from user_followers where uid = 1) following,
(select count(follows_uid) from user_followers where follows_uid = 1) followers;
And, to answer your question, your subquery approach is, in fact, the best way to achieve it. As pointed out by #FuzzyTree, you could use indexes to optimise your performance.
SELECT
IFNULL(SUM(IF(uid = 1, 1, 0)), 0) as following,
IFNULL(SUM(IF(follows_uid = 1, 1, 0)), 0) as followers
FROM user_followers
WHERE (uid = 1 OR follows_uid = 1)
AND `status` = 'ACTIVE';
Click here to see SQL Fiddle
Related
Fairly new to MySQL and I'm struggling with a query for table of data I'm trying to filter through. What I'd like to be able to do is identify the user_id's only where a set of conditions is met on the record column.
Example
Return a SINGLE user_id of each of the users that hold ALL of the records 1, 2 & 3.
user_id record
---------------------
1000 1
1001 1
1002 1
1003 1
1004 1
1000 2
1000 3
1002 2
1002 3
The ideal output in this example would be...
user_id
-------
1000
1002
I've tried quite a few variants using HAVING, COUNT and IN but I never seem to get the correct output and I think I'm starting to confuse myself. Anyone that could help would be greatly appreciated.
Do aggregation :
select user_id
from t
where record in (1, 2, 3)
group by user_id
having count(*) = 3; -- Use distinct inside function in case of duplicate records
If you don't know what the records are, then you can do :
select user_id
from t
group by user_id
having count(*) = (select count(distinct record) from t);
You can use HAVING Clause with a query to distinctly count all user_id values :
SELECT user_id
FROM t
GROUP BY user_id
HAVING COUNT(*) = (SELECT COUNT(distinct record) FROM t );
I have a table like
uid programid
1 3
1 4
2 5
2 6
3 3
...
but imagine that on one million line, what I would like to get is something like
Is this possible doing that using mysql ? Percentage is not that important but I would really appreciate to get the 'cluster' part.
Thanks for your help :)
You can pre-compute data using table expressions as shown below. If using MySQL 8.x you can use CTEs (that are friendlier to use). For example:
select
favorites,
users,
case when users = 0 then 0 else mod(users - 1, 5) end as cluster
from (
select
favorites,
count(*) as users
from (
select uid, count(*) as favorites
from t
group by uid
) x
group by favorites
) y
order by favorites
Im using MySQL
I cant change the DB structure, so thats not an option sadly
THE ISSUE:
When i use GROUP BY with CASE (as need in my situation), MYSQL uses
file_sort and the delay is humongous (approx 2-3minutes):
http://sqlfiddle.com/#!9/f97d8/11/0
But when i dont use CASE just GROUP BY group_id , MYSQL easily uses
index and result is fast:
http://sqlfiddle.com/#!9/f97d8/12/0
Scenerio: DETAILED
Table msgs, containing records of sent messages, with fields:
id,
user_id, (the guy who sent the message)
type, (0=> means it's group msg. All the msgs sent under this are marked by group_id. So lets say group_id = 5 sent 5 msgs, the table will have 5 records with group_id =5 and type=0. For type>0, the group_id will be NULL, coz all other types have no group_id as they are individual msgs sent to single recipient)
group_id (if type=0, will contain group_id, else NULL)
Table contains approx 10 million records for user id 50001 and with different types (i.e group as well as individual msgs)
Now the QUERY:
SELECT
msgs.*
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.user_id IN (50111)
AND msgs.type IN (0, 1, 5, 7)
GROUP BY CASE `msgs`.`type` WHEN 0 THEN `msgs`.`group_id` ELSE `msgs`.`id` END
ORDER BY `msgs`.`group_id` DESC
LIMIT 100
I HAVE to get summary in a single QUERY,
so msgs sent to group lets say 5 (have 5 records in this table) will be shown as 1 record for summary (i may show COUNT later, but thats not an issue).
The individual msgs have NULL as group_id, so i cant just put 'GROUP BY group_id ' coz that will Group all individual msgs to single record which is not acceptable.
Sample output can be something like:
id owner_id, type group_id COUNT
1 50001 0 2 5
1 50001 1 NULL 1
1 50001 4 NULL 1
1 50001 0 7 5
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 0 10 5
Now the problem is that the GROUP condition after using CASE (which i currently think that i have to because i only need to group by group_id if type=0) is causing alot of delay coz it's not using indexes which it does if i dont use CASE (like just group by group_id ). Please view SQLFiddles above to see the explain results
Can anyone plz give an advice how to get it optimized
UPDATE
I tried a workaround , that does somehow works out (drops INITIAL queries to 1sec). Using union, what it does is, to minimize the resultset by union that forces SQL to write on disk for filesort (due to huge resultset), limit the resultset of group msgs, and individual msgs (view query below)
-- first part of union retrieves group msgs (that have type 0 and needs to be grouped by group_id). Applies the limit to captivate the out of control result set
-- The second query retrieves individual msgs, (those with type !=0, grouped by msgs.id - not necessary but just to be save from duplicate entries due to joins). Applies the limit to captivate the out of control result set
-- JOins the two to retrieve the desired resultset
Here's the query:
SELECT
*
FROM
(
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (msgs.user_id = accounts.id)
WHERE 1
AND accounts.id IN (50111 ) AND type = 0
GROUP BY msgs.group_id
ORDER BY msgs.id DESC
LIMIT 40
)
UNION
ALL
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.type != 0
AND accounts.id IN (50111)
GROUP BY msgs.id
ORDER BY msgs.id
LIMIT 40
)
) AS temp
ORDER BY reference_id
LIMIT 20,20
But has alot of caveats,
-I need to handle the limit in inner queries as well. Lets say 20recs per page, and im on page 4. For inner queries , i need to apply limit 0,80, since im not sure which of the two parts had how many records in the previous 3 pages. So, as the records per page and number of pages grow, my query grows heavier. Lets say 1k rec per page, and im on page 100 , or 1K, the load gets heavier and time exponentially increases
I need to handle ordering in inner queries and then apply on the resultset prepared by union , conditions need to be applied on both inner queries seperately(but not much of an issue)
-Cant use calc_found_rows, so will need to get count using queries seperately
The main issue is the first one. The higher i go with the pagination , the heavier it gets
Would this run faster?
SELECT id, user_id, type, group_id
FROM
( SELECT id, user_id, type, group_id, IFNULL(group_id, id) AS foo
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
)
GROUP BY foo
ORDER BY `group_id` DESC
LIMIT 100
It needs INDEX(user_id, type).
Does this give the 'correct' answer?
SELECT DISTINCT *
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
GROUP BY IFNULL(group_id, id)
ORDER BY `group_id` DESC
LIMIT 100
(It needs the same index)
I have the table solution with: id, user_id, problem_id correct, date, tries
where correct can be true or false
date is the date the solution was saved
and tries is the number of times the user have submitted a solution
user_id problem_id tries correct
------- ---------- ----- --------
1 1 1 true
1 2 1 false
1 2 2 false
1 2 3 false
1 3 1 false
1 3 2 false
1 3 3 true
1 3 4 false
I need to get the user's number of tries before the first correct solution,
So I've tried this:
SELECT problem_id, tries FROM solution
where user_id= and correct = true
group by problem_id order by date;
This gives me the number of tries until the first correct solution, but only for the solutions that were at least once correct.
problem_id tries
---------- -----
1 1
3 3
I also need to see the number of tries even if the user has never had a correct solution.
How can I get these two results together?
problem_id tries
---------- -----
1 1
2 3
3 3
Possibly using a sub query (not tested):-
SELECT problem_id, IF(b.user_id IS NULL, 0, COUNT(*))
FROM solution a
LEFT OUTER JOIN
(
SELECT user_id, problem_id, MIN(date) AS min_date
FROM solution
WHERE correct = true
GROUP BY user_id, problem_id
) b
ON a.problem_id = b.problem_id
AND a.user_id = b.user_id
AND a.date < b.min_date
WHERE a.user_id = ?
GROUP BY problem_id
EDIT - Having played with the test data I think I may have a solution. Not sure if there are any edge cases it fails on though:-
SELECT a.user_id, a.problem_id, SUM(IF(b.user_id IS NULL OR a.date <= b.min_date, 1, 0))
FROM solution a
LEFT OUTER JOIN
(
SELECT user_id, problem_id, MIN(date) AS min_date
FROM solution
WHERE correct = 'true'
GROUP BY user_id, problem_id
) b
ON a.problem_id = b.problem_id
AND a.user_id = b.user_id
GROUP BY a.user_id, problem_id
This has a sub query to find the lowest date with a correct solution for a user problem and joins that against the list of solutions. It the does a SUM of 1 or 0, with a row counting as 1 if there is no correct solution, or if there is a correct solution and the date of that correct solution is greater or equal this this solutions date.
SQL fiddle for it here:-
http://www.sqlfiddle.com/#!2/f48e11/1
If the query you mention gets you what you want for "Correct" answers, to get "Incorrect" numbers, just UNION your query with an effective duplicate of itself adjusting the correct = true to a correct = false predicate.
I have two tables:
Members:
id username
Trips:
id member_id flag_status created
("YES" or "NO")
I can do a query like this:
SELECT
Trip.id, Trip.member_id, Trip.flag_status
FROM
trips Trip
WHERE
Trip.member_id = 1711
ORDER BY
Trip.created DESC
LIMIT
3
Which CAN give results like this:
id member_id flag_status
8 1711 YES
9 1711 YES
10 1711 YES
My goal is to know if the member's last three trips all had a flag_status = "YES", if any of the three != "YES", then I don't want it to count.
I also want to be able to remove the WHERE Trip.member_id = 1711 clause, and have it run for all my members, and give me the total number of members whose last 3 trips all have flag_status = "YES"
Any ideas?
Thanks!
http://sqlfiddle.com/#!2/28b2d
In that sqlfiddle, when the correct query i'm seeking runs, I should see results such as:
COUNT(Member.id)
2
The two members that should qualify are members 1 and 3. Member 5 fails because one of his trips has flag_status = "NO"
You could use GROUP_CONCAT function, to obtain a list of all of the status ordered by id in ascending order:
SELECT
member_id,
GROUP_CONCAT(flag_status ORDER BY id DESC) as status
FROM
trips
GROUP BY
member_id
HAVING
SUBSTRING_INDEX(status, ',', 3) NOT LIKE '%NO%'
and then using SUBSTRING_INDEX you can extract only the last three status flags, and exclude those that contains a NO. Please see fiddle here. I'm assuming that all of your rows are ordered by ID, but if you have a created date you should better use:
GROUP_CONCAT(flag_status ORDER BY created DESC) as status
as Raymond suggested. Then, you could also return just the count of the rows returned using something like:
SELECT COUNT(*)
FROM (
...the query above...
) as q
Although I like the simplicity of fthiella's solution, I just can't think of a solution that depends so much on data representation. In order not to depend on it you can do something like this:
SELECT COUNT(*) FROM (
SELECT member_id FROM (
SELECT
flag_status,
#flag_index := IF(member_id = #member, #flag_index + 1, 1) flag_index,
#member := member_id member_id
FROM trips, (SELECT #member := 0, #flag_index := 1) init
ORDER BY member_id, id DESC
) x
WHERE flag_index <= 3
GROUP BY member_id
HAVING SUM(flag_status = 'NO') = 0
) x
Fiddle here. Note I've slightly modified the fiddle to remove one of the users.
The process basically ranks the trips for each of the members based on their id desc and then only keeps the last 3 of them. Then it makes sure that none of the fetched trips has a NO in the flag_status. FInally all the matching meembers are counted.