MySQL merging two queries one with group by - mysql

I have two tables, one holds user info (id, name, etc) and another table that holds user tickets and ticket status (ticket_id, user_id, ticket_status, etc).
I want to produce a list of ALL the users for example: ( SELECT * FROM user_table )
And for each user I need a count of their tickets for example:
(SELECT t1.user_id, COUNT(*) FROM user_tickets t1 WHERE t1.ticket_status = 15 GROUP BY t1.ticket_status, t1.user_id )
I can do this query to achieve what I’m looking for but it takes 5sec. to run the query on 50000 tickets, while each query running separately only takes fraction of a second.
SELECT t1.user_id, COUNT(*)
FROM user_tickets t1
LEFT JOIN user_table t2 ON t1.user_id = t2.id
WHERE t2.group_id = 20 AND t1.status_id = 15
GROUP BY t1.status_id, user_id
Any idea how to write the query to get same performance as each separately?

An indexing where clause fixed the problem.

Related

Selecting Counts from Different Tables with a Subquery

I'm new to MySQL, and I'd like some help in setting up a MySQL query to pull some data from a few tables (~100,000 rows) in a particular output format.
This problem involves three SQL tables:
allusers : This one contains user information. The columns of interest are userid and vip
table1 and table2 contain data, but they also have a userid column, which matches the userid column in allusers.
What I'd like to do:
I'd like to create a query which searches through allusers, finds the userid of those that are VIP, and then count the number of records in each of table1 and table2 grouped by the userid. So, my desired output is:
userid | Count in Table1 | Count in Table2
1 | 5 | 21
5 | 16 | 31
8 | 21 | 12
What I've done so far:
I've created this statement:
SELECT userid, count(1)
FROM table1
WHERE userid IN (SELECT userid FROM allusers WHERE vip IS NOT NULL)
GROUP BY userid
This gets me close to what I want. But now, I want to add another column with the respective counts from table2
I also tried using joins like this:
select A.userid, count(T1.userid), count(T2.userid) from allusers A
left join table1 T1 on T1.userid = A.userid
left join table2 T2 on T2.userid = A.userid
where A.vip is not null
group by A.userid
However, this query took a very long time and I had to kill the query. I'm assuming this is because using Joins for such large tables is very inefficient.
Similar Questions
This one is looking for a similar result as I am, but doesn't need nearly as much filtering with subqueries
This one sums up the counts across tables, while I need the counts separated into columns
Could someone help me set up the query to generate the data I need?
Thanks!
You need to pre-aggregate first, then join, otherwise the results will not be what you expect if a user has several rows in both table1 and table2. Besides, pre-aggregation is usually more efficient than outer aggregation in a situation such as yours.
Consider:
select a.userid, t1.cnt cnt1, t2.cnt cnt2
from allusers a
left join (select userid, count(*) cnt from table1 group by userid) t1
on t1.userid = a.userid
left join (select userid, count(*) cnt from table2 group by userid) t2
on t2.userid = a.userid
where a.vip is not null
This is a case where I would recommend correlated subqueries:
select a.userid,
(select count(*) from table1 t1 where t1.userid = a.userid) as cnt1,
(select count(*) from table2 t2 where t2.userid = a.userid) as cnt2
from allusers a
where a.vip is not null;
The reason that I recommend this approach is because you are filtering the alllusers table. That means that the pre-aggregation approach may be doing additional, unnecessary work.

Joining two tables in mysql - One to many relationship

I have 2 tables in mysql - User (user_id, first_name ....) and login_history(user_id, login_time)
Every time an user loges in, system records the time in login_history.
I want to run a query to fetch all the fields from the users table and the latest login time from login_history . Can anyone help please?
You have to use a join then :
SELECT *, login_history.login_time
FROM User
INNER JOIN login_history
ON User.user_id=login_history.user_id;
And this query gonna give you, all the columns of User and the login_time.
SELECT t1.col1
,t1.col2
,[...repeat for all columns in User table]
,max(t2.login_time)
FROM user t1
INNER JOIN login_history t2 ON t1.user_id = t2.user_id
GROUP BY t1.col1
,t1.col2
,[..repeat for all columns in User table]
This should work, assuming login_time is stored in a sane data type and/or format.
Following are 2 queries that can help you out to select latest login time with user details
SELECT * FROM User C,login_history O where C.user_id=O.user_id order by O.login_time desc limit 1
or
SELECT * FROM User C,login_history O where C.user_id=O.user_id and ROWNUM <=1 order by O.login_time desc

MySQL Select From Multiple Tables + Last row from somewhere else

Hello I have these 4 tables with the structure
Table: Users [ id, username, password, bouquet_id ]
Table: Bouquets [ id, bouquet_name, stream_ids = serialized array ]
Table: Streams [ id, channel_name ]
Table: activity [ id, user_id, stream_id ]
I want to select ALL users but with their info as well from other tables + THE LAST ROW from table activity per user
For example the following query:
SELECT t1.*,t2.`bouquet_name`
FROM `users` t1,`bouquets` t2
WHERE t1.`bouquet_id ` = t2.`id`
ORDER BY t1.id DESC
Takes the data from the first 2 tables and assigned the bouquet_id to its bouquet name.
Now i want to have in query the last ROW from activity table WITH it's stream name [based on stream_id]
The following query does the job i want[ PER USER]
SELECT t1.channel_name
FROM `streams` t1,`activity` t2
WHERE t2.user_id = **'%d'** AND t1.id = t2.stream_id
ORDER BY t2.id DESC
LIMIT 1
But its a kind slow since for every user in the table "users" i run 2 queries.
I want the 2 queries above to be embed together as one so that i will be able to select the data from the first two tables BUT WITH the last row from table activity based on user_id.
Hope you understand me
thank you
You should not depend on "last row" = "row with greatest id". Since MySQL provides the possibility to setup master-master replications, which then offers the possibility to assign auto_increment values in more or less arbitrary order, this assumption is not always true. An additional timestamp column would be better.
You can select the most recent activity per user for example with:
SELECT user_id, MAX(id) FROM activity GROUP BY user_id
You then have to join this into your query and join in the basic activity table to retrieve the data you actually want. You may want to replace your joins with left joins too, so you will always retrieve all users regardless if there exists rows in bouquets or activity:
SELECT t1.*, t2.`bouquet_name`, activity.*
FROM `users` t1
LEFT JOIN `bouquets` t2
ON t1.`bouquet_id ` = t2.`id`
LEFT JOIN (
SELECT user_id, MAX(id) AS maxid FROM activity GROUP BY user_id
) mostrecent
ON t1.user_id = mostrecent.user_id
LEFT JOIN activity
ON t1.user_id = activity.user_id AND activity.id = mostrecent.maxid
ORDER BY t1.id DESC
And then you can join in the stream data too like
SELECT t1.*, t2.`bouquet_name`, activity.stream_id, streams.channel_name
FROM `users` t1
LEFT JOIN `bouquets` t2
ON t1.`bouquet_id ` = t2.`id`
LEFT JOIN (
SELECT user_id, MAX(id) AS maxid FROM activity GROUP BY user_id
) mostrecent
ON t1.user_id = mostrecent.user_id
LEFT JOIN activity
ON t1.user_id = activity.user_id AND activity.id = mostrecent.maxid
LEFT JOIN streams
ON activity.stream_id = streams.id
ORDER BY t1.id DESC

Slow MySQL query using LEFT JOIN

I'm using a simple left join query to fetch two rows of data from two separate tables. They both hold a common column named domain and I join them on this column to calculate a value based on the one tables visits and the other tables earnings.
SELECT t1.`domain` AS `domain`,
(SUM(earnings)/SUM(visits)) AS `rpv`
FROM hat_adsense_stats t1
LEFT JOIN hat_analytics_stats t4 ON t4.`domain`=t1.`domain`
WHERE(t1.`hat_analytics_id`='91' OR t1.`hat_analytics_id`='92')
AND t1.`date`>='2013-02-18'
AND t4.`date`>='2013-02-18'
GROUP BY t1.`domain`
ORDER BY rpv DESC
LIMIT 10;
this is the query i run and it takes 9.060 sec to execute.
The hat_adsense_stats table contains 60887 records
The hat_analytics_stats table contains 190780 records
but by grouping by domain it returns 186 rows of data that needs comparing.
Any suggestions on in-efficient code or on better way to resolve this will be appreciated!
thanks raheel for opening the door, this is what worked in the end, with a execution time of 0.051sec. :)
SELECT
t1.`domain` AS `domain`,
SUM(earnings)/visits AS `rpv`
FROM hat_adsense_stats t1
INNER JOIN (SELECT
domain,
SUM(visits) AS visits
FROM hat_analytics_stats
WHERE `date` >= "2013-02-18"
GROUP BY domain) AS t4
ON t4.domain = t1.domain
WHERE t1.`hat_analytics_id` IN('91','92')
AND t1.`date`>='2013-02-18'
GROUP BY t1.`domain`
ORDER BY rpv DESC
LIMIT 10
Change your query like this
SELECT
t1.`domain` AS `domain`,
t2.earnings/t2.visits AS `rpv`
FROM hat_adsense_stats t1
INNER JOIN (SELECT
domain,
sum(earnings) AS earnings,
SUM(visits) AS visits
FROM hat_adsense_stats
GROUP BY domain) AS t2
on t2.domain = t1.domain
LEFT JOIN hat_analytics_stats t4
ON t4.`domain` = t1.`domain`
WHERE t1.`hat_analytics_id` IN('91','92')
AND t1.`date` >= '2013-02-18'
AND t4.`date` >= '2013-02-18'
GROUP BY t1.`domain`
ORDER BY rpv DESC
LIMIT 10;
The LEFT JOIN is unnecessary as you check the value of an item from the right side of the join. An INNER JOIN would work just as well here and might well be quicker

How do I write this kind of query (returning the latest avaiable data for each row)

I have a table defined like this:
CREATE TABLE mytable (id INT NOT NULL AUTO_INCREMENT, PRIMARY KEY(id),
user_id INT REFERENCES user(id) ON UPDATE CASCASE ON DELETE RESTRICT,
amount REAL NOT NULL CHECK (amount > 0),
record_date DATE NOT NULL
);
CREATE UNIQUE INDEX idxu_mybl_key ON mytable (user_id, amount, record_date);
I want to write a query that will have two columns:
user_id
amount
There should be only ONE entry in the returned result set for a given user. Furthermore, the amount figure returned should be the last recoreded amount for the user (i.e. MAX(record_date).
The complication arises because weights are recorded on different dates for different users, so there is no single LAST record_date for all users.
How may I write (preferably an ANSI SQL) query to return the columns mentioned previously, but ensuring that its only the amount for the last recorded amount for the user that is returned?
As an aside, it is probably a good idea to return the 'record_date' column as well in the query, so that it is eas(ier) to verify that the query is working as required.
I am using MySQL as my backend db, but ideally the query should be db agnostic (i.e. ANSI SQL) if possible.
First you need the last record_date for each user:
select user_id, max(record_date) as last_record_date
from mytable
group by user_id
Now, you can join previous query with mytable itself to get amount for this record_date:
select
t1.user_id, last_record_date, amount
from
mytable t1
inner join
( select user_id, max(record_date) as last_record_date
from mytable
group by user_id
) t2
on t1.user_id = t2.user_id
and t1.record_date = t2.last_record_date
A problem appears becuase a user can have several rows for same last_record_date (with different amounts). Then you should get one of them, sample (getting the max of the different amounts):
select
t1.user_id, t1.record_date as last_record_date, max(t1.amount)
from
mytable t1
inner join
( select user_id, max(record_date) as last_record_date
from mytable
group by user_id
) t2
on t1.user_id = t2.user_id
and t1.record_date = t2.last_record_date
group by t1.user_id, t1.record_date
I do not now about MySQL but in general SQL you need a sub-query for that. You must join the query that calculates the greatest record_date with the original one that calculates the corresponding amount. Roughly like this:
SELECT B.*
FROM
(select user_id, max(record_date) max_date from mytable group by user_id) A
join
mytable B
on A.user_id = B.user_id and A.max_date = B.record_date
SELECT datatable.* FROM
mytable AS datatable
INNER JOIN (
SELECT user_id,max(record_date) AS max_record_date FROM mytable GROUP BS user_id
) AS selectortable ON
selectortable.user_id=datatable.user_id
AND
selectortable.max_record_date=datatable.record_date
in some SQLs you might need
SELECT MAX(user_id), ...
in the selectortable view instead of simply SELECT user_id,...
The definition of maximum: there is no larger(or: "more recent") value than this one. This naturally leads to a NOT EXISTS query, which should be available in any DBMS.
SELECT user_id, amount
FROM mytable mt
WHERE mt.user_id = $user
AND NOT EXISTS ( SELECT *
FROM mytable nx
WHERE nx.user_id = mt.user_id
AND nx.record_date > mt.record_date
)
;
BTW: your table definition allows more than one record to exist for a given {id,date}, but with different amounts. This query will return them all.