Given the schema
The following query
SELECT a.user_id,
a.id,
a.date_created,
avg(ai.level) level
FROM assessment a
JOIN assessment_item ai ON a.id = ai.assessment_id
GROUP BY a.user_id, a.id;
Returns these results
user_id, a.id, a.date_created, level
1, 99, "2015-07-13 18:26:00", 4.0000
1, 98, "2015-07-13 19:04:58", 6.0000
13, 9, "2015-07-13 18:26:00", 2.0000
13, 11, "2015-07-13 19:04:58", 3.0000
I would like to change the query such that only the earliest results is returned for each user. In other words, the following should be returned instead
user_id, a.id, a.date_created, level
1, 99, "2015-07-13 18:26:00", 4.0000
13, 9, "2015-07-13 18:26:00", 2.0000
I think I need to add a HAVING clause, but I'm struggling to figure out the exact syntax.
I have done something like this, except for a small difference I wanted first 5 per group. The usage case was for reporting - means time for running query / creation of temp table was not a constraint.
The solution I had:
Create a new table with columns as id( a reference to original table) and id can be unique/primary
INSERT IGNORE INTO tbl1 (id) select min(id) from original_tbl where id not in (select id from tbl1) group by user_id
Repeat step 2 as many times you required( in my case it was 5 times). the new table table will have only the ids you want to show
Now run a join on tbl1 and original table will give you the required result
Note: This might not be the best solution, but this worked for me when I had to share the report in 2-3hours in a weekend. And the data size I had was around 1M records
Disclaimer: I am in a bit of a hurry, and have not tested this fully
-- Create a CTE that holds the first and last date for each user_id.
with first_and_last as (
-- Get the first date (min) for each user_id
select a.[user_id], min(a.date_created) as date_created
from assessment as a
group by a.[user_id]
-- Combine the first and last, so each user_id should have two entries, even if they are the same one.
union all
-- Get the last date (max) for each user_id
select a.[user_id], max(a.date_created)
from assessment as a
group by a.[user_id]
)
select a.[user_id],
a.id,
a.date_created,
avg(ai.[level]) as [level]
from assessment as a
inner join assessment_item as ai on a.id = ai.assessment_id
-- Join with the CTE to only keep records that have either the min or max date_created for each user_id.
inner join first_and_last as fnl on a.[user_id] = fnl.[user_id] and a.date_created = fnl.date_created
group by a.[user_id], a.id, a.date_created;
Related
SQL Query I Am Working With
Result from the table
What I am trying to accomplish is that instead of just having values for places where num_opens is actually counted, I would want to have it show all potential num_opens values between the minimum and maximum value, and their total to be 0. For example, in the photo we see a jump between
num_opens: 7 Total: 1
num_opens: 10 Total: 1
But I would like it to be
num_opens: 7 Total: 1
num_opens: 8 Total: 0
num_opens: 9 Total: 0
num_opens: 10 Total: 1
and similarly for all potential num_opens values between the minimum and maximum (11 - 15, 15 - 31, 31 - 48). It is tricky because everyday the maximum value could be different (today the max is 48, but tomorrow it could be 37), so I would need to pull the max value somehow.
Thank you!
You can use generate_array() and unnest():
select num_opens, count(t.num_opens)
from (select min(num_opens) as min_no, max(num_opens) as max_no
from t
) x cross join
unnest(generate_array(t.min_no, t.max_no)) num_opens left join
t
on t.num_opens = num_opens
group by num_opens;
You need a reference table to start with. From your picture you have something called users, but really any (big enough) table will do.
So to start, you'll build the reference table using a rank() or row_count() function. Or if your users.id has no gaps it's even easier to use that.
SELECT *, rank() OVER (ORDER BY id) as reference_value FROM users
This will generate a table 1....n for users.
Now you join onto that, but count from the joined in table:
SELECT
a.reference_value, count(b.num_opens) as total
FROM
(SELECT rank() OVER (ORDER BY id) as reference_value from users) a
LEFT JOIN
[whatever table] b ON a.reference_value = b.num_opens
GROUP BY
a.reference_value
But this is too many rows! You definitely have more users than these event counts. So throw a quick filter in there.
SELECT
a.reference_value, count(b.num_opens) as total
FROM
(SELECT rank() OVER (ORDER BY id) as reference_value from users) a
LEFT JOIN
[whatever table] b ON a.reference_value = b.num_opens
WHERE
a.reference_value <= (SELECT max(num_opens) FROM [whatever table])
GROUP BY
a.reference_value
I am trying to select max transaction_num from my table tbl_loan and group it by c_id to avoid duplicate of c_id.
here is my query
SELECT * FROM `tbl_loan` WHERE transaction_num IN (SELECT max(transaction_num) max_trans FROM tbl_loan GROUP BY c_id)
and my output is
still have duplicate c_id.
MySQL MAX with GROUP BY clause
To find the maximum value for every group, you use the MAX function with the GROUP BY clause in a SELECT statement.
You use the following query:
SELECT
*, MAX(transaction_num)
FROM
tbl_loan
GROUP BY c_id
ORDER BY MAX(transaction_num);
From the looks of it, and correct me if I'm wrong. The transaction number appears to be sequential per each C_ID whenever a new transaction happens. There is also the "I_ID" column which appears to be an auto-incrementing column which does not duplicate. It appears your transaction number is sequentially 1, 2, 3, etc per C_ID for simple counting purposes, so everyone starts with a 1, and those with more have 2nd and possibly 3rd and more...
So, if this is accurate and you want the most recent per C_ID, you really want the max "I_ID" per C_ID because multiple records will exist with a value of 2, 3, etc...
try this.
SELECT
TL.*
FROM
tbl_loan TL
JOIN ( SELECT C_ID, max(I_ID) maxI_ID
FROM tbl_loan
GROUP BY c_id) MaxPer
on TL.I_ID = MaxPer.MaxI_ID
So, from your data for C_ID = 55, you have I_ID = 61 (trans num = 1) and 62 (trans num = 2). So for ID = 55, you want the transaction I_ID = 62 which represents the second transaction.
For C_ID = 70, it has I_IDs of 77 & 78, of which will grab I_ID = 78.
The rest only have a single trans num and will get their only single entry id.
HTH
Think about it like this
Your query:
SELECT * FROM `tbl_loan` WHERE transaction_num IN (SELECT max(transaction_num) max_trans FROM tbl_loan GROUP BY c_id)
Lets say your subquery returns one transaction_num of 20. This 20 can be the same for multiple c_id's.
So your outer query is then running
SELECT * FROM `tbl_loan` WHERE transaction_num IN (20)
and returning all those results.
How do I count database records having 2 criteria.
Here is the pic:
I want to count posts which have condition 12 & 18 only. So after post 11 I should have the counter at value 1. Post 5, 6, 7, 9, 10 should not be counted since condition 12 & 18 don't show up.
To get the idea here is a query:
SELECT COUNT(post) from post_condition
where `condition`=18 AND `condition`=12
But this is wrong, I tried with some sub-queries but without success.
If you want to count the number that have both, then you can generate the matching posts using a subquery:
SELECT COUNT(*)
FROM (SELECT pc.post
FROM post_condition pc
WHERE pc.condition IN (12, 18)
GROUP BY pc.post
HAVING COUNT(pc.condition) = 2 -- assumes no duplicates
) t;
You can do this without a subquery, if you have a table of posts (which I suspect is true:
select count(*)
from posts p
where exists (select 1 from post_condition pc where pc.post = p.post and pc.condition = 12) and
exists (select 1 from post_condition pc where pc.post = p.post and pc.condition = 18);
With an index on post_condition(post, condition), this is likely to be the fastest method. This would be slower only if many, many posts had no rows at all in post_condition.
Try this:
SELECT COUNT(post)
FROM table
WHERE condition IN (12, 18)
GROUP BY post
HAVING COUNT(DISTINCT(condition)) = 2;
I'm having a mental block with this query, I'm trying to return the max date and the maximum time and do an order by of the identity. It would be greatly appreciate if someone can add a pair of eyes to this type of query So :
Data Set
Identity, Date, Time, Website
10, 5/10/15, 1, google.com
10, 5/10/15, 3, google.com
10, 5/10/15, 10, google.com
25, 5/11/15, 1, yahoo.com
25, 5/11/15, 15, yahoo.com
Expected Result
10, 5/10/15, 10, google.com
25, 5/11/15, 15, yahoo.com
Current Query
SELECT DISTINCT *, MAX(datetime) as maxdate, MAX(time), identity
FROM identity_track
GROUP BY identity
ORDER BY maxdate DESC
Something like this?
select identity, max(date), max(time), website
from identity_track
group by website;
Demo here: http://sqlfiddle.com/#!9/5cadf/1
You can order by any of the fields you want.
Also, the expected output you posted doesn't line up with what it seems like you're attempting to do.
edit
Updated query based on additional information.
select t.identity, t.date, max(t.time), t.website
from t
inner join
(select identity, website, max(date) d
from t
group by identity, website) q
on t.identity = q.identity
and t.website = q.website
and q.d = t.date
group by t.identity, t.website, t.date
This one should give you the users identity, the pages he visited, the last time he visited that page, and the most amount of time he spent in any visit on that last visit.
Don't assume that all records for an identity are on the same day e.g. if the entity has times of 1/1/15 5pm and 1/2/15 2pm you'd get 1/2/15 5pm which is wrong.
I'd always merge the time and date but if you can't try this:
select t.identity, t.website, MAX(t.time)
FROM t
INNER JOIN
(
select identity, max(date) as max_date
from t
group by identity;
) x
ON t.identity = x.identity
AND t.date = x.max_date
group by t.identity, t.website
Firstly we get the maximum date for each site. Then for that day, get the maximum time.
Hope this helps.
I have a table called log_payment that has a series of payment records like:
log_user_id, log_date, log_payment_id
13, 2013-01-01 01:13:00, TRIAL<BR>
13, 2013-01-02 01:18:00, 1<BR>
13, 2013-01-03 01:05:00, 2
What I want to get is the payment id and date of the users last record. So I want that user_id's last transaction was 01/03 and has a payment id of 2. So I wrote this query:
select max(log_date) as max_date,log_user_id,log_payment_id from log_payment group by log_user_id
but it returns 13, 2013-01-03 01:05:00, TRIAL
So based on some data I found somewhere else, I tried this:
select log_user_id, max_date, log_payment_id from (select log_user_id,max(log_date) as max_date from log_payment group by _log_user_id) payment_table inner join log_payment on payment_table.log_user_id = log_payment.log_user_id and payment_table.max_date = log_payment.log_date
But this goes on for several minutes until I finally just cancel it. What am I missing?
Your query, which I have reparsed, looks good, except for the _log_user_id in the group by. It should be log_user_id:
select log_user_id,
max_date,
log_payment_id from
(select log_user_id,max(log_date) as max_date from log_payment group by _log_user_id)
payment_table
inner join
log_payment
on payment_table.log_user_id = log_payment.log_user_id and
payment_table.max_date = log_payment.log_date
Depending on the size of your tables the query might be slow. Try adding a LIMIT 10 at the end of the query to see if that gets you the desired result for the first 10 tuples.
--dmg
The best solution for the Group by order is use a subquery to make the order by for you:
SELECT t1.*
FROM `log_payment` t1
WHERE `id` = (
SELECT `id`
FROM `log_payment` `t2`
WHERE `t2`.`log_user_id` = `t1`.`log_user_id`
ORDER BY `t2`.`log_date` DESC
LIMIT 1
)
It also should be really fast. Of course it always relies on your index's setup.