I am working with a table of items with expiration dates,these items are assigned to users.
I want to get for each user,the highest expiration date.The issue here is that default items are initialized with a '3000/01/01' expiration date that should be ignored if another item exists for that user.
I've got a query doing that:
SELECT
user_id as UserId,
CASE WHEN (YEAR(MAX(date_expiration)) = 3000)
THEN (
SELECT MAX(temp.date_expiration)
FROM user_items temp
WHERE YEAR(temp.date_expiration) <> 3000 and temp.user_id = UserId
)
ELSE MAX(date_expiration)
END as date_expiration
FROM user_items GROUP BY user_id
This works, but the query inside THEN block is killing performance a bit and it is a huge table.
So,Is there a better way to ignore the default date from the MAX operation when entering the CASE condition?
SELECT user_id,
COALESCE(
MAX(CASE WHEN YEAR(date_expiration) = 3000 THEN NULL ELSE date_expiration END),
MAX(date_expiration)
)
FROM user_items
GROUP BY
user_id
If there are few users but lots of entries per user in your table, you can try improving your query yet a little more:
SELECT user_id,
COALESCE(
(
SELECT date_expiration
FROM user_items uii
WHERE uii.user_id = uid.user_id
AND date_expiration < '3000-01-01'
ORDER BY
user_id DESC, date_expiration DESC
LIMIT 1
),
(
SELECT date_expiration
FROM user_items uii
WHERE uii.user_id = uid.user_id
ORDER BY
user_id DESC, date_expiration DESC
LIMIT 1
)
)
FROM (
SELECT DISTINCT
user_id
FROM user_items
) uid
You need an index on (user_id, date_expiration) for this to work fast.
Related
I have a very slow MySQL syntax which is basically unusable since the table has grown to over 5000 entries. It takes more than 30 sec so the server sends error code and quits.
The syntax is:
SELECT
id,
user_id,
date
FROM
table
WHERE
id IN (
SELECT
MAX(id)
FROM
table
GROUP BY date
)
AND
company_id = '1'
AND
date > '1473700785'
AND
complete = '1'
AND
name = "random string"
ORDER BY id ASC
Structure:
id - int(11)
user_id - int(10)
company_id - int(11)
date - varchar(20)
complete - varchar(2)
name - varchar(75)
Do you have any idea what could be slowing it? It used to function as expected with a much smaller table size (under 1000 entries).
Apart from subquery (like below), the best method is indexing. Like what most people here suggested
SELECT id, user_id, date
FROM table min
--sub queries sometimes run faster than IN / NOT IN
JOIN (
SELECT SELECT MAX(id)
FROM table
GROUP BY date
)
max on max.id = min.id
WHERE min.company_id = '1'
AND min.date > '1473700785'
AND min.complete = '1'
AND min.name = "random string"
ORDER BY min.id ASC
At first you need index for date field.
And you need store date as integer, because you use this expression
date > '1473700785'
Indexing is good, but I don't see the need for a SUB-SELECT
SELECT
MAX(t.id) as id,
u.user_id,
t.date
FROM table t
JOIN table u ON u.id=MAX(t.id )
WHERE
t.company_id = '1'
AND
t.date > '1473700785'
AND
t. complete = '1'
AND
t.name = "random string"
GROUP BY t.date
ORDER BY t.id ASC
Lets say we have a table named record with 4 fields
id (INT 11 AUTO_INC)
email (VAR 50)
timestamp (INT 11)
status (INT 1)
And the table contains following data
Now we can see that the email address test#xample.com was duplicated 4 times (the record with the lowest timestamp is the original one and all copies after that are duplicates). I can easily count the number of unique records using
SELECT COUNT(DISTINCT email) FROM record
I can also easily find out which email address was duplicated how many times using
SELECT email, count(id) FROM record GROUP BY email HAVING COUNT(id)>1
But now the business question is
How many times STATUS was 1 on all the Duplicate Records?
For example:
For test#example.com there was no duplicate record having status 1
For second#example.com there was 1 duplicate record having status 1
For third#example.com there was 1 duplicate record having status 1
For four#example.com there was no duplicate record having status 1
For five#example.com there were 2 duplicate record having status 1
So the sum of all the numbers is 0 + 1 + 1 + 0 + 2 = 4
Which means there were 4 Duplicate records which had status = 1 In table
Question
How many Duplicate records have status = 1 ?
This is a new solution that works better. It removes the first entry for each email and then counts the rest. It's not easy to read, if possible I would write this in a stored procedure but this works.
select sum(status)
from dude d1
join (select email,
min(ts) as ts
from dude
group by email) mins
using (email)
where d1.ts != mins.ts;
sqlfiddle
original answer below
Your own query to find "which email address was duplicated how many times using"
SELECT email,
count(id) as duplicates
FROM record
GROUP BY email
HAVING COUNT(id)>1
can easily be modified to answer "How many Duplicate records have status = 1"
SELECT email,
count(id) as duplicates_status_sum
FROM record
GROUP BY email
WHERE status = 1
HAVING COUNT(id)>1
Both these queries will answer including the original line so it's actually "duplicates including the original one". You can subtract 1 from the sums if the original one always have status 1.
SELECT email,
count(id) -1 as true_duplicates
FROM record
GROUP BY email
HAVING COUNT(id)>1
SELECT email,
count(id) -1 as true_duplicates_status_sum
FROM record
GROUP BY email
WHERE status = 1
HAVING COUNT(id)>1
If I am not wrong in understanding then your query should be
SELECT `email` , COUNT( `id` ) AS `tot`
FROM `record` , (
SELECT `email` AS `emt` , MIN( `timestamp` ) AS `mtm`
FROM `record`
GROUP BY `email`
) AS `temp`
WHERE `email` = `emt`
AND `timestamp` > `mtm`
AND `status` =1
GROUP BY `email`
HAVING COUNT( `id` ) >=1
First we need to get the minimum timestamp and then find duplicate records that are inserted after this timestamp and having status 1.
If you want the total sum then the query is
SELECT SUM( `tot` ) AS `duplicatesWithStatus1`
FROM (
SELECT `email` , COUNT( `id` ) AS `tot`
FROM `record` , (
SELECT `email` AS `emt` , MIN( `timestamp` ) AS `mtm`
FROM `record`
GROUP BY `email`
) AS `temp`
WHERE `email` = `emt`
AND `timestamp` > `mtm`
AND `status` =1
GROUP BY `email`
HAVING COUNT( `id` ) >=1
) AS t
Hope this is what you want
You can get the count of Duplicate records have status = 1 by
select count(*) as Duplicate_Record_Count
from (select *
from record r
where r.status=1
group by r.email,r.status
having count(r.email)>1 ) t1
The following query will return the duplicate email with status 1 count and timestamp
select r.email,count(*)-1 as Duplicate_Count,min(r.timestamp) as timestamp
from record r
where r.status=1
group by r.email
having count(r.email)>1
I have the following query which queries a table of sports results for the last 20 matches that involved a teams, returning goals conceeded in each of these matches.
SELECT *, `against` AS `goalsF` , `for` AS `goalsA`
FROM `matches` , `teams` , `outcomes`
WHERE (
`home_team_id`=7 AND `matches`.away_team_id = `teams`.team_id
OR
`away_team_id`=7 AND `matches`.home_team_id = `teams`.team_id
)
AND `matches`.score_id = `outcomes`.outcome_id
ORDER BY `against', `date` DESC
LIMIT 0 , 20
I want sort the results by goals conceeded and then within each group of goals conceeded by date so for example.
the first 4 results where goals conceded=1 in date order
then the next 3 might be results where conceded=2 in date order
I have tried ORDER by date,against - this gives me a strict date order
I have tried ORDER by against,date - this gives me matches beyond the last 20
Is it possible to do what I want to do?
Thanks everyone, I found this worked. This solution was posted by another user but then was removed, not sure why?
SELECT * FROM (
SELECT *, `against` AS `goalsF` , `for` AS `goalsA`
FROM `matches` , `teams` , `outcomes`
WHERE (
`home_team_id`=7 AND `matches`.away_team_id = `teams`.team_id
OR
`away_team_id`=7 AND `matches`.home_team_id = `teams`.team_id
)
AND `matches`.score_id = `outcomes`.outcome_id
ORDER by `goalsF`
LIMIT 0 , 20
) res
ORDER BY `date` DESC
If you want to limit by date, add the date range you are looking for into your WHERE clause and then order by the number of goals conceded.
I have the following tables in my game's database:
rankedUp (image_id, user_id, created_at)
globalRank (image_id, rank )
matchups (user_id, image_id1, image_id2)
All image_ids in globalRank table are assigned a rank which is a float from 0 to 1
Assuming I have the current logged in user's "user_id" value, I'm looking for a query that will return a pair of image ids (imageid1, imageid2) such that:
imageid1 has lower rank than imageid2 but is also the next highest rank less than imageid2
matchups table doesn't have (userid,imageid1,imageid2) or (userid,imageid2,imageid1)
rankedup table doesn't have (userid,imageid1) or if it does, the createdat column is older than X hours
What I have so far for requirement 1 is this:
SELECT lowerImages.image_id AS lower_image, higherImages.image_id AS higher_image
FROM global_rank AS lowerImages, global_rank AS higherImages
WHERE lowerImages.rank < higherImages.rank
AND lowerImages.image_id = (
SELECT image_id
FROM (
SELECT image_id
FROM global_rank
WHERE rank < higherImages.rank
ORDER BY rank DESC
LIMIT 1 , 1
) AS tmp
)
but it doesnt work because I can't reference higherImages.rank in the subquery.
Does anyone know how I could satisfy all of those requirements in one query?
Thanks for your help
EDIT:
I now have this query but I don't know about the efficiency and I need to test it for correctness:
SELECT lowerImages.image_id AS lower_image,
max(higherImages.image_id) AS higher_image
FROM global_rank AS lowerImages, global_rank AS higherImages
WHERE lowerImages.rank < higherImages.rank
AND 1 NOT IN (select 1 from ranked_up where
lowerImages.image_id = ranked_up.image_id
AND ranked_up.user_id = $user_id
AND ranked_up.created_at > DATE_SUB(NOW(), INTERVAL 1 DAY))
AND 1 NOT IN (
SELECT 1 from matchups where user_id = $userId
AND lower_image_id = lowerImages.image_id
AND higher_image_id = higherImages.image_id
UNION
SELECT 1 from matchups where user_id = $user_id
AND lower_image_id = higherImages.image_id
AND higher_image_id = lowerImages.image_id
)
GROUP BY 1
the "not in" statements I'm using are all indexed so they should run fast. The efficiency problem I have is the group by and selection of the global_rank tables
This question is a revision of Pretty Complex SQL Query, which should no longer be answered.
select
(
select image_id, rank from
rankedup inner join globalRank
on rankedup.image_id = globalRank .image_id
where user_id = XXX
limit 1, 1
) as highest,
(
select image_id, rank from
rankedup inner join globalRank
on rankedup.image_id = globalRank .image_id
where user_id = XXX
limit 2, 1
) as secondhighest
I normally use SQL Server, but this i think is the translation for mysql :)
This should do the trick:
SELECT lowerImages.*, higherImages.*
FROM globalrank AS lowerImages, globalrank AS higherImages
WHERE lowerImages.rank < higherImages.rank
AND lowerImages.image_id = (
SELECT image_id
FROM (
SELECT image_id
FROM globalrank
WHERE rank < higherImages.rank
ORDER BY rank DESC
LIMIT 1,1
) AS tmp
)
AND NOT EXISTS (
SELECT * FROM matchups
WHERE user_id = $user_id
AND ((image_id1 = lowerImages.image_id AND image_id2 = higherImages.image_id)
OR (image_id2 = lowerImages.image_id AND image_id1 = higherImages.image_id))
)
AND higherImages.image_id NOT IN (
SELECT image_id FROM rankedup
WHERE created_at < DATE_ADD(NOW(), INTERVAL 1 DAY)
AND USER_ID <> $user_id
)
ORDER BY higherImages.rank
I'm assuming the PKs of matchups and rankedup include all columns in those tables. This would allow the second 2 sub-queries to utilize the PK indexes. You would probably want an ordered index on globalrank.rank to speed up the first sub-query.
I have a query
select user_id,sum(hours),date, task_id from table where used_id = 'x' and date >='' and date<= '' group by user_id, date, task_id with roll up
The query works fine. But I also need to find a second sum(hours) where the group by order is changed.
select user_id,sum(hours),date, task_id from table where used_id = 'x' group by user_id,task_id
(The actual where condition is much longer.)
Is it possible to get both the sum in a single query since the where condition almost the same?
SELECT * FROM (
SELECT 1 AS list_id
, user_id
, sum(hours) AS total_hours
, `date`
, task_id
FROM table WHERE used_id = 'x' AND `date` BETWEEN #thisdate AND #thatdate
GROUP BY user_id, `date`, task_id /*WITH ROLLUP*/
UNION ALL
SELECT 2 AS list_id
, user_id
, sum(hours) AS total_hours
, `date`
, task_id
FROM table
WHERE used_id = 'x'
GROUP BY user_id,task_id WITH ROLLUP ) q
/*ORDER BY q.list_id, q.user_id, q.`date`, q.task_id*/
Depending on your needs, you should only need one with rollup, or two.