SELECT newest rows, ignore old duplicates

SELECT newest rows, ignore old duplicates - mysql

In my table, i have the following columns :
CRMID | user | ticket_id | | description | date | hour
what i am trying to do is to select all the rows from the table, but when two (or more) rows have the same ticket_id, i want only the newest one to appear in the results, so the row with the newest date and hour.
the problem here is that i should be addin cases, if the values from the date column are the same, then i will compare the hour colum, otherwise, its simple cauz i'll be comparing only the date column.

SELECT
n.*
FROM
table n RIGHT JOIN (
SELECT
MAX(date) AS max_date,
(SELECT MAX(hour) AS hour WHERE date = max_date) AS hour,
user,
ticket_id
FROM
table
GROUP BY
user,
ticket_id
) m ON n.user = m.user AND n.ticket_id = m.ticket_id

You may want to combine your date and hour columns, then perform the comparison
SELECT foo.*
FROM foo
JOIN (SELECT ticket_id, MAX(ADDTIME(`date`,`hour`)) as mostrecent
FROM foo
GROUP BY ticket_id) AS bar
ON bar.ticket_id = foo.ticket_id
and bar.mostrecent = ADDTIME(foo.`date`,foo.`hour`);

Related

Mysql multi column count having minimum occurrences values in columns

I have a table t1 with 5 columns and 80000 rows :
+---+--------+-------+--------+------------+
|id |category|groupe |subject | description|
+---+--------+-------+--------+------------+
|1 |categ1 |group1 |subject1| desc1 |
|2 |categ1 |group2 |subject2| desc2 |
|3 |categ1 |group2 |subject5| desc3 |
|4 |categ2 |group1 |subject5| desc4 |
|5 |categ2 |group3 |subject1| desc5 |
|6 |categ2 |group3 |subject2| desc6 |
|7 |categ3 |group1 |subject1| desc7 |
|8 |categ3 |group1 |subject4| desc8 |
+---+--------+-------+--------+------------+
I need to extract rows that have minimum 30 occurrences of values in category AND 30 occurrences of group AND 30 of subject.
This means if "categ3" appears more than 30 times, i need rows with categ3
same with group and subject.
but when i used the query bellow the final result can have less than 30 categ3 because result has been filtered by group or subject that remove id who have categ3.
You can see an example on db<>fiddle,the good query result count() with 10 occurences have to return 118 rows.
select
*
from
t1
where
category in (
SELECT
category
FROM
t1
GROUP BY
category
HAVING
COUNT(category) >= 30
)
and
groupe in (
SELECT
groupe
FROM
t1
GROUP BY
groupe
HAVING
COUNT(groupe) >= 30
)
and
subject in (
SELECT
subject
FROM
t1
GROUP BY
subject
HAVING
COUNT(subject) >= 30
)
This query return intersection on ID where category,groupe and subject have 30 occurrences on values, but this intersection reduce the result count...
this means certain category values count could be reduce to a number less than 30.
for resume,i need 30 occurences in the intersection result.
I think I need to do a recursive filter and have to repeat the loop until input rows is equal to output rows.. But I don't know how to do that... An idea?
Thanks 😊

Add some DISTINCT's, while grouping on the 3 columns.
select *
from dataset t
where t.category in (SELECT distinct category FROM dataset GROUP BY category, groupe, subject HAVING COUNT(*) >= 30)
and t.groupe in (SELECT distinct groupe FROM dataset GROUP BY category, groupe, subject HAVING COUNT(*) >= 30)
and t.subject in (SELECT distinct subject FROM dataset GROUP BY category, groupe, subject HAVING COUNT(*) >= 30)
A test on db<>fiddle here
For reference sake, this query will only select those with a tupple that occurs 30 times or more.
Which will naturally be less that the query above.
SELECT *
FROM dataset
WHERE (category, groupe, subject) IN (
SELECT category, groupe, subject
FROM dataset
GROUP BY category, groupe, subject
HAVING COUNT(*) >= 30
)

Pro tip: This is a case where describing your requirement takes a lot of thought. As you think about it, think of SQL as a processor of sets of rows. It is always worthwhile to describe the requirement as carefully as you can, especially when it is as tricky as this one. Often it's helpful to describe the problem domain, rather than just talking about columns and values.
I guess you need the sets of rows meeting your three different criteria (more than x duplicates). You can use a set of id values for those rows because they are apparently a primary key (unique).
Here's one set of IDs
SELECT id FROM dataset WHERE category IN (
SELECT category FROM dataset GROUP BY category HAVING COUNT(*) >= 5))
I believe you need all the rows lying in the intersection of those three sets. That is, you want any rows having all three items recurring frequently. You can get that with
id IN set1 AND id IN set2 AND id IN set3
If you need the union of those sets you can use this instead. This gives you the rows with any of the three items recurring frequently.
id IN set1 OR id IN set2 OR id IN set3
So here's the query.
SELECT *
FROM dataset
WHERE id IN (
SELECT id FROM dataset WHERE category IN (
SELECT category FROM dataset GROUP BY category HAVING COUNT(*) >= 5))
AND id IN (
SELECT id FROM dataset WHERE groupe IN (
SELECT groupe FROM dataset GROUP BY groupe HAVING COUNT(*) >= 5))
AND id IN (
SELECT id FROM dataset WHERE subject IN (
SELECT subject FROM dataset GROUP BY subject HAVING COUNT(*) >= 5))
I used 5 for the repeat threshold. You can use another number.
If you want your result set to contain only those rows with at least ten items in the result set, rather than in the dataset, you would use this query.
select d.*
from dataset d
join (
select count(*), groupe, category, subject
from dataset
group by groupe, category, subject
having count(*) >= 10
) e ON d.groupe=e.groupe AND d.category = e.category AND d.subject = e.subject

MySQL - Group By Latest and Join First Instance

I've tried a few things but I've ended up confusing myself.
What I am trying to do is find the most recent records from a table and left join the first after a certain date.
An example might be
id | acct_no | created_at | some_other_column
1 | A0001 | 2017-05-21 00:00:00 | x
2 | A0001 | 2017-05-22 00:00:00 | y
3 | A0001 | 2017-05-22 00:00:00 | z
So ideally what I'd like is to find the latest record of each acct_no sorted by created_at DESC so that the results are grouped by unique account numbers, so from the above record it would be 3, but obviously there would be multiple different account numbers with records for different days.
Then, what I am trying to achieve is to join on the same table and find the first record with the same account number after a certain date.
For example, record 1 would be returned for a query joining on acct_no A0001 after or equal to 2017-05-21 00:00:00 because it is the first result after/equal to that date, so these are sorted by created_at ASC AND created_at >= "2017-05-21 00:00:00" (and possibly AND id != latest.id.
It seems quite straight forward but I just can't get it to work.
I only have my most recent attempt after discarding multiple different queries.
Here I am trying to solve the first part which is to select the most recent of each account number:
SELECT latest.* FROM my_table latest
JOIN (SELECT acct_no, MAX(created_at) FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no
but that still returns all rows rather than the most recent of each.
I did have something using a join on a subquery but it took so long to run I quite it before it finished, but I have indexes on acct_no and created_at but I've also ran into other problems where columns in the select are not in the group by. I know this can be turned off but I'm trying to find a way to perform the query that doesn't require that.

Just try a little edit to your initial query:
SELECT latest.* FROM my_table latest
join (SELECT acct_no, MAX(created_at) as max_time FROM my_table GROUP
BY acct_no) latest2
ON latest.acct_no = latest2.acct_no AND latest.created_at = latest2.max_time

Trying a different approach. Not sure about the performance impact. But hoping that avoiding self join and group by would be better in terms of performance.
SELECT * FROM (
SELECT mytable1.*, IF(#temp <> acct_no, 1, 0) selector, #temp := acct_no FROM `mytable1`
JOIN (SELECT #temp := '') a
ORDER BY acct_no, created_at DESC , id DESC
) b WHERE selector = 1
Sql Fiddle

you need to get the id where max date is created.
SELECT latest.* FROM my_table latest
join (SELECT max(id) as id FROM my_table GROUP
BY acct_no where created_at = MAX(created_at)) latest2
ON latest.id = latest2.id

mysql distinct does not filter unique ids

Using mysql I'm attempting to display the results with a list of devices (serialno) that have not appeared for a specific time (last_seen) and then only display unique devices with the max(last_seen). The last_seen is an init value which is a number that increments (think minutes) when the device has not been seen. Imagine a table that has a row of with serialno "L123" with last_seen "1", then after another minute, serialno "L123" with last_seen "2", and so fourth. Using max(last_seen) the results should display the highest number or the last time the device was seen.
Works so far, but I'm noticing where a device serialno L123 will display twice, how can I filter the results to only display the highest last_seen? I've tried two scenarios using distinct but neither of them seem to work.
As an example of what i get (not working)
email | serialno | Last seen (min)
abc#example.com | L123 | 30
abc1#example.com | K900 | 20
abc2#example.com | L123 | 1 <--yes the email is different but same serialno
As an example of what want to see
email | serialno | Last seen (min)
abc#example.com | L123 | 30
abc1#example.com | K900 | 20
Scenario 1: select distinct in a where sub-query
SELECT
email,
serialno,
max(last_seen)
FROM
my_table
WHERE
last_seen IN (SELECT last_seen FROM my_table WHERE last_seen > 0)
AND
serialno IN (SELECT distinct serialno FROM my_table)
GROUP BY
2,1
ORDER BY
3 DESC
Scenario 2: using having, after group by
SELECT
email,
serialno,
max(last_seen)
FROM
my_table
WHERE
last_seen IN (SELECT last_seen FROM my_table WHERE last_seen > 0)
GROUP BY
2,1
HAVING
serialno in (SELECT distinct serialno FROM my_table)
ORDER BY
3 DESC

JOIN with a sub-query which is used to find each serialno's max last_seen value:
select t1.*
from my_table t1
join (select serialno, max(Last_seen) Last_seen
from my_table
group by serialno) t2
on t1.serialno = t2.serialno and t1.Last_seen = t2.Last_seen
order by t1.Last_seen desc

Distinct works on the values you actually want to output. If you use it in a subselect, you must also filter the unique values in the subselect.
Group only by serialno though in the first query. You just want grouped results for your serialno. You don't need a in clause or having.

SQL Group by day from timestamp with two tables

I have two tables with timestamp columns.
Table #1 contains clicks, timestamp and Table #2 contains userid, timestamp. I want the counts of clicks and users by date. for example
Date clicks_count users_count
2015-07-24 10 15
2015-07-24 04 06

I think these SQL useful to you.
select a.date1,clicks_count,users_count from
(select date(Table1.timestamp)as date1, count(clicks) as clicks_count
from Table1
group by date(Table1.timestamp)) as a
join
(
select date(Table2.timestamp) date2, count(userid) as users_count
from Table2
group by date(Table2.timestamp)) b on a.date1 = b.date2
Thank you.

select date(timestamp),
sum(is_click) as clicks,
sum(is_click = 0) as user_count
from
(
select timestamp, 1 as is_click from table1
union all
select timestamp, 0 from table2
) tmp
group by date(timestamp)
You can select the timestamps from both tables together and add a calculated column that indicates from which table the timestamp came from.
Then you take that subquery result and group by by the date and count the users and clicks.
sum(is_click = 0) counts how many time the timestamp came from the users table.

How to Obtain First and Last record ? One Step Solution?

I have the following data table.
Record Date Price
A 3/1/2015 5
A 3/2/2015 6
A 3/3/2015 7
A 3/4/2015 10
B 2/1/2015 4
B 2/2/2015 6
B 2/3/2015 15
B 2/4/2015 2
How can I output a table that only shows the First price and the last price for each record for the first date in the table and the last date in the table. Output columns would be Record, First Price, Last Price. I am looking for a one step solution that is easy to implement in order to create a custom view.
The output desired would be:
Record FirstPrice LastPrice
A 5 10
B 4 2

Perhaps something like this is what you are looking for?
select R.Record, FD.Price as MinPrice, LD.Price as MaxPrice
from Records R
join (
select Price, R1.Record
from Records R1
where Date = (select MIN(DATE) from Records R2 where R2.Record = R1.Record)
) FD on FD.Record = R.Record
join (
select Price, R1.Record
from Records R1
where Date = (select MAX(DATE) from Records R2 where R2.Record = R1.Record)
) LD on LD.Record = R.Record
group by R.Record
http://sqlfiddle.com/#!9/d047b/26

Get the min and max aggregate dates grouped by the record field and join back to the root data. If you can have multiple records for the same record field on the same date, you will have to use min, max or avg to get just one value for that date.
SQLFiddle: http://sqlfiddle.com/#!9/1158b/3
SELECT anchorData.Record
, firstRecord.Price
, lastRecord.Price
FROM (
SELECT Record
, MIN(Date) AS FirstDate
, MAX(Date) AS LastDate
FROM Table1
GROUP BY Record
) AS anchorData
JOIN Table1 AS firstRecord
ON firstRecord.Record = anchorData.Record
AND firstRecord.Date = anchorData.FirstDate
JOIN Table1 AS lastRecord
ON lastRecord.Record = anchorData.Record
AND lastRecord.Date = anchorData.LastDate

"in order to create a custom view."...are you looking to do this in Oracle/MySql as a CREATE VIEW or just a query/select statement?

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SELECT newest rows, ignore old duplicates - mysql

SELECT n.* FROM table n RIGHT JOIN ( SELECT MAX(date) AS max_date, (SELECT MAX(hour) AS hour WHERE date = max_date) AS hour, user, ticket_id FROM table GROUP BY user, ticket_id ) m ON n.user = m.user AND n.ticket_id = m.ticket_id

You may want to combine your date and hour columns, then perform the comparison SELECT foo.* FROM foo JOIN (SELECT ticket_id, MAX(ADDTIME(`date`,`hour`)) as mostrecent FROM foo GROUP BY ticket_id) AS bar ON bar.ticket_id = foo.ticket_id and bar.mostrecent = ADDTIME(foo.`date`,foo.`hour`);

Related

Mysql multi column count having minimum occurrences values in columns

MySQL - Group By Latest and Join First Instance

mysql distinct does not filter unique ids

SQL Group by day from timestamp with two tables

How to Obtain First and Last record ? One Step Solution?

Categories

Resources