find first transaction after created date and added to a column MySQL - mysql

I am using MySQL version 8.0
MRE:
create table users(
user varchar(5),
work_type varchar(20),
time datetime
);
insert into users(user, work_type, time)
Values ("A", "create", "2020-01-01 11:11:11")
, ("A", "bought", "2020-01-04 16:11:11")
, ("A", "bought", "2020-01-07 18:10:10")
, ("A", "bought", "2020-01-08 12:00:11")
, ("A", "create", "2020-02-02 15:17:11")
, ("A", "bought", "2020-02-02 16:11:11");
In my table for each user there is a "work_type" column which specifies what user does.
user work_type time
A create 2020-01-01 11:11:11
A bought 2020-01-04 16:11:11
A bought 2020-01-07 18:10:10
A bought 2020-01-08 12:00:11
A create 2020-02-02 15:17:11
A bought 2020-02-02 16:11:11
Since after user A "create" their account I want to find only first bought time and add it to new column
user work_type time bought_time
A create 2020-01-01 11:11:11 2020-01-04 16:11:11
A create 2020-02-02 15:17:11 2020-02-02 16:11:11
Notice that user A can have multiple create work_type. Above is the desired output however there will be multiple user as well.

A correlated subquery in the select list can retrieve a single value. I use the order by time asc limit 1 clauses to limit the number of returned rows to 1:
select t.*, (select t2.`time` from yourtable t2 where t2.user=t.user and t2.`time` > t.`time` and t2.work_type='bought' order by t2.`time` asc limit 1) as bought_time
from yourtable t
where work_type='create'
The above query is fine, as long as you have at least 1 bought record after each create one. If you cannot guarantee this and you have no other fields to link a create with the subsequent bought, then you have to complicate things to check for the type of the next record after the create. Note: I do not filter on the work_type field in the subquery any longer:
select t.*, (select if(t2.work_type='bought',t2.`time`,null) from yourtable t2 where t2.user=t.user and t2.`time` > t.`time` order by t2.`time` asc limit 1) as bought_time
from yourtable t
where work_type='create'
If the create and subsequent bought records form part of a set, then I would definitely create a field that links them together, meaning that this field would have the same value for all records belonging to the same set. This way it would be really easy to identify which records form part of the set.

Solution for your problem:
SELECT * FROM
(
SELECT
user
,work_type
,CASE WHEN UPPER(work_type) = 'CREATE' THEN time END time
,CASE WHEN UPPER(work_type) = 'CREATE'
THEN LEAD(time) OVER(PARTITION BY user ORDER BY time) END bought_time
FROM
Table1) A
WHERE UPPER(work_type) = 'CREATE';
Link for demo:
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=ac0cf9375025b964769fd28514db0ce1

Related

How do I find duplicate records in MySQL database

I have a table with 60k records, I need to find records which are duplicates based on column Crime ID, so far I found out this:
SELECT * FROM crimedata GROUP BY `Crime ID` HAVING COUNT(`Crime ID`) > 1
This query returns how many times particular Crime ID occurred. As most of Crime ID appered twice it worked, but I have also 10k of records where Crime ID is empty(its not null) and that query can't distinguish that. I need a query that would return every Crime ID that is duplicate and would leave one as the unique first.
Crime ID | column2 | column3 |
------------------------------
abc a b 1
abc a a 2
a b b 3
b b b 4
a a a 5
b a a 6
abc b a 7
From this example query would return 2, 5, 6, 7 record.
You can use rank.
SELECT* from (
SELECT `Crime ID`, `column 2`, `column 3`, Rank() over (partition by CrimeID order by `Crime ID`) as myrank
FROM crimedata
) rankedlist
WHERE myrank = 1
If you want to select the duplicate records and ignore the originals, you can select WHERE myrank > 1
If you order by something meaningful, like the filing date (if you keep track of it), you will be able to select the entry which was there before others.
Selecting the duplicate records will let you aggregate data from them and merge them into the original record, if this is your intent.
You need to check null, then it will give you desired output. Below query was tested on MSSQL.
Which database are you using? If you understand this then change it so it works on your server else tell us.
I tried to convert the query to MySQL, please check.
SELECT `Crime ID` ,COUNT(NULLIF(`Crime ID`,'')) FROM crimedata GROUP BY `Crime ID` HAVING COUNT(NULLIF(`Crime ID`,'')) > 1
I have found out how to get what I wanted.
SELECT *
FROM
table
GROUP BY
`column1`,
`column2`,
`column3`,
HAVING COUNT(`column1`) > 1
AND COUNT(`column2`) > 1
AND COUNT(`column3`) > 1
This returns me every record which appear more than once in database.

Complicated MysQL query to find each time a user appears more than once on the same day

I am trying to query a table. There are 3 important fields: attendant_id, client_id, and date.
Each time an attendant works with a client, they add an entry which includes their id, the client's id, and the date. Occasionally, an attendant will work with more than one client on the same day. I would like to capture when this happens. Here is what I have so far:
SELECT *
FROM timesheet_lines tsl1
WHERE EXISTS
(
SELECT *
FROM timesheet_lines tsl2
WHERE tsl1.date = tsl2.date
AND tsl1.attendant_id = tsl2.attendant_id
AND tsl1.client_id <> tsl2.client_id
AND tsl1.date between '2014-04-01' AND '2014-06-30'
LIMIT 2,5
)
I only want to display results where an attendant worked with at least 2 different clients. I don't expect it to be possible to have more than 5 on a single day. This is why I am using LIMIT 2,5.
I am also only interested in April through June of this year.
I think I may have the right syntax, but the query seems to be taking forever to run. Is there a faster query? There should be only about 42000+ entries all together for this particular date range. I am not expecting to get more than about 500-600 results that meet the criteria.
I ended up using the following:
create TEMPORARY table tempTSL1
(date1 date, start1 time, end1 time, attend1 varchar(50), client1 varchar(50), type1 tinyint);
insert into tempTSL1(date1, start1, end1, attend1, client1, type1)
select date, start_time, end_time, attendant_id, client_id, type
from timesheet_lines
WHERE
timesheet_lines.date BETWEEN '2014-04-01' AND '2014-06-30'
and timesheet_lines.type IN (1,2,5,6);
create TEMPORARY table tempTSL2
(date2 date, start2 time, end2 time, attend2 varchar(50), client2 varchar(50), type2 tinyint);
insert into tempTSL2(date2, start2, end2, attend2, client2, type2)
select date, start_time, end_time, attendant_id, client_id, type
from timesheet_lines
WHERE
timesheet_lines.date BETWEEN '2014-04-01' AND '2014-06-30'
and timesheet_lines.type IN (1,2,5,6);
SELECT *
FROM tempTSL1
WHERE (attend1,date1) IN (
SELECT attend2
,date2
FROM tempTSL2 tsl2
GROUP BY attend2
,date2
HAVING COUNT(date2) > 1
)
GROUP BY attend1
,client1
,date1
HAVING COUNT(client1) = 1
ORDER BY date1,attend1,start1
You are likely making it much more complex than it needs to be. Try something like this:
SELECT attendant_id
,client_id
,date
FROM timesheet_lines
WHERE (attendant_id,date) IN (
SELECT attendant_id
,date
FROM timesheet_lines tsl1
GROUP BY attendant_id
,date
HAVING COUNT(date) > 1
)
GROUP BY attendant_id
,client_id
,date
HAVING COUNT(client_id) = 1
The subquery returns results only of attendants performing multiple activities on the same date. The top query will pull from the same table, matching the attendant and dates of activity, and filter the result set to items where there is only 1 client in the grouping. Example:
attendant_id client_id date
1 A 2014-01-01
1 B 2014-01-01
2 C 2014-01-01
2 D 2014-01-02
Will return:
attendant_id client_id date
1 A 2014-01-01
1 B 2014-01-01
Untested, but I think it should be in line with what you are looking for, assuming the following two statements are true:
You are not trying to capture two different attendants working the same client on the same day
An attendant can only perform one activity per client per day
If the second point is not true, then you will need to incorporate additional fields into the subquery (such as an activity_id or something).
Hope this helps.

How do I get the sum of a column across multiple keys?

I have data that looks like this:
id int (11) primary key auto_increment
key int (2)
type int (2)
data int (4)
timestamp datetime
There are 5 different keys - 1,2,3,4,5 and three types - 1,2,3
Data is put in continuously against a key and of a particular type.
What I need to extract is a sum of the data for a particular type (say, type 1) across all 5 keys (1,2,3,4,5) so it is a sum of exactly 5 records. I only want to sum the latest (max(timestamp) values (there are 5 of them) of data for each key, but they may all have different timestamps.
Something like this....
SELECT sum(data) FROM table WHERE type='1' AND timestamp=(SELECT max(timestamp FROM table WHERE type='1' GROUP BY key)
Or something like that. That isn't even close of course. I am completely lost on this one. it feels like I need to group by key but the syntax eludes me. Any suggestions are appreciated.
EDIT: additional info:
if: 'data' is temperature. 'key' is day of the week. 'type' is morning, noon or night
So the data might look like
morning mon 70 (timestamp)
noon tue 78 (timestamp)
morning wed 72 (timestamp)
night tue 74 (timestamp)
morning thu 76 (timestamp)
noon wed 77 (timestamp)
night fri 78 (timestamp)
noon tue 79 (timestamp)
If these are in timestamp order (desc) and I want the sum of most recent noon temps for all five days, the result would be: 155 in this case since the last noon was also tuesday and it was earlier and thus, not included. Make sense? I want sum of 'data' for any key, specific type, latest timestamp only. In this example, I would be summing at most 7 pieces of data.
If the timestamp column is guaranteed to be unique for each (key,type) (That is, there's a UNIQUE constraint ON (key,type,timestamp), then this query will return the specified resultset. (This isn't the only approach, but it is a familiar pattern):
SELECT SUM(t.data) AS latest_total
FROM mytable t
JOIN ( SELECT h.type
, h.key
, MAX(h.timestamp) AS max_ts
FROM mytable h
WHERE h.type='1'
GROUP
BY h.type
, h.key
) m
ON m.type = t.type
AND m.key = t.key
AND m.max_ts = t.timestamp
The inline view assigned an alias of m returns the "latest" timestamp for type=1 for all 5 key values (if at least one row exists)
That is joined to the original table, to retrieve the row that has that "latest" timestamp.
A suitable index with leading columns of type,key,timestamp will likely improve performance.
(That's based on my understanding of the specification; I may not be totally clear on the specification. What this query is doing is getting the latest timestamp for the type=1 rows. If there happen to be two (or more) rows with the same latest timestamp value for a given key and type, this query will retrieve both (or all) of those rows, and include them in the sum.
We could add a GROUP BY t.type on that query, and that wouldn't change the result, since we are guaranteed that the t.type will be equal to the constant 1 (specified in the predicate in the WHERE clause of the inline view query.)
But we would need to add the GROUP BY if we wanted to get totals for all three type in the same query:
SELECT t.key
, SUM(t.data) AS latest_total
FROM mytable t
JOIN ( SELECT h.type
, h.key
, MAX(h.timestamp) AS max_ts
FROM mytable h
WHERE h.type IN ('1','2','3')
GROUP
BY h.type
, h.key
) m
ON m.type = t.type
AND m.key = t.key
AND m.max_ts = t.timestamp
GROUP
BY t.key
NOTE:
Using reserved words as identifiers (e.g. TIMESTAMP and KEY isn't illegal, but those identifiers (usually) need to be enclosed in backticks. But changing the names of these columns so that they aren't reserved words is best practice.
SELECT SUM(data)
FROM ( SELECT CONCAT(MAX(timestamp), '_', type) AS customId
FROM table
WHERE type = '1'
GROUP BY key ) a
JOIN table b ON a.customId = CONCAT(b.timestamp, '_', type)
GROUP BY type;
This would probably do the trick...
SQL-Fiddle
I would for simplicity and maintainability use a temp-table and fill it with several statements. The solution with "union-subselect" looks a bit long for me.
So
drop tamporary table if exists tmp_data;
create temporary table tmp_data (type int, value int);
insert into tmp_data select 1, value from data_table where type=1 order by timestamp desc limit 5;
insert into tmp_data select 2, value from data_table where type=2 order by timestamp desc limit 5;
insert into tmp_data select 3, value from data_table where type=3 order by timestamp desc limit 5;
select type, sum(value) as total from tmp_data group by type;
EDIT:
The subselect-solution would be similar, and since there are only 3 types not too bad
select type, sum(value) as total from
(select 1 as type, value from data_table where type=1 order by timestamp desc limit 5
union
select 2 as type, value from data_table where type=2 order by timestamp desc limit 5
union
select 3 as type, value from data_table where type=3 order by timestamp desc limit 5) as subtab group by type;
Hope that helps.

mysql query to efficiently remove duplicates

Hi folks and thanks for reading
I have a quiz feature on my site which stores a score, username and ip address as the most important columns. I currently have a horrible series of views bringing back the high scores based on the criteria I need which are...
Lowest score first but...only the lowest score for each Quiz user.
The complexity lies if the user has changed ip, i.e. keeps the same username but has a different ip OR if the user keeps the same IP address but changes user name.
It's easier to explain with an example.
First visitor has 4 entries but from 3 different IP Addresses
Second user from 2 IP Addresses
Third user using one IP Address but using 3 Usernames
Table with VALUES(UserID, IPA, Score)
User 1, IP1, 13
User 1, IP1, 20
User 1, IP2, 30
User 1, IP3, 10
User 2, IP4, 20
User 2, IP5, 22
User 2, IP5, 15
User 3, IP6, 12
User 3, IP6, 20
User 4, IP6, 15
User 5, IP6, 11
The highscore query would present you with
User 1, IP3, 10
User 5, IP6, 11
User 2, IP5, 15
The score value is highly unlikely to be duplicated but I guess it is possible. The figures above are simplified to explain my conundrum!
Can anyone suggest an efficient way of removing these duplicates as my table is now over 15,000 records and the views are creaking!
Many thanks.
To identify occurrences of duplicate (UserID,IPA) tuples is pretty straightforward:
SELECT s.UserID
, s.IPA
FROM mytable s
GROUP
BY s.UserID
, s.IPA
HAVING COUNT(1) > 1
To get the lowest score, you could add MIN(s.Score) to the select list.
Deleting duplicates is a little more difficult, in that you don't seem to have any guarantee of uniqueness. Some will recommend that you copy the rows you want to keep out to a separate table, and then either swap the tables with renames, or truncate the original table and reload from the new table. (That usually turns out to be the most efficient approach.)
CREATE TABLE newtable LIKE mytable ;
INSERT INTO newtable (UserID,IPA,Score)
SELECT s.UserID
, s.IPA
, MIN(Score) AS Score
FROM mytable s
GROUP
BY s.UserID
, s.IPA ;
If you want to identify duplicates by just UserID, the same approach can work. If it isn't important that the IPA value comes from the row with the lowest score, it's a little easier. I can put together the query that gets the row that has the lowest score for the user.
If you want to delete rows from the existing table, without adding a unique identifier (like an AUTO_INCREMENT id column) on each row, that can be done too.
This will get you partway, deleting all rows for a given (UserID,IPA) that have a score higher than the lowest score:
DELETE t.*
FROM mytable t
JOIN ( SELECT s.UserID
, s.IPA
, MIN(s.Score)
FROM mytable s
GROUP
BY s.Userid
, s.IPA
) k
ON k.UserID = t.UserID
AND k.IPA = t.IPA
AND k.Score < t.Score
But that will still leave duplicate occurrences of duplicate (UserID,IPA,Score) tuples. Without some other column on the table that makes the row unique, it's a little more difficult to remove duplicates. (Again, a common technique is copy the rows you want to keep to another table, and either swap tables or reload the original table from the saved rows.
FOLLOWUP
Note that views (both stored and inline) can be expensive performancewise, with MySQL, since the views get materialized as temporary MyISAM tables (MySQL calls them "derived tables").
But correlated subqueries can be even more problematic on large sets.
So, choose your poison.
If there the table has an index ON (userID, Score, IPA) here's how I would get the resultset:
SELECT IF(#prev_user=t.UserID,#i:=#i+1,#i:=1) AS seq
, #prev_user := t.UserID AS UserID
, t.IPA
, t.Score
FROM mytable t
JOIN (SELECT #i := NULL, #prev_user := NULL) i
GROUP
BY t.UserID ASC
, t.Score ASC
, t.IPA ASC
HAVING seq = 1
This is taking advantage of some MySQL-specific features: user_variables and the guarantee that the GROUP BY will return a sorted resultset. (The EXPLAIN output will show "Using index", which means we avoid a sort operation, but the query will still create a derived table. We use the user_variables to identify the "first" row for each UserID, and the HAVING clause eliminates all but that first row.
test case:
create table mytable (UserID VARCHAR(6), IPA varchar(3), Score INT);
create index mytable_IX ON mytable (UserID, Score, IPA);
insert into mytable values ('User 1','IP1',13)
,('User 1','IP1',20)
,('User 1','IP2',30)
,('User 1','IP3',10)
,('User 2','IP4',20)
,('User 2','IP5',22)
,('User 2','IP5',15)
,('User 3','IP6',12)
,('User 3','IP6',20)
,('User 4','IP6',15)
,('User 5','IP6',11);
Another followup
To eliminate 'User 4' and 'User 5' from the resultset (it's not at all clear why you would want or need to do that. If it's because those users have only one row in the table, then you could add a JOIN to a subquery (inline view) that gets a list of UserID values where there is more than one row, like this:
SELECT IF(#prev_user=t.UserID,#i:=#i+1,#i:=1) AS seq
, #prev_user := t.UserID AS UserID
, t.IPA
, t.Score
FROM mytable t
JOIN ( SELECT d.UserID
FROM mytable d
GROUP
BY d.UserID
HAVING COUNT(1) > 1
) m
ON m.UserID = t.UserID
CROSS
JOIN (SELECT #i := NULL, #prev_user := NULL) i
GROUP
BY t.UserID ASC
, t.Score ASC
, t.IPA ASC
HAVING seq = 1

MS Access 2003 - Combine last record of multiple tables into one query or table?

I have a couple of tables that are transaction tables, and I would like to make a simple pivot chart for comparative balances....which happen to be the last record of each of these tables in a field called "balance".
so i know how to populate this on a form using a SQL statement, rs.movelast, but i do not know how to get to the pivot chart without having this into a table....
thanks!
EDIT:
This is what I used! Thanks Remou!
(SELECT TOP 1 TransactionID, Balance
FROM tblTrans001
ORDER BY TransctionID DESC)
UNION
(SELECT TOP 1 TransactionID, Balance
FROM tblTransaction02
ORDER BY TransactionID DESC)
UNION
(SELECT TOP 1 TransactionID, Balance
FROM Tranaction03
ORDER BY TransID DESC)
Now I just need to find a way to insert a text string into the corresponding fields that identifies what table the value is coming from.
for example, the above query returns
TransID Balance
123 $1000.00
234 $20000.00
345 $300000.00
and I need:
TransID Balance Table/Account
123 $1000.00 tblTransaction01
234 $20000.00 tblTransaction02
345 $300000.00 tblTransaction03
thanks!
What do you define last record? Let us say it is the date created and that the date created is unique, then you could use the SQL below. Note that the parentheses are important.
(SELECT TOP 1 CrDate , Balance , "TranA" As FromTable
FROM TransactionsA
ORDER BY CrDate DESC)
UNION
(SELECT TOP 1 CrDate , Balance , "TranB" As FromTable
FROM TransactionsB
ORDER BY CrDate DESC)