How to get the latest items distinctively in a row? - mysql

I want to get the remaining/latest balance of the cardnumber from the rows. Below is the sample of the table.
trans_id | cardnumber | trans_date | balance
---------------------------------------------------------------
1 | 1000005240000008 | 2009-07-03 04:54:27 | 88
2 | 1000005120000008 | 2009-07-04 05:00:07 | 2
3 | 1000005110000008 | 2009-07-05 13:18:39 | 3
4 | 1000005110000008 | 2009-07-06 13:18:39 | 4
5 | 1000005110000008 | 2009-07-07 14:25:32 | 4.5
6 | 1000005120000002 | 2009-07-08 16:50:51 | -1
7 | 1000005240000002 | 2009-07-09 17:03:17 | 1
The result should look like this:
trans_id | cardnumber | trans_date | balance
---------------------------------------------------------------
1 | 1000005110000008 | 2009-07-07 14:25:32 | 4.5
2 | 1000005120000002 | 2009-07-08 16:50:51 | -1
3 | 1000005240000002 | 2009-07-09 17:03:17 | 1
I already have a query but it goes something like this:
SELECT cardnumber, MAX(balance), trans_date
FROM transactions
GROUP BY cardnumber
I really need help on this, im having a hard time. :(
Thanks in advance.
Mark

I don't have a MySQL in front of me at the moment, but something like this should work:
SELECT latest.cardnumber, latest.max_trans_date, t2.balance
FROM
(
SELECT t1.cardnumber, MAX(t1.trans_date) AS max_trans_date
FROM transactions t1
GROUP BY t1.cardnumber
) latest
JOIN transactions t2 ON (
latest.cardnumber = t2.cardnumber AND
latest.max_trans_date = t2.trans_date
)
Probably requires 5.0.x or later. There may be a better way. It's 3AM :-D

Almost the same as derobert's, but other way around. The idea is anyway that you make a subquery that takes the cardnumber with the latest (max) transaction date and then join that with the original table. This of course assumes that there aren't any transactions on cardnumber occuring at the exact same time.
SELECT t1.trans_id, t1.cardnumber, t1.trans_date, t1.balance
FROM transaction AS t1
JOIN (SELECT MAX(trans_date), cardnumber FROM transactions) AS t2 ON t2.cardnumber = t1.cardnumber

SELECT * FROM transactions WHERE (cardnumber,trans_date) in (SELECT cardnumber, MAX(trans_date) FROM transactions GROUP BY cardnumber);

Related

MariaDB: Select val_1, max(value_2), group by value_3

I have the following columns:
| order_id | client_id | order_timestamp | buyer_id | (all INTs)
It started with the easy-sounding task "Show me the buyer of the last order for each client", so basically
SELECT
client_id,
max(order_timestamp),
buyer_id
FROM table t
GROUP BY client_id;
if GROUP BY would work as one would expect/wish. I know that this is kind of a common problem, but I've never seen this case in particular where you need another value in addition to the one you're grouping by. I guess using the Window functions could help, but we're using MariaDB 10.0, so that's not really an option. I tried different subselect and joins but it always ends with the problem that I can't use the order_id to join, since I have to group by the client_id. It also came to my mind to join using the client_id AND order_timestamp but the combination is not unique in the table, since it's possible to have orders with the exact same (Unix) timestamp for one client or client/buyer combination (so yeah, this would be an edge case, I would need the buyer of the order with the higher order_id, but that's a problem for another day I guess).
If the table was filled like
| order_id | client_id | order_timestamp | buyer_id |
| 1 | 123 | 9876543 | 2 |
| 2 | 123 | 9876654 | 3 |
| 3 | 234 | 9945634 | 2 |
| 4 | 234 | 9735534 | 1 |
I would like to get
| client_id | buyer_id |
------------|----------|
| 123 | 3 |
| 234 | 2 |
Hopefully, somebody can help me, so I can go to sleep in peace tonight.
If your MariaDB version supports window functions you can use ROW_NUMBER():
select t.client_id, t.buyer_id
from (
select *,
row_number() over (partition by client_id order by order_timestamp desc, order_id desc) rn
from tablename
) t
where t.rn = 1
See the demo.
Results:
| client_id | buyer_id |
| --------- | -------- |
| 123 | 3 |
| 234 | 2 |
Without window functions use NOT EXISTS:
select t.client_id, t.buyer_id
from tablename t
where not exists (
select 1 from tablename
where client_id = t.client_id
and (
order_timestamp > t.order_timestamp
or (order_timestamp = t.order_timestamp and order_id > t.order_id)
)
)
If you use max(field), it will pickup the first column of the group condition. In your case first occuring client_id per group which is not what you want.
Try this.
select client_id, order_timestamp, buyer_id from t
where order_timestamp=
(select max(ot) from t as tcopy where tcopy.client_id= t.client_id )
group by client_id;

Seek rows with incorrect dates in historic data

I had a table that is an historic log, recently I fixed a bug that was writing in that table an incorrect date, the dates should be correlatives, but in some cases there was a date that wasn't it, so much older than the previous date.
How can I get all the rows that aren't correlatives for each entity_id? In the example below I should get the rows 5 and 10.
The table has millions of rows and thousand of differents entities. I was thinking to compare the results of ordering by date and id but that is a lot of manual work.
| id | entity_id | time_stamp |
|--------|-------------|---------------|
| 1 | 7 | 2019-01-22 |
| 2 | 9 | 2019-01-05 |
| 3 | 6 | 2019-03-14 |
| 4 | 9 | 2019-04-20 |
| 5 | 6 | 2015-10-04 | WRONG
| 6 | 9 | 2019-07-15 |
| 7 | 3 | 2019-07-04 |
| 8 | 7 | 2019-06-01 |
| 9 | 6 | 2019-11-04 |
| 10 | 7 | 2019-03-04 | WRONG
Are there any function to compare the previous date by the entity id? I'm completely lost here, not sure how to clean the data. The database is MYSQL by the way.
If you are running MySQL 8.0, you can use lag(); the idea is to order records by id within groups having the same entity_id, and then to filter on records where the current timestamp is smaller than the previous one:
select t.*
from (
select t.*, lag(time_stamp) over(partition by entity_id order by id) lag_time_stamp
from mytable t
) t
where time_stamp < lag_time_stamp
In earlier versions, one option is to use a correlated subquery to get the previous timestamp:
select t.*
from mytable t
where time_stamp < (
select time_stamp
from mytable t1
where t1.entity_id = t.entity_id and t1.id < t.id
order by id desc
limit 1
)
SELECT s1.*
FROM sourcetable s1
WHERE EXISTS ( SELECT NULL
FROM sourcetable s2
WHERE s1.id < s2.id
AND s1.entity_id = s2.entity_id
AND s1.time_stamp > s2.time_stamp )
The index by (entity_id, id, time_stamp) or (entity_id, time_stamp, id) will increase the performance.

Calculate average, minimum, maximum interval between date

I am trying to do this with SQL. I have a transaction table which contain transaction_date. After grouping by date, I got this list:
| transaction_date |
| 2019-03-01 |
| 2019-03-04 |
| 2019-03-05 |
| ... |
From these 3 transaction dates, I want to achieve:
Average = ((4-1) + (5-4)) / 2 = 2 days (calculate DATEDIFF every single date)
Minimum = 1 day
Maximum = 3 days
Is there any good syntax? Before I iterate all of them using WHILE.
Thanks in advance
If your mysql version didn't support lag or lead function.
You can try to make a column use a subquery to get next DateTime. then use DATEDIFF to get the date gap in a subquery.
Query 1:
SELECT avg(diffDt),min(diffDt),MAX(diffDt)
FROM (
SELECT DATEDIFF((SELECT transaction_date
FROM T tt
WHERE tt.transaction_date > t1.transaction_date
ORDER BY tt.transaction_date
LIMIT 1
),transaction_date) diffDt
FROM T t1
) t1
Results:
| avg(diffDt) | min(diffDt) | MAX(diffDt) |
|-------------|-------------|-------------|
| 2 | 1 | 3 |
if your mysql version higher than 8.0 you can try to use LEAD window function instead of subquery.
Query #1
SELECT avg(diffDt),min(diffDt),MAX(diffDt)
FROM (
SELECT DATEDIFF(LEAD(transaction_date) OVER(ORDER BY transaction_date),transaction_date) diffDt
FROM T t1
) t1;
| avg(diffDt) | min(diffDt) | MAX(diffDt) |
| ----------- | ----------- | ----------- |
| 2 | 1 | 3 |
View on DB Fiddle

MySQL - Return Latest Date and Total Sum from two rows in a column for multiple entries

For every ID_Number, there is a bill_date and then two types of bills that happen. I want to return the latest date (max date) for each ID number and then add together the two types of bill amounts. So, based on the table below, it should return:
| 1 | 201604 | 10.00 | |
| 2 | 201701 | 28.00 | |
tbl_charges
+-----------+-----------+-----------+--------+
| ID_Number | Bill_Date | Bill_Type | Amount |
+-----------+-----------+-----------+--------+
| 1 | 201601 | A | 5.00 |
| 1 | 201601 | B | 7.00 |
| 1 | 201604 | A | 4.00 |
| 1 | 201604 | B | 6.00 |
| 2 | 201701 | A | 15.00 |
| 2 | 201701 | B | 13.00 |
+-----------+-----------+-----------+--------+
Then, if possible, I want to be able to do this in a join in another query, using ID_Number as the column for the join. Would that change the query here?
Note: I am initially only wanting to run the query for about 200 distinct ID_Numbers out of about 10 million. I will be adding an 'IN' clause for those IDs. When I do the join for the final product, I will need to know how to get those latest dates out of all the other join possibilities. (ie, how do I get ID_Number 1 to join with 201604 and not 201601?)
I would use NOT EXISTS and GROUP BY
select, t1.id_number, max(t1.bill_date), sum(t1.amount)
from tbl_charges t1
where not exists (
select 1
from tbl_charges t2
where t1.id_number = t2.id_number and
t1.bill_date < t2.bill_date
)
group by t1.id_number
the NOT EXISTS filter out the irrelevant rows and GROUP BY do the sum.
I would be inclined to filter in the where:
select id_number, sum(c.amount)
from tbl_charges c
where c.date = (select max(c2.date)
from tbl_charges c2
where c2.id_number = c.id_number and c2.bill_type = c.bill_type
)
group by id_number;
Or, another fun way is to use in with tuples:
select id_number, sum(c.amount)
from tbl_charges c
where (c.id_number, c.bill_type, c.date) in
(select c2.id_number, c2.bill_type, max(c2.date)
from tbl_charges c2
group by c2.id_number, c2.bill_type
)
group by id_number;

Select the most current records from multiple identical rows in the MySQL database

I am working on a product sample inventory system where I track the movement of the products. The status of each product can have a status of "IN" or "OUT" or "REMOVED". Each row of the table represents a new entry, where ID, status and date are unique. Each product also has a serial number.
I need help with a SQL query that will return all products that are currently "OUT". If I simply just select SELECT * FROM table WHERE status = "IN", it will return all products that ever had status IN.
Every time product comes in and out, I duplicate the last row of that specific product and change the status and update the date and it will get a new ID automatically.
Here is the table that I have:
id | serial_number | product | color | date | status
------------------------------------------------------------
1 | K0T4N | XYZ | silver | 2016-07-01 | IN
2 | X56Z7 | ABC | silver | 2016-07-01 | IN
3 | 96T4F | PQR | silver | 2016-07-01 | IN
4 | K0T4N | XYZ | silver | 2016-07-02 | OUT
5 | 96T4F | PQR | silver | 2016-07-03 | OUT
6 | F0P22 | DEF | silver | 2016-07-04 | OUT
7 | X56Z7 | ABC | silver | 2016-07-05 | OUT
8 | F0P22 | DEF | silver | 2016-07-06 | IN
9 | K0T4N | XYZ | silver | 2016-07-07 | IN
10 | X56Z7 | ABC | silver | 2016-07-08 | IN
11 | X56Z7 | ABC | silver | 2016-07-09 | REMOVED
12 | K0T4N | XYZ | silver | 2016-07-10 | OUT
13 | 96T4F | PQR | silver | 2016-07-11 | IN
14 | F0P22 | DEF | silver | 2016-07-12 | OUT
This query will give you all the latest records for each serial_number
SELECT a.* FROM your_table a
LEFT JOIN your_table b ON a.serial_number = b.serial_number AND a.id < b.id
WHERE b.serial_number IS NULL
Below query will give your expected result
SELECT a.* FROM your_table a
LEFT JOIN your_table b ON a.serial_number = b.serial_number AND a.id < b.id
WHERE b.serial_number IS NULL AND a.status LIKE 'OUT'
There are two good ways to do this. Which way is best,in terms of performance, can depend on various factors, so try both.
SELECT
t1.*
FROM table t
LEFT OUTER JOIN table later_t
ON later_t.serial_number = t.serial_number
AND later_t.date > t.date
WHERE later_t.id IS NULL
AND t.status = "OUT"
Which column you check from later_t for IS NULL does not matter, so long as that column is declared NOT NULL in the table definition.
The other logically equivalent method is:
SELECT
t.*
FROM table t
INNER JOIN (
SELECT
serial_number,
MAX(date) AS date
FROM table
GROUP BY serial_number
) latest_t
ON later_t.serial_number = t.serial_number
AND latest_t.date = t.date
WHERE t.status = "OUT"
For each of these queries, I strongly suggest the following index:
ALTER TABLE table
ADD INDEX `LatestSerialStatus` (serial_number,date)
I use this type of query a lot in my own work, and have the above index as the primary key on tables. Query performance is extremely fast in such cases, for these type of queries.
See also the documentation on this query type.