SQL query to aggregate amount from product transaction table - mysql

Need to aggregate latest product prices of all products from batchTransaction Table, relevant Columns:
id - Unique
productId - Not unique
transactionValue - Value of that transaction
transactionDate - date of that transaction
A product can have multiple transactions but only latest needs to be considered for aggregation. Need to aggregate total transactionValue across plant at a provided date, for all products.
SELECT SUM(transactionQuantity)
FROM batchTransaction
WHERE (id, dateCreated) IN (
SELECT id, MAX(dateCreated)
FROM batchTransaction
WHERE AND transactionDate < 1675189800000
GROUP BY productId
);
Above query would have worked, but it gives error - this is incompatible with sql_mode=only_full_group_by

The only way this makes sense with your description of getting the latest transaction per product is to group by productId in the subquery, and use productId in the result. Then compare that to the productid in the outer query.
SELECT SUM(transactionQuantity)
FROM batchTransaction
WHERE (productId, dateCreated) IN (
SELECT productId, MAX(dateCreated)
FROM batchTransaction
WHERE transactionDate < 1675189800000
GROUP BY productId
);
I also removed a superfluous AND keyword from your subquery.
I assume from this that transactionDate is stored as a BIGINT representing the UNIX timestamp in milliseconds, not as a DATETIME type.
A more modern way to write this sort of query is to use a window function ROW_NUMBER() and select only those that are the first (latest) row in each partition by productId.
SELECT SUM(transactionQuantity)
FROM (
SELECT transactionQuantity,
ROW_NUMBER() OVER (PARTITION BY productId ORDER BY dateCreated DESC) AS rownum
FROM batchTransaction
WHERE transactionDate < 1675189800000
) AS t
WHERE t.rownum = 1;
This syntax requires MySQL 8.0 for the window function.

Avoid IN clause as it causes performance issue when run on a larger dataset. Join would be a better option in such scenarios.
The id in the IN clause, is not guaranteed that it is of latest transaction because when you do group by product id, records with same product id are grouped and the order of those records is not maintained as you are assuming.
Query to achieve the right results
select sum(t.transactionQuantity)
from
(select
cast(substring_index(group_concat(
transactionQuantity
order by transactionDate desc separator ','
), ',', 1) as unsigned) as transactionQuantity
from
batchTransaction
group by productId
) as t;

Related

Max(created_at) showing the right column but not the rest of the data SQL

I want to fetch the latest entry to the database
I have this data
When I run this query
select id, parent_id, amount, max(created_at) from table group by parent_id
it correctly returns the latest entry but not the rest of the column
what I want is
how do I achieve that?
Sorry that I posted image instead of table, the table won't work for some reason
You can fetch the desired output using subquery. In the subquery fetch the max created_at of each parent_id which will return the row with max created_at for each parent_id. Please try the below query.
SELECT * FROM yourtable t WHERE t.created_at =
(SELECT MAX(created_at) FROM yourtable WHERE parent_id = t.parent_id);
If the id column in your table is AUTO_INCREMENT field then you can fetch the latest entry with the help of id column too.
SELECT * FROM yourtable t WHERE t.id =
(SELECT MAX(id) FROM yourtable WHERE parent_id = t.parent_id);
That's a good use case for a window function like RANK as a subquery:
SELECT id, parent_id, amount, created_at
FROM (
SELECT id, parent_id, amount, created_at,
RANK() OVER (PARTITION BY parent_id ORDER BY created_at DESC) parentID_rank
FROM yourtable) groupedData
WHERE parentID_rank = 1;
or with ORDER BY clause for the outer query if necessary:
SELECT id, parent_id, amount, created_at
FROM (
SELECT id, parent_id, amount, created_at,
RANK() OVER (PARTITION BY parent_id ORDER BY created_at DESC) parentID_rank
FROM yourtable) groupedData
WHERE parentID_rank = 1
ORDER BY id;
To explain the intention:
The PARTITION BY clause groups your data by the parent_id.
The ORDER BY clause sorts it starting with the latest date.
The WHERE clause just takes the entry with the latest date per parent id only.
The main point here is that your query is invalid. The DBMS should raise an error, but you work in a cheat mode that MySQL offers that allows you to write such queries without being warned.
My advice: When working in MySQL make sure you have always
SET sql_mode = 'ONLY_FULL_GROUP_BY';
As to the query: You are using MAX. Thus you aggregate your data. In your GROUP BY clause you say you want one result row per parent_id. You select the parent_id's maximum created_at. You also select the parent_id's ID, the parent_id itself, and the parent_id's amount. The parent_id's ID??? Is there only one ID per parent_id in your table? The amount? Is there only one amount per parent_id in the table? You must tell the DBMS which ID to show and which amount. You haven't done so, and this makes your query invalid according to standard SQL.
You are running MySQL in cheat mode,however, and so MySQL silently applies ANY_VALUE to all non-aggregated columns. This is what your query is turned into internally:
select
any_value(id),
parent_id,
any_value(amount),
max(created_at)
from table
group by parent_id;
ANY_VALUE means the DBMS is free to pick the attribute from whatever row it likes; you don't care.
What you want instead is not to aggregate your rows, but to filter them. You want to select only those rows with the maximum created_at per parent_id.
There exist several ways to get this result. Here are some options.
Get the maximum created_at per parent_id. Then select the matching rows:
select *
from table
where (parent_id, created_at) in
(
select parent_id, max(created_at)
from table
group by parent_id
);
Select the rows for which no newer created_at exists for the parent_id:
select *
from table t
where not exists
(
select null
from table newer
where newer.parent_id = t.parent_id
and newer.created_at > t.created_at
);
Get the maximum created_at on-the-fly. Then compare the dates:
select id, parent_id, amount, created_at
from
(
select t.*, max(created_at) over (partition by parent_id) as max_created_at
from table t
) with_max_created_at
where created_at = max_created_at;
select id, parent_id, amount, max(created_at)
from table
group by parent_id
order by max(created_at) desc
limit 1

SQL to find local count max per primary key

I have a table with PK CustomerId + type. Each customer has a few types.
For each customer I want to get type which repeated the most for this customer.
I've tried to create a column "count" but I want to get the local maxs, and not a global max for the whole col.
Is there a native way to do so?
to get type which repeated the most for this customer
You need to group by CustomerId,type. With row_number you can partition by CustomerId and order by the COUNT(type).
Try:
WITH cte AS (
SELECT CustomerId ,
type,
ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY COUNT(type) DESC ) as row_num
FROM test
GROUP BY CustomerId,type
) SELECT CustomerId, type
FROM cte
WHERE row_num = 1 ;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=8e8657dfa08ff170ed3eaf5e335b3582

Filtering values according to a different value for each user

I am trying to understand how to do in mySQL what I usually do in python.
I have a sales table, with sale_date, user_id and price_USD as columns. Each row is an order made by the user.
I want to get a new table that has all of the orders which cost more than the last order the user made (so in this picture, just the yellow rows).
I know how to get a table with the last order for each user, but I cannot save it on the database.
How do I compare each row's price to a different value by the user_id and get just the larger ones in one 'swoop'?
Thanks
If you are running MysL 8.0, you can do this with window functions:
select t.*
from (
select
t.*,
first_value(price_usd)
over(partition by user_id order by sale_date desc) last_price_usd
from mytable t
) t
where lag_price_usd is null or price > last_price_usd
In earlier versions, you could use a correlated subquery:
select t.*
from mytable t
where t.price_usd > (
select price_usd
from mytable t1
where t1.user_id = t.user_id
order by sale_date desc
limit 1
)

Get top item for each year

I have a datatable with some records. Using mysql I am able to get a result grouped by a specific period (year) and users and ordered (in descending order) by number of species.
SELECT YEAR(entry_date) AS period, uid AS user, COUNT(DISTINCT pid) AS species
FROM records
WHERE YEAR(entry_date)<YEAR(CURDATE())
GROUP BY period, uid
ORDER by period, species DESC
Please see attached picture of the result. But what if I only want the get the TOP USER (and number of species) for EACH year (the red marked rows)? How can I achieve that?
I am able to handle this later in my php code but it would be nice to have this sortered out already in mysql query.
Thanks for your help!
If you are running MySQL 8.0, you can use RANK() to rank records in years partitions by their count of species, and then filter on the top record per group:
SELECT *
FROM (
SELECT
YEAR(entry_date) AS period,
uid AS user,
COUNT(DISTINCT pid) AS species,
RANK() OVER(PARTITION BY YEAR(entry_date) ORDER BY COUNT(DISTINCT pid) DESC) rn
FROM records
WHERE entry_date < DATE_FORMAT(CURRENT_DATE, '%Y-01-01')
GROUP BY period, uid
) t
WHERE rn = 1
ORDER by period
This preserves top ties, if any. Note that uses an index-friendly filter on the dates in the WHERE clause.
In earlier versions, an equivalent option is to filter with a HAVING clause and a correlated subquery:
SELECT
YEAR(entry_date) AS period,
uid AS user,
COUNT(DISTINCT pid) AS species
FROM records r
WHERE entry_date < DATE_FORMAT(CURRENT_DATE, '%Y-01-01')
GROUP BY period, uid
HAVING COUNT(DISTINCT pid) = (
SELECT COUNT(DISTINCT r1.pid) species1
FROM records r1
WHERE YEAR(r1.entry_date) = period
GROUP BY r1.uid
ORDER BY species1 DESC
LIMIT 1
)
ORDER by period

Retrieving last row inserted in table for each "parameter"

I have a table, currently about 1.3M rows which stores measured data points for a couple of different parameters. It is a bout 30 parameters.
Table:
* id
* station_id (int)
* comp_id (int)
* unit_id (int)
* p_id (int)
* timestamp
* value
I have a UNIQUE index on: (station_id, comp_id, unit_id, p_id, timestamp)
Due to timestamp differ for every parameter i have difficulties sorting by the timestamp (I have to use a group by).
So today I select the last value for each parameter by this query:
select p_id, timestamp, value
from (select p_id, timestamp, value
from table
where station_id = 3 and comp_id = 9112 and unit_id = 1 and
p_id in (1,2,3,4,5,6,7,8,9,10)
order by timestamp desc
) table_x
group by p_id;
This query takes about 3 seconds to execute.
Even though i have index as mentioned before the optimizer uses filesort to find the values.
Querying for only 1 specific parameter:
select p_id, timestamp, value from table where station_id = 3 and comp_id = 9112 and unit_id = 1 and p_id =1 order by timestamp desc limit 1;
Takes no time (0.00).
I've also tried joining the parameter-ids to a table which I store the parameter ID's in without luck.
So, is there a simple ( & fast) way to ask for the latest values for a couple of rows with different parameters?
Doing a procedure running a loop asking for each parameter individually seems much faster than asking all for once which I think not is the way to use a database.
Your query is incorrect. You are aggregating by p_id, but including other columns. These come from indeterminate rows, and the documentation is quite clear:
MySQL extends the use of GROUP BY so that the select list can refer to
nonaggregated columns not named in the GROUP BY clause. This means
that the preceding query is legal in MySQL. You can use this feature
to get better performance by avoiding unnecessary column sorting and
grouping. However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Furthermore, the selection of values from each group cannot be
influenced by adding an ORDER BY clause.
The following should work:
select p_id, timestamp, value
from table t join
(select p_id, max(timestamp) as maxts
from table
where station_id = 3 and comp_id = 9112 and unit_id = 1 and
p_id in (1,2,3,4,5,6,7,8,9,10)
order by timestamp desc
) tt
on tt.pid = t.pid and tt.timestamp = t.maxts;
The best index for this query is a composite index on table(station_id, comp_id, unit_id, p_id, timestamp).