SQL to find local count max per primary key - mysql

I have a table with PK CustomerId + type. Each customer has a few types.
For each customer I want to get type which repeated the most for this customer.
I've tried to create a column "count" but I want to get the local maxs, and not a global max for the whole col.
Is there a native way to do so?

to get type which repeated the most for this customer
You need to group by CustomerId,type. With row_number you can partition by CustomerId and order by the COUNT(type).
Try:
WITH cte AS (
SELECT CustomerId ,
type,
ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY COUNT(type) DESC ) as row_num
FROM test
GROUP BY CustomerId,type
) SELECT CustomerId, type
FROM cte
WHERE row_num = 1 ;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=8e8657dfa08ff170ed3eaf5e335b3582

Related

SQL query to aggregate amount from product transaction table

Need to aggregate latest product prices of all products from batchTransaction Table, relevant Columns:
id - Unique
productId - Not unique
transactionValue - Value of that transaction
transactionDate - date of that transaction
A product can have multiple transactions but only latest needs to be considered for aggregation. Need to aggregate total transactionValue across plant at a provided date, for all products.
SELECT SUM(transactionQuantity)
FROM batchTransaction
WHERE (id, dateCreated) IN (
SELECT id, MAX(dateCreated)
FROM batchTransaction
WHERE AND transactionDate < 1675189800000
GROUP BY productId
);
Above query would have worked, but it gives error - this is incompatible with sql_mode=only_full_group_by
The only way this makes sense with your description of getting the latest transaction per product is to group by productId in the subquery, and use productId in the result. Then compare that to the productid in the outer query.
SELECT SUM(transactionQuantity)
FROM batchTransaction
WHERE (productId, dateCreated) IN (
SELECT productId, MAX(dateCreated)
FROM batchTransaction
WHERE transactionDate < 1675189800000
GROUP BY productId
);
I also removed a superfluous AND keyword from your subquery.
I assume from this that transactionDate is stored as a BIGINT representing the UNIX timestamp in milliseconds, not as a DATETIME type.
A more modern way to write this sort of query is to use a window function ROW_NUMBER() and select only those that are the first (latest) row in each partition by productId.
SELECT SUM(transactionQuantity)
FROM (
SELECT transactionQuantity,
ROW_NUMBER() OVER (PARTITION BY productId ORDER BY dateCreated DESC) AS rownum
FROM batchTransaction
WHERE transactionDate < 1675189800000
) AS t
WHERE t.rownum = 1;
This syntax requires MySQL 8.0 for the window function.
Avoid IN clause as it causes performance issue when run on a larger dataset. Join would be a better option in such scenarios.
The id in the IN clause, is not guaranteed that it is of latest transaction because when you do group by product id, records with same product id are grouped and the order of those records is not maintained as you are assuming.
Query to achieve the right results
select sum(t.transactionQuantity)
from
(select
cast(substring_index(group_concat(
transactionQuantity
order by transactionDate desc separator ','
), ',', 1) as unsigned) as transactionQuantity
from
batchTransaction
group by productId
) as t;

Not getting this SQL query

Print all details of the 16th order placed by each customer if any.
How to print exact 16th Order?
SELECT COUNT(orderId)
FROM orders
GROUP BY CustomerID
ORDER BY CustomerID;
We can use a CTE and RANK to create a list of all orderId's, customerID's and their "order" as you named it.
Then we fetch those entries from the entire result whose order is 16.
WITH result AS
(
SELECT orderId, customerID,
RANK() OVER
(PARTITION BY customerID
ORDER BY orderId) AS rnk
FROM orders
)
SELECT orderId, customerID
FROM result
WHERE rnk=16
GROUP BY orderId, customerID
ORDER BY customerID;
For customerID's having less than 16 orders, nothing will be selected.
We can also use ROW_NUMBER instead of RANK in the above query, this makes no difference in your use case.
Select * from
(
SELECT *,
DENSE_RANK()
OVER(
PARTITION BY customerID
ORDER BY orderID
) my_rank
FROM orders
) as myTable
where my_rank = 16
order by CustomerID;
You can just use offset like:
SELECT *
FROM orders
GROUP BY CustomerID
ORDER BY CustomerID
LIMIT 1 OFFSET 15;
and set the OFFSET value to 15 so it skips the first 15 values and prints from the 16th value and limit it to only one row by setting the LIMIT value to 1

Max(created_at) showing the right column but not the rest of the data SQL

I want to fetch the latest entry to the database
I have this data
When I run this query
select id, parent_id, amount, max(created_at) from table group by parent_id
it correctly returns the latest entry but not the rest of the column
what I want is
how do I achieve that?
Sorry that I posted image instead of table, the table won't work for some reason
You can fetch the desired output using subquery. In the subquery fetch the max created_at of each parent_id which will return the row with max created_at for each parent_id. Please try the below query.
SELECT * FROM yourtable t WHERE t.created_at =
(SELECT MAX(created_at) FROM yourtable WHERE parent_id = t.parent_id);
If the id column in your table is AUTO_INCREMENT field then you can fetch the latest entry with the help of id column too.
SELECT * FROM yourtable t WHERE t.id =
(SELECT MAX(id) FROM yourtable WHERE parent_id = t.parent_id);
That's a good use case for a window function like RANK as a subquery:
SELECT id, parent_id, amount, created_at
FROM (
SELECT id, parent_id, amount, created_at,
RANK() OVER (PARTITION BY parent_id ORDER BY created_at DESC) parentID_rank
FROM yourtable) groupedData
WHERE parentID_rank = 1;
or with ORDER BY clause for the outer query if necessary:
SELECT id, parent_id, amount, created_at
FROM (
SELECT id, parent_id, amount, created_at,
RANK() OVER (PARTITION BY parent_id ORDER BY created_at DESC) parentID_rank
FROM yourtable) groupedData
WHERE parentID_rank = 1
ORDER BY id;
To explain the intention:
The PARTITION BY clause groups your data by the parent_id.
The ORDER BY clause sorts it starting with the latest date.
The WHERE clause just takes the entry with the latest date per parent id only.
The main point here is that your query is invalid. The DBMS should raise an error, but you work in a cheat mode that MySQL offers that allows you to write such queries without being warned.
My advice: When working in MySQL make sure you have always
SET sql_mode = 'ONLY_FULL_GROUP_BY';
As to the query: You are using MAX. Thus you aggregate your data. In your GROUP BY clause you say you want one result row per parent_id. You select the parent_id's maximum created_at. You also select the parent_id's ID, the parent_id itself, and the parent_id's amount. The parent_id's ID??? Is there only one ID per parent_id in your table? The amount? Is there only one amount per parent_id in the table? You must tell the DBMS which ID to show and which amount. You haven't done so, and this makes your query invalid according to standard SQL.
You are running MySQL in cheat mode,however, and so MySQL silently applies ANY_VALUE to all non-aggregated columns. This is what your query is turned into internally:
select
any_value(id),
parent_id,
any_value(amount),
max(created_at)
from table
group by parent_id;
ANY_VALUE means the DBMS is free to pick the attribute from whatever row it likes; you don't care.
What you want instead is not to aggregate your rows, but to filter them. You want to select only those rows with the maximum created_at per parent_id.
There exist several ways to get this result. Here are some options.
Get the maximum created_at per parent_id. Then select the matching rows:
select *
from table
where (parent_id, created_at) in
(
select parent_id, max(created_at)
from table
group by parent_id
);
Select the rows for which no newer created_at exists for the parent_id:
select *
from table t
where not exists
(
select null
from table newer
where newer.parent_id = t.parent_id
and newer.created_at > t.created_at
);
Get the maximum created_at on-the-fly. Then compare the dates:
select id, parent_id, amount, created_at
from
(
select t.*, max(created_at) over (partition by parent_id) as max_created_at
from table t
) with_max_created_at
where created_at = max_created_at;
select id, parent_id, amount, max(created_at)
from table
group by parent_id
order by max(created_at) desc
limit 1

MySQL Select, unique in one column

Lets say I have a table with the following rows/values:
I need a way to select the values in amount but only once if they're duplicated. So from this example I'd want to select A,B and C the amount once. The SQL result should look like this then:
Use LAG() function and compare previous amount with current row amount for name.
-- MySQL (v5.8)
SELECT t.name
, CASE WHEN t.amount = t.prev_val THEN '' ELSE amount END amount
FROM (SELECT *
, LAG(amount) OVER (PARTITION BY name ORDER BY name) prev_val
FROM test) t
Please check from url https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=8c1af9afcadf2849a85ad045df7ed580
You can handle situation like these with different function depending on what you need:
Case1 : If you have same values per name:
select distinct name, amount from [table name]
Case2 : You have duplicates with different values for each name and you want to pick the one with the highest value. Use min() if you need the minimum one to show up.
select name, max(amount) from [table name] group by 1
Case 3: The one you need with blanks for the rest of the duplications.
Row number will create rows based on values in amount and since the values are the same it will create it incrementally and you can then use IF to create a new column where rank_ > 1 then blanks. This will also cover the case where you would like to select just the minimum value and then have blanks for the rest of the name values
With ranking as (
select
*,
ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY AMOUNT) AS RANK_
from [table]
)
SELECT
*,
IF(RANK_ > 1,"",AMOUNT) AS NEW_AMOUNT
FROM ranking
Case 4: You need to select maximum and put the other names as blank
You will just adjust the order by clause of ROW_NUMBER() to DESC. This will put the rank 1 to the highest amount per name and for the rest, the blank will be filled
With ranking as (
select
*,
ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY AMOUNT DESC) AS RANK_
from [table]
)
SELECT
*,
IF(RANK_ > 1,"",AMOUNT) AS NEW_AMOUNT
FROM ranking
If you are using mysql 8 you can use row_number for this:
with x as (
select *, row_number() over(partition by name order by amount) rn
from t
)
select name, case when rn=1 then amount else '' end amount
from x
See example Fiddle
The other answers are missing a really important point: A SQL table returns an unordered set unless there is an explicit order by.
The data that you have provides has rows that are exact duplicates. For this reason, I think the best approach uses row_number() and an order by in the outer query:
select name, (case when seqnum = 1 then amount end) as amount
from (select t.*,
row_number() over (partition by name, amount) as seqnum
from t
) t
order by name, seqnum;
Note that MySQL does not require an order by argument for row_number().
More commonly, though, you would have some other column (say a date or id) that would be used for ordering. I should also emphasize that this type of formatting is often handled at the application layer and not in the database.

SQL join two tables with no key column but need to join with date filter column

I want to join two tables. Tables are following below,
Table A:
Batch_ID INT,
Start_Dt DATE,
Expiry_Dt DATE
Table B:
Purchase_Dt DATE
I need to get two oldest batch code for each purchase date. Purchase date should be greater than or equal to start_dt and expiry_dt should be less than or equal to purchase date.
You can try using row_number()
select Purchase_Dt,Batch_ID from
(
select Purchase_Dt,Batch_ID,row_number() over(partition by Purchase_Dt order by batch_id desc) as rn
from B join A on Purchase_Dt>=start_dt and Purchase_Dt<=Expiry_Dt
)f where rn=1