Select most recent row based on distinct combination of two columns - mysql

I'm writing a cronjob that runs analysis on a flags table in my database, structured as such:
| id | item | def | time_flagged | time_resolved | status |
+----+------+-----+--------------+---------------+---------+
| 1 | 1 | foo | 1519338608 | 1519620669 | MISSED |
| 2 | 1 | bar | 1519338608 | (NULL) | OPEN |
| 3 | 2 | bar | 1519338608 | 1519620669 | IGNORED |
| 4 | 1 | foo | 1519620700 | (NULL) | OPEN |
For each distinct def, for each unique price, I want to get the "latest" row (IFNULL(`time_resolved`, `time_flagged`) AS `time`). If no such row exists for a given def-item combination, that's okay; I just don't want any duplicates for a given def-item combination.
For the above data set, I would like to select:
| def | item | time | status |
+-----+------+------------+---------+
| foo | 1 | 1519620700 | OPEN |
| bar | 1 | 1519338608 | OPEN |
| bar | 2 | 1519620669 | IGNORED |
Row 1 is not included because it's "overridden" by row 4, as both rows have the same def-item combination, and the latter has a more recent time.
The data set will have a few dozen distinct defs, a few hundred distinct items, and a very large number of flags that will only increase over time.
How can I go about doing this? I see the greatest-n-per-group tag is rife with similar questions but I don't see any that involve my specific circumstance of needed "nested grouping" across two columns.

You could try:
select distinct def, item, IFNULL(time_resolved, time_flagged) AS time, status from flags A where IFNULL(time_resolved, time_flagged) = (select MAX(IFNULL(time_resolved, time_flagged)) from flags B where A.item = B.item and A.def = B.def )
I know it's not the best approach but it might work for you

Do you mean 'for each unique Def and each unique Item'? If so, a group by of multiple columns seems like it would work (shown as a temp table t) joined back to the original table to grab the rest of the data:
select
table.def,
table.item,
table.time,
status
from
table
join (select
def,
item,
max(time) time
from table
group by def, item) t
on
table.def=t.def and
table.item=t.item and
table.time=t.time

Depending on your version of mySQL, you can use a window function:
SELECT def, item, time, status
FROM (
SELECT
def,
item,
time,
status,
RANK() OVER(PARTITION BY def, item ORDER BY COALESCE(time_resolved, time_flagged) DESC) MyRank -- Rank each (def, item) combination by "time"
FROM MyTable
) src
WHERE MyRank = 1 -- Only return top-ranked (i.e. most recent) rows per (def, item) grouping
If you can have a (def, item) combo with the same "time" value, then change RANK() to ROW_NUMBER. This will guarantee you only get one row per grouping.

select table.def, table.item, a.time, table.status
from table
join (select
def, item, MAX(COALESCE(time_r, time_f)) as time
from temp
group by def, item) a
on temp.def = a.def and
temp.item = a.item and
COALESCE(temp.time_r, temp.time_f) = a.time

Related

How to select both sum value of all rows and values in some specific rows?

I have a record table and its comment table, like:
| commentId | relatedRecordId | isRead |
|-----------+-----------------+--------|
| 1 | 1 | TRUE |
| 2 | 1 | FALSE |
| 3 | 1 | FALSE |
Now I want to select newCommentCount and allCommentCount as a server response to the browser. Is there any way to select these two fields in one SQL?
I've tried this:
SELECT `isRead`, count(*) AS cnt FROM comment WHERE relatedRecordId=1 GROUP BY `isRead`
| isRead | cnt |
| FALSE | 2 |
| TRUE | 1 |
But, I have to use a special data structure to map it and sum the cnt fields in two rows to get allCommentCount by using an upper-layer programming language. I want to know if I could get the following format of data by SQL only and in one step:
| newCommentCount | allCommentCount |
|-----------------+-----------------|
| 2 | 3 |
I don't even know how to describe the question. So I got no any search result in Google and Stackoverflow. (Because of My poor English, maybe)
Use conditional aggregation:
SELECT SUM(NOT isRead) AS newCommentCount, COUNT(*) AS allCommentCount
FROM comment
WHERE relatedRecordId = 1;
if I under stand you want show sum of newComments Count and all comments so you can do it like
SELECT SUM ( CASE WHEN isRead=false THEN 1 ELSE 0 END ) AS newComment,
Count(*) AS AllComments From comments where relatedRecord=1
also you can make store procedure for it.
To place two result sets horizontally, you can as simple as use a subquery for an expression in the SELECT CLAUSE as long as the number of rows from the result sets match:
select (select count(*) from c_table where isread=false and relatedRecordId=1 ) as newCommentCount,
count(*) as allCommentCount
from c_table where relatedRecordId=1;

MySQL Inner Join with No Rows

I have a MySQL database that I normalized and the idea is to allow for a business to select zero or more marketing sequences, but the kicker is that a handful of marketing sequences are required (right now I have 4, but the list can grow). So what I've done is structured my tables as such:
sequence
+-------------+------------------+-------+-------------+----------+
| sequence_id | customer_type_id | title | description | required |
| 1 | 1 | ... | ... | true |
| 2 | 1 | ... | ... | true |
| 3 | 1 | ... | ... | false |
| 4 | 2 | ... | ... | true |
| 5 | 3 | ... | ... | true |
| 6 | 4 | ... | ... | false |
+-------------+------------------+-------+-------------+----------+
business_sequence
+----------------------+-------------+-------------+
| business_sequence_id | business_id | sequence_id |
+----------------------+-------------+-------------+
customer_type_id and business_id are foreign key fields that link to tables that describe the type of customer (customer, former customer, etc.) and the business's information (name, address, etc.) respectively.
The reason why I have the required column in my sequence table is so that if a business decides not to allow for any of the non-required sequences, then that business would not need a row. After all, there's no need to have duplicate rows in the business_sequence table if the only piece of data that is different is the business_id field.
Now what I'm trying to do is get all the rows and all the fields from the sequence table where the business_id in the business_sequence table matches a parameterized value (say 1 for the example that I'm going to show in a second). The query that I tried to use is:
SELECT
s.*
FROM
`sequence` AS s
INNER JOIN `business_sequence` AS b ON b.`sequence_id` = s.`sequence_id`
WHERE
b.`business_id` = 1 AND
s.`required` = true;
But this returned no results if the business had no rows in the sequence table. What I expected it to do is return the 0 rows from the b.business_id = 1 but I also expected it to return the 4 "required" rows (ids: 1, 2, 4, and 5) from the s.required = true.
Whenever I took out the INNER JOIN statement and the business_id portion of the WHERE clause, it does in fact return the 4 "required" rows. This leads me to believe that in my original query, because there are no rows for that particular business_id in the sequence table it isn't returning anything.
With all of this being said, how do I accomplish retrieving the zero or more rows when the business_id field matches the parameterized value and retrieve all of the rows when the required field is true?
How about using OR condition in stead of AND ?
SELECT
s.*
FROM
`sequence` AS s
INNER JOIN `business_sequence` AS b ON b.`sequence_id` = s.`sequence_id`
WHERE
b.`business_id` = 1 OR
s.`required` = true;
I was able to resolve my problem by performing a UNION as such:
SELECT * FROM `sequence` WHERE `required` = true
UNION
SELECT
s.*
FROM
`sequence` AS s
INNER JOIN `business_sequence` AS b ON b.`sequence_id` = s.`sequence_id`
WHERE
b.`business_id` = 1

SQL - SELECT with double GROUP BY as two-dimensional array

I have the following tables with the described relations:
alpha_codes
id
code
description
beta_codes
id
code
description
entities
id
quantity
alpha_code_id
beta_codes_entities
id
entity_id
beta_code_id
What I would like to ask is if I can SELECT the SUM of quantities GROUPed BY alpha_code and beta_code, resulting in the the following two-dimensional array:
+------------+---------------+---------------+---------------+---------------+
| | betacode1 | betacode2 | ... | betacodeN |
+------------+---------------+---------------+---------------+---------------+
| alphacode1 | SUM(quantity) | SUM(quantity) | ... | SUM(quantity) |
+------------+---------------+---------------+---------------+---------------+
| alphacode2 | SUM(quantity) | SUM(quantity) | ... | SUM(quantity) |
+------------+---------------+---------------+---------------+---------------+
| ... | ... | ... | ... | SUM(quantity) |
+------------+---------------+---------------+---------------+---------------+
| alphacodeN | SUM(quantity) | SUM(quantity) | SUM(quantity) | SUM(quantity) |
+------------+---------------+---------------+---------------+---------------+
So far I have the following simple query, which although it returns all the required data, the result set format is different and requires manual reform to build the two-dimensional array shown in the above table.
SELECT
`alpha_codes`.`code` as alphacode,
`beta_codes`.`code` as betacode,
SUM(`entities`.`quantity`)
FROM
`entities`
INNER JOIN
`alpha_codes` ON `alpha_codes`.`id` = `entities`.`alpha_code_id`
INNER JOIN
`beta_code_entity` ON `beta_code_entity`.`entity_id` = `entities`.`id`
INNER JOIN
`beta_codes` ON `beta_codes`.`id` = `beta_code_entity`.`beta_code_id`
GROUP BY `alpha_codes`.`id` , `beta_codes`.`id`
OUTPUT
+------------+---------------+------------------------+
| alphacode | betacode | SUM(entities.quantity) |
+------------+---------------+------------------------+
| alphacode1 | betacode1 | SUM(entities.quantity) |
+------------+---------------+------------------------+
| alphacode1 | betacode2 | SUM(entities.quantity) |
+------------+---------------+------------------------+
| alphacode2 | betacode1 | SUM(entities.quantity) |
+------------+---------------+------------------------+
| ... | ... | ... |
+------------+---------------+------------------------+
| alphacodeN | SUM(quantity) | SUM(quantity) |
+------------+---------------+------------------------+
In case of a small fixed number of betacodes, I was thinking of using multiple CASE statements for each one of them. However the real-world scenario includes about 850 alphacodes, 1000 betacodes and over 500k entities, so I'm really worried about performance issues too...
In the solution below :
The SQL Statement has to be dynamically generated.
The list of sum phrases (the portion with .....) have to be generated by looping through betacode values. You could put the Case Statement into a function, while sending the two parameters to the function.
Since you are already in a situation where you have to build N columns, N not known priorly but only at runtime, building the SUM phrase also dynamically based on betacode's value shouldn't be much different/difficult.
Select alphacode,
Sum(Case `beta_codes`.`id` When betacode1_value_goes_here Then quantity Else 0 End Case) as betacode1,
Sum(Case `beta_codes`.`id` When betacode2_value_goes_here Then quantity Else 0 End Case) as betacode2,
Sum(Case `beta_codes`.`id` When betacode3_value_goes_here Then quantity Else 0 End Case) as betacode3,
..............
From
FROM
`entities`
INNER JOIN
`alpha_codes` ON `alpha_codes`.`id` = `entities`.`alpha_code_id`
INNER JOIN
`beta_code_entity` ON `beta_code_entity`.`entity_id` = `entities`.`id`
INNER JOIN
`beta_codes` ON `beta_codes`.`id` = `beta_code_entity`.`beta_code_id`
GROUP BY `alpha_codes`.`id`
Another solution would be :
a. Create a temporary table that houses the output columns. If a single table would be too many columns, create more than one, like a table partitioning, you could join them later.
b. Loop through the records. Start either with entities table or with betacodes. Write Update statements inside the loop that will draw values from the query that you already posted. If you loop through betacode columns you will be updating one column at a time for all entities.

List Last record of each item in mysql

Each item(item is produced by Serial) in my table has many record and I need to get last record of each item so I run below code:
SELECT ID,Calendar,Serial,MAX(ID)
FROM store
GROUP BY Serial DESC
it means it must show a record for each item which in that record all data of columns be for last record related to each item but the result is like this:
-------------------------------------------------------------+
ID | Calendar | Serial | MAX(ID) |
-------------------------------------------------------------|
7031053 | 2016-05-14 14:05:14 79.5 | N10088 | 7031056 |
7053346 | 2016-05-14 15:17:28 79.8 | N10078 | 7053346 |
7051349 | 2016-05-14 15:21:29 86.1 | J20368 | 7051349 |
7059144 | 2016-05-14 15:50:27 89.6 | J20367 | 7059144 |
7045551 | 2016-05-14 15:15:15 89.2 | J20366 | 7045551 |
7056243 | 2016-05-14 15:25:34 85.2 | J20358 | 7056245 |
7042652 | 2016-05-14 15:18:33 83.9 | J20160 | 7042652 |
7039753 | 2016-05-14 11:48:16 87 | J20158 | 7039753 |
7036854 | 2016-05-14 15:18:35 87.5 | J20128 | 7036854 |
7033955 | 2016-05-14 15:20:45 83.4 | 9662 | 7033955 |
-------------------------------------------------------------+
the problem is why for example in record related to Serial N10088 the ID is "7031053", but MAX(ID) is "7031056"? or also for J20358?
each row must show last record of each item but in my output it is not true!
If you want the row with the max value, then you need a join or some other mechanism.
Here is a simple way using a correlated subquery:
select s.*
from store s
where s.id = (
select max(s2.id)
from store s2
where s2.serial = s.serial
);
You query uses a (mis)feature of SQL Server that generates lots of confusion and is not particularly helpful: you have columns in the select that are not in the group by. What value do these get?
Well, in most databases the answer is simple: the query generates an error as ANSI specifies. MySQL pulls the values for the additional columns from indeterminate matching rows. That is rarely what the writer of the query intends.
For performance, add an index on store(serial, id).
try this one.
SELECT MAX(id), tbl.*
FROM store tbl
GROUP BY Serial
You can try with this also...
SELECT ID,Calendar,Serial
FROM store s0
where ID = (
SELECT MAX(id)
FROM store s1
WHERE s1.serial = s0.serial
);

Order the rows of a MySQL result based on a "next_id" field

I'm currently working with a database table that is structured as follows:
______________________________
| id | content | next_id |
|------|-----------|-----------|
| 1 | (value) | 4 |
| 2 | (value) | 1 |
| 3 | (value) | (NULL) |
| 4 | (value) | 3 |
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
The value of the next_id field defines the id of the row of data that should follow it. A value of NULL means that no row follows it.
Is there a way I can query the database in such a way that in the resulting rows will be ordered using this method? For example, in the case I gave above, the rows should be returned ordered so that the ids are in this order: 2, 1, 4, 3. I'm looking for a solution that can do this regardless of the number of rows in this sequence.
I know that it is possible to reorder the results after retrieving them from the database (using the programming language I'm working with), but I'm hoping that there is a way that I can do it in SQL.
I can't see a solution without as many self-joins as you have rows. Instead I would build a nested set out of it in a temp table using push down stack algorithm and then retrieve a full tree.
I've got something that's close.
/*one select to init the #next variable to the first row*/
select #next:= id from table1 order by isnull(next_id) asc, next_id asc limit 1;
select distinct a.id, a.next_id from table1 b
inner join
(
select #rank:= id as id, #next:= next_id as next_id from table1
where id = #next
) a
on (b.id = b.id);
This outputs
+----+---------+
| id | next_id |
+----+---------+
| 2 | 1 |
| 1 | 4 |
And then stops. If only I could find a way for it to continue....
Anyway this sort of force feeding values into a query is dodgy enough when doing ranking, let alone this sort of stuff, so maybe I'm going down a dead end.