Group By ignores sorting in subquery - mysql

There is a TLDR version at the bottom.
Note: I have based my current solution on the proposed solution in this question here (proposed in the question text itself), however it does not work for me even if it works for that person. So I'm not sure how to handle this, because the question seems like a duplicate but the answer given there doesn't work for me. So I guess something must be different for me. If someone can tell me how to correctly handle this, I'm open to hearing.
I have a table like this one here:
scope_id key_id value
0 0 0_0
0 1 0_1
1 0 1_0
2 0 2_0
2 1 2_1
The scopes have a hierarchy where scope 0 is the parent of scope 2 and scope 2 is the parent of scope 1. (on purpose not sorted, they IDs are UUIDs, just for reading numbers here)
My use case is that I want the value of multiple keys in a specific scope (scope 1). However if there is no value defined for scope 1, I would be fine with a value from its parent (scope 2) and lastly if there is also no value in scope 2 I would take a value from its parent, scope 0. So if possible, I want the value from scope 1, if it doesn't have a value then from scope 2 and lastly I try to get the value from scope 0. (The scopes are a tree structure, so each scope can have max one parent, however a parent can have multiple childs).
So in the example above, if I want the value of key 0 in scope 1, I'd like to get 1_0 as the key is defined in the scope. If I want the value of key 1 in scope 1, I'd like to get 2_1 as there is no value defined in the scope 1 but in its parent scope 2 there is. And lastly if I want the value of keys 0 and 1 in scope 1, I want to get 1_0 and 2_1.
Currently it is solved by making 3 separate SQL requests and merging it in code. That works fine and fast enough, but I want to see if it would be faster with a single SQL query. I came up with the following query (based on the update in the question text here):
SELECT *
FROM (
SELECT *
FROM test
WHERE key_id IN (0, 1)
AND scope_id IN (1 , 2, 0)
ORDER BY FIELD(scope_id, 1 , 2, 0)
) t1
GROUP BY t1.key_id;
The inner subquery first finds all keys that I want to look at and makes sure they are in the scope that I want to look at or it's parent scope. Then I order the scopes, so that first the child is, then the parent, then the grandparent. Now I expect group by to leave the value of the first row it finds, so hopefully the child (scope 1). However this doesn't work. Instead the first value based on the actual table is used.
TLDR
When grouping with GROUP BY in the query above, why is the order defined by the ORDER BY query ignored? Instead the first value based on the original table is taken when grouping.
Using this code you can try for yourself:
# this group by doesn't work with strict mode
SET sql_mode = '';
CREATE TABLE IF NOT EXISTS test(
scope_id int,
key_id int,
`value` varchar(20),
PRIMARY KEY (scope_id, key_id)
);
INSERT IGNORE INTO test values
(0, 0, "0_0"),
(1, 0, "1_0"),
(2, 0, "2_0"),
(2, 1, "2_1"),
(0, 1, "0_1");
SELECT *
FROM (
SELECT *
FROM test
WHERE key_id IN (0, 1)
AND scope_id IN (1 , 2, 0)
ORDER BY FIELD(scope_id, 1 , 2, 0)
) t1
GROUP BY t1.key_id;
# expected result are the rows that contain value 1_0 and 2_1

I understand your question as a greatest-n-per-group variant.
In this situation, you should not think aggregation, but filtering.
You could solve it with a correlated subquery that selects the first available scope_id per key_id:
select t.*
from test t
where t.scope_id = (
select t1.scope_id
from test t1
where t1.key_id = t.key_id
order by field(scope_id, 1, 2, 0)
limit 1
)
For performance, you want an index on (key_id, scope_id).
Demo on DB Fiddle:
scope_id | key_id | value
-------: | -----: | :----
1 | 0 | 1_0
2 | 1 | 2_1

This will get what you want. Use a row number to effectively "save" your order for the next section of the query.
MySQL 8.0 or newer:
SELECT *
FROM (
SELECT *, ROW_NUMBER() rank
FROM test
WHERE key_id IN (0, 1)
AND scope_id IN (1 , 2, 0)
ORDER BY FIELD(scope_id, 1 , 2, 0)
) t1
GROUP BY t1.key_id
order by rank;
MySQL 5.7 or older:
SET #row_num = 0;
SELECT *
FROM (
SELECT *, #row_num := #row_num + 1 rank
FROM test
WHERE key_id IN (0, 1)
AND scope_id IN (1 , 2, 0)
ORDER BY FIELD(scope_id, 1 , 2, 0)
) t1
GROUP BY t1.key_id
ORDER BY rank;
Soap Box: MySQL results are, in general, horribly unreliable in any query that has 1 or more columns in a group by or aggregate but does not have all columns in a group by or aggregate.

Related

Keep the newest one field value until it changes then keeping its newest field value

I have a few tables that have millions of records where a sensor was sending multiple 0 and 1 values and this data was logged to the table even though we only needed it to keep the very first 1 or 0 per each 1 to 0 or 0 to 1 change.
Adjustments have been made so we only now get the 1 and 0 values on each change and not every one second or whatever but I need to cleanup the unnecessary records from the tables.
I've done some research and testing and I'm having trouble figuring out what method to use here to delete the records not needed. I was trying to figure out how to retain the previous value record using variables and also created row numbers but it's not working as I need it to.
I created an SQLFiddle here and tried some logic per the example post MySQL - How To Select Rows Depending on Value in Previous Row (Remove Duplicates in Each Sequence). I keep getting back no results from this and when I tried running it on a large local MySQL table, and I got an error wto I have to increase the MySQL Workbench read query timeout to 600 or it lost connection.
I also found the "MySql - How get value in previous row and value in next row?" post and tried some variations of it and also "How to get next/previous record in MySQL?" and I've come up with total failure getting the expected results.
The Data
The data in the tables has a TimeStr column and a Value column just as in the screen shot and on the SQLFiddle link I posted with a small sample of the data.
Each record will never have the same TimeStr value but I really only need to keep the very first record time wise when the sensor either turned ON or OFF if that clarifies.
I'm not sure if the records will need an incremental row number added to get the expected results since it only has the TimeStr and the Value records otherwise.
My Question
Can anyone help me determine a method that I can use on a couple large tables to delete the records from a table where there are subsequent and duplicate Value values so the tables only has the very first 1 or 0 records where those actually change from a 1 to 0 or 0 to 1?
I will accept an answer that also results in just the records needed—but any that perform fast would be even more greatly appreciated.
I can easily put those into a temp table, drop the original table, and then create and insert the needed records only into the original table.
Expected Results
| TimeStr | Value |
|----------------------|-------|
| 2018-02-13T00:00:00Z | 0 |
| 2018-02-13T00:00:17Z | 1 |
| 2018-02-13T00:00:24Z | 0 |
| 2018-02-13T00:00:28Z | 1 |
Select t.timestr, t.value from (
SELECT s.*, #pv x1, (#pv := s.value) x2
FROM sensor S, (select #pv := -1) x
ORDER BY TimeStr ) t
where t.x1 != t.x2
See http://sqlfiddle.com/#!9/8d0774/122
Try this :
SET #rownum = 0;
SET #rownum_x = 0;
SELECT b.rownum, b.TimeStr, b.Value
FROM
(
SELECT #rownum := #rownum+1 as rownum, TimeStr, Value
FROM sensor
ORDER BY TimeStr
) b
LEFT JOIN (
SELECT #rownum_x := #rownum_x+1 as rownum_x, TimeStr as TimeStr_x, Value as Value_x
FROM sensor
ORDER BY TimeStr
) x ON b.rownum = x.rownum_x + 1
where b.Value <> x.Value_x or x.Value_x is null
order by b.TimeStr
The result I got is
You want the first record for each value when it appears. This suggests variables. Here is one way that only involves sorting and no joining:
select t.*
from (select t.*,
(case when value = #prev_value then value
when (#save_prev := #prev_value) = NULL then NULL
when (#prev_value := value) = NULL then NULL
else #save_prev
end) as prev_value
from (select t.*
from sensor t
order by timestr
) t cross join
(select #prev_value := -1) params
) t
where prev_value <> value;
Notes:
The subquery for ordering only seems to be needed since MySQL 5.7.
The case is just a way to introduce serialized code. When using a variable it should only be used on one expression.
This only requires one sort -- and if you have an index, that doesn't even need to be a sort.
Here is a SQL Fiddle.

searching for records in mysql using or - and - not in query

I think I am getting turned around when looking at this. I am trying to get all patron records relating to transactions that have a transaction item with one of a number of ids (1 or 2) as well as transaction items with other ids (3 or 4) but not with transaction items with other ids (5 or 6)
The structure is:
=patron=
id
fname
lname
email
phone
=trans=
id
id_org
id_patron
=trans_item=
id
id_trans
id_perf
I was trying the following:
SELECT
patron.email,
patron.fname,
patron.lname,
patron.phone
FROM
trans_item,
trans,
patron
WHERE
trans_item.id_perf IN (1,2)
AND
trans_item.id_perf IN (3,4)
AND
trans_item.id_perf NOT IN (5,6)
AND
trans_item.id_trans = trans.id
AND
trans.id_org = 1
AND
trans.id_patron = patron.id
GROUP BY
patron.id
ORDER BY
patron.email DESC,
patron.phone DESC
I'm aware that saying the id needs to be 2 AND 4 is always going to return nothing but I need to have it as if id is in (1,2) AND (3,4) so it can be 1 or 2 but also needs to be in 3 or 4
For Clarity:
I am trying to get patrons who have gone to performance 1 OR 2 and 3 OR 4 but NOT 5 OR 6
You can do this with group by and having. The basic idea is:
select ti.id_trans
from trans_item ti
group by ti.id_trans
having sum(ti.id_perf in (1, 2)) > 0 and
sum(ti.id_perf in (3, 4)) > 0 and
sum(ti.id_perf in (5, 6)) = 0;
Each condition in the having clause checks a row for the particular ids. The > 0 means they exist for transaction. The = 0 means they do not.
If you want additional columns from other tables, you can join back to this result set.
I think I have a solution. If I combine the ids for all perfs and group all results by the trans_item.id I can get a list that has duplicates. I then convert them into a php multidimensional array and exclude / include based on the ids for each requirement finding the duplicates that way. Any other suggestions are welcome

MySQL multi-step GROUP BY without subquery

I'm working on improving some queries I inherited, and was curious if it was possible to do the following - given a table the_table that looks like this:
id uri
---+-------------------------
1 /foo/bar/x
1 /foo/bar/y
1 /foo/boo
2 /alpha/beta/carotine
2 /alpha/delic/ipa
3 /plastik/man/spastik
3 /plastik/man/krakpot
3 /plastik/man/helikopter
As an implicit intermediate step I'd like to group these by the 1st + 2nd tuple of uri. The results of that step would look like:
id base
---+---------------
1 /foo/bar
1 /foo/boo
2 /alpha/beta
2 /alpha/delic
3 /plastik/man
And the final result would reflect the number of unique tuple1 + tuple2 values, per unique id:
id cnt
---+-----
1 2
2 2
3 1
I can achieve these results, but not without doing a subquery (to get the results of the implicit step mentioned above), and then select/grouping out of that. Something like:
SELECT
id,
count(base) cnt
FROM (
SELECT
id,
substring_index(uri, '/', 3) AS base
FROM the_table
GROUP BY id, base
)
GROUP BY id;
My reason for wanting to avoid the subquery is that I'm working with a fairly large (20M rows) data set, and the subquery gets very expensive. Gut tells me it's not doable, but figured I'd ask SO...
There's no need for a subquery -- you can use count with distinct to achieve the same result:
SELECT
id,
count(distinct substring_index(uri, '/', 3)) AS base
FROM the_table
GROUP BY id
SQL Fiddle Demo
BTW -- this returns count of 1 for id 3 -- I assume that was a typo in your posting.

how can I tell if the last x rows of 'state' = 1

I need help with a SQL query.
I have a table with a 'state' column. 0 means closed and 1 means opened.
Different users want to be notified after there have been x consecutive 1 events.
With an SQL query, how can I tell if the last x rows of 'state' = 1?
If, for example, you want to check if the last 5 consecutive rows have a state equals to 1, then here's you could probably do it :
SELECT IF(SUM(x.state) = 5, 1, 0) AS is_consecutive
FROM (
SELECT state
FROM table
WHERE Processor = 3
ORDER BY Status_datetime DESC
LIMIT 5
) as x
If is_consecutive = 1, then, yes, there is 5 last consecutive rows with state = 1.
Edit : As suggested in the comments, you'll have to use ORDER BY in your query, to get the last nth rows.
And for more accuracy, since you have a timestamp column, you should use Status_datetime to order the rows.
You should be able to use something like this (replace the number in the HAVING with the value of x you want to check for):
SELECT Processor, OpenCount FROM
(
SELECT TOP 10 Processor, DateTime, Sum(Status) AS OpenCount
FROM YourTable
WHERE Processor = 3
ORDER BY DateTime DESC
) HAVING OpenCount >= 10

MySQL best practice: matching prefixes

I have a table with codes and an other table with prefixes. I need to match the (longest) prefix for each code.
There is also a secondary scope in which I have to restrict prefixes (this involves bringing in other tables). I don't think this would matter in most cases, but here is a simplified (normalized) scheme (I have to set item.prefix_id):
group (id)
subgroup (id, group_id)
prefix (id, subgroup_id, prefix)
item (id, group_id, code, prefix_id)
It is allright to cache the length of the prefix in a new field and index it. It is allright to cache the group_id in prefix table (although groups are fairly small tables, in most cases I don't think any performance increase is gained). item table contains a few hundred thousand records, prefix contains at most 500.
Edit:
Sorry If the question was not defined enough. When using the word "prefix" I actually mean it, so the codes have to start with the actual prefix.
subgroup
id group_id
-------------
1 1
2 1
3 1
4 2
prefix
id subgroup_id prefix
------------------------
1 1 a
2 2 abc
3 2 123
4 4 abcdef
item
id group_id code prefix_id
-----------------------------------
1 1 abc123 NULL
2 1 abcdef NULL
3 1 a123 NULL
4 2 abc123 NULL
The expected result for the prefix column is (item.id, item.prefix_id):
(1, 2) Because: subroups 1, 2, 3 are under group 1, the code abc123 starts with the the prefix a and the prefix abc and abc is the logest of the two, so we take the id of abc which is 2 and put it into item.prefix_id.
(2, 2) Because: even though prefix {4} (which is abcdef) is the logest matching prefix, it's subgroup (which is 4) is under group 2 but the item is under group 1, so we can choose from subgroups 1, 2, 3 and still abc is the logest match out of the three possible prefixes.
(3, 1) Because: a is the logest match.
(4, NULL) Because: item 4 is under group 2 and the only prefix under group 2 is abcdef which is no match to abc123 (because abc123 does not start with abcdef).
But as I said the whole groping thing is not essential part of the question. My main concern is to match a table with possible prefixes to a table of strings, and how to do it the best way. (Best meaning an optimal tradeoff between readability, maintainability and performance - hence the 'best prectice' in the title).
Currently I'm doing something like:
UPDATE item USE INDEX (code3)
LEFT JOIN prefix ON prefix.length=3 AND LEFT(item.code,3)=prefix.prefix
LEFT JOIN subgroup ON subgroup.id=prefix.subgroup_id
WHERE subgroup.group_id == item.group_id AND
item.segment_id IS NULL
Where code3 is a KEY code3 (segment_id, group_id, code(3)). - And the same logic is repeate with 1, 2, 3 and 4 as length. It seems pretty efficient, but I don't like the presence of duplication in it (4 queries for a single operation). - of course this is in the case when the maximum legth of prefixes is 4.
Thanks for everyone for sharing your ideas this far.
It is allright to cache the group_id in prefix table.
So let's create column group_id in table prefix and fill the column with the appropriate values. I assume that you know how to do this, so let's go to the next step.
The biggest performance benefit we will get from this composite index:
ALTER TABLE `prefix` ADD INDEX `c_index` (
`group_id` ASC,
`prefix` ASC
);
And the UPDATE statement:
UPDATE item i
SET
prefix_id = (
SELECT p.id
FROM prefix p USE INDEX (`c_index`)
WHERE
p.group_id = i.group_id AND
p.prefix IN (
LEFT(i.code, 4),
LEFT(i.code, 3),
LEFT(i.code, 2),
LEFT(i.code, 1)
)
ORDER BY LENGTH(p.prefix) DESC
LIMIT 1
)
In this example I assume that prefix is variable length {1,4}. Together I decided to use IN clause instead of LIKE for to get the full benefit of c_index.
Unless I'm overly simplifying, should be as simple as... Start an inner pre-query to get the longest prefix (regardless of if multiple have the same length per code)
select
PreQuery.Code,
P2.ID,
P2.SubGroup_ID,
P2.Prefix
From
( select
i.code,
max( length( trim( p.Prefix ))) as LongestPrefix
from
item i
join prefix p
on i.prefix_id = p.id
group by
i.code ) PreQuery
Join item i2
on PreQuery.Code = i2.Code
Join Prefix P2
on i2.Prefix_ID = P2.ID
AND PreQuery.LongestPrefix = length( trim( P2.Prefix )))
Now, if you want to do something special about those where there are multiple with the same prefix length, it will need some adjusting, but this should get it for you.
To re-answer since you are trying to UPDATE elements, try the following update query. Now here's the catch around this... The "PreQuery" will actually return ALL matching prefixes for a given item... However, since the order is based on the Prefix Length, for those entries that have more than one matching "prefix", it will first be updated with the shortest prefix, then hit the record with the next longer prefix, and finally end with whichever has the longest for the match. So at the end, it SHOULD get you what you need.
That being said (and I can't specifically test now), if it is only updating based on the FIRST entry found for a given ID, then just make the order in DESCENDING order of the prefix length.
update Item,
( SELECT
I.ID,
P.ID Prefix_ID,
P.Prefix,
I.Code,
LENGTH( TRIM( P.Prefix )) as PrefixLen
FROM
Item I
JOIN SubGroup SG
ON I.Group_ID = SG.Group_ID
JOIN Prefix P
ON SG.ID = P.SubGroup_ID
AND LEFT( P.Prefix, LENGTH( TRIM( P.Prefix )))
= LEFT( I.Code, LENGTH( TRIM( P.Prefix )))
ORDER BY
I.ID,
LENGTH( TRIM( P.Prefix )) ) PreQuery
set
Prefix_ID = PreQuery.Prefix_ID
where
ID = PreQuery.ID