Grouped Index in MySQL - mysql

I'm building a table in MySQL and need to build a "grouped index" row that increments, but resets for new values in another column. Like this:
1 Apple
2 Apple
3 Apple
1 Banana
2 Banana
1 Pear
2 Pear
3 Pear
4 Pear
Any ideas how I can do this?

If you are running MySQL 8.0, just use row_number():
select
row_number() over(partition by fruit order by ?) rn,
fruit
from mytable
Note that, for your answer to produce consistent results, you need another column that can be used to order the records. I represented that as ? in the query.

If you use mysql 5.x you can use this Query
CREATE TABLE fruit (
`fruit` VARCHAR(6)
);
INSERT INTO fruit
( `fruit`)
VALUES
( 'Apple'),
( 'Apple'),
( 'Apple'),
( 'Banana'),
( 'Banana'),
( 'Pear'),
( 'Pear'),
( 'Pear'),
( 'Pear');
✓
✓
SELECT
IF(fruit = #fruit, #row_number := #row_number +1,#row_number := 1) rownumber
,#fruit := fruit
FROM
(SELECT * From fruit ORDER BY fruit ASC) t, (SELECT #row_number := 0) a,(SELECT #fruit := '') b ;
rownumber | #fruit := fruit
--------: | :--------------
1 | Apple
2 | Apple
3 | Apple
1 | Banana
2 | Banana
1 | Pear
2 | Pear
3 | Pear
4 | Pear
db<>fiddle here
The order of the columns has to be this way, so that the algorithm can work. If you need it in mysql to change, please use a an outer SELECT

Related

How to ORDER a list where the same value doesn't appear twice in a row?

I'm returning a list of results from a database but because of a design feature I need a specific order.
The results should return randomly. The only criteria is that one of the values should not appear twice in a row.
Here's the example data:
id
animals
color
1
hamster
brown
2
dog
brown
3
horse
white
4
mouse
gray
5
cat
black
6
bird
orange
7
snake
green
8
monkey
orange
9
chameleon
green
So I have a list of animals and their individual colours in the table. I want to return a list of 5 of these animals randomly ordered but without two colours show up in a row. So the dog can't show up after the mouse and the chameleon can't show up after the snake etc...
I have solved this with PHP in the past. But I'm looking for a faster and smarter solution and hopefully in MySQL only.
Let me know :-)
Well, if you're using a recent version of MySQL (8.0+), you can do something like this.
The first CTE term provides the data. You can replace that with any list of data you wish, directly from some table or the result of a complex query expression.
rn0 is the order of the randomly ordered data.
#Zakaria is correct. Here's the adjusted SQL to handle just the requirement that consecutive rows should not have the same color, after randomly ordering the data.
Basically, this randomly orders the data and then takes just the first edge of each color island, and limits the result to 5 islands.
WITH data (id,animals,color) AS (
SELECT 1 AS id, 'hamster' AS animals , 'brown' AS color UNION
SELECT 2, 'dog' , 'brown' UNION
SELECT 3, 'horse' , 'white' UNION
SELECT 4, 'mouse' , 'gray' UNION
SELECT 5, 'cat' , 'black' UNION
SELECT 6, 'bird' , 'orange' UNION
SELECT 7, 'snake' , 'green' UNION
SELECT 8, 'monkey' , 'orange' UNION
SELECT 9, 'chameleon' , 'green'
)
, list1 AS (
SELECT id, animals, color, ROW_NUMBER() OVER (ORDER BY rand()) AS rn0 FROM data
)
, list AS (
SELECT *, CASE WHEN color = LAG(color) OVER (ORDER BY rn0) THEN 0 ELSE 1 END AS good
FROM list1
)
SELECT *
FROM list
WHERE good = 1
ORDER BY rn0
LIMIT 5
;
An example result:
+----+-----------+--------+-----+------+
| id | animals | color | rn0 | good |
+----+-----------+--------+-----+------+
| 9 | chameleon | green | 1 | 1 |
| 2 | dog | brown | 3 | 1 |
| 6 | bird | orange | 4 | 1 |
| 1 | hamster | brown | 5 | 1 |
| 3 | horse | white | 6 | 1 |
+----+-----------+--------+-----+------+
The original SQL, which does more than requested, requiring distinct colors in the result. It's not what was requested.
WITH data (id,animals,color) AS (
SELECT 1, 'hamster' , 'brown' UNION
SELECT 2, 'dog' , 'brown' UNION
SELECT 3, 'horse' , 'white' UNION
SELECT 4, 'mouse' , 'gray' UNION
SELECT 5, 'cat' , 'black' UNION
SELECT 6, 'bird' , 'orange' UNION
SELECT 7, 'snake' , 'green' UNION
SELECT 8, 'monkey' , 'orange' UNION
SELECT 9, 'chameleon' , 'green'
)
, list AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY rand()) AS rn0 FROM data
)
, step1 AS (
SELECT list.*, ROW_NUMBER() OVER (PARTITION BY color ORDER BY rn0) AS rn
FROM list
)
SELECT *
FROM step1
WHERE rn = 1
ORDER BY rn0
LIMIT 5
;
Sample result:
+----+---------+--------+-----+----+
| id | animals | color | rn0 | rn |
+----+---------+--------+-----+----+
| 7 | snake | green | 1 | 1 |
| 6 | bird | orange | 2 | 1 |
| 3 | horse | white | 3 | 1 |
| 1 | hamster | brown | 5 | 1 |
| 5 | cat | black | 6 | 1 |
+----+---------+--------+-----+----+
Do you mean something like this?
select any_value(name), color from animals group by color order by rand() limit 5;
Fiddle here: https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=ff1854c9698e2c60deaf9131ea87774c

MySQL: Sequentially number a column based on change in a different column

If I have a table with the following columns and values, ordered by parent_id:
id parent_id line_no
-- --------- -------
1 2
2 2
3 2
4 3
5 4
6 4
And I want to populate line_no with a sequential number that starts over at 1 every time the value of parent_id changes:
id parent_id line_no
-- --------- -------
1 2 1
2 2 2
3 2 3
4 3 1
5 4 1
6 4 2
What would the query or sproc look like?
NOTE: I should point out that I only need to do this once. There's a new function in my PHP code that automatically creates the line_no every time a new record is added. I just need to update the records that already exist.
Most versions of MySQL do not support row_number(). So, you can do this using variables. But you have to be very careful. MySQL does not guarantee the order of evaluation of variables in the select, so a variable should not be assigned an referenced in different expressions.
So:
select t.*,
(#rn := if(#p = parent_id, #rn + 1,
if(#p := parent_id, 1, 1)
)
) as line_no
from (select t.* from t order by id) t cross join
(select #p := 0, #rn := 0) params;
The subquery to sort the table may not be necessary. Somewhere around version 5.7, this became necessary when using variables.
EDIT:
Updating with variables is fun. In this case, I would just use subqueries with the above:
update t join
(select t.*,
(#rn := if(#p = parent_id, #rn + 1,
if(#p := parent_id, 1, 1)
)
) as new_line_no
from (select t.* from t order by id) t cross join
(select #p := 0, #rn := 0) params
) tt
on t.id = tt.id
set t.line_no = tt.new_line_no;
Or, a little more old school...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,parent_id INT NOT NULL
);
INSERT INTO my_table VALUES
(1, 2),
(2 , 2),
(3 , 2),
(4 , 3),
(5 , 4),
(6 , 4);
SELECT x.*
, CASE WHEN #prev = parent_id THEN #i := #i+1 ELSE #i := 1 END i
, #prev := parent_id prev
FROM my_table x
, (SELECT #prev:=null,#i:=0) vars
ORDER
BY parent_id,id;
+----+-----------+------+------+
| id | parent_id | i | prev |
+----+-----------+------+------+
| 1 | 2 | 1 | 2 |
| 2 | 2 | 2 | 2 |
| 3 | 2 | 3 | 2 |
| 4 | 3 | 1 | 3 |
| 5 | 4 | 1 | 4 |
| 6 | 4 | 2 | 4 |
+----+-----------+------+------+
You can use subquery if the row_number() doesn't help :
select t.*,
(select count(*)
from table t1
where t1.parent_id = t.parent_id and t1.id <= t.id
) as line_no
from table t;

Outliers of data by groups

I want to analyse outliers a of grouped data. Lets say I have data:
+--------+---------+-------+
| fruit | country | price |
+--------+---------+-------+
| apple | UK | 1 |
| apple | USA | 3 |
| apple | LT | 2 |
| apple | LV | 5 |
| apple | EE | 4 |
| pear | SW | 6 |
| pear | NO | 2 |
| pear | FI | 3 |
| pear | PL | 7 |
+--------+---------+-------+
Lets take pears. If my method of finding outliers would be to take 25% highest prices of pears and lowest 25%, outliers of pears would be
+--------+---------+-------+
| pear | NO | 2 |
| pear | PL | 7 |
+--------+---------+-------+
As for apples:
+--------+---------+-------+
| apple | UK | 1 |
| apple | LV | 5 |
+--------+---------+-------+
That I want is to create a view, which would show table of all fruits outliers union. If I had this view, I could analyse only tails, also intersect view with main table to get table without outliers - that's my goal. Solution to this would be:
(SELECT * FROM fruits f WHERE f.fruit = 'pear' ORDER BY f.price ASC
LIMIT (SELECT ROUND(COUNT(*) * 0.25,0)
FROM fruits f2
WHERE f2.fruit = 'pear')
)
union all
(SELECT * FROM fruits f WHERE f.fruit = 'pear' ORDER BY f.price DESC
LIMIT (SELECT ROUND(COUNT(*) * 0.25,0)
FROM fruits f2
WHERE f2.fruit = 'pear')
)
union all
(SELECT * FROM fruits f WHERE f.fruit = 'apple' ORDER BY f.price ASC
LIMIT (SELECT ROUND(COUNT(*) * 0.25,0)
FROM fruits f2
WHERE f2.fruit = 'apple')
)
union all
(SELECT * FROM fruits f WHERE f.fruit = 'apple' ORDER BY f.price DESC
LIMIT (SELECT ROUND(COUNT(*) * 0.25,0)
FROM fruits f2
WHERE f2.fruit = 'apple')
)
This would give me a table I want, however code after LIMIT doesn't seem to be correct... Another problem is number of groups. In this example there are only two groups(pears,apples), but in my actual data there are around 100 groups. So 'union all' should somehow automatically go thru all unique fruits without writing code for each unique fruit, find number of outliers of each unique fruit, take only that numbe of rows and show it all in another table(view).
You can't supply LIMIT with a value from a subquery, in any RDBMS I'm aware of. Some dbs don't even allow host variables/parameters in their versions of the clause (I'm thinking of iSeries DB2).
This is essentially a greatest-n-per-group problem. Similar queries in most other RDBMSs are solved with what are called Windowing functions - essentially, you're looking at a movable selection of data.
MySQL doesn't have this functionality, so we have to counterfeit it. The actual mechanics of the query will depend on the actual data you need, so I can only speak to what you're attempting here. The techniques should be generally adaptable, but may require rather more creativity than otherwise.
To start with you want a function that will return a number indicating it's position - I'm assuming duplicate prices should be given the same rank (ties), and that doing so won't create a gap in the number. This is essentially the DENSE_RANK() windowing function. We can get these results by doing the following:
SELECT fruit, country, price,
#Rnk := IF(#last_fruit <> fruit, 1,
IF(#last_price = price, #Rnk, #Rnk + 1)) AS Rnk,
#last_fruit := fruit,
#last_price := price
FROM Fruits
JOIN (SELECT #Rnk := 0) n
ORDER BY fruit, price
Example Fiddle
... Which generates the following for the 'apple' group:
fruit country price rank
=============================
apple UK 1 1
apple LT 2 2
apple USA 3 3
apple EE 4 4
apple LV 5 5
Now, you're trying to get the top/bottom 25% of rows. In this case, you need a count of distinct prices:
SELECT fruit, COUNT(DISTINCT price)
FROM Fruits
GROUP BY fruit
... And now we just need to join this to the previous statement to limit the top/bottom:
SELECT RankedFruit.fruit, RankedFruit.country, RankedFruit.price
FROM (SELECT fruit, COUNT(DISTINCT price) AS priceCount
FROM Fruits
GROUP BY fruit) CountedFruit
JOIN (SELECT fruit, country, price,
#Rnk := IF(#last_fruit <> fruit, 1,
IF(#last_price = price, #Rnk, #Rnk + 1)) AS rnk,
#last_fruit := fruit,
#last_price := price
FROM Fruits
JOIN (SELECT #Rnk := 0) n
ORDER BY fruit, price) RankedFruit
ON RankedFruit.fruit = CountedFruit.fruit
AND (RankedFruit.rnk > ROUND(CountedFruit.priceCount * .75)
OR RankedFruit.rnk <= ROUND(CountedFruit.priceCount * .25))
SQL Fiddle Example
...which yields the following:
fruit country price
=======================
apple UK 1
apple LV 5
pear NN 2
pear NO 2
pear PL 7
(I duplicated a pear row to show "tied" prices.)
Does round not need 2 / 3 arguments? I.e. do you not need to put in, to what decimal place you wish to round?
so
...
LIMIT (SELECT ROUND(COUNT(*) * 0.25)
FROM #fruits f2
WHERE f2.fruit = 'apple')
becomes
...
LIMIT (SELECT ROUND(COUNT(*) * 0.25,2)
FROM #fruits f2
WHERE f2.fruit = 'apple')
also, just having a quick look at lunch, but it looks like you're just expecting the min / max values. Could you not just use those functions instead?

Update of MySQL table column to sequential digit based on another column

The current table looks something like this:
[id | section | order | thing]
[1 | fruits | 0 | apple]
[2 | fruits | 0 | banana]
[3 | fruits | 0 | avocado]
[4 | veggies | 0 | tomato]
[5 | veggies | 0 | potato]
[6 | veggies | 0 | spinach]
I'm wondering how to make the table look more like this:
[id | section | order | thing]
[1 | fruits | 1 | apple]
[2 | fruits | 2 | banana]
[3 | fruits | 3 | avocado]
[4 | veggies | 1 | tomato]
[5 | veggies | 2 | potato]
[6 | veggies | 3 | spinach]
"order" column updated to a sequential number, starting at 1, based on "section" column and "id" column.
You can do this with an update by using a join. The second table to the join calculates the ordering, which is then used for the update:
update t join
(select t.*, #rn := if(#prev = t.section, #rn + 1, 1) as rn
from t cross join (select #rn := 0, #prev := '') const
) tsum
on t.id = tsum.id
set t.ordering = tsum.rn
You don't want to do this as an UPDATE, as that will be really slow.
Instead, do this on INSERT. Here's a simple one-line INSERT that will grab the next order number and inserts a record called 'kiwi' in the section 'fruits'.
INSERT INTO `table_name` (`section`, `order`, `thing`)
SELECT 'fruits', MAX(`order`) + 1, 'kiwi'
FROM `table_name`
WHERE `section` = `fruits`
EDIT: You could also do this using an insert trigger, e.g.:
DELIMITER $$
CREATE TRIGGER `trigger_name`
BEFORE INSERT ON `table_name`
FOR EACH ROW
BEGIN
SET NEW.`order` = (SELECT MAX(`order`) + 1 FROM `table_name` WHERE `section` = NEW.`section`);
END$$
DELIMITER ;
Then you could just insert your records as usual, and they will auto-update the order value.
INSERT INTO `table_name` (`section`, `thing`)
VALUES ('fruits', 'kiwi')
Rather than storing the ordering, you could derive it:
SELECT t.id
,t.section
,#row_num := IF (#prev_section = t.section, #row_num+1, 1) AS ordering
,t.thing
,#prev_section := t.section
FROM myTable t
,(SELECT #row_num := 1) x
,(SELECT #prev_value := '') y
ORDER BY t.section, t.id
Note that order is a keyword and is therefore not the greatest for a column name. You could quote the column name or give it a different name...

how to condense consecutive duplicate rows in Mysql?

In a mysql database, how can I condense all consecutive duplicates into 1 while maintaining order in a select output?
data:
id fruit
----------
1 Apple
2 Banana
3 Banana
4 Banana
5 Apple
6 Mango
7 Mango
8 Apple
Output I want:
fruit
-------
Apple
Banana
Apple
Mango
Apple
This is a very easy thing to do in unix with the uniq command, but 'distinct' is not as flexible.
IDs are not sequential, and gaps are possible. I was oversimplifying in my example.
Select could be like this:
data:
id fruit
----------
100 Apple
2 Banana
30 Banana
11 Banana
50 Apple
62 Mango
7 Mango
4 Apple
Try this - assuming consecutive IDs with no gaps.
SELECT T.fruit
FROM YOURTABLE T
LEFT JOIN YOURTABLE T2
ON T2.ID = T.ID + 1
WHERE T2.fruit <> T.fruit
OR T2.ID IS NULL
ORDER BY T.ID
Use:
SELECT x.fruit
FROM (SELECT t.fruit,
CASE
WHEN #fruit <> t.fruit THEN #rownum := #rownum + 1
ELSE #rownum
END as rank
FROM YOUR_TABLE t
JOIN (SELECT #rownum := 0, #fruit := NULL) r
ORDER BY t.id) x
GROUP BY x.fruit, x.rank
ORDER BY x.rank