I have a table that can have some valid duplicated values so I need an additional column with the sequence number of appearance of said duplicate for future use.
A sample might be
ROW | COLUMN_A | COLUMN_B | COLUMN_C | SEQ_NUM <= Want this column
1 A B 1 1
2 A B 1 2
3 A B 2 1
4 A B 2 2
5 A B 2 3
The values are supposed to be unique like (COLUMN_A, COLUMB_B, COLUMN_C), but I cannot use a unique index because I need those duplicated values as well, I just need to keep track of the order of apparition. So I added a column SEQ_NUM to keep track of those repetitions.
And i fill it like this:
begin
declare done boolean default false;
declare _A varchar(1);
declare _B varchar(1);
declare _C integer unsigned;
declare cur cursor for
select COLUMN_A , COLUMN_B , COLUMN_C
from tmp_horario
group by COLUMN_A , COLUMN_B , COLUMN_C
having count(*) > 1; -- Here I loop throught the repeated values
declare continue handler for not found set done := true;
open cur;
loop_dup: loop
fetch cur into _A, _B, _C;
if done then
leave loop_dup;
end if;
set #_seq = 0; -- I initialize my sequence in 0 to start
update tmp_table h
set h.SEQ_NUM = (#_seq := #_seq + 1) -- Set the next sequential to the repeated values
where h.COLUMN_A = _A
and h.COLUMN_B = _B
and h.COLUMN_C = _C;
end loop loop_dup;
close cur;
end;
Note: The table has way more columns making the cursor (fetch into) a bigger pain.
As you can see that works like charm except that it takes my store from 20 s to 80 s which I find a little disappointing (already checked indexes and they are being properly used), I believe the problem lies in the use of the cursor.
My question then is: Is there a way of setting that famous sequential number in a single query without the cursor?.
Assuming you want this to happen when you insert a value to the table you could do this as such:
INSERT INTO tmp_horario(COLUMN_A, COLUMN_B, COLUMN_C, SEQ_NUM)
VALUE(A_VAL, B_VAL, C_VAL, (IFNULL((
SELECT MAX(SEQ_NUM)
FROM tmp_horario AS a
WHERE a.COLUMN_A = A_VAL AND a.COLUMN_B = B_VAL AND a.COLUMN_C = C_VAL), 0)+1));
The basic premise is you look for rows with the same values, get the maximum sequential value if one exists, and then add one to that for the new value. If no match is found then set the insert value to one. The IFNULL statement is really all you need to get the SEQ_NUM, should you need to adapt this query.
Yes pretty much like your cursor
DROP TABLE IF EXISTS T;
CREATE TABLE T(ROW INT, COLUMN_A VARCHAR(1), COLUMN_B VARCHAR(1), COLUMN_C VARCHAR(1), SEQ_NUM INT);
INSERT INTO T VALUES
(1 , 'A' , 'B' , 1,NULL),
(2 , 'A' , 'B' , 1,NULL),
(3 , 'A' , 'B' , 2,NULL),
(4 , 'A' , 'B' , 2,NULL),
(5 , 'A' , 'B' , 2,NULL);
UPDATE T
JOIN (
SELECT T.ROW,
IF(CONCAT(T.COLUMN_A,T.COLUMN_B,T.COLUMN_C) <> #P , #RN:=1,#RN:=#RN+1) RN,
#P:=CONCAT(T.COLUMN_A,T.COLUMN_B,T.COLUMN_C) P
FROM T , (SELECT #RN:=0,#P:=0) R
ORDER BY ROW
) S ON S.ROW = T.ROW
SET SEQ_NUM = S.RN
WHERE 1 = 1
MariaDB [sandbox]> SELECT * FROM T;
+------+----------+----------+----------+---------+
| ROW | COLUMN_A | COLUMN_B | COLUMN_C | SEQ_NUM |
+------+----------+----------+----------+---------+
| 1 | A | B | 1 | 1 |
| 2 | A | B | 1 | 2 |
| 3 | A | B | 2 | 1 |
| 4 | A | B | 2 | 2 |
| 5 | A | B | 2 | 3 |
+------+----------+----------+----------+---------+
5 rows in set (0.00 sec)
Related
I guess this is going to be a very easy one for anyone who knows SQL programming...
SQLFiddle
Here is the code I am executing against the database:
DECLARE #maxCounter int
-- Used to get the maximum bound number for my loop, basically what is the highest number of records. Tested, seems to work as expected.
SET #maxCounter = (SELECT TOP 1 COUNT(SN)
FROM TestResults
WHERE Type = 'EX'
GROUP BY SN
ORDER BY COUNT(SN) DESC)
CREATE TABLE #Info
(
DLoc VARCHAR(500),
DCode VARCHAR(500),
Dobs VARCHAR(500)
)
DECLARE #counter INT
SET #counter = 1
WHILE #counter <= #maxCounter
BEGIN
INSERT INTO #Info (DLoc)
VALUES ('Location_' + CAST(#counter AS VARCHAR(16)))
INSERT INTO #Info (DCode)
VALUES ('Code_' + CAST(#counter AS VARCHAR(16)))
INSERT INTO #Info (Dobs)
VALUES ('Observation_' + CAST(#counter AS VARCHAR(16)))
SET #counter = #counter + 1
END
SELECT * FROM #Info;
DROP TABLE #Info;
If you see some weird things in the code, then that is because I am a total beginner and do not know any better.
Expected output of the while loop:
+------------+---------+---------------+
| defLoc | defCode | obs |
+------------+---------+---------------+
| Location_1 | Code_1 | Observation_1 |
| Location_2 | Code_2 | Observation_2 |
| Location_3 | Code_3 | Observation_3 |
+------------+---------+---------------+
Unexpected output result:
defLoc | defCode | obs |
-----------------+-------------+---------------|
Location_1 | | |
| Code_1 | |
| | Observation_1 |
Location_2 | | |
| Code_2 | |
| | Observation_2 |
Location_3 | | |
| Code_3 | |
| | Observation_3 |
I have no clue where the empty cells come from...
You need to use ONE INSERT for each iteration, and specify all three columns and their values in one go:
WHILE #counter <= #maxCounter
BEGIN
INSERT INTO #Info (DLoc, DCode, Dobs)
VALUES ('Location_' + CAST(#counter AS VARCHAR(16)),
'Code_' + CAST(#counter AS VARCHAR(16)),
'Observation_' + CAST(#counter AS VARCHAR(16))
)
SET #counter = #counter + 1
END
Each INSERT will insert a whole row - the value you provided is inserted into the column you specified, but the other columns will all remain NULL.
You can insert each row with 3 different column values with a single insert statement in a loop:
WHILE #counter <= #maxCounter
BEGIN
INSERT INTO #Info (DLoc,DCode,Dobs)
VALUES (
'Location_' + CAST(#counter AS VARCHAR(16))
, 'Code_' + CAST(#counter AS VARCHAR(16))
, 'Observation_' + CAST(#counter AS VARCHAR(16))
);
SET #counter = #counter + 1;
END;
Alternatively, you could insert all rows at once in a single INSERT...SELECT statement using a tally table as the source. If you have no tally table, you can use a CTE to generate the set of numbers. The example below uses a CTE with ROW_NUMBER() that can be extended as needed.
WITH
t10 AS (SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) t(n))
,t1k AS (SELECT 0 AS n FROM t10 AS a CROSS JOIN t10 AS b CROSS JOIN t10 AS c)
,t1m AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS num FROM t1k AS a CROSS JOIN t1k AS b CROSS JOIN t1k AS c)
INSERT INTO #Info (DLoc,DCode,Dobs)
SELECT
'Location_' + CAST(num AS VARCHAR(16))
, 'Code_' + CAST(num AS VARCHAR(16))
, 'Observation_' + CAST(num AS VARCHAR(16))
FROM t1m
WHERE num <= #maxCounter;
In SQL Server, set-based operations generally outperform procedural looping constructs by orders of magnitude.
Consider the following table:
CREATE TABLE foo (
id INT PRIMARY KEY,
effective_date DATETIME NOT NULL UNIQUE
)
Given a set of dates D, how do you fetch all rows from foo whose effective_date is the greatest value less than each date in D in a single query?
For simplicity, assume that each date will have exactly one matching row.
Suppose foo has the following rows.
---------------------
| id |effective_date|
---------------------
| 0 | 2013-01-07|
---------------------
| 1 | 2013-02-03|
---------------------
| 2 | 2013-04-19|
---------------------
| 3 | 2013-04-20|
---------------------
| 4 | 2013-05-11|
---------------------
| 5 | 2013-06-30|
---------------------
| 6 | 2013-12-08|
---------------------
If you were given D = {2013-02-20, 2013-06-30, 2013-12-19}, the query should return the following:
---------------------
| id |effective_date|
---------------------
| 1 | 2013-02-03|
| 4 | 2013-05-11|
| 6 | 2013-12-08|
If D had only one element, say D = {2013-06-30}, you could just do:
SELECT *
FROM foo
WHERE effective_date = SELECT MAX(effective_date) FROM foo WHERE effective_date < 2013-06-30
How do you generalize this query when the size of D is greater than 1, assuming D will be specified in an IN clause?
Actually, your problem is - that you have a list of values, which will be treated in MySQL as row - and not as a set - in most cases. That is - one of possible solutions is to generate your set properly in application so it will look like:
SELECT '2013-02-20'
UNION ALL
SELECT '2013-06-30'
UNION ALL
SELECT '2013-12-19'
-and then use produced set inside JOIN. Also, that will be great, if MySQL could accept static list in ANY subqueries - like for IN keyword, but it can't. ANY also expects rows set, not list (which will be treated as row with N columns, where N is count of items in your list).
Fortunately, in your particular case your issue has important restriction: there could be no more items in list, than rows in your foo table (it makes no sense otherwise). So you can dynamically build that list, and then use it like:
SELECT
foo.*,
final.period
FROM
(SELECT
period,
MAX(foo.effective_date) AS max_date
FROM
(SELECT
period
FROM
(SELECT
ELT(#i:=#i+1, '2013-02-20', '2013-06-30', '2013-12-19') AS period
FROM
foo
CROSS JOIN (SELECT #i:=0) AS init) AS dates
WHERE period IS NOT NULL) AS list
LEFT JOIN foo
ON foo.effective_date<list.period
GROUP BY period) AS final
LEFT JOIN foo
ON final.max_date=foo.effective_date
-your list will be automatically iterated via ELT(), so you can pass it directly to query without any additional restructuring. Note, that this method, however, will iterate through all foo records to produce row set, so it will work - but doing the stuff in application may be more useful in terms of performance.
The demo for your table can be found here.
perhaps this can help :
SELECT *
FROM foo
WHERE effective_date IN
(
(SELECT MAX(effective_date) FROM foo WHERE effective_date < '2013-02-20'),
(SELECT MAX(effective_date) FROM foo WHERE effective_date < '2013-06-30'),
(SELECT MAX(effective_date) FROM foo WHERE effective_date < '2013-12-19')
)
result :
---------------------
| id |effective_date|
---------------------
| 1 | 2013-02-03| -- different
| 4 | 2013-05-11|
| 6 | 2013-12-08|
UPDATE - 06 December
create procedure :
DELIMITER $$
USE `test`$$ /*change database name*/
DROP PROCEDURE IF EXISTS `myList`$$
CREATE PROCEDURE `myList`(ilist VARCHAR(100))
BEGIN
/*var*/
/*DECLARE ilist VARCHAR(100) DEFAULT '2013-02-20,2013-06-30,2013-12-19';*/
DECLARE delimeter VARCHAR(10) DEFAULT ',';
DECLARE pos INT DEFAULT 0;
DECLARE item VARCHAR(100) DEFAULT '';
/*drop temporary table*/
DROP TABLE IF EXISTS tmpList;
/*loop*/
loop_item: LOOP
SET pos = pos + 1;
/*split*/
SET item =
REPLACE(
SUBSTRING(SUBSTRING_INDEX(ilist, delimeter, pos),
LENGTH(SUBSTRING_INDEX(ilist, delimeter, pos -1)) + 1),
delimeter, '');
/*break*/
IF item = '' THEN
LEAVE loop_item;
ELSE
/*create temporary table*/
CREATE TEMPORARY TABLE IF NOT EXISTS tmpList AS (
SELECT item AS sdate
);
END IF;
END LOOP loop_item;
/*view*/
SELECT * FROM tmpList;
END$$
DELIMITER ;
call procedure :
CALL myList('2013-02-20,2013-06-30,2013-12-19');
query :
SELECT
*,
(SELECT MAX(effective_date) FROM foo WHERE effective_date < sdate) AS effective_date
FROM tmpList
result :
------------------------------
| sdate |effective_date|
------------------------------
| 2013-02-20 | 2013-02-03 |
| 2013-06-30 | 2013-05-11 |
| 2013-12-19 | 2013-12-08 |
The bad way first (without ordered analytical functions, or rank/row_number)
sel tmp.min_effective_date, for_id.id
from
(
Sel crossed.effective_date,max(SRC.effective_date) as min_effective_date
from
foo as src
cross join
foo as crossed
where
src.effective_date <cross.effective_date
and crossed.effective_date in
(given dates here)
group by 1
) tmp inner join foo as for_id on
tmp.effective_date =for_id.effective_date
Next, with row_number
SEL TGT.id, TGT.effective_date
(Sel id, effective_date, row_number() over(order by effective_date asc) as ordered
) SRC
INNER JOIN
(Sel id, effective_date, row_number() over(order by effective_date asc) as ordered ) TGT
on
src.ordered+1=TGT.ordered
where src.effective_date in (given dates)
with ordered analytical functions:
sel f.id, tmp.eff
foo as f inner join
(SEL ID, max(effective_date) over(order by effective_date asc ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as eff
from foo
) TMP
on f.id = tmp.id
where f.effective_date in (given dates)
and tmp.eff is not null
the queries above assume id needs to be selected, and the ids in the source don't follow the same sequence (eg ascending) as the dates. Otherwise, you can straight away use the ordered analytical function.
I seem to come against this problem a lot, where I have data that's formatted like this:
+----+----------------------+
| id | colors |
+----+----------------------+
| 1 | Red,Green,Blue |
| 2 | Orangered,Periwinkle |
+----+----------------------+
but I want it formatted like this:
+----+------------+
| id | colors |
+----+------------+
| 1 | Red |
| 1 | Green |
| 1 | Blue |
| 2 | Orangered |
| 2 | Periwinkle |
+----+------------+
Is there a good way to do this? What is this kind of operation even called?
You could use a query like this:
SELECT
id,
SUBSTRING_INDEX(SUBSTRING_INDEX(colors, ',', n.digit+1), ',', -1) color
FROM
colors
INNER JOIN
(SELECT 0 digit UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) n
ON LENGTH(REPLACE(colors, ',' , '')) <= LENGTH(colors)-n.digit
ORDER BY
id,
n.digit
Please see fiddle here. Please notice that this query will support up to 4 colors for every row, you should update your subquery to return more than 4 numbers (or you should use a table that contains 10 or 100 numbers).
I think it is what you need (stored procedure) : Mysql split column string into rows
DELIMITER $$
DROP PROCEDURE IF EXISTS explode_table $$
CREATE PROCEDURE explode_table(bound VARCHAR(255))
BEGIN
DECLARE id INT DEFAULT 0;
DECLARE value TEXT;
DECLARE occurance INT DEFAULT 0;
DECLARE i INT DEFAULT 0;
DECLARE splitted_value INT;
DECLARE done INT DEFAULT 0;
DECLARE cur1 CURSOR FOR SELECT table1.id, table1.value
FROM table1
WHERE table1.value != '';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
DROP TEMPORARY TABLE IF EXISTS table2;
CREATE TEMPORARY TABLE table2(
`id` INT NOT NULL,
`value` VARCHAR(255) NOT NULL
) ENGINE=Memory;
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO id, value;
IF done THEN
LEAVE read_loop;
END IF;
SET occurance = (SELECT LENGTH(value)
- LENGTH(REPLACE(value, bound, ''))
+1);
SET i=1;
WHILE i <= occurance DO
SET splitted_value =
(SELECT REPLACE(SUBSTRING(SUBSTRING_INDEX(value, bound, i),
LENGTH(SUBSTRING_INDEX(value, bound, i - 1)) + 1), ',', ''));
INSERT INTO table2 VALUES (id, splitted_value);
SET i = i + 1;
END WHILE;
END LOOP;
SELECT * FROM table2;
CLOSE cur1;
END; $$
This saved me many hours! Taking it a step further: On a typical implementation there would in all likelyhood be a table that enumerates the colours against an identitying key, color_list. A new colour can be added to the implementation without having to modify the query and the potentially endless union -clause can be avoided altogether by changing the query to this:
SELECT id,
SUBSTRING_INDEX(SUBSTRING_INDEX(colors, ',', n.digit+1), ',', -1) color
FROM
colors
INNER JOIN
(select id as digit from color_list) n
ON LENGTH(REPLACE(colors, ',' , '')) <= LENGTH(colors)-n.digit
ORDER BY id, n.digit;
It is important that the Ids in table color_list remain sequential, however.
No need for a stored procedure. A CTE is enough:
CREATE TABLE colors(id INT,colors TEXT);
INSERT INTO colors VALUES (1, 'Red,Green,Blue'), (2, 'Orangered,Periwinkle');
WITH RECURSIVE
unwound AS (
SELECT *
FROM colors
UNION ALL
SELECT id, regexp_replace(colors, '^[^,]*,', '') colors
FROM unwound
WHERE colors LIKE '%,%'
)
SELECT id, regexp_replace(colors, ',.*', '') colors
FROM unwound
ORDER BY id
;
+------+------------+
| id | colors |
+------+------------+
| 1 | Red |
| 1 | Green |
| 1 | Blue |
| 2 | Orangered |
| 2 | Periwinkle |
+------+------------+
notice this can be done without creating a temporary table
select id, substring_index(substring_index(genre, ',', n), ',', -1) as genre
from my_table
join
(SELECT #row := #row + 1 as n FROM
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t,
(SELECT #row:=0) r) as numbers
on char_length(genre)
- char_length(replace(genre, ',', '')) >= n - 1
if delimiter is part of data but embedded by double quotes then how can we split it.
Example
first,"second,s",third
it should come as
first
second,s
third
I have a table:
Type | Value
1 | '1test1'
2 | '2test1'
2 | '2test2'
2 | '2test3'
I want to get a result containing a pair where each entry from each type is used at least once, but not more than required.
From the example table above, I want the following result:
1test1 - 2test1
1test1 - 2test2
1test1 - 2test3
If the table is:
Type | Value
1 | '1test1'
1 | '1test2'
1 | '1test3'
2 | '2test1'
2 | '2test2'
2 | '2test3'
I want the following result:
1test1 - 2test1
1test2 - 2test2
1test3 - 2test3
If the table is:
Type | Value
1 | '1test1'
1 | '1test2'
2 | '2test1'
2 | '2test2'
2 | '2test3'
I want the following result:
'1test1' - '2test1'
'1test2' - '2test2'
'1test1' - '2test3'
'1test1' - '2test1'
'1test2' - '2test2'
'1test1' - '2test3'
I want each type to be repeated equally as other values in the same type. There shouldn't be a value from type that is repeated more often than other values in the same type.
What is the most elegant way to do it with SQL or stored procedure, or with a series of SQL statements?
It is somewhat easy to do when you have you don't have the same amount of rows for each type, but once you do, it gets somewhat tricky.
So I came up with this:
CREATE PROCEDURE test()
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE type1, type2 INT;
DECLARE value1, value2 VARCHAR(5);
DECLARE cur1 CURSOR FOR SELECT type,value FROM testtable WHERE Type = 1;
DECLARE cur1 CURSOR FOR SELECT type,value FROM testtable WHERE Type = 2;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
if (SELECT COUNT(Value) FROM testtable WHERE Type = 1)
= (SELECT COUNT(Value) FROM testtable WHERE Type = 2)
then
OPEN cur1;
OPEN cur2;
CREATE TEMPORARY TABLE test1 (
Value1 varchar(12),
Value2 varchar(12)
)
read_loop: LOOP
FETCH cur1 INTO type1, value1;
FETCH cur2 INTO type2, value2;
IF done THEN
LEAVE read_loop;
END IF;
INSERT INTO test1 VALUES(value1, value2);
END LOOP;
CLOSE cur1;
CLOSE cur2;
SELECT * FROM test1;
DROP TABLE test1;
ELSE
SELECT t1.Value, t2.Value
FROM testtable t1
LEFT JOIN testtable t2 ON t2.Type = 2
WHERE t1.Type = 1
UNION SELECT t1.Value, t2.Value
FROM testtable t1
RIGHT JOIN testtable t2 ON t2.Type = 2
WHERE t1.Type = 1;
END IF;
END;
It's hideous, but it works for your three examples. Somewhat.
This is a somewhat contrived answer but I guess it fits the question:
create table stuff( idx tinyint unsigned, val varchar(50));
insert into stuff( idx, val ) values ( 1, '1val1'), (1, '1val2'), (2,'2val1'),
(2,'2val2'), (2, '2val3');
SELECT s0.val v0, s1.val v1 FROM stuff s0
JOIN stuff s1 ON s0.idx != s1.idx
where s0.idx = 1;
Here's a fiddle.
Is this what you want
SELECT
s.val AS One,
r.val AS Second
FROM stuff AS s
LEFT OUTER JOIN (SELECT * FROM stuff WHERE idx = 2) AS r ON r.idx <> s.idx
WHERE s.idx = 1
SQL Fiddle Demo
OUTPUT :
One | Second
--------------------
1val1 | 2val1
1val1 | 2val2
1val1 | 2val3
1val2 | 2val1
1val2 | 2val2
1val2 | 2val3
I have a mySql table like this (simplified)
Id*| Text | Pos (integer)
-----------
A | foo | 0
B | bar | 1
C | baz | 2
D | qux | 3
Now, after I delete a row, I want to update the Pos value on the remaining rows so that no "holes" or gaps are to be found.
For example if I row with Id='C' is deleted, the remaing table should be:
Id*| Text | Pos (integer)
-----------
A | foo | 0
B | bar | 1
D | qux | 2
Is this possible in a single query?
UPDATE
Based on the accepted answer this was the solution to my problem:
START TRANSACTION;
SELECT #A:=pos FROM table_name WHERE Id= 'C';
DELETE FROM table_name WHERE Id = 'C';
UPDATE table_name SET Pos = Pos - 1 WHERE Pos > #A;
COMMIT;
You can achieve this by creating a AFTER DELETE TRIGGER on table,
or by using transactions:
START TRANSACTION;
SELECT Pos
INTO #var_pos
FROM table_name
WHERE id = 'C';
DELETE
FROM table_name
WHERE id = 'C';
UPDATE table_name
SET Pos = Pos - 1
WHERE Pos > #var_pos;
COMMIT;
I think this should work, (I haven't tested it)
you can run this statement after any delete
update t set t.Pos=a.iterator
from tablename t
join(
SELECT #i:=#i+1 AS iterator, t.id
FROM tablename t,(SELECT #i:=0) r)a
on a.id=t.id