I have a postgres DB called sales with a json-object, data containing around 100 outer-keys, lets name them k1,k2,k3..,k100.
I want to write a query
select * from sales some_function(data)
which simply returns something like
k1 | k2 | .. | k100
--------------------
"foo" | "bar" | .. | 2
"fizz"| "buzz"| .. | 10
ie. just unpacks the keys as columsn and their values as row.
Note, k1,k2..k100 is not their real name thus I can't do a
data->> key loop
That's not possible. One restriction of the SQL language is, that all columns (and their data types) must be known to the database when parsing the statement - so before it is actually run.
You will have to write each one separately:
select data ->> 'k1' as k1, data ->> 'k2' as k2, ...
from sales
One way to make this easier, is to generate a view dynamically by extracting all JSON keys from the column, then using dynamic SQL to create the view. You will however need to re-create that view each time the number of keys change.
Something along the lines (not tested!)
do
$$
declare
l_columns text;
l_sql text;
begin
select string_agg(distinct format('data ->> %L as %I', t.key, t.key), ', ')
into l_columns
from sales s
cross join jsonb_each(s.data) as t(key, value);
-- l_columns now contains something like:
-- data ->> 'k1' as k1, data ->> 'k2' as k2
-- now create a view from that
l_sql := 'create view sales_keys as select '||l_columns||' from sales';
execute l_sql;
end;
$$
;
You probably want to add e.g. the primary key column(s) to the view, so that you can match the JSON values back to the original row(s).
I have a table someTable with a column bin of type VARCHAR(4). Whenever I insert to this table, bin should be a unique combination of characters and numbers. Unique in this sense meaning has not appeared before in the table in another row.
bin is in the form of AA00, where A is a character A-F and 0 is a number 0-9.
Say I insert to this table once: it should come up with a bin value which doesn't appear before. Assuming the table was empty, the first bin could be AA11. On second insertion, it should be AA12, and then AA13, etc.
AA00, AA01, ... AA09, AA10, AA11, ... AA99, AB00, AB01, ... AF99, BA00, BA01, ... FF99
It doesn't matter this table can contain only 3,600 possible rows. How do I create this code, specifically finding a bin that doesn't already exist in someTable? It can be in order as I've described or a random bin, as long as it doesn't appear twice.
CREATE TABLE someTable (
bin VARCHAR(4),
someText VARCHAR(32),
PRIMARY KEY(bin)
);
INSERT INTO someTable
VALUES('?', 'a');
INSERT INTO someTable
VALUES('?', 'b');
INSERT INTO someTable
VALUES('?', 'c');
INSERT INTO someTable
VALUES('?', 'd');
Alternatively, I can use the below procedure to insert instead:
CREATE PROCEDURE insert_someTable(tsomeText VARCHAR(32))
BEGIN
DECLARE var (VARCHAR(4) DEFAULT (
-- some code to find unique bin
);
INSERT INTO someTable
VALUES(var, tsomeText);
END
A possible outcome is:
+------+----------+
| bin | someText |
+------+----------+
| AB31 | a |
| FC10 | b |
| BB22 | c |
| AF92 | d |
+------+----------+
As Gordon said, you will have to use a trigger because it is too complex to do as a simple formula in a default. Should be fairly simple, you just get the last value (order by descending, limit 1) and increment it. Writing the incrementor will be somewhat complicated because of the alpha characters. It would be much easier in an application language, but then you run into issues of table locking and the possibility of two users creating the same value.
A better method would be to use a normal auto-increment primary key and translate it to your binary value. Consider your bin value as two base 6 characters followed by two base 10 values. You then take the id generated by MySQL which is guaranteed to be unique and convert to your special number system. Calculate the bin and store it in the bin column.
To calculate the bin:
Step one would be to get the lower 100 value of the decimal number (mod 100) - that gives you the last two digits. Convert to varchar with a leading zero.
Subtract that from the id, and divide by 100 to get the value for the first two digits.
Get the mod 6 value to determine the 3rd (from the right) digit. Convert to A-F by index.
Subtract this from what's left of the ID, and divide by 6 to get the 4th (from the right) digit. Convert to A-F by index.
Concat the three results together to form the value for the bin.
You may need to edit the following to match your table name and column names, but it should so what you are asking. One possible improvement would be to have it cancel any inserts past the 3600 limit. If you insert the 3600th record, it will duplicate previous bin values. Also, it won't insert AA00 (id=1 = 'AA01'), so it's not perfect. Lastly, you could put a unique index on bin, and that would prevent duplicates.
DELIMITER $$
CREATE TRIGGER `fix_bin`
BEFORE INSERT ON `so_temp`
FOR EACH ROW
BEGIN
DECLARE next_id INT;
SET next_id = (SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA=DATABASE() AND TABLE_NAME='so_temp');
SET #id = next_id;
SET #Part1 = MOD(#id,100);
SET #Temp1 = FLOOR((#id - #Part1) / 100);
SET #Part2 = MOD(#Temp1,6);
SET #Temp2 = FLOOR((#Temp1 - #Part2) / 6);
SET #Part3 = MOD(#Temp2,6);
SET #DIGIT12 = RIGHT(CONCAT("00",#Part1),2);
SET #DIGIT3 = SUBSTR("ABCDEF",#Part2 + 1,1);
SET #DIGIT4 = SUBSTR("ABCDEF",#Part3 + 1,1);
SET NEW.`bin` = CONCAT(#DIGIT4,#DIGIT3,#DIGIT12);
END;
$$
DELIMITER ;
Hopelessly stuck at the following and up until now none of my programming speed dial buddies has been able to help out (most of them not MySQL experts):
I have different tables where the column names and datatypes are auto generated from the 'import table data wizard' using a CSV file, and the table does not contain an AUTO INCREMENT column (yet). This particular table consists of approx: 30.000 rows It starts at row=id(1) from a table that looks like this:
I am trying to correct values in one column that are comma delimited using one 'corrections' table. And to do this I am writing a stored procedure containing a WHILE loop to interate through the corrections table row for row, and check wheter or not an Alias is found in the table that was imported.
| id | material | alias01 | alias02 | alias03 | *up to 12
1 Katoen Cotton Supima Pima
2 Polyester Polyster
3 Lyocell Lycocell Lyocel
4 Linnen Linen
5 Viscose Visose Viskose Viscoe Voscose
6 Scheerwol
7 Polyamide
8 Nylon
9 Leer Leder Lamsleder Varkensleder
10 Polyurethaan Polyurethan PU Polyuretaan
For testing purposes to test any kind of results i am only using alias01 for now ( it needs to check alias01, then 02 etc... but i'll try to solve that at a later time).
It needs to compare `Length' ( alias_string_length = found_string_length) to make sure that a string that consist of 'wo' is not found in 'wool' or 'wol'.
The values from the column that need corrections look like this (the comma's dont need to be there it's just what i was given to work with):
| material |
,Katoen,Elastaan,Voering,Acetaat,Polyester
,Nylon,Polyester,Elastaan
,Katoen
,Leder,in,Leder,Loopzool,Leder
,Polyester
,Polyester,Elastaan,Voering,Polyester
Update
Thanks to Drew's tip i changed the procedure. I added a tmp table that holds materials AND a unique id for each row, and iterate through each one with the alias01. It takes around 11 seconds to do 9000 rows but 0 row(s) affected,. Any tips on increasing speed are most welcome, but insight in what might be the issue would help alot more.
CREATE DEFINER=`root`#`localhost` PROCEDURE `replace_materials`()
BEGIN
set #rownumber = 1;
set #totalrows = 28;
set #um ='';
set #cm ='';
set #corrected ='';
set #correctme ='';
TRUNCATE TABLE tmp;
INSERT INTO tmp (material) SELECT material FROM vantilburgonline.productinfo;
WHILE (#rownumber < #totalrows) DO
SET #um = (SELECT alias01 FROM vantilburgonline.materials WHERE id=#rownumber);
-- gives 'um' value from column alias01, from table materials, row(X)
SET #cm = (SELECT material FROM vantilburgonline.materials WHERE id=#rownumber);
-- gives 'cm' value from column material, from table materials, row(X)
set #tmprow = 1;
set #totaltmprow =9000;
WHILE (#tmprow < #totaltmprow) DO
SET #correctme = (SELECT material FROM vantilburgonline.tmp WHERE id = #tmprow);
-- gives the value from column material from table tmp to correctme(X).
SET #correctme = REPLACE(#correctme,#um,#cm);
-- should run through column material from table productinfo and replace 'alias01' with correct 'material'.
SET #tmprow = #tmprow +1;
END WHILE;
SET #rownumber = #rownumber +1;
END WHILE;
END
though i'm certain alias01 contains strings it should've found in the materials. Also Workbench was using 9GB at this point and i was only able to counter that by restarting..
I would recommend an alteration from your materials table which is unwieldy with multiple columns (alias01 .. alias12). A transition to a normalized, extensible system. It would have a materials table and a materials_alias table. As it sits alongside your current table that you created, I named them with a 2.
Schema
drop table if exists materials2;
create table materials2
( material varchar(100) primary key, -- let's go with a natural key
active bool not null -- turn it LIVE and ON for string replacement of alias back to material name
-- so active is TRUE for ones to do replacement, or FALSE for skip
-- facilitates your testing of your synonyms, translations, slangs, etc
)engine=INNODB;
insert materials2 (material,active) values
('KARTON',true),
('Polyester',false),
('Lyocell',false),
('Linnen',true),
('Viscose',true),
('Scheerwol',false),
('Nylon',false),
('Leer',true),
('Polyurethaan',true),
('Polyacryl',true),
('Acryl',false),
('Modal',true),
('Acetaat',true),
('Papier',false),
('Wol',true),
('Zijde',true),
('Temcal',false),
('Polyamide',true),
('Wol-Merino',true),
('Elastan',true),
('Elastomultiester',true);
-- 21 rows
-- a few rows were skipped. The intent of them read as gibberish to me. Please review.
-- we need to restructure the materials2_alias table (after the first attempt)
-- 1. it might need special handling when `alias` is a legitimate substring of `material` (those 2 columns)
-- 2. it needs a unique composite index
drop table if exists materials2_alias;
create table materials2_alias
( id int auto_increment primary key,
material varchar(100) not null,
alias varchar(100) not null,
ais bool not null, -- Alias is Substring (alias is a legitimate substring of material, like Wo and Wol, respectively)
unique key(material,alias), -- Composite Index, do not allow dupe combos (only 1 row per combo)
foreign key `m2alias_m2` (material) references materials2(material)
)engine=INNODB;
insert materials2_alias (material,alias,ais) values
('KARTON','Cotton',false),('KARTON','Katoen',false),('KARTON','Pima',false),
('Polyester','Polyster',false),
('Lyocell','Lycocell',false),('Lyocell','Lyocel',false),
('Linnen','Linen',false),
('Viscose','Visose',false),('Viscose','Viskose',false),('Viscose','Viscoe',false),('Viscose','Voscose',false),
('Leer','Leder',false),('Leer','Lamsleder',false),('Leer','Varkensleder',false),('Leer','Schapenleder',false),('Leer','Geitenleder',false),
('Polyurethaan','Polyurethan',false),('Polyurethaan','PU',false),('Polyurethaan','Polyuretaan',false),('Polyurethaan','Polyurathane',false),('Polyurethaan','Polyurtaan',false),('Polyurethaan','Polyueretaan',false),
('Polyacryl','Polyacrylic',false),
('Acetaat','Leder',false),('Acetaat','Lamsleder',false),
('Wol','Schuurwol',false),('Wol','Wool',false),('Wol','WO',false),('Wol','Scheerwol',false),
('Zijde','Silk',false),('Zijde','Sede',false),
('Polyamide','Polyamie',false),('Polyamide','Polyamid',false),('Polyamide','Poliamide',false),
('Wol-Merino','Merino',false),
('Elastan','Elastaan',false),('Elastan','Spandex',false),('Elastan','Elataan',false),('Elastan','Elastane',false),
('Elastomultiester','elastomutltiester',false),('Elastomultiester','Elasomultiester',false);
-- this cleans up the above, where false should have been true
update materials2_alias
set ais=true
where instr(material,alias)>0;
-- 4 rows
There are several alter table statements and other things. I will try to document them or link to them. I am merely trying to capture something to share considering it is several hundred lines of code from you. But mine comes down to a simple chunk of code you would put in a loop.
The Update put in a loop:
UPDATE productinfo pi
join materials2_alias ma
on instr( pi.material, concat(',',ma.alias,',') )>0
join materials2 m
on m.material=ma.material and m.active=true
set pi.material=replace(lower(pi.material),lower(ma.alias),lower(ma.material)),
pi.touchCount=pi.touchCount+1;
A few notes on the update:
-- Note, pi.material starts and ends with a comma.
-- I forced that during the ETL. But `ma.alias` does not contain commas.
-- So add the commas with a concat() within the "Update with a Join" pattern shown
--
-- Note that the commas solved the problem with the Wol - Wo
Well, the following 4 in particular.
select * from materials2_alias
where ais=true
order by material,alias;
+----+------------+----------+-----+
| id | material | alias | ais |
+----+------------+----------+-----+
| 6 | Lyocell | Lyocel | 1 |
| 33 | Polyamide | Polyamid | 1 |
| 28 | Wol | WO | 1 |
| 35 | Wol-Merino | Merino | 1 |
+----+------------+----------+-----+
-- instr() is not case sensitive except for binary strings
-- REPLACE(str,from_str,to_str); -- case sensitive
-- http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_replace
--
-- so the update uses lower() or this won't work due to replace() case sensitivity
--
Stored Procedure:
DROP PROCEDURE if exists touchCounts;
DELIMITER $$
CREATE PROCEDURE touchCounts()
BEGIN
select touchCount,count(*) as rowCount
from productinfo
group by touchCount
order by touchCount;
END $$
DELIMITER ;
When that stored procedure returns the same count of rows on a successive call (the next call), you are done modifying the material column via the update.
That stored procedure could naturally return an out parameter for the rowcount. But it is late and time to sleep.
For your last data set from your side, the update statement would need to be called 4 times. That is like 13 seconds on my mediocre laptop. The idea is naturally flexible, for hundreds of aliases per material if you want.
I parked it up on github as it is too much otherwise.