Related
How can i update data using replace or regex-like method from
id | jdata
---------------
01 | {"name1":["number","2"]}
02 | {"val1":["number","12"],"val2":["number","22"]}
to
id | jdata
---------------
01 | {"name1":2 }
02 | {"val1": 12,"val2":22 }
I need to make a proper json entry for numbers and replace an array with a number from that array. Column "jdata" can have any number of similar attributes from the example. Something similar to this would do:
UPDATE table SET jdata = REPLACE(jdata, '["number","%d"]', %d);
Two ways:
The long, more clumsy way, using JSON_ARRAY:
UPDATE table1,
(
SELECT
id,
JSON_EXTRACT(jdata, "$.name1[0]") as A,
JSON_EXTRACT(jdata, "$.name1[1]") as B,
JSON_EXTRACT(jdata, "$.val1[0]") as C,
JSON_EXTRACT(jdata, "$.val1[1]") as D,
JSON_EXTRACT(jdata, "$.val2[0]") as E,
JSON_EXTRACT(jdata, "$.val2[1]") as F
FROM table1
) x
SET jdata = CASE WHEN table1.id=1 THEN JSON_ARRAY("name1",x.B)
ELSE JSON_ARRAY("val1",x.D,"val2",F) END
WHERE x.id=table1.id;
Or using JSON_REPLACE:
update table1
set jdata = JSON_REPLACE(jdata, "$.name1",JSON_EXTRACT(jdata,"$.name1[1]"))
where id=1;
update table1
set jdata = JSON_REPLACE(jdata, "$.val1",JSON_EXTRACT(jdata,"$.val1[1]"),
"$.val2",JSON_EXTRACT(jdata,"$.val2[1]"))
where id=2;
see: DBFIDDLE for both options
EDIT: To get more depth in the query, you can start with below, and create a new JSON message from this stuff without the number:
WITH RECURSIVE cte1 as (
select 0 as x
union all
select x+1 from cte1 where x<10
)
select
id,
x,
JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]"))) j,
JSON_EXTRACT(jdata,CONCAT("$.",JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]"))))) v,
JSON_UNQUOTE(JSON_EXTRACT(jdata,CONCAT("$.",JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]"))),"[0]"))) v1,
JSON_UNQUOTE(JSON_EXTRACT(jdata,CONCAT("$.",JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]"))),"[1]"))) v2
from table1
cross join cte1
where x<JSON_DEPTH(jdata)
and not JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]")) is null
order by id,x;
output:
id
x
j
v
v1
v2
1
0
name1
["number", "2"]
number
2
2
0
val1
["number", "12"]
number
12
2
1
val2
["number", "22"]
number
22
This should take care of JSON message which also contains values like val3, val4, etc, until a maximum depth which is now fixed to 10 in cte1.
EDIT2: When it is just needed to remove the "number" from the JSON message, you can also repeat this UPDATE until all "number" tags are removed (you can repeat this in a stored procedure, I am not going to write the stored procedure for you 😉)
update
table1,
( WITH RECURSIVE cte1 as (
select 0 as x
union all
select x+1 from cte1 where x<10
) select * from cte1 )x
set jdata = JSON_REMOVE(table1.jdata, CONCAT("$.",JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]"))),"[0]"))
where JSON_UNQUOTE(JSON_EXTRACT(jdata,CONCAT("$.",JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]"))),"[0]"))) = "number"
An example, where I do run the update 2 times, is in this DBFIDDLE
I found some options to run a mysql query to migrate/split one table into 2 new tables, but one of the fields I would like to create on the new table has to be the same as in the other new table:
So from the "From_payments" table, I would like to create 2 new tables called: "To_paymenttransaction" and "To_paymentinfo". A lof of the field are predefined and some are from the old table. The only problem is that the predefined field "paymentinfoid" has to be the same is both new tables.
**To_paymenttransaction** --- < --- **From_payments**
paymenttransactionid ------- < --- (Generate next available number in this column)
paymentinfoid -------------- < --- (Generate next available number in this column)
transactionid --------------- < --- txn_id
state ---------------------- < --- (Set all to "1")
amount -------------------- < --- mc_gross
currency ------------------- < --- mc_currency
dateline -------------------- < --- payment_date
paymentapiid --------------- < --- (Set all to "1")
request -------------------- < --- (Set all to "NULL")
reversed ------------------- < --- (Set all to "0")
**To_paymentinfo** -----------<
paymentinfoid -------------- < --- (Same generated number that goes to the paymentinfoif field in To_paymenttransaction table )
hash ----------------------- < --- (Set all to "Imported")
subscriptionid --------------- < --- (Set all to "1")
subscriptionsubid ------------ < --- ("2" IF above field mc_gross is 4, "1" IF above field mc_gross is 6, "0" IF above field mc_gross is 10)
userid ---------------------- < --- userid
completed ------------------ < --- (Set all to "0")
Any ideas?
All help is greatly appreciated.
Maybe generate the paymentinfoid on the original table before you split it. Then you can copy that single entry into each of the new tables.
I have this query:-
select col_str,
getVal,another_str,resultVal_str from tablename
Getting results like this:
col_str getVal another_str
'11,12,33,54,1,44' '12' '9,5,4,8,7'
'11,12,33,54,1,44,10,12,11,12,12' '44' '9,5,4,8,7,6,3,5,2,4,2'
'11,12,33,54,1,44' '999' '9,5,4,8,7,4'
'11,12,33' '0' '9,5,4'
----- ---- -----
----- ---- -----
----- ---- -----
The columns col_str,getVal,another_str came from table and the column resultVal_str want to calculate based on remaining three column,
Logic for resultVal_str -
See first record getVal having value 12 and col_str having 12 at location number 2 then see the location number two in another_str is 5, so the resultVal_str is 5 and so on. See below:
col_str getVal another_str resultVal_str
'11,12,33,54,1,44' '12' '9,5,4,8,7' 5
'11,12,33,54,1,44,10,12,11,12,12' '44' '9,5,4,8,7,6,3,5,2,4,2' 6
'11,12,33,54,1,44' '999' '9,5,4,8,7,4' 0
'11,12,33' '0' '9,5,4' 0
----- ---- ----- ---
----- ---- ----- ---
----- ---- ----- ---
How can i add the next column resultVal_str with getting result like above ?
first you need to fin the position of getVal in col_str using FIND_IN_SET function.
once you get the position you can find the resultVal from same location in another_str using SUBSTRING_INDEX function as:
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(another_str,
",", (FIND_IN_SET(getVal, col_str))),
",", - 1) AS resultVal_str
FROM tablename;
test:
SET #getVal = '12', #col_str = '11,12,33,54,1,44', #another_str = '9,5,4,8,7';
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(#another_str, ",", (FIND_IN_SET(#getVal, #col_str))), ",", - 1) AS resultVal_str;
I have a text file that looks like this:
gene1 gene2 gene3
a d c
b e d
c f g
d g
h
i
(Each column is a human gene, and each contains a variable number of proteins (strings, shown as letters here) that can bind to those genes).
What I want to do is count how many columns each string is represented in, output that number and all the column headers, like this:
a 1 gene1
b 1 gene1
c 2 gene1 gene3
d 3 gene1 gene2 gene3
e 1 gene2
f 1 gene2
g 2 gene2 gene3
h 1 gene2
i 1 gene2
I have been trying to figure out how to do this in Perl and R, but without success so far. Thanks for any help.
This solution seems like a bit of a hack, but it gives the desired output. It relies on using both plyr and reshape packages, though I'm sure you could find base R alternatives. The trick is that function melt lets us flatten the data out into a long format, which allows for easy(ish) manipulation from that point forward.
library(reshape)
library(plyr)
#Recreate your data
dat <- data.frame(gene1 = c(letters[1:4], NA, NA),
gene2 = letters[4:9],
gene3 = c("c", "d", "g", NA, NA, NA)
)
#Melt the data. You'll need to update this if you have more columns
dat.m <- melt(dat, measure.vars = 1:3)
#Tabulate counts
counts <- as.data.frame(table(dat.m$value))
#I'm not sure what to call this column since it's a smooshing of column names
otherColumn <- ddply(dat.m, "value", function(x) paste(x$variable, collapse = " "))
#Merge the two together. You could fix the column names above, or just deal with it here
merge(counts, otherColumn, by.x = "Var1", by.y = "value")
Gives:
> merge(counts, otherColumn, by.x = "Var1", by.y = "value")
Var1 Freq V1
1 a 1 gene1
2 b 1 gene1
3 c 2 gene1 gene3
4 d 3 gene1 gene2 gene3
....
In perl, assuming the proteins in each column don't have duplicates that need to be removed. (If they do, a hash of hashes should be used instead.)
use strict;
use warnings;
my $header = <>;
my %column_genes;
while ($header =~ /(\S+)/g) {
$column_genes{$-[1]} = "$1";
}
my %proteins;
while (my $line = <>) {
while ($line =~ /(\S+)/g) {
if (exists $column_genes{$-[1]}) {
push #{ $proteins{$1} }, $column_genes{$-[1]};
}
else {
warn "line $. column $-[1] unexpected protein $1 ignored\n";
}
}
}
for my $protein (sort keys %proteins) {
print join("\t",
$protein,
scalar #{ $proteins{$protein} },
join(' ', sort #{ $proteins{$protein} } )
), "\n";
}
Reads from stdin, writes to stdout.
A one liner (or rather 3 liner)
ddply(na.omit(melt(dat, m = 1:3)), .(value), summarize,
len = length(variable),
var = paste(variable, collapse = " "))
If it's not a lot of columns, you can do something like this in sql. You basically flatten out the data into a 2 column derived table of protein/gene and then summarize it as needed.
;with cte as (
select gene1 as protein, 'gene1' as gene
union select gene2 as protein, 'gene2' as gene
union select gene3 as protein, 'gene3' as gene
)
select protein, count(*) as cnt, group_concat(gene) as gene
from cte
group by protein
In mysql, like so:
select protein, count(*), group_concat(gene order by gene separator ' ') from gene_protein group by protein;
assuming data like:
create table gene_protein (gene varchar(255) not null, protein varchar(255) not null);
insert into gene_protein values ('gene1','a'),('gene1','b'),('gene1','c'),('gene1','d');
insert into gene_protein values ('gene2','d'),('gene2','e'),('gene2','f'),('gene2','g'),('gene2','h'),('gene2','i');
insert into gene_protein values ('gene3','c'),('gene3','d'),('gene3','g');
I need to concatenate from two different tables.
Compare s.panelid (result like "AA") to b.modulecodes and return number_of_strings. Then put s.panelid (result like "AA") and number_of_string together.
select concat(Mid(s.panelid, 5, 2), ' - ' , '??') as `Module Type-Strings`
from r2rtool.stringtopanel s, be.modulecodes b
where s.insertts > '2011-07-15' and s.insertts < '2011-07-26' and Mid(s.panelid, 5, 2) != 99
group by date(insertts), `Module Type-Strings`
order by `Module Type-Strings`;
Be (Table): modulecodes, number_of_strings
AA - 12
AB - 4
AD - 3
AE - 12
When I run the above query it returns things like: Module Type-Strings = 'AA-??' and "AB-??" of course.
I am looking for: Module Type-Strings = 'AA-12'
Just in case you haven't tried it already...
Have you tried this?
select concat(Mid(s.panelid, 5, 2), ' - ' , b.number_of_string) as `Module Type-Strings`
from r2rtool.stringtopanel s, be.modulecodes b
where s.insertts > '2011-07-15' and s.insertts < '2011-07-26' and Mid(s.panelid, 5, 2) != 99
group by date(insertts), `Module Type-Strings`
order by `Module Type-Strings`;
There I'm basically replacing the '??' with the column you are asking about, number_of_string in the be.modulecodes table (aliased as b in the from clause).