Compare JSON values in MariaDB - json

How can I compare two JSON values in MariaDB? Two values such as {"b": 1, "a": 2} and {"a": 2, "b": 1} should be equal. Does MariaDB contain function to reorder elements of a JSON value?

If you expect to need this (uncommon) kind of comparison, build the JSON in some canonical way before storing it. The obvious way for a simple JSON like yours is to alphabetize the keys. How to do that will depend on the "encode" library you are using for JSON.

Just use JSON_EXTRACT, JSON_EXTRACT doesnt care about the position of a digit within a JSON string.
Query
SELECT
JSON_EXTRACT(#json_string_1, '$.a') AS a1
, JSON_EXTRACT(#json_string_2, '$.a') AS a2
, JSON_EXTRACT(#json_string_1, '$.b') AS b1
, JSON_EXTRACT(#json_string_2, '$.b') AS b2
FROM (
SELECT
#json_string_1 := '{"b":1,"a":2}'
, #json_string_2 := '{"a":2,"b":1}'
)
AS
json_strings
Result
a1 a2 b1 b2
------ ------ ------ --------
2 2 1 1
Now use this result as delivered table so we can check if a1 is equal to a2 and b1 is equal to b2.
Query
SELECT
1 AS json_equal
FROM (
SELECT
JSON_EXTRACT(#json_string_1, '$.a') AS a1
, JSON_EXTRACT(#json_string_2, '$.a') AS a2
, JSON_EXTRACT(#json_string_1, '$.b') AS b1
, JSON_EXTRACT(#json_string_2, '$.b') AS b2
FROM (
SELECT
#json_string_1 := '{"b":1,"a":2}'
, #json_string_2 := '{"a":2,"b":1}'
)
AS
json_strings
)
AS json_data
WHERE
json_data.a1 = json_data.a2
AND
json_data.b1 = json_data.b2
Result
json_equal
------------
1

Disclaimer: I work for MariaDB
See my answer at https://dba.stackexchange.com/a/300235/208895 for an example how to use JSON_EQUALS available as of 10.7.

Related

MySQL 8 update with Replace() or Regex

How can i update data using replace or regex-like method from
id | jdata
---------------
01 | {"name1":["number","2"]}
02 | {"val1":["number","12"],"val2":["number","22"]}
to
id | jdata
---------------
01 | {"name1":2 }
02 | {"val1": 12,"val2":22 }
I need to make a proper json entry for numbers and replace an array with a number from that array. Column "jdata" can have any number of similar attributes from the example. Something similar to this would do:
UPDATE table SET jdata = REPLACE(jdata, '["number","%d"]', %d);
Two ways:
The long, more clumsy way, using JSON_ARRAY:
UPDATE table1,
(
SELECT
id,
JSON_EXTRACT(jdata, "$.name1[0]") as A,
JSON_EXTRACT(jdata, "$.name1[1]") as B,
JSON_EXTRACT(jdata, "$.val1[0]") as C,
JSON_EXTRACT(jdata, "$.val1[1]") as D,
JSON_EXTRACT(jdata, "$.val2[0]") as E,
JSON_EXTRACT(jdata, "$.val2[1]") as F
FROM table1
) x
SET jdata = CASE WHEN table1.id=1 THEN JSON_ARRAY("name1",x.B)
ELSE JSON_ARRAY("val1",x.D,"val2",F) END
WHERE x.id=table1.id;
Or using JSON_REPLACE:
update table1
set jdata = JSON_REPLACE(jdata, "$.name1",JSON_EXTRACT(jdata,"$.name1[1]"))
where id=1;
update table1
set jdata = JSON_REPLACE(jdata, "$.val1",JSON_EXTRACT(jdata,"$.val1[1]"),
"$.val2",JSON_EXTRACT(jdata,"$.val2[1]"))
where id=2;
see: DBFIDDLE for both options
EDIT: To get more depth in the query, you can start with below, and create a new JSON message from this stuff without the number:
WITH RECURSIVE cte1 as (
select 0 as x
union all
select x+1 from cte1 where x<10
)
select
id,
x,
JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]"))) j,
JSON_EXTRACT(jdata,CONCAT("$.",JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]"))))) v,
JSON_UNQUOTE(JSON_EXTRACT(jdata,CONCAT("$.",JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]"))),"[0]"))) v1,
JSON_UNQUOTE(JSON_EXTRACT(jdata,CONCAT("$.",JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]"))),"[1]"))) v2
from table1
cross join cte1
where x<JSON_DEPTH(jdata)
and not JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]")) is null
order by id,x;
output:
id
x
j
v
v1
v2
1
0
name1
["number", "2"]
number
2
2
0
val1
["number", "12"]
number
12
2
1
val2
["number", "22"]
number
22
This should take care of JSON message which also contains values like val3, val4, etc, until a maximum depth which is now fixed to 10 in cte1.
EDIT2: When it is just needed to remove the "number" from the JSON message, you can also repeat this UPDATE until all "number" tags are removed (you can repeat this in a stored procedure, I am not going to write the stored procedure for you 😉)
update
table1,
( WITH RECURSIVE cte1 as (
select 0 as x
union all
select x+1 from cte1 where x<10
) select * from cte1 )x
set jdata = JSON_REMOVE(table1.jdata, CONCAT("$.",JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]"))),"[0]"))
where JSON_UNQUOTE(JSON_EXTRACT(jdata,CONCAT("$.",JSON_UNQUOTE(JSON_EXTRACT(JSON_KEYS(jdata),CONCAT("$[",x,"]"))),"[0]"))) = "number"
An example, where I do run the update 2 times, is in this DBFIDDLE

Postgres select value by key from json in a list

Given the following:
create table test (
id int,
status text
);
insert into test values
(1,'[]'),
(2,'[{"A":"d","B":"c"}]'),
(3,'[{"A":"g","B":"f"}]');
Is it possible to return?
id A B
1 null null
2 d c
3 g f
I am attempting something like this:
select id,
status::json ->> 0 #> "A" from test
Try this to address your specific example :
SELECT id, (status :: json)#>>'{0,A}' AS A, (status :: json)#>>'{0,B}' AS B
FROM test
see the result
see the manual :
jsonb #>> text[] → text
Extracts JSON sub-object at the specified path as text.
'{"a": {"b": ["foo","bar"]}}'::json #>> '{a,b,1}' → bar
This does it:
SELECT id,
(status::json->0)->"A" as A,
(status::json->0)->"B" as B
FROM test;

How to truncate double precision value in PostgreSQL by keeping exactly first two decimals?

I'm trying to truncate double precision value when I'm build json using json_build_object() function in PostgreSQL 11.8 but with no luck. To be more precise I'm trying to truncate 19.9899999999999984 number to ONLY two decimals but making sure it DOES NOT round it to 20.00 (which is what it does), but to keep it at 19.98.
BTW, what I've tried so far was to use:
1) TRUNC(found_book.price::numeric, 2) and I get value 20.00
2) ROUND(found_book.price::numeric, 2) and I get value 19.99 -> so far this is closesest value but not what I need
3) ROUND(found_book.price::double precision, 2) and I get
[42883] ERROR: function round(double precision, integer) does not exist
Also here is whole code I'm using:
create or replace function public.get_book_by_book_id8(b_id bigint) returns json as
$BODY$
declare
found_book book;
book_authors json;
book_categories json;
book_price double precision;
begin
-- Load book data:
select * into found_book
from book b2
where b2.book_id = b_id;
-- Get assigned authors
select case when count(x) = 0 then '[]' else json_agg(x) end into book_authors
from (select aut.*
from book b
inner join author_book as ab on b.book_id = ab.book_id
inner join author as aut on ab.author_id = aut.author_id
where b.book_id = b_id) x;
-- Get assigned categories
select case when count(y) = 0 then '[]' else json_agg(y) end into book_categories
from (select cat.*
from book b
inner join category_book as cb on b.book_id = cb.book_id
inner join category as cat on cb.category_id = cat.category_id
where b.book_id = b_id) y;
book_price = trunc(found_book.price, 2);
-- Build the JSON response:
return (select json_build_object(
'book_id', found_book.book_id,
'title', found_book.title,
'price', book_price,
'amount', found_book.amount,
'is_deleted', found_book.is_deleted,
'authors', book_authors,
'categories', book_categories
));
end
$BODY$
language 'plpgsql';
select get_book_by_book_id8(186);
How do I achieve to keep EXACTLY ONLY two FIRST decimal digits 19.98 (any suggestion/help is greatly appreciated)?
P.S. PostgreSQL version is 11.8
In PostgreSQL 11.8 or 12.3 I cannot reproduce:
# select trunc('19.9899999999999984'::numeric, 2);
trunc
-------
19.98
(1 row)
# select trunc(19.9899999999999984::numeric, 2);
trunc
-------
19.98
(1 row)
# select trunc(19.9899999999999984, 2);
trunc
-------
19.98
(1 row)
Actually I can reproduce with the right type and a special setting:
# set extra_float_digits=0;
SET
# select trunc(19.9899999999999984::double precision::text::numeric, 2);
trunc
-------
19.99
(1 row)
And a possible solution:
# show extra_float_digits;
extra_float_digits
--------------------
3
(1 row)
select trunc(19.9899999999999984::double precision::text::numeric, 2);
trunc
-------
19.98
(1 row)
But note that:
Note: The extra_float_digits setting controls the number of extra
significant digits included when a floating point value is converted
to text for output. With the default value of 0, the output is the
same on every platform supported by PostgreSQL. Increasing it will
produce output that more accurately represents the stored value, but
may be unportable.
As #pifor suggested I've managed to get it done by directly passing trunc(found_book.price::double precision::text::numeric, 2) as value in json_build_object like this:
json_build_object(
'book_id', found_book.book_id,
'title', found_book.title,
'price', trunc(found_book.price::double precision::text::numeric, 2),
'amount', found_book.amount,
'is_deleted', found_book.is_deleted,
'authors', book_authors,
'categories', book_categories
)
Using book_price = trunc(found_book.price::double precision::text::numeric, 2); and passing it as value for 'price' key didn't work.
Thank you for your help. :)

Sort values that contain letters and symbols in a custom order

Can you change the MySQL sort by function? I am trying to sort my values according to an arbitrary order.
Currently looking for ways to inject a function that might help me out here short of adding a column and modifying the import.
This is the order I want:
AAA
AA+
AA
AA-
A+
A
A-
BBB+
BBB
BBB-
BB+
BB
BB-
B+
B
B-
CCC+
CCC
CCC-
CC
This is my result using sort by:
A
A+
A-
AA
AA+
AA-
AAA
B
B+
B-
BB
BB+
BB-
BBB
BBB+
BBB-
C
CC
CCC
CCC+
CCC-
EDIT:
Attempting but getting syntax errors:
CREATE FUNCTION sortRating (s CHAR(20))
RETURNS INT(2)
DECLARE var INT
CASE s
WHEN 'AAA' THEN SET var = 1
WHEN 'AA+' THEN SET var = 2
ELSE
SET VAR = 3
END CASE
RETURN var
END;
This is possible using the following syntax:
ORDER BY FIELD(<field_name>, comma-separated-custom-order)
for instance, if the expression you want to order by is called rating, then your ORDER BY clause would read:
ORDER BY FIELD(rating, 'AAA', 'AA+', 'AA', 'AA-', 'A+', 'A', 'A-',
'BBB+', 'BBB', 'BBB-', 'BB+', 'BB', 'BB-',
'B+', 'B', 'B-', 'CCC+', 'CCC', 'CCC-', 'CC')
Here's documentation on the FIELD FUNCTION
I see a pattern here:
BBB+
BBB
BBB-
BB+
BB
BB-
B+
B
B-
Think of each character as a column and sort each column in this order:
Letters
+
empty string
-
SELECT rating
FROM test
ORDER BY
MID(rating, 1, 1),
CASE MID(rating, 2, 1) WHEN '+' THEN 2 WHEN '' THEN 3 WHEN '-' THEN 4 ELSE 1 END,
CASE MID(rating, 3, 1) WHEN '+' THEN 2 WHEN '' THEN 3 WHEN '-' THEN 4 ELSE 1 END,
CASE MID(rating, 4, 1) WHEN '+' THEN 2 WHEN '' THEN 3 WHEN '-' THEN 4 ELSE 1 END
SQL Fiddle

Perl (or R, or SQL): Count how often string appears across columns

I have a text file that looks like this:
gene1 gene2 gene3
a d c
b e d
c f g
d g
h
i
(Each column is a human gene, and each contains a variable number of proteins (strings, shown as letters here) that can bind to those genes).
What I want to do is count how many columns each string is represented in, output that number and all the column headers, like this:
a 1 gene1
b 1 gene1
c 2 gene1 gene3
d 3 gene1 gene2 gene3
e 1 gene2
f 1 gene2
g 2 gene2 gene3
h 1 gene2
i 1 gene2
I have been trying to figure out how to do this in Perl and R, but without success so far. Thanks for any help.
This solution seems like a bit of a hack, but it gives the desired output. It relies on using both plyr and reshape packages, though I'm sure you could find base R alternatives. The trick is that function melt lets us flatten the data out into a long format, which allows for easy(ish) manipulation from that point forward.
library(reshape)
library(plyr)
#Recreate your data
dat <- data.frame(gene1 = c(letters[1:4], NA, NA),
gene2 = letters[4:9],
gene3 = c("c", "d", "g", NA, NA, NA)
)
#Melt the data. You'll need to update this if you have more columns
dat.m <- melt(dat, measure.vars = 1:3)
#Tabulate counts
counts <- as.data.frame(table(dat.m$value))
#I'm not sure what to call this column since it's a smooshing of column names
otherColumn <- ddply(dat.m, "value", function(x) paste(x$variable, collapse = " "))
#Merge the two together. You could fix the column names above, or just deal with it here
merge(counts, otherColumn, by.x = "Var1", by.y = "value")
Gives:
> merge(counts, otherColumn, by.x = "Var1", by.y = "value")
Var1 Freq V1
1 a 1 gene1
2 b 1 gene1
3 c 2 gene1 gene3
4 d 3 gene1 gene2 gene3
....
In perl, assuming the proteins in each column don't have duplicates that need to be removed. (If they do, a hash of hashes should be used instead.)
use strict;
use warnings;
my $header = <>;
my %column_genes;
while ($header =~ /(\S+)/g) {
$column_genes{$-[1]} = "$1";
}
my %proteins;
while (my $line = <>) {
while ($line =~ /(\S+)/g) {
if (exists $column_genes{$-[1]}) {
push #{ $proteins{$1} }, $column_genes{$-[1]};
}
else {
warn "line $. column $-[1] unexpected protein $1 ignored\n";
}
}
}
for my $protein (sort keys %proteins) {
print join("\t",
$protein,
scalar #{ $proteins{$protein} },
join(' ', sort #{ $proteins{$protein} } )
), "\n";
}
Reads from stdin, writes to stdout.
A one liner (or rather 3 liner)
ddply(na.omit(melt(dat, m = 1:3)), .(value), summarize,
len = length(variable),
var = paste(variable, collapse = " "))
If it's not a lot of columns, you can do something like this in sql. You basically flatten out the data into a 2 column derived table of protein/gene and then summarize it as needed.
;with cte as (
select gene1 as protein, 'gene1' as gene
union select gene2 as protein, 'gene2' as gene
union select gene3 as protein, 'gene3' as gene
)
select protein, count(*) as cnt, group_concat(gene) as gene
from cte
group by protein
In mysql, like so:
select protein, count(*), group_concat(gene order by gene separator ' ') from gene_protein group by protein;
assuming data like:
create table gene_protein (gene varchar(255) not null, protein varchar(255) not null);
insert into gene_protein values ('gene1','a'),('gene1','b'),('gene1','c'),('gene1','d');
insert into gene_protein values ('gene2','d'),('gene2','e'),('gene2','f'),('gene2','g'),('gene2','h'),('gene2','i');
insert into gene_protein values ('gene3','c'),('gene3','d'),('gene3','g');