Query Postgres for number of items in JSON - json

I am running Postgres 9.3 and have a problem with a query involving a JSON column that I cannot seem to crack.
Let's assume this is the table:
# CREATE TABLE aa (a int, b json);
# INSERT INTO aa VALUES (1, '{"f1":1,"f2":true}');
# INSERT INTO aa VALUES (2, '{"f1":2,"f2":false,"f3":"Hi I''m \"Dave\""}');
# INSERT INTO aa VALUES (3, '{"f1":3,"f2":true,"f3":"Hi I''m \"Popo\""}');
I now want to create a query that returns all rows that have exactly three items/keys in the root node of the JSON column (i.e., row 2 and 3). Whether the JSON is nested doesn't matter.
I tried to use json_object_keys and json_each but couldn't get it to work.

json_each(json) should do the job. Counting only root elements:
SELECT aa.*
FROM aa, json_each(aa.b) elem
GROUP BY aa.a -- possible, because it's the PK!
HAVING count(*) = 3;
SQL Fiddle.

Related

MySql json reverse search

I have a MySQL table with a column of type json. The values of this columns are json array not json object. I need to find records of this table that at least one value of their json column is substring of the given string/phrase.
Let's suppose the table is looks like this:
create table if not exists test(id int, col json);
insert into test values (1, '["ab", "cd"]');
insert into test values (2, '["ef", "gh", "ij"]');
insert into test values (3, '["xyz"]');
If the input string/phrase is "acf ghi z" the second column must be returned as the result, because "gh" is a substring of the input. I read a lot about json_contains, json_extract, json_search and even json_overlaps but couldn't manage to solve this problem.
What is the correct sql syntax to retrieve the related rows?
MySQL version is 8.0.20
You can use json_table() to extract the JSON array as rows in a table. Then just filter:
select *
from test t cross join
json_table(t.col, '$[*]' columns (str varchar(255) path '$')) j
where 'acf ghi z' like concat('%', j.str, '%');
Here is a db<>fiddle.

How to do Where clause on simple Json Array in SQL Server 2017?

Say I have a column in my database called attributes which has this value as an example:
{"pages":["Page1"]}
How can I do a where clause so I can filter down rows that have "Page1" in it.
select JSON_QUERY(Attributes, '$.pages')
from Table
where JSON_QUERY(Attributes, '$.pages') in ('Page1')
Edit:
From the docs it seems like this might work though it seems so complicated for what it is doing.
select count(*)
from T c
cross apply Openjson(c.Attributes)
with (pages nvarchar(max) '$.pages' as json)
outer apply openjson(pages)
with ([page] nvarchar(100) '$')
where [page] = 'Page1'
Something like this:
use tempdb
create table T(id int, Attributes nvarchar(max))
insert into T(id,Attributes) values (1, '{"pages":["Page1"]}')
insert into T(id,Attributes) values (2, '{"pages":["Page3","Page4"]}')
insert into T(id,Attributes) values (3, '{"pages":["Page3","Page1"]}')
select *
from T
where exists
(
select *
from openjson(T.Attributes,'$.pages')
where value = 'Page1'
)
returns
id Attributes
----------- ---------------------------
1 {"pages":["Page1"]}
3 {"pages":["Page3","Page1"]}
(2 rows affected)

How to use operands for json in postgres

I am using postgres 9.4. How to use regular operands such as < , > <= etc with json postgres where key is a numeric, and value is a text till a limit of key numeric value is reached?
This is my table:
create table foo (
id numeric,
x json
);
The values for the json are as follows:
id | x
----+--------------------
1 | '{"1":"A","2":"B"}'
2 | '{"3":"C","4":"A"}'
3 | '{"5":"B","6":"C"}'
so on randomly till key is 100
I am trying to get all the id, keys, values of the json key where key is <= 20.
I have tried:
select *
from foo
where x->>'key' <='5';
The above query ran, and should have given me 20 rows of output, instead it gave me 0. The below query ran, and gave me 20 rows but it took over 30 mins!
select
id
, key::bigint as key
, value::text as value
from foo
, jsonb_each(x::jsonb)
where key::numeric <= 100;
Is there a way to use a for loop or a do-while loop until x = 20 for json? Is there a way the run time be reduced?
Any help appreciated!
The only operator which can query JSON keys & use indexes on jsonb (but not on json) is the ? operator. But unfortunately, you cannot use it in conjunction with <=.
However, you can use generate_series() if your queried range is relatively small:
-- use `jsonb` instead of `json`
create table foo (
id numeric,
x jsonb
);
-- sample data
insert into foo
values (1, '{"1":"A","2":"B"}'),
(2, '{"3":"C","4":"A"}'),
(3, '{"5":"B","6":"C"}'),
(4, '{"7":"A","8":"B"}'),
(5, '{"9":"C","10":"A"}'),
(6, '{"11":"B","12":"C"}');
-- optionally an index to speed up `?` queries
create index foo_x_idx on foo using gin (x);
select distinct foo.*
from generate_series(1, 5) s
join foo on x ? s::text;
To work with larger ranges, you may need to extract all numeric keys of x into an integer array (int[]) & index that.

SSIS amount of insert operations based on record value

I'm migrating data from an old database to a new one in SSIS (2008 R2 Enterprise Edition). In the old database, I have a table called [Financial] and a column named: [Installments]. This column has a numeric value in it: 1, 2, 3 or 4. These are payments in installments. The old database only stores this numeric value and do not provide any more information about individual installments. The new database, however, provide more information of each installment, with columns like: [InstallmentPaid] (if the customer paid the installment), [DateInstallmentPaid] (when the customer paid the installment), [InstallmentNumber] (this is important to specify which installmentnumber it is. If the customer wants to pay in 4 installments, then 4 records will be created. 1 with InstallmentNr1, second with InstallmentNr2, third with InstallmentNr3 and fourth with InstallmentNr4.) and of course the [InstallmentPrice].
So the old database has the table [Financial] with the column [Installments]. The new database has the same [Financial] table, but instead of having a column called [Installments], it has a new relationship called [CustInstallments] ([CustInstallments] has FK FinancialID (1-to-many relationship)
So now that I'm migrating the data from the old database to the new one, I don't want to lose the information about the amount of installments. The following logic should be executed in SSIS in order to prevent information loss:
Foreach [Installments] in [Financial] from the old database, insert a
new [CustInstallment] referencing to the corresponding [FinancialID]
within the new database
So if in the old database the numeric value within [Installments] is 3, then I need to INSERT INTO CustInstallments (FinancialID, InstallmentNumber) VALUES (?, ?) This ? should be 1 at the first insert, 2 and the 2nd and 3 at the 3rd. So I need some kind of a loop here? Is that even possible within the data flow of SSIS?
Below the visualization (figure) and description of my flow so far.
I select the old database source [Financial]
I convert the data so it matches the current database data types
Since I already migrated the old [Financial] database data to the new one, I can use the lookup on the FinancialID's in the new database, so the first variable ? of the INSERT query can be linked to the lookup output.
I split all the possibilities, like when the Installment contains NULL, 1, 2, 3 or 4.
The 5th step is what I'm looking for. Some clue, some direction towards something useful. When NumberOfInstallments is 1, I need to INSERT INTO CustInstallments (FinancialID, InstallmentNumber) VALUES (?, ?) with the second ? variable as 1. When the NumberOfInstallments are 2, then I need to do two inserts, one with InstallmentNumber 1, and one with InstallmentNumber 2. When NumberOfInstallmentNumber is 3, then 3 inserts with a counting NumberOfInstallmentNumber. When 4, then four.
Is there any smart way to achieve this? Is there any built-in function available of SSIS that I am not aware of, and could be used here?
I appreciate any input here!
Thank you.
EDIT 10/02/2014
I have tried the following code:
INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, X);
WITH nums AS(select top 4 row_number() over (order by (select 1)) AS id
from sys.columns
) SELECT f.* FROM CustInstallments f
JOIN nums n on f.InstallmentNumber>= n.id
But this query doesn't create X-amount of records, instead, the JOIN nums just replicates it X-amount of times, so I still can't track every installment individually.
I have written my own code - toke me a while since I never worked with TSQL before - and this works like a charm in SQL Server:
DECLARE #MyCounter tinyint;
SET #MyCounter = 1;
WHILE (SELECT COUNT(*) FROM CustInstallments WHERE FinancialID = #ID) < 4
BEGIN
INSERT INTO CustInstallments (FinancialID, InstallmentNumber) VALUES (#ID, #MyCounter)
IF (SELECT COUNT(*) FROM CustInstallments) > 4
BREAK
ELSE
SET #MyCounter = #MyCounter +1;
CONTINUE
END
Now in SSIS, I cannot change the #ID to a ?-variable, and use the lookup FinancialID, because as soon as I do, I get the following error:
Could anyone explain me why SSIS doesn't like this?
EDIT 10/02/2014
My last and least preferable option would be to use multicast to cast an insert query X amount of times, where each X is an OLE DB Command. For example, when there are 3 [Installments] in the old column, I would create a multicast with 3 OLE DB commands, with their SqlCommand:
OLE DB Command 1: INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, 1);
OLE DB Command 2: INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, 2);
OLE DB Command 3: INSERT INTO CustInstallments (FinancialID, InstallmentNumber) values (?, 3);
This is an ugly approach, but with the small amount of data I am using, perhaps it's not a big deal.
I would try to resolve this with TSQL in your source query. Do a join to some kind of numbers table like this:
create table #fininancial (id int not null identity(1,1), investments int);
go
insert into #fininancial (investments) values (1),(2);
GO
with nums as (select top 5 row_number() over (order by (select 1)) as id
from sys.columns
)
select f.* from #fininancial f
JOIN nums n on f.investments >= n.id
EDIT:
The above example is unclear - sorry about that. I was only presenting the concept of replicating the rows, but not completing the thought of how you will apply it. Try this out:
create table #fininancial (financialid int not null, investments int);
go
insert into #fininancial (financialid, investments) values (123, 1),(456, 2);
GO
with nums as (select top 5 row_number() over (order by (select 1)) as id
from sys.columns
)
select f.financialid, n.id as investments from #fininancial f
JOIN nums n on n.id <= f.investments
So for each financialid you will get multiple investments with different investment ids. This is a set-based way to handle the operation, which will perform better than a procedural method and will require less effort in ssis. Does that make more sense?

MySQL string search between commas

My db is built as follows:
value1,value2,value3 | 1
value4,value5,val"u6 | 2
value 5, value 6, value 8 |3
(Two columns, one with a key separated by commas and the other just a normal var-char)
I'm looking for the most reliable way to find a query within the quotes and I'm getting kinda lost here.
I'm using the word boundaries for that:
SELECT * FROM ABC WHERE content REGEXP '[[:<:]]value 5[[:>:]]'
The problem is when I'm doing this query:
SELECT * FROM ABC WHERE content REGEXP '[[:<:]]5[[:>:]]'
It will also return the value, which is not what I'm looking for. Another problem is that the word boundaries refer to quotes as a word boundary
How can I solve this and create a simple query that will only fetch the exact full query between the quotes?
BTW
I don't have an option to change the DB structure...
As #MarcB commented, you really should try to normalise your schema:
CREATE TABLE ABC_values (
id INT,
content VARCHAR(10),
FOREIGN KEY (id) REFERENCES ABC (id)
);
INSERT INTO ABC_values
(id, content)
VALUES
(1, 'value1'), (1, 'value2'), (1, 'value3'),
(2, 'value4'), (2, 'value5'), (2, 'val"u6'),
(3, 'value 5'), (3, 'value 6'), (3, 'value 8')
;
ALTER TABLE ABC DROP content;
Then, as required, you can perform a SQL join between your tables and group the results:
SELECT id, GROUP_CONCAT(ABC_values.content) AS content
FROM ABC LEFT JOIN ABC_values USING (id) NATURAL JOIN (
SELECT id FROM ABC_values WHERE content = 'value 5'
) t
GROUP BY id
If it is completely impossible to change the schema, you can try FIND_IN_SET():
SELECT * FROM ABC WHERE FIND_IN_SET('value 5', content)
Another workaround is to use LIKE with the delimiters of the items in your list:
WHERE content LIKE ',5,'
But the item you're looking for may be at the start or end of the list. So you have to modify the list on the fly to include the delimiters at the start and end.
WHERE CONCAT(',', content, ',') LIKE '%,5,%' -> this works for me on mysql
This works, and in some sense it's no worse than any other search that you do for an item in a comma-separated list. That's because such a search is bound to do a table-scan and therefore it's very inefficient. As the data in your table grows, you'll find it can't perform well enough to be useful.
See also my answer to Is storing a delimited list in a database column really that bad?