Constructing nested json arrays in T-SQL - json

Given the table:
C1 C2 C3
----------------
1 'v1' 1.1
2 'v2' 2.2
3 'v3' 3.3
Is there any "easy" way to return JSON in this format:
{
"columns": [ "C1", "C2", "C3" ],
"rows": [
[ 1, "v1", 1.1 ],
[ 2, "v2", 2.2 ],
[ 3, "v3", 3.3 ]
]
}
To generate an array with single values from a table there is a neat trick like this:
SELECT JSON_QUERY(REPLACE(REPLACE(
(
SELECT id
FROM table a
WHERE pk in (1,2)
FOR JSON PATH
), '{"id":',''),'}','')) 'ids'
Which generates
"ids": [1,2]
But to construct the nested array above the replacing gets really tedious, anyone know a good way to achieve this?

Well, you ask for an easy way but the following will not be easy :-)
The tricky part is to know which values need to be qouted and which can remain naked.
This needs generic type-analysis to find, which values are strings.
The only way I know to get on meta data (besides building dynamic sql using meta views like INFORMATIONSCHEMA.COLUMNS) is XML together with an AUTO-schema.
This XML is very near to your needs actually. There is a list of columns at the beginning, followed by a list of rows. But it is not JSON of course...
Try this out:
--This is a mockup table with the values you provided.
DECLARE #mockup TABLE(C1 INT,C2 VARCHAR(100),C3 DECIMAL(4,2));
INSERT INTO #mockup VALUES
(1,'v1',1.1)
,(2,'v2',2.2)
,(3,'v3',3.3);
--Now we create an XML out of this
DECLARE #xml XML =
(
SELECT *
FROM #mockup t
FOR XML RAW,XMLSCHEMA,TYPE
);
--Check the XML's content with SELECT #xml to see how it is looking internally
--Now the real query can start:
SELECT '{"columns":[' +
STUFF(#xml.query('declare namespace xsd="http://www.w3.org/2001/XMLSchema";
for $col in /xsd:schema/xsd:element//xsd:attribute
return
<x>,{concat("""",xs:string($col/#name),"""")}</x>
').value('.','nvarchar(max)'),1,1,'') +
'],"rows":[' +
STUFF(
(
SELECT
',[' + STUFF(b.query(' declare namespace xsd="http://www.w3.org/2001/XMLSchema";
for $attr in ./#*
return
<x>,{if(/xsd:schema/xsd:element//xsd:attribute[#name=local-name($attr)]//xsd:restriction/#base="sqltypes:varchar") then
concat("""",$attr,"""")
else
xs:string($attr)
}
</x>
').value('.','nvarchar(max)'),1,1,'') + ']'
FROM #xml.nodes('/*:row') B(b)
FOR XML PATH(''),TYPE
).value('.','nvarchar(max)'),1,1,'') +
']}';
The result
{"columns":["C1","C2","C3"],"rows":[[3,"v3",3.30],[1,"v1",1.10],[2,"v2",2.20]]}
Some explanation:
The first part will use XQuery to find all columns (xsd:attribute within XML-schema) and create the array of column names.
The second part will againt use XQuery in order to run through all rows and write their column values in a concatenated string. Each value can refer to its type within the schema. Whenever this type is sqltypes:varchar the value will be quoted. All other values remain naked.
This will not solve each and any case generically...
To be honest, this was more for my own curiosity :-) Wanted to find out, how one can solve this.
Quite probably the best answer is: Use another tool. SQL-Server is not the best choice here ;-)

Related

I have a MySQL database whereI am trying to retrieve JSON data that stores urls but the results keep coming back empty

The data in the column looks like the below:
{
"activity": {
"token": "e7b64be4-74d4-7a6d-a74b-xxxxxxx",
"route": "http://example.com/enroll/confirmation",
"url_parameters": {
"Success": "True",
"ContractNumber": "003992314W",
"Barcode": "1908Y10Z",
"price": "8.99"
},
"server_info": {
"cookie": [
"_ga=xxxx; _fbp=xxx; _hjid=xxx; XDEBUG_SESSION=XDEBUG_ECLIPSE;"
],
"upgrade-insecure-requests": [
"1"
],
},
"campaign": "Unknown/None",
"ip": "192.168.10.1",
"entity": "App\\Models\\User",
"entity_id": "1d9f3066-13ce-4659-b10d-xxxxx",
},
"time": "2021-05-21 20:15:02"
}
My code that I am using is below:
SELECT *
FROM websote.stored_events
WHERE JSON_EXTRACT(event_properties, '$.route') = 'http://example.com/enroll/confirmation'
ORDER BY created_at DESC LIMIT 500;
The code works on the other the json values just not the url ones. I've tried escaping the values in MySQL like the below:
SELECT *
FROM websote.stored_events
WHERE JSON_EXTRACT(event_properties, '$.route') = 'http:///example.com//enroll//confirmation'
ORDER BY created_at DESC LIMIT 500;
But still no luck. Any help on this would be appreciated!
Route is a nested property; I would have expected the path to be
JSON_EXTRACT(event_properties, '$.activity.route')
Your example data isn't valid JSON. You can't have a comma after the last element in an object or array:
"entity_id": "1d9f3066-13ce-4659-b10d-xxxxx",
},
^ here
If I remove that and other similar cases, I can test your data inserts into a JSON column and I can extract the object element you described:
mysql> select json_extract(event_properties, '$.activity.route') as route from stored_events;
+------------------------------------------+
| route |
+------------------------------------------+
| "http://example.com/enroll/confirmation" |
+------------------------------------------+
Note the value is returned with double-quotes. This is because it's returned as a JSON document, a scalar string. If you want the raw value, you have to unquote it:
mysql> select json_unquote(json_extract(event_properties, '$.activity.route')) as route from stored_events;
+----------------------------------------+
| route |
+----------------------------------------+
| http://example.com/enroll/confirmation |
+----------------------------------------+
If you want to search for that value, you would have to do a similar expression:
select * from stored_events
where json_unquote(json_extract(event_properties, '$.activity.route'))
= 'http://example.com/enroll/confirmation'
Searching based on object properties stored in JSON has disadvantages.
It requires complex expressions that force you (and anyone else you needs to maintain your code) to learn a lot of details about how JSON works.
It cannot be optimized with an index. This query will run a table-scan. You can add virtual columns with indexes, but that adds to complexity and if you need to ALTER TABLE to add virtual columns, it misses the point of JSON to store semi-structured data.
The bottom line is that if you find yourself using JSON functions in the WHERE clause of a query, it's a sign that you should be storing the column you want to search as a normal column, not as part of a JSON document.
Then you can write code that is easy to read, easy for your colleagues to maintain, and can be optimized easily with indexes:
SELECT * FROM stored_events
WHERE route = 'http://example.com/enroll/confirmation';
You can still store other properties in the JSON document, but the ones you want to be searchable should be stored in normal columns.
You might like to view my presentation How to Use JSON in MySQL Wrong.

Bracket notation for SQL Server json_value?

This works:
select json_value('{ "a": "b" }', '$.a')
This doesn't work:
select json_value('{ "a": "b" }', '$["a"]')
and neither does this:
select json_value('{ "a": "b" }', '$[''a'']')
In JSON, these are the same:
foo = { "a": "b" }
console.log(foo.a)
console.log(foo["a"])
What am I missing? I get an error trying to use bracket notation in SQL Server:
JSON path is not properly formatted. Unexpected character '"' is found at position 2
No sooner do I ask, than I stumble on an answer. I couldn't find this in any documentation anywhere, but select json_value('{ "a": "b" }', '$."a"') works. Bracket notation is not supported, but otherwise invalid keys can be escaped with quotation marks, e.g. select json_value('{ "I-m so invalid][": "b" }', '$."I-m so invalid]["') when in JavaScript that would be foo["I-m so invalid]["]
MsSql reserves this for array index. SQL parses all JSON as a string literal, instead of as an object(JSON or ARRAY) with any hidden key.
Some of what SQL can do will vary with version. Here's a crash course on the annoying (but also really powerful, and fast once in place) requirements. I'm posting more than you need because a lot of the documentation for JSON through MsSql is lacking, and doesn't do justice to how strong it is with JSON.
MsDoc here: https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver15
In this example, we are working with a JSON "object" to separate the data into columns. Note how calling a position inside of an array is weird.
declare #data nvarchar(max) = N'[{"a":"b","c":[{"some":"random","array":"value"},{"another":"random","array":"value"}]},{"e":"f","c":[{"some":"random","array":"value"},{"another":"random","array":"value"}]}]'
--make sure SQL is happy. It will not accept partial snippets
select ISJSON(#data)
--let's look at the data in tabular form
select
json1.*
, json2.*
from openjson(#data)
with (
a varchar --note there is no "path" specified here, as "a" is a key in the first layer of the object
, c nvarchar(max) as JSON --must use "nvarchar(max)" and "as JSON" or SQL freaks out
, c0 nvarchar(max) N'$.c[0]' as JSON
) as json1
cross apply openjson(json1.c) as json2
You can also pull out the individual values, if needed
select oj.value from openjson(#data) as oj where oj.[key] = 1;
select
oj.value
, JSON_VALUE(oj.value,N'$.e')
, JSON_VALUE(oj.value,N'$.c[0].some')
, JSON_VALUE(#data,N'$[1].c[0].some') --Similar to your first example, but uses index position instead of key value. Works because SQL views the "[]" brackets as an array while trying to parse.
from openjson(#data) as oj
where oj.[key] = 1

T-SQL - search in filtered JSON array

SQL Server 2017.
Table OrderData has column DataProperties where JSON is stored. JSON example stored there:
{
"Input": {
"OrderId": "abc",
"Data": [
{
"Key": "Files",
"Value": [
"test.txt",
"whatever.jpg"
]
},
{
"Key": "Other",
"Value": [
"a"
]
}
]
}
}
So, it's an object with Input object, which has Data array that's KVP - full of objects with Key string and Value array of strings.
And my problem - I need to query for rows based on values in Files in example JSON - simple LIKE that matches %text%.
This query works:
SELECT TOP 10 *
FROM OrderData CROSS APPLY OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0
AND JSON_QUERY(dat.value, '$.Value') LIKE '%2%'
Problem is that this query is very slow, unsurprisingly.
How to make it faster?
I cannot create computed column with JSON_VALUE, because I need to filter in an array.
I cannot create computed column with JSON_QUERY on "$.Input.Data" or "$.Input.Data[0].Values" - because I need specific array item in this array with Key == "Files".
I've searched, but it seems that you cannot create computed column that also filters data, like with this attempt:
ALTER TABLE OrderData
ADD aaaTest AS (select JSON_QUERY(dat.value, '$.Value')
OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0 );
Error: Subqueries are not allowed in this context. Only scalar expressions are allowed.
What are my options?
Add Files column with an index and use INSERT/UPDATE triggers that populate this column on inserts/updates?
Create a view that "computes" this column? Can't add index, will still be slow
So far only option 1. has some merit, but I don't like triggers and maybe there's another option?
You might try something along this:
Attention: I've added a 2 to the text2 to fullfill your filter. And I named both to the plural "Values":
DECLARE #mockupTable TABLE(ID INT IDENTITY, DataProperties NVARCHAR(MAX));
INSERT INTO #mockupTable VALUES
(N'{
"Input": {
"OrderId": "abc",
"Data": [
{
"Key": "Files",
"Values": [
"test2.txt",
"whatever.jpg"
]
},
{
"Key": "Other",
"Values": [
"a"
]
}
]
}
}');
The query
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
AND dat.[Values] LIKE '%2%';
The main difference is the WITH-clause, which is used to return the properties inside an object in a typed way and side-by-side (similar to a naked OPENJSON with a PIVOT for all columns - but much better). This avoids expensive JSON methods in your WHERE...
Hint: As we return the Value with NVARCHAR(MAX) AS JSON we can continue with the nested array and might proceed with something like this:
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
--we read the array again with `OPENJSON`:
AND 'test2.txt' IN(SELECT [Value] FROM OPENJSON(dat.[Values]));
You might use one more CROSS APPLY to add the array's values and filter this at the WHERE directly.
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
CROSS APPLY OPENJSON(dat.[Values]) vals
WHERE dat.[Key] = 'Files'
AND vals.[Value]='test2.txt'
Just check it out...
This is an old question, but I would like to revisit it. There isn't any mention of how the source table is actually constructed in terms of indexing. If the original author is still around, can you confirm/deny what indexing strategy you used? For performant json document queries, I've found that having a table using the COLUMSTORE indexing strategy yields very performant JSON queries even with large amounts of data.
https://learn.microsoft.com/en-us/sql/relational-databases/json/store-json-documents-in-sql-tables?view=sql-server-ver15 has an example of different indexing techniques. For my personal solution I've been using COLUMSTORE albeit on a limited NVARCAHR document size. It's fast enough for any purposes I have even under millions of rows of decently sized json documents.

Mapping JSON in Hive where Nested Fields Have Underscores

We are attempting to create a schema to load a massive JSON structure into Hive. We are having a problem, however, in that some fields have leading underscores for names--at the root level, this is fine, but we have not found a way to make this work for nested fields.
Sample JSON:
{
"_id" : "319FFE15FF908EDD86B7FDEADBEEFBD8D7284128841B14AA6A966923C268DF39",
"SomeThing" :
{
"_SomeField" : 22,
"AnotherField" : 2112,
"YetAnotherField": 1
}
. . . etc . . . .
Using a schema as follows:
create table testSample
(
id string,
something struct
<
somefield:int,
anotherfield:bigint,
yetanotherfield:int
>
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties
(
"mapping.id" = "_id",
"mapping.somefield" = "_somefield"
);
This schema builds OK--however, after loading the in above sample, the value of "somefield" (the nested + leading underscore one) is always null (all the other values exist and are correct).
We've been trying a lot of syntax combinations, but to no avail.
Does anyone know the trick to hap a nested field with a leading underscore in its name?
Cheers!
Answering my own question here: there is no trick because you can't.
However, there's an easy work-around: you can tell Hive to treat the names as literals upon creating the schema. If you do this, you will also need to query using the same literal syntax. In the above example, it would look like:
`_something` struct<rest_of_definitions>
without any special serde properties for it.
Then use again in query:
select stuff.`_something` from sometable;
e.g., schema:
create table testSample
(
id string,
something struct
<
`_somefield`:int,
anotherfield:bigint,
yetanotherfield:int
>
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties("mapping.id" = "_id");
for an input JSON like:
{
"_id": "someuid",
"something":
{
"_somefield": 1,
"anotherfield": 2,
"yetanotherfield": 3
}
}
with a query like:
select something.`_somefield`
from testSample
where something.anotherfield = 2;

SQL Server dynamic JSON using within Analysis Services?

I am trying to get my head around which direction to even start with the following..
Imaging a dynamic form (JSON) that I store in SQL Server 2016+. So far, I have seen / tried a couple of dynamic queries to take the dynamic JSON and flatten out as columns.
Given the "dynamic" nature, it is hard to "store" that flatten out data. I have been looking at temporary/temporal/memory tables to store that dynamic flattened data for a "relatively short period" of time (say an hour or two).
I have also been asked if it is possible to use the dynamic JSON data in building a cube within Analysis Services.. again given the dynamic nature of this, would something like this even be possible?
I guess my question is two-fold:
Pointers to flatten out dynamic JSON within SQL Server
Is it possible to take dynamic JSON, flatten out to columns and somehow use within Analysis Services? i.e. ultimately to use within a cube?
Realise the above is a bit vague, but any pointers to get me going in the correct direction would be appreciated!
Many thanks.
Dynamically converting JSON into columns can get tricky. Especially if you are NOT certain of the structure. That said, have you considered converting the JSON into a hierarchy via a Recursive CTE?
Example
declare #json varchar(max)='
[
{
"url": "https://www.google.com",
"image-url": "https://www.google.com/imghp",
"labels": [
{
"source": "Bob, Inc",
"name": "Whips",
"info": "Ouch"
},
{
"source": "Weezles of Oregon",
"name": "Chains",
"info": "Let me go"
}
],
"Fact": "Fictional"
}
]';
;with cte0 as (
Select *
,[Level]=1
,[Path]=convert(varchar(max),row_number() over(order by (select null)))
From OpenJSON(#json,'$')
Union All
Select R.*
,[Level]=p.[Level]+1
,[Path]=concat(P.[Path],'\',row_number() over(order by (select null)))
From cte0 p
Cross Apply OpenJSON(p.value,'$') R
Where P.[Type]>3
)
Select [Level]
,[Path]
,Title = replicate('|---',[Level]-1)+[Key]
,Item = [Key]
,Value = case when [type]<4 then Value else null end
From cte0
Order By [Path]
Returns