Select and insert JSON file into SQL Server table - mysql

I'm trying to import the entirety of a JSON file into a table of mine in SQL Server.
The JSON data looks like this:
{
"category": "General Knowledge",
"type": "multiple",
"difficulty": "hard",
"question": "Electronic music producer Kygo's popularity skyrocketed after a certain remix. Which song did he remix?",
"correct_answer": "Ed Sheeran - I See Fire",
"incorrect_answers": [
"Marvin Gaye - Sexual Healing",
"Coldplay - Midnight",
"a-ha - Take On Me"
]
},
With multiple entries like this.
I'm attempting to use OPENROWSET and OPENJSON to accomplish this using the following query:
SELECT value
FROM OPENROWSET (BULK 'C:\Users\USERNAME\Desktop\general_questions.json', SINGLE_CLOB) as j
CROSS APPLY OPENJSON(BulkColumn)
However, the output I'm getting only shows the first question object in the file. I have a two part question:
How can I get my query to select ALL of the objects in the file and then insert all of those objects into a table in my SQL Server db?

If I understood you right, is it this:
SELECT value
FROM OPENROWSET (BULK 'C:\Users\USERNAME\Desktop\general_questions.json', SINGLE_CLOB) as j
CROSS APPLY OPENJSON(BulkColumn)
WITH
(
CATEGORY VARCHAR2(MAX),
...
) AS JSON_TABLE
Also, am not sure what you mean by "However, the output I'm getting only shows the first question object in the file."? Do you mean the question object has multiple attributes?

Related

How to query JSON data in Athena with an # symbol in the key name and duplicate keys

The data I have been tasked to query is structured like this:
{
"#timestamp": "2022-11-17T21:00:19.191+00:00",
"#version": 1,
"message": "log message",
"logger_name": "com.logger.name",
"thread_name": "tomcat-thread-13",
"level": "INFO",
"level_value": 20000,
"application_name": "app_name",
"vpc": "vpc_name",
"region": "eu-west-1",
"aid": "ffffffff-ffff-ffff-ffff-ffffffffffff",
"account": "prod",
"rq": "ffffffff-ffff-ffff-ffff-ffffffffffff",
"log_shipper": "firehose",
"application_name": "app_name",
"account": "prod",
"region": "eu-west-1"
}
As you can see there are some duplicate keys in here, so both the Hive and OpenX JSON SerDe throw an error and won't query it at all.
I've created a table using the Ion SerDe, which can read the data, but the #timestamp and #version fields are always blank, all the other fields are read correctly.
The initial table definition I had was this...
CREATE EXTERNAL TABLE firehose_logs_pe (
`#timestamp` STRING,
`#version` STRING,
<other columns>
)
ROW FORMAT SERDE
'com.amazon.ionhiveserde.IonHiveSerDe'
STORED AS ION
LOCATION 's3://s3-bucket-name/folder/'
I also tried to rename the fields and use a path extractor to get the values, like this...
CREATE EXTERNAL TABLE firehose_logs_pe (
ts STRING,
version STRING,
<other columns>
)
ROW FORMAT SERDE
'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES (
'ion.ts.path_extractor' = '(`#timestamp`)',
'ion.version.path_extractor' = '(`#version`)'
)
STORED AS ION
LOCATION 's3://s3-bucket-name/folder/'
However, the values of the ts and version fields are still empty. The query also seems to run slower using the path extractors.
Is there any way to query this data in this format with Athena? As a test I did a find and replace on one of the JSON files and removed the #, at which point everything worked as it should, however this is not a practical solution when I have about 20Tb of data to query in hundreds of millions of files.

SQL Server 2014 JSON into table

I'm using SQL Server 2014 and aware that out of the box it does not support JSON.
We are receiving data from a 3rd party supplier that will look like the below:
{
"PersonID": "1",
"MarketingPreference": "Allow",
"AllowPhone": "No",
"AllowEmail": "Yes",
"AllowTxt": "Yes",
"AllowMob": "Yes"
}
However, we may sometimes also receive the below:
{
"PersonID": "2",
"MarketingPreference": "DoNotAllow"
}
I need to insert these values into a table - what is the best way to do this if SQL Server 2014 does not support JSON?
If I convert the JSON to XML it looks like the below:
<PersonID>1</PersonID>
<MarketingPreference>Allow</MarketingPreference>
<AllowPhone>No</AllowPhone>
<AllowEmail>Yes</AllowEmail>
<AllowTxt>Yes</AllowTxt>
<AllowMob>Yes</AllowMob>
How do I then extract the values from the XML?
DECLARE #xml XML
SET #xml = N'
<PersonID>1</PersonID>
<MarketingPreference>Allow</MarketingPreference>
<AllowPhone>No</AllowPhone>
<AllowEmail>Yes</AllowEmail>
<AllowTxt>Yes</AllowTxt>
<AllowMob>Yes</AllowMob>'
SELECT
Tab.Col.value('#PersonID','int') AS ContactID,
Tab.Col.value('#MarketingPreference','varchar(20)') AS Pref,
Tab.Col.value('#AllowPhone','varchar(20)') AS Phone,
Tab.Col.value('#AllowEmail','varchar(20)') AS Email,
Tab.Col.value('#AllowTxt','varchar(20)') AS Txt,
Tab.Col.value('#AllowMob','varchar(20)') AS Mob
FROM
#xml.nodes('/root/') Tab(Col)
GO;
But now I get this error:
Incorrect syntax near 'GO'.
Is there an easier way to select the values from JSON?
You don't need a GO (never mind GO; which is not valid), and your XML syntax just seems to have been plucked from your first search result? Try:
SELECT PersonID = x.p.value('(PersonID)[1]', 'int'),
MarkPref = x.p.value('(MarketingPreference)[1]', 'varchar(20)'),
AllowPhone = x.p.value('(AllowPhone)[1]','varchar(20)'),
AllowEmail = x.p.value('(AllowEmail)[1]','varchar(20)'),
AllowTxt = x.p.value('(AllowTxt)[1]', 'varchar(20)'),
AllowMob = x.p.value('(AllowMob)[1]', 'varchar(20)')
FROM #xml.nodes('.') AS x(p);
Output:
PersonID
MarkPref
AllowPhone
AllowEmail
AllowTxt
AllowMob
1
Allow
No
Yes
Yes
Yes
Example db<>fiddle

T-SQL - search in filtered JSON array

SQL Server 2017.
Table OrderData has column DataProperties where JSON is stored. JSON example stored there:
{
"Input": {
"OrderId": "abc",
"Data": [
{
"Key": "Files",
"Value": [
"test.txt",
"whatever.jpg"
]
},
{
"Key": "Other",
"Value": [
"a"
]
}
]
}
}
So, it's an object with Input object, which has Data array that's KVP - full of objects with Key string and Value array of strings.
And my problem - I need to query for rows based on values in Files in example JSON - simple LIKE that matches %text%.
This query works:
SELECT TOP 10 *
FROM OrderData CROSS APPLY OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0
AND JSON_QUERY(dat.value, '$.Value') LIKE '%2%'
Problem is that this query is very slow, unsurprisingly.
How to make it faster?
I cannot create computed column with JSON_VALUE, because I need to filter in an array.
I cannot create computed column with JSON_QUERY on "$.Input.Data" or "$.Input.Data[0].Values" - because I need specific array item in this array with Key == "Files".
I've searched, but it seems that you cannot create computed column that also filters data, like with this attempt:
ALTER TABLE OrderData
ADD aaaTest AS (select JSON_QUERY(dat.value, '$.Value')
OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0 );
Error: Subqueries are not allowed in this context. Only scalar expressions are allowed.
What are my options?
Add Files column with an index and use INSERT/UPDATE triggers that populate this column on inserts/updates?
Create a view that "computes" this column? Can't add index, will still be slow
So far only option 1. has some merit, but I don't like triggers and maybe there's another option?
You might try something along this:
Attention: I've added a 2 to the text2 to fullfill your filter. And I named both to the plural "Values":
DECLARE #mockupTable TABLE(ID INT IDENTITY, DataProperties NVARCHAR(MAX));
INSERT INTO #mockupTable VALUES
(N'{
"Input": {
"OrderId": "abc",
"Data": [
{
"Key": "Files",
"Values": [
"test2.txt",
"whatever.jpg"
]
},
{
"Key": "Other",
"Values": [
"a"
]
}
]
}
}');
The query
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
AND dat.[Values] LIKE '%2%';
The main difference is the WITH-clause, which is used to return the properties inside an object in a typed way and side-by-side (similar to a naked OPENJSON with a PIVOT for all columns - but much better). This avoids expensive JSON methods in your WHERE...
Hint: As we return the Value with NVARCHAR(MAX) AS JSON we can continue with the nested array and might proceed with something like this:
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
--we read the array again with `OPENJSON`:
AND 'test2.txt' IN(SELECT [Value] FROM OPENJSON(dat.[Values]));
You might use one more CROSS APPLY to add the array's values and filter this at the WHERE directly.
SELECT TOP 10 *
FROM #mockupTable t
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
,[Values] NVARCHAR(MAX) AS JSON) dat
CROSS APPLY OPENJSON(dat.[Values]) vals
WHERE dat.[Key] = 'Files'
AND vals.[Value]='test2.txt'
Just check it out...
This is an old question, but I would like to revisit it. There isn't any mention of how the source table is actually constructed in terms of indexing. If the original author is still around, can you confirm/deny what indexing strategy you used? For performant json document queries, I've found that having a table using the COLUMSTORE indexing strategy yields very performant JSON queries even with large amounts of data.
https://learn.microsoft.com/en-us/sql/relational-databases/json/store-json-documents-in-sql-tables?view=sql-server-ver15 has an example of different indexing techniques. For my personal solution I've been using COLUMSTORE albeit on a limited NVARCAHR document size. It's fast enough for any purposes I have even under millions of rows of decently sized json documents.

U-SQL - Extract data from complex json object

So I have a lot of json files structured like this:
{
"Id": "2551faee-20e5-41e4-a7e6-57bd20b02a22",
"Timestamp": "2016-12-06T08:09:57.5541438+01:00",
"EventEntry": {
"EventId": 1,
"Payload": [
"1a3e0c9e-ef69-4c6a-ac8c-9b2de2fbc701",
"DHS.PlanCare.Business.BusinessLogic.VisionModels.VisionModelServiceWithoutUnitOfWork.FetchVisionModelsForClientOnReferenceDateAsync(System.Int64 clientId, System.DateTime referenceDate, System.Threading.CancellationToken cancellationToken)",
25,
"DHS.PlanCare.Business.BusinessLogic.VisionModels.VisionModelServiceWithoutUnitOfWork+<FetchVisionModelsForClientOnReferenceDateAsync>d__11.MoveNext\r\nDHS.PlanCare.Core.Extensions.IQueryableExtensions+<ExecuteAndThrowTaskCancelledWhenRequestedAsync>d__16`1.MoveNext\r\n",
false,
"2197, 6-12-2016 0:00:00, System.Threading.CancellationToken"
],
"EventName": "Duration",
"KeyWordsDescription": "Duration",
"PayloadSchema": [
"instanceSessionId",
"member",
"durationInMilliseconds",
"minimalStacktrace",
"hasFailed",
"parameters"
]
},
"Session": {
"SessionId": "0016e54b-6c4a-48bd-9813-39bb040f7736",
"EnvironmentId": "C15E535B8D0BD9EF63E39045F1859C98FEDD47F2",
"OrganisationId": "AC6752D4-883D-42EE-9FEA-F9AE26978E54"
}
}
How can I create an u-sql query that outputs the
Id,
Timestamp,
EventEntry.EventId and
EventEntry.Payload[2] (value 25 in the example below)
I can't figure out how to extend my query
#extract =
EXTRACT
Timestamp DateTime
FROM #"wasb://xxx/2016/12/06/0016e54b-6c4a-48bd-9813-39bb040f7736/yyy/{*}/{*}.json"
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
#res =
SELECT Timestamp
FROM #extract;
OUTPUT #res TO "/output/result.csv" USING Outputters.Csv();
I have seen some examples like:
U- SQL Unable to extract data from JSON file => this only queries one level of the document, I need data from multiple levels.
U-SQL - Extract data from json-array => this only queries one level of the document, I need data from multiple levels.
JSONTuple supports multiple JSONPaths in one go.
#extract =
EXTRACT
Id String,
Timestamp DateTime,
EventEntry String
FROM #"..."
USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
#res =
SELECT Id, Timestamp, EventEntry,
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(EventEntry,
"EventId", "Payload[2]") AS Event
FROM #extract;
#res =
SELECT Id,
Timestamp,
Event["EventId"] AS EventId,
Event["Payload[2]"] AS Something
FROM #res;
You may want to look at this GIT example. https://github.com/Azure/usql/blob/master/Examples/JsonSample/JsonSample/NestedJsonParsing.usql
This take 2 disparate data elements and combines them, like you have the Payload, and Payload schema. If you create key value pairs using the "Donut" or "Cake and Batter" examples you may be able to match the scema up to the payload and use the cross apply explode function.

SQL Server dynamic JSON using within Analysis Services?

I am trying to get my head around which direction to even start with the following..
Imaging a dynamic form (JSON) that I store in SQL Server 2016+. So far, I have seen / tried a couple of dynamic queries to take the dynamic JSON and flatten out as columns.
Given the "dynamic" nature, it is hard to "store" that flatten out data. I have been looking at temporary/temporal/memory tables to store that dynamic flattened data for a "relatively short period" of time (say an hour or two).
I have also been asked if it is possible to use the dynamic JSON data in building a cube within Analysis Services.. again given the dynamic nature of this, would something like this even be possible?
I guess my question is two-fold:
Pointers to flatten out dynamic JSON within SQL Server
Is it possible to take dynamic JSON, flatten out to columns and somehow use within Analysis Services? i.e. ultimately to use within a cube?
Realise the above is a bit vague, but any pointers to get me going in the correct direction would be appreciated!
Many thanks.
Dynamically converting JSON into columns can get tricky. Especially if you are NOT certain of the structure. That said, have you considered converting the JSON into a hierarchy via a Recursive CTE?
Example
declare #json varchar(max)='
[
{
"url": "https://www.google.com",
"image-url": "https://www.google.com/imghp",
"labels": [
{
"source": "Bob, Inc",
"name": "Whips",
"info": "Ouch"
},
{
"source": "Weezles of Oregon",
"name": "Chains",
"info": "Let me go"
}
],
"Fact": "Fictional"
}
]';
;with cte0 as (
Select *
,[Level]=1
,[Path]=convert(varchar(max),row_number() over(order by (select null)))
From OpenJSON(#json,'$')
Union All
Select R.*
,[Level]=p.[Level]+1
,[Path]=concat(P.[Path],'\',row_number() over(order by (select null)))
From cte0 p
Cross Apply OpenJSON(p.value,'$') R
Where P.[Type]>3
)
Select [Level]
,[Path]
,Title = replicate('|---',[Level]-1)+[Key]
,Item = [Key]
,Value = case when [type]<4 then Value else null end
From cte0
Order By [Path]
Returns