Parse JSON data in T-SQL [duplicate] - json

This is driving me nuts, and I don't understand what's wrong with my approach.
I generate a JSON object in SQL like this:
select #output = (
select distinct lngEmpNo, txtFullName
from tblSecret
for json path, root('result'), include_null_values
)
I get a result like this:
{"result":[{"lngEmpNo":696969,"txtFullName":"Clinton, Bill"}]}
ISJSON() confirms that it's valid JSON, and JSON_QUERY(#OUTPUT, '$.result') will return the array [] portion of the JSON object... cool!
BUT, I'm trying to use JSON_QUERY to extract a specific value:
This gets me a NULL value. Why??????? I've tried it with the [0], without the [0], and of course, txtFullName[0]
SELECT JSON_QUERY(#jsonResponse, '$.result[0].txtFullName');
I prefixed with strict, SELECT JSON_QUERY(#jsonResponse, 'strict $.result[0].txtFullName');, and it tells me this:
Msg 13607, Level 16, State 4, Line 29
JSON path is not properly formatted. Unexpected character 't' is found at
position 18.
What am I doing wrong? What is wrong with my structure?

JSON_QUERY will only extract an object or an array. You are trying to extract a single value so, you need to use JSON_VALUE. For example:
SELECT JSON_VALUE(#jsonResponse, '$.result[0].txtFullName');

Related

Redshift JSON Parsing

I have some JSON data in Redshift table of type character varying. An example entry is:
[{"value":["*"], "key":"testData"}, {"value":"["GGG"], key: "differentData"}]
I want to return vales based on keys, how can i do this? I'm attempting to do something like
json_extract_path_text(column, 'value') but unfortunately it errors out. Any ideas?
So the first issue is that your string isn't valid JSON. There are mismatched and missing quotes. I think you mean:
[{"value":["*"], "key":"testData"}, {"value":["GGG"], "key": "differentData"}]
I don't know if this is a data issue or a transcription error but these functions won't work unless the json text is valid.
The next thing to consider is that at the top level this json is an array so you will need to use json_extract_array_element_text() function to pick up an element of the array. For example:
json_extract_array_element_text('json string', 0)
So putting this together we can extract the first "value" with (untested):
json_extract_path_text(
json_extract_array_element_text(
'[{"value":["*"], "key":"testData"}, {"value":["GGG"], "key": "differentData"}]', 0
), 'value'
)
Should return the string ["*"].

How do we modify an json array object regardless of its position?

The problem
Each entity owns an id and a json field. That json field simply stores a json list of objects.
Entity{ id, json }
"1, '[{"tag": "Player"}, {"position": {"x": 20, "y": 20}}]'"
The order of those json objects is not always the same and i want to update the json object inside the array where "tag" :"Player". I basically wanna change the tag.
I tried to use json_replace, but it didnt worked because it seems like that function does not accept the $** wildcard. But i cant use $[0] because that json object is not always at the first position. Thats what i tried.
UPDATE entity
SET jsonComponents = JSON_REPLACE(
jsonComponents ,
'$**.tag' ,
'NewTag'
)
WHERE
entity.id = 1
The Question
How are we supposed to modify/remove an json object inside an pure json list, if we dont know where its located at ? How can we modify/remove a json object inside a list regardless of its position inside the list ?
Im actually very glad for any help on this topic, couldnt find anything about it...
The solution
If we dont know the path of the json object we seek to modify... we simply query for the path using json_search
update entity
set jsonComponents = JSON_REPLACE(
jsonComponents,
JSON_UNQUOTE(json_search(jsonComponents, 'one', 'Player')),
'NewTag'
)
where entity.id = 0

JSON_QUERY unable to handle NULL

I have a table. Few columns are of String, Integer type. And few are JSON type. I am writing a query to form each row as a json object. I have issues with JSON_QUERY(jsondataColumnName). If the column is populated NULL JSON_QUERY fails.
I have already written query below.
select
(
SELECT [customerReferenceNumber] as customerReferenceNumber
,[customerType] as customerType
,[personReferenceNumber] as personReferenceNumber
,[organisationReferenceNumber] as organisationReferenceNumber
,json_query(isnull(product,'')) as product
,json_query(isnull([address],'')) as address
FROM [dbo].[customer]
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER) AS customer
from [dbo].[customer] P
Msg 13609, Level 16, State 1, Line 2
JSON text is not properly formatted. Unexpected character '.' is found at position 0.
By default, for JSON don't work with NULL VALUES. Use INCLUDE_NULL_VALUES to handle.
As Example: JSON PATH, WITHOUT_ARRAY_WRAPPER, INCLUDE_NULL_VALUES) AS customer
As reference:
https://learn.microsoft.com/en-us/sql/relational-databases/json/include-null-values-in-json-include-null-values-option?view=sql-server-ver15

Parse JSON into U-SQL then convert to csv

I'm trying to convert some telemetry data that is in JSON format into CSV format, then write it out to a file, using U-SQL.
The problem is that some of the JSON key values have periods in them, and so when I'm doing the SELECT operation, U-SQL is not recognizing them. When I check the output file, all that I am seeing is the values for "p1". How can I represent the names of the JSON key names in the script so that they are recognized. Thanks in advance for any help!
Code:
REFERENCE ASSEMBLY MATSDevDB.[Newtonsoft.Json];
REFERENCE ASSEMBLY MATSDevDB.[Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
#jsonDocuments =
EXTRACT jsonString string
FROM #"adl://xxxx.azuredatalakestore.net/xxxx/{*}/{*}/{*}/telemetry_{*}.json"
USING Extractors.Tsv(quoting:false);
#jsonify =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS json
FROM #jsonDocuments;
#columnized = SELECT
json["EventInfo.Source"] AS EventInfoSource,
json["EventInfo.InitId"] AS EventInfoInitId,
json["EventInfo.Sequence"] AS EventInfoSequence,
json["EventInfo.Name"] AS EventInfoName,
json["EventInfo.Time"] AS EventInfoTime,
json["EventInfo.SdkVersion"] AS EventInfoSdkVersion,
json["AppInfo.Language"] AS AppInfoLanguage,
json["UserInfo.Language"] AS UserInfoLanguage,
json["DeviceInfo.BrowserName"] AS DeviceInfoBrowswerName,
json["DeviceInfo.BrowserVersion"] AS BrowswerVersion,
json["DeviceInfo.OsName"] AS DeviceInfoOsName,
json["DeviceInfo.OsVersion"] AS DeviceInfoOsVersion,
json["DeviceInfo.Id"] AS DeviceInfoId,
json["p1"] AS p1,
json["PipelineInfo.AccountId"] AS PipelineInfoAccountId,
json["PipelineInfo.IngestionTime"] AS PipelineInfoIngestionTime,
json["PipelineInfo.ClientIp"] AS PipelineInfoClientIp,
json["PipelineInfo.ClientCountry"] AS PipelineInfoClientCountry,
json["PipelineInfo.IngestionPath"] AS PipelineInfoIngestionPath,
json["AppInfo.Id"] AS AppInfoId,
json["EventInfo.Id"] AS EventInfoId,
json["EventInfo.BaseType"] AS EventInfoBaseType,
json["EventINfo.IngestionTime"] AS EventINfoIngestionTime
FROM #jsonify;
OUTPUT #columnized
TO "adl://xxxx.azuredatalakestore.net/poc/TestResult.csv"
USING Outputters.Csv(quoting : false);
JSON:
{"EventInfo.Source":"JS_default_source","EventInfo.Sequence":"1","EventInfo.Name":"daysofweek","EventInfo.Time":"2018-01-25T21:09:36.779Z","EventInfo.SdkVersion":"ACT-Web-JS-2.6.0","AppInfo.Language":"en","UserInfo.Language":"en-US","UserInfo.TimeZone":"-08:00","DeviceInfo.BrowserName":"Chrome","DeviceInfo.BrowserVersion":"63.0.3239.132","DeviceInfo.OsName":"Mac OS X","DeviceInfo.OsVersion":"10","p1":"V1","PipelineInfo.IngestionTime":"2018-01-25T21:09:33.9930000Z","PipelineInfo.ClientCountry":"CA","PipelineInfo.IngestionPath":"FastPath","EventInfo.BaseType":"custom","EventInfo.IngestionTime":"2018-01-25T21:09:33.9930000Z"}
I got this to work with single quotes and single square brackets, eg
#columnized = SELECT
json["['EventInfo.Source']"] AS EventInfoSource,
...
Full code:
#columnized = SELECT
json["['EventInfo.Source']"] AS EventInfoSource,
json["['EventInfo.InitId']"] AS EventInfoInitId,
json["['EventInfo.Sequence']"] AS EventInfoSequence,
json["['EventInfo.Name']"] AS EventInfoName,
json["['EventInfo.Time']"] AS EventInfoTime,
json["['EventInfo.SdkVersion']"] AS EventInfoSdkVersion,
json["['AppInfo.Language']"] AS AppInfoLanguage,
json["['UserInfo.Language']"] AS UserInfoLanguage,
json["['DeviceInfo.BrowserName']"] AS DeviceInfoBrowswerName,
json["['DeviceInfo.BrowserVersion']"] AS BrowswerVersion,
json["['DeviceInfo.OsName']"] AS DeviceInfoOsName,
json["['DeviceInfo.OsVersion']"] AS DeviceInfoOsVersion,
json["['DeviceInfo.Id']"] AS DeviceInfoId,
json["p1"] AS p1,
json["['PipelineInfo.AccountId']"] AS PipelineInfoAccountId,
json["['PipelineInfo.IngestionTime']"] AS PipelineInfoIngestionTime,
json["['PipelineInfo.ClientIp']"] AS PipelineInfoClientIp,
json["['PipelineInfo.ClientCountry']"] AS PipelineInfoClientCountry,
json["['PipelineInfo.IngestionPath']"] AS PipelineInfoIngestionPath,
json["['AppInfo.Id']"] AS AppInfoId,
json["['EventInfo.Id']"] AS EventInfoId,
json["['EventInfo.BaseType']"] AS EventInfoBaseType,
json["['EventINfo.IngestionTime']"] AS EventINfoIngestionTime
FROM #jsonify;
My results:

Check and print occurrences of an array of string in a dataset in Python

I want to check if an array of strings occur in a dataset and print those rows where the string array elements occur.
rareTitles = {"Capt", "Col", "Countess", "Don", "Dr", "Jonkheer", "Lady",
"Major", "Mlle", "Mme", "Ms", "Rev", "Sir"}
dataset[rareTitles in (dataset['Title'])]
I am getting following error:
TypeError: unhashable type: 'set'
First of all, I think the comparison should go the other way around - you look for a dataset['Title'], that contains string from rareTitles.
You can use str attribute of a pandas DataSeries, which allows as to use string methods, like contains. As this method accepts also a pattern as a regular expression, you can put as an argument something like 'Capt|Col...'. To join all elements of a set you can use str.join() method.
So the solution would be
dataset[dataset['Title'].str.contains('|'.join(rareTitles))]
Link to documentation: pandas.Series.str.contains