I am attempting to extract from my json object
hits = [{“title”: “Facebook”,
“domain”: “facebook.com”},
{“title”: “Linkedin”,
“domain”: “linkedin.com”}]
When I use:
json_extract(hits,'$.title') as title,
nothing is returned. I would like the result to be: [Facebook, Linkedin].
However, when I extract by a scalar value, ex.:
json_extract_scalar(hits,'$[0].title') as title,
it works and Facebook is returned.
hits contains a lot of values, so I need to use json_extract in order to get all of them, so I can't do each scalar individually. Any suggestions to fix this would be greatly appreciated.
I get INVALID_FUNCTION_ARGUMENT: Invalid JSON path: '$.title' as an error for $.title (double stars). When I try unnest I get INVALID_FUNCTION_ARGUMENT: Cannot unnest type: varchar as an error and INVALID_FUNCTION_ARGUMENT: Cannot unnest type: json. I get SYNTAX_ERROR: line 26:19: Column '$.title' cannot be resolved when I try double quotes
Correct json path to exract all titles is $.[*].title (or $.*.title), though it is not supported by athena. One option is to cast your json to array of json and use transform on it:
WITH dataset AS (
SELECT * FROM (VALUES
(JSON '[{"title": "Facebook",
"domain": "facebook.com"},
{"title": "Linkedin",
"domain": "linkedin.com"}]')
) AS t (json_string))
SELECT transform(cast(json_string as ARRAY(JSON)), js -> json_extract_scalar(js, '$.title'))
FROM dataset
Output:
_col0
[Facebook, Linkedin]
Fits you have an array. So $.title doesn't exist see below
Second, you have not a valid json, is must have double quotes " like the example shows
SET #a := '[{
"title": "Facebook",
"domain": "facebook.com"
},
{
"title": "Linkedin",
"domain": "linkedin.com"
}
]'
SELECT json_extract(#a,'$[0]') as title
| title |
| :---------------------------------------------- |
| {"title": "Facebook", "domain": "facebook.com"} |
SELECT JSON_EXTRACT(#a, "$[0].title") AS 'from'
| from |
| :--------- |
| "Facebook" |
SELECT #a
| #a |
| :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [{<br> "title": "Facebook",<br> "domain": "facebook.com"<br> },<br> {<br><br> "title": "Linkedin",<br> "domain": "linkedin.com"<br> }<br>] |
db<>fiddle here
I am working with a nested json file. The issue is that the keys of the nested json are dates and their value is not known beforehand. Therefore I am unable to apply expandRecordColumn method on it.
Each row has a unique refId and looks like this
{
"refId" : "XYZ",
"snapshotIndexes" : {
"19-07-2021" : {
url: "abc1",
value: "123"
},
"20-07-2021" : {
url: "abc2",
value: "567"
}
}
}
I finally want a table with these columns,
refid | date | url | value
XYZ | 19-7-2021 | abc1 | 123
XYZ | 20-7-2021 | abc2 | 567
PQR | 7-5-2021 | srt | 999
In the new table, refId and date will together make a unique entry.
This is powerBi snapshot
Records
I was able to solve it using Record.ToTable on each row to convert from record to table and then applying ExpandTableColumn
let
Source = DocumentDB.Contents("sourceurl"),
Source = Source{[id="dbid"]}[Collections],
SourceTable= Source{[db_id="dbid",id="PartnerOfferSnapshots"]}[Documents],
ExpandedDocument = Table.ExpandRecordColumn(SourceTable, "Document", {"refId", "snapshotIndexes"}, {"Document.refId", "Document.snapshotIndexes"}),
TransformColumns = Table.TransformColumns(ExpandedDocument,{"Document.snapshotIndexes", each Table.ExpandRecordColumn(Record.ToTable(_), "Value", {"url","id","images"}, {"url","id","images"})}),
ExpandedTable = Table.ExpandTableColumn(TransformColumns, "Document.snapshotIndexes", {"Name","url","id","images"}, {"Document.dates","Document.url","Document.id","Document.images"})
in
ExpandedTable
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to hit an API by applying some parameters from a dataframe, get the Json Response body, and from the body, pull out all the distinct values of a particular Key.
I then need to add this column into the first dataframe.
Suppose i have a dataframe like below:
df1:
+-----+-------+--------+
| DB | User | UserID |
+-----+-------+--------+
| db1 | user1 | 123 |
| db2 | user2 | 456 |
+-----+-------+--------+
I want to hit a REST API by providing the column value of Df1 as parameters.
If my parameters for URL is db=db1 and User=user1(First record of df1),the response will be a json format of following format:
{
"data":[
{
"db": "db1"
"User": "User1"
"UserID": 123
"Query": "Select * from A"
"Application": "App1"
},
{
"db": "db1"
"User": "User1"
"UserID": 123
"Query": "Select * from B"
"Application": "App2"
}
]
}
From this json file, i want get distinct values of Application key as an array or list and attach it as a new column to Df1
My output will look similar to below:
Final df:
+-----+-------+--------+-------------+
| DB | User | UserID | Apps |
+-----+-------+--------+-------------+
| db1 | user1 | 123 | {App1,App2} |
| db2 | user2 | 456 | {App3,App3} |
+-----+-------+--------+-------------+
I have come up with a high level plan on how to achieve it.
Add a new column called response URL built from multiple columns in input.
Define a scala function that takes in URL and return an array of application and convert it to UDF.
Create another column by applying the UDF by passing response URL.
Since i am pretty new to scala-spark and have never worked with REST APIs, can someone please help me here on achieving the result please.
Any other idea or suggestion is always welcome.
I am using spark 1.6.
Check below code, You may need to write logic to invoke reset api. once you get result next process is simple.
scala> val df = Seq(("db1","user1",123),("db2","user2",456)).toDF("db","user","userid")
df: org.apache.spark.sql.DataFrame = [db: string, user: string, userid: int]
scala> df.show(false)
+---+-----+------+
|db |user |userid|
+---+-----+------+
|db1|user1|123 |
|db2|user2|456 |
+---+-----+------+
scala> :paste
// Entering paste mode (ctrl-D to finish)
def invokeRestAPI(db:String,user: String) = {
import org.json4s._
import org.json4s.jackson.JsonMethods._
implicit val formats = DefaultFormats
// Write your invoke logic & for now I am hardcoding your sample json here.
val json_data = parse("""{"data":[ {"db": "db1","User": "User1","UserID": 123,"Query": "Select * from A","Application": "App1"},{"db": "db1","User": "User1","UserID": 123,"Query": "Select * from B","Application": "App2"}]}""")
(json_data \\ "data" \ "Application").extract[Set[String]].toList
}
// Exiting paste mode, now interpreting.
invokeRestAPI: (db: String, user: String)List[String]
scala> val fetch = udf(invokeRestAPI _)
fetch: org.apache.spark.sql.UserDefinedFunction = UserDefinedFunction(<function2>,ArrayType(StringType,true),List(StringType, StringType))
scala> df.withColumn("apps",fetch($"db",$"user")).show(false)
+---+-----+------+------------+
|db |user |userid|apps |
+---+-----+------+------------+
|db1|user1|123 |[App1, App2]|
|db2|user2|456 |[App1, App2]|
+---+-----+------+------------+
I have below API response sample
{
"items": [
{
"id":11,
"name": "SMITH",
"prefix": "SAM",
"code": "SSO"
},
{
"id":10,
"name": "James",
"prefix": "JAM",
"code": "BBC"
}
]
}
As per above response, my tests says that whenever I hit the API request the 11th ID would be of SMITH and 10th id would be JAMES
So what I thought to store this in a table and assert against the actual response
* table person
| id | name |
| 11 | SMITH |
| 10 | James |
| 9 | RIO |
Now how would I match one by one ? like first it parse the first ID and first name from the API response and match with the Tables first ID and tables first name
Please share any convenient way of doing it from KARATE
There are a few possible ways, here is one:
* def lookup = { 11: 'SMITH', 10: 'James' }
* def items =
"""
[
{
"id":11,
"name":"SMITH",
"prefix":"SAM",
"code":"SSO"
},
{
"id":10,
"name":"James",
"prefix":"JAM",
"code":"BBC"
}
]
"""
* match each items contains { name: "#(lookup[_$.id+''])" }
And you already know how to use table instead of JSON.
Please read the docs and other stack-overflow answers to get more ideas.
I am trying to parse JSON data from a table in SQL Server 2017. I have a view that returns this data:
| Debrief Name | Version | Answer Question | Answer Options |
+-------------------+-----------+--------------------------+--------------------------------------------------------------------------------------------------------------------------+
| Observer Report | 7 | Division: | {"Options":[{"Display":"Domestic","Value":"Domestic"},{"Display":"International","Value":"International"}]} |
| Observer Report | 7 | Are you on reserve? | {"Options":[{"Display":"Yes - Long Call Line","Value":"Yes"},{"Display":"No","Value":"No"}]} |
| Observer Report | 11 | Crew Position: | {"Options":[{"Display":"CA","Value":"CA"},{"Display":"RC","Value":"RC"},{"Display":"FO","Value":"FO"}]} |
| Observer Report | 11 | Domicile: | {"VisibleLines":2,"Options":[{"Display":"BOS","Value":"BOS"},{"Display":"CLT","Value":"CLT"}]} |
| Training Debrief | 12 | TRAINING CREW POSITION | {"VisibleLines":2,"Options":[{"Display":"CA","Value":"CA"},{"Display":"FO","Value":"FO"}]} |
| Training Debrief | 12 | AIRCRAFT | {"VisibleLines":2,"Options":[{"Display":"777","Value":"777"},{"Display":"767","Value":"767"}]} |
| Security Debrief | 9 | Aircraft Type | {"Options":[{"Display":"MD-80","Value":"MD-80"},{"Display":"777","Value":"777"},{"Display":"767/757","Value":"767/757"}]}|
| News Digest | 2 | Do you read Digest? | {"Options":[{"Display":"Yes","Value":"Yes"},{"Display":"No","Value":"No"}]} |
The Debrief Name column can have multiple records for same debrief name and Version. Also there are multiple versions for each debrief. And for each debrief name and version combination, there are set of Answer Questions and related Answer Options. Now the column Answer Options contain JSON record which I need to parse.
So my initial query that is something like below:
SELECT *
FROM [dbo].<MY VIEW>
WHERE [Debrief Name] = 'Observer Report' AND Version = 11
which would return below data:
| Debrief Name | Version | Answer Question | Answer Options |
+---------------------+--------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+
| Observer Report | 11 | Crew Position: | {"Options":[{"Display":"CA","Value":"CA"},{"Display":"RC","Value":"RC"}]} |
| Observer Report | 11 | Domicile: | {"VisibleLines":2,"Options":[{"Display":"BOS","Value":"BOS"},{"Display":"CLT","Value":"CLT"}]} |
| Observer Report | 11 | Fleet: | {"Options":[{"Display":"330","Value":"330"},{"Display":"320","Value":"320"}]} |
| Observer Report | 11 | Division: | {"Options":[{"Display":"Domestic","Value":"Domestic"},{"Display":"International","Value":"International"}]} |
| Observer Report | 11 | Are you on reserve? | {"Options":[{"Display":"Yes - Long Call Line","Value":"Yes - Long Call Line"},{"Display":"No","Value":"No"}]} |
Now from this returned result, for each Answer Question I need to parse the related Answer Options JSON data and extract the Value field for all the display attribute. So for example the JSON string in Answer Options for question "Are you on reserver?" looks like this:
"Options":[
{
"Display":"330",
"Value":"330",
"Selected":false
},
{
"Display":"320",
"Value":"320",
"Selected":false
},
{
"Display":"S80",
"Value":"S80",
"Selected":false
}
]
So I need to extract "Value" fields and return something like an array with values {330, 320, 195}.
In conclusion I want to construct a query where when I provide the Debrief Name and VersionNumber, it returns me the Answer Question and all the Answer Option values.
I am thinking of using a stored procedure like below:
CREATE PROCEDURE myProc
#DebriefName NVARCHAR(255),
#Version INT
AS
SELECT *
FROM [dbo].[myView]
WHERE [Debrief Name] = #DebriefName
AND Version = #Version
GO;
And then have another stored procedure that will capture this result from myProc and then do the JSON parsing:
CREATE PROCEDURE parseJSON
#DebriefName NVARCHAR(255),
#Version INT
AS
EXEC myProc #DebriefName, #Version; //Need to capture the result data in a temp table or something
// Parse the JSON data for each question item in temp table
GO;
I am not an expert in SQL so not sure how to do this. I read about Json parsing in SQL here and feel like I can use that but not sure how to in my context.
If you want to parse JSON data in Answer Options column and extract the Value field, you may try with the following approach, using OPENJSON() and STRING_AGG():
DECLARE #json nvarchar(max)
SET #json = N'{
"Options": [
{
"Display": "330",
"Value": "330",
"Selected": false
},
{
"Display": "320",
"Value": "320",
"Selected": false
},
{
"Display": "195",
"Value": "195",
"Selected": false
}
]
}'
SELECT STRING_AGG(x.[value], ', ') AS [Values]
FROM OPENJSON(#json, '$.Options') j
CROSS APPLY (SELECT * FROM OPENJSON(j.[value])) x
WHERE x.[key] = 'Value'
Output:
Values
330, 320, 195
If you want to build your statement using stored procedure, use this approach:
CREATE TABLE myTable (
DebriefName nvarchar(100),
Version int,
AnswerQuestion nvarchar(1000),
AnswerOptions nvarchar(max)
)
INSERT INTO myTable
(DebriefName, Version, AnswerQuestion, AnswerOptions)
VALUES
(N'Observer Report', 7, N'Division:' , N'{"Options":[{"Display":"Domestic","Value":"Domestic"},{"Display":"International","Value":"International"}]}'),
(N'Observer Report', 7, N'Are you on reserve?' , N'{"Options":[{"Display":"Yes - Long Call Line","Value":"Yes"},{"Display":"No","Value":"No"}]}'),
(N'Observer Report', 11, N'Crew Position:' , N'{"Options":[{"Display":"CA","Value":"CA"},{"Display":"RC","Value":"RC"},{"Display":"FO","Value":"FO"}]}'),
(N'Observer Report', 11, N'Domicile:' , N'{"VisibleLines":2,"Options":[{"Display":"BOS","Value":"BOS"},{"Display":"CLT","Value":"CLT"}]}'),
(N'Training Debrief', 12, N'TRAINING CREW POSITION', N'{"VisibleLines":2,"Options":[{"Display":"CA","Value":"CA"},{"Display":"FO","Value":"FO"}]}'),
(N'Training Debrief', 12, N'AIRCRAFT' , N'{"VisibleLines":2,"Options":[{"Display":"777","Value":"777"},{"Display":"767","Value":"767"}]}'),
(N'Security Debrief', 9, N'Aircraft Type' , N'{"Options":[{"Display":"MD-80","Value":"MD-80"},{"Display":"777","Value":"777"},{"Display":"767/757","Value":"767/757"}]}'),
(N'News Digest', 2, N'Do you read Digest?' , N'{"Options":[{"Display":"Yes","Value":"Yes"},{"Display":"No","Value":"No"}]}')
SELECT
t.AnswerQuestion,
STRING_AGG(x.[value], ', ') AS [Values]
FROM myTable t
CROSS APPLY (SELECT * FROM OPENJSON(t.AnswerOptions, '$.Options')) j
CROSS APPLY (SELECT * FROM OPENJSON(j.[value])) x
WHERE
DebriefName = N'Observer Report' AND
t.Version = 11 AND
x.[key] = 'Value'
GROUP BY
t.DebriefName,
t.Version,
t.AnswerQuestion
Output:
AnswerQuestion Values
Crew Position: CA, RC, FO
Domicile: BOS, CLT