I have this JSON file in a data lake that looks like this:
{
"id":"398507",
"contenttype":"POST",
"posttype":"post",
"uri":"http://twitter.com/etc",
"title":null,
"profile":{
"#class":"PublisherV2_0",
"name":"Company",
"id":"2163171",
"profileIcon":"https://pbs.twimg.com/image",
"profileLocation":{
"#class":"DocumentLocation",
"locality":"Toronto",
"adminDistrict":"ON",
"countryRegion":"Canada",
"coordinates":{
"latitude":43.7217,
"longitude":-31.432},
"quadKey":"000000000000000"},
"displayName":"Name",
"externalId":"00000000000"},
"source":{
"name":"blogs",
"id":"18",
"param":"Twitter"},
"content":{
"text":"Description of post"},
"language":{
"name":"English",
"code":"en"},
"abstracttext":"More Text and links",
"score":{}
}
}
in order to call the data into my application, I have to turn the JSON into a string using this code:
DECLARE #input string = #"/MSEStream/{*}.json";
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
#allposts =
EXTRACT
jsonString string
FROM #input
USING Extractors.Text(delimiter:'\b', quoting:true);
#extractedrows = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS er FROM #allposts;
#result =
SELECT er["id"] AS postID,
er["contenttype"] AS contentType,
er["posttype"] AS postType,
er["uri"] AS uri,
er["title"] AS Title,
er["acquisitiondate"] AS acquisitionDate,
er["modificationdate"] AS modificationDate,
er["publicationdate"] AS publicationDate,
er["profile"] AS profile
FROM #extractedrows;
OUTPUT #result
TO "/ProcessedQueries/all_posts.csv"
USING Outputters.Csv();
This output the JSON into a .csv file that is readable and when I download the file all data is displayed properly. My problem is when I need to get the data inside profile. Because the JSON is now a string I can't seem to extract any of that data and put it into a variable to use. Is there any way to do this? or do I need to look into other options for reading the data?
You can use JsonTuple on the profile string to further extract the specific properties you want. An example of U-SQL code to process nested Json is provided in this link - https://github.com/Azure/usql/blob/master/Examples/JsonSample/JsonSample/NestedJsonParsing.usql.
You can use JsonTuple on the profile column to further extract specific nodes
E.g. use JsonTuple to get all the child nodes of the profile node and extract specific values like how you did in your code.
#childnodesofprofile =
SELECT
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(profile) AS childnodes_map
FROM #result;
#values =
SELECT
childnodes_map["name"] AS name,
childnodes_map["id"] AS id
FROM #result;
Alternatively, if you are interested in specific values, you can also pass paramters to the JsonTuple function to get the specific nodes you want. The code below gets the locality node from the recursively nested nodes (as described by the "$..value" construct.
#locality =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(profile, "$..locality").Values AS locality
FROM #result;
Other supported constructs by JsonTuple
JsonTuple(json, "id", "name") // field names
JsonTuple(json, "$.address.zip") // nested fields
JsonTuple(json, "$..address") // recursive children
JsonTuple(json, "$[?(#.id > 1)].id") // path expression
JsonTuple(json) // all children
Hope this helps.
Related
I want to retrieve data from an external nested JSON file on my seed.rb
The JSON looks like this:
{"people":[{"name":"John", "age":"23"}, {"name":"Jack", "age":"25"}]}
I saw a solution on GitHub but it only works on non-nested JSON.
Let's say you have JSON file db/seeds.json:
{"people":[{"name":"John", "age":"23"}, {"name":"Jack", "age":"25"}]}
You can use it like this in your db/seeds.rb:
seeds = JSON.parse(Rails.root.join("db", "seeds.json"), symbolize_names: true)
User.create(seeds[:people])
seeds[:people] in this case is array of hashes with user attributes
if you have:
json_data = {"people":[{"name":"John", "age":"23"}, {"name":"Jack", "age":"25"}]}
when you do:
json_data[:people]
you'll get an array:
[{:name=>"John", :age=>"23"}, {:name=>"Jack", :age=>"25"}]
if you want to use this array to populate a model, you can do:
People.create(json_data[:people])
if you want to read each item values, you can iterate through your data, like:
json_data[:people].each {|p| puts p[:name], p[:age]}
I have wrote a code like below to concatenate two arrays together and save them as a JSON file.
In this code, "seg" is an array of some number, which has been produced somewhere in my code. info is also an array containing some data following by "Seg" array.
Defining variable types:
seg: Array<any> = [];
info: Array<any>=[];
final: Array<{info:any, Seg:any}>=[];
push value in array and concatenate them together:
this.info.push({date_created: 25 , description: 'aaa', year:'2015'});
this.final.push({info: this.info ,Seg:this.seg});
this.file.writeFile(this.file.externalApplicationStorageDirectory, 'test.json', JSON.stringify(this.final));
the produced file is something like this:
[{"info":[{"date_created: 25 , "description"="aaa", "year" :"2015"}],"seg":[2,3,4,5]}]
As you can see, the info information is placed between two bracket, so JSON file consider it as a list, not record.
Does anyone knows , how can I remove this brackets from the info array sides?
Should change the type of variable from array to anything else?
You can use like this to store as a record
seg: Array<any> = [];
info: Array<any>=[];
final:{info:any, Seg:any};
this.final.Seg = this.seg;
this.final.info = this.info;
I need to pass values from my features to json files.
Ex: an item is created in the test feature and id is returned in response;
I would like to put this id in a json file where I have something as:
{"item":
["string1", string2 etc..]
}
to concatenate the id in string1, string2
I saw an example here but it didn't help me for json files:
https://github.com/intuit/karate/blob/master/karate-junit4/src/test/java/com/intuit/karate/junit4/demos/type-conv.feature
Thanks for help
Use the set keyword:
* def json = read('some.json')
* set json.item[] = 'string3'
I'm trying to convert some telemetry data that is in JSON format into CSV format, then write it out to a file, using U-SQL.
The problem is that some of the JSON key values have periods in them, and so when I'm doing the SELECT operation, U-SQL is not recognizing them. When I check the output file, all that I am seeing is the values for "p1". How can I represent the names of the JSON key names in the script so that they are recognized. Thanks in advance for any help!
Code:
REFERENCE ASSEMBLY MATSDevDB.[Newtonsoft.Json];
REFERENCE ASSEMBLY MATSDevDB.[Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
#jsonDocuments =
EXTRACT jsonString string
FROM #"adl://xxxx.azuredatalakestore.net/xxxx/{*}/{*}/{*}/telemetry_{*}.json"
USING Extractors.Tsv(quoting:false);
#jsonify =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS json
FROM #jsonDocuments;
#columnized = SELECT
json["EventInfo.Source"] AS EventInfoSource,
json["EventInfo.InitId"] AS EventInfoInitId,
json["EventInfo.Sequence"] AS EventInfoSequence,
json["EventInfo.Name"] AS EventInfoName,
json["EventInfo.Time"] AS EventInfoTime,
json["EventInfo.SdkVersion"] AS EventInfoSdkVersion,
json["AppInfo.Language"] AS AppInfoLanguage,
json["UserInfo.Language"] AS UserInfoLanguage,
json["DeviceInfo.BrowserName"] AS DeviceInfoBrowswerName,
json["DeviceInfo.BrowserVersion"] AS BrowswerVersion,
json["DeviceInfo.OsName"] AS DeviceInfoOsName,
json["DeviceInfo.OsVersion"] AS DeviceInfoOsVersion,
json["DeviceInfo.Id"] AS DeviceInfoId,
json["p1"] AS p1,
json["PipelineInfo.AccountId"] AS PipelineInfoAccountId,
json["PipelineInfo.IngestionTime"] AS PipelineInfoIngestionTime,
json["PipelineInfo.ClientIp"] AS PipelineInfoClientIp,
json["PipelineInfo.ClientCountry"] AS PipelineInfoClientCountry,
json["PipelineInfo.IngestionPath"] AS PipelineInfoIngestionPath,
json["AppInfo.Id"] AS AppInfoId,
json["EventInfo.Id"] AS EventInfoId,
json["EventInfo.BaseType"] AS EventInfoBaseType,
json["EventINfo.IngestionTime"] AS EventINfoIngestionTime
FROM #jsonify;
OUTPUT #columnized
TO "adl://xxxx.azuredatalakestore.net/poc/TestResult.csv"
USING Outputters.Csv(quoting : false);
JSON:
{"EventInfo.Source":"JS_default_source","EventInfo.Sequence":"1","EventInfo.Name":"daysofweek","EventInfo.Time":"2018-01-25T21:09:36.779Z","EventInfo.SdkVersion":"ACT-Web-JS-2.6.0","AppInfo.Language":"en","UserInfo.Language":"en-US","UserInfo.TimeZone":"-08:00","DeviceInfo.BrowserName":"Chrome","DeviceInfo.BrowserVersion":"63.0.3239.132","DeviceInfo.OsName":"Mac OS X","DeviceInfo.OsVersion":"10","p1":"V1","PipelineInfo.IngestionTime":"2018-01-25T21:09:33.9930000Z","PipelineInfo.ClientCountry":"CA","PipelineInfo.IngestionPath":"FastPath","EventInfo.BaseType":"custom","EventInfo.IngestionTime":"2018-01-25T21:09:33.9930000Z"}
I got this to work with single quotes and single square brackets, eg
#columnized = SELECT
json["['EventInfo.Source']"] AS EventInfoSource,
...
Full code:
#columnized = SELECT
json["['EventInfo.Source']"] AS EventInfoSource,
json["['EventInfo.InitId']"] AS EventInfoInitId,
json["['EventInfo.Sequence']"] AS EventInfoSequence,
json["['EventInfo.Name']"] AS EventInfoName,
json["['EventInfo.Time']"] AS EventInfoTime,
json["['EventInfo.SdkVersion']"] AS EventInfoSdkVersion,
json["['AppInfo.Language']"] AS AppInfoLanguage,
json["['UserInfo.Language']"] AS UserInfoLanguage,
json["['DeviceInfo.BrowserName']"] AS DeviceInfoBrowswerName,
json["['DeviceInfo.BrowserVersion']"] AS BrowswerVersion,
json["['DeviceInfo.OsName']"] AS DeviceInfoOsName,
json["['DeviceInfo.OsVersion']"] AS DeviceInfoOsVersion,
json["['DeviceInfo.Id']"] AS DeviceInfoId,
json["p1"] AS p1,
json["['PipelineInfo.AccountId']"] AS PipelineInfoAccountId,
json["['PipelineInfo.IngestionTime']"] AS PipelineInfoIngestionTime,
json["['PipelineInfo.ClientIp']"] AS PipelineInfoClientIp,
json["['PipelineInfo.ClientCountry']"] AS PipelineInfoClientCountry,
json["['PipelineInfo.IngestionPath']"] AS PipelineInfoIngestionPath,
json["['AppInfo.Id']"] AS AppInfoId,
json["['EventInfo.Id']"] AS EventInfoId,
json["['EventInfo.BaseType']"] AS EventInfoBaseType,
json["['EventINfo.IngestionTime']"] AS EventINfoIngestionTime
FROM #jsonify;
My results:
I have in my bucket a document containing a list of ID (childList).
I would like to query over this list and keep the result ordered like in my JSON. My query is like (using java SDK) :
String query = new StringBuilder().append("SELECT B.name, META(B).id as id ")
.append("FROM" + bucket.name() + "A ")
.append("USE KEYS $id ")
.append("JOIN" + bucket.name() + "B ON KEYS ARRAY i FOR i IN A.childList end;").toString();
This query will return rows that I will transform into my domain object and create a list like this :
n1qlQueryResult.allRows().forEach(n1qlQueryRow -> (add to return list ) ...);
The problem is the output order is important.
Any ideas?
Thank you.
here is a rough idea of a solution without N1QL, provided you always start from a single A document:
List<JsonDocument> listOfBs = bucket
.async()
.get(idOfA)
.flatMap(doc -> Observable.from(doc.content().getArray("childList")))
.concatMapEager(id -> bucket.async().get(id))
.toList()
.toBlocking().first();
You might want another map before the toList to extract the name and id, or to perform your domain object transformation even maybe...
The steps are:
use the async API
get the A document
extract the list of children and stream these ids
asynchronously fetch each child document and stream them but keeping them in original order
collect all into a List<JsonDocument>
block until the list is ready and return that List.