Set values in json file (using karate intuit framework) - json

I need to pass values from my features to json files.
Ex: an item is created in the test feature and id is returned in response;
I would like to put this id in a json file where I have something as:
{"item":
["string1", string2 etc..]
}
to concatenate the id in string1, string2
I saw an example here but it didn't help me for json files:
https://github.com/intuit/karate/blob/master/karate-junit4/src/test/java/com/intuit/karate/junit4/demos/type-conv.feature
Thanks for help

Use the set keyword:
* def json = read('some.json')
* set json.item[] = 'string3'

Related

How to retrieve data from an external nested json file on seed.rb

I want to retrieve data from an external nested JSON file on my seed.rb
The JSON looks like this:
{"people":[{"name":"John", "age":"23"}, {"name":"Jack", "age":"25"}]}
I saw a solution on GitHub but it only works on non-nested JSON.
Let's say you have JSON file db/seeds.json:
{"people":[{"name":"John", "age":"23"}, {"name":"Jack", "age":"25"}]}
You can use it like this in your db/seeds.rb:
seeds = JSON.parse(Rails.root.join("db", "seeds.json"), symbolize_names: true)
User.create(seeds[:people])
seeds[:people] in this case is array of hashes with user attributes
if you have:
json_data = {"people":[{"name":"John", "age":"23"}, {"name":"Jack", "age":"25"}]}
when you do:
json_data[:people]
you'll get an array:
[{:name=>"John", :age=>"23"}, {:name=>"Jack", :age=>"25"}]
if you want to use this array to populate a model, you can do:
People.create(json_data[:people])
if you want to read each item values, you can iterate through your data, like:
json_data[:people].each {|p| puts p[:name], p[:age]}

Insert JSON to Hadoop using Spark (Java)

I'm very new in Hadoop,
I'm using Spark with Java.
I have dynamic JSON, exmaple:
{
"sourceCode":"1234",
"uuid":"df123-....",
"title":"my title"
}{
"myMetaDataEvent": {
"date":"10/10/2010",
},
"myDataEvent": {
"field1": {
"field1Format":"fieldFormat",
"type":"Text",
"value":"field text"
}
}
}
Sometimes I can see only field1 and sometimes I can see field1...field50
And maybe the user can add fields/remove fields from this JSON.
I want to insert this dynamic JSON to hadoop (to hive table) from Spark Java code,
How can I do it?
I want that the user can after make HIVE query, i.e: select * from MyTable where type="Text
I have around 100B JSON records per day that I need to insert to Hadoop,
So what is the recommanded way to do that?
*I'm looked on the following: SO Question but this is known JSON scheme where it isnt my case.
Thanks
I had encountered kind of similar problem, I was able to resolve my problem using this. ( So this might help if you create the schema before you parse the json ).
For a field having a string data type you could create the schema :-
StructField field = DataTypes.createStructField(<name of the field>, DataTypes.StringType, true);
For a field having a int data type you could create the schema :-
StructField field = DataTypes.createStructField(<name of the field>, DataTypes.IntegerType, true);
After you have added all the fields in a List<StructField>,
Eg:-
List<StructField> innerField = new ArrayList<StructField>();
.... Field adding logic ....
Eg:-
innerField.add(field1);
innerField.add(field2);
// One instance can come, or multiple instance of value comes in an array, then it needs to be put in Array Type.
ArrayType getArrayInnerType = DataTypes.createArrayType(DataTypes.createStructType(innerField));
StructField getArrayField = DataTypes.createStructField(<name of field>, getArrayInnerType,true);
You can then create the schema :-
StructType structuredSchema = DataTypes.createStructType(getArrayField);
Then I read the json using the schema generated using the Dataset API.
Dataset<Row> dataRead = sqlContext.read().schema(structuredSchema).json(fileName);

Parse JSON into U-SQL then convert to csv

I'm trying to convert some telemetry data that is in JSON format into CSV format, then write it out to a file, using U-SQL.
The problem is that some of the JSON key values have periods in them, and so when I'm doing the SELECT operation, U-SQL is not recognizing them. When I check the output file, all that I am seeing is the values for "p1". How can I represent the names of the JSON key names in the script so that they are recognized. Thanks in advance for any help!
Code:
REFERENCE ASSEMBLY MATSDevDB.[Newtonsoft.Json];
REFERENCE ASSEMBLY MATSDevDB.[Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
#jsonDocuments =
EXTRACT jsonString string
FROM #"adl://xxxx.azuredatalakestore.net/xxxx/{*}/{*}/{*}/telemetry_{*}.json"
USING Extractors.Tsv(quoting:false);
#jsonify =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS json
FROM #jsonDocuments;
#columnized = SELECT
json["EventInfo.Source"] AS EventInfoSource,
json["EventInfo.InitId"] AS EventInfoInitId,
json["EventInfo.Sequence"] AS EventInfoSequence,
json["EventInfo.Name"] AS EventInfoName,
json["EventInfo.Time"] AS EventInfoTime,
json["EventInfo.SdkVersion"] AS EventInfoSdkVersion,
json["AppInfo.Language"] AS AppInfoLanguage,
json["UserInfo.Language"] AS UserInfoLanguage,
json["DeviceInfo.BrowserName"] AS DeviceInfoBrowswerName,
json["DeviceInfo.BrowserVersion"] AS BrowswerVersion,
json["DeviceInfo.OsName"] AS DeviceInfoOsName,
json["DeviceInfo.OsVersion"] AS DeviceInfoOsVersion,
json["DeviceInfo.Id"] AS DeviceInfoId,
json["p1"] AS p1,
json["PipelineInfo.AccountId"] AS PipelineInfoAccountId,
json["PipelineInfo.IngestionTime"] AS PipelineInfoIngestionTime,
json["PipelineInfo.ClientIp"] AS PipelineInfoClientIp,
json["PipelineInfo.ClientCountry"] AS PipelineInfoClientCountry,
json["PipelineInfo.IngestionPath"] AS PipelineInfoIngestionPath,
json["AppInfo.Id"] AS AppInfoId,
json["EventInfo.Id"] AS EventInfoId,
json["EventInfo.BaseType"] AS EventInfoBaseType,
json["EventINfo.IngestionTime"] AS EventINfoIngestionTime
FROM #jsonify;
OUTPUT #columnized
TO "adl://xxxx.azuredatalakestore.net/poc/TestResult.csv"
USING Outputters.Csv(quoting : false);
JSON:
{"EventInfo.Source":"JS_default_source","EventInfo.Sequence":"1","EventInfo.Name":"daysofweek","EventInfo.Time":"2018-01-25T21:09:36.779Z","EventInfo.SdkVersion":"ACT-Web-JS-2.6.0","AppInfo.Language":"en","UserInfo.Language":"en-US","UserInfo.TimeZone":"-08:00","DeviceInfo.BrowserName":"Chrome","DeviceInfo.BrowserVersion":"63.0.3239.132","DeviceInfo.OsName":"Mac OS X","DeviceInfo.OsVersion":"10","p1":"V1","PipelineInfo.IngestionTime":"2018-01-25T21:09:33.9930000Z","PipelineInfo.ClientCountry":"CA","PipelineInfo.IngestionPath":"FastPath","EventInfo.BaseType":"custom","EventInfo.IngestionTime":"2018-01-25T21:09:33.9930000Z"}
I got this to work with single quotes and single square brackets, eg
#columnized = SELECT
json["['EventInfo.Source']"] AS EventInfoSource,
...
Full code:
#columnized = SELECT
json["['EventInfo.Source']"] AS EventInfoSource,
json["['EventInfo.InitId']"] AS EventInfoInitId,
json["['EventInfo.Sequence']"] AS EventInfoSequence,
json["['EventInfo.Name']"] AS EventInfoName,
json["['EventInfo.Time']"] AS EventInfoTime,
json["['EventInfo.SdkVersion']"] AS EventInfoSdkVersion,
json["['AppInfo.Language']"] AS AppInfoLanguage,
json["['UserInfo.Language']"] AS UserInfoLanguage,
json["['DeviceInfo.BrowserName']"] AS DeviceInfoBrowswerName,
json["['DeviceInfo.BrowserVersion']"] AS BrowswerVersion,
json["['DeviceInfo.OsName']"] AS DeviceInfoOsName,
json["['DeviceInfo.OsVersion']"] AS DeviceInfoOsVersion,
json["['DeviceInfo.Id']"] AS DeviceInfoId,
json["p1"] AS p1,
json["['PipelineInfo.AccountId']"] AS PipelineInfoAccountId,
json["['PipelineInfo.IngestionTime']"] AS PipelineInfoIngestionTime,
json["['PipelineInfo.ClientIp']"] AS PipelineInfoClientIp,
json["['PipelineInfo.ClientCountry']"] AS PipelineInfoClientCountry,
json["['PipelineInfo.IngestionPath']"] AS PipelineInfoIngestionPath,
json["['AppInfo.Id']"] AS AppInfoId,
json["['EventInfo.Id']"] AS EventInfoId,
json["['EventInfo.BaseType']"] AS EventInfoBaseType,
json["['EventINfo.IngestionTime']"] AS EventINfoIngestionTime
FROM #jsonify;
My results:

Converting mongoengine objects to JSON

i tried to fetch data from mongodb using mongoengine with flask. query is work perfect the problem is when i convert query result into json its show only fields name.
here is my code
view.py
from model import Users
result = Users.objects()
print(dumps(result))
model.py
class Users(DynamicDocument):
meta = {'collection' : 'users'}
user_name = StringField()
phone = StringField()
output
[["id", "user_name", "phone"], ["id", "user_name", "phone"]]
why its show only fields name ?
Your query returns a queryset. Use the .to_json() method to convert it.
Depending on what you need from there, you may want to use something like json.loads() to get a python dictionary.
For example:
from model import Users
# This returns <class 'mongoengine.queryset.queryset.QuerySet'>
q_set = Users.objects()
json_data = q_set.to_json()
# You might also find it useful to create python dictionaries
import json
dicts = json.loads(json_data)

USql Call data in multidimensional JSON array

I have this JSON file in a data lake that looks like this:
{
"id":"398507",
"contenttype":"POST",
"posttype":"post",
"uri":"http://twitter.com/etc",
"title":null,
"profile":{
"#class":"PublisherV2_0",
"name":"Company",
"id":"2163171",
"profileIcon":"https://pbs.twimg.com/image",
"profileLocation":{
"#class":"DocumentLocation",
"locality":"Toronto",
"adminDistrict":"ON",
"countryRegion":"Canada",
"coordinates":{
"latitude":43.7217,
"longitude":-31.432},
"quadKey":"000000000000000"},
"displayName":"Name",
"externalId":"00000000000"},
"source":{
"name":"blogs",
"id":"18",
"param":"Twitter"},
"content":{
"text":"Description of post"},
"language":{
"name":"English",
"code":"en"},
"abstracttext":"More Text and links",
"score":{}
}
}
in order to call the data into my application, I have to turn the JSON into a string using this code:
DECLARE #input string = #"/MSEStream/{*}.json";
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
#allposts =
EXTRACT
jsonString string
FROM #input
USING Extractors.Text(delimiter:'\b', quoting:true);
#extractedrows = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS er FROM #allposts;
#result =
SELECT er["id"] AS postID,
er["contenttype"] AS contentType,
er["posttype"] AS postType,
er["uri"] AS uri,
er["title"] AS Title,
er["acquisitiondate"] AS acquisitionDate,
er["modificationdate"] AS modificationDate,
er["publicationdate"] AS publicationDate,
er["profile"] AS profile
FROM #extractedrows;
OUTPUT #result
TO "/ProcessedQueries/all_posts.csv"
USING Outputters.Csv();
This output the JSON into a .csv file that is readable and when I download the file all data is displayed properly. My problem is when I need to get the data inside profile. Because the JSON is now a string I can't seem to extract any of that data and put it into a variable to use. Is there any way to do this? or do I need to look into other options for reading the data?
You can use JsonTuple on the profile string to further extract the specific properties you want. An example of U-SQL code to process nested Json is provided in this link - https://github.com/Azure/usql/blob/master/Examples/JsonSample/JsonSample/NestedJsonParsing.usql.
You can use JsonTuple on the profile column to further extract specific nodes
E.g. use JsonTuple to get all the child nodes of the profile node and extract specific values like how you did in your code.
#childnodesofprofile =
SELECT
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(profile) AS childnodes_map
FROM #result;
#values =
SELECT
childnodes_map["name"] AS name,
childnodes_map["id"] AS id
FROM #result;
Alternatively, if you are interested in specific values, you can also pass paramters to the JsonTuple function to get the specific nodes you want. The code below gets the locality node from the recursively nested nodes (as described by the "$..value" construct.
#locality =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(profile, "$..locality").Values AS locality
FROM #result;
Other supported constructs by JsonTuple
JsonTuple(json, "id", "name") // field names
JsonTuple(json, "$.address.zip") // nested fields
JsonTuple(json, "$..address") // recursive children
JsonTuple(json, "$[?(#.id > 1)].id") // path expression
JsonTuple(json) // all children
Hope this helps.