Trying to load JSON file from s3 into Redshift using Copy with JSONPATHS file. The file contains N number of records.
Loading the entire set in one go throws an error:
Invalid operation: Invalid JSONPath format. Supported notations are 'dot-notation' and 'bracket-notation'
The Json paths:
{"jsonpaths":
[
"$.item[:].col1",
"$.item[:].col2",
"$.item[:].col3"
]
}
sample file:
{"item":
[
{
"col1":"A",
"col2":"b",
"col3":"d"
},
{
"col1": "123",
"col2": "red",
"col3": "456"
}
]
}
Working FILE:-
{"jsonpaths":
[
"$.item[0].col1",
"$.item[0].col2",
"$.item[0].col3"
]
}
What am I doing wrong to cause this error?
As per the documentation, there are 2 ways of specifying the JSONPaths. One is to use the dot notation and another is to use the bracket notation.
In this example, the user has used the dot notation, but the arrays have been indexed using a colon (:). The correct way to index JSON arrays elements is to use a number. Hence the second example of the JSONPath file works.
Related
I am new to Apache Spark and I am trying to compare two json files.
My requirement is to find out that which key/value is added, removed or modified and what is its path.
To explain my problem, I am sharing the code which I have tried with a small json sample here.
Sample Json 1 is:
{
"employee": {
"name": "sonoo",
"salary": 57000,
"married": true
} }
Sample Json 2 is:
{
"employee": {
"name": "sonoo",
"salary": 58000,
"married": true
} }
My code is:
//Compare two multiline json files
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
//Load first json file
val jsonData_1 = sqlContext.read.json(sc.wholeTextFiles("D:\\File_1.json").values)
//Load second json file
val jsonData_2 = sqlContext.read.json(sc.wholeTextFiles("D:\\File_2.json").values)
//Compare both json files
jsonData_2.except(jsonData_1).show(false)
The output which I get on executing this code is:
+--------------------+
|employee |
+--------------------+
|{true, sonoo, 58000}|
+--------------------+
But here only one field i.e. salary was modified so output should be only the updated field with its path.
Below is the expected output details:
[
{
"op" : "replace",
"path" : "/employee/salary",
"value" : 58000
}
]
Can anyone point me in the right direction?
Assuming each json has an identifier, and that you have two json groups (e.g. folders), you need to compare b/w the jsons in the two groups:
Load the jsons from each group into a dataframe, providing a schema matching the structure of the son. After this, you have two dataframes.
Compare the jsons (by now rows in a dataframe) by joining on the identifiers, looking for mismatched values.
I have a JSON file arranged in this pattern:
[
{
"Title ID": "4224031",
"Overtime Status": "Non-Exempt",
"Shift rates": "No Shift rates",
"On call rates": "No On call rates"
},
[
{
"Step: 1.0": [
"$38.87",
"(38.870000)"
]
}
]
][
{
"Title ID": "4225031",
"Overtime Status": "Non-Exempt",
"Shift rates": "No Shift rates",
"On call rates": "No On call rates"
},
[
{
"Step: 1.0": [
"$38.87",
"(38.870000)"
]
}
]
]
I am trying to get it into a Pandas DataFrame. I have tried opening a connection to the JSON file and running JSON.load(s). Unfortunately, I get JSON decode errors like: "JSONDecodeError: Extra data: line 16 column 2 (char 182)". When running the JSON through a linter, I see that there might be an issue with the way the JSON is presented in the file. The parts between the brackets are valid but when wrapped in brackets, become invalid. I have then tried to get at the dictionaries with the wrapping brackets but have not been able to make much progress. Does anyone have tips on how I can successfully access this JSON data and get it into a pandas DataFrame?
The json is invalid beacuase it has more than one root in this representation.
This has to be like this
jsonObject = [{"1":"3"}], [{"4":"5"}]
Hacks that I am able to think of are replace these brackets ][ to this ],[ by find and replace in editor. You'll be able to then create a dataframe as its a list now.
Second, if its not a one time job, then you need to write a regex that can do this for you in text cleaning pipeline(or code). I'm not good at writing of working regex(sorry mate).
I found a solution.
First, after examining the JSON data in a linter, I found that I had some extra brackets and braces at different points. So, I am running the data through a regex that cleans out the unnecessary brackets and braces.
Next, I run each line, which now looks like a string dictionary through json.loads
Finally, I call pd.DataFrame(pd.json_normalize(data)) to get my desired pandas dataframe.
Thanks for the help from commenters.
I am trying to extract a node from a json file for a json element that matches another node in same element.
To be more specific, I want the names of all students in the sample json below who has "certified":"false"
Example JSON
{
"Students": [
{
"name": "John",
"Rank": "1",
"certified":"false"
},
{
"name": "Ashley",
"Rank": "5",
"certified":"true"
}
]
}
Code i am using is (gives me empty output) :
Library JSONLibrary
JSON_Verification
[Documentation] Testing JSON load logic
${metadataJson_object}= Load JSON From File ../TestData/sample.json
Log ${metadataJson_object}
#{studentName}= Get Value From Json ${metadataJson_object} "$..[?(#.certified=='false')]#.name"
Log #{studentName}
Indeed you were very close to it, # notation is not needed after filter. Just change the json path to =
$.Students[?(#.certified=='false')].name
Here :
$ -> root element
. -> child operator or to access the property
?() -> filter expression and
# -> current node
${json}= Convert String to JSON ${Getjson}
${name}= Get Value From Json ${json} $.Students[?(#.certified=='false')].name
Output
I need to get the count of card from json file. For this I've used $.storedCards.cards.lenght
in JSON Extractor but it doesn't work. There is an error message:
Options AS_PATH_LIST and ALWAYS_RETURN_LIST are not allowed when using path functions!
After that I've tried JSR223 PostProcessor with next script on goovy
def jsonText = '''${AllCards}''' //${AllCards} has json value
def json = new JsonSlurper().parseText(jsonText)
log.info( "Json length---------->"+json.resource.size())
${CardsCount} = props.get("4") //vars.put(json.resource.size.toString())
but there is problem with set value to my variable. Or when i've created variable in Groovy it was impossible to use outside from script.
My json file
"storedCards":
{
"cards":
[
{
"CardId":"123",
"cardBrand":"Visa",
"lastFourDigits":"2968",
},
{
"CardId":"321",
"cardBrand":"Visa",
"lastFourDigits":"2968",
},
..........
],
How can i get the count of card and set to my Variables? what should i use for this?
Your JSON data seems to be invalid. Assuming you have the valid JSON like below, I'm answering your question.
{
"storedCards": {
"cards": [
{
"CardId": "123",
"cardBrand": "Visa",
"lastFourDigits": "2968"
},
{
"CardId": "321",
"cardBrand": "Visa",
"lastFourDigits": "2968"
}
]
}
}
You dont need to write Groovy code, you can resolve this using JSON Extractor. Instead of using length function, use JSON path predicate like this-
$.storedCards.cards[*]
Though Variable you used in JSON Extractor won't give the solution right away, another JMeter function helps - __RandomFromMultipleVars
Excerpt from documentation -
The RandomFromMultipleVars function returns a random value based on the variable values provided by Source Variables.
The variables can be simple or multi-valued as they can be generated by the following extractors:
Boundary Extractor
Regular Expression Extractor
CSS Selector Extractor
JSON Extractor
XPath Extractor
XPath2 Extractor
Multi-value vars are the ones that are extracted when you set -1 for
Match Numbers. This leads to creation of match number variable called
varName_matchNr and for each value to the creation of variable
varName_n where n = 1, 2, 3 etc.
So once you use the predicate, you will get the count in the yourVariableName_matchNr. Example:-
Hope this help.
I have a JSON response like below
{
"queryStartDate": "20170523134739822",
"queryEndDate": "20170623134739822",
"Rows": [
{
"hasScdHistoryOnly": false,
"Values": [
"1",
"53265",
"CO"
]
},
{
"hasScdHistoryOnly": false,
"Values": [
"1",
"137382",
"CO"
]
},
{
"hasScdHistoryOnly": false,
"Values": [
"1",
"310824",
"CO"
]
}
]
}
I am using Jmeter's JSON Extractor post-processor to receive the second value from the last of the 'Values' list. i.e. 53265, 137382, 310824.
I've tried to use $.Rows[*].Values[-2:-1], and $.Rows[*].Values[(#.length-2)], according to Stefan's introduction: http://goessner.net/articles/JsonPath/index.html#e2, but neither of them are working. Would you please help me out?
I believe JMeter is using JayWay JSON Path library, so you should be looking for the documentation here instead.
In general I would recommend using JSR223 PostProcessor as an alternative to JSON Path Extractors, both are applicable for basic scenarios only, when it comes to advanced queries and operators their behaviour is flaky.
Add JSR223 PostProcessor as a child of the request which returns above JSON
Make sure you have "groovy" selected in the "Language" drop down and "Cache compiled script if available" box is ticked
Put the following code into "Script" area
def values = com.jayway.jsonpath.JsonPath.parse(prev.getResponseDataAsString()).read('$..Values')
values.eachWithIndex { val, idx ->
vars.put('yourVar_' + (idx + 1), val.get(val.size()-2))
}
It should generate the following JMeter Variables:
yourVar_1=53265
yourVar_2=137382
yourVar_3=310824
which seem to be something you're looking for.
References:
Groovy: Parsing and producing JSON
Apache Groovy - Why and How You Should Use It
Using View Results tree's JSon Path Tester I could see that the following expression you used for extracting the values were not correct (correct for online JSONPath Online Evaluator but not working for JMeter)
Used Expression: $.Rows[*].Values[-2:-1]
Output from JSon Path Tester: No Match Found.
Used Expression: $.Rows[*].Values[(#.length-2)]
Output from JSon Path Tester: Exception: Could not parse token starting at position 16. Expected ?, ', 0-9, *
If the expression $.Rows[*].Values[1] is used it extracts the desired responses.
Used Expression: $.Rows[*].Values[1]
Output from JSon Path Tester:
Result[0]=53265
Result[1]=137382
Result[2]=310824