Spark Dataframe to JSON error - json

I have tried the df.write.mode("append").json(targetPath)method for writing multiple rows into a JSON file. Unfortunately, the output always seem to be missing commas between objects. For example, I am getting the following output
{"time_stamp":"2016-12-08 01:45:00","Temperature":0.8,"Energy":111111.5,"Net_Energy":1111.3}
{"time_stamp":"2016-12-08 02:00:00","Temperature":21.9,"Energy":222222.5,"Net_Energy":222.0}
While I was expecting
{"time_stamp":"2016-12-08 01:45:00","Temperature":0.8,"Energy":111111.5,"Net_Energy":1111.3},
{"time_stamp":"2016-12-08 02:00:00","Temperature":21.9,"Energy":222222.5,"Net_Energy":222.0}
As you can see that there is no comma between two objects in the output.
Am I missing anything?
Also, is there a method to save each row in a separate JSON file without going to through the expensive filter() method?
Any help would be highly appreciated.

Related

Extract JSON data with Data Flow - Flatten Transformation

Following a previous question I started working on a Data Flow, with the purpose of flattening a JSON file, created as a result of an Application Insights REST query. You can find an anonymised version here.
My goal is to extract the data in the "rows" array of arrays, but I end up with the data duplicated in a cartezian manner (I got an original number of 18 rows and I end up with 324, 18*18).
I cannot understand what I am doing wrong or if it is an issue with the JSON "rows" array of arrays.
Here is my Data Flow - Source has the "Document per line" JSON option, "Single documents" raises a [unexpected character "] error, probably due to the strange formatting in the JSON:
This is the Data Preview in the Source - as you can see, it is only one "tables" node, with 18 elements in the "rows" array:
rows:
I tried to Flatten it, but I cannot map "rows" data to a column, I cannot use something like table.rows[0]:
Also, the rows data gets duplicated - 18 rows for each of the 18 rows outputted:
I am not sure how to get to the bottom of this, if it's the JSON format or if I am doing something wrong. From my experience it's probably the latter.
I think this is caused by your special format.
Please try this:
add a rowdata property
flatten your data
#Steve Zhao thank you! But that solution duplicated the data similar to the original situation:
I did not manage to treat these data as JSON so I ended up thinking about it as text that can be manipulated into an array.
So I split the text by "rows:" and retrieve the second part of the split (arrays in Expression Builder start from 1):
Then I split that text as an array:
Which then I can flatten (at last):
From here on I keep splitting these values to get the data I need - I was interested in the first two and fourth column.

Cannot identify proper format for a json request body stored and used in csv file for use in a karate scenario

Am having trouble identifying the propert format to store a json request body in csv format, then use the csv file value in a scenario.
This works properly within a scenario:
And request '{"contextURN":"urn:com.myco.here:env:booking:reservation:0987654321","individuals":[{"individualURN":"urn:com.myco.here:env:booking:reservation:0987654321:individual:12345678","name":{"firstName":"NUNYA","lastName":"BIDNESS"},"dateOfBirth":"1980-03-01","address":{"streetAddressLine1":"1 Myplace","streetAddressLine2":"","city":"LANDBRANCH","countrySubdivisionCode":"WV","postalCode":"25506","countryCode":"USA"},"objectType":"INDIVIDUAL"},{"individualURN":"urn:com.myco.here:env:booking:reservation:0987654321:individual:23456789","name":{"firstName":"NUNYA","lastName":"BIZNESS"},"dateOfBirth":"1985-03-01","address":{"streetAddressLine1":"1 Myplace","streetAddressLine2":"","city":"BRANCHLAND","countrySubdivisionCode":"WV","postalCode":"25506","countryCode":"USA"},"objectType":"INDIVIDUAL"}]}'
However, when stored in csv file as follows (I've tried quite a number other formatting variations)
'{"contextURN":"urn:com.myco.here:env:booking:reservation:0987654321","individuals":[{"individualURN":"urn:com.myco.here:env:booking:reservation:0987654321:individual:12345678","name":{"firstName":"NUNYA","lastName":"BIDNESS"},"dateOfBirth":"1980-03-01","address":{"streetAddressLine1":"1 Myplace","streetAddressLine2":"","city":"LANDBRANCH","countrySubdivisionCode":"WV","postalCode":"25506","countryCode":"USA"},"objectType":"INDIVIDUAL"},{"individualURN":"urn:com.myco.here:env:booking:reservation:0987654321:individual:23456789","name":{"firstName":"NUNYA","lastName":"BIZNESS"},"dateOfBirth":"1985-03-01","address":{"streetAddressLine1":"1 Myplace","streetAddressLine2":"","city":"BRANCHLAND","countrySubdivisionCode":"WV","postalCode":"25506","countryCode":"USA"},"objectType":"INDIVIDUAL"}]}',
and used in scenario as:
And request requestBody
my test returns an "javascript evaluation failed: " & the json above & :1:63 Missing close quote ^ in at line number 1 at column number 63
Can you please identify correct formatting or the usage errors I am missing? Thanks
We just use a basic CSV library behind the scenes. I suggest you roll your own Java helper class that does whatever processing / pre-processing you need.
Do read this answer as well: https://stackoverflow.com/a/54593057/143475
I can't make sense of your JSON but if you are trying to fit JSON into CSV, sorry - that's not a good idea. See this answer: https://stackoverflow.com/a/62449166/143475

Way to extract columns from a CSV and place them into a dictionary

So basically I'm at a wall with an assignment and it's beginning to really frustrate me. Essentially I have a CSV file and my goal is to count how an the amount of times a string is called. So like column 1 would have a string and column 2 would have a integer connected to it. I ultimately need this to be formatted into a dictionary. Where I am stuck is how the heck do I do this without using imported libraries. I am only allowed to iterate through the file using for loops. Would my best bet be indexing each line and creating that into a string and count how many times that string is called? Any insight would be appreciated.
If you don't want to you any library (and assuming you are using python) you can use a dict comprehension, like this:
with open("data.csv") as file:
csv_as_dict = {line[0]: line[1] for line in file.readlines()}
Note: The question is possibly a duplicate of Creating a dictionary from a csv file?.

Python: Dump JSON Data Following Custom Format

I'm working on some Python code for my local billiard hall and I'm running into problems with JSON encoding. When I dump my data into a file I obviously get all the data in a single line. However, I want my data to be dumped into the file following the format that I want. For example (Had to do picture to get point across),
My custom JSON format
. I've looked up questions on custom JSONEncoders but it seems they all have to do with datatypes that aren't JSON serializable. I never found a solution for my specific need which is having everything laid out in the manner that I want. Basically, I want all of the list elements to on a separate row but all of the dict items to be in the same row. Do I need to write my own custom encoder or is there some other approach I need to take? Thanks!

find the syntax error in JSON

I am trying to find the syntax error in the JSON data since an hour!
Unable to attach the full code, hence uploading in GoogleDocs
Json.data
JSON is supposed to represent a single JavaScript Object, yet what you linked to includes two objects : the first object ends at line 20 and the parser would expect to reach the end of the file here, yet it finds a comma and some other data.
You might want to include these multiple objects in an array or another wrapper object.
Additionnaly, you have an extra comma after your "VPC" object, which might upset some parsers.