How to read non-delimited JSON using Pig? - json

I have a json file where raw text looks like this:
{a:1,b:2,c:3}{a:3,b:3,c:5}{a:3,b:3,c:9}
Doing
raw = LOAD 'jsonfile.text' USING JsonLoader('a:chararry,b:chararray,c:chararry') ;
dump raw;
only returns 1 record.
Actual excerpt from log:
Input(s): Successfully read 1 records (630644858 bytes) from:
"s3n://logstash/ls.s3.ip-10-45-56-56.2016-03-02T23.10.part42.txt"
Output(s): Successfully stored 1 records (1900 bytes) in:
"hdfs://nameservice1/tmp/temp-1489272670/tmp-1959659634"
It looks like only the first record of the JSON is being read. The Json file is not delimited.
Anyone have any tips?

I would suggest doing a first pass which does the string replacement }{ -> }\n{. Then you will have one valid json object per line, and the json parsing should work.

Check the twitter elephant bird jar, that can be used to work with literally any kind of JSON data.
Check this for reference - Sample pig script working on JSON Data similar to yours!
https://gist.github.com/neilkod/2898455
Hope this helps!! <><

Related

wso2 convert json/xml to csv and write to a csv file

i'm trying to create tab-delimited csv data from json/xml data. While I can do this using payload factory mediator in an iterate loop; the data gets appended to the same line in the file every iteration, creating a long line of data. I want to be append to the next line, but i've been unable to find a way. Any suggestions? Thanks.
(I do not want a solution which uses a csv connector or module)
Edit: I solved it, you just need to use an xslt and "
" line break character.

Error when importing GeoJson into BigQuery

I'm trying to load GeoJson data [1] into BigQuery via Cloud Shell but I'm getting the following error:
Failed to parse JSON: Top-level GeoJson 'type' member should have value 'Feature', but was 'FeatureCollection'.; ParsedString returned false; Could not parse value; Parser terminated before end of string
It feels like the GeoJson file is not formatted properly for BQ but I have no idea if that's true or how to fix it.
[1] https://github.com/tonywr71/GeoJson-Data/blob/master/australian-suburbs.geojson
Expounding on #scespinoza's answer, I was able to convert to new-line delimited GeoJSON and load it to Bigquery with the following steps:
geojson2ndjson geodata.txt > geodata_converted.txt
Using this command, I encountered an error:
But was able to create a workaround by splitting the data into 2 tables, applying the same command.
Loaded table in Bigquery:
Your file is in standard GeoJSON format, but BigQuery only accepts new-line delimited GeoJSON files and individual GeoJSON objects (see documentation: https://cloud.google.com/bigquery/docs/geospatial-data#geojson-files). So, you should first convert the dataset to the appropiated format. Here is a good and simple explanation on how it works: https://stevage.github.io/ndgeojson/.

JMeter - Save complete JSON response of all the request to CSV file for test data preparation

I need to create test data preparation script and capture JSON response data to CSV file.
In the actual test, I need to read parameters from CSV file.
Is there any possibilities of saving entire JSON data as filed in CSV file (or) need to extract each filed and save it to CSV file?
The main issue JSON have comma, You can overcome it by saving JSON to file and use different delimiter instead of comma separated, for example #
Then read file using CSV Data Set Config using # Delimiter
Delimiter to be used to split the records in the file. If there are fewer values on the line than there are variables the remaining variables are not updated - so they will retain their previous value (if any).
Also you can save JSON in every row and then get data using different delimiter as #
You can save entire JSON response into a JMeter Variable by adding a Regular Expression Extractor as a child of the HTTP Request sampler which returns JSON and configuring it like:
Name of created variables: anything meaningful, i.e. response
Regular Expression: (?s)(^.*)
Template: $1$
Then you need to declare this response as a Sample Variable by adding the next line to user.properties file:
sample_variables=response
And finally you can use Flexible File Writer plugin to store the response variable into a file, if you don't have any other Sample Variables you should use variable#0

how to save json data into csv using python

[I have json data like this 1]
I wanted to save the json into csv
the out put will be like this ,each tittle will be holding the information in that titile
I hope this gets converted to a comment, but look at Pandas, it can probably do what you want (Pandas json to csv)

How to load OSM (GeoJSON) data to ArangoDB?

How I can load OSM data to ArangoDB?
I loaded data sed named luxembourg-latest.osm.pbf from OSM, than converted it to JSON with OSMTOGEOJSON, after I tried to load result geojson to ArangoDB with next command: arangoimp --file out.json --collection lux1 --server.database geodb and got hude list of errors:
...
2017-03-17T12:44:28Z [7712] WARNING at position 719386: invalid JSON type (expecting object, probably parse error), offending context: ],
2017-03-17T12:44:28Z [7712] WARNING at position 719387: invalid JSON type (expecting object, probably parse error), offending context: [
2017-03-17T12:44:28Z [7712] WARNING at position 719388: invalid JSON type (expecting object, probably parse error), offending context: 5.867441,
...
What I am doing wrong?
upd: it's seems that converter osm2json converter should be run with option osmtogeojson --ndjson that produce items not as single Json, but in line by line mode.
As #dmitry-bubnenkov already found out, --ndjson is required to produce the right input for ArangoImp.
One has to know here, that ArangoImp expects a JSON-Subset (since it doesn't parse the json on its own) dubbed as JSONL.
Thus, Each line of the JSON-File is expected to become one json document in the collection after the import. To maximize performance and simplify the implementation, The json is not completely parsed before sending it to the server.
It tries to chop the JSON into chunks with the maximum request size that the server permits. It leans on the JSONL-line endings to isolate possible chunks.
However, the server expects valid JSON for sure. Sending the chopped part to the server with possibly incomplete JSON documents will lead to parse errors on the server, which is the error message you saw in your output.