Inserting json file into Cassandra table - json

I am currently using the Cassandra-Ruby driver to insert data from a JSON file into an existing table in my database.
the JSON file looks like this:
[
{
"id": "123",
"destination": "234",
"type": "equipment",
"support": "type 1",
"test": "test1"
},
{
"id": "234",
"destination": "123",
"type": "equipment",
"support": "type 1",
"test": "test1"
}
]
I am reading in the file like this:
file = File.read('itemType.json')
data_hash = JSON.parse(file) #return an array of hashes
Iterate through the array and get each hash
and insert each hash onto the table
data_hash.each do |has|
#check the type of each object
#puts has.class #return hash
insert_statement = session.prepare('INSERT INTO keyspace.table JSON ?')
session.execute(insert_statement, [has]) #error occurs here
end
After running this code, I get this error message
in `assert_instance_of': options must be a Hash
I checked that each object being inserted in the table is a hash, so I'm not sure why I'm getting this issue.

You are saying that you are inserting a JSON but you are not, you are trying to insert an object. See this example from the documentation:
INSERT INTO cycling.cyclist_category JSON '{
"category" : "Sprint",
"points" : 700,
"id" : "829aa84a-4bba-411f-a4fb-38167a987cda"
}';
You have to give it a json format if you do it like that.

using .to_json add \ escape character. This gave me error
INSERT INTO organization_metadata JSON '{\"id\":9150,\"destroyed\":false,\"name\":\"ABC\",\"timestamp\":1510541801000000000}';
and the following worked.
INSERT INTO organization_metadata JSON '{"id":9150,"destroyed":false,"name":"ABC","timestamp":1510541801000000000}';

Related

Talend- Need to extract data from JSON (JSON array) and load it to Oracle DB

I have a Talend Job that receives a JSON(JSON format below) from a route. I need to extract data from JSON and load it to Oracle DB table.
Job
JSON format:
{
"data": [
{
"name": "FRSC-01",
"recordnum": "01",
"Expense1": "100",
"Expense2": "7265",
"Expense3": "9000"
},
{
"name": "FRSC-02",
"recordnum": "",
"Expense1": "200",
"Expense2": "6000",
"Expense3": "9000"
},
{
"name": "FRSC-03",
"recordnum": "03",
"Expense1": "200",
"Expense2": "7000",
"Expense3": "8000"
}
]
}
You can use tExtractJsonFields component to extract data from your json.
Define a schema with the columns you want from the json (name, recordNum, Expense1, Expense2, Expense3), set loop jsonpath query to "$.data[*]", and then for each column set the jsonpath expression like so:
name => "name"
recordNum => "recordNum"
...
And then just use a tMap to map the columns to your target table in the tOracleOutput component.

Need to relationalise a nested JSON string using Pyspark

I am new to Pyspark and need guidance to perform the following task.
A sample data in the form of a JSON string has been given,
{
"id": "1234",
"location": "znd",
"contact": "{\"phone\": [{\"number\":\"12345\",\"code\":\"111\",\"altno\":\"No\"},{\"number\":\"55656\",\"code\":\"222\",\"altno\":\"Yes\"}]}"
}
This needs to be rationalized as follows, as seen below one row of input will get translated to 2 rows.
{id: "1234", "location": "znd","number": "12345", "code": "111","altno":"No"}
{id: "1234", "location": "znd","number": "55656", "code": "222","altno":"No"}
I have tried to use the explode function but as this is a JSON string, explode does not work on it.
I have read the data into a DF and tried to enforce a struct type to later use explode, but that does not work either.

SOLR post json file Default fieldtype

I have POSTAL_CODE field in my json file. If I try importing that data to SOLR using solr/post, the fieldtype is being set as 'plongs' which is not suitable for data like "108-0023". Beacause of that the data import is throwing out an error. Is there any work around for this kind of issues?
Edit:
Sample data which you might use to check it.
{
"id": "1",
"POSTAL_CODE": "1982"
},
{
"id": "2",
"POSTAL_CODE": "1947"
},
{
"id": "3",
"POSTAL_CODE": "19473"
},
{
"id": "4",
"POSTAL_CODE": "19471"
},
{
"id": "5",
"POSTAL_CODE": "1947-123"
}
In the above sample, I don't understand why 'id' is not being considered as 'plongs' or 'pints' but only 'POSTAL_CODE' has that issue. if the first element has POSTAL_CODE as, say "1947-145" then the field type is being taken as 'text_general'. Generally if the value has double quotes, (i.e., "Data": "123") shouldn't it be considered as a string value?
Remove the collection, create it as new and before you index anything, define a field POSTAL_CODE in your schema as type string. This will then index any incoming data on this field without guessing, but instead use the string type, which means it is indexed as-is.
Copied and adapted from https://lucene.apache.org/solr/guide/7_0/schema-api.html, but untested:
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field":{
"name":"POSTAL_CODE",
"type":"string",
"stored":true }
}' http://localhost:8983/solr/yourcollectionhere/schema
I tried to import the data by creating a raw json document with the field POSTAL_CODE. Below is my json & my solr version is 7.2.1
{"array": [1,2,3],"boolean": true,"color": "#82b92c","null": null,"number": 123,"POSTAL_CODE": "108-0023"}
It is indexed as Text Field in solr below is the attached screenshot. Command I triggered to index the data is as below:
bin/post -c gettingstarted test.json
Could you please provide the sample data and version of solr on which you are facing this issue.

How to edit a json dictionary in Robot Framework

I am currently implementing some test automation that uses a json POST to a REST API to initialize the test data in the SUT. Most of the fields I don't have an issue editing using information I found in another thread: Json handling in ROBOT
However, one of the sets of information I am editing is a dictionary of meta data.
{
"title": "Test Auotmation Post 2018-03-06T16:12:02Z",
"content": "dummy text",
"excerpt": "Post made by automation for testing purposes.",
"name": "QA User",
"status": "publish",
"date": "2018-03-06T16:12:02Z",
"primary_section": "Entertainment",
"taxonomy": {
"section": [
"Entertainment"
]
},
"coauthors": [
{
"name": "QA User - CoAuthor",
"meta": {
"Title": "QA Engineer",
"Organization": "That One Place"
}
}
],
"post_meta": [
{
"key": "credit",
"value": "QA Engineer"
},
{
"key": "pub_date",
"value": "2018-03-06T16:12:02Z"
},
{
"key": "last_update",
"value": "2018-03-06T16:12:02Z"
},
{
"key": "source",
"value": "wordpress"
}
]
}
Is it possible to use the Set to Dictionary Keyword on a dictionary inside a dictionary? I would like to be able to edit the value of the pub_date and last_update inside of post_meta, specifically.
The most straightforward way would be to use the Evaluate keyword, and set the sub-dict value in it. Presuming you are working with a dictionary that's called ${value}:
Evaluate $value['post_meta'][1]['pub_date'] = 'your new value here'
I won't get into how to find the index of the post_meta list that has the 'key' with value 'pub_date', as that's not part of your question.
Is it possible to use the Set to Dictionary Keyword on a dictionary inside a dictionary?
Yes, it's possible.
However, because post_meta is a list rather than a dictionary, you will have to write some code to iterate over all of the values of post_meta until you find one with the key you want to update.
You could do this in python quite simply. You could also write a keyword in robot to do that for you. Here's an example:
*** Keywords ***
Set list element by key
[Arguments] ${data} ${target_key} ${new_value}
:FOR ${item} IN #{data}
\ run keyword if '''${item['key']}''' == '''${target_key}'''
\ ... set to dictionary ${item} value=${new_value}
[return] ${data}
Assuming you have a variable named ${data} contains the original JSON data as a string, you could call this keyword like the following:
${JSON}= evaluate json.loads('''${data}''') json
set list element by key ${JSON['post_meta']} pub_date yesterday
set list element by key ${JSON['post_meta']} last_update today
You will then have a python object in ${JSON} with the modified values.

Map Reduce to parse JSON data in hadoop 2.2

Hello I have a JSON in the following format.I need to parse this in the map function to get the gender information of all the records.
[
{
"SeasonTicket" : false,
"name" : "Vinson Foreman",
"gender" : "male",
"age" : 50,
"email" : "vinsonforeman#cyclonica.com",
"annualSalary" : "$98,501.00",
"id" : 0
},
{
"SeasonTicket": true,
"name": "Genevieve Compton",
"gender": "female",
"age": 28,
"email": "genevievecompton#cyclonica.com",
"annualSalary": "$46,881.00",
"id": 1
},
{
"SeasonTicket": false,
"name": "Christian Crawford",
"gender": "male",
"age": 53,
"email": "christiancrawford#cyclonica.com",
"annualSalary": "$53,488.00",
"id": 2
}
]
I have tried using JSONparser but am not able to get through the JSON structure.I have been advised to use JAQL and pig but cannot do so.
Any help would be appreciated.
What I understand is that you have a huge file with an array of JSONs. Of this, you need to read the same to a mapper and emit say <id : gender>. The challenge is that JSON falls across to multiple lines.
In this is the case, I would suggest you to change the default delimiter to "}" instead of "\n".
In this case, you will be able to get parts of the JSON into the map method as value. You can discard the key ie. byte offset and do slight re-fractor on the value like removing off unwanted [ ] or , and adding chars like "}" and then parse the remaining string.
This solution works because there is no nesting within JSON and } is a valid JSON end delimiter as per the given example.
For changing the default delimiter, just set the property textinputformat.record.delimiter to "}"
Please check out this example.
Also check this jira.