Here I have a json input which I want to import into cassandra so i am using json2stable as below
./json2sstable -K yelp -c business /home/srinath/Desktop/test.json /home/srinath/Desktop/CD/Cassandra/cassandra/data/yelp/business/Standard1-e-1-Data.db
Output:
ERROR 15:03:02,594 Unable to initialize MemoryMeter (jamm not specified as javaagent). This means Cassandra will be unable to measure object sizes accurately and may consequently OOM.
org.codehaus.jackson.map.JsonMappingException: Can not deserialize instance of java.lang.Object[] out of START_OBJECT token
at [Source: /home/srinath/Desktop/test.json; line: 1, column: 1]
at org.codehaus.jackson.map.JsonMappingException.from(JsonMappingException.java:163)
at org.codehaus.jackson.map.deser.StdDeserializationContext.mappingException(StdDeserializationContext.java:219)
at org.codehaus.jackson.map.deser.StdDeserializationContext.mappingException(StdDeserializationContext.java:212)
at org.codehaus.jackson.map.deser.std.ObjectArrayDeserializer.handleNonArray(ObjectArrayDeserializer.java:177)
at org.codehaus.jackson.map.deser.std.ObjectArrayDeserializer.deserialize(ObjectArrayDeserializer.java:88)
at org.codehaus.jackson.map.deser.std.ObjectArrayDeserializer.deserialize(ObjectArrayDeserializer.java:18)
at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2695)
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1294)
at org.codehaus.jackson.JsonParser.readValueAs(JsonParser.java:1368)
at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:351)
at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:335)
at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:559)
ERROR: Can not deserialize instance of java.lang.Object[] out of START_OBJECT token
at [Source: /home/srinath/Desktop/test.json; line: 1, column: 1]
================================================================================================================================================
Sample Json:
{
"business_id": "qarobAbxGSHI7ygf1f7a_Q",
"full_address": "891 E Baseline Rd\nSuite 102\nGilbert, AZ 85233",
"open": true,
"categories": [
"Sandwiches",
"Restaurants"
],
"city": "Gilbert",
"review_count": 10,
"name": "Jersey Mike's Subs",
"neighborhoods": [],
"longitude": -111.8120071,
"state": "AZ",
"stars": 3.5,
"latitude": 33.3788385,
"type": "business"
}
cid | key | ts
-----+------+-----
101 | ramu | 999
[{
"columns":[["cid",101],["key","ramu"],["ts",687]]
}]
Above json format is based on the above table..
Like that you can prepare your json based on your table format and columns.
Related
How do I convert the following JSON into the relational rows that follow it? The part that I am stuck on is the fact that the pyspark explode() function throws an exception due to a type mismatch. I have not found a way to coerce the data into a suitable format so that I can create rows out of each object within the source key within the sample_json object.
JSON INPUT
sample_json = """
{
"dc_id": "dc-101",
"source": {
"sensor-igauge": {
"id": 10,
"ip": "68.28.91.22",
"description": "Sensor attached to the container ceilings",
"temp":35,
"c02_level": 1475,
"geo": {"lat":38.00, "long":97.00}
},
"sensor-ipad": {
"id": 13,
"ip": "67.185.72.1",
"description": "Sensor ipad attached to carbon cylinders",
"temp": 34,
"c02_level": 1370,
"geo": {"lat":47.41, "long":-122.00}
},
"sensor-inest": {
"id": 8,
"ip": "208.109.163.218",
"description": "Sensor attached to the factory ceilings",
"temp": 40,
"c02_level": 1346,
"geo": {"lat":33.61, "long":-111.89}
},
"sensor-istick": {
"id": 5,
"ip": "204.116.105.67",
"description": "Sensor embedded in exhaust pipes in the ceilings",
"temp": 40,
"c02_level": 1574,
"geo": {"lat":35.93, "long":-85.46}
}
}
}"""
DESIRED OUTPUT
dc_id source_name id description
-------------------------------------------------------------------------------
dc-101 sensor-gauge 10 Sensor attached to the container ceilings
dc-101 sensor-ipad 13 Sensor ipad attached to carbon cylinders
dc-101 sensor-inest 8 Sensor attached to the factory ceilings
dc-101 sensor-istick 5 Sensor embedded in exhaust pipes in the ceilings
PYSPARK CODE
from pyspark.sql.functions import *
df_sample_data = spark.read.json(sc.parallelize([sample_json]))
df_expanded = df_sample_data.withColumn("one_source",explode_outer(col("source")))
display(df_expanded)
ERROR
AnalysisException: cannot resolve 'explode(source)' due to data type
mismatch: input to function explode should be array or map type, not
struct....
I put together this Databricks notebook to further demonstrate the challenge and clearly show the error. I will be able to use this notebook to test any recommendations provided herein.
You can't use explode for structs but you can get the column names in the struct source (with df.select("source.*").columns) and using list comprehension you create an array of the fields you want from each nested struct, then explode to get the desired result :
from pyspark.sql import functions as F
df1 = df.select(
"dc_id",
F.explode(
F.array(*[
F.struct(
F.lit(s).alias("source_name"),
F.col(f"source.{s}.id").alias("id"),
F.col(f"source.{s}.description").alias("description")
)
for s in df.select("source.*").columns
])
).alias("sources")
).select("dc_id", "sources.*")
df1.show(truncate=False)
#+------+-------------+---+------------------------------------------------+
#|dc_id |source_name |id |description |
#+------+-------------+---+------------------------------------------------+
#|dc-101|sensor-igauge|10 |Sensor attached to the container ceilings |
#|dc-101|sensor-inest |8 |Sensor attached to the factory ceilings |
#|dc-101|sensor-ipad |13 |Sensor ipad attached to carbon cylinders |
#|dc-101|sensor-istick|5 |Sensor embedded in exhaust pipes in the ceilings|
#+------+-------------+---+------------------------------------------------+
Say if I have JSON entry as follows(The JSON file generated by fetching data from a Firebase DB):
[{"goal_savings": 0.0, "social_id": "", "score": 0, "country": "BR", "photo": "http://graph.facebook", "id": "", "plates": 3, "rcu": null, "name": "", "email": ".", "provider": "facebook", "phone": "", "savings": [], "privacyPolicyAccepted": true, "currentRole": "RoleType.PERSONAL", "empty_lives_date": null, "userId": "", "authentication_token": "-------", "onboard_status": "ONBOARDING_WIZARD", "fcmToken": ----------", "level": 1, "dni": "", "social_token": "", "lives": 10, "bills": [{"date": "2020-12-10", "role": "RoleType.PERSONAL", "name": "Supermercado", "category": "feeding", "periodicity": "PeriodicityType.NONE", "value": 100.0"}], "payments": [], "goals": [], "goalTransactions": [], "incomes": [], "achievements": [{"created_at":", "name": ""}]}]
How do I extract the content corresponding to 'value' which is present inside column 'bills' . Any way to do this ?
My python code is as follows. With this I was only able to get data within bills column. But I need only the entry corresponding to 'value' which is present inside bills.
import json
filedata = open('firebase-dataset.json','r')
data = json.load(filedata)
listoffields = [] # To produce it into a list with fields
for dic in data:
try:
listoffields.append(dic['bills']) # only non-essential bill categories.
except KeyError:
pass
print(listoffields)
The JSON you posted contains misplaced quotes.
I think you are trying to extract the value of 'value' column within bills.
try this
print(listoffields[0][0]['value'])
which will print you 100.0 as str. use float() to use it in calculations.
---edit---
Say the JSON you having contains many JSON objects separated by commas as..
[{ first-entry },{ second-entry },{ third.. }, ....and so on]
..and you want to find the value of each bill in the each JSON obj..
may be the code below will work.-
bill_value_list = [] # to store 'value' of each bill
for bill_list in listoffields:
bill_value_list.append(float(bill_list[0]['value'])) # blill_list[0] will contain complete bill dictionary.
print(bill_value_list)
print(sum(bill_value_list)) # do something usefull
Paste it after the code you posted.(no changes to your code .. since it always works :-) )
I've been struggling with this one for the whole day which i want to turn to a csv.
It represents the officers attached to company whose number is "OC418979" in the UK Company House API.
I've already truncated the json to contain just 2 objects inside "items".
What I would like to get is a csv like this
OC418979, country_of_residence, officer_role, appointed_on
OC418979, country_of_residence, officer_role, appointed_on
OC418979, country_of_residence, officer_role, appointed_on
OC418979, country_of_residence, officer_role, appointed_on
...
There are 2 extra complication: there are 2 types of "officers", some are people, some are companies, so not all key in people are present in the other and viceversa. I'd like these entries to be 'null'. Second complication is those nested objects like "name" which contains a comma in it! or address, which contains several sub-objects (which I guess I could flatten in pandas tho).
{
"total_results": 13,
"resigned_count": 9,
"links": {
"self": "/company/OC418979/officers"
},
"items_per_page": 35,
"etag": "bc7955679916b089445c9dfb4bc597aa0daaf17d",
"kind": "officer-list",
"active_count": 4,
"inactive_count": 0,
"start_index": 0,
"items": [
{
"officer_role": "llp-designated-member",
"name": "BARRICK, David James",
"date_of_birth": {
"year": 1984,
"month": 1
},
"appointed_on": "2017-09-15",
"country_of_residence": "England",
"address": {
"country": "United Kingdom",
"address_line_1": "Old Gloucester Street",
"locality": "London",
"premises": "27",
"postal_code": "WC1N 3AX"
},
"links": {
"officer": {
"appointments": "/officers/d_PT9xVxze6rpzYwkN_6b7og9-k/appointments"
}
}
},
{
"links": {
"officer": {
"appointments": "/officers/M2Ndc7ZjpyrjzCXdFZyFsykJn-U/appointments"
}
},
"address": {
"locality": "Tadcaster",
"country": "United Kingdom",
"address_line_1": "Westgate",
"postal_code": "LS24 9AB",
"premises": "5a"
},
"identification": {
"legal_authority": "UK",
"identification_type": "non-eea",
"legal_form": "UK"
},
"name": "PREMIER DRIVER LIMITED",
"officer_role": "corporate-llp-designated-member",
"appointed_on": "2017-09-15"
}
]
}
What I've been doing is creating new json objects extracting the fields I needed like this:
{officer_address:.items[]?.address, appointed_on:.items[]?.appointed_on, country_of_residence:.items[]?.country_of_residence, officer_role:.items[]?.officer_role, officer_dob:items.date_of_birth, officer_nationality:.items[]?.nationality, officer_occupation:.items[]?.occupation}
But the query runs for hours - and I am sure there is a quicker way.
Right now I am trying this new approach - creating a json whose root is the company number and as argument a list of its officers.
{(.links.self | split("/")[2]): .items[]}
Using jq, it's easier to extract values from the top-level object that will be shared and generate the desired rows. You'll want to limit the amounts of times you go through the items to at most once.
$ jq -r '(.links.self | split("/")[2]) as $companyCode
| .items[]
| [ $companyCode, .country_of_residence, .officer_role, .appointed_on ]
| #csv
' input.json
Ok, you want to scan the list of officers, extract some fields from there if they are present and write that in csv format.
First part is to extract the data from the json. Assuming you loaded it is a data Python object, you have:
print(data['items'][0]['officer_role'], data['items'][0]['appointed_on'],
data['items'][0]['country_of_residence'])
gives:
llp-designated-member 2017-09-15 England
Time to put everything together with the csv module:
import csv
...
with open('output.csv', 'w', newline='') as fd:
wr = csv.writer(fd)
for officer in data['items']:
_ = wr.writerow(('OC418979',
officer.get('country_of_residence',''),
officer.get('officer_role', ''),
officer.get('appointed_on', '')
))
The get method on a dictionnary allows to use a default value (here the empty string) if the key is not present, and the csv module ensures that if a field contains a comma, it will be enclosed in quotation marks.
With your example input, it gives:
OC418979,England,llp-designated-member,2017-09-15
OC418979,,corporate-llp-designated-member,2017-09-15
{
"gameId": 32,
"participantIdentities": [
{
"player": {
"id": "123",
"name": "xxx",
},
"participantId": 1
},
{
"player": {
"id": "123",
"name": "yyyy",
},
"participantId": 2
}
]
"gameDuration": 143,
}
I am trying to print names in this json file in python 3
list_id = []
for info in matchinfo['participantIdentities']['player']['name']:
list_id.append(info)
But I get the following error below
TypeError: list indices must be integers or slices, not str
How do I get the content of 'name'?
There are several issues:
You provided an invalid JSON. matchinfo['participantIdentities'] should be a list but the JSON you provided is missing a closing ]
matchinfo['participantIdentities'] is a list, so you should either provide an index (matchinfo['participantIdentities'][0]['player']['summonerId'] for example) or iterate over all matchinfo['participantIdentities'] entries.
You are trying to access a key that doesn't even exist (at least in the JSON you provided). There is no 'summonerId' key anywhere.
I'm trying to implement ascii doc with snippet for grails rest-api with rest-assured for json response :
{
"instanceList": [
{
"firstName": "Coy",
"lastName": "T",
"pictureUrl": null,
"email": "bootstrap#cc.com",
"bio": null,
"skills": [],
"interestAreas": []
},
{
"firstName": "Jane",
"lastName": "D",
"pictureUrl": null,
"email": "jane#cc.com",
"bio": null,
"skills": [],
"interestAreas": []
},
{
"firstName": "Cause",
"lastName": "C",
"pictureUrl": "https://cc-picture.com",
"email": "cc-user#code.com",
"bio": "cc beyond infinity",
"skills": [],
"interestAreas": []
},
{
"firstName": "sachidanand",
"lastName": "v",
"pictureUrl": null,
"email": "cc#cc.com",
"bio": null,
"skills": [],
"interestAreas": []
}
],
"totalCount": 4
}
and the code snippet of UserDocumentationApiSpec(as IntegrationTest) :
void 'test and document get request for /user'() {
expect:
given(documentationSpec)
.header("AuthToken", "TokenValue")
.accept(MediaType.APPLICATION_JSON.toString())
.filter(document('user-list-v1',
preprocessRequest(modifyUris()
.host('127.0.0.1')
.removePort()),
preprocessResponse(prettyPrint()),
responseFields(
fieldWithPath("[].firstName").description("First name of user"),
fieldWithPath("[].lastName").description("Last name of user"),
fieldWithPath("[].pictureUrl").type(JsonFieldType.STRING).description("Picture Url of user"),
fieldWithPath("[].email").description("Email address of user"),
fieldWithPath("[].bio").description("Bio data of user"),
fieldWithPath("totalCount").description("Count of instanceList field"),
fieldWithPath("type").description("Type of result")
))).
when()
.port(8080)
.get('/api/v1/user')
.then()
.assertThat()
.statusCode(is(200))
}
This part of code giving me error trace as :
org.springframework.restdocs.payload.PayloadHandlingException: com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input
at [Source: [B#2c6adbe3; line: 1, column: 1]
at org.springframework.restdocs.payload.JsonContentHandler.readContent(JsonContentHandler.java:84)
at org.springframework.restdocs.payload.JsonContentHandler.findMissingFields(JsonContentHandler.java:50)
at org.springframework.restdocs.payload.AbstractFieldsSnippet.validateFieldDocumentation(AbstractFieldsSnippet.java:113)
at org.springframework.restdocs.payload.AbstractFieldsSnippet.createModel(AbstractFieldsSnippet.java:74)
at org.springframework.restdocs.snippet.TemplatedSnippet.document(TemplatedSnippet.java:64)
at org.springframework.restdocs.generate.RestDocumentationGenerator.handle(RestDocumentationGenerator.java:192)
at org.springframework.restdocs.restassured.RestDocumentationFilter.filter(RestDocumentationFilter.java:63)
at com.jayway.restassured.internal.filter.FilterContextImpl.next(FilterContextImpl.groovy:73)
at com.jayway.restassured.filter.session.SessionFilter.filter(SessionFilter.java:60)
at com.jayway.restassured.internal.filter.FilterContextImpl.next(FilterContextImpl.groovy:73)
at org.springframework.restdocs.restassured.RestAssuredRestDocumentationConfigurer.filter(RestAssuredRestDocumentationConfigurer.java:65)
at com.jayway.restassured.internal.filter.FilterContextImpl.next(FilterContextImpl.groovy:73)
at com.jayway.restassured.internal.RequestSpecificationImpl.applyPathParamsAndSendRequest(RequestSpecificationImpl.groovy:1574)
at com.jayway.restassured.internal.RequestSpecificationImpl.get(RequestSpecificationImpl.groovy:159)
at com.converge.docs.UserApiDocumentationSpec.$tt__$spock_feature_0_0(UserApiDocumentationSpec.groovy:73)
at com.converge.docs.UserApiDocumentationSpec.test and document get request for /user_closure2(UserApiDocumentationSpec.groovy)
at groovy.lang.Closure.call(Closure.java:426)
at groovy.lang.Closure.call(Closure.java:442)
at grails.transaction.GrailsTransactionTemplate$1.doInTransaction(GrailsTransactionTemplate.groovy:70)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133)
at grails.transaction.GrailsTransactionTemplate.executeAndRollback(GrailsTransactionTemplate.groovy:67)
at com.converge.docs.UserApiDocumentationSpec.test and document get request for /user(UserApiDocumentationSpec.groovy)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input
at [Source: [B#2c6adbe3; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3781)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3721)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2819)
at org.springframework.restdocs.payload.JsonContentHandler.readContent(JsonContentHandler.java:81)
... 21 more
Please correct me where I'm going wrong...
The only reason I see for this failing is that you are not getting the expected response from the /api/v1/user endpoint.
Follow these steps:
Check that instances are there in the database.
Check that you are sending the correct token.
Write a test case using the RestBuilder and see if you actually get the expected response.
Your code and JSON looks fine.
Also, make sure you follow this issue and mark your empty array field as optional and explicitly provide a type for its contents.
I hope this helps.
I frequently see the error: org.springframework.restdocs.payload.PayloadHandlingException: com.fasterxml.jackson.databind.JsonMappingException: No content to map due to end-of-input
When there is no response. I see this is a get request. These integration tests work a little bit differently in Grails than RestClient. Have you setup a sample data point in this test or in the bootstrap.Groovy file? I can't see the rest of the code see how you are running this as an integration test. In my sample Grails example, I setup some test data in the Bootstrap.groovy file.
Please let me know if I can help.