Converting the dictionary inside the list to the nested dictionary - mysql

have a list of tuples, called employee_data, where each list element is a tuple that corresponds to a class and a point that a employee can earn. For example,
employee_data =
[{
"name": "asd",
"lastname": "abc",
"birthday": 15/15/2021,
"birthplace": "CA",
"live_place": "USA",
"email": "sss.com",
"website": "sss.com",
"Phone_number": "12345678901",
"work_number": "abc",
"save_date": 15/15/2021,
"start_date": 15/15/2021,
"leave_date": 15/15/2021,
"project": "End-Of-Support",
"age_in_months": 256 ,
"Age_in_Years": 15.3,
"Computer_name": 'pc1',
"computer_cpu": 8,
"computer_ram": 12,
"computer_ssd": 256,
},
{
"name": "asd",
"lastname": "abc",
"birthday": 16/15/2021,
"birthplace": "CA",
"live_place": "USA",
"email": "sss.com",
"website": "sss.com",
"Phone_number": "12345678901",
"work_number": "abc",
"save_date": 15/15/2021,
"start_date": 15/15/2021,
"leave_date": 15/15/2021,
"project": "End-Of-Support",
"age_in_months": 256 ,
"Age_in_Years": 15.3,
"Computer_name": 'pc1',
"computer_cpu": 8,
"computer_ram": 12,
"computer_ssd": 256,
}]
Nested Dict ID
name
lastname
birthday
birthplace
live_place
email
0
asd
abc
15/15/2021
USA
CAD
asd#mail.com
1
asd2
abc
16/15/2021
CAD
USA
abc#mail.com
I tried this functions but couldn't fix error.
Here's a link!
My issue is, this dictionary in list. I want to create nested dict for mapping datas.
[{0{"name": "asd",
"lastname": "abc",
"birthday": 15/15/2021,
"birthplace": "CA",
"live_place": "USA",
"email": "sss.com",}
1{"name": "asd2",
"lastname": "abc",
"birthday": 16/15/2021,
"birthplace": "CA",
"live_place": "USA",
"email": "sss.com",}]
if employee has same last name i want to get employee nestedDictID than i'll import all infomartion on different list and table ..
d = { x['lastname']: x['abc'] for x in employee_data}
KeyError: 'abc'

You can create a list of tuples with the required data extracted from json which you can use to export it to SQL tables. Refer the below code:
import pandas as pd
data = [(item.get('name'), item.get('lastname'),item.get('birthday'),item.get('birthplace'), item.get('live_place'), item.get('email')) for item in employee_data]
print(data)
df = pd.DataFrame.from_records(data, columns=['name', 'lastname', 'birthday','birthplace','live_place','email']))
And then you can use df.to_sql from pandas.

Related

Fetching the value from the key in Python's dict

I am fetching 'value' from 'key' in JSON but I do not know why I cannot fetch the target information. Code below
import json
import requests
#Person's ID
id=1194452
#Url
info=requests.get(f'https://api.brokercheck.finra.org/search/individual/{id}?hl=true&includePrevious=true&sort=bc_lastname_sort+asc,bc_firstname_sort+asc,bc_middlename_sort+asc,score+desc&wt=json')
#convert to JSON
x=info.json()
#Value
print(x["firstName"])
The following is JSON of this example. (Information in the link is available to public. I, thus, do not delete the details of 1194452).
{'hits': {'total': 1, 'hits': [{'_type': '_doc', '_source': {'content': '{"basicInformation": {"individualId": 1194452, "firstName": "STEPHEN", "middleName": "MICHAEL", "lastName": "SCHULTZ", "otherNames": [], "bcScope": "Active", "iaScope": "NotInScope", "daysInIndustryCalculatedDate": "4/11/1984"}, "currentEmployments": [{"firmId": 5685, "firmName": "PRUCO SECURITIES, LLC.", "iaOnly": "N", "registrationBeginDate": "4/12/1984", "firmBCScope": "ACTIVE", "firmIAScope": "ACTIVE", "iaSECNumber": "52208", "iaSECNumberType": "801", "bdSECNumber": "16402", "branchOfficeLocations": [{"locatedAtFlag": "Y", "supervisedFromFlag": "N", "privateResidenceFlag": "N", "branchOfficeId": "737412", "street1": "445 Broadhollow Road", "street2": "Suite 405", "city": "Melville", "cityAlias": ["DIX HILLS", "HUNTINGTN STA", "HUNTINGTON STATION", "MELVILLE"], "state": "NY", "country": "United States", "zipCode": "11747", "latitude": "40.785118", "longitude": "-73.404965", "geoLocation": "40.785118,-73.404965", "nonRegisteredOfficeFlag": "N", "elaBeginDate": "10/15/2021"}]}], "currentIAEmployments": [], "previousEmployments": [], "previousIAEmployments": [], "disclosureFlag": "N", "iaDisclosureFlag": "N", "disclosures": [], "examsCount": {"stateExamCount": 0, "principalExamCount": 0, "productExamCount": 2}, "stateExamCategory": [], "principalExamCategory": [], "productExamCategory": [{"examCategory": "SIE", "examName": "Securities Industry Essentials Examination", "examTakenDate": "10/1/2018", "examScope": "BC"}, {"examCategory": "Series 6", "examName": "Investment Company Products/Variable Contracts Representative Examination", "examTakenDate": "4/2/1984", "examScope": "BC"}], "registrationCount": {"approvedSRORegistrationCount": 1, "approvedFinraRegistrationCount": 1, "approvedStateRegistrationCount": 1, "approvedIAStateRegistrationCount": 0}, "registeredStates": [{"state": "New York", "regScope": "BC", "status": "APPROVED", "regDate": "4/12/1984"}], "registeredSROs": [{"sro": "FINRA", "status": "APPROVED"}], "brokerDetails": {"hasBCComments": "N", "hasIAComments": "N", "legacyReportStatusDescription": "Not Requested"}}'}}]}}
Questions
Since type(x) is 'dict' in Python, why can't I fetch the value of the key?
What is 'hits' in JSON? I googled 'hits' and find that it is 'Hyperlink Induced Topic Search'.
Thank you
In this case, hits is the numbers of results of your query. It's just the keys this API is using to structure its request response.
In order to get firstName, you have to navigate along the nested dictionaries (accessing elements with a key) and lists (accessing elements with an index). Since you only have one hit in your case (giving a person's ID and receiving a response with total=1), you'll access the unique hit with index 0.
One thing to look out for though: the value associated with content is a json string. So you should read it as json.
In order to print all first names:
import json
x = {'hits': {'total': 1, 'hits': [{'_type': '_doc', '_source': {'content': '{"basicInformation": {"individualId": 1194452, "firstName": "STEPHEN", "middleName": "MICHAEL", "lastName": "SCHULTZ", "otherNames": [], "bcScope": "Active", "iaScope": "NotInScope", "daysInIndustryCalculatedDate": "4/11/1984"}, "currentEmployments": [{"firmId": 5685, "firmName": "PRUCO SECURITIES, LLC.", "iaOnly": "N", "registrationBeginDate": "4/12/1984", "firmBCScope": "ACTIVE", "firmIAScope": "ACTIVE", "iaSECNumber": "52208", "iaSECNumberType": "801", "bdSECNumber": "16402", "branchOfficeLocations": [{"locatedAtFlag": "Y", "supervisedFromFlag": "N", "privateResidenceFlag": "N", "branchOfficeId": "737412", "street1": "445 Broadhollow Road", "street2": "Suite 405", "city": "Melville", "cityAlias": ["DIX HILLS", "HUNTINGTN STA", "HUNTINGTON STATION", "MELVILLE"], "state": "NY", "country": "United States", "zipCode": "11747", "latitude": "40.785118", "longitude": "-73.404965", "geoLocation": "40.785118,-73.404965", "nonRegisteredOfficeFlag": "N", "elaBeginDate": "10/15/2021"}]}], "currentIAEmployments": [], "previousEmployments": [], "previousIAEmployments": [], "disclosureFlag": "N", "iaDisclosureFlag": "N", "disclosures": [], "examsCount": {"stateExamCount": 0, "principalExamCount": 0, "productExamCount": 2}, "stateExamCategory": [], "principalExamCategory": [], "productExamCategory": [{"examCategory": "SIE", "examName": "Securities Industry Essentials Examination", "examTakenDate": "10/1/2018", "examScope": "BC"}, {"examCategory": "Series 6", "examName": "Investment Company Products/Variable Contracts Representative Examination", "examTakenDate": "4/2/1984", "examScope": "BC"}], "registrationCount": {"approvedSRORegistrationCount": 1, "approvedFinraRegistrationCount": 1, "approvedStateRegistrationCount": 1, "approvedIAStateRegistrationCount": 0}, "registeredStates": [{"state": "New York", "regScope": "BC", "status": "APPROVED", "regDate": "4/12/1984"}], "registeredSROs": [{"sro": "FINRA", "status": "APPROVED"}], "brokerDetails": {"hasBCComments": "N", "hasIAComments": "N", "legacyReportStatusDescription": "Not Requested"}}'}}]}}
for hit in x['hits']['hits']:
person_json = json.loads(hit['_source']['content'])
print(person_json['basicInformation']['firstName'])
Or get the first name of the first (and in this case only) hit:
json.loads(x['hits']['hits'][0]['_source']['content'])['basicInformation']['firstName']
API response as a JSON object but the required data is in string form of JSON object Data. So You have to call json.loads() to make into dict and hits means listing object of data.
Script:
import requests
import json
id = 1194452
url = f'https://api.brokercheck.finra.org/search/individual/{id}?hl=true&includePrevious=true&sort=bc_lastname_sort+asc,bc_firstname_sort+asc,bc_middlename_sort+asc,score+desc&wt=json'
response = requests.get(url)
data = response.json()
for hit in data['hits']['hits']:
content_str = hit['_source']['content']
content = json.loads(content_str)
first_name = content['basicInformation']['firstName']
print(first_name)
Output:
STEPHEN

How to extract JSON list column from JSON file and convert it into dataframe

I have a JSON file that consists of 1m of data. I wanted to extract the skills based on id so that each id has different skills. Can anyone suggest to me how to extract the skills column from the JSON file and convert it into a data frame? I wanted only skill among three columns present in the skills list.
I am attaching a few rows from the JSON file.
{
"id": "3d86309e-64f6-4df8-ba60-cce431870bfb",
"location": {
"city": "Assen",
"country": "Netherlands",
"longitude": 6.564228534698486,
"latitude": 52.99275207519531
},
"educations": [
{
"title": "Bachelor Bedrijfskundige Informatica",
"institution": "Hanzehogeschool Groningen / Hanze University of Applied Sciences Groningen",
"start_date": "2002-01-01",
"end_date": "2005-12-31",
"ongoing": false,
"edu_type_id": 16870377,
"edu_cat_id": 951006,
"level": 11
},
{
"title": "Bachelor Public Relations, Marketing, Communication",
"institution": "NHL Hogeschool",
"start_date": "1994-01-01",
"end_date": "1997-12-31",
"ongoing": false,
"edu_type_id": 953096,
"edu_cat_id": 951099,
"level": 11
},
{
"title": " ",
"institution": "Gomarus College",
"start_date": null,
"end_date": null,
"ongoing": false,
"edu_type_id": null,
"edu_cat_id": null,
"level": null
}
],
"work_experiences": [
{
"title": "medewerker ICT business development",
"company_name": "Woningcorporatie Actium",
"location": {
"city": null,
"country": null,
"longitude": null,
"latitude": null
},
"start_date": "2014-10-01",
"end_date": null,
"classification": {
"function_type": "Sales, business development and key account managers",
"function_type_id": 1568086,
"function_cat": "Sales, account and business development managers and representatives",
"function_cat_id": 1567386,
"level": 5
}
},
{
"title": "Functioneel Applicatiebeheerder",
"company_name": "Actium Assen",
"location": {
"city": null,
"country": null,
"longitude": null,
"latitude": null
},
"start_date": "2013-02-01",
"end_date": "2014-09-30",
"classification": {
"function_type": "Ict service and information managers",
"function_type_id": 1567953,
"function_cat": "Ict service and information managers",
"function_cat_id": 1567269,
"level": 5
}
},
{
"title": "Change Coördinator",
"company_name": "KPN Consulting",
"location": {
"city": null,
"country": null,
"longitude": null,
"latitude": null
},
"start_date": "2006-08-01",
"end_date": "2012-09-30",
"classification": {
"function_type": "Securities and finance dealers and brokers",
"function_type_id": 1567651,
"function_cat": "Finance, securities and investment staff",
"function_cat_id": 1567434,
"level": 6
}
},
{
"title": "Coordinator Automatisering",
"company_name": "Spinder Products",
"location": {
"city": null,
"country": null,
"longitude": null,
"latitude": null
},
"start_date": "2000-01-01",
"end_date": "2006-12-31",
"classification": {
"function_type": "Ict service and information managers",
"function_type_id": 1567953,
"function_cat": "Ict service and information managers",
"function_cat_id": 1567269,
"level": 6
}
}
],
"skills": [
{
"skill": "business development",
"skill_id": 972528,
"skill_type_id": 34097811
},
{
"skill": "Automatisering",
"skill_id": 1588585,
"skill_type_id": 954000
}
],
"languages": [
{
"language": "Dutch",
"proficiency": "native or bilingual proficiency"
},
{
"language": "English",
"proficiency": "professional working proficiency"
},
{
"language": "German",
"proficiency": "elementary proficiency"
}
],
"certificates": [],
"working_years": 20
}
I want my output to be in the format :
skill
business development,Automatisering
I take it from your question that you only want the names of each skill in the data frame.
The following code will get that if your JSON is in the file "data.json"
from pandas import DataFrame
import json
with open('data.json') as file:
data = DataFrame([skill["skill"] for skill in json.loads(file.read())["skills"]])
print(data)
will print the following from the DataFrame "data"
0
0 business development
1 Automatisering
The 0th column in the data frame is the "skill" column from your JSON, but if you wanted a different column, such as "skill_id" just replace skill["skill"] with skill["skill_id"] in the code above.
If you just want all three columns and don't want to filter any out, the code is even shorter
from pandas import DataFrame
import json
with open('data.json') as file:
data = DataFrame(json.loads(file.read())["skills"])
print(data)
I'm not sure why you need a DataFrame rather than just a normal list, especially considering you are just getting a list of string.
In the case that you just want a list of the names of the skills you can run
with open('data.json') as file:
data = [skill["skill"] for skill in json.loads(file.read())["skills"]]
print(data)
I just removed the parts relating to the DataFrame.
JSON.loads takes in the file after you've opened it.

Json parsing and mapping keys

I'm trying to map json to send it to another applications which expects the data in it's own formats, I'm using the AWS Lambda which when an event is triggered GETs below json which needs to be parsed and mapped according to what application expects. but the key stack is so large eg "rateCode" in "ratePlan" in "Details", there are almost 20000 rate codes like "abc", "xyz",... it is not a great idea to map like
if "rateCode" == "abc":
application_two_dict["rate_code"] = 123
so there are many more keys which keys has large set of values. what is the best way to map those keys. Also this needs to be happened in two way like when we get data from application two we need the parse the json and map the keys other way around which application one understands and vice versa.
{
"customer": {
"firstName": "john",
"lastName": "doe",
"email": "john.doe#test.com",
"mailingAddress": {
"address1": "123 N 1st st",
"address2": "789",
"countryCode": "USA",
"stateCode": "AZ",
"city": "Phoenix",
"postalCode": "34567"
},
"telephoneNumber": {
"telephoneNumber": "1235456789"
}
},
"paymentAccount": {
"firstName": "john",
"lastName": "doe",
"paymentAccountType": "VA",
"expirationDate": "2021-05-31",
"billingAddress": {
"address1": "1234 N 1st st",
"address2": "435",
"city": "Phoenix",
"countryCode": "USA",
"postalCode": "213445",
"stateCode": "AZ"
}
},
"Details": {
"123": [{
"quantity": 1,
"ratePlan": {
"rateCode": "abc",
"DetailsList": [{
"CategoryCode": "1234",
}]
}
}
}
I still don't have the exact format of app2 json
example json
for example
app1 json
{
"Details": {
"123": [{
"quantity": 1,
"ratePlan": {
"rateCode": "abc",
"DetailsList": [{
"CategoryCode": "1234",
}]
}
}
}
}
app 2 json
{
user_details_code : 123,
quantity : [1],
rate_plan : {
rate_code: "xyz",
category_code : "US_SAN"
}
}
I would try the following ways:
- use two static map with rateCode as keys
{ "abc": "123", ...} and { "123": "abc", ...} and use them to get values from the other app rateCode value.
use a database to fetch rateCode for app2 based on app1 value. Dynamo has a very low latency and can be very effective.
Maybe you could describe more precisely the json structure of the two apps.

How to join an element which is in an array which is also an part of another array for best performance in couchbase?

Below is the sample document for organization
{
"org": {
"id": "org_2_1084",
"organizationId": 1084,
"organizationName": "ABC",
"organizationRoles": [
{
"addressAssociations": [
{
"activeDate": "2019-08-03T18:52:00.857Z",
"addressAssocTypeId": -2,
"addressId": 100,
"ownershipStatus": 1,
"srvAddressStatus": 1
},
{
"activeDate": "2019-08-03T18:52:00.857Z",
"addressAssocTypeId": -2,
"addressId": 105,
"ownershipStatus": 1,
"srvAddressStatus": 1
}
],
"name": "NLUZ",
"organizationRoleId": 893,
"roleSpecId": -104,
"statusId": 1,
"statusLastChangedDate": "2019-08-04T13:14:44.616Z"
},
{
"addressAssociations": [
{
"activeDate": "2019-08-03T18:52:00.857Z",
"addressAssocTypeId": -2,
"addressId": 582,
"ownershipStatus": 1,
"srvAddressStatus": 1
},
{
"activeDate": "2019-08-03T18:52:00.857Z",
"addressAssocTypeId": -2,
"addressId": 603,
"ownershipStatus": 1,
"srvAddressStatus": 1
}
],
"name": "TXR",
"organizationRoleId": 894,
"partyRoleAssocs": [
{
"partyRoleAssocId": "512"
}
],
"roleSpecId": -103,
"statusId": 1,
"statusLastChangedDate": "2019-08-04T13:14:44.616Z"
},
}
and below is the sample document for address
{
"address": {
"address1": "string",
"address2": "string",
"addressId": "1531",
"changeWho": "string",
"city": "string",
"fxGeocode": "string",
"houseNumber": "string",
"id": "1531",
"isActive": true,
"postalCode": "string",
"state": "string",
"streetName": "string",
"tenantId": "2",
"type": "address",
"zip": "string"
}
}
In an organization there are multiple organizationRoles and in an organizationRole there are multiple addressAssociations.Each addressAssociation contains an addressId and corresponding to this addressId
address is stored in address document.
Now i have to get organizationRole name, organizationRole id, city, zip from the two documents.
What should be the best way to approach this situation for the best performance in couchbase?
I am thinking about using join but not able to come up with an exact query for this scenario.
I have tried the below query but its not working.
select *
from 'contact' As A UNNEST 'contact'.organizationRoles as Roles
UNNEST Roles.addressAssociations address
Join 'contact' As B
on address.addressID=B.addressID
where A.type="organization" and B.type="address";
You are in the right direction.
In addressAssociations the addressId is number, In address addressId is string. string and number not same and no implicit type casting. You must fix data or do explicit type casting using TOSTRING(), TONUMBER() etc...
Also N1QL field names are case-sensitive your query using addressID vs addressId (in the document)
SELECT r.name AS organizationRoleName, r.organizationRoleId, a.city, a.zip
FROM contact AS c
UNNEST c.organizationRoles AS r
UNNEST r.addressAssociations AS aa
jOIN contact AS a
ON aa.addressId = a.addressId
WHERE c.type = "organization" AND a.type = "address";
CREATE INDEX ix1 ON contact(addressId, city, zip) WHERE type = "address";
Check out https://blog.couchbase.com/ansi-join-support-n1ql/

How to convert a json file without the same length of json objects into csv

I have a json file and I want to convert it to csv format.
The problem I face is that every json object in the file has not the same length of the converted columns I have. For example the one object have 49 columnns and the next have 50.
I provide here an example of 2 data from which the first one has not the creator.slug but the next has it is and so there is the problem with data. The problem is that the process create all 50 columns but for the object which don't have the value creator.slug it takes the next price.
{
"id": 301852363,
"name": "Song of the Sea",
"blurb": "One evening, two shows: SIRENS and The Girl From Bare Cove. Building a community. Giving voice to survivors of sexual violence.",
"goal": 5000,
"pledged": 671,
"state": "live",
"slug": "song-of-the-sea",
"disable_communication": false,
"country": "US",
"currency": "USD",
"currency_symbol": "$",
"currency_trailing_code": true,
"deadline": 1399293386,
"state_changed_at": 1397133386,
"created_at": 1396672480,
"launched_at": 1397133386,
"backers_count": 20,
"photo": {
"full": "https://s3.amazonaws.com/ksr/projects/939387/photo-full.jpg?1397874930",
"ed": "https://s3.amazonaws.com/ksr/projects/939387/photo-ed.jpg?1397874930",
"med": "https://s3.amazonaws.com/ksr/projects/939387/photo-med.jpg?1397874930",
"little": "https://s3.amazonaws.com/ksr/projects/939387/photo-little.jpg?1397874930",
"small": "https://s3.amazonaws.com/ksr/projects/939387/photo-small.jpg?1397874930",
"thumb": "https://s3.amazonaws.com/ksr/projects/939387/photo-thumb.jpg?1397874930",
"1024x768": "https://s3.amazonaws.com/ksr/projects/939387/photo-1024x768.jpg?1397874930",
"1536x1152": "https://s3.amazonaws.com/ksr/projects/939387/photo-1536x1152.jpg?1397874930"
},
"creator": {
"id": 1714048992,
"name": "Maridee Slater",
"slug": "maridee",
"avatar": {
"thumb": "https://s3.amazonaws.com/ksr/avatars/996153/DSC_0310.thumb.jpg?1337713264",
"small": "https://s3.amazonaws.com/ksr/avatars/996153/DSC_0310.small.jpg?1337713264",
"medium": "https://s3.amazonaws.com/ksr/avatars/996153/DSC_0310.medium.jpg?1337713264"
},
"urls": {
"web": {
"user": "https://www.kickstarter.com/profile/maridee"
},
"api": {
"user": "https://api.kickstarter.com/v1/users/1714048992?signature=1398256877.e6d63adcca055cd041a5920368b197d40459f748"
}
}
},
"location": {
"id": 2459115,
"name": "New York",
"slug": "new-york-ny",
"short_name": "New York, NY",
"displayable_name": "New York, NY",
"country": "US",
"state": "NY",
"urls": {
"web": {
"discover": "https://www.kickstarter.com/discover/places/new-york-ny",
"location": "https://www.kickstarter.com/locations/new-york-ny"
},
"api": {
"nearby_projects": "https://api.kickstarter.com/v1/discover?signature=1398256786.89b2c4539aeab4ad25982694dd7e659e8c12028f&woe_id=2459115"
}
}
},
"category": {
"id": 17,
"name": "Theater",
"slug": "theater",
"position": 14,
"urls": {
"web": {
"discover": "http://www.kickstarter.com/discover/categories/theater"
}
}
},
"urls": {
"web": {
"project": "https://www.kickstarter.com/projects/maridee/song-of-the-sea"
}
}
},
{
"id": 967108708,
"name": "Good Bread Alley",
"blurb": "A play by April Yvette Thompson. A Gullah Healer Woman and an Afro-Cuban Priest forge a new world of magic & dreams in Jim Crow Miami.",
"goal": 100000,
"pledged": 33242,
"state": "live",
"slug": "good-bread-alley",
"disable_communication": false,
"country": "US",
"currency": "USD",
"currency_symbol": "$",
"currency_trailing_code": true,
"deadline": 1399271911,
"state_changed_at": 1396334313,
"created_at": 1393278556,
"launched_at": 1396334311,
"backers_count": 261,
"photo": {
"full": "https://s3.amazonaws.com/ksr/projects/883489/photo-full.jpg?1397869394",
"ed": "https://s3.amazonaws.com/ksr/projects/883489/photo-ed.jpg?1397869394",
"med": "https://s3.amazonaws.com/ksr/projects/883489/photo-med.jpg?1397869394",
"little": "https://s3.amazonaws.com/ksr/projects/883489/photo-little.jpg?1397869394",
"small": "https://s3.amazonaws.com/ksr/projects/883489/photo-small.jpg?1397869394",
"thumb": "https://s3.amazonaws.com/ksr/projects/883489/photo-thumb.jpg?1397869394",
"1024x768": "https://s3.amazonaws.com/ksr/projects/883489/photo-1024x768.jpg?1397869394",
"1536x1152": "https://s3.amazonaws.com/ksr/projects/883489/photo-1536x1152.jpg?1397869394"
},
"creator": {
"id": 749318998,
"name": "April Yvette Thompson",
"avatar": {
"thumb": "https://s3.amazonaws.com/ksr/avatars/9751919/kick_thumb.thumb.jpg?1396128151",
"small": "https://s3.amazonaws.com/ksr/avatars/9751919/kick_thumb.small.jpg?1396128151",
"medium": "https://s3.amazonaws.com/ksr/avatars/9751919/kick_thumb.medium.jpg?1396128151"
},
"urls": {
"web": {
"user": "https://www.kickstarter.com/profile/749318998"
},
"api": {
"user": "https://api.kickstarter.com/v1/users/749318998?signature=1398256877.af4db50c53f93339b05c7813f4534e833eaca270"
}
}
},
"location": {
"id": 2459115,
"name": "New York",
"slug": "new-york-ny",
"short_name": "New York, NY",
"displayable_name": "New York, NY",
"country": "US",
"state": "NY",
"urls": {
"web": {
"discover": "https://www.kickstarter.com/discover/places/new-york-ny",
"location": "https://www.kickstarter.com/locations/new-york-ny"
},
"api": {
"nearby_projects": "https://api.kickstarter.com/v1/discover?signature=1398256786.89b2c4539aeab4ad25982694dd7e659e8c12028f&woe_id=2459115"
}
}
},
"category": {
"id": 17,
"name": "Theater",
"slug": "theater",
"position": 14,
"urls": {
"web": {
"discover": "http://www.kickstarter.com/discover/categories/theater"
}
}
},
"urls": {
"web": {
"project": "https://www.kickstarter.com/projects/749318998/good-bread-alley"
}
}
}
Here is the code I run
#open the json file
require(RJSONIO)
require(rjson)
library("rjson")
filename2 <- "C:/Users/Desktop/in.json"
json_data <- fromJSON(file = filename2)
#unlist the json because it has a problem
unlisted <- unlist(unlist(json_data,recursive=FALSE),recursive=FALSE)
use to fill the NA but as I can understand now it is for already existed nulls http://stackoverflow.com/questions/16947643/getting-imported-json-data-into-a-data-frame-in-r/16948174#16948174
unlisted <- lapply(unlisted, function(x) {
x[sapply(x, is.null)] <- NA
unlist(x)
})
json <- do.call("rbind", unlisted)
Here is a full list with the columns of the output csv and after that I provide what I would like to keep from every object of json, less columns
id
name
blurb
goal
pledged
state
slug
disable_communication
country
currency
currency_symbol
currency_trailing_code
deadline
state_changed_at
created_at
launched_at
backers_count
photo.full
photo.ed
photo.med
photo.little
photo.small
photo.thumb
photo.1024x768
photo.1536x1152
creator.id
creator.name
creator.slug
creator.avatar.thumb
creator.avatar.small
creator.avatar.medium
creator.urls.web.user
creator.urls.api.user
location.id
location.name
location.slug
location.short_name
location.displayable_name
location.country
location.state
location.urls.web.discover
location.urls.web.location
location.urls.api.nearby_projects
category.id
category.name
category.slug
category.position
category.urls.web.discover
category.urls.web.project
category.urls.web.rewards
Here it is the list of columns I would try to have in the output csv:
id
name
blurb
goal
pledged
state
slug
disable_communication
country
currency
currency_symbol
currency_trailing_code
deadline
state_changed_at
created_at
launched_at
backers_count
creator.id
creator.name
creator.slug
location.id
location.name
location.slug
location.short_name
location.displayable_name
location.country
location.state
category.id
category.name
category.slug
category.position
Looks like there's a very similar question (with answer, though not pure R) here: convert json to csv format
However, since you do seem to want most, if not all, the JSON in a "wide CSV" format you can use fromJSON from jsonlite, rbindlist from data.table (which gets you the fill=TRUE parameter to handle uneven lists nicely) and unlist:
library(jsonlite)
library(data.table)
# tell fromJSON we want a list back
json_data <- fromJSON("in.json", simplifyDataFrame=FALSE)
# iterate over the list we have so we can "flatten" it then
# covert it back to a data.frame-like object
dat <- rbindlist(lapply(json_data, function(x) {
as.list(unlist(x))
}), fill=TRUE)
You may need to tweak column names, but I think this gets you what you're looking for.