multiple JSON objects into R from one txt file - json

I am very new to Json files. I scraped a txt file with some million json objects such as:
{
"created_at":"Mon Oct 14 21:04:25 +0000 2013",
"default_profile":true,
"default_profile_image":true,
"description":"...",
"followers_count":5,
"friends_count":560,
"geo_enabled":true,
"id":1961287134,
"lang":"de",
"name":"Peter Schmitz",
"profile_background_color":"C0DEED",
"profile_background_image_url":"http://abs.twimg.com/images/themes",
"utc_offset":-28800,
...
}
{
"created_at":"Fri Oct 17 20:04:25 +0000 2015",
...
}
I want to extract the columns into a data frame in R:
Variable Value
created_at X
default_profile Y
…
In general, similar to how done here(multiple Json objects in one file extract by python) in Python. If anyone has an idea or a suggestion, help would be much appreciated! Thank you!

Here is an example on how you could approach it with two objects. I assume you were able to read the JSON from a file, otherwise see here.
myjson = '{"created_at": "Mon Oct 14 21:04:25 +0000 2013", "default_profile": true,
"default_profile_image": true, "description": "...", "followers_count":
5, "friends_count": 560, "geo_enabled": true, "id": 1961287134, "lang":
"de", "name": "Peter Schmitz", "profile_background_color": "C0DEED",
"profile_background_image_url": "http://abs.twimg.com/images/themes", "utc_offset": -28800}
{"created_at": "Mon Oct 15 21:04:25 +0000 2013", "default_profile": true,
"default_profile_image": true, "description": "...", "followers_count":
5, "friends_count": 560, "geo_enabled": true, "id": 1961287134, "lang":
"de", "name": "Peter Schmitz", "profile_background_color": "C0DEED",
"profile_background_image_url": "http://abs.twimg.com/images/themes", "utc_offset": -28800}
'
library("rjson")
# Split the text into a list of all JSON objects. I chose '!x!x!' pretty randomly.. There may be better ways of keeping the brackets wile splitting.
my_json_objects = head(strsplit(gsub('\\}','\\}!x!x!', myjson),'!x!x!')[[1]],-1)
# read the text as JSON objects
json_data <- lapply(my_json_objects, function(x) {fromJSON(x)})
# Transform to dataframes
json_data <- lapply(json_data, function(x) {data.frame(val=unlist(x))})
Output:
[[1]]
val
created_at Mon Oct 14 21:04:25 +0000 2013
default_profile TRUE
default_profile_image TRUE
description ...
followers_count 5
friends_count 560
geo_enabled TRUE
id 1961287134
lang de
name Peter Schmitz
profile_background_color C0DEED
profile_background_image_url http://abs.twimg.com/images/themes
utc_offset -28800
[[2]]
val
created_at Mon Oct 15 21:04:25 +0000 2013
default_profile TRUE
default_profile_image TRUE
description ...
followers_count 5
friends_count 560
geo_enabled TRUE
id 1961287134
lang de
name Peter Schmitz
profile_background_color C0DEED
profile_background_image_url http://abs.twimg.com/images/themes
utc_offset -28800
Hope this helps!

Related

Any way to reformat "json" file so I can extract the data?

A colleague of mine sent this file over to me asking me to extract the data into a database.... issue is, it is not formatted correctly. (It is supposed to be a json file... even sent as orders.json).
Here is the file contents (The other files are much larger, this is just a snippet):
[
"Buyer: Jane Doe ",
[
"jane Doe",
"Street Here",
"State, ZIP",
"Phone : 8888888"
],
"Ship By: Wed, Oct 11, 2017 to Thu, Oct 12, 2017",
"Deliver By: Mon, Oct 16, 2017 to Thu, Oct 19, 2017",
"Shipping Service: Standard",
[
[
"Product name",
"SKU: 99999999999",
"ASIN: 999999999",
"Condition: New",
"Listing ID: 99999999999",
"Order Item ID: 9999999999",
"Customizations:",
"Size: 16 Inches",
"Monogram Letters: BSA",
"Color: Unpainted (goes out in 48 hrs)"
]
],
"A",
"Wed, Oct 4, 2017, 8:03 PM PT",
"114-7275553-4341048",
[
"1"
]
]
I believe I will have to go through this slowly and use unnecessary methods to get this to look nice and work... unless I am missing something?
As of right now I can't really access the data in a efficient way.

how do I parse info from a txt file? Python 3.0

This is just a sample of code
{
"created_at": "Fri Jan 31 05:51:59 +0000 2014",
"favorited": false,
"lang": "en",
"place": {
"country_code": "US",
"url": "https://api.twitter.com/1.1/geo/id/cf44347a08102884.json"
},
"retweeted": false,
"source": "Tweetbot for Mac",
"text": "Active crime scene on I-59/20 near Jeff/Tusc Co line. One dead, one injured; shooting involved. Police search in the area; traffic stopped",
"truncated": false
}
How do I parse this in python so that I can get the information in text or lang?
I'm assuming this fragment is incomplete, as it looks like json but is currently invalid. Assuming a valid json document then you can use the json module:
>>> import json
>>> s = """{"lang": "en", "favorited": false, "truncated": false, ... }"""
>>> data = json.loads(s)
>>> data['lang']
'en'
>>> data['text']
'Active crime scene on I-59/20 near Jeff/Tusc Co line. One dead, one injured; shooting involved. Police search in the area; traffic stopped'

How to import a JSON file into MATLAB programatically?

I am trying to load my data from a JSON file into MATLAB that is delimited with ,.
The format of my data is as follows:
{"created_at": "Mon Oct 27 20:35:47 +0000 2014", "tweet": "Silver Finished Up, Gold, Copper, Crude Oil, Nat Gas Down - Live Trading News http://t.co/jNLTUIgHwA", "id": 526834668759285761, "sentiment": "negative"}
{"created_at": "Mon Oct 27 20:36:21 +0000 2014", "tweet": "Gold, Silver slips on lacklustre demand- The Economic Times http://t.co/Jd5Tn9ctfX", "id": 526834810300289024, "sentiment": "negative"}
How would I do so?
As of version 2016b, Matlab has integrated json support.
See:
https://www.mathworks.com/help/matlab/ref/jsondecode.html
In short, you do:
jsonData = jsondecode(fileread('file.json'));
Use jsonLab
One line to read:
tweet_info = loadjson('~/Desktop/test.json')
Here's what's stored in tweet_info{1}
created_at: 'Mon Oct 27 20:35:47 +0000 2014'
tweet: 'Silver Finished Up, Gold, Copper, Crude Oil, Nat Gas Down - Live Trading News http://t.co/jNLTUIgHwA'
id: 5.2683e+17
sentiment: 'negative'
Here's what stored in the test.json file
{"created_at": "Mon Oct 27 20:35:47 +0000 2014", "tweet": "Silver Finished Up, Gold, Copper, Crude Oil, Nat Gas Down - Live Trading News http://t.co/jNLTUIgHwA", "id": 526834668759285761, "sentiment": "negative"}
{"created_at": "Mon Oct 27 20:36:21 +0000 2014", "tweet": "Gold, Silver slips on lacklustre demand- The Economic Times http://t.co/Jd5Tn9ctfX", "id": 526834810300289024, "sentiment": "negative"}

JSON to CSV: How to add filters (columns) in the final Excel table?

First, I apologize if my description is not accurate enough for you, I am a total newbie and I don't know a thing about programming, so don't hesitate to tell me if you need more detailed info, but I will try to be as precise as possible.
So I have downloaded a bunch of tweets thanks to Twitter's API and the Terminal (through Twurl). All the tweets are in a .json file (that I open with TextWrangler, I'm on a Mac) and the thing is that when I export my .json file to a .csv file in order to process and analyze the data more easily thanks to Excel (or at least the Excel version of LibreOffice), I don't have all the parameters I would require for my study, I lack the "bio" part of each Tweet info present in the .json file. In other words, in my final table I have a column for the tweet ID, one for the tweet author, one for the text of the tweet itself and so on... But I don't have a column for the bio of the tweet author, whereas this information is displayed in the .json file itself. So my question is: is there a code or anything which would enable me to have one more column displaying some more info present in the basic .json file in my final .csv table?
Again, this may not be clear, so don't hesitate to tell me if you need me to highlight a specific point.
Thanks in advance for any insight, I really need help on this one, this is for a research project I need to carry on for my PhD, so any help would be more than welcome!
EDIT: As an example, here is a sample of the data I have for one tweet in my original .json file:
{
"created_at": "Mon Apr 28 09:00:40 +0000 2014",
"id": 460705144846712800,
"id_str": "460705144846712832",
"text": "Work can suck a dick today",
"source": "Twitter for iPhone",
"truncated": false,
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": null,
"in_reply_to_user_id_str": null,
"in_reply_to_screen_name": null,
"user": {
"id": 253350311,
"id_str": "253350311",
"name": "JEEEZUS",
"screen_name": "Maxi_Flex",
"location": "Southchestershire",
"url": "http://www.soundcloud.com/maxi_flex",
"description": "Jazz Personality.G Mentality.",
"protected": false,
"followers_count": 457,
"friends_count": 400,
"listed_count": 1,
"created_at": "Thu Feb 17 02:08:57 +0000 2011",
"favourites_count": 1229,
"utc_offset": null,
"time_zone": null,
"geo_enabled": true,
"verified": false,
"statuses_count": 13661,
"lang": "en",
"contributors_enabled": false,
"is_translator": false,
"is_translation_enabled": false,
"profile_background_color": "08ABFC",
"profile_background_image_url": "http://pbs.twimg.com/profile_background_images/444297891977244672/Z1BkfCFB.jpeg",
"profile_background_image_url_https": "https://pbs.twimg.com/profile_background_images/444297891977244672/Z1BkfCFB.jpeg",
"profile_background_tile": true,
"profile_image_url": "http://pbs.twimg.com/profile_images/454073282778902529/gCGicDBH_normal.jpeg",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/454073282778902529/gCGicDBH_normal.jpeg",
"profile_banner_url": "https://pbs.twimg.com/profile_banners/253350311/1392339276",
"profile_link_color": "FA05F2",
"profile_sidebar_border_color": "FFFFFF",
"profile_sidebar_fill_color": "DDEEF6",
"profile_text_color": "333333",
"profile_use_background_image": true,
"default_profile": false,
"default_profile_image": false,
"following": null,
"follow_request_sent": null,
"notifications": null
},
"geo": null,
"coordinates": null,
"place": null,
"contributors": null,
"retweet_count": 0,
"favorite_count": 0,
"entities": {
"hashtags": [],
"symbols": [],
"urls": [],
"user_mentions": []
},
"favorited": false,
"retweeted": false,
"filter_level": "medium",
"lang": "en"
}
So in the final csv file, I have some of the info I mentionned above, but what I would need to add in the csv file is the "description" part (bold) of each string. Any help would be appreciated!
The problem is probably that JSON is hierarchical and CSV is not. I'm guessing that you are only getting the top level JSON elements and not the nested objects. For example if your JSON is:
{
'name': 'test',
'author': {
'id': 123,
'created': ''
}
}
you are only getting 'name' and not 'author.id'? If this is the case, check out other questions on SO related to flattening JSON out for CSV e.g. flattening json to csv format
Any good JSON to CSV converter will work, try this one. If there is somehting funky in the JSON we need an example of the input JSON and what is getting spit out.
If you just need that one field enter the following command on the command line:
cat test.json | sed -n 's/.*description\":\"\([^"]*\)\".*/Description, \1/p' > result.csv
Where test.json is the file with all the JSON entries in it.
Here is the output from an example I ran:
cat test.json | sed -n 's/.*description\":\"\([^"]*\)\".*/\1/p'
Jazz Personality.G Mentality.
Jazz Personality.G Mentality.
Jazz Personality.G Mentality.
Jazz Personality.G Mentality.
If the file is very large you may need to split in to parts:
split -l N test.json part
Where N is the number of lines per part.

How does DROPBOX response (JSON) data look like?

Good day, does anyone know typical JSON response data, when accessing a file? I am more interested in whether or not one can check if a response object is a file or a directory!
Thanx mates
This is a common JSON answer, when you try to get information about file/folder.
You can see more information about request here
if is_dir is true, it's a folder.
{
"size": "225.4KB",
"rev": "35e97029684fe",
"thumb_exists": false,
"bytes": 230783,
"modified": "Tue, 19 Jul 2011 21:55:38 +0000",
"client_mtime": "Mon, 18 Jul 2011 18:04:35 +0000",
"path": "/Getting_Started.pdf",
"is_dir": false,
"icon": "page_white_acrobat",
"root": "dropbox",
"mime_type": "application/pdf",
"revision": 220823
}