Writing JSON children in R - json

I have a data set that I would like to group in JSON.
address city.x state.x latitude.x longitude.x
1 5601 W. Slauson Ave. #200 Culver City CA 33.99718 -118.40145
2 PO 163005 Austin TX 30.31622 -97.85877
3 10215 W. Jamesburg Street Wichita KS 37.70063 -97.43430
4 14556 Newport Ave Tustin CA 33.74165 -117.82127
5 2496 Falcon Crescent Virginia Beach VA 36.83840 -76.02862
6 1306 Wilshire Boulevard Santa Monica CA 34.03216 -118.49022
I would like to group together address and lat/long and put it all under the category of company.
I would like it to look like this:
{company: {address: {address: "5601 W. Slauson Ave. #200" ,
city.x: "Culver City" ,
state.x: "CA"}},
{geo: {latitude: "33.99718",
longitude: "-118.40145"}}},
{company: {address: {address: "PO 163005" ,
city.x: "Austin" ,
state.x: "TX"}},
{geo: {latitude: "30.31622",
longitude: "-97.85877"}}},
structure(list(address = c("5601 W. Slauson Ave. #200", "PO 163005",
"10215 W. Jamesburg Street", "14556 Newport Ave", "2496 Falcon Crescent",
"1306 Wilshire Boulevard"), city.x = c("Culver City", "Austin",
"Wichita", "Tustin", "Virginia Beach", "Santa Monica"), state.x = c("CA",
"TX", "KS", "CA", "VA", "CA"), latitude.x = c(33.997179, 30.316223,
37.700632, 33.741651, 36.838398, 34.032159), longitude.x = c(-118.40145,
-97.85877, -97.4343, -117.82127, -76.02862, -118.49022)), .Names = c("address",
"city.x", "state.x", "latitude.x", "longitude.x"), class = "data.frame", row.names = c(NA,
6L))
Any help would be appreciated!

The following code should output what you want:
for (i in 1:nrow(df)){
cat ("{company:{address:{adress:\t\"",df$address[i],
"\",\n\t\tcity.x:\t\"", df$city.x[i],
"\",\n\t\tstate.x:\t \"", df$state.x[i],
"\"}}\n\t {geo:{\tlatitude: \"", df$latitude[i],
"\",\n\t\tlongitude: \"", df$longitude[i],
"\"}}},\n", sep="")
}
with df as your data frame.

Another option is to use the rjson package.
require(rjson)
# This is necessary to avoid duplication of labels in the JSON output
names(df) <- NULL
reshaped <- apply(df, 1, FUN=function(x){list(address=list(
address = x[1],
city = x[2],
state = x[3]),
coords=list(
latitude = x[4],
longitude = x[5]))})
result <- toJSON(reshaped)
The only difference from what you requested is that instead of having "company" as the root it will have sequential numbers. You could change it by changing the row names of your data (using rownames), but R does not support duplicate row names... the closest that I got was using
rownames(df) <- paste("company", 1:nrow(df), collapse="")
and maybe with a little regexp magic you could strip the numbers in the output string...

Related

Read JSON to pandas dataframe - Getting ValueError: Mixing dicts with non-Series may lead to ambiguous ordering

I am trying to read in the JSON structure below into pandas dataframe, but it throws out the error message:
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.
Json data:
'''
{
"Name": "Bob",
"Mobile": 12345678,
"Boolean": true,
"Pets": ["Dog", "cat"],
"Address": {
"Permanent Address": "USA",
"Current Address": "UK"
},
"Favorite Books": {
"Non-fiction": "Outliers",
"Fiction": {"Classic Literature": "The Old Man and the Sea"}
}
}
'''
How do I get this right? I have tried the script below...
'''
j_df = pd.read_json('json_file.json')
j_df
with open(j_file) as jsonfile:
data = json.load(jsonfile)
'''
Read json from file first and pass to json_normalize with DataFrame.explode:
import json
with open('json_file.json') as data_file:
data = json.load(data_file)
df = pd.json_normalize(j).explode('Pets').reset_index(drop=True)
print (df)
Name Mobile Boolean Pets Address.Permanent Address \
0 Bob 12345678 True Dog USA
1 Bob 12345678 True cat USA
Address.Current Address Favorite Books.Non-fiction \
0 UK Outliers
1 UK Outliers
Favorite Books.Fiction.Classic Literature
0 The Old Man and the Sea
1 The Old Man and the Sea
EDIT: For write values to sentence you can select necessary columns, remove duplicates, create numpy array and loop:
for x, y in df[['Name','Favorite Books.Fiction.Classic Literature']].drop_duplicates().to_numpy():
print (f"{x}’s favorite classical iterature book is {y}.")
Bob’s favorite classical iterature book is The Old Man and the Sea.

How to convert multiple json objects interpreted as string into Json dictionary

I have my data as below which looks like multiple json dictionaries but it is of type string. Can someone please help me out to convert it into json dictionary ?
{"id": "1305857561179152385", "tweet": "If you like vintage coke machines and guys who look like Fred Flintstone you'll love the short we've riffed: Coke R\u2026 ", "ts": "Tue Sep 15 13:14:38 +0000 2020"}{"id": "1305858267067883521", "tweet": "Chinese unicorn Genki Forest plots own beverage hits #China #Chinese #Brands #GoingGlobal\u2026 ", "ts": "Tue Sep 15 13:17:27 +0000 2020"}{"id": "1305858731293507585", "tweet": "RT #CinemaCheezy: If you like vintage coke machines and guys who look like Fred Flintstone you'll love the short we've riffed: Coke Refresh\u2026", "ts": "Tue Sep 15 13:19:17 +0000 2020"}
Try this,
let = "{'a': 'b', 'c': 'd'}{'e':'f', 'g':'h'}"
let_list = let.split('}')
d = []
for i in let_list[:-1]:
val = eval(i + '}')
d.append(val)
The output will be two dictionaries
print(d)
# Will print as shown
[{'a': 'b', 'c': 'd'}, {'e':'f', 'g':'h'}]
import json
json.loads(json_str)

Ruby: Extract from deeply nested JSON structure based on multiple criteria

I want to select any marketId of marketName == 'Moneyline' but only those with countryCode == 'US' || 'GB' OR eventName.include?(' # '). (space before and after the #). I tried different combos of map and select but some nodes don't have countryCode which complicates things for me. This is the source, but a sample of what it might look like:
{"currencyCode"=>"GBP",
"eventTypes"=>[
{"eventTypeId"=>7522,
"eventNodes"=>[
{"eventId"=>28024331,
"event"=>
{"eventName"=>"EWE Baskets Oldenburg v PAOK Thessaloniki BC"
},
"marketNodes"=>[
{"marketId"=>"1.128376755",
"description"=>
{"marketName"=>"Moneyline"}
},
{"marketId"=>"1.128377853",
"description"=>
{"marketName"=>"Start Lublin +7.5"}
}}}]},
{"eventId"=>28023434,
"event"=>
{"eventName"=>"Asseco Gdynia v Start Lublin",
"countryCode"=>"PL",
},
"marketNodes"=>
[{"marketId"=>"1.128377853", ETC...
Based on this previous answer, you just need to add a select on eventNodes :
require 'json'
json = File.read('data.json')
hash = JSON.parse(json)
moneyline_market_ids = hash["eventTypes"].map{|type|
type["eventNodes"].select{|event_node|
['US', 'GB'].include?(event_node["event"]["countryCode"]) || event_node["event"]["eventName"].include?(' # ')
}.map{|event|
event["marketNodes"].select{|market|
market["description"]["marketName"] == 'Moneyline'
}.map{|market|
market["marketId"]
}
}
}.flatten
puts moneyline_market_ids.join(', ')
#=> 1.128255531, 1.128272164, 1.128255516, 1.128272159, 1.128278718, 1.128272176, 1.128272174, 1.128272169, 1.128272148, 1.128272146, 1.128255464, 1.128255448, 1.128272157, 1.128272155, 1.128255499, 1.128272153, 1.128255484, 1.128272150, 1.128255748, 1.128272185, 1.128278720, 1.128272183, 1.128272178, 1.128255729, 1.128360712, 1.128255371, 1.128255433, 1.128255418, 1.128255403, 1.128255387
If you want to keep the country code and name information with the id:
moneyline_market_ids = hash["eventTypes"].map{|type|
type["eventNodes"].map{|event_node|
[event_node, event_node["event"]["countryCode"], event_node["event"]["eventName"]]
}.select{|_, country, event_name|
['US', 'GB'].include?(country) || event_name.include?(' # ')
}.map{|event, country, event_name|
event["marketNodes"].select{|market|
market["description"]["marketName"] == 'Moneyline'
}.map{|market|
[market["marketId"],country,event_name]
}
}
}.flatten(2)
require 'pp'
pp moneyline_market_ids
#=> [["1.128255531", "US", "Philadelphia # Seattle"],
# ["1.128272164", "US", "Arkansas # Mississippi State"],
# ["1.128255516", "US", "New England # San Francisco"],
# ["1.128272159", "US", "Indiana # Michigan"],
# ["1.128278718", "CA", "Edmonton # Ottawa"],
# ["1.128272176", "US", "Arizona State # Washington"],
# ["1.128272174", "US", "Alabama A&M # Auburn"],
# ...

How to scrape the text by categories and make a json file?

We scrape the website www.theft-alerts.com. Now we get all the text.
connection = urllib2.urlopen('http://www.theft-alerts.com')
soup = BeautifulSoup(connection.read().replace("<br>","\n"), "html.parser")
theftalerts = []
for sp in soup.select("table div.itemspacingmodified"):
for wd in sp.select("div.itemindentmodified"):
text = wd.text
if not text.startswith("Images :"):
print(text)
with open("theft-alerts.json", 'w') as outFile:
json.dump(theftalerts, outFile, indent=2)
Output:
STOLEN : A LARGE TAYLORS OF LOUGHBOROUGH BELL
Stolen from Bromyard on 7 August 2014
Item : The bell has a diameter of 37 1/2" is approx 3' tall weighs just shy of half a ton and was made by Taylor's of Loughborough in 1902. It is stamped with the numbers 232 and 11.
The bell had come from Co-operative Wholesale Society's Crumpsall Biscuit Works in Manchester.
Any info to : PC 2361. Tel 0300 333 3000
Messages : Send a message
Crime Ref : 22EJ / 50213D-14
No of items stolen : 1
Location : UK > Hereford & Worcs
Category : Shop, Pub, Church, Telephone Boxes & Bygones
ID : 84377
User : 1 ; Antique/Reclamation/Salvage Trade ; (Administrator)
Date Created : 11 Aug 2014 15:27:57
Date Modified : 11 Aug 2014 15:37:21;
How can we categories the text for the JSON file. The JSON file is now empty.
Output JSON:
[]
You can define a list and append all dictionary objects that you create to the list. e.g:
import json
theftalerts = [];
atheftobject = {};
atheftobject['location'] = 'UK > Hereford & Worcs';
atheftobject['category'] = 'Shop, Pub, Church, Telephone Boxes & Bygones';
theftalerts.append(atheftobject);
atheftobject['location'] = 'UK';
atheftobject['category'] = 'Shop';
theftalerts.append(atheftobject);
with open("theft-alerts.json", 'w') as outFile:
print(json.dump(theftalerts, outFile, indent=2))
After this run the theft-alerts.json will contain this json object:
[
{
"category": "Shop",
"location": "UK"
},
{
"category": "Shop",
"location": "UK"
}
]
You can play with this to generate your own JSON object.
Checkout the json module
Your JSON output remains empty because your loop doesn't append to the list.
Here's how I would extract the category name:
theftalerts = []
for sp in soup.select("table div.itemspacingmodified"):
item_text = "\n".join(
[wd.text for wd in sp.select("div.itemindentmodified")
if not wd.text.startswith("Images :")])
category = sp.find(
'span', {'class': 'itemsmall'}).text.split('\n')[1][11:]
theftalerts.append({'text': item_text, 'category': category})

Parse JSON object in SAS macro

Here is the input JSON file. It have to parse in SAS dataset.
"results":
[
{
"acct_nbr": 1234,
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
}
}
,
{
"acct_nbr": 3456,
"firstName": "Sam",
"lastName": "Jones",
"age": 32,
"address": {
"streetAddress": "25 2nd Street",
"city": "New Jersy",
"state": "NJ",
"postalCode": "10081"
}
}
]
And I want the output for only Address field in SAS dataset like this :
ACCT_NBR FIELD_NAME FIELD_VALUE
1234 streetAddress 21 2nd Street
1234 city New York
1234 state NY
1234 postalCode 10021
3456 streetAddress 25 2nd Street
3456 city New Jersy
3456 state NJ
3456 postalCode 10081
I have tried separate way, but no similar output.
even tried scanover from PDF ... but cannot get desired output...
here is my code......and output....
LIBNAME src '/home/user/read_JSON';
filename data '/home/user/read_JSON/test2.json';
data src.testdata2;
infile data lrecl = 32000 truncover scanover;
input #'"streetAddress": "' streetAddress $255. #'"city": "' city $255. #'"state": "' state $2. #'"postalCode": "' postalCode $255.;
streetAddress = substr(streetAddress,1,index(streetAddress,'",')-2);
city = substr( city,1,index( city,'",')-2);
state = substr(state,1,index(state,'",')-2);
postalCode = substr(postalCode,1,index(postalCode,'",')-2);
run;
proc print data=src.testdata2;
RUN;
My OUTPUT in .lst file
The SAS System 09:44 Tuesday, January 14, 2014 1
street postal
Obs Address city state Code
1 21 2nd Stree New Yor NY 10021"
2 25 2nd Stree New Jers NJ 10081"
To answer your question with a SAS-only solution, your problems are twofold:
Use SCAN instead of substr to get the un-comma/quotationed portion
acct_nbr is a number, so you need to remove the final quotation mark from the input.
Here's the correct code (I changed directories, you'll need to change them back):
filename data 'c:\temp\json.txt';
data testdata2;
infile data lrecl = 32000 truncover scanover;
input
#'"acct_nbr": ' acct_nbr $255.
#'"streetAddress": "' streetAddress $255.
#'"city": "' city $255.
#'"state": "' state $2.
#'"postalCode": "' postalCode $255.;
acct_nbr=scan(acct_nbr,1,',"');
streetAddress = scan(streetAddress,1,',"');
city = scan(city,1,',"');
state = scan(state,1,',"');
postalCode = scan(postalCode,1,',"');
run;
proc print data=testdata2;
RUN;
You can use proc groovy to parse JSON pretty easily (assuming you know Groovy). This SAS blog on authenticating to Twitter shows a detailed example of how to do it; here is some of the highlights.
This assumes you have the Groovy JAR files (http://groovy.codehaus.org/Download) and a way to output the files (the example uses OpenCSV).
The below is my attempt at it; I don't think it quite works, but I don't know Groovy, either. The general concept should be correct. If you want to try this approach, but can't figure out the specifics of this, you might either retag your question groovy or ask a new question with that tag.
%let groovydir=C:\Program Files\SASHome_9.4\SASFoundation\9.4\groovy; *the location the groovy JARs are located at;
%let sourcefile=c:\temp\json.txt;
%let outfile=c:\temp\json.csv;
proc groovy classpath="&groovydir.\groovy-all-2.2.0.jar;&groovydir.\opencsv-2.3.jar";
submit "&sourcefile" "&outfile";
import groovy.json.*
import au.com.bytecode.opencsv.CSVWriter
def input = new File(args[0]).text
def output = new JsonSlurper().parseText(input)
def csvoutput = new FileWriter(args[1])
CSVWriter writer = new CSVWriter(csvoutput);
String[] header = new String[8];
header[0] = "results.acct_nbr";
header[1] = "results.firstName";
header[2] = "results.lastName";
header[3] = "results.age";
header[4] = "results.address.streetAddress";
header[5] = "results.address.city";
header[6] = "results.address.state";
header[7] = "results.address.postalCode";
writer.writeNext(header);
output.statuses.each {
String[] content = new String[8];
content[0] = it.results.acct_nbr.toString();
content[1] = it.results.firstName.toString();
content[2] = it.results.lastName.toString();
content[3] = it.results.age.toString();
content[4] = it.results.address.streetAddress.toString();
content[5] = it.results.address.city.toString();
content[6] = it.results.address.state.toString();
content[7] = it.results.address.postalCode.toString();
writer.writeNext(content)
}
writer.close();
endsubmit;
quit;
I used this json file and above code as an example in a thread on sas.com. One of the expert programmers on there was extremely generous and came up with a solution. Note the json file should be wrapped in "{}".
Link: https://communities.sas.com/thread/72163
Code:
filename cp temp;
proc groovy classpath=cp;
add classpath="C:\Program Files\Java\groovy-2.3.4\embeddable\groovy-all-2.3.4.jar";
/*or*/
/*
add classpath="C:\Program Files\Java\groovy-2.3.4\lib\groovy-2.3.4.jar";
add classpath="C:\Program Files\Java\groovy-2.3.4\lib\groovy-json-2.3.4.jar";
*/
submit parseonly;
import groovy.json.JsonSlurper
class MyJsonParser {
def parseFile(path) {
def jsonFile = new File(path)
def jsonText = jsonFile.getText()
def InputJSON = new JsonSlurper().parseText(jsonText)
def accounts = []
InputJSON.results.each{
accounts << [
acct_nbr : it.acct_nbr.toString(),
firstName : it.firstName,
lastName : it.lastName,
age : it.age.toString(),
streetAddress : it.address.streetAddress,
city : it.address.city,
state : it.address.state,
postalCode : it.address.postalCode
]
}
return accounts
}
}
endsubmit;
submit parseonly;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.LinkedHashMap;
public class MyJsonParser4Sas {
public String filename = "";
public void init() {
MyJsonParser myParser = new MyJsonParser();
accounts = myParser.parseFile(filename);
iter = accounts.iterator();
}
public boolean hasNext() {
return iter.hasNext();
}
public void getNext() {
account = ((LinkedHashMap) (iter.next()));
}
public String getString(String k) {
return account.get(k);
}
protected ArrayList accounts;
protected Iterator iter;
protected LinkedHashMap account;
}
endsubmit;
quit;
options set=classpath "%sysfunc(pathname(cp,f))";
data accounts;
attrib id label="Account Index" length= 8
acct_nbr label="Account Number" length=$ 10
firstName label="First Name" length=$ 20
lastName label="Last Name" length=$ 30
age label="Age" length=$ 3
streetAddress label="Street Address" length=$ 128
city label="City" length=$ 40
state label="State" length=$ 2
postalCode label="Postal Code" length=$ 5;
dcl javaobj accounts("MyJsonParser4Sas");
accounts.exceptiondescribe(1);
accounts.setStringField("filename", "C:\\foo.json");
accounts.callVoidMethod("init");
accounts.callBooleanMethod("hasNext",rc);
do id=1 by 1 while(rc);
accounts.callVoidMethod("getNext");
accounts.callStringMethod("getString", "acct_nbr", acct_nbr);
accounts.callStringMethod("getString", "firstName", firstName);
accounts.callStringMethod("getString", "lastName", lastName);
accounts.callStringMethod("getString", "age", age);
accounts.callStringMethod("getString", "streetAddress", streetAddress);
accounts.callStringMethod("getString", "city", city);
accounts.callStringMethod("getString", "state", state);
accounts.callStringMethod("getString", "postalCode", postalCode);
output;
accounts.callBooleanMethod("hasNext",rc);
end;
drop rc;
run;