Parse JSON object in SAS macro - json

Here is the input JSON file. It have to parse in SAS dataset.
"results":
[
{
"acct_nbr": 1234,
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
}
}
,
{
"acct_nbr": 3456,
"firstName": "Sam",
"lastName": "Jones",
"age": 32,
"address": {
"streetAddress": "25 2nd Street",
"city": "New Jersy",
"state": "NJ",
"postalCode": "10081"
}
}
]
And I want the output for only Address field in SAS dataset like this :
ACCT_NBR FIELD_NAME FIELD_VALUE
1234 streetAddress 21 2nd Street
1234 city New York
1234 state NY
1234 postalCode 10021
3456 streetAddress 25 2nd Street
3456 city New Jersy
3456 state NJ
3456 postalCode 10081
I have tried separate way, but no similar output.
even tried scanover from PDF ... but cannot get desired output...
here is my code......and output....
LIBNAME src '/home/user/read_JSON';
filename data '/home/user/read_JSON/test2.json';
data src.testdata2;
infile data lrecl = 32000 truncover scanover;
input #'"streetAddress": "' streetAddress $255. #'"city": "' city $255. #'"state": "' state $2. #'"postalCode": "' postalCode $255.;
streetAddress = substr(streetAddress,1,index(streetAddress,'",')-2);
city = substr( city,1,index( city,'",')-2);
state = substr(state,1,index(state,'",')-2);
postalCode = substr(postalCode,1,index(postalCode,'",')-2);
run;
proc print data=src.testdata2;
RUN;
My OUTPUT in .lst file
The SAS System 09:44 Tuesday, January 14, 2014 1
street postal
Obs Address city state Code
1 21 2nd Stree New Yor NY 10021"
2 25 2nd Stree New Jers NJ 10081"

To answer your question with a SAS-only solution, your problems are twofold:
Use SCAN instead of substr to get the un-comma/quotationed portion
acct_nbr is a number, so you need to remove the final quotation mark from the input.
Here's the correct code (I changed directories, you'll need to change them back):
filename data 'c:\temp\json.txt';
data testdata2;
infile data lrecl = 32000 truncover scanover;
input
#'"acct_nbr": ' acct_nbr $255.
#'"streetAddress": "' streetAddress $255.
#'"city": "' city $255.
#'"state": "' state $2.
#'"postalCode": "' postalCode $255.;
acct_nbr=scan(acct_nbr,1,',"');
streetAddress = scan(streetAddress,1,',"');
city = scan(city,1,',"');
state = scan(state,1,',"');
postalCode = scan(postalCode,1,',"');
run;
proc print data=testdata2;
RUN;

You can use proc groovy to parse JSON pretty easily (assuming you know Groovy). This SAS blog on authenticating to Twitter shows a detailed example of how to do it; here is some of the highlights.
This assumes you have the Groovy JAR files (http://groovy.codehaus.org/Download) and a way to output the files (the example uses OpenCSV).
The below is my attempt at it; I don't think it quite works, but I don't know Groovy, either. The general concept should be correct. If you want to try this approach, but can't figure out the specifics of this, you might either retag your question groovy or ask a new question with that tag.
%let groovydir=C:\Program Files\SASHome_9.4\SASFoundation\9.4\groovy; *the location the groovy JARs are located at;
%let sourcefile=c:\temp\json.txt;
%let outfile=c:\temp\json.csv;
proc groovy classpath="&groovydir.\groovy-all-2.2.0.jar;&groovydir.\opencsv-2.3.jar";
submit "&sourcefile" "&outfile";
import groovy.json.*
import au.com.bytecode.opencsv.CSVWriter
def input = new File(args[0]).text
def output = new JsonSlurper().parseText(input)
def csvoutput = new FileWriter(args[1])
CSVWriter writer = new CSVWriter(csvoutput);
String[] header = new String[8];
header[0] = "results.acct_nbr";
header[1] = "results.firstName";
header[2] = "results.lastName";
header[3] = "results.age";
header[4] = "results.address.streetAddress";
header[5] = "results.address.city";
header[6] = "results.address.state";
header[7] = "results.address.postalCode";
writer.writeNext(header);
output.statuses.each {
String[] content = new String[8];
content[0] = it.results.acct_nbr.toString();
content[1] = it.results.firstName.toString();
content[2] = it.results.lastName.toString();
content[3] = it.results.age.toString();
content[4] = it.results.address.streetAddress.toString();
content[5] = it.results.address.city.toString();
content[6] = it.results.address.state.toString();
content[7] = it.results.address.postalCode.toString();
writer.writeNext(content)
}
writer.close();
endsubmit;
quit;

I used this json file and above code as an example in a thread on sas.com. One of the expert programmers on there was extremely generous and came up with a solution. Note the json file should be wrapped in "{}".
Link: https://communities.sas.com/thread/72163
Code:
filename cp temp;
proc groovy classpath=cp;
add classpath="C:\Program Files\Java\groovy-2.3.4\embeddable\groovy-all-2.3.4.jar";
/*or*/
/*
add classpath="C:\Program Files\Java\groovy-2.3.4\lib\groovy-2.3.4.jar";
add classpath="C:\Program Files\Java\groovy-2.3.4\lib\groovy-json-2.3.4.jar";
*/
submit parseonly;
import groovy.json.JsonSlurper
class MyJsonParser {
def parseFile(path) {
def jsonFile = new File(path)
def jsonText = jsonFile.getText()
def InputJSON = new JsonSlurper().parseText(jsonText)
def accounts = []
InputJSON.results.each{
accounts << [
acct_nbr : it.acct_nbr.toString(),
firstName : it.firstName,
lastName : it.lastName,
age : it.age.toString(),
streetAddress : it.address.streetAddress,
city : it.address.city,
state : it.address.state,
postalCode : it.address.postalCode
]
}
return accounts
}
}
endsubmit;
submit parseonly;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.LinkedHashMap;
public class MyJsonParser4Sas {
public String filename = "";
public void init() {
MyJsonParser myParser = new MyJsonParser();
accounts = myParser.parseFile(filename);
iter = accounts.iterator();
}
public boolean hasNext() {
return iter.hasNext();
}
public void getNext() {
account = ((LinkedHashMap) (iter.next()));
}
public String getString(String k) {
return account.get(k);
}
protected ArrayList accounts;
protected Iterator iter;
protected LinkedHashMap account;
}
endsubmit;
quit;
options set=classpath "%sysfunc(pathname(cp,f))";
data accounts;
attrib id label="Account Index" length= 8
acct_nbr label="Account Number" length=$ 10
firstName label="First Name" length=$ 20
lastName label="Last Name" length=$ 30
age label="Age" length=$ 3
streetAddress label="Street Address" length=$ 128
city label="City" length=$ 40
state label="State" length=$ 2
postalCode label="Postal Code" length=$ 5;
dcl javaobj accounts("MyJsonParser4Sas");
accounts.exceptiondescribe(1);
accounts.setStringField("filename", "C:\\foo.json");
accounts.callVoidMethod("init");
accounts.callBooleanMethod("hasNext",rc);
do id=1 by 1 while(rc);
accounts.callVoidMethod("getNext");
accounts.callStringMethod("getString", "acct_nbr", acct_nbr);
accounts.callStringMethod("getString", "firstName", firstName);
accounts.callStringMethod("getString", "lastName", lastName);
accounts.callStringMethod("getString", "age", age);
accounts.callStringMethod("getString", "streetAddress", streetAddress);
accounts.callStringMethod("getString", "city", city);
accounts.callStringMethod("getString", "state", state);
accounts.callStringMethod("getString", "postalCode", postalCode);
output;
accounts.callBooleanMethod("hasNext",rc);
end;
drop rc;
run;

Related

How do I iterate variables to pass into a http header params for further iteration

I have 2 json api's that I am requesting; search and extended profile.
The first one gives me some search results for profiles. The search results have a "memberid" number ps['id'] for each profile found.
I want to pass and iterate those memberid's to the next json api for the extended profile information for each member. The memberid's has to be passed into the profile_params. As it is now, only 1 memberid is being passed and stored and therefore I only get the first extended profile and not all from the search.
My code is like this:
# Search for profiles
search_response = requests.post('https://api_search_for_profiles', headers=search_headers, data=search_params)
search_json = json.dumps(search_response.json(), indent=2)
search_data = json.loads(search_json)
memberid = []
for ps in (search_data['data']['content']):
memberid = str(ps['id']) # These memberid's I want to pass all found to the profile_params.
print('UserID: ' + str(ps['roomNo']))
print('MemberID: ' + str(ps['id']))
print('Username: ' + ps['nickName'])
# Extended profile info
profile_headers = {
'x-auth-token': f'{token}',
'Content-Type': 'application/x-www-form-urlencoded',
'User-Agent': 'okhttp/3.11.0',
}
profile_params = {
'id': '',
'token': f'{token}',
'memberId': f'{memberid}', # where I want each memberid from the search to go
'roomNo': ''
}
profile_response = requests.post('https://api_extended_profile_information', headers=profile_headers, data=profile_params)
profile_json = json.dumps(profile_response.json(), indent=2)
profile_data = json.loads(profile_json)
pfd = profile_data['data'] # main data
userid = str(pfd['roomNo'])
username = pfd['nickName']
gender = str(pfd['gender'])
level = str(pfd['memberLevel'])
# Here I will iterate through each profiles with the corresponding memberid and print.
The json output for search is like this, snippet:
{
"code": 0,
"data": {
"content": [
{
"id": 1359924,
"memberLevel": 1,
"nickName": "akuntesting dgt",
"roomNo": 1820031
},
{
"id": 2607179,
"memberLevel": 1,
"nickName": "testingsyth",
"roomNo": 3299390
},
# ... and so on
Assuming the post request takes only one memberid, the following is a simplified version of your code designed to handle only the issue of multiple memberids. Starting here:
memberids = []
for ps in (search_data['data']['content']):
memberid = str(ps['id'])
memberids.append(memberid)
for memberid in memberids:
profile_params = {'memberId': memberid}
profile_response = requests.post('https://api_extended_profile_information', headers=profile_headers, data=profile_params)
#the rest of your code goes here inside the loop
Try it and let me know if it works.

How to format json in a file using Groovy

I have a question in regards to formatting a file so that it displays a Json output to the correct format.
At the moment the code I have below imports a json into a file but when I open the file, it displays the json in a single line (word wrap unticked) like so:
{"products":[{"type":null,"information":{"description":"Hotel Parque La Paz (One Bedroom apartment) (Half Board) [23/05/2017 00:00:00] 7 nights","items":{"provider Company":"Juniper","provider Hotel ID":"245","provider Hotel Room ID":"200"}},"costGroups":[{"name":null,"costLines":[{"name":"Hotel Cost","search":null,"quote":234.43,"quotePerAdult":null,"quotePerChild":null}
I want to format the json in the file so that it looks like actual json formatting like so:
{
"products": [
{
"type": null,
"information": {
"description": "Hotel Parque La Paz (One Bedroom apartment) (Half Board) [23/05/2017 00:00:00] 7 nights",
"items": {
"provider Company": "Juniper",
"provider Hotel ID": "245",
"provider Hotel Room ID": "200"
}
},
"costGroups": [
{
"name": null,
"costLines": [
{
"name": "Hotel Cost",
"search": null,
"quote": 234.43,
"quotePerAdult": null,
"quotePerChild": null
}
Virtually each header has its own line to contain its values.
What is the best way to implement this to get the correct json formatting within the file?
Below is the code:
def groovyUtils = new com.eviware.soapui.support.GroovyUtils(context)
def dataFolder = groovyUtils.projectPath +"//Log Data//"
def response = testRunner.testCase.getTestStepByName("GET_Pricing{id}").getProperty("Response").getValue();
def jsonFormat = (response).toString()
def fileName = "Logged At - D" +date+ " T" +time+ ".txt"
def logFile = new File(dataFolder + fileName)
// checks if a current log file exists if not then prints to logfile
if(logFile.exists())
{
log.info("Error a file named " + fileName + "already exisits")
}
else
{
logFile.write "Date Stamp: " +date+ " " + time + "\n" + jsonFormat //response
If you have a modern version of groovy, you can do:
JsonOutput.prettyPrint(jsonFormat)

How to scrape the text by categories and make a json file?

We scrape the website www.theft-alerts.com. Now we get all the text.
connection = urllib2.urlopen('http://www.theft-alerts.com')
soup = BeautifulSoup(connection.read().replace("<br>","\n"), "html.parser")
theftalerts = []
for sp in soup.select("table div.itemspacingmodified"):
for wd in sp.select("div.itemindentmodified"):
text = wd.text
if not text.startswith("Images :"):
print(text)
with open("theft-alerts.json", 'w') as outFile:
json.dump(theftalerts, outFile, indent=2)
Output:
STOLEN : A LARGE TAYLORS OF LOUGHBOROUGH BELL
Stolen from Bromyard on 7 August 2014
Item : The bell has a diameter of 37 1/2" is approx 3' tall weighs just shy of half a ton and was made by Taylor's of Loughborough in 1902. It is stamped with the numbers 232 and 11.
The bell had come from Co-operative Wholesale Society's Crumpsall Biscuit Works in Manchester.
Any info to : PC 2361. Tel 0300 333 3000
Messages : Send a message
Crime Ref : 22EJ / 50213D-14
No of items stolen : 1
Location : UK > Hereford & Worcs
Category : Shop, Pub, Church, Telephone Boxes & Bygones
ID : 84377
User : 1 ; Antique/Reclamation/Salvage Trade ; (Administrator)
Date Created : 11 Aug 2014 15:27:57
Date Modified : 11 Aug 2014 15:37:21;
How can we categories the text for the JSON file. The JSON file is now empty.
Output JSON:
[]
You can define a list and append all dictionary objects that you create to the list. e.g:
import json
theftalerts = [];
atheftobject = {};
atheftobject['location'] = 'UK > Hereford & Worcs';
atheftobject['category'] = 'Shop, Pub, Church, Telephone Boxes & Bygones';
theftalerts.append(atheftobject);
atheftobject['location'] = 'UK';
atheftobject['category'] = 'Shop';
theftalerts.append(atheftobject);
with open("theft-alerts.json", 'w') as outFile:
print(json.dump(theftalerts, outFile, indent=2))
After this run the theft-alerts.json will contain this json object:
[
{
"category": "Shop",
"location": "UK"
},
{
"category": "Shop",
"location": "UK"
}
]
You can play with this to generate your own JSON object.
Checkout the json module
Your JSON output remains empty because your loop doesn't append to the list.
Here's how I would extract the category name:
theftalerts = []
for sp in soup.select("table div.itemspacingmodified"):
item_text = "\n".join(
[wd.text for wd in sp.select("div.itemindentmodified")
if not wd.text.startswith("Images :")])
category = sp.find(
'span', {'class': 'itemsmall'}).text.split('\n')[1][11:]
theftalerts.append({'text': item_text, 'category': category})

Parse nested JSON in Flask

I have a REST API endpoint in which I need to parse incoming nested JSON of the format:
site: {
id: '37251',
site_name: 'TestSite',
address: {
'address': '1234 Blaisdell Ave',
'city': 'Minneapolis',
'state': 'MN',
'zip': '55456',
'neighborhood': 'Kingfield',
'county': 'Hennepin',
},
geolocation: {
latitude : '41.6544',
longitude : '73.3322',
accuracy: '45'
}
}
into the following SQLAlchemy classes:
Site:
class Site(db.Model):
__tablename__ = 'site'
id = Column(Integer, primary_key=True, autoincrement=True)
site_name = Column(String(80))# does site have a formal name
address_id = Column(Integer, ForeignKey('address.id'))
address = relationship("Address", backref=backref("site", uselist=False))
geoposition_id = Column(Integer, ForeignKey('geoposition.id'))
geoposition = relationship("Geoposition", backref=backref("site", uselist=False))
evaluations = relationship("Evaluation", backref="site")
site_maintainers = relationship("SiteMaintainer", backref="site")
Address (a Site has one Address):
class Address(db.Model):
__tablename__ = 'address'
id = Column(Integer, primary_key=True, autoincrement=True)
address = Column(String(80))
city = Column(String(80))
state = Column(String(2))
zip = Column(String(5))
neighborhood = Column(String(80))
county = Column(String(80))
and Geoposition (a Site has one Geoposition):
class Geoposition(db.Model):
__tablename__ = 'geoposition'
id = Column(Integer, primary_key=True, autoincrement=True)
site_id = Column(Integer)
latitude = Column(Float(20))
longitude = Column(Float(20))
accuracy = Column(Float(20))
timestamp = Column(DateTime)
Getting the SQLAlchemey data into JSON is easy, but I need to parse the JSON from my request so that I can append/update data that is sent via POST to the RESTful API. I know how to handle non-nested JSON, but I will be the first to admit that I am clueless in dealing with the nested JSON for records that belong to multiple tables in a relational structure.
I've tried searching high and low for this without any luck. Closest I could find is here "Nested validation with the flask-restful RequestParser", but this is not clicking for what I need to do based on my nested structure.
Cool! Looks like Flask handles this through the request handler:
With this JSON:
site: {
"id": "37251",
"site_name": "TestSite",
"address": {
"address": "1234 Blaisdell Ave",
"city": "Minneapolis",
"state": "MN",
"zip": "55456",
"neighborhood": "Kingfield",
"county": "Hennepin"
},
geolocation: {
latitude : "41.6544",
longitude : "73.3322",
accuracy: "45"
}
}
Sent to this endpoint:
#app.route('/api/resource', methods=['GET','POST','OPTIONS'])
#cross_origin() # allow all origins all methods
#auth.login_required
def get_resource():
# Get the parsed contents of the form data
json = request.json
print(json)
# Render template
return jsonify(json)
I get the following object:
{u'site': {u'geolocation': {u'latitude': u'41.6544', u'longitude': u'73.3322', u'accuracy': u'45'}, u'site_name': u'TestSite', u'id': u'37251', u'address': {u'city': u'Minneapolis', u'neighborhood': u'Kingfield', u'zip': u'55456', u'county': u'Hennepin', u'state': u'MN', u'address': u'1234 Blaisdell Ave'}}}
UPDATE:
Am able to access all my dictionary items just fine using this code for a test in my endpoint:
# print entire object
print json['site']
# define dictionary item for entire object
site = json['site']
print site["site_name"]
# print address object
print site['address']
# define address dictionary object
address = json['site']['address']
print address["address"]
# define geolocation dictionary object
geolocation = json['site']['geolocation']
print geolocation["accuracy"]
In retrospect, this seems rather trivial now. I hope this helps someone in the future.
Do you have access and can edit the JSON?
A few edits would help make it a valid JSON:
Use double quotes for keys and values
Open and close JSON with { and }
Delete the trailing , in the country line
If you can do so, you JSON will look like this:
{
"site": {
"id": "37251",
"site_name": "TestSite",
"address": {
"address": "1234BlaisdellAve",
"city": "Minneapolis",
"state": "MN",
"zip": "55456",
"neighborhood": "Kingfield",
"county": "Hennepin"
},
"geolocation": {
"latitude": "41.6544",
"longitude": "73.3322",
"accuracy": "45"
}
}
}
Sorting that out, use Python's json to deal with it:
import json
file_handler = open('test.json', 'r')
parsed_data = json.loads(file_handler.read())
print parsed_data
The output is a diciotnary that you can easily iterate with to validate your data:
{u'site': {u'geolocation': {u'latitude': u'41.6544', u'longitude': u'73.3322', u'accuracy': u'45'}, u'site_name': u'TestSite', u'id': u'37251', u'address': {u'city': u'Minneapolis', u'neighborhood': u'Kingfield', u'zip': u'55456', u'county': u'Hennepin', u'state': u'MN', u'address': u'1234BlaisdellAve'}}}
But if you can't edit your JSON to make its syntax better, json.loads would not parse it…

Parsing JSON from Google Distance Matrix API with Corona SDK

So I'm trying to pull data from a JSON string (as seen below). When I decode the JSON using the code below, and then attempt to index the duration text, I get a nil return. I have tried everything and nothing seems to work.
Here is the Google Distance Matrix API JSON:
{
"destination_addresses" : [ "San Francisco, CA, USA" ],
"origin_addresses" : [ "Seattle, WA, USA" ],
"rows" : [
{
"elements" : [
{
"distance" : {
"text" : "1,299 km",
"value" : 1299026
},
"duration" : {
"text" : "12 hours 18 mins",
"value" : 44303
},
"status" : "OK"
}]
}],
"status" : "OK"
}
And here is my code:
local json = require ("json")
local http = require("socket.http")
local myNewData1 = {}
local SaveData1 = function (event)
distanceReturn = ""
distance = ""
local URL1 = "http://maps.googleapis.com/maps/api/distancematrix/json?origins=Seattle&destinations=San+Francisco&mode=driving&&sensor=false"
local response1 = http.request(URL1)
local data2 = json.decode(response1)
if response1 == nil then
native.showAlert( "Data is nill", { "OK"})
print("Error1")
distanceReturn = "Error1"
elseif data2 == nill then
distanceReturn = "Error2"
native.showAlert( "Data is nill", { "OK"})
print("Error2")
else
for i = 1, #data2 do
print("Working")
print(data2[i].rows)
for j = 1, #data2[i].rows, 1 do
print("\t" .. data2[i].rows[j])
for k = 1, #data2[i].rows[k].elements, 1 do
print("\t" .. data2[i].rows[j].elements[k])
for g = 1, #data2[i].rows[k].elements[k].duration, 1 do
print("\t" .. data2[i].rows[k].elements[k].duration[g])
for f = 1, #data2[i].rows[k].elements[k].duration[g].text, 1 do
print("\t" .. data2[i].rows[k].elements[k].duration[g].text)
distance = data2[i].rows[k].elements[k].duration[g].text
distanceReturn = data2[i].rows[k].elements[k].duration[g].text
end
end
end
end
end
end
timer.performWithDelay (100, SaveData1, 999999)
Your loops are not correct. Try this shorter solution.
Replace all your "for i = 1, #data2 do" loop for this one below:
print("Working")
for i,row in ipairs(data2.rows) do
for j,element in ipairs(row.elements) do
print(element.duration.text)
end
end
This question was solved on Corona Forums by Rob Miracle (http://forums.coronalabs.com/topic/47319-parsing-json-from-google-distance-matrix-api/?hl=print_r#entry244400). The solution is simple:
"JSON and Lua tables are almost identical data structures. In this case your table data2 has top level entries:
data2.destination_addresses
data2.origin_addresses
data2.rows
data2.status
Now data2.rows is another table that is indexed by numbers (the [] brackets) but here is only one of them, but its still an array entry:
data.rows[1]
Then inside of it is another numerically indexed table called elements.
So far to get to the element they are (again there is only one of them
data2.rows[1].elements[1]
then it's just accessing the remaining elements:
data2.rows[1].elements[1].distance.text
data2.rows[1].elements[1].distance.value
data2.rows[1].elements[1].duration.text
data2.rows[1].elements[1].duration.value
There is a great table printing function called print_r which can be found in the community code which is great for dumping tables like this to see their structure."