Ruby Parsing json array with OpenStruct - json

I'm trying to parse a json file with OpenStruct. Json file has an array for Skills. When I parse it I get some extra "garbage" returned. How do I get rid of it?
json
{
"Job": "My Job 1",
"Skills": [{ "Name": "Name 1", "ClusterName": "Cluster Name 1 Skills"},{ "Name": "Name 2", "ClusterName": "Cluster Name 2 Skills"}]
}
require 'ostruct'
require 'json'
json = File.read('1.json')
job = JSON.parse(json, object_class: OpenStruct)
puts job.Skills
#<OpenStruct Name="Name 1", ClusterName="Cluster Name 1 Skills">
#<OpenStruct Name="Name 2", ClusterName="Cluster Name 2 Skills">

If by garbage, you mean #<OpenStruct and ">, it is just the way Ruby represents objects when called with puts. It is useful for development and debugging, and it makes it easier to understand the difference between a String, an Array, an Hash and an OpenStruct.
If you just want to display the name and cluster name, and nothing else :
puts job.Job
job.Skills.each do |skill|
puts skill.Name
puts skill.ClusterName
end
It returns :
My Job 1
Name 1
Cluster Name 1 Skills
Name 2
Cluster Name 2 Skills
EDIT:
When you use job = JSON.parse(json, object_class: OpenStruct), your job variable becomes an OpenStruct Ruby object, which has been created from a json file.
It doesn't have anything to do with json though: it is not a json object anymore, so you cannot just write it back to a .json file and expect it to have the correct syntax.
OpenStruct doesn't seem to work well with to_json, so it might be better to remove object_class: OpenStruct, and just work with hashes and arrays.
This code reads 1.json, convert it to a Ruby object, adds a skill, modifies the job name, writes the object to 2.json, and reads it again as JSON to check that everything worked fine.
require 'json'
json = File.read('1.json')
job = JSON.parse(json)
job["Skills"] << {"Name" => "Name 3", "ClusterName" => "Cluster Name 3 Skills"}
job["Job"] += " (modified version)"
# job[:Fa] = 'blah'
File.open('2.json', 'w'){|out|
out.puts job.to_json
}
require 'pp'
pp JSON.parse(File.read('2.json'))
# {"Job"=>"My Job 1 (modified version)",
# "Skills"=>
# [{"Name"=>"Name 1", "ClusterName"=>"Cluster Name 1 Skills"},
# {"Name"=>"Name 2", "ClusterName"=>"Cluster Name 2 Skills"},
# {"Name"=>"Name 3", "ClusterName"=>"Cluster Name 3 Skills"}]}

Related

Dask how to open json with list of dicts

I'm trying to open a bunch of JSON files using read_json In order to get a Dataframe as follow
ddf.compute()
id owner pet_id
0 1 "Charlie" "pet_1"
1 2 "Charlie" "pet_2"
3 4 "Buddy" "pet_3"
but I'm getting the following error
_meta = pd.DataFrame(
columns=list(["id", "owner", "pet_id"]])
).astype({
"id":int,
"owner":"object",
"pet_id": "object"
})
ddf = dd.read_json(f"mypets/*.json", meta=_meta)
ddf.compute()
*** ValueError: Metadata mismatch found in `from_delayed`.
My JSON files looks like
[
{
"id": 1,
"owner": "Charlie",
"pet_id": "pet_1"
},
{
"id": 2,
"owner": "Charlie",
"pet_id": "pet_2"
}
]
As far I understand the problem is that I'm passing a list of dicts, so I'm looking for the right way to specify it the meta= argument
PD:
I also tried doing it in the following way
{
"id": [1, 2],
"owner": ["Charlie", "Charlie"],
"pet_id": ["pet_1", "pet_2"]
}
But Dask is wrongly interpreting the data
ddf.compute()
id owner pet_id
0 [1, 2] ["Charlie", "Charlie"] ["pet_1", "pet_2"]
1 [4] ["Buddy"] ["pet_3"]
The invocation you want is the following:
dd.read_json("data.json", meta=meta,
blocksize=None, orient="records",
lines=False)
which can be largely gleaned from the docstring.
meta looks OK from your code
blocksize must be None, since you have a whole JSON object per file and cannot split the file
orient "records" means list of objects
lines=False means this is not a line-delimited JSON file, which is the more common case for Dask (you are not assuming that a newline character means a new record)
So why the error? Probably Dask split your file on some newline character, and so a partial record got parsed, which therefore did not match your given meta.

JSON Response printing issue for Python

I am hitting an API to get ID based on parameters passed from database, script below shows only the API part. I am passing 2 columns. If both the columns have data in the database, then it hits API-1. If only 2nd column has a data then it hits API-2 via API-1. The problem is in printing the response, because both the API have different structure.
Pretty Structure of API-1:
"body": {
"type1": {
"id": id_data,
"col1": "col1_data",
"col2": "col2_data"}
}
Pretty Structure of API-2:
"body": {
"id": id_data,
"col2": "col2_data"
}
Python Code:
print (resp['body']['type1']['id'], resp['body']['type1']['col1'], resp['body']['type1']['col2'])
As you can see structure is different and 'print' works if both the parameters are sent but it fails when only 2nd column is sent as parameter.
Create a printer that handles both cases graciously:
id_data = "whatever"
data1 = {"body": { "type1": { "id": id_data, "col1": "col1_data", "col2": "col2_data"} }}
data2 = { "body": { "id": id_data, "col2": "col2_data"}}
def printit(d):
# try to acces d["body"]["type1"] - fallback to d["body"] - store it
myd = d["body"].get("type1",d["body"])
did = myd.get("id") # get id
col1 = myd.get("col1") # mabye get col1
col2 = myd.get("col2") # get col2
# print what is surely there, check for those that might miss
print(did)
if col1:
print(col1)
print(col2 + "\n")
printit(data1)
printit(data2)
Output:
# data1
whatever
col1_data
col2_data
# data2
whatever
col2_data

How do you print multiple key values from sub keys in a .json file?

Im pulling a list of AMI ids from my AWS account and its being written into a json file.
The json looks basically like this:
{
"Images": [
{
"CreationDate": "2017-11-24T11:05:32.000Z",
"ImageId": "ami-XXXXXXXX"
},
{
"CreationDate": "2017-11-24T11:05:32.000Z",
"ImageId": "ami-aaaaaaaa"
},
{
"CreationDate": "2017-10-24T11:05:32.000Z",
"ImageId": "ami-bbbbbbb"
},
{
"CreationDate": "2017-10-24T11:05:32.000Z",
"ImageId": "ami-cccccccc"
},
{
"CreationDate": "2017-12-24T11:05:32.000Z",
"ImageId": "ami-ddddddd"
},
{
"CreationDate": "2017-12-24T11:05:32.000Z",
"ImageId": "ami-eeeeeeee"
}
]
}
My code looks like this so far after gathering the info and writing it to a .json file locally:
#writes json output to file...
print('writing to response.json...')
with open('response.json', 'w') as outfile:
json.dump(response, outfile, ensure_ascii=False, indent=4, sort_keys=True, separators=(',', ': '))
#Searches file...
print('opening response.json...')
with open("response.json") as f:
file_parsed = json.load(f)
The next part im stuck on is how to iterate through the file and print only the CreationDate and ImageId values.
print('printing CreationDate and ImageId...')
for ami in file_parsed['Images']:
#print ami['CreationDate'] #THIS WORKS
#print ami['ImageId'] #THIS WORKS
#print ami['CreationDate']['ImageId']
The last line there gives me this no matter how I have tried it: TypeError: string indices must be integers
My desired output is something like this:
2017-11-24T11:05:32.000Z ami-XXXXXXXX
Ultimately what im looking to do is then iterate through lines that are a certain date or older and deregister those AMIs. So would I be converting these to a list or a dict?
Pretty much not a programmer here so dont drown me.
TIA
You have almost parsed the json but for the desired output you need to concatenate the 'CreationDate' and 'ImageId' like this:
for ami in file_parsed['Images']:
print(ami['CreationDate'] + " "+ ami['ImageId'])
CreationDate evaluates to a string. So you can only take numerical indices of a string which is why ['CreationDate']['ImageId'] leads to a TypeError. Your other two commented lines, however, were correct.
To check if the date is older, you can make use of the datetime module. For instance, you can take the CreationDate (which is a string), convert it to a datetime object, create your own based on what that certain date is, and compare the two.
Something to this effect:
def checkIfOlder(isoformat, targetDate):
dateAsString = datetime.strptime(isoformat, '%Y-%m-%dT%H:%M:%S.%fZ')
return dateAsString <= targetDate
certainDate = datetime(2017, 11, 30) # Or whichever date you want
So in your for loop:
for ami in file_parsed['Images']:
creationDate = ami['CreationDate']
if checkIfOlder(creationDate, certainDate):
pass # write code to deregister AMIs here
Resources that would benefit would be Python's datetime documentation and in particular, the strftime/strptime directives. HTH!

How to Change a value in a Dataframe based on a lookup from a json file

I want to practice building models and I figured that I'd do it with something that I am familiar with: League of Legends. I'm having trouble replacing an integer in a dataframe with a value in a json.
The datasets I'm using come off of the kaggle. You can grab it and run it for yourself.
https://www.kaggle.com/datasnaek/league-of-legends
I have json file of the form: (it's actually must bigger, but I shortened it)
{
"type": "champion",
"version": "7.17.2",
"data": {
"1": {
"title": "the Dark Child",
"id": 1,
"key": "Annie",
"name": "Annie"
},
"2": {
"title": "the Berserker",
"id": 2,
"key": "Olaf",
"name": "Olaf"
}
}
}
and dataframe of the form
print df
gameDuration t1_champ1id
0 1949 1
1 1851 2
2 1493 1
3 1758 1
4 2094 2
I want to replace the ID in t1_champ1id with the lookup value in the json.
If both of these were dataframe, then I could use the merge option.
This is what I've tried. I don't know if this is the best way to read in the json file.
import pandas
df = pandas.read_csv("lol_file.csv",header=0)
champ = pandas.read_json("champion_info.json", typ='series')
for i in champ.data[0]:
for j in df:
if df.loc[j,('t1_champ1id')] == i:
df.loc[j,('t1_champ1id')] = champ[0][i]['name']
I get the below error:
the label [gameDuration] is not in the [index]'
I'm not sure that this is the most efficient way to do this, but I'm not sure how to do it at all either.
What do y'all think?
Thanks!
for j in df: iterates over the column names in df, which is unnecessary, since you're only looking to match against the column 't1_champ1id'. A better use of pandas functionality is to condense the id:name pairs from your JSON file into a dictionary, and then map it to df['t1_champ1id'].
player_names = {v['id']:v['name'] for v in json_file['data'].itervalues()}
df.loc[:, 't1_champ1id'] = df['t1_champ1id'].map(player_names)
# gameDuration t1_champ1id
# 0 1949 Annie
# 1 1851 Olaf
# 2 1493 Annie
# 3 1758 Annie
# 4 2094 Olaf
Created a dataframe from the 'data' in the json file (also transposed the resulting dataframe and then set the index to what you want to map, the id) then mapped that to the original df.
import json
with open('champion_info.json') as data_file:
champ_json = json.load(data_file)
champs = pd.DataFrame(champ_json['data']).T
champs.set_index('id',inplace=True)
df['champ_name'] = df.t1_champ1id.map(champs['name'])

transform JSON to hashmap in Ruby

I am trying to convert a json file which contain object and array to a JSON file.
Below is the JSON file
{
"localbusiness":{
"name": "toto",
"phone": "+11234567890"
},
"date":"05/02/2016",
"time":"5:00pm",
"count":"4",
"userInfo":{
"name": "John Doe",
"phone": "+10987654321",
"email":"john.doe#unknown.com",
"userId":"user1234333"
}
}
my goal is to save this is a database such as MongoId. I would like to use map to get something like:
localbusiness_name => "toto",
localbusiness_phone => "+11234567890",
date => "05/02/2016",
...
userInfo_name => "John Doe"
...
I have tried map but it's not splitting the array of local business or userInfo
def format_entry
ps = #params.map do | h |
ps.merge!(h)
##logger.info("entry #{h}")
end
##logger.info("formatting the data #{ps}")
ps
end
I do not really how to parse each entry and rebuild the name
It looks like to me you are trying to "flatten" the inner hashes into one big hash. Flatten being incorrect because you want to prepend the hash's key to the sub-hash's key. This will require looping through the hash, and then looping again through each sub hash. This code example will only work if you have 1 layer deep. if you have multiple layers, then I would suggest making two methods, or a recursive method.
#business = { # This is a hash, not a json blob, but you can take json and call JSON.parse(blob) to turn it into a hash.
"localbusiness":{
"name": "toto",
"phone": "+11234567890"
},
"date":"05/02/2016",
"time":"5:00pm",
"count":"4",
"userInfo":{
"name": "John Doe",
"phone": "+10987654321",
"email":"john.doe#unknown.com",
"userId":"user1234333"
}
}
#squashed_business = Hash.new
#business.each do |k, v|
if v.is_a? Hash
v.each do |key, value|
#squashed_business.merge! (k.to_s + "_" + key.to_s) => value
end
else
#squashed_business.merge! k => v
end
end
I noticed that you are getting "unexpected" outcomes when enumerating over a hash #params.each { |h| ... } because it gives you both a key and a value. Instead you want to do #params.each { |key, value| ... } as I did in the above code example.