Parsing, Extracting & Returning JSON as Hash - json

I am trying to make a localized version of this app: SMS Broadcast Ruby App
I have been able to get the JSON data from a local file & sanitize the number as well as open the JSON data. However I have been unable to extract the values and pair them as a scrubbed hash. Here's what I have so far.
def data_from_spreadsheet
file = open(spreadsheet_url).read
JSON.parse(file)
end
def contacts_from_spreadsheet
contacts = {}
data_from_spreadsheet.each do |entry|
puts entry['name']['number']
contacts[sanitize(number)] = name
end
contacts
end
Here's the JSON data sample I'm working with.
[
{
"name": "Michael",
"number": 9045555555
},
{
"name": "Natalie",
"number": 7865555555
}
]
Here's how I would like the JSON to be expressed after the contacts_from_spreadsheet method.
{
'19045555555' => 'Michael',
'19045555555' => 'Natalie'
}
Any help would be much appreciated.

You could create array of pairs (hashes) using map and then call reduce to get a single hash.
data = [{
"name": "Michael",
"number": 9045555555
},
{
"name": "Natalie",
"number": 7865555555
}]
data.map{|e| {e[:number] => e[:name]}}.reduce Hash.new, :merge
Result: {9045555555=>"Michael", 7865555555=>"Natalie"}

You don't seem to have number or name extracted in any way. I think first you'll need to update your code to get those details.
i.e. If entry is a JSON object (or rather was before parsing), you can do the following:
def contacts_from_spreadsheet
contacts = {}
data_from_spreadsheet.each do |entry|
contacts[sanitize(entry['number'])] = entry['name']
end
contacts
end

Not really keeping this function within JSON, but I have solved the problem. Here's what I used.
def data_from_spreadsheet
file = open(spreadsheet_url).read
YAML.load(file)
end
def contacts_from_spreadsheet
contacts = {}
data_from_spreadsheet.each do |entry|
name = entry['name']
number = entry['phone_number'].to_s
contacts[sanitize(number)] = name
end
contacts
end
This returned back clean array here:
{"+19045555555"=>"Michael", "+17865555555"=>"Natalie"}
Thanks everyone who added input!

Related

Iteration of json array to display in a flask API - single entry displayed?

I'd appreciate some help with the following python problem. I have a json that looks like this:
"gene": {
"entrezGeneId": 7157,
"entrezGeneSymbol": "TP53"
},
"termAssoc": [
{
"ontologyId": "HP:0100592",
"name": "Peritoneal abscess",
"definition": "The presence of an abscess of the peritoneum."
},
{
"ontologyId": "HP:0002756",
"name": "Pathologic fracture",
"definition": "A pathologic fracture occurs when a bone breaks in an area that is weakened secondarily to another disease process such as tumor, infection, and certain inherited bone disorders. A pathologic fracture can occur without a degree of trauma required to cause fracture in healthy bone."
},
{
"ontologyId": "HP:0005513",
"name": "Increased megakaryocyte count",
"definition": "Increased megakaryocyte number, i.e., of platelet precursor cells, present in the bone marrow."
},
{
"ontologyId": "HP:0004396",
"name": "Poor appetite",
"definition": ""
},
I am building a simple Flask API where an input is submitted and specific entries are extracted I want to extract the ontologyId from the array. I tried iterating:
class HPO_Class(Resource):
def get(self, Entrez_ID):
result=url_func(Entrez_ID)
for ID in result['termAssoc']:
data= ID['ontologyId']
return data
But that returns only one entry
API response
I understand that this is a json array and I am missing something but I am stumped. I tried the following
class HPO_Class(Resource):
def get(self, Entrez_ID):
result=url_func(Entrez_ID)
for ID in result['termAssoc']:
data= ID
return data
But that returns
{
"ontologyId": "HP:0003010",
"name": "Prolonged bleeding time",
"definition": "Prolongation of the time taken for a standardized skin cut of fixed depth and length to stop bleeding."
}
Only a single entry appears. I have no idea what to do and am hoping for some insight into my lacking of understanding of this problem :) Thank you
BTW url_func is just a function to create and get requests for the API. Code is below if it helps:
import requests, sys, json
from pprint import pprint
import re
#Function to create url for HPO API, gene_id = Entrez ID
def url_func(gene_id):
base_url = 'https://hpo.jax.org/api/hpo/gene'
ext = gene_id
url= '/'.join([base_url, ext]) # url is join at the / amd consists of base url and ext
#GET requests using get function from requests, url is the argument
r = requests.get(url, headers={"Content-Type": "application/json"})
#requests check, if not valid the fcuntion will exit from python
if not r.ok:
r.raise_for_status()
sys.exit()
return(r.json()) #returns results as json
Thank you for all your help!! :)
For anyone keeping track. I found the answer in the end. Here is how I wrote this function in the end
def get(self, Entrez_ID):
result=url_func(Entrez_ID, mode = 1, no_results=-1)
data=[]
for ID in result['termAssoc']:
data.append(ID.get('ontologyId'))
#data= (ID.get('ontologyId'))
return data
That solved it and it return a dictionary of the entries need
[
"HP:0000822",
"HP:0002894",
"HP:0006491",
"HP:0045040",
"HP:0100787",
"HP:0002910",
"HP:0030392",
"HP:0002254",
"HP:0002861",
"HP:0004808",
"HP:0002863",
"HP:0200063",
"HP:0030448",
"HP:0100526",
"HP:0100576",
"HP:0001433",
"HP:0002756",
"HP:0100592",
"HP:0006744",
"HP:0004389",
"HP:0000952",
"HP:0009919",
"HP:0002891",
"HP:0002013",
"HP:0003010",
"HP:0012639",
"HP:0002797",
"HP:0002027",
"HP:0100743",
"HP:0004322",
"HP:0006572",
"HP:0003003",
"HP:0000739",
"HP:0100768",
"HP:0001738",
"HP:0004374",
"HP:0030406",
"HP:0005584",
"HP:0001065",
"HP:0001962",
"HP:0025269",
"HP:0100659",
"HP:0002488",
"HP:0000975",
"HP:0011875",
"HP:0002859",
"HP:0003002",
"HP:0003118",
"HP:0200022",
"HP:0002890",
"HP:0100605",
"HP:0001824",
"HP:0000238",
"HP:0002039",
"HP:0004324",
"HP:0001263",
"HP:0006753",
"HP:0001428",
"HP:0001909",
"HP:0006489",
"HP:0000819",
"HP:0011027",
"HP:0004936",
"HP:0002750",
"HP:0012125",
"HP:0006725",
"HP:0011974",
"HP:0012410",
"HP:0000135",
"HP:0012030",
"HP:0001413",
"HP:0005561",
"HP:0025435",
"HP:0100006",
"HP:0000252",
"HP:0100543",
"HP:0001945",
"HP:0003401",
"HP:0012288",
"HP:0002017",
"HP:0000029",
"HP:0000737",
"HP:0012539",
"HP:0001402",
"HP:0002669",
"HP:0000006",
"HP:0001276",
"HP:0002018",
"HP:0003155",
"HP:0410067",
"HP:0001250",
"HP:0001658",
"HP:0005249",
"HP:0002667",
"HP:0001872",
"HP:0025380",
"HP:0500022",
"HP:0009726",
"HP:0002315",
"HP:0004420",
"HP:0005513",
"HP:0003110",
"HP:0004313",
"HP:0002888",
"HP:0012126",
"HP:0000080",
"HP:0001085",
"HP:0000505",
"HP:0012334",
"HP:0010982",
"HP:0030078",
"HP:0001425",
"HP:0001744",
"HP:0002665",
"HP:0030348",
"HP:0002664",
"HP:0000859",
"HP:0003466",
"HP:0004396",
"HP:0012531",
"HP:0006716",
"HP:0000944",
"HP:0003418",
"HP:0001903",
"HP:0012174",
"HP:0012189",
"HP:0025134",
"HP:0001386",
"HP:0025436",
"HP:0002326",
"HP:0002900",
"HP:0100630",
"HP:0001939",
"HP:0011748",
"HP:0010788",
"HP:0007378",
"HP:0100749",
"HP:0002896",
"HP:0001324",
"HP:0002716",
"HP:0006740",
"HP:0006721",
"HP:0009592",
"HP:0012740",
"HP:0012432",
"HP:0025318",
"HP:0100615",
"HP:0002885",
"HP:0000998",
"HP:0030070"
]
Cheers

Python glom with list of records group common unique client_ids together as key

I just discovered glom and the tutorial makes sense, but I can't figure out the right spec to use for chrome BrowserHistory.json entries to create a data structure grouped by client_id or if this is even the right use of glom. I think I can accomplish this using other methods by looping over the json, but was hoping to learn more about glom and its capabilities.
The json has Browser_History with a list for each history entry as follows:
{
"Browser_History": [
{
"favicon_url": "https://www.google.com/favicon.ico",
"page_transition": "LINK",
"title": "Google Takeout",
"url": "https://takeout.google.com",
"client_id": "abcd1234",
"time_usec": 1424794867875291
},
...
I'd like a data structure where everything is grouped by the client_id, like with the client_id as the key to a list of dicts, something like:
{ 'client_ids' : {
'abcd1234' : [ {
"title" : "Google Takeout",
"url" : "https://takeout.google.com",
...
},
...
],
'wxyz9876' : [ {
"title" : "Google",
"url" : "https://www.google.com",
...
},
...
}
}
Is this something glom is suited for? I've been playing around with it and reading, but I can't seem to get the spec correct to accomplish what I need. Best I've got without error is:
with open(history_json) as f:
history_list = json.load(f)['Browser_History']
spec = {
'client_ids' : ['client_id']
}
pprint(glom(data, spec))
which gets me a list of all the client_ids, but I can't figure out how to group them together as keys rather than have them as a big list. any help would be appreciated, thanks!
This should do the trick although I'm not sure if this is the most "glom"-ic way to achieve this.
import glom
grouping_key = "client_ids"
def group_combine (existing,incoming):
# existing is a dictionary used for accumulating the data
# incoming is each item in the list (your input)
if incoming[grouping_key] not in existing:
existing[incoming[grouping_key]] = []
if grouping_key in incoming:
existing[incoming[grouping_key]].append(incoming)
return existing
data ={ 'Browser_History': [{}] } # your data structure
fold_spec = glom.Fold(glom.T,init = dict, op = group_combine )
results = glom.glom(data["Browser_History"] ,{ grouping_key:fold_spec })

Convert array of hash by logstash to simple hash

I am trying to parse by logstash this:
"attributes": [
{
"value": "ValueA",
"name": "NameA"
},
{
"value": "ValueB",
"name": "NameB"
},
{
"value": "ValueC",
"name": "NameC"
}
],
To this:
"attributes": {
"NameA": "ValueA",
"NameB": "ValueB",
"NameC": "ValueC"
}
Any recommendations?
I don't want to split this list to more records...
I have found th solution. For anyone dealing with a similar problem, here is the solution and a short story...
In the beginning, I tried this:
ruby {
code => '
xx = event.get("[path][to][data_source]")
event.set(['_my_destination'], Hash[xx.collect { |p| [p[:name], p[:value]] }])
'
But it returned an error because of the set method allowing a string only.
So I tried to do it this way:
ruby {
code => '
event.get("[path][to][data_source]").each do |item|
k = item[:name]
event.set("[_my_destination][#{k}]", item[:value])
end
'
}
I spent a lot of time debugging it because it works everywhere except in logstash :-D. After some grumbling, I finally fixed it. The solution with commentary is as follows.
ruby {
code => '
i = 0 # need index to address hash in array
event.get("[json_msg][mail][headers]").each do |item|
# need to use event.get function to get value
k = event.get("[json_msg][mail][headers][#{i}][name]")
v = event.get("[json_msg][mail][headers][#{i}][value]")
# now it is simple
event.set("[json_msg][headers][#{k}]", v)
i += 1
end
'
}
I think you should be able to do it with a custom Ruby script - see the ruby filter. You'll use the Event API to manipulate the event.
Maybe the aggregate filter could be also used but the Ruby script with one for loop seems more straightforward to me.

How do I access this JSON API data in Ruby?

I am writing a short Ruby program that is going to take a zipcode and return the names of cities within 2 miles of that zipcode. I successfully called an API and was able to parse the JSON data, but I'm unsure how to access the 'city' key.
url = API call (not going to replicate here since it requires a key)
uri = URI(url)
response = Net::HTTP.get(uri)
JSON.parse(response)
Here's what my JSON looks like.
{
"results": [
{
"zip": "08225",
"city": "Northfield",
"county": "Atlantic",
"state": "NJ",
"distance": "0.0"
},
{
"zip": "08221",
"city": "Linwood",
"county": "Atlantic",
"state": "NJ",
"distance": "1.8"
}
]
}
I've been trying to access 'city' like this:
response['result'][0]['city']
This appears to be incorrect. Also tried
response[0][0]['city']
And a couple of other permutations of the same code.
How can I get the value 'Northfield' out of the JSON data?
You're almost there, just use results instead of result on the result of JSON.parse(response) instead of on response:
JSON.parse(response)["results"][0]["city"]
#=> "Northfield"
JSON parse will create a hash then you can target the results which is an array of hashes, like so:
hash = JSON.parse(response)
hash['results'].select{|h| h['city'] == 'Northfield'}
Or if you only care about the results:
array = JSON.parse(response)['results']
array.select{|a| a['city' == 'Northfield'} #
To get just a single data point from the data, you might select one item in the array and then the key of the value you want:
array[0]['city']
For all the cities
cities = array.map{|k,v| k['city']}
You have a typo error, instead of response['result'] you can use it like response[:results].
And if you want to get the value of city key from all the hash, then response['result'][0]['city'] will not work.
After parsing response you will get an array of hashes, i.e
[{:zip=>"08225", :city=>"Northfield", :county=>"Atlantic", :state=>"NJ", :distance=>"0.0"}, {:zip=>"08221", :city=>"Linwood", :county=>"Atlantic", :state=>"NJ", :distance=>"1.8"}]
And if you want to fetch the values of key city from all the hash then you can try this steps
response[:results].map{|x| x[:city]}
which will give the result
["Atlantic", "Atlantic"]

Parsing Google Custom Search API for Elasticsearch Documents

After retrieving results from the Google Custom Search API and writing it to JSON, I want to parse that JSON to make valid Elasticsearch documents. You can configure a parent - child relationship for nested results. However, this relationship seems to not be inferred by the data structure itself. I've tried automatically loading, but not results.
Below is some example input that doesn't include things like id or index. I'm trying to focus on creating the correct data structure. I've tried modifying graph algorithms like depth-first-search but am running into problems with the different data structures.
Here's some example input:
# mock data structure
google = {"content": "foo",
"results": {"result_one": {"persona": "phone",
"personb": "phone",
"personc": "phone"
},
"result_two": ["thing1",
"thing2",
"thing3"
],
"result_three": "none"
},
"query": ["Taylor Swift", "Bob Dole", "Rocketman"]
}
# correctly formatted documents for _source of elasticsearch entry
correct_documents = [
{"content":"foo"},
{"results": ["result_one", "result_two", "result_three"]},
{"result_one": ["persona", "personb", "personc"]},
{"persona": "phone"},
{"personb": "phone"},
{"personc": "phone"},
{"result_two":["thing1","thing2","thing3"]},
{"result_three": "none"},
{"query": ["Taylor Swift", "Bob Dole", "Rocketman"]}
]
Here is my current approach this is still a work in progress:
def recursive_dfs(graph, start, path=[]):
'''recursive depth first search from start'''
path=path+[start]
for node in graph[start]:
if not node in path:
path=recursive_dfs(graph, node, path)
return path
def branching(google):
""" Get branches as a starting point for dfs"""
branch = 0
while branch < len(google):
if google[google.keys()[branch]] is dict:
#recursive_dfs(google, google[google.keys()[branch]])
pass
else:
print("branch {}: result {}\n".format(branch, google[google.keys()[branch]]))
branch += 1
branching(google)
You can see that recursive_dfs() still needs to be modified to handle string, and list data structures.
I'll keep going at this but if you have thoughts, suggestions, or solutions then I would very much appreciate it. Thanks for your time.
here is a possible answer to your problem.
def myfunk( inHole, outHole):
for keys in inHole.keys():
is_list = isinstance(inHole[keys],list);
is_dict = isinstance(inHole[keys],dict);
if is_list:
element = inHole[keys];
new_element = {keys:element};
outHole.append(new_element);
if is_dict:
element = inHole[keys].keys();
new_element = {keys:element};
outHole.append(new_element);
myfunk(inHole[keys], outHole);
if not(is_list or is_dict):
new_element = {keys:inHole[keys]};
outHole.append(new_element);
return outHole.sort();