I'm using ruby and the geocodio gem to do some reverse geocoding. The reverse geocoder returns an object of type Geocodio::Address which per their website is JSON. I'm trying to use ruby JSON:parse to convert to a hash that I can them map as needed.
JSON parse returns this error...
C:/Ruby23-x64/lib/ruby/2.3.0/json/common.rb:156:in parse': 784: unexpected token at '"Saipan, MP 96950"' (J
from C:/Ruby23-x64/lib/ruby/2.3.0/json/common.rb:156:inparse'
Here's my whole script
require 'geocodio'
require 'CSV'
require 'json'
filename = './for_fips.csv'
#fipslist = []
geocodio = Geocodio::Client.new('not real...d5a1557e2175d8ce265')
i = 1
CSV.foreach(filename) do |row|
lat = row[1]
long = row[2]
coord = lat + "," + long
#puts coord
add = geocodio.reverse_geocode([lat + "," + long],fields: %w[cd stateleg school timezone]).best
add_parsed = JSON.parse(add)
pp add_parsed
#puts add.each { |k,v| "#{k}=#{v}"}.join('~~')
i +=1
if i > 2 then break end
#fipslist << fcc.district_fips
end
Related
I was able to take a text file, read each line, create a dictionary per line, update(append) each line and store the json file. The issue is when reading the json file it will not read correctly. the error point to a storing file issue?
The text file looks like:
84.txt; Frankenstein, or the Modern Prometheus; Mary Wollstonecraft (Godwin) Shelley
98.txt; A Tale of Two Cities; Charles Dickens
...
import json
import re
path = "C:\\...\\data\\"
books = {}
books_json = {}
final_book_json ={}
file = open(path + 'books\\set_of_books.txt', 'r')
json_list = file.readlines()
open(path + 'books\\books_json.json', 'w').close() # used to clean each test
json_create = []
i = 0
for line in json_list:
line = line.replace('#', '')
line = line.replace('.txt','')
line = line.replace('\n','')
line = line.split(';', 4)
BookNumber = line[0]
BookTitle = line[1]
AuthorName = line[-1]
file
if BookNumber == ' 2701':
BookNumber = line[0]
BookTitle1 = line[1]
BookTitle2 = line[2]
AuthorName = line[3]
BookTitle = BookTitle1 + ';' + BookTitle2 # needed to combine title into one to fit dict format
books = json.dumps( {'AuthorName': AuthorName, 'BookNumber': BookNumber, 'BookTitle': BookTitle})
books_json = json.loads(books)
final_book_json.update(books_json)
with open(path + 'books\\books_json.json', 'a'
) as out_put:
json.dump(books_json, out_put)
with open(path + 'books\\books_json.json', 'r'
) as out_put:
'books\\books_json.json', 'r')]
print(json.load(out_put))
The reported error is: JSONDecodeError: Extra data: line 1 column 133
(char 132) - adding this is right between the first "}{". Not sure
how json should look in a flat-file format? The output file as seen on
an editor looks like: {"AuthorName": " Mary Wollstonecraft (Godwin)
Shelley", "BookNumber": " 84", "BookTitle": " Frankenstein, or the
Modern Prometheus"}{"AuthorName": " Charles Dickens", "BookNumber": "
98", "BookTitle": " A Tale of Two Cities"}...
I ended up changing the approach and used pandas to read the text and then spliting the single-cell input.
books = pd.read_csv(path + 'books\\set_of_books.txt', sep='\t', names =('r','t', 'a') )
#print(books.head(10))
# Function to clean the 'raw(r)' inoput data
def clean_line(cell):
...
return cell
books['r'] = books['r'].apply(clean_line)
books = books['r'].str.split(';', expand=True)
Using regex I extract a substring from a Sentence and then translate it multiple times but sometimes I getting an error.
My code:
from googletrans import Translator
try:
translator = Translator()
# substring Sentence: Direccion Referencia </tr>
# I want To Translate This Substring Sentence
# Address = re.search(r'(?<=Direccion).*?(?=</tr>)' ,
#new_get_htmlSource).group(0)
# Address Substring From Website
Address = re.search(r'(?<=<td>).*?(?=</td>)' , Address).group(0)
translated = translator.translate(str(Address))
Address = translated.text
except Exception as e :
exc_type , exc_obj , exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
print("Error ON : ",sys._getframe().f_code.co_name + "--> " + str(e) ,
"\n" , exc_type , "\n" ,fname , "\n" , exc_tb.tb_lineno)
I expect the output of This: Reference # Translated Text
I get this error:
scraping_data--> Expecting value: line 1 column 1 (char 0)
class: json.decoder.JSONDecodeError
Scraping_Things.py
29
I've got a ruby script which check microservices version from each microservice API. I try to run this with bamboo and return the result as a html table.
...
h = {}
threads = []
service = data_hash.keys
service.each do |microservice|
threads << Thread.new do
thread_id = Thread.current.object_id.to_s(36)
begin
h[thread_id] = ""
port = data_hash["#{microservice}"]['port']
nodes = "knife search 'chef_environment:#{env} AND recipe:#
{microservice}' -i 2>&1 | tail -n 2"
node = %x[ #{nodes} ].split
node.each do |n|
h[thread_id] << "\n<html><body><h4> Node: #{n} </h4></body></html>\n"
uri = URI("http://#{n}:#{port}/service-version")
res = Net::HTTP.get_response(uri)
status = Net::HTTP.get(uri)
data_hash = JSON.parse(status)
name = data_hash['name']
version = data_hash['version']
h[thread_id] << "<table><tr><th> #{name}:#{version} </th></tr></table>"
end
rescue => e
h[thread_id] << "ReadTimeout Error"
next
end
end
end
threads.each do |thread|
thread.join
end
ThreadsWait.all_waits(*threads)
puts h.to_a
The issue is that I want to output name and version into a html table and if I put threads it generates some random characters between each line:
<table><tr><th> microservice1:2.10.3 </th></tr></table>
bjfsw
<table><tr><th> microservice2:2.10.8 </th></tr></table>
The random characters are the keys of the hash generated with to_s(36).
Replace the puts h.to_a with something like
puts h.values.join("\n")
And you will only see the data and not the keys.
You can use kernel#p h or puts h.inspect to see this.
I am trying to map a list of (~320+) Texas addresses to lat,lng.
I started using geopy (simple example) and it worked for some addresses but it failed on a set of addresses.
So I integrated a backup with googlemaps geocode... but it too failed. Below is the code... see address_to_geoPt.
Yet, when I submit the failed addresses via the browser, it finds the address... any tips on how to get more reliable hits? Which googleapi should I use (see address_to_geoPt_googlemaps())
class GeoMap(dbXLS):
def __init__(self, **kwargs):
super(umcGeoMap, self).__init__(**kwargs)
# Geo Locator
self.gl = Nominatim()
self.gmaps = googlemaps.Client(key='mykeyISworking')
shName = self.xl.sheet_names[0] if 'sheet' not in kwargs else kwargs['sheet']
self.df = self.xl.parse(shName)
def address_to_geoPt(self, addr):
l = self.geoLocation(addr)
if l : return (l.latitude, l.longitude)
return (np.nan, np.nan)
def address_to_geoPt_googlemaps(self, addr):
geocode = self.gmaps.geocode(addr)
if l == None : return (np.nan, np.nan)
# Geocoding an address
locDict = geocode[0]['geometry']['location']
return(locDict['lat'], locDict['lng'])
def address(self, church):
return (church.Address1 + " "
+ church.City + " "
+ church.State + " "
+ church.ZipCode + " "
+ church.Country)
def church_to_geoPt(self, church):
a = (church.Address1 + " "
+ church.City + " "
+ church.State)
if pd.isnull(church.geoPt):
(lat, lng) = self.address_to_geoPt(a)
else: (lat, lng ) = church.geoPt
if not pd.isnull(lat) :
print("DEBUG to_geoPt 1", lat, lng, a)
return (lat,lng)
(lat, lng) = self.address_to_geoPt_googlemaps(a)
print("DEBUG to_geoPt 2", lat, lng, a)
return (lat, lng)
The following shows a set of addresses that are not mapped by geocoders.
4 3000 Bee Creek Rd Spicewood TX 78669-5109 USA
6 P O BOX 197 BERTRAM TX 78605-0197 USA
10 2833 Petuma Dr Kempher TX 78639 USA
#geocodezip provided the answer... and the code worked the next day.
I am using R to extract tweets and analyse their sentiment, however when I get to the lines below I get an error saying "Object of type 'closure' is not subsettable"
scores$drink = factor(rep(c("east"), nd))
scores$very.pos = as.numeric(scores$score >= 2)
scores$very.neg = as.numeric(scores$score <= -2)
Full code pasted below
load("twitCred.Rdata")
east_tweets <- filterStream("tweetselnd.json", locations = c(-0.10444, 51.408699, 0.33403, 51.64661),timeout = 120, oauth = twitCred)
tweets.df <- parseTweets("tweetselnd.json", verbose = FALSE)
##function score.sentiment
score.sentiment = function(sentences, pos.words, neg.words, .progress='none')
{
# Parameters
# sentences: vector of text to score
# pos.words: vector of words of postive sentiment
# neg.words: vector of words of negative sentiment
# .progress: passed to laply() to control of progress bar
scores = laply(sentences,
function(sentence, pos.words, neg.words)
{
# remove punctuation
sentence = gsub("[[:punct:]]", "", sentence)
# remove control characters
sentence = gsub("[[:cntrl:]]", "", sentence)
# remove digits?
sentence = gsub('\\d+', '', sentence)
# define error handling function when trying tolower
tryTolower = function(x)
{
# create missing value
y = NA
# tryCatch error
try_error = tryCatch(tolower(x), error=function(e) e)
# if not an error
if (!inherits(try_error, "error"))
y = tolower(x)
# result
return(y)
}
# use tryTolower with sapply
sentence = sapply(sentence, tryTolower)
# split sentence into words with str_split (stringr package)
word.list = str_split(sentence, "\\s+")
words = unlist(word.list)
# compare words to the dictionaries of positive & negative terms
pos.matches = match(words, pos.words)
neg.matches = match(words, neg.words)
# get the position of the matched term or NA
# we just want a TRUE/FALSE
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
# final score
score = sum(pos.matches) - sum(neg.matches)
return(score)
}, pos.words, neg.words, .progress=.progress )
# data frame with scores for each sentence
scores.df = data.frame(text=sentences, score=scores)
return(scores.df)
}
pos = readLines(file.choose())
neg = readLines(file.choose())
east_text = sapply(east_tweets, function(x) x$getText())
scores = score.sentiment(tweetseldn.json, pos, neg, .progress='text')
scores()$drink = factor(rep(c("east"), nd))
scores()$very.pos = as.numeric(scores()$score >= 2)
scores$very.neg = as.numeric(scores$score <= -2)
# how many very positives and very negatives
numpos = sum(scores$very.pos)
numneg = sum(scores$very.neg)
# global score
global_score = round( 100 * numpos / (numpos + numneg) )
If anyone could help with as to why I'm getting this error it will be much appreciated. Also I've seen other answeres about adding '()' when referring to the variable 'scores' such as scores()$.... but it hasn't worked for me. Thank you.
The changes below got rid of the error:
x <- scores
x$drink = factor(rep(c("east"), nd))
x$very.pos = as.numeric(x$score >= 2)
x$very.neg = as.numeric(x$score <= -2)