I have 2 databases in MySQL:
1) An input Latitude-Longitude_dB ('latlong_db', henceforth): It has the latitude and longitude of each reading from a GPS tracking device.
2) A Weather_db: I read the input latlongs from dB1, and calculate 'current' weather data for each pair of latlongs (eg: humidity, cloud_coverage) . This weather data is written into a Weather_db.
The issue is: I need to keep track of which record (which 'input latlong') was read last. This is so that I don't recalculate weather_data for the latlongs that I've already covered. How do I keep track of the last read input_latlong?
Thank you so much.
Edit:
1) For those who have been asking about the 'database v/s table' question, the answer is that I am reading from 1 database and writing into the 2nd database. The 'config.json' to connect to the 2 databases is as follows:
{
"Tracker_ds_locallatlongdb": {
"database": "ds_testdb1",
"host": "XXXXXXXXXXX",
"port": XXXX,
"user": "XXXX",
"password": "XXXXX"
},
"Tracker_ds_localweatherdb": {
"database": "ds_testdb2",
"host": "XXXXXXX",
"port": XXXX,
"user": "XXXX",
"password": "XXXXX"
}
}
2) My Python script to read from the input_latlong_db and write into the weather_db is outlined as follows. I am using OpenWeatherMap API to calculate weather data for given latitudes and longitudes:
from pyowm import OWM
import json
import time
import pprint
import pandas as pd
import mysql.connector
from mysql.connector import Error
api_key = 'your api key'
def get_weather_data(my_lat, my_long):
owm= OWM(api_key)
obs= owm.weather_at_coords(my_lat.item() , my_long.item() ) #Use: <numpy.ndarray>.item:
w= obs.get_weather()
l= obs.get_location()
city= l.get_name()
cloud_coverage =w.get_clouds()
.
.
.
w_datatoinsert= [my_lat, my_long, w_latitude, w_longitude, city, weather_time_gmt,call_time_torontotime,
short_status, detailed_status,
temp_celsius, cloud_coverage, humidity, wind_deg, wind_speed,
snow, rain, atm_pressure, sea_level_pressure,sunset_time_gmt ] #15 + act_latitude + act_longitude
return w_datatoinsert
# ------------------------------------------------------------------------------------------------------------------------------------
spec_creds_1= {}
spec_creds_2= {}
def operation():
with open('C:/Users/config.json') as config_file:
creds_dict= json.load(config_file)
spec_creds_1= creds_dict['Tracker_ds_locallatlongdb']
spec_creds_2= creds_dict['Tracker_ds_localweatherdb']
try:
my_conn_1= mysql.connector.connect(**spec_creds_1 )
if (my_conn_1.is_connected()):
info_1= my_conn_1.get_server_info()
print("Connected ..now reading the local input_latlong_db: ", info_1)
try:
my_conn_2= mysql.connector.connect(**spec_creds_2)
if (my_conn_2.is_connected()):
info_2= my_conn_2.get_server_info()
print('Connected to write into the local weather_db: ', info_2)
cursor_2= my_conn_2.cursor()
readings_df= pd.read_sql("SELECT latitude, longitude FROM readings_table_19cols;", con= my_conn_1)
for index, row in readings_df.iterrows():
gwd= get_weather_data(row['latitude'], row['longitude'])
q= "INSERT INTO weather_table_19cols VALUES(" + ",".join(["%s"]*len(gwd)) + " ); "
cursor_2.execute(q, gwd)
my_conn_2.commit()
except Error as e:
print("Error while connecting to write into the local weather_db: ", e)
finally:
if (my_conn_2.is_connected()):
cursor_2.close()
my_conn_2.close()
print("Wrote 1 record to the local weather_db.")
except Error as e:
print("Error connecting to the local input latlong_db: ", e)
finally:
if (my_conn_1.is_connected()):
my_conn_1.close() # no cursor present for 'my_conn_1'
print("Finished reading all the input latlongs ...and finished inserting ALL the weather data.")
#-------------------------------------------------------------------------------
if __name__=="__main__":
operation()
In the Input_latlong_table- the readings_table_19cols: I created an Autoincremented readings_id as the Primary key , and a column called read_flag whose default value was 0.
In the weather_table_19cols: I created an Autoincremented weather_id as the Primary key.
Since my method involves reading an input latlong record and correspondingly writing its weather data into the weather_table, I compared the index of the readings_table_19cols and the weather_table_19cols. If they matched, that means that the input record was read, and I would set the read_flag to 1.
..
for index_1, row_1 in readings_df_1.iterrows():
gwd= get_weather_data(row_1['imei'], row_1['reading_id'] ,row_1['send_time'],row_1['latitude'], row_1['longitude'])
q_2= "INSERT INTO weather_table_23cols(my_imei, my_reading_id, actual_latitude, actual_longitude, w_latitude, w_longitude, city, weather_time_gmt, OBD_send_time_gmt, call_time_torontotime, \
short_status, detailed_status, \
temp_celsius, cloud_coverage, humidity, wind_deg, wind_speed, \
snow, rain, atm_pressure, sea_level_pressure,sunset_time_gmt) VALUES(" + ",".join(["%s"]*len(gwd)) + " ); "
q_1b= "UPDATE ds_testdb1.readings_table_22cols re, ds_testdb2.weather_table_23cols we \
SET re.read_flag=1 WHERE (re.reading_id= we.weather_id);" # use the prefix 'db_name.table_name' if 1 cursor is being used for 2 different db's
cursor_2.execute(q_2, gwd)
my_conn_2.commit()
cursor_1.execute(q_1b) # use Cursor_1 for 1st query
my_conn_1.commit()
Related
I am experiencing a strange issue with my Python code. It's objective is the following:
Retrieve a .csv from S3
Convert that .csv into JSON (Its an array of objects)
Add a few key value pairs to each object in the Array, and change the original key values
Validate the JSON
Sent the JSON to a /output S3 bucket
Load the JSON into Dynamo
Here's what the .csv looks like:
Prefix,Provider
ABCDE,Provider A
QWERT,Provider B
ASDFG,Provider C
ZXCVB,Provider D
POIUY,Provider E
And here's my python script:
import json
import boto3
import ast
import csv
import os
import datetime as dt
from datetime import datetime
import jsonschema
from jsonschema import validate
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
providerCodesSchema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"providerCode": {"type": "string", "maxLength": 5},
"providerName": {"type": "string"},
"activeFrom": {"type": "string", "format": "date"},
"activeTo": {"type": "string"},
"apiActiveFrom": {"type": "string"},
"apiActiveTo": {"type": "string"},
"countThreshold": {"type": "string"}
},
"required": ["providerCode", "providerName"]
}
}
datestamp = dt.datetime.now().strftime("%Y/%m/%d")
timestamp = dt.datetime.now().strftime("%s")
updateTime = dt.datetime.now().strftime("%Y/%m/%d/%H:%M:%S")
nowdatetime = dt.datetime.now()
yesterday = nowdatetime - dt.timedelta(days=1)
nintydaysfromnow = nowdatetime + dt.timedelta(days=90)
def lambda_handler(event, context):
filename_json = "/tmp/file_{ts}.json".format(ts=timestamp)
filename_csv = "/tmp/file_{ts}.csv".format(ts=timestamp)
keyname_s3 = "newloader-ptv/output/{ds}/{ts}.json".format(ds=datestamp, ts=timestamp)
json_data = []
for record in event['Records']:
bucket_name = record['s3']['bucket']['name']
key_name = record['s3']['object']['key']
s3_object = s3.get_object(Bucket=bucket_name, Key=key_name)
data = s3_object['Body'].read()
contents = data.decode('latin')
with open(filename_csv, 'a', encoding='utf-8') as csv_data:
csv_data.write(contents)
with open(filename_csv, encoding='utf-8-sig') as csv_data:
csv_reader = csv.DictReader(csv_data)
for csv_row in csv_reader:
json_data.append(csv_row)
for elem in json_data:
elem['providerCode'] = elem.pop('Prefix')
elem['providerName'] = elem.pop('Provider')
for element in json_data:
element['activeFrom'] = yesterday.strftime("%Y-%m-%dT%H:%M:%S.00-00:00")
element['activeTo'] = nintydaysfromnow.strftime("%Y-%m-%dT%H:%M:%S.00-00:00")
element['apiActiveFrom'] = " "
element['apiActiveTo'] = " "
element['countThreshold'] = "3"
element['updateDate'] = updateTime
try:
validate(instance=json_data, schema=providerCodesSchema)
except jsonschema.exceptions.ValidationError as err:
print(err)
err = "Given JSON data is InValid"
return None
with open(filename_json, 'w', encoding='utf-8-sig') as json_file:
json_file.write(json.dumps(json_data, default=str))
with open(filename_json, 'r', encoding='utf-8-sig') as json_file_contents:
response = s3.put_object(Bucket=bucket_name, Key=keyname_s3, Body=json_file_contents.read())
for jsonElement in json_data:
table = dynamodb.Table('privateProviders-loader')
table.put_item(Item=jsonElement)
print("finished enriching JSON")
os.remove(filename_csv)
os.remove(filename_json)
return None
I'm new to Python, so please forgive any amateur mistakes in the code.
Here's my issue:
When I deploy the code, and add a valid .csv into my S3 bucket, everything works.
When I then add an invalid .csv into my S3 buck, again it work, the import fails as the validation kicks in and tells me the problem.
However, when I add the valid .csv back into the S3 bucket, I get the same cloudwatch log as I did for the invalid .csv, and my Dynamo isn't updated, nor is an output JSON file sent to /output in S3.
With some troubleshooting I've noticed the following behavour:
When I first deploy the code, the first .csv loads as expected (dynamo table updated + JSON file sent to S3 + cloudwatch logs documenting the process)
If I enter the same valid .csv into the S3 bucket, it gives me the same nice looking cloudwatch logs, but none of the other actions take place (Dynamo not updated etc)
If I add the invalid .csv, that seems to break the cycle, and I get a nice Cloudwatch log showing the validation has kicked in, but if I reload the valid .csv, which just previously resulted in good cloudwatch logs (but no actual real outputs), I now get a repeat of the validation error log.
In short, the first time the function is invoked, it seems to work, the second time it doesn't.
It seems as though the python function is caching something or not closing out the function when finished, and I've played about with the return command etc, but nothing I've tried works. I've sunk many hours into trying to move parts of the code around etc. thinking the structure or order of events is the problem, and I've the code above gives me the closest behaviour to expected, given that it seems to work completely the first and only time I load the .csv into S3.
Any help or general pointers would be massively appreciated.
Thanks
P.s. Here's an example of the Cloudwatch log when validation kicks in a and stops an invalid .csv from being processed. If I then add a valid .csv to S£, the function is triggered, but I get this same error, even though the file is actually good.
2021-06-29T22:12:27.709+01:00 'ABCDEE' is too long
2021-06-29T22:12:27.709+01:00 Failed validating 'maxLength' in schema['items']['properties']['providerCode']:
2021-06-29T22:12:27.709+01:00 {'maxLength': 5, 'type': 'string'}
2021-06-29T22:12:27.709+01:00 On instance[2]['providerCode']:
2021-06-29T22:12:27.709+01:00 'ABCDEE'
2021-06-29T22:12:27.710+01:00 END RequestId: 81dd6a2d-130b-4c8f-ad08-39307841adf9
2021-06-29T22:12:27.710+01:00 REPORT RequestId: 81dd6a2d-130b-4c8f-ad08-39307841adf9 Duration: 482.43 ms Billed Duration: 483
I'm writing a script that will check the CVS COVID vaccine availability for cities in my state of VA. I have been successful getting the data I'm looking for, but my code is hard coded in some areas. I'm specifically asking for help improving my code in the areas number 1 & 2 below:
The JSON file can be found here:
https://www.cvs.com//immunizations/covid-19-vaccine.vaccine-status.VA.json?vaccineinfo
I'm trying to access the data in the responsePayloadData key. The only way I could figure out how to do this is to make it the only key. For that reason, I deleted the other key responseMetaData:
#remove the key that we don't need
del obj['responseMetaData']
I'm also not sure how to dynamically loop through the VA items without hard coding the number of cities I know are there in the data:
for x, y in obj.items():
for a in range(34):
Here's the full code:
import requests
import json
import time
from datetime import datetime
import urllib2
try:
import indigo
except:
pass
strAvail = "False"
strAvailCity = "None"
try:
# download raw json object from CVS Virginia Website
url = "https://www.cvs.com//immunizations/covid-19-vaccine.vaccine-status.VA.json?vaccineinfo"
data = urllib2.urlopen(url).read().decode()
except urllib2.HTTPError, err:
return {"error": err.reason, "error_code": err.code}
# parse json object
obj = json.loads(data)
# remove the key that we don't need
del obj['responseMetaData']
# loop through the JSON dictionary and check availability
# status options: {"Fully Booked", "Available"}
for x, y in obj.items():
for a in range(34):
# print('City: ' + y['data']['VA'][a]['city'])
# print('Total Available: ' + y['data']['VA'][a]['totalAvailable'])
# print('Percent Available: ' + y['data']['VA'][a]['pctAvailable'])
# print('Status: ' + y['data']['VA'][a]['status'])
# print("------------------------------")
# If there is availability anywhere in the state, take some action.
if y['data']['VA'][a]['status'] == "Available":
strAvail = True
strAvailCity = y['data']['VA'][a]['city']
# Log timestamp for this check to the JSON
now = datetime.now()
strDateTime = now.strftime("%m/%d/%Y %I:%M %p")
EDIT: Since the JSON is not available outside the US. I've pasted it below:
{"responsePayloadData":{"currentTime":"2021-02-11T14:55:00.470","data":{"VA":[{"totalAvailable":"1","city":"ABINGDON","state":"VA","pctAvailable":"0.19%","status":"Fully Booked"},{"totalAvailable":"0","city":"ALEXANDRIA","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"ARLINGTON","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"BEDFORD","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"BLACKSBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"CHARLOTTESVILLE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"CHATHAM","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"CHESAPEAKE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"1","city":"DANVILLE","state":"VA","pctAvailable":"0.19%","status":"Fully Booked"},{"totalAvailable":"2","city":"DUBLIN","state":"VA","pctAvailable":"0.39%","status":"Fully Booked"},{"totalAvailable":"0","city":"FAIRFAX","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"FREDERICKSBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"GAINESVILLE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"HAMPTON","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"HARRISONBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"LEESBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"LYNCHBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"MARTINSVILLE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"MECHANICSVILLE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"MIDLOTHIAN","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},
{"totalAvailable":"0","city":"NEWPORT NEWS","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"NORFOLK","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"PETERSBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"PORTSMOUTH","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"RICHMOND","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"ROANOKE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},
{"totalAvailable":"0","city":"ROCKY MOUNT","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"STAFFORD","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"SUFFOLK","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},
{"totalAvailable":"0","city":"VIRGINIA BEACH","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"WARRENTON","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"WILLIAMSBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"WINCHESTER","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"WOODSTOCK","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"}]}},"responseMetaData":{"statusDesc":"Success","conversationId":"Id-beb5f68730b34e6aa3bbc1fd927ea12b","refId":"Id-b4a7256078789eb59b8912b4","operation":"getInventorybyCity","statusCode":"0000"}}
Regarding problem 1, you can just access the data by key. You don't need to delete the other key:
payload = obj['responsePayloadData']
For the second problem, you can just iterate over the items in the list associated with obj['data']['VA']:
for city in payload['data']['VA']:
print(city)
{'city': 'ABINGDON',
'pctAvailable': '0.19%',
'state': 'VA',
'status': 'Fully Booked',
'totalAvailable': '1'}
{'city': 'ALEXANDRIA',
'pctAvailable': '0.00%',
'state': 'VA',
'status': 'Fully Booked',
'totalAvailable': '0'}
...
I am new to Apache spark and trying out a few POCs around this. I am trying to read json logs which are structured but a few fields are not always guaranteed, for example :
{
"item": "A",
"customerId": 123,
"hasCustomerId": true,
.
.
.
},
{
"item": "B",
"hasCustomerId": false,
.
.
.
}
}
Assume I want to transform these JSON logs into CSV, I was trying out Spark SQL to get hold of all the fields by simple Select statements but as the second JSON is missing a field(although it does has an identifier) I am not sure how can I handle this.
I want to transform the above json logs to
item, customerId, ....
A , 123 , ....
B , null/0 , ....
You should use SqlContext to read the JOSN file, sqlContext.read.json("file/path") But if you want to convert it into CSV and then you want to read it with missing values. Your CSV file should be look like
item,customerId,hasCustomerId, ....
A,123,, .... // hasCustomerId is null
B,,888, .... // customerId is null
i.e. empty record. Then you have to read this like
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true") // Use first line of all files as header
.option("inferSchema", "true") // Automatically infer data types
.load("file/path")
I am trying to get erlang-mysql-driver working, I managed to set it up and make queries but there are two things I cannot do.(https://code.google.com/archive/p/erlang-mysql-driver/issues)
(BTW, I am new to Erlang)
So Here is my code to connect MySQL.
<erl>
out(Arg) ->
mysql:start_link(p1, "127.0.0.1", "root", "azzkikr", "MyDB"),
{data, Result} = mysql:fetch(p1, "SELECT * FROM messages").
</erl>
1. I cannot get data from table.
mysql.erl doesn't contain any specific information on how to get table datas but this is the farthest I could go.
{A,B} = mysql:get_result_rows(Result),
B.
And the result was this:
ERROR erlang code threw an uncaught exception:
File: /Users/{username}/Sites/Yaws/index.yaws:1
Class: error
Exception: {badmatch,[[4,0,<<"This is done baby!">>,19238],
[5,0,<<"Success">>,19238],
[6,0,<<"Hello">>,19238]]}
Req: {http_request,'GET',{abs_path,"/"},{1,1}}
Stack: [{m181,out,1,
[{file,"/Users/{username}/.yaws/yaws/default/m181.erl"},
{line,18}]},
{yaws_server,deliver_dyn_part,8,
[{file,"yaws_server.erl"},{line,2818}]},
{yaws_server,aloop,4,[{file,"yaws_server.erl"},{line,1232}]},
{yaws_server,acceptor0,2,[{file,"yaws_server.erl"},{line,1068}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]
I understand that somehow I need to get second element and use foreach to get each data but strings are returned in different format like queried string is Success but returned string is <<"Success">>.
{badmatch,[[4,0,<<"This is done baby!">>,19238],
[5,0,<<"Success">>,19238],
[6,0,<<"Hello">>,19238]]}
First Question is: How do I get datas from table?
2. How to insert values into table using variables?
I can insert data into table using this method:
Msg = "Hello World",
mysql:prepare(add_message,<<"INSERT INTO messages (`message`) VALUES (?)">>),
mysql:execute(p1, add_message, [Msg]).
But there are two things I am having trouble,
1. I am inserting data without << and >> operators, because When I do Msg = << ++ "Hello World" >>, erlang throws out an exception (I think I am doing something wrong), i don't know wether they are required but without them I am able to insert data into table except this error bothers me after execution:
yaws code at /Users/{username}/Yaws/index.yaws:1 crashed or ret bad val:{updated,
{mysql_result,
[],
[],
1,
[]}}
Req: {http_request,'GET',{abs_path,"/"},{1,1}}
returned atom is updated while I commanded to insert data.
Question 2 is: How do I insert data into table in a proper way?
Error:
{badmatch,[[4,0,<<"This is done baby!">>,19238],
[5,0,<<"Success">>,19238],
[6,0,<<"Hello">>,19238]]}
Tells you that returned values is:
[[4,0,<<"This is done baby!">>,19238],
[5,0,<<"Success">>,19238],
[6,0,<<"Hello">>,19238]]
Which obviously can't match with either {data, Data} nor {A, B}. You can obtain your data as:
<erl>
out(Arg) ->
mysql:start_link(p1, "127.0.0.1", "root", "azzkikr", "MyDB"),
{ehtml,
[{table, [{border, "1"}],
[{tr, [],
[{td, [],
case Val of
_ when is_binary(Val) -> yaws_api:htmlize(Val);
_ when is_integer(val) -> integer_to_binary(Val)
end}
|| Val <- Row
]}
|| Row <- mysql:fetch(p1, "SELECT * FROM messages")
]}
]
}.
</erl>
I have a query from table of MySQL database, and i want check: have i got this number in table and if not - println that this number not in database, if this number exist - println that this number exist. How i can do that, using Exceptions or (If) (else) construction?
Assuming you're using an in memory hsql db:
def sql = Sql.newInstance( 'jdbc:hsqldb:mem:testDB', // JDBC Url
'sa', // Username
'', // Password
'org.hsqldb.jdbc.JDBCDriver') // Driver classname
def tim = 'tim'
def name = sql.firstRow( "SELECT name FROM users WHERE userid=$tim" )?.name
if( name ) {
println "Name was $name"
}
else {
println "Name not found"
}