CSV Reader works, Trouble with CSV writer - csv

I am writing a very simple python script to READ a CSV (no problem) and to write to another CSV (issue):
System info:
Windows 10
Powershell
Python 3.6.5 :: Anaconda, Inc.
Sample Data: Office Events
The purpose is to filter events based on criteria, and to write to another CSV with desired criteria.
For Example:
I would like to read from this CSV and write the events where Registrations (or column 4) is Greater than 0 (remove rows with registrations = 0)
# SCRIPT TO FILTER EVENTS TO BE PROCESSED
import os
import time
import shutil
import os.path
import fnmatch
import csv
import glob
import pandas
# Location of file containing ALL events
path = r'allEvents.csv'
# Writes to writer
writer = csv.writer(open(r'RegisteredEvents' + time.strftime("%m_%d_%Y-%I_%M_%S") + '.csv', "wb"))
writer.writerow(["Event Name", "Start Date", "End Date", "Registrations", "Total Revenue", "ID", "Status"])
#writer.writerow([r'Event Name', r'Start Date', r'End Date', r'Registrations', r'Total Revenue', r'ID', r'Status'])
#writer.writerow([b'Event Name', b'Start Date', b'End Date', b'Registrations', b'Total Revenue', b'ID', b'Status'])
def checkRegistrations(file):
reader = csv.reader(file)
data = list(reader)
for row in data:
#if row[3] > str(0):
if row[3] > int(0):
writer.writerow(([data]))
The Error I continue to get is:
writer.writerow(["Event Name", "Start Date", "End Date", "Registrations", "Total Revenue", "ID", "Status"])
TypeError: a bytes-like object is required, not 'str'
I have tried using the various commented out statements
For Example:
"" vs r"" vs r'' vs b''
if row[3] > int(0) **vs** if row[3] > str(0)
Every time I execute my script, It creates the file.. so the first csv writer line works (create and open the file)... the second line (to write the headers) is when the error appears...
Perhaps I am getting mixed up with syntax due to python versions, or perhaps I am misusing the CSV library, or (more than likely) I have endless to learn about data type IO and conversion... someone please help!!
I am aware of the excess of import libraries -- script came from another basic script to move files from one location to another based on filename and output a rowcounter for each file being moved.
With that being said, I may be unaware of any missing/ needed libraries
Please let me know if you have any questions, concerns or clarifications
Thanks in advance!

It looks like you are calling:
writer = csv.writer(open('file.csv', 'wb'))
The 'wb' argument is the file mode. The 'b' means that you are opening the file that you are writing to in binary mode. You are then trying to write a string which isn't what it is expecting.
Try getting rid of the 'b' in the 'wb'.
writer = csv.writer(open('file.csv', 'w'))
Let me know if that works for you.

Related

how to import variables from a json file to attributes in BUILD.bazel?

I would like to import variables defined in a json file(my_info.json) as attibutes for bazel rules.
I tried this (https://docs.bazel.build/versions/5.3.1/skylark/tutorial-sharing-variables.html) and works but do not want to use a .bzl file and import variables directly to attributes to BUILD.bazel.
I want to use those variables imported from my_info.json as attributes for other BUILD.bazel files.
projects/python_web/BUILD.bazel
load("//projects/tools/parser:config.bzl", "MY_REPO","MY_IMAGE")
container_push(
name = "publish",
format = "Docker",
registry = "registry.hub.docker.com",
repository = MY_REPO,
image = MY_IMAGE,
tag = "1",
)
Asking the similar in Bazel slack I was informed the is not possible to import variables directly to Bazel and it is needed to parse the json variables and write them into a .bzl file.
I tried also this code but nothing is written in config.bzl file.
my_info.json
{
"MYREPO" : "registry.hub.docker.com",
"MYIMAGE" : "michael/monorepo-python-web"
}
WORKSPACE.bazel
load("//projects/tools/parser:jsonparser.bzl", "load_my_json")
load_my_json(
name = "myjson"
)
projects/tools/parser/jsonparser.bzl
def _load_my_json_impl(repository_ctx):
json_data = json.decode(repository_ctx.read(repository_ctx.path(Label(":my_info.json"))))
config_lines = ["%s = %s" % (key, repr(val)) for key, val in json_data.items()]
repository_ctx.file("config.bzl", "\n".join(config_lines))
load_my_json = repository_rule(
implementation = _load_my_json_impl,
attrs = {},
)
projects/tools/parser/BUILD.bazel
load("#aspect_bazel_lib//lib:yq.bzl", "yq")
load(":config.bzl", "MYREPO", "MY_IMAGE")
yq(
name = "convert",
srcs = ["my_info2.json"],
args = ["-P"],
outs = ["bar.yaml"],
)
Executing:
% bazel build projects/tools/parser:convert
ERROR: Traceback (most recent call last):
File "/Users/michael.taquia/Documents/Personal/Projects/bazel/bazel-projects/multi-language-bazel-monorepo/projects/tools/parser/BUILD.bazel", line 2, column 22, in <toplevel>
load(":config.bzl", "MYREPO", "MY_IMAGE")
Error: file ':config.bzl' does not contain symbol 'MYREPO'
When making troubleshooting I see the execution calls the jsonparser.bzl but never enters to _load_my_json_impl function (based in print statements) and does not write anything to config.bzl.
Notes: Tested on macOS 12.6 (21G115 ) Darwin Kernel Version 21.6.0
There is a better way to do that? A code snippet will be very useful.

Python Lambda actioning CSV file once but not the second time

I am experiencing a strange issue with my Python code. It's objective is the following:
Retrieve a .csv from S3
Convert that .csv into JSON (Its an array of objects)
Add a few key value pairs to each object in the Array, and change the original key values
Validate the JSON
Sent the JSON to a /output S3 bucket
Load the JSON into Dynamo
Here's what the .csv looks like:
Prefix,Provider
ABCDE,Provider A
QWERT,Provider B
ASDFG,Provider C
ZXCVB,Provider D
POIUY,Provider E
And here's my python script:
import json
import boto3
import ast
import csv
import os
import datetime as dt
from datetime import datetime
import jsonschema
from jsonschema import validate
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
providerCodesSchema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"providerCode": {"type": "string", "maxLength": 5},
"providerName": {"type": "string"},
"activeFrom": {"type": "string", "format": "date"},
"activeTo": {"type": "string"},
"apiActiveFrom": {"type": "string"},
"apiActiveTo": {"type": "string"},
"countThreshold": {"type": "string"}
},
"required": ["providerCode", "providerName"]
}
}
datestamp = dt.datetime.now().strftime("%Y/%m/%d")
timestamp = dt.datetime.now().strftime("%s")
updateTime = dt.datetime.now().strftime("%Y/%m/%d/%H:%M:%S")
nowdatetime = dt.datetime.now()
yesterday = nowdatetime - dt.timedelta(days=1)
nintydaysfromnow = nowdatetime + dt.timedelta(days=90)
def lambda_handler(event, context):
filename_json = "/tmp/file_{ts}.json".format(ts=timestamp)
filename_csv = "/tmp/file_{ts}.csv".format(ts=timestamp)
keyname_s3 = "newloader-ptv/output/{ds}/{ts}.json".format(ds=datestamp, ts=timestamp)
json_data = []
for record in event['Records']:
bucket_name = record['s3']['bucket']['name']
key_name = record['s3']['object']['key']
s3_object = s3.get_object(Bucket=bucket_name, Key=key_name)
data = s3_object['Body'].read()
contents = data.decode('latin')
with open(filename_csv, 'a', encoding='utf-8') as csv_data:
csv_data.write(contents)
with open(filename_csv, encoding='utf-8-sig') as csv_data:
csv_reader = csv.DictReader(csv_data)
for csv_row in csv_reader:
json_data.append(csv_row)
for elem in json_data:
elem['providerCode'] = elem.pop('Prefix')
elem['providerName'] = elem.pop('Provider')
for element in json_data:
element['activeFrom'] = yesterday.strftime("%Y-%m-%dT%H:%M:%S.00-00:00")
element['activeTo'] = nintydaysfromnow.strftime("%Y-%m-%dT%H:%M:%S.00-00:00")
element['apiActiveFrom'] = " "
element['apiActiveTo'] = " "
element['countThreshold'] = "3"
element['updateDate'] = updateTime
try:
validate(instance=json_data, schema=providerCodesSchema)
except jsonschema.exceptions.ValidationError as err:
print(err)
err = "Given JSON data is InValid"
return None
with open(filename_json, 'w', encoding='utf-8-sig') as json_file:
json_file.write(json.dumps(json_data, default=str))
with open(filename_json, 'r', encoding='utf-8-sig') as json_file_contents:
response = s3.put_object(Bucket=bucket_name, Key=keyname_s3, Body=json_file_contents.read())
for jsonElement in json_data:
table = dynamodb.Table('privateProviders-loader')
table.put_item(Item=jsonElement)
print("finished enriching JSON")
os.remove(filename_csv)
os.remove(filename_json)
return None
I'm new to Python, so please forgive any amateur mistakes in the code.
Here's my issue:
When I deploy the code, and add a valid .csv into my S3 bucket, everything works.
When I then add an invalid .csv into my S3 buck, again it work, the import fails as the validation kicks in and tells me the problem.
However, when I add the valid .csv back into the S3 bucket, I get the same cloudwatch log as I did for the invalid .csv, and my Dynamo isn't updated, nor is an output JSON file sent to /output in S3.
With some troubleshooting I've noticed the following behavour:
When I first deploy the code, the first .csv loads as expected (dynamo table updated + JSON file sent to S3 + cloudwatch logs documenting the process)
If I enter the same valid .csv into the S3 bucket, it gives me the same nice looking cloudwatch logs, but none of the other actions take place (Dynamo not updated etc)
If I add the invalid .csv, that seems to break the cycle, and I get a nice Cloudwatch log showing the validation has kicked in, but if I reload the valid .csv, which just previously resulted in good cloudwatch logs (but no actual real outputs), I now get a repeat of the validation error log.
In short, the first time the function is invoked, it seems to work, the second time it doesn't.
It seems as though the python function is caching something or not closing out the function when finished, and I've played about with the return command etc, but nothing I've tried works. I've sunk many hours into trying to move parts of the code around etc. thinking the structure or order of events is the problem, and I've the code above gives me the closest behaviour to expected, given that it seems to work completely the first and only time I load the .csv into S3.
Any help or general pointers would be massively appreciated.
Thanks
P.s. Here's an example of the Cloudwatch log when validation kicks in a and stops an invalid .csv from being processed. If I then add a valid .csv to S£, the function is triggered, but I get this same error, even though the file is actually good.
2021-06-29T22:12:27.709+01:00 'ABCDEE' is too long
2021-06-29T22:12:27.709+01:00 Failed validating 'maxLength' in schema['items']['properties']['providerCode']:
2021-06-29T22:12:27.709+01:00 {'maxLength': 5, 'type': 'string'}
2021-06-29T22:12:27.709+01:00 On instance[2]['providerCode']:
2021-06-29T22:12:27.709+01:00 'ABCDEE'
2021-06-29T22:12:27.710+01:00 END RequestId: 81dd6a2d-130b-4c8f-ad08-39307841adf9
2021-06-29T22:12:27.710+01:00 REPORT RequestId: 81dd6a2d-130b-4c8f-ad08-39307841adf9 Duration: 482.43 ms Billed Duration: 483

How to efficiently parse JSON data with multiple keys in Python 2.7?

I'm writing a script that will check the CVS COVID vaccine availability for cities in my state of VA. I have been successful getting the data I'm looking for, but my code is hard coded in some areas. I'm specifically asking for help improving my code in the areas number 1 & 2 below:
The JSON file can be found here:
https://www.cvs.com//immunizations/covid-19-vaccine.vaccine-status.VA.json?vaccineinfo
I'm trying to access the data in the responsePayloadData key. The only way I could figure out how to do this is to make it the only key. For that reason, I deleted the other key responseMetaData:
#remove the key that we don't need
del obj['responseMetaData']
I'm also not sure how to dynamically loop through the VA items without hard coding the number of cities I know are there in the data:
for x, y in obj.items():
for a in range(34):
Here's the full code:
import requests
import json
import time
from datetime import datetime
import urllib2
try:
import indigo
except:
pass
strAvail = "False"
strAvailCity = "None"
try:
# download raw json object from CVS Virginia Website
url = "https://www.cvs.com//immunizations/covid-19-vaccine.vaccine-status.VA.json?vaccineinfo"
data = urllib2.urlopen(url).read().decode()
except urllib2.HTTPError, err:
return {"error": err.reason, "error_code": err.code}
# parse json object
obj = json.loads(data)
# remove the key that we don't need
del obj['responseMetaData']
# loop through the JSON dictionary and check availability
# status options: {"Fully Booked", "Available"}
for x, y in obj.items():
for a in range(34):
# print('City: ' + y['data']['VA'][a]['city'])
# print('Total Available: ' + y['data']['VA'][a]['totalAvailable'])
# print('Percent Available: ' + y['data']['VA'][a]['pctAvailable'])
# print('Status: ' + y['data']['VA'][a]['status'])
# print("------------------------------")
# If there is availability anywhere in the state, take some action.
if y['data']['VA'][a]['status'] == "Available":
strAvail = True
strAvailCity = y['data']['VA'][a]['city']
# Log timestamp for this check to the JSON
now = datetime.now()
strDateTime = now.strftime("%m/%d/%Y %I:%M %p")
EDIT: Since the JSON is not available outside the US. I've pasted it below:
{"responsePayloadData":{"currentTime":"2021-02-11T14:55:00.470","data":{"VA":[{"totalAvailable":"1","city":"ABINGDON","state":"VA","pctAvailable":"0.19%","status":"Fully Booked"},{"totalAvailable":"0","city":"ALEXANDRIA","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"ARLINGTON","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"BEDFORD","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"BLACKSBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"CHARLOTTESVILLE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"CHATHAM","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"CHESAPEAKE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"1","city":"DANVILLE","state":"VA","pctAvailable":"0.19%","status":"Fully Booked"},{"totalAvailable":"2","city":"DUBLIN","state":"VA","pctAvailable":"0.39%","status":"Fully Booked"},{"totalAvailable":"0","city":"FAIRFAX","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"FREDERICKSBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"GAINESVILLE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"HAMPTON","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"HARRISONBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"LEESBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"LYNCHBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"MARTINSVILLE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"MECHANICSVILLE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"MIDLOTHIAN","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},
{"totalAvailable":"0","city":"NEWPORT NEWS","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"NORFOLK","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"PETERSBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"PORTSMOUTH","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"RICHMOND","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"ROANOKE","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},
{"totalAvailable":"0","city":"ROCKY MOUNT","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"STAFFORD","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"SUFFOLK","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},
{"totalAvailable":"0","city":"VIRGINIA BEACH","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"WARRENTON","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"WILLIAMSBURG","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"WINCHESTER","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"},{"totalAvailable":"0","city":"WOODSTOCK","state":"VA","pctAvailable":"0.00%","status":"Fully Booked"}]}},"responseMetaData":{"statusDesc":"Success","conversationId":"Id-beb5f68730b34e6aa3bbc1fd927ea12b","refId":"Id-b4a7256078789eb59b8912b4","operation":"getInventorybyCity","statusCode":"0000"}}
Regarding problem 1, you can just access the data by key. You don't need to delete the other key:
payload = obj['responsePayloadData']
For the second problem, you can just iterate over the items in the list associated with obj['data']['VA']:
for city in payload['data']['VA']:
print(city)
{'city': 'ABINGDON',
'pctAvailable': '0.19%',
'state': 'VA',
'status': 'Fully Booked',
'totalAvailable': '1'}
{'city': 'ALEXANDRIA',
'pctAvailable': '0.00%',
'state': 'VA',
'status': 'Fully Booked',
'totalAvailable': '0'}
...

Ruby: Handling different JSON response that is not what is expected

Searched online and read through the documents, but have not been able to find an answer. I am fairly new and part of learning Ruby I wanted to make the script below.
The Script essentially does a Carrier Lookup on a list of numbers that are provided through a CSV file. The CSV file has just one row with the column header "number".
Everything runs fine UNTIL the API gives me an output that is different from the others. In this example, it tells me that one of the numbers in my file is not a valid US number. This then causes my script to stop running.
I am looking to see if there is a way to either ignore it (I read about Begin and End, but was not able to get it to work) or ideally either create a separate file with those errors or just put the data into the main file.
Any help would be much appreciated. Thank you.
Ruby Code:
require 'csv'
require 'uri'
require 'net/http'
require 'json'
number = 0
CSV.foreach('data1.csv',headers: true) do |row|
number = row['number'].to_i
uri = URI("https://api.message360.com/api/v3/carrier/lookup.json?PhoneNumber=#{number}")
req = Net::HTTP::Post.new(uri)
req.basic_auth 'XXX' , 'XXX'
res = Net::HTTP.start(uri.hostname, uri.port, :use_ssl => true) {|http|
http.request(req)
}
json = JSON.parse(res.body)
new = json["Message360"]["Carrier"].values
CSV.open("new.csv", "ab") do |csv|
csv << new
end
end
File Data:
number
5556667777
9998887777
Good Response example in JSON:
{"Message360"=>{"ResponseStatus"=>1, "Carrier"=>{"ApiVersion"=>"3", "CarrierSid"=>"XXX", "AccountSid"=>"XXX", "PhoneNumber"=>"+19495554444", "Network"=>"Cellco Partnership dba Verizon Wireless - CA", "Wireless"=>"true", "ZipCode"=>"92604", "City"=>"Irvine", "Price"=>0.0003, "Status"=>"success", "DateCreated"=>"2018-05-15 23:05:15"}}}
The response that causes Script to stop:
{
"Message360": {
"ResponseStatus": 0,
"Errors": {
"Error": [
{
"Code": "ER-M360-CAR-111",
"Message": "Allowed Only Valid E164 North American Numbers.",
"MoreInfo": []
}
]
}
}
}
It would appear you can just check json["Message360"]["ResponseStatus"] first for a 0 or 1 to indicate failure or success.
I'd probably add a rescue to help catch any other errors (malformed JSON, network issue, etc.)
CSV.foreach('data1.csv',headers: true) do |row|
number = row['number'].to_i
...
json = JSON.parse(res.body)
if json["Message360"]["ResponseStatus"] == 1
new = json["Message360"]["Carrier"].values
CSV.open("new.csv", "ab") do |csv|
csv << new
end
else
# handle bad response
end
rescue StandardError => e
# request failed for some reason, log e and the number?
end

How to write all the returned data into JSON file using Python?

How to write the returned dict into a json file.
until now I am able to returned the correct data and print it, but when I tried to write it into a JSON file it just write the last record.
I will appreciate any help to fix this
example :
printed data:
[{"file Name": "test1.txt", "searched Word": "th", "number of occurence": 1}]
[{"file Name": "test2.txt", "searched Word": "th", "number of occurence": 1}]
json file
[{"file Name": "test2.txt", "searched Word": "th", "number of occurence": 1}]
code:
import re
import json
import os.path
import datetime
for counter, myLine in enumerate(textList):
thematch=re.sub(searchedSTR,RepX,myLine)
matches = re.findall(searchedSTR, myLine, re.MULTILINE | re.IGNORECASE)
if len(matches) > 0:
# add one record for the match (add one because line numbers start with 1)
d[matches[0]].append(counter + 1)
self.textEdit_PDFpreview.insertHtml(str(thematch))
'''
loop over the selected file and extract 3 values:
==> name of file
==> searched expression
==> number of occurence
'''
for match, positions in d.items():
listMetaData = [{"file Name":fileName,"searched Word":match,"number of occurence":len(positions)}]
jsondata = json.dumps(listMetaData)
print("in the for loop ==>jsondata: \n{0}".format(jsondata))
'''
create a folder called 'search_result' that includes all the result of the searching as JSON file
where the system check if the folder exist will continue if not the system create the folder
insode the folder the file will be created as ==> today_searchResult.js
'''
if not(os.path.exists("./search_result")):
try:
#print(os.mkdir("./search_result"))
#print(searchResultFoder)
today = datetime.date.today()
fileName = "{}_searchResult.json".format(today)
#fpJ = os.path.join(os.mkdir("./search_result"),fileName)
#print(fpJ)
with open(fileName,"w") as jsf:
jsf.write(jsondata)
print("finish writing")
except Exception as e:
print(e)