Convert CSV column format from string to datetime - csv

Python 3.4 ... I am trying to convert a specific column in CSV from string to datetime and write it to the file. I am able to read and convert it, but need help writing the date format change to the file (the last line of code - needs to be modified). I got the following error for last line of code Error: sequence expected.
import csv
import datetime
with open ('/path/test.date.conversion.csv') as csvfile:
readablefile = csv.reader(csvfile)
writablefile = csv.writer(csvfile)
for row in readablefile:
date_format = datetime.datetime.strptime(row[13], '%m/%d/%Y')
writablefile.writerow(date_format)

So date_format is an instance of the datetime class. You need to format a string to print out to file. You can go with the standard:
date_format.isoformat()
Or customize your own format:
date_format.strftime('%Y-%m-%d %H:%M:%S')
Also a couple more things to note. You can't write to the same file you're reading from like that. If you want to overwrite that file, create a tmp file then replace it when done, but it's better to just create another file. The best solution imho is to make good use of stdin and stdout like this:
import csv
import sys
from datetime import datetime
FIELD = 13
FMT = '%m/%d/%Y'
readablefile = csv.reader(sys.stdin)
writablefile = csv.writer(sys.stdout)
for row in readablefile:
dt = datetime.strptime(row[FIELD], FMT)
row[FIELD] = dt.isoformat()
writablefile.writerow(row)
Now you can call the script like so:
cat /path/test.date.conversion.csv | python fixdate.py > another_file.csv
At this point, you can make use of the argparse module to provide all kinds of nice parameters to your script like:
Which field to format?
What input and/or output format?
etc.

Related

CSV not importing JSON with correct format into database

Just like the title says, here is my code:
require 'json'
def import_csv
path = Rails.root.join('folder1', 'folder2', 'file.csv')
counter = 0
puts "inserts on table started..."
CSV.foreach(path, headers: true) do |row|
next if row.to_hash['deleted_at'] != nil
counter += 1
puts row.to_json #shows correct format
someModel = someModel.new(row.to_hash) #imports incorrect format of json with backslash in db
#someModel = someModel.new(row.to_json) #ArgumentError: When assigning attributes, you must pass a hash as an argument.
someModel.skip_callbacks = true
someModel.save!
end
puts "#{counter} inserts on table apps complete"
end
import_csv
I can not import the CSV File in the correct format. The import works, but the structure is wrong.
EXPECTED
{"data":{"someData":72}}
GETTING
"{\"data\":{\"someData\":72}}"
How can I import it with the correct JSON format?
If all headers are correct as of the column names of the model
Maybe you can try:
JSON.parse(row.to_json)

Error parsing JSON: more than one document in the input (Redshift to Snowflake SQL)

I'm trying to convert a query from Redshift to Snowflake SQL.
The Redshift query looks like this:
SELECT
cr.creatives as creatives
, JSON_ARRAY_LENGTH(cr.creatives) as creatives_length
, JSON_EXTRACT_PATH_TEXT(JSON_EXTRACT_ARRAY_ELEMENT_TEXT (cr.creatives,0),'previewUrl') as preview_url
FROM campaign_revisions cr
The Snowflake query looks like this:
SELECT
cr.creatives as creatives
, ARRAY_SIZE(TO_ARRAY(ARRAY_CONSTRUCT(cr.creatives))) as creatives_length
, PARSE_JSON(PARSE_JSON(cr.creatives)[0]):previewUrl as preview_url
FROM campaign_revisions cr
It seems like JSON_EXTRACT_PATH_TEXT isn't converted correctly, as the Snowflake query results in error:
Error parsing JSON: more than one document in the input
cr.creatives is formatted like this:
"[{""previewUrl"":""https://someurl.com/preview1.png"",""device"":""desktop"",""splitId"":null,""splitType"":null},{""previewUrl"":""https://someurl.com/preview2.png"",""device"":""mobile"",""splitId"":null,""splitType"":null}]"
It seems to me that you are not working with valid JSON data inside Snowflake.
Please review your file format used for the copy into command.
If you open the "JSON" text provided in a text editor , note that the information is not parsed or formatted as JSON because of the quoting you have. Once your issue with double quotes / escaped quotes is handled, you should be able to make good progress
Proper JSON on Left || Original Data on Right
If you are not inclined to reload your data, see if you can create a Javascript User Defined Function to remove the quotes from your string, then you can use Snowflake to process the variant column.
The following code is working POJO that can be used to remove the doublequotes for you.
var textOriginal = '[{""previewUrl"":""https://someurl.com/preview1.png"",""device"":""desktop"",""splitId"":null,""splitType"":null},{""previewUrl"":""https://someurl.com/preview2.png"",""device"":""mobile"",""splitId"":null,""splitType"":null}]';
function parseText(input){
var a = input.replaceAll('""','\"');
a = JSON.parse(a);
return a;
}
x = parseText(textOriginal);
console.log(x);
For anyone else seeing this double double quote issue in JSON fields coming from CSV files in a Snowflake external stage (slightly different issue than the original question posted):
The issue is likely that you need to use the FIELD_OPTIONALLY_ENCLOSED_BY setting. Specifically, FIELD_OPTIONALLY_ENCLOSED_BY = '"' when setting up your fileformat.
(docs)
Example of creating such a file format:
create or replace file format mydb.myschema.my_tsv_file_format
type = CSV
field_delimiter = '\t'
FIELD_OPTIONALLY_ENCLOSED_BY = '"';
And example of querying from a stage using this file format:
select
$1 field_one
$2 field_two
-- ...and so on
from '#my_s3_stage/path/to/file/my_tab_separated_file.csv' (file_format => 'my_tsv_file_format')

Python 2.7, MySQL, JSON, & DateTime - How can I remove the T between the Date and Time in the timestamp field?

I have a MySQL database with a few tables. I wrote a python script to dump some of the tables data into a JSON file. I am a bit confused with dumping the date and time stamp.
Here is the code sample, conversion.py:
import MySQLdb
import json
import collections
from datetime import date, datetime
#connect to database
conn = MySQLdb.connect(host= "localhost", user="root", passwd="root", db="testdb")
#Fetch rows
sql = "SELECT * from offices"
cursor = conn.cursor()
cursor.execute(sql)
rows = cursor.fetchall()
data = []
def json_serial(obj):
"""JSON serializer for objects not serializable by default json code"""
if isinstance(obj, (datetime, date)):
return obj.isoformat()
raise TypeError ("Type %s not serializable" % type(obj))
for row in rows:
d = collections.OrderedDict()
d['officeCode'] = row[0]
d['city'] = row[1]
d['phone'] = row[2]
d['eff_date'] = row[3]
d['lastupdatedby'] = row[4]
d['state'] = row[5]
d['country'] = row[6]
d['postalcode'] = row[7]
d['territory'] = row[8]
data.append(d)
with open('data.json', 'w') as outfile:
json.dump(data, outfile, default=json_serial)
conn.close()
When I execute this code, a JSON file is created which is fine. I have a problem with two fields, eff_date which is a date type in database and lastupdatedby is a timestamp type in the database.
"eff_date": "2015-09-23"
"lastupdatedby": "2016-08019T08:13:53"
So, in my JSON file, eff_time is created fine but lastupdatedby is getting a T in middle of date and time as shown above. But, in my actual database there is no T between the date and time. I would like to get rid of that T because I am planning to dump this file into a different database and I don't think it will accept that format.
Any help will be much appreciated.
The T between the date and time is per the ISO 8601 format.
And that's format returned by the datetime.isoformat function, found in the code here:
return obj.isoformat()
(That happens to be the format that Javascript is expecting.)
If we want to return a string different format, we probably need to use a different function, e.g. strftime function in place of isoformat.
If isoformat is working for the date objects, leave that alone. Just do the strftime for a datetime object.
The format string "%Y-%m-%d %H:%M:%S" might suit your needs.
https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior

Export Datatables from Spotfire to CSV using IronPython Script

I have a IronPython script I use to export all my data tables from a Spotfire project.
Currently it works perfectly. It loops through all datatables and exports them as ".xlsx". Now I need to export the files as ".csv" which I thought would be as simple as changing ".xlsx" to ".csv".
This script still exports the files, names them all .csv, but what is inside the file is a .xlsx, Im not sure how or why. The code is just changing the file extension name but not converting the file to csv.
Here is the code I am currently using:
I have posted the full code at the bottom, and the code I believe is relevant to my question in a separate code block at the top.
if(dialogResult == DialogResult.Yes):
for d in tableList: #cycles through the table list elements defined above
writer = Document.Data.CreateDataWriter(DataWriterTypeIdentifiers.ExcelXlsDataWriter)
table = Document.Data.Tables[d[0]] #d[0] is the Data Table name in the Spotfire project (defined above)
filtered = Document.ActiveFilteringSelectionReference.GetSelection(table).AsIndexSet() #OR pass the filter
stream = File.OpenWrite(savePath+'\\'+ d[1] +".csv") #d[1] is the Excel alias name. You could also use d.Name to export with the Data Table name
names = []
for col in table.Columns:
names.append(col.Name)
writer.Write(stream, table, filtered, names)
stream.Close()
I think it may have to do with the ExcelXlsDataWriter.
I tried with ExcelXlsxDataWriter as well. Is there a csv writer I could use for this? I believe csv and txt files have a different writer.
Any help is appreciated.
Full script shown below:
import System
import clr
import sys
clr.AddReference("System.Windows.Forms")
from sys import exit
from System.Windows.Forms import FolderBrowserDialog, MessageBox, MessageBoxButtons, DialogResult
from Spotfire.Dxp.Data.Export import DataWriterTypeIdentifiers
from System.IO import File, FileStream, FileMode
#This is a list of Data Tables and their Excel file names. You can see each referenced below as d[0] and d[1] respectively.
tableList = [
["TestTable1"],
["TestTable2"],
]
#imports the location of the file so that there is a default place to put the exports.
from Spotfire.Dxp.Application import DocumentMetadata
dmd = Application.DocumentMetadata #Get MetaData
path = str(dmd.LoadedFromFileName) #Get Path
savePath = '\\'.join(path.split('\\')[0:-1]) + "\\DataExports\\"
dialogResult = MessageBox.Show("The files will be save to "+savePath
+". Do you want to change location?"
, "Select the save location", MessageBoxButtons.YesNo)
if(dialogResult == DialogResult.Yes):
# GETS THE FILE PATH FROM THE USER THROUGH A FILE DIALOG instead of using the file location
SaveFile = FolderBrowserDialog()
SaveFile.ShowDialog()
savePath = SaveFile.SelectedPath
#message making sure that the user wants to exporthe files.
dialogResult = MessageBox.Show("Export Files."
+" Export Files","Are you sure?", MessageBoxButtons.YesNo)
if(dialogResult == DialogResult.Yes):
for d in tableList: #cycles through the table list elements defined above
writer = Document.Data.CreateDataWriter(DataWriterTypeIdentifiers.ExcelXlsDataWriter)
table = Document.Data.Tables[d[0]] #d[0] is the Data Table name in the Spotfire project (defined above)
filtered = Document.ActiveFilteringSelectionReference.GetSelection(table).AsIndexSet() #OR pass the filter
stream = File.OpenWrite(savePath+'\\'+ d[1] +".csv") #d[1] is the Excel alias name. You could also use d.Name to export with the Data Table name
names = []
for col in table.Columns:
names.append(col.Name)
writer.Write(stream, table, filtered, names)
stream.Close()
#if the user doesn't want to export then he just gets a message
else:
dialogResult = MessageBox.Show("ok.")
For some reason the Spotfire Iron Python implementation does not support the csv package implemented in Python.
The workaround I found to your implementation is using StdfDataWriter instead of ExcelXsDataWriter. The STDF data format is the Spotfire Text Data Format. The DataWriter class in Spotfire supports only Excel and STDF (see here) and STDF comes closest to CSV.
from System.IO import File
from Spotfire.Dxp.Data.Export import DataWriterTypeIdentifiers
writer = Document.Data.CreateDataWriter(DataWriterTypeIdentifiers.StdfDataWriter)
table = Document.Data.Tables['DropDownSelectors']
filtered = Document.ActiveFilteringSelectionReference.GetSelection(table).AsIndexSet()
stream = File.OpenWrite("C:\Users\KLM68651\Documents\dropdownexport.stdf")
names =[]
for col in table.Columns:
names.append(col.Name)
writer.Write(stream, table, filtered, names)
stream.Close()
Hope this helps

JMeter - Specify CSV row failure

Within JMeter, I am running a script which uses a .CSV file to enter data as well as verify results. It is working correctly, but I cannot figure out how to tell which row/line of the .CSV caused the individual failures. Is there a way to do this?
Somewhat of an example scenario (not specific to what I'm doing, but similar):
Each row of the .CSV file contains a mathematical equation as well as the expected result.
On page 1, enter the equation (2+2)
Then on Page 2, you get the response: 3.
That test would obviously be a failure.
Say there are 1,000 tests being ran, some that pass and some that do not. How can I tell which .CSV row/line didn't pass?
Do you have any columns in your CSV file which help you to uniquely identify a row?
Let me assume you have a column called 'TestCaseNo' which has values as TC001, TC002,TC003...etc
Add a Beanshell Post Processor to store the result for each iteration.
Add the below code. I assume the you have the PASS or FAIL result stored in the 'Result' variable.
import java.io.File;
import org.apache.jmeter.services.FileServer;
f = new FileOutputStream("someptah/tcstatus.csv", true);
p = new PrintStream(f);
p.println( vars.get("TestCaseNo") + "," + vars.get("Result"));
p.close();
f.close();
The above code creates a CSV file with the results for each testcase.
EDIT:
Do the assertion yourself in the Beanshell post processor.
import java.io.File;
import org.apache.jmeter.services.FileServer;
Result = "FAIL";
Response = prev.getResponseDataAsString();
if (Response.contains("value")) // replace the value with the expected text
Result = "PASS";
f = new FileOutputStream("someptah/tcstatus.csv", true);
p = new PrintStream(f);
p.println( vars.get("TestCaseNo") + "," + Result);
p.close();
f.close();
I would use the following approach
__CSVRead() function - to get data from the .csv file.
__counter function - to track CSV file position. You can include counter variable name into Sampler's label so current .csv file line could be reported. See below image for example
For more information on aforementioned and other useful JMeter functions see How to Use JMeter Functions posts series.