Using python unable to load CSV file into MySql - mysql

I was trying to load trans_dt into cmdels table. but it throws syntax error. How to add local variable into table. The Local variable name is newDate
import mysql.connector
config = {
'user':'root',
'password':'password',
'host':'127.0.0.1',
'database':'nse'
}
conn= mysql.connector.connect(**config)
c=conn.cursor()
def insertRows(fileName,c):
delimiter=r','
dateString=r'%d-%b-%Y'
file=fileName.split("/")[-1]
if file.startswith("MTO"):
newDate=new_Date(file)
c.execute("Load data local infile %s into table cmdels fields terminated by %s
ignore 4 lines(recno,srno,symbol,series,qtytrd,qtydel,qtyper,trans_dt)
SET trans_dt=(#trans_dt,%s)", (fileName, delimiter,newDate))
localExtractFilePath="/Users/data/nse"
import os
def new_Date(s):
return s[4:-4]
for file in os.listdir(localExtractFilePath):
if file.endswith(".DAT"):
insertRows(localExtractFilePath+"/"+file,c)
print "Loaded file "+file+" into database"
conn.commit()
c.close()
conn.close()

Single quoted string literals can't span newlines. You can use triple-quoted strings ("""Load data...""") but you end up with extra whitespace and newlines in the string, which can cause their own problems. A good option is to end each line with an end quote and let python concatenate them in the compilation step
c.execute("Load data local infile %s into table cmdels fields terminated by %s"
" ignore 4 lines(recno,srno,symbol,series,qtytrd,qtydel,qtyper,trans_dt)"
" SET trans_dt=(#trans_dt,%s)", (fileName, delimiter,newDate))

Related

Load data to Salesforce using COPY INTO

I have been trying to load csv data into Snowflake using COPY INTO command
This is the sample data
4513194677~"DELL - ULTRASHARP 32\" MONITOR 4K U3223QE"~""~""
I have tried using below COPY INTO syntax
file_format =
type = 'csv'
field_delimiter = '~'
skip_header = 1
record_delimiter = '\\n'
field_optionally_enclosed_by = '"'
ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = 'NONE'
However, getting this error "Found character 'M' instead of field delimiter '~'"
How can I escape the " and load the columns data as DELL - ULTRASHARP 32 " MONITOR 4K U3223QE
If I try to use ESCAPE, I get below error when running the COPY command
[ERROR] ProgrammingError: 001003 (42000): 01a8e01d-3201-36a9-0050-4502537cfc7f: SQL compilation error:
syntax error line 15 at position 43 unexpected '''.
syntax error line 20 at position 20 unexpected ')'.
file_format =
type = 'csv'
field_delimiter = '~'
skip_header = 1
record_delimiter = '\\n'
field_optionally_enclosed_by = '"'
ESCAPE = '\\'
ESCAPE_UNENCLOSED_FIELD = '\\'
Try using two double quotes in the data instead of one without trying to escape the double quote
Data similar to "sample"
You can have your csv formated like below
"Data similar to ""sample"""

Exporting data from R to MYSQL server

df <- data.frame(category = c("A","B","A","D","E"),
date = c("5/10/2005","6/10/2005","7/10/2005","8/10/2005","9/10/2005"),
col1 = c(1,NA,2,NA,3),
col2 = c(1,2,NA,4,5),
col3 = c(2,3,NA,NA,4))
I have to insert a data frame that is created in R to mysql server.
I have tried these methods(Efficient way to insert data frame from R to SQL). However, my data also has NA which are fails the whole process of exporting.
Is there a way around to faster upload to data.
dbWriteTable(cn,name ="table_name",value = df,overwrite=TRUE, row.names = FALSE)
The above works but is very slow to upload
The method that I have to use is this :
before = Sys.time()
chunksize = 1000000 # arbitrary chunk size
for (i in 1:ceiling(nrow(df)/chunksize)) {
query = paste0('INSERT INTO dashboard_file_new_rohan_testing (',paste0(colnames(df),collapse = ','),') VALUES ')
vals = NULL
for (j in 1:chunksize) {
k = (i-1)*chunksize+j
if (k <= nrow(df)) {
vals[j] = paste0('(', paste0(df[k,],collapse = ','), ')')
}
}
query = paste0(query, paste0(vals,collapse=','))
dbExecute(cn, query)
}
time_chunked = Sys.time() - before
Error Encountered:
Error in .local(conn, statement, ...) :
could not run statement: Unknown column 'NA' in 'field list'
One of the fastest ways to load data into MySQL is to use its LOAD DATA command line tool. You may try first writing your R data frame to a CSV file, then using MySQL's LOAD DATA to load it:
write.csv(df, "output.csv", row.names=FALSE)
Then from your command line, use:
LOAD DATA INFILE 'output.csv' INTO TABLE table_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
Note that this assumes the CSV file is already on the same machine as MySQL. If not, and you have it still locally, then use LOAD DATA LOCAL INFILE instead.
You may read MYSQL import data from csv using LOAD DATA INFILE for more help using LOAD DATA.
Edit:
To deal with the issue of NA values, which should represent NULL in MySQL, you may take the approach of first casting the entire data frame to text, and then replacing the NA values with empty string. LOAD DATA will interpret a missing value in a CSV column as being NULL. Consider this:
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
df[is.na(df)] <- ""
Then, use write.csv along with LOAD DATA as described above.

Julia - Rewriting a CSV

Complete Julia newbie here.
I'd like to do some processing on a CSV. Something along the lines of:
using CSV
in_file = CSV.Source('/dir/in.csv')
out_file = CSV.Sink('/dir/out.csv')
for line in CSV.eachline(in_file)
replace!(line, "None", "")
CSV.writeline(out_file, line)
end
This is in pseudocode, those aren't existing functions.
Idiomatically, should I iterate on 1:CSV.countlines(in_file)? Do a while and check something?
If all you want to do is replace a string in the line, you do not need any CSV parsing utilities. All you do is read the file line by line, replace, and write. So:
infile = "/path/to/input.csv"
outfile = "/path/to/output.csv"
out = open(outfile, "w+")
for line in readlines(infile)
newline = replace(line, "a", "b")
write(out, newline)
end
close(out)
This will replicate the pseudocode you have in your question.
If you need to parse and read the csv field by field, use the readcsv function in base.
data=readcsv(infile)
typeof(data) #Array{Any,2}
This will return the data in the file as a 2 dimensional array. You can process this data any way you want, and write it back using the writecsv function.
for i in 1:size(data,1) #iterate by rows
data[i, 1] = "This is " * data[i, 1] # Add text to first column
end
writecsv(outfile, data)
Documentation for these functions:
http://docs.julialang.org/en/release-0.5/stdlib/io-network/?highlight=readcsv#Base.readcsv
http://docs.julialang.org/en/release-0.5/stdlib/io-network/?highlight=readcsv#Base.writecsv

blank file while copying a file in python

I have a function takes a file as input and prints certain statistics and also copies the file into a file name provided by the user. Here is my current code:
def copy_file(option):
infile_name = input("Please enter the name of the file to copy: ")
infile = open(infile_name, 'r')
outfile_name = input("Please enter the name of the new copy: ")
outfile = open(outfile_name, 'w')
slist = infile.readlines()
if option == 'statistics':
for line in infile:
outfile.write(line)
infile.close()
outfile.close()
result = []
blank_count = slist.count('\n')
for item in slist:
result.append(len(item))
print('\n{0:<5d} lines in the list\n{1:>5d} empty lines\n{2:>7.1f} average character per line\n{3:>7.1f} average character per non-empty line'.format(
len(slist), blank_count, sum(result)/len(slist), (sum(result)-blank_count)/(len(slist)-blank_count)))
copy_file('statistics')
It prints the statistics of the file correctly, however the copy it makes of the file is empty. If I remove the readline() part and the statistics part, the function seems to make a copy of the file correctly. How can I correct my code so that it does both. It's a minor problem but I can't seem to get it.
The reason the file is blank is that
slist = infile.readlines()
is reading the entire contents of the file, so when it gets to
for line in infile:
there is nothing left to read and it just closes the newly truncated (mode w) file leaving you with a blank file.
I think the answer here is to change your for line in infile: to for line in slist:
def copy_file(option):
infile_name= input("Please enter the name of the file to copy: ")
infile = open(infile_name, 'r')
outfile_name = input("Please enter the name of the new copy: ")
outfile = open(outfile_name, 'w')
slist = infile.readlines()
if option == 'statistics':
for line in slist:
outfile.write(line)
infile.close()
outfile.close()
result = []
blank_count = slist.count('\n')
for item in slist:
result.append(len(item))
print('\n{0:<5d} lines in the list\n{1:>5d} empty lines\n{2:>7.1f} average character per line\n{3:>7.1f} average character per non-empty line'.format(
len(slist), blank_count, sum(result)/len(slist), (sum(result)-blank_count)/(len(slist)-blank_count)))
copy_file('statistics')
Having said all that, consider if it's worth using your own copy routine rather than shutil.copy - Always better to delegate the task to your OS as it will be quicker and probably safer (thanks to NightShadeQueen for the reminder)!

Encapsulate complex mysql inserts with back quotes

I have exported some sql statements, and now need to replay them. The issue arises when I play this to mysql where actual java code is included in the VALUES. This code includes back slashes, quotes, double quotes - sometimes causing mysql to mis-interpret escaping and quoting:
INSERT INTO sonar.snapshot_sources(ID, SNAPSHOT_ID, DATA) VALUES (267, 420,
'package com.company.gateway.dl.util
"((\''|\")*)(stuff|moreStuff)((\''|\")*):((\''|\")*)([0-9]+)((\''|\")*)";
');
Above is not the full text, but the quoted bit here, cause the insert to fail. A work around is to enclose the complex text value in back quotes, e.g.
INSERT INTO sonar.snapshot_sources(ID, SNAPSHOT_ID, DATA) VALUES (267, 420,
`'package com.company.gateway.dl.util
"((\''|\")*)(stuff|moreStuff)((\''|\")*):((\''|\")*)([0-9]+)((\''|\")*)";
'`);
The question I have is, how can I sed precisely these statements to add back quotes to encapsulate where required? I want to replace 'package with `'package and '); with '`); The closing bracket seems the most complicated since many more statements will match this.
The below python script can help you. The script will read the sql dump file and add backtick to the matched regex and prints it to output file. Input file and output file should be given as a commandline argument.
#!/usr/bin/env python3
import re
import os
import sys
if len(sys.argv) < 3:
print('Provide input and output file to begin processing')
sys.exit(1)
else:
if not os.path.isfile(sys.argv[1]):
print('Input file does not exit')
sys.exit(1)
ifile = sys.argv[1]
ofile = sys.argv[2]
# Compiling the regex
rex = re.compile(r"""(INSERT[\s\S]+?)('package[\s\S]+?')(\);)""")
# Reading the file as a string
dataFile = open(ifile, 'r').read()
fp = open(ofile, 'w')
# Substituting it
print(re.sub(rex, r'\1`\2`\3', dataFile), end='', file=fp)
fp.close()
Output
INSERT INTO sonar.snapshot_sources(ID, SNAPSHOT_ID, DATA) VALUES (267, 420,
`'package com.company.gateway.dl.util
"((\''|\")*)(stuff|moreStuff)((\''|\")*):((\''|\")*)([0-9]+)((\''|\")*)";
'`);