How to Create Master script file to run mysql scripts - mysql

I want to create DB structure for my application in mysql, I have some 100 scripts which will create tables , sp, functions in different schemas.
Please suggest how can i run script only one after other and how can i stop if previous script failed. I am using MySQL 5.6 version.
I am currrently runnning them using a text file.
mysql> source /mypath/CreateDB.sql
which contains
tee /logout/session.txt
source /mypath/00-CreateSchema.sql
source /mypath/01-CreateTable1.sql
source /mypath/01-CreateTable2.sql
source /mypath/01-CreateTable3.sql
But they are running simultaniously and I have Foreign key in below tables due to which it is giving error.

The scripts are not running simultaneously. The mysql client does not execute in a multi-threaded manner.
But it's possible that you are sourcing the scripts in an order that causes foreign keys to reference tables that you haven't defined yet, and this is a problem.
You have two possible fixes for this problem:
Create the tables in the order to avoid this problem.
Create all the tables without their foreign keys, then run another script that contains ALTER TABLE ADD FOREIGN KEY... statements.

I wrote a Python function to execute SQL files:
#!/usr/bin/python
# -*- coding: utf-8 -*-
# Download it at http://sourceforge.net/projects/mysql-python/?source=dlp
# Tutorials: http://mysql-python.sourceforge.net/MySQLdb.html
# http://zetcode.com/db/mysqlpython/
import MySQLdb as mdb
import datetime, time
def run_sql_file(filename, connection):
'''
The function takes a filename and a connection as input
and will run the SQL query on the given connection
'''
start = time.time()
file = open(filename, 'r')
sql = s = " ".join(file.readlines())
print "Start executing: " + filename + " at " + str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M")) + "\n" + sql
cursor = connection.cursor()
cursor.execute(sql)
connection.commit()
end = time.time()
print "Time elapsed to run the query:"
print str((end - start)*1000) + ' ms'
def main():
connection = mdb.connect('127.0.0.1', 'root', 'password', 'database_name')
run_sql_file("my_query_file.sql", connection)
connection.close()
if __name__ == "__main__":
main()
I haven't tried it with stored procedure or large SQL statements. Also if you have SQL files containing several SQL queries, you might have to split(";") to extract each query and call cursor.execute(sql) for each query. Feel free to edit this answer to incorporate these improvements.

Related

Python script to copy table from one server to another - but row in results is truncating the ddl script

I am writing a python script that copies the ddl structure and data from one server to another. I have tried SqlAlchemy/Panda to write (to_sql) to target server to create the structure but it doesn't copy proper datatypes. So I have decided to store ddl structure in a table on the source server. In the script I am executing the sql statement to get the DDL structure and then would like to execute that structure on the target server. However, when I look at the results it truncates the ddl script and therefore not able to execute that on the target server. I am very new to python, so I don't understand a lot of things but here is what I am doing:
import sqlalchemy as sa
import turbodbc as tb
srcparams = 'DRIVER={ODBC Driver 17 for SQL Server}' + ';' \
'SERVER=' + SourceServer + ';' \
'DATABASE=' + SourceDatabase + ';' \
'Trusted_Connection=yes' + ';'
srcparams = urllib.parse.quote_plus(srcparams) #urllib.parse.quote_plus("DRIVER={ODBC Driver 17 for SQL Server};server=srcserver;database=srcdb;Trusted_Connection=yes")
srcengine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s" % srcparams)
srcconn = srcengine.connect()
results=srcconn.execute("SELECT script FROM util.GeneratedDDLScript WHERE ObjectName = '" + table + "'")
for row in results:
print (row.script)
This prints only the last line of the script
, [RowLoadDateTime] [DATETIME] NULL CONSTRAINT [DF__DimAddres_r] DEFAULT (getdate())
); , CONSTRAINT [DimAddress_PK] PRIMARY KEY CLUSTERED ([DimAddressKey])
My intention is to execute the row.script as sql to target server to build the table structure. I would appreciate any help.
Thanks

Is opening multiple cursors in Mysql a costly operation using Python?

I have few tables in Mysql which are to be loaded into Teradata, I am going with file based approach here, Meaning I export Mysql tables into delimiter file and those files i'm trying to load into teradata. The question/clarity i am expecting is, We are maintaining Mysql stored procedure to extract the data from tables, this stored procedure i'm using in python script to fetch the table data. Is it good/optimal to use stored procedure. Because to get the list of tables, retention period, datebase and other details, i'm creating one cursor to fetch data from 1 table, and again i have to create another cursor to call stored procedure.
Is it a costly operation in mysql creating a cursor.
Instead of table to fetch list of tables, retention period, datebase and other details, is it good thought to maintain them in flat file.
Please share your thoughts.
import sys
import mysql.connector
from mysql.connector import MySQLConnection, Error
import csv
output_file_path='/home/XXXXXXX/'
sys.path.insert(0, '/home/XXXXXXX/')
from mysql_config import *
def stored_proc_call(tbl):
print('SP call:', tbl)
conn_sp = mysql.connector.connect(host=dsn,database=database,user=username,passwd=password,allow_local_infile=True)
conn_sp_cursor = conn_sp.cursor(buffered=True)
conn_sp_cursor.callproc('mysql_stored_proc', [tbl])
output_file = output_file_path + tbl + '.txt'
print('output_file:', output_file)
with open(output_file, 'w') as filehandle:
writer = csv.writer(filehandle, delimiter='\x10')
for result in conn_sp_cursor.stored_results():
print('Stored proc cursor:{}, value:{}'.format(type(result), result))
for row in result:
writer.writerow(row)
#print('cursor row', row)
# Allow loading client-side files using the LOAD DATA LOCAL INFILE statement.
con = mysql.connector.connect(host=dsn,database=database,user=username,passwd=password,allow_local_infile=True)
cursor = con.cursor(buffered=True)
cursor.execute("select * from table")
for row in cursor:
print('Archive table cursor:{}, value:{}'.format(type(row), row))
(db,table,col,orgid,*allvalues)=row
stored_proc_call(table)
#print('db:{}, table:{}, col:{}, orgid:{}, ret_period:{}, allvalues:{}'.format(db,table,col,orgid,ret_period,allvalues))
#print('db:{}, table:{}, col:{}, orgid:{}, ret_period:{}, allvalues:{}'.format(db,table,col,orgid,ret_period,allvalues))

ETL script in Python to load data from another server .csv file into mysql

I work as a Business Analyst and new to Python.
In one of my project, I want to extract data from .csv file and load that data into my MySQL DB (Staging).
Can anyone guide me with a sample code and frameworks I should use?
Simple program to create sqllite. You can read the CSV file and use dynamic_entry to insert into your desired target table.
import sqlite3
import time
import datetime
import random
conn = sqlite3.connect('test.db')
c = conn.cursor()
def create_table():
c.execute('create table if not exists stuffToPlot(unix REAL, datestamp TEXT, keyword TEXT, value REAL)')
def data_entry():
c.execute("INSERT INTO stuffToPlot VALUES(1452549219,'2016-01-11 13:53:39','Python',6)")
conn.commit()
c.close()
conn.close()
def dynamic_data_entry():
unix = time.time();
date = str(datetime.datetime.fromtimestamp(unix).strftime('%Y-%m-%d %H:%M:%S'))
keyword = 'python'
value = random.randrange(0,10)
c.execute("INSERT INTO stuffToPlot(unix,datestamp,keyword,value) values(?,?,?,?)",
(unix,date,keyword,value))
conn.commit()
def read_from_db():
c.execute('select * from stuffToPlot')
#data = c.fetchall()
#print(data)
for row in c.fetchall():
print(row)
read_from_db()
c.close()
conn.close()
You can iterate through the data in CSV and load into sqllite3. Please refer below link as well.
Quick easy way to migrate SQLite3 to MySQL?
If that's a properly formatted CSV file you can use the LOAD DATA INFILE MySQL command and you won't need any python. Then after it is loaded in the staging area (without processing) you can continue transforming it using sql/etl tool of choice.
https://dev.mysql.com/doc/refman/8.0/en/load-data.html
A problem with that is that you need to add all columns but still even if you have data you don't need you might prefer to load everything in the staging.

RMySQL - dbWriteTable() - The used command is not allowed with this MySQL version

I am trying to read a few excel files into a dataframe and then write to a MySQL database. The following program is able to read the files and create the dataframe but when it tries to write to the db using dbWriteTable command, I get an error message -
Error in .local(conn, statement, ...) :
could not run statement: The used command is not allowed with this MySQL version
library(readxl)
library(RMySQL)
library(DBI)
mydb = dbConnect(RMySQL::MySQL(), host='<ip>', user='username', password='password', dbname="db",port=3306)
setwd("<directory path>")
file.list <- list.files(pattern='*.xlsx')
print(file.list)
dat = lapply(file.list, function(i){
print(i);
x = read_xlsx(i,sheet=NULL, range=cell_cols("A:D"), col_names=TRUE, skip=1, trim_ws=TRUE, guess_max=1000)
x$file=i
x
})
df = do.call("rbind.data.frame", dat)
dbWriteTable(mydb, name="table_name", value=df, append=TRUE )
dbDisconnect(mydb)
I checked the definition of the dbWriteTable function and looks like it is using load data local inpath to store the data in the database. As per some other answered questions on Stackoverflow, I understand that the word local could be the cause for concern but since it is already in the function definition, I don't know what I can do. Also, this statement is using "," as separator. But my data has "," in some of the values and that is why I was interested in using the dataframes hoping that it would preserve the source structure. But now I am not so sure.
Is there any other way/function do write the dataframe to the MySQL tables?
I solved this on my system by adding the following line to the my.cnf file on the server (you may need to use root and vi to edit!). In my this is just below the '[mysqld]' line
local-infile=1
Then restart the sever.
Good luck!
You may need to change
dbWriteTable(mydb, name="table_name", value=df, append=TRUE )
to
dbWriteTable(mydb, name="table_name", value=df,field.types = c(artist="varchar(50)", song.title="varchar(50)"), row.names=FALSE, append=TRUE)
That way, you specify the field types in R and append data to your MySQL table.
Source:Unknown column in field list error Rmysql

PostgreSQL multiple CSV import and add filename to each column

I've got 200k csv files and I need to import them all to a single postgresql table. It's a list of parameters from various devices and each csv's file name contains device's serial number and I need it to be in one of the colums for each row.
So to simplify, I've got few columns of data (no headers), let's say that columns in each csv file are: Date, Variable, Value and file name contains SERIALNUMBER_and_someOtherStuffIDontNeed.csv
I'm trying to use cygwin to write a bash script to iterate over files and do it for me, however for some reason it won't work, showing 'syntax error at or near "as" '
Here's my code:
#!/bin/bash
FILELIST=/cygdrive/c/devices/files/*
for INPUT_FILE in $FILELIST
do
psql -U postgres -d devices -c "copy devicelist
(
Date,
Variable,
Value,
SN as CURRENT_LOAD_SOURCE(),
)
from '$INPUT_FILE
delimiter ',' ;"
done
I'm learning SQL so it might be an obvious mistake, but I can't see it.
Also I know that in that form I will get full file name, not just the serial number bit I want but I can probably handle that somehow later.
Please advise.
Thanks.
I dont think there is a CURRENT_LOAD_SOURCE() function in postgres. A work-around is to leave the name-column NULL on copy, and patch is to the desired value just after the copy. I prefer a shell here-document because that make quoting inside the SQL body easier. (BTW: for 10K of files, the globbing needed to obtain FILELIST might exceed argmax for the shell ...)
#!/bin/bash
FILELIST="`ls /tmp/*.c`"
for INPUT_FILE in $FILELIST
do
echo "File:" $INPUT_FILE
psql -U postgres -d devices <<OMG
-- I have a schema "tmp" for testing purposes
CREATE TABLE IF NOT EXISTS tmp.filelist(name text, content text);
COPY tmp.filelist ( content)
from '/$INPUT_FILE' delimiter ',' ;
UPDATE tmp.filelist SET name = '$FILELIST'
WHERE name IS NULL;
OMG
done
For anyone interested in an answer, I've used a python script to change file names and then another script using psycopg2 to connect to the database and then done everyting in one connection. Took 10 minutes instead of 10 hours.
Here's the code:
Renaming files (also apparently to import from CSV you need all the rows to be filled and the information I needed was in first 4 columns anyway, therefore I've put together a solution to generate whole new CSVs instead of just renaming them):
import os
import csv
path='C:/devices/files'
os.chdir(path)
i=0
for file in os.listdir(path):
try:
i+=1
if i%10000 == 0:
#just to see the progress
print(i)
serial_number = (file[:8])
creader = csv.reader(open(file))
cwriter = csv.writer(open('processed_'+file, 'w'))
for cline in creader:
new_line = [val for col, val in enumerate(cline) if col not in (4, 5, 6, 7)]
new_line.insert(0, serial_number)
#print(new_line)
cwriter.writerow(new_line)
except:
print('problem with file: ' + file)
pass
Updating database:
import os
import psycopg2
path="C:\\devices\\files"
directory_listing = os.listdir(path)
conn = psycopg2.connect("dbname='devices' user='postgres' host='localhost'")
cursor = conn.cursor()
print(len(directory_listing))
i=100001
while i < 218792:
current_file=(directory_listing[i])
i+=1
full_path = "C:/devices/files/" + current_file
with open(full_path) as f:
cursor.copy_from(file=f, table='devicelistlive', sep=",")
conn.commit()
conn.close()
Don't mind while and weird numbers, it's just because I was doing it in portions for testing purposes. Can easily be replaced with for