I am writing a python script which connects to the SQL database. It creates databases based on the folder paths and tables based on the files present in those folders in corresponding databases.
My following code is doing everything fine but I want to optimize it in a way that it loads the data into tables only if the tables are empty.
The problem with the following code is that whenever I run it, it checks if the table is not present, it creates the table and if the table is already there, it moves on but when it comes to loading data into tables, every time run the script, it loads the data into tables from file 1.
I want to tweak it in a way that it loads data into tables only and only if the data is already not present in it. If the data is present, the code move on.
I tried to do something like create table if not exists but not successful.
hostname = 'hostname'
username = 'usrname'
password = '12345'
database = 'd1'
portname = '12345'
from mysql.connector.constants import ClientFlag
import pathlib
import sys
import os
import mysql.connector
import subprocess
from subprocess import *
import time
from termcolor import colored
print(colored('\nConnecting SQL database using host = '+hostname+' , username = '+username+' , port = ' +portname+ ' , database = ' +database+'.','cyan',attrs=['reverse','blink']))
print('\n')
myConnection = mysql.connector.connect(user=username, passwd=password,host=hostname,port=portname,database=database,client_flags=[ClientFlag.LOCAL_FILES])
myCursor =myConnection.cursor()
rootDir35 = '/mnt/Wdrive/pc35/SK/E13'
filenames35 = os.listdir(rootDir35)
root35 = pathlib.Path(rootDir35)
non_empty_dirs35 = {str(p35.parent) for p35 in root35.rglob('*') if p35.is_file()}
#35
try:
print(colored('**** Starting Performing SQL-Queries for pc35 **** \n','green',attrs=['reverse','blink'] ))
for f35 in non_empty_dirs35:
dB35 = f35.replace('/','_') or f35.replace('-','_')
for dirName35, subdirList35, fileList35 in os.walk(rootDir35):
if dirName35 == f35:
p1135= 'Current Working Directory is: %s' %f35+ ' '
print(colored(p1135,'cyan'))
createDB35='CREATE DATABASE IF NOT EXISTS %s' %dB35
myCursor.execute(createDB35)
p135='Database of pc35 Created : %s' %dB35
print(colored(p135,'cyan'))
useDB35='use %s' %dB35
myCursor.execute(useDB35)
myConnection.commit()
p235= 'Database in use : %s' %dB35
print(colored(p235,'cyan'))
print(' ')
for fname35 in fileList35:
completePath35 ='%s' %dirName35+'/%s'%fname35
tblname35 = os.path.basename(fname35).split('.')[0]
if '-' not in tblname35:
if '.' not in tblname35:
sql35= 'CREATE TABLE if not exists %s (Datum varchar(50), Uhrzeit varchar(13), UpsACT_V varchar(6), UpsPRE_V varchar(6), IpsACT_A varchar(6), IpsPRE_A varchar(6), PpsACT_W varchar(6), PpsPRE_W varchar(10), UelACT_V varchar(6), UelPRE_V varchar(6), IelACT_A varchar(8), IelPRE_A varchar(8), PelACT_W varchar(8), PelPRE_W varchar(8), Qlad_Ah varchar(10), Qlast_Ah varchar(10))' %(tblname35)
myCursor.execute(sql35)
myConnection.commit()
test35 = 'The Table %s ' %tblname35+ 'in database %s '%dB35+'is created'
print(colored(test35,'yellow'))
loadData35= "LOAD DATA LOCAL INFILE '%s' " %completePath35 + "INTO TABLE %s" %tblname35
myCursor.execute(loadData35)
myConnection.commit()
p335='Data loaded from file %s ' %fname35
p435=' into table %s ' %tblname35
p535 = p335 + p435
print(colored(p535,'green'))
print(' ')
print(colored('**** SQL-Queries for pc35 successfully executed **** \n','green',attrs=['reverse','blink']))
except:
print(' ')
print(colored('**** SQL queries for pc35 doesnot executed. Please refer to the report or user manual for more details ****','red',attrs=['reverse','blink']))
print(' ')
What I want is something like load data if not exists in table.
what do you think, is it possible or what should I do to achieve this?
Related
I am trying to create a db with one table, db has been created but the table is not. I am using the database, so I am not sure why table is not showing. If fact, I managed to create the table earlier today with the same code but deleted by mistake and had to rebuild. Any help would be appreciated.
import mysql.connector
from mysql.connector import errorcode
from database import cursor
DB_NAME = 'stations'
TABLES ={}
sql ='''CREATE TABLE INCOMING_ENTRIES(
ID INT(11) NOT NULL PRIMARY KEY,
START_DATE VARCHAR(45),
START_TIME VARCHAR(45),
END_DATE VARCHAR(45),
BEGIN_LOCATION VARCHAR(45),
END_LOCATION VARCHAR(45),
HEADCODE VARCHAR(45),
VEHICLE_MAKEUP VARCHAR(100)
)'''
def create_database():
cursor.execute(sql)
print("Database {} created!".format(DB_NAME))
def create_tables():
cursor.execute("USE {}".format(DB_NAME))
for table_name in TABLES:
table_description = TABLES[table_name]
try:
print("Creating table ({})".format(table_name), end ="")
cursor.execute(table_description)
except mysql.connector.Error as err:
if err.errno == errorcode.ER_TABLE_EXISTS_ERROR:
print("Already exists")
else:
print(err.msg)
create_database()
create_tables()
And below is the database connection set up:
import os
import mysql.connector
DB_USERNAME = os.getenv("DB_USERNAME")
DB_PASSWORD = os.getenv("DB_PASSWORD")
db = mysql.connector.connect(
host="localhost",
user=DB_USERNAME,
password=DB_PASSWORD
)
cursor = db.cursor()
I have several CSV files in one folder with the same structure and would like to import it to a Database for further analysis.
The CSV columns are the following:
Company_Code , Company_Name , Year , Account_number , Account_Description, Value
My intention is to use this CSV to populate 3 tables in my DB in MySQL Workbench. This tables are responsible for organizing:
company_data: id, code and name.
Accounts: id, account number and account description.
Values: companyID, accountID, year and value.
I want to always keep the relationship between company values and account, but in a way that makes easier queries and analysis.
I have tried python but i can't figure out how to attach foreign keys. I don't know if its better to import to a temporary table and then populate each table, directly in MySQL workbench
Edit:
For my python approach i was trying to retrive data from an XML file that has the same information but it started getting to techinical because I would need to implement several valdations that with the CSV are not necessary.
import xml.etree.ElementTree as ET;
import csv
import mysql.connector;
mydb = mysql.connector.connect(
host = "localhost",
user = "root",
passwd = "########",
database = "cvmsanepar"
)
mycursor = mydb.cursor();
mycursor.execute("CREATE TABLE CVM (conta VARCHAR(255), descricao VARCHAR(255), valor1 INTEGER(10), valor2 INTEGER(10), valor3 INTEGER(10))");
dfpTree = ET.parse("InfoFinaDFin.xml");
dfpRoot = dfpTree.getroot();
conta1 = []; #codigo CVM do plano de contas (accounts)
descricao = []; #descrição do plano de conta (account description)
valor1 = []; #valor ano do documento (value year 1)
valor2 = []; #valor ano anterior (value year 2)
valor3 = []; #valor ano anterior do anterior (value year 3)
for numeroDeConta in dfpRoot.iter('NumeroConta'):
conta1.append(numeroDeConta.text);
for descricaoConta in dfpRoot.iter('DescricaoConta1'):
descricao.append(descricaoConta.text);
for valor in dfpRoot.findall('InfoFinaDFin'):
valor1.append(valor.find('ValorConta1').text);
valor2.append(valor.find('ValorConta2').text);
valor3.append(valor.find('ValorConta3').text);
def merge(conta1,descricao,valor1,valor2,valor3):
mergedList = [(conta1[i], descricao[i], valor1[i], valor2[i], valor3[i]) for i in range(0, len(conta1))]
return mergedList
sqlFormula = "INSERT INTO CVM (conta, descricao, valor1, valor2, valor3) VALUES (%s,%s,%s,%s,%s)"
mycursor.executemany(sqlFormula, merge(conta1,descricao,valor1,valor2,valor3));
mydb.commit();
I made a python program which takes arguments from function call to update a table. The arguments are passed successfully but Does not update the table.
`
import mysql.connector
mydb = mysql.connector.connect(host="localhost",user='root',passwd="",database='student')
print(mydb)
mycursor = mydb.cursor()
mycursor.execute("create TABLE if not exists testtable ( num INT NOT NULL AUTO_INCREMENT,issue varchar(30), status varchar(30),PRIMARY KEY (num))")
def dev(y,z):
values=(y,z)
print(values)
print(mydb)
sql = "UPDATE form SET status = %s WHERE num = %s"
mycursor.execute(sql,values)
mydb.commit()
print(mycursor.rowcount, "record(s) affected")
values=('goog',1)
sql = "UPDATE form SET status = %s WHERE num = %s"
mycursor.execute(sql,values)
mydb.commit()
print(mycursor.rowcount, "record(s) affected")
dev('goog',2)`
a similar query outside the function works properly.
For some reason mycursor.execute() wont execute
What is the most effective way to insert data dumped on the database A into the database B? Normally I would use mysqldump for the task like this, but because of the complex query I had to take a different approach. At present I have the following inefficient solution:
from sqlalchemy import create_engine, Column, INTEGER, CHAR, VARCHAR
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
SessFactory = sessionmaker()
print('## Configure database connections')
db_one = create_engine('mysql://root:pwd1#127.0.0.1/db_one', echo=True).connect()
sess_one = SessFactory(bind=db_one)
db_two = create_engine('mysql://root:pwd2#127.0.0.2/db_two', echo=True).connect()
sess_two = SessFactory(bind=db_two)
## Declare query to dump data
dump_query = (
'SELECT A.id, A.name, B.address '
'FROM table_a A JOIN table_b B '
'ON A.id = B.id_c WHERE '
'A.deleted = 0'
)
print('## Fetch data on db_one')
data = db_one.execute(dump_query).fetchall()
## Declare table on db_two
class cstm_table(Base):
__tablename__ = 'cstm_table'
pk = Column(INTEGER, primary_key=True)
id = Column(CHAR(36), nullable=False)
name = Column(VARCHAR(150), default=None)
address = Column(VARCHAR(150), default=None)
print('## Recreate "cstm_table" on db_two')
cstm_table.__table__.drop(bind=db_two, checkfirst=True)
cstm_table.__table__.create(bind=db_two)
print('## Insert dumped data into the "cstm_table" on db_two')
for row in data:
insert = cstm_table.__table__.insert().values(row)
db_two.execute(insert)
This execute sequentially over a 100K inserts (horrible).
I also tried:
with db_two.connect() as conn:
with conn.begin() as trans:
row_as_dict = [dict(row.items()) for row in data]
try:
conn.execute(cstm_table.__table__.insert(), row_as_dict)
except:
trans.rollback()
raise
else:
trans.commit()
But then after inserting ~20 rows I get error:
OperationalError: (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
The following also does the job, but I'm not so sure it's the most efficient:
sess_two.add_all([cstm_table(**dict(row.items())) for row in data])
sess_two.flush()
sess_two.commit()
import csv
import MySQLdb
conn = MySQLdb.connect('localhost','tekno','poop','media')
cursor = conn.cursor()
txt = csv.reader(file('movies.csv'))
for row in txt:
cursor.execute('insert into shows_and_tv(watched_on,title,score_rating)' 'values ("%s","%s","%s")',row)
conn.close()
when I run this I get
TypeError: not enough arguments for format string
but it matches up
the csv is formatted like
dd-mm-yyyy,string,tinyint
which watches the fields in the database
I do not have a mysql database to play with. So I did what you need but in sqlite. It should be quite easy to adapt this to your needs.
import csv
import sqlite3
from collections import namedtuple
conn = sqlite3.connect('statictest.db')
c = conn.cursor()
c.execute('''CREATE TABLE if not exists movies (ID INTEGER PRIMARY KEY AUTOINCREMENT, 'watched_on','title','score_rating')''')
record = namedtuple('record',['watched_on','title','score_rating'])
SQL ='''
INSERT INTO movies ("watched_on","title","score_rating") VALUES (?,?,?)
'''
with open('statictest.csv', 'r') as file:
read_data = csv.reader(file)
for row in read_data:
watched_on, title, score_rating = row
data = (record(watched_on, title, score_rating))
c.execute(SQL, data)
conn.commit()