Quick easy way to migrate SQLite3 to MySQL? [closed] - mysql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
Anyone know a quick easy way to migrate a SQLite3 database to MySQL?

Everyone seems to starts off with a few greps and perl expressions and you sorta kinda get something that works for your particular dataset but you have no idea if it's imported the data correctly or not. I'm seriously surprised nobody's built a solid library that can convert between the two.
Here a list of ALL the differences in SQL syntax that I know about between the two file formats:
The lines starting with:
BEGIN TRANSACTION
COMMIT
sqlite_sequence
CREATE UNIQUE INDEX
are not used in MySQL
SQLite uses CREATE TABLE/INSERT INTO "table_name" and MySQL uses CREATE TABLE/INSERT INTO table_name
MySQL doesn't use quotes inside the schema definition
MySQL uses single quotes for strings inside the INSERT INTO clauses
SQLite and MySQL have different ways of escaping strings inside INSERT INTO clauses
SQLite uses 't' and 'f' for booleans, MySQL uses 1 and 0 (a simple regex for this can fail when you have a string like: 'I do, you don't' inside your INSERT INTO)
SQLLite uses AUTOINCREMENT, MySQL uses AUTO_INCREMENT
Here is a very basic hacked up perl script which works for my dataset and checks for many more of these conditions that other perl scripts I found on the web. Nu guarantees that it will work for your data but feel free to modify and post back here.
#! /usr/bin/perl
while ($line = <>){
if (($line !~ /BEGIN TRANSACTION/) && ($line !~ /COMMIT/) && ($line !~ /sqlite_sequence/) && ($line !~ /CREATE UNIQUE INDEX/)){
if ($line =~ /CREATE TABLE \"([a-z_]*)\"(.*)/i){
$name = $1;
$sub = $2;
$sub =~ s/\"//g;
$line = "DROP TABLE IF EXISTS $name;\nCREATE TABLE IF NOT EXISTS $name$sub\n";
}
elsif ($line =~ /INSERT INTO \"([a-z_]*)\"(.*)/i){
$line = "INSERT INTO $1$2\n";
$line =~ s/\"/\\\"/g;
$line =~ s/\"/\'/g;
}else{
$line =~ s/\'\'/\\\'/g;
}
$line =~ s/([^\\'])\'t\'(.)/$1THIS_IS_TRUE$2/g;
$line =~ s/THIS_IS_TRUE/1/g;
$line =~ s/([^\\'])\'f\'(.)/$1THIS_IS_FALSE$2/g;
$line =~ s/THIS_IS_FALSE/0/g;
$line =~ s/AUTOINCREMENT/AUTO_INCREMENT/g;
print $line;
}
}

Here is a list of converters (not updated since 2011):
https://www2.sqlite.org/cvstrac/wiki?p=ConverterTools (or snapshot at archive.org)
An alternative method that would work nicely but is rarely mentioned is: use an ORM class that abstracts specific database differences away for you. e.g. you get these in PHP (RedBean), Python (Django's ORM layer, Storm, SqlAlchemy), Ruby on Rails (ActiveRecord), Cocoa (CoreData)
i.e. you could do this:
Load data from source database using the ORM class.
Store data in memory or serialize to disk.
Store data into destination database using the ORM class.

Here is a python script, built off of Shalmanese's answer and some help from Alex martelli over at Translating Perl to Python
I'm making it community wiki, so please feel free to edit, and refactor as long as it doesn't break the functionality (thankfully we can just roll back) - It's pretty ugly but works
use like so (assuming the script is called dump_for_mysql.py:
sqlite3 sample.db .dump | python dump_for_mysql.py > dump.sql
Which you can then import into mysql
note - you need to add foreign key constrains manually since sqlite doesn't actually support them
here is the script:
#!/usr/bin/env python
import re
import fileinput
def this_line_is_useless(line):
useless_es = [
'BEGIN TRANSACTION',
'COMMIT',
'sqlite_sequence',
'CREATE UNIQUE INDEX',
'PRAGMA foreign_keys=OFF',
]
for useless in useless_es:
if re.search(useless, line):
return True
def has_primary_key(line):
return bool(re.search(r'PRIMARY KEY', line))
searching_for_end = False
for line in fileinput.input():
if this_line_is_useless(line):
continue
# this line was necessary because '');
# would be converted to \'); which isn't appropriate
if re.match(r".*, ''\);", line):
line = re.sub(r"''\);", r'``);', line)
if re.match(r'^CREATE TABLE.*', line):
searching_for_end = True
m = re.search('CREATE TABLE "?(\w*)"?(.*)', line)
if m:
name, sub = m.groups()
line = "DROP TABLE IF EXISTS %(name)s;\nCREATE TABLE IF NOT EXISTS `%(name)s`%(sub)s\n"
line = line % dict(name=name, sub=sub)
else:
m = re.search('INSERT INTO "(\w*)"(.*)', line)
if m:
line = 'INSERT INTO %s%s\n' % m.groups()
line = line.replace('"', r'\"')
line = line.replace('"', "'")
line = re.sub(r"([^'])'t'(.)", "\1THIS_IS_TRUE\2", line)
line = line.replace('THIS_IS_TRUE', '1')
line = re.sub(r"([^'])'f'(.)", "\1THIS_IS_FALSE\2", line)
line = line.replace('THIS_IS_FALSE', '0')
# Add auto_increment if it is not there since sqlite auto_increments ALL
# primary keys
if searching_for_end:
if re.search(r"integer(?:\s+\w+)*\s*PRIMARY KEY(?:\s+\w+)*\s*,", line):
line = line.replace("PRIMARY KEY", "PRIMARY KEY AUTO_INCREMENT")
# replace " and ' with ` because mysql doesn't like quotes in CREATE commands
if line.find('DEFAULT') == -1:
line = line.replace(r'"', r'`').replace(r"'", r'`')
else:
parts = line.split('DEFAULT')
parts[0] = parts[0].replace(r'"', r'`').replace(r"'", r'`')
line = 'DEFAULT'.join(parts)
# And now we convert it back (see above)
if re.match(r".*, ``\);", line):
line = re.sub(r'``\);', r"'');", line)
if searching_for_end and re.match(r'.*\);', line):
searching_for_end = False
if re.match(r"CREATE INDEX", line):
line = re.sub('"', '`', line)
if re.match(r"AUTOINCREMENT", line):
line = re.sub("AUTOINCREMENT", "AUTO_INCREMENT", line)
print line,

I usually use the Export/import tables feature of IntelliJ DataGrip.
You can see the progress in the bottom right corner.
[]

If you are using Python/Django it's pretty easy:
create two databases in settings.py (like here https://docs.djangoproject.com/en/1.11/topics/db/multi-db/)
then just do like this:
objlist = ModelObject.objects.using('sqlite').all()
for obj in objlist:
obj.save(using='mysql')

Probably the quick easiest way is using the sqlite .dump command, in this case create a dump of the sample database.
sqlite3 sample.db .dump > dump.sql
You can then (in theory) import this into the mysql database, in this case the test database on the database server 127.0.0.1, using user root.
mysql -p -u root -h 127.0.0.1 test < dump.sql
I say in theory as there are a few differences between grammars.
In sqlite transactions begin
BEGIN TRANSACTION;
...
COMMIT;
MySQL uses just
BEGIN;
...
COMMIT;
There are other similar problems (varchars and double quotes spring back to mind) but nothing find and replace couldn't fix.
Perhaps you should ask why you are migrating, if performance/ database size is the issue perhaps look at reoginising the schema, if the system is moving to a more powerful product this might be the ideal time to plan for the future of your data.

I've just gone through this process, and there's a lot of very good help and information in this Q/A, but I found I had to pull together various elements (plus some from other Q/As) to get a working solution in order to successfully migrate.
However, even after combining the existing answers, I found that the Python script did not fully work for me as it did not work where there were multiple boolean occurrences in an INSERT. See here why that was the case.
So, I thought I'd post up my merged answer here. Credit goes to those that have contributed elsewhere, of course. But I wanted to give something back, and save others time that follow.
I'll post the script below. But firstly, here's the instructions for a conversion...
I ran the script on OS X 10.7.5 Lion. Python worked out of the box.
To generate the MySQL input file from your existing SQLite3 database, run the script on your own files as follows,
Snips$ sqlite3 original_database.sqlite3 .dump | python ~/scripts/dump_for_mysql.py > dumped_data.sql
I then copied the resulting dumped_sql.sql file over to a Linux box running Ubuntu 10.04.4 LTS where my MySQL database was to reside.
Another issue I had when importing the MySQL file was that some unicode UTF-8 characters (specifically single quotes) were not being imported correctly, so I had to add a switch to the command to specify UTF-8.
The resulting command to input the data into a spanking new empty MySQL database is as follows:
Snips$ mysql -p -u root -h 127.0.0.1 test_import --default-character-set=utf8 < dumped_data.sql
Let it cook, and that should be it! Don't forget to scrutinise your data, before and after.
So, as the OP requested, it's quick and easy, when you know how! :-)
As an aside, one thing I wasn't sure about before I looked into this migration, was whether created_at and updated_at field values would be preserved - the good news for me is that they are, so I could migrate my existing production data.
Good luck!
UPDATE
Since making this switch, I've noticed a problem that I hadn't noticed before. In my Rails application, my text fields are defined as 'string', and this carries through to the database schema. The process outlined here results in these being defined as VARCHAR(255) in the MySQL database. This places a 255 character limit on these field sizes - and anything beyond this was silently truncated during the import. To support text length greater than 255, the MySQL schema would need to use 'TEXT' rather than VARCHAR(255), I believe. The process defined here does not include this conversion.
Here's the merged and revised Python script that worked for my data:
#!/usr/bin/env python
import re
import fileinput
def this_line_is_useless(line):
useless_es = [
'BEGIN TRANSACTION',
'COMMIT',
'sqlite_sequence',
'CREATE UNIQUE INDEX',
'PRAGMA foreign_keys=OFF'
]
for useless in useless_es:
if re.search(useless, line):
return True
def has_primary_key(line):
return bool(re.search(r'PRIMARY KEY', line))
searching_for_end = False
for line in fileinput.input():
if this_line_is_useless(line): continue
# this line was necessary because ''); was getting
# converted (inappropriately) to \');
if re.match(r".*, ''\);", line):
line = re.sub(r"''\);", r'``);', line)
if re.match(r'^CREATE TABLE.*', line):
searching_for_end = True
m = re.search('CREATE TABLE "?([A-Za-z_]*)"?(.*)', line)
if m:
name, sub = m.groups()
line = "DROP TABLE IF EXISTS %(name)s;\nCREATE TABLE IF NOT EXISTS `%(name)s`%(sub)s\n"
line = line % dict(name=name, sub=sub)
line = line.replace('AUTOINCREMENT','AUTO_INCREMENT')
line = line.replace('UNIQUE','')
line = line.replace('"','')
else:
m = re.search('INSERT INTO "([A-Za-z_]*)"(.*)', line)
if m:
line = 'INSERT INTO %s%s\n' % m.groups()
line = line.replace('"', r'\"')
line = line.replace('"', "'")
line = re.sub(r"(?<!')'t'(?=.)", r"1", line)
line = re.sub(r"(?<!')'f'(?=.)", r"0", line)
# Add auto_increment if it's not there since sqlite auto_increments ALL
# primary keys
if searching_for_end:
if re.search(r"integer(?:\s+\w+)*\s*PRIMARY KEY(?:\s+\w+)*\s*,", line):
line = line.replace("PRIMARY KEY", "PRIMARY KEY AUTO_INCREMENT")
# replace " and ' with ` because mysql doesn't like quotes in CREATE commands
# And now we convert it back (see above)
if re.match(r".*, ``\);", line):
line = re.sub(r'``\);', r"'');", line)
if searching_for_end and re.match(r'.*\);', line):
searching_for_end = False
if re.match(r"CREATE INDEX", line):
line = re.sub('"', '`', line)
print line,

http://sqlfairy.sourceforge.net/
http://search.cpan.org/dist/SQL-Translator/
aptitude install sqlfairy libdbd-sqlite3-perl
sqlt -f DBI --dsn dbi:SQLite:../.open-tran/ten-sq.db -t MySQL --add-drop-table > mysql-ten-sq.sql
sqlt -f DBI --dsn dbi:SQLite:../.open-tran/ten-sq.db -t Dumper --use-same-auth > sqlite2mysql-dumper.pl
chmod +x sqlite2mysql-dumper.pl
./sqlite2mysql-dumper.pl --help
./sqlite2mysql-dumper.pl --add-truncate --mysql-loadfile > mysql-dump.sql
sed -e 's/LOAD DATA INFILE/LOAD DATA LOCAL INFILE/' -i mysql-dump.sql
echo 'drop database `ten-sq`' | mysql -p -u root
echo 'create database `ten-sq` charset utf8' | mysql -p -u root
mysql -p -u root -D ten-sq < mysql-ten-sq.sql
mysql -p -u root -D ten-sq < mysql-dump.sql

I wrote this simple script in Python3. It can be used as an included class or standalone script invoked via a terminal shell. By default it imports all integers as int(11)and strings as varchar(300), but all that can be adjusted in the constructor or script arguments respectively.
NOTE: It requires MySQL Connector/Python 2.0.4 or higher
Here's a link to the source on GitHub if you find the code below hard to read: https://github.com/techouse/sqlite3-to-mysql
#!/usr/bin/env python3
__author__ = "Klemen TuĊĦar"
__email__ = "techouse#gmail.com"
__copyright__ = "GPL"
__version__ = "1.0.1"
__date__ = "2015-09-12"
__status__ = "Production"
import os.path, sqlite3, mysql.connector
from mysql.connector import errorcode
class SQLite3toMySQL:
"""
Use this class to transfer an SQLite 3 database to MySQL.
NOTE: Requires MySQL Connector/Python 2.0.4 or higher (https://dev.mysql.com/downloads/connector/python/)
"""
def __init__(self, **kwargs):
self._properties = kwargs
self._sqlite_file = self._properties.get('sqlite_file', None)
if not os.path.isfile(self._sqlite_file):
print('SQLite file does not exist!')
exit(1)
self._mysql_user = self._properties.get('mysql_user', None)
if self._mysql_user is None:
print('Please provide a MySQL user!')
exit(1)
self._mysql_password = self._properties.get('mysql_password', None)
if self._mysql_password is None:
print('Please provide a MySQL password')
exit(1)
self._mysql_database = self._properties.get('mysql_database', 'transfer')
self._mysql_host = self._properties.get('mysql_host', 'localhost')
self._mysql_integer_type = self._properties.get('mysql_integer_type', 'int(11)')
self._mysql_string_type = self._properties.get('mysql_string_type', 'varchar(300)')
self._sqlite = sqlite3.connect(self._sqlite_file)
self._sqlite.row_factory = sqlite3.Row
self._sqlite_cur = self._sqlite.cursor()
self._mysql = mysql.connector.connect(
user=self._mysql_user,
password=self._mysql_password,
host=self._mysql_host
)
self._mysql_cur = self._mysql.cursor(prepared=True)
try:
self._mysql.database = self._mysql_database
except mysql.connector.Error as err:
if err.errno == errorcode.ER_BAD_DB_ERROR:
self._create_database()
else:
print(err)
exit(1)
def _create_database(self):
try:
self._mysql_cur.execute("CREATE DATABASE IF NOT EXISTS `{}` DEFAULT CHARACTER SET 'utf8'".format(self._mysql_database))
self._mysql_cur.close()
self._mysql.commit()
self._mysql.database = self._mysql_database
self._mysql_cur = self._mysql.cursor(prepared=True)
except mysql.connector.Error as err:
print('_create_database failed creating databse {}: {}'.format(self._mysql_database, err))
exit(1)
def _create_table(self, table_name):
primary_key = ''
sql = 'CREATE TABLE IF NOT EXISTS `{}` ( '.format(table_name)
self._sqlite_cur.execute('PRAGMA table_info("{}")'.format(table_name))
for row in self._sqlite_cur.fetchall():
column = dict(row)
sql += ' `{name}` {type} {notnull} {auto_increment}, '.format(
name=column['name'],
type=self._mysql_string_type if column['type'].upper() == 'TEXT' else self._mysql_integer_type,
notnull='NOT NULL' if column['notnull'] else 'NULL',
auto_increment='AUTO_INCREMENT' if column['pk'] else ''
)
if column['pk']:
primary_key = column['name']
sql += ' PRIMARY KEY (`{}`) ) ENGINE = InnoDB CHARACTER SET utf8'.format(primary_key)
try:
self._mysql_cur.execute(sql)
self._mysql.commit()
except mysql.connector.Error as err:
print('_create_table failed creating table {}: {}'.format(table_name, err))
exit(1)
def transfer(self):
self._sqlite_cur.execute("SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%'")
for row in self._sqlite_cur.fetchall():
table = dict(row)
# create the table
self._create_table(table['name'])
# populate it
print('Transferring table {}'.format(table['name']))
self._sqlite_cur.execute('SELECT * FROM "{}"'.format(table['name']))
columns = [column[0] for column in self._sqlite_cur.description]
try:
self._mysql_cur.executemany("INSERT IGNORE INTO `{table}` ({fields}) VALUES ({placeholders})".format(
table=table['name'],
fields=('`{}`, ' * len(columns)).rstrip(' ,').format(*columns),
placeholders=('%s, ' * len(columns)).rstrip(' ,')
), (tuple(data) for data in self._sqlite_cur.fetchall()))
self._mysql.commit()
except mysql.connector.Error as err:
print('_insert_table_data failed inserting data into table {}: {}'.format(table['name'], err))
exit(1)
print('Done!')
def main():
""" For use in standalone terminal form """
import sys, argparse
parser = argparse.ArgumentParser()
parser.add_argument('--sqlite-file', dest='sqlite_file', default=None, help='SQLite3 db file')
parser.add_argument('--mysql-user', dest='mysql_user', default=None, help='MySQL user')
parser.add_argument('--mysql-password', dest='mysql_password', default=None, help='MySQL password')
parser.add_argument('--mysql-database', dest='mysql_database', default=None, help='MySQL host')
parser.add_argument('--mysql-host', dest='mysql_host', default='localhost', help='MySQL host')
parser.add_argument('--mysql-integer-type', dest='mysql_integer_type', default='int(11)', help='MySQL default integer field type')
parser.add_argument('--mysql-string-type', dest='mysql_string_type', default='varchar(300)', help='MySQL default string field type')
args = parser.parse_args()
if len(sys.argv) == 1:
parser.print_help()
exit(1)
converter = SQLite3toMySQL(
sqlite_file=args.sqlite_file,
mysql_user=args.mysql_user,
mysql_password=args.mysql_password,
mysql_database=args.mysql_database,
mysql_host=args.mysql_host,
mysql_integer_type=args.mysql_integer_type,
mysql_string_type=args.mysql_string_type
)
converter.transfer()
if __name__ == '__main__':
main()

I recently had to migrate from MySQL to JavaDB for a project that our team is working on. I found a Java library written by Apache called DdlUtils that made this pretty easy. It provides an API that lets you do the following:
Discover a database's schema and export it as an XML file.
Modify a DB based upon this schema.
Import records from one DB to another, assuming they have the same schema.
The tools that we ended up with weren't completely automated, but they worked pretty well. Even if your application is not in Java, it shouldn't be too difficult to whip up a few small tools to do a one-time migration. I think I was able to pull of our migration with less than 150 lines of code.

Get a SQL dump
moose#pc08$ sqlite3 mySqliteDatabase.db .dump > myTemporarySQLFile.sql
Import dump to MySQL
For small imports:
moose#pc08$ mysql -u <username> -p
Enter password:
....
mysql> use somedb;
Database changed
mysql> source myTemporarySQLFile.sql;
or
mysql -u root -p somedb < myTemporarySQLFile.sql
This will prompt you for a password. Please note: If you want to enter your password directly, you have to do it WITHOUT space, directly after -p:
mysql -u root -pYOURPASS somedb < myTemporarySQLFile.sql
For larger dumps:
mysqlimport or other import tools like BigDump.
BigDump gives you a progress bar:

Based on Jims's solution:
Quick easy way to migrate SQLite3 to MySQL?
sqlite3 your_sql3_database.db .dump | python ./dump.py > your_dump_name.sql
cat your_dump_name.sql | sed '1d' | mysql --user=your_mysql_user --default-character-set=utf8 your_mysql_db -p
This works for me. I use sed just to throw the first line, which is not mysql-like, but you might as well modify dump.py script to throw this line away.

There is no need to any script,command,etc...
you have to only export your sqlite database as a .csv file and then import it in Mysql using phpmyadmin.
I used it and it worked amazing...

Ha... I wish I had found this first! My response was to this post... script to convert mysql dump sql file into format that can be imported into sqlite3 db
Combining the two would be exactly what I needed:
When the sqlite3 database is going to be used with ruby you may want to change:
tinyint([0-9]*)
to:
sed 's/ tinyint(1*) / boolean/g ' |
sed 's/ tinyint([0|2-9]*) / integer /g' |
alas, this only half works because even though you are inserting 1's and 0's into a field marked boolean, sqlite3 stores them as 1's and 0's so you have to go through and do something like:
Table.find(:all, :conditions => {:column => 1 }).each { |t| t.column = true }.each(&:save)
Table.find(:all, :conditions => {:column => 0 }).each { |t| t.column = false}.each(&:save)
but it was helpful to have the sql file to look at to find all the booleans.

This script is ok except for this case that of course, I've met :
INSERT INTO "requestcomparison_stopword" VALUES(149,'f');
INSERT INTO "requestcomparison_stopword" VALUES(420,'t');
The script should give this output :
INSERT INTO requestcomparison_stopword VALUES(149,'f');
INSERT INTO requestcomparison_stopword VALUES(420,'t');
But gives instead that output :
INSERT INTO requestcomparison_stopword VALUES(1490;
INSERT INTO requestcomparison_stopword VALUES(4201;
with some strange non-ascii characters around the last 0 and 1.
This didn't show up anymore when I commented the following lines of the code (43-46) but others problems appeared:
line = re.sub(r"([^'])'t'(.)", "\1THIS_IS_TRUE\2", line)
line = line.replace('THIS_IS_TRUE', '1')
line = re.sub(r"([^'])'f'(.)", "\1THIS_IS_FALSE\2", line)
line = line.replace('THIS_IS_FALSE', '0')
This is just a special case, when we want to add a value being 'f' or 't' but I'm not really comfortable with regular expressions, I just wanted to spot this case to be corrected by someone.
Anyway thanks a lot for that handy script !!!

This simple solution worked for me:
<?php
$sq = new SQLite3( 'sqlite3.db' );
$tables = $sq->query( 'SELECT name FROM sqlite_master WHERE type="table"' );
while ( $table = $tables->fetchArray() ) {
$table = current( $table );
$result = $sq->query( sprintf( 'SELECT * FROM %s', $table ) );
if ( strpos( $table, 'sqlite' ) !== false )
continue;
printf( "-- %s\n", $table );
while ( $row = $result->fetchArray( SQLITE3_ASSOC ) ) {
$values = array_map( function( $value ) {
return sprintf( "'%s'", mysql_real_escape_string( $value ) );
}, array_values( $row ) );
printf( "INSERT INTO `%s` VALUES( %s );\n", $table, implode( ', ', $values ) );
}
}

echo ".dump" | sqlite3 /tmp/db.sqlite > db.sql
watch out for CREATE statements

Related

Python 3 psycopg2 COPY from stdin failed: error in .read()

I am trying to apply the code found on this page, in particular part 'Copy Data from String Iterator' of the Table of Contents, but run into an issue with my code.
Since not all lines coming from the generator (here log_lines) can be imported into the PostgreSQL database, I try to filter the correct lines (here row) using itertools.filterfalse like in the codeblock below:
def copy_string_iterator(connection, log_lines) -> None:
with connection.cursor() as cursor:
create_staging_table(cursor)
log_string_iterator = StringIteratorIO((
'|'.join(map(clean_csv_value, (
row['date'],
row['time'],
row['cs_uri_query'],
row['s_contentpath'],
row['sc_status'],
row['s_computername'],
...
row['sc_substates'],
row['s_port'],
row['cs_version'],
row['c_protocol'],
row.update({'cs_cookie':'x'}),
row['timetakenms'],
row['cs_uri_stem'],
))) + '\n')
for row in filterfalse(lambda line: "#" in line.get('date'), log_lines)
)
cursor.copy_from(log_string_iterator, 'log_table', sep = '|')
When I run this, cursor.copy_from() gives me the following error:
QueryCanceled: COPY from stdin failed: error in .read() call
CONTEXT: COPY log_table, line 112910
I understand why this error happens, it is because in the test file I use there are only 112909 lines that meet the filterfalse condition. But why does it try to copy line 112910 and throw the error and not just stop?
Since Python doesn't have a coalescing operator, add something like:
(map(clean_csv_value, (
row['date'] if 'date' in row else None,
:
row['cs_uri_stem'] if 'cs_uri_stem' in row else None,
))) + '\n')
for each of your fields so you can handle any missing fields in the JSON file. Of course the fields should be nullable in the db if you use None otherwise replace with None with some default value for that field.

Values are not inserted into MySQL table using pool.apply_async in python2.7

I am trying to run the following code to populate a table in parallel for a certain application. First the following function is defined which is supposed to connect to my db and execute the sql command with the values given (to insert into table).
def dbWriter(sql, rows) :
# load cnf file
MYSQL_CNF = os.path.abspath('.') + '/mysql.cnf'
conn = MySQLdb.connect(db='dedupe',
charset='utf8',
read_default_file = MYSQL_CNF)
cursor = conn.cursor()
cursor.executemany(sql, rows)
conn.commit()
cursor.close()
conn.close()
And then there is this piece:
pool = dedupe.backport.Pool(processes=2)
done = False
while not done :
chunks = (list(itertools.islice(b_data, step)) for step in
[step_size]*100)
results = []
for chunk in chunks :
print len(chunk)
results.append(pool.apply_async(dbWriter,
("INSERT INTO blocking_map VALUES (%s, %s)",
chunk)))
for r in results :
r.wait()
if len(chunk) < step_size :
done = True
pool.close()
Everything works and there are no errors. But at the end, my table is empty, meaning somehow the insertions were not successful. I have tried so many things to fix this (including adding column names for insertion) after many google searches and have not been successful. Any suggestions would be appreciated. (running code in python2.7, gcloud (ubuntu). note that indents may be a bit messed up after pasting here)
Please also note that "chunk" follows exactly the required data format.
Note. This is part of this example
Please note that the only thing I am changing in the above example (linked) is that I am separating the steps for creation of and inserting into the tables since I am running my code on gcloud platform and it enforces GTID standards.
Solution was changing dbwriter function to:
conn = MySQLdb.connect(host = # host ip,
user = # username,
passwd = # password,
db = 'dedupe')
cursor = conn.cursor()
cursor.executemany(sql, rows)
cursor.close()
conn.commit()
conn.close()

multiprocessing.Pool to execute concurrent subprocesses to pipe data from mysqldump

I am using Python to pipe data from one mysql database to another. Here is a lightly abstracted version of code which I have been using for several months, which has worked fairly well:
def copy_table(mytable):
raw_mysqldump = "mysqldump -h source_host -u source_user --password='secret' --lock-tables=FALSE myschema mytable"
raw_mysql = "mysql -h destination_host -u destination_user --password='secret' myschema"
mysqldump = shlex.split(raw_mysqldump)
mysql = shlex.split(raw_mysql)
ps = subprocess.Popen(mysqldump, stdout=subprocess.PIPE)
subprocess.check_output(mysql, stdin=ps.stdout)
ps.stdout.close()
retcode = ps.wait()
if retcode == 0:
return mytable, 1
else:
return mytable, 0
The size of the data has grown, and it currently takes about an hour to copy something like 30 tables. In order to speed things along, I would like to utilize multiprocessing. I am attempting to execute the following code on an Ubuntu server, which is an t2.micro (AWS EC2).
def copy_tables(tables):
with multiprocessing.Pool(processes=4) as pool:
params = [(arg, table) for table in sorted(tables)]
results = pool.starmap(copy_table, params)
failed_tables = [table for table, success in results if success == 0]
all_tables_processed = False if failed_tables else True
return all_tables_processed
The problem is: almost all of the tables will copy, but there are always a couple of child processes left over that will not complete - they just hang, and I can see from monitoring the database that no data is being transferred. It feels like they have somehow disconnected from the parent process, or that data is not being returned properly.
This is my first question, and I've tried to be both specific and concise - thanks in advance for any help, and please let me know if I can provide more information.
I think the following code
ps = subprocess.Popen(mysqldump, stdout=subprocess.PIPE)
subprocess.check_output(mysql, stdin=ps.stdout)
ps.stdout.close()
retcode = ps.wait()
should be
ps = subprocess.Popen(mysqldump, stdout=subprocess.PIPE)
sps = subprocess.Popen(mysql, stdin=ps.stdout)
retcode = ps.wait()
ps.stdout.close()
sps.wait()
You should not close the pipe until the mysqldump process finished. And check_output is blocking, it will hang until stdin reach the end.

Ruby Mysql2 Client not taking backslash while insert

We are using production and staging databases in our application.
Our requirement is to insert all the records to staging database when ever a record is added in production database, so that both the servers are consistent and same data.
I have used Mysql2 client pool to connect to staging server and insert the record that is added to production.
here is my code:
def create
#aperson = Person.new
#person = #aperson.save
if #person && Rails.env == "production"
#add_new_person_to_staging
client = Mysql2::Client.new(:host => dbconfig[:host], :username => dbconfig[:username], :password => dbconfig[:password], :database => dbconfig[:database])
#person_result = client.query('INSERT INTO user_types(user_name, regex, code) Values ("myname" , "\.myregex\." , "ns" );')
end
end
Here "#person_result" record is inserted to mysql table but the "regex" column eliminates "\" slashes.
like : user_name = myname, regex = .myregex., code = ns
when I manually execute the "Insert" query in mysql command line it inserts as it is along with \ slash. but not through "client.query"
Why does \ slash is eliminated. please help me here.
Thanks.
\ is likely being removed by the MySQL2 client as part of a SQL injection protection preprocessor.
Have you looked at trying either a double backslash or using the escape method to properly escape the string?
Try using this
#person_result = client.query('INSERT INTO user_types(user_name, regex, code) Values (myname , "\."+myregx+".\" , ns )')

Is there a tool to check database integrity in Django?

The MySQL database powering our Django site has developed some integrity problems; e.g. foreign keys that refer to nonexistent rows. I won't go into how we got into this mess, but I'm now looking at how to fix it.
Basically, I'm looking for a script that scans all models in the Django site, and checks whether all foreign keys and other constraints are correct. Hopefully, the number of problems will be small enough so they can be fixed by hand.
I could code this up myself but I'm hoping that somebody here has a better idea.
I found django-check-constraints but it doesn't quite fit the bill: right now, I don't need something to prevent these problems, but to find them so they can be fixed manually before taking other steps.
Other constraints:
Django 1.1.1 and upgrading has been determined to break things
MySQL 5.0.51 (Debian Lenny), currently with MyISAM tables
Python 2.5, might be upgradable but I'd rather not right now
(Later, we will convert to InnoDB for proper transaction support, and maybe foreign key constraints on the database level, to prevent similar problems in the future. But that's not the topic of this question.)
I whipped up something myself. The management script below should be saved in myapp/management/commands/checkdb.py. Make sure that intermediate directories have an __init__.py file.
Usage: ./manage.py checkdb for a full check; use --exclude app.Model or -e app.Model to exclude the model Model in the app app.
from django.core.management.base import BaseCommand, CommandError
from django.core.management.base import NoArgsCommand
from django.core.exceptions import ObjectDoesNotExist
from django.db import models
from optparse import make_option
from lib.progress import with_progress_meter
def model_name(model):
return '%s.%s' % (model._meta.app_label, model._meta.object_name)
class Command(BaseCommand):
args = '[-e|--exclude app_name.ModelName]'
help = 'Checks constraints in the database and reports violations on stdout'
option_list = NoArgsCommand.option_list + (
make_option('-e', '--exclude', action='append', type='string', dest='exclude'),
)
def handle(self, *args, **options):
# TODO once we're on Django 1.2, write to self.stdout and self.stderr instead of plain print
exclude = options.get('exclude', None) or []
failed_instance_count = 0
failed_model_count = 0
for app in models.get_apps():
for model in models.get_models(app):
if model_name(model) in exclude:
print 'Skipping model %s' % model_name(model)
continue
fail_count = self.check_model(app, model)
if fail_count > 0:
failed_model_count += 1
failed_instance_count += fail_count
print 'Detected %d errors in %d models' % (failed_instance_count, failed_model_count)
def check_model(self, app, model):
meta = model._meta
if meta.proxy:
print 'WARNING: proxy models not currently supported; ignored'
return
# Define all the checks we can do; they return True if they are ok,
# False if not (and print a message to stdout)
def check_foreign_key(model, field):
foreign_model = field.related.parent_model
def check_instance(instance):
try:
# name: name of the attribute containing the model instance (e.g. 'user')
# attname: name of the attribute containing the id (e.g. 'user_id')
getattr(instance, field.name)
return True
except ObjectDoesNotExist:
print '%s with pk %s refers via field %s to nonexistent %s with pk %s' % \
(model_name(model), str(instance.pk), field.name, model_name(foreign_model), getattr(instance, field.attname))
return check_instance
# Make a list of checks to run on each model instance
checks = []
for field in meta.local_fields + meta.local_many_to_many + meta.virtual_fields:
if isinstance(field, models.ForeignKey):
checks.append(check_foreign_key(model, field))
# Run all checks
fail_count = 0
if checks:
for instance in with_progress_meter(model.objects.all(), model.objects.count(), 'Checking model %s ...' % model_name(model)):
for check in checks:
if not check(instance):
fail_count += 1
return fail_count
I'm making this a community wiki because I welcome any and all improvements to my code!
Thomas' answer is great but is now a bit out of date.
I have updated it as a gist to support Django 1.8+.