[Django]Import data into mysql from file - mysql

I'm trying to import data from a file (3 columns) into my local mysql database. It contains 27442 rows.
def import(file):
base_dir= '/home/xxxxx/Desktop/'
with open(os.path.join(base_dir, file), 'r') as f:
for line in f.readlines():
line=line.split() #remove all useless space
if len(line) >= 3:
host.objects.create(
hostaddress=line[0],
hostname= line[1],
comment=line[2] #' ' by default
)
else:
raise('Invalid format')
Everything seem work properly but when i check the table after importation
I get less than the half of total rows (about 11000).
Django model:
class host(models.Model):
hostname= models.SlugField(max_length=50)
hostaddress= models.GenericIPAddressField(primary_key=True)
comment= models.CharField(max_length=100)

I suspect the problem is:
hostaddress= models.GenericIPAddressField(primary_key=True)
primary_key=True automatically sets unique=True. If you have duplicate hostaddress fields in the raw data then only the unique values will be imported.

Related

how to upload whole text of a file in a row in mysql through terminal

How to upload whole text of a text file in a row in database, the text gets divided and is stored in subsequent rows.
This is the code of my SQL file, database name is info containing table named info having two columns des1 and des2 having field VARCHAR(3000):
use info;
INSERT INTO info (des1) VALUES (LOAD_FILE('eng.txt'));
select * from(info);
I am getting following output:
des1 des2
NULL NULL
I also attatched an image showing output
I expect all of the text in the file to be in a single row of the database, this has to be done in terminal
There might be a problem with how you form the string that is to be inserted. You can do it easily by the help of a programming language.
I am providing a simple solution that would work in Python.
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="root",
passwd="password",
database="db"
)
def fileReadToString(filename):
result = ""
with open(filename, "r") as ins:
for line in ins:
result +=(line)
return result
file = fileReadToString('eng.txt')
mycursor = mydb.cursor()
sql = "INSERT INTO info VALUES (%s)"
val = (file)
mycursor.execute(sql, val)
mydb.commit()
print(mycursor.rowcount, "done")
LOAD DATA INFILE is for structured data. For unstructured data use LOAD_FILE():
INSERT INTO info (des1) VALUES (LOAD_FILE('eng.txt'))

Updating one table of MYSQL with multiple processes via pymysql

Actually, I am trying to update one table with multiple processes via pymysql, and each process reads a CSV file split from a huge one in order to promote the speed. But I get the Lock wait timeout exceeded; try restarting transaction exception when I run the script. After searching the posts on this site, I found one post which mentioned that to set or build the built-in LOAD_DATA_INFILE, but no details on it. How can I do it with 'pymysql' to reach my aim?
---------------------------first edit----------------------------------------
Here's the job method:
`def importprogram(path, name):
begin = time.time()
print('begin to import program' + name + ' info.')
# "c:\\sometest.csv"
file = open(path, mode='rb')
csvfile = csv.reader(codecs.iterdecode(file, 'utf-8'))
connection = None
try:
connection = pymysql.connect(host='a host', user='someuser', password='somepsd', db='mydb',
cursorclass=pymysql.cursors.DictCursor)
count = 1
with connection.cursor() as cursor:
sql = '''update sometable set Acolumn='{guid}' where someid='{pid}';'''
next(csvfile, None)
for line in csvfile:
try:
count = count + 1
if ''.join(line).strip():
command = sql.format(guid=line[2], pid=line[1])
cursor.execute(command)
if count % 1000 == 0:
print('program' + name + ' cursor execute', count)
except csv.Error:
print('program csv.Error:', count)
continue
except IndexError:
print('program IndexError:', count)
continue
except StopIteration:
break
except Exception as e:
print('program' + name, str(e))
finally:
connection.commit()
connection.close()
file.close()
print('program' + name + ' info done.time cost:', time.time()-begin)`
And the multi-processing method:
import multiprocessing as mp
def multiproccess():
pool = mp.Pool(3)
results = []
paths = ['C:\\testfile01.csv', 'C:\\testfile02.csv', 'C:\\testfile03.csv']
name = 1
for path in paths:
results.append(pool.apply_async(importprogram, args=(path, str(name))))
name = name + 1
print(result.get() for result in results)
pool.close()
pool.join()
And the main method:
if __name__ == '__main__':
multiproccess()
I am new to Python. How can I make the code or the way itself goes wrong? Should I use only one single process to finish the data reading and importing?
Your issue is that you are exceeding the time allowed for a response to be fetched from the server, so the client is automatically timing out.
In my experience, adjust the wait timeout to something like 6000 seconds, combine into one CSV and just leave the data to import. Also, I would recommend running the query direct from MySQL rather than Python.
The way I usually import CSV data from Python to MySQL is through the INSERT ... VALUES ... method, and I only do so when some kind of manipulation of the data is required (i.e. inserting different rows into different tables).
I like your approach and understand your thinking but in reality there is no need. The benefit to the INSERT ... VALUES ... method is that you won't run into any timeout issue.

writer.writerow() doesn't write to the correct column

I have three DynamoDB tables. Two tables have instance IDs that are part of an application and the other is a master table of all instances across all of my accounts and the tag metadata. I have two scans for the two tables to get the instance IDs and then query the master table for the tag metadata. However, when I try writing this to the CSV file, I want to have two separate header sections for each dynamo table's unique output. Once the first iteration is done, the second file write writes to the last row where the first iteration left off instead of starting over at the top in the second header section. Below is my code and an output example to make it clear.
CODE:
import boto3
import csv
import json
from boto3.dynamodb.conditions import Key, Attr
dynamo = boto3.client('dynamodb')
dynamodb = boto3.resource('dynamodb')
s3 = boto3.resource('s3')
# Required resource and client calls
all_instances_table = dynamodb.Table('Master')
missing_response = dynamo.scan(TableName='T1')
installed_response = dynamo.scan(TableName='T2')
# Creates CSV DictWriter object and fieldnames
with open('file.csv', 'w') as csvfile:
fieldnames = ['Agent Not Installed', 'Not Installed Account', 'Not Installed Tags', 'Not Installed Environment', " ", 'Agent Installed', 'Installed Account', 'Installed Tags', 'Installed Environment']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
# Find instances IDs from the missing table in the master table to pull tag metadata
for instances in missing_response['Items']:
instance_missing = instances['missing_instances']['S']
#print("Missing:" + instance_missing)
query_missing = all_instances_table.query(KeyConditionExpression=Key('ID').eq(instance_missing))
for item_missing in query_missing['Items']:
missing_id = item_missing['ID']
missing_account = item_missing['Account']
missing_tags = item_missing['Tags']
missing_env = item_missing['Environment']
# Write the data to the CSV file
writer.writerow({'Agent Not Installed': missing_id, 'Not Installed Account': missing_account, 'Not Installed Tags': missing_tags, 'Not Installed Environment': missing_env})
# Find instances IDs from the installed table in the master table to pull tag metadata
for instances in installed_response['Items']:
instance_installed = instances['installed_instances']['S']
#print("Installed:" + instance_installed)
query_installed = all_instances_table.query(KeyConditionExpression=Key('ID').eq(instance_installed))
for item_installed in query_installed['Items']:
installed_id = item_installed['ID']
print(installed_id)
installed_account = item_installed['Account']
installed_tags = item_installed['Tags']
installed_env = item_installed['Environment']
# Write the data to the CSV file
writer.writerow({'Agent Installed': installed_id, 'Installed Account': installed_account, 'Installed Tags': installed_tags, 'Installed Environment': installed_env})
OUTPUT:
This is what the columns/rows look like in the file.
I need all of the output to be on the same line for each header section.
DATA:
Here is a sample of what both tables look like.
SAMPLE OUTPUT:
Here is what the for loops print out and appends to the lists.
Missing:
i-0xxxxxx 333333333 foo#bar.com int
i-0yyyyyy 333333333 foo1#bar.com int
Installed:
i-0zzzzzz 44444444 foo2#bar.com int
i-0aaaaaa 44444444 foo3#bar.com int
You want to collect related rows together into a single list to write on a single row, something like:
missing = [] # collection for missing_responses
installed = [] # collection for installed_responses
# Find instances IDs from the missing table in the master table to pull tag metadata
for instances in missing_response['Items']:
instance_missing = instances['missing_instances']['S']
#print("Missing:" + instance_missing)
query_missing = all_instances_table.query(KeyConditionExpression=Key('ID').eq(instance_missing))
for item_missing in query_missing['Items']:
missing_id = item_missing['ID']
missing_account = item_missing['Account']
missing_tags = item_missing['Tags']
missing_env = item_missing['Environment']
# Update first half of row with missing list
missing.append(missing_id, missing_account, missing_tags, missing_env)
# Find instances IDs from the installed table in the master table to pull tag metadata
for instances in installed_response['Items']:
instance_installed = instances['installed_instances']['S']
#print("Installed:" + instance_installed)
query_installed = all_instances_table.query(KeyConditionExpression=Key('ID').eq(instance_installed))
for item_installed in query_installed['Items']:
installed_id = item_installed['ID']
print(installed_id)
installed_account = item_installed['Account']
installed_tags = item_installed['Tags']
installed_env = item_installed['Environment']
# update second half of row by updating installed list
installed.append(installed_id, installed_account, installed_tags, installed_env)
# combine your two lists outside a loop
this_row = []
i = 0;
for m in missing:
# iterate through the first half to concatenate with the second half
this_row.append( m + installed[i] )
i = i +1
# adding an empty column after the write operation, manually, is optional
# Write the data to the CSV file
writer.writerow(this_row)
This will work if your installed and missing tables operate on a relatable field - like a timestamp or an account ID, something that you can ensure keeps the rows being concatenated in the same order. A data sample would be useful to really answer the question.

blank file while copying a file in python

I have a function takes a file as input and prints certain statistics and also copies the file into a file name provided by the user. Here is my current code:
def copy_file(option):
infile_name = input("Please enter the name of the file to copy: ")
infile = open(infile_name, 'r')
outfile_name = input("Please enter the name of the new copy: ")
outfile = open(outfile_name, 'w')
slist = infile.readlines()
if option == 'statistics':
for line in infile:
outfile.write(line)
infile.close()
outfile.close()
result = []
blank_count = slist.count('\n')
for item in slist:
result.append(len(item))
print('\n{0:<5d} lines in the list\n{1:>5d} empty lines\n{2:>7.1f} average character per line\n{3:>7.1f} average character per non-empty line'.format(
len(slist), blank_count, sum(result)/len(slist), (sum(result)-blank_count)/(len(slist)-blank_count)))
copy_file('statistics')
It prints the statistics of the file correctly, however the copy it makes of the file is empty. If I remove the readline() part and the statistics part, the function seems to make a copy of the file correctly. How can I correct my code so that it does both. It's a minor problem but I can't seem to get it.
The reason the file is blank is that
slist = infile.readlines()
is reading the entire contents of the file, so when it gets to
for line in infile:
there is nothing left to read and it just closes the newly truncated (mode w) file leaving you with a blank file.
I think the answer here is to change your for line in infile: to for line in slist:
def copy_file(option):
infile_name= input("Please enter the name of the file to copy: ")
infile = open(infile_name, 'r')
outfile_name = input("Please enter the name of the new copy: ")
outfile = open(outfile_name, 'w')
slist = infile.readlines()
if option == 'statistics':
for line in slist:
outfile.write(line)
infile.close()
outfile.close()
result = []
blank_count = slist.count('\n')
for item in slist:
result.append(len(item))
print('\n{0:<5d} lines in the list\n{1:>5d} empty lines\n{2:>7.1f} average character per line\n{3:>7.1f} average character per non-empty line'.format(
len(slist), blank_count, sum(result)/len(slist), (sum(result)-blank_count)/(len(slist)-blank_count)))
copy_file('statistics')
Having said all that, consider if it's worth using your own copy routine rather than shutil.copy - Always better to delegate the task to your OS as it will be quicker and probably safer (thanks to NightShadeQueen for the reminder)!

Is there a tool to check database integrity in Django?

The MySQL database powering our Django site has developed some integrity problems; e.g. foreign keys that refer to nonexistent rows. I won't go into how we got into this mess, but I'm now looking at how to fix it.
Basically, I'm looking for a script that scans all models in the Django site, and checks whether all foreign keys and other constraints are correct. Hopefully, the number of problems will be small enough so they can be fixed by hand.
I could code this up myself but I'm hoping that somebody here has a better idea.
I found django-check-constraints but it doesn't quite fit the bill: right now, I don't need something to prevent these problems, but to find them so they can be fixed manually before taking other steps.
Other constraints:
Django 1.1.1 and upgrading has been determined to break things
MySQL 5.0.51 (Debian Lenny), currently with MyISAM tables
Python 2.5, might be upgradable but I'd rather not right now
(Later, we will convert to InnoDB for proper transaction support, and maybe foreign key constraints on the database level, to prevent similar problems in the future. But that's not the topic of this question.)
I whipped up something myself. The management script below should be saved in myapp/management/commands/checkdb.py. Make sure that intermediate directories have an __init__.py file.
Usage: ./manage.py checkdb for a full check; use --exclude app.Model or -e app.Model to exclude the model Model in the app app.
from django.core.management.base import BaseCommand, CommandError
from django.core.management.base import NoArgsCommand
from django.core.exceptions import ObjectDoesNotExist
from django.db import models
from optparse import make_option
from lib.progress import with_progress_meter
def model_name(model):
return '%s.%s' % (model._meta.app_label, model._meta.object_name)
class Command(BaseCommand):
args = '[-e|--exclude app_name.ModelName]'
help = 'Checks constraints in the database and reports violations on stdout'
option_list = NoArgsCommand.option_list + (
make_option('-e', '--exclude', action='append', type='string', dest='exclude'),
)
def handle(self, *args, **options):
# TODO once we're on Django 1.2, write to self.stdout and self.stderr instead of plain print
exclude = options.get('exclude', None) or []
failed_instance_count = 0
failed_model_count = 0
for app in models.get_apps():
for model in models.get_models(app):
if model_name(model) in exclude:
print 'Skipping model %s' % model_name(model)
continue
fail_count = self.check_model(app, model)
if fail_count > 0:
failed_model_count += 1
failed_instance_count += fail_count
print 'Detected %d errors in %d models' % (failed_instance_count, failed_model_count)
def check_model(self, app, model):
meta = model._meta
if meta.proxy:
print 'WARNING: proxy models not currently supported; ignored'
return
# Define all the checks we can do; they return True if they are ok,
# False if not (and print a message to stdout)
def check_foreign_key(model, field):
foreign_model = field.related.parent_model
def check_instance(instance):
try:
# name: name of the attribute containing the model instance (e.g. 'user')
# attname: name of the attribute containing the id (e.g. 'user_id')
getattr(instance, field.name)
return True
except ObjectDoesNotExist:
print '%s with pk %s refers via field %s to nonexistent %s with pk %s' % \
(model_name(model), str(instance.pk), field.name, model_name(foreign_model), getattr(instance, field.attname))
return check_instance
# Make a list of checks to run on each model instance
checks = []
for field in meta.local_fields + meta.local_many_to_many + meta.virtual_fields:
if isinstance(field, models.ForeignKey):
checks.append(check_foreign_key(model, field))
# Run all checks
fail_count = 0
if checks:
for instance in with_progress_meter(model.objects.all(), model.objects.count(), 'Checking model %s ...' % model_name(model)):
for check in checks:
if not check(instance):
fail_count += 1
return fail_count
I'm making this a community wiki because I welcome any and all improvements to my code!
Thomas' answer is great but is now a bit out of date.
I have updated it as a gist to support Django 1.8+.