I want to generate test data in mysql
Assuming that there is a table like this,
create table person (
id int auto_increment primary key,
name text,
age int,
birth_day date
);
let me know how to create test data in a simple way.
BTW I know some ways using stdin like
repeat 5 echo 'insert into person (name, age, birth_day) select concat("user", ceil(rand() * 100)), floor(rand()*100), date_add(date("1900/01/01"), interval floor(rand()*100) year);'
or
repeat 5 perl -M"Data::Random qw(:all)" -E 'say sprintf qq#insert into person (name, age, birth_day) values ("user%s", %s,"%s");#, (int rand(100)), (int rand(100)), rand_date(min => "1900-01-01", max=>"1999-12-31")'
I think the latter may be better because it doesn't use mysql functions.
This is the easiest way to generate dummy data for MySQL:
http://www.generatedata.com/
See also:
https://dba.stackexchange.com/questions/449/tool-to-generate-large-datasets-of-test-data
Personnaly, as sysadmin, I use the library Faker that permit to generate on-the-fly data.
for your database person, you could the following
#!/usr/bin/env python
import random
import mysql.connector
from mysql.connector import Error
from faker import Faker
Faker.seed(33422)
fake = Faker()
conn = mysql.connector.connect(host=db_host, database=db_name,
user=db_user, password=db_pass)
cursor = conn.cursor()
row = [fake.first_name(), random.randint(0,99), fake.date_of_birth()]
cursor.execute(' \
INSERT INTO `person` (name, age, birth_day) \
VALUES ("%s", %d, "%s",);' % (row[0], row[1], row[2])
conn.commit()
Then you can improve the script by looping and at each iteration faker will create random name and birth_date.
Faker has an extended list of type (called "provider") of data it can generate, the full list is available at https://faker.readthedocs.io/en/master/providers.html.
You can try http://paulthedutchman.nl/datagenerator-online. Also available as offline version to use on your local development environment. It has so much more options than other datagenerators out there.
Not suitable for mobile devices because it uses ExtJS, use it on your computer.
The offline version automatically scans your database and table structure.
This random data generator for MySQL is based on mysql routines and you don't need to really provide anything other than database and table names.
To use it:
Download the routines from GitHub
mysql < populate.sql
mysql> call populate('database','table',1000,'N');
Related
I am importing data into my Python3 environment and then writing it to a MySQL database. However, there is a lot of different data tables, and so writing out each INSERT statement isn't really pragmatic, plus some have 50+ columns.
Is there a good way to create a table in MySQL directly from a dataframe, and then send insert commands to that same table using a dataframe of the same format, without having to actually type out all the col names? I started trying to call column names and format it and concat everything as a string, but it is extremely messy.
Ideally there is a function out there to directly handle this. For example:
apiconn.request("GET", url, headers=datheaders)
#pull in some JSON data from an API
eventres = apiconn.getresponse()
eventjson = json.loads(eventres.read().decode("utf-8"))
#create a dataframe from the data
eventtable = json_normalize(eventjson)
dbconn = pymysql.connect(host='hostval',
user='userval',
passwd='passval',
db='dbval')
cursor = dbconn.cursor()
sql = sqltranslate(table = 'eventtable', fun = 'append')
#where sqlwrite() is some magic function that takes a dataframe and
#creates SQL commands that pymysql can execute.
cursor.execute(sql)
What you want is a way to abstract the generation of the SQL statements.
A library like SQLAlchemy will do a good job, including a powerful way to construct DDL, DML, and DQL statements without needing to directly write any SQL.
I work as a Business Analyst and new to Python.
In one of my project, I want to extract data from .csv file and load that data into my MySQL DB (Staging).
Can anyone guide me with a sample code and frameworks I should use?
Simple program to create sqllite. You can read the CSV file and use dynamic_entry to insert into your desired target table.
import sqlite3
import time
import datetime
import random
conn = sqlite3.connect('test.db')
c = conn.cursor()
def create_table():
c.execute('create table if not exists stuffToPlot(unix REAL, datestamp TEXT, keyword TEXT, value REAL)')
def data_entry():
c.execute("INSERT INTO stuffToPlot VALUES(1452549219,'2016-01-11 13:53:39','Python',6)")
conn.commit()
c.close()
conn.close()
def dynamic_data_entry():
unix = time.time();
date = str(datetime.datetime.fromtimestamp(unix).strftime('%Y-%m-%d %H:%M:%S'))
keyword = 'python'
value = random.randrange(0,10)
c.execute("INSERT INTO stuffToPlot(unix,datestamp,keyword,value) values(?,?,?,?)",
(unix,date,keyword,value))
conn.commit()
def read_from_db():
c.execute('select * from stuffToPlot')
#data = c.fetchall()
#print(data)
for row in c.fetchall():
print(row)
read_from_db()
c.close()
conn.close()
You can iterate through the data in CSV and load into sqllite3. Please refer below link as well.
Quick easy way to migrate SQLite3 to MySQL?
If that's a properly formatted CSV file you can use the LOAD DATA INFILE MySQL command and you won't need any python. Then after it is loaded in the staging area (without processing) you can continue transforming it using sql/etl tool of choice.
https://dev.mysql.com/doc/refman/8.0/en/load-data.html
A problem with that is that you need to add all columns but still even if you have data you don't need you might prefer to load everything in the staging.
I've got 200k csv files and I need to import them all to a single postgresql table. It's a list of parameters from various devices and each csv's file name contains device's serial number and I need it to be in one of the colums for each row.
So to simplify, I've got few columns of data (no headers), let's say that columns in each csv file are: Date, Variable, Value and file name contains SERIALNUMBER_and_someOtherStuffIDontNeed.csv
I'm trying to use cygwin to write a bash script to iterate over files and do it for me, however for some reason it won't work, showing 'syntax error at or near "as" '
Here's my code:
#!/bin/bash
FILELIST=/cygdrive/c/devices/files/*
for INPUT_FILE in $FILELIST
do
psql -U postgres -d devices -c "copy devicelist
(
Date,
Variable,
Value,
SN as CURRENT_LOAD_SOURCE(),
)
from '$INPUT_FILE
delimiter ',' ;"
done
I'm learning SQL so it might be an obvious mistake, but I can't see it.
Also I know that in that form I will get full file name, not just the serial number bit I want but I can probably handle that somehow later.
Please advise.
Thanks.
I dont think there is a CURRENT_LOAD_SOURCE() function in postgres. A work-around is to leave the name-column NULL on copy, and patch is to the desired value just after the copy. I prefer a shell here-document because that make quoting inside the SQL body easier. (BTW: for 10K of files, the globbing needed to obtain FILELIST might exceed argmax for the shell ...)
#!/bin/bash
FILELIST="`ls /tmp/*.c`"
for INPUT_FILE in $FILELIST
do
echo "File:" $INPUT_FILE
psql -U postgres -d devices <<OMG
-- I have a schema "tmp" for testing purposes
CREATE TABLE IF NOT EXISTS tmp.filelist(name text, content text);
COPY tmp.filelist ( content)
from '/$INPUT_FILE' delimiter ',' ;
UPDATE tmp.filelist SET name = '$FILELIST'
WHERE name IS NULL;
OMG
done
For anyone interested in an answer, I've used a python script to change file names and then another script using psycopg2 to connect to the database and then done everyting in one connection. Took 10 minutes instead of 10 hours.
Here's the code:
Renaming files (also apparently to import from CSV you need all the rows to be filled and the information I needed was in first 4 columns anyway, therefore I've put together a solution to generate whole new CSVs instead of just renaming them):
import os
import csv
path='C:/devices/files'
os.chdir(path)
i=0
for file in os.listdir(path):
try:
i+=1
if i%10000 == 0:
#just to see the progress
print(i)
serial_number = (file[:8])
creader = csv.reader(open(file))
cwriter = csv.writer(open('processed_'+file, 'w'))
for cline in creader:
new_line = [val for col, val in enumerate(cline) if col not in (4, 5, 6, 7)]
new_line.insert(0, serial_number)
#print(new_line)
cwriter.writerow(new_line)
except:
print('problem with file: ' + file)
pass
Updating database:
import os
import psycopg2
path="C:\\devices\\files"
directory_listing = os.listdir(path)
conn = psycopg2.connect("dbname='devices' user='postgres' host='localhost'")
cursor = conn.cursor()
print(len(directory_listing))
i=100001
while i < 218792:
current_file=(directory_listing[i])
i+=1
full_path = "C:/devices/files/" + current_file
with open(full_path) as f:
cursor.copy_from(file=f, table='devicelistlive', sep=",")
conn.commit()
conn.close()
Don't mind while and weird numbers, it's just because I was doing it in portions for testing purposes. Can easily be replaced with for
I am able to do mySQL data insert using following,
from twisted.enterprise.adbapi import ConnectionPool
.
.
self.factory.pool.runOperation ('insert into table ....')
But, somehow unable to figure out how to do a simple select from an adbapi call to mySQL like following,
self.factory.pool.runOperation('SELECT id FROM table WHERE name = (%s)',customer)
How do I retrieve the id value from this partilcar call? I was working OK with plain python but somehow really fuzzed up with the twisted framework.
Thanks.
runOperation isn't for SELECT statements. It is for statements that do not produce rows, eg INSERT and DELETE.
Statements that produce rows are supported by runQuery. For example:
pool = ...
d = pool.runQuery("SELECT id FROM table WHERE name = (%s)", (customer,))
def gotRows(rows):
print 'The user id is', rows
def queryError(reason):
print 'Problem with the query:', reason
d.addCallbacks(gotRows, queryError)
In this example, d is an instance of Deferred. If you haven't encountered Deferreds before, you definitely want to read up about them: http://twistedmatrix.com/documents/current/core/howto/defer.html
This is my first attempt to throw data back and forth between a local MySQL database and R. That said, I have a table created in the database and want to insert data into it. Currently, it is a blank table (created with MySQL Query Browser) and has a PK set.
I am using the RODBC package (RMySQL gives me errors) and prefer to stick with this library.
How should I go about inserting the data from a data frame into this table? Is there a quick solution or do I need to:
Create a new temp table from my dataframe
Insert the data
Drop the temp table
With separate commands? Any help much appreciated!
See help(sqlSave) in the package documentation; the example shows
channel <- odbcConnect("test")
sqlSave(channel, USArrests, rownames = "state", addPK=TRUE)
sqlFetch(channel, "USArrests", rownames = "state") # get the lot
foo <- cbind(state=row.names(USArrests), USArrests)[1:3, c(1,3)]
foo[1,2] <- 222
sqlUpdate(channel, foo, "USArrests")
sqlFetch(channel, "USArrests", rownames = "state", max = 5)
sqlDrop(channel, "USArrests")
close(channel)
which hopefully should be enough to get you going.