This is my first attempt to throw data back and forth between a local MySQL database and R. That said, I have a table created in the database and want to insert data into it. Currently, it is a blank table (created with MySQL Query Browser) and has a PK set.
I am using the RODBC package (RMySQL gives me errors) and prefer to stick with this library.
How should I go about inserting the data from a data frame into this table? Is there a quick solution or do I need to:
Create a new temp table from my dataframe
Insert the data
Drop the temp table
With separate commands? Any help much appreciated!
See help(sqlSave) in the package documentation; the example shows
channel <- odbcConnect("test")
sqlSave(channel, USArrests, rownames = "state", addPK=TRUE)
sqlFetch(channel, "USArrests", rownames = "state") # get the lot
foo <- cbind(state=row.names(USArrests), USArrests)[1:3, c(1,3)]
foo[1,2] <- 222
sqlUpdate(channel, foo, "USArrests")
sqlFetch(channel, "USArrests", rownames = "state", max = 5)
sqlDrop(channel, "USArrests")
close(channel)
which hopefully should be enough to get you going.
Related
I have a data frame made up of 3 columns named INTERNAL_ID, NT_CLONOTYPE and SAMPLE_ID. I need to write a script in R that will transfer this data into the appropriate 3 columns with the exact names in a MySQL table. However, the table has more than 3 columns, say 5 (INTERNAL_ID, COUNT, NT_CLONOTYPE, AA_CLONOTYPE, and SAMPLE_ID). The MySQL table already exists and may or may not include preexisting rows of data.
I'm using the dbx and RMariaDB libraries in R. I've been able to connect to the MySQL database with dbxConnect(). When I try to run dbxUpsert()
-----
conx <- dbxConnect(adapter = "mysql", dbname = "TCR_DB", host = "127.0.0.1", user = "xxxxx", password = "xxxxxxx")
table <- "TCR"
records <- newdf #dataframe previously created with the update data.
dbxUpsert(conx, table, records, where_cols = c("INTERNAL_ID"))
dbxDisconnect(conx)
I expect to obtain an updated mysql table with the new rows, which may or may not have null entries in the columns not contained in the data frame.
Ex.
INTERNAL_ID COUNT NT_CLONOTYPE AA_CLONOTYPE SAMPLE_ID
Pxxxxxx.01 CTTGGAACTG PMA.01
The connection and disconnection all run fin, but instead of the output I obtain the following error:
Error in .local(conn, statement, ...) :
could not run statement: Field 'COUNT' doesn't have a default value
I'm suspecting it's because the number of columns in each file are not the same, but I'm not sure. And if such, how can I get around this.
I figured it out. I changed the table entry for "COUNT" to default to NULL. This allowed for the program to proceed by ignoring "COUNT".
Does anyone know how I can use RMySQL (or another library) to update a row in a table, rather than having to pull out the full table and push it back in? I don't want to read such a huge table into memory just to update one row.
What I am trying to do is pull out a row, change some of the values in there within R and push the same row object back into the table.
However, dbWriteTable seems to replace the entire table rather than just the row I specify.
The easiest way is to construct a string within R containing the adequate SQL Update statement and use dbSendQuery to push your data back into the table.
Using sqldf package:
library(sqldf)
table_name = data.frame(a = 1:10, b = 4)
# Open connection
sqldf()
fn$sqldf("update table_name set b=1")
ans = sqldf("select * from main.table_name")
# Close connection
sqldf()
print(table_name)
I want to generate test data in mysql
Assuming that there is a table like this,
create table person (
id int auto_increment primary key,
name text,
age int,
birth_day date
);
let me know how to create test data in a simple way.
BTW I know some ways using stdin like
repeat 5 echo 'insert into person (name, age, birth_day) select concat("user", ceil(rand() * 100)), floor(rand()*100), date_add(date("1900/01/01"), interval floor(rand()*100) year);'
or
repeat 5 perl -M"Data::Random qw(:all)" -E 'say sprintf qq#insert into person (name, age, birth_day) values ("user%s", %s,"%s");#, (int rand(100)), (int rand(100)), rand_date(min => "1900-01-01", max=>"1999-12-31")'
I think the latter may be better because it doesn't use mysql functions.
This is the easiest way to generate dummy data for MySQL:
http://www.generatedata.com/
See also:
https://dba.stackexchange.com/questions/449/tool-to-generate-large-datasets-of-test-data
Personnaly, as sysadmin, I use the library Faker that permit to generate on-the-fly data.
for your database person, you could the following
#!/usr/bin/env python
import random
import mysql.connector
from mysql.connector import Error
from faker import Faker
Faker.seed(33422)
fake = Faker()
conn = mysql.connector.connect(host=db_host, database=db_name,
user=db_user, password=db_pass)
cursor = conn.cursor()
row = [fake.first_name(), random.randint(0,99), fake.date_of_birth()]
cursor.execute(' \
INSERT INTO `person` (name, age, birth_day) \
VALUES ("%s", %d, "%s",);' % (row[0], row[1], row[2])
conn.commit()
Then you can improve the script by looping and at each iteration faker will create random name and birth_date.
Faker has an extended list of type (called "provider") of data it can generate, the full list is available at https://faker.readthedocs.io/en/master/providers.html.
You can try http://paulthedutchman.nl/datagenerator-online. Also available as offline version to use on your local development environment. It has so much more options than other datagenerators out there.
Not suitable for mobile devices because it uses ExtJS, use it on your computer.
The offline version automatically scans your database and table structure.
This random data generator for MySQL is based on mysql routines and you don't need to really provide anything other than database and table names.
To use it:
Download the routines from GitHub
mysql < populate.sql
mysql> call populate('database','table',1000,'N');
I am using R to manage a MySQL database and ran into the following question: Can I use multiple cores to run an update on a table, or does SQL get confused by that? The R code I use works like this (pseudocode):
d_ply(.data = my_local_table, .variables = ~ID, .fun = function(x){
check if ID exists in my_DB_table
if(exists):
Update record
else:
Create new record
Is it problematic to use the .parallel option of the plyr package in this case?
I have a MySQL table that I am reading with the RMySQL package of R. I would like to be able to directly refer to the data frame stored in the table so I can seamlessly interact with it rather than having to execute RMySQL statement every time I want to do something. Is there a way to accomplish this? I tried:
data <- dbReadTable(conn = con, name = 'tablename')
For example, if I now want to check how many rows I have in this table I would run:
nrow(data)
Does this go through the database connection, or am I now storing the object "data" locally, defeating the whole purpose of using an external database?
data <- dbReadTable(conn = con, name = 'tablename')
This command downloads all the data into a local R dataframe (assuming you have enough RAM). Any operations with data from that point forward do not require the SQL connection.