I am using R to manage a MySQL database and ran into the following question: Can I use multiple cores to run an update on a table, or does SQL get confused by that? The R code I use works like this (pseudocode):
d_ply(.data = my_local_table, .variables = ~ID, .fun = function(x){
check if ID exists in my_DB_table
if(exists):
Update record
else:
Create new record
Is it problematic to use the .parallel option of the plyr package in this case?
Related
I've connected to a MySQL database using the RMariaDB package, and, thanks to the dbplyr package, am able to adjust the data using dplyr commands directly from R studio. However, there are some basic things I want to do that require base R functions (there are no equivalents in dplyr to my knowledge). Is there a way to clean this data using base R commands? Thanks in advance.
The answer to this arises from how the dbplyr package works. dbplyr translates certain dplyr commands into SQL. For example:
library(dplyr)
library(dbplyr)
data(mtcars)
# setup simulated database connection
df_postgre = tbl_lazy(mtcars, con = simulate_postgres())
# fetch first 5 records
first_five = df_postgre %>% head(5)
# view SQL transaltion
first_five %>% show_query()
# resulting SQL translation
<SQL>
SELECT *
FROM `df`
LIMIT 5
The major constrain for this approach is that dbplyr can only translate certain commands into SQL. So something like the following will fail:
# setup simulated database connection
df_postgre = tbl_lazy(mtcars, con = simulate_postgres())
# fetch first 5 records
first_five = df_postgre[1:5,]
While head(df, 5) and df[1:5,] produce identical output for data.frames in local R memory, dbplyr can not translate developer intention, only specific dplyr commands. Hence these two commands are very different when working with database tables.
The other element to consider here is that databases are primarily read-only. In R we can do
df = df %>%
mutate(new_var = 2*old_var)
and this changes the data held in memory. However in databases, the original data is stored in the database and it is transformed based on your instructions when it is requested. There are ways to write completely new database tables from existing database tables - there are already several Q&A on this under the dbplyr tag.
I am trying to utilize dbWriteTable to write from R to MySQL. When I connect to MySQL I created an odbc connection so I can just utilize the command:
abc <- DBI::dbConnect(odbc::odbc(),
dsn = "mysql_conn")
In which I can see all my schemas for the MySQL instance. This is great when I want to read in data such as:
test_query <- dbSendQuery(abc, "SELECT * FROM test_schema.test_file")
test_query <- dbFetch(test_query)
The problem I have is when I want to create a new table in one of the schema's how to declare the schema I want to write to in
dbWriteTable(abc, value = new_file, name = "new_file", overwrite=T)
I imagine I have to define the test_schema in the dbWriteTable portion but haven't been able to get it to work. Thoughts?
Does anyone know how I can use RMySQL (or another library) to update a row in a table, rather than having to pull out the full table and push it back in? I don't want to read such a huge table into memory just to update one row.
What I am trying to do is pull out a row, change some of the values in there within R and push the same row object back into the table.
However, dbWriteTable seems to replace the entire table rather than just the row I specify.
The easiest way is to construct a string within R containing the adequate SQL Update statement and use dbSendQuery to push your data back into the table.
Using sqldf package:
library(sqldf)
table_name = data.frame(a = 1:10, b = 4)
# Open connection
sqldf()
fn$sqldf("update table_name set b=1")
ans = sqldf("select * from main.table_name")
# Close connection
sqldf()
print(table_name)
This should be the simplest thing but for some reason it's eluding me completely.
I have a Sequel connection to a database named DB. It's using the Mysql2 engine if that's important.
I'm trying to update a single record in a table in the database. The short loop I'm using looks like this:
dataset = DB["SELECT post_id, message FROM xf_post WHERE message LIKE '%#{match}%'"]
dataset.each do |row|
new_message = process_message(row[:message])
# HERE IS WHERE I WANT TO UPDATE THE ROW IN THE DATABASE!
end
I've tried:
dataset.where('post_id = ?', row[:post_id]).update(message: new_message)
Which is what the Sequel cheat sheet recommends.
And:
DB["UPDATE xf_post SET message = ? WHERE post_id = ?", new_message, row[:post_id]]
Which should be raw SQL executed by the Sequel connector. Neither throws an error or outputs any error message (I'm using a logger with the Sequel connection). But both calls fail to update the records in the database. The data is unchanged when I query the database after running the code.
How can I make the update call function properly here?
Your problem is you are using a raw SQL dataset, so the where call isn't going to change the SQL, and update is just going to execute the raw SQL. Here's what you want to do:
dataset = DB[:xf_post].select(:post_id, :message).
where(Sequel.like(:message, "%#{match}%"))
That will make the where/update combination work.
Note that your original code has a trivial SQL injection vulnerability if match depends on user input, which this new code avoids. You may want to consider using Dataset#escape_like if you want to escape metacharacters inside match, otherwise if match depends on user input, it's possible for users to use very complex matching syntax that the database may execute slowly or not handle properly.
Note that the reason that
DB["UPDATE xf_post SET message = ? WHERE post_id = ?", new_message, row[:post_id]]
doesn't work is because it only creates a dataset, it doesn't execute it. You can actually call update on that dataset to run the query and return number of affected rows.
This is my first attempt to throw data back and forth between a local MySQL database and R. That said, I have a table created in the database and want to insert data into it. Currently, it is a blank table (created with MySQL Query Browser) and has a PK set.
I am using the RODBC package (RMySQL gives me errors) and prefer to stick with this library.
How should I go about inserting the data from a data frame into this table? Is there a quick solution or do I need to:
Create a new temp table from my dataframe
Insert the data
Drop the temp table
With separate commands? Any help much appreciated!
See help(sqlSave) in the package documentation; the example shows
channel <- odbcConnect("test")
sqlSave(channel, USArrests, rownames = "state", addPK=TRUE)
sqlFetch(channel, "USArrests", rownames = "state") # get the lot
foo <- cbind(state=row.names(USArrests), USArrests)[1:3, c(1,3)]
foo[1,2] <- 222
sqlUpdate(channel, foo, "USArrests")
sqlFetch(channel, "USArrests", rownames = "state", max = 5)
sqlDrop(channel, "USArrests")
close(channel)
which hopefully should be enough to get you going.