Empty result for sql query in Rstudio-server - mysql

I'm trying to get data from MySQL DB into Rstudio-server. My actions are like
mydb = dbConnect(MySQL(), user='user', password='password', dbname='dbname', host='localhost')
query <- stri_paste('select sellings.updated_at AS Up_Date, concat(item_parameters.title, " ", ad_attributes.int_value) AS Class, CONCAT(geos.name, " ", geos.kind) AS place, geos.lon, geos.lat, sellings.price AS price, ((geo_routes.distance*2/1000 + 100)) AS delivery_cost FROM sellings, users, item_parameters, ad_attributes, geos, geo_routes WHERE users.encrypted_password!="" && item_parameters.title="Класс" && sellings.price IS NOT NULL && ad_attributes.int_value IS NOT NULL AND users.id=sellings.user_id AND item_parameters.id=ad_attributes.item_parameter_id AND sellings.id = ad_attributes.ad_id AND sellings.geo_guid = geos.guid AND geos.routable_guid = geo_routes.src_guid AND geo_routes.distance = (SELECT geo_routes.distance FROM geo_routes, geos WHERE geos.guid = sellings.geo_guid AND geo_routes.src_guid = geos.routable_guid AND geo_routes.dst_guid = (SELECT geos.routable_guid FROM geos WHERE geos.name = "Воронеж" && geos.kind = "г")) ORDER BY Up_Date;')
rs = dbGetQuery(mydb, query)
And I get an empty dataframe. But when I do the same with my local DB everything is OK. The query takes a pretty long time, about 3 minutes, but it works properly. Moreover the same query works right from the command line in MySQL. On the server, it takes about 4 seconds. OS of server is Debian 7, OS of local machine is Win 8. Any idea?

Sometimes when querying from the command line the default schema has been set in a previous command. This command doesn't carry over to R so the exact same query from a command line to a R session might not work. Maybe check the dbname.

Insert the below statements in your SQL query
SET NOCOUNT ON
SET ANSI_WARNINGS OFF
It worked for me

Related

Saving dbplyr query (tbl_sql object) to MySQL without saving data locally

This question expands on this question
Here, I'm using the custom function created by #Simon.S.A. shown in the answer to this question. I'm attempting to save a tbl_sql object in R to MySQL as a new table without first saving it locally. Here, the database and schema in my MySQL are named "test." The tbl_sql object in R is my_data, and I want to save this is a new table in MySQL labeled "car_data".
library(DBI)
library(tidyverse)
library(dbplyr)
#establish connection and import data from MySQL
con <- DBI::dbConnect(RMariaDB::MariaDB(),
dbname = "test",
host = "127.0.0.1",
user = "user",
password = "password")
my_data <- tbl(con, "mtcars")
my_data <- my_data %>% filter(mpg >= 22)
# write function to save tbl_sql as a new table in SQL
write_to_database <- function(input_tbl, db, schema, tbl_name){
# connection
tbl_connection <- input_tbl$src$con
# SQL query
sql_query <- glue::glue(
"SELECT *\n",
"INTO {db}.{schema}.{tbl_name}\n",
"FROM (\n",
dbplyr::sql_render(input_tbl),
"\n) AS sub_query"
)
result <- dbExecute(tbl_connection, as.character(sql_query))
}
# execute function
write_to_database(my_data, "test", "test", "car_data")
After running final line, I get the following error. I'm not sure how I can fix this.
Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '.test.car_data
FROM (
SELECT *
FROM `mtcars`
WHERE (`mpg` >= 22.0)
) AS sub_quer' at line 2 [1064]
12.
stop(structure(list(message = "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '.test.car_data\nFROM (\nSELECT *\nFROM `mtcars`\nWHERE (`mpg` >= 22.0)\n) AS sub_quer' at line 2 [1064]",
call = NULL, cppstack = NULL), class = c("Rcpp::exception",
"C++Error", "error", "condition")))
11.
result_create(conn#ptr, statement, is_statement)
10.
initialize(value, ...)
9.
initialize(value, ...)
8.
new("MariaDBResult", sql = statement, ptr = result_create(conn#ptr,
statement, is_statement), bigint = conn#bigint, conn = conn)
7.
dbSend(conn, statement, params, is_statement = TRUE)
6.
.local(conn, statement, ...)
5.
dbSendStatement(conn, statement, ...)
4.
dbSendStatement(conn, statement, ...)
3.
dbExecute(tbl_connection, as.character(sql_query))
2.
dbExecute(tbl_connection, as.character(sql_query))
1.
write_to_database(my_data, "test", "test", "car_data")
Creating a table with INTO command is an SQL Server (even MS Access) specific syntax and not supported in MySQL. Instead, consider the counterpart statement: CREATE TABLE...SELECT. Also, schema differs between RDBMS's. For MySQL, database is synonymous to schema.
Therefore, consider adjusted version of SQL build:
sql_query <- glue::glue(
"CREATE TABLE {db}.{tbl_name}\n AS \n",
"SELECT * \n",
"FROM (\n",
dbplyr::sql_render(input_tbl),
"\n) AS sub_query"
)

Values are not inserted into MySQL table using pool.apply_async in python2.7

I am trying to run the following code to populate a table in parallel for a certain application. First the following function is defined which is supposed to connect to my db and execute the sql command with the values given (to insert into table).
def dbWriter(sql, rows) :
# load cnf file
MYSQL_CNF = os.path.abspath('.') + '/mysql.cnf'
conn = MySQLdb.connect(db='dedupe',
charset='utf8',
read_default_file = MYSQL_CNF)
cursor = conn.cursor()
cursor.executemany(sql, rows)
conn.commit()
cursor.close()
conn.close()
And then there is this piece:
pool = dedupe.backport.Pool(processes=2)
done = False
while not done :
chunks = (list(itertools.islice(b_data, step)) for step in
[step_size]*100)
results = []
for chunk in chunks :
print len(chunk)
results.append(pool.apply_async(dbWriter,
("INSERT INTO blocking_map VALUES (%s, %s)",
chunk)))
for r in results :
r.wait()
if len(chunk) < step_size :
done = True
pool.close()
Everything works and there are no errors. But at the end, my table is empty, meaning somehow the insertions were not successful. I have tried so many things to fix this (including adding column names for insertion) after many google searches and have not been successful. Any suggestions would be appreciated. (running code in python2.7, gcloud (ubuntu). note that indents may be a bit messed up after pasting here)
Please also note that "chunk" follows exactly the required data format.
Note. This is part of this example
Please note that the only thing I am changing in the above example (linked) is that I am separating the steps for creation of and inserting into the tables since I am running my code on gcloud platform and it enforces GTID standards.
Solution was changing dbwriter function to:
conn = MySQLdb.connect(host = # host ip,
user = # username,
passwd = # password,
db = 'dedupe')
cursor = conn.cursor()
cursor.executemany(sql, rows)
cursor.close()
conn.commit()
conn.close()

multiprocessing.Pool to execute concurrent subprocesses to pipe data from mysqldump

I am using Python to pipe data from one mysql database to another. Here is a lightly abstracted version of code which I have been using for several months, which has worked fairly well:
def copy_table(mytable):
raw_mysqldump = "mysqldump -h source_host -u source_user --password='secret' --lock-tables=FALSE myschema mytable"
raw_mysql = "mysql -h destination_host -u destination_user --password='secret' myschema"
mysqldump = shlex.split(raw_mysqldump)
mysql = shlex.split(raw_mysql)
ps = subprocess.Popen(mysqldump, stdout=subprocess.PIPE)
subprocess.check_output(mysql, stdin=ps.stdout)
ps.stdout.close()
retcode = ps.wait()
if retcode == 0:
return mytable, 1
else:
return mytable, 0
The size of the data has grown, and it currently takes about an hour to copy something like 30 tables. In order to speed things along, I would like to utilize multiprocessing. I am attempting to execute the following code on an Ubuntu server, which is an t2.micro (AWS EC2).
def copy_tables(tables):
with multiprocessing.Pool(processes=4) as pool:
params = [(arg, table) for table in sorted(tables)]
results = pool.starmap(copy_table, params)
failed_tables = [table for table, success in results if success == 0]
all_tables_processed = False if failed_tables else True
return all_tables_processed
The problem is: almost all of the tables will copy, but there are always a couple of child processes left over that will not complete - they just hang, and I can see from monitoring the database that no data is being transferred. It feels like they have somehow disconnected from the parent process, or that data is not being returned properly.
This is my first question, and I've tried to be both specific and concise - thanks in advance for any help, and please let me know if I can provide more information.
I think the following code
ps = subprocess.Popen(mysqldump, stdout=subprocess.PIPE)
subprocess.check_output(mysql, stdin=ps.stdout)
ps.stdout.close()
retcode = ps.wait()
should be
ps = subprocess.Popen(mysqldump, stdout=subprocess.PIPE)
sps = subprocess.Popen(mysql, stdin=ps.stdout)
retcode = ps.wait()
ps.stdout.close()
sps.wait()
You should not close the pipe until the mysqldump process finished. And check_output is blocking, it will hang until stdin reach the end.

Is there a faster way to upload data from R to MySql?

I am using the following code to upload a new table into a mysql database.
library(RMySql)
library(RODBC)
con <- dbConnect(MySQL(),
user = 'user',
password = 'pw',
host = 'amazonaws.com',
dbname = 'db_name')
dbSendQuery(con, "CREATE TABLE table_1 (
var_1 VARCHAR(50),
var_2 VARCHAR(50),
var_3 DOUBLE,
var_4 DOUBLE);
")
channel <- odbcConnect("db name")
sqlSave(channel, dat = df, tablename = "tb_name", rownames = FALSE, append =
TRUE)
The full data set is 68 variables and 5 million rows. It is taking over 90 minutes to upload 50 thousand rows to MySql. Is there a more efficient way to upload the data to MySql. I originally tried dbWriteTable() but this would result in an error message saying the connection to the database was lost.
Consider a CSV export from R for an import into MySQL with LOAD DATA INFILE:
...
write.csv(df, "/path/to/filename.csv", row.names=FALSE)
dbSendQuery(con, "LOAD DATA LOCAL INFILE '/path/to/filename.csv'
INTO TABLE mytable
FIELDS TERMINATED by ','
ENCLOSED BY '"'
LINES TERMINATED BY '\\n'")
You could try to disable the mysql query log:
dbSendQuery(con, "SET GLOBAL general_log = 'off'")
I can't tell if your mysql user account has the appropriate permissions to do that, or if it conflicts with your business needs.
Off the top of my head: Otherwise you could try to send the data in say 1000-row batches, using a for- loop in your Rscript, and maybe option verbose = true in your call to sqlSave
If you send the data in a single batch, Mysql might try to run the INSERT as a single transaction ("all-or-nothing") and if it fails it goes into recovery or just fails after inserting some random number of rows.

Error in RMySQL package

I am using RMySQL package to write (append) data in current table.
I am using R, version 3.3.2.
My code looks like this:
library(RMySQL)
df_final <- some_data
m<-dbDriver("MySQL")
mydb <- dbConnect(m, user='odvjet12_mislav',
password='my_pass',
host='91.234.46.219',
dbname='odvjet12_fina_pn')
dbWriteTable(mydb, value = df_final, name = "fina_pn", append = TRUE, row.names = FALSE)
This code works fine for some time, but in last ten days, it always return an error:
Error in .local(conn, statement, ...) :
could not run statement: The used command is not allowed with this MySQL version
I don't understand how it is possible for code to work for some time and now, it returns an error?
I kindly ask for feedback on this issue.
Best,
Mislav Šagovac
You could also use dbGetQuery from the RMySQL package and iterate over the rows, which was my solution when I reached a similar error for a dataframe I wanted to write to a MySQL DB:
mydb = dbConnect(MySQL(), user='user', password='password', dbname='databasename', host='hostname')
for(i in 1:nrow(df)){
dbGetQuery(mydb,paste0("INSERT INTO MYTABLE (COL1,COL2) VALUES(",df$col1[i],",",df$col2[i],")"))
}