I have been trying to write an R script to query Impala database. Here is the query to the database:
select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA
When I run this query manually (read: outside the Rscript via impala-shell), I am able to get the table contents. However, when the same is tried via the R script, I get the following error:
[1] "HY000 140 [Cloudera][ImpalaODBC] (140) Unsupported query."
[2] "[RODBC] ERROR: Could not SQLExecDirect 'select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA'
closing unused RODBC handle 1
Why does the query fail when tried via R? and how do I fix this? Thanks in advance :)
Edit 1:
The connection script looks as below:
library("RODBC");
connection <- odbcConnect("Impala");
query <- "select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA";
data <- sqlQuery(connection,query);
You need to install the relevant drivers, please look at the following link
I had the same issue, all i had to do was update the ODBC drivers.
Also if you can update your odbcConnect with the username and password
connection <- odbcConnect("Impala");
to
connection <- odbcConnect("Impala", uid="root", pwd="password")
The RODBC package is quirky: if there's no row updated/deleted in the query execution it will throw an error.
So before using sqlDelete to delete rows, or using sqlUpdate to update values, first check if there's at least one row that will be deleted/updated by querying COUNT(*).
I've had no problem after implementing the check, for Oracle SQL 12g.
An alternative would be to use a staging table for the new batch of data, and use sqlQuery to execute a MERGE command. RODBC won't complaint if there's zero row merged.
This might also be due to an error in your sql query itself. For example, I got this error when I missed an 'in' in the following generalized statement. Example:
stringstuff <- someDT$columnyouwanttouse
somestring <- toString(sprintf("'%s'", stringstuff))
RESULTS <- sqlQuery(con, paste0("select
fling as flam
and toot **in** (",somestring,")
limit 30
;"))
I got the error you did when I left out the 'in', so double check your syntax.
This error message can arise if the table doesn't exist in the database.
A few sensible checks:
Check for typos in the table name in your query
See if you can run the same query on the same database via another sql client
Talk to your data base administrator to confirm that the table does exist
Re-installing the RODBC package did the trick for me!
I had a similar problem. After unnisntalling the R version 4.2.1 and install the R version 4.1.3 the problem was solved.
Related
I am trying to convert Oracle table data into JSON files. I have three databases and the below code gives output as JSON file in one DB but the other two databases throw ORA-00907: missing right parenthesis error.
Syntactically it is correct, as it gave output in one DB. Don't understand what is going wrong.
This is in Oracle DB, How do I find out which version of Oracle is installed in those DB's and if they are 12.2 and above, Is there a way to fix this issue? All I want is to convert the output of a select statement to a json file. Thanks in advance
code:
SELECT JSON_OBJECT ( 'empid' value eid , 'name' value ename , 'add' value eaddr )
FROM abc.emp
JSON_Object is available from Oracle version 12.2 .
Run the query Select * from v$version to check your oracle version
I want to use Spark to process some data from a JDBC source. But to begin with, instead of reading original tables from JDBC, I want to run some queries on the JDBC side to filter columns and join tables, and load the query result as a table in Spark SQL.
The following syntax to load raw JDBC table works for me:
df_table1 = sqlContext.read.format('jdbc').options(
url="jdbc:mysql://foo.com:3306",
dbtable="mydb.table1",
user="me",
password="******",
driver="com.mysql.jdbc.Driver" # mysql JDBC driver 5.1.41
).load()
df_table1.show() # succeeded
According to Spark documentation (I'm using PySpark 1.6.3):
dbtable: The JDBC table that should be read. Note that anything that is valid
in a FROM clause of a SQL query can be used. For example, instead of a
full table you could also use a subquery in parentheses.
So just for experiment, I tried something simple like this:
df_table1 = sqlContext.read.format('jdbc').options(
url="jdbc:mysql://foo.com:3306",
dbtable="(SELECT * FROM mydb.table1) AS table1",
user="me",
password="******",
driver="com.mysql.jdbc.Driver"
).load() # failed
It threw the following exception:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table1 WHERE 1=0' at line 1
I also tried a few other variations of the syntax (add / remove parentheses, remove 'as' clause, switch case, etc) without any luck. So what would be the correct syntax? Where can I find more detailed documentation for the syntax? Besides, where does this weird "WHERE 1=0" in error message come from? Thanks!
For reading data from JDBC source using sql query in Spark SQL, you can try something like this:
val df_table1 = sqlContext.read.format("jdbc").options(Map(
("url" -> "jdbc:postgresql://localhost:5432/mydb"),
("dbtable" -> "(select * from table1) as table1"),
("user" -> "me"),
("password" -> "******"),
("driver" -> "org.postgresql.Driver"))
).load()
I tried it using PostgreSQL. You can modify it according to MySQL.
table = "(SELECT id, person, manager, CAST(tdate AS CHAR) AS tdate, CAST(start AS CHAR) AS start, CAST(end AS CHAR) as end, CAST(duration AS CHAR) AS duration FROM EmployeeTimes) AS EmployeeTimes",
spark = get_spark_session()
df = spark.read.format("jdbc"). \
options(url=ip,
driver='com.mysql.jdbc.Driver',
dbtable=table,
user=username,
password=password).load()
return df
I had heaps of trouble with Spark JDBC incompatability with MYSQL timestamps. The trick is to convert all your timestamp or duration values to a string prior to having the JDBC touch them. Simply cast your values as strings and it will work.
Note: You will also have to use AS to give the query an alias for it to work.
With Spark 2.2 on Python connecting to a MySQL (5.7.19) I'm able to run the following when I use table="(SELECT * FROM a_table) AS my_table".
from pyspark.sql import SparkSession
my_spark = SparkSession \
.builder.appName("myApp") \
.config("jars", "/usr/local/spark-2.2.2-bin-hadoop2.7/jars/mysql-connector-java-5.1.45-bin.jar") \
.getOrCreate()
my_df = my_spark.read.jdbc(
url="jdbc:mysql://my_host:3306/my_db",
table="(SELECT * FROM a_table) AS my_table",
properties={'user': 'my_username', 'password': 'my_password'}
)
my_df.head(20)
I think it may be a bug in Spark SQL.
It seems that either this or this line gives you the error. Both use a Scala string interpolation to replace table with dbtable.
s"SELECT * FROM $table WHERE 1=0"
That's where you can find table1 WHERE 1=0 from the error you've faced since the above pattern would become:
SELECT * FROM (select * from table1) as table1 WHERE 1=0
which looks incorrect.
There is indeed a MySQL-specific dialect - MySQLDialect - that overrides getTableExistsQuery with its own:
override def getTableExistsQuery(table: String): String = {
s"SELECT 1 FROM $table LIMIT 1"
}
so my bet is that the other method getSchemaQuery is the source of the error. That's highly unlikely tough given you use Spark 1.6.3 while the method has #Since("2.1.0") marker.
I'd highly recommend checking out the logs of MySQL database and see what query is executed that leads to the error message.
I have a MySQL server as a linked server in Microsoft SQL Server 2008. For the link I use MySQL ODBC Connector version 5.1.8. When invoking queries using OPENQUERY (the only way I found of performing queries), problems occur. Simple queries, such as
SELECT * FROM OPENQUERY(MYSQL, 'SHOW TABLES')
work fine. Selection of individual columns, e.g.,
SELECT * FROM OPENQUERY(MYSQL, 'SELECT nr FROM letter')
works fine as well, but SELECT * syntax does not work. The query:
SELECT * FROM OPENQUERY(MYSQL, 'SELECT * FROM mytable')
raises an error:
Msg 7347, Level 16, State 1, Line 6
OLE DB provider 'MSDASQL' for linked
server 'MYSQL' returned data that does
not match expected data length for
column '[MSDASQL].let_nr'. The
(maximum) expected data length is 40,
while the returned data length is 0.
How can I make the SELECT * syntax work?
This problem happens if you are querying a MySQL linked server and the table you query has a datatype char(). This means fixed length and NOT varchar(). This happens when your fixed length field has a shorter string than the maximum length that SQL Server expected to get from the ODBC.
To fix this go to your MySQL server and change the datatype to varchar() leaving the length as it is. Example change char(10) to varchar(10).
Executing the following command before queries seems to help:
DBCC TRACEON(8765)
The error messages go away and queries seem to be working fine.
I'm not sure what it does though; I found it here: http://bugs.mysql.com/bug.php?id=46857
Strangely, SQL Server becomes unstable, stops responding to queries and finally crashes with scary-looking dumps in the logs a few minutes after several queries to the MySQL server. I am not sure if this has to do anything with the DBCC command, so I'm still interested in other possible solutions to this problem.
What I did to fix this since I can't modify the MySQL database structure is just create a view with a cast ex: CAST(call_history.calltype AS CHAR(8)) AS Calltype,
and select my view from MSSQL in my linked server.
The reason behind is that some strange types don't work well with the linked server (in my case the MySQL enum)
I found this
"The problem is that one of the fields
being returned is a blank or NULL CHAR
field. To resolve this in the Mysql
ODBC settings select the option "Pad
CHAR to Full Length"
Look at the last post here
An alternative would be to use the trim() function in your SELECT statement within OPENQUERY. The downside is you have to list each field individually, but what I did was create a view that calls OPENQUERY and then perfrom select * on the view.
Not ideal, but better than changing data types on tables!
Here is a crappy solution I came up with because I am unable to change the datatype to varchar as the db admin for the MySQL server is afraid it will cause issues with his scripts.
in my MySQL select query I run a case statement checking the character length of the string and add a filler character in front of the string "filling it up" to the max (in my case its a char(6)). then in the select statement of the openquery I strip the character back off.
Select replace(gradeid,'0','') as gradeid from openquery(LINKEDTOMYSQL, '
SELECT case when char_length(gradeid) = 0 then concat("000000", gradeID)
when char_length(gradeID) = 1 then concat("00000", gradeID)
when char_length(gradeID) = 2 then concat("0000", gradeID)
when char_length(gradeID) = 3 then concat("000", gradeID)
when char_length(gradeID) = 4 then concat("00", gradeID)
when char_length(gradeID) = 5 then concat("0", gradeID)
else gradeid end as gradeid
FROM sometableofmine')
it works but it probably is slower...
maybe you can make a MySQL function that will do the same logic, or come up with a more elegant solution.
I had the similar problem to this myself, which I resolved by wrapping the column-names in single ` style quotes.
Instead of...
column_name
...use...
`column_name`
Doing this helps the MySql query-engine should the column-name clash with a key or reserved-word.*
Instead of using SELECT * FROM TABLE_NAME, try to use all column names with quotes:
SELECT `column1`, `column2`, ... FROM TABLE_NAME
Example for normal datatype columns
SELECT * FROM OPENQUERY(MYSQL, 'SELECT `column1`, `column2`,...,`columnN` FROM mytable')
Example for ENUM datatype columns
SELECT * FROM OPENQUERY(MYSQL, 'SELECT `column1`, trim(`column2`) `column2`, `column3`,...,`columnN` FROM mytable')
*For those used to Sql Server, it is the MySql equivalent of wrapping a value in square-brackets, [ and ].
I used RMySQL for import database, sometimes when I try to close the connection, I receive the following error:
Error in mysqlCloseConnection(conn, ...) :
connection has pending rows (close open results set first)
I have no other ways of correcting this other than restarting the computer, anything I can do so solve this? Thanks!
We can use the method dbClearResult.
Example:
dbClearResult(dbListResults(conn)[[1]])
As Multiplexer noted, you are probably doing it wrong by leaving parts of the result set behind.
DBI and the accessor packages like RMySQL have documentation that is a little challenging at times. I try to remind myself to use dbGetQuery() which grabs the whole result set at once. Here is a short snippet from the CRANberries code:
sql <- paste("select count(*) from packages ",
"where package='", curPkg, "' ",
"and version='", curVer, "';", sep="")
nb <- dbGetQuery(dbcon, sql)
After this I can close without worries (or do other operations).
As explained in previous answers, you get this error because RMysql didn't return all the results of the query.
I had this problem when the results where over 500 ,using :
my_result <- fetch( dbSendQuery(con, query))
looking at the documentation for fetch I found that you can specify the number of records retrieved :
n = maximum number of records to retrieve per fetch. Use n = -1 or n = Inf to retrieve all pending records.
Solutions :
1- set the number of record to infinity :
my_result <- fetch( dbSendQuery(con, query), n=Inf)
2- use dbGetQuery :
my_result <- dbGetQuery(con, query)
rs<- dbGetQuery(dbcon, sql)
data<-dbFetch(rs)
dbClearResult(rs)
last line removed the following error when continuing querying
Error in .local(conn, statement, ...) :
connection with pending rows, close resultSet before continuing
You need to close the resultset before closing the connection.
If you try to close the connection before closing the resultset which has pending rows then sometimes it lead to hang the machine.
I don't know much about rmysql but try to close the resultset first.
You have to remember about result's set yourself. In example below you have how to close/clear results and how to take the rows affected. To solve your problem use last line of code on variable which takes results from any of yours sent statement or query. :)
statementRes <- DBI::dbSendStatement(conn = db,
"CREATE TABLE IF NOT EXISTS great_dupa_test (
taxonomy_id INTERGER NOT NULL,
scientific_name TEXT);")
DBI::dbGetRowsAffected(statementRes)
DBI::dbClearResult(statementRes)
When I used this in R. It worked for me!
Just run the command of dbConnect(). I just reconnected the db.
This procedure works from the MySQL commandline both remotely and on localhost and it works when called from PHP. In all cases the grants are adequate:
CREATE PROCEDURE `myDB`.`lee_expout` (IN e int, IN g int)
BEGIN
select lm.groupname, lee.location, starttime, dark,
inadist,smldist,lardist,emptydur,inadur,smldur,lardur,emptyct,entct,inact,smlct,larct
from lee join leegroup_map lm using (location)
where exp_id= e and std_interval!=0 and groupset_id= g
order by starttime,groupname,location;
END
I'm trying to call it from R:
library(DBI)
library(RMySQL)
db <- dbConnect(MySQL(), user="user", password="pswd",
dbname="myDB", host="the.host.com")
#args to pass to the procedure
exp_id<-16
group_id<-2
#the procedure call
p <- paste('CALL lee_expout(', exp_id, ',', group_id,')', sep= ' ')
#the bare query
q <- paste('select lm.groupname, lee.location, starttime, dark,
inadist,smldist,lardist,emptydur,inadur,smldur,lardur,emptyct,entct,inact,smlct,larct
from lee join leegroup_map lm using (location)
where exp_id=',
exp_id,
' and std_interval!=0 and groupset_id=',
group_id,
'order by starttime,groupname,location', sep=' ')
rs_p <- dbSendQuery(db, statement=p) #run procedure and fail
p_data<-fetch(rs_p,n=30)
rs_q <- dbSendQuery(db, statement=q) #or comment out p, run query and succeed
q_data<-fetch(rs_q,n=30)
The bare query runs fine. The procedure call fails with
RApache Warning/Error!!!Error in
mysqlExecStatement(conn, statement,
...) : RS-DBI driver: (could not
run statement: PROCEDURE
myDB.lee_expout can't return a
result set in the given context)
The MySQL docs say
For statements that can be determined
only at runtime to return a result
set, a PROCEDURE %s can't return a
result set in the given context error
occurs.
One would think that if a procedure were going to throw that error, it would be thrown under all circumstances instead of just from R.
Any thoughts on how to fix this?
As far as I know, calling SQL procedures from R (dbCallProc) is not yet formally implemented (see reference manual of 24 july 2010 : http://cran.r-project.org/web/packages/RMySQL/RMySQL.pdf)
RMySQL is transferred from S3 to S4 programming style, and is currently still under development (version 0.7 being the current one). I suggest you ask the same question on the database mailing list for R :
https://stat.ethz.ch/mailman/listinfo/r-sig-db
If it is possible, they'll show you how. If it isn't, they'll tell you why.
Try adding:
client.flag=CLIENT_MULTI_STATEMENTS
to your connection parameters. It may help.
There are some details about this in the RMySQL PDF.
Don't now about R, but this
p <- paste('CALL lee_expout(', exp_id, ',', group_id,')', sep= ' ')
does look a bit ugly, ie like string concatenation. Maybe R's database driver takes that badly. In general, you can use placeholders for variables and pass the values on as separate arguments. Besides various security arguments, this also takes care of any type/apostrophe/whatever issues - maybe here, too?