Consider am having table(myTable) with 4 columns in sql server i.e. col1, col2 ,col3, col4.
I want to upload or bulkinsert dataframe into "MyTable" using RODBC library in R.
I need to upload this data to mySql also using RMySql library.
please see my example code below.
library(RODBC)
uploaddbconnection=odbcDriverConnect('driver={SQL Server};server=localhost;database=StudentsDB;uid=sa;pwd=sa123;')
outputframe=data.frame(col1=name,col2=age,col3=TotalMarks)
sqlSave(uploaddbconnection, outputframe, tablename ="MyTable",rownames=FALSE, append = TRUE)
but the above code returns error.
> *Error in sqlSave(uploaddbconnection, outputframe, tablename = TableName, : unable to append to table ‘MyTable’*
can anyone help me on this.thanks in advance.
Related
I am trying to convert Oracle table data into JSON files. I have three databases and the below code gives output as JSON file in one DB but the other two databases throw ORA-00907: missing right parenthesis error.
Syntactically it is correct, as it gave output in one DB. Don't understand what is going wrong.
This is in Oracle DB, How do I find out which version of Oracle is installed in those DB's and if they are 12.2 and above, Is there a way to fix this issue? All I want is to convert the output of a select statement to a json file. Thanks in advance
code:
SELECT JSON_OBJECT ( 'empid' value eid , 'name' value ename , 'add' value eaddr )
FROM abc.emp
JSON_Object is available from Oracle version 12.2 .
Run the query Select * from v$version to check your oracle version
I am trying to utilize dbWriteTable to write from R to MySQL. When I connect to MySQL I created an odbc connection so I can just utilize the command:
abc <- DBI::dbConnect(odbc::odbc(),
dsn = "mysql_conn")
In which I can see all my schemas for the MySQL instance. This is great when I want to read in data such as:
test_query <- dbSendQuery(abc, "SELECT * FROM test_schema.test_file")
test_query <- dbFetch(test_query)
The problem I have is when I want to create a new table in one of the schema's how to declare the schema I want to write to in
dbWriteTable(abc, value = new_file, name = "new_file", overwrite=T)
I imagine I have to define the test_schema in the dbWriteTable portion but haven't been able to get it to work. Thoughts?
I want to use Spark to process some data from a JDBC source. But to begin with, instead of reading original tables from JDBC, I want to run some queries on the JDBC side to filter columns and join tables, and load the query result as a table in Spark SQL.
The following syntax to load raw JDBC table works for me:
df_table1 = sqlContext.read.format('jdbc').options(
url="jdbc:mysql://foo.com:3306",
dbtable="mydb.table1",
user="me",
password="******",
driver="com.mysql.jdbc.Driver" # mysql JDBC driver 5.1.41
).load()
df_table1.show() # succeeded
According to Spark documentation (I'm using PySpark 1.6.3):
dbtable: The JDBC table that should be read. Note that anything that is valid
in a FROM clause of a SQL query can be used. For example, instead of a
full table you could also use a subquery in parentheses.
So just for experiment, I tried something simple like this:
df_table1 = sqlContext.read.format('jdbc').options(
url="jdbc:mysql://foo.com:3306",
dbtable="(SELECT * FROM mydb.table1) AS table1",
user="me",
password="******",
driver="com.mysql.jdbc.Driver"
).load() # failed
It threw the following exception:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'table1 WHERE 1=0' at line 1
I also tried a few other variations of the syntax (add / remove parentheses, remove 'as' clause, switch case, etc) without any luck. So what would be the correct syntax? Where can I find more detailed documentation for the syntax? Besides, where does this weird "WHERE 1=0" in error message come from? Thanks!
For reading data from JDBC source using sql query in Spark SQL, you can try something like this:
val df_table1 = sqlContext.read.format("jdbc").options(Map(
("url" -> "jdbc:postgresql://localhost:5432/mydb"),
("dbtable" -> "(select * from table1) as table1"),
("user" -> "me"),
("password" -> "******"),
("driver" -> "org.postgresql.Driver"))
).load()
I tried it using PostgreSQL. You can modify it according to MySQL.
table = "(SELECT id, person, manager, CAST(tdate AS CHAR) AS tdate, CAST(start AS CHAR) AS start, CAST(end AS CHAR) as end, CAST(duration AS CHAR) AS duration FROM EmployeeTimes) AS EmployeeTimes",
spark = get_spark_session()
df = spark.read.format("jdbc"). \
options(url=ip,
driver='com.mysql.jdbc.Driver',
dbtable=table,
user=username,
password=password).load()
return df
I had heaps of trouble with Spark JDBC incompatability with MYSQL timestamps. The trick is to convert all your timestamp or duration values to a string prior to having the JDBC touch them. Simply cast your values as strings and it will work.
Note: You will also have to use AS to give the query an alias for it to work.
With Spark 2.2 on Python connecting to a MySQL (5.7.19) I'm able to run the following when I use table="(SELECT * FROM a_table) AS my_table".
from pyspark.sql import SparkSession
my_spark = SparkSession \
.builder.appName("myApp") \
.config("jars", "/usr/local/spark-2.2.2-bin-hadoop2.7/jars/mysql-connector-java-5.1.45-bin.jar") \
.getOrCreate()
my_df = my_spark.read.jdbc(
url="jdbc:mysql://my_host:3306/my_db",
table="(SELECT * FROM a_table) AS my_table",
properties={'user': 'my_username', 'password': 'my_password'}
)
my_df.head(20)
I think it may be a bug in Spark SQL.
It seems that either this or this line gives you the error. Both use a Scala string interpolation to replace table with dbtable.
s"SELECT * FROM $table WHERE 1=0"
That's where you can find table1 WHERE 1=0 from the error you've faced since the above pattern would become:
SELECT * FROM (select * from table1) as table1 WHERE 1=0
which looks incorrect.
There is indeed a MySQL-specific dialect - MySQLDialect - that overrides getTableExistsQuery with its own:
override def getTableExistsQuery(table: String): String = {
s"SELECT 1 FROM $table LIMIT 1"
}
so my bet is that the other method getSchemaQuery is the source of the error. That's highly unlikely tough given you use Spark 1.6.3 while the method has #Since("2.1.0") marker.
I'd highly recommend checking out the logs of MySQL database and see what query is executed that leads to the error message.
I have been trying to write an R script to query Impala database. Here is the query to the database:
select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA
When I run this query manually (read: outside the Rscript via impala-shell), I am able to get the table contents. However, when the same is tried via the R script, I get the following error:
[1] "HY000 140 [Cloudera][ImpalaODBC] (140) Unsupported query."
[2] "[RODBC] ERROR: Could not SQLExecDirect 'select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA'
closing unused RODBC handle 1
Why does the query fail when tried via R? and how do I fix this? Thanks in advance :)
Edit 1:
The connection script looks as below:
library("RODBC");
connection <- odbcConnect("Impala");
query <- "select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from databaseB.tableB ) group by columnA order by columnA";
data <- sqlQuery(connection,query);
You need to install the relevant drivers, please look at the following link
I had the same issue, all i had to do was update the ODBC drivers.
Also if you can update your odbcConnect with the username and password
connection <- odbcConnect("Impala");
to
connection <- odbcConnect("Impala", uid="root", pwd="password")
The RODBC package is quirky: if there's no row updated/deleted in the query execution it will throw an error.
So before using sqlDelete to delete rows, or using sqlUpdate to update values, first check if there's at least one row that will be deleted/updated by querying COUNT(*).
I've had no problem after implementing the check, for Oracle SQL 12g.
An alternative would be to use a staging table for the new batch of data, and use sqlQuery to execute a MERGE command. RODBC won't complaint if there's zero row merged.
This might also be due to an error in your sql query itself. For example, I got this error when I missed an 'in' in the following generalized statement. Example:
stringstuff <- someDT$columnyouwanttouse
somestring <- toString(sprintf("'%s'", stringstuff))
RESULTS <- sqlQuery(con, paste0("select
fling as flam
and toot **in** (",somestring,")
limit 30
;"))
I got the error you did when I left out the 'in', so double check your syntax.
This error message can arise if the table doesn't exist in the database.
A few sensible checks:
Check for typos in the table name in your query
See if you can run the same query on the same database via another sql client
Talk to your data base administrator to confirm that the table does exist
Re-installing the RODBC package did the trick for me!
I had a similar problem. After unnisntalling the R version 4.2.1 and install the R version 4.1.3 the problem was solved.
I'm trying to connect to my postgres database and do a select statement.The code is given below.
I get a error saying table1 is undefined...Table1 is in the database..how table1 will be recognized in my code.
code.py
engine1 = create_engine('postgresql://username:password#localhost/testdb')
Session1 = sessionmaker(bind=engine1)
session1 = Session1()
for each in session1.query(table1).all():
print each
Before you can query for table1 you need to define it and its mapping. Following should get you on the right track:
metadata = MetaData(bind=engine1)
table1 = Table('Table1', metadata, autoload=True)
# now you can run the query
Reading SQL Expression Language Tutorial is highly recommended.