I am researching how to read in data from a server directly to a data frame in R. In the past I have written SQL queries that were over 50 lines long (with all the selects and joins). Any advice on how to write long queries in R? Is there some way to write the query elsewhere in R, then paste it in to the "sqlQuery" part of the code?
Keep long SQL queries in .sql files and read them in using readLines + paste with collapse='\n'
my_query <- paste(readLines('your_query.sql'), collapse='\n')
results <- sqlQuery(con, my_query)
You can paste any SQL query into R as is and then simply replace the newlines + spaces with a single space. For instance:
## Connect ot DB
library(RODBC)
con <- odbcConnect('MY_DB')
## Paste the query as is (you can have how many spaces and lines you want)
query <-
"
SELECT [Serial Number]
,[Series]
,[Customer Name]
,[Zip_Code]
FROM [dbo].[some_db]
where [Is current] = 'Yes' and
[Serial Number] LIKE '5%' and
[Series] = '20'
order by [Serial Number]
"
## Simply replace the new lines + spaces with a space and you good to go
res <- sqlQuery(con, gsub("\\n\\s+", " ", query))
close(con)
Approach with separate .sql (most sql or nosql engines) files can be trouble if one prefer to edit code in one file.
As far as someone using RStudio (or other tool where code folding can be customized), simplifying can be done using parenthesis. I prefer using {...} and fold the code.
For example:
query <- {'
SELECT
user_id,
format(date,"%Y-%m") month,
product_group,
product,
sum(amount_net) income,
count(*) number
FROM People
WHERE
date > "2015-01-01" and
country = "Canada"
GROUP BY 1,2,3;'})
Folding a query can be even done within function (folding long argument), or in other situations where our code extends to inconvenient sizes.
I had this issue trying to run a 17 line SQL query through RODBC and tried #arvi1000's solution but no matter what I did it would produce an error message and not execute more than one line of the .sql file. Tried variations of the value for collapse and different ways for reading in the file. Spent 90 minutes trying to get it to work.. Suspect RODBC might behave differently with multi-line queries on different platforms or with different versions of MySQL or ODBC settings.
Anyway, the following loop arrangement may not be as elegant but it works and is possibly more robust:
channel <- odbcConnect("mysql_odbc", uid="username", pwd="password")
sqlString<-readLines("your_query.sql")
for (i in 1:length(sqlString)) {
print(noquote(sqlString[i]))
sqlQuery(channel, as.name(sqlString[i]))
}
In my script, all except the last lines were doing joins, creating temporary tables etc, only the last line had a SELECT statment and produced an output. .sql file was tidy with only one query per line, no comments or newline characters within the query. It seems that this loop runs all the code, but the output is possibly lost in the scope somewhere, so the one SELECT statement needs to be repeated outside the loop.
Related
I'm trying to excercise on BadStore, for those who don't know it's a fake online store site which can be run on VM box, and offers a lot of security vulnerabilities.
One thing i'm trying to do is to apply sql injection on the search query.
When searching for "book", for instance, we see this:
So, i'm trying to show all the store items trying to search for 1=1' --, which will result with the query of:
SELECT itemnum, sdesc, ldesc, price FROM itemdb WHERE '1=1' --' IN (itemnum,sdesc,ldesc)
however, this not giving the expected outcome as I get the following error:
Any suggestions?
You realize that -- in MySQL acts as a comment for the rest of the line?
If this is what you are trying to do, commenting out the rest of the line, then as per the MySQL documentation, you need a space after the --.
I understand you are trying out MySQL injection, so try to type your query, and then after the query type ; -- Notice that there IS a trailing space.
TL;DR
Change
'1=1' --' IN
TO
'1=1' -- ' IN
I am currently in the process of converting a large amount of ASP classic/VBscript pages from an old database (Unify Dataserver) to MySQL.
Say you have a query like this:
sql = "SELECT c.container_type, c_amount, c_sdate, c_edate, csrt " & _
"FROM containers c, container_chars cc"
objRS.Open sql, objConn, 3, 1
If I want to reference the column "c_edate", I can simply use this and it works fine:
x = objRS("c_edate")
However, when it comes to referencing a column like "c.container_type" (With a . used to differentiate it from another table, like so:
x = objRS("c.container_type")
It will say
ADODB.Recordset error '800a0cc1'
Item cannot be found in the collection corresponding to the requested name or ordinal.
I can fix it by using a number instead:
objRS(0)
This was never an issue until we switched to MySQL. In our old database, using the rs(table.column_name) format worked just fine. But in MySQL, once you add a (.) to the code, it can't find that item unless you switch it to a number.
As you can imagine, this is quite a pain as I go through the 700+ pages of this website manually counting the placement of each column in the corresponding select statement every time something from the query is referenced.
Does anyone know why this is happening or how to make the rs(table.column_name) format work with MySQL like it does with our old database?
In SQL Server, and apparently in MySQL too, the way to reference a field in the result set is to just use the name, without the prefix.
x = objRS("container_type")
The prefix is needed by the database to differentiate between identically-named columns, but once you send the results to a recordset, that recordset doesn't know or care where the columns came from.
The same goes for aliases:
SQL = "SELECT c.container_type AS ctype, [...]"
...
x = objRS("ctype")
Combining these two facts, if you do have identically-named columns in the result set, you must alias at least one of them. If you don't, it won't necessarily give an error, but you will not be able to reference the second column using the rs("name") syntax.
SQL = "SELECT c1.container_type, c2.container_type AS c_type2, ..."
...
x = objRS("container_type")
y = objRS("c_type2")
[Note that while you're at it, you probably should also modify your FROM clauses to use proper FROM table1 INNER JOIN table2 ON table1.fieldA = table2.fieldB type syntax. The FROM table1, table2 WHERE table1.fieldA = table2.fieldB syntax has been deprecated for many years now.]
I'm using the Wordnet SQL database from here: http://wnsqlbuilder.sourceforge.net
It's all built fine and users with appropriate privileges have been set.
I'm trying to find synonyms of words and have tried to use the two example statements at the bottom of this page: http://wnsqlbuilder.sourceforge.net/sql-links.html
SELECT synsetid,dest.lemma,SUBSTRING(src.definition FROM 1 FOR 60) FROM wordsXsensesXsynsets AS src INNER JOIN wordsXsensesXsynsets AS dest USING(synsetid) WHERE src.lemma = 'option' AND dest.lemma <> 'option'
SELECT synsetid,lemma,SUBSTRING(definition FROM 1 FOR 60) FROM wordsXsensesXsynsets WHERE synsetid IN ( SELECT synsetid FROM wordsXsensesXsynsets WHERE lemma = 'option') AND lemma <> 'option' ORDER BY synsetid
However, they never complete. At least not in any reasonable amount of time and I have had to cancel all of the queries. All other queries seem to work find and when I break up the second SQL example, I can get the individual parts to work and complete in reasonable times (about 0.40 seconds)
When I try and run the full statement however, the MySQL command line client just hangs.
Is there a problem with this syntax? What is causing it to take so long?
EDIT:
Output of "EXPLAIN SELECT ..."
Output of "EXPLAIN EXTENDED ...; SHOW WARNINGS;"
I did more digging and looking into the various statements used and found the problem was in the IN command.
MySQL repeats the statement for every single row in the database. This is the cause of the hang, as it had to run through hundreds of thousands of records.
My remedy to this was to split the command into two separate database calls first getting the synsets, and then dynamically creating a bound SQL string to look for the words in the synsets.
I am using VBScript to query a MySQL database that contains only English characters. The query is basically: SELECT * FROM table name PROCEDURE ANALYSE(1,1)
When I run the query directly on the DB it returns the expected results. However, when the query is run through VBScript it returns gibberish (Chinese?). I know for a fact the DB only contains English as I am the one who built it. I've run numerous other queries against the same table and haven't had any problems. Its only when I run the PROCEDURE ANALYSE query that it returns something unexpected.
The VBScript code is as follows:
strSQLcommand = "SELECT * FROM " & strTempTableName & " PROCEDURE ANALYSE(1,1)"
otRecordset.Open strSQLcommand,Connection
If Not otRecordset.EOF Then
otRecordset.movefirst
Do While NOT otRecordset.EOF
wscript.echo otRecordset(0).value
wscript.echo otRecordset(1).value
otRecordset.Movenext
Loop
End If
I've never had a problem with returning values from any other table in this DB. I've run this query numerous times and always get the same results which has me perplexed.
Any thoughts or ideas are greatly appreciated!
Ok, so, it turns out it has nothing to do with the DB per se.
I decided to start checking the data types that were being returned using the VBScript VarTYpe() method. The fields that were returning "gibberish/Chinese" had a data type of 8092. Basically, a byte array. Sprinkling a little Google fairy dust led me to this function:
Function C8209toStr(body8209)
If VarType(body8209) = 8209 Then
Dim i
ReDim aOut(UBound(body8209))
For i = 1 to UBound(body8209) + 1
aOut(i-1) = chr(ascb(midb(body8209,i,1)))
Next
C8209toStr = Join(aOut, "")
End If
End Function
Hope that helps whoever else comes along!
I used RMySQL for import database, sometimes when I try to close the connection, I receive the following error:
Error in mysqlCloseConnection(conn, ...) :
connection has pending rows (close open results set first)
I have no other ways of correcting this other than restarting the computer, anything I can do so solve this? Thanks!
We can use the method dbClearResult.
Example:
dbClearResult(dbListResults(conn)[[1]])
As Multiplexer noted, you are probably doing it wrong by leaving parts of the result set behind.
DBI and the accessor packages like RMySQL have documentation that is a little challenging at times. I try to remind myself to use dbGetQuery() which grabs the whole result set at once. Here is a short snippet from the CRANberries code:
sql <- paste("select count(*) from packages ",
"where package='", curPkg, "' ",
"and version='", curVer, "';", sep="")
nb <- dbGetQuery(dbcon, sql)
After this I can close without worries (or do other operations).
As explained in previous answers, you get this error because RMysql didn't return all the results of the query.
I had this problem when the results where over 500 ,using :
my_result <- fetch( dbSendQuery(con, query))
looking at the documentation for fetch I found that you can specify the number of records retrieved :
n = maximum number of records to retrieve per fetch. Use n = -1 or n = Inf to retrieve all pending records.
Solutions :
1- set the number of record to infinity :
my_result <- fetch( dbSendQuery(con, query), n=Inf)
2- use dbGetQuery :
my_result <- dbGetQuery(con, query)
rs<- dbGetQuery(dbcon, sql)
data<-dbFetch(rs)
dbClearResult(rs)
last line removed the following error when continuing querying
Error in .local(conn, statement, ...) :
connection with pending rows, close resultSet before continuing
You need to close the resultset before closing the connection.
If you try to close the connection before closing the resultset which has pending rows then sometimes it lead to hang the machine.
I don't know much about rmysql but try to close the resultset first.
You have to remember about result's set yourself. In example below you have how to close/clear results and how to take the rows affected. To solve your problem use last line of code on variable which takes results from any of yours sent statement or query. :)
statementRes <- DBI::dbSendStatement(conn = db,
"CREATE TABLE IF NOT EXISTS great_dupa_test (
taxonomy_id INTERGER NOT NULL,
scientific_name TEXT);")
DBI::dbGetRowsAffected(statementRes)
DBI::dbClearResult(statementRes)
When I used this in R. It worked for me!
Just run the command of dbConnect(). I just reconnected the db.