Single Quote R for SQL Query - mysql

Hello I have the following data that I want to paste into an SQL query through a R connection.
UKWinnersID<-c("1W167X6", "QM6VY8", "ZDNZX0", "8J49D8", "RGNSW9",
"BH7D3P1", "W31S84", "NTHDJ4", "H3UA1", "AH9N7",
"DF52B68", "K65C2", "VGT2Q0", "93LR6", "SJAJ0",
"WQBH47", "CP8PW9", "5H2TD5", "TFLKV4", "X42J1" )
The query / code in R is as following:
UKSQL6<-data.frame(sqlQuery(myConn, paste("SELECT TOP 10000 [AxiomaDate]
,[RiskModelID] ,[AxiomaID],[Factor1],[Factor2],[Factor3],[Factor4],[Factor5]
,[Factor6],[Factor7],[Factor8],[Factor9],[Factor10],[Factor11],[Factor12]
,[Factor13],[Factor14],[Factor15]FROM [PortfolioAnalytics].[Data_Axioma].[SecurityExposures]
Where AxiomaDate IN (
SELECT MAX(AxiomaDate)
FROM [PortfolioAnalytics].[Data_Axioma].[FactorReturns]
GROUP BY MONTH(AxiomaDate), YEAR(AxiomaDate))
AND RiskModelID = 8
AND AxiomaID IN(",paste(UKWinnersID, collapse = ","),")")))
I am pasting the UKWinnersID in the last line of the code above but that format of the UKWinnersID needs to be as ('1W167X6', 'QM6VY8', 'ZDNZX0'.. etc) with a single quote which I just cant get to work.

Consider running a parameterized query using the RODBCext package (extension of RODBC), assuming this is the API being used. Parameterized queries do more than insulate from SQL injection but abstracts data from code and avoids the messy quote enclosure and string interpolation and concatenation for cleaner, maintainable code.
Below replaces your TOP 10000 into TOP 500 for each of the 20 ids:
library(RODBC)
library(RODBCext)
conn <- odbcConnect("DBName", uid="user", pwd="password")
ids_df <- data.frame(UKWinnersID = c("1W167X6", "QM6VY8", "ZDNZX0", "8J49D8", "RGNSW9",
"BH7D3P1", "W31S84", "NTHDJ4", "H3UA1", "AH9N7",
"DF52B68", "K65C2", "VGT2Q0", "93LR6", "SJAJ0",
"WQBH47", "CP8PW9", "5H2TD5", "TFLKV4", "X42J1"))
# SQL STATEMENT (NO DATA)
query <- "SELECT TOP 500 [AxiomaDate], [RiskModelID], [AxiomaID], [Factor1],[Factor2]
, [Factor3], [Factor4], [Factor5], [Factor6], [Factor7], [Factor8]
, [Factor9], [Factor10], [Factor11], [Factor12]
, [Factor13], [Factor14], [Factor15]
FROM [PortfolioAnalytics].[Data_Axioma].[SecurityExposures]
WHERE AxiomaDate IN (
SELECT MAX(AxiomaDate)
FROM [PortfolioAnalytics].[Data_Axioma].[FactorReturns]
GROUP BY MONTH(AxiomaDate), YEAR(AxiomaDate)
)
AND RiskModelID = 8
AND AxiomaID = ?"
# PASS DATAFRAME VALUES TO BIND TO QUERY PARAMETERS
UKSQL6 <- sqlExecute(conn, query, ids_df, fetch=TRUE)
odbcClose(conn)
Alternatively, if you really need to use the IN() clause:
# SQL STATEMENT (NO DATA)
query <- paste("SELECT TOP 10000
...same as above...
AND AxiomaID IN (", paste(rep("?", nrow(ids_df)), collapse=", "), ")")
# TRANSPOSE DATA FRAME FOR COLUMN EQUAL TO ? PLACEHOLDERS
UKSQL6 <- sqlExecute(conn, query, t(ids_df), fetch=TRUE)

Related

Double Quotes in temporary JSON variable on MySQL using R

I have a table in MYSQL that contains the user interactions with a Web Page, I needed to extract the rows for the users where the date of that interaction is lower than a certain benchmark date and that benchmark date is different for each customer (I extract that date from a different database).
My approach was to set a json variable in which the key is a user and the value is the benchmark date, and used it in the query to extract the intended fields.
Example in R:
#MainDF contains the user and the benchmark date from a different database
json_str <- mapply(function(uid, bench_date){
paste0(
'{','"',cust,'"', ':', '"', bench_date, '"','}'
)
}, MainDF[, 'uid'],
MainDF[, 'date']
)
json_str <- paste0("'", '[', paste0(json_str , collapse = ','), ']', "'")
temp_var <- paste('set #test=', json_str)
The intention was to make temp_var to be like:
set #test= '{"0001":"2010-05-05",
"0012":"2015-05-05",
"0101":"2018-07-20"}'
but it actually looks like :
set #test= '{\"0001\":\"2010-05-05\",
\"0012\":\"2015-05-05\",
\"0101\":\"2018-07-20\"}'
then create the main query:
main_Q <- "select user_id, date
from interaction
where 1=1
and json_contains(json_keys(#test), concat('\"',user_id,'\"')) = 1
and date <= json_unquote(json_extract(#test,
concat('$.','\"',user_id, '\"')
)
)
"
For the execution, first, set the temporal variable and then execute the main query
dbSendQuery(connection, temp_var)
resp <- dbSendQuery(connection, main_Q )
target_df <- fetch(resp, n=-1)
dbClearResult(resp )
When I test a fraction of it in a SQL IDE it does works. However, in R it doesn't return anything.
I think that the issue is that R escape the double quotes in temp_var and SQL end up reading
set #test= '{\"0001\":\"2010-05-05\",
\"0012\":\"2015-05-05\",
\"0101\":\"2018-07-20\"}'
which is not won't work.
For example if I execute:
set #test= '{"0001":"2010-05-05",
"0012":"2015-05-05",
"0101":"2018-07-20"}'
select json_keys(#test)
it will return an array with the keys, but that is not the case with
set #test= '{\"0001\":\"2010-05-05\",
\"0012\":\"2015-05-05\",
\"0101\":\"2018-07-20\"}'
select json_keys(#test)
I am not sure how to solve the issue, but I need double quotes to specify the JSON. Is there any other approach that I should try or a way to make this work?
First, I think it is generally better to use a well-known library/package for converting to/from JSON, for several reasons.
This gives you a string that you should be able to place just about anywhere.
json_str <- jsonlite::toJSON(setNames(as.list(MainDF$date), MainDF$uid), auto_unbox=TRUE)
json_str
# {"0001":"2010-05-05","0012":"2015-05-05","0101":"2018-07-20"}
And while looking at the object on the R console will give the escaped-doublequotes,
as.character(json_str)
# [1] "{\"0001\":\"2010-05-05\",\"0012\":\"2015-05-05\",\"0101\":\"2018-07-20\"}"
that is merely R's representation (shows all strings within double-quotes, and therefore needs to escape any double-quotes within the string).
Adding it into some script should be straight-forward:
cat(paste('set #test=', sQuote(json_str)), '\n')
# set #test= '{"0001":"2010-05-05","0012":"2015-05-05","0101":"2018-07-20"}'
I'm assuming that having each on its own row is not critical. If it is, and indentation is important, perhaps this is more your style:
spaces <- strrep(' ', 2+nchar('set #test = '))
cat(paste0('set #test = ', sQuote(gsub(",", paste0(",\n", spaces), json_str))), '\n')
# set #test = '{"0001":"2010-05-05",
# "0012":"2015-05-05",
# "0101":"2018-07-20"}'
Data:
MainDF <- read.csv(stringsAsFactors=FALSE, colClasses='character', text='
uid,date
0001,2010-05-05
0012,2015-05-05
0101,2018-07-20')

r fetch data from mysql db loop

I successfully fetch data from my mysql db using r:
library(RMySQL)
mydb = dbConnect(MySQL(), user='user', password='pass', dbname='fib', host='myhost')
rs = dbSendQuery(mydb, 'SELECT distinct(DATE(date)) as date, open,close FROM stocksng WHERE symbol = "FIB7F";')
data <- fetch(rs, n=-1)
dbHasCompleted(rs)
so now I've an object a list:
> print (typeof(data))
[1] "list"
each elements is a tuple(?) like date(charts),open(long),close(long)
ok well now my problem: I want to get a vector of percentuale difference betwen close (x) and next day open (x+1) until the end BUT I can't access properly to the item!
Example: ((open)/close*100)-100)
I try:
for (item in data){
print (item[2])
}
and all possible combination like:
for (item in data){
print (item[][2])
}
but cannot access to right element :! anyone could help?
You have a bigger problem than this in your MySQL query, because you did not specify an ORDER BY clause. Consider using the following query:
SELECT DISTINCT
DATE(date) AS date,
open,
close
FROM stocksng
WHERE
symbol = "FIB7F"
ORDER BY
date
Here we order the result set by date, so that it makes sense to speak of the current and next open or close. Now with a proper query in place if you wanted to get the percentile difference between the current close and the next day open you could try:
require(dplyr)
(lead(open, 1) / close*100) - 100
Or using base R:
(open[2:(length(open)+1)] / close*100) - 100
naif version:
for (row in 1:nrow(data)){
date <- unname (data[row,"date"])
open <- unname (data[row+1,"open"])
close <- unname (data[row,"close"])
var <- abs((close/open*100)-100)
print (var)
}

How to stop large float numbers from being output in exponential form (scientific notation) in R?

I'm working with ID numbers such as: "868130240945684480", which are stored into a MySQL database.
Whenever I make a query with RMySQL, the output is as follows: ""8.681302e+17" and then R reads it as "868130200000000000", which is a totally different ID.
Is there anyway to avoid this?
Here is the query I make to retrieve my data:
require(RMySQL)
con <- dbConnect(MySQL(), user='xxx', password='xxx', dbname='xxx', host='xxx')
rs <- dbSendQuery(con, "select sprinklr_twitter.Tweet_Id from sprinklr_twitter
inner join twitter
on sprinklr_twitter.Tweet_id=twitter.Tweet_id")
Tweet_id<-fetch(rs, n=-1)
Tweet_id<-as.data.frame(Tweet_id)

transforming / updating / typecasting list values from int & num to strings

I'm retrieving data from a MySQL database using an R script and then writing it to a CSV, but I'm having an issue where two of the columns of data that I want to write out as strings are being written out as integers and numbers (in this case, in scientific notation).
I would like to have these written out as string values instead, but I'm not finding this is a straightforward task, in spite of doing a fair bit of googling and experimentation.
The relevant code:
conn <- dbConnect(MySQL(), host = "127.0.0.1", user="REDACTED", password="REDACTED", dbname="REDACTED", port=8906)
type_data <- dbGetQuery(conn, paste("SELECT * FROM ", arg, " WHERE 1 LIMIT 10", sep=""))
# Problem: "Subscribed" and "TimeUpdated" are coming through as numbers instead of strings
write.csv(type_data, paste("./",arg,".csv", sep=""), row.names=F)
dbDisconnect(conn)
Desired results:
"Id","EntityId","EntityType","CommunicationType","Subscribed","TimeUpdated"
"0002INKRyUolIrjG5DbUa0lDqUjxt","4374484","PERSON","MFS","1","1385297883000000000"
"0004WaXpmvbOh3WG3hd6kQtPINibv","8361929","PERSON","MFS","1","1437798832740631885"
"0005l1fy1TJiFhyiEK2IXRCxfqee5","4197014","PERSON","SURVEYS_AND_POLLS","0","1146917239000000000"
"0008Qb2ra1PoSLgbumc2wmDfvexx8","4155704","PERSON","MFS","1","1345053223000000000"
"000C1IKgHrFaqmlHlKGGhigGyoaw4","4515071","PERSON","PARTNER","1","1215098959000000000"
"000Czw8Gv5w3eNoOmOFVTKLIuc2ti","4372360","PERSON","MFS","1","1384952236000000000"
"000DOsk9xlYKvs11PzZFRgmOpYfiA","4347384","PERSON","SURVEYS_AND_POLLS","1","1177513307000000000"
"000IQ4TKYHAbb334zFYdWVCZZfMYo","4470083","PERSON","PARTNER","1","1446945757133940400"
"000LbifV4rUa2MhxFlVZ52PSek5kG","499194","PERSON","SURVEYS_AND_POLLS","0","1097867573000000000"
Actual results:
"Id","EntityId","EntityType","CommunicationType","Subscribed","TimeUpdated"
"0002INKRyUolIrjG5DbUa0lDqUjxt","4374484","PERSON","MFS",1,1.385297883e+18
"0004WaXpmvbOh3WG3hd6kQtPINibv","8361929","PERSON","MFS",1,1437798832740631808
"0005l1fy1TJiFhyiEK2IXRCxfqee5","4197014","PERSON","SURVEYS_AND_POLLS",0,1.146917239e+18
"0008Qb2ra1PoSLgbumc2wmDfvexx8","4155704","PERSON","MFS",1,1.345053223e+18
"000C1IKgHrFaqmlHlKGGhigGyoaw4","4515071","PERSON","PARTNER",1,1.215098959e+18
"000Czw8Gv5w3eNoOmOFVTKLIuc2ti","4372360","PERSON","MFS",1,1.384952236e+18
"000DOsk9xlYKvs11PzZFRgmOpYfiA","4347384","PERSON","SURVEYS_AND_POLLS",1,1.177513307e+18
"000IQ4TKYHAbb334zFYdWVCZZfMYo","4470083","PERSON","PARTNER",1,1446945757133940480
"000LbifV4rUa2MhxFlVZ52PSek5kG","499194","PERSON","SURVEYS_AND_POLLS",0,1.097867573e+18
"000OWvUHdmjeL34XzuVLmHQBple7X","4176205","PERSON","MFS",1,1.143985154e+18
Assistance would be most appreciated!
Thanks to #Bernhard for the help with this - here's a working solution.
options(scipen = 999) # so that TimeUpdated isn't outputted using scientific notation
conn <- dbConnect(MySQL(), host = "127.0.0.1", user="REDACTED", password="REDACTED", dbname="REDACTED", port=8906)
type_data <- dbGetQuery(conn, paste("SELECT * FROM ", arg, " WHERE 1", sep=""))
# convert the subscribed and timeupdated columns to strings
type_data$Subscribed <- as.character(type_data$Subscribed)
type_data$TimeUpdated <- as.character(type_data$TimeUpdated)
write.csv(type_data, paste(args[[1]], "/", arg, ".csv", sep=""), row.names=F)
dbDisconnect(conn)

How to use dynamic variable in RMySQL LIKE Statement [duplicate]

This question already has answers here:
Dynamic "string" in R
(4 answers)
Add a dynamic value into RMySQL getQuery [duplicate]
(2 answers)
RSQLite query with user specified variable in the WHERE field [duplicate]
(2 answers)
Closed 5 years ago.
Is there any way to pass a variable defined within R to the sqlQuery function within the RODBC package?
Specifically, I need to pass such a variable to either a scalar/table-valued function, a stored procedure, and/or perhaps the WHERE clause of a SELECT statement.
For example, let:
x <- 1 ## user-defined
Then,
example <- sqlQuery(myDB,"SELECT * FROM dbo.my_table_fn (x)")
Or...
example2 <- sqlQuery(myDB,"SELECT * FROM dbo.some_random_table AS foo WHERE foo.ID = x")
Or...
example3 <- sqlQuery(myDB,"EXEC dbo.my_stored_proc (x)")
Obviously, none of these work, but I'm thinking that there's something that enables this sort of functionality.
Build the string you intend to pass. So instead of
example <- sqlQuery(myDB,"SELECT * FROM dbo.my_table_fn (x)")
do
example <- sqlQuery(myDB, paste("SELECT * FROM dbo.my_table_fn (",
x, ")", sep=""))
which will fill in the value of x.
If you use sprintf, you can very easily build the query string using variable substitution. For extra ease-of-use, if you pre-parse that query string (I'm using stringr) you can write it over multiple lines in your code.
e.g.
q1 <- sprintf("
SELECT basketid, count(%s)
FROM %s
GROUP BY basketid
"
,item_barcode
,dbo.sales
)
q1 <- str_replace_all(str_replace_all(q1,"\n",""),"\\s+"," ")
df <- sqlQuery(shopping_database, q1)
Side-note and hat-tip to another R chap
Recently I found I wanted to make the variable substitution even simpler by using something like Python's string.format() function, which lets you reuse and reorder variables within the string
e.g.
$: w = "He{0}{0}{1} W{1}r{0}d".format("l","o")
$: print(w)
"Hello World"
However, this function doesn't appear to exist in R, so I asked around on Twitter, and a very helpful chap #kevin_ushey replied with his own custom function to be used in R. Check it out!
With more variables do this:
aaa <- "
SELECT ColOne, ColTwo
FROM TheTable
WHERE HpId = AAAA and
VariableId = BBBB and
convert (date,date ) < 'CCCC'
"
--------------------------
aaa <- gsub ("AAAA", toString(111),aaa)
aaa <- gsub ("BBBB", toString(2222),aaa)
aaa <- gsub ("CCCC", toString (2016-01-01) ,aaa)
try with this
x <- 1
example2 <- fn$sqlQuery(myDB,"SELECT * FROM dbo.some_random_table AS foo WHERE foo.ID = '$x'")