How to use dynamic variable in RMySQL LIKE Statement [duplicate] - mysql

This question already has answers here:
Dynamic "string" in R
(4 answers)
Add a dynamic value into RMySQL getQuery [duplicate]
(2 answers)
RSQLite query with user specified variable in the WHERE field [duplicate]
(2 answers)
Closed 5 years ago.
Is there any way to pass a variable defined within R to the sqlQuery function within the RODBC package?
Specifically, I need to pass such a variable to either a scalar/table-valued function, a stored procedure, and/or perhaps the WHERE clause of a SELECT statement.
For example, let:
x <- 1 ## user-defined
Then,
example <- sqlQuery(myDB,"SELECT * FROM dbo.my_table_fn (x)")
Or...
example2 <- sqlQuery(myDB,"SELECT * FROM dbo.some_random_table AS foo WHERE foo.ID = x")
Or...
example3 <- sqlQuery(myDB,"EXEC dbo.my_stored_proc (x)")
Obviously, none of these work, but I'm thinking that there's something that enables this sort of functionality.

Build the string you intend to pass. So instead of
example <- sqlQuery(myDB,"SELECT * FROM dbo.my_table_fn (x)")
do
example <- sqlQuery(myDB, paste("SELECT * FROM dbo.my_table_fn (",
x, ")", sep=""))
which will fill in the value of x.

If you use sprintf, you can very easily build the query string using variable substitution. For extra ease-of-use, if you pre-parse that query string (I'm using stringr) you can write it over multiple lines in your code.
e.g.
q1 <- sprintf("
SELECT basketid, count(%s)
FROM %s
GROUP BY basketid
"
,item_barcode
,dbo.sales
)
q1 <- str_replace_all(str_replace_all(q1,"\n",""),"\\s+"," ")
df <- sqlQuery(shopping_database, q1)
Side-note and hat-tip to another R chap
Recently I found I wanted to make the variable substitution even simpler by using something like Python's string.format() function, which lets you reuse and reorder variables within the string
e.g.
$: w = "He{0}{0}{1} W{1}r{0}d".format("l","o")
$: print(w)
"Hello World"
However, this function doesn't appear to exist in R, so I asked around on Twitter, and a very helpful chap #kevin_ushey replied with his own custom function to be used in R. Check it out!

With more variables do this:
aaa <- "
SELECT ColOne, ColTwo
FROM TheTable
WHERE HpId = AAAA and
VariableId = BBBB and
convert (date,date ) < 'CCCC'
"
--------------------------
aaa <- gsub ("AAAA", toString(111),aaa)
aaa <- gsub ("BBBB", toString(2222),aaa)
aaa <- gsub ("CCCC", toString (2016-01-01) ,aaa)

try with this
x <- 1
example2 <- fn$sqlQuery(myDB,"SELECT * FROM dbo.some_random_table AS foo WHERE foo.ID = '$x'")

Related

Double Quotes in temporary JSON variable on MySQL using R

I have a table in MYSQL that contains the user interactions with a Web Page, I needed to extract the rows for the users where the date of that interaction is lower than a certain benchmark date and that benchmark date is different for each customer (I extract that date from a different database).
My approach was to set a json variable in which the key is a user and the value is the benchmark date, and used it in the query to extract the intended fields.
Example in R:
#MainDF contains the user and the benchmark date from a different database
json_str <- mapply(function(uid, bench_date){
paste0(
'{','"',cust,'"', ':', '"', bench_date, '"','}'
)
}, MainDF[, 'uid'],
MainDF[, 'date']
)
json_str <- paste0("'", '[', paste0(json_str , collapse = ','), ']', "'")
temp_var <- paste('set #test=', json_str)
The intention was to make temp_var to be like:
set #test= '{"0001":"2010-05-05",
"0012":"2015-05-05",
"0101":"2018-07-20"}'
but it actually looks like :
set #test= '{\"0001\":\"2010-05-05\",
\"0012\":\"2015-05-05\",
\"0101\":\"2018-07-20\"}'
then create the main query:
main_Q <- "select user_id, date
from interaction
where 1=1
and json_contains(json_keys(#test), concat('\"',user_id,'\"')) = 1
and date <= json_unquote(json_extract(#test,
concat('$.','\"',user_id, '\"')
)
)
"
For the execution, first, set the temporal variable and then execute the main query
dbSendQuery(connection, temp_var)
resp <- dbSendQuery(connection, main_Q )
target_df <- fetch(resp, n=-1)
dbClearResult(resp )
When I test a fraction of it in a SQL IDE it does works. However, in R it doesn't return anything.
I think that the issue is that R escape the double quotes in temp_var and SQL end up reading
set #test= '{\"0001\":\"2010-05-05\",
\"0012\":\"2015-05-05\",
\"0101\":\"2018-07-20\"}'
which is not won't work.
For example if I execute:
set #test= '{"0001":"2010-05-05",
"0012":"2015-05-05",
"0101":"2018-07-20"}'
select json_keys(#test)
it will return an array with the keys, but that is not the case with
set #test= '{\"0001\":\"2010-05-05\",
\"0012\":\"2015-05-05\",
\"0101\":\"2018-07-20\"}'
select json_keys(#test)
I am not sure how to solve the issue, but I need double quotes to specify the JSON. Is there any other approach that I should try or a way to make this work?
First, I think it is generally better to use a well-known library/package for converting to/from JSON, for several reasons.
This gives you a string that you should be able to place just about anywhere.
json_str <- jsonlite::toJSON(setNames(as.list(MainDF$date), MainDF$uid), auto_unbox=TRUE)
json_str
# {"0001":"2010-05-05","0012":"2015-05-05","0101":"2018-07-20"}
And while looking at the object on the R console will give the escaped-doublequotes,
as.character(json_str)
# [1] "{\"0001\":\"2010-05-05\",\"0012\":\"2015-05-05\",\"0101\":\"2018-07-20\"}"
that is merely R's representation (shows all strings within double-quotes, and therefore needs to escape any double-quotes within the string).
Adding it into some script should be straight-forward:
cat(paste('set #test=', sQuote(json_str)), '\n')
# set #test= '{"0001":"2010-05-05","0012":"2015-05-05","0101":"2018-07-20"}'
I'm assuming that having each on its own row is not critical. If it is, and indentation is important, perhaps this is more your style:
spaces <- strrep(' ', 2+nchar('set #test = '))
cat(paste0('set #test = ', sQuote(gsub(",", paste0(",\n", spaces), json_str))), '\n')
# set #test = '{"0001":"2010-05-05",
# "0012":"2015-05-05",
# "0101":"2018-07-20"}'
Data:
MainDF <- read.csv(stringsAsFactors=FALSE, colClasses='character', text='
uid,date
0001,2010-05-05
0012,2015-05-05
0101,2018-07-20')

Single Quote R for SQL Query

Hello I have the following data that I want to paste into an SQL query through a R connection.
UKWinnersID<-c("1W167X6", "QM6VY8", "ZDNZX0", "8J49D8", "RGNSW9",
"BH7D3P1", "W31S84", "NTHDJ4", "H3UA1", "AH9N7",
"DF52B68", "K65C2", "VGT2Q0", "93LR6", "SJAJ0",
"WQBH47", "CP8PW9", "5H2TD5", "TFLKV4", "X42J1" )
The query / code in R is as following:
UKSQL6<-data.frame(sqlQuery(myConn, paste("SELECT TOP 10000 [AxiomaDate]
,[RiskModelID] ,[AxiomaID],[Factor1],[Factor2],[Factor3],[Factor4],[Factor5]
,[Factor6],[Factor7],[Factor8],[Factor9],[Factor10],[Factor11],[Factor12]
,[Factor13],[Factor14],[Factor15]FROM [PortfolioAnalytics].[Data_Axioma].[SecurityExposures]
Where AxiomaDate IN (
SELECT MAX(AxiomaDate)
FROM [PortfolioAnalytics].[Data_Axioma].[FactorReturns]
GROUP BY MONTH(AxiomaDate), YEAR(AxiomaDate))
AND RiskModelID = 8
AND AxiomaID IN(",paste(UKWinnersID, collapse = ","),")")))
I am pasting the UKWinnersID in the last line of the code above but that format of the UKWinnersID needs to be as ('1W167X6', 'QM6VY8', 'ZDNZX0'.. etc) with a single quote which I just cant get to work.
Consider running a parameterized query using the RODBCext package (extension of RODBC), assuming this is the API being used. Parameterized queries do more than insulate from SQL injection but abstracts data from code and avoids the messy quote enclosure and string interpolation and concatenation for cleaner, maintainable code.
Below replaces your TOP 10000 into TOP 500 for each of the 20 ids:
library(RODBC)
library(RODBCext)
conn <- odbcConnect("DBName", uid="user", pwd="password")
ids_df <- data.frame(UKWinnersID = c("1W167X6", "QM6VY8", "ZDNZX0", "8J49D8", "RGNSW9",
"BH7D3P1", "W31S84", "NTHDJ4", "H3UA1", "AH9N7",
"DF52B68", "K65C2", "VGT2Q0", "93LR6", "SJAJ0",
"WQBH47", "CP8PW9", "5H2TD5", "TFLKV4", "X42J1"))
# SQL STATEMENT (NO DATA)
query <- "SELECT TOP 500 [AxiomaDate], [RiskModelID], [AxiomaID], [Factor1],[Factor2]
, [Factor3], [Factor4], [Factor5], [Factor6], [Factor7], [Factor8]
, [Factor9], [Factor10], [Factor11], [Factor12]
, [Factor13], [Factor14], [Factor15]
FROM [PortfolioAnalytics].[Data_Axioma].[SecurityExposures]
WHERE AxiomaDate IN (
SELECT MAX(AxiomaDate)
FROM [PortfolioAnalytics].[Data_Axioma].[FactorReturns]
GROUP BY MONTH(AxiomaDate), YEAR(AxiomaDate)
)
AND RiskModelID = 8
AND AxiomaID = ?"
# PASS DATAFRAME VALUES TO BIND TO QUERY PARAMETERS
UKSQL6 <- sqlExecute(conn, query, ids_df, fetch=TRUE)
odbcClose(conn)
Alternatively, if you really need to use the IN() clause:
# SQL STATEMENT (NO DATA)
query <- paste("SELECT TOP 10000
...same as above...
AND AxiomaID IN (", paste(rep("?", nrow(ids_df)), collapse=", "), ")")
# TRANSPOSE DATA FRAME FOR COLUMN EQUAL TO ? PLACEHOLDERS
UKSQL6 <- sqlExecute(conn, query, t(ids_df), fetch=TRUE)

r fetch data from mysql db loop

I successfully fetch data from my mysql db using r:
library(RMySQL)
mydb = dbConnect(MySQL(), user='user', password='pass', dbname='fib', host='myhost')
rs = dbSendQuery(mydb, 'SELECT distinct(DATE(date)) as date, open,close FROM stocksng WHERE symbol = "FIB7F";')
data <- fetch(rs, n=-1)
dbHasCompleted(rs)
so now I've an object a list:
> print (typeof(data))
[1] "list"
each elements is a tuple(?) like date(charts),open(long),close(long)
ok well now my problem: I want to get a vector of percentuale difference betwen close (x) and next day open (x+1) until the end BUT I can't access properly to the item!
Example: ((open)/close*100)-100)
I try:
for (item in data){
print (item[2])
}
and all possible combination like:
for (item in data){
print (item[][2])
}
but cannot access to right element :! anyone could help?
You have a bigger problem than this in your MySQL query, because you did not specify an ORDER BY clause. Consider using the following query:
SELECT DISTINCT
DATE(date) AS date,
open,
close
FROM stocksng
WHERE
symbol = "FIB7F"
ORDER BY
date
Here we order the result set by date, so that it makes sense to speak of the current and next open or close. Now with a proper query in place if you wanted to get the percentile difference between the current close and the next day open you could try:
require(dplyr)
(lead(open, 1) / close*100) - 100
Or using base R:
(open[2:(length(open)+1)] / close*100) - 100
naif version:
for (row in 1:nrow(data)){
date <- unname (data[row,"date"])
open <- unname (data[row+1,"open"])
close <- unname (data[row,"close"])
var <- abs((close/open*100)-100)
print (var)
}

R convert json to list to data.table

I have a data.table where one of the columns contains JSON. I am trying to extract the content so that each variable is a column.
library(jsonlite)
library(data.table)
df<-data.table(a=c('{"tag_id":"34","response_id":2}',
'{"tag_id":"4","response_id":1,"other":4}',
'{"tag_id":"34"}'),stringsAsFactors=F)
The desired result, that does not refer to the "other" variable:
tag_id response_id
1 "34" 2
2 "4" 1
3 "34" NA
I have tried several versions of:
parseLog <- function(x){
if (is.na(x))
e=c(tag_id=NA,response_id=NA)
else{
j=fromJSON(x)
e=c(tag_id=as.integer(j$tag_id),response_id=j$response_id)
}
e
}
that seems to work well to retrieve a list of vectors (or lists if c is replaced by list) but when I try to convert the list to data.table something doesn´t work as expected.
parsed<-lapply(df$a,parseLog)
rparsed<-do.call(rbind.data.frame,parsed)
colnames(rparsed)<-c("tag_id","response_id")
Because of the missing value in the third row. How can I solve it in a R-ish clean way? How can I make that my parse method returns an NA for the missing value. Alternative, Is there a parameter "fill" like there is for rbind that can be used in rbind.data.frame or analogous method?
The dataset I am using has 11M rows so performance is important.
Additionally, there is an equivalent method to rbind.data.frame to obtain a data.table. How would that be used? When I check the documentation it refers me to rbindlist but it complains the parameter is not used and if call directly(without do.call it complains about the type of parsed):
rparsed<-do.call(rbindlist,fill=T,parsed)
EDIT: The case I need to cover is more general, in a set of 11M records all the possible circumstances happen:
df<-data.table(a=c('{"tag_id":"34","response_id":2}',
'{"trash":"34","useless":2}',
'{"tag_id":"4","response_id":1,"other":4}',
NA,
'{"response_id":"34"}',
'{"tag_id":"34"}'),stringsAsFactors=F)
and the output should only contain tag_id and response_id columns.
There might be a simpler way but this seems to be working:
library(data.table)
library(jsonlite)
df[, json := sapply(a, fromJSON)][, rbindlist(lapply(json, data.frame), fill=TRUE)]
#or if you need all the columns :
#df[, json := sapply(a, fromJSON)][,
# c('tag_id', 'response_id') := rbindlist(lapply(json, data.frame), fill=TRUE)]
Output:
> df[, json := sapply(a, fromJSON)][, rbindlist(lapply(json, data.frame), fill=TRUE)]
tag_id response_id
1: 34 2
2: 4 1
3: 34 NA
EDIT:
This solution comes after the edit of the question with additional requests.
There are lots of ways to do this but I find the simplest one is at the creation of the data.frame like this:
df[, json := sapply(a, fromJSON)][,
rbindlist(lapply(json, function(x) data.frame(x)[-3]), fill=TRUE)]
# tag_id response_id
#1: 34 2
#2: 4 1
#3: 34 NA

Trying to append col.name on to a vector

I have several functions that I am trying to implement in R(studio). I will show the simplest one. I am trying to append names on to a vector for later use as a col.name.
# Initialize
headerA <- vector(mode="character",length=20)
headerA[1]="source";headerA[2]="matches"
# Function - add on new name
h <- function(df, compareA, compareB) {
new_header <- paste(compareA,"Vs",compareB,sep="_")
data.frame(df,new_header)
}
# Comparison 1:
compareA <-"AA"
compareB <-"BB"
headers <- (headerA, compareA, compareB)
But I am getting this error and it is very puzzling. I have googled it but the search is too vague/broad.
When run I get:
headers <- (headerA, compareA, compareB)
Error: unexpected ',' in "headers <- (headerA,"
The second error for the other function is similar...
It looks like you're missing a call to your function h and just have an open ( instead:
headers <- h(headerA, compareA, compareB)
Results in:
df new_header
1 source AA_Vs_BB
2 matches AA_Vs_BB
3 AA_Vs_BB
4 AA_Vs_BB
...