Double Quotes in temporary JSON variable on MySQL using R - mysql

I have a table in MYSQL that contains the user interactions with a Web Page, I needed to extract the rows for the users where the date of that interaction is lower than a certain benchmark date and that benchmark date is different for each customer (I extract that date from a different database).
My approach was to set a json variable in which the key is a user and the value is the benchmark date, and used it in the query to extract the intended fields.
Example in R:
#MainDF contains the user and the benchmark date from a different database
json_str <- mapply(function(uid, bench_date){
paste0(
'{','"',cust,'"', ':', '"', bench_date, '"','}'
)
}, MainDF[, 'uid'],
MainDF[, 'date']
)
json_str <- paste0("'", '[', paste0(json_str , collapse = ','), ']', "'")
temp_var <- paste('set #test=', json_str)
The intention was to make temp_var to be like:
set #test= '{"0001":"2010-05-05",
"0012":"2015-05-05",
"0101":"2018-07-20"}'
but it actually looks like :
set #test= '{\"0001\":\"2010-05-05\",
\"0012\":\"2015-05-05\",
\"0101\":\"2018-07-20\"}'
then create the main query:
main_Q <- "select user_id, date
from interaction
where 1=1
and json_contains(json_keys(#test), concat('\"',user_id,'\"')) = 1
and date <= json_unquote(json_extract(#test,
concat('$.','\"',user_id, '\"')
)
)
"
For the execution, first, set the temporal variable and then execute the main query
dbSendQuery(connection, temp_var)
resp <- dbSendQuery(connection, main_Q )
target_df <- fetch(resp, n=-1)
dbClearResult(resp )
When I test a fraction of it in a SQL IDE it does works. However, in R it doesn't return anything.
I think that the issue is that R escape the double quotes in temp_var and SQL end up reading
set #test= '{\"0001\":\"2010-05-05\",
\"0012\":\"2015-05-05\",
\"0101\":\"2018-07-20\"}'
which is not won't work.
For example if I execute:
set #test= '{"0001":"2010-05-05",
"0012":"2015-05-05",
"0101":"2018-07-20"}'
select json_keys(#test)
it will return an array with the keys, but that is not the case with
set #test= '{\"0001\":\"2010-05-05\",
\"0012\":\"2015-05-05\",
\"0101\":\"2018-07-20\"}'
select json_keys(#test)
I am not sure how to solve the issue, but I need double quotes to specify the JSON. Is there any other approach that I should try or a way to make this work?

First, I think it is generally better to use a well-known library/package for converting to/from JSON, for several reasons.
This gives you a string that you should be able to place just about anywhere.
json_str <- jsonlite::toJSON(setNames(as.list(MainDF$date), MainDF$uid), auto_unbox=TRUE)
json_str
# {"0001":"2010-05-05","0012":"2015-05-05","0101":"2018-07-20"}
And while looking at the object on the R console will give the escaped-doublequotes,
as.character(json_str)
# [1] "{\"0001\":\"2010-05-05\",\"0012\":\"2015-05-05\",\"0101\":\"2018-07-20\"}"
that is merely R's representation (shows all strings within double-quotes, and therefore needs to escape any double-quotes within the string).
Adding it into some script should be straight-forward:
cat(paste('set #test=', sQuote(json_str)), '\n')
# set #test= '{"0001":"2010-05-05","0012":"2015-05-05","0101":"2018-07-20"}'
I'm assuming that having each on its own row is not critical. If it is, and indentation is important, perhaps this is more your style:
spaces <- strrep(' ', 2+nchar('set #test = '))
cat(paste0('set #test = ', sQuote(gsub(",", paste0(",\n", spaces), json_str))), '\n')
# set #test = '{"0001":"2010-05-05",
# "0012":"2015-05-05",
# "0101":"2018-07-20"}'
Data:
MainDF <- read.csv(stringsAsFactors=FALSE, colClasses='character', text='
uid,date
0001,2010-05-05
0012,2015-05-05
0101,2018-07-20')

Related

Exporting data from R to MYSQL server

df <- data.frame(category = c("A","B","A","D","E"),
date = c("5/10/2005","6/10/2005","7/10/2005","8/10/2005","9/10/2005"),
col1 = c(1,NA,2,NA,3),
col2 = c(1,2,NA,4,5),
col3 = c(2,3,NA,NA,4))
I have to insert a data frame that is created in R to mysql server.
I have tried these methods(Efficient way to insert data frame from R to SQL). However, my data also has NA which are fails the whole process of exporting.
Is there a way around to faster upload to data.
dbWriteTable(cn,name ="table_name",value = df,overwrite=TRUE, row.names = FALSE)
The above works but is very slow to upload
The method that I have to use is this :
before = Sys.time()
chunksize = 1000000 # arbitrary chunk size
for (i in 1:ceiling(nrow(df)/chunksize)) {
query = paste0('INSERT INTO dashboard_file_new_rohan_testing (',paste0(colnames(df),collapse = ','),') VALUES ')
vals = NULL
for (j in 1:chunksize) {
k = (i-1)*chunksize+j
if (k <= nrow(df)) {
vals[j] = paste0('(', paste0(df[k,],collapse = ','), ')')
}
}
query = paste0(query, paste0(vals,collapse=','))
dbExecute(cn, query)
}
time_chunked = Sys.time() - before
Error Encountered:
Error in .local(conn, statement, ...) :
could not run statement: Unknown column 'NA' in 'field list'
One of the fastest ways to load data into MySQL is to use its LOAD DATA command line tool. You may try first writing your R data frame to a CSV file, then using MySQL's LOAD DATA to load it:
write.csv(df, "output.csv", row.names=FALSE)
Then from your command line, use:
LOAD DATA INFILE 'output.csv' INTO TABLE table_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
Note that this assumes the CSV file is already on the same machine as MySQL. If not, and you have it still locally, then use LOAD DATA LOCAL INFILE instead.
You may read MYSQL import data from csv using LOAD DATA INFILE for more help using LOAD DATA.
Edit:
To deal with the issue of NA values, which should represent NULL in MySQL, you may take the approach of first casting the entire data frame to text, and then replacing the NA values with empty string. LOAD DATA will interpret a missing value in a CSV column as being NULL. Consider this:
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
df[is.na(df)] <- ""
Then, use write.csv along with LOAD DATA as described above.

Python 3 psycopg2 COPY from stdin failed: error in .read()

I am trying to apply the code found on this page, in particular part 'Copy Data from String Iterator' of the Table of Contents, but run into an issue with my code.
Since not all lines coming from the generator (here log_lines) can be imported into the PostgreSQL database, I try to filter the correct lines (here row) using itertools.filterfalse like in the codeblock below:
def copy_string_iterator(connection, log_lines) -> None:
with connection.cursor() as cursor:
create_staging_table(cursor)
log_string_iterator = StringIteratorIO((
'|'.join(map(clean_csv_value, (
row['date'],
row['time'],
row['cs_uri_query'],
row['s_contentpath'],
row['sc_status'],
row['s_computername'],
...
row['sc_substates'],
row['s_port'],
row['cs_version'],
row['c_protocol'],
row.update({'cs_cookie':'x'}),
row['timetakenms'],
row['cs_uri_stem'],
))) + '\n')
for row in filterfalse(lambda line: "#" in line.get('date'), log_lines)
)
cursor.copy_from(log_string_iterator, 'log_table', sep = '|')
When I run this, cursor.copy_from() gives me the following error:
QueryCanceled: COPY from stdin failed: error in .read() call
CONTEXT: COPY log_table, line 112910
I understand why this error happens, it is because in the test file I use there are only 112909 lines that meet the filterfalse condition. But why does it try to copy line 112910 and throw the error and not just stop?
Since Python doesn't have a coalescing operator, add something like:
(map(clean_csv_value, (
row['date'] if 'date' in row else None,
:
row['cs_uri_stem'] if 'cs_uri_stem' in row else None,
))) + '\n')
for each of your fields so you can handle any missing fields in the JSON file. Of course the fields should be nullable in the db if you use None otherwise replace with None with some default value for that field.

Single Quote R for SQL Query

Hello I have the following data that I want to paste into an SQL query through a R connection.
UKWinnersID<-c("1W167X6", "QM6VY8", "ZDNZX0", "8J49D8", "RGNSW9",
"BH7D3P1", "W31S84", "NTHDJ4", "H3UA1", "AH9N7",
"DF52B68", "K65C2", "VGT2Q0", "93LR6", "SJAJ0",
"WQBH47", "CP8PW9", "5H2TD5", "TFLKV4", "X42J1" )
The query / code in R is as following:
UKSQL6<-data.frame(sqlQuery(myConn, paste("SELECT TOP 10000 [AxiomaDate]
,[RiskModelID] ,[AxiomaID],[Factor1],[Factor2],[Factor3],[Factor4],[Factor5]
,[Factor6],[Factor7],[Factor8],[Factor9],[Factor10],[Factor11],[Factor12]
,[Factor13],[Factor14],[Factor15]FROM [PortfolioAnalytics].[Data_Axioma].[SecurityExposures]
Where AxiomaDate IN (
SELECT MAX(AxiomaDate)
FROM [PortfolioAnalytics].[Data_Axioma].[FactorReturns]
GROUP BY MONTH(AxiomaDate), YEAR(AxiomaDate))
AND RiskModelID = 8
AND AxiomaID IN(",paste(UKWinnersID, collapse = ","),")")))
I am pasting the UKWinnersID in the last line of the code above but that format of the UKWinnersID needs to be as ('1W167X6', 'QM6VY8', 'ZDNZX0'.. etc) with a single quote which I just cant get to work.
Consider running a parameterized query using the RODBCext package (extension of RODBC), assuming this is the API being used. Parameterized queries do more than insulate from SQL injection but abstracts data from code and avoids the messy quote enclosure and string interpolation and concatenation for cleaner, maintainable code.
Below replaces your TOP 10000 into TOP 500 for each of the 20 ids:
library(RODBC)
library(RODBCext)
conn <- odbcConnect("DBName", uid="user", pwd="password")
ids_df <- data.frame(UKWinnersID = c("1W167X6", "QM6VY8", "ZDNZX0", "8J49D8", "RGNSW9",
"BH7D3P1", "W31S84", "NTHDJ4", "H3UA1", "AH9N7",
"DF52B68", "K65C2", "VGT2Q0", "93LR6", "SJAJ0",
"WQBH47", "CP8PW9", "5H2TD5", "TFLKV4", "X42J1"))
# SQL STATEMENT (NO DATA)
query <- "SELECT TOP 500 [AxiomaDate], [RiskModelID], [AxiomaID], [Factor1],[Factor2]
, [Factor3], [Factor4], [Factor5], [Factor6], [Factor7], [Factor8]
, [Factor9], [Factor10], [Factor11], [Factor12]
, [Factor13], [Factor14], [Factor15]
FROM [PortfolioAnalytics].[Data_Axioma].[SecurityExposures]
WHERE AxiomaDate IN (
SELECT MAX(AxiomaDate)
FROM [PortfolioAnalytics].[Data_Axioma].[FactorReturns]
GROUP BY MONTH(AxiomaDate), YEAR(AxiomaDate)
)
AND RiskModelID = 8
AND AxiomaID = ?"
# PASS DATAFRAME VALUES TO BIND TO QUERY PARAMETERS
UKSQL6 <- sqlExecute(conn, query, ids_df, fetch=TRUE)
odbcClose(conn)
Alternatively, if you really need to use the IN() clause:
# SQL STATEMENT (NO DATA)
query <- paste("SELECT TOP 10000
...same as above...
AND AxiomaID IN (", paste(rep("?", nrow(ids_df)), collapse=", "), ")")
# TRANSPOSE DATA FRAME FOR COLUMN EQUAL TO ? PLACEHOLDERS
UKSQL6 <- sqlExecute(conn, query, t(ids_df), fetch=TRUE)

transforming / updating / typecasting list values from int & num to strings

I'm retrieving data from a MySQL database using an R script and then writing it to a CSV, but I'm having an issue where two of the columns of data that I want to write out as strings are being written out as integers and numbers (in this case, in scientific notation).
I would like to have these written out as string values instead, but I'm not finding this is a straightforward task, in spite of doing a fair bit of googling and experimentation.
The relevant code:
conn <- dbConnect(MySQL(), host = "127.0.0.1", user="REDACTED", password="REDACTED", dbname="REDACTED", port=8906)
type_data <- dbGetQuery(conn, paste("SELECT * FROM ", arg, " WHERE 1 LIMIT 10", sep=""))
# Problem: "Subscribed" and "TimeUpdated" are coming through as numbers instead of strings
write.csv(type_data, paste("./",arg,".csv", sep=""), row.names=F)
dbDisconnect(conn)
Desired results:
"Id","EntityId","EntityType","CommunicationType","Subscribed","TimeUpdated"
"0002INKRyUolIrjG5DbUa0lDqUjxt","4374484","PERSON","MFS","1","1385297883000000000"
"0004WaXpmvbOh3WG3hd6kQtPINibv","8361929","PERSON","MFS","1","1437798832740631885"
"0005l1fy1TJiFhyiEK2IXRCxfqee5","4197014","PERSON","SURVEYS_AND_POLLS","0","1146917239000000000"
"0008Qb2ra1PoSLgbumc2wmDfvexx8","4155704","PERSON","MFS","1","1345053223000000000"
"000C1IKgHrFaqmlHlKGGhigGyoaw4","4515071","PERSON","PARTNER","1","1215098959000000000"
"000Czw8Gv5w3eNoOmOFVTKLIuc2ti","4372360","PERSON","MFS","1","1384952236000000000"
"000DOsk9xlYKvs11PzZFRgmOpYfiA","4347384","PERSON","SURVEYS_AND_POLLS","1","1177513307000000000"
"000IQ4TKYHAbb334zFYdWVCZZfMYo","4470083","PERSON","PARTNER","1","1446945757133940400"
"000LbifV4rUa2MhxFlVZ52PSek5kG","499194","PERSON","SURVEYS_AND_POLLS","0","1097867573000000000"
Actual results:
"Id","EntityId","EntityType","CommunicationType","Subscribed","TimeUpdated"
"0002INKRyUolIrjG5DbUa0lDqUjxt","4374484","PERSON","MFS",1,1.385297883e+18
"0004WaXpmvbOh3WG3hd6kQtPINibv","8361929","PERSON","MFS",1,1437798832740631808
"0005l1fy1TJiFhyiEK2IXRCxfqee5","4197014","PERSON","SURVEYS_AND_POLLS",0,1.146917239e+18
"0008Qb2ra1PoSLgbumc2wmDfvexx8","4155704","PERSON","MFS",1,1.345053223e+18
"000C1IKgHrFaqmlHlKGGhigGyoaw4","4515071","PERSON","PARTNER",1,1.215098959e+18
"000Czw8Gv5w3eNoOmOFVTKLIuc2ti","4372360","PERSON","MFS",1,1.384952236e+18
"000DOsk9xlYKvs11PzZFRgmOpYfiA","4347384","PERSON","SURVEYS_AND_POLLS",1,1.177513307e+18
"000IQ4TKYHAbb334zFYdWVCZZfMYo","4470083","PERSON","PARTNER",1,1446945757133940480
"000LbifV4rUa2MhxFlVZ52PSek5kG","499194","PERSON","SURVEYS_AND_POLLS",0,1.097867573e+18
"000OWvUHdmjeL34XzuVLmHQBple7X","4176205","PERSON","MFS",1,1.143985154e+18
Assistance would be most appreciated!
Thanks to #Bernhard for the help with this - here's a working solution.
options(scipen = 999) # so that TimeUpdated isn't outputted using scientific notation
conn <- dbConnect(MySQL(), host = "127.0.0.1", user="REDACTED", password="REDACTED", dbname="REDACTED", port=8906)
type_data <- dbGetQuery(conn, paste("SELECT * FROM ", arg, " WHERE 1", sep=""))
# convert the subscribed and timeupdated columns to strings
type_data$Subscribed <- as.character(type_data$Subscribed)
type_data$TimeUpdated <- as.character(type_data$TimeUpdated)
write.csv(type_data, paste(args[[1]], "/", arg, ".csv", sep=""), row.names=F)
dbDisconnect(conn)

How to use dynamic variable in RMySQL LIKE Statement [duplicate]

This question already has answers here:
Dynamic "string" in R
(4 answers)
Add a dynamic value into RMySQL getQuery [duplicate]
(2 answers)
RSQLite query with user specified variable in the WHERE field [duplicate]
(2 answers)
Closed 5 years ago.
Is there any way to pass a variable defined within R to the sqlQuery function within the RODBC package?
Specifically, I need to pass such a variable to either a scalar/table-valued function, a stored procedure, and/or perhaps the WHERE clause of a SELECT statement.
For example, let:
x <- 1 ## user-defined
Then,
example <- sqlQuery(myDB,"SELECT * FROM dbo.my_table_fn (x)")
Or...
example2 <- sqlQuery(myDB,"SELECT * FROM dbo.some_random_table AS foo WHERE foo.ID = x")
Or...
example3 <- sqlQuery(myDB,"EXEC dbo.my_stored_proc (x)")
Obviously, none of these work, but I'm thinking that there's something that enables this sort of functionality.
Build the string you intend to pass. So instead of
example <- sqlQuery(myDB,"SELECT * FROM dbo.my_table_fn (x)")
do
example <- sqlQuery(myDB, paste("SELECT * FROM dbo.my_table_fn (",
x, ")", sep=""))
which will fill in the value of x.
If you use sprintf, you can very easily build the query string using variable substitution. For extra ease-of-use, if you pre-parse that query string (I'm using stringr) you can write it over multiple lines in your code.
e.g.
q1 <- sprintf("
SELECT basketid, count(%s)
FROM %s
GROUP BY basketid
"
,item_barcode
,dbo.sales
)
q1 <- str_replace_all(str_replace_all(q1,"\n",""),"\\s+"," ")
df <- sqlQuery(shopping_database, q1)
Side-note and hat-tip to another R chap
Recently I found I wanted to make the variable substitution even simpler by using something like Python's string.format() function, which lets you reuse and reorder variables within the string
e.g.
$: w = "He{0}{0}{1} W{1}r{0}d".format("l","o")
$: print(w)
"Hello World"
However, this function doesn't appear to exist in R, so I asked around on Twitter, and a very helpful chap #kevin_ushey replied with his own custom function to be used in R. Check it out!
With more variables do this:
aaa <- "
SELECT ColOne, ColTwo
FROM TheTable
WHERE HpId = AAAA and
VariableId = BBBB and
convert (date,date ) < 'CCCC'
"
--------------------------
aaa <- gsub ("AAAA", toString(111),aaa)
aaa <- gsub ("BBBB", toString(2222),aaa)
aaa <- gsub ("CCCC", toString (2016-01-01) ,aaa)
try with this
x <- 1
example2 <- fn$sqlQuery(myDB,"SELECT * FROM dbo.some_random_table AS foo WHERE foo.ID = '$x'")