How specify join variables with different names in different MySQL tables - mysql

I need to join two tables where the common column-id that I want to use has a different name in each table. The two tables have a "false" common column name that does not work when dplyr takes the default and joins on columns "id".
Here's some of the code involved in this problem
library(dplyr)
library(RMySQL)
SDB <- src_mysql(host = "localhost", user = "foo", dbname = "bar", password = getPassword())
# Then reference a tbl within that src
administrators <- tbl(SDB, "administrators")
members <- tbl(SDB, "members")
Here are 3 attempts -- that all fail -- to pass along the information that the common column on the members side is "id" and on the adminisrators side it's "idmember":
sqlq <- semi_join(members,administrators, by=c("id","idmember"))
sqlq <- inner_join(members,administrators, by= "id.x = idmember.y")
sqlq <- semi_join(members,administrators, by.x = id, by.y = idmember)
Here's an example of the kinds of error messages I'm getting:
Error in mysqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not run statement: Unknown column '_LEFT.idmember' in 'where clause')
The examples I see out there pertain to data tables and data frames on the R side. My question is about how dplyr sends "by" statements to a SQL engine.

In the next version of dplyr, you'll be able to do:
inner_join(members, administrators, by = c("id" = "idmember"))

Looks like this is an unresolved issue:
https://github.com/hadley/dplyr/issues/177
However you can use merge:
❥ admin <- as.tbl(data.frame(id = c("1","2","3"),false = c(TRUE,FALSE,FALSE)))
❥ members <- as.tbl(data.frame(idmember = c("1","2","4"),false = c(TRUE,TRUE,FALSE)))
❥ merge(admin,members, by.x = "id", by.y = "idmember")
id false.x false.y
1 1 TRUE TRUE
2 2 FALSE TRUE
If you need to do left or outer joins, you can always use the ALL.x, or ALL arguments to merge. A thought though... You've got a sql db, why not use it?
❥ con2 <- dbConnect(MySQL(), host = "localhost", user = "foo", dbname = "bar", password = getPassword())
❥ dbGetQuery(con, "select * from admin join members on id = idmember")

Related

Extract DB name and table name from snowflake query using snowsql

Insert into D.d
Select * from A.a join B.b on
A.a.a1=B.b.b1
Join C.c on C.c.c1=B.b.b1
I have complex statements for which i need to extract source db name ( in above statement source DB are A,B,C and source tables are a,b,c &Target Db is D and target table is d)
Need output like
SourceDB SourceTbl TargetDB Targettbl
A,B,C a,b,c D d
Or we can get values in json format as well for each field.. Also this needs to accomodate for update and delete statements as well. Please assist
Thanks
You can use the SQLPARSE to parse the statement. I am providing a code below which is not optimally and efficiently written, but it has the logic to get the information
import sqlparse
raw = 'Insert into D.d ' \
'Select * from A.a join B.b on ' \
'A.a.a1=B.b.b1 Join C.c on C.c.c1=B.b.b1;'
parsed = sqlparse.parse(raw)[0]
tgt_switch = "N"
src_switch = "N"
src_table=[]
tgt_table= ""
for items in parsed.tokens:
#print(items,items.ttype)
if str(items) == "into":
tgt_switch ="Y"
if tgt_switch == "Y" and items.ttype is None:
tgt_switch = "N"
tgt_table = items
if str(items).lower() == "from" or str(items).lower() == "join":
src_switch = "Y"
if src_switch == "Y" and items.ttype is None:
src_switch = "N"
src_table.append(str(items))
target_db = str(tgt_table).split(".")[0]
target_tbl = str(tgt_table).split(".")[1]
print("Target DB is {} and Target table is {}".format(target_db,target_tbl))
for obj in src_table:
src_db = str(obj).split(".")[0]
src_tbl = str(obj).split(".")[1]
print("Source DB is {} and Source table is {}".format(src_db, src_tbl))
Snowflake does not offer any SQL statement parsing support. You can hack at it with regex'es, of course, or use any of the tools on the market.
If this query ran, and ran successfully, you can use ACCESS_HISTORY view https://docs.snowflake.com/en/sql-reference/account-usage/access_history.html to see which tables (A.a, B.b, C.c, D.d) and columns (A.a.a1, B.b.b1, C.c.c1, D.d.d1) it accessed and how (read or write).

R to MySQL query

I got a data frame in R querying a SQL Server DB, Now I want to loop on each line and insert it to MySQL DB
Tried with dbwritetable but it didn't work
library(RODBC)
library(odbc)
library(RMySQL)
con <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "XX",
Database = "XX",
UID = "XX",
PWD = "XX",
Port = XX)
mydb = dbConnect(MySQL(), user='XX', password='XX', dbname='YY', host='YYY')
resultset <- dbGetQuery(con, "SET NOCOUNT ON
DECLARE #StartDate DateTime
DECLARE #EndDate DateTime
SET #StartDate = dateadd(d,-1,getdate())
SET #EndDate = getdate()
SET NOCOUNT OFF
SELECT …..
LEFT JOIN ... ON ….
LEFT JOIN …. ON x.Key = y.Key
WHERE temp.StartDateTime >= #StartDate")
nrows <- nrow(resultset)
colnames(resultset) <- c("tagName", "date_inserted","value") `
So in here I got my result, in resultset but I don't know how to insert the resulset in MySQL
dbWriteTable(mydb, name='data', value=resultset[0,],append=TRUE)
dbReadTable(mydb, "data")
I Expect to insert the data, but I don't know should it be a for loop (for each line a query) or how is it done
More details with this images :
This is my data set
This is MySQL DB structure
Try using a parameterized insert using the RODBCext package. I have used the following function in the past.
This will append records into your database
library(RODBC)
library(RODBCext)
First we need to connect to the database using RODBC.
sql.driver = "MySQL ODBC 5.3 ANSI Driver" # need to figure the version out
sql.server = "your_server_here"
sql.port = "3306" # or whatever your port number is
sql.user = "your_user_name_here"
sql.pass = "your_password_name_here"
sql.db = "your_database_name_here"
con.string = paste0("Driver=", sql.driver, ";",
"Server=", sql.server, ";",
"Port=", sql.port, ";",
"Uid=", sql.user, ";",
"Pwd=", sql.pass, ";",
"Database=", sql.db, ";")
ch = odbcDriverConnect(con.string)
Then here is the custom function saveTable(). You will want to run this with your specific inputs, defined below.
saveTable <- function(con, table_name, df) {
# con = the ODBC connection (e.g., ch)
# table_name = the SQL database table to append to
# df = the data.frame() to append
sql_code = paste("INSERT INTO",table_name,"(",paste(colnames(df),collapse=", "),") VALUES (",paste(rep("?",ncol(df)),collapse=","),")")
sqlExecute(con, sql_code, df)
}

Values in my dataframe are not the same as ones in my database table(R and MySQL))

I am getting tweet_id from a table in my database and storing them in a dataframe in r. The problem is that the tweet_id values are not being added correctly in dataframe.
snapshot of my table:
snapshot of my dataframe in rstudio:
As you can see there is no tweet_id = '882100387989291008'(3rd value in my dataframe) in my database table
my Rscript file:
#connecting with db
#myDB = dbConnect(MySQL(), user = "root", password = "F33mtHaDD", dbname = "dashboard", host= "127.0.0.1", port="8889")
myDB =dbConnect(MySQL(), user = "root", password ="F33mtHaDD", dbname = "dashboard")
options(scipen=10)
options()$scipen
#running a query and retriving data and saving it in a object
rs = dbSendQuery(myDB, "SELECT tweet_id, sentiment, text FROM dashboard.sen_tweets_twitter WHERE text <> '';")
#getting the result. The function fetch() saves the result in a dataframe
datafetd = fetch(rs, n=-1)
#removing extra whitespaces
#new = stripWhitespace(datafetd$text)
#dataafterclean =data.frame(new)
#converts into one single string
review_text = paste(datafetd$text)
review_id = paste(datafetd$tweet_id)
print(review_id)
rm(tm_tdm)
#find the number of data
tweets_num = length(review_text)
#Disconnect connections
dbdisconnect = lapply(dbListConnections( dbDriver( drv = "MySQL")), dbDisconnect)
#checking if all connection has been closed
dbListConnections(MySQL())
The values in my database are the correct ones.How do i solve this problem?
Database tables represent unordered sets of data. In your table snapshot, it appears that the records are sorted by ID in ascending order. I postulate that all the data did in fact make it into your data frame, but that data frame has a different order than what you showed when querying your table. To confirm this, you can try sorting the data frame ascending on the ID:
datafetd[with(datafetd, order(ID)), ]

R and R-SQL API to execute SQL query

For an assignment, I need to "use SQL to extract all tweets in twitter message-table under those 3 user ids in the previous step." I am currently confused with grabbing the tweet info from MySQL using the vector,x, in R.
I keep getting this error message, "Error in .local(conn, statement, ...) :
unused argument (c(18949452, 34713362, 477583514))."
#use SQL to get a list of unique user id in twitter message table as a
#vector in R.
res <- dbSendQuery(con, statement = "select user_id from
twitter_message")
user_id <- dbFetch(res)
user_id
nrow(user_id)
#randomly selects : use R to randomly generate 3 user id
x <- user_id[sample(nrow(user_id), 3, replace = FALSE, prob = NULL),]
x
res2 = dbSendQuery(con, statement = 'SELECT twitter_message WHERE
user_id =',x)
tweets <- dbFetch(res2)
tweets
x is a vector, so you maybe you should use the dbSendQuery function in a loop. For each element in x, pass its value in your dbSendQuery statement. Does that make sense?

For loop in python and use elements of list as variables for MYSQL query

two questions:
So I have a list of filenames, each of which I would like to feed into a MYSQL query.
The first questions is how to loop through the filelist and pass the elements (the filenames) as a variable to MYSQL?
The second question is: How do I print the results in a more elegant way without the parenthesis and L's form the Tuple output that is returned? THe way I have below works for three columns, but I'd like a flexible way that I don't have to add sublists (cleaned1, 2..) when I fetch more rows.
Any help highly appreciated!!!
MyConnection = MySQLdb.connect( host = "localhost", user = "root", \
passwd = "xxxx", db = "xxxx")
MyCursor = MyConnection.cursor()
**MyList= (File1, File2, File3, File...., File36)
For i in Mylist:
do MYSQL query**
SQL = """SELECT a.column1, a.column2, b.column2 FROM **i in MyList** a, table2 b WHERE
a.column1=b.column1;"""
SQLLen = MyCursor.execute(SQL) # returns the number of records retrieved
AllOut = MyCursor.fetchall()
**List = list(AllOut) # this puts all the TUple information into a list
cleaned = [i[0] for i in List] # this cleans up the Tuple characters)
cleaned1 = [i[1] for i in List] # this cleans up the Tuple characters)
cleaned2 = [i[2] for i in List] # this cleans up the Tuple characters)
NewList=zip(cleaned,cleaned1,cleaned2) # This makes a new List
print NewList[0:10]**
# Close the files
MyCursor.close()
MyConnection.close()
I can figure out the saving to file, but I don't know how to pass a python variable into MYSQL.
convert the tuple to a list first: using
MyList = list(MyList)
and you will have two options:
try this:
for tablename in MyList:
c.execute("SELECT a.column1, a.column2, b.column2 FROM %s a, table2 b WHERE a.column1=b.column1", (tablename))
or :
for tablename in MyList:
SQL= "SELECT a.column1, a.column2, b.column2 FROM tablevar a, table2 b WHERE a.column1=b.column1"
SQL = SQL.replace('tablevar', tablename)
c.execute(SQL)
to print the results without the brackets you can use :
for tablename in MyList:
print tablename