Extract DB name and table name from snowflake query using snowsql - mysql

Insert into D.d
Select * from A.a join B.b on
A.a.a1=B.b.b1
Join C.c on C.c.c1=B.b.b1
I have complex statements for which i need to extract source db name ( in above statement source DB are A,B,C and source tables are a,b,c &Target Db is D and target table is d)
Need output like
SourceDB SourceTbl TargetDB Targettbl
A,B,C a,b,c D d
Or we can get values in json format as well for each field.. Also this needs to accomodate for update and delete statements as well. Please assist
Thanks

You can use the SQLPARSE to parse the statement. I am providing a code below which is not optimally and efficiently written, but it has the logic to get the information
import sqlparse
raw = 'Insert into D.d ' \
'Select * from A.a join B.b on ' \
'A.a.a1=B.b.b1 Join C.c on C.c.c1=B.b.b1;'
parsed = sqlparse.parse(raw)[0]
tgt_switch = "N"
src_switch = "N"
src_table=[]
tgt_table= ""
for items in parsed.tokens:
#print(items,items.ttype)
if str(items) == "into":
tgt_switch ="Y"
if tgt_switch == "Y" and items.ttype is None:
tgt_switch = "N"
tgt_table = items
if str(items).lower() == "from" or str(items).lower() == "join":
src_switch = "Y"
if src_switch == "Y" and items.ttype is None:
src_switch = "N"
src_table.append(str(items))
target_db = str(tgt_table).split(".")[0]
target_tbl = str(tgt_table).split(".")[1]
print("Target DB is {} and Target table is {}".format(target_db,target_tbl))
for obj in src_table:
src_db = str(obj).split(".")[0]
src_tbl = str(obj).split(".")[1]
print("Source DB is {} and Source table is {}".format(src_db, src_tbl))

Snowflake does not offer any SQL statement parsing support. You can hack at it with regex'es, of course, or use any of the tools on the market.
If this query ran, and ran successfully, you can use ACCESS_HISTORY view https://docs.snowflake.com/en/sql-reference/account-usage/access_history.html to see which tables (A.a, B.b, C.c, D.d) and columns (A.a.a1, B.b.b1, C.c.c1, D.d.d1) it accessed and how (read or write).

Related

generate queries for each key in pyspark data frame

I have a data frame in pyspark like below
df = spark.createDataFrame(
[
('2021-10-01','A',25),
('2021-10-02','B',24),
('2021-10-03','C',20),
('2021-10-04','D',21),
('2021-10-05','E',20),
('2021-10-06','F',22),
('2021-10-07','G',23),
('2021-10-08','H',24)],("RUN_DATE", "NAME", "VALUE"))
Now using this data frame I want to update a table in MySql
# query to run should be similar to this
update_query = "UPDATE DB.TABLE SET DATE = '2021-10-01', VALUE = 25 WHERE NAME = 'A'"
# mysql_conn is a function which I use to connect to `MySql` from `pyspark` and run queries
# Invoking the function
mysql_conn(host, user_name, password, update_query)
Now when I invoke the mysql_conn function by passing parameters the query runs successfully and the record gets updated in the MySql table.
Now I want to run the update statement for all the records in the data frame.
For each NAME it has to pick the RUN_DATE and VALUE and replace in update_query and trigger the mysql_conn.
I think we need to a for loop but not sure how to proceed.
Instead of iterating through the dataframe with a for loop, it would be better to distribute the workload across each partitions using foreachPartition. Moreover, since you are writing a custom query instead of executing one query for each query, it would be more efficient to execute a batch operation to reduce the round trips, latency and concurrent connections. Eg
def update_db(rows):
temp_table_query=""
for row in rows:
if len(temp_table_query) > 0:
temp_table_query = temp_table_query + " UNION ALL "
temp_table_query = temp_table_query + " SELECT '%s' as RUNDATE, '%s' as NAME, %d as VALUE " % (row.RUN_DATE,row.NAME,row.VALUE)
update_query="""
UPDATE DBTABLE
INNER JOIN (
%s
) new_records ON DBTABLE.NAME = new_records.NAME
SET
DBTABLE.DATE = new_records.RUNDATE,
DBTABLE.VALUE = new_records.VALUE
""" % (temp_table_query)
mysql_conn(host, user_name, password, update_query)
df.foreachPartition(update_db)
View Demo on how the UPDATE query works
Let me know if this works for you.

Sql Alchemy filter / where clause joined by OR not AND

I want to select a bunch distinct records based off a composite key. In SQL I'd write something like this:
SELECT * FROM security WHERE (
exchange_code = 'exchange_code_1' AND code = 'code_1')
OR (exchange_code = 'exchange_code_2' AND code = 'code_2')
...
OR (exchange_code = 'exchange_code_N' AND code = 'code_N')
)
With SQLAlchemy I'd like to use the filter clause like:
query = sess.query(Security)
[query.filter(
and_(Security.exchange_code == security.exchange_code,
Security.code == security.code)
) for security in securities]
result = query.all()
The problem is filter and where join clauses with an AND not an OR... is there some way to use filter with OR?
Or is my only choice to generate a bunch of individual select's and UNION them? Something like:
first = exchanges.pop()
query = reduce(lambda query, exchange: query.union(exchange.pk_query),
first.pk_query())
query.all()
Use or_:
query = sess.query(Security).filter(
or_(*(and_(Security.exchange_code == security.exchange_code,
Security.code == security.code)
for security in securities)))
If your database supports it, you should use tuple_ instead.

Conversion of Foxpro code to Set-Based MySQL Query

Trying to convert a Visual Foxpro code to set-based MySQL query. Following is the code segment from Foxpro
lnFound=0
IF LnFound = 0 .and. rcResult = "ALL" AND PcOpOrIp = "OP"
SELECT PFile
LcTag = ORDER()
SET ORDER TO TAG PtcntlNm
=SEEK(LcPatientNo)
SCAN WHILE PtcntlNm = LcPatientNo
IF GcMResult <= "0"
GcMResult = "1-7MAT-PTC"
ENDIF
IF MONTH(cSRa.Fromdate) = MONTH(pFile.Fromdate) ;
.AND. pFile.ThruDate >= cSRa.ThruDate
** Check From/Thru Date against pFile
IF (ABS(cSRa.totalchrg) = (pFile.BDeduct+pFile.Deduct+pFile.Coinsur)) .OR. cSRa.Tchrgs = (pFile.BDeduct+pFile.Deduct+pFile.Coinsur) .or. (ABS(cSRa.totalchrg) = pFile.Total .OR. cSRa.Tchrgs = pFile.Total)
IF lnFound = 0
gcRecid = recid
gcmResult=rcResult
ENDIF
lnFound = lnFound + 1
gcUNrECID = gcunRecid + IIF(EMPTY(gCUNreCID),Recid,[,]+recid)
ENDIF
ENDIF
ENDSCAN
SELECT PFile
SET ORDER TO &LcTag
ENDIF
I have a table named pfile which I'am trying to join with another table named csra. The main aim of this is to set the record_id (gcrecid) based on the condition of three nested if statements. After setting the gcrecid variable the lnfound variable is set to one hence the third if statement condition is false from the second iteration onwards.
Here is the MySQL stored procedure which I came up with and as you can see I'm not able to completely convert the code in an efficient manner.
UPDATE csra AS cs
JOIN p051331s AS p ON cs.patientno = p.ptcntlnm
SET cs.recid = p.recid
, cs.mcsult = "ALL"
, cs.lnfound = '"1"'
WHERE cs.provider = '051331'
AND cs.lnfound = "0"
AND cs.RECID IS NULL
AND month(cs.fromdate) = month(p.fromdate)
AND p.thrudate >= cs.ThruDate
AND ABS(cs.totalchrg) = (p.bdeduct+p.deduct+p.coinsur)
OR cs.tchrgs = (p.bdeduct+p.deduct+p.coinsur)
OR ABS(cs.totalchrg) = p.total OR cs.tchrgs = p.total;
Any lead in this regard will be much appreciated as I've been working on this procedure for a couple of day with no noticeable results.
According to this partial VFP code (which is not clear on variables it uses) there is no code to be converted to set based at all. Corresponding mySQL or MS SQL or any other SQL series backend code would simply be "nothing". ie: this would be equivalant:
-- Hello to mySQL or MS SQL
PS: On your trial to convert to an update code, inner joining with csra is wrong. It is not joined in VFP code, csra values are constant --unless there is a relation on fields set-- (pointing to the "current row" values in csra only). You would want to make them into parameters as with the rest of memory variables (which is not clear from the code which ones are memory variables).

How specify join variables with different names in different MySQL tables

I need to join two tables where the common column-id that I want to use has a different name in each table. The two tables have a "false" common column name that does not work when dplyr takes the default and joins on columns "id".
Here's some of the code involved in this problem
library(dplyr)
library(RMySQL)
SDB <- src_mysql(host = "localhost", user = "foo", dbname = "bar", password = getPassword())
# Then reference a tbl within that src
administrators <- tbl(SDB, "administrators")
members <- tbl(SDB, "members")
Here are 3 attempts -- that all fail -- to pass along the information that the common column on the members side is "id" and on the adminisrators side it's "idmember":
sqlq <- semi_join(members,administrators, by=c("id","idmember"))
sqlq <- inner_join(members,administrators, by= "id.x = idmember.y")
sqlq <- semi_join(members,administrators, by.x = id, by.y = idmember)
Here's an example of the kinds of error messages I'm getting:
Error in mysqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not run statement: Unknown column '_LEFT.idmember' in 'where clause')
The examples I see out there pertain to data tables and data frames on the R side. My question is about how dplyr sends "by" statements to a SQL engine.
In the next version of dplyr, you'll be able to do:
inner_join(members, administrators, by = c("id" = "idmember"))
Looks like this is an unresolved issue:
https://github.com/hadley/dplyr/issues/177
However you can use merge:
❥ admin <- as.tbl(data.frame(id = c("1","2","3"),false = c(TRUE,FALSE,FALSE)))
❥ members <- as.tbl(data.frame(idmember = c("1","2","4"),false = c(TRUE,TRUE,FALSE)))
❥ merge(admin,members, by.x = "id", by.y = "idmember")
id false.x false.y
1 1 TRUE TRUE
2 2 FALSE TRUE
If you need to do left or outer joins, you can always use the ALL.x, or ALL arguments to merge. A thought though... You've got a sql db, why not use it?
❥ con2 <- dbConnect(MySQL(), host = "localhost", user = "foo", dbname = "bar", password = getPassword())
❥ dbGetQuery(con, "select * from admin join members on id = idmember")

For loop in python and use elements of list as variables for MYSQL query

two questions:
So I have a list of filenames, each of which I would like to feed into a MYSQL query.
The first questions is how to loop through the filelist and pass the elements (the filenames) as a variable to MYSQL?
The second question is: How do I print the results in a more elegant way without the parenthesis and L's form the Tuple output that is returned? THe way I have below works for three columns, but I'd like a flexible way that I don't have to add sublists (cleaned1, 2..) when I fetch more rows.
Any help highly appreciated!!!
MyConnection = MySQLdb.connect( host = "localhost", user = "root", \
passwd = "xxxx", db = "xxxx")
MyCursor = MyConnection.cursor()
**MyList= (File1, File2, File3, File...., File36)
For i in Mylist:
do MYSQL query**
SQL = """SELECT a.column1, a.column2, b.column2 FROM **i in MyList** a, table2 b WHERE
a.column1=b.column1;"""
SQLLen = MyCursor.execute(SQL) # returns the number of records retrieved
AllOut = MyCursor.fetchall()
**List = list(AllOut) # this puts all the TUple information into a list
cleaned = [i[0] for i in List] # this cleans up the Tuple characters)
cleaned1 = [i[1] for i in List] # this cleans up the Tuple characters)
cleaned2 = [i[2] for i in List] # this cleans up the Tuple characters)
NewList=zip(cleaned,cleaned1,cleaned2) # This makes a new List
print NewList[0:10]**
# Close the files
MyCursor.close()
MyConnection.close()
I can figure out the saving to file, but I don't know how to pass a python variable into MYSQL.
convert the tuple to a list first: using
MyList = list(MyList)
and you will have two options:
try this:
for tablename in MyList:
c.execute("SELECT a.column1, a.column2, b.column2 FROM %s a, table2 b WHERE a.column1=b.column1", (tablename))
or :
for tablename in MyList:
SQL= "SELECT a.column1, a.column2, b.column2 FROM tablevar a, table2 b WHERE a.column1=b.column1"
SQL = SQL.replace('tablevar', tablename)
c.execute(SQL)
to print the results without the brackets you can use :
for tablename in MyList:
print tablename