As part of hosting my website I have an SQL server on pythonanywhere.com with some data collected from my website. I need to aggregate some of the information into a new table stored in the same database. If I use the code below I can create a new table as observed by the SHOW TABLES query. However, I cannot see that table in the Django online interface provided alongside the SQL server.
Why is that the case? How can I make the new visible on the Django interface so I can browse the content and modify it?
from __future__ import print_function
from mysql.connector import connect as sql_connect
import sshtunnel
from sshtunnel import SSHTunnelForwarder
from copy import deepcopy
sshtunnel.SSH_TIMEOUT = 5.0
sshtunnel.TUNNEL_TIMEOUT = 5.0
def try_query(query):
try:
cursor.execute(query)
connection.commit()
except Exception:
connection.rollback()
raise
if __name__ == '__main__':
remote_bind_address = ('{}.mysql.pythonanywhere-services.com'.format(SSH_USERNAME), 3306)
tunnel = SSHTunnelForwarder(('ssh.pythonanywhere.com'),
ssh_username=SSH_USERNAME, ssh_password=SSH_PASSWORD,
remote_bind_address=remote_bind_address)
tunnel.start()
connection = sql_connect(user=SSH_USERNAME, password=DATABASE_PASSWORD,
host='127.0.0.1', port=tunnel.local_bind_port,
database=DATABASE_NAME)
print("Connection successful!")
cursor = connection.cursor() # get the cursor
cursor.execute("USE {}".format(DATABASE_NAME)) # select the database
cursor.execute("SHOW TABLES")
prev_tables = deepcopy(cursor.fetchall())
try_query("CREATE TABLE IF NOT EXISTS TestTable(TestName VARCHAR(255) PRIMARY KEY, SupplInfo VARCHAR(255))")
print("Created table.")
cursor.execute("SHOW TABLES")
new_tables = deepcopy(cursor.fetchall())
Related
I have airflow installed on Ubuntu as WSL on windows.
I am trying to load a delimited file that is stored on my C drive into Mysql database using the code below:
import logging
import os
import csv
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.operators.mysql_operator import MySqlOperator
from airflow.hooks.mysql_hook import MySqlHook
def bulk_load_sql(table_name, **kwargs):
local_filepath = 'some c drive path'
conn = MySqlHook(conn_name_attr='mysql_default')
conn.bulk_load(table_name, local_filepath)
return table_name
dag = DAG(
"dag_name",
start_date=datetime.datetime.now() - datetime.timedelta(days=1),
schedule_interval=None)
t1 = PythonOperator(
task_id='csv_to_stgtbl',
provide_context=True,
python_callable=bulk_load_sql,
op_kwargs={'table_name': 'mysqltablnm'},
dag=dag
)
It gives the following exception:
MySQLdb._exceptions.OperationalError: (2068, 'LOAD DATA LOCAL INFILE file request rejected due to restrictions on access.')
I have checked the following setting on mysql and its ON
SHOW GLOBAL VARIABLES LIKE 'local_infile'
Could someone please provide some pointers as to how to fix it.
Is there any other way I can load a delimited file into mysql using airflow.
For now, I have implemented a work around as follows:
def load_staging():
mysqlHook = MySqlHook(conn_name_attr='mysql_default')
#cursor = conn.cursor()
conn = mysqlHook.get_conn()
cursor = conn.cursor()
csv_data = csv.reader(open('c drive file path'))
header = next(csv_data)
logging.info('Importing the CSV Files')
for row in csv_data:
#print(row)
cursor.execute("INSERT INTO table_name (col1,col2,col3) VALUES (%s, %s, %s)",
row)
conn.commit()
cursor.close()
t1 = PythonOperator(
task_id='csv_to_stgtbl',
python_callable=load_staging,
dag=dag
)
However, it would have been great if the LOAD DATA LOCAL INFILE would have worked.
I have mySQL running on-prem and would like to migrate it with mySQL running on Cloud SQL (GCP). I first want to export tables to Cloud Storage as JSON files and then from there move them to mySQL (cloud-sql) & Big Query.
Now I wonder how I should do this - export each table as JSON or just dump the whole database to cloud storage? (we might need to change schemas for some tables that's why im thinking to do it 1 by 1).
Is there any way doing it with python pandas?
I found this --> Pandas Dataframe to Cloud Storage Bucket
but don't understand how to connect this to my GCP's cloud storage, and how to do this mycursor.execute("SELECT * FROM table") for all my tables.
EDIT 1:
so i came up with this, but this works only for the selected schema + table. how can I do this for all tables in the schema??
#!/usr/bin/env python3
import mysql.connector
import pandas as pd
from google.cloud import storage
from google.oauth2 import service_account
import os
import csv
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/python2/key.json"
#export GOOGLE_APPLICATION_CREDENTIALS="/home/python2/key.json"
#credentials = storage.Client.from_service_account_json('/home/python2/key.json')
#credentials = service_account.Credentials.from_service_account_file('key.json')
mydb = mysql.connector.connect(
host="localhost", user="root", passwd="pass_word", database="test")
mycursor = mydb.cursor(named_tuple=True)
mycursor.execute("SELECT * FROM test")
myresult = mycursor.fetchall()
df = pd.DataFrame(data=myresult)
storage_client = storage.Client()
bucket = storage_client.get_bucket("my-buckets-1234567")
blob = bucket.blob("file.json")
df = pd.DataFrame(data=myresult).to_json(orient='records')
#df = pd.DataFrame(data=myresult).to_csv(sep=";", index=False, quotechar='"', quoting=csv.QUOTE_ALL, encoding="UTF-8")
blob.upload_from_string(data=df)
I have the below code, I have the account, username, pw, etc, but I'm still seeing the below error:
raise error_class( sqlalchemy.exc.ProgrammingError:
(snowflake.connector.errors.ProgrammingError) 251001: Account must be
specified
I've also tried by changing the engine variable in my created_db_engine function like below, but I see the same error:
engine = snowflake.connector.connect(
user='USER',
password='PASSWORD',
account='ACCOUNT',
warehouse='WAREHOUSE',
database='DATABASE',
schema='SCHEMA'
)
here is my code
import pandas as pd
from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine
import snowflake.connector
from snowflake.connector.pandas_tools import write_pandas, pd_writer
from pandas import json_normalize
import requests
df = 'my_dataframe'
def create_db_engine(db_name, schema_name):
engine = URL(
account="ab12345.us-west-2.snowflakecomputing.com",
user="my_user",
password="my_pw",
database="DB",
schema="PUBLIC",
warehouse="WH1",
role="DEV"
)
return engine
def create_table(out_df, table_name, idx=False):
url = create_db_engine(db_name="db", schema_name="skm")
engine = create_engine(url)
connection = engine.connect()
try:
out_df.to_sql(
table_name, connection, if_exists="append", index=idx, method=pd_writer
)
except ConnectionError:
print("Unable to connect to database!")
finally:
connection.close()
engine.dispose()
return True
print(df.head)
create_table(df, "reporting")
Given the Snowflake documentation for SqlAlchemy, your account parameter should not include snowflakecomputing.com.
So you should try with ab12345.us-west-2 and connector will append the domain part automatically for you.
I'm using windows7 and MySQL8.0. I've tried to edit the my.ini by stopping the service first. First of all, if I tried to replace my.ini with secure_file_priv = "",it was saying access denied. So, I simply saved it with 'my1.ini' then deleted the my.ini' and again renamed 'my1.ini' to 'my.ini'. Now when I try to start the MySQL80 service from administrative tools>Services, I am unable to start it again. Even I've tried this from the CLI client, but it raises the issue of secure_file_priv. How do I do it? I've been able to store the scraped data into MySQL database using Scrapy,but not able to export it to my project directory.
#pipelines.py
from itemadapter import ItemAdapter
import mysql.connector
class QuotewebcrawlerPipeline(object):
def __init__(self):
self.create_connection()
self.create_table()
#self.dump_database()
def create_connection(self):
"""
This method will create the database connection & the cusror object
"""
self.conn = mysql.connector.connect(host = 'localhost',
user = 'root',
passwd = 'Pxxxx',
database = 'itemcontainer'
)
self.cursor = self.conn.cursor()
def create_table(self):
self.cursor.execute(""" DROP TABLE IF EXISTS my_table""")
self.cursor.execute(""" CREATE TABLE my_table (
Quote text,
Author text,
Tag text)"""
)
def process_item(self, item, spider):
#print(item['quote'])
self.store_db(item)
return item
def store_db(self,item):
"""
This method is used to write the scraped data from item container into the database
"""
#pass
self.cursor.execute(""" INSERT INTO my_table VALUES(%s,%s,%s)""",(item['quote'][0],item['author'][0],
item['tag'][0])
)
self.conn.commit()
#self.dump_database()
# def dump_database(self):
# self.cursor.execute("""USE itemcontainer;SELECT * from my_table INTO OUTFILE 'quotes.txt'""",
# multi = True
# )
# print("Data saved to output file")
#item_container.py
import scrapy
from ..items import QuotewebcrawlerItem
class ItemContainer(scrapy.Spider):
name = 'itemcontainer'
start_urls = [
"http://quotes.toscrape.com/"
]
def parse(self,response):
items = QuotewebcrawlerItem()
all_div_quotes = response.css("div.quote")
for quotes in all_div_quotes:
quote = quotes.css(".text::text").extract()
author = quotes.css(".author::text").extract()
tag = quotes.css(".tag::text").extract()
items['quote'] = quote
items['author'] = author
items['tag'] = tag
yield items
I use DBI and RMySQL package to import the whole table from the database. The code works as expected. I would like to know is there a faster way to import the same table multiple times? For example, I import the table, do some calculations, close the R session, and then import the same table again tomorrow. Is there a way to somehow cache that table and import the same table in a faster way?
The code example (working as expected):
library(RMySQL)
library(DBI)
# coonect to database
connection <- function() {
con <- DBI::dbConnect(RMySQL::MySQL(),
host = "91.234.xx.xxx",
port = 3306L,
dbname = "xxxx",
username = "xxxx",
password = "xxxx",
Trusted_Connection = "True")
}
# imoprt
db <- connection()
vix <- DBI::dbGetQuery(db, 'SELECT * FROM VIX')
invisible(dbDisconnect(db))