I have a simple FastApi endpoint that connects to a MySQL database using SqlAlchemy (based of the tutorial: https://fastapi.tiangolo.com/tutorial/sql-databases/)
I create a session using:
engine = create_engine(
SQLALCHEMY_DATABASE_URL
)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
I create the dependency:
def get_db():
db = SessionLocal()
try:
yield db
finally:
db.close()
In my route I want to execute an arbitrary SQL statement but I am not sure how to handle session, connection, cursor etc. correctly (including closing) which I learned the hard way is super important for correct performance
#app.get("/get_data")
def get_data(db: Session = Depends(get_db)):
???
Ultimately the reason for this is that my table contains machine learning features with columns that are undetermined beforehand. If there is a way to define a Base model with "all columns" that would work too, but I couldnt find that either.
I solved this using the https://www.encode.io/databases/ package instead. It handles all connections / sessions etc. under the hood. Simplified snippet:
database = databases.Database(DATABASE_URL)
#app.get("/read_db")
async def read_db():
data = await database.fetch_all("SELECT * FROM USER_TABLE")
return data
import pymysql, pandas as pd
engine = create_engine('mysql+pymysql://'+uname+':'+password+'#'+server+':'+port+'/'+db)
con = engine.connect()
df = pd.read_sql('SELECT schema_name FROM information_schema.schemata', con)
return df
Related
I am using FastAPI to design the GUI and RESTFul APIs. When one API is called, it will trigger the pytest as background task. I want to use a table in the database to monitor the progress of the test cases in pytest. Then, the RESTFul API logic can also refer to the table to update the test progress in GUI. In order to do that, I am currently using two db sessions based on sqlalchemy to query and update the database.
In the main.py of FastAPI, I implement the db session as below.
def get_db():
try:
db = SessionLocal()
yield db
finally:
db.close()
#app.post("/runtest")
async def run_test(test_request: TestRequest, backgroud_tasks: BackgroundTasks, db: Session = Depends(get_db)):
"""
add a test request and trigger background task
"""
test = Test()
test.panel_id = test_request.panel_id
test.device_pos = test_request.device_pos
db.add(test)
db.commit()
backgroud_tasks.add_task(schedule_test, test.id)
In the background task for schedule_test, I called pytest.main to start the tests by using pytest framework. In pytest part, I implemented the pytest fixtures to setup another session to talk with DB.
#pytest.fixture(scope="session")
def connection() -> Any:
SQLALCHEMY_DATABASE_URL = "postgresql+psycopg2://xxxx"
engine = create_engine(SQLALCHEMY_DATABASE_URL)
return engine.connect()
#pytest.fixture(scope="session")
def db_session(connection) -> Session:
"""Returns an sqlalchemy session, and after the test tears down everything properly."""
transaction = connection.begin()
session = sessionmaker(autocommit=False, autoflush=False, bind=connection)
yield session()
Then in my test case, I want to update the table in the database as below.
#pytest.mark.dependency(name="test_uid", scope="session")
#pytest.mark.order(1)
def test_case1(
db_session: Session
) -> None:
db_session.query(Test).filter(Test.device_pos == '1').update({'test_progress': 'Started'})
db_session.commit()
However, I found that the db session in pytest can not really update the table even after calling the commit() function.
What could be wrong in this implementation? Are there some better way to share the db session between FastAPI and Pytest? Thanks a lot!
I am trying to use federated learning framework flower with TensorFlow. My code seems to compile fine but It's not showing federated loss and accuracy. What am I doing wrong?
ServerSide Code :
import flwr as fl
import sys
import numpy as np
class SaveModelStrategy(fl.server.strategy.FedAvg):
def aggregate_fit(
self,
rnd,
results,
failures
):
aggregated_weights = super().aggregate_fit(rnd, results, failures)
"""if aggregated_weights is not None:
# Save aggregated_weights
print(f"Saving round {rnd} aggregated_weights...")
np.savez(f"round-{rnd}-weights.npz", *aggregated_weights)"""
return aggregated_weights
# Create strategy and run server
strategy = SaveModelStrategy()
# Start Flower server for three rounds of federated learning
fl.server.start_server(
server_address = 'localhost:'+str(sys.argv[1]),
#server_address = "[::]:8080" ,
config={"num_rounds": 2} ,
grpc_max_message_length = 1024*1024*1024,
strategy = strategy
)
Server Side:
According to the source code of app.py, I realized that we can set force_final_distributed_eval = True. So we need to pass this to fl.server.start_server().
Not sure whether this is intended, but it solved my problem.
I am using celery to archive the async job in python, my code flow is as following:
celery task get some data from remote api
celery beat get the celery task result from celery backend which is redis and then insert the result into redis
but in step 2, before I insert result data into mysql, I check if the data is existed.although I do the check, the duplicate data still be inserted.
my code is as following:
def get_task_result(logger=None):
db = MySQLdb.connect(host=MYSQL_HOST, port=MYSQL_PORT, user=MYSQL_USER, passwd=MYSQL_PASSWD, db=MYSQL_DB, cursorclass=MySQLdb.cursors.DictCursor, use_unicode=True, charset='utf8')
cursor = db.cursor()
....
....
store_subdomain_result(db, cursor, asset_id, celery_task_result)
....
....
cursor.close()
db.close()
def store_subdomain_result(db, cursor, top_domain_id, celery_task_result, logger=None):
subdomain_list = celery_task_result.get('result').get('subdomain_list')
source = celery_task_result.get('result').get('source')
for domain in subdomain_list:
query_subdomain_sql = f'SELECT * FROM nw_asset WHERE domain="{domain}"'
cursor.execute(query_subdomain_sql)
sub_domain_result = cursor.fetchone()
if sub_domain_result:
asset_id = sub_domain_result.get('id')
existed_source = sub_domain_result.get('source')
if source not in existed_source:
new_source = f'{existed_source},{source}'
update_domain_sql = f'UPDATE nw_asset SET source="{new_source}" WHERE id={asset_id}'
cursor.execute(update_domain_sql)
db.commit()
else:
insert_subdomain_sql = f'INSERT INTO nw_asset(domain) values("{domain}")'
cursor.execute(insert_subdomain_sql)
db.commit()
I first select if the data is existed, if the data not existed, I will do the insert, the code is as following:
query_subdomain_sql = f'SELECT * FROM nw_asset WHERE domain="{domain}"'
cursor.execute(query_subdomain_sql)
sub_domain_result = cursor.fetchone()
I do this, but it still insert duplicate data, I can't understand this.
I google this question and some one says use insert ignore or relace into or unique index, but I want to know why the code not work as expectedly?
also, In my opinion, I think if there is some cache in mysql, when I do the select, the data not really into mysql it just in the flush, so the select will return none?
First of all, sorry for my lack of knowledge regarding databases, this is my first time working with them.
I am having some issues trying to get the data from an excel file and putting it into a data base.
Using answers from the site, I managed to kind of connect to the database by doing this.
import pandas as pd
import pyodbc
server = 'XXXXX'
db = 'XXXXXdb'
# create Connection and Cursor objects
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + db + ';Trusted_Connection=yes')
cursor = conn.cursor()
# read data from excel
data = pd.read_excel('data.csv')
But I dont really know what to do now.
I have 3 tables, which are connected by a 'productID', my excel file mimics the data base, meaning that all the columns in the excel file have a place to go in the DB.
My plan was to read the excel file and make lists with each column, then insert into the DB each column value but I have no idea how to create a query that can do this.
Once I get the query I think the data insertion can be done like this:
query = "xxxxxxxxxxxxxx"
for row in data:
#The following is not the real code
productID = productID
name = name
url = url
values = (productID, name, url)
cursor.execute(query,values)
conn.commit()
conn.close
Database looks like this.
https://prnt.sc/n2d2fm
http://prntscr.com/n2d3sh
http://prntscr.com/n2d3yj
EDIT:
Tried doing something like this, but i'm getting 'not all arguments converted during string formatting' Type error.
import pymysql
import pandas as pd
connStr = pymysql.connect(host = 'xx.xxx.xx.xx', port = xxxx, user = 'xxxx', password = 'xxxxxxxxxxx')
df = pd.read_csv('GenericProducts.csv')
cursor = connStr.cursor()
query = "INSERT INTO [Productos]([ItemID],[Nombre])) values (?,?)"
for index,row in df.iterrows():
#cursor.execute("INSERT INTO dbo.Productos([ItemID],[Nombre])) values (?,?,?)", row['codigoEspecificoProducto'], row['nombreProducto'])
codigoEspecificoProducto = row['codigoEspecificoProducto']
nombreProducto = row['nombreProducto']
values = (codigoEspecificoProducto,nombreProducto)
cursor.execute(query,values)
connStr.commit()
cursor.close()
connStr.close()
I think my problem is in how I'm defining the query, surely thats not the right way
Try this, you seem to have changed the library from pyodbc to mysql, it seems to expect %s instead of ?
import pymysql
import pandas as pd
connStr = pymysql.connect(host = 'xx.xxx.xx.xx', port = xxxx, user = 'xxxx', password = 'xxxxxxxxxxx')
df = pd.read_csv('GenericProducts.csv')
cursor = connStr.cursor()
query = "INSERT INTO [Productos]([ItemID],[Nombre]) values (%s,%s)"
for index,row in df.iterrows():
#cursor.execute("INSERT INTO dbo.Productos([ItemID],[Nombre]) values (%s,%s)", row['codigoEspecificoProducto'], row['nombreProducto'])
codigoEspecificoProducto = row['codigoEspecificoProducto']
nombreProducto = row['nombreProducto']
values = (codigoEspecificoProducto,nombreProducto)
cursor.execute(query,values)
connStr.commit()
cursor.close()
connStr.close()
Trying to port some SqlAlchemy to Django and I've got this tricky little bit:
version = Column(
BIGINT,
default=literal_column(
'UNIX_TIMESTAMP() * 1000000 + MICROSECOND(CURRENT_TIMESTAMP)'
),
nullable=False)
What's the best option for porting the literal_column bit to Django? Best idea I've got so far is a function to set as the default that executes the same raw sql, but I'm not sure if there's an easier way? My google-foo is failing me there.
Edit: the reason we need to use a timestamp created by mysql is that we are measuring how out of date something is (so we need to actually know time) and we want, for correctness, to have only one time-stamping authority (so that we don't introduce error using python functions that look at system times, which could be different across servers).
At present I've got:
def get_current_timestamp(self):
cursor = connection.cursor()
cursor.execute("SELECT UNIX_TIMESTAMP() * 1000000 + MICROSECOND(CURRENT_TIMESTAMP)")
row = cursor.fetchone()
return row
version = models.BigIntegerField(default=get_current_timestamp)
which, at this point, sounds like my best/only option.
If you don't care about having a central time authority:
import time
version = models.BigIntegerField(
default = lambda: int(time.time()*1000000) )
To bend the database to your will:
from django.db.models.expressions import ExpressionNode
class NowInt(ExpressionNode):
""" Pass this in the same manner you would pass Count or F objects """
def __init__(self):
super(Now, self).__init__(None, None, False)
def evaluate(self, evaluator, qn, connection):
return '(UNIX_TIMESTAMP() * 1000000 + MICROSECOND(CURRENT_TIMESTAMP))', []
### Model
version = models.BigIntegerField(default=NowInt())
because expression nodes are not callables, the expression will be evaluated database side.