How to use SQLAlchemy Utils in a SQLAlchemy model - sqlalchemy

I'm trying to create a user model that uses UUID as primary key:
from src.db import db # SQLAlchemy instance
import sqlalchemy_utils
import uuid
class User(db.Model):
__tablename__ = 'user'
id = db.Column(sqlalchemy_utils.UUIDType(binary=True), primary_key=True, nullable=False)
But when I generate the migrations I receive:
File "/home/pc/Downloads/project/auth/venv/lib/python3.6/site-packages/alembic/runtime/environment.py", line 836, in run_migrations
self.get_context().run_migrations(**kw)
File "/home/pc/Downloads/project/auth/venv/lib/python3.6/site-packages/alembic/runtime/migration.py", line 330, in run_migrations
step.migration_fn(**kw)
File "/home/pc/Downloads/project/auth/migrations/versions/efae4166f832_.py", line 22, in upgrade
sa.Column('id', sqlalchemy_utils.types.uuid.UUIDType(length=16), nullable=False),
NameError: name 'sqlalchemy_utils' is not defined`
I had try to explicity inform the module I'm using like this and use a 'internal' implementation that SQLAlchemy
Obs: If I manualy import the sqlalchemy_utils in the /migrations/version/efae4166f832_.py and remove the length that is generated automaticaly sa.Column('id', sqlalchemy_utils.types.uuid.UUIDType(length=16), nullable=False) it works fine
I generate the migrations using a generate.py script:
from src import create_app
from src.db import db
from flask_migrate import Migrate
# Models
from src.user.models.user import User
app = create_app()
migrate = Migrate(app, db)`
Obs: MySQL engine
I expect that when I generate migration it generate a user model that uses UUID implemented from SQLAlchemy Utils as primary key

You have just to add:
import sqlalchemy_utils
to your script.py.mako inside migrations folder

Thanks, Marco, but I have already fixed it. I have put the import import sqlalchemy_utils inside env.py and script.py.mako, I have also put the following function:
def render_item(type_, obj, autogen_context):
"""Apply custom rendering for selected items"""
if type_ == "type" and isinstance(obj, sqlalchemy_utils.types.uuid.UUIDType):
# Add import for this type
autogen_context.imports.add("import sqlalchemy_utils")
autogen_context.imports.add("import uuid")
return "sqlalchemy_utils.types.uuid.UUIDType(), default=uuid.uuid4"
# Default rendering for other objects
return False
Inside the env.py, and at the same file I have set render_item=render_item in the function run_migrations_online:
context.configure(
...,
render_item=render_item,
...
)
I researched to do this automatically, but I couldn't find nothing that could help me.
The order of the operations matter:
export FLASK_APP=manage.py
flask db init
Do the tutorial above
flask db migrate
flask db upgrade

Background
It would be ideal if you didn't have to go and manually edit each migration file with an import sqlalchemy_utils statement.
Looking at the Alembic documentation, script.py.mako is "a Mako template file which is used to generate new migration scripts." Therefore, you'll need to re-generate your migration files, with Mako already importing sqlalchemy_utils as part of the migration file generation.
Fix
If possible, remove your old migrations (they're probably broken anyway), and add the import sqlalchemy_utils like so to your script.py.mako file:
from alembic import op
import sqlalchemy as sa
import sqlalchemy_utils #<-- line you add
${imports if imports else ""}
then just rerun your alembic migrations:
alembic revision --autogenerate -m "create initial tables"
when you go to look at your migration file you should see sqlalchemy_utils already imported via the mako script.
hope that helps.

Adding import sqlalchemy_utils in the script.py.mako file will automatically import this line on all the migration files that are generated and resolve the issue.
from alembic import op
import sqlalchemy as sa
import sqlalchemy_utils
${imports if imports else ""}

Add the import sqlalchemy_utils line to the newly-created migrations/versions/{hash}_my_comment.py file. However, this will only fix the problem for that specific step of the migration. If you expect that you'll be making lots of changes to columns which reference sqlalchemy_utils, you should probably do something more robust like Walter's suggestion. Even then, though, it looks like you may need to add code to properly deal with each column type you end up using.
NB: Despite seeing the suggestion in multiple places of just adding the import line to the script.py.mako file, that did not work for me.

Related

Palantir Foundry How to allow dynamic number of input in compute (Code repository)

I have a folder where I will upload one file every month. The file will have the same format in every month.
First problem
The idea is to concatenate all the files in this folder into one file. Currently I am hardcoding the filenames (filename[0], filename[1], filename[2]..) but imagine later I will have 50 files, should I explicitly add them to the transform_df decorator? Is there any other method to handle this?
Second problem:
Currently I have let's say 4 files (2021_07, 2021_08, 2021_09, 2021_10) and I want whenever I add the file presenting 2021_12 data to avoid changing the code.
If I add input_5 = Input(path_to_2021_12_do_not_exists) the code will not be run and give an error.
How can I implement the code for future files and let the code ignore the input if it does not exist without manually each month add a new value to my code?
Thank you
# from pyspark.sql import functions as F
from transforms.api import transform_df, Input, Output
from pyspark.sql.functions import to_date, year, col
from pyspark.sql.types import StringType
from myproject.datasets import utils
from pyspark.sql import DataFrame
from functools import reduce
input_dir = '/Company/Project_name/'
prefix_filename = 'DataInput1_'
suffixes = ['2021_07', '2021_08', '2021_09', '2021_10', '2021_11', '2021_12']
filenames = [input_dir + prefix_filename + suffixe for suffixe in suffixes]
#transform_df(
Output("/Company/Project_name/Data/clean/File_concat"),
input_1=Input(filenames[0]),
input_2=Input(filenames[1]),
input_3=Input(filenames[2]),
input_4=Input(filenames[3]),
)
def compute(input_1, input_2, input_3, input_4):
input_dfs = [input_1, input_2, input_3, input_4]
dfs = []
def transformation_input(df):
# some transformation
return df
for input_df in input_dfs:
dfs.append(transformation_input(input_df))
dfs = reduce(DataFrame.unionByName, dfs)
return dfs
This question comes up a lot, the simple answer is that you don't. Defining datasets and executing a build on them are two different steps executed at different stages.
Whenever you commit your code and run the checks, your overall python code is executed during the renderSchrinkwrap stage, except for the compute part. This allows Foundry to discover what datasets exist and publish.
Publishing involves creating your dataset and putting whatever is inside your compute function is published into the jobspec of the dataset, so foundry knows what code to execute whenever you run a build.
Once you hit build on the dataset, Foundry will only pick up whatever is on the jobspec and execute it. Any other code has already run during your checks, and it has run just once.
So any dynamic input/output would require you to re-run checks on your repo, which means that some code change would have had to happen since the Checks is part of the CI process, not part of the build.
Taking a step back, assuming each of your input files has the same schema, Foundry would expect you to have all of those files in the same dataset as append transactions.
This might not be possible though, if for instance, the only indication of the "year" of the data is embedded in the filename, but your sample code would indicate that you expect all these datasets to have the same schema and easily union together.
You can do this manually through the Dataset Preview - just use the Upload File button or drag-and-drop the new file into the Preview window - or, if it's an "end user" workflow, with a File Upload Widget in a Workshop app. You may need to coordinate with your Foundry support team if this widget isn't available.
Bit late to the post although for anyone who is interested in an answer to most of the question. Dynamically determining file names from within a folder is not doable although having some level of dynamic input is possible as follows:
# from pyspark.sql import functions as F
from transforms.api import transform, Input, Output
from pyspark.sql.functions import to_date, year, col
from pyspark.sql.types import StringType
from myproject.datasets import utils
from pyspark.sql import DataFrame
# from functools import reduce
from transforms.verbs.dataframes import union_many # use this instead of reduce
input_dir = '/Company/Project_name/'
prefix_filename = 'DataInput1_'
suffixes = ['2021_07', '2021_08', '2021_09', '2021_10', '2021_11', '2021_12']
filenames = [input_dir + prefix_filename + suffixe for suffixe in suffixes]
inputs = {('input{}'.format(index)): Input(filename) for (index, filename) in enumerate(filenames))}
#transform(
output=Output("/Company/Project_name/Data/clean/File_concat"),
**inputs
)
def compute(output, **kwargs):
# Extract dataframes from input datasets
input_dfs = [dataset_df.dataframe() for dataset_name, dataset_df in kwargs.items()]
dfs = []
def transformation_input(df):
# some transformation
return df
for input_df in input_dfs:
dfs.append(transformation_input(input_df))
# dfs = reduce(DataFrame.unionByName, dfs)
unioned_dfs = union_many(*dfs)
return unioned_dfs
Couple points:
Created dynamic input dict.
That dict is read into the transform using **kwargs.
Using transform decorator not transform_df, we can extract the dataframes.
(not in question) Combine multiple dataframes using union_many function from transforms_verbs library.

fastapi fastapi-users with Database adapter for SQLModel users table is not created

I was trying to use fastapi users package to quickly Add a registration and authentication system to my FastAPI project which uses the PostgreSQL database. I am using asyncio to be able to create asynchronous functions.
In the beginning, I used only sqlAlchemy and I have tried their example here. And I added those line of codes to my app/app.py to create the database at the starting of the server. and everything worked like a charm. the table users was created on my database.
#app.on_event("startup")
async def on_startup():
await create_db_and_tables()
Since I am using SQLModel I added FastAPI Users - Database adapter for SQLModel to my virtual en packages. And I added those lines to fastapi_users/db/__init__.py to be able to use the SQL model database.
try:
from fastapi_users_db_sqlmodel import ( # noqa: F401
SQLModelBaseOAuthAccount,
SQLModelBaseUserDB,
SQLModelUserDatabase,
)
except ImportError: # pragma: no cover
pass
I have also modified app/users.py, to use SQLModelUserDatabase instead of sqlAchemy one.
async def get_user_manager(user_db: SQLModelUserDatabase = Depends(get_user_db)):
yield UserManager(user_db)
and the app/dp.py to use SQLModelUserDatabase, SQLModelBaseUserDB, here is the full code of app/db.py
import os
from typing import AsyncGenerator
from fastapi import Depends
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import sessionmaker
from fastapi_users.db import SQLModelUserDatabase, SQLModelBaseUserDB
from sqlmodel import SQLModel
from app.models import UserDB
DATABASE_URL = os.environ.get("DATABASE_URL")
engine = create_async_engine(DATABASE_URL)
async_session_maker = sessionmaker(
engine, class_=AsyncSession, expire_on_commit=False)
async def create_db_and_tables():
async with engine.begin() as conn:
await conn.run_sync(SQLModel.metadata.create_all)
async def get_async_session() -> AsyncSession:
async_session = sessionmaker(
engine, class_=AsyncSession, expire_on_commit=False
)
async with async_session() as session:
yield session
async def get_user_db(session: AsyncSession = Depends(get_async_session)):
yield SQLModelUserDatabase(UserDB, session, SQLModelBaseUserDB)
Once I run the code, the table is not created at all. I wonder what could be the issue. I could not understand. Any idea?
I had the same problem, but managed to make it work by making a couple changes
The changes that I needed to make (code is based on the full example in the documentation):
In models.py, make UserDB inherit from SQLModelBaseUserDB, User, and add table=True for sqlmodel to create the table:
class UserDB(SQLModelBaseUserDB, User, table=True):
pass
It's important that SQLModelBaseUserDB is inherited from first, because otherwise User.id trumps SQLModelBaseUserDB.id and sqlmodel cannot find primary_key column
Use SQLModelUserDatabaseAsync in get_user_db, like this (as far as I understand, you don't need to pass in SQLModelBaseUserDB in SQLModelUserDatabase. The third argument is for oauth account model):
async def get_user_db(session: AsyncSession = Depends(get_async_session)):
yield SQLModelUserDatabaseAsync(UserDB, session)
By the time I posted this question that was the answer I received from one of the maintainer of fastapi-users that made me switch to sqlAlchemy that time, actually I do not know if they officially released sqlModel DB adapter or not
My guess is that you didn't change the UserDB model so that it inherits from the SQLModelBaseUserDB one. It's necessary in order to let SQLModel detect all your models and create them.
You can have an idea of what it should look like in fastapi-users-db-sqlmodel tests: https://github.com/fastapi-users/fastapi-users-db-sqlmodel/blob/3a46b80399f129aa07a834a1b40bf49d08c37be1/tests/conftest.py#L25-L27
Bear in mind though that we didn't officially release this DB adapter; as they are some problems with SQLModel regarding UUID (tiangolo/sqlmodel#25). So you'll probably run into issues.
and here is the GitHub link of the issue: https://github.com/fastapi-users/fastapi-users/discussions/861

Pyspark 2.4 Issue faced while passing properties file in spark submit

I have a pyspark program which connects to MySQL db successfully and reads a table. Now, I am trying to pass the database credentials from a properties file, instead of embedding them in the code, but not able to make it work.
from pyspark.sql import SparkSession
from pyspark.sql.types import *
#spark-submit –packages mysql:mysql-connector-java:8.0.13 workWithMySQL.py
spark = SparkSession.builder.appName(“MySQL connection”).getOrCreate()
#create spart context from spark session
sc = spark.sparkContext
#read from mysql
#configuration details
hostname=”localhost”
jdbcport=3306
dbname=”TEST”
username=”kanchan#localhost”
password=”password”
mysql_url = “jdbc:mysql://{0}:{1}/{2}?user={3}&password={4}”.format(hostname,jdbcport,dbname,username,password)
mysql_driver = “com.mysql.jdbc.Driver”
query = “(select * from cats) t1_alias”
df4 = spark.read.format(“jdbc”).options(driver=mysql_driver,url=mysql_url, dbtable=query).load()
df4.show()
Now, I have created a properties file jdbc.properties at $SPARK_HOME/conf
spark.mysql.user kanchan#localhost
spark.mysql.password password
And add it in spark-submit call
spark-submit –packages mysql:mysql-connector-java:8.0.13 --files $SPARK_HOME/conf/jdbc.properties workWithMySQL.py
replaced the assignments:
username=sc.getConf.getOption("spark.mysql.user")
password=sc.getConf.getOption("spark.mysql.user")
when run. it throws an error saying the function has no attribute as get option. I could not locate the appropriate documentation for it. Can anyone help?
Further, is it possible to encrypt the credentials or ensure data security by any other means?
The method getOption should be replace with the method get.
username=sc.getConf().get("spark.mysql.user")

Handling multiple models.py with alembic

Our web app is based on sqlalchemy in pyramid framework and we are looking to use alembic for managing database migrations. The web application consists of various packages that operate on one database. This consequently means we have multiple models.py that need to be migrated. I am confused as to how to handle this. I could progress some far using the following in my env.py
from pkg_a.app.models import Base as pkg_a_base
from pkg_b.app.models import Base as pkg_b_base
from pkg_c.app.models import Base as pkg_c_base
def combine_metadata(*args):
m = MetaData()
for metadata in args:
for t in metadata.tables.values():
t.tometadata(m)
return m
target_metadata = combine_metadata(pkg_a_base, pkg_b_base, pkg_c_base)
This works great the first time. However, if I add one more model later, just adding that to this list doesn't do much. I was expecting that running
alembic revision -m "added a new model pkg_d.models" --version-path=migrations/versions --autogenerate
would generate a new version file that would have the code for adding the tables from pkg_d.models. But it isn't so. What am I doing wrong here.
If your packages are completely independent and separate then each of them should have a separate migration history - either stored inside each package (pkg_a.migrations, pkg_b.migrations etc.) or at least stored in a separate top-level migrations directory via having a separate section in alembic.ini and using -n parameter to alembic command to specify which section to use:
[pkg_a]
# path to migration scripts
script_location = migrations_a
sqlalchemy.url = xxx
[pkg_b]
script_location = migrations_b
sqlalchemy.url = xxx
[pkg_c]
script_location = migrations_c
sqlalchemy.url = xxx
And then you'll be able to use alembic revision -n pkg_a -m "added a new model pkg_a.models"
If, however, your models are dependent in any way then they should use a common Base - you do realize you don't have to keep all your SQLAlchemy stuff in a single models.py file, don't you? I would create a separate "base" package which would contain a common MetaData, Base and other SQLAlchemy configuration stuff which would then be imported by your other packages.

JRuby - Calling Java class - Cannot import class 'com.foo.bar' as 'bar'

I am getting the below error when in import a custom jar.
require 'java'
require '/path/custom.jar'
java_import 'com.foo.bar'
The error reported is:
cannot import class com.foo.bar' asbar'
I am trying to built a custom logstash input plugin.
as it says because java_import 'com.foo.bar' is likely not a class-name, so you need to specify a (qualified) class-name or you can import the whole package with a java_package declartion
if 'com.foo.bar' is a "valid" Java class-name (which is quite bad practice) it simply won't work since in Ruby constant names start with upper-case, you will be left with doing the import into the current name space (module) on your own e.g. : Bar = Java::com.foo.bar