Unique Sequencial Number to column - sqlalchemy

I need create sequence but in generic case not using Sequence class.
USN = Column(Integer, nullable = False, default=nextusn, server_onupdate=nextusn)
, this funcion nextusn is need generate func.max(table.USN) value of rows in model.
I try using this
class nextusn(expression.FunctionElement):
type = Numeric()
name = 'nextusn'
#compiles(nextusn)
def default_nextusn(element, compiler, **kw):
return select(func.max(element.table.c.USN)).first()[0] + 1
but the in this context element not know element.table. Exist way to resolve this?

this is a little tricky, for these reasons:
your SELECT MAX() will return NULL if the table is empty; you should use COALESCE to produce a default "seed" value. See below.
the whole approach of inserting the rows with SELECT MAX is entirely not safe for concurrent use - so you need to make sure only one INSERT statement at a time invokes on the table or you may get constraint violations (you should definitely have a constraint of some kind on this column).
from the SQLAlchemy perspective, you need your custom element to be aware of the actual Column element. We can achieve this either by assigning the "nextusn()" function to the Column after the fact, or below I'll show a more sophisticated approach using events.
I don't understand what you're going for with "server_onupdate=nextusn". "server_onupdate" in SQLAlchemy doesn't actually run any SQL for you, this is a placeholder if for example you created a trigger; but also the "SELECT MAX(id) FROM table" thing is an INSERT pattern, I'm not sure that you mean for anything to be happening here on an UPDATE.
The #compiles extension needs to return a string, running the select() there through compiler.process(). See below.
example:
from sqlalchemy import Column, Integer, create_engine, select, func, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql.expression import ColumnElement
from sqlalchemy.schema import ColumnDefault
from sqlalchemy.ext.compiler import compiles
from sqlalchemy import event
class nextusn_default(ColumnDefault):
"Container for a nextusn() element."
def __init__(self):
super(nextusn_default, self).__init__(None)
#event.listens_for(nextusn_default, "after_parent_attach")
def set_nextusn_parent(default_element, parent_column):
"""Listen for when nextusn_default() is associated with a Column,
assign a nextusn().
"""
assert isinstance(parent_column, Column)
default_element.arg = nextusn(parent_column)
class nextusn(ColumnElement):
"""Represent "SELECT MAX(col) + 1 FROM TABLE".
"""
def __init__(self, column):
self.column = column
#compiles(nextusn)
def compile_nextusn(element, compiler, **kw):
return compiler.process(
select([
func.coalesce(func.max(element.column), 0) + 1
]).as_scalar()
)
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, default=nextusn_default(), primary_key=True)
data = Column(String)
e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)
# will normally pre-execute the default so that we know the PK value
# result.inserted_primary_key will be available
e.execute(A.__table__.insert(), data='single row')
# will run the default expression inline within the INSERT
e.execute(A.__table__.insert(), [{"data": "multirow1"}, {"data": "multirow2"}])
# will also run the default expression inline within the INSERT,
# result.inserted_primary_key will not be available
e.execute(A.__table__.insert(inline=True), data='single inline row')

Related

How to save Scrapy items from pipeline in MySQL table by order from another table (multiple tables)?

This is my first question in Stackoverflow ever. :P Everything work just fine, except a crawl order, I add a priority method but didn`t work correctly. Need to first write all author data, then all album and songs data and store to DB with this order. I want to query items in a MySql table by order from item in another one.
Database structure: https://i.postimg.cc/GhF4w32x/db.jpg
Example: first write all author items in Author table, and then order album items in Album table by authorId from Author table.
Github repository: https://github.com/markostalma/discogs/tree/master/discogs
P.S. I have a three item class for author, album and song parser.
Also I was tried to make a another flow of spider and put all in one item class, but with no success. Order was a same. :(
Sorry for my bad English.
You need to setup an item pipeline for this. I would suggest using SQL Alchemy to build the SQL item and connect to the DB. You're SQL Alchemy class will reflect all the table relationships you have in your DB schema. Let me show you. This is a working example of a similar pipeline that I have except you would setup your class on the SQLAlchemy to container the m2m or foreignkey relationships you need. You'll have to refer to their documentation [1] .
An even more pythonic way of doing this would be to keep your SQL Alchemy class and item names the same and do something like for k,v in item.items():
This way you can just loop the item and set what is there. Code is long and violates DRY for a purpose though.
# -*- coding: utf-8 -*-
from scrapy.exceptions import DropItem
from sqlalchemy import create_engine, Column, Integer, String, DateTime, ForeignKey, Boolean, Sequence, Date, Text
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
import datetime
DeclarativeBase = declarative_base()
def db_connect():
"""
This function connections to the database. Tables will automatically be created if they do not exist.
See __tablename__ under RateMds class
MySQL example: engine = create_engine('mysql://scott:tiger#localhost/foo')
"""
return create_engine('sqlite:///reviews.sqlite', echo=True)
class GoogleReviewItem(DeclarativeBase):
__tablename__ = 'google_review_item'
pk = Column('pk', String, primary_key=True)
query = Column('query', String(500))
entity_name = Column('entity_name', String(500))
user = Column('user', String(500))
review_score = Column('review_score', Integer)
description = Column('description', String(5000))
top_words = Column('top_words', String(10000), nullable=True)
bigrams = Column('bigrams', String(10000), nullable=True)
trigrams = Column('trigrams', String(10000), nullable=True)
google_average = Column('google_average', Integer)
total_reviews = Column('total_reviews', Integer)
review_date = Column('review_date', DateTime)
created_on = Column('created_on', DateTime, default=datetime.datetime.now)
engine = db_connect()
Session = sessionmaker(bind=engine)
def create_individual_table(engine):
# checks for tables existance and creates them if they do not already exist
DeclarativeBase.metadata.create_all(engine)
create_individual_table(engine)
session = Session()
def get_row_by_pk(pk, model):
review = session.query(model).get(pk)
return review
class GooglePipeline(object):
def process_item(self, item, spider):
review = get_row_by_pk(item['pk'], GoogleReviewItem)
if review is None:
googlesite = GoogleReviewItem(
query=item['query'],
google_title=item['google_title'],
review_score=item['review_score'],
review_count=item['review_count'],
website=item['website'],
website_type=item['website_type'],
top_words=item['top_words'],
bigrams=item['bigrams'],
trigrams=item['trigrams'],
text=item['text'],
date=item['date']
)
session.add(googlesite)
session.commit()
return item
else:
raise DropItem()
[1]: https://docs.sqlalchemy.org/en/13/core/constraints.html

Modify all numbers before insert or update

In SqlAlchemy I use:
price = Column(Numeric(18, 5))
in various placed throught my app. When I get a number formatted in swedish, with a comma instead of a dot (0,34 instead of 0.34) and try to change the price column the number gets set to 0.00000.
To solve this I have this code:
obj.price = price.replace(',','.')
But having this all over the code makes it pretty ugly and the risk is that I forget one place. Would it be possible to have some kind of generic converter function which gets called before a value is converted from a string to a Numeric? And that I have that in one place only.
Check the validates decorator of SQLAlchemy: http://docs.sqlalchemy.org/en/rel_1_0/orm/mapped_attributes.html
A quick way to add a “validation” routine to an attribute is to use
the validates() decorator. An attribute validator can raise an
exception, halting the process of mutating the attribute’s value, or
can change the given value into something different.
In your case the code could look similar to:
from sqlalchemy.orm import validates
class Obj(Base):
__tablename__ = 'obj'
id = Column(Integer, primary_key=True)
price = Column(Numeric(18, 5))
#validates('price')
def validate_price(self, key, price):
if ',' in price:
return float(price.replace(',','.'))
else:
return float(price)

How to use MySQL's standard deviation (STD, STDDEV, STDDEV_POP) function inside SQLAlchemy?

I need to use the STD function of MySQL through SQLAlchemy, but after a couple of minutes of search, it looks like there is no func.<> way of using this one in SQLAlchemy. Is it not supported, or am I missing something?
Found this issue while coding some aggregates on SQLAlchemy.
Citing the docs:
Any name can be given to func. If the function name is unknown to SQLAlchemy, it will be rendered exactly as is. For common SQL functions which SQLAlchemy is aware of, the name may be interpreted as a generic function which will be compiled appropriately to the target database.
Basically func will generate a function matching the attribute "func." if its not a common function of which SQLAlchemy is aware of (like func.count).
To keep the advantages of RDBMS abstraction that comes with any ORM I always suggest to use ANSI functions to decouple the code from the DB Engine.
For a working sample you can add a connection string and execute the following code:
from sqlalchemy.orm import sessionmaker
from sqlalchemy import func, create_engine, Column
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.types import DateTime, Integer, String
# Add your connection string
engine = create_engine('My Connection String')
Base = declarative_base(engine)
Session = sessionmaker(bind=engine)
db_session = Session()
# Make sure to have a table foo in the db with foo_id, bar, baz columns
class Foo(Base):
__tablename__ = 'foo'
__table_args__ = { 'autoload' : True }
query = db_session.query(
func.count(Foo.bar).label('count_agg'),
func.avg(Foo.foo_id).label('avg_agg'),
func.stddev(Foo.foo_id).label('stddev_agg'),
func.stddev_samp(Foo.foo_id).label('stddev_samp_agg')
)
print(query.statement.compile())
It will generate the following SQL
SELECT count(foo.bar) AS count_agg,
avg(foo.foo_id) AS avg_agg,
stddev(foo.foo_id) AS stddev_agg,
stddev_samp(foo.foo_id) AS stddev_samp_agg
FROM foo

Association Proxy SQLAlchemy

This source details how to use association proxies to create views and objects with values of an ORM object.
However, when I append an value that matches an existing object in the database (and said value is either unique or a primary key), it creates a conflicting object so I cannot commit.
So in my case is this only useful as a view, and I'll need to use ORM queries to retrieve the object to be appended.
Is this my only option or can I use merge (I may only be able to do this if it's a primary key and not a unique constraint), OR set up the constructor such that it will use an existing object in the database if it exists instead of creating a new object?
For example from the docs:
user.keywords.append('cheese inspector')
# Is translated by the association proxy into the operation:
user.kw.append(Keyword('cheese inspector'))
But I'd like to to be translated to something more like: (of course the query could fail).
keyword = session.query(Keyword).filter(Keyword.keyword == 'cheese inspector').one()
user.kw.append(keyword)
OR ideally
user.kw.append(Keyword('cheese inspector'))
session.merge() # retrieves identical object from the database, or keeps new one
session.commit() # success!
I suppose this may not even be a good idea, but it could be in certain use cases :)
The example shown on the documentation page you link to is a composition type of relationship (in OOP terms) and as such represents the owns type of relationship rather then uses in terms of verbs. Therefore each owner would have its own copy of the same (in terms of value) keyword.
In fact, you can use exactly the suggestion from the documentation you link to in your question to create a custom creator method and hack it to reuse existing object for given key instead of just creating a new one. In this case the sample code of the User class and creator function will look like below:
def _keyword_find_or_create(kw):
keyword = Keyword.query.filter_by(keyword=kw).first()
if not(keyword):
keyword = Keyword(keyword=kw)
# if aufoflush=False used in the session, then uncomment below
#session.add(keyword)
#session.flush()
return keyword
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String(64))
kw = relationship("Keyword", secondary=lambda: userkeywords_table)
keywords = association_proxy('kw', 'keyword',
creator=_keyword_find_or_create, # #note: this is the
)
I recently ran into the same problem. Mike Bayer, creator of SQLAlchemy, refered me to the “unique object” recipe but also showed me a variant that uses an event listener. The latter approach modifies the association proxy so that UserKeyword.keyword temporarily points to a plain string and only creates a new Keyword object if the keyword doesn't already exist.
from sqlalchemy import event
# Same User and Keyword classes from documentation
class UserKeyword(Base):
__tablename__ = 'user_keywords'
# Columns
user_id = Column(Integer, ForeignKey(User.id), primary_key=True)
keyword_id = Column(Integer, ForeignKey(Keyword.id), primary_key=True)
special_key = Column(String(50))
# Bidirectional attribute/collection of 'user'/'user_keywords'
user = relationship(
User,
backref=backref(
'user_keywords',
cascade='all, delete-orphan'
)
)
# Reference to the 'Keyword' object
keyword = relationship(Keyword)
def __init__(self, keyword=None, user=None, special_key=None):
self._keyword_keyword = keyword_keyword # temporary, will turn into a
# Keyword when we attach to a
# Session
self.special_key = special_key
#property
def keyword_keyword(self):
if self.keyword is not None:
return self.keyword.keyword
else:
return self._keyword_keyword
#event.listens_for(Session, "after_attach")
def after_attach(session, instance):
# when UserKeyword objects are attached to a Session, figure out what
# Keyword in the database it should point to, or create a new one
if isinstance(instance, UserKeyword):
with session.no_autoflush:
keyword = session.query(Keyword).\
filter_by(keyword=instance._keyword_keyword).\
first()
if keyword is None:
keyword = Keyword(keyword=instance._keyword_keyword)
instance.keyword = keyword

Coercion in SQLAlchemy from Column annotations

Good day everyone,
I have a file of strings corresponding to the fields of my SQLAlchemy object. Some fields are floats, some are ints, and some are strings.
I'd like to be able to coerce my string into the proper type by interrogating the column definition. Is this possible?
For instance:
class MyClass(Base):
...
my_field = Column(Float)
It feels like one should be able to say something like MyClass.my_field.column.type and either ask the type to coerce the string directly or write some conditions and int(x), float(x) as needed.
I wondered whether this would happen automatically if all the values were strings, but I received Oracle errors because the type was incorrect.
Currently I naively coerce -- if it's float()able, that's my value, else it's a string, and I trust that integral floats will become integers upon inserting because they are represented exactly. But the runtime value is wrong (e.g. 1.0 vs 1) and it just seems sloppy.
Thanks for your input!
SQLAlchemy 0.7.4
You can iterate over columns of the mapped Table:
for col in MyClass.__table__.columns:
print col, repr(col.type)
... so you can check the type of each field by its name like this:
def get_col_type(cls_, fld_):
for col in cls_.__table__.columns:
if col.name == fld_:
return col.type # this contains the instance of SA type
assert Float == type(get_col_type(MyClass, 'my_field'))
I would cache the results though if your file is large in order to save the for-loop on every row imported from the file.
Type coercion for sqlalchemy prior to committing to some database.
How can I verify Column data types in the SQLAlchemy ORM?
from sqlalchemy import (
Column,
Integer,
String,
DateTime,
)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import event
import datetime
Base = declarative_base()
type_coercion = {
Integer: int,
String: str,
DateTime: datetime.datetime,
}
# this event is called whenever an attribute
# on a class is instrumented
#event.listens_for(Base, 'attribute_instrument')
def configure_listener(class_, key, inst):
if not hasattr(inst.property, 'columns'):
return
# this event is called whenever a "set"
# occurs on that instrumented attribute
#event.listens_for(inst, "set", retval=True)
def set_(instance, value, oldvalue, initiator):
desired_type = type_coercion.get(inst.property.columns[0].type.__class__)
coerced_value = desired_type(value)
return coerced_value
class MyObject(Base):
__tablename__ = 'mytable'
id = Column(Integer, primary_key=True)
svalue = Column(String)
ivalue = Column(Integer)
dvalue = Column(DateTime)
x = MyObject(svalue=50)
assert isinstance(x.svalue, str)
I'm not sure if I'm reading this question correctly, but I would do something like:
class MyClass(Base):
some_float = Column(Float)
some_string = Column(String)
some_int = Column(Int)
...
def __init__(self, some_float, some_string, some_int, ...):
if isinstance(some_float, float):
self.some_float = somefloat
else:
try:
self.some_float = float(somefloat)
except:
# do something intelligent
if isinstance(some_string, string):
...
And I would repeat the checking process for each column. I would trust anything to do it "automatically". I also expect your file of strings to be well structured, otherwise something more complicated would have to be done.
Assuming your file is a CSV (I'm not good with file reads in python, so read this as pseudocode):
while not EOF:
thisline = readline('thisfile.csv', separator=',') # this line is an ordered list of strings
thisthing = MyClass(some_float=thisline[0], some_string=thisline[1]...)
DBSession.add(thisthing)