How can I have two similar relationships between two tables in SQLAlchemy? - sqlalchemy

I have been coming back to and given up on this several times.
Technically it's flask-sqlalchemy if this makes a difference. I think what I am looking to do is make two many-to-one (or is it one-to-one?) relationships in the Pair table both referring to the Word table. A pair must have two words and a word can be in many pairs.
class Pair(db.Model):
__tablename__ = "pairs"
id = db.Column(db.Integer, primary_key=True)
word_id = db.Column(db.Integer, db.ForeignKey('words.id'), nullable=False)
partner_id = db.Column(
db.Integer, db.ForeignKey('words.id'), nullable=False)
word_sound = db.Column(db.String(), nullable=False)
partner_sound = db.Column(db.String(), nullable=False)
# These two relationships are where I'm particularly lost. The words 1 and 2 need to
# refer to two separate sounds, so I can't just have two words and two sounds in any
# order. Therefore I need to have two one-to-one links to the same table:
word1t = db.relationship(
"Word", foreign_keys=[word_id], primaryjoin="Pair.word_id==Word.id",, back_populates="pairs")
word2t = db.relationship(
"Word", foreign_keys=[partner_id], primaryjoin="Pair.partner_id==Word.id", back_populates="pairs")
class Word(db.Model):
__tablename__ = "words"
id = db.Column(db.Integer, primary_key=True)
word = db.Column(db.String(), nullable=False)
# Relationships
# "partners" refers to all words that this word has a pair (link) with
partners = db.relationship(
'Word',
secondary="pairs",
primaryjoin=id == Pair.word_id,
secondaryjoin=id == Pair.partner_id,
backref=db.backref('words')
)
# "pairs" is supposed to refer to all pairs that this word is a part of.
pairs = db.relationship('Pair', primaryjoin=id ==
or_(Pair.word_id, Pair.partner_id))
I have these models. Basically I have words and (word-)pairs, where pairs consist of two words and also hold some information about the type of pair. In a pair I need two different words that each hold their own sound. This sound is defined by the word's realationship to the other word, so fx in a pair of "pat" and "hat", "pat" has "p" and "hat" has "h" because these sounds comprise the difference between the words. So what sound belongs with what word is important. But I can't just put the sound in the Word table, because "pat" has a different sound (t) when linked to "pac" for example.
I have tried a number of things and deleted them again, but now I'm just hoping someone will show me the proper way to do it.
Help me, Stackoverflow. You're my only hope.

Related

postgres store reference to field in json

It is possible to store json in postgres using the json data type. Check this tutorial for an introduction: http://www.postgresqltutorial.com/postgresql-json/
Consider I am storing the following json in such a field:
{
"address": {
"street1": "123 seasame st"
}
}
I want a to store separately a reference to the street field. For example, I might have another object which is using data from this json structure and wants to store a reference to where it got the data. Maybe something like this:
class Product():
__tablename__ = 'Address'
street_1 = Column(String)
data_source = ?
Now I could make data_source a string and just store namespaces like address.street, but if I did this postgres has no idea what that means. Working with that in queries would mean parsing the string and other inefficient stuff. Does postgres support referring to fields stored inside json data structures?
This question is related to JSON foreign keys in PostgreSQL , but in this case I don't necessarily want a fk relationship. I just want to create a reference, which is not necessarily enforced in the way a fk is.
update:
To be more clear, I want to reference the location of something in the json structure on another attribute and store that reference in a column. In the below code, Address.data_source is a reference to the location of the street data (for example address.street1 in this case)
class Address():
__tablename__ = 'Address'
street_1 = Column(String)
sample_id = Column(Integer, ForeignKey('DataSample.uid'))
data_source = ?
class DataSample():
__tablename__ = 'DataSample'
uid = Column(Integer, primary_key=True)
data = Column(JSONB)
body = {
"address": {
"street1": "123 seasame st"
}
}
datasample = DataSample(data=body)
address = Address(street_1=datasample.data['address']['street_1'],
sample_id=datasample.uid,
data_source=?)
As clarified, the question is seeking a way to flexibly specify a path within a JSON object of a particular record. Keys are being handled in normal columns. Constraints on JSONB fields are not available, and there is no specific support for specifying paths within JSON objects.
I worked with the following in SQL Fiddle using PostgreSQL 9.6:
CREATE TABLE datasample (
id integer PRIMARY KEY,
data jsonb
);
CREATE TABLE address (
id integer PRIMARY KEY,
street_1 text,
sample_id integer REFERENCES datasample (id),
data_source text
);
INSERT INTO datasample(id, data)
VALUES (1, '{"address":{"street_1": "123 seasame st"}}');
INSERT INTO address(id,street_1, sample_id, data_source)
VALUES (1,'123 seasame st',1,'datasample.data->''address''->>''street''');
A typical lookup of the street address (needed to retrieve street_1) would resemble:
SELECT datasample.data->'address'->>'street_1'
FROM datasample
WHERE id=1;
There is no special postgres type for identifying columns. Strings are the closest available and you will need to retrieve the string (or array of strings, or object containing strings, if one of those simplifies parsing) and use it to build the query. In tbe first code block, I stored it as the (escaped) fragment of query - 'datasample.data->''address''->>''street'''. Though longer, it would require only retrieval and unescaping to use in a new custom query. I did not find a way to use the string as a fragment within the same SQL statement, though it might be possible to combine it with other bits of text to form a full statement that could be run through EXECUTE.

How do I access both a parent and child JSON record, if the parent is not the highest element?

I am trying to load the SQuAD dataset using Pandas. The JSON elements in my dataset are structured like this, where everything that ends in "s" represents a list:
-data
-- title
-- paragraphs
-- context
--- qas
---- id
---- question
----- answers
------ answerStart
------ answerText
I want to create a DataFrame that looks something like this:
question title context answerText
However, I only want just one "answerText" value per question, so that means only one answer per "qas" field. Since "qas" has an id that is unique to each pair, it may be best to create an "answers" dataframe, then another dataframe that looks like this:
qas_id answer_id
However, I'm not quite sure how to best set this schema up. Here's what I have tried:
with open(filename) as file:
data = json.load(file)["data"]
questions = pd.io.json.json_normalize(data,record_path=["paragraphs","qas","question"],meta=["paragraphs","qas","id"])
answers = pd.io.json.json_normalize(data,record_path=["paragraphs","qas","answers"],meta=["paragraphs","qas","id"])
Since meta apparently only allows access to the children of the top element, how do I create a dataframe with both the "id" element of "qas" and the "answerStart" and "answerText" elements of answers?
I believe I have a working solution:
import json
import re
import string
import pandas as pd
def readFile(filename):
with open(filename) as file:
data = json.load(file)["data"]
qas = pd.io.json.json_normalize(data,record_path=["paragraphs","qas"],meta=["title"])
#print(qas["question"])
#Gather a list of where all answers should be so we can shove them into a DataFrame.
# Haven't found a more efficient way to do this yet.
answer_ids = set()
answerId = 0
for index,row in qas.iterrows():
answer_ids.add(answerId)
answerId = answerId + len(row["answers"])
print("Finished with answer ids.")
# Map qas pair IDs to answer IDs.
answer_ids = pd.DataFrame(list(answer_ids))
print("Finished converting answer_ids to DataFrame.")
question_answerId = pd.DataFrame(qas["question"]).join(answer_ids,how="outer")
question_answerId.columns = ["question","answer_id"]
#print("Id-answerID columns: ",id_answerId.columns)
print("finished creating intermediary table.")
# Load answers into a data frame.
answers = pd.io.json.json_normalize(data,record_path=["paragraphs","qas","answers"])
answers.rename(columns={"text":"answer_text"},inplace=True)
# Give each answer an ID.
answers["id"] = answers.index
print("Finished creating answers dataframe.")
qas = qas.drop(labels=["answers"],axis=1) # Not needed any longer; we have the answers!
#print("Dropped column 'answers' from qas.")
# Map qas dataframe to answer table via id_answerId
qas_answerId = pd.merge(qas,question_answerId,how="inner",on="question")
# Check that no duplicates exist in qas_answerId
qas_answerId = qas_answerId.drop_duplicates("question")
assert qas_answerId.duplicated("question").any() == False
print("Finished joining qas to answer id")
# Merge qas_answerId with answers.
returnDataFrame = pd.merge(qas_answerId,answers,how="inner",left_on="answer_id",right_on="id")
#print("Returned data frame: ",returnDataFrame)
print("Done!")
return returnDataFrame

How to query JSON Array in Postgres with SqlAlchemy?

I have a SqlAlchemy model defined
from sqlalchemy.dialects.postgresql import JSONB
class User(db.Model):
__tablename__ = "user"
id = db.Column(db.Integer, primary_key=True)
nickname = db.Column(db.String(255), nullable=False)
city = db.Column(db.String(255))
contact_list = db.Column(JSONB)
created_at = db.Column(db.DateTime, default=datetime.utcnow)
def add_user():
user = User(nickname="Mike")
user.contact_list = [{"name": "Sam", "phone": ["123456", "654321"]},
{"name": "John", "phone": ["159753"]},
{"name": "Joe", "phone": ["147889", "98741"]}]
db.session.add(user)
db.session.commit()
if __name__ == "__main__":
add_user()
How can I retrieve the name from my contact_list using phone? For example, I have the 147889, how can I retrieve Joe?
I have tried this
User.query.filter(User.contact_list.contains({"phone": ["147889"]})).all()
But, it returns me an empty list, []
How can I do this?
You just forgot that your JSON path should include the outermost array as well:
User.query.filter(User.contact_list.contains([{"phone": ["147889"]}])).all()
will return the user you are looking for. The original query would match, if your JSON contained an object with key "phone" etc. Note that this returns the User object in question, not the specific object/name from the JSON structure. If you want that, as seems to be the end goal, you could expand the array elements of each user, filter based on the resulting records, and select the name:
val = db.column('value', type_=JSONB)
db.session.query(val['name'].astext).\
select_from(User,
db.func.jsonb_array_elements(User.contact_list).alias()).\
filter(val.contains({"phone": ["147889"]})).\
all()
On the other hand the above query is not as index friendly as the first one can be, because it has to expand all the arrays before filtering, so it might be beneficial to first find the users that contain the phone in their contact list in a subquery or CTE, and then expand and filter.

Annotate django-taggit tags attached to specific object

I have an object that is tagged using django-taggit. If I wanted to get a list of all the tags attached to this object, I would follow the documentation like so:
apple = Food.objects.create(name="apple")
apple.tags.add("red", "green", "delicious")
apple.tags.all()
If I wanted to know how many Food objects were attached to each tag in existence, I would do the following:
Tag.objects.all().annotate(food_count=Count('food'))
If I wanted to get a count of all of the food items attached only to the tags that are attached to 'apple', I could do the following:
apple = Food.objects.create(name="apple")
apple.tags.add("red", "green", "delicious")
apple.tags.all().annotate(food_count=Count('food'))
Ok, so for my question. Let's say my Food model has a field with a flag:
class Food(models.Model):
name = models.CharField(max_length=200, unique = True)
healthy_flag = models.BooleanField(default=False)
How can I get a count of all of the healthy foods attached only to the tags that are attached to 'apple' (where a healthy food is denoted by healthy_flag = 1)? Basically, for every 'apple' tag, how many healthy foods share that tag?
Found the answer here.
apple.tags.all().annotate(food_count=Count('food')).filter(food__health_flag = True)

Encoding a binary tree to json

I'm using the sqlalchemy to store a binary tree data in the db:
class Distributor(Base):
__tablename__ = "distributors"
id = Column(Integer, primary_key=True)
upline_id = Column(Integer, ForeignKey('distributors.id'))
left_id = Column(Integer, ForeignKey('distributors.id'))
right_id = Column(Integer, ForeignKey('distributors.id'))
how can I generate json "tree" format data like the above listed:
{'id':1,children:[{'id':2, children:[{'id':3, 'id':4}]}]}
I'm guessing you're asking to store the data in a JSON format? Or are you trying to construct JSON from the standard relational data?
If the former, why don't you just create entries like:
{id: XX, parentId: XX, left: XX, right: XX, value: "foo"}
For each of the nodes, and then reconstruct the tree manually from the entries? Just start form the head (parentId == null) and then assemble the branches.
You could also add an additional identifier for the tree itself, in case you have multiple trees in the database. Then you would just query where the treeId was XXX, and then construct the tree from the entries.
I hesitate to provide this answer, because I'm not sure I really understand your the problem you're trying to solve (A binary tree, JSON, sqlalchemy, none of these are problems).
What you can do with this kind of structure is to iterate over each row, adding edges as you go along. You'll start with what is basically a cache of objects; which will eventually become the tree you need.
import collections
idmap = collections.defaultdict(dict)
for distributor in session.query(Distributor):
dist_dict = idmap[distributor.id]
dist_dict['id'] = distributor.id
dist_dict.setdefault('children', [])
if distributor.left_id:
dist_dict.['children'].append(idmap[distributor.left_id])
if distributor.right_id:
dist_dict.['children'].append(idmap[distributor.right_id])
So we've got a big collection of linked up dicts that can represent the tree. We don't know which one is the root, though;
root_dist = session.query(Distributor).filter(Distributor.upline_id == None).one()
json_data = json.dumps(idmap[root_dist.id])