Django / MYSQL foreign key best practice - mysql

I have three models:
class Agent(models.Model):
name = models.CharField(max_length=200)
class Phone(models.Model):
calls = models.CharField(max_length=200)
agent_phone_name = models.CharField(max_length=200)
agent = models.ForeignKey(Agent, on_delete=models.CASCADE)
class Chat(models.Model):
chats = models.CharField(max_length=200)
agent_chat_name = models.CharField(max_length=200)
agent = models.ForeignKey(Agent, on_delete=models.CASCADE)
Chat and Phone links to agent with foreign key.
I want to make a table like this:
agent1 | sum(calls) | sum(chats)
agent2 | sum(calls) | sum(chats)
agent3 | sum(calls) | sum(chats)
First question is:
I've read about SQL foreign key as a way to maintain data and referential integrity. However in my case, there are miss call and chat (agent didn't pick it up) that generate rows of data has empty value on agent_name. So when I'm inserting the phone/chat data and update the FK, I have leave the FK empty and kinda that conflicts with the idea of FK. What's the best way to handle this?
Second question is: is using Foreign key the only way to perform SQL join table query in Django ORM? Is there any other way around it?
Thank you !

Related

SqlAlchemy not returning all rows when querying table object, but returns all rows when I query table object column

Update - Solution Below
I am extremely new to SqlAlchemy so please excuse if this is an obvious problem.
When I query the Table object I only get one result (the first in the database, there are 600+ with my filter). When I query by a column on the table it returns all the data I expect. What I am doing incorrectly?
Only returns 1 result should be hundreds
for row in edb_alchemy.session.query(FtSite).filter(FtSite.serial_si == 200134444):
print(row.s_sequence)
Result looks like:
1
Returns all results
for row in edb_alchemy.session.query(FtSite.s_sequence).filter(FtSite.serial_si == 200134444):
print(row)
Result looks like:
(1,)
(2,)
(3,)
(4,)...
Returns 1 result for FtSite table and all results for column
for row in edb_alchemy.session.query(FtSite, FtSite.s_sequence).filter(FtSite.serial_si == 200134444):
print(row.FtSite.s_sequence, row.s_sequence)
Result looks like
(1, 1), (1, 2), (1, 3), (1,4)....
The SQL that SQLAlchemy says its using is
"SELECT ft_site.serial_si AS ft_site_serial_si, ft_site.partition_id AS ft_site_partition_id, ft_site.s_sequence AS ft_site_s_sequence, ft_site.value AS ft_site_value" + \
" FROM ft_site" + \
" WHERE ft_site.serial_si = 200134444"
Which works fine as I'd expect when just using SQL query outside of SQLAlchemy.
Update
Thank you Ilja in the comments.
For some reason I thought this table had an id primary key.
It does not, I'm just a consumer of this db and should have been more observant.
You were correct. This table has no unique key and lists MUL under FtSite.serial_si.
This is what the table actually looks like.
+------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+----------------+
| p_id | int(10) | NO | | NULL | |
| serial_si | int(10) | NO | MUL | NULL | |
| s_sequence | int(10) | NO | | NULL | |
| value | double | YES | | NULL | |
+------------+----------+------+-----+---------+----------------+
My original table description was
class FtSite(Base):
__tablename__ = "ft_site"
id = Column(INTEGER, primary_key=True)
serial_si = Column(INTEGER)
partition_id = Column(INTEGER)
s_sequence = Column(INTEGER)
value = Column(DOUBLE)
I changed this to have a composite key in SQLAlchemy (s_sequence, serial_si) is unique even if it's not defined in the database. Is this the best way to handle this in SQLAlchemy? It's now returning expected results.
class FtSite(Base):
__tablename__ = "ft_site"
serial_si = Column(INTEGER, primary_key=True)
partition_id = Column(INTEGER)
s_sequence = Column(INTEGER, primary_key=True)
value = Column(DOUBLE)
I ran into a similar situation where SQLAlchemy query object's .all() doesn't return all the rows in the table (always missing some) but .count() call does give the correct count. After digging into it a bit more, I realized that the model declaration deviated from the actual table schema in that database. First, the database has a single primary key column in the schema but the model declaration has a composition primary key (in reverse case like yours), also I missed a 3-column unique constraint where as the table schema has it.
What happened there in my case was that whenever SQL Alchemy query the database it did get all the rows behind the scene but due to the incorrect composition primary key in my model declaration has prevented some rows loading into SQLAlchemy's session (primary keys by definition uniquely indentify the objects and it won't load two objects with the same primary key in session as a result, hence it tosses out those composition columns that have same values even in the database they do have different PKs.)
In conclusion, double check model declaration with database schema to make sure they are in sync is the first response from this kind of issues.
It is a schema mismatch with primary key specified in the SQLAlchemy model and none in the underlying database table.
From the SQLAlchemy ORM docs:
Most ORMs require that objects have some kind of primary key defined because the object in memory must correspond to a uniquely identifiable row in the database table
The solution is to define a composite primary key:
class SomeClass(Base):
__tablename__ = "some_table_with_no_pk"
uid = Column(Integer, primary_key=True)
- bar = Column(String)
+ bar = Column(String, primary_key=True)
A related primary key question: How to define a table without primary key with SQLAlchemy?
Yes, whatever Devy mentioned is correct. I faced the same issue and it got resolved. Please check the SQLAlchemy model declaration and database schema. If you want to quickly check that by using "Sqlacodegen"
sqlacodegen --outfile models.py mysql+pymysql://<username><password>#<hostname/ip of database>:<dbport>/<databasename>
https://pypi.org/project/sqlacodegen/ (Check this for more information).
Please let me know if you need anything.

Is it acceptable to have NULL foreign keys?

This should be a simple question I think, but is it OK to have NULL foreign keys?
I guess to elaborate, let's say I'm making a database for users and different types of users require different data sets... what would be the best practice to design this database?
This was my thought, as a rough example: (am i correct or way off?)
"users":
id | type (ie. '1' for basic, '2' for advanced) | basic_id (nullable foreign key) | advanced_id (nullable foreign key) | email | name | address | phone (etc etc)
"users_basic":
id | user_id (foreign key) | (other data only required for basic users)
"users_advanced"
id | user_id (forgein key) | (other data only required for advanced users)
I get the feeling it's bad design cause there's no way to get all the data in one query without checking what type of user it is first, but I really don't like the idea of having ONE table with a ton of NULL data. What is the best way to design this?
Of course it is fine to have NULL foreign keys.
In your case, though, I'd be inclined to do one of two things. If there really aren't very many columns for the basic and advanced users, you can just include them in the users table. This would be the typical approach.
Otherwise, you can declare user_id as the primary key in all three tables, and still have a foreign key relationship from the secondary tables (users_basic and users_advanced) to the primary (users). Maintaining the distinctiveness of the relationship is tricky in MySQL and probably not worth doing.

What's best data structure for primary key and secondary key in database?

In my python project, I have an employee map. The key is employee number, and value is an object with employee information, and the location field in the value is unique, for example,
mapA = {12: {name: 'John Ma', location: 'US'}, 25: {name: 'Richard Yan', location: 'Eng'}, ...}
Also there is another map, the key is location, and value is another objects with something else. such as,
mapB = {'US': objectS1, 'Eng': objectS2, ...}
Now I want to build one map between employee number and objectS which name is mapC.
mapC = {12: objectS1, 25: objectS2 ...}
I must find the location in the object of mapA, then get the key which points to this object. Then use this key and objectS in mapB to build the new mapC. In this way, the time complexity is O(N^2).
To reduce time complexity, first I build mapD which key is location and value is employee number through traversing mapA, then I can get mapC easily.
Is there any better solution for that question?
BTW, another question come into my mind: what's best data structure for primary key and secondary key in database?
For example, here is one table,
| Username | Password | Email_Addr |
--------------------------------------
| Username1 | Password1 | Email 1 |
| Username2 | Password2 | Email 2 |
| Username3 | Password1 | Email 3 |
and username is first primary key and Password is secondary key, what's best data structure to store them to get best performance?
IMO, to simply, combine username and password as one compound key. and we sort those records according to the following compare function,
bool key_compare(string key1, string key2){
return key1+key2 > key2+key1;
}
and those information are stored in B_tree.
Edit 2
From this document, we know, those two indexes can be implemented with hash table and inverted table. In the table above,
The hash table save the information of Username and the address of each record,
{Username1: 'record1 address', Username2: 'record2 address'}
and the inverted table hold the information of Password and record address table.
{Password1: ['record1 address', 'record3 address'], Password2: ['record2 address']}
Edit 1
I want to know how database to handle them, for example, MySQL...
Is there any hashMap beside B_tree? thanks in advance.
I think a better solution would be to use one map that maps id to employee and to keep all the information about said employee in 'Employee object.
So based on your example Employee would have two values
1. name
2. location
Your issue seems to be that location is a key to another map and not the information you'd like to grab. Why not have a 'Location' object that holds all this information? You'd be able to eliminate the need for a second and third map.
Create a "location" object with information such as:
1. Country
2. Latitude\longitude
So now
mapA = {12: employee1, 43: employee2, ...}
employee1 has information such as:
name = 'John'
location = usLocation
usLocation is an object of type location and has all the relevant information encapsulated inside of it.
As for your second question I haven't dealt much with SQL so I'll let someone else tackle that.

Allow/require only one record with common FK to have "primary" flag

Firstly, I apologise if this is a dupe - I suspect it may be but I can't find it.
Say I have a table of companies:
id | company_name
----+--------------
1 | Someone
2 | Someone else
...and a table of contacts:
id | company_id | contact_name | is_primary
----+------------+--------------+------------
1 | 1 | Tom | 1
2 | 2 | Dick | 1
3 | 1 | Harry | 0
4 | 1 | Bob | 0
Is it possible to set up the contacts table in such a way that it requires that one and only one record has the is_primary flag set for each common company_id?
So if I tried to do:
UPDATE contacts
SET is_primary = 1
WHERE id = 4
...the query would fail, because Tom (id = 1) is already flagged as the primary contact for company_id = 1. Or even better, would it be possible to construct a trigger so that the query would succeed, but Tom's is_primary flag would be cleared by the same operation?
I am not too bothered about checking whether company_id exists in the companies table, my PHP code would already have performed this check before I got to this stage (although if there is a way to do this in the same operation it would be nice, I suppose).
When I initially thought about this I thought "that will be easy, I'll just add a unique index across the company_id and is_primary columns" but obviously that won't work as it would restrict me to one primary and one non-primary contact - any attempt to add a third contact would fail. But I can't help feeling there would be a way to configure a unique index that gives me the minimum functionality I require - to reject an attempt to add a second primary contact, or reject an attempt to leave a company with no primary contact.
I am aware that I could just add a primary_contact field to the companies table with an FK to the contacts table but it feels messy. I don't like the idea of both tables having an FK to the other - it seems to me that the one table should rely on the other, not both tables relying on each other. I guess I just think that over time there is more chance of something going wrong.
To sum up:
How can I restrict the contacts table so that one and only one record with a given company_id has the is_primary flag set?
Anyone have any thoughts on whether two tables having FKs to each other is a good/bad idea?
Circular refenences between tables are indeed messy. See this (decade old) article: SQL By Design: The Circular Reference
The cleanest way to make such a constraint is to add another table:
Company_PrimaryContact
----------------------
company_id
contact_id
PRIMARY KEY (company_id)
FOREIGN KEY (company_id, contact_id)
REFERENCES Contact (company_id, id)
This will also require a UNIQUE constraint in table Contact on (company_id, id)
You could just do a query before that one setting
UPDATE contacts SET is_primary = 0 WHERE company_id = .....
or even
UPDATE contacts
SET is_primary = IF(id=[USERID],1,0)
WHERE company_id = (
SELECT company_id FROM contacts WHERE id = [USERID]
);
Just putting an alternative out there - personally I'd probably look to the FK approach though instead of this type of workaround i.e. have a field in the companies table with a primary_user_id field.
EDIT method w/o relying on a contact.is_primary field
Alternative method, first of all remove is_primary from contacts. Secondly add a "primary_contact_id" INT field into companies. Thirdly, when changing the primary contact, just change that primary_contact_id thus preventing any possibility of there being more than 1 primary contact at any time and all without the need for triggers etc in the background.
This option would work fine in any engine as it's simply updating an INT field, any reliance on FK's etc could be added/removed as required but at it's simplest it's just changing an INT fields value
This option is viable as long as you need one and precisely one link from companies to contacts flagging a primary

Django Foreign Key

I am trying to generate a report across 2 models/ tables. Here they are:
class Members(models.Model):
username = models.CharField(max_length=30,null=True, unique=True)
email = models.CharField(max_length=100,null=True, unique=True)
name = models.CharField(max_length=30,null=True)
phone = models.CharField(max_length=30,null=True)
and
class Report(models.Model):
report_text = models.CharField(max_length=500)
reporter_id = models.IntegerField(db_index=True)
reported_id = models.IntegerField(db_index=True)
date_created = models.DateTimeField(null=True)
date_read = models.DateTimeField(null=True)
The 2 tables obviously have auto increment IDs as the primary key.
The report will look like this:
Reported Phone | Reported Name | Report | Date Reported | Date Report Read
Everyone reported on will be in the member table. The reporter ID is the ID of the member who logged the report. The reported_id is the ID of the person the report is on. I need to do a join across the 2 models to get the members name and their phone number. I can't quite work it out form the doc. I believe I should make the reported_id and reporter_id both foreign keys to the Members table primary key ID field. How do I do that and what code will extract the report for all entries submitted by a specific reporter?
Do I user reported_id = models.ForeignKey(Members) and do the same for reporter_id. It seems odd as I don't specify the field that the field is foreign to. The ORM is supposed to make it easier (and it usually does!). I could do it with a join in SQL but this has got me stumped.
I hope the question makes sense.
Thanks in advance
Rich
How do I do that and what code will
extract the report for all entries
submitted by a specific reporter?
Yes, do reported_id = models.ForeignKey(Members)
The field will be the target models primary key, which is in your case id since you haven't specified one.
You will need to specify a related_name for one of these fields to prevent a name clash for the reverse foreign key accessor.
After setting up the foreign key field, to get all objects related via that foreign key, use the related_name to query the related model.
http://docs.djangoproject.com/en/dev/topics/db/queries/#following-relationships-backward
For example, if you set up your model as:
reporter = models.ForeignKey(Members, related_name="reports_by_me")
reported = models.ForeignKey(Members, related_name="reports_by_others")
You could access all related Report models via that foreign key by
member_instance.reports_by_me.all()
member_instance.reports_by_others.all()