SQL Inheritance, get by ID - mysql

This question aims to get the most clean and "best" way to handle this kind of problem.
I've read many questions about how to handle inheritance in SQL and like the Table Per Type model most and would like to use it. The problem with this is that you have to know what type you are going to query to do the proper join.
Let's say we have three tables:Son, Daughter and Child.
This works very well if you for example want to query all daughters. You can simply join the child and get all the information.
What I'm trying to do is to query a Child by ID and get the associated sub class information. What I could do is to add a column Type to the child and select the associated data with a second select, but that does not seem pretty nice. Another way to do it would be to join all sub tables, but that doesn't seem to be that nice either.
Is there an inheritance model to solve this kind of problem in a clean, nice and performant way?
I'm using MySQL btw

Given your detailed definition in the comment with the use case
The Server gets the http request domain.com/randomID.
it becomes apparent, that you have a single ID at hand for which you want to retrieve the attributes of derived entities. For your case, I would recommend to use the LEFT JOIN approach:
SELECT age,
son.id is not null as isSon,
randomColumn,
daughter is not null as isDaughter,
whatEver
FROM child
LEFT JOIN son on child.id = son.id
LETT JOIN daughter on child.id = daughter.id
WHERE
child.id = #yourRandomId
This approach, BTW, stays very close to your current database design and thus you would not have to change much. Yet, you are able to benefit from the storage savings that the improved data model provides.
Besides that, I do not see many chances to do it differently:
You have different columns with different datatypes (esp. if looking at your use case), so it is not possible to reduce the number of columns by combining some of them.
Introducing a type attribute is already rejected in your question; sending single SELECT statements as well.
In the comment you are stating that you are looking for something like Map<ID, Child> in MySQL. Please note that this java'ish expression is a compile-time expression which gets instantiated during runtime with the corresponding type of the instance. SQL does not know the difference between runtime and compile-time. Thus, there is also no need for such a generic expression. Finally, also please note that in case of your Java program, you also need to analyse (by introspection or usage of instanceof) which type your value instance has -- and that is also a "single-record" activity which you need to perform.

Related

Preserve data integrity in a database structure with two paths of association

I have this situation that is as simple as it is annoying.
The requirements are
Every item must have an associated category.
Every item MAY be included in a set.
Sets must be composed of items of the same category.
There may be several sets of the same category.
The desired logic procedure to insert new data is as following:
Categories are inserted.
Items are inserted. For each new item, a category is assigned.
Sets of items of the same category are created.
I'd like to get a design where data integrity between tables is ensured.
I have come up with the following design, but I can't figure out how to maintain data integrity.
If the relationship highlighted in yellow is not taken into account, everything is very simple and data integrity is forced by design: an item acquires a category only when it is assigned to a set and the category is given by the set itself.However, it would not be possible to have items not associated with a set but linked to a category and this is annoying.
I want to avoid using special "bridging sets" to assign a category to an item since it would feel hacky and there is no way to distinguish between real sets and special ones.
So I introduced the relationship in yellow. But now you can create sets of objects of different categories!
How can I avoid this integrity problem using only plain constraints (index, uniques, FK) in MySQL?
Also I would like to avoid triggers as I don't like them as it seems a fragile and not very reliable way to solve this problem...
I've read about similar question like How to preserve data integrity in circular reference database structure? but I cannot understand how to apply the solution in my case...
Interesting scenario. I don't see a slam-dunk 'best' approach. One consideration here is: what proportion of items are in sets vs attached only to categories?
What you don't want is two fields on items. Because, as you say, there's going to be data anomalies: an item's direct category being different to the category it inherits via its set.
Ideally you'd make a single field on items that is an Algebraic Data Type aka Tagged Union, with a tag saying its payload was a category vs a set. But SQL doesn't support ADTs. So any SQL approach would have to be a bit hacky.
Then I suggest the compromise is to make every item a member of a set, from which it inherits its category. Then data access is consistent: always JOIN items-sets-categories.
To support that, create dummy sets whose only purpose is to link to a category.
To address "there is no way to distinguish between real sets and special ones": put an extra field/indicator on sets: this is a 'real' set vs this is a link-to-category set. (Or a hack: make the set-description as "Category: <category-name>".)
Addit: BTW your "desired logic procedure to insert new data" is just wrong: you must insert sets (Step 3) before items (Step 2).
I think I might found a solution by looking at the answer from Roger Wolf to a similar situation here:
Ensuring relationship integrity in a database modelling sets and subsets
Essentially, in the items table, I've changed the set_id FK to a composite FK that references both set.id and set.category_id from, respectively, items.set_id and item.category_id columns.
In this way there is an overlap of the two FKs on items table.
So for each row in items table, once a category_id is chosen, the FK referring to the sets table is forced to point to a set of the same category.
If this condition is not respected, an exception is thrown.
Now, the original answer came with an advice against the use of this approach.
I am uncertain whether this is a good idea or not.
Surely it works and I think that is a fairly elegant solution compared to the one that uses tiggers for such a simple piece of a a more complex design.
Maybe the same solution is more difficult to understand and maintain if heavily applied to a large set of tables.
Edit:
As AntC pointed out in the comments below, this technique, although working, can give insidious problems e.g. if you want to change the category_id for a set.
In that case you would have to update the category_id of each item linked to that set.
That needs BEGIN COMMIT/END COMMIT wrapped around the updates.
So ultimately it's probably not worth it and it's better to investigate the requirements further in order to find a better schema.

Django/SQL - Creating a table view that joins a table with an override table

So I have the following model structure in my Django App:-
class SuperModel(models.Model):
f1 = models.CharField()
f2 = models.CharField()
class Model(SuperModel):
f3 = models.CharField()
class OverrideModel(models.Model):
fpk = models.OneToOneField(Model, primary_key=True)
f1 = models.CharField()
f2 = models.CharField()
Basically, in my application, the fields f1 and f2 in the Model table contain user information that I have entered. The user has the ability to override this information and any changes he/she makes in the data is stored in the OverrideModel table (because I do not want to lose the information that I had entered first). Think of it as me creating user profiles earlier while now I want the user to be able to edit his/her own profile without losing the information that I had entered about them.
Now, since the rest of my application (views/templates etal) work with the field names in the Model class, what I want is to create a view of the data that fetches the field f1 from the override table if it exists, otherwise it should pickup f1 from the table it used to earlier without resorting to a raw queryset.
I will describe everything I have considered so far so that some of the other constraints I am working with become clear:-
Model.objects.annotate(f1=Case(When(overridemodel__f1__isnull=True, then=F('f1')), default=F('overridemodel__f1'))).
This throws the error that the annotate alias conflicts with a field already in the table.
Model.objects.defer('f1').extra(select={'f1': 'CASE WHEN ... END'}, tables=..., where=...).
This approach cannot be applied because I could not figure out a way to apply an outer join using extra. The override model may not have a row corresponding to each model row. Specifying the override table in the tables clause performs a cross product operation which combined with where can be used to perform an inner join, not an outer join (although I'd be happy to be proved wrong).
EDIT: I have realized that select_related might be able to solve the above problem but if I filter the queryset generated by Model.objects.select_related('overridemodel').defer('f1').extra(select={'f1': 'CASE WHEN ... END'}, tables=..., where=...) on the field f1, say qs.filter(f1='Random stuff') the where clause for the filter query uses the Model.f1 field rather than the f1 field generated in extra. So this approach is also futile.
Using Model.objects.raw() to get a raw queryset.
This is a non-starter because the Django ORM becomes useless after using raw and I need to be able to filter / sort the model objects as part of the application.
Defining methods/properties on the Model class.
Again, I will not be able to use the same field names here which involves hunting through code for all usages and making changes.
Creating a view in the database that gives me what I want and creating an unmanaged model that reads the data from that view.
This is probably the best solution for my problem but having never used an unmanaged model before, I'm not sure how to go about it or what pitfalls I might encounter. One problem that I can think of off the top of my head is that my view always has to be kept in sync with the models but that seems a small price to pay compared to hunting through the codebase and making changes and then testing to see if anything broke.
So, there you have it. As always, any help / pointers will be greatly appreciated. I have tried to provide as minimal an example as possible; so if any more information is required I'll be happy to provide it.
Also, I am using Django 1.8 with MySQL.
I realized that there is no easy canonical way to solve my problem. Even with using option 5 (creating a view that is ORM manipulated using an unmanaged Model), I would lose the related query names on the original model that are being used in my filtering / sorting.
So, for anyone else with a similar problem I would recommend the approach I finally went with which is not keeping an OverrideModel but an OverriddenModel which keeps the values that are overridden whenever the user makes changes and updating the original Model with the override values so that the model always contains the values on which filtering / querying is going to occur

How do I join on the correct one-to-one table (super-type, sub-type model)?

I've been looking into first, second, and third normal forms, and I want to do a better job normalizing my tables. Part of this, I realized, was that I've never understood the purpose of one-to-one tables. From what I understand, "optional" data should be grouped into another table, leaving distinct entities intact, while avoiding the nuances of maintaining several NULL fields in one monolithic table.
So, a real-world scenario. In a CMS, I want to maintain several different types of "pages," making it extendable by additional plugins without affecting the original schema. I have these as sample tables so far:
Pages (title, path, type, etc.)
ContentPages (same as base page, but with keyword/description/content fields)
LinkedPages (same as base page, but contains a reference to another page)
ProductPages (same as base page, but with SKU and other ecomm-related info)
So far, so good. No NULLS. Self-documented design. Super-typing / Sub-typing is consistent between my PHP models and database. Everything's DANDY.
EXCEPT, given any page ID, I don't want to do a first query to get the base page info, figure out what type of page it is, and then get the corresponding sub-type information with another query. Do I have to keep track of this with application state (or URL), or is there a way to know which table to join on, while only knowing the page ID and nothing else?
This is really easy with only one table (obviously), as the NULL fields imply the type, or an ENUM can tell me what it is. Switching back to 1NF isn't an acceptable answer, as I already know how to do it. I want to learn this way ;)
UPDATE: Also wanted to mention that each of the sub-type properties is unique to that type. So, any common property shared by all types will, of course, go into the base page table. Sub-types won't share any other properties. This seemed like a logical way to group the sub-tables, but maybe I'm defeating the purpose of one-to-one tables with this arrangement...
It depends on who's asking the question. If your plugin is driving the query then it can start at its specific subtype and join in the supertype, which it knows must exist.
I don't know what your business requirements are, but it seems to me that if you are trying to keep things modular then you want to drive as many joins from the child side (i.e. the plugin side) as possible.
If you are going to have a query driven from the supertype to the subtype then you can use an outer join and just be ready in your code to handle null columns if the subtype in question isn't present. Obviously that approach is less modular, but I suppose there could be times when that is what you need or want to do.
you could create a view by left outer joining all the subtypes on the main Page table. The view could be queried by a single page_id and would return one row with many null values, the same as you'd get with one big 1st normal form page table.
is there a way to know which table to join on, while only knowing the
page ID and nothing else?
Well, in a supertype/subtype structure, you should know more than the page ID. You should also know the subtype.
Usually, a supertype/subtype structure for 'n' subtypes maps to
n + 1 tables, one for each subtype, plus one for the supertype, and
n updatable views, each of which joins the supertype with the appropriate subtype
So your application should usually be working with the views, not with the base tables. (Usually, but not always.)
If you're not using the views, then when you retrieve the page id numbers from the supertype, you should also retrieve the column that identifies the subtype. Don't have such a column? Fix that. And see this other relevant SO database design problem for a supertype/subtype with code, a description of the structure, and the logic behind it.

Implementing Comments and Likes in database

I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.
I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).
I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.
The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)
Detailed description:
I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:
comment on any of those 3 tables
mark any of them as liked
tag any of them with some tag
I also want to count the number of likes for every element and the number of times that particular tag was used.
1st approach:
a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.
b) The same counts for comments.
c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...
I really don't like this design for the last part, it smells badly for me ;)
2nd approach:
I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:
a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)
and for places:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)
and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...
At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:
in element (Photo/Article/Place) table
by select count().
I hope that my explanation of the issue is more thorough now.
The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.
Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:
Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:
BTW, there are roughly 3 ways to implement the "ER category":
All types in one table.
All concrete types in separate tables.
All concrete and abstract types in separate tables.
Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).
Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.
Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.
If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.
You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.
This is a general idea
please donĀ“t pay much attention to the field names styling, but more to the relation and structure
This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"
This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "like"
as far as i understand. several tables are required. There is a many to many relation between them.
Table which stores the user data such as name, surname, birth date with a identity field.
Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
each different data type has its table. for example, status updates, photos, links.
the last table is for many to many relation storing an id, user id, data type and data id.
Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?
If not favour the one that requires the fewer tables
In this case:
Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
Find comments by a person about one kind of thing: simple query in either case
Find all comments by a person about all things: this seems little gnarly either way.
I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.
Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.
One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.
One big table with dictionary may become uncontrollable one day.
Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...

Linq to SQL and Gridview Datasource

I have a question related to this one. I don't want to do a calculation (aggregation), but I need to get display values from an association. In my C# code, I can directly reference the value, because the foreign key constraint made Linq generate all the necessary wiring.
When I specify the IQueryable as the Gridview datasource property, and reference something that is not a column of the primary entity in the result set, I get an error that the column does not exist.
As a newbie to Linq, I am guessing the assignment implicitely converts the IQueryable to a list, and the associations are lost.
My question is, what is a good way to do this?
I assume that I can work around this by writing a parallel query returning an anonymous type that contains all the columns that I need for the gridview. It seems that by doing that I would hold data in memory redundantly that I already have. Can I query the in-memory data structures on the fly when assigning the data source? Or is there a more direct solution?
The gridview is supposed to display the physician's medical group associations, and the name of the association is in a lookup table.
IQueryable<Physician> ph =
from phys in db.Physicians
//from name in phys.PhysicianNames.DefaultIfEmpty()
//from lic in phys.PhysicianLicenseNums.DefaultIfEmpty()
//from addr in phys.PhysicianAddresses.DefaultIfEmpty()
//from npi in phys.PhysicianNPIs.DefaultIfEmpty()
//from assoc in phys.PhysicianMedGroups.DefaultIfEmpty()
where phys.BQID == bqid
select phys;
(source: heeroz.com)
So, based on Denis' answer, I removed all the unneeded stuff from my query. I figured that I may not be asking the right question to begin with.
Anyways, the page shows a physician's data. I want to display all medical group affiliations in a grid (and let the user insert, edit, and update affiliations). I now realize that I don't need to explicitly join in these other tables - Linq does that for me. I can access the license number, which is in a separate table, by referencing it through the chain of child associations.
I cannot reference the medical group name in the gridview, which brings me back to my question:
AffiliationGrid.DataSource = ph.First().PhysicianMedGroups;
This does not work, because med_group_print_name is not accessible for the GridView:
A field or property with the name 'med_group_print_name' was not found on the
selected data source.
Again, bear with me, if it is all too obvious that I don't understand Linq ... because I don't.
Your query seems strange. You should try to simply display
ph = from phys in db.Physicians
where phys.BQID == bqid
select phys;
in your grid. That should work.
Also, why the calls to Load()? If the DataContext is not disposed when the grid is binding, you should not need it.
If you still have issues, can you please post the error message you get, that would help...
Part 2
The problem is that you have the name is effectively not in the PhysMedGroup. You need to navigate one level down to the MedGroupLookup to access the name, since it is a property of that class.
Depending on the technology you are using (it seems to be either WinForms or Web Forms), you will need to configure your data-binding to access MedGroupLookup.med_group_print_name.