Polymorphic Associations Pattern or AntiPattern or Both? - mysql

Multiple questions on this site and others relate to using a MySQL table definition where the name of the table is a column name.
For instance, for "notes" in a DB I am thinking of using the structure:
CREATE TABLE IF NOT EXISTS `Notes` (
`id` int(10) NOT NULL,
`table` varchar(30) NOT NULL,
`row_id` int(10) NOT NULL,
`note` varchar(500) NOT NULL,
`user_id` int(11) NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I keep reading all over the place that this is poor database design. I have figured out that this is called polymorphic association. Polymorphic association is specifically listed as a SQL Anti-Pattern. (or in slides)
I have seen the drawbacks of the antipattern, but I have no requirement for doing any of those types of queries that I can think of.
For my app, I want to be able to write notes on just about every other row in the database. For potentially hundreds of other rows.
It is confusing that while this is listed as an AntiPattern, it seems to be a fundamental part of the ruby ActiveRecord concept. Is the active record layer doing magic that makes this OK. (i.e. its polymorphic association at the record level, but not at the DB level)?
Specifically I would like to understand when/if using this SQL design is safe to use.
-FT

Related

How to best categorize values in a table

I'm in the process of designing a new database for a project at work. I want to create a table that stores Assignments for a digital classroom. Each Assignment can be one of 2 categories: "Individual" or "Group".
The first implementation that comes to mind is the following:
CREATE TABLE `assignments` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) DEFAULT NULL,
`category` varchar(10) NOT NULL DEFAULT 'individual',
PRIMARY KEY (`id`),
KEY `category_index` (`category`(10))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
I would then select all assignments of a given category with:
SELECT title FROM assignments WHERE category = "individual"
However, because we've had performance issues in the past, I'm trying to optimize the design as much as possible. As such, I'm wondering whether or not storing the category as a VARCHAR is a good idea (considering the table will get quite large)? Would indexing an INT perform better over a VARCHAR?
Aside from just performance, I'm also curious what would be considered a good solution from a design-perspective. Suggestions?

Increase JOIN Query response time by combining tables

We have a sports shopping website that recommends products to users. our query recommends by doing a JOIN on three tables of the following effect: (1) what sports a user is interested in, (2) what products are part of that sport, and (3) eliminate products the user has already bought. We have three tables currently. The response time is 3 seconds.
In an effort to make the query response faster, we are proposing combing two tables into one table . The attached image shows the proposed logic. My question is:
is the proposed query even possible as a single query
if all else is equal, will the proposed logic be faster than the current logic - even if it is a small amount?
We are on AWS MySQL RDS. All indexes have been done correctly. Please don't discuss about migrating to Redis, MEMSql etc, i am just interested at this stage to understand if the proposed logic will be faster.
Thank you for your help!!
CREATEs
CREATE TABLE UserPreferences (
UserPreferenceId int(11) NOT NULL AUTO_INCREMENT,
UserId int(11) NOT NULL,
FamilyId int(11) NOT NULL,
InsertedDate datetime NOT NULL,
PRIMARY KEY (UserPreferenceId),
KEY userID (UserId),
KEY FamilyId (FamilyId),
KEY user (UserId),
KEY fk_UserPreferences_1 (FamilyId),
) ENGINE=InnAoDB AUTO_INCREMENT=261 DEFAULT CHARSET=utf8
CREATE TABLE ArticleToFamily (
ArticleToFamilyId int(10) unsigned NOT NULL AUTO_INCREMENT,
ArticleId int(11) DEFAULT NULL,
FamilyId int(11) unsigned NOT NULL,
InsertedDate datetime DEFAULT NULL,
Confidence int(11) NOT NULL DEFAULT '0',
Rank int(11) NOT NULL DEFAULT '0',
PRIMARY KEY ArticleToFamilyId),
KEY ArticleIdAndFamilyId` (ArticleId,FamilyId),
KEY FamilyId (FamilyId)
) ENGINE=InnoDB AUTO_INCREMENT=19795572 DEFAULT CHARSET=latin1
CREATE TABLE ItemsUserHasBought (
ItemsUserHasBoughtId int(11) NOT NULL AUTO_INCREMENT,
UserId int(11) NOT NULL,
ArticleId int(11) NOT NULL,
BuyDate datetime NOT NULL,
InsertedDate datetime NOT NULL,
UpdatedDate datetime NOT NULL,
Status char(1) NOT NULL DEFAULT '1',
PRIMARY KEY (ItemsUserHasBoughtId),
KEY ArticleId (ArticleId)
) ENGINE=InnoDB AUTO_INCREMENT=367 DEFAULT CHARSET=latin1
Don't do it.
Combining tables usually means denormalization of some kind, which is not the direction you want to be moving in a relational database. It's rarely side-effect free and often fails to achieve the desired gains. All in all, something to avoid, to be done only when all other avenues are exhausted.
Instead, check your indexes on the three tables that you have. It's likely that adding a foreign key in the right place can easily make this query run in a fraction of it's current time. Unfortunately, until we know what indexes you're already using, we can't be any more specific about how to improve it. It's also possible you're doing the right things here, and are really hitting a wall in terms of what your server is able to do... but probably not.
If indexes don't help, the next place I'd usually look is a materialized/indexed view. This is supported by Sql Server, Oracle, Postgresql, and most other modern database server engines. Sadly, like Windowing Functions, the APPLY/lateral join operation, and correct NULL handling, indexed views are among the many parts of ansi sql where MySql lags behind other dbs. MySql is sadly becoming more and more of a joke with each passing year... but then that's probably all part of Oracle's plan since the Sun acquisition. If you really want an open source DB, Postgresql has outclassed MySql for years now in pretty much every category. MySql is living now off of it's old momentum; it's popular because it's been popular, and is therefore widely available among the low-cost web hosts, but not at all because it's better.
Don't get me wrong: MySql used to be a great option. Postgresql hardly existed, and Oracle and Sql Server weren't any better back then and priced out of reach for most small businesses. But Oracle, Sql Server, Postgresql, and others have all moved on in ways that MySql hasn't. Postgresql, specifically, has gotten easier to manage while MySql has lost some of the simplicity that gave it an advantage, without picking up enough features that really matter.
But anyone can be an armchair architect, and I've editorialized way too much already. Given wholesale database change isn't likely to be an option for you by now anyway, take a long close look at your indexes. It's a good bet you'll be able to fix your problem that way. And if you can't, you can always throw more hardware at your server. Because MySql is cheaper, right?

Is there any performance hit when we reference multiple columns to one table rather than separate tables?

I have a database design like this. I am using MYSQL.
Have a vehicle table to store information about a vehicle
CREATE TABLE `test`.`vehicle` (
`vehicle_id` BIGINT UNSIGNED NOT NULL,
`fuel_type_id_ref` TINYINT UNSIGNED NULL DEFAULT NULL,
`drive_type_id_ref` TINYINT UNSIGNED NULL DEFAULT NULL,
`condition_id_ref` TINYINT UNSIGNED NOT NULL,
`transmission_type_id_ref` TINYINT UNSIGNED NULL DEFAULT NULL,
PRIMARY KEY (`vehicle_id`)
) ENGINE = INNODB CHARSET = latin1 COLLATE = latin1_swedish_ci ;
I used separate tables to store records for each reference id.
for eg: I have a fuel type table to store fuels, transmission type table and so on.
But now I figured that the schema of those tables are pretty much equivalent.
So created a table like this.
CREATE TABLE `test`.`vehicle_feature` (
`veh_feature_id` TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
`feature_type_id_ref` TINYINT UNSIGNED NOT NULL,
`name` VARCHAR (50) NOT NULL,
`is_active` TINYINT (1) NOT NULL DEFAULT TRUE,
PRIMARY KEY (`veh_feature_id`)
) ENGINE = INNODB CHARSET = latin1 COLLATE = latin1_swedish_ci ;
and I put all those fuels and transmisiion types into this table with a feature type Id to identify the group.
Now I have to join same table again and again to retrieve the values from my vehicle table.
So my question is.
Shall I maintain my separate tables or Shall I go with this new approach? Since I have to write same joins again and again there is no reduce in my code. I can easily join my small tables rather than this one table. Also if I use small tables I can go for inner join to join those tables but in here I have to use left joins to join the tables.Also separate tables have less records comparing to one table. All what this approach doing is reduce the tables of my DB( only 4 tables which I dont care ). Sum of all records in these 4 tables will be 100 records.
So what is performance wise good?
This is a bit of a difficult question, because these are both reasonable approaches. The key to deciding is understand what the application needs from this type of data.
A separate table for the items has one nice advantage because foreign key constraints can actually check the referential integrity of the data. Furthermore, each of the entities is treated as a full-fledged bona-fide entity. This is handy if you have other information about the fuels, drives, and transmissions that is specific to that entity. For instance, the fuel could have an octane rating, which could be in the fuel table but does not need to clutter the other reference tables.
On the other hand, you might end up with lots of similar reference tables. And, for your application, these may not need to be full-fledged entities. In that case, having a single table is quite reasonable. This is actually a bigger advantage if you want to internationalize your application. That is, if you want to provide the names of things in multiple languages.
In an object-oriented language, you would approach this problem using inheritance. The three "types" would all be "subclasses" from a class of vehicle attributes. Unfortunately, SQL does not have such built-in concepts.
From a performance perspective, the two methods would both involve relatively small reference tables (I'm guessing at most a few thousand rows), that are accessed via primary keys. There should be very little performance difference between the two approaches. The important concern is how to properly model the data for your application.

What is the most efficient way to check against a huge MySQL table?

I have a service in which users may "like" content posted by other users. Currently, the system doesn't filter out content that the user has already liked, which is undesirable behavior. I have a table called LikeRecords which stores a userID, a contentID, and a timePlaced timestamp. The idea is to use this table to filter content that a user has already liked when choosing what to display.
The thing is, I'm a MySQL amateur, and don't understand scaling and maintenance well. Even though I only have about 1,500 users, this table already has 45,000 records. I'm worried that as my service grows to tens or hundreds of thousands of users, this table will explode into millions and become slow since the filter operation would be called very frequently.
Is there a better design pattern I could use here, or a maintenance technique I should use?
EDIT: Here is the query for building the table in question:
CREATE TABLE `likerecords` (
`likeID` int(11) NOT NULL AUTO_INCREMENT,
`userID` int(10) unsigned NOT NULL,
`orderID` int(11) NOT NULL,
`timePlaced` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`special` tinyint(1) NOT NULL,
PRIMARY KEY (`likeID`)
) ENGINE=InnoDB AUTO_INCREMENT=44775 DEFAULT CHARSET=latin1
I would be using it to filter results in other tables, such as an "orders" table.

Approach for multiple "item sets" in Database Design

I have a database design where i store image filenames in a table called resource_file.
CREATE TABLE `resource_file` (
`resource_file_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`resource_id` int(11) NOT NULL,
`filename` varchar(200) NOT NULL,
`extension` varchar(5) NOT NULL DEFAULT '',
`display_order` tinyint(4) NOT NULL,
`title` varchar(255) NOT NULL,
`description` text NOT NULL,
`canonical_name` varchar(200) NOT NULL,
PRIMARY KEY (`resource_file_id`)
) ENGINE=InnoDB AUTO_INCREMENT=592 DEFAULT CHARSET=utf8;
These "files" are gathered under another table called resource (which is something like an album):
CREATE TABLE `resource` (
`resource_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`description` text NOT NULL,
PRIMARY KEY (`resource_id`)
) ENGINE=InnoDB AUTO_INCREMENT=285 DEFAULT CHARSET=utf8;
The logic behind this design comes handy if i want to assign a certain type of "resource" (album) to a certain type of "item" (product, user, project & etc) for example:
CREATE TABLE `resource_relation` (
`resource_relation_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`module_code` varchar(32) NOT NULL DEFAULT '',
`resource_id` int(11) NOT NULL,
`data_id` int(11) NOT NULL,
PRIMARY KEY (`resource_relation_id`)
) ENGINE=InnoDB AUTO_INCREMENT=328 DEFAULT CHARSET=utf8;
This table holds the relationship of a resource to a certain type of item like:
Product
User
Gallery
& etc.
I do exactly this by giving the "module_code" a value like, "product" or "user" and assigning the data_id to the corresponding unique_id, in this case, product_id or user_id.
So at the end of the day, if i want to query the resources assigned to a product with the id of 123 i query the resource_relation table: (very simplified pseudo query)
SELECT * FROM resource_relation WHERE data_id = 123 AND module_code = 'product'
And this gives me the resource's for which i can find the corresponding images.
I find this approach very practical but i don't know if it is a correct approach to this particular problem.
What is the name of this approach?
Is it a valid design?
Thank you
This one uses super-type/sub-type. Note how primary key propagates from a supert-type table into sub-type tables.
To answer your second question first: the table resource_relation is an implementation of an Entity-attribute-value model.
So the answer to the next question is, it depends. According to relational database theory it is bad design, because we cannot enforce a foreign key relationship between data_id and say product_id, user_id, etc. It also obfuscates the data model, and it can be harder to undertake impact analysis.
On the other hand, lots of people find, as you do, that EAV is a practical solution to a particular problem, with one table instead of several. Although, if we're talking practicality, EAV doesn't scale well (at least in relational products, there are NoSQL products which do things differently).
From which it follows, the answer to your first question, is it the correct approach?, is "Strictly, no". But does it matter? Perhaps not.
" I can't see a problem why this would "not" scale. Would you mind
explaining it a little bit further? "
There are two general problems with EAV.
The first is that small result sets (say DATE_ID=USER_ID) and big result sets (say DATE_ID=PRODUCT_ID) use the same query, which can lead to sub-optimal execution plans.
The second is that adding more attributes to the entity means the query needs to return more rows, whereas a relational solution would return the same number of rows, with more columns. This is the major scaling cost. It also means we end up writing horrible queries like this one.
Now, in your specific case perhaps neither of these concerns are relevant. I'm just explaining the reasons why EAV can cause problems.
"How would i be supposed to assign "resources" to for example, my
product table, "the normal way"?"
The more common approach is to have a different intersection table (AKA junction table) for each relationship e.g.USER_RESOURCES, PRODUCT_RESOURCES, etc. Each table would consist of a composite primary key, e.g. (USER_ID, RESOURCE_ID), and probably not much else.
The other approach is to use a generic super-type table with specific sub-type tables. This is the implementation which Damir has modelled. The normal use caee for super-types is when we have a bunch of related entities which have some attributes, behaviours and usages in common plus seom distinct features of their own. For instance, PERSON and USER, CUSTOMER, SUPPLIER.
Regarding your scenario I don't think USER, PRODUCT and GALLERY fit this approach. Sure they are all consumers of RESOURCE, but that is pretty much all they have in common. So trying to map them to an ITEM super-type is a procrustean solution; gaining a generic ITEM_RESOURCE table is likely to be a small reward for the additiona hoops you're going to have to jump through elsewhere.
I have a database design where i store images in a table called
resource_file.
You're not storing images; you're storing filenames. The filename may or may not identify an image. You'll need to keep database and filesystem permissions in sync.
Your resource_file table structure says, "Image filenames are identifiable in the database, but are unidentifiable in the filesystem." It says that because resource_file_id is the primary key, but there are no unique constraints besides that id. I suspect your image files actually are identifiable in the filesystem, and you'd be better off with database constraints that match that reality. Maybe a unique constraint on (filename, extension).
Same idea for the resource table.
For resource_relation, you probably need a unique constraint on either (resource_id, data_id) or (resource_id, data_id, module_code). But . . .
I'll try to give this some more thought later. It's kind of hard to figure out what you're trying to do resource_relation, which is usually a red flag.