Mysql - A simple database design question - mysql

Suppose I have Tutors who take online webclasses and create learning packs. Both online webclasses and learning packscan be rated by students and finally a tutor's rating is the simple average of all the ratings on his classes and packs.
This is the table architecture of our current Ratings table -
CREATE TABLE IF NOT EXISTS `Ratings` (
`id_rating` int(10) unsigned NOT NULL auto_increment,
`id_teacher` int(10) unsigned default NULL COMMENT 'the teacher who created the class/pack',
`id_lp` int(10) unsigned default NULL COMMENT 'the id of the learning pack',
`id_wc` int(10) NOT NULL default '0' COMMENT 'the id of the webclass',
`id_user` int(10) unsigned NOT NULL default '0' COMMENT 'the user who has rated',
`rate` int(10) unsigned NOT NULL default '0',
`cdate` timestamp NOT NULL default CURRENT_TIMESTAMP,
`udate` timestamp NULL default NULL,
PRIMARY KEY (`id_rating`),
KEY `Ratings_FKIndex1` (`id_user`),
KEY `id_lp` (`id_lp`),
KEY `id_wc` (`id_wc`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
Currently, both class and pack ratings are entered into the same table (id_wc and id_lp are entered accordingly - one is entered and the other is NULL for every rating record).
So, my question is -
Is this architecture correct or is it better to keep class and pack ratings separate? Why or why not? I need exactly the same number of Rating table fields for both class and pack rating.
I guess, If class and pack ratings were to be separately found, then separate tables would reduce the number of records to be looked up. But, since in our case only tutor ratings are needed (involves both class and packs), that's why all the ratings are put together.

A bit more detailed model.
A teacher at a university can take classes too.
One class may have more than one teacher.
There may be several classes on the same subject, taught by different teachers.
Only students who participate in classes get to vote (rate) the class.
Learning packs are on a subject (math, biology).
One learning pack can have several authors.
Technically, a student can author a learning pack too.
Only members who use a learning pack get to rate a pack.
Although authors can vote for packs and teachers can vote for their classes, those votes are ignored.
If only interested in the ratings table, you could use:
Or, combine both models into something like:

If you think that you'll end up with more entities which will require rating then you need to create something more generic (and not very db-philosophy-friendly).
ratings
-------
id
voterClass
voterId
subjectClass
subjectId
vote
date(s)
By using this design you forget about FKs and referential integrity. But it's very flexible, and using the right indexes it's very scalable. Also, when entities (subjects) are deleted the votes remain. This design saves you from duplicating fields and tables.

Use mySQL workbench. Its cross-platform and works great.
MySQL workbench visurally see what you are doing with your database. http://diariolinux.com/wp-content/uploads/2008/08/wb51linuxpreview2a.png
MySQL workbench Also meets all of your criteria for general-purpose-remote-data-backup-and-download-including-innodb-support question on stackoverflow
BTW: Use <ctrl>+<G> to forward engineer a database.

Related

How to move from MySQL to Cassandra modeling

I am trying to move from MySQL to Cassandra for a music service application I am building.
I have read the following stackexchange: MySQL Data Model to Cassandra Help?
and checked out https://wiki.apache.org/cassandra/DataModel - also the DataStax Cassandra Modeling they did with the music service also, but the documentation so far are very small and narrow that I can't ditch MySql type queries away, so I would need help on.
This is my album table that works so far in mysql
CREATE TABLE `albums` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(150) NOT NULL,
`description` varchar(300) NOT NULL,
`release_date` int(10) unsigned NOT NULL,
`status` enum('active','inactive','pending') NOT NULL,
`licensor_id` int(11) NOT NULL,
`score` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `status` (`status`),
KEY `licensor_id` (`licensor_id`),
KEY `batch_id` (`batch_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1720100 ;
I also have a one to many relationship on the following tables:, artist (many artist to one album), genre(many genre to one album), songs(1 album contains many songs).
I have many pivot tables going around in order to couple these around.
So because Cassandra doesn't allow joins, I figure that doing set,list,map would help me resolve to the proper dataset.
at first my thoughts were to solve my maping by just reusing the same table:
CREATE TABLE `albums` (
`id` int(10) ,
`title` varchar(150) ,
`description` varchar(300) ,
`release_date` date ,
`status` enum('active','inactive','pending') ,
`licensor_id` int(11) ,
`data_source_provider_id` int(10) ,
`score` int(10)
`genre` <set>
`artist` <set>
PRIMARY KEY (`id`),
) ;
(apologies if the above are not the correct syntax for Cassandra, Ive only begun installing the system on a dev system)
My queries are of the following:
Give me all albums sorted by Score (Descending)
Give me all albums from a particular genre, sorted by score
Give me all albums from a particular artist, sorted by score
Give me all albums sorted by release date, then by score.
In SQL the 4 are easy when doing the join - however since Cassandra doesn't allow joins i figure that my modelling was adequent enough however #4 cannot be satisified (there are no double order by as far as i can tell).
Multiple indexes are slow, and considering that its on a large dataset (there are 1.8M records for now, but I'm planning on pumping triple the amount at least, hence why Cassandra would be useful)
My question are:
1) Is my path to go from MySQL to Cassandra correct despite being stuck on my 4 questions - or did it do it wrong? (I've done some Active Records before with MongoDB where you can have a sub entity within the document, but Cassandra only has set,list and map).
2) If I want to expand my modelling to: " I want to create a list X that contains a predefined number elements from the albums table". Would tagging each Albums element with a new field "tag" that has X be the smart way to filter things OR would it be best to create a new table, that contains all the elements that I need and just query that.
The general advice for Cassandra is to write your tables based on your queries. Don't be shy about writing the same data to multiple tables if some of those queries are not compatible with each other. (Twitter, for example would write each tweet to a table of all the followers of that user.)
That said, looking at your queries, your challenge will be that Cassandra does not inherently have a way of handling some of your sorting needs. You will need to add an analytics engine like Spark or Hadoop's M/R to sort on a non-unique (constantly changing?) field like score.
Let's look at some table definitions that will be a good start. Then you can determine if you need a full blown distributed analytics engine or whether locally sorting the results of the query will be enough.
CREATE TABLE albums(
id uuid,
title text,
description text,
releasedate timestamp,
status text,
license_id varint,
data_source_provider_id varint,
score counter,
genre set<text>,
artist set<text>,
PRIMARY KEY (id)
);
This table will store all your albums by id. Based on your use case, selecting all the albums and sorting them by score would definitely be out of the question. You could, potentially, do something clever like modulo-ing the score and putting the albums in buckets, but I'm not convinced that would scale. Any of your queries could be answered using this table plus analytics, but in the interest of completeness, let's look at some other options for putting your data in Cassandra. Each of the following tables could readily reduce the load from any analytics investigations you run that have additional parameters (like a range of dates or set of genres).
CREATE TABLE albums(
id uuid,
title text,
description text,
releasedate timestamp,
status text,
license_id varint,
data_source_provider_id varint,
score counter,
genre set<text>,
artist text,
PRIMARY KEY (artist, releasedate, title)
);
Cassandra can automatically sort immutable fields. The table above will store each artist's albums in a separate partition (each partition is colocated in your cluster and replicated based on your replication factor). If an album has multiple artists, this record would be duplicated under each artist's entry, and that's OKAY. The second and third keys (releasedate and title) are considered sorting keys. Cassandra will sort the albums first by releasedate and second by title (for the other priority, reverse their order above). Each combo of artist, releasedate and title is logically one row (although on disk, they will be stored as a widerow per artist only). For one artist, you can probably sort the entries by score locally, without direct intervention from the database.
Sorting by release date can easily be accomplished by a similar table, but changing the PRIMARY KEY to: PRIMARY KEY (releasedate, ..?). In this case, however, you probably will face a challenge in sorting (locally) if you have a substantial range of release dates.
Finally, don't try something similar for genre. Genre is too large a set to be contained in a single partition key. Hypothetically if you had a secondary way of splitting that set up, you could do PRIMARY KEY ((genre, artist)), (double parens intentional) but I don't think this fits well with your particular use case as both of such keys are required to look up an entry.

Polymorphic Associations Pattern or AntiPattern or Both?

Multiple questions on this site and others relate to using a MySQL table definition where the name of the table is a column name.
For instance, for "notes" in a DB I am thinking of using the structure:
CREATE TABLE IF NOT EXISTS `Notes` (
`id` int(10) NOT NULL,
`table` varchar(30) NOT NULL,
`row_id` int(10) NOT NULL,
`note` varchar(500) NOT NULL,
`user_id` int(11) NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I keep reading all over the place that this is poor database design. I have figured out that this is called polymorphic association. Polymorphic association is specifically listed as a SQL Anti-Pattern. (or in slides)
I have seen the drawbacks of the antipattern, but I have no requirement for doing any of those types of queries that I can think of.
For my app, I want to be able to write notes on just about every other row in the database. For potentially hundreds of other rows.
It is confusing that while this is listed as an AntiPattern, it seems to be a fundamental part of the ruby ActiveRecord concept. Is the active record layer doing magic that makes this OK. (i.e. its polymorphic association at the record level, but not at the DB level)?
Specifically I would like to understand when/if using this SQL design is safe to use.
-FT

SQL table design to reduce redundancy

I have two designs in mind. Wanted to check which one is more optimum as per you guys.
So I have three tables offer, offer_type and offer_type_filter.
Original Design of tables
offer
id int(10) unsigned
code varchar(48)
offer_type_id int(10) unsigned
start_date datetime
exp_date datetime
value int(10)
updated timestamp
created datetime
offer_type
id int(10) unsigned
name varchar(48)
condition varchar(512)
offer_type_filter
id int(10) unsigned
filter_type varchar(20)
filter_value varchar(50)
offer_type_id int(10) unsigned
Now as you all may guess that offer has a type and filter specifies in what specific cases offer will apply. If you are wondering then offer_type.condition is mainly for 20$ off on purchase of min. 300$. Offer_type_filter is to apply this offer only for say McDonalds. Offer can exist without filters.
One prob with current design is that every time I create new offer, even though type is same I have to create a duplicate entry in offer_type and then use that type in offer_type_filter (using current type will mess up existing offers).
So in terms of database re-design it is quite obvious that offer_type must not exist in offer_type_filter so I am convinced it has to change to something like this
Redesign (Doing away with offer_type_filter and creating new table filter. It's basically renaming to something more appropriate)
Filter
id int(10) unsigned
filter_type varchar(20)
filter_value varchar(50)
filter_type_set_id int(10) unsigned
For other tables I am thinking of these two options
Option 1 (offer_type_filter from redesign + other tables same from original design)
offer
id int(10) unsigned
code varchar(48)
offer_type_filter_mapping_id int(10) unsigned
offer_type_filter_mapping
id int(10) unsigned
filter_type_set_id int(10) unsigned > from Filter table
offer_type_id int(10) unsigned
If I choose first design then I will have redundant entries in offer_type_filter_mapping. For offers which don't have filters, offer_type_filter_mapping will have entries of offer_type_id with null as filter_type_set_id. Also then for each type I create, I will have to put an entry in mapping table. So I don't like this aspect of design.
Option 2 (offer_type_filter from redesign + other tables same from original design)
offer
id int(10) unsigned
code varchar(48)
filter_type_set_id int(10) unsigned > from Filter table
I came to Option 2 only because in this case there is redundant filter_type_set_id for each offer and in my case offer table is huge
Wanted your critique as to which design do you think is the least painful. Frequent Usecases: Creating lots of offers with and without filters. We already have close to 40-50 Offer types. The types table is not able to cover all scenario so we do create new types 10 % of the times.
Also I use Spring and Hibernate so you can think from that perspective too what my design constraints would be.
P.S. You might even add that in mysql it is not convenient to generate two id's per table as in offer_type_filter but I am thinking about it. Prob use a dummy table for generation or use an externally generated id.
I see it this way, one offer can have only one offer type_filter, so it makes a 1:N relationship
and offer will take the offer_type attributes that u had before.
the cardinality is N:M
EDIT:
for example, if you have in offer_type_filter.
offer_type_filter_id = 1 and it's 30% off.
offer_type_filter_id = 2 and it's 10% off.
offer_type_filter_id = 3 and it's 0% off.
...
etc
and in your offer table you can have:
offer_id=1 and offer_filter_id=1 //this mean that product 1 has 30% off
offer_id=2 and offer_filter_id=1 //this mean that product 2 has 30% off
offer_id=3 and offer_filter_id=2 //this mean that product 2 has 10% off
offer_id=4 and offer_filter_id=3 //this mean that product 2 has 0% off
...
etc
If your cardinality is one offer can be have only one Offer type, is the first design.
if your cardinality is one offer can have multiple discounts and the same discount for multiple products, I recommend the second design

Approach for multiple "item sets" in Database Design

I have a database design where i store image filenames in a table called resource_file.
CREATE TABLE `resource_file` (
`resource_file_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`resource_id` int(11) NOT NULL,
`filename` varchar(200) NOT NULL,
`extension` varchar(5) NOT NULL DEFAULT '',
`display_order` tinyint(4) NOT NULL,
`title` varchar(255) NOT NULL,
`description` text NOT NULL,
`canonical_name` varchar(200) NOT NULL,
PRIMARY KEY (`resource_file_id`)
) ENGINE=InnoDB AUTO_INCREMENT=592 DEFAULT CHARSET=utf8;
These "files" are gathered under another table called resource (which is something like an album):
CREATE TABLE `resource` (
`resource_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`description` text NOT NULL,
PRIMARY KEY (`resource_id`)
) ENGINE=InnoDB AUTO_INCREMENT=285 DEFAULT CHARSET=utf8;
The logic behind this design comes handy if i want to assign a certain type of "resource" (album) to a certain type of "item" (product, user, project & etc) for example:
CREATE TABLE `resource_relation` (
`resource_relation_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`module_code` varchar(32) NOT NULL DEFAULT '',
`resource_id` int(11) NOT NULL,
`data_id` int(11) NOT NULL,
PRIMARY KEY (`resource_relation_id`)
) ENGINE=InnoDB AUTO_INCREMENT=328 DEFAULT CHARSET=utf8;
This table holds the relationship of a resource to a certain type of item like:
Product
User
Gallery
& etc.
I do exactly this by giving the "module_code" a value like, "product" or "user" and assigning the data_id to the corresponding unique_id, in this case, product_id or user_id.
So at the end of the day, if i want to query the resources assigned to a product with the id of 123 i query the resource_relation table: (very simplified pseudo query)
SELECT * FROM resource_relation WHERE data_id = 123 AND module_code = 'product'
And this gives me the resource's for which i can find the corresponding images.
I find this approach very practical but i don't know if it is a correct approach to this particular problem.
What is the name of this approach?
Is it a valid design?
Thank you
This one uses super-type/sub-type. Note how primary key propagates from a supert-type table into sub-type tables.
To answer your second question first: the table resource_relation is an implementation of an Entity-attribute-value model.
So the answer to the next question is, it depends. According to relational database theory it is bad design, because we cannot enforce a foreign key relationship between data_id and say product_id, user_id, etc. It also obfuscates the data model, and it can be harder to undertake impact analysis.
On the other hand, lots of people find, as you do, that EAV is a practical solution to a particular problem, with one table instead of several. Although, if we're talking practicality, EAV doesn't scale well (at least in relational products, there are NoSQL products which do things differently).
From which it follows, the answer to your first question, is it the correct approach?, is "Strictly, no". But does it matter? Perhaps not.
" I can't see a problem why this would "not" scale. Would you mind
explaining it a little bit further? "
There are two general problems with EAV.
The first is that small result sets (say DATE_ID=USER_ID) and big result sets (say DATE_ID=PRODUCT_ID) use the same query, which can lead to sub-optimal execution plans.
The second is that adding more attributes to the entity means the query needs to return more rows, whereas a relational solution would return the same number of rows, with more columns. This is the major scaling cost. It also means we end up writing horrible queries like this one.
Now, in your specific case perhaps neither of these concerns are relevant. I'm just explaining the reasons why EAV can cause problems.
"How would i be supposed to assign "resources" to for example, my
product table, "the normal way"?"
The more common approach is to have a different intersection table (AKA junction table) for each relationship e.g.USER_RESOURCES, PRODUCT_RESOURCES, etc. Each table would consist of a composite primary key, e.g. (USER_ID, RESOURCE_ID), and probably not much else.
The other approach is to use a generic super-type table with specific sub-type tables. This is the implementation which Damir has modelled. The normal use caee for super-types is when we have a bunch of related entities which have some attributes, behaviours and usages in common plus seom distinct features of their own. For instance, PERSON and USER, CUSTOMER, SUPPLIER.
Regarding your scenario I don't think USER, PRODUCT and GALLERY fit this approach. Sure they are all consumers of RESOURCE, but that is pretty much all they have in common. So trying to map them to an ITEM super-type is a procrustean solution; gaining a generic ITEM_RESOURCE table is likely to be a small reward for the additiona hoops you're going to have to jump through elsewhere.
I have a database design where i store images in a table called
resource_file.
You're not storing images; you're storing filenames. The filename may or may not identify an image. You'll need to keep database and filesystem permissions in sync.
Your resource_file table structure says, "Image filenames are identifiable in the database, but are unidentifiable in the filesystem." It says that because resource_file_id is the primary key, but there are no unique constraints besides that id. I suspect your image files actually are identifiable in the filesystem, and you'd be better off with database constraints that match that reality. Maybe a unique constraint on (filename, extension).
Same idea for the resource table.
For resource_relation, you probably need a unique constraint on either (resource_id, data_id) or (resource_id, data_id, module_code). But . . .
I'll try to give this some more thought later. It's kind of hard to figure out what you're trying to do resource_relation, which is usually a red flag.

MySQL db structure help

I'm working on a quiz project and I want create a mysql structure in such a way that:
questionID: A unique question identification number(primary key)
testID: A unique test identification number(question belongs to this test)(primary key)
questionOrder: The order of the question within the quiz questions, ie this question is n-th question in the quiz. I want this value to come from mysql, so that when I insert a new question to db, I don't have to calculate it
One question can be in multiple different tests.
I have couple of questions:
1) I have the following code but I get:
Incorrect table definition; there can be only one auto column and it must be defined as a key
How can I fix this?
2) This structure doesn't allow a question to belong to multiple quizzes. Any idea to avoid this?
3) Do you think this structure is good/optimum, can you suggest anything better?
CREATE TABLE `quiz_question` (
`questionID` int(11) NOT NULL auto_increment,
`quizID` int(11) NOT NULL default '0',
`questionOrder` int(11) NOT NULL AUTO_INCREMENT,
`question` varchar(256) NOT NULL default '',
`answer` varchar(256) NOT NULL default '',
PRIMARY KEY (`questionID`),
UNIQUE KEY (`quizID`, `questionOrder`),
KEY `par_ind` (`quizID`, `questionOrder`)
) ENGINE=MyISAM;
ALTER TABLE `quiz_question`
ADD CONSTRAINT `0_133` FOREIGN KEY (`quizID`) REFERENCES `quiz_quiz` (`quizID`);
CREATE TABLE `quiz_quiz` (
`quizID` int(11) NOT NULL auto_increment,
`topic` varchar(100) NOT NULL default '',
`information` varchar(100) NOT NULL default '',
PRIMARY KEY (`quizID`)
) ENGINE=MyISAM;
Thanks for reading this.
1) You can only have one AUTO_INCREMENT column per table. It should be a key. Generally, it's part of / is the PK.
2) A 'quiz' would be an entity composed of questions. You should have 3 tables:
1 - quiz_question: quest_id, question, answer
2 - quiz_quiz: quiz_id, topic, info
3 - quiz_fact: quiz_id, quest_id, quest_order
The quiz and question tables hold the per-item (quiz/question) information. The quiz_fact defines how a quiz is composed (this quiz has this question in this order).
3) My only suggestion would be to use Drizzle instead ; ) Seriously though, play with things - 'good enough' often is. If it suits your needs, why tinker? Otherwise you can ask more detailed questions once you have this up and runnning (ie my queries are too slow on such and such operations).
1) Do the order increment yourself. The DB will only do it if it's part of a PK. You might be able to hack it by making a composite key containing the order column but it's not worth it.
2) Rename quiz_question to question (and quiz_quiz to quiz). Make a new quiz-question join table called quiz_question. It should have a quiz ID and a question ID, linking a quiz to a question. As the same question will have different orders on different quizes, put the question order on the new quiz_question. You no longer need a quiz ID on the question table.
Remove AUTO_INCREMENT from the questionOrder field.
As far as having MySQL set the value in the questionOrder field, then do that in a subsequent UPDATE query. Usually, you'd want the administrator of the test, using your admin utility, to be able to adjust the ordering of questions. In that case, you just enter an initial value +1 higher than the highest previous ordering value (on that test). Then, you can let them adjust it something like the manner of adjusting a Netflix queue :)