MySQL db structure help - mysql

I'm working on a quiz project and I want create a mysql structure in such a way that:
questionID: A unique question identification number(primary key)
testID: A unique test identification number(question belongs to this test)(primary key)
questionOrder: The order of the question within the quiz questions, ie this question is n-th question in the quiz. I want this value to come from mysql, so that when I insert a new question to db, I don't have to calculate it
One question can be in multiple different tests.
I have couple of questions:
1) I have the following code but I get:
Incorrect table definition; there can be only one auto column and it must be defined as a key
How can I fix this?
2) This structure doesn't allow a question to belong to multiple quizzes. Any idea to avoid this?
3) Do you think this structure is good/optimum, can you suggest anything better?
CREATE TABLE `quiz_question` (
`questionID` int(11) NOT NULL auto_increment,
`quizID` int(11) NOT NULL default '0',
`questionOrder` int(11) NOT NULL AUTO_INCREMENT,
`question` varchar(256) NOT NULL default '',
`answer` varchar(256) NOT NULL default '',
PRIMARY KEY (`questionID`),
UNIQUE KEY (`quizID`, `questionOrder`),
KEY `par_ind` (`quizID`, `questionOrder`)
) ENGINE=MyISAM;
ALTER TABLE `quiz_question`
ADD CONSTRAINT `0_133` FOREIGN KEY (`quizID`) REFERENCES `quiz_quiz` (`quizID`);
CREATE TABLE `quiz_quiz` (
`quizID` int(11) NOT NULL auto_increment,
`topic` varchar(100) NOT NULL default '',
`information` varchar(100) NOT NULL default '',
PRIMARY KEY (`quizID`)
) ENGINE=MyISAM;
Thanks for reading this.

1) You can only have one AUTO_INCREMENT column per table. It should be a key. Generally, it's part of / is the PK.
2) A 'quiz' would be an entity composed of questions. You should have 3 tables:
1 - quiz_question: quest_id, question, answer
2 - quiz_quiz: quiz_id, topic, info
3 - quiz_fact: quiz_id, quest_id, quest_order
The quiz and question tables hold the per-item (quiz/question) information. The quiz_fact defines how a quiz is composed (this quiz has this question in this order).
3) My only suggestion would be to use Drizzle instead ; ) Seriously though, play with things - 'good enough' often is. If it suits your needs, why tinker? Otherwise you can ask more detailed questions once you have this up and runnning (ie my queries are too slow on such and such operations).

1) Do the order increment yourself. The DB will only do it if it's part of a PK. You might be able to hack it by making a composite key containing the order column but it's not worth it.
2) Rename quiz_question to question (and quiz_quiz to quiz). Make a new quiz-question join table called quiz_question. It should have a quiz ID and a question ID, linking a quiz to a question. As the same question will have different orders on different quizes, put the question order on the new quiz_question. You no longer need a quiz ID on the question table.

Remove AUTO_INCREMENT from the questionOrder field.
As far as having MySQL set the value in the questionOrder field, then do that in a subsequent UPDATE query. Usually, you'd want the administrator of the test, using your admin utility, to be able to adjust the ordering of questions. In that case, you just enter an initial value +1 higher than the highest previous ordering value (on that test). Then, you can let them adjust it something like the manner of adjusting a Netflix queue :)

Related

How to efficiently update values without a primary key in MySQL?

I am currently facing an issue with designing a database table and updating/inserting values into it.
The table is used to collect and aggregate statistics that are identified by:
the source
the user
the statistic
an optional material (e.g. item type)
an optional entity (e.g. animal)
My main issue is, that my proposed primary key is too large because of VARCHARs that are used to identify a statistic.
My current table is created like this:
CREATE TABLE `Statistics` (
`server_id` varchar(255) NOT NULL,
`player_id` binary(16) NOT NULL,
`statistic` varchar(255) NOT NULL,
`material` varchar(255) DEFAULT NULL,
`entity` varchar(255) DEFAULT NULL,
`value` bigint(20) NOT NULL)
In particular, the server_id is configurable, the player_id is a UUID, statistic is the representation of an enumeration that may change, material and entity likewise. The value is then aggregated using SUM() to calculate the overall statistic.
So far it works but I have to use DELETE AND INSERT statements whenever I want to update a value, because I have no primary key and I can't figure out how to create such a primary key in the constraints of MySQL.
My main question is: How can I efficiently update values in this table and insert them when they are not currently present without resorting to deleting all the rows and inserting new ones?
The main issue seems to be the restriction MySQL puts on the primary key. I don't think adding an id column would solve this.
Simply add an auto-incremented id:
CREATE TABLE `Statistics` (
statistis_id int auto_increment primary key,
`server_id` varchar(255) NOT NULL,
`player_id` binary(16) NOT NULL,
`statistic` varchar(255) NOT NULL,
`material` varchar(255) DEFAULT NULL,
`entity` varchar(255) DEFAULT NULL,
`value` bigint(20) NOT NULL
);
Voila! A primary key. But you probably want an index. One that comes to mind:
create index idx_statistics_server_player_statistic on statistics(server_id, player_id, statistic)`
Depending on what your code looks like, you might want additional or different keys in the index, or more than one index.
Follow the below hope it will solve your problem :-
- First use a variable let suppose "detailed" as money with your table.
- in your project when you use insert statement then before using statement get the maximum of detailed (SELECT MAX(detailed)+1 as maxid FROM TABLE_NAME( and use this as use number which will help you to FETCH,DELETE the record.
-you can also update with this also BUT during update MAXIMUM of detailed is not required.
Hope you understand this and it will help you .
I have dug a bit more through the internet and optimized my code a lot.
I asked this question because of bad performance, which I assumed was because of the DELETE and INSERT statements following each other.
I was thinking that I could try to reduce the load by doing INSERT IGNORE statements followed by UPDATE statements or INSERT .. ON DUPLICATE KEY UPDATE statements. But they require keys to be useful which I haven't had access to, because of constraints in MySQL.
I have fixed the performance issues though:
By reducing the amount of statements generated asynchronously (I know JDBC is blocking but it worked, it just blocked thousand of threads) and disabling auto-commit, I was able to improve the performance by 600 times (from 60 seconds down to 0.1 seconds).
Next steps are to improve the connection string and gaining even more performance.

Best way with relation tables

I have a question about tables and relations tables ...
Actually, I have these 3 tables
CREATE TABLE USER (
ID int(11) NOT NULL AUTO_INCREMENT,
NAME varchar(14) DEFAULT NULL
);
CREATE TABLE COUNTRY (
ID int(11) NOT NULL AUTO_INCREMENT,
COUNTRY_NAME varchar(14) DEFAULT NULL
);
CREATE TABLE USER_COUNTRY_REL (
ID int(11) NOT NULL AUTO_INCREMENT,
ID_USER int(11) NOT NULL,
ID_COUNTRY int(11) NOT NULL,
);
Ok, so now, 1 user can have one or more country, so, several entries in the table USER_COUNTRY_REL for ONE user.
But, my table USER contains almost 130.000 entries ...
Even for 1 country by user, it's almost 10Mo for the USER_COUNTRY_REL table.
And I have several related tables in this style ...
My question is, is it the fastest, better way to do?
This would not be better to put directly in the USER table, COUNTRY field that contains the different ID (like this: "2, 6, ...")?
Thanks guys ;)
The way you have it is the most optimal as far as time constraints go. Sure, it takes up more space, but that's part of space-time tradeoff - If you want to be faster, you use more space; if you want to use less space, it will run slower (on average).
Also, think of the future. Right now, you're probably selecting the countries for each user, but just wait. Thanks to the magic of scope creep, your application will one day need to select all the users in a given country, at which point scanning each user's "COUNTRY" field to find matches will be incredibly slow, as opposed to just going backwards through the USER_COUNTRY_REL table like you could do now.
In general, for a 1-to-1 or 1-to-many correlation, you can link by foreign key. For a many-to-many correlation, you want to have a relation table in between the two. This scenario is a many-to-many relationship, as each user has multiple countries, and each country has multiple users.
Why not try like this: Create table country first
CREATE TABLE COUNTRY (
CID int(11) NOT NULL AUTO_INCREMENT,
COUNTRY_NAME varchar(14) DEFAULT NULL
);
Then the table user:
CREATE TABLE USER (
ID int(11) NOT NULL AUTO_INCREMENT,
NAME varchar(14) DEFAULT NULL,
CID Foreign Key References CID inCountry
);
just Create a Foreign Key relation between them.
If you try to put this as explicit relation , there will lot of redundancy data.
This is the better approach. You can also make that Foreign Key as index . So that the databse retrieval becomes fast during search operations.
hope this helps..
Note : Not sure about the exact syntax of the foreign key

MYSQL Long super-keys

I am currently working on a project, which involves altering data stored in a MYSQL database. Since the table that I am working on does not have a key, I add a key with the following command:
ALTER TABLE deCoupledData ADD COLUMN MY_KEY INT NOT NULL AUTO_INCREMENT KEY
Due to the fact that I want to group my records according to selected fields, I try to create an index for the table deCoupledData that consists of MY_KEY, along with the selected fields. For example, If I want to work with the fields STATED_F and NOT_STATED_F, I type:
ALTER TABLE deCoupledData ADD INDEX (MY_KEY, STATED_F, NOT_STATED_F)
The real issue is that the fields that I usually work with are more than 16, so MYSQL does not allow super-keys longer than 16 fields.
In conclusion, Is there another way to do this? Can I make (somehow) MYSQL to order the records according to the desired super-key (something like clustering)? I really need to make my script faster and the main overhead is that each group may contain records which are not stored on the same page of the disk, and I assume that my pc starts random I/Os in order to retrieve records.
Thank you for your time.
Nick Katsipoulakis
CREATE TABLE deCoupledData (
AA double NOT NULL DEFAULT '0',
STATED_F double DEFAULT NULL,
NOT_STATED_F double DEFAULT NULL,
MIN_VALUES varchar(128) NOT NULL DEFAULT '-1,-1',
MY_KEY int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (MY_KEY),
KEY AA (AA) )
ENGINE=InnoDB AUTO_INCREMENT=74358 DEFAULT CHARSET=latin1
Okay, first of all, when you add an index over multiple columns and you don't really use the first column, the index is useless.
Example: You have a query like
SELECT *
FROM deCoupledData
WHERE
stated_f = 5
AND not_stated_f = 10
and an index over (MY_KEY, STATED_F, NOT_STATED_F).
The index can only be used, if you have another AND my_key = 1 or something in the WHERE clause.
Imagine you want to look up every person in a telephone book with first name 'John'. Then the knowledge that the book is sorted by last name is useless, you still have to look up every single name.
Also, the primary key does not have to be a surrogate / artificial one. It's nearly always better to have a primary key which is made up of columns which identify each row uniquely anyway.
Also it's not always good to have many indexes. Not only do indexes slow down INSERTs and UPDATEs, sometimes they just cause an extra lookup, since first a look at the index is taken and a second look to find the actual data.
That's just a few tips. Maybe Jordan's hint is not a bad idea, "You should maybe post a new question that has your actual SQL query, table layout, and performance questions".
UPDATE:
Yes, that is possible. According to manual
If you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index.
which means that the data is practically sorted on disk, yes.
Be aware that it's also possible to define a primary key over multiple columns!
Like
CREATE TABLE deCoupledData (
AA double NOT NULL DEFAULT '0',
STATED_F double DEFAULT NULL,
NOT_STATED_F double DEFAULT NULL,
MIN_VALUES varchar(128) NOT NULL DEFAULT '-1,-1',
MY_KEY int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (NOT_STATED_F, STATED_F, AA),
KEY AA (AA) )
ENGINE=InnoDB AUTO_INCREMENT=74358 DEFAULT CHARSET=latin1
as long as the combination of the columns is unique.

Approach for multiple "item sets" in Database Design

I have a database design where i store image filenames in a table called resource_file.
CREATE TABLE `resource_file` (
`resource_file_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`resource_id` int(11) NOT NULL,
`filename` varchar(200) NOT NULL,
`extension` varchar(5) NOT NULL DEFAULT '',
`display_order` tinyint(4) NOT NULL,
`title` varchar(255) NOT NULL,
`description` text NOT NULL,
`canonical_name` varchar(200) NOT NULL,
PRIMARY KEY (`resource_file_id`)
) ENGINE=InnoDB AUTO_INCREMENT=592 DEFAULT CHARSET=utf8;
These "files" are gathered under another table called resource (which is something like an album):
CREATE TABLE `resource` (
`resource_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`description` text NOT NULL,
PRIMARY KEY (`resource_id`)
) ENGINE=InnoDB AUTO_INCREMENT=285 DEFAULT CHARSET=utf8;
The logic behind this design comes handy if i want to assign a certain type of "resource" (album) to a certain type of "item" (product, user, project & etc) for example:
CREATE TABLE `resource_relation` (
`resource_relation_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`module_code` varchar(32) NOT NULL DEFAULT '',
`resource_id` int(11) NOT NULL,
`data_id` int(11) NOT NULL,
PRIMARY KEY (`resource_relation_id`)
) ENGINE=InnoDB AUTO_INCREMENT=328 DEFAULT CHARSET=utf8;
This table holds the relationship of a resource to a certain type of item like:
Product
User
Gallery
& etc.
I do exactly this by giving the "module_code" a value like, "product" or "user" and assigning the data_id to the corresponding unique_id, in this case, product_id or user_id.
So at the end of the day, if i want to query the resources assigned to a product with the id of 123 i query the resource_relation table: (very simplified pseudo query)
SELECT * FROM resource_relation WHERE data_id = 123 AND module_code = 'product'
And this gives me the resource's for which i can find the corresponding images.
I find this approach very practical but i don't know if it is a correct approach to this particular problem.
What is the name of this approach?
Is it a valid design?
Thank you
This one uses super-type/sub-type. Note how primary key propagates from a supert-type table into sub-type tables.
To answer your second question first: the table resource_relation is an implementation of an Entity-attribute-value model.
So the answer to the next question is, it depends. According to relational database theory it is bad design, because we cannot enforce a foreign key relationship between data_id and say product_id, user_id, etc. It also obfuscates the data model, and it can be harder to undertake impact analysis.
On the other hand, lots of people find, as you do, that EAV is a practical solution to a particular problem, with one table instead of several. Although, if we're talking practicality, EAV doesn't scale well (at least in relational products, there are NoSQL products which do things differently).
From which it follows, the answer to your first question, is it the correct approach?, is "Strictly, no". But does it matter? Perhaps not.
" I can't see a problem why this would "not" scale. Would you mind
explaining it a little bit further? "
There are two general problems with EAV.
The first is that small result sets (say DATE_ID=USER_ID) and big result sets (say DATE_ID=PRODUCT_ID) use the same query, which can lead to sub-optimal execution plans.
The second is that adding more attributes to the entity means the query needs to return more rows, whereas a relational solution would return the same number of rows, with more columns. This is the major scaling cost. It also means we end up writing horrible queries like this one.
Now, in your specific case perhaps neither of these concerns are relevant. I'm just explaining the reasons why EAV can cause problems.
"How would i be supposed to assign "resources" to for example, my
product table, "the normal way"?"
The more common approach is to have a different intersection table (AKA junction table) for each relationship e.g.USER_RESOURCES, PRODUCT_RESOURCES, etc. Each table would consist of a composite primary key, e.g. (USER_ID, RESOURCE_ID), and probably not much else.
The other approach is to use a generic super-type table with specific sub-type tables. This is the implementation which Damir has modelled. The normal use caee for super-types is when we have a bunch of related entities which have some attributes, behaviours and usages in common plus seom distinct features of their own. For instance, PERSON and USER, CUSTOMER, SUPPLIER.
Regarding your scenario I don't think USER, PRODUCT and GALLERY fit this approach. Sure they are all consumers of RESOURCE, but that is pretty much all they have in common. So trying to map them to an ITEM super-type is a procrustean solution; gaining a generic ITEM_RESOURCE table is likely to be a small reward for the additiona hoops you're going to have to jump through elsewhere.
I have a database design where i store images in a table called
resource_file.
You're not storing images; you're storing filenames. The filename may or may not identify an image. You'll need to keep database and filesystem permissions in sync.
Your resource_file table structure says, "Image filenames are identifiable in the database, but are unidentifiable in the filesystem." It says that because resource_file_id is the primary key, but there are no unique constraints besides that id. I suspect your image files actually are identifiable in the filesystem, and you'd be better off with database constraints that match that reality. Maybe a unique constraint on (filename, extension).
Same idea for the resource table.
For resource_relation, you probably need a unique constraint on either (resource_id, data_id) or (resource_id, data_id, module_code). But . . .
I'll try to give this some more thought later. It's kind of hard to figure out what you're trying to do resource_relation, which is usually a red flag.

Should I normalize this table?

I have a table Items which stores fetched book data from Amazon. This Amazon data is inserted into Items as users browse the site, so any INSERT that occurs needs to be efficient.
Here's the table:
CREATE TABLE IF NOT EXISTS `items` (
`Item_ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`Item_ISBN` char(13) DEFAULT NULL,
`Title` varchar(255) NOT NULL,
`Edition` varchar(20) DEFAULT NULL,
`Authors` varchar(255) DEFAULT NULL,
`Year` char(4) DEFAULT NULL,
`Publisher` varchar(50) DEFAULT NULL,
PRIMARY KEY (`Item_ID`),
UNIQUE KEY `Item_Data` (`Item_ISBN`,`Title`,`Edition`,`Authors`,`Year`,`Publisher`),
KEY `ISBN` (`Item_ISBN`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT AUTO_INCREMENT=1 ;
Normalizing this table would presumably mean creating tables for Titles, Authors, and Publishers. My concern with doing this is that the insert would become too complex.. To insert a single Item, I'd have to:
Check for the Publisher in Publishers to SELECT Publisher_ID, otherwise insert it and use mysql_insert_id() to get Publisher_ID.
Check for the Authors in Authors to SELECT Authors_ID, otherwise insert it and use mysql_insert_id() to get Authors_ID.
Check for the Title in Titles to SELECT Title_ID, otherwise insert it and use mysql_insert_id() to get Title_ID.
Use those ID's to finally insert the Item (which may in fact be a duplicate, so this whole process would have been a waste..)
Does that argue against normalization for this table?
Note: The goal of Items is not to create a comprehensive database of books, so that a user would say "Show me all the books by Publisher X." The Items table is just used to cache Items for my users' search results.
Considering your goal, I definitely wouldn't normalize this.
You've answered your own question - don't normalize it!
YES you should normalize it if you don't think it is already. However, as far as I can tell it's already in 5th Normal Form anyway - at least it seems to be based on the "obvious" interpretation of those column names and if you ignore the nullable columns. Why do you doubt it? Not sure why you want to allow nulls for some of those columns though.
1.Check for the Publisher in Publishers to SELECT Publisher_ID,
otherwise insert it and use
mysql_insert_id() to get Publisher_ID
There is no "Publisher_ID" in your table. Normalization has nothing to do with inventing a new "Publisher_ID" attribute. Substituting a "Publisher_ID" in place of Publisher certainly wouldn't make it any more normalized than it already is.
The only place where i can see normalization useful in your case is if you want to store information about each author.
However -
Where normalization could help you - Saving space! Especially if there is a lot of repetition in terms of publishers, authors (that is, if you normalize individual authors table).
So if you are dealing with 10s of millions of rows, normalization will show an impact in terms of space(even performance). If you don't face that situation (which i believe should be the case) you don't need to normalize.
ps - Also think of the future... will there ever be a need? DBs are a long term infrastructure... never design them keeping the now in mind.