2 independent auto_incremental fields in MySQL table - mysql

I am trying to create a MySQL table that has a generic ID column, but also a secondary ID column, both of which need some form of auto incrementing
currently my MySQL table looks like this:
`ban_id` mediumint unsigned NOT NULL AUTO_INCREMENT,
`student_uuid` varchar(36) NOT NULL,
`student_ban_id` tinyint unsigned NOT NULL AUTO_INCREMENT,
(a bunch of data irrelevant to this question)
PRIMARY KEY (`student_uuid`, `student_ban_id`),
UNIQUE (`ban_id`)
The desired behavior is that ban_id is just a generic entry_id and that student_ban_id is the ban's number for the given student. (my reasoning is that I want to be able to reference bans by an id value if the student_uuid is unavailable, but the program spec also requires the ability to take student:banID as a valid means of reference)
A example row might be BanID:501, {studentUUID}, studentBanID:2 (501st ban, 2nd ban against the given student)
I have run into the issue that the MyISAM engine does not support tracking two separate incremental columns at once (I believe it can handle both desired behaviors, but not at the same time)
What might be the best way to achieve such a behavior?
Much appreciated!
-Cryptic

Related

Will a Insert or Update query affect from indexed columns which are not included in query?

Mysql and SQL users. This question related to both of you. Its about indexing. I have this table structure for a classified website. I have a one common table to store title, description, user who post etc.. Also I have this table structure to store detail attributes about a particular ad category.
CREATE TABLE `ad_detail` (
`ad_detail_id_pk` int(10) NOT NULL AUTO_INCREMENT,
`header_id_fk` int(10) NOT NULL,
`brand_id_fk` smallint(5) NULL,
`brand_name` varchar(200) NULL,
`is_brand_new` bool,
.......
`transmission_type_id_fk` tinyint(3) NULL,
`transmission_type_name` varchar(200) NULL,
`body_type_id_fk` tinyint(3) unsigned NULL,
`body_type_name` varchar(200) NULL,
`mileage` double NULL,
`fuel_type_id_fk` tinyint(3) NULL,
......
PRIMARY KEY (`ad_detail_id_pk`)
)
SO as you can see first part of the attributes will belong to mobile ads and second part belongs to vehicle ads like so on I have other attributes for other categories. header_id_fk will hold the relationship to header table which have common information. So all of these foreign keys are some what involves in filtering ads. Some may wants to find all the mobile phones which made by Nokia. SO then the brand_id_fk will be use. Some may wants to filter vehicle by fuel type. So as you can see I need to index every filtering attributes in this table. So now this is my question.
So when user post a mobile ad insert statement will contain certain no of fields to store data. But as we all know index will gain the performance when data retrieval but it will make additional cost to insert and update queries. So if I insert mobile ad, will that insert query suffer from other attributes which are relevant to vehicles ads' index fields?
Yes, normal indexes contain one row for every row in the table (unless you use oracle http://use-the-index-luke.com/sql/where-clause/null). So therefore every index will have a new row inserted every time you insert a row into the table, and the associated index maintenance issues (page splits etc.)
You could create a filtered/partial index which excludes nulls which would solve the particular issue of INSERT performance being slowed down by indexes on fields into which you're inserting NULL but you would need to test the solution thoroughly to make sure that the indexes are still being used by the queries that you expect them to be used. Note that mysql does not support partial indexes, AFAIK, the following is for sql-server.
Create Index ix_myFilteredIndex On ad_detail (brand_id_fk) Where brand_id_fk Is Not Null;

Approach to Primary Key on Related Tables to Save Storage Space

I have a question regarding primary keys in Relational Databases. Let's assume that I have the following tables:
Box
id
box_name
BoxItems
id
item_name
belongs_to_box_id (foreign key)
Let's also assume that I intend to store millions of items per day. I would probably use bigint or a guid for the BoxItems.Id.
What I was thinking, and I need your advice on that, is instead of Bigint Id for the BoxItems, use a sequencial TinyInt number and what identified each item is the combination of the belongs_to_box_id plus the tinyint row (e.g. item_numner).
So now instead of the above we get:
BoxItems
belongs_to_box_id
item_sequence_number [TINYINT]
item_name
Example:
Items.Insert(1,1, "my item 1");
Items.Insert(1,2, "my item 2");
So instead of using bigint or GUID for that matter, I can use tinyint and save a lot of disk space.
I want to know what the cons and pros of such approach. I am developing my app using MySQL and ASP.NET 4.5
When you think about it, there's really not much difference between the "box/contents" problem and the "order/line item" problem.
create table boxes (
box_id integer primary key,
box_name varchar(35) not null
);
create table boxed_items (
box_id integer not null references boxes (box_id),
box_item_num tinyint not null,
item_name varchar(35) not null
);
For MySQL, you'd probably use unsigned integer and unsigned tinyint. There's no compelling reason for a database to avoid negative numbers, but developers should lean on the Principle of Least Surprise.
Make sure 256 values are enough. Getting that wrong can be expensive to correct in a table that gets millions of rows each day.
I would recommend writing a simple test for both approaches and compare performance, disk space and ease of implementation and make a judgement call. Both of your suggestions are reasonable and I doubt there will be much of a difference in performance but the best way to find out is to just try it out and then you will know for sure.

SQL table design to reduce redundancy

I have two designs in mind. Wanted to check which one is more optimum as per you guys.
So I have three tables offer, offer_type and offer_type_filter.
Original Design of tables
offer
id int(10) unsigned
code varchar(48)
offer_type_id int(10) unsigned
start_date datetime
exp_date datetime
value int(10)
updated timestamp
created datetime
offer_type
id int(10) unsigned
name varchar(48)
condition varchar(512)
offer_type_filter
id int(10) unsigned
filter_type varchar(20)
filter_value varchar(50)
offer_type_id int(10) unsigned
Now as you all may guess that offer has a type and filter specifies in what specific cases offer will apply. If you are wondering then offer_type.condition is mainly for 20$ off on purchase of min. 300$. Offer_type_filter is to apply this offer only for say McDonalds. Offer can exist without filters.
One prob with current design is that every time I create new offer, even though type is same I have to create a duplicate entry in offer_type and then use that type in offer_type_filter (using current type will mess up existing offers).
So in terms of database re-design it is quite obvious that offer_type must not exist in offer_type_filter so I am convinced it has to change to something like this
Redesign (Doing away with offer_type_filter and creating new table filter. It's basically renaming to something more appropriate)
Filter
id int(10) unsigned
filter_type varchar(20)
filter_value varchar(50)
filter_type_set_id int(10) unsigned
For other tables I am thinking of these two options
Option 1 (offer_type_filter from redesign + other tables same from original design)
offer
id int(10) unsigned
code varchar(48)
offer_type_filter_mapping_id int(10) unsigned
offer_type_filter_mapping
id int(10) unsigned
filter_type_set_id int(10) unsigned > from Filter table
offer_type_id int(10) unsigned
If I choose first design then I will have redundant entries in offer_type_filter_mapping. For offers which don't have filters, offer_type_filter_mapping will have entries of offer_type_id with null as filter_type_set_id. Also then for each type I create, I will have to put an entry in mapping table. So I don't like this aspect of design.
Option 2 (offer_type_filter from redesign + other tables same from original design)
offer
id int(10) unsigned
code varchar(48)
filter_type_set_id int(10) unsigned > from Filter table
I came to Option 2 only because in this case there is redundant filter_type_set_id for each offer and in my case offer table is huge
Wanted your critique as to which design do you think is the least painful. Frequent Usecases: Creating lots of offers with and without filters. We already have close to 40-50 Offer types. The types table is not able to cover all scenario so we do create new types 10 % of the times.
Also I use Spring and Hibernate so you can think from that perspective too what my design constraints would be.
P.S. You might even add that in mysql it is not convenient to generate two id's per table as in offer_type_filter but I am thinking about it. Prob use a dummy table for generation or use an externally generated id.
I see it this way, one offer can have only one offer type_filter, so it makes a 1:N relationship
and offer will take the offer_type attributes that u had before.
the cardinality is N:M
EDIT:
for example, if you have in offer_type_filter.
offer_type_filter_id = 1 and it's 30% off.
offer_type_filter_id = 2 and it's 10% off.
offer_type_filter_id = 3 and it's 0% off.
...
etc
and in your offer table you can have:
offer_id=1 and offer_filter_id=1 //this mean that product 1 has 30% off
offer_id=2 and offer_filter_id=1 //this mean that product 2 has 30% off
offer_id=3 and offer_filter_id=2 //this mean that product 2 has 10% off
offer_id=4 and offer_filter_id=3 //this mean that product 2 has 0% off
...
etc
If your cardinality is one offer can be have only one Offer type, is the first design.
if your cardinality is one offer can have multiple discounts and the same discount for multiple products, I recommend the second design

Approach for multiple "item sets" in Database Design

I have a database design where i store image filenames in a table called resource_file.
CREATE TABLE `resource_file` (
`resource_file_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`resource_id` int(11) NOT NULL,
`filename` varchar(200) NOT NULL,
`extension` varchar(5) NOT NULL DEFAULT '',
`display_order` tinyint(4) NOT NULL,
`title` varchar(255) NOT NULL,
`description` text NOT NULL,
`canonical_name` varchar(200) NOT NULL,
PRIMARY KEY (`resource_file_id`)
) ENGINE=InnoDB AUTO_INCREMENT=592 DEFAULT CHARSET=utf8;
These "files" are gathered under another table called resource (which is something like an album):
CREATE TABLE `resource` (
`resource_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`description` text NOT NULL,
PRIMARY KEY (`resource_id`)
) ENGINE=InnoDB AUTO_INCREMENT=285 DEFAULT CHARSET=utf8;
The logic behind this design comes handy if i want to assign a certain type of "resource" (album) to a certain type of "item" (product, user, project & etc) for example:
CREATE TABLE `resource_relation` (
`resource_relation_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`module_code` varchar(32) NOT NULL DEFAULT '',
`resource_id` int(11) NOT NULL,
`data_id` int(11) NOT NULL,
PRIMARY KEY (`resource_relation_id`)
) ENGINE=InnoDB AUTO_INCREMENT=328 DEFAULT CHARSET=utf8;
This table holds the relationship of a resource to a certain type of item like:
Product
User
Gallery
& etc.
I do exactly this by giving the "module_code" a value like, "product" or "user" and assigning the data_id to the corresponding unique_id, in this case, product_id or user_id.
So at the end of the day, if i want to query the resources assigned to a product with the id of 123 i query the resource_relation table: (very simplified pseudo query)
SELECT * FROM resource_relation WHERE data_id = 123 AND module_code = 'product'
And this gives me the resource's for which i can find the corresponding images.
I find this approach very practical but i don't know if it is a correct approach to this particular problem.
What is the name of this approach?
Is it a valid design?
Thank you
This one uses super-type/sub-type. Note how primary key propagates from a supert-type table into sub-type tables.
To answer your second question first: the table resource_relation is an implementation of an Entity-attribute-value model.
So the answer to the next question is, it depends. According to relational database theory it is bad design, because we cannot enforce a foreign key relationship between data_id and say product_id, user_id, etc. It also obfuscates the data model, and it can be harder to undertake impact analysis.
On the other hand, lots of people find, as you do, that EAV is a practical solution to a particular problem, with one table instead of several. Although, if we're talking practicality, EAV doesn't scale well (at least in relational products, there are NoSQL products which do things differently).
From which it follows, the answer to your first question, is it the correct approach?, is "Strictly, no". But does it matter? Perhaps not.
" I can't see a problem why this would "not" scale. Would you mind
explaining it a little bit further? "
There are two general problems with EAV.
The first is that small result sets (say DATE_ID=USER_ID) and big result sets (say DATE_ID=PRODUCT_ID) use the same query, which can lead to sub-optimal execution plans.
The second is that adding more attributes to the entity means the query needs to return more rows, whereas a relational solution would return the same number of rows, with more columns. This is the major scaling cost. It also means we end up writing horrible queries like this one.
Now, in your specific case perhaps neither of these concerns are relevant. I'm just explaining the reasons why EAV can cause problems.
"How would i be supposed to assign "resources" to for example, my
product table, "the normal way"?"
The more common approach is to have a different intersection table (AKA junction table) for each relationship e.g.USER_RESOURCES, PRODUCT_RESOURCES, etc. Each table would consist of a composite primary key, e.g. (USER_ID, RESOURCE_ID), and probably not much else.
The other approach is to use a generic super-type table with specific sub-type tables. This is the implementation which Damir has modelled. The normal use caee for super-types is when we have a bunch of related entities which have some attributes, behaviours and usages in common plus seom distinct features of their own. For instance, PERSON and USER, CUSTOMER, SUPPLIER.
Regarding your scenario I don't think USER, PRODUCT and GALLERY fit this approach. Sure they are all consumers of RESOURCE, but that is pretty much all they have in common. So trying to map them to an ITEM super-type is a procrustean solution; gaining a generic ITEM_RESOURCE table is likely to be a small reward for the additiona hoops you're going to have to jump through elsewhere.
I have a database design where i store images in a table called
resource_file.
You're not storing images; you're storing filenames. The filename may or may not identify an image. You'll need to keep database and filesystem permissions in sync.
Your resource_file table structure says, "Image filenames are identifiable in the database, but are unidentifiable in the filesystem." It says that because resource_file_id is the primary key, but there are no unique constraints besides that id. I suspect your image files actually are identifiable in the filesystem, and you'd be better off with database constraints that match that reality. Maybe a unique constraint on (filename, extension).
Same idea for the resource table.
For resource_relation, you probably need a unique constraint on either (resource_id, data_id) or (resource_id, data_id, module_code). But . . .
I'll try to give this some more thought later. It's kind of hard to figure out what you're trying to do resource_relation, which is usually a red flag.

MySQL column with various types

I seem to often find myself wanting to store data of more than one type (usually specifically integers and text) in the same column in a MySQL database. I know this is horrible, but the reason it happens is when I'm storing responses that people have made to questions in a questionnaire. Some questions need an integer response, some need a text response and some might be an item selected from a list.
The approaches I've taken in the past have been:
Store everything as text and convert to int (or whatever) when needed later.
Have two columns - one for text and one for int. Then you just fill one in per row per response, and leave the other one as null.
Have two tables - one for text responses and one for integer responses.
I don't really like any of those, though, and I have a feeling there must be a much better way to deal with this kind of situation.
To make it more concrete, here's an example of the tables I might have:
CREATE TABLE question (
id int(11) NOT NULL auto_increment,
text VARCHAR(200) NOT NULL default '',
PRIMARY KEY ('id')
)
CREATE TABLE response (
id int(11) NOT NULL auto_increment,
question int (11) NOT NULL,
user int (11) NOT NULL,
response VARCHAR(200) NOT NULL default ''
)
or, if I went with using option 2 above:
CREATE TABLE response (
id int(11) NOT NULL auto_increment,
question int (11) NOT NULL,
user int (11) NOT NULL,
text_response VARCHAR(200),
numeric_response int(11)
)
and if I used option 3 there'd be a responseInteger table and a responseText table.
Is any of those the right approach, or am I missing an obvious alternative?
[Option 2 is] NOT the most normalized option [as #Ray claims]. The most normalized would have no nullable fields and obviously option 2 would require a null on every row.
At this point in your design you have to think about the usage, the queries you'll do, the reports you'll write. Will you want to do math on all of the numeric responses at the same time? i.e. WHERE numeric_response IS NOT NULL? Probably unlikely.
More likely would be, What's the average response WHERE Question = 11. In those cases you can either choose the INT table or the INT column and neither would be easier to do than the other.
If you did do two tables, you'd more than likely be constantly unioning them together for questions like, what % of questions have a response etc.
Can you see how the questions you ask your database to answer start to drive the design?
I'd opt for Option 1. The answers are always text strings, but sometimes the text string happens to be the representation of an integer. What is less easy is to determine what constraints, if any, should be placed on the answer to a given question. If some answer should only be a sequence of one or more digits, how do you validate that? Most likely, the Questions table should contain information about the possible answers, and that should guide the validation.
I note that the combination of QuestionID and UserID is (or should be) unique (for a given questionnaire). So, you really don't need the auto-increment column in the answer. You should also have a unique constraint (or primary key constraint) on the QuestionID and UserID anyway (regardless of whether you keep the auto-increment column).
Option 2 is the correct, most normalized option.