I am new in database, I am going to share with you 2 database tables design here,I just want to know which one is best design and why?
First one i have create a user table, subject table and user_subject table.
In user table i am saving user information and in subject i have saved subject. IN user_subject I have saved user id and subject id.
CREATE TABLE IF NOT EXISTS `subjects` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
-- --------------------------------------------------------
--
-- Table structure for table `users`
--
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
-- --------------------------------------------------------
--
-- Table structure for table `user_subjects`
--
CREATE TABLE IF NOT EXISTS `user_subjects` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`subject_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
2 one>
CREATE TABLE IF NOT EXISTS `subjects` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`subject_name` varchar(2000) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
I have save user and it's subjects in user table with coma (,) sprat and not create another table to save user and subject id.
I thing 2 one is best because we are not need to save data in thrid table. Please tell me which one is best and long lasting for future.
The first version is much, much better. Here are some reasons why you do not want to use comma delimited strings:
SQL does not have particularly good string functions -- the basics, but not much more.
When you store values in a delimited string, the database cannot validate the data. With a separate table you can use foreign key constraints.
Queries on comma-delimited columns cannot make use of standard indexes (although it might be possible to use full text indexes).
With a comma-delimited string, the database cannot validate that the subjects are unique.
The first method uses a junction table and it is the better way to implement this logic in a relational database.
It's ok to use second way IF:
1) subjects have only one value of importance (name)
2) that value uniquely identifies subjects (i.e. no two subjects have the same name), OR there is no need to make distinction between two subjects with same name
Generally speaking first way is better because if you suddenly decide to give a new value to the subjects (for example age) you don't have to redo your whole table structure.
The second solution is not very good anyway, since you can not use joins or index.
Which solution would be the best, depends on the kind of relationship between users and subjects.
If each subject belongs to exactly one user and each user may have an arbitrary number of subjects, which means you have a one-to-many relationship, than you should add user_id to the table subject.
If any subject can belong to more than one user and each user can have many subjects you should use your first solution with a third mapping table (that would be a many-to-many-relationship).
In both cases you can express following queries very easily and cleanly in sql using a simple join:
which subjects belong to a given user
which users have subjects with a name containing a certain expression
which is the user (are the users in second case) of a given subject
Related
We have a set of users
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(254) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
Each user can have one or many domains, such as
CREATE TABLE `domains` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(11) NOT NULL,
`domain` varchar(254) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `domain` (`domain`),
CONSTRAINT `domains_user_id_fk` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
And we have a table that has some sort of data, for this example it doesn't really matter what it contains
CREATE TABLE `some_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` TEXT NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
We want certain elements of some_data to be accessible to only certain users or only certain domains (whitelist case).
In other cases we want elements of some_data to be accessible to everyone BUT certain users or certain domains (blacklist case).
Ideally we would like to retrieve the list of domains that the given element of some_data is accessible to in a single query and ideally do the reverse (list all the data the given domain has access to)
Our approach so far is a single table
CREATE TABLE `access_rules` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rule_type` enum('blacklist','whitelist')
`some_data_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`domain_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
CONSTRAINT `access_rules_some_data_id_fk` FOREIGN KEY (`some_data_id`) REFERENCES `some_data` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
The problem however is the fact that we need to query the db twice (to figure out if the given data entry is operating a blacklist or a whitelist [whitelist has higher priority]). (EDIT: it can be done in a single query)
Also since the domain_id is nullable (to allow blacklisting / whitelisting an entire user) joining is not easy
The API that will use this schema is currently hit 4-5k times per second so performance matters.
The users table is relatively small (50k+ rows) and the domains table is about 1.5 million entries. some_data is also relatively small (sub 100k rows)
EDIT: the question is more around semantics and best practices. With the above structure I'm confident we can make it work, but the schema "feels wrong" and I'm wondering if there is better way
There are two issues to consider, normalization and management.
To normalize traditionally you would need 4 tables.
Set up the 3 master tables USER, DOMAIN, OtherDATA.
Set up a child table with User_Id, Domain_Id, OtherDATA_Id, PermissionLevel
This provides the least amount of repeated data. It also makes the management possible at the user-domain level easier. You could also add a default whitelist/blacklist field at the user and domain tables. This way a script could auto populate the child table and then a manager could just go in and adjust the one value needed.
If you have a two different tables, one for white and one black list, you could get a user or domain on both lists by accident. Actually it would be 4 tables, 2 for users and 2 for domain. Management would be more complex.
With the following type of table design:
http://www.martinfowler.com/eaaCatalog/classTableInheritance.html
Let's use the following schema for sake of example:
CREATE TABLE `fruit` (
`id` int(10) UNSIGNED NOT NULL,
`type` tinyint(3) UNSIGNED NOT NULL,
`purchase_date` DATETIME NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `apple` (
`fruit_id` int(10) UNSIGNED NOT NULL,
`is_macintosh` tinyint(1) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `orange` (
`fruit_id` int(10) UNSIGNED NOT NULL,
`peel_thickness_mm` decimal(4,2) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `fruit`
ADD PRIMARY KEY (`id`);
ALTER TABLE `apple`
ADD KEY `fruit_id` (`fruit_id`);
ALTER TABLE `orange`
ADD KEY `fruit_id` (`fruit_id`);
ALTER TABLE `fruit`
MODIFY `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE `apple`
ADD CONSTRAINT `apple_ibfk_1` FOREIGN KEY (`fruit_id`) REFERENCES `fruit` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE `orange`
ADD CONSTRAINT `orange_ibfk_1` FOREIGN KEY (`fruit_id`) REFERENCES `fruit` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
Here, 'apples' and 'oranges' are types of 'fruit', and have unique properties, which is why they've been segmented out into their own tables.
The question is, from a performance standpoint, when performing a SELECT * FROM fruit query, would it be better to:
a) perform a LEFT OUTER JOIN on each typed table, i.e. apple and orange (in practice, we may be dealing with dozens of fruit types)
b) skip the joins and perform a separate query later for each fruit row in the application logic, so for a fruit row of type apple, SELECT * FROM apple WHERE fruit_id=...?
EDIT:
Regarding the specific scenario, I won't go into excruciating detail, but the actual application here is a notification system which generates notifications when certain events occur. There is a different notification type for each event type, and each notification type stores properties unique to that event type. This is on a site with a lot of user activity, so there will eventually be millions of notification rows.
Have one table with columns for the 'common' attributes (eg, type='apple', purchase_date=...), plus one TEXT column with JSON containing any other attributes (eg, subtype='macintosh') appropriate to the row in question.
Or it might make more sense to have subtype as a common attribute, since many fruits have such (think 'navel').
What will you be doing with the "inheritance"? It's great in the textbook, but it sucks in a database. SQL predates inheritance, object-oriented, etc.
I have a table with primary id auto generated and a separate order column (integer which might be changed by a GUI application). I would like the order column to be auto incremented when new row is inserted. It can be the same value as new car_id, I don't care. Is it possible?
I think I can have more then one auto increment fields, but they both need to be part of the primary key, and to prevent to potentially have two cars with same car_id I would need unique index on car_id? Am I correct?
I think I can use triggers, but they are prohibited by my hosting company.
CREATE TABLE IF NOT EXISTS `car` (
`car_id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`description` text,
`order` int(11) DEFAULT NULL,
PRIMARY KEY (`car_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
Why I would like it to be handled by the database is because I use a complicated system of database handling in my program (using generic types and reflection to make everything automated) and this feature would make my life easier.
Versioning is straightforward with entry such as page that has name. I would have a table page_version that stores every previous value of the row every time page is updated, whether using triggers or application logic.
CREATE TABLE `page` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `page` (`id`, `name`)
VALUES
(1,'Foo');
CREATE TABLE `page_version` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`page_id` int(10) unsigned NOT NULL,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`entry_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `page_id` (`page_id`),
CONSTRAINT `page_version_ibfk_1` FOREIGN KEY (`page_id`) REFERENCES `page` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `page_version` (`id`, `page_id`, `name`, `entry_timestamp`)
VALUES
(1,1,'foo','2013-09-19 20:27:06');
In this example, I know that page.name was changed from "foo" to "Foo". If it had been changed again (e.g., to "Bar"), then "Foo" value would be added to page_version and original row page.name updated to "Bar".
However, how to track version of dependant values that might have a one-to-many relation with the entry? e.g. if the latter schema was supplemented by adding category and category_page tables.
CREATE TABLE `category` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `category` (`id`, `name`)
VALUES
(1,'One'),
(2,'Two');
CREATE TABLE `page_category` (
`page_id` int(10) unsigned NOT NULL,
`category_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`page_id`,`category_id`),
KEY `category_id` (`category_id`),
CONSTRAINT `page_category_ibfk_2` FOREIGN KEY (`category_id`) REFERENCES `category` (`id`) ON DELETE CASCADE,
CONSTRAINT `page_category_ibfk_1` FOREIGN KEY (`page_id`) REFERENCES `page` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
How to capture the change (on the same change when foo was changed to Foo) if user has added a new category ("Two") to the page?
You use the term "version" but, as jeremycole commented, you are not clear on the reason for needing it.
If it is simply to provide a history of changes to the data over time, then an additional table for each table in your database will suffice; it appears this is what you already have with your page_version table.
These history tables will allow you to retrieve the state of your "object" from the database at a point in time, which is why I use the term "history". Calling it a version implies there is number, or some other identifier, applied to the collection of data that defines the "object". You do not appear to have this in your table structure.
Rebuilding the relational data from a point in time will involve writing your normal queries joining the appropriate tables, but with the addition of matching the row of data at, or before, the point in time you are interested in. While this can be done, it becomes unwieldy when the number of tables in the join increases.
Another way is to create a version of your object in the application and store it in the database. Use, for example, XML or JSON to encode your object and put the whole thing, as a string, in a table along with the version number and date stamp.
This makes it easy to retrieve an entire object given a version number, although it requires the application to construct the in-memory object from the XML/JSON data before it can be written to the database again (in the event you want to revert to a previous version). This shouldn’t be too hard though since you're already be reading/writing objects to your relational tables, you would just need to add the object streaming code.
Without knowing more about your reasons for storing the history/version it's hard to recommend one method over the other. I use the simple history table, managed by triggers, to record changes to the data in our system but then we don't have the requirement to "roll back" to a previous version. We use the history for the odd occasion someone goofs and we need to undo a single edit, and as a "blame" trail by recording the username of the person who made the change :)
I recommend you read Developing Time-Oriented Database Applications in SQL by Richard Snodgrass (links to the PDF are in the first paragraph of the "Books" section). It's not a short book but it has helped me immensely.
If you want to track version in the same place (say the same entry_timestamp field) you can achieve that with trigger on page_category table.
See more here, there is an example on the bottom of that page.
I am wondering if there is a better way to make some mysql tables than what I have been using in this project. I have a series of numbers which represent a specific time. Such as the number 101 would represent Jan 12, 2012 for example. It doesn't only represent time but that is the very basic of that information. So I created a lexicon table which has all the numbers we use and details such as time and meaning of that number. I have another table that is per customer which whenever they make a purchase I check off that the purchase is eligiable for a specific time. But the table where I check off each purchase and the lexicon table are not linked. I am wondering if there is a better way, maybe a way to have an sql statement take all the data from the Lexicon table and turn that into columns while the rows consist of customer ID and a true/false selector.
table structure
THIS IS THE CUSTOMER PURCHASED TABLE T/F
CREATE TABLE `group1` (
`100` TINYINT(4) NULL DEFAULT '0',
`101` TINYINT(4) NULL DEFAULT '0',
`102` TINYINT(4) NULL DEFAULT '0',
... this goes on for 35 times each table
PRIMARY KEY (`CustID`)
)
THIS IS THE LEXICON TABLE
CREATE TABLE `lexicon` (
`Number` INT(3) NOT NULL DEFAULT '0',
`Date` DATETIME NULL DEFAULT NULL,
`OtherPurtinantInfo` .... etc
)
So I guess instead of making groups of numbers every season for the customers I would prefer being able to use the updated lexicon table to automatically generate a table. My only concerns are that we have many many numbers so that would make a very large table all combined together but perhaps that could be limited into groups automatically as well so that it is not an overwhelming table.
I am not sure if I am being clear enough so feel free to comment on things that need to be clarified.
Here's a normalized ERD, based on what I understand your business requirements to be:
The classifieds run on certain dates, and a given advertisement can be run for more than one classifieds date.
The SQL statements to make the tables:
CREATE TABLE IF NOT EXISTS `classified_ads` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
);
CREATE TABLE IF NOT EXISTS `classified_dates` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`date` DATETIME NOT NULL,
`info` TEXT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE IF NOT EXISTS `classified_ad_dates` (
`classified_ad_id` INT UNSIGNED NOT NULL,
`classifiend_date_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`classified_ad_id`, `classifiend_date_id`),
INDEX `fk_classified_ad_dates_classified_ads1` (`classified_ad_id` ASC),
INDEX `fk_classified_ad_dates_classified_dates1` (`classifiend_date_id` ASC),
CONSTRAINT `fk_classified_ad_dates_classified_ads1`
FOREIGN KEY (`classified_ad_id`)
REFERENCES `classified_ads` (`id`)
ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT `fk_classified_ad_dates_classified_dates1`
FOREIGN KEY (`classifiend_date_id`)
REFERENCES `classified_dates` (`id`)
ON DELETE CASCADE
ON UPDATE CASCADE
);