As a simplified example, imagine that I'm selling widgets. I sell them nationwide (in both the U.S. and Canada) but there are some that can only be sold in certain areas (one or more U.S. states or Canadian provinces).
I'd like a good way to store this information, coupled with a fast way to query for the widgets that are available to a given user. "U.S., 50 states and D.C." is the most common value, so I'd rather not insert 51 rows.
MySQL doesn't support bitmap indexes, so that's ruled out.
Here are some combinations:
U.S. 50 states and D.C.
U.S. 50 states, D.C., Canada, but not Quebec.
U.S. 48 contiguous states and D.C.
U.S., D.C., but not Colorado
U.S., D.C., and territories (Puerto Rico, etc).
My user will have given me one value for their state/province and country.
Can you suggest a schema that provides good storage and fast matching?
Thanks!
You should build predefined sets of values and storing this set to the items.
With a value you retrieve the matching sets and the matching items.
CREATE TABLE `valuesets` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `valueset_items` (
`valueset_id` int(11) unsigned NOT NULL,
`value` varchar(20) NOT NULL DEFAULT '',
PRIMARY KEY (`valueset_id`,`value`),
CONSTRAINT `fk_valueset_items_valueset` FOREIGN KEY (`valueset_id`) REFERENCES `valuesets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `items` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(50) NOT NULL DEFAULT '',
`valueset_id` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `fk_items_valueset` (`valueset_id`),
CONSTRAINT `fk_items_valueset` FOREIGN KEY (`valueset_id`) REFERENCES `valuesets` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
To select all items that matching a special value
SELECT *
FROM items
WHERE
valueset_id IN ( SELECT valueset_id
FROM valueset_items
WHERE `value` = 'A' )
SQL Fiddle DEMO
This is a MySQL SET type, assuming that you can keep your dataset down to 64 items (or, use multiple sets based on other conditions).
I thought I would expand on my answer, because I think some people just don't understand the power of the set. Example table:
CREATE TABLE `Test` (
`setid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`setname` varchar(64) NOT NULL,
`setstate` set('AK','AL','AR','AZ','CA','CO','CT','DC','DE','FL','GA','HI','IA','ID','IL','IN','KS','KY','LA','MA','MD','ME','MI','MN','MO','MS','MT','NC','ND','NE','NH','NJ','NM','NV','NY','OH','OK','OR','PA','RI','SC','SD','TN','TX','UT','VA','VT','WA','WI','WV','WY') NOT NULL,
PRIMARY KEY (`setid`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
insert into `Test` values('1','test','AZ,CA,NJ,NM,NY,VA,VT');
Note that we use a single set field for states. More complex uses will likely require use of multiple sets, but the slightly more horizontal qword for each record may be cheaper than adding a large # of extra join operations on a lookup table that could easily reach a huge # of records on its on.
Below are 3 (functionally) equivalent pulls. Note that the bitmask is very much the fastest way to pull this data:
SELECT * FROM Test WHERE setstate & 1000;
For test #1, We use 1000 as the bitmask, because this corresponds to item #4 in our list (AZ). This is, by far, the fastest method... and there are few ways to store this data which will give you faster result potential.
SELECT * FROM Test WHERE setstate LIKE '%AZ%';
This method can use indexes, but will be somewhat slow because of the fuzzy match.
SELECT * FROM Test WHERE FIND_IN_SET('AZ',setstate);
This method will be faster than the fuzzy match, but its nature will pretty much require the use of a temporary table in most real-world uses.
Related
I'm running MySQL 5.5 and found behaviour I didn't know of before.
Given this create:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(128) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name_UQ` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
With these inserts:
insert into test (name) values ('b');
insert into test (name) values ('a');
And this select:
select * from test;
MySQL does something I wasn't aware of:
2 a
1 b
It sorts automatically.
Given a table with one extra, non-unique column:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(128) DEFAULT NULL,
`other_column` varchar(128) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name_UQ` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And the same inserts (see above), the select (see above) gives this result:
1 b NULL
2 a NULL
Which is kind of expected.
Where is the behaviour of the first query (SQL Fiddle) documented? I'd like to see more of these peculiar things.
MySQL does not sort result sets automatically. The ordering of a result set is indeterminate unless the query specifies an order by clause.
You should never rely on any sort of "implicit" ordering. Just because you see it in 1 (or 100 queries). In fact, without an order by, the same query can return results in different orders on subsequent runs (although I'll admit that this regularly occurs in other database, it is unlikely in MySQL).
Instead, add the ORDER BY. Ordering by a primary key is remarkably efficient, so you don't have to worry about performance.
I'm in the process of designing a new database for a project at work. I want to create a table that stores Assignments for a digital classroom. Each Assignment can be one of 2 categories: "Individual" or "Group".
The first implementation that comes to mind is the following:
CREATE TABLE `assignments` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) DEFAULT NULL,
`category` varchar(10) NOT NULL DEFAULT 'individual',
PRIMARY KEY (`id`),
KEY `category_index` (`category`(10))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
I would then select all assignments of a given category with:
SELECT title FROM assignments WHERE category = "individual"
However, because we've had performance issues in the past, I'm trying to optimize the design as much as possible. As such, I'm wondering whether or not storing the category as a VARCHAR is a good idea (considering the table will get quite large)? Would indexing an INT perform better over a VARCHAR?
Aside from just performance, I'm also curious what would be considered a good solution from a design-perspective. Suggestions?
Researching hierarchical data persistence and led me to closure tables and pieced together this comment structure based off of the culmination of said research.
Queries for creating new nodes in the closure table were easy enough for me to grasp and fetching data for descendants via a JOIN on the closure table is simple enough.
However, I would like to expand upon that and get results back sorted and limited by both number of parents/children down through a depth of x.
I'm trying to keep things timely/efficient (I expect comments table to get very large) by making use of foreign keys and indexes. I am shooting for an all in one query that can do what I ask in the title, but am not opposed to breaking it up to increase speed/efficiency.
Current table structures:
CREATE TABLE `comments` (
`comment_id` int(11) UNSIGNED PRIMARY KEY,
`reply_to` int(11) UNSIGNED NOT NULL DEFAULT '0',
`user_id` int(11) UNSIGNED NOT NULL,
`comment_time` int(11) NOT NULL,
`comment` mediumtext NOT NULL,
FOREIGN KEY (`user_id`) REFERENCES users(`user_id`)
) Engine=InnoDB
CREATE TABLE `comments_closure`(
`ancestor_id` int(11) UNSIGNED NOT NULL,
`descendant_id` int(11) UNSIGNED NOT NULL,
`length` tinyint(3) UNSIGNED NOT NULL DEFAULT '0',
PRIMARY KEY(`ancestor_id`, `descendant_id`),
KEY `tree_adl`(`ancestor_id`, `descendant_id`, `length`),
KEY `tree_dl`(`descendant_id`, `length`),
FOREIGN KEY (`ancestor_id`) REFERENCES comments(`comment_id`),
FOREIGN KEY (`descendant_id`) REFERENCES comments(`comment_id`)
) Engine=InnoDB
A clearer summary of what I'm trying to do would be to fetch 20 comments that share an ancestor_id, sorted by time. While also fetching each one's comments 2 length deeper (keeping these limited to a much smaller amount 2) also sorted by time.
I'm not looking to always sort by time however and would also like to fetch results sorted by their comment_id Is it possible to do all this in a single query? I'm not quite sure where to begin.
I'm trying to implement a way to track changes to a table named user and another named report_to Below are their definitions:
CREATE TABLE `user`
(
`agent_eid` int(11) NOT NULL,
`agent_id` int(11) DEFAULT NULL,
`agent_pipkin_id` int(11) DEFAULT NULL,
`first_name` varchar(45) NOT NULL,
`last_name` varchar(45) NOT NULL,
`team_id` int(11) NOT NULL,
`hire_date` date NOT NULL,
`active` bit(1) NOT NULL,
`agent_id_req` bit(1) NOT NULL,
`agent_eid_req` bit(1) NOT NULL,
`agent_pipkin_req` bit(1) NOT NULL,
PRIMARY KEY (`agent_eid`),
UNIQUE KEY `agent_eid_UNIQUE` (`agent_eid`),
UNIQUE KEY `agent_id_UNIQUE` (`agent_id`),
UNIQUE KEY `agent_pipkin_id_UNIQUE` (`agent_pipkin_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `report_to`
(
`agent_eid` int(11) NOT NULL,
`report_to_eid` int(11) NOT NULL,
PRIMARY KEY (`agent_eid`),
UNIQUE KEY `agent_eid_UNIQUE` (`agent_eid`),
KEY `report_to_report_fk_idx` (`report_to_eid`),
CONSTRAINT `report_to_agent_fk` FOREIGN KEY (`agent_eid`) REFERENCES `user` (`agent_eid`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `report_to_report_fk` FOREIGN KEY (`report_to_eid`) REFERENCES `user` (`agent_eid`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8
What can change that needs to be tracked is user.team_id, user.active and report_to.report_to_eid. What i currently have implemented is a table that is populated via an update trigger on user that tracks team changes. That table is defined as:
CREATE TABLE `user_team_changes`
(
`agent_id` int(11) NOT NULL,
`date_changed` date NOT NULL,
`old_team_id` int(11) NOT NULL,
`begin_date` date NOT NULL,
PRIMARY KEY (`agent_id`,`date_changed`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
This works fine for just tracking team changes. I'm able to use joins and a union to populate a history view that tracks that change over time for the individual users. The issue of complexity rises when I try to implement tracking for the other two change types.
I have thought about creating additional tables similar to the one tracking changes for teams, but I worry about performance hits due to the joins that will be required.
Another way I have considered is creating a table similar to a view that I have that details the current user state (it joins all necessary user data together from 4 tables), then insert a record on update with a valid until date field added. My concern with that is the amount of space this could take.
We will be using the user change history quite a bit as we will be running YTD, MTD, PMTD and time interval reports with it on an almost daily basis.
Out of the two options I am considering, which would be the best for my given situation?
The options you've presented:
using triggers to populate transaction-log tables.
including a new table with an effective-date columns in the schema and tracking change by inserting new rows.
Either one of these will work. You can add logging triggers to other tables without causing any trouble.
What distinguishes these two choices? The first one is straightforward, once you get your triggers debugged.
The second choice seems to me that it will create denormalized redundant data. That is never good. I would opt not to do that. It is possible with judicious combinations of views and effective-date columns to create history tables that are viewable as the present state of the system. To learn about this look at Prof. RT Snodgrass's excellent book on Developing Time Oriented applications. http://www.cs.arizona.edu/~rts/publications.html If you have time to do an excellent engineering (over-engineering?) job on this project you might consider this approach.
The data volume you've mentioned will not cause intractable performance problems on any modern server hardware platform. If you do get slowdowns on JOIN operations, it's almost certain that the addition of appropriate indexes will completely fix them, as long as you declare all your DATE, DATETIME, and TIMESTAMP fields NOT NULL. (NULL values can mess up indexing and searching).
Hope this helps.
I have a description of a Skeletal Muscle System, However I do not know the best approach as this system has several subsets wich happen to have subsets like:
Skeletal Muscle System
Position Of The Animal In Station
Assessment Of Progress
Valuation Of Trot
Probing
Tip Thoracic
Region Escapolohumeral
Elbow And Forearm
Carpo And Fingers
Pelvic Limb
Pelvis
Knee
Hock
Specific Tests
Drawer Test
Test Ortolani
Other
Then I have a patient table
Patient
ID
Name
L_Name
Then I have a table that contains all systems (Skeletal Muscle System is part of that)
Systems
Skeletal Muscle System
Nervous System
Urinary System
Respiratory
So I am doing something like
DROP TABLE IF EXISTS `tbl_systems`;
CREATE TABLE `tbl_systems` (
id_system INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
name_system VARCHAR(25)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;
INSERT INTO `tbl_systems` VALUES (1,'Skeletal_Muscle');
INSERT INTO `tbl_systems` VALUES (2,'Nervous');
INSERT INTO `tbl_systems` VALUES (3,'Urinary');
INSERT INTO `tbl_systems` VALUES (4,'Respiratory');
DROP TABLE IF EXISTS `tbl_patient`;
CREATE TABLE `tbl_patient` (
id_patient INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
name_patient VARCHAR(25) NOT NULL DEFAULT "not available",
l_name_patient VARCHAR(25) NOT NULL DEFAULT "not available"
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `tbl_patient` VALUES (1,'Joe', 'Doe');
DROP TABLE IF EXISTS `tbl_systems_patient`;
CREATE TABLE `tbl_systems_patient` (
id_patient_system INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
id_system INTEGER NOT NULL,
id_patient INTEGER NOT NULL,
FOREIGN KEY (id_system) REFERENCES `tbl_systems` (id_system),
FOREIGN KEY (id_patient) REFERENCES `tbl_patient` (id_patient)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;
INSERT INTO `tbl_systems_patient` VALUES (1,1,1);
DROP TABLE IF EXISTS `tbl_Skeletal_Muscle`;
CREATE TABLE `tbl_Skeletal_Muscle` (
id_system INTEGER NOT NULL,
Position_In_Station VARCHAR(25) NOT NULL DEFAULT "not available",
Assessment_Of_Progress VARCHAR(25) NOT NULL DEFAULT "not available",
Valuation_Of_Trot VARCHAR(25) NOT NULL DEFAULT "not available",
Probing VARCHAR(25) NOT NULL DEFAULT "not available"
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1;
INSERT INTO `tbl_Skeletal_Muscle`
VALUES (1,'Normal','no progress','no change','failed');
How to link the tbl_Skeletal_Muscle with a patient (id_patient) and get the the corresponding results, navigating through those tables?
Is the approach correct or are there better options?
How to add the missing data of tbl_Skeletal_Muscle? Do I create other tables? How many?
please take a look at sqlfiddle
You problem is that you are trying to mix patient-related information with information, which describes your system.
Let's for now assume that the hierarchy in skeletal muscle system is not terribly relevant to your database, but more a presentation issue. Likewise for the other systems.
Then you could create a table for each system, each table having a foreign key to a patient and containing all required columns, irrespective of the hierarchy. In that case the Systems information would be contained in the table layout itself and adding another system would require adding another table. A Systems table would not be needed at all. At first glance it indeed looks like Systems should become part of the design, because very few patients will differ in their Systems ;-)
If you want the hierarchy of skeletal muscle system to be expressed in that table you could add a parent column, where e.g. Tip Thoracic's parent would be Probing. Still I believe this is a bad idea, because this parent relationship would be the same for all patients and thus does not convey any patient-related information.
You can add a Table which contains the hierarchy of systems to their components (Position Of The Animal In Station, ...). This table is then more like a configuration table, which describes your system (and not patients). You can however just as well capture this hierarchy in your application code or another configuration file.