Blacklist / Whitelist Table Design

Blacklist / Whitelist Table Design - mysql

We have a set of users
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(254) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
Each user can have one or many domains, such as
CREATE TABLE `domains` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(11) NOT NULL,
`domain` varchar(254) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `domain` (`domain`),
CONSTRAINT `domains_user_id_fk` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
And we have a table that has some sort of data, for this example it doesn't really matter what it contains
CREATE TABLE `some_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` TEXT NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
We want certain elements of some_data to be accessible to only certain users or only certain domains (whitelist case).
In other cases we want elements of some_data to be accessible to everyone BUT certain users or certain domains (blacklist case).
Ideally we would like to retrieve the list of domains that the given element of some_data is accessible to in a single query and ideally do the reverse (list all the data the given domain has access to)
Our approach so far is a single table
CREATE TABLE `access_rules` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rule_type` enum('blacklist','whitelist')
`some_data_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`domain_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
CONSTRAINT `access_rules_some_data_id_fk` FOREIGN KEY (`some_data_id`) REFERENCES `some_data` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
The problem however is the fact that we need to query the db twice (to figure out if the given data entry is operating a blacklist or a whitelist [whitelist has higher priority]). (EDIT: it can be done in a single query)
Also since the domain_id is nullable (to allow blacklisting / whitelisting an entire user) joining is not easy
The API that will use this schema is currently hit 4-5k times per second so performance matters.
The users table is relatively small (50k+ rows) and the domains table is about 1.5 million entries. some_data is also relatively small (sub 100k rows)
EDIT: the question is more around semantics and best practices. With the above structure I'm confident we can make it work, but the schema "feels wrong" and I'm wondering if there is better way

There are two issues to consider, normalization and management.
To normalize traditionally you would need 4 tables.
Set up the 3 master tables USER, DOMAIN, OtherDATA.
Set up a child table with User_Id, Domain_Id, OtherDATA_Id, PermissionLevel
This provides the least amount of repeated data. It also makes the management possible at the user-domain level easier. You could also add a default whitelist/blacklist field at the user and domain tables. This way a script could auto populate the child table and then a manager could just go in and adjust the one value needed.
If you have a two different tables, one for white and one black list, you could get a user or domain on both lists by accident. Actually it would be 4 tables, 2 for users and 2 for domain. Management would be more complex.

Related

MySQL Performance: Single Object With Multiple Types - JOIN scenario

With the following type of table design:
http://www.martinfowler.com/eaaCatalog/classTableInheritance.html
Let's use the following schema for sake of example:
CREATE TABLE `fruit` (
`id` int(10) UNSIGNED NOT NULL,
`type` tinyint(3) UNSIGNED NOT NULL,
`purchase_date` DATETIME NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `apple` (
`fruit_id` int(10) UNSIGNED NOT NULL,
`is_macintosh` tinyint(1) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `orange` (
`fruit_id` int(10) UNSIGNED NOT NULL,
`peel_thickness_mm` decimal(4,2) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `fruit`
ADD PRIMARY KEY (`id`);
ALTER TABLE `apple`
ADD KEY `fruit_id` (`fruit_id`);
ALTER TABLE `orange`
ADD KEY `fruit_id` (`fruit_id`);
ALTER TABLE `fruit`
MODIFY `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE `apple`
ADD CONSTRAINT `apple_ibfk_1` FOREIGN KEY (`fruit_id`) REFERENCES `fruit` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE `orange`
ADD CONSTRAINT `orange_ibfk_1` FOREIGN KEY (`fruit_id`) REFERENCES `fruit` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
Here, 'apples' and 'oranges' are types of 'fruit', and have unique properties, which is why they've been segmented out into their own tables.
The question is, from a performance standpoint, when performing a SELECT * FROM fruit query, would it be better to:
a) perform a LEFT OUTER JOIN on each typed table, i.e. apple and orange (in practice, we may be dealing with dozens of fruit types)
b) skip the joins and perform a separate query later for each fruit row in the application logic, so for a fruit row of type apple, SELECT * FROM apple WHERE fruit_id=...?
EDIT:
Regarding the specific scenario, I won't go into excruciating detail, but the actual application here is a notification system which generates notifications when certain events occur. There is a different notification type for each event type, and each notification type stores properties unique to that event type. This is on a site with a lot of user activity, so there will eventually be millions of notification rows.

Have one table with columns for the 'common' attributes (eg, type='apple', purchase_date=...), plus one TEXT column with JSON containing any other attributes (eg, subtype='macintosh') appropriate to the row in question.
Or it might make more sense to have subtype as a common attribute, since many fruits have such (think 'navel').
What will you be doing with the "inheritance"? It's great in the textbook, but it sucks in a database. SQL predates inheritance, object-oriented, etc.

which is the best database design for mysql

I am new in database, I am going to share with you 2 database tables design here,I just want to know which one is best design and why?
First one i have create a user table, subject table and user_subject table.
In user table i am saving user information and in subject i have saved subject. IN user_subject I have saved user id and subject id.
CREATE TABLE IF NOT EXISTS `subjects` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
-- --------------------------------------------------------
--
-- Table structure for table `users`
--
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
-- --------------------------------------------------------
--
-- Table structure for table `user_subjects`
--
CREATE TABLE IF NOT EXISTS `user_subjects` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`subject_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
2 one>
CREATE TABLE IF NOT EXISTS `subjects` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
CREATE TABLE IF NOT EXISTS `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`subject_name` varchar(2000) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
I have save user and it's subjects in user table with coma (,) sprat and not create another table to save user and subject id.
I thing 2 one is best because we are not need to save data in thrid table. Please tell me which one is best and long lasting for future.

The first version is much, much better. Here are some reasons why you do not want to use comma delimited strings:
SQL does not have particularly good string functions -- the basics, but not much more.
When you store values in a delimited string, the database cannot validate the data. With a separate table you can use foreign key constraints.
Queries on comma-delimited columns cannot make use of standard indexes (although it might be possible to use full text indexes).
With a comma-delimited string, the database cannot validate that the subjects are unique.
The first method uses a junction table and it is the better way to implement this logic in a relational database.

It's ok to use second way IF:
1) subjects have only one value of importance (name)
2) that value uniquely identifies subjects (i.e. no two subjects have the same name), OR there is no need to make distinction between two subjects with same name
Generally speaking first way is better because if you suddenly decide to give a new value to the subjects (for example age) you don't have to redo your whole table structure.

The second solution is not very good anyway, since you can not use joins or index.
Which solution would be the best, depends on the kind of relationship between users and subjects.
If each subject belongs to exactly one user and each user may have an arbitrary number of subjects, which means you have a one-to-many relationship, than you should add user_id to the table subject.
If any subject can belong to more than one user and each user can have many subjects you should use your first solution with a third mapping table (that would be a many-to-many-relationship).
In both cases you can express following queries very easily and cleanly in sql using a simple join:
which subjects belong to a given user
which users have subjects with a name containing a certain expression
which is the user (are the users in second case) of a given subject

How to version relational data?

Versioning is straightforward with entry such as page that has name. I would have a table page_version that stores every previous value of the row every time page is updated, whether using triggers or application logic.
CREATE TABLE `page` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `page` (`id`, `name`)
VALUES
(1,'Foo');
CREATE TABLE `page_version` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`page_id` int(10) unsigned NOT NULL,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`entry_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `page_id` (`page_id`),
CONSTRAINT `page_version_ibfk_1` FOREIGN KEY (`page_id`) REFERENCES `page` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `page_version` (`id`, `page_id`, `name`, `entry_timestamp`)
VALUES
(1,1,'foo','2013-09-19 20:27:06');
In this example, I know that page.name was changed from "foo" to "Foo". If it had been changed again (e.g., to "Bar"), then "Foo" value would be added to page_version and original row page.name updated to "Bar".
However, how to track version of dependant values that might have a one-to-many relation with the entry? e.g. if the latter schema was supplemented by adding category and category_page tables.
CREATE TABLE `category` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `category` (`id`, `name`)
VALUES
(1,'One'),
(2,'Two');
CREATE TABLE `page_category` (
`page_id` int(10) unsigned NOT NULL,
`category_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`page_id`,`category_id`),
KEY `category_id` (`category_id`),
CONSTRAINT `page_category_ibfk_2` FOREIGN KEY (`category_id`) REFERENCES `category` (`id`) ON DELETE CASCADE,
CONSTRAINT `page_category_ibfk_1` FOREIGN KEY (`page_id`) REFERENCES `page` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
How to capture the change (on the same change when foo was changed to Foo) if user has added a new category ("Two") to the page?

You use the term "version" but, as jeremycole commented, you are not clear on the reason for needing it.
If it is simply to provide a history of changes to the data over time, then an additional table for each table in your database will suffice; it appears this is what you already have with your page_version table.
These history tables will allow you to retrieve the state of your "object" from the database at a point in time, which is why I use the term "history". Calling it a version implies there is number, or some other identifier, applied to the collection of data that defines the "object". You do not appear to have this in your table structure.
Rebuilding the relational data from a point in time will involve writing your normal queries joining the appropriate tables, but with the addition of matching the row of data at, or before, the point in time you are interested in. While this can be done, it becomes unwieldy when the number of tables in the join increases.
Another way is to create a version of your object in the application and store it in the database. Use, for example, XML or JSON to encode your object and put the whole thing, as a string, in a table along with the version number and date stamp.
This makes it easy to retrieve an entire object given a version number, although it requires the application to construct the in-memory object from the XML/JSON data before it can be written to the database again (in the event you want to revert to a previous version). This shouldn’t be too hard though since you're already be reading/writing objects to your relational tables, you would just need to add the object streaming code.
Without knowing more about your reasons for storing the history/version it's hard to recommend one method over the other. I use the simple history table, managed by triggers, to record changes to the data in our system but then we don't have the requirement to "roll back" to a previous version. We use the history for the odd occasion someone goofs and we need to undo a single edit, and as a "blame" trail by recording the username of the person who made the change :)
I recommend you read Developing Time-Oriented Database Applications in SQL by Richard Snodgrass (links to the PDF are in the first paragraph of the "Books" section). It's not a short book but it has helped me immensely.

If you want to track version in the same place (say the same entry_timestamp field) you can achieve that with trigger on page_category table.
See more here, there is an example on the bottom of that page.

Questionable Normalizing

I'm new to designing DB's and MySQL
so to keep it short the information I have is AppId, Last Name, First Name, Gender, Age, DoB, Relation, School Name, School Address, School Phone Number, Teachers, Councilers, If they are in childcare
(AppID is the application ID being submited)
basically this is how I see everything being determined and what tables should be created to be most normalized
AppID -> Child ID
childID -> Last Name, FirstName, Gender, Age, DoB, Relation, School Name, Grade, Child Care
School Name -> Address School, School Phone Number, teacherID,counselorID
teacherID -> First Name, Last Name, Course
CounselorID -> First Name, Last Name, Counselor type
however I'm not sure if attempting to completely normalize this is a good idea since this is a rather small group I'm helping which may cause joining the tables to take longer than a regular grouped look up and may take up more space.
Another concern is MySQL only allows 1 auto incremental variable, which I could define a similar thing in a query but would rather not have to if it's possible. The 2 incremental's would be teacherID and CouncelorID
so any input would be very appreciated
Edit: here's the basic structure, also will add modifications attributes later, dropped courses for now. Thank you
CREATE TABLE `Client_Child_Info` (
`FirstName` varchar(15) NOT NULL,
`LastName` varchar(15) NOT NULL,
`Gender` tinyint(1) NOT NULL,
`Age` tinyint(4) NOT NULL,
`DoB` date NOT NULL,
`Relation` varchar(15) NOT NULL,
`Grade` varchar(3) NOT NULL default 'NA',
`ChildCare` tinyint(1) NOT NULL default '0',
`ChildID` int(11) NOT NULL auto_increment,
PRIMARY KEY (`ChildID`),
KEY `Age` (`Age`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Basic Child Information' AUTO_INCREMENT=1 ;
CREATE TABLE `Client_Child_Schoolinfo` (
`SchoolID` int(11) NOT NULL,
`SchoolName` varchar(50) NOT NULL,
`SchoolAddress` varchar(50) default NULL,
`SchoolPhone` varchar(15) default NULL,
PRIMARY KEY (`SchoolID`),
KEY `SchoolName` (`SchoolName`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='School Information for a given ID ';
CREATE TABLE `Client_child_teacher` (
`TeacherID` int(11) NOT NULL,
`FirstName` varchar(15) NOT NULL,
`LastName` varchar(15) NOT NULL,
`Guidance` tinyint(1) NOT NULL COMMENT 'determines if the person is a guidance councilor or teacher',
PRIMARY KEY (`TeacherID`),
KEY `Guidance` (`Guidance`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Teacher information';
CREATE TABLE `Client_RTchild` (
`AppID` int(11) NOT NULL,
`ChildID` int(11) NOT NULL auto_increment,
PRIMARY KEY (`ChildID`),
KEY `AppID` (`AppID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Reference Table Applicant to Client' AUTO_INCREMENT=1 ;
CREATE TABLE `Client_RTteacher` (
`SchoolID` int(11) NOT NULL,
`TeacherID` int(11) NOT NULL,
PRIMARY KEY (`TeacherID`),
KEY `SchoolID` (`SchoolID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Reference Table for teacher to school ';
CREATE TABLE `Client_RTschool` (
`ChildID` int(11) NOT NULL,
`SchoolID` int(11) NOT NULL auto_increment,
PRIMARY KEY (`SchoolID`),
KEY `ChildID` (`ChildID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Reference Table Child to SchoolID it is attending' AUTO_INCREMENT=1 ;
ALTER TABLE `Client_Child_Info`
ADD CONSTRAINT `Client_Child_Info_ibfk_1` FOREIGN KEY (`ChildID`) REFERENCES `Client_RTchild` (`ChildID`) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE `Client_Child_Schoolinfo`
ADD CONSTRAINT `Client_Child_Schoolinfo_ibfk_1` FOREIGN KEY (`SchoolID`) REFERENCES `Client_RTschool` (`SchoolID`) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE `Client_child_teacher`
ADD CONSTRAINT `Client_child_teacher_ibfk_1` FOREIGN KEY (`TeacherID`) REFERENCES `Client_RTteacher` (`TeacherID`) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE `Client_RTschool`
ADD CONSTRAINT `Client_RTschool_ibfk_1` FOREIGN KEY (`ChildID`) REFERENCES `Client_RTchild` (`ChildID`) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE `Client_RTteacher`
ADD CONSTRAINT `Client_RTteacher_ibfk_1` FOREIGN KEY (`SchoolID`) REFERENCES `Client_RTschool` (`SchoolID`) ON DELETE CASCADE ON UPDATE CASCADE;

...An old one and the comments are heading into one direction.
Let's put some rationale to statements:
The baselining to share table like structured information is MS Excel or Office software from any other vendor. Unfortunately those client applications are not good in near-simultanous edit by multiple authors, however are good enough to distribute information from ONE author to multiple readers. So if the working mode is the second there is no use in putting any effort to transfer data and data concept to a database system since MS Excel is literally common sense and distribution/training effort is low.
However in case near-simultanous edit by multiple authors is needed and you can afford server hardware and application development a database on a server would be the best place to store data.
Ok, given the decision for a database has been taken:
To go for normalisation or denormalisation is a design decision. Factors like maintenance, setup effort and hardware support play a role for design decisions. Software function design depends on the data model design and the software function design must support the business logic which in your case is to deal with applications. Unfortunately denormalization often builds on assumptions (on top on those basic assumptions that you also have in place for a normalized model) according to the Business Logic, that in future developments often are revised with the ugly consequence that software function must also be changed by then. Coming from there the denormalization has some benefits, if the software has not a time limited scope.

Mysql Dynamic Table expanding columns

I am wondering if there is a better way to make some mysql tables than what I have been using in this project. I have a series of numbers which represent a specific time. Such as the number 101 would represent Jan 12, 2012 for example. It doesn't only represent time but that is the very basic of that information. So I created a lexicon table which has all the numbers we use and details such as time and meaning of that number. I have another table that is per customer which whenever they make a purchase I check off that the purchase is eligiable for a specific time. But the table where I check off each purchase and the lexicon table are not linked. I am wondering if there is a better way, maybe a way to have an sql statement take all the data from the Lexicon table and turn that into columns while the rows consist of customer ID and a true/false selector.
table structure
THIS IS THE CUSTOMER PURCHASED TABLE T/F
CREATE TABLE `group1` (
`100` TINYINT(4) NULL DEFAULT '0',
`101` TINYINT(4) NULL DEFAULT '0',
`102` TINYINT(4) NULL DEFAULT '0',
... this goes on for 35 times each table
PRIMARY KEY (`CustID`)
)
THIS IS THE LEXICON TABLE
CREATE TABLE `lexicon` (
`Number` INT(3) NOT NULL DEFAULT '0',
`Date` DATETIME NULL DEFAULT NULL,
`OtherPurtinantInfo` .... etc
)
So I guess instead of making groups of numbers every season for the customers I would prefer being able to use the updated lexicon table to automatically generate a table. My only concerns are that we have many many numbers so that would make a very large table all combined together but perhaps that could be limited into groups automatically as well so that it is not an overwhelming table.
I am not sure if I am being clear enough so feel free to comment on things that need to be clarified.

Here's a normalized ERD, based on what I understand your business requirements to be:
The classifieds run on certain dates, and a given advertisement can be run for more than one classifieds date.
The SQL statements to make the tables:
CREATE TABLE IF NOT EXISTS `classified_ads` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
);
CREATE TABLE IF NOT EXISTS `classified_dates` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`date` DATETIME NOT NULL,
`info` TEXT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE IF NOT EXISTS `classified_ad_dates` (
`classified_ad_id` INT UNSIGNED NOT NULL,
`classifiend_date_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`classified_ad_id`, `classifiend_date_id`),
INDEX `fk_classified_ad_dates_classified_ads1` (`classified_ad_id` ASC),
INDEX `fk_classified_ad_dates_classified_dates1` (`classifiend_date_id` ASC),
CONSTRAINT `fk_classified_ad_dates_classified_ads1`
FOREIGN KEY (`classified_ad_id`)
REFERENCES `classified_ads` (`id`)
ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT `fk_classified_ad_dates_classified_dates1`
FOREIGN KEY (`classifiend_date_id`)
REFERENCES `classified_dates` (`id`)
ON DELETE CASCADE
ON UPDATE CASCADE
);

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008