How to version relational data? - mysql

Versioning is straightforward with entry such as page that has name. I would have a table page_version that stores every previous value of the row every time page is updated, whether using triggers or application logic.
CREATE TABLE `page` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `page` (`id`, `name`)
VALUES
(1,'Foo');
CREATE TABLE `page_version` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`page_id` int(10) unsigned NOT NULL,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`entry_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `page_id` (`page_id`),
CONSTRAINT `page_version_ibfk_1` FOREIGN KEY (`page_id`) REFERENCES `page` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `page_version` (`id`, `page_id`, `name`, `entry_timestamp`)
VALUES
(1,1,'foo','2013-09-19 20:27:06');
In this example, I know that page.name was changed from "foo" to "Foo". If it had been changed again (e.g., to "Bar"), then "Foo" value would be added to page_version and original row page.name updated to "Bar".
However, how to track version of dependant values that might have a one-to-many relation with the entry? e.g. if the latter schema was supplemented by adding category and category_page tables.
CREATE TABLE `category` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `category` (`id`, `name`)
VALUES
(1,'One'),
(2,'Two');
CREATE TABLE `page_category` (
`page_id` int(10) unsigned NOT NULL,
`category_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`page_id`,`category_id`),
KEY `category_id` (`category_id`),
CONSTRAINT `page_category_ibfk_2` FOREIGN KEY (`category_id`) REFERENCES `category` (`id`) ON DELETE CASCADE,
CONSTRAINT `page_category_ibfk_1` FOREIGN KEY (`page_id`) REFERENCES `page` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
How to capture the change (on the same change when foo was changed to Foo) if user has added a new category ("Two") to the page?

You use the term "version" but, as jeremycole commented, you are not clear on the reason for needing it.
If it is simply to provide a history of changes to the data over time, then an additional table for each table in your database will suffice; it appears this is what you already have with your page_version table.
These history tables will allow you to retrieve the state of your "object" from the database at a point in time, which is why I use the term "history". Calling it a version implies there is number, or some other identifier, applied to the collection of data that defines the "object". You do not appear to have this in your table structure.
Rebuilding the relational data from a point in time will involve writing your normal queries joining the appropriate tables, but with the addition of matching the row of data at, or before, the point in time you are interested in. While this can be done, it becomes unwieldy when the number of tables in the join increases.
Another way is to create a version of your object in the application and store it in the database. Use, for example, XML or JSON to encode your object and put the whole thing, as a string, in a table along with the version number and date stamp.
This makes it easy to retrieve an entire object given a version number, although it requires the application to construct the in-memory object from the XML/JSON data before it can be written to the database again (in the event you want to revert to a previous version). This shouldn’t be too hard though since you're already be reading/writing objects to your relational tables, you would just need to add the object streaming code.
Without knowing more about your reasons for storing the history/version it's hard to recommend one method over the other. I use the simple history table, managed by triggers, to record changes to the data in our system but then we don't have the requirement to "roll back" to a previous version. We use the history for the odd occasion someone goofs and we need to undo a single edit, and as a "blame" trail by recording the username of the person who made the change :)
I recommend you read Developing Time-Oriented Database Applications in SQL by Richard Snodgrass (links to the PDF are in the first paragraph of the "Books" section). It's not a short book but it has helped me immensely.

If you want to track version in the same place (say the same entry_timestamp field) you can achieve that with trigger on page_category table.
See more here, there is an example on the bottom of that page.

Related

MySQL, Foreign Key contraint fails in SP, but not when executing manually

the task I am executing seems so simple. But this behaviour is very creepy indeed.
I am using MySQL.
I am having a Table for uploaded files - containing ID, FileName, FileHash, UploadDate, UploadUser etc.
Basically, these are XML Files.
And I have also a Table, containing the contents of the uploaded files. XML Files get parsed when uploaded and the contents are written into this Table.
This table has a reference, a foreign key, to the File-Table.
Here is the definition (shortened to the crucial points):
CREATE TABLE `tbl_xmlfiles` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`Filename` varchar(1024) NOT NULL,
`FileHash` varchar(45) DEFAULT NULL,
`UploadDate` datetime DEFAULT CURRENT_TIMESTAMP,
`Count` int(11) NOT NULL DEFAULT '0',
`Status` enum('uploaded','closed','error','archived') NOT NULL DEFAULT 'uploaded',
`ScanID` int(11) DEFAULT NULL,
...
PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=986 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='Stores all uploaded xmlfiles data';
CREATE TABLE `tbl_xmlcontents` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`FileID` int(11) NOT NULL,
...
PRIMARY KEY (`ID`),
KEY `FileID` (`FileID`),
CONSTRAINT `fileID` FOREIGN KEY (`FileID`) REFERENCES `tbl_xmlfiles` (`ID`),
) ENGINE=InnoDB AUTO_INCREMENT=304817 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='Stores all xml content data';
For better understanding, a "Scan" bundles multiple XML Files together.
I understand of course, that, if a File from the File Table is to be deleted, we first have to delete the rows from the other table (because this is no CASCADE contraint, which i do not want).
This is what I do in the SP:
CREATE PROCEDURE `sp_DeleteFilesByScanID`(
IN ScanID INT,
IN username VARCHAR(45)
)
BEGIN
...
DELETE c FROM tbl_xmlcontents c LEFT JOIN tbl_xmlfiles f ON c.FileID=f.ID WHERE f.`ScanID`=ScanID;
DELETE FROM tbl_xmlfiles WHERE `ScanID`=ScanID;
...
END
Running this SP, it errors:
Cannot delete or update a parent row: a foreign key constraint fails (`MYGREATTOOL`.`tbl_xmlcontents`, CONSTRAINT `fileID` FOREIGN KEY (`FileID`) REFERENCES `tbl_xmlfiles` (`ID`))
It errors on the second statement "DELETE FROM tbl_xmlfiles ...". The first one is executed as expected.
BUT here comes the clue.
When I execute these two DELETE-lines manually in MySQL Workbench (replacing the ScanID parameter with a valid value) - all works as expected. How can that be, for heaven's sake?
Can you please help me?
Greetings, xola
Okay, sticky bit's comment pointed me to the right solution. Thank you very much!
When specifying the ScanID with the table name/alias, then it is not considered as the SP Parameter "ScanID", but as a field name.
Solution:
DELETE FROM tbl_xmlfiles x WHERE x.`ScanID`=ScanID;
Can someone clarify, why the MySQL Parser thinks, that "ScanID" (sorry, here are backticks around the word - Stackoverflow makes this a quote automatically) is the SP Parameter? Aren't Backticks always meant to describe a MySQL field name?

Blacklist / Whitelist Table Design

We have a set of users
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(254) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_email` (`email`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
Each user can have one or many domains, such as
CREATE TABLE `domains` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(11) NOT NULL,
`domain` varchar(254) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `domain` (`domain`),
CONSTRAINT `domains_user_id_fk` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
And we have a table that has some sort of data, for this example it doesn't really matter what it contains
CREATE TABLE `some_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` TEXT NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
We want certain elements of some_data to be accessible to only certain users or only certain domains (whitelist case).
In other cases we want elements of some_data to be accessible to everyone BUT certain users or certain domains (blacklist case).
Ideally we would like to retrieve the list of domains that the given element of some_data is accessible to in a single query and ideally do the reverse (list all the data the given domain has access to)
Our approach so far is a single table
CREATE TABLE `access_rules` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rule_type` enum('blacklist','whitelist')
`some_data_id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`domain_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
CONSTRAINT `access_rules_some_data_id_fk` FOREIGN KEY (`some_data_id`) REFERENCES `some_data` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
The problem however is the fact that we need to query the db twice (to figure out if the given data entry is operating a blacklist or a whitelist [whitelist has higher priority]). (EDIT: it can be done in a single query)
Also since the domain_id is nullable (to allow blacklisting / whitelisting an entire user) joining is not easy
The API that will use this schema is currently hit 4-5k times per second so performance matters.
The users table is relatively small (50k+ rows) and the domains table is about 1.5 million entries. some_data is also relatively small (sub 100k rows)
EDIT: the question is more around semantics and best practices. With the above structure I'm confident we can make it work, but the schema "feels wrong" and I'm wondering if there is better way
There are two issues to consider, normalization and management.
To normalize traditionally you would need 4 tables.
Set up the 3 master tables USER, DOMAIN, OtherDATA.
Set up a child table with User_Id, Domain_Id, OtherDATA_Id, PermissionLevel
This provides the least amount of repeated data. It also makes the management possible at the user-domain level easier. You could also add a default whitelist/blacklist field at the user and domain tables. This way a script could auto populate the child table and then a manager could just go in and adjust the one value needed.
If you have a two different tables, one for white and one black list, you could get a user or domain on both lists by accident. Actually it would be 4 tables, 2 for users and 2 for domain. Management would be more complex.

MariaDB/MySQL foreign key constraint: possible to request cascade at time of delete?

I have used PHP code to preserve database integrity for years, but now I am switching from MyISAM to InnoDB and thought it might be nice to utilize foreign key constraints, letting the DB carry more of the load. But I want to confirm with the user before doing a cascade, so the constraints would be declared as ON DELETE RESTRICT. When I get the error, I would let the user know that there are dependent records and how many, and if they say, "Sure, delete them," it would be nice to let the database do a cascading delete. Is it possible to tell a specific DELETE statement to go ahead and cascade? I expected an option or something on the DELETE command (e.g. pseudocode DELETE FROM table WHERE ... CASCADE TO child-table), but I didn't see anything.
Example (very standard many-to-many):
CREATE TABLE `person` (
`PersonID` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`FullName` varchar(100) CHARACTER SET utf8mb4 NOT NULL DEFAULT '',
<many other fields>,
PRIMARY KEY (`PersonID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `category` (
`CategoryID` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`Category` varchar(60) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`CategoryID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `percat` (
`PersonID` mediumint(8) unsigned NOT NULL DEFAULT 0,
`CategoryID` mediumint(8) unsigned NOT NULL DEFAULT 0,
PRIMARY KEY (`PersonID`,`CategoryID`),
FOREIGN KEY (`PersonID`) REFERENCES `person`(`PersonID`) ON DELETE RESTRICT,
FOREIGN KEY (`CategoryID`) REFERENCES `category`(`CategoryID`) ON DELETE RESTRICT
) ENGINE=InnoDB DEFAULT CHARSET=ascii COLLATE=ascii_bin;
I found How to cascade-delete temporarily or on-demand? but: (a) it's for SQLServer, not MySQL (well, technically I'm using MariaDB 10.2.4, if that makes a difference), so I don't know if I have additional options available to me, and (b) such stored procedure code wouldn't be any simpler than the PHP code I already have (and less visible when I'm developing), so I don't see the point in swapping one for the other.
Short answer: No.
Longer answer:
The answer is simple-minded: FKs are simple-minded. When you ask for more than trivial actions, you are asking for too much of FKs, and you need to build the "business logic" into your application.
Ditto for Triggers.
MySQL (and MariaDB) have always been "lean and mean" compared to the heavy hitters. FKs exist as a check on a feature list "yes, we have FKs, too". So, anything esoteric in the details of FKs are quite likely missing.
Sometimes the syntax is implemented without any real code behind it -- CHECK; INDEX(x DESC). (The latter is finally being implemented in 8.0, but I would estimate the number of use cases to be somewhere around one in a thousand.)

MySQL Performance: Single Object With Multiple Types - JOIN scenario

With the following type of table design:
http://www.martinfowler.com/eaaCatalog/classTableInheritance.html
Let's use the following schema for sake of example:
CREATE TABLE `fruit` (
`id` int(10) UNSIGNED NOT NULL,
`type` tinyint(3) UNSIGNED NOT NULL,
`purchase_date` DATETIME NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `apple` (
`fruit_id` int(10) UNSIGNED NOT NULL,
`is_macintosh` tinyint(1) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `orange` (
`fruit_id` int(10) UNSIGNED NOT NULL,
`peel_thickness_mm` decimal(4,2) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `fruit`
ADD PRIMARY KEY (`id`);
ALTER TABLE `apple`
ADD KEY `fruit_id` (`fruit_id`);
ALTER TABLE `orange`
ADD KEY `fruit_id` (`fruit_id`);
ALTER TABLE `fruit`
MODIFY `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE `apple`
ADD CONSTRAINT `apple_ibfk_1` FOREIGN KEY (`fruit_id`) REFERENCES `fruit` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE `orange`
ADD CONSTRAINT `orange_ibfk_1` FOREIGN KEY (`fruit_id`) REFERENCES `fruit` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
Here, 'apples' and 'oranges' are types of 'fruit', and have unique properties, which is why they've been segmented out into their own tables.
The question is, from a performance standpoint, when performing a SELECT * FROM fruit query, would it be better to:
a) perform a LEFT OUTER JOIN on each typed table, i.e. apple and orange (in practice, we may be dealing with dozens of fruit types)
b) skip the joins and perform a separate query later for each fruit row in the application logic, so for a fruit row of type apple, SELECT * FROM apple WHERE fruit_id=...?
EDIT:
Regarding the specific scenario, I won't go into excruciating detail, but the actual application here is a notification system which generates notifications when certain events occur. There is a different notification type for each event type, and each notification type stores properties unique to that event type. This is on a site with a lot of user activity, so there will eventually be millions of notification rows.
Have one table with columns for the 'common' attributes (eg, type='apple', purchase_date=...), plus one TEXT column with JSON containing any other attributes (eg, subtype='macintosh') appropriate to the row in question.
Or it might make more sense to have subtype as a common attribute, since many fruits have such (think 'navel').
What will you be doing with the "inheritance"? It's great in the textbook, but it sucks in a database. SQL predates inheritance, object-oriented, etc.

Questionable Normalizing

I'm new to designing DB's and MySQL
so to keep it short the information I have is AppId, Last Name, First Name, Gender, Age, DoB, Relation, School Name, School Address, School Phone Number, Teachers, Councilers, If they are in childcare
(AppID is the application ID being submited)
basically this is how I see everything being determined and what tables should be created to be most normalized
AppID -> Child ID
childID -> Last Name, FirstName, Gender, Age, DoB, Relation, School Name, Grade, Child Care
School Name -> Address School, School Phone Number, teacherID,counselorID
teacherID -> First Name, Last Name, Course
CounselorID -> First Name, Last Name, Counselor type
however I'm not sure if attempting to completely normalize this is a good idea since this is a rather small group I'm helping which may cause joining the tables to take longer than a regular grouped look up and may take up more space.
Another concern is MySQL only allows 1 auto incremental variable, which I could define a similar thing in a query but would rather not have to if it's possible. The 2 incremental's would be teacherID and CouncelorID
so any input would be very appreciated
Edit: here's the basic structure, also will add modifications attributes later, dropped courses for now. Thank you
CREATE TABLE `Client_Child_Info` (
`FirstName` varchar(15) NOT NULL,
`LastName` varchar(15) NOT NULL,
`Gender` tinyint(1) NOT NULL,
`Age` tinyint(4) NOT NULL,
`DoB` date NOT NULL,
`Relation` varchar(15) NOT NULL,
`Grade` varchar(3) NOT NULL default 'NA',
`ChildCare` tinyint(1) NOT NULL default '0',
`ChildID` int(11) NOT NULL auto_increment,
PRIMARY KEY (`ChildID`),
KEY `Age` (`Age`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Basic Child Information' AUTO_INCREMENT=1 ;
CREATE TABLE `Client_Child_Schoolinfo` (
`SchoolID` int(11) NOT NULL,
`SchoolName` varchar(50) NOT NULL,
`SchoolAddress` varchar(50) default NULL,
`SchoolPhone` varchar(15) default NULL,
PRIMARY KEY (`SchoolID`),
KEY `SchoolName` (`SchoolName`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='School Information for a given ID ';
CREATE TABLE `Client_child_teacher` (
`TeacherID` int(11) NOT NULL,
`FirstName` varchar(15) NOT NULL,
`LastName` varchar(15) NOT NULL,
`Guidance` tinyint(1) NOT NULL COMMENT 'determines if the person is a guidance councilor or teacher',
PRIMARY KEY (`TeacherID`),
KEY `Guidance` (`Guidance`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Teacher information';
CREATE TABLE `Client_RTchild` (
`AppID` int(11) NOT NULL,
`ChildID` int(11) NOT NULL auto_increment,
PRIMARY KEY (`ChildID`),
KEY `AppID` (`AppID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Reference Table Applicant to Client' AUTO_INCREMENT=1 ;
CREATE TABLE `Client_RTteacher` (
`SchoolID` int(11) NOT NULL,
`TeacherID` int(11) NOT NULL,
PRIMARY KEY (`TeacherID`),
KEY `SchoolID` (`SchoolID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Reference Table for teacher to school ';
CREATE TABLE `Client_RTschool` (
`ChildID` int(11) NOT NULL,
`SchoolID` int(11) NOT NULL auto_increment,
PRIMARY KEY (`SchoolID`),
KEY `ChildID` (`ChildID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Reference Table Child to SchoolID it is attending' AUTO_INCREMENT=1 ;
ALTER TABLE `Client_Child_Info`
ADD CONSTRAINT `Client_Child_Info_ibfk_1` FOREIGN KEY (`ChildID`) REFERENCES `Client_RTchild` (`ChildID`) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE `Client_Child_Schoolinfo`
ADD CONSTRAINT `Client_Child_Schoolinfo_ibfk_1` FOREIGN KEY (`SchoolID`) REFERENCES `Client_RTschool` (`SchoolID`) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE `Client_child_teacher`
ADD CONSTRAINT `Client_child_teacher_ibfk_1` FOREIGN KEY (`TeacherID`) REFERENCES `Client_RTteacher` (`TeacherID`) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE `Client_RTschool`
ADD CONSTRAINT `Client_RTschool_ibfk_1` FOREIGN KEY (`ChildID`) REFERENCES `Client_RTchild` (`ChildID`) ON DELETE CASCADE ON UPDATE CASCADE;
ALTER TABLE `Client_RTteacher`
ADD CONSTRAINT `Client_RTteacher_ibfk_1` FOREIGN KEY (`SchoolID`) REFERENCES `Client_RTschool` (`SchoolID`) ON DELETE CASCADE ON UPDATE CASCADE;
...An old one and the comments are heading into one direction.
Let's put some rationale to statements:
The baselining to share table like structured information is MS Excel or Office software from any other vendor. Unfortunately those client applications are not good in near-simultanous edit by multiple authors, however are good enough to distribute information from ONE author to multiple readers. So if the working mode is the second there is no use in putting any effort to transfer data and data concept to a database system since MS Excel is literally common sense and distribution/training effort is low.
However in case near-simultanous edit by multiple authors is needed and you can afford server hardware and application development a database on a server would be the best place to store data.
Ok, given the decision for a database has been taken:
To go for normalisation or denormalisation is a design decision. Factors like maintenance, setup effort and hardware support play a role for design decisions. Software function design depends on the data model design and the software function design must support the business logic which in your case is to deal with applications. Unfortunately denormalization often builds on assumptions (on top on those basic assumptions that you also have in place for a normalized model) according to the Business Logic, that in future developments often are revised with the ugly consequence that software function must also be changed by then. Coming from there the denormalization has some benefits, if the software has not a time limited scope.