How do I check the integrity of a relationship in MySQL? - mysql

I do something like this:
SET foreign_key_checks = 0;
//many different operations on several tables
SET foreign_key_checks = 1;
How can I verify that my entire base is consistent? I want to be sure that all relationships are properly maintained. For example, if I delete a "country" with id: 20, I want to make sure that no "city" has a non-existent relationship "country_id" = 20.

It's easier if you do not SET foreign_key_checks = 0. If you keep the constraint enforcement on, then you can't make inconsistencies or broken references. You get an error if you try. So you should consider not turning off the FK checks if referential integrity is important.
If you do think you have inconsistencies, you must do a query like the following to verify there are no "orphans" that reference a parent that no longer exists:
SELECT cities.city_id
FROM cities
LEFT OUTER JOIN countries
ON cities.country_id = countries.country_id
WHERE countries.country_id IS NULL;
If the JOIN condition was based on equality of country_id, this means country_id must not be NULL. The left outer join returns NULL for all columns when there is no match. So if you search in the WHERE clause for cases where country_id IS NULL this will only return cities that have no match in the other table.
You must do a separate query for each relationship in your database. This can be quite a chore, and if the tables are very large, it can take a long time.
I once had to do this many years ago in a buggy application that had no foreign key constraints (it used MyISAM tables). I ran a script to do all these orphan-checks every night, and eventually it grew to dozens of queries, and took hours to run.
Then comes the part that is even harder: once you do find some orphaned records, what do you do about them? Do you delete the orphans? Do you change their foreign key column to reference a parent record that does still exist? Do you restore the parent record? It could be any of these options, and you must have the orphaned records reviewed case by case, by someone with the knowledge and authority to choose how to resolve the issue.
It's far better to keep the constraints enforced so you don't have to do that work.

Related

DB design for M:N table with time interval

i would like to ask you a design question:
I am designing a table that makes me scratch my head, not sure what the best approach is, i feel like i am missing something:
There are two tables A and B and one M:N relationship table between them. The relationship table has right now these values:
A.ID, B.ID, From, To
Bussiness requirements:
At any time, A:B relation ship can be only 1:1
A:B can repeat in time as defined by From and To datetime values, which specify an interval
example: Car/Driver.
Any car can have only 1 Driver at any time
Any Driver can drive only 1 car at any time (this is NOT topgear, ok? :) )
Driver can change the car after some time, and can return to the same car
Now, i am not sure:
- what PK should i go with? A,B is not enough, adding From and To doesnt feel right, maybe an autoincrement PK?
-any way to enforce the bussiness requirements by DB design?
-for business reason, i would prefer it not to be in a historical table. Why? Well, let's assume the car is rented and i want to know, given a date, who had what car rented at that date. Splitting it into historical table would require more joinst :(
I feel like i am missing something, some kind of general patter ... or i dont know....
Thankful for any help, so thank you :)
I don't think you are actually missing anything. I think you've got a handle on what the problem is.
I've read a couple of articles about how to handle "temporal" data in a relational database.
Bottom line consensus is that the traditional relational model doesn't have any builtin mechanism for supporting temporal data.
There are several approaches, some better suited to particular requirements than others, but all of the approaches feel like they are "duct taped" on.
(I was going to say "bolted on", but I thought at tip of the hat to Red Green was in order: "... the handyman's secret weapon, duct tape", and "if the women don't find you handsome, they should at least find in you handy.")
As far as a PRIMARY KEY or UNIQUE KEY for the table, you could use the combination of (a_id, b_id, from). That would give the row a unique identifier.
But, that doesn't do anything to prevent overlapping "time" ranges.
There is no declarative constraint for a MySQL table that prevents "overlapping" datetime ranges that are stored as "start","end" or "start","duration", etc. (At least, in the general case. If you had very well defined ranges, and triggers that rounded the from to an even four hour boundary, and a duration to exactly four hours, you could use a UNIQUE constraint. In the more general case, for any ol' values of from and to, the UNIQUE constraint does not work for us.
A CHECK constraint is insufficient (since you would need to look at other rows), and even if it were possible, MySQL doesn't actually enforce check constraints.
The only way (I know of) to get the database to enforce such a constraint would be a TRIGGER that looks for the existence of another row for which the affected (inserted/updated) row would conflict.
You'd need both a BEFORE INSERT trigger and a BEFORE UPDATE trigger. The trigger would need to query the table, to check for the existence of a row that "overlaps" the new/modified row
SELECT 1
FROM mytable t
WHERE t.a_id = NEW.a_id
AND t.b_id = NEW.b_id
AND t.from <> OLD.from
AND < (t.from, t.to) overlaps (NEW.from,NEW.to) >
Obviously, that last line is pseudocode for the actual syntax that would be required.
The line before that would only be needed in the BEFORE UPDATE trigger, so we don't find (as a "match") the row being updated. The actual check there would really depend on the selection of the PRIMARY KEY (or UNIQUE KEY)(s).
With MySQL 5.5, we can use the SIGNAL statement to return an error, if we find the new/updated row would violate the constraint. With previous versions of MySQL, we can "throw" an error by doing something that causes an actual error to occur, such as running a query against a table name that we know does not exist.
And finally, this type of functionality doesn't necessarily have to be implemented in a database trigger; this could be handled on the client side.
How about three tables:
TCar, TDriver, TLog
TCar
pkCarID
fkDriverID
name
A unique index on driver ensures a driver is only ever in one car. Turning the foreign key
fkDriverID into a 1:1 relation ship.
TDriver
pkDriverID
name
TLog
pkLogID (surrogate pk)
fkCarID
fkDriverID
from
to
With 2 joins you will get any information you describe. if you just need to find Car data by driverID or driver data by cardid you can do it with one join.
thank you everyone for you input, so far i am thinking about this approach, would be thankful for any criticism/pointing out flaws:
Tables (pseudoSQLcode):
Car (ID pk auto_increment, name)
Driver(ID pk auto_increment, name)
Assignment (CarID unique,DriverID unique,from Datetime), composite PK (CarID,DriverID)
AssignmentHistory (CarID unique,DriverID unique,from Datetime,to Datetime) no pk
of course, CarID is a FK to Car(ID) and DriverID is a FK to Driver(ID)
the next stage are two triggers (and boy oh boy, i hope this can be done in mysql (works on MSSSQL, but i dont have a mysql db handy right now to test):
!!! Warning, MSSQL for now
create trigger Assignment _Update on Assignment instead of update as
delete Assignment
from Assignment
join inserted
on ( inserted.CarID= Assignment .CarID
or inserted.DriverID= Assignment .DriverID)
and ( inserted.CarID<> omem.CarID or inserted.DriverID<> omem.DriverID)
insert into Assignment
select * from inserted;
create trigger Assignment _Insert on Assignment after delete as
insert into Assignment_History
select CarID,DriverID,from,NOW() from deleted;
i tested it a bit and it seems for each bussiness case it does what i need it to do

Designer table relations vs joins

I just have a small beginners question of MySQL containing relations and joins.
What is the difference between them? In my phpmyadmin in the designer section I can make relationships between tables. So that the tables are linked with each other through for example "id"
But if I in my php code do a join / left join...
for example:
$stmt = $db->prepare ("SELECT * FROM visitor
LEFT JOIN host ON visitor.host_id=host.id
LEFT JOIN reason ON visitor.reason_id=reason.id
WHERE visitor.id = ?");
$stmt->bindParam(1, $lastid);
$stmt->execute();
Isn't that just the same I have done? I am asking now why do you need to set the relations into phpmyadmin? What is the benefit of doing that?
I suspect what you're referring to are foreign keys. You're technically using foreign keys in your join (the reason_id field) already, but by formally defining them it creates an integrity constraint. The advantage of this is that it'd be impossible to accidentally insert a visitor which has an invalid reason_id, it's also possible to specify the delete behaviour to ensure that all related records are cleaned up appropriately. Note that not all storage engines support foreign keys.
See here: https://dev.mysql.com/doc/refman/5.6/en/create-table-foreign-keys.html

Remove duplicate rows in MySQL _BUT_ reassign all referential records to kept record

I'm trying to de-duplicate user accounts in our system and I know there are lots of questions out there about removing/identifying duplicates (such as Remove duplicate rows in MySQL), but I haven't seen any that required maintaining referential records.
I have a users table and a subscriptions table with a foreign key field User_ID common to both and set to CASCADE in subscriptions.
I'd like to remove all duplicates in the users table but in doing so, all of the records corresponding to User_ID in the subscriptions table would be lost due to the CASCADE behavior.
Is it possible to UPDATE the users table, altering the User_ID of the duplicate records to the one I want to keep, without colliding with the unique index, allowing all referential records to be updated accordingly and finally removing the duplicate User record without cascading the delete?
The added complication is that the User_ID field in the users table is obviously indexed with unique.
EDIT: I should add that this is a simplified example, our DB has 100+ tables many of which have foreign keys based on the User_ID.
So in the end, as #MarcB helped me discover above, the correct answer is to have planned better in the beginning ;)
So in the end, we're going to have to write a programatic solution to manually join accounts. We're lucky enough to have DAO/DTO's for every object type and so it shouldn't be too bad dealing with the referential records, it'll just be an intense operation and so will require some good planning wink.

MySQL find invalid foreign keys

We have a database with a couple hundred tables. Tables using foreign_keys use INNODB.
Sometimes we transfer data (individual tables using mysqldump) between our development, stage, and production databases. mysqldump disables all foreign key checking to make importing the data easy.
So over time some of our non-production databases ends up with a few orphaned records.
I was about to write a script that would find and detect any invalid (keys pointing to missing records) foreign keys for an entire MySQL database.
I know I can write a query to check each table and fkey one by one, but was thinking there may be a tool to do this already.
I would check before writing such a script to see if there is one out there already.
Searched google a bit... surprisingly I found nothing.
If the data is already in and you haven't set up fk constraints or cascades for deleting the parent then you just want:
SELECT * FROM children WHERE my_fk_id NOT IN (select id from parents);
these other answers are fine for small tables but i think they run in O(n^2) which probably isn't ideal for a large db. Instead i used a left join:
SELECT * FROM children c LEFT JOIN parents p ON p.id=c.parent_id WHERE p.id IS NULL AND c.parent_id IS NOT NULL;
Note you may not need that very last not null condition, i did because i wanted to exclude children that didn't have parents (a valid case in my particular scenario)

Mysql deduce foreign key relationship for random queries

I am an MySQL novice and am looking for the solution to the following problem:
I would like to create a CMS with cppcms which shall be capable to have modules. Since I want to reduce the chance of (accidental) access to private data, I want a module which handles data access and rights. Since this module is supposed to be unaware of data structures created by other modules I would like it to deduce the data owner through foreign key relations. My idea would be to search for a path (over foreign keys) which links a row to a user id.
Sum up:
What I am trying to do
Taking a random query, determine the affected rows
for the affected rows determine a relationship/path (via foreign keys) to a user/userid (a column in an existing table)
return only the rows for which a relationship could be determined and a condition holds (e.g. the userid found in the related query matches a fixed user id, such as the user currently accessing the system)
(As far as I know foreign keys only enforce the existence of a key in another table, however the precondition I assume is, that every row is linked to a user over a path of foreign key relations)
My Problem/Question:
Is there an existing solution/Better approach to the problem? Prepared statements wont do the trick since I don't know all datastructures/queries in advance.
How do I get the foreign key relations? Is there another way besides "SHOW CREATE TABLE" and then parsing the result string?
How can I determine the rows that would be affected, without modifing them? I would like to filter this set afterwards by determining if I can link it to the current user (not the mysql user but system user).
Could I try executing the query, and then select the affect rows, and if I determine an access violation simply do a rollback? Problem with this: how to do the changes to the subset of rows for which it is legal (e.g. I attempt to change 5 rows, may only change 2, how to only change those 2). One idea was to search a way to create a temporary table with the result set; this solution has several drawbacks: foreign key relations are not possilbe for temporary tables, they are 'lost'.
P.S.: I am coding in c++, therfore I would prefer cpp-compatible library recommendations, however I am open to other suggestions. While googling I stumbled over doctrine and Iam currently researching it.
P.P.S.: Database engine is InnoDB (has to because of the foreign keys)
UPDATE: Explanation Attempt of Part 2:
I am trying to filter which collumns a user is allowed to see of tables. To do so I would like to find a connection in the database over foreign keys (By foreign keys I ensure that I can get to all data over joins, and they are a hint on which columns I have to join). Since I plan on a complexer system (e.g. forum) I don't want to join all data in a temporary table and run a user query on those. I would rather evaluate the userquery and check for the result if I can map it with a join to the users id. For example I could use this to enforce that an edit button is only enabled for the posts created by the user. (I know there are easier ways to do this, but I basically want to allow programmers to write their own queries without giving them the chance to edit or view data that they are not allowed to see. My assumption is that the programmer is not an evildoer but simply forgetting constraints, thus I want to enforce them in software).
Getting here would be pretty good, but I have a little more complex need.
First a basic example. Let's say its like facebook and all the friends of a person are allowed to see his pictures.
pictures = id **userid** file (bool)visibleForFriends album
friendship = **userid1** **userid2**
users = userid
What I want to happen is:
Programmer input "SELECT * FROM pictures WHERE album=2"
System gets all matching records (e.g. set of ids)
System sees foreign key userid, tries to match current userid against the pictures userid, adds all matching to the returned result part
System notices special column visibleForFriends
System tries to determin all Friends (SELECT userid1 FROM friendship WHERE userid2=currentUserID join (have to read up on joins) SELECT userid2 FROM friendship WHERE userid1 =currentUserID)
System adds all rows where visibleForFriends is true and pictures.userid=Result from 5.
While the Friendship part is some extra code (I think doable if igot started on the first bit), I still need to figure out how to automatically follow the foreign keys to see the connection. Ignoring the special Friendship case (special case), I would like the system to work on this as well:
pictures = id **albumid** file (bool)visibleForFriends album
albums = id **userid**
users = userid
Now the system should go pictures.albumid ==> albums.id -> albums.userid ==> users.userid.
I hope the examples clarified the question a bit. One problem is, that in point one from the example (programmer query input) I dont want to let "DELETE *" take effect on anything not owned by the user. So I have to filter which rows to actually delete.
In response to part of your answer (part 1), providing the Mysql user you access the database with has access rights to information_schema then you can use the following query to understand existing foreign key relations within a specific database:
SELECT
TABLE_NAME,
COLUMN_NAME,
REFERENCED_TABLE_NAME,
REFERENCED_COLUMN_NAME
FROM
information_schema.KEY_COLUMN_USAGE
WHERE
TABLE_SCHEMA = 'dbname' AND REFERENCED_COLUMN_NAME IS NOT NULL;
I am slightly confused by the part 2 and am unsure how to give an appropriate response to this section. I hope you find the above query helpful though in your project!
Is there an existing solution/Better approach to the problem?
Yes, I think so. You're describing a multi-tenant database. In a multi-tenant database in which the users share tables (also known as "shared everything"), each table should have a column for the user id. In effect, each row knows its owner.
This will vastly simplify your SQL, since you need no joins to determine who a row belongs to. it will probably speed up your SQL a lot, too.
This SO answer has a decent summary of the issues and alternatives.