MySQL find invalid foreign keys - mysql

We have a database with a couple hundred tables. Tables using foreign_keys use INNODB.
Sometimes we transfer data (individual tables using mysqldump) between our development, stage, and production databases. mysqldump disables all foreign key checking to make importing the data easy.
So over time some of our non-production databases ends up with a few orphaned records.
I was about to write a script that would find and detect any invalid (keys pointing to missing records) foreign keys for an entire MySQL database.
I know I can write a query to check each table and fkey one by one, but was thinking there may be a tool to do this already.
I would check before writing such a script to see if there is one out there already.
Searched google a bit... surprisingly I found nothing.

If the data is already in and you haven't set up fk constraints or cascades for deleting the parent then you just want:
SELECT * FROM children WHERE my_fk_id NOT IN (select id from parents);

these other answers are fine for small tables but i think they run in O(n^2) which probably isn't ideal for a large db. Instead i used a left join:
SELECT * FROM children c LEFT JOIN parents p ON p.id=c.parent_id WHERE p.id IS NULL AND c.parent_id IS NOT NULL;
Note you may not need that very last not null condition, i did because i wanted to exclude children that didn't have parents (a valid case in my particular scenario)

Related

How do I check the integrity of a relationship in MySQL?

I do something like this:
SET foreign_key_checks = 0;
//many different operations on several tables
SET foreign_key_checks = 1;
How can I verify that my entire base is consistent? I want to be sure that all relationships are properly maintained. For example, if I delete a "country" with id: 20, I want to make sure that no "city" has a non-existent relationship "country_id" = 20.
It's easier if you do not SET foreign_key_checks = 0. If you keep the constraint enforcement on, then you can't make inconsistencies or broken references. You get an error if you try. So you should consider not turning off the FK checks if referential integrity is important.
If you do think you have inconsistencies, you must do a query like the following to verify there are no "orphans" that reference a parent that no longer exists:
SELECT cities.city_id
FROM cities
LEFT OUTER JOIN countries
ON cities.country_id = countries.country_id
WHERE countries.country_id IS NULL;
If the JOIN condition was based on equality of country_id, this means country_id must not be NULL. The left outer join returns NULL for all columns when there is no match. So if you search in the WHERE clause for cases where country_id IS NULL this will only return cities that have no match in the other table.
You must do a separate query for each relationship in your database. This can be quite a chore, and if the tables are very large, it can take a long time.
I once had to do this many years ago in a buggy application that had no foreign key constraints (it used MyISAM tables). I ran a script to do all these orphan-checks every night, and eventually it grew to dozens of queries, and took hours to run.
Then comes the part that is even harder: once you do find some orphaned records, what do you do about them? Do you delete the orphans? Do you change their foreign key column to reference a parent record that does still exist? Do you restore the parent record? It could be any of these options, and you must have the orphaned records reviewed case by case, by someone with the knowledge and authority to choose how to resolve the issue.
It's far better to keep the constraints enforced so you don't have to do that work.

Updating existing lines in MySql and treating Duplicated Keys

I have a MySql database containing data about users of an application. This application is in production already, however improvements are added every day. The last improvement I've made changed the way data is collected and inserted into the database.
Just to be clearer, my database is composed of 5 tables containing user data and 1 table to relate all the tables, through foreign keys. These 5 foreign keys, together, form my Unique Index for this "Main Table" I have.
The issue is that one of these tables containing user data changed its format, and I want to remove all the data older than the modification I made on my application (just from this table, the other ones I need to keep untouched). However, this dataset has foreign keys in the main table, and I can't just drop these lines on the main table because the other informations I have are important. I tried to change the value of the foreign key for this table, in specific, but then, obviously, I have a problem related to duplicated indexes.
Reading on internet, I've found a solution to my problem using "Insert ... On duplicate key update ...", but i'm not inserting data, just updating it. I have an Idea about how to make a program on PHP to update my database, but is there another easier solution? Is it possible to avoid these problems using just MySql syntax?
might be worth looking at the below link
http://www.kavoir.com/2009/05/mysql-insert-if-doesnt-exist-otherwise-update-the-existing-row.html

MySQL database merge of two databases

I set up two Wordpress blogs a while ago, both obviously having different databases. I've more recently merged these databases into one by changing the tables prefixes, therefore these two 'entities' have the same amount of tables and the same names (as they originate from a Wordpress install) but with different prefixes, i.e.:
Blog1_tabledata1
Blog1_tabledata2
Blog1_tabledata3
Blog1_tabledata4
Blog2_tabledata1
Blog2_tabledata2
Blog2_tabledata3
Blog2_tabledata4
I have now realised that I need to merge these two databases (where they're both using the same tables) so that they can be used in the same Wordpress instance (later separated by tags etc).
What would be the most simple way of doing this?
(Please note I am asking this from a MySQL standpoint - this is not a Wordpress question!)
If you absolutely are not looking for a wordpress solution, that means you are not looking at all domains. By this, I mean that you are not looking at what the data means. This could be a problem. but nevertheless:
figure out the foreign keys. If the tables are MyIsam instead of InnoDB, they will be implicit. Figure out what ID points to what field.
select from one DB and insert into the other. This mean you add the rows of one table to the equivalent of the target database. Auto-increment rows will be fine. But for foreign key (explicit AND implicit) fields here is were the trouble starts.
If you insert, say a user, the user gets a new ID -> You have to find the equivalent of the userid in the old db so you can insert the foreign keys with the right ID. this is tricky and without making this a wordpress-question there is no more help we can give you: just figure out what rows they should be :). it is database // domain specific. (with that I mean you can't just figure that out by looking at the fields, you must know some of what they mean)
If the db is correct, this should work, but I'm not sure if you get into trouble with duplicates (all should go on ID, and you fixed that in step 3 with unique and connected id's. but if your domain doesn't want two accounts, two pages or two whatevers (tags?) to have the same name, you still have a problem. But again, this is domain specific logic and you're specifically asking not to go there.

Nested foreign keys (with multiple tables)

I have three tables:
user: id
folders: id, user_id
words: id, folders_id
(there are more columns in every tables, but they are for this problem irrelevant)
A user contains several folders and each folder contains several words.
So far so good. I interconnected these tables with foreign Keys with the help of MySQL Workbench.
What MySQL-Workbench does for the first connection (folders.user_id -> user.id) is as I expected it. But when I add the second relationship (words.folders_id -> folders.id), It automatically generates an index over two columns: the expected one folders.id but also over the column of the first foreign key folders.user_id
I always thought, that duplicate data in a MySQL-Database is not a good solution. But why does MySQL-Workbench propose me to do it this way? The only advantage I can think of is, that I can select the user's id from a word directly without a JOIN. Is this the purpose of this index over two columns?
Thanks for explaining me this phenomenon.
I haven't found a solution yet, but when I search in google images for diagrams of relationships in MySQL Workbench, these nested foreign keys never appear. So I assume they are not necessary and I just keep deleting them when MySQL Workbench creates them.
why don't you write the query as follows:
select user.id from user
inner join folders on folders.user_id=user.id
inner join words on words.folder_id=folders.id

Mysql deduce foreign key relationship for random queries

I am an MySQL novice and am looking for the solution to the following problem:
I would like to create a CMS with cppcms which shall be capable to have modules. Since I want to reduce the chance of (accidental) access to private data, I want a module which handles data access and rights. Since this module is supposed to be unaware of data structures created by other modules I would like it to deduce the data owner through foreign key relations. My idea would be to search for a path (over foreign keys) which links a row to a user id.
Sum up:
What I am trying to do
Taking a random query, determine the affected rows
for the affected rows determine a relationship/path (via foreign keys) to a user/userid (a column in an existing table)
return only the rows for which a relationship could be determined and a condition holds (e.g. the userid found in the related query matches a fixed user id, such as the user currently accessing the system)
(As far as I know foreign keys only enforce the existence of a key in another table, however the precondition I assume is, that every row is linked to a user over a path of foreign key relations)
My Problem/Question:
Is there an existing solution/Better approach to the problem? Prepared statements wont do the trick since I don't know all datastructures/queries in advance.
How do I get the foreign key relations? Is there another way besides "SHOW CREATE TABLE" and then parsing the result string?
How can I determine the rows that would be affected, without modifing them? I would like to filter this set afterwards by determining if I can link it to the current user (not the mysql user but system user).
Could I try executing the query, and then select the affect rows, and if I determine an access violation simply do a rollback? Problem with this: how to do the changes to the subset of rows for which it is legal (e.g. I attempt to change 5 rows, may only change 2, how to only change those 2). One idea was to search a way to create a temporary table with the result set; this solution has several drawbacks: foreign key relations are not possilbe for temporary tables, they are 'lost'.
P.S.: I am coding in c++, therfore I would prefer cpp-compatible library recommendations, however I am open to other suggestions. While googling I stumbled over doctrine and Iam currently researching it.
P.P.S.: Database engine is InnoDB (has to because of the foreign keys)
UPDATE: Explanation Attempt of Part 2:
I am trying to filter which collumns a user is allowed to see of tables. To do so I would like to find a connection in the database over foreign keys (By foreign keys I ensure that I can get to all data over joins, and they are a hint on which columns I have to join). Since I plan on a complexer system (e.g. forum) I don't want to join all data in a temporary table and run a user query on those. I would rather evaluate the userquery and check for the result if I can map it with a join to the users id. For example I could use this to enforce that an edit button is only enabled for the posts created by the user. (I know there are easier ways to do this, but I basically want to allow programmers to write their own queries without giving them the chance to edit or view data that they are not allowed to see. My assumption is that the programmer is not an evildoer but simply forgetting constraints, thus I want to enforce them in software).
Getting here would be pretty good, but I have a little more complex need.
First a basic example. Let's say its like facebook and all the friends of a person are allowed to see his pictures.
pictures = id **userid** file (bool)visibleForFriends album
friendship = **userid1** **userid2**
users = userid
What I want to happen is:
Programmer input "SELECT * FROM pictures WHERE album=2"
System gets all matching records (e.g. set of ids)
System sees foreign key userid, tries to match current userid against the pictures userid, adds all matching to the returned result part
System notices special column visibleForFriends
System tries to determin all Friends (SELECT userid1 FROM friendship WHERE userid2=currentUserID join (have to read up on joins) SELECT userid2 FROM friendship WHERE userid1 =currentUserID)
System adds all rows where visibleForFriends is true and pictures.userid=Result from 5.
While the Friendship part is some extra code (I think doable if igot started on the first bit), I still need to figure out how to automatically follow the foreign keys to see the connection. Ignoring the special Friendship case (special case), I would like the system to work on this as well:
pictures = id **albumid** file (bool)visibleForFriends album
albums = id **userid**
users = userid
Now the system should go pictures.albumid ==> albums.id -> albums.userid ==> users.userid.
I hope the examples clarified the question a bit. One problem is, that in point one from the example (programmer query input) I dont want to let "DELETE *" take effect on anything not owned by the user. So I have to filter which rows to actually delete.
In response to part of your answer (part 1), providing the Mysql user you access the database with has access rights to information_schema then you can use the following query to understand existing foreign key relations within a specific database:
SELECT
TABLE_NAME,
COLUMN_NAME,
REFERENCED_TABLE_NAME,
REFERENCED_COLUMN_NAME
FROM
information_schema.KEY_COLUMN_USAGE
WHERE
TABLE_SCHEMA = 'dbname' AND REFERENCED_COLUMN_NAME IS NOT NULL;
I am slightly confused by the part 2 and am unsure how to give an appropriate response to this section. I hope you find the above query helpful though in your project!
Is there an existing solution/Better approach to the problem?
Yes, I think so. You're describing a multi-tenant database. In a multi-tenant database in which the users share tables (also known as "shared everything"), each table should have a column for the user id. In effect, each row knows its owner.
This will vastly simplify your SQL, since you need no joins to determine who a row belongs to. it will probably speed up your SQL a lot, too.
This SO answer has a decent summary of the issues and alternatives.