Best practice advise for deleting tables in PHP/MySQL framework? - mysql

What are some best practices tips for tinkering, deleting tables, making reversible changes in MySQL (not production) testing server? In my case I'm learning a PHP/MySQL framework.
The only general tool I have in my toolbox is to rename files before I delete them. If there is a problem I can always return a file to its original name. I would imagine it should be OK to apply the same practice to a database, since clients can lose their connection to a host. Yet, how does a web application framework proceed when referential integrity is broken only in one place?

I guess you are referring to transactions. InnoDB engine in MySQL supports transactions as well as Foreign Key constraints.
In transactional design, you can execute a bunch of queries that need to be executed as a single entity in order to be meaningful and to maintain data integrity. A transaction is started and if something goes wrong it does a Rollback, thus reverting every change done so far, or committing the entire set of modifications in the database.
Foreign keys are constraints for referential data. Thus in a master-detail relationship you cannot e.g. refer to a master record that does not exist. If there is a table comments with a user_id referring to the users.id field , you are not allowed to enter a comment for a non-existent user.
Read more here if you will
http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-model.html
and for foreign keys
http://dev.mysql.com/doc/refman/5.0/en/innodb-foreign-key-constraints.html

Related

How to fill for the first time a SQL database with multiple tables

I have a general question regarding the method of how to fill a database for the first time. Actually, I work on "raw" datasets within R (dataframes that I've built to work and give insights quickly) but I now need to structure and load everything in a relational Database.
For the DB design, everything is OK (=> Conceptual, logical and 3NF). The result is a quite "complex" (it's all relative) data model with many junction tables and foreign keys within tables.
My question is : Now, what is the easiest way for me to populate this DB ?
My approach would be to generate a .csv for each table starting from my "raw" dataframes in R and then load them table per table in the DB. Is it the good way to do it or do you have any easier method ? . Another point is, how to not struggle with FK constraints while populating ?
Thank you very much for the answers. I realize it's very "methodological" questions but I can't find any tutorial/thread related
Notes : I work with R (dplyr, etc.) and MySQL
A serious relational database, such as Postgres for example, will offer features for populating a large database.
Bulk loading
Look for commands that read in external data to be loaded into a table with a matching field structure. The data moves directly from the OS’s file system file directly into the table. This is vastly faster than loading individual rows with the usual SQL INSERT. Such commands are not standardized, so you must look for the proprietary commands in your particular database engine.
In Postgres that would be the COPY command.
Temporarily disabling referential-integrity
Look for commands that defer enforcing the foreign key relationship rules until after the data is loaded.
In Postgres, use SET CONSTRAINTS … DEFERRED to not check constraints during each statement, and instead wait until the end of the transaction.
Alternatively, if your database lacks such a feature, as part of your mass import routine, you could delete your constraints before and then re-establish them after. But beware, this may affect all other transactions in all other database connections. If you know the database has no other users, then perhaps this is workable.
Other issues
For other issues to consider, see the Populating a Database in the Postgres documentation (whether you use Postgres or not).
Disable Autocommit
Use COPY (for mass import, mentioned above)
Remove Indexes
Remove Foreign Key Constraints (mentioned above)
Increase maintenance_work_mem (changing the memory allocation of your database engine)
Increase max_wal_size (changing the configuration of your database engine’s write-ahead log)
Disable WAL Archival and Streaming Replication (consider moving a copy of your database to replicant server(s) rather than letting replication move the mass data)
Run ANALYZE Afterwards (remind your database engine to survey the new state of the data, for use by its query planner)
Database migration
By the way, you will likely find a database migration tool helpful in creating the tables and columns, and possibly in loading the data. Consider tools such as Flyway or Liquibase.

What do you think about cascading deletions on mysql tables?

This question is in the title !
The database i'm using to store datas from my (production) website contains a lot of ON DELETE CASCADE.
I just would know if it's a good thing or if it's a better way to manually code all deletions.
On one hand, it's not very explicit : deletions are made by magic and on a other hand, it make development easier : I don't have to keep the entire schema of my database in my mind.
I think maintaining referential integrity is a good thing to be doing. The last thing you'd want is orphaned rows in your database.
See the MySQL documentation on things to consider when not using referential integrity:
MySQL gives database developers the choice of which approach to use. If you don't need foreign keys and want to avoid the overhead associated with enforcing referential integrity, you can choose another storage engine instead, such as MyISAM. (For example, the MyISAM storage engine offers very fast performance for applications that perform only INSERT and SELECT operations. In this case, the table has no holes in the middle and the inserts can be performed concurrently with retrievals. See Section 8.10.3, “Concurrent Inserts”.)
If you choose not to take advantage of referential integrity checks, keep the following considerations in mind:
In the absence of server-side foreign key relationship checking, the application itself must handle relationship issues. For example, it must take care to insert rows into tables in the proper order, and to avoid creating orphaned child records. It must also be able to recover from errors that occur in the middle of multiple-record insert operations.
If ON DELETE is the only referential integrity capability an application needs, you can achieve a similar effect as of MySQL Server 4.0 by using multiple-table DELETE statements to delete rows from many tables with a single statement. See Section 13.2.2, “DELETE Syntax”.
A workaround for the lack of ON DELETE is to add the appropriate DELETE statements to your application when you delete records from a table that has a foreign key. In practice, this is often as quick as using foreign keys and is more portable.
Be aware that the use of foreign keys can sometimes lead to problems:
Foreign key support addresses many referential integrity issues, but it is still necessary to design key relationships carefully to avoid circular rules or incorrect combinations of cascading deletes.
It is not uncommon for a DBA to create a topology of relationships that makes it difficult to restore individual tables from a backup. (MySQL alleviates this difficulty by enabling you to temporarily disable foreign key checks when reloading a table that depends on other tables. See Section 14.3.5.4, “FOREIGN KEY Constraints”. As of MySQL 4.1.1, mysqldump generates dump files that take advantage of this capability automatically when they are reloaded.)
Source: http://dev.mysql.com/doc/refman/5.5/en/ansi-diff-foreign-keys.html
Cascading deletes are a great tool for you to use provided you make sure only to use them where it makes perfect sense to do so.
The main situation in which you would opt for using a cascading delete is when you have a table that models entities that are "owned" by one (and only one) row in another table. For example, if you have a table that models people and a table that models phone numbers. Here your phone numbers table would have a foreign key to your people table. Now if you decide you no longer want your application to keep track of someone - say "Douglas" - it makes perfect sense that you don't want to keep track of Douglas's phone numbers any more, either. There is no sense in having a phone number floating around in your database and not know whose it is.
But at the same time, when you want to delete a person from the "people" table, you don't want to first have to laboriously check whether you have any phone numbers for that person and delete them. Why do that when you can encode into the database structure the rule that when a person is deleted, their phone numbers can all go as well? That is what a cascading delete will do for you. Just make sure you know what cascading deletes you have, and that they all make sense.
NB. If you use triggers, you need to be more careful. MySQL doesn't fire triggers on cascading deletes.

Setting MySQL unique key or checking for duplicate in application part?

Which one is more reliable and has better performance? Setting MySQL unique key and using INSERT IGNORE or first checking if data exists on database and act according to the result?
If the answer is the second one, is there any way to make a single SQL query instead of two?
UPDATE: I ask because my colleagues in the company I work believe that deal with such issues should be done in application part which is more reliable according to them.
You application won't catch duplicates.
Two concurrent calls can insert the same data, because each process doesn't see the other while your application checks for uniqueness. Each process thinks it's OK to INSERT.
You can force some kind of serialisation but then you have a bottleneck and performance limit. And you will have other clients writing to the database, even if it is just a release script-
That is why there are such things as unique indexes and constraints generally. Foreign keys, triggers, check constraints, NULL/NIOT NULL, datatype constraints are all there to enforce data integrity
There is also the arrogance of some code monkey thinking they can do better.
See programmers.se: Constraints in a relational databases - Why not remove them completely? and this Enforcing Database Constraints In Application Code (SO)
Settings a unique key is better. It will reduce the amount of round-trips to mysql you'll have to do for a single operation, and item uniqueness is ensured, reducing errors caused by your own logic.
You definitely should set a unique key in your MySQL table, no matter what you decide.
As far as the other part of your question, definitely use insert ignore on duplicate key update if that is what you intend for your application.
I.e. if you're going to load a bunch of data and you don't care what the old data was, you just want the new data, that is the way to go.
On the other hand, if there is some sort of decision branch that is based on whether the change is an update or a new value, I think you would have to go with option 2.
I.e. If changes to the table are recorded in some other table (e.g. table: change_log with columns: id,table,column,old_val,new_val), then you couldn't just use INSERT IGNORE because you would never be able to tell which values were changed vs. which were newly inserted.

Session / Log tables keys design question

I have almost always heard people say not to use FKs with user session and any log tables as those are usually High write tables and once written data almost always tays forever without any updates or deletes.
But the question is I have colunms like these:
User_id (link a session or activity log to the user)
activity_id (linking the log activity table to the system activity lookup table)
session_id (linking the user log table with the parent session)
... and there are 4-5 more colunms.
So if I dont use FKs then how will i "relate" these colunms? Can i join tables and get the user info without FKs? Can i write correct data without FKs? Any performance impact or do people just talk and say this is a no no?
Another question I have is if i dont use FKs can i still connect my data with lookup tables?
In fact, you can build the whole database without real FKs in mysql. If you're using MyISAM as a storage engine, the FKs aren't real anyway.
You can nevertheless do all the joins you like, as long as the join keys match.
Performance impact depends on how much data you stuff into a referenced table. It takes extra time if you have a FK in a table and insert data into it, or update a FK value. Upon insertion or modification, the FK needs to be looked up in the referenced table to ensure the reference integrity.
On highly used tables which don't really need reference integrity, I'd just stick with loose columns instead of FKs.
AFAIK InnoDB is currently the only one supporting real foreign keys (unless MySQL 5.5 got new or updated storage engines which support them as well). Storage engines like MyISAM do support the syntax, but don't actually validate the referential integrity.
FK's can be detrimental in "history log" tables. This kind of table wants to preserve the exact state of what happened at a point in time.
The problem with FK's is they don't store the value, just a pointer to the value. If the value changes, then the history is lost. You DO NOT WANT updates to cascade into your history log. It's OK to have a "fake Foreign key" that you can join on, but you also want to intensionally de-normalize relevant fields to preserve the history.

Drupal MySql database design question

I was just looking at the MySql database created by drupal after I installed it.
All the tables are in MyISAM.
With a complex software like drupal, wouldn't it make more sense to
use foreign keys and hence InnoDB tables to enforce referential integrity?
Without foreign keys all the constraint checking will happen at the
PHP end.
MySQL offers a variety of database engines for a reason - different engines offer different advantages and disadvantages. InnoDB is a great engine that offers referential integrity as well as transaction safety, but it is poorly optimized for the use case of web site where you have order of magnitude more reads then writes.
MyISAM offers the best performance for a web site where most hits need only read access to the database. In such cases referential integrity can most often be maintained by writing your data inserts and deletes in a way that they cannot succeed if they compromise integrity.
For example, instead of writing
DELETE FROM mytable WHERE id = 5
you can write
DELETE mytable FROM mytable LEFT JOIN linkedtable ON mytable.id=linkedtable.ref WHERE id = 5 AND linkedtable.ref IS NULL
This will succeed in deleting the row only when the are no external references to it.