MySQL 5.6 information_schema.REFERENTIAL_CONSTRAINTS inconsistency - mysql

I'm building the Docker image with MySQL 5.6 inside (for tests/CI purposes only) and creates the schema and all the tables and then commit the docker container as an image. I intentionally remap the DATADIR to a specific location to keep it inside the container, and not on the volume.
Base image is mysql/mysql-server:5.6
I noticed this super strange behavior:
Every time I start the container and make the following query for the first time:
select * from information_schema.REFERENTIAL_CONSTRAINTS;
I see something like:
+-----+------------------------+-----+
| ... | UNIQUE_CONSTRAINT_NAME | ... |
+-----+------------------------+-----+
| | NULL | |
| | PRIMARY | |
| | PRIMARY | |
| | PRIMARY | |
| | NULL | |
| | PRIMARY | |
| | NULL | |
But if I call it again straight away I see
+-----+------------------------+-----+
| ... | UNIQUE_CONSTRAINT_NAME | ... |
+-----+------------------------+-----+
| | PRIMARY | |
| | PRIMARY | |
| | PRIMARY | |
| | PRIMARY | |
| | PRIMARY | |
| | PRIMARY | |
| | PRIMARY | |
So after first call some of the constraints with initially NULL values became "initialised" somehow.
The code of creating these foreign keys is similar so have no idea why some of them have NULLs and some don't.
...FOREIGN KEY (campaign) REFERENCES coupon_campaign(_id) ON DELETE CASCADE,...
Why it might harm? I use JOOQ and want to do a jooq-codegen on the created container. And because JOOQ uses this system tables to describe the data structure it produces inconsistent results.
Could somebody please hint why this might happen?

Related

Index not used in query. How to improve performance?

I have this query:
SELECT
*
FROM
`av_cita`
JOIN `av_cita_cstm` ON (
(
`av_cita`.`id` = `av_cita_cstm`.`id_c`
)
)
WHERE
av_cita.deleted = 0
This query takes over 120 seconds to finish, yet I have added all indexes.
When I ask for the execution plan:
explain SELECT * FROM `av_cita`
JOIN `av_cita_cstm` ON ( ( `av_cita`.`id` = `av_cita_cstm`.`id_c` ) )
WHERE av_cita.deleted = 0;
I get this:
+----+-------------+--------------+--------+----------------------+---------+---------+---------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+----------------------+---------+---------+---------------------------+--------+-------------+
| 1 | SIMPLE | av_cita | ALL | PRIMARY,delete_index | NULL | NULL | NULL | 192549 | Using where |
| 1 | SIMPLE | av_cita_cstm | eq_ref | PRIMARY | PRIMARY | 108 | rednacional_v2.av_cita.id | 1 | |
+----+-------------+--------------+--------+----------------------+---------+---------+---------------------------+--------+-------------+
delete_index is listed in the possible_keys column, but the key is null, and it doesn't use the index.
Table and index definitions:
+------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+-------+
| id | char(36) | NO | PRI | NULL | |
| name | varchar(255) | YES | MUL | NULL | |
| date_entered | datetime | YES | MUL | NULL | |
| date_modified | datetime | YES | | NULL | |
| modified_user_id | char(36) | YES | | NULL | |
| created_by | char(36) | YES | MUL | NULL | |
| description | text | YES | | NULL | |
| deleted | tinyint(1) | YES | MUL | 0 | |
| assigned_user_id | char(36) | YES | MUL | NULL | |
+------------------+--------------+------+-----+---------+-------+
+---------+------------+--------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+--------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| av_cita | 0 | PRIMARY | 1 | id | A | 192786 | NULL | NULL | | BTREE | | |
| av_cita | 1 | delete_index | 1 | deleted | A | 2 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | name_index | 1 | name | A | 96393 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | date_entered_index | 1 | date_entered | A | 96393 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | created_by | 1 | created_by | A | 123 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | assigned_user_id | 1 | assigned_user_id | A | 1276 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | deleted_id | 1 | deleted | A | 2 | NULL | NULL | YES | BTREE | | |
| av_cita | 1 | deleted_id | 2 | id | A | 192786 | NULL | NULL | | BTREE | | |
| av_cita | 1 | id | 1 | id | A | 192786 | NULL | NULL | | BTREE | | |
+---------+------------+--------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
How can I improve the performance of this query?
The query is losing time on making the join. I would strongly suggest to create and index on av_cita_cstm.id_c. Then the plan will probably be changed to use that index for the av_cita_cstm table, which is much better than PRIMARY. As a consequence PRIMARY will be used on ac_cita.
I think that will bring a big improvement. You might still get more improvement if you make sure delete_index is defined with two fields: (deleted, id), and then move the where condition of the SQL statement into the join condition. But I am not sure MySql will see this as a possibility.
The index on deleted is not used probably because the optimizer has decided that a full table-scan is cheaper than using the index. MySQL tends to make this decision if the value you search for is found on about 20% or more of the rows in the table.
By analogy, think of the index at the back of a book. You can understand why common words like "the" aren't indexed. It would be easier to just read the book cover-to-cover than to flip back and forth to the index, which only tells you that "the" appears on a majority of pages.
If you think MySQL has made the wrong decision, you can make it pretend that a table-scan is more expensive than using a specific index:
SELECT
*
FROM
`av_cita` FORCE INDEX (deleted_index)
JOIN `av_cita_cstm` ON (
(
`av_cita`.`id` = `av_cita_cstm`.`id_c`
)
)
WHERE
av_cita.deleted = 0
Read http://dev.mysql.com/doc/refman/5.7/en/index-hints.html for more information about index hints. Don't overuse index hints, they're useful only in rare cases. Most of the time the optimizer makes the right decision.
Your EXPLAIN plan shows that your join to av_cita_cstm is already using a unique index (the clue is "type: eq_ref" and also the "rows: 1"). I don't think any new index is needed in that table.
I notice the EXPLAIN shows that the table-scan on av_cita scans about an estimated 192549 rows. I'm really surprised that this takes 120 seconds. On any reasonably powerful computer, that should run much faster.
That makes me wonder if you have something else that needs tuning or configuration on this server:
What other processes are running on the server? A lot of applications, perhaps? Are the other processes also running slowly on this server? Do you need to increase the power of the server, or move applications onto their own server?
If you're on MySQL 5.7, try querying the sys schema: this:
select * from sys.innodb_buffer_stats_by_table
where object_name like 'av_cita%';
Are there other costly SQL queries running concurrently?
Did you under-allocate MySQL's innodb_buffer_pool_size? If it's too small, it could be furiously recycling pages in RAM as it scans your table.
select ##innodb_buffer_pool_size;
Did you over-allocate innodb_buffer_pool_size? Once I helped tune a server that was running very slowly. It turned out they had a 4GB buffer pool, but only 1GB of physical RAM. The operating system was swapping like crazy, causing everything to run slowly.
Another thought: You have shown us the columns in av_cita, but not the table structure for av_cita_cstm. Why are you fetching SELECT *? Do you really need all the columns? Are there huge BLOB/TEXT columns in the latter table? If so, it could be reading a large amount of data from disk that you don't need.
When you ask SQL questions, it would help if you run
SHOW CREATE TABLE av_cita\G
SHOW TABLE STATUS LIKE 'av_cita'\G
And also run the same commands for the other table av_cita_cstm, and include the output in your question above.

Database schema: Key/Value table or all keys in one record

I guess that this is somewhat of a philosophical question. I need to collect pathology results for a group of patients and store them in a database. In the past I have used a very simple table structure (simplified):
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| ID | bigint(20) | NO | PRI | NULL | |
| Updated | datetime | NO | PRI | NULL | |
| PatientId | varchar(255) | NO | | NULL | |
| Name | varchar(255) | NO | | NULL | |
| Value | varchar(255) | NO | | NULL | |
+-------------------+--------------+------+-----+---------+-------+
More often in schema design I see:
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| ID | bigint(20) | NO | PRI | NULL | |
| PatientId | varchar(255) | NO | | NULL | |
| Ph_Value | varchar(255) | NO | | NULL | |
| K_Value | varchar(255) | NO | | NULL | |
| Ca_Value | varchar(255) | NO | | NULL | |
| Ph_Value_updated | datetime | NO | | NULL | |
| K_Value_updated | datetime | NO | | NULL | |
| Ca_Value_updated | datetime | NO | | NULL | |
+-------------------+--------------+------+-----+---------+-------+
It seems to me that the first design is much more flexible, expandable etc. However, I do wonder about performance hits when the records run to the millions.
The issue with the second is that there may be a couple of hundred fields that need to be recorded on occasions.
I would be really interested to get comments / advice / guidance on this.
You are absolutely right, the first schema is a lot more flexible: you can add new keys on a live database without changing the schema. However, flexibility is usually bought with the time and/or the space. In this case, it's both: you need more space to store all keys for the same row because the ID is replicated N times, and the joins or orderings required to get the fields together would take time.
There is no reason to pay for flexibility unless you need it. If most of your queries need most of the columns, the second result is the most economical. However, if most of your queries ask for a single column, getting the flexibility may be worth spending the CPU time and the database space.
In my opinion, If that name/value pairs won't be changed much so the second option is much better in the terms of space and number of rows.
Also you can have another solution to optimize the first schema , to put the names in another table and just put name_id instead of repeating the same name several times.
The other schema is to have patient table and a table for each value that contains patient_id and value and the table name is the name for that value

How to implement a superclass/subclass structure in MySQL?

These are the tables in my database, I need to create a couple superclass/subclass structures.
The first is where...
Superclass-Crew_Member
Subclasses-Director, Producer, Other_Directing, Other_Production, Art, Camera, Sound, Grip, Electrical, Post.
The second is...
Superclass-Producer
Subclasses-Salaries, Budget
+---------------------+
| Tables_in_film_crew |
+---------------------+
| art |
| budget |
| camera |
| crew_member |
| director |
| electrical |
| equipment |
| grip |
| location |
| manufacturer |
| other_directing |
| other_production |
| post_production |
| producer |
| salaries |
| sound |
+---------------------+
So how exactly would I go about creating those relationships?
Edit:
Maybe I should have clarified some other things too.
Here's what's contained in crew_member (Superclass):
+-------------+-------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+-------------------+----------------+
| Member_ID | int(5) | NO | PRI | NULL | auto_increment |
| Member_Name | varchar(25) | YES | | [INSERT EXAMPLE] | |
| DOB | date | YES | | [INSERT EXAMPLE] | |
| Address1 | varchar(25) | YES | | [INSERT EXAMPLE] | |
| Address2 | varchar(25) | YES | | [INSERT EXAMPLE] | |
+-------------+-------------+------+-----+-------------------+----------------+
Meanwhile here's what's contained in Other_Directing (Example Subclass):
+---------------+--------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+--------+------+-----+---------+----------------+
| O_Director_ID | int(4) | NO | PRI | NULL | auto_increment |
| FAD_ID | int(5) | NO | MUL | NULL | |
| SAD_ID | int(5) | NO | MUL | NULL | |
| SUD_ID | int(5) | NO | MUL | NULL | |
+---------------+--------+------+-----+---------+----------------+
Now all the foreign keys are referring to Member_ID from Crew_Member. All the other tables (except Director and Producer) are created in similar ways.
You could start by following some general rules which have to be taken in consideration when creating a database. Put the information that different groups have in common in 1 table and the specific data in smaller satellite tables.
I would put the generic information about a crew member in the first table:
so we would have an id, name, address and whatever all the members have in common.
Then you create "sub-tables" which relate to the "crew-member" table through the value crew_member_id. In this tables you put only the specific information related to directors, producers etc..
So the fields here might be something like: id, crew_member_id, directed movies, etc..
Even with the superclass producer you should work in the same way. Relate the subtales with the superclass through its primary key to have relations between them.
I would suggest you to read some articles about database designing. It might save your life in the future, cause after a database is made then it becomes much harder to correct mistakes.
http://www.datanamic.com/support/lt-dez005-introduction-db-modeling.html
Yeah this is a really good question, that I was researching as well.
The ideas I came up with were:
1> Have a parent table as the super class with sattelite tables for the attributes for each subclass joined by a foreign key. You could then represent it as a view.
2> Have a parent table as the super class and another single table for all of the extra attributes. This would have to be matched by joined by two foreign keys.
3> One table which holds all of the classes.(Terrible idea)
Theres other ideas but I think the first is the best bet.
Here's more info that suggests the first way.
http://www.tomjewett.com/dbdesign/dbdesign.php?page=subclass.php

SQL optimizing large or query

I'm attempting to run a query to find any matches between multiple phone number columns on two tables and it is taking far too long (>5 minutes) and this is with the data filtered as much as possible. I've separated the actual columns I can search from both tables into their own tables, just to reduce the amount of total rows.
This is from a legacy application I inherited.
Query
select count(b.bid)
from customers_with_phone c,buyers_orders_with_phone b
where
(b.hphone=c.pprim or b.hphone=c.phome or b.hphone=c.pwork or b.hphone=c.pother)
or (b.wphone=c.pprim or b.wphone=c.phome or b.wphone=c.pwork or b.wphone=c.pother)
or (b.cphone=c.pprim or b.cphone=c.phome or b.cphone=c.pwork or b.cphone=c.pother)
group by b.bid;
Tables
mysql> show columns from customers_with_phone;
+--------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------+------+-----+---------+-------+
| pnum | int(11) | YES | | NULL | |
| pprim | text | YES | | NULL | |
| phome | text | YES | | NULL | |
| pwork | text | YES | | NULL | |
| pother | text | YES | | NULL | |
+--------+---------+------+-----+---------+-------+
mysql> show columns from buyers_orders_with_phone;
+--------+------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+------+------+-----+---------+-------+
| bid | text | YES | | NULL | |
| hphone | text | YES | | NULL | |
| wphone | text | YES | | NULL | |
| cphone | text | YES | | NULL | |
+--------+------+------+-----+---------+-------+
Explain
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+----------------------------------------------+
| 1 | SIMPLE | b | ALL | NULL | NULL | NULL | NULL | 8673 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | c | ALL | NULL | NULL | NULL | NULL | 75931 | 100.00 | Using where; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------+----------------------------------------------+
I realize that neither tables have a primary key, as these are only the columns that I need to search on and I extracted these columns from their original table. But using the original table it takes even longer because there is far more data to filter through.
I have other queries that are similar to this that will work with much more data so if I can make this one work in a reasonable time, I can get the others to work similarly.
A primary key is not a optimazation. What you need are non clustered index on your telephone text fields (one index per column). With these, you won't need to extract your data to seperate tables.
The legacy query is awful, sorry. It is full cartesian product.
The data structure cannot handle such queries effectively. You have 3 fields in one table and 4 in other and try to figure if any pair matches.
Possibly primary key and key for every phone column can improve this query, not sure, but it can make worse delete/insert/update performance.
Btw, you wrote that impossible to index by nullable column. It's not correct.
I can believe in only radical solution - change data structure or adding some kind of caching mechanism with trigger. But it is hard.

Is many to many needed on this DB?

I am designing a DB for a possible PHP MySQL project I may be undertaking.
I am a complete novice at relational DB design, and have only worked with single table DB's before.
This is a diagram of the tables:
So, 'Cars' contains each model of car, and the other 3 tables contains parts that the car can be fitted with.
So each car can have different parts from each of the three tables, and each part can be fitted to different cars from the parts table. In reality, there will be about 10 of these parts tables.
So, what would be the best way to link these together? do I need another table in the middle etc?
and what would I need to do with keys in terms of linking.
There is some inheritance in your parts. The common attributes seem to be:
part_number
price
and there are some specifics for your part types exhaust, software and intake.
There are two strategies:
- have three tables and one view over the three tables
- have one table with a parttype column and may be three views for the tables.
If you'd like to play with your design you might want to look at my companies website http://www.uml2php.com. UML2PHP will automatically convert your UML design to a database design and let you "play" with the result.
At:
http://service.bitplan.com/uml2phpexamples/carparts/
you'll find an example applicaton along your design. The menu does not allow you to access all tables via the menu yet.
via:
http://service.bitplan.com/uml2phpexamples/carparts/index.php?function=dbCheck
the table definitions are accessible:
mysql> describe CP01_car;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| oid | varchar(32) | NO | | NULL | |
| car_id | varchar(255) | NO | PRI | NULL | |
| model | varchar(255) | YES | | NULL | |
| description | text | YES | | NULL | |
| model_year | decimal(10,0) | YES | | NULL | |
+-------------+---------------+------+-----+---------+-------+
mysql> describe CP01_part;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| oid | varchar(32) | NO | | NULL | |
| part_number | varchar(255) | NO | PRI | NULL | |
| price | varchar(255) | YES | | NULL | |
| car_car_id | varchar(255) | YES | | NULL | |
+-------------+--------------+------+-----+---------+-------+
mysql> describe cp01_exhaust;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| oid | varchar(32) | NO | | NULL | |
| type | varchar(255) | YES | | NULL | |
| part_number | varchar(255) | NO | PRI | NULL | |
| price | varchar(255) | YES | | NULL | |
+-------------+--------------+------+-----+---------+-------+
mysql> describe CP01_intake;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| oid | varchar(32) | NO | | NULL | |
| part_number | varchar(255) | NO | PRI | NULL | |
| price | varchar(255) | YES | | NULL | |
+-------------+--------------+------+-----+---------+-------+
mysql> describe CP01_software;
+-------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| oid | varchar(32) | NO | | NULL | |
| power_gain | decimal(10,0) | YES | | NULL | |
| part_number | varchar(255) | NO | PRI | NULL | |
| price | varchar(255) | YES | | NULL | |
+-------------+---------------+------+-----+---------+-------+
The above tables have been generated from the UML model and the result does not fit your needs yet.
Especially if you think of having 10 or more table likes this. The field car_car_id that links your parts to the car table should be available in all the tables. And according to the design proposal the base "table" for the parts should be a view like this:
mysql>
create view partview as
select oid,part_number,price from CP01_software
union select oid,part_number,price from CP01_exhaust
union select oid,part_number,price from CP01_intake;
of course the car_car_id column also needs to be selected;
Now you can edit every table by itself and the partview will show all parts together.
To be able to distinguish the parts types you might want to add another column "part_type".
I would do it like this. Instead of having three different tables for car parts:
table - cars table - parts (this would have only an id and a part
number and a type maybe)
table - part_connections (connectin cars with parts)
table - part_options (with all the options which arent in the
parts table like "power gain")
table - part_option_connections (which connects the parts to the
various part options)
In this way it is much easier to add new parts (because you won't need a new table) and its closer to being normalized as well.