MySQL - Huge difference in cardinality on what should be a duplicate table

MySQL - Huge difference in cardinality on what should be a duplicate table - mysql

On my development server I have a column indexed with a cardinality of 200.
The table has about 6 million rows give or take and I have confirmed it is an identical row count on the production server.
However the production servers index has a cardinality of 31938.
They are both mysql 5.5 however my dev server is Ubuntu server 13.10 and the production server is Windows server 2012.
Any ideas on what would cause such a difference in what should be the exact same data?
The data was loaded into the production server from a MySQL dump of the dev server.
EDIT: Its worth noting that I have queries that take about 15 minutes to run on my dev server that seem to run forever on the production server due to what i believe to be these indexing issues. Different amounts of rows are being pulled within sub-queries.

Mysql checksums might help you verify that the tables are the same
-- a table
create table test.t ( id int unsigned not null auto_increment primary key, r float );
-- some data ( 18000 rows or so )
insert into test.t (r) select rand() from mysql.user join mysql.user u2;
-- a duplicate
create table test.t2 select * from test.t;
-- introduce a difference somewhere in there
update test.t2 set r = 0 order by rand() limit 1;
-- and prove the tables are different easily:
mysql> checksum table test.t;
+--------+------------+
| Table | Checksum |
+--------+------------+
| test.t | 2272709826 |
+--------+------------+
1 row in set (0.00 sec)
mysql> checksum table test.t2
-> ;
+---------+-----------+
| Table | Checksum |
+---------+-----------+
| test.t2 | 312923301 |
+---------+-----------+
1 row in set (0.01 sec)
Beware the checksum locks tables.
For more advanced functionality, the percona toolkit can both checksum and sync tables (though it's based on master/slave replication scenarios so it might not be perfect for you).
Beyond checksumming, you might consider looking at REPAIR OR OPTIMIZE.

Related

Slow time updating primary key indexed row

I have a query that updates a field in a table using the primary key to locate the row. The table can contain many rows where the date/time field is initially NULL, and then is updated with a date/time stamp using NOW().
When I run the update statement on the table, I am getting a slow query log entry (3.38 seconds). The log indicates that 200,000 rows were examined. Why would that many rows be examined if I am using the PK to identify the row being updated?
Primary key is item_id and customer_id. I have verified the PRIMARY key is correct in the mySQL table structure.
UPDATE cust_item
SET status = 'approved',
lstupd_dtm = NOW()
WHERE customer_id = '7301'
AND item_id = '12498';

I wonder if it's a hardware issue.
While the changes I've mentioned in comments might help slightly, in truth, I cannot replicate this issue...
I have a data set of roughly 1m rows...:
CREATE TABLE cust_item
(customer_id INT NOT NULL
,item_id INT NOT NULL
,status VARCHAR(12) NULL
,PRIMARY KEY(customer_id,item_id)
);
-- INSERT some random rows...
SELECT COUNT(*)
, SUM(customer_id = 358) dense
, SUM(item_id=12498) sparse
FROM cust_item;
+----------+-------+--------+
| COUNT(*) | dense | sparse |
+----------+-------+--------+
| 1047720 | 104 | 8 |
+----------+-------+--------+
UPDATE cust_item
SET status = 'approved'
WHERE item_id = '12498'
AND customer_id = '358';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0

How long does it take to select the record, without the update?
If select is fast then you need to look into things that can affect update/write speed.
too many indexes on the table, don't forget filtered indexes and indexed views
the index pages have 0 fill factor and need to split to accommodate the data change.
referential constraints with cascade
triggers
slow write speed at the storage level
If the select is slow
old/bad statistics on the index
extreme fragmentation
columnstore index with too many open rowgroups
If the select speed improves significantly after the first time, you may be having some cold buffer performance issues. That could point to storage I/O problems as well.
You may also be having concurrency issues caused by another process locking the table momentarily.
Finally, any chance the tool executing the query is returning a false duration? For example, SQL Server Management Studio can occasionally be slow to return a large resultset, even if the server handled it very quickly.

What is the default select order in PostgreSQL or MySQL?

I have read in the PostgreSQL docs that without an ORDER statement, SELECT will return records in an unspecified order.
Recently on an interview, I was asked how to SELECT records in the order that they inserted without an PK or created_at or other field that can be used for order. The senior dev who interviewed me was insistent that without an ORDER statement the records will be returned in the order that they were inserted.
Is this true for PostgreSQL? Is it true for MySQL? Or any other RDBMS?

I can answer for MySQL. I don't know for PostgreSQL.
The default order is not the order of insertion, generally.
In the case of InnoDB, the default order depends on the order of the index read for the query. You can get this information from the EXPLAIN plan.
For MyISAM, it returns orders in the order they are read from the table. This might be the order of insertion, but MyISAM will reuse gaps after you delete records, so newer rows may be stored earlier.
None of this is guaranteed; it's just a side effect of the current implementation. MySQL could change the implementation in the next version, making the default order of result sets different, without violating any documented behavior.
So if you need the results in a specific order, you should use ORDER BY on your queries.

Following BK's answer, and by way of example...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table(id INT NOT NULL) ENGINE = MYISAM;
INSERT INTO my_table VALUES (1),(9),(5),(8),(7),(3),(2),(6);
DELETE FROM my_table WHERE id = 8;
INSERT INTO my_table VALUES (4),(8);
SELECT * FROM my_table;
+----+
| id |
+----+
| 1 |
| 9 |
| 5 |
| 4 | -- is this what
| 7 |
| 3 |
| 2 |
| 6 |
| 8 | -- we expect?
+----+

In the case of PostgreSQL, that is quite wrong.
If there are no deletes or updates, rows will be stored in the table in the order you insert them. And even though a sequential scan will usually return the rows in that order, that is not guaranteed: the synchronized sequential scan feature of PostgreSQL can have a sequential scan "piggy back" on an already executing one, so that rows are read starting somewhere in the middle of the table.
However, this ordering of the rows breaks down completely if you update or delete even a single row: the old version of the row will become obsolete, and (in the case of an UPDATE) the new version can end up somewhere entirely different in the table. The space for the old row version is eventually reclaimed by autovacuum and can be reused for a newly inserted row.

Without an ORDER BY clause, the database is free to return rows in any order. There is no guarantee that rows will be returned in the order they were inserted.
With MySQL (InnoDB), we observe that rows are typically returned in the order by an index used in the execution plan, or by the cluster key of a table.
It is not difficult to craft an example...
CREATE TABLE foo
( id INT NOT NULL
, val VARCHAR(10) NOT NULL DEFAULT ''
, UNIQUE KEY (id,val)
) ENGINE=InnoDB;
INSERT INTO foo (id, val) VALUES (7,'seven') ;
INSERT INTO foo (id, val) VALUES (4,'four') ;
SELECT id, val FROM foo ;
MySQL is free to return rows in any order, but in this case, we would typically observe that MySQL will access rows through the InnoDB cluster key.
id val
---- -----
4 four
7 seven
Not at all clear what point the interviewer was trying to make. If the interviewer is trying to sell the idea, given a requirement to return rows from a table in the order the rows were inserted, a query without an ORDER BY clause is ever the right solution, I'm not buying it.
We can craft examples where rows are returned in the order they were inserted, but that is a byproduct of the implementation, ... not guaranteed behavior, and we should never rely on that behavior to satisfy a specification.

TRUNCATE table on MariaDB just started hanging

I am running 10.1.26-MariaDB-0+deb9u1 Debian 9.1 in multiple locations.
Just got a call today that some scripts are no longer running at one of the locations. I've diagnosed that whenever a script tries to execute TRUNCATE <table name> it just hangs.
I've tried it from the CLI and Workbench as well with the same results. I have also tried TRUNCATE TABLE <table name> with the same results.
I cannot figure out A) why this all of a sudden stopped working. and B) what's different between this location and other three, where it does work.

I expect you see something like this:
mysql> show processlist;
+----+-----------------+-----------+------+---------+------+---------------------------------+------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-----------------+-----------+------+---------+------+---------------------------------+------------------------+
| 8 | msandbox | localhost | test | Query | 435 | Waiting for table metadata lock | truncate table mytable |
Try this experiment in a test instance of MySQL (like on your local development environment): open two shell windows and run the mysql client. Create a test table.
mysql> create table test.mytable ( answer int );
mysql> insert into test.mytable set answer = 42;
Now start a transaction and query the table, but do not commit the transaction yet.
mysql> begin;
mysql> select * from test.mytable;
+--------+
| answer |
+--------+
| 42 |
+--------+
In the second window, try to truncate that table.
mysql> truncate table mytable;
<hangs>
What it's waiting for is a metadata lock. It will wait for a number of seconds equal to the lock_wait_timeout configuration option.
Now go back to the first window, and commit.
mysql> commit;
Now see in your second window, the TRUNCATE TABLE stops waiting, and it finally does its work, truncating the table.
Any DDL statement like ALTER TABLE, TRUNCATE TABLE, DROP TABLE needs to acquire an exclusive metadata lock on the table. But any transaction that has been reading or writing that table holds a shared metadata lock. This means many concurrent sessions can do their work, like SELECT/UPDATE/INSERT/DELETE without blocking each other (because their locks are shared). But a DDL statement requires an exclusive metadata lock, meaning no other metadata lock, either shared or exclusive, can exist.
So I'd guess there's some transaction hanging around that has done some read or write against your table, without committing. Either the query itself is very long-running, or else the query has finished but the transaction hasn't.
You have to figure out where you have an outstanding transaction. If you are using MySQL 5.7 or later, you can read the sys.schema_lock_waits table while one of your truncate table statements is waiting.
select * from sys.schema_table_lock_waits\G
*************************** 1. row ***************************
object_schema: test
object_name: mytable
waiting_thread_id: 47
waiting_pid: 8
waiting_account: msandbox#localhost
waiting_lock_type: EXCLUSIVE
waiting_lock_duration: TRANSACTION
waiting_query: truncate table mytable
waiting_query_secs: 625
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 48
blocking_pid: 9
blocking_account: msandbox#localhost
blocking_lock_type: SHARED_READ
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 9
sql_kill_blocking_connection: KILL 9
This tells us which session is blocked, waiting for a metadata lock. The waiting_pid (8 in the above example) corresponds to the Id in the processlist of the blocked session.
The blocking_pid (9 in the above example) corresponds to the Id in the processlist of the session that currently holds the lock, and which is blocking the truncate table.
It even tells you exactly how to kill the session that's holding the lock:
mysql> KILL 9;
Once the session is killed, it must release its locks, and the truncate table finally finishes.
mysql> truncate table mytable;
Query OK, 0 rows affected (13 min 34.50 sec)
Unfortunately, you're using MariaDB 10.1. This doesn't support the sys schema or the performance_schema.metadata_locks table that it needs to report those locks. MariaDB is a fork from MySQL 5.5, which is nearly ten years old now, and they didn't have the metadata_locks table at that time.
I don't use MariaDB, but I googled and found that they have their own proprietary implementation for querying metadata locks: https://mariadb.com/kb/en/library/metadata_lock_info/ I haven't used it, so I'll leave it to you to read the docs about that.

MySQL 5.1.* Strange trigger replication behavior

I have a MySQL master-slave configuration.
On both servers I have two tables: table1 and table2
I also have the following trigger on both servers:
Trigger: test_trigger
Event: UPDATE
Table: table1
Statement: insert into table2 values(null)
Timing: AFTER
The structure of table2 is the following:
+-------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
+-------+---------+------+-----+---------+----------------+
The problem is that, on MySQL 5.1.*, when the slave calls the trigger it adds the id that was inserted on the master and NOT the id it should insert according to its own auto_increment value.
Let's say I have the following data:
On Master:
SELECT * FROM table2;
Empty set (0.08 sec)
On Slave:
SELECT * FROM table2;
+----+
| id |
+----+
| 1 |
+----+
1 row in set (0.00 sec)
(just ignore the fact that the slave is not a complete mirror of the master)
Given the above scenario, when I update a row from table1 on Master, the Slave stops and returns an error:
Error 'Duplicate entry '1' for key 'PRIMARY'' on query.
I don't see why the slave tries to insert a specific ID.
It's very strange that on MySQL 5.0.* this doesn't happen.

Switch to row-based replication if possible.
Auto increment is pretty much broken for anything but the most basic cases with statement based replication.
For any statement which generates more than one auto_increment value (via triggers, multi row inserts, etc.) only the 1-st auto_increment value will always be correct on the slave (only the 1-st is logged).
If the slave reads an auto_increment value from the log, but does not 'use' it, the value gets used for the next statement (which can be completely unrelated). This happens when the slave skips the corresponding insert statement for some reason (an ignored table/db in the configuration, a conditional insert in a proc/trigger, etc.).
I had a similar problem with an audit-log type table (a trigger inserts an event in table2 for every change to table1) along with several other auto-increment related problems.
I'm not sure this solution will fit your case but I'm going to post it just in case:
Add a 'updated_count' field to table1. It starts at 0 (on insert) and gets incremented by 1 on every update (using BEFORE INSERT/UPDATE triggers).
Remove table2's auto_increment and change its PK to a composite key (table1_pk,table1_update). Then use table1's PK and 'updated_count' in the AFTER INSERT/UPDATE triggers for table2's PK.

Easy way to store metadata about MySQL-Database

Earlier today, I asked for an easy way to store a version number for the SQL table layout you are using in SQLite, and got the suggestion to use PRAGMA user_version. As there is no such thing as a Pragma in MySQL, I was wondering on how you would go about this in MySQL (Except for creating a table named "META" with a column "DB-Scheme-Version").
Just to repeat what I said in the linked question: I'm not looking for a way to find out which version of MySQL is installed, but to save a version nuber that tells me what version of my MySQL-Scheme I am using, without checking every table via script.
I also saw this question, but it only allows me to version single tables. Is there something similar or, preferably, easier, for whole Databases (Since it would be no fun to query every single table seperately)? Thanks in advance.
MySQL's SET GLOBAL would probably work, but I prefer a solution that does not reset itself every time the server reboots and does not require SUPER Privilege and / or access to the configuration file to use. To put it short: It should work with a standard MySQL-Database that you get when you rent a small webhosting package, not the ones you get if you rent a full server, as you tend to have more access to those.

There are a couple of choices, depending on the privileges that you have. The higher privileges you have, the more “elegant” the solution.
The most direct route is to create a stored function, which requires the CREATE ROUTINE privilege. e.g.
mysql> CREATE FUNCTION `mydb`.DB_VERSION() RETURNS VARCHAR(15)
RETURN '1.2.7.2861';
Query OK, 0 rows affected (0.03 sec)
mysql> SELECT `mydb`.DB_VERSION();
+--------------+
| DB_VERSION() |
+--------------+
| 1.2.7.2861 |
+--------------+
1 row in set (0.06 sec)
If your privileges limit you to only creating tables, you can create a simple table and put the version in a default value:
mysql> CREATE TABLE `mydb`.`db_version` (
`version` varchar(15) not null default '1.2.7.2861');
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW COLUMNS FROM `mydb`.`db_version`;
+---------+-------------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------+------+-----+------------+-------+
| version | varchar(15) | NO | | 1.2.7.2861 | |
+---------+-------------+------+-----+------------+-------+
1 row in set (0.00 sec)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008