I have this table in mysql called ts1
+----------+-------------+---------------+
| position | email | date_of_birth |
+----------+-------------+---------------+
| 3 | NULL | 1987-09-03 |
| 1 | NULL | 1982-03-26 |
| 2 | Sam#gmail | 1976-10-03 |
| 2 | Sam#gmail | 1976-10-03 |
+----------+-------------+---------------+
I want to drop the equal rows using ALTER IGNORE.
I have tried
ALTER IGNORE TABLE ts1 ADD UNIQUE INDEX inx (position, email, date_of_birth);
and
ALTER IGNORE TABLE ts1 ADD UNIQUE(position, email, date_of_birth);
In both cases I get
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IGNORE TABLE ts1 ADD UNIQUE(position, email, date_of_birth)' at line 1
I'm using mySQL 5.7.9. Any suggestions?
To do it inline against the table, given just the columns you show consider the below. To do it in a new table as suggested by Strawberry, see my pastie link under comments.
create table thing
( position int not null,
email varchar(100) null,
dob date not null
);
insert thing(position,email,dob) values
(3,null,'1987-09-03'),(1,null,'1982-03-26'),
(2,'SamIAm#gmail.com','1976-10-03'),(2,'SamIAm#gmail.com','1976-10-03');
select * from thing;
+----------+------------------+------------+
| position | email | dob |
+----------+------------------+------------+
| 3 | NULL | 1987-09-03 |
| 1 | NULL | 1982-03-26 |
| 2 | SamIAm#gmail.com | 1976-10-03 |
| 2 | SamIAm#gmail.com | 1976-10-03 |
+----------+------------------+------------+
alter table thing add id int auto_increment primary key;
Delete with a join pattern, deleting subsequent dupes (that have a larger id number)
delete thing
from thing
join
( select position,email,dob,min(id) as theMin,count(*) as theCount
from thing
group by position,email,dob
having theCount>1
) xxx -- alias
on thing.position=xxx.position and thing.email=xxx.email and thing.dob=xxx.dob and thing.id>xxx.theMin
-- 1 row affected
select * from thing;
+----------+------------------+------------+----+
| position | email | dob | id |
+----------+------------------+------------+----+
| 3 | NULL | 1987-09-03 | 1 |
| 1 | NULL | 1982-03-26 | 2 |
| 2 | SamIAm#gmail.com | 1976-10-03 | 3 |
+----------+------------------+------------+----+
Add the unique index
CREATE UNIQUE INDEX `thing_my_composite` ON thing (position,email,dob); -- forbid dupes hereafter
View current table schema
show create table thing;
CREATE TABLE `thing` (
`position` int(11) NOT NULL,
`email` varchar(100) DEFAULT NULL,
`dob` date NOT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `thing_my_composite` (`position`,`email`,`dob`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;
Related
I'm faced with a MySQL database which contains an events table with ~70 million rows which has foreign keys to other tables and is used to generate reports. Constructing a performant query to select (while counting/summing values) and grouping data per day from this table is proving challenging.
The database structure is as follows:
CREATE TABLE `client` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_client_id_name` (`id`,`name`)
) ENGINE=InnoDB AUTO_INCREMENT=66 DEFAULT CHARSET=utf8mb3
CREATE TABLE `class` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`client_id` int DEFAULT NULL,
`duration` int DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_client_id_idx` (`client_id`),
CONSTRAINT `fk_client_id` FOREIGN KEY (`client_id`) REFERENCES `client` (`id`) ON DELETE SET NULL ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=2606 DEFAULT CHARSET=utf8mb3
CREATE TABLE `event` (
`id` int NOT NULL AUTO_INCREMENT,
`start_time` datetime DEFAULT NULL,
`class_id` int DEFAULT NULL,
`venue_id` int DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `fk_class_id_idx` (`class_id`),
KEY `fk_venue_id_idx` (`venue_id`),
KEY `idx_1` (`venue_id`,`class_id`,`start_time`),
CONSTRAINT `fk_class_id` FOREIGN KEY (`class_id`) REFERENCES `class` (`id`) ON DELETE SET NULL ON UPDATE CASCADE,
CONSTRAINT `fk_venue_id` FOREIGN KEY (`venue_id`) REFERENCES `venue` (`id`) ON DELETE SET NULL ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=64093231 DEFAULT CHARSET=utf8mb3
CREATE TABLE `venue` (
`id` int NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_venue_id_name` (`id`,`name`)
) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=utf8mb3
The query which is fine on an events table with a few thousand rows to demonstrate the desired outcome is as follows:
SELECT
CAST(event.start_time as date) as day,
class.name,
client.name,
venue.name,
COUNT(class.name) AS occurrences,
SUM(class.duration) AS duration
FROM
class,
client,
event,
venue
WHERE
event.venue_id = venue.id
AND event.class_id = class.id
AND class.client_id = client.id
GROUP BY day, class.name, client.name, venue.name
The database isn't indexed and although I've tried indexing with things like alter table events add index idx_test (venue_id, class_id, start_time); to improve performance it's still incredibly slow (I tend to abort them when they're past the 10 minute mark so don't know for sure how long they'd take to complete).
I figured this was a good use case for a summary table (as suggested by Rick James' guide) so that I could hold a separate set of summarized data broken down into day with occurrences and total duration calculated/incremented with each addition to the table (IODKU). However I'm then also up against creating rows per day in a summary table based on what is considered a day in the database (UTC) which may not match with the application's "day" due to timezone offset.
Short of converting the start_time column to a timestamp type (which is then inconsistent with all other date types in the database) is there any way round this or is there any other optimization I could be making to the original events table resulting in a more responsive query? TIA
Update 23/05
Here's the buffer pool size:
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
+-------------------------+-----------+
| Variable_name | Value |
+-------------------------+-----------+
| innodb_buffer_pool_size | 134217728 |
+-------------------------+-----------+
I've also made a bit of progress with indexing, modifying the query and creating a summary table.
I tried various ordering of columns to test indexes and found idx_event_venueid_classid_starttime (below), to be the most efficient for the event table:
SHOW INDEXES FROM EVENT;
+-------+------------+-------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+-------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| event | 0 | PRIMARY | 1 | id | A | 62142912 | NULL | NULL | | BTREE | | | YES | NULL |
| event | 1 | fk_class_id_idx | 1 | class_id | A | 51286 | NULL | NULL | YES | BTREE | | | YES | NULL |
| event | 1 | fk_venue_id_idx | 1 | venue_id | A | 16275 | NULL | NULL | YES | BTREE | | | YES | NULL |
| event | 1 | idx_event_venueid_classid_starttime | 1 | venue_id | A | 13378 | NULL | NULL | YES | BTREE | | | YES | NULL |
| event | 1 | idx_event_venueid_classid_starttime | 2 | class_id | A | 81331 | NULL | NULL | YES | BTREE | | | YES | NULL |
| event | 1 | idx_event_venueid_classid_starttime | 3 | start_time | A | 63909472 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-------+------------+-------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Here's my modified version of the query, using JOIN syntax and now uses CONVERT_TZ to convert from UTC to the timezone required for reporting and then group that by the date (discarding the time portion):
SELECT
DATE(CONVERT_TZ(event.start_time,
'UTC',
'Europe/London')) AS tz_date,
class.name,
client.name,
venue.name,
COUNT(class.id) AS occurrences,
SUM(class.duration) AS duration
FROM
event
JOIN
class ON class.id = event.class_id
JOIN
venue ON venue.id = event.venue_id
JOIN
client ON client.id = class.client_id
GROUP BY tz_date, class.name, client.name, venue.name;
And here's the output of explain for that query:
+----+-------------+--------+------------+--------+---------------------------------------------------------------------+-------------------------------------+---------+-------------------------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+--------+---------------------------------------------------------------------+-------------------------------------+---------+-------------------------+------+----------+------------------------------+
| 1 | SIMPLE | venue | NULL | index | PRIMARY,idx_venue_id_name | idx_venue_id_name | 772 | NULL | 28 | 100.00 | Using index; Using temporary |
| 1 | SIMPLE | event | NULL | ref | fk_class_id_idx,fk_venue_id_idx,idx_event_venueid_classid_starttime | idx_event_venueid_classid_starttime | 5 | example.venue.id | 4777 | 100.00 | Using where; Using index |
| 1 | SIMPLE | class | NULL | eq_ref | PRIMARY,fk_client_id_idx | PRIMARY | 4 | example.event.class_id | 1 | 100.00 | Using where |
| 1 | SIMPLE | client | NULL | eq_ref | PRIMARY,idx_client_id_name | PRIMARY | 4 | example.class.client_id | 1 | 100.00 | NULL |
+----+-------------+--------+------------+--------+---------------------------------------------------------------------+-------------------------------------+---------+-------------------------+------+----------+------------------------------+
The query takes ~1m 20s to run now so I figured I could prepend that with an insert into to populate a summary table with the dates being timezone specific and run that on a nightly basis. Summary table structure:
CREATE TABLE `summary` (
`tz_date` date NOT NULL,
`class` varchar(255) NOT NULL,
`client` varchar(255) NOT NULL,
`venue` varchar(255) NOT NULL,
`occurrences` int NOT NULL,
`duration` int NOT NULL,
PRIMARY KEY (`tz_date`,`class`,`client`,`venue`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
From the original ~60m+ rows in the event table, the aggregated summary table is populated with ~66k rows.
To then generate the reports from the summary table it takes a fraction of a second (shown below with data snipped):
SELECT * FROM SUMMARY;
66989 rows in set (0.03 sec)
I haven't looked into the impact of inserting into event while the query to populate the summary table is running - is using InnoDB likely to slow that down?
No further indexes are likely to help. It need to scan all the events table, reaching into the other tables to get the names.
Some things for us to look at:
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
EXPLAIN SELECT ...
How much RAM do you have?
Do the aggregates (COUNT and SUM) look correct? In some situations involving JOIN, they can be over-inflated.
Please use the newer JOIN ... ON syntax. (Won't change performance.)
As you observed, a Summary Table may help -- but only of the older data is not being modified. Please provide the SHOW CREATE TABLE and query for it.
Yes, timezone vs "definition of day" is a thorny issue. Notice how StackOverflow defines day based on UTC.
How many new rows are there per day? Are they spread out somewhat evenly throughout the day? If the average number of rows per hour is at least 20, then the Summary Table could be based on half-hour intervals. (I picked that because of India time vs most of the rest of the world.) The 20 comes from a Rule of Thumb that says that a summary table should have one-tenth as many rows as the Fact table.
Yes, TIMESTAMP instead of DATETIME may be a workaround.
Since you are talking about moderately large tables, consider whether to change INT NULL to SMALLINT UNSIGNED NOT NULL or some other sized integer.
(As for the cliff in 2038, ask yourself how many databases have been active on the same hardware and software since 2006. That may give some perspective on whether your design must survive 16 years.)
I am building a dynamic application which will act based on settings.
The settings are stored in a MySQL table which consists of both App level data and global level data (app_id = 0).
My use case is, I want to select the settings of an App. If it does not exist, fetch the corresponding setting from the global level.
I have achieved this using sub queries and COALESCE function.
Question: Can the data be fetched in a single query? If not, Can the schema be modified to handle this App level and Global level in a much simpler way?
Schema
CREATE TABLE `settings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`partner_id` int(11) NOT NULL,
`app_id` int(11) DEFAULT NULL,
`type` varchar(64) DEFAULT NULL,
`name` varchar(64) DEFAULT NULL,
`value` varchar(300) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `key_partner_id` (`partner_id`),
KEY `key_app_id` (`app_id`),
KEY `key_type` (`type`),
KEY `key_name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Data
| id | partner_id | app_id | type | name | value |
|----|------------|--------|-------|---------|--------|
| 1 | 500 | 0 | color | primary | blue |
| 2 | 500 | 100 | color | primary | green |
| 3 | 500 | 101 | color | primary | red |
query
SELECT * FROM settings WHERE app_id in (
COALESCE ((SELECT app_id FROM settings WHERE app_id = 100), 0)
);
| id | partner_id | app_id | type | name | value |
|----|------------|--------|-------|---------|-------|
| 2 | 500 | 100 | color | primary | green |
SELECT * FROM settings WHERE app_id in (
COALESCE ((SELECT app_id FROM settings WHERE app_id = 102), 0)
);
| id | partner_id | app_id | type | name | value |
|----|------------|--------|-------|---------|-------|
| 1 | 500 | 0 | color | primary | blue |
Single query to get the settings for the app or fall back on global settings
SELECT * FROM settings
WHERE app_id IN(0,102)
ORDER BY app_id DESC
LIMIT 1;
Obviously, this assumes a single row for the app settings.
I use following query to create table news:
CREATE TABLE IF NOT EXISTS `news` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`news_title` varchar(500) NOT NULL,
`news_detail` varchar(5000) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
mysql> desc news;
+-------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| news_title | varchar(500) | NO | | | |
| news_detail | varchar(5000) | NO | | | |
+-------------+---------------+------+-----+---------+----------------+
mysql> insert into news (news_title, news_detail) values ('test','demod demo');
mysql> select * from news;
+----+--------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------+
| id | news_title | news_detail |
+----+--------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------+
| 3 | Advani wants to shift from Gujarat, BJP trying to convince him otherwise | testt |
| 5 | test | demod demo |
+----+--------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------+
as you see in the select query the id is increment like 1,3,5,7.... means it increment by 2. So what is the problem here?
actually in my local, it is increment by 1 and working perfectly. but in my server it creates the problem.
Thanks in advance.
Why ?
The auto_increment value can be change with the variable auto_increment_increment.Normally, it’s always 1, but for some weird reason it was set to 2 in my case. I think MySQL Workbench may be involed.
You can change it be doing one of those :
SET ##auto_increment_increment=1
SET GLOBAL auto_increment_increment=1;
More information
You can find some information here and here.
Check system variable ##set_auto_increment_increment.
it should be
SET ##auto_increment_increment=1;
I have found that MySQL (Win 7 64, 5.6.14) does not use index properly if I specify table output for IN statement. USER table contains 900k records.
If I use IN (_SOME_TABLE_OUTPUT_) syntax - I get fullscan for all 900k users. Query runs forever.
If I use IN ('CONCRETE','VALUES') syntax - I get a correct index usage.
How can I make MySQL finally USE the index?
1st case:
explain SELECT gu.id FROM USER gu WHERE gu.uuid in
(select '11b6a540-0dc5-44e0-877d-b3b83f331231' union
select '11b6a540-0dc5-44e0-877d-b3b83f331232');
+----+--------------------+------------+-------+---------------+------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+-------+---------------+------+---------+------+--------+--------------------------+
| 1 | PRIMARY | gu | index | NULL | uuid | 257 | NULL | 829930 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 3 | DEPENDENT UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------------+------------+-------+---------------+------+---------+------+--------+--------------------------+
2nd case:
explain SELECT gu.id FROM USER gu WHERE gu.uuid in
('11b6a540-0dc5-44e0-877d-b3b83f331231');
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
| 1 | SIMPLE | gu | ref | uuid | uuid | 257 | const | 1 | Using where; Using index |
+----+-------------+-------+------+---------------+------+---------+-------+------+--------------------------+
Table structure:
CREATE TABLE `USER` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`version` bigint(20) NOT NULL,
`email` varchar(255) DEFAULT NULL,
`uuid` varchar(255) NOT NULL,
`partner_id` bigint(20) NOT NULL,
`password` varchar(255) DEFAULT NULL,
`date_created` datetime DEFAULT NULL,
`last_updated` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique-email` (`partner_id`,`email`),
KEY `uuid` (`uuid`),
CONSTRAINT `fk_USER_partner` FOREIGN KEY (`partner_id`) REFERENCES `partner` (`id`) ON DELETE CASCADE,
CONSTRAINT `FKB2D9FEBE725C505E` FOREIGN KEY (`partner_id`) REFERENCES `partner` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3315452 DEFAULT CHARSET=latin1
FORCE INDEX and USE INDEX statements don't change anything.
Demonstration SQLfiddle: http://sqlfiddle.com/#!2/c607e1/2
In fact I faced such problem before and it happened that I had one table that had a single column set as UTF-8 and the other tables where latin1. It did not matter what I did, MySQL insisted on using no indexes. The problem is quite well described on this blog post Slow queries in MySQL due to collation problems. Once you manage to fix the character set, I believe any of the queries will work.
An inner join on your virtual table might give you better performance. Try something along these lines.
SELECT gu.id
FROM USER gu
INNER JOIN (
select '11b6a540-0dc5-44e0-877d-b3b83f331231' uuid
union all
select '11b6a540-0dc5-44e0-877d-b3b83f331232') ids
on gu.uuid = ids.uuid;
So, here's basically the problem:
For starter, I am not asking anyone to do my homework, but to just give me a nudge in the right direction.
I have 2 tables containing names and contact data for practicing
Let's call these tables people and contact.
Create Table for people:
CREATE TABLE `people` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`fname` tinytext,
`mname` tinytext,
`lname` tinytext,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Create Table for contact:
CREATE TABLE `contact` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`person_id` int(10) unsigned NOT NULL DEFAULT '0',
`tel_home` tinytext,
`tel_work` tinytext,
`tel_mob` tinytext,
`email` text,
PRIMARY KEY (`id`,`person_id`),
KEY `fk_contact` (`person_id`),
CONSTRAINT `fk_contact` FOREIGN KEY (`person_id`) REFERENCES `people` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
When getting the contact information for each person, the query I use is as follows:
SELECT p.id, CONCAT_WS(' ',p.fname,p.mname,p.lname) name, c.tel_home, c.tel_work, c.tel_mob, c.email;
This solely creates a response like:
+----+----------+---------------------+----------+---------+---------------------+
| id | name | tel_home | tel_work | tel_mob | email |
+----+----------+---------------------+----------+---------+---------------------+
| 1 | Jane Doe | 1500 (xxx-xxx 1500) | NULL | NULL | janedoe#example.com |
| 2 | John Doe | 1502 (xxx-xxx 1502) | NULL | NULL | NULL |
| 2 | John Doe | NULL | NULL | NULL | johndoe#example.com |
+----+----------+---------------------+----------+---------+---------------------+
The problem with this view is that row 1 and 2 (counting from 0) could've been grouped to a single row.
Even though this "non-pretty" result is due to corrupt data, it is likely that this will occur in a multi-node database environment.
The targeted result would be something like
+----+----------+---------------------+----------+---------+---------------------+
| id | name | tel_home | tel_work | tel_mob | email |
+----+----------+---------------------+----------+---------+---------------------+
| 1 | Jane Doe | 1500 (xxx-xxx 1500) | NULL | NULL | janedoe#example.com |
| 2 | John Doe | 1502 (xxx-xxx 1502) | NULL | NULL | johndoe#example.com |
+----+----------+---------------------+----------+---------+---------------------+
Where the rows with the same id and name are grouped when still showing the effective data.
Side notes:
innodb_version: 5.5.32
version: 5.5.32-0ubuntu-.12.04.1-log
version_compile_os: debian_linux-gnu
You could use GROUP_CONCAT(), which "returns a string result with the concatenated non-NULL values from a group":
SELECT p.id,
GROUP_CONCAT(CONCAT_WS(' ',p.fname,p.mname,p.lname)) name,
GROUP_CONCAT(c.tel_home) tel_home,
GROUP_CONCAT(c.tel_work) tel_work,
GROUP_CONCAT(c.tel_mob ) tel_mob,
GROUP_CONCAT(c.email ) email
FROM my_table
GROUP BY p.id