I have a table with nested children. I'm trying to fetch a list of parents sorted by the most recent child, when available, otherwise the parent's created date. My query seemed to work at first, but as I started importing more and more records (#13.6K atm), performance has become a problem.
Version: 10.5.5-MariaDB
Table structure (excluded fields for brevity):
CREATE TABLE `emails` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NULL DEFAULT NULL,
`_lft` int(10) unsigned NOT NULL DEFAULT 0,
`_rgt` int(10) unsigned NOT NULL DEFAULT 0,
`parent_id` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `emails__lft__rgt_parent_id_index` (`_lft`,`_rgt`,`parent_id`) USING BTREE,
KEY `emails__lft__rgt_created_at_index` (`_lft`,`_rgt`,`created_at`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=13484 DEFAULT CHARSET=utf8
Here's the query I'm working with (#21s):
SELECT
`emails`.`id`,
(
SELECT MAX(`descendants`.`created_at`) AS `created_at`
FROM `emails` AS `descendants`
WHERE `descendants`.`_lft` >= `emails`.`_lft`
AND `descendants`.`_rgt` <= `emails`.`_rgt`
) `descendants_created_at`
FROM `emails`
WHERE `parent_id` IS NULL
ORDER BY `descendants_created_at` DESC
LIMIT 25 OFFSET 0;
The _lft and _rgt fields are provided by the lazychaser/laravel-nestedset package and are essentially giving me the descendants for each of the records returned in the main query. It includes the parent as well, so a created_at value is always returned.
Sample output:
| id | created_at | descendants_created_at |
|-------|---------------------|------------------------|
| 13483 | 2021-07-22 12:35:55 | 2021-07-22 12:35:55 |
| 8460 | 2021-04-29 12:56:57 | 2021-07-22 12:35:00 |
| 13481 | 2021-07-22 12:33:22 | 2021-07-22 12:33:22 |
| 3514 | 2021-01-16 09:43:42 | 2021-07-22 12:23:28 |
| 13479 | 2021-07-22 11:28:07 | 2021-07-22 11:28:07 |
| 13478 | 2021-07-22 11:27:09 | 2021-07-22 11:27:09 |
| 13407 | 2021-07-21 10:05:41 | 2021-07-22 10:21:14 |
| 13408 | 2021-07-21 10:05:41 | 2021-07-22 10:21:14 |
| 13389 | 2021-07-21 08:17:23 | 2021-07-22 10:21:14 |
| 13303 | 2021-07-19 14:25:38 | 2021-07-22 10:21:14 |
The problem seems to be once I'm doing the actual ordering here:
ORDER BY `descendants_created_at` DESC
My EXPLAIN looks like this:
UPDATE #1 - Using a LEFT JOIN & adding a parent_id key, this query is now #10s which is better, but still not great:
https://dbfiddle.uk/?rdbms=mariadb_10.5&fiddle=f5442fdfba119cc750c09a19024ccf7c
Description
I have a MySQL table like the following one:
CREATE TABLE `ticket` (
`ticket_id` int(11) NOT NULL AUTO_INCREMENT,
`ticket_number` varchar(30) DEFAULT NULL,
`pick1` varchar(2) DEFAULT NULL,
`pick2` varchar(2) DEFAULT NULL,
`pick3` varchar(2) DEFAULT NULL,
`pick4` varchar(2) DEFAULT NULL,
`pick5` varchar(2) DEFAULT NULL,
`pick6` varchar(2) DEFAULT NULL,
PRIMARY KEY (`ticket_id`)
) ENGINE=InnoDB AUTO_INCREMENT=19675 DEFAULT CHARSET=latin1;
Let's also asume we have the following values already stored in DB:
+-----------+-------------------+-------+-------+-------+-------+-------+-------+
| ticket_id | ticket_number | pick1 | pick2 | pick3 | pick4 | pick5 | pick6 |
+-----------+-------------------+-------+-------+-------+-------+-------+-------+
| 655 | 08-09-21-24-46-52 | 8 | 9 | 21 | 24 | 46 | 52 |
| 658 | 08-23-24-40-42-45 | 8 | 23 | 24 | 40 | 42 | 45 |
| 660 | 07-18-19-20-22-31 | 7 | 18 | 19 | 20 | 22 | 45 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 19674 | 06-18-33-43-49-50 | 6 | 18 | 33 | 43 | 49 | 50 |
+-----------+-------------------+-------+-------+-------+-------+-------+-------+
Now, my goal is to compare each ticket with each other one in the Table (except itself), in terms of their respective values in ticket_number field (6 elements per set, split by -). Put differently, for instance, imagine I compare ticket_id = 655 with ticket_id = 658, in terms of the elements in their respectives ticket_number fields, then I will find that elements 08 and 24 appear in both sets. If we now compare ticket_id = 660 with ticket_id = 19674, then we have that there is only one coincidence: 18.
What I am actually using to carry out these comparisons is the following query:
select A.ticket_id, A.ticket_number, P.ticket_id, P.ticket_number, count(P.ticket_number) as cnt from ticket A inner join ticket P on A.ticket_id != P.ticket_id
where
((A.ticket_number like concat("%", lpad(P.pick1,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick2,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick3,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick4,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick5,2,0), "%"))
+ (A.ticket_number like concat("%", lpad(P.pick6,2,0), "%")) > 3) group by A.ticket_id
having cnt > 5;
That is, first I create a INNER JOIN concatenating all rows with different ticket_id and then I compare each P.pickX (X=[1..6]) with the A.ticket_number of the resulting INNER JOIN operation, and I count the number of matchings between both sets.
Finally, after executing, I obtain something like this:
+-------------+-------------------+-------------+-------------------+-----+
| A.ticket_id | A.ticket_number | P.ticket_id | P.ticket_number | cnt |
+-------------+-------------------+-------------+-------------------+-----+
| 8489 | 14-21-28-32-48-49 | 2528 | 14-21-33-45-48-49 | 6 |
| 8553 | 02-14-17-38-47-53 | 2364 | 02-30-38-44-47-53 | 6 |
| 8615 | 05-12-29-33-36-43 | 4654 | 12-21-29-33-36-37 | 6 |
| 8686 | 09-13-29-34-44-48 | 6038 | 09-13-17-29-33-44 | 6 |
| 8693 | 01-10-14-17-42-50 | 5330 | 01-10-37-42-48-50 | 6 |
| ... | ... | ... | ... | ... |
| 19195 | 05-13-29-41-46-51 | 5106 | 07-13-14-29-41-51 | 6 |
+-------------+-------------------+-------------+-------------------+-----+
Problem
The problem is that I execute this for a table of 10476 rows, resulting in more tan 100 Million ticket_number vs pickX to compare, lasting around 172 seconds in total to conclude. This is too slow.
GOAL
My goal is to make this execution as fast as possible so as to be completed in less than a second, since this must work in real-time.
Is that possible?
If you want to keep the current structure then change pick1..6 to tinyint type instead of varchar
TINYINT(1) stores the values between -128 to 128 if it is signed. And then your query won't have that concat with % which is the cause of slow run.
Then, these two queries will give you the same result
select * FROM ticket where pick1 = '8';
select * FROM ticket where pick1 = '08';
This is the sql structure:
CREATE TABLE `ticket` (
`ticket_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`ticket_number` varchar(30) DEFAULT NULL,
`pick1` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick2` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick3` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick4` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick5` tinyint(1) unsigned zerofill DEFAULT NULL,
`pick6` tinyint(1) unsigned zerofill DEFAULT NULL,
PRIMARY KEY (`ticket_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;
I think, you even can remove the zerofill
if this doesn't work, change the table design.
How big can the numbers be? Looks like 50. If the answer is 63 or less, then change the format to this:
All 6 numbers are stored in a single SET ('0','1','2',...,'50') and use suitable operations to set the nth bit.
Then, comparing two sets becomes BIT_COUNT(x & y) to find out how many match. A simple comparison will test for equality.
If your goal is to see if a particular lottery guess is already in the table, then index that column so that a lookup will be fast. I don't mean minutes or even seconds, but rather a few milliseconds. Even for a billion rows.
The bit arithmetic can be done in SQL or in your client language. For example, to build the SET for (11, 33, 7), the code would be
INSERT INTO t SET picks = '11,33,7' -- order does not matter
Also this would work:
... picks = (1 << 11) |
(1 << 33) |
(1 << 7)
A quick example:
CREATE TABLE `setx` (
`picks` set('1','2','3','4','5','6','7','8','9','10') NOT NULL
) ENGINE=InnoDB;
INSERT INTO setx (picks) VALUES ('2,10,6');
INSERT INTO setx (picks) VALUES ('1,3,5,7,9'), ('2,4,6,8,10'), ('9,8,7,6,5,4,3,2,1,10');
SELECT picks, HEX(picks+0) FROM setx;
+----------------------+--------------+
| picks | HEX(picks+0) |
+----------------------+--------------+
| 2,6,10 | 222 |
| 1,3,5,7,9 | 155 |
| 2,4,6,8,10 | 2AA |
| 1,2,3,4,5,6,7,8,9,10 | 3FF |
+----------------------+--------------+
4 rows in set (0.00 sec)
I have this table in mysql called ts1
+----------+-------------+---------------+
| position | email | date_of_birth |
+----------+-------------+---------------+
| 3 | NULL | 1987-09-03 |
| 1 | NULL | 1982-03-26 |
| 2 | Sam#gmail | 1976-10-03 |
| 2 | Sam#gmail | 1976-10-03 |
+----------+-------------+---------------+
I want to drop the equal rows using ALTER IGNORE.
I have tried
ALTER IGNORE TABLE ts1 ADD UNIQUE INDEX inx (position, email, date_of_birth);
and
ALTER IGNORE TABLE ts1 ADD UNIQUE(position, email, date_of_birth);
In both cases I get
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'IGNORE TABLE ts1 ADD UNIQUE(position, email, date_of_birth)' at line 1
I'm using mySQL 5.7.9. Any suggestions?
To do it inline against the table, given just the columns you show consider the below. To do it in a new table as suggested by Strawberry, see my pastie link under comments.
create table thing
( position int not null,
email varchar(100) null,
dob date not null
);
insert thing(position,email,dob) values
(3,null,'1987-09-03'),(1,null,'1982-03-26'),
(2,'SamIAm#gmail.com','1976-10-03'),(2,'SamIAm#gmail.com','1976-10-03');
select * from thing;
+----------+------------------+------------+
| position | email | dob |
+----------+------------------+------------+
| 3 | NULL | 1987-09-03 |
| 1 | NULL | 1982-03-26 |
| 2 | SamIAm#gmail.com | 1976-10-03 |
| 2 | SamIAm#gmail.com | 1976-10-03 |
+----------+------------------+------------+
alter table thing add id int auto_increment primary key;
Delete with a join pattern, deleting subsequent dupes (that have a larger id number)
delete thing
from thing
join
( select position,email,dob,min(id) as theMin,count(*) as theCount
from thing
group by position,email,dob
having theCount>1
) xxx -- alias
on thing.position=xxx.position and thing.email=xxx.email and thing.dob=xxx.dob and thing.id>xxx.theMin
-- 1 row affected
select * from thing;
+----------+------------------+------------+----+
| position | email | dob | id |
+----------+------------------+------------+----+
| 3 | NULL | 1987-09-03 | 1 |
| 1 | NULL | 1982-03-26 | 2 |
| 2 | SamIAm#gmail.com | 1976-10-03 | 3 |
+----------+------------------+------------+----+
Add the unique index
CREATE UNIQUE INDEX `thing_my_composite` ON thing (position,email,dob); -- forbid dupes hereafter
View current table schema
show create table thing;
CREATE TABLE `thing` (
`position` int(11) NOT NULL,
`email` varchar(100) DEFAULT NULL,
`dob` date NOT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `thing_my_composite` (`position`,`email`,`dob`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;
I have a very basic table with a list of transactions (in this case, they're Call Detail Reporting records);
CREATE TABLE `cdr_records` (
`dateTimeOrigination` int(11) DEFAULT NULL,
`callingPartyNumber` varchar(50) DEFAULT NULL,
`originalCalledPartyNumber` varchar(50) DEFAULT NULL,
`finalCalledPartyNumber` varchar(50) DEFAULT NULL,
`pkid` varchar(50) NOT NULL DEFAULT '',
`duration` int(11) DEFAULT NULL,
`destDeviceName` varchar(50) DEFAULT NULL,
PRIMARY KEY (`pkid`)
)
When I query the transactions in this table, I get output like this;
from_unixtime(dateTimeOrigination) | callingPartyNumber | originalCalledPartyNumber | finalCalledPartyNumber | sec_to_time(duration) |
+------------------------------------+--------------------+---------------------------+------------------------+-----------------------+
| 2014-09-26 08:22:11 | 12345 | exampleNumber | exampleNumber | 02:49:54 |
| 2014-09-26 15:06:35 | 67891 | exampleNumber | exampleNumber | 02:39:46 |
| 2014-09-26 17:46:33 | 67891 | exampleNumber | exampleNumber | 02:37:13 |
| 2014-08-21 17:41:30 | 12345 | exampleNumber | exampleNumber | 02:23:55 |
| 2014-08-21 14:43:01 | 12345 | exampleNumber | exampleNumber | 02:01:56
I would like to write query that does two things;
1) Tells me who are the top talkers based on the duration of their calls
2) What the total duration was for all of that particular user's calls were for that period
How is something like this approached? If this is done with a DISTINCT query, how can I sum up the total number of duration values for each entry?
SELECT callingPartyNumber, sec_to_time(duration_sum)
FROM
(
SELECT callingPartyNumber, sum(duration) as duration_sum
FROM cdr_records
WHERE dateTimeOrigination
BETWEEN UNIX_TIMESTAMP('014-09-26 08:22:11')
AND UNIX_TIMESTAMP('2014-08-21 14:43:01')
GROUP BY callingPartyNumber
ORDER BY duration_sum
) sub1
So, here's basically the problem:
For starter, I am not asking anyone to do my homework, but to just give me a nudge in the right direction.
I have 2 tables containing names and contact data for practicing
Let's call these tables people and contact.
Create Table for people:
CREATE TABLE `people` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`fname` tinytext,
`mname` tinytext,
`lname` tinytext,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Create Table for contact:
CREATE TABLE `contact` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`person_id` int(10) unsigned NOT NULL DEFAULT '0',
`tel_home` tinytext,
`tel_work` tinytext,
`tel_mob` tinytext,
`email` text,
PRIMARY KEY (`id`,`person_id`),
KEY `fk_contact` (`person_id`),
CONSTRAINT `fk_contact` FOREIGN KEY (`person_id`) REFERENCES `people` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
When getting the contact information for each person, the query I use is as follows:
SELECT p.id, CONCAT_WS(' ',p.fname,p.mname,p.lname) name, c.tel_home, c.tel_work, c.tel_mob, c.email;
This solely creates a response like:
+----+----------+---------------------+----------+---------+---------------------+
| id | name | tel_home | tel_work | tel_mob | email |
+----+----------+---------------------+----------+---------+---------------------+
| 1 | Jane Doe | 1500 (xxx-xxx 1500) | NULL | NULL | janedoe#example.com |
| 2 | John Doe | 1502 (xxx-xxx 1502) | NULL | NULL | NULL |
| 2 | John Doe | NULL | NULL | NULL | johndoe#example.com |
+----+----------+---------------------+----------+---------+---------------------+
The problem with this view is that row 1 and 2 (counting from 0) could've been grouped to a single row.
Even though this "non-pretty" result is due to corrupt data, it is likely that this will occur in a multi-node database environment.
The targeted result would be something like
+----+----------+---------------------+----------+---------+---------------------+
| id | name | tel_home | tel_work | tel_mob | email |
+----+----------+---------------------+----------+---------+---------------------+
| 1 | Jane Doe | 1500 (xxx-xxx 1500) | NULL | NULL | janedoe#example.com |
| 2 | John Doe | 1502 (xxx-xxx 1502) | NULL | NULL | johndoe#example.com |
+----+----------+---------------------+----------+---------+---------------------+
Where the rows with the same id and name are grouped when still showing the effective data.
Side notes:
innodb_version: 5.5.32
version: 5.5.32-0ubuntu-.12.04.1-log
version_compile_os: debian_linux-gnu
You could use GROUP_CONCAT(), which "returns a string result with the concatenated non-NULL values from a group":
SELECT p.id,
GROUP_CONCAT(CONCAT_WS(' ',p.fname,p.mname,p.lname)) name,
GROUP_CONCAT(c.tel_home) tel_home,
GROUP_CONCAT(c.tel_work) tel_work,
GROUP_CONCAT(c.tel_mob ) tel_mob,
GROUP_CONCAT(c.email ) email
FROM my_table
GROUP BY p.id