two different table join mysql - mysql

I have 2 table which one:
Albums:
CREATE TABLE IF NOT EXISTS `albums` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(64) NOT NULL,
`singer` varchar(64) NOT NULL,
`year` int(11) NOT NULL,
`releaseDate` date DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `categoryId` (`categoryId`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=7 ;
Music:
CREATE TABLE IF NOT EXISTS `musics` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(128) NOT NULL,
`singer` varchar(128) NOT NULL,
`genre` varchar(128) NOT NULL,
`albumId` int(11) DEFAULT NULL,
`year` int(4) NOT NULL,
`releaseDate` date DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `categoryId` (`categoryId`),
KEY `albumId` (`albumId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=16 ;
I want join that's table and ordered by releaseDate. It's possible?
Sorry for my english.
RESULT:
Now I get some result:
+-----------------------------------------------+-------------------------+-------------+
| albums_name | musics_name | releaseDate |
+-----------------------------------------------+-------------------------+-------------+
| The Artificial Theory For The Dramatic Beauty | K | NULL |
| The Artificial Theory For The Dramatic Beauty | Fiction In Hope | NULL |
| The Artificial Theory For The Dramatic Beauty | Chemicarium | NULL |
| The Artificial Theory For The Dramatic Beauty | Voice | NULL |
| The Artificial Theory For The Dramatic Beauty | Blue | NULL |
| The Artificial Theory For The Dramatic Beauty | Mirror | NULL |
| The Artificial Theory For The Dramatic Beauty | If You Want To Wake Up? | NULL |
| The Artificial Theory For The Dramatic Beauty | Interlude | NULL |
| NULL | Everything At Once | 2010-11-11 |
| NULL | Blue Freightliner | 2011-11-11 |
+-----------------------------------------------+-------------------------+-------------+
I want:
+-----------------------------------------------+-------------------------+-------------+
| albums_name | musics_name | releaseDate |
+-----------------------------------------------+-------------------------+-------------+
| The Artificial Theory For The Dramatic Beauty | NULL | 2009-11-11 |
| NULL | Everything At Once | 2010-11-11 |
| NULL | Blue Freightliner | 2011-11-11 |
+-----------------------------------------------+-------------------------+-------------+

You should do some studying / playing with JOIN. There's a few different types (INNER JOIN, LEFT JOIN).
Here's a simple example to get you started:
SELECT albums.name AS albums.name, musics.name AS musics_name, musics.releaseDate
FROM albums
LEFT JOIN musics ON albums.id = musics.albumId
ORDER BY musics.releaseDate
Or, if you need music and only the album when it matches:
SELECT albums.name AS albums.name, musics.name AS musics_name, musics.releaseDate
FROM musics
LEFT JOIN albums ON musics.albumId = albums.id
ORDER BY musics.releaseDate

There are two very distinct parts that your output appears to consist of:
albums and their release dates;
album-less tracks and their release dates.
Each part coming from a different table, this strikes me as a classic example of a union, not a join, of two sets:
SELECT
name AS albums_name,
NULL AS musics_name,
releaseDate
FROM albums
UNION ALL
SELECT
NULL AS albums_name,
name AS musics_name,
releaseDate
FROM musics
WHERE
album_id IS NULL
ORDER BY
releaseDate ASC
;

Related

SQL/MySQL - Update target table with most recent entry

I'm looking for some help with a SQL/MySQL problem.
I have three source tables:
CREATE TABLE `customers` (
`cid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`customer_name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`cid`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8
CREATE TABLE `standards` (
`sid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`standard_name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`sid`)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8
CREATE TABLE `partial_standard_compliance` (
`customer` bigint(20) unsigned NOT NULL,
`standard` bigint(20) unsigned NOT NULL,
`standard_compliance` bigint(20) unsigned DEFAULT NULL,
`created_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8
The idea is a customer gives themselves a rating using the standard_compliance column in the partial_standard_compliance table.
Customers can rate the same standard multiple times.
Result example:
+----------+----------+---------------------+---------------------+
| customer | standard | standard_compliance | created_time |
+----------+----------+---------------------+---------------------+
| 1 | 1 | 50 | 2023-01-28 16:19:34 |
| 1 | 1 | 60 | 2023-01-28 16:19:40 |
| 1 | 1 | 70 | 2023-01-28 16:19:48 |
| 2 | 10 | 30 | 2023-01-28 16:58:21 |
| 2 | 8 | 60 | 2023-01-28 16:58:32 |
| 2 | 9 | 60 | 2023-01-28 16:58:39 |
| 2 | 9 | 80 | 2023-01-28 16:58:43 |
+----------+----------+---------------------+---------------------+
I need to create a 4th table that has customer name, standard name and the most recent rating they have given themselves.
I have been trying with JOINS and CREATE AS SELECT, but haven't been able to solve it.
Any point in the right direction would be great. Thanks.
I have been trying with JOINS and CREATE AS SELECT
I need to create a 4th table that has customer name, standard name and
the most recent rating they have given themselves
Would be better if you create a view instead.
create view fourth_table as
select customer_name ,
standard_name ,
standard_compliance,
created_time
from (select c.customer_name,
s.standard_name,
psc.standard_compliance,
psc.created_time,
row_number() over(partition by c.customer_name order by psc.created_time desc ) as rn
from customers c
inner join partial_standard_compliance psc on psc.customer=c.cid
inner join standards s on s.sid=psc.standard
) x
where rn=1;
https://dbfiddle.uk/ZiK-k8jN
MySQL View

MariaDB - why are the primary keys not being used for joins on a specific table?

I'm trying to understand why these two queries are treated differently with regards to use of the primary keys in joins.
This query with a join on icd_codes (the SELECT query, without the EXPLAIN, of course) completes in 56 ms:
EXPLAIN
SELECT var.Var_ID,
var.Gene,
var.HGVSc,
pVCF_145K.PT_ID,
pVCF_145K.AD_ALT,
pVCF_145K.AD_REF,
icd_codes.ICD_NM,
icd_codes.PT_AGE
FROM public.variants_145K var
INNER JOIN public.pVCF_145K USING (Var_ID)
INNER JOIN public.icd_codes using (PT_ID)
# INNER JOIN public.demographics USING (PT_ID)
WHERE Gene IN ('SLC9A6', 'SLC9A7')
AND Canonical
AND impact = 'high'
+------+-------------+-----------+-------+------------------------------------------------------------------+---------------------------------+---------+------------------------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-----------+-------+------------------------------------------------------------------+---------------------------------+---------+------------------------+------+------------------------------------+
| 1 | SIMPLE | var | range | PRIMARY,variants_145K_Gene_index,variants_145K_Impact_Gene_index | variants_145K_Impact_Gene_index | 125 | NULL | 280 | Using index condition; Using where |
| 1 | SIMPLE | pVCF_145K | ref | PRIMARY,pVCF_145K_PT_ID_index | PRIMARY | 326 | public.var.Var_ID | 268 | |
| 1 | SIMPLE | icd_codes | ref | PRIMARY | PRIMARY | 38 | public.pVCF_145K.PT_ID | 29 | |
+------+-------------+-----------+-------+------------------------------------------------------------------+---------------------------------+---------+------------------------+------+------------------------------------+
This query with a join on demographics takes over 11 minutes, and I'm not sure how to interpret the difference in the explain results. Why is it resorting to using the join buffer? How can I optimize this further?
EXPLAIN
SELECT variants_145K.Var_ID,
variants_145K.Gene,
variants_145K.HGVSc,
pVCF_145K.PT_ID,
pVCF_145K.AD_ALT,
pVCF_145K.AD_REF,
demographics.Sex,
demographics.Age
FROM public.variants_145K
INNER JOIN public.pVCF_145K USING (Var_ID)
# inner join public.icd_codes using (PT_ID)
INNER JOIN public.demographics USING (PT_ID)
WHERE Gene IN ('SLC9A6', 'SLC9A7')
AND Canonical
AND impact = 'high'
+------+-------------+---------------+--------+------------------------------------------------------------------+---------------------------------+---------+-------------------------------------------------------+---------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------------+--------+------------------------------------------------------------------+---------------------------------+---------+-------------------------------------------------------+---------+------------------------------------+
| 1 | SIMPLE | variants_145K | range | PRIMARY,variants_145K_Gene_index,variants_145K_Impact_Gene_index | variants_145K_Impact_Gene_index | 125 | NULL | 280 | Using index condition; Using where |
| 1 | SIMPLE | demographics | ALL | PRIMARY | NULL | NULL | NULL | 1916393 | Using join buffer (flat, BNL join) |
| 1 | SIMPLE | pVCF_145K | eq_ref | PRIMARY,pVCF_145K_PT_ID_index | PRIMARY | 364 | public.variants_145K.Var_ID,public.demographics.PT_ID | 1 | |
+------+-------------+---------------+--------+------------------------------------------------------------------+---------------------------------+---------+-------------------------------------------------------+---------+------------------------------------+
Adding a further filter in demographics (WHERE demographics.Platform IS NOT NULL) as shown below reduces to 38 seconds. However, there are queries where we do not use such filters so it would be ideal if it could use the primary PT_ID key in the joins.
+------+-------------+---------------+--------+------------------------------------------------------------------+---------------------------------+---------+-------------------------------------------------------+--------+------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------------+--------+------------------------------------------------------------------+---------------------------------+---------+-------------------------------------------------------+--------+------------------------------------------------------------------------+
| 1 | SIMPLE | variants_145K | range | PRIMARY,variants_145K_Gene_index,variants_145K_Impact_Gene_index | variants_145K_Impact_Gene_index | 125 | NULL | 280 | Using index condition; Using where |
| 1 | SIMPLE | demographics | range | PRIMARY,Demographics_PLATFORM_index | Demographics_PLATFORM_index | 17 | NULL | 258544 | Using index condition; Using where; Using join buffer (flat, BNL join) |
| 1 | SIMPLE | pVCF_145K | eq_ref | PRIMARY,pVCF_145K_PT_ID_index | PRIMARY | 364 | public.variants_145K.Var_ID,public.demographics.PT_ID | 1 | |
+------+-------------+---------------+--------+------------------------------------------------------------------+---------------------------------+---------+-------------------------------------------------------+--------+------------------------------------------------------------------------+
The tables:
create table public.demographics # 1,916,393 rows
(
PT_ID varchar(9) not null
primary key,
Age float(3,1) null,
Status varchar(8) not null,
Sex varchar(7) not null,
Race_1 varchar(41) not null,
Race_2 varchar(41) not null,
Ethnicity varchar(22) not null,
Smoker_flag tinyint(1) not null,
Platform char(4) null,
MyCode_Consent tinyint(1) not null,
MR_ENC_DT date null,
Birthday date null,
Deathday date null,
max_unrelated_145K tinyint unsigned null
);
create index Demographics_PLATFORM_index
on public.demographics (Platform);
create table public.icd_codes # 116,220,141 rows
(
PT_ID varchar(9) not null,
ICD_CD varchar(8) not null,
ICD_NM varchar(217) not null,
DX_DT date not null,
PT_AGE float(3,1) unsigned not null,
CODE_SYSTEM char(7) not null,
primary key (PT_ID, ICD_CD, DX_DT)
);
create table public.pVCF_145K # 10,113,244,082 rows
(
Var_ID varchar(81) not null,
PT_ID varchar(9) not null,
GT tinyint unsigned not null,
GQ smallint unsigned not null,
AD_REF smallint unsigned not null,
AD_ALT smallint unsigned not null,
DP smallint unsigned not null,
FT varchar(30) null,
primary key (Var_ID, PT_ID)
);
create index pVCF_145K_PT_ID_index
on public.pVCF_145K (PT_ID);
create table public.variants_145K # 151,314,917 rows
(
Var_ID varchar(81) not null,
Gene varchar(22) null,
Feature varchar(18) not null,
Feature_type varchar(10) null,
HIGH_INF_POS tinyint(1) null,
Consequence varchar(26) not null,
rsid varchar(34) null,
Impact varchar(8) not null,
Canonical tinyint(1) not null,
Exon smallint unsigned null,
Intron smallint unsigned null,
HGVSc varchar(323) null,
HGVSp varchar(196) null,
AA_position smallint unsigned null,
gnomAD_NFE_MAF float null,
SIFT varchar(14) null,
PolyPhen varchar(17) null,
GHS_Hom mediumint(5) unsigned null,
GHS_Het mediumint(5) unsigned null,
GHS_WT mediumint(5) unsigned null,
IDT_MAF float null,
VCR_MAF float null,
UKB_MAF float null,
Chr tinyint unsigned not null,
Pos int(9) unsigned not null,
Ref varchar(298) not null,
Alt varchar(306) not null,
primary key (Var_ID, Feature)
);
create index variants_145K_Chr_Pos_Ref_Alt_index
on public.variants_145K (Chr, Pos, Ref, Alt);
create index variants_145K_Gene_index
on public.variants_145K (Gene);
create index variants_145K_Impact_Gene_index
on public.variants_145K (Impact, Gene);
create index variants_145K_rsid_index
on public.variants_145K (rsid);
This is on MariaDB 10.5.8 (innodb)
Thank you!
INDEX(impact, canonical, gene) or INDEX(canonical, impact, gene) is better for the var.
If you don't need it, remove INNER JOIN public.icd_codes USING (PT_ID). It is costly to reach into that table, and all it does is filter out any rows that fail in the JOIN.
Ditto for demographics.
The "join buffer" is not always a "resort to"; however, it is often a fast way. Especially if most of the table is needed and the join_buffer is big enough.
More
Note that demographics has a single-column PRIMARY KEY(PT_ID), but the other table has a composite PK. This probably impacts whether the Optimizer will even consider using the "join buffer".
Depending on a lot of things (in the query and the data), the Optimizer may make the wrong choice between join_buffer and repeatedly doing lookups.

Is there a way to combine two aggregate functions in MySQL to get distinct values?

I have the following schema in MySQL:
CREATE TABLE `ORDER_CONTENTS` (
`Order_ID` int(10) NOT NULL,
`Pizza_Name` varchar(20) NOT NULL DEFAULT '',
`Quantity` int(2) NOT NULL,
PRIMARY KEY (`Order_ID`,`Pizza_Name`),
KEY `ordercontentsfk2_idx` (`Pizza_Name`),
CONSTRAINT `order_contentsfk1` FOREIGN KEY (`Order_ID`) REFERENCES `ORDERS` (`Order_ID`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `CUSTOMERS` (
`Mobile_Number` varchar(10) NOT NULL,
`Name` varchar(45) NOT NULL,
`Age` int(3) DEFAULT NULL,
`Gender` enum('M','F') DEFAULT NULL,
`Email` varchar(100) DEFAULT NULL,
PRIMARY KEY (`Mobile_Number`),
UNIQUE KEY `Mobile_Number_UNIQUE` (`Mobile_Number`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `ORDERS` (
`Order_ID` int(10) NOT NULL AUTO_INCREMENT,
`Mobile_Number` varchar(10) NOT NULL,
`Postcode` int(4) NOT NULL,
`Timestamp` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`Order_ID`),
KEY `ordersfk1_idx` (`Mobile_Number`),
KEY `ordersfk2_idx` (`Postcode`),
CONSTRAINT `ordersfk1` FOREIGN KEY (`Mobile_Number`) REFERENCES `CUSTOMERS` (`Mobile_Number`) ON DELETE NO ACTION ON UPDATE CASCADE,
CONSTRAINT `ordersfk2` FOREIGN KEY (`Postcode`) REFERENCES `STORES` (`Postcode`) ON DELETE NO ACTION ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1;
CREATE TABLE `STORES` (
`Postcode` int(4) NOT NULL DEFAULT '0',
`Address` varchar(100) DEFAULT NULL,
`Phone_Number` varchar(10) DEFAULT NULL,
PRIMARY KEY (`Postcode`),
UNIQUE KEY `Postcode_UNIQUE` (`Postcode`),
UNIQUE KEY `Address_UNIQUE` (`Address`),
UNIQUE KEY `Phone_Number_UNIQUE` (`Phone_Number`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I need to find the following:
Problem Statement
For each customer, list the store details of their favorite pizza
store, where a store is the favorite if it is the one where the
customer purchased the most pizzas).
I have managed to figure it out upto the following query:
select `Name`,SUM(quantity) as hqty,COUNT(*),Postcode from CUSTOMERS natural join orders natural join order_contents group by Mobile_Number,postcode;
This gives me a result as the following:
+---------------+------+----------+----------+
| Name | hqty | COUNT(*) | Postcode |
+---------------+------+----------+----------+
| Homer Simpson | 19 | 3 | 4000 |
| Homer Simpson | 1 | 1 | 4502 |
| Ned Flanders | 2 | 1 | 4000 |
+---------------+------+----------+----------+
But in this case there are two instances of the same customer ( i.e. Homer Simpson). Why is this so? I figured that I would need to use a combination of aggregate function.
Any help/explanation would be great.
Cheers!
[UPDATE 1]
Just for reference:
select * from CUSTOMERS natural join orders natural join
order_contents;
The above query produces this:
+----------+---------------+---------------+------+--------+-----------------+----------+---------------------+--------------+----------+
| Order_ID | Mobile_Number | Name | Age | Gender | Email | Postcode | Timestamp | Pizza_Name | Quantity |
+----------+---------------+---------------+------+--------+-----------------+----------+---------------------+--------------+----------+
| 1 | 0412345678 | Homer Simpson | 38 | M | homer#doh.com | 4000 | 2014-08-21 19:38:01 | Garlic Bread | 9 |
| 1 | 0412345678 | Homer Simpson | 38 | M | homer#doh.com | 4000 | 2014-08-21 19:38:01 | Hawaiian | 9 |
| 2 | 0412345678 | Homer Simpson | 38 | M | homer#doh.com | 4000 | 2014-08-21 19:38:01 | Vegan Lovers | 1 |
| 3 | 0412345678 | Homer Simpson | 38 | M | homer#doh.com | 4502 | 2014-08-21 19:38:12 | Meat Lovers | 1 |
| 4 | 0412345679 | Ned Flanders | 60 | M | ned#vatican.net | 4000 | 2014-08-21 19:39:09 | Meat Lovers | 2 |
+----------+---------------+---------------+------+--------+-----------------+----------+---------------------+--------------+----------+
Also please note the problem statement
SELECT *
FROM customers c
JOIN stores s
ON s.postcode =
(
SELECT postcode
FROM orders o
JOIN order_contents oc
USING (order_id)
WHERE o.mobile_number = c.mobile_number
GROUP BY
postcode
ORDER BY
SUM(quantity) DESC
LIMIT 1
)
This won't show customers who have made no orders at all. If you need those, change the JOIN to stores to a LEFT JOIN
Group by your customers primary key (maybe an ID).
The reason you are getting duplicate customers is because you are grouping the query by mobile_number and postcode, which isn't making a unique index.
Your query should become something like this:
select Name ,SUM(quantity) as hqty,COUNT(*),Postcode from CUSTOMERS natural join orders natural join order_contents group by CUSTOMERS.id
Replace ID with whatever the customers table PK is and it should group by the customer uniquely.

MySQL CONCAT multiple unique rows

So, here's basically the problem:
For starter, I am not asking anyone to do my homework, but to just give me a nudge in the right direction.
I have 2 tables containing names and contact data for practicing
Let's call these tables people and contact.
Create Table for people:
CREATE TABLE `people` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`fname` tinytext,
`mname` tinytext,
`lname` tinytext,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Create Table for contact:
CREATE TABLE `contact` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`person_id` int(10) unsigned NOT NULL DEFAULT '0',
`tel_home` tinytext,
`tel_work` tinytext,
`tel_mob` tinytext,
`email` text,
PRIMARY KEY (`id`,`person_id`),
KEY `fk_contact` (`person_id`),
CONSTRAINT `fk_contact` FOREIGN KEY (`person_id`) REFERENCES `people` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
When getting the contact information for each person, the query I use is as follows:
SELECT p.id, CONCAT_WS(' ',p.fname,p.mname,p.lname) name, c.tel_home, c.tel_work, c.tel_mob, c.email;
This solely creates a response like:
+----+----------+---------------------+----------+---------+---------------------+
| id | name | tel_home | tel_work | tel_mob | email |
+----+----------+---------------------+----------+---------+---------------------+
| 1 | Jane Doe | 1500 (xxx-xxx 1500) | NULL | NULL | janedoe#example.com |
| 2 | John Doe | 1502 (xxx-xxx 1502) | NULL | NULL | NULL |
| 2 | John Doe | NULL | NULL | NULL | johndoe#example.com |
+----+----------+---------------------+----------+---------+---------------------+
The problem with this view is that row 1 and 2 (counting from 0) could've been grouped to a single row.
Even though this "non-pretty" result is due to corrupt data, it is likely that this will occur in a multi-node database environment.
The targeted result would be something like
+----+----------+---------------------+----------+---------+---------------------+
| id | name | tel_home | tel_work | tel_mob | email |
+----+----------+---------------------+----------+---------+---------------------+
| 1 | Jane Doe | 1500 (xxx-xxx 1500) | NULL | NULL | janedoe#example.com |
| 2 | John Doe | 1502 (xxx-xxx 1502) | NULL | NULL | johndoe#example.com |
+----+----------+---------------------+----------+---------+---------------------+
Where the rows with the same id and name are grouped when still showing the effective data.
Side notes:
innodb_version: 5.5.32
version: 5.5.32-0ubuntu-.12.04.1-log
version_compile_os: debian_linux-gnu
You could use GROUP_CONCAT(), which "returns a string result with the concatenated non-NULL values from a group":
SELECT p.id,
GROUP_CONCAT(CONCAT_WS(' ',p.fname,p.mname,p.lname)) name,
GROUP_CONCAT(c.tel_home) tel_home,
GROUP_CONCAT(c.tel_work) tel_work,
GROUP_CONCAT(c.tel_mob ) tel_mob,
GROUP_CONCAT(c.email ) email
FROM my_table
GROUP BY p.id

MySQL structure help for joins ( large tables)

I currently have 2 tables that are used for a select query with a simple join. The first table houses around 6-9 million rows, and this gets used as the join. The primary table is anywhere from 1mil to 300mil rows. However, I notice when I join above 10mil rows on the primary table the select query goes from instant to very slow (3+ seconds and grows).
Here is my table structure and queries.
CREATE TABLE IF NOT EXISTS `links` (
`link_id` int(10) unsigned NOT NULL,
`domain_id` mediumint(7) unsigned NOT NULL,
`parent_id` int(11) unsigned DEFAULT NULL,
`hash` int(10) unsigned NOT NULL,
`url` text NOT NULL,
`type` enum('html','pdf') DEFAULT NULL,
`processed` enum('N','Y') NOT NULL DEFAULT 'N',
UNIQUE KEY `hash` (`hash`),
KEY `idx_processed` (`processed`),
KEY `domain_id` (`domain_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
CREATE TABLE IF NOT EXISTS `domains` (
`domain_id` mediumint(7) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(170) NOT NULL,
`blocked` enum('N','Y') NOT NULL DEFAULT 'N',
`count` mediumint(6) NOT NULL DEFAULT '0',
`mcount` mediumint(3) NOT NULL,
PRIMARY KEY (`domain_id`),
KEY `name` (`name`),
KEY `blocked` (`blocked`),
KEY `mcount` (`mcount`),
KEY `count` (`count`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=10834389 ;
Query:
(SELECT link_id, url, hash FROM links, domains WHERE links.domain_id = domains.domain_id and mcount > 1 and processed='N' limit 200)
UNION
(SELECT link_id, url, hash FROM links where processed='N' and type='html' limit 200)
Explain select:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+-------+-------------------------+--------------- +---------+---------------------------+---------+-------------+
| 1 | PRIMARY | domains | range | PRIMARY,mcount | mcount | 3 | NULL | 257673 | Using where |
| 1 | PRIMARY | links | ref | idx_processed,domain_id | domain_id | 3 | crawler.domains.domain_id | 1 | Using where |
| 2 | UNION | links | ref | idx_processed | idx_processed | 1 | const | 7090017 | Using where |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+------------+-------+-------------------------+---------------+---------+---------------------------+---------+-------------+
Right now, I'm trying a partition with 20 partitions on links using domain_id as the key.
Any other options would be greatly appreciated.
A single SELECT statement would replace your entire UNION statement:
SELECT link_id, url, hash
FROM links, domains
WHERE links.domain_id = domains.domain_id
AND mcount > 1
AND processed='N'
AND type='html'
This may not be THE answer you are looking for, but it should help you simplify your question.
When things suddenly slow down you might want to check the size of your indexes (used in the query execution) vs size of various mysql buffers.