Mysql Query performance very slow - mysql

The below query was taking more than 8 min and 900 000 rows processed. it is very slow and affect my product. I can't identify why the query getting slow, all index are set fine.
explain SELECT
COUNT(DISTINCT (cinfo.CONTACT_ID))
FROM
cinfo
INNER JOIN
LTocMapping ON cinfo.CONTACT_ID = LTocMapping.CONTACT_ID
WHERE
(((((((((cinfo.COUNTRY LIKE '%Panama%')
OR (cinfo.COUNTRY LIKE '%PANAMA%'))
AND (((cinfo.CONTACT_EMAIL NOT LIKE '%test%')
AND (cinfo.CONTACT_EMAIL NOT LIKE '%engine%'))
OR (cinfo.CONTACT_EMAIL IS NULL)))
AND ((SELECT
(GROUP_CONCAT(Temp.LIST_ID
ORDER BY Temp.LIST_ID) REGEXP ('.*,*221715000514445053,*.*$'))
FROM
LTocMapping Temp
WHERE
((LTocMapping.CONTACT_ID = Temp.CONTACT_ID)
AND (((Temp.MAPPING_ID >= 221715000000000000)
AND (Temp.MAPPING_ID <= 221715999999999999))
OR ((Temp.MAPPING_ID >= 0)
AND (Temp.MAPPING_ID <= 999999999999))))
GROUP BY Temp.CONTACT_ID) = '0'))
AND ((SELECT
(GROUP_CONCAT(Temp.LIST_ID
ORDER BY Temp.LIST_ID) REGEXP ('.*,*221715000520574130,*.*$'))
FROM
LTocMapping Temp
WHERE
((LTocMapping.CONTACT_ID = Temp.CONTACT_ID)
AND (((Temp.MAPPING_ID >= 221715000000000000)
AND (Temp.MAPPING_ID <= 221715999999999999))
OR ((Temp.MAPPING_ID >= 0)
AND (Temp.MAPPING_ID <= 999999999999))))
GROUP BY Temp.CONTACT_ID) = '0'))
AND (LTocMapping.LIST_ID IN (221715000520574130 , 221715000201569885)))
AND (LTocMapping.STATUS = BINARY 'subscribed'))
AND (((cinfo.CONTACT_STATUS = BINARY 'active')
OR (cinfo.CONTACT_STATUS = BINARY 'softbounce'))
AND (LTocMapping.STATUS = BINARY 'subscribed')))
AND (((cinfo.CONTACT_ID >= 221715000000000000)
AND (cinfo.CONTACT_ID <= 221715999999999999))
OR ((cinfo.CONTACT_ID >= 0)
AND (cinfo.CONTACT_ID <= 999999999999))))
And the answer will be
Below tables FYR
Table 1 :
mysql> desc cinfo;
+------------------------+--------------+------+-----+-----------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+--------------+------+-----+-----------+-------+
| CONTACT_ID | bigint(19) | NO | PRI | NULL | |
| CONTACT_EMAIL | varchar(100) | NO | MUL | NULL | |
| TITLE | varchar(20) | YES | | NULL | |
| FIRSTNAME | varchar(100) | YES | | NULL | |
| LASTNAME | varchar(50) | YES | | NULL | | |
| ADDED_BY | varchar(20) | YES | | NULL | |
| ADDED_TIME | bigint(19) | NO | | NULL | |
| LAST_UPDATED_TIME | bigint(19) | NO | | NULL | |
+------------------------+--------------+------+-----+-----------+-------+
Table 2 :
mysql> desc LTocMapping;
+---------------------+--------------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+------------+-------+
| MAPPING_ID | bigint(19) | NO | PRI | NULL | |
| CONTACT_ID | bigint(19) | NO | MUL | NULL | |
| LIST_ID | bigint(19) | NO | MUL | NULL | |
| STATUS | varchar(100) | YES | | subscribed | |
| MAPPING_STATUS | varchar(20) | YES | | connected | |
| MAPPING_TIME | bigint(19) | YES | | NULL | |
+---------------------+--------------+------+-----+------------+-------+

As Far as I can tell, your subqueries are the bottleneck:
For the first subquery, you are using LTocMapping.CONTACT_ID
For the second subquery, you are using LTocMapping.CONTACT_ID as well.
These references (to values of the outer query) are causing these inner queries to become correlated subqueries (also called dependent subqueries). And that means: For every row you are going to fetch on one of the outer tables (~970000) - you are firing 2 additional queries on another table.
So, that's 1.8 Million (as it seems as well not trivial) queries you are executing.
Most the time, a correlated subquery can be replaced by a proper join. But this depends on the usecase. You also can join the same table twice, when using a different alias.
But to outline some join-options, you need to explain, why the subqueries resulting in the condition group_concat(....) = '0' are important - or maybe better, what you want to achieve.
(ps.: You can also see, that explain outlines them as dependent subquery)

OR is inefficient, see if you can avoid it.
Leading wildcards in LIKE are inefficient. See if a FULLTEXT index would work for you.
With a proper COLLATION, you don't need to test both upper and lower case. Also you can avoid use of BINARY. In both cases, you might be able to use an index. (What indexes do you have?)
Try to change from
WHERE ( ( SELECT ... ) = '0' )
to
WHERE ( NOT EXISTS ( SELECT ... ) )
(The SELECT will need some modification.)
(Please get rid of some of the redundant parens; it is hard to read.)
(Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.)

Related

Make a MariaDB view that includes a boolean

I created this view:
CREATE OR REPLACE VIEW vista_metadatos AS
SELECT m.*, f.archivo IS NOT NULL AS myBooleanColumn
FROM metadatos m
LEFT JOIN facturas f ON (m.uuid = f.uuid)
However myBooleanColumn is being returned as an INT and I want it to be a Boolean which in this case should be a TINYINT:
> desc vista_metadatos;
+-----------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+--------------+------+-----+---------+-------+
| uuid | varchar(40) | YES | | NULL | |
| otherBooleanColumn | tinyint(1) | YES | | NULL | |
| myBooleanColumn | int(1) | NO | | 0 | |
| ... | varchar(42) | NO | | | |
+-----------------------+--------------+------+-----+---------+-------+
From this desc I know that views can hold TINYINT, but how can I create a view that uses that condition as a TINYINT?
You should not really care. A view does not actually store the data, so there is no overhead anyway. And you can use an INT that has 0/1 values just like you use a BOOLEAN.
As far as the documentation states, BOOLEAN (or TINYINT()) are not supported targets for casting. So althought that might not be satisfying from a pure intellectual point of view, you'll have to live with this...
CAST and CONVERT have no boolean, use an IF instead
CREATE OR REPLACE VIEW vista_metadatos AS
SELECT
m.*
,IF(f.archivo IS NOT NULL, 'TRUE', 'FALSE') AS myBooleanColumn
FROM metadatos m
LEFT JOIN facturas f ON (m.uuid = f.uuid)

Duplicate removal not working on table with many NULLs

Perhaps I've been staring at the screen too long but I have the following [legacy] table I'm messing with:
describe t3_test;
+--------------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+------------------+------+-----+---------+----------------+
| provnum | varchar(24) | YES | MUL | NULL | |
| trgt_mo | datetime | YES | | NULL | |
| mcare | varchar(2) | YES | | NULL | |
| bed2prsn_asst | varchar(2) | YES | | NULL | |
| trnsfr2prsn_asst | varchar(2) | YES | | NULL | |
| tlt2prsn_asst | varchar(2) | YES | | NULL | |
| hygn2prsn_asst | varchar(2) | YES | | NULL | |
| bath2psrn_asst | varchar(2) | YES | | NULL | |
| ampmcare2prsn_asst | varchar(2) | YES | | NULL | |
| any2prsn_asst | varchar(2) | YES | | NULL | |
| n | float | YES | | NULL | |
| pct | float | YES | | NULL | |
| trgt_qtr | varchar(12) | YES | | NULL | |
| recno | int(10) unsigned | NO | PRI | NULL | auto_increment |
| enddate | date | YES | | NULL | |
+--------------------+------------------+------+-----+---------+----------------+
15 rows in set (0.00 sec)
I have data that looks like this..
"555223","2008-10-01 00:00:00",NULL,"1",NULL,NULL,NULL,NULL,NULL,NULL,"40","93.0233","2008Q4","5767343","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,"1",NULL,NULL,NULL,NULL,NULL,NULL,"40","93.0233","2008Q4","4075309","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,"0",NULL,NULL,NULL,NULL,NULL,NULL,"3","6.97674","2008Q4","4075308","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,"0",NULL,NULL,NULL,NULL,NULL,NULL,"3","6.97674","2008Q4","5767342","2008-12-31"
"555223","2008-10-01 00:00:00","N",NULL,"1",NULL,NULL,NULL,NULL,NULL,"36","83.7209","2008Q4","4075327","2008-12-31"
"555223","2008-10-01 00:00:00","N","1",NULL,NULL,NULL,NULL,NULL,NULL,"36","83.7209","2008Q4","4075323","2008-12-31"
"555223","2008-10-01 00:00:00","Y","1",NULL,NULL,NULL,NULL,NULL,NULL,"4","9.30233","2008Q4","4075325","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,NULL,"0",NULL,NULL,NULL,NULL,NULL,"3","6.97674","2008Q4","4075310","2008-12-31"
"555223","2008-10-01 00:00:00",NULL,NULL,"1",NULL,NULL,NULL,NULL,NULL,"40","93.0233","2008Q4","4075311","2008-12-31"
The first two lines of the table clearly appear to be dupes (minus the A.I. index "recno"). I've tried a half dozen dupe-removal routines and they are not automatically removed.
At this point I am not sure what exactly is wrong? Is it possible there's an invisible character somewhere? Is it possible a letter is in a different character encoding? When I dump the data to CSV as is listed, it doesn't look any different.
Do you have a delete routine that would work on this file structure that would remove anything that is a dupe (minus the recno field)? I have been staring at this for two days and for some reason, it escapes me. (btw, I am aware of the column name anomaly for bathd2psrn_asst - that's not it)
This (original) table has over 13 million records in it. And is over 3GB in size so I'm looking for the most efficient way to kill dupes.. Any ideas?
Here's an example of one of the dupe-killing techniques I used that did not work:
DELETE a FROM t3_test as a, t3_test as b WHERE
(a.provnum=b.provnum)
AND (a.trgt_mo=b.trgt_mo OR a.trgt_mo IS NULL AND b.trgt_mo IS NULL)
AND (a.mcare=b.mcare OR a.mcare IS NULL AND b.mcare IS NULL)
AND (a.bed2prsn_asst=b.bed2prsn_asst OR a.bed2prsn_asst IS NULL AND b.bed2prsn_asst IS NULL)
AND (a.trnsfr2prsn_asst=b.trnsfr2prsn_asst OR a.trnsfr2prsn_asst IS NULL AND b.trnsfr2prsn_asst IS NULL)
AND (a.tlt2prsn_asst=b.tlt2prsn_asst OR a.tlt2prsn_asst IS NULL AND b.tlt2prsn_asst IS NULL)
AND (a.hygn2prsn_asst=b.hygn2prsn_asst OR a.hygn2prsn_asst IS NULL AND b.hygn2prsn_asst IS NULL)
AND (a.bath2psrn_asst=b.bath2psrn_asst OR a.bath2psrn_asst IS NULL AND b.bath2psrn_asst IS NULL)
AND (a.ampmcare2prsn_asst=b.ampmcare2prsn_asst OR a.ampmcare2prsn_asst IS NULL AND b.ampmcare2prsn_asst IS NULL)
AND (a.any2prsn_asst=b.any2prsn_asst OR a.any2prsn_asst IS NULL AND b.any2prsn_asst IS NULL)
AND (a.n=b.n OR a.n IS NULL AND b.n IS NULL)
AND (a.pct=b.pct OR a.pct IS NULL AND b.pct IS NULL)
AND (a.trgt_qtr=b.trgt_qtr OR a.trgt_qtr IS NULL AND b.trgt_qtr IS NULL)
AND (a.enddate=b.enddate OR a.enddate IS NULL AND b.enddate IS NULL)
AND (a.recno>b.recno);
For such a large table, delete can be quite inefficient -- all the logging needed for the deletes is very cumbersome.
I might recommend that you try the truncate/insert approach:
create table temp_t3_test as (
select provnum, targ_mo, . . .,
min(recno) as recno,
enddate
from t3_test
group by provnum, targ_mo, . . ., enddate;
truncate table t3_test;
insert into t3_test(provnum, targ_mo, . . . , recno, enddate)
select *
from temp_t3_test;
Try:
CREATE TABLE t3_new AS
(
SELECT provnum,
trgt_mo,
mcare,
bed2prsn_asst,
trnsfr2prsn_asst,
tlt2prs‌​n_asst,
hygn2prsn_ass‌​t,
bath2psrn_asst,
amp‌​mcare2prsn_asst,
any2‌​prsn_asst,
n,
pct,
trgt‌​_qtr,
Min(recno),
endd‌​ate
FROM t3_test
GROUP BY provnum,
trgt_mo,
mcare,
bed2prsn_asst,
trnsfr2prsn_asst,
tlt2prs‌​n_asst,
hygn2prsn_ass‌​t,
bath2psrn_asst,
amp‌​mcare2prsn_asst,
any2‌​prsn_asst,
n,
pct,
trgt‌​_qtr,
enddate
)
When you use min(recno), you don't actually select just one row. you select the minimum of all recno and use the same value for all the rows. To remove less rows, you can use distinct or group by as I have used. I would say that you can remove the rec no from the temp table and use a new auto increment column in the table that you create again to avoid gaps in the ids.
This is to be used in with the method suggested by Gordon Linoff.
In the case of this scenario, the problem was not with the SQL statement. It was a problem with the DATA, but it was not visible.
The two fields designated type "float" held hidden decimal values that were slightly different from each other. Converting those fields to DECIMAL(a,b) type made the dupes show up and be properly deleted by conventional means.
Special thanks to Gordon Linoff for suggesting looking into this.

Is this possible in one fast mysql query?

I have three tables and i need different data from all of them. Sadly i also need to be able to extract the latest row.
Here are my tables:
messages: I am just storing the content of the messages inside a table because one text could be sent to multiple users
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| message_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| content | varchar(255) | NO | | 0 | |
+------------+------------------+------+-----+---------+----------------+
conversations: This table just reflects a single conversation between two users.
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| partner_id | int(10) unsigned | NO | MUL | NULL | |
| conversation_id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| expedition_id | int(11) | NO | | NULL | |
| active | tinyint(4) | NO | | 1 | |
+-----------------+------------------+------+-----+---------+----------------+
conversation_messages: This table stores the information about the actual messages exchanged.
+-----------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+-------+
| message_id | int(11) unsigned | NO | PRI | NULL | |
| receiver_id | int(11) unsigned | NO | PRI | NULL | |
| conversation_id | int(11) unsigned | NO | MUL | NULL | |
| status | tinyint(4) | NO | | NULL | |
| timestamp | datetime | YES | | NULL | |
+-----------------+------------------+------+-----+---------+-------+
What i want to do now to select the latest message inside each conversation and get the content from this message aswell. (It sounds simple, but it did not find a simple solution). What i tried is the following:
SELECT max(c_m.message_id), m.content, c_m.`status`
FROM expedition_conversations e_c, conversation_messages c_m
INNER JOIN messages m ON m.message_id = c_m.message_id
WHERE e_c.expedition_id = 1 AND (c_m.conversation_id = e_c.conversation_id)
GROUP BY c_m.conversation_id;
Sadly since GROUP BY internally seems to selecting the first inserted row most of the time, the content i select from the messages table is wrong, while the message_id selected from conversation_messages is correct.
Any idea how to perform this in one query? If you have any suggestions to alter the table structure, i would also appreciate those.
Thanks in advance.
You may want to try this version:
SELECT c_m.message_id, m.content, c_m.`status`
FROM expedition_conversations e_c join
conversation_messages c_m
ON c_m.conversation_id = e_c.conversation_id INNER JOIN
messages m
ON m.message_id = c_m.message_id
WHERE e_c.expedition_id = 1 AND
NOT EXISTS (SELECT 1
FROM conversation_messages cm2
WHERE cm2.conversation_id = c_m.conversation_id AND
cm2.timestamp > c_m.timestamp
)
For performance, you want an index on conversation_messages(conversation_id, timestamp).
This is possible, because your usage of AUTO_INCREMENT means, that the highest id belongs to the latest message:
SELECT
messages.*,
FROM
conversations
INNER JOIN (
SELECT conversation_id, MAX(message_id) AS maxmsgid
FROM conversation_messages
GROUP BY conversation_id
) AS latest ON latest.conversation_id=conversations.id
INNER JOIN messages
ON messages.message_id=latest.maxmsgid
WHERE
1=1 -- whatever you want or need!
Since this query is bound to be quite slow, you might want to consider a few options:
Throw hardware at it: Use enough RAM and configure MySQL to go to disk for the interims table as late as possibel
Use denormalization and have a ON AFTER INSERT trigger on messages update a field on conversation_messages, that holds the latest message ID
Try this little trick:
SELECT c_m.message_id, m.content, c_m.status
FROM expedition_conversations e_c
JOIN (select * from (
select * from conversation_messages
order by message_id desc) x
group by conversation_id) c_m ON c_m.conversation_id = e_c.conversation_id
INNER JOIN messages m ON m.message_id = c_m.message_id
WHERE e_c.expedition_id = 1
This will work on your version if mysql - 5.6.19 - and should out-perform other approaches.

Query taking very long (Explain included)

Goal of query:
Display race by district.
Query:
SELECT school_data_schools_outer.district_id,
school_data_race_ethnicity_raw_outer.year,
school_data_race_ethnicity_raw_outer.race,
ROUND(
SUM( school_data_race_ethnicity_raw_outer.count) /
(SELECT SUM(count)
FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_inner
INNER JOIN school_data_schools as school_data_schools_inner
USING (school_id)
WHERE school_data_schools_outer.district_id = school_data_schools_inner.district_id
AND school_data_race_ethnicity_raw_outer.year = school_data_race_ethnicity_raw_inner.year) * 100, 2)
FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_outer
INNER JOIN school_data_schools as school_data_schools_outer USING (school_id)
GROUP BY school_data_schools_outer.district_id,
school_data_race_ethnicity_raw_outer.year,
school_data_race_ethnicity_raw_outer.race
mysql> explain SELECT school_data_schools_outer.district_id, school_data_race_ethnicity_raw_outer.year, school_data_race_ethnicity_raw_outer.race,ROUND(SUM(school_data_race_ethnicity_raw_outer.count)/( SELECT SUM(count) FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_inner INNER JOIN school_data_schools as school_data_schools_inner USING (school_id) WHERE school_data_schools_outer.district_id = school_data_schools_inner.district_id and school_data_race_ethnicity_raw_outer.year = school_data_race_ethnicity_raw_inner.year ) * 100,2) FROM school_data_race_ethnicity_raw as school_data_race_ethnicity_raw_outer INNER JOIN school_data_schools as school_data_schools_outer USING (school_id) GROUP BY school_data_schools_outer.district_id, school_data_race_ethnicity_raw_outer.year, school_data_race_ethnicity_raw_outer.race;
+----+--------------------+--------------------------------------+--------+----------------------------+---------+---------+----------------------------------------------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------------------------+--------+----------------------------+---------+---------+----------------------------------------------------------------------+-------+---------------------------------+
| 1 | PRIMARY | school_data_race_ethnicity_raw_outer | ALL | school_id,school_id_2 | NULL | NULL | NULL | 84012 | Using temporary; Using filesort |
| 1 | PRIMARY | school_data_schools_outer | eq_ref | PRIMARY | PRIMARY | 257 | rocdocs_main_drupal_7.school_data_race_ethnicity_raw_outer.school_id | 1 | |
| 2 | DEPENDENT SUBQUERY | school_data_race_ethnicity_raw_inner | ref | school_id,year,school_id_2 | year | 4 | func | 8402 | |
| 2 | DEPENDENT SUBQUERY | school_data_schools_inner | eq_ref | PRIMARY | PRIMARY | 257 | rocdocs_main_drupal_7.school_data_race_ethnicity_raw_inner.school_id | 1 | Using where |
+----+--------------------+--------------------------------------+--------+----------------------------+---------+---------+----------------------------------------------------------------------+-------+---------------------------------+
4 rows in set (0.00 sec)
mysql>
mysql> describe school_data_race_ethnicity_raw;
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| school_id | varchar(255) | NO | MUL | NULL | |
| year | int(11) | NO | MUL | NULL | |
| race | varchar(255) | NO | | NULL | |
| count | int(11) | NO | | NULL | |
+-----------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> describe school_data_schools;
+-------------+----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------------+------+-----+---------+-------+
| school_id | varchar(255) | NO | PRI | NULL | |
| grade_level | varchar(255) | NO | | NULL | |
| district_id | varchar(255) | NO | | NULL | |
| school_name | varchar(255) | NO | | NULL | |
| address | varchar(255) | NO | | NULL | |
| city | varchar(255) | NO | | NULL | |
| lat | decimal(20,10) | NO | | NULL | |
| lon | decimal(20,10) | NO | | NULL | |
+-------------+----------------+------+-----+---------+-------+
8 rows in set (0.00 sec)
NOTE: I also have tried:
select sds.school_id,
detail.year,
detail.race,
ROUND((detail.count / summary.total) * 100 ,2) as percent
FROM school_data_race_ethnicity_raw as detail
inner join school_data_schools as sds USING (school_id)
inner join (
select sds2.district_id, year, sum(count) as total
from school_data_race_ethnicity_raw
inner join school_data_schools as sds2 USING (school_id)
group by sds2.district_id, year
) as summary on summary.district_id = sds.district_id
and summary.year = detail.year
This is slow beacuse:
You have no index in use on school_data_race_ethnicity_raw_outer, so it's scanning each of the ~84,000 rows
You are using a correlated subquery which means that your complex calculation has to be run once per row i.e. 84,000 times.
The best approach is not to use a correlated subquery, but if not, then to make it go fast, you need to use covering indexes so that the whole of that inner query (and the other parts via their own indexes) can be run lightning fast using just the index. For a great tutorial on the subject of indexes, check this out. It taught me a lot! Right now, your inner query just uses the year index on school_data_race_ethnicity_raw, so it has to look up the rest of the stuff it needs by reading 8000 rows for every one of the 84000 calculations. Indexes will make this far faster e.g. create a composite index on school_data_race_ethnicity_raw and you will find it helps:
CREATE index inner_composite ON school_data_race_ethnicity_raw (year, district_id, schoolid, count)
This will allow all the fields used in the WHERE to be gotten from the index, then the join field, then the field you want for the select. You should see it show up in the 'key' column of your explain result. Also, if you get it right, you'll see 'using index' in the right-most column, showing that no table access is happening, which is orders of magnitude faster.
You can experiment quick-and-dirty style by adding loads of indexes for the columns that the query mentions and see what gets picked up in the key column. If something appears, read your query to see what other columns from that table are in use, then add a new index with those columns added in too on the right hand side and see if that works better. Remember to delete the unused indexes once you find out what works.
MySQL doesn't allow you to directly index the SUM of a column, which would be the fastest way, so unless you want to move to another DB (good idea if you can), this will always be a little slow.
This should be all you need to aggregate your data to get a count of race by district, not sure why you are doing so much math in your original, as it is unnecessary to achieve your goal, and is forcing some crazy sub queries.
SELECT SUM(students.count) as studentCount, School.district_id, students.race
FROM school_data_schools schools,
school_data_race_ethnicity_raw students
WHERE shools.school_id = students.school_id
GROUP BY district_id, race
You probably also want an index on school_data_race_ethnicity_raw.school_id (alone, not as part of a multiple column key)
EDIT was not aware OP was looking for a percentage breakdown, and not just totals
SELECT ((studentCount / districtTotal) * 100) as percentage, district_id, race
FROM(
SELECT SUM(students.count) as studentCount, Schools.district_id, students.race,
(SELECT SUM(inStudents.count)
FROM school_data_schools inSchools,
school_data_race_ethnicity_raw inStudents
WHERE inSchools.school_id = inStudents.school_id
AND inSchools.district_ID = Schools.district_id
GROUP BY inSchools.district_id) as districtTotal
FROM school_data_schools schools,
school_data_race_ethnicity_raw students
WHERE schools.school_id = students.school_id
GROUP BY district_id, race
) table1
This will run pretty quick, still need to make sure there is an index on school_data_race_ethnicity_raw.school_id that is not part of a multiple column index. you can see it in action here, though my test case is rather small, it does seem to check out.

MySQL: return field for which no related entries exist in another table

First, sorry for the title, as I'm no native english-speaker, this is pretty hard to phrase. In other words, what I'm trying to achieve is this:
I'm trying to fetch all domain names from the table virtual_domains where there is no corresponding entry in the virtual_aliases table starting like "postmaster#%".
So if I have two domains:
foo.org
example.org
An they got aliases like:
info#foo.org => admin#foo.org
postmaster#foo.org => user1#foo.org
info#example.org => admin#example.org
I want the query to return only the domain "foo.org" as "example.org" is missing the postmaster alias.
This is the table layout:
mysql> show columns from virtual_aliases;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| domain_id | int(11) | NO | MUL | NULL | |
| source | varchar(100) | NO | | NULL | |
| destination | varchar(100) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
mysql> show columns from virtual_domains;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | | NULL | |
+-------+-------------+------+-----+---------+----------------+
I tried for many hours with IF, CASE, LIKE queries with no success. I don't need a final solution, maybe just a hint with some explanation. Thanks!
SELECT * FROM virtual_domains AS domains
LEFT JOIN virtual_aliases AS aliases
ON domains.id = aliases.domain_id
WHERE aliases.domain_id IS NULL
LEFT JOIN returns all records from the "left" table, even they have no corresponding records in "right" table. Those records will have the right table fields set to NULL. Use WHERE to strip all the others.
I guess I didn't understand you correctly the first time. You have several entries in aliases for single domain, and you want to display only those domains that don't have an entry in aliases table that starts with "postmaster"?
In this case you are should use NOT IN like this:
SELECT * FROM virtual_domains AS domains
WHERE domains.id NOT IN (
SELECT domain_id
FROM virtual_aliases
WHERE whatever_column LIKE "postmaster#%"
)
select id,domain from virtual_domains
where id not in (select domain_id from virtual_aliases)
SELECT * FROM virtual_domains vd
LEFT JOIN virtual_aliases va ON vd.id = va.domain_id
AND va.destination NOT LIKE 'postmaster#%';