I have a query that is not grouping properly and returning the wrong results and I can't figure out what the problem is.
The query is shown below. FYI - It's not obvious in it's current form why I need the group by because I've removed all other parts of the query to get to the most basic form where I see a problem.
SELECT * FROM (
SELECT *
FROM notifications n
WHERE 1
-- and group_id = '5b35c8eb075881f8bbdfbcb36b052aa7'
GROUP BY `from`
) t
WHERE group_id = '5b35c8eb075881f8bbdfbcb36b052aa7'
The problem is that when I use put the where on the inside subquery (currently commented out), for this case, I end up with 4 results. Each of the 4 results have a different "from" value so should be listed separately. When I put the where on the outside of the subquery I end up with 3 results.
For completeness the table definition is:
CREATE TABLE `notifications` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`mem_id` int(10) unsigned DEFAULT NULL,
`type` varchar(255) NOT NULL,
`from` varchar(255) DEFAULT NULL,
`entry_id` int(11) DEFAULT NULL,
`parent_id` int(11) DEFAULT NULL,
`table_id` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`emailed` tinyint(1) DEFAULT NULL,
`read` tinyint(1) NOT NULL,
`group_id` char(32) NOT NULL,
PRIMARY KEY (`id`),
KEY `mem_id` (`mem_id`),
KEY `created_at` (`created_at`),
KEY `entry_id` (`entry_id`),
KEY `parent_id` (`parent_id`),
KEY `group_id` (`group_id`)
)
Any ideas what could cause this? I'm completely stumped. At this point I'm ready to attribute it to some bug in mysql but that also seems unlikely.
Update
I wasn't clear by what I meant by "wrong results" There were 7 records in the data set with this group_id. There were 2 records with a unique "from" and 5 more records with 2 other "from" ids (one had 3 records, one had 2).
Doing the where for the group by on the inside resulted in in the 4 records that I wanted. I don't care about which row was selected as the result because I'm doing other sums/counts which I excluded from the example because it wasn't directly relevant to the problem.
If I do the where on the outer group by one of the two records with a single "from" did not return at all.
I'll try to update with a sqlfiddle (didn't know about that!) - the issue is that this database I was testing on is wiped daily so I don't have the original data, I'll see if I can reproduce.
update #2
I noticed that in my questions, I've been referring to inner and outer group by - the group by is always on the inner query it's just where the "where" is. I've tried to adjust the phrasing. Again, it's not immediately obvious why I care about the location of the where - but in my final use case, I need the selection to happen on the outside (I'm building a count of notifications that are read/unread and I need a count both per member and total per message - eg the group_id)
sqlfiddle: http://www.sqlfiddle.com/#!2/7d746/5
screenshot of query with inner where:https://www.evernote.com/shard/s48/sh/e355e96e-e48d-4550-bbaf-ffb18bc0bb9c/08e2454867e00e3a05535303429748f1
screenshot of query with outer where:https://www.evernote.com/shard/s48/sh/60b10427-e417-4196-8b92-7d6d8031d21e/c779bc9c46d23472983ac6fa0d25e42d
With the sqlfiddle I get back 4 results each time! Which leads me more to think it's a server issue. We're running MySQL 5.5.28-29.2 Percona Server (GPL), Release rel29.2, Revision 360
This query:
SELECT *
FROM notifications n
WHERE 1
GROUP BY `from`
is simply wrong in ANSI SQL and on almost all DBMS (oracle, postgres, MS SQL etc.).
It runs on MySql only because of their nonstandard group by extension
See this link: http://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html
Hovever they warn about something:
However, this is useful primarily when all values in each
nonaggregated column not named in the GROUP BY are the same for each
group. The server is free to choose any value from each group, so
unless they are the same, the values chosen are indeterminate.
Because of this "feature" your query (select from select * group by) is unpredicable, results are dependent on the order of records in the table.
Take a look at this simple demo: http://www.sqlfiddle.com/#!2/b762e/2
There are two identic tables in this demo, with the same content, the only difference is a physical rows order. And the same queries give completely different results.
---- EDIT how to solve this problem -----
To solve this problem in your query, just add both columns to the GROUP BY clause.
select * FROM (
SELECT * FROM notifications n
GROUP BY `from`, `group_id`
) x
WHERE group_id = 'A';
select * FROM (
SELECT * FROM notifications n
WHERE group_id = 'A'
GROUP BY `from`, `group_id`
) x
Above two queries give always the same results for columns from and group_id, other columns (not included in the GROUP BY clause`) can be random.
Take a look at simple demo --> http://www.sqlfiddle.com/#!2/5d19b/5
Related
My code in Laravel is:
Car::selectRaw('*,
MIN(car_prices.price) AS min_price,
MAX(car_prices.price) AS max_price,
MAX(car_prices.updated_at) AS latest_update')
->leftJoin('car_prices', 'car_prices.car_id', 'cars.id')
->groupBy('car_prices.car_id')
->orderBy('latest_update', 'desc')
->paginate(10);
It takes long time to run until throwing error:
Maximum execution time of 60 seconds exceeded
The count of records in cars table is 100,000 and 6,000,000 in car_prices.
The tables structure:
CREATE TABLE `cars` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=110001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
CREATE TABLE `car_prices` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`car_id` bigint(20) unsigned NOT NULL,
`price` decimal(8,2) NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `car_prices_car_id_foreign` (`car_id`)
) ENGINE=MyISAM AUTO_INCREMENT=5506827 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
The query:
select count(*) as aggregate
from `cars`
left join `car_prices`
on `car_prices`.`car_id` = `cars`.`id`
group by `car_prices`.`car_id`;
select *,
MIN(car_prices.price) AS min_price,
MAX(car_prices.price) AS max_price,
MAX(car_prices.updated_at) AS latest_update from `cars`
left join `car_prices`
on `car_prices`.`car_id` = `cars`.`id`
group by `car_prices`.`car_id`
order by `latest_update` desc
limit 10
offset 0;
How can I optimize it? Should I cache the data? Or there is some better query than this?
My hard disk is SSD
Value of innodb_flush_log_at_trx_commit = 1
The number of writes/inserts approximately 1000/second from 10AM - 02PM and before and after this period there are much less requests.
U need to either have better car table unique index latest_update or remove ->orderBy('latest_update', 'desc') in query. and sort it after receiving the results
U can check the performance in mysql with explain
EXPLAIN SELECT * FROM car order by latest_update desc;
/// Check this https://www.exoscale.com/syslog/explaining-mysql-queries/#:~:text=the%20last%20decade.-,Explain,DELETE%20%2C%20REPLACE%20%2C%20and%20UPDATE%20.
and https://dev.mysql.com/doc/refman/5.7/en/using-explain.html#:~:text=The%20EXPLAIN%20statement%20provides%20information,%2C%20REPLACE%20%2C%20and%20UPDATE%20statements.&text=That%20is%2C%20MySQL%20explains%20how,joined%20and%20in%20which%20order.
Basically u need to optimize (better index) your DB table "car" so that it perform well
And other thing u might to try increasing execution time
In php.ini u need to set max_execution_time = 600 or something more to just check how much time it needed to complete execution.
https://www.codewall.co.uk/increase-php-script-max-execution-time-limit-using-ini_set-function/
The query you have used is not apt for such large tables. instead whenever entry coming to the table car_prices set a operation and take minimum and maximum value and store it in the cars table. or you can setup a crone for this.
In both queries,
GROUP BY cars.id
This is instead of using car_prices.car_id, which might be missing because of the LEFT JOIN.
Once you have done that, the first query (with just the COUNT) can drop the JOIN. And then the GROUP BY becomes redundant:
select count(*) as aggregate
from `cars`
The second query has issues.
With the current design, you must go through all of both tables. Ugh.
Also... If there are no prices for a given car, it will have NULL for latest_update, therefore it will sort at the end of the 100,000 rows. Given that, you may as well not display those cars; this would simplify the query enough to be better optimized.
If you need to list the cars for which you have no prices, make that a separate request in the UI. That query will be a LEFT JOIN .. IS NULL and won't need the MAX()s.
But, I am still concerned about the 10,000 pages that the user needs to paginate through.
Switch from MyISAM to InnoDB.
Toss created_at and updated_at, if you aren't using them for anything.
After that, cars is simply a mapping between id and name. This might allow you to avoid going through cars. Instead do something like
SELECT ( SELECT name FROM cars WHERE id = x.car_id ) AS name,
...
FROM ...
Another thought that whenever you add a row to car_prices, you update updated_at in cars. This would allow you to find the 10 cars entirely in cars.
Decide what you are willing to sacrifice.
More
Note: With MyISAM, a slow SELECT blocks UPDATE. With InnoDB, the can run in parallel; the SELECT uses the values before the UPDATE. Either way, the select is at some "point in time". But InnoDB allows more parallelism.
It is a tradeoff. A small slowdown in updates to achieve a big speedup on selects. (No, I don't know for sure that my suggestion is "faster")
Some further questions to analyze the tradeoff:
Disk: HDD or SSD?
Value of innodb_flush_log_at_trx_commit (after you change to InnoDB).
How much traffic? As a first cut, is the number of writes--insert/delete--more than 100/second?
I have this query:
SELECT `Stocks`.`id` AS `Stocks.id` , `Locations`.`id` AS `Locations.id`
FROM `rowiusnew`.`c_stocks` AS `Stocks`
LEFT JOIN `rowiusnew`.`g_locations` AS `Locations` ON ( `Locations`.`ref_id` = `Stocks`.`id` AND `Locations`.`ref_type` = 'stock' )
GROUP BY `Stocks`.`id`
HAVING `Locations.id` IS NOT NULL
This returns 0 results.
When I add
ORDER BY Locations.id
to the exactly same query, I correctly get 3 results.
Noteworthy: When I discard the GROUP BY clause, I get the same 3 results. The grouping is necessary for the complete query with additional joins; this is the simplified one to demonstrate the problem.
My question is: Why do I not get a result with the original query?
Note that there are two conditions in the JOIN ON clause. Changing or removing the braces or changing the order of these conditions does not change the outcome.
Usually, you would suspect that the field id in g_locations is sometimes NULL, thus the ORDER BY clause makes the correct referenced result be displayed "at the top" of the group dataset. This is not the case; the id field is correctly set up as a primary key field and uses auto_increment.
The EXPLAIN statement shows that filesort is used instead of the index in those cases when I actually get a result. The original query looks like this:
The modified, working query looks like this:
Below is the table definitions:
CREATE TABLE IF NOT EXISTS `c_stocks` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id_stock_type` int(10) unsigned DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`locality` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `StockType_idx` (`id_stock_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `g_locations` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`ref_type` enum('stock','object','branch') DEFAULT NULL,
`ref_id` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UniqueLocation` (`ref_type`,`ref_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The ref_id field features a long comment that I omitted in this definition.
After being unable to reproduce the error on SQLFiddle.com and also on my second computer, I realized that there must be a bug involved.
Indeed, my used version 5.6.12 suffers from this bug:
Some LEFT JOIN queries with GROUP BY could return incorrect results. (Bug #68897, Bug #16620047)
See the change log of MySQL 5.6.13: http://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-13.html
An upgrade to 5.6.17 solved my problem. I am not getting the results I expect, independent of ORDER clauses and aggregate functions.
Remove
having locations.id is not null
instead use
where locations.id is not null
locations.id is not null is not a problem for the grouping - you don't want them included at all.
Also, you need to do something with the locations.id since it isn't in the group by clause. Do you want "max" locations.id?
so your query now becomes:
SELECT `Stocks`.`id` AS `Stocks.id` , max(`Locations`.`id`) AS `Locations.id`
FROM `rowiusnew`.`c_stocks` AS `Stocks`
LEFT JOIN `rowiusnew`.`g_locations` AS `Locations` ON ( `Locations`.`ref_id` = `Stocks`.`id` AND `Locations`.`ref_type` = 'stock' )
WHERE `Locations.id` IS NOT NULL
GROUP BY `Stocks`.`id`
Make those changes and it should work better for you.
FYI: I think that by putting in the order by clause, you are allowing the engine to guess what you want for the locations.id, otherwise it has no clue. In something other than MYSQL, it wouldn't run at all.
I have a site where there is an activity feed, similar to how social sites like Facebook have one. It is a "newest first" list that describes actions taken by users. In production, there's about 200k entries in that table.
Since this is going to be asked anyway, I'll first share the full table structure:
CREATE TABLE `karmalog` (
`id` int(11) NOT NULL auto_increment,
`guid` char(36) default NULL,
`user_id` int(11) default NULL,
`user_name` varchar(45) default NULL,
`user_avat_url` varchar(255) default NULL,
`user_sec_id` int(11) default NULL,
`user_sec_name` varchar(45) default NULL,
`user_sec_avat_url` varchar(255) default NULL,
`event` enum('EDIT_PROFILE','EDIT_AVATAR','EDIT_EMAIL','EDIT_PASSWORD','FAV_IMG_ADD','FAV_IMG_ADDED','FAV_IMG_REMOVE','FAV_IMG_REMOVED','FOLLOW','FOLLOWED','UNFOLLOW','UNFOLLOWED','COM_POSTED','COM_POST','COM_VOTE','COM_VOTED','IMG_VOTED','IMG_UPLOAD','LIST_CREATE','LIST_DELETE','LIST_ADMINDELETE','LIST_VOTE','LIST_VOTED','IMG_UPD','IMG_RESTORE','IMG_UPD_LIC','IMG_UPD_MOD','IMG_GEO','IMG_UPD_MODERATED','IMG_VOTE','IMG_VOTED','TAG_FAV_ADD','CLASS_DOWN','CLASS_UP','IMG_DELETE','IMG_ADMINDELETE','IMG_ADMINDELETEFAV','SET_PASSWORD','IMG_RESTORED','IMG_VIEW','FORUM_CREATE','FORUM_DELETE','FORUM_ADMINDELETE','FORUM_REPLY','FORUM_DELETEREPLY','FORUM_ADMINDELETEREPLY','FORUM_SUBSCRIBE','FORUM_UNSUBSCRIBE','TAG_INFO_EDITED','IMG_ADDSPECIE','IMG_REMOVESPECIE','SPECIE_ADDVIDEO','SPECIE_REMOVEVIDEO','EARN_MEDAL','JOIN') NOT NULL,
`event_type` enum('follow','tag','image','class','list','forum','specie','medal','user') NOT NULL,
`active` bit(1) NOT NULL,
`delete` bit(1) NOT NULL default '\0',
`object_id` int(11) default NULL,
`object_cache` text,
`object_sec_id` int(11) default NULL,
`object_sec_cache` text,
`karma_delta` int(11) NOT NULL,
`gold_delta` int(11) NOT NULL,
`newkarma` int(11) NOT NULL,
`newgold` int(11) NOT NULL,
`migrated` int(11) NOT NULL default '0',
`date_created` timestamp NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `user_sec_id` (`user_sec_id`),
KEY `image_id` (`object_id`),
KEY `date_event` (`date_created`,`event`),
KEY `event` (`event`),
KEY `date_created` (`date_created`),
CONSTRAINT `karmalog_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `user` (`id`) ON DELETE SET NULL,
CONSTRAINT `karmalog_ibfk_2` FOREIGN KEY (`user_sec_id`) REFERENCES `user` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Before optimizing this table, my query had 5 joins and I ran into slow query times. I have denormalized all of that data, so that not a single join is there anymore. So the table and query is flat.
As you can see in the table design, there's an "event" field which is an enum, holding a few dozen possible values. Throughout the site, I show activity feeds based on specific event types. Typically that query looks like this:
SELECT * FROM karmalog as k
WHERE k.event IN ($events) AND k.delete=0
ORDER BY k.date_created DESC, k.id DESC
LIMIT 0,30
What this query does is to find the latest 30 entries in the total set that match any of the events passed in $events, which can be multiple.
Due to removing the joins and having indices on most fields, I was expecting this to perform very well, but it doesn't. On 200k entries, it still takes over 3 seconds and I don't understand why.
Regarding solutions, I know I could archive older entries or partition the table per event type, but that will have quite a code impact, and I first would like to understand why the above is so slow.
As a temporary work-around, I'm now doing this:
SELECT * FROM
(SELECT * FROM karmalog ORDER BY date_created DESC, id DESC LIMIT 0,1000) as karma
WHERE karma.event IN ($events) AND karma.delete=0
LIMIT $page,$pagesize
What this does is to limit the baseset to search in to the latest 1000 entries only, hoping and guessing that there's 30 entries to be found for the filters that I pass in. It's not very robust though. It will not work for more rare events, and it brings pagination issues.
Therefore, I first like to get to the root cause of why my initial query is slow, against my expectation.
Edit: I was asked to share the execution plan. Here's the test query:
EXPLAIN SELECT * FROM karmalog
WHERE event IN ('FAV_IMG_ADD','FOLLOW','COM_POST','IMG_VOTE','LIST_VOTE','JOIN','CLASS_UP','LIST_CREATE','FORUM_REPLY','FORUM_CREATE','FORUM_SUBSCRIBE','IMG_GEO','IMG_ADDSPECIE','SPECIE_ADDVIDEO','EARN_MEDAL') AND karmalog.delete=0
ORDER BY date_created DESC, id DESC
LIMIT 0,36
Execution plan:
id = 1
select_type = SIMPLE
table = karmalog
type = range
possible_keys = event
key = event
key_len = 1
red = NULL
rows = 80519
Extra = Using where; Using filesort
I'm not sure how to read into the above, but I do know that the sort clause really seems to kill this query. With this sorting, it takes 4.3 secs, without 0.03 secs.
SELECT * sometimes slows down ordered queries by a huge amount, so let's start by refactoring your query as follows:
SELECT k.*
FROM karmalog AS k
JOIN (
SELECT id
FROM karmalog
WHERE event IN ($events)
AND delete=0
ORDER BY date_created DESC, id DESC
LIMIT 0,30
) AS m ON k.id = m.id
ORDER BY k.date_created DESC, k.id DESC
This will do your ORDER BY ... LIMIT operation without having to haul the whole table around in the sorting phase. Finally it will look up the appropriate thirty rows from the original table and sort just those again. This might save a whole lot of I/O and in-memory data shuffling.
Second, if id column values are assigned in ascending order as records are inserted, then the use of date_created in your ORDER BY operation is redundant. But MySQL doesn't know that, so leaving it out might help. This will be true if you always use the current date when inserting, and never update the dates.
Third, you might be able to use a compound covering index for the selection (inner) query. This is an index that contains all the fields you need. When you use a covering index, the whole query can be satisfied from the index, and there's no need to bounce back to the original table. This saves disk access time.
Try this compound covering index: (delete, event, id). If you decide you can't get rid of the use of date_created in your ordering, try this instead: (delete, event, date_created, id)
Add a compound index over the two relevant questions. In your table, you can do that by specifying e.g.
KEY `date_created` (`date_created`, `event`)
This key can still be used to satisfy plain old date_created range searching. But in addition to that, the event data is included as well, so the DBS will be able to detect the relevant rows by only looking at the index.
If you want, you can try the other order as well: first event and then date. This might allow some optimization if there are many event types but your filter only contains few. On the other hand, I'm not sure the system will be able to make use of the LIMIT clause in this case, so I'm not certain that this other order will be any help at all.
Edit: I completely missed that your date_event index already has this info. According to your execution plan, though, that one isn't used. Looks like the optimizer is getting things wrong. You could try removing the event index, and perhaps the date index as well, and see what happens then.
I have a table with multiple rows per "website_id"
CREATE TABLE `MyTable` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`tagCheckResult` int(11) DEFAULT NULL,
`website_id` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `IX_website_id` (`website_id`),
) ENGINE=InnoDB;
I am trying to select the latest entry per website_id
-- This creates a temporary table with the last entry per website_id, and joins it
-- to get the entire row
SELECT *
FROM `WebsiteStatus` ws1
JOIN (
SELECT MAX(id) max_id, website_id FROM `WebsiteStatus`
GROUP BY website_id) ws2
ON ws1.id = ws2.max_id
Now, I know the correct way to get the last row per website_id is as above. My qusetion is - I also tried the following simpler query, at it seemed to return the exact same results as above:
SELECT * FROM `WebsiteStatus`
GROUP BY website_id
ORDER BY website_id DESC
I know that in principle GROUP BY without operators (e.g. MAX), like I do in my 2nd query can return any of the relevant rows ... but in practice it returns the last one. Is there an implementation detail in mysql that guarantees this is always the case?
(Just asking for academic curiosity, I know the 1st query is "more correct").
I'm running the following query on a Macbook Pro 2.53ghz with 4GB of Ram:
SELECT
c.id AS id,
c.name AS name,
c.parent_id AS parent_id,
s.domain AS domain_name,
s.domain_id AS domain_id,
NULL AS stats
FROM
stats s
LEFT JOIN stats_id_category sic ON s.id = sic.stats_id
LEFT JOIN categories c ON c.id = sic.category_id
GROUP BY
c.name
It takes about 17 seconds to complete.
EXPLAIN:
alt text http://img7.imageshack.us/img7/1364/picture1va.png
The tables:
Information:
Number of rows: 147397
Data size: 20.3MB
Index size: 1.4MB
Table:
CREATE TABLE `stats` (
`id` int(11) unsigned NOT NULL auto_increment,
`time` int(11) NOT NULL,
`domain` varchar(40) NOT NULL,
`ip` varchar(20) NOT NULL,
`user_agent` varchar(255) NOT NULL,
`domain_id` int(11) NOT NULL,
`date` timestamp NOT NULL default CURRENT_TIMESTAMP,
`referrer` varchar(400) default NULL,
KEY `id` (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=147398 DEFAULT CHARSET=utf8
Information second table:
Number of rows: 1285093
Data size: 11MB
Index size: 17.5MB
Second table:
CREATE TABLE `stats_id_category` (
`stats_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
KEY `stats_id` (`stats_id`,`category_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Information third table:
Number of rows: 161
Data size: 3.9KB
Index size: 8KB
Third table:
CREATE TABLE `categories` (
`id` int(11) NOT NULL auto_increment,
`parent_id` int(11) default NULL,
`name` varchar(40) NOT NULL,
`questions_category_id` int(11) NOT NULL default '0',
`rank` int(2) NOT NULL default '0',
PRIMARY KEY (`id`),
KEY `id` (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=205 DEFAULT CHARSET=latin1
Hopefully someone can help me speed this up.
I see several WTF's in your query:
You use two LEFT OUTER JOINs but then you group by the c.name column which might have no matches. So perhaps you don't really need an outer join? If that's the case, you should use an inner join, because outer joins are often slower.
You are grouping by c.name but this gives ambiguous results for every other column in your select-list. I.e. there might be multiple values in these columns in each grouping by c.name. You're lucky you're using MySQL, because this query would simply give an error in any other RDBMS.
This is a performance issue because the GROUP BY is likely causing the "using temporary; using filesort" you see in the EXPLAIN. This is a notorious performance-killer, and it's probably the single biggest reason this query is taking 17 seconds. Since it's not clear why you're using GROUP BY at all (using no aggregate functions, and violating the Single-Value Rule), it seems like you need to rethink this.
You are grouping by c.name which doesn't have a UNIQUE constraint on it. You could in theory have multiple categories with the same name, and these would be lumped together in a group. I wonder why you don't group by c.id if you want one group per category.
SELECT NULL AS stats: I don't understand why you need this. It's kind of like creating a variable that you never use. It shouldn't harm performance, but it's just another WTF that makes me think you haven't thought this query through very well.
You say in a comment you're looking for number of visitors per category. But your query doesn't have any aggregate functions like SUM() or COUNT(). And your select-list includes s.domain and s.domain_id which would be different for every visitor, right? So what value do you expect to be in the result set if you only have one row per category? This isn't really a performance issue either, it just means your query results don't tell you anything useful.
Your stats_id_category table has an index over its two columns, but no primary key. So you can easily get duplicate rows, and this means your count of visitors may be inaccurate. You need to drop that redundant index and use a primary key instead. I'd order category_id first in that primary key, so the join can take advantage of the index.
ALTER TABLE stats_id_category DROP KEY stats_id,
ADD PRIMARY KEY (category_id, stats_id);
Now you can eliminate one of your joins, if all you need to count is the number of visitors:
SELECT c.id, c.name, c.parent_id, COUNT(*) AS num_visitors
FROM categories c
INNER JOIN stats_id_category sic ON (sic.category_id = c.id)
GROUP BY c.id;
Now the query doesn't need to read the stats table at all, or even the stats_id_category table. It can get its count simply by reading the index of the stats_id_category table, which should eliminate a lot of work.
You are missing the third table in the information provided (categories).
Also, it seems odd that you are doing a LEFT JOIN and then using the right table (which might be all NULLS) in the GROUP BY. You will end up grouping all of the non-matching rows together as a result, is that what you intended?
Finally, can you provide an EXPLAIN for the SELECT?
Harrison is right; we need the other table. I would start by adding an index on category_id to stats_id_category, though.
I agree with Bill. Point 2 is very important. The query doesn't even make logical sense. Also, with the simple fact that there is no where statement means that you have to pull back every row in the stats table, which seems to be around 140000. It then has to sort all that data, so that it can perform the GROUP BY. This is because sort [ O(n log n)] and then find duplicates [ O(n) ] is much faster than just finding duplicates without sorting the data set [ O(n^2)?? ].