My SQL Query is very slow with 50K+ records - mysql

I have a table called links in which I am having 50000 + records where three of the fields are having indexes in the order link_id ,org_id ,data_id.
Problem here is when I am using group by sql query it is taking more time to load.
The Query is like this
SELECT DISTINCT `links`.*
FROM `links`
WHERE `links`.`org_id` = 2 AND (link !="")
GROUP BY link
The table is having nearly 20+ columns
Is there any solution to speed up the query to access faster.

Build an index on (org_id, link)
org_id to optimize your WHERE clause, and link for your group by (and also part of where).
By having the LINK_ID in the first position is probably what is holding your query back.
create index LinksByOrgAndLink on Links ( Org_ID, Link );
MySQL Create Index syntax

the problem is in your DISTINCT.*
the GROUP BY is already doing the work of DISTINCT , so are doing distinct two times , one of SELECT DISTINCT and other for GROUP BY
try this
SELECT *
FROM `links`
WHERE org_id` = 2 AND (link !="")
GROUP BY link

I guess adding a index to your 'link' column would improve the result.
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
Only select the columns that you need.
Why is there a distinct for links.*?
Do you have some rows exactly doubled in your table?
On the other hand, changing the value "" to NULL could be improve your select statement, but iam not sure about this.

Related

Mysql query very slow and not using proper index (By using group by, IN operator )

Below query was taking 5+ sec time to execute ( Table contains 1m+ records ).
Outer query was not using proper index it always fetching data by using FULL table scan.can someone help me how to optimize it..
Query
SELECT x
FROM UserCardXref x
WHERE x.userCardXrefId IN(
SELECT MAX(y.userCardXrefId)
FROM UserCardXref y
WHERE y.usrId IN(1001,1002)
GROUP
BY y.usrId
HAVING COUNT(*) > 0
)
Query Explain
Query Statistics
Execution Plan
I would re-write the query as
select x.* from UserCardXref x
join (
select max(userCardXrefId),usrId from UserCardXref
where usrId in (1001,1002) group by usrId
)y on x.userCardXrefId = y.userCardXrefId
The indexes you will need as
alter table UserCardXref add index userCardXrefId_idx(userCardXrefId)
usrId is already indexed as per the explain plan so no need to add that
Also you have having count(*)>0 you are already using max() function and it would never have 0 rows for a given group so I have removed that.

Improve performance in sum across multiple rows for multiple fields?

I have MySQL sub query to select multiple rows with 15 columns names,joining two tables one has 4,00,000 record another one has 9000 records.
In this filter using unique id in both tables,in where clause using date filter as between .. and ... date column mostly having null value only.Added index for both table columns its reduced 28 sec to 17 sec ..
140 rows retrieving in this select statement..In this case sum across multiple rows for multiple fields took more time to retrieve data,.How to improve performance to this query?If any one has experience share it?
SELECT A.xs_id ,
A.unique_id ,
sum(A.xs_type) AS TYPE ,
sum(A.xs_item_type) AS item_type ,
sum(A.xs_counterno) AS counterno ,
r.Modified_date ,
sum(A.sent) AS sent ,
sum(A.sent_amt) AS amount ,
(sum(sent_amt)+sum(rec_amt)) AS total
FROM xs_data A
JOIN r_data r ON (r.unique_id=A.uniqueid
AND summ_id =1
AND modified_date IS NOT NULL
WHERE date(modified_date) BETWEEN '2012-02-12' AND '2013-01-22')
GROUP BY date(modified_date),
A.xs_id,
A.unique_id;
If you have an index on modified_date, then the BETWEEN query can use it. Once you put a column into a function DATE(modified_date), MySQL no longer uses the column's index, so it has to go through all the rows (that's called a full table scan).
It could be helpful to use
WHERE `modified_date` >= '2012-02-12 00:00:00'
AND `modified_date` < '2013-01-23 00:00:00'
On the other hand EXPLAIN SELECT will tell you more.
Find the pain....
Delete the group by, delete the columns and focus on the join. You join is about unique_id AND summ_id AND modified_date, so I need an index with those three fields. The most discriminant field first, that's probably uniqueid / unique_id.
To help you focus on the join only write the result of the join in a temp table, like:
select uniqueid
into #temp
from ...
Do that if the join spits out 100k records you don't get confronted with side effects (like for example sending all those records to you management studio).

select count(*) taking considerably longer than select * for same "where" clause?

I am finding a select count(*) is taking considerably longer than select * for the queries with the same where clause.
The table in question has about 2.2 million records (call it detailtable). It has a foreign key field linking to another table (maintable).
This query takes about 10-15 seconds:
select count(*) from detailtable where maintableid = 999
But this takes a second or less:
select * from detailtable where maintableid = 999
UPDATE - It was asked to specify the number of records involved. It is 150.
UPDATE 2 Here is information when the EXPLAIN keyword is used.
For the SELECT COUNT(*), The EXTRA column reports:
Using where; Using index
KEY and POSSIBLE KEYS both have the foreign key constraint as their value.
For the SELECT * query, everything is the same except EXTRA just says:
Using Where
UPDATE 3 Tried OPTIMIZE TABLE and it still does not make a difference.
For sure
select count(*)
should be faster than
select *
count(*), count(field), count(primary key), count(any) are all the same.
Your explain clearly stateas that the optimizer somehow uses the index for count(*) and not for the other making the foreign key the main reason for the delay.
Eliminate the foreign key.
Try
select count(PRIKEYFIELD) from detailtable where maintableid = 999
count(*) will get all data from the table, then count the rows meaning it has more work to do.
Using the primary key field means it's using its index, and should run faster.
Thread Necro!
Crazy idea... In some cases, depending on the query planner and the table size, etc, etc., it is possible for using an index to actually be slower than not using one. So if you get your count without using an index, in some cases, it could actually be faster.
Try this:
SELECT count(*)
FROM detailtable
USING INDEX ()
WHERE maintableid = 999
SELECT count(*)
with that syntax alone is no problem, you can do that to any table.
The main issue on your scenario is the proper use of INDEX and applying [WHERE] clause on your search.
Try to reconfigure your index if you have the chance.
If the table is too big, yes it may take time. Try to check MyISAM locking article.
As the table is 2.2 million records, count can take time. As technically, MySQL should find the records and then count them. This is an extra operation that becomes significant with millions of records. The only way to make it faster is to cache the result in another table and update it behind the scenes.
Or simply Try
SELECT count(1) FROM table_name WHERE _condition;
SELECT count('x') FROM table_name WHERE _condition;

MySQL select not in is really slow

This is what I have now.
SELECT distinct ClientID
FROM Table
WHERE PAmt = '' and ClientID not in
(select distinct ClientID from Table where PAmt != '')
ORDER BY ID ASC
ClientID can be inside the Table more then once and some of them have PAmt value some don't.
I am trying to get only the clientid's that never had a PAmt value. The table has about 12000 entry's and only 2700 are unique clientid's
I think this could be easier solved by
SELECT
ClientID,
MAX(IF(PAmt='',0,1)) AS HasPAmt
FROM `Table`
GROUP BY ClientID
HAVING HasPAmt=0
Edit
Some words on the rationale behind this:
Subqueries in MySQL are a beasty thing: If the resultset is either too large (Original SQL) or intertwined with the driving query (#DavidFleeman's answer), the inner query is looped, i.e. it is repeated for every row of the driving query. This ofcourse gives bad performance.
So we try to reformulate the query in a way, that will avoid looping. My suggestion works by running only two queries: The first (everything before the HAVING) will create a temp table, that marks each distinct ClientID as having at least one non-empty PAmt (or not), the second selects only those rows of the temp table, that are marked as having none, into the final result set.
try to reorganize your query to something like this:
select clientID
from Table
group by clientID
having max(length(PAmt)) == 0
of course you should add index (clientID, PAmt)
if this query will still work slow, add column with pre-calculated length, and replace PAmt with this column
Take away the subquery:
$clientIds = runQuery("select distinct ClientID from Table where PAmt != ''");
$clientIds = implode(",", $clientIds);
runQuery("SELECT distinct ClientID
FROM Table
WHERE PAmt = '' and ClientID not in
({$clientIds})
ORDER BY ID ASC")
I know it looks like MySQL should do that optimisation step for you, but it doesn't. You will find your query is about 12000 times faster if you do the sub query as a separate query.
Try using not exists instead of not in.
http://sqlfiddle.com/#!2/249cd5/26

How can I speed up a MySQL query with a large offset in the LIMIT clause?

I'm getting performance problems when LIMITing a mysql SELECT with a large offset:
SELECT * FROM table LIMIT m, n;
If the offset m is, say, larger than 1,000,000, the operation is very slow.
I do have to use limit m, n; I can't use something like id > 1,000,000 limit n.
How can I optimize this statement for better performance?
Perhaps you could create an indexing table which provides a sequential key relating to the key in your target table. Then you can join this indexing table to your target table and use a where clause to more efficiently get the rows you want.
#create table to store sequences
CREATE TABLE seq (
seq_no int not null auto_increment,
id int not null,
primary key(seq_no),
unique(id)
);
#create the sequence
TRUNCATE seq;
INSERT INTO seq (id) SELECT id FROM mytable ORDER BY id;
#now get 1000 rows from offset 1000000
SELECT mytable.*
FROM mytable
INNER JOIN seq USING(id)
WHERE seq.seq_no BETWEEN 1000000 AND 1000999;
If records are large, the slowness may be coming from loading the data. If the id column is indexed, then just selecting it will be much faster. You can then do a second query with an IN clause for the appropriate ids (or could formulate a WHERE clause using the min and max ids from the first query.)
slow:
SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
fast:
SELECT id FROM table ORDER BY id DESC LIMIT 10 OFFSET 50000
SELECT * FROM table WHERE id IN (1,2,3...10)
There's a blog post somewhere on the internet on how you should best make the selection of the rows to show should be as compact as possible, thus: just the ids; and producing the complete results should in turn fetch all the data you want for only the rows you selected.
Thus, the SQL might be something like (untested, I'm not sure it actually will do any good):
select A.* from table A
inner join (select id from table order by whatever limit m, n) B
on A.id = B.id
order by A.whatever
If your SQL engine is too primitive to allow this kind of SQL statements, or it doesn't improve anything, against hope, it might be worthwhile to break this single statement into multiple statements and capture the ids into a data structure.
Update: I found the blog post I was talking about: it was Jeff Atwood's "All Abstractions Are Failed Abstractions" on Coding Horror.
I don't think there's any need to create a separate index if your table already has one. If so, then you can order by this primary key and then use values of the key to step through:
SELECT * FROM myBigTable WHERE id > :OFFSET ORDER BY id ASC;
Another optimisation would be not to use SELECT * but just the ID so that it can simply read the index and doesn't have to then locate all the data (reduce IO overhead). If you need some of the other columns then perhaps you could add these to the index so that they are read with the primary key (which will most likely be held in memory and therefore not require a disc lookup) - although this will not be appropriate for all cases so you will have to have a play.
Paul Dixon's answer is indeed a solution to the problem, but you'll have to maintain the sequence table and ensure that there is no row gaps.
If that's feasible, a better solution would be to simply ensure that the original table has no row gaps, and starts from id 1. Then grab the rows using the id for pagination.
SELECT * FROM table A WHERE id >= 1 AND id <= 1000;
SELECT * FROM table A WHERE id >= 1001 AND id <= 2000;
and so on...
I have run into this problem recently. The problem was two parts to fix. First I had to use an inner select in my FROM clause that did my limiting and offsetting for me on the primary key only:
$subQuery = DB::raw("( SELECT id FROM titles WHERE id BETWEEN {$startId} AND {$endId} ORDER BY title ) as t");
Then I could use that as the from part of my query:
'titles.id',
'title_eisbns_concat.eisbns_concat',
'titles.pub_symbol',
'titles.title',
'titles.subtitle',
'titles.contributor1',
'titles.publisher',
'titles.epub_date',
'titles.ebook_price',
'publisher_licenses.id as pub_license_id',
'license_types.shortname',
$coversQuery
)
->from($subQuery)
->leftJoin('titles', 't.id', '=', 'titles.id')
->leftJoin('organizations', 'organizations.symbol', '=', 'titles.pub_symbol')
->leftJoin('title_eisbns_concat', 'titles.id', '=', 'title_eisbns_concat.title_id')
->leftJoin('publisher_licenses', 'publisher_licenses.org_id', '=', 'organizations.id')
->leftJoin('license_types', 'license_types.id', '=', 'publisher_licenses.license_type_id')
The first time I created this query I had used the OFFSET and LIMIT in MySql. This worked fine until I got past page 100 then the offset started getting unbearably slow. Changing that to BETWEEN in my inner query sped it up for any page. I'm not sure why MySql hasn't sped up OFFSET but between seems to reel it back in.