Optimize SQL Query - mysql

I have problem with optimize this query:
SET #SEARCH = "dokumentalne";
SELECT SQL_NO_CACHE
`AA`.`version` AS `Version` ,
`AA`.`contents` AS `Contents` ,
`AA`.`idarticle` AS `AdressInSQL` ,
`AA` .`topic` AS `Topic` ,
MATCH (`AA`.`topic` , `AA`.`contents`) AGAINST (#SEARCH) AS `Relevance` ,
`IA`.`url` AS `URL`
FROM `xv_article` AS `AA`
INNER JOIN `xv_articleindex` AS `IA` ON ( `AA`.`idarticle` = `IA`.`adressinsql` )
INNER JOIN (
SELECT `idarticle` , MAX( `version` ) AS `version`
FROM `xv_article`
WHERE MATCH (`topic` , `contents`) AGAINST (#SEARCH)
GROUP BY `idarticle`
) AS `MG`
ON ( `AA`.`idarticle` = `MG`.`idarticle` )
WHERE `IA`.`accepted` = "yes"
AND `AA`.`version` = `MG`.`version`
ORDER BY `Relevance` DESC
LIMIT 0 , 30
Now, this query using ^ 20 seconds. How to optimize this?
EXPLAIN gives this:
1 PRIMARY AA ALL NULL NULL NULL NULL 11169 Using temporary; Using filesort
1 PRIMARY ALL NULL NULL NULL NULL 681 Using where
1 PRIMARY IA ALL accepted NULL NULL NULL 11967 Using where
2 DERIVED xv_article fulltext topic topic 0 1 Using where; Using temporary; Using filesort
This is example server with my data:
user: bordeux_4prog
password: 4prog
phpmyadmin: http://phpmyadmin.bordeux.net/
chive: http://chive.bordeux.net/

Looks like your db is dead. Getting rid of inner query is the key part to optimization. Please try this (not tested) query:
SET #SEARCH = "dokumentalne";
SELECT SQL_NO_CACHE
aa.idarticle AS `AdressInSQL`,
aa.contents AS `Contents`,
aa.topic AS `Topic`,
MATCH(aa.topic , aa.contents) AGAINST (#SEARCH) AS `Relevance`,
ia.url AS `URL`,
MAX(aa.version) AS `Version`
FROM
xv_article AS aa,
xv_articleindex AS ia
WHERE
aa.idarticle = ia.adressinsql
AND ia.accepted = "yes"
AND MATCH(aa.topic , aa.contents) AGAINST (#SEARCH)
GROUP BY
aa.idarticle,
aa.contents,
`Relevance`,
ia.url
ORDER BY
`Relevance` DESC
LIMIT
0, 30
To further optimize your query you may also split getting articles with newest version from full text search as the latter is the most expensive. This can be done by subquerying (also not tested on your db):
SELECT SQL_NO_CACHE
iq.idarticle AS `AdressInSQL`,
iq.topic AS `Topic`,
iq.contents AS `Contents`,
iq.url AS `URL`,
MATCH(iq.topic, iq.contents) AGAINST (#SEARCH) AS `Relevance`
FROM (
SELECT
a.idarticle,
a.topic,
a.contents,
i.url,
MAX(a.version) AS version
FROM
xv_article AS a,
xv_articleindex AS i
WHERE
i.accepted = "yes"
AND a.idarticle = i.adressinsql
GROUP BY
a.idarticle AS id,
a.topic,
a.contents,
i.url
) AS iq
WHERE
MATCH(iq.topic, iq.contents) AGAINST (#SEARCH)
ORDER BY
`Relevance` DESC
LIMIT
0, 30

The first thing I noticed in your DB is that you don't have an index on xv_articleindex.adressinsql. Add it, and it should significantly improve the query performance. Also, one table is MyISAM, whereas another one is InnoDb. Use one engine(in general, I'd recommend InnoDB)

Related

improve mysql select query with order by option

I have following table with around 10 million records.
and using following query to retrieve data, but it is taking more than 4, 5 seconds to hand over the response.
Is any way to improve query...?
CREATE TABLE `master` (
`organizationName` varchar(200) NOT NULL DEFAULT '',
`organizationNameQuery` varchar(200) DEFAULT NULL,
`organizationLinkedinHandle` varchar(200) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',
`organizationDomain` varchar(110) NOT NULL DEFAULT '',
`source` varchar(10) NOT NULL DEFAULT '',
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY `master_inx` (`organizationName`(80),`organizationDomain`(80),`organizationLinkedinHandle`(80),`organizationNameQuery`(80),`source`),
KEY `organizationDomain` (`organizationDomain`),
KEY `domainWithModified` (`organizationDomain`,`modified`),
KEY `modifiedInx` (`modified`)
);
Query:
SELECT *
FROM (SELECT *
FROM Organizations.master
where ( ( organizationDomain like 'linkedin.com'
|| organizationNameQuery = 'linkedin.com')
and source like 'MY_SOURCE') ) M
ORDER BY M.modified DESC limit 1;
1 row in set (4.69 sec)
UPDATE
I found by breaking OR operator i am getting result faster.
For example:
SELECT *
FROM (SELECT *
FROM Organizations.master
where ( ( organizationDomain like 'linkedin.com')
and source like 'MY_SOURCE') ) M
ORDER BY M.modified DESC limit 1;
1 row in set (0.00 sec)
SELECT *
FROM (SELECT *
FROM Organizations.master
where ( (organizationNameQuery = 'linkedin.com')
and source like 'MY_SOURCE') ) M
ORDER BY M.modified DESC limit 1;
1 row in set (0.00 sec)
Use OR, not || in that context.
The performance villain is OR. Turn the OR into UNION:
( SELECT *
FROM Organizations.master
WHERE organizationDomain = 'linkedin.com'
AND source = 'MY_SOURCE'
ORDER BY modified DESC limit 1
) UNION ALL
( SELECT *
FROM Organizations.master
WHERE organizationNameQuery = 'linkedin.com'
AND source = 'MY_SOURCE'
ORDER BY modified DESC limit 1
}
ORDER BY modified DESC LIMIT 1;
Notes:
This formulation is likely to take about 0.00s to run.
The ORDER BY and LIMIT shows up 3 times.
If you need OFFSET, things get a little tricky.
Change back to LIKE if you allow users to enter wildcards.
A leading wildcard would not be efficient.
UNION ALL is faster than UNION (aka UNION DISTINCT).
It needs two new composite indexes; the order of the 2 columns is not critical:
INDEX(organizationDomain, source),
INDEX(organizationNameQuery, source)
As I checked the query I think you can remove the like operator and use =.
SELECT * FROM (
SELECT * FROM Organizations.master
where ( (organizationDomain = 'linkedin.com' ||
organizationNameQuery = 'linkedin.com')
and source = 'MY_SOURCE')
) M
ORDER BY M.modified DESC limit 1

How can reduce the time of this SQL query

Can someone tell me how can I reduce the number of time using this query?
This is the SQL query:
SELECT
`i`.`id`,
`i`.`local_file`,
`i`.`remote_file`,
`i`.`remote_file_big`,
`i`.`image_name`,
`i`.`description`,
IF(`i`.`prevent_sync`='1', '5', `i`.`status`) `status`,
GROUP_CONCAT(`il`.`user_id` SEPARATOR ',') AS `likes`,
COUNT(`il`.`user_id`) AS `likes_count`
FROM `images` `i`
LEFT JOIN `image_likes` `il` ON (`il`.`image_id`=`i`.`id`)
WHERE 1 AND `i`.`created` < DATE_SUB(CURDATE(), INTERVAL 48 HOUR)
GROUP BY `i`.`id`
ORDER BY `likes_count` DESC LIMIT 3 OFFSET 0;
On checking the query time, this is the result:
# Query_time: 9.948511 Lock_time: 0.000181 Rows_sent: 3 Rows_examined: 4730490
# Rows_affected: 0
Table image_likes:
id (Primary) int(11)
local_file varchar(100)
orig_name varchar(100)
remote_file varchar(1000)
remote_file_big varchar(1000)
remote_page varchar(1000)
image_name varchar(50)
image_name_eng varchar(50)
user_idIndex int(11)
author varchar(50)
credit varchar(250)
credit_eng varchar(250)
location varchar(50)
description varchar(500)
description_eng varchar(275)
notes varchar(550)
categoryIndex int(11)
date_range varchar(50)
createdIndex datetime
license enum('1', '2', '3')
status enum('0', '1', '2', '3', '4')
locked enum('0', '1')
watch_list enum('0', '1', '2')
url_title varchar(100)
url_data varchar(8192)
rem_date datetime
rem_notes varchar(500)
original_url varchar(1000)
prevent_sync enum('0', '1')
checked_by int(11)
system_recommended enum('0', '1')
Please suggest.
This is a complex task for the DB, and there is not much you can do to get the result really efficiently. You can try to limit the IO with a subquery that operates on covering indexes. Remove everything from your query that you don't need to get the three image ids:
SELECT i.id
FROM images i
JOIN image_likes il ON il.image_id = i.id
WHERE i.created < DATE_SUB(CURDATE(), INTERVAL 48 HOUR)
GROUP BY i.id
ORDER BY COUNT(il.image_id) DESC
LIMIT 3 OFFSET 0
The smallest covering indexes would be images(created, id) and image_likes(image_id). With 5M likes, both indexes together will consume something like 100 - 200 MB and should easily fit into memory. The size of the temporary table, that has to be sorted by the count, will also be smaller.
Use that query as derived table (subquery in FROM clause) and join only the three rows from the images table:
SELECT
`i`.`id`,
`i`.`local_file`,
`i`.`remote_file`,
`i`.`remote_file_big`,
`i`.`image_name`,
`i`.`description`,
IF(`i`.`prevent_sync`='1', '5', `i`.`status`) `status`,
GROUP_CONCAT(`il`.`user_id` SEPARATOR ',') AS `likes`,
COUNT(`il`.`user_id`) AS `likes_count`
FROM (
SELECT i.id
FROM images i
JOIN image_likes il ON il.image_id = i.id
WHERE i.created < DATE_SUB(CURDATE(), INTERVAL 48 HOUR)
GROUP BY i.id
ORDER BY COUNT(il.image_id) DESC
LIMIT 3 OFFSET 0
) sub
JOIN images i ON i.id = sub.id
JOIN image_likes il ON il.image_id = i.id
GROUP BY i.id
ORDER BY likes_count;
If that isn't fast enough, you should cache the likes_count using triggers.
This probably suffers from the "inflate-deflate" syndrome which often happens with JOIN + GROUP BY. Also it usually leads to incorrect aggregate values.
SELECT `id`, `local_file`, `remote_file`,
`remote_file_big`, `image_name`, `description`,
IF(`prevent_sync`='1', '5', `status`) `status`,
s.likes, s.likes_count
FROM `images` AS `i`
JOIN
( SELECT GROUP_CONCAT(user_id SEPARATOR ',') AS likes,
COUNT(*) AS likes_count
FROM `image_likes`
GROUP BY image_id
ORDER BY `likes_count` DESC
LIMIT 3 OFFSET 0;
) AS s ON s.`image_id`=`i`.`id`
WHERE `created` < CURDATE() - INTERVAL 2 DAY
ORDER BY `likes_count` DESC;
This variant will exclude rows with likes_count=0, but that seems reasonable.
It assumes that the PRIMARY KEY of images is id.
image_likes needs INDEX(user_id) and will make one scan of that table. Then only 3 lookups into images.
The original query had to scan all the rows of images and repeatedly scan all of image_likes.

fetch datas from two tables and differentiate between them

I have two tables and want displays rows from the two one in the same page ordered by date created.
Here my query:
SELECT R.*, R.id as id_return
FROM return R
UNION
ALL
SELECT A.*, A.id as id_buy
FROM buy A
WHERE
R.id_buyer = '$user' AND R.id_buyer = A.id_buyer AND (R.stats='1' OR R.stats='3') OR A.stats='4'
ORDER
BY R.date, A.date DESC LIMIT $from , 20
With this query i get this error message:
Warning: mysqli_fetch_array() expects parameter 1 to be mysqli_result, boolean given in ...
And here how i think i can differentiate between the results: (Knowing if the result is from the table RETURN or from the table BUY)
if(isset($hist_rows["id_return"])) {
// show RETURN rows
} else {
// show BUY rows
}
Please what is wrong with the query, and if the method to differentiate between tables are correct ?
EDIT
Here my tables sample:
CREATE TABLE IF NOT EXISTS `return` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_buyer` INT(12) NOT NULL,
`id_seller` INT(12) NOT NULL,
`message` TEXT NOT NULL,
`stats` INT(1) NOT NULL,
`date` varchar(30) NOT NULL,
`update` varchar(30)
PRIMARY KEY (`id`)
)
CREATE TABLE IF NOT EXISTS `buy` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_buyer` INT(12) NOT NULL,
`product` INT(12) NOT NULL,
`title` VARCHAR(250) NOT NULL,
`stats` INT(1) NOT NULL,
`date` varchar(30) NOT NULL
PRIMARY KEY (`id`)
)
Be sure the two table return and buy have the same number (and type sequence) of colummns .. if not the query fails
try select only the column you need from both the table and be sure that these are in correspondenting number and type
SELECT R.col1, R.col2, R.id as id_return
FROM return R
UNION ALL
SELECT A.col1, A.col2, A.id as id_buy
FROM buy A
WHERE
........
Looking to your code you should select the same number and type of column form boith the table eg de sample below:
(where i have added the different column and selecting null from the table where are not present)
I have aslore referred the proper where condition to each table ..
SELECT
R.'from return' as `source_table`
, R.`id`
, R.`id_buyer`
, null as product
, null as title
, R.`id_seller` as id_seller
, R-`message`
, R.`stats`
, R.`date`
, R.`update`
FROM return R
WHERE R.id_buyer = '$user'
AND (R.stats='1' OR R.stats='3')
UNION ALL
SELECT
A.'from buy'
, A.`id`
, A.`id_buyer`
, A.`product`
, A.`title`
, null
, null
, A.`stats`
, A.`date`
, null
FROM buy A
WHERE
A.id_buyer = '$user'
AND A.stats='4'
ORDER BY `source table`, date DESC LIMIT $from , 20
for retrive te value of the first column you should use in your case
echo $hist_rows["source_table"];
Otherwise i the two table are in some way related you should look at a join (left join) for link the two table and select the the repated column
(but this is another question)
But if you need left join you can try
SELECT
R.`id`
, R.`id_buyer`
, R.`id_seller` as id_seller
, R-`message`
, R.`stats`
, R.`date`
, R.`update`
, A.`id`
, A.`id_buyer`
, A.`product`
, A.`title`
, null
, null
, A.`stats`
, A.`date`
FROM return R
LEFT JOIN buy A ON R.id_buyer = A.id_buyer
AND R.id_buyer = '$user'
AND (R.stats='1' OR R.stats='3')
AND A.stats='4'
ORDER BY R.date DESC LIMIT $from , 20
When you use union all, the queries need to have exactly the same columns in the same order. If the types are not quite the same, then they are converted to the same type.
So, you don't want union all. I'm guessing you want a join. Something like this:
SELECT r.co1, r.col2, . . ., r.id as id_return,
b.col1, b.col2, . . ., b.id as id_buy
FROM return r JOIN
buy b
ON r.id_buyer = b.id_buyer
WHERE r.id_buyer = '$user' and
(r.stats in (1, 3) OR A.stats = 4)
ORDER BY R.date, A.date DESC
LIMIT $from, 20;
This query is only a guess as to what you might want.
Since you're using a union, select a string that you set identifying each query:
SELECT 'R', R.*, R.id as id_return
FROM return R
UNION
ALL
SELECT 'A', A.*, A.id as id_buy
This way your string 'R' or 'A' is the first column, showing you where it came from. We can't really know why it's failing without the full query, but I'd guess your $from might be empty?
As for your
Warning: mysqli_fetch_array() expects parameter 1 to be mysqli_result, boolean given in ...
Run the query directly first to get the sql sorted out before putting it into your PHP script. The boolean false indicates the query failed.

mysql Group by : Slow query

Here is my query
SELECT file_id, file_name, file_date, file_email
FROM (SELECT *
FROM `file`
ORDER BY file_date DESC
) AS t
WHERE file_domains = ''
GROUP BY file_name
ORDER BY file_date DESC
LIMIT 0 , 100
primary key is file_id and index is file_name. Records about 900k
It took about 2 seconds in my local computer.
Is there any optimize for this query?
thanks in advance.
Your query uses a non-standard "feature" (mistake: one non-standard and one semi-standard feature) of MySQL and there is no guarantee that it will not break in future versions of MySQL, when the optimizer will be clever enough to understand that the subquery is redundant.
Add an index on (file_domains, file_name, file_date) and try this version:
SELECT f.file_id, f.file_name, f.file_date, f.file_email
FROM
`file` AS f
JOIN
( SELECT file_name
, MAX(file_date) AS max_file_date
FROM `file`
WHERE file_domains = ''
GROUP BY file_name
ORDER BY max_file_date DESC
LIMIT 0 , 100
) AS fm
ON fm.file_name = f.file_name
AND fm.max_file_date = f.file_date
ORDER BY f.file_date DESC ;
This intermediate query:
SELECT *
FROM `file`
ORDER BY file_date DESC
Fetches 900k records and orders by date, that might be slow.

MySQL Query optimization with JOIN and COUNT

I have the following MySQL Query:
SELECT t1.id, t1.releaseid, t1.site, t1.date, t2.pos FROM `tracers` as t1
LEFT JOIN (
SELECT `releaseid`, `date`, COUNT(*) AS `pos`
FROM `tracers` GROUP BY `releaseid`
) AS t2 ON t1.releaseid = t2.releaseid AND t2.date <= t1.date
ORDER BY `date` DESC , `pos` DESC LIMIT 0 , 100
The idea being to select a release and count how many other sites had also released it prior to the recorded date, to get the position.
Explain says:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY t1 ALL NULL NULL NULL NULL 498422 Using temporary; Using filesort
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 91661
2 DERIVED tracers index NULL releaseid 4 NULL 498422
Any suggestions on how to eliminate the Using temporary; Using filesort ? It takes a loooong time. The indexes I have thought of and tried haven't helped anything.
Try adding an index on tracers.releaseid and one on tracers.date
make sure you have an index on releaseid.
flip your JOIN, the sub-query must be on the left side in the LEFT JOIN.
put the ORDER BY and LIMIT clauses inside the sub-query.
Try having two indices, one on (date) and one on (releaseid, date).
Another thing is that your query does not seem to be doing what you describe it does. Does it actually count correctly?
Try rewriting it as:
SELECT t1.id, t1.releaseid, t1.site, t1.`date`
, COUNT(*) AS pos
FROM tracers AS t1
JOIN tracers AS t2
ON t2.releaseid = t1.releaseid
AND t2.`date` <= t1.`date`
GROUP BY t1.releaseid
ORDER BY t1.`date` DESC
, pos DESC
LIMIT 0 , 100
or as:
SELECT t1.id, t1.releaseid, t1.site, t1.`date`
, ( SELECT COUNT(*)
FROM tracers AS t2
WHERE t2.releaseid = t1.releaseid
AND t2.`date` <= t1.`date`
) AS pos
FROM tracers AS t1
ORDER BY t1.`date` DESC
, pos DESC
LIMIT 0 , 100
This answer below maybe not change explain output, however if your major problem is sorting data, which it identified by removing order clause will makes your query run faster, try to sort your subquery join table first and your query will be:
SELECT t1.id, t1.releaseid, t1.site, t1.date, t2.pos FROM `tracers` as t1
LEFT JOIN (
SELECT `releaseid`, `date`, COUNT(*) AS `pos`
FROM `tracers` GROUP BY `releaseid`
ORDER BY `pos` DESC -- additional order
) AS t2 ON t1.releaseid = t2.releaseid AND t2.date <= t1.date
ORDER BY `date` DESC , `pos` DESC LIMIT 0 , 100
Note: My db version is mysql-5.0.96-x64, maybe in another version you get different result.