I have three table and I have to search them with a like match. The query runs over 10,000 records. It works fine but take 4 seconds to give results. What can I do to improve the speed and take it down to 1 second?
profile_category_table
----------------------
restaurant
sea food restaurant
profile_keywords_table
----------------------
rest
restroom
r.s.t
company_profile_table
---------------------
maha restaurants
indian restaurants
Query:
SELECT name
FROM (
(SELECT PC_name AS name
FROM profile_category_table
WHERE PC_status=1
AND PC_parentid!=0
AND (regex_replace('[^a-zA-Z0-9\-]','',remove_specialCharacter(PC_name)) LIKE '%rest%')
GROUP BY PC_name)
UNION
(SELECT PROFKEY_name AS name
FROM profile_keywords_table
WHERE PROFKEY_status=1
AND (regex_replace('[^a-zA-Z0-9\-]','',remove_specialCharacter(PROFKEY_name)) LIKE '%rest%')
GROUP BY PROFKEY_name)
UNION
(SELECT COM_name AS name
FROM company_profile_table
WHERE COM_status=1
AND (regex_replace('[^a-zA-Z0-9\-]','',remove_specialCharacter(COM_name)) LIKE '%rest%')
GROUP BY COM_name))a
ORDER BY IF(name LIKE '%rest%',1,0) DESC LIMIT 0, 2
And I add INDEX FOR THAT columns too.
if a user search with text rest in textbox..the auto suggestions results should be..
results
restaurant
sea food restaurant
maha restaurants
indian restaurants
rest
restroom
r.s.t
i used regex_replace('[^a-zA-Z0-9-]','',remove_specialCharacter(COM_name) to remove special characters from the field value and to math with that keyword..
There are lots of thing you can consider:
The main killer of performance here is probably the regex_replace() ... like '%FOO%'. Given that you are applying function on the columns, indices are not going to take effect, leaving you several full table scans. Not to mention regex replace is going to be heavy weight. For the sake of optimization, you may
Keep a separate column, which stored the "sanitized" data, for which you create indices on, and leaving your query like where pc_name_sanitized like '%FOO%'
I am not sure if it is available in MySql, but in a lot of DMBS, there is a feature called function-based index. You can consider making use of it to index the regex replace function
However even after the above changes, you will find the performance is not very attractive. In most case, using like with wildcard at the front is avoiding indices to be used. If possible, try to do exact match, or have the beginning of string provided, e.g. where pc_name_sanitized like 'FOO%'
As mentioned by other users mentioned, using UNION is also a performance killer. Try to use UNION ALL instead if possible.
I'm going to say don't filter on the query. Do that on whatever language you're programming in. Regex_replace is a heavy operation regardless of the environment and you're doing this several times on a query of 10,000 records with a union of who knows how many more.
Rewrite it completely.
UNION statements are killing performance, and you're doing the LIKE on too many fields.
Moreover you're searching into a temporary table (SELECT field FROM (...subquery...)), so without any indexes, which is really slow (1/1 chance to go through full-table scan for each row).
Since you use union in between all queries, you can remove the group by option in all queries and you select only column having "rest" in it. so remove the function "IF(name LIKE '%rest%',1,0)"in the order by clause.
Related
There are two name columns in my table. (name1, name2)
I want to receive keywords as input and output them in the most similar order among the data including the keywords.
If the user inputs ed, we want the output to be in the order of 'ed', 'Ed Sheeran' and 'Ahmedzidan'.
(The order of 'Ed Sheeran' and 'Ahmed Zidan' may vary depending on the matching method.)
We want the word 'ed' to be the most similar and immediately followed by the word 'ed'.
I don't know how to do exact matching.
The above 'ed' is searched even if it is included in either name1 or name2.
There is no priority between the two.
The method I am using now:
select
((LENGTH(name1) - LENGTH(( 'ed')))) + ((LENGTH(name) - LENGTH(( 'ed')))
) as score
from user
where name like '%ed%' or name2 like '%ed%'
order by score asc
Another way:
select
(CASE WHEN name1 = 'ed' or name2 = 'ed' THEN 4
WHEN name1 like 'ed%' or name2 like 'ed%' THEN 3
WHEN name1 like '%ed' or name2 like '%ed' THEN 2
WHEN name1 like '%ed%' or name2 like '%ed%' THEN 1
END
)
as score
from user
where name like '%ed%' or name2 like '%ed%'
order by score desc
However, both results are different from what I thought, and I don't know which one is faster.
I tried using a full-text index, but it seems to require too much sacrifice(?) to search for one alphabet.
And it was too slow when when the user was typing keywords in long words.
Example: keyword : ed -> 0.2s , keyword : ed Sheeran -> 5s.
What is the best way?
If the above two methods are the best, which one could be faster?
Let me discuss the performance impact of each part of the query:
The WHERE has OR and LIKE with a leading wildcard. Each of those forces the query to do a full scan, checking every row.
I don't need to discuss further; all other aspects (including the lengthy CASE) are less important in judging speed. Things like POSITION vs the alternative might shave off 1%.
If the table is huge (and cannot be cached in RAM), then this would help some: INDEX(name1, name2) The trick here is to change a table scan into an index scan.
All work is done in the "buffer_pool" in RAM. When a table is bigger than RAM, and the query needs to look at all the rows, the processing must bump things out of the buffer_pool to load data from disk. I/O is likely to be the biggest factor in performance.
The table's BTree contains all the columns for all the rows. The INDEX mentioned contains one row for each name1, name2 and whatever column(s) comprise the PRIMARY KEY. That is, the index is likely to be smaller than the table. Hence the index might sit in RAM, whereas the data would have to be page in. (It's about I/O.)
I think you can use POSITION function and order by it. There is no need such using select CASE becouse there is no such a logic oparation in your query and duplicate LIKE function, that to much wasteting time. With using POSTION function you can get the result like you want, if you just want to order "Ed Sharon" first than followed By other "ed" like "Mr. Bambang Ed".
SELECT name, POSITION('a' IN name) pos FROM user WHERE name LIKE '%a%' ORDER BY pos ASC
I have 3 tables. All 3 tables have approximately 2 million rows. Everyday 10,000-100,000 new entries are entered. It takes approximately 10 seconds to finish the sql statement below. Is there a way to make this sql statement faster?
SELECT customers.name
FROM customers
INNER JOIN hotels ON hotels.cus_id = customers.cus_id
INNER JOIN bookings ON bookings.book_id = customers.book_id
WHERE customers.gender = 0 AND
customers.cus_id = 3
LIMIT 25 OFFSET 1;
Of course this statement works fine, but its slow. Is there a better way to write this code?
All database servers have a form of an optimization engine that is going to determine how best to grab the data you want. With a simple query such as the select you showed, there isn't going to be any way to greatly improve performance within the SQL. As others have said sub-queries won't helps as that will get optimized into the same plan as joins.
Reduce the number of columns, add indexes, beef up the server if that's an option.
Consider caching. I'm not a mysql expert but found this article interesting and worth a skim. https://www.percona.com/blog/2011/04/04/mysql-caching-methods-and-tips/
Look at the section on summary tables and consider if that would be appropriate. Does pulling every hotel, customer, and booking need to be up-to-the-minute or would inserting this into a summary table once an hour be fine?
A subquery don't help but a proper index can improve the performance so be sure you have proper index
create index idx1 on customers(gender , cus_id,book_id, name )
create index idex2 on hotels(cus_id)
create index idex3 on hotels(book_id)
I find it a bit hard to believe that this is related to a real problem. As written, I would expect this to return the same customer name over and over.
I would recommend the following indexes:
customers(cus_id, gender, book_id, name)
hotels(cus_id)
bookings(book_id)
It is really weird that bookings are not to a hotel.
First, these indexes cover the query, so the data pages don't need to be accessed. The logic is to start with the where clause and use those columns first. Then add additional columns from the on and select clauses.
Only one column is used for hotels and bookings, so those indexes are trivial.
The use of OFFSET without ORDER BY is quite suspicious. The result set is in indeterminate order anyway, so there is no reason to skip the nominally "first" value.
This mysql query is runned on a large (about 200 000 records, 41 columns) myisam table :
select t1.* from table t1 where 1 and t1.inactive = '0' and (t1.code like '%searchtext%' or t1.name like '%searchtext%' or t1.ext like '%searchtext%' ) order by t1.id desc LIMIT 0, 15
id is the primary index.
I tried adding a multiple column index on all 3 searched (like) columns. works ok but results are served on a auto filled ajax table on a website and the 2 seond return delay is a bit too slow.
I also tried adding seperate indexes on all 3 columns and a fulltext index on all 3 columns without significant improvement.
What would be the best way to optimize this type of query? I would like to achieve under 1 sec performance, is it doable?
The best thing you can do is implement paging. No matter what you do, that IO cost is going to be huge. If you only return one page of records, 10/25/ or whatever that will help a lot.
As for the index, you need to check the plan to see if your index is actually being used. A full text index might help but that depends on how many rows you return and what you pass in. Using parameters such as % really drain performance. You can still use an index if it ends with % but not starts with %. If you put % on both sides of the text you are searching for, indexes can't help too much.
You can create a full-text index that covers the three columns: code, name, and ext. Then perform a full-text query using the MATCH() AGAINST () function:
select t1.*
from table t1
where match(code, name, ext) against ('searchtext')
order by t1.id desc
limit 0, 15
If you omit the ORDER BY clause the rows are sorted by default using the MATCH function result relevance value. For more information read the Full-Text Search Functions documentation.
As #Vulcronos notes, the query optimizer is not able to use the index when the LIKE operator is used with an expression that starts with a wildcard %.
I am running the following query and however I change it, it still takes almost 5 seconds to run which is completely unacceptable...
The query:
SELECT cat1, cat2, cat3, PRid, title, genre, artist, author, actors, imageURL,
lowprice, highprice, prodcatID, description
from products
where title like '%' AND imageURL <> '' AND cat1 = 'Clothing and accessories'
order by userrating desc
limit 500
I've tried taking out the "like %", taking out the "imageURl <> ''" but still the same. I've tried returning only 1 colum, still the same.
I have indexes on almost every column in the table, certainly all the columns mentioned in the query.
This is basically for a category listing. If I do a fulltext search for something in the title column which has a fulltext index, it takes less than a second.
Should I add another fulltext index to column cat1 and change the query focus to "match against" on that column?
Am I expecting too much?
The table has just short of 3 million rows.
You said you had an index on every column. Do you have an index such as?
alter table products add index (cat1, userrating)
If you don't, give it a try. Run that query and let me know if it run faster.
Also, I assume you're actually setting some kind of filter instead of the % on the title, field, right?
You should rather have the cat1 as a integer, then a string in these 3 million rows. You must also index correctly. If indexing all columns only improved, then it'd be a default thing the system would do.
Apart from that, title LIKE '%' doesn't do anything. I guess you use it to search so it becomes `title LIKE 'search%'
Do you use any sort of framework to fetch this? Getting 500 rows with a lot of columns can exhaust the system if your framework saves this to a large array. It may probably not be the case, but:
Try running a ordinary $query = mysql_query() and while($row = mysql_fetch_object($query)).
I suggest to add an index with the columns queried: title, imageURL and cat1.
Second improvement: use the SQL server cache, it will deadly improve the speed.
Last improvement: if you query is always like that, only the values change, then use prepared statements.
Well, I am quite sure that a % as the first char in a LIKE clause, gives you a full table scan for that column (in your case you won't have that full table scan executed because you already have restricting clauses in the AND clause).
Beside that try to add an index on cat1 column. Also, try to add other criterias to your query, in order to reduce the size of the dataset - your working data set (the number of rows that matches your query, without the LIMIT clause) might be too big also.
So firstly here's my query: (NOTE:I know SELECT * is bad practice I just switched it in to make the query more readable)
SELECT pcln_cities.*,COUNT(pcln_hotels.cityid) AS hotelcount
FROM pcln_cities
LEFT OUTER JOIN pcln_hotels ON pcln_hotels.cityid=pcln_cities.cityid
WHERE pcln_cities.state_name='California' GROUP BY pcln_cities.cityid
ORDER BY hotelcount DESC
LIMIT 5
So I know that to solve things like that you add EXPLAIN to the beginning of the query but I'm not 100% sure how to read the results, so here they are:
alt text http://www.andrew-g-johnson.com/query-results.JPG
Bonus points to an answer that tells me what to look for in the EXPLAIN results
EDIT The cities tables has the following indexes (or is it indices?)
cityid
state_name
and I just added one with both as I thought it might help (it didn't)
The hotels tables has the following indexes (or is it indices?)
cityid
Hmm, there's something not very right in your query.
You use an aggregate function (count), but you simply group by on id.
Normally, you should group on all columns in your select list, that are not an aggregate function.
As you've specified the query now, IMHO, the DBMS can never correctly determine which values he should display for those columns that are not an aggregate ...
It would be more correct if your query was written like:
select cityname, count(*)
from city inner join hotel on hotel.city_id = city_id
group by cityname
order by count(*) desc
If you do not have an index on the cityName, and you filter on cityname, it will improve performance if you put an index on that column.
In short: adding an index on columns that you regularly use for filtering or for sorting, may improve performance.
(That is simply put offcourse, you can use it as a 'guideline', but every situation is different. Sometimes it can be helpfull to add an index which spans multiple columns.
Also, remember that if you update or insert a record, indexes need to be updated as well, so there's a slight performance cost in adding/updating/deleting records)
Another thing that could improve performance, is using an inner join instead of an outer join. I don't think that it is necessary to use an outer join here.
It looks like you don't have an index on pcln_cities.state_name, or pcln_cities.cityid? Try adding them.
Given that you've updated your question to say that you do have these indexes, I can only suggest that your database currently has a preponderance of cities in California, so the query optimizer decided it would be easier to do a table scan and throw out the non-California ones than to use the index to pick out the California ones.
Your query looks fine. Is there a chance that something else has a lock on a record that you need? Are the tables especially big? I doubt that data is the problem as there are not that many hotels...
I've run in to similar issues with MySQL. After spending over a year tuning, patching, and thinking I'm a SQL dummy, I switched to SQL Server Express. The exact same queries with the exact same data would run 2-5 orders of magnitude faster in SQL Server Express. MySQL seemed to have an especially difficult time with moderately complex queries (5+ tables). I think the MySQL optimizer became retarded after SUN bought the organization...