How to get the closest matches first from MySQL

How to get the closest matches first from MySQL - mysql

I have a table of 10 million records and and am trying to select user details like firstname, lastname and country. I need to get results back in order where (order by column="abc") would give me results where those that match are ranked on the top.
what I have tried
Query one
-- this is match slower with 45+ seconds
select firstname, lastname, town
from user_db
order by town="abc" DESC
limit 25;
Query two
-- much faster with 0.00019 seconds
select firstname, lastname, town
from user_db
order by town DESC
limit 25;
The problem
The first query also works but takes 45+ seconds while if I remove the equals expression in the (order by clause) like in the second query, it's match faster. And obviously I do use where clauses but this is a simplified example.
other notes
There are currently no joins on the query as it is just a simple select statment of user details and my setup is pretty good with 30GB ram and 2TB of storage all local.
Indexes: All columns mentioned have indexes but the (order by town="abc") clause triggers a full table search and as a result, this ends up finishing in 2 minutes
Is there a way to get results ranked by closest matches first faster within a single query?
Any help will gladly be appreciated. Thank you.

It looks to me like your user_db table has an index on your town column. That means ORDER BY town DESC LIMIT 25 can be satisfied in O(1) constant time by random-accessing the index to the last row and then scanning 25 rows of the index.
But your ORDER BY town='abc' DESC LIMIT 25 has to look at, and sort, every single row in the table. MySQL doesn't use an index to help compute that town='abc' condition when it appears in an ORDER BY clause.
Many people with requirements like yours use FULLTEXT searching and ordering by the MATCH() function. That gets a useful ordering for a person looking at the closest matches like in the searching location bar of a web browser. But don't expect Google-like match accuracy from MySQL.

You can decouple the query into two queries each one being very fast.
First, create an index on town.
create index ix1 on user_db (town);
Then get the matches, with a limit of 25 rows:
select * from user_db where town = 'abc' limit 25
The query above may return any number of rows between 0 and 25: let's call this number R. Then, get the non-matches:
select * from user_db where town <> 'abc' limit 25 - R
Assemble both result sets and problem solved. Even if the second query results in a table scan, it will be concluded earlier resulting in a low cost.

One way is to add a new column that has a value of country="abc", then sort by this column.
I'm rebuilding my workspace right now so I cannot try it properly, but something like:
select firstname, lastname, town, town="abc" as sortme
from user_db
order by sortme desc, town, lastname, firstname
limit 25;

While it is unclear what you mean by "closest match" it is difficult to answer your question. Are "abd", "bc" etc regarded a close match to "abc"? Should the word "abc" appear in the town and match "abcville"?
There are a number of options.
Appearance of search string
Using a like "%abc%" where clause will find all towns with the string "abc" appearing in it.
select firstname, lastname
from user_db
where town like "%abc%"
order by town
Leave out the first % if you want to search by towns starting with "abc". The advantage is that this probably will search in the index if there is one for town. Use "abc%" to find towns starting with "abc". There is no ranking but you could add a sort.
Use a fulltext index
Create a FULLTEXT index on town:
ALTER TABLE user_db
ADD FULLTEXT(town);
And use this with a match:
SELECT
MATCH(town) AGAINST('abc') AS Relevance,
firstname, lastname
FROM user_db
WHERE MATCH(town) AGAINST('abc')
ORDER BY Relevance DESC
LIMIT 15
Match uses words to calculate the match so in this case the string "abc" must appear with spaces in in the town in order to have a match. The NATURAL LANGUAGE options work well for plain texts but might not do so for town names.
To be honest I have no experience with FULLTEXT and match performance but it probably is well optimized and works fairly good on large tables.
Create additional fields
As storage is cheap and time is not you might want to consider adding additional fields with search strings or alternative spellings for 'town' create all the indexes you'll need and use that as a search source. As this will need analysis of your use case it is difficult to provide an solution.

Related

Using index with IN clause and ordering by primary key

I am having a problem with the following task using MySQL. I have a table Records(id,enterprise, department, status). Where id is the primary key, and enterprise and department are foreign keys, and status is an integer value (0-CREATED, 1 - APPROVED, 2 - REJECTED).
Now, usually the application need to filter something for a concrete enterprise and department and status:
SELECT * FROM Records WHERE status = 0 AND enterprise = 11 AND department = 21
ORDER BY id desc LIMIT 0,10;
The order by is required, since I have to provide the user with the most recent records. For this query I have created an index (enterprise, department, status), and everything works fine. However, for some privileged users the status should be omitted:
SELECT * FROM Records WHERE enterprise = 11 AND department = 21
ORDER BY id desc LIMIT 0,10;
This obviously breaks the index - it's still good for filtering, but not for sorting. So, what should I do? I don't want create a separate index (enterprise, department), so what if I modify the query like this:
SELECT * FROM Records WHERE enterprise = 11 AND department = 21
AND status IN (0,1,2)
ORDER BY id desc LIMIT 0,10;
MySQL definitely does use the index now, since it's provided with values of status, but how quick will the sorting by primary key be? Will it take the recent 10 values for each status available, and then merge them, or will it first merge the ids for each status together, and only after that take the first ten (this way it's gonna be much slower I guess).

All of the queries will benefit from one composite query:
INDEX(enterprise, department, status, id)
enterprise and department can swapped, but keep the rest of the columns in that order.
The first query will use that index for both the WHERE and the ORDER BY, thereby be able to find the 10 rows without scanning the table or doing a sort.
The second query is missing status, so my index is less than perfect. This would be better:
INDEX(enterprise, department, id)
At that point, it works like above. (Note: If the table is InnoDB, then this 3-column index is identical to your 2-column INDEX(enterprise, department) -- the PK is silently included.)
The third query gets dicier because of the IN. Still, my 4 column index will be nearly the best. It will use the first 3 columns, but not be able to do the ORDER BY id, so it won't use id. And it won't be able to comsume the LIMIT. Hence the EXPLAIN will say Using temporary and/or Using filesort. Don't worry, performance should still be nice.
My second index is not as good for the third query.
See my Index Cookbook.
"How quick will sorting by id be"? That depends on two things.
Whether the sort can be avoided (see above);
How many rows in the query without the LIMIT;
Whether you are selecting TEXT columns.
I was careful to say whether the INDEX is used all the way through the ORDER BY, in which case there is no sort, and the LIMIT is folded in. Otherwise, all the rows (after filtering) are written to a temp table, sorted, then 10 rows are peeled off.
The "temp table" I just mentioned is necessary for various complex queries, such as those with subqueries, GROUP BY, ORDER BY. (As I have already hinted, sometimes the temp table can be avoided.) Anyway, the temp table comes in 2 flavors: MEMORY and MyISAM. MEMORY is favorable because it is faster. However, TEXT (and several other things) prevent its use.
If MEMORY is used then Using filesort is a misnomer -- the sort is really an in-memory sort, hence quite fast. For 10 rows (or even 100) the time taken is insignificant.

Index Usage on Multiple Columns

I have a table that contains 250 million records recording people who live in the US and their sate, county and settlement. A simplified version looks like:
I have put a combined index on surname, region, subregion and place. The following queries execute in exactly the same time:
SELECT SQL_NO_CACHE surname, place, count(*) as cnt FROM `ustest` group by place, surname;
SELECT SQL_NO_CACHE surname, region, count(*) as cnt FROM `ustest` group by region, surname;
I was under the impression that the first query would not use the index, as I thought that to use an index you had to query on all the columns from left to right.
Can anyone explain how MySQL uses indexes on multiple columns in such instances?

It's hard to tell the specifics of your queries' execution plans without seeing the EXPLAIN output.
But two things jump out:
Both queries must take all rows in the table into account (you don't have a WHERE clause).
Both queries could be satisfied by scanning your compound index based on surname being the lead column of that index. Because you're counting items, it's necessary to do a tight, not loose, index scan. (You can read about those.)
So it's possible that they both have the same execution plan.

MySQL - how to order multiple columns with pagination efficiently

Say I have these four tables:
BRANCH (BRANCH_ID, CITY_ID, OWNER_ID, SPECIALTY_ID, INAUGURATION_DATE)
CITY (CITY_ID, NAME)
OWNER (ONWER_ID, NAME)
SPECIALTY (SPECIALTY_ID, NAME)
I have a PrimeFaces datatable where I will show all branches using pagination of 50 (LIMIT X, 50). Today BRANCH has like 10000 rows. I'll join BRANCH with the other 3 tables because I want to show their names.
I want to fetch the results with the following default sort:
ORDER BY INAUGURATION_DATE ASC, C.NAME ASC, O.NAME ASC, S.NAME ASC
Now, the user can choose to click in the header of any of these columns in my datatable, and I will query the database again making the sort he asked as the priority one. For instance, if he chose to order first by specialty name, descending, I'll do:
ORDER BY S.NAME DESC, INAUGURATION_DATE ASC, C.NAME ASC, O.NAME ASC
Now my question: how can I query the database with this dynamic sort always using the 4 columns, efficiently? A lot of users can be viewing this datatable in my site at the same time (like 1000 users), so using the ORDER BY in the SQL is very slow. I'm doing the ordering in Java, but then I cannot do the pagination correctly. How can I make this efficiently in SQL? Is creating indexes for these columns enough?
Thanks

10000 rows is quite small, so mysql should be able to handle that very fast. Assuming you have proper indexes on the City, Owner, and Speciality class (which will be the case if you declare primary keys) this query should return quickly. Also be sure to use LIMIT 50 in your query. However if the number of rows becomes large (like a million or even much more. You should just time the query to find out where it begins to slow down) then you individual indexes on City_ID, Owner_ID, Speciality_id, or inauguration_date will not help. To take advantage of the sort, assuming that your are just doing a join and there are no where clauses then you the index will need to be on all columns in the order you wish to sort. So you will need quite a few indexes to cover all the cases. If performance becomes an issue, you may want to consider whether the application needs all those options. Perhaps you could offer the user to change the sort of just any one column. In that case individual indexes will help. Also when the number of rows gets large, the performance bottleneck may not be sorting but rather how you are performing the pagination. I like the approach in https://stackoverflow.com/a/19609938/4350148.
One last point. Mysql caches queries by default. So if the tables are not changing then the queries should return without even having to do the sorting.

SQL query running really slow

I am running this query to search the database:
SELECT
IFNULL(firstname, '') AS firstname,
IFNULL(lastname, '') AS lastname,
IFNULL(age, ' ') AS age,
email,
telephone,
comments,
ref
FROM person
RIGHT JOIN
order ON person.oID = order.ref
WHERE
LOWER(firstname) LIKE LOWER ('%{$search}%') OR
LOWER(lastname) LIKE LOWER ('%{$search}%') OR
LOWER(email) LIKE LOWER ('%{$search}%') OR
LOWER(telephone) LIKE LOWER ('%{$search}%') OR
LOWER(ref) LIKE LOWER ('%{$search}%');
It's doing a lot of processing, but how can I get these results faster? The page is taking about 6-7 seconds to load, If i run the query in PHPMyAdmin, the query takes 3-4 seconds to run. Its not a huge database, 3000 entries or so. I have added an index to the ref, email, firstname and lastname columns but that doesnt seem to have made any difference. Can anyone help?

The reason this query is slow is because you've combined two convenient but slow features of MySQL in the slowest possible way.
FUNCTION(column) LIKE %matchstring% requires a scan of the table; no ordered index can help satisfy this search because it's unanchored.
condition OR condition OR condition requires the table to be rescanned once per OR clause.
You also happen to be ignoring the fact that MySQL's searches are already case-insensitive if you have set up your column collations correctly.
Finally, it's not clear what you're doing with the RIGHT JOINed table data. Which columns of your result set come from that table? If you don't need data from that table get rid of it.
So, in summary, what you have is slow x many.
So, how can you fix this? The most important thing is for you to get rid of as many of these unanchored scans as possible. If you can change them to
email LIKE '{$search}%'
so the LOWER() functions and leading %s in the LIKE terms can be eliminated, you will have a big win.
If this sort of cast-a-wide-net search feature is critical to your application, you should consider using MySQL fulltext searching.
Or you could consider creating a new column in your table that's the concatenation of all the columns you presently search, so you can search it just once.
Edit to explain LIKE slowness
If the column haystack is indexed, the search haystack LIKE 'needle%' runs quite quickly. That's because the BTREE style index is inherently ordered. To search this way, MySQL can random-access the first possible match, and then scan sequentially to the last possible match.
But the search haystack LIKE '%needle%' can't use random access to find the first possible match in the index. The first possible match could be anywhere. So it has to scan all the values of the haystack one by one for the needle.

I would suggest that you change the right join to an inner join. The fields that you are looking for look like they are coming from the person table anyway, so the where clause is turning the query into an inner join.
SELECT
IFNULL(firstname, '') AS firstname,
IFNULL(lastname, '') AS lastname,
IFNULL(age, ' ') AS age,
email,
telephone,
comments,
ref
FROM person INNER JOIN
order
ON person.oID = order.ref
WHERE
LOWER(firstname) LIKE LOWER ('%{$search}%') OR
LOWER(lastname) LIKE LOWER ('%{$search}%') OR
LOWER(email) LIKE LOWER ('%{$search}%') OR
LOWER(telephone) LIKE LOWER ('%{$search}%') OR
LOWER(ref) LIKE LOWER ('%{$search}%');
Second, create an index on order(ref). This should greatly reduce the search space for the where clause. The syntax is:
create index order_ref on `order`(ref);
By the way, order is a bad name for a table, because it is a SQL reserved word. I would suggest orders instead.

why dont you use Full text search instead of bunch of OR and LOWER ?
SELECT
IFNULL(firstname, '') AS firstname,
IFNULL(lastname, '') AS lastname,
IFNULL(age, ' ') AS age,
email,
telephone,
comments,
ref
FROM person
RIGHT JOIN
order ON person.oID = order.ref
WHERE
MATCH (LOWER(firstname), LOWER(lastname),LOWER(email),LOWER(ref))
AGAINST ('$search' IN BOOLEAN MODE)
to run this faster you need to add an index .
ALTER TABLE person ADD FULLTEXT(firstname, lastname,email,ref);

mysql regex() alternation match order?

With the following query:
SELECT * FROM people WHERE name REGEXP(bob|robert)
Am I right in assuming that mysql will scan each row in a fairly random order looking for either 'bob' or 'robert' (rather than bob first, then another scan for robert)?
If so, is there any way to get mysql to attempt to match the entire table against 'bob' first and then 'robert' without performing two seperate queries?

SELECT * FROM people WHERE name REGEXP(bob|robert) order by name desc
It is only one query, and do the job.
SGBD can scan the data as they are please to do, it is not specify in SQL, and it is not random.
Unspecified can be random but isn't random.

There is no logical way to match the entire table against bob first (why would you want to?)
You can order the results, though, but it can be slow if the table has high cardinality and/or name is not a key.
SELECT * FROM people WHERE name = 'bob' OR name = 'robert'
ORDER BY name = 'bob' DESC

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008