mysql regex() alternation match order? - mysql

With the following query:
SELECT * FROM people WHERE name REGEXP(bob|robert)
Am I right in assuming that mysql will scan each row in a fairly random order looking for either 'bob' or 'robert' (rather than bob first, then another scan for robert)?
If so, is there any way to get mysql to attempt to match the entire table against 'bob' first and then 'robert' without performing two seperate queries?

SELECT * FROM people WHERE name REGEXP(bob|robert) order by name desc
It is only one query, and do the job.
SGBD can scan the data as they are please to do, it is not specify in SQL, and it is not random.
Unspecified can be random but isn't random.

There is no logical way to match the entire table against bob first (why would you want to?)
You can order the results, though, but it can be slow if the table has high cardinality and/or name is not a key.
SELECT * FROM people WHERE name = 'bob' OR name = 'robert'
ORDER BY name = 'bob' DESC

Related

How to get the closest matches first from MySQL

I have a table of 10 million records and and am trying to select user details like firstname, lastname and country. I need to get results back in order where (order by column="abc") would give me results where those that match are ranked on the top.
what I have tried
Query one
-- this is match slower with 45+ seconds
select firstname, lastname, town
from user_db
order by town="abc" DESC
limit 25;
Query two
-- much faster with 0.00019 seconds
select firstname, lastname, town
from user_db
order by town DESC
limit 25;
The problem
The first query also works but takes 45+ seconds while if I remove the equals expression in the (order by clause) like in the second query, it's match faster. And obviously I do use where clauses but this is a simplified example.
other notes
There are currently no joins on the query as it is just a simple select statment of user details and my setup is pretty good with 30GB ram and 2TB of storage all local.
Indexes: All columns mentioned have indexes but the (order by town="abc") clause triggers a full table search and as a result, this ends up finishing in 2 minutes
Is there a way to get results ranked by closest matches first faster within a single query?
Any help will gladly be appreciated. Thank you.
It looks to me like your user_db table has an index on your town column. That means ORDER BY town DESC LIMIT 25 can be satisfied in O(1) constant time by random-accessing the index to the last row and then scanning 25 rows of the index.
But your ORDER BY town='abc' DESC LIMIT 25 has to look at, and sort, every single row in the table. MySQL doesn't use an index to help compute that town='abc' condition when it appears in an ORDER BY clause.
Many people with requirements like yours use FULLTEXT searching and ordering by the MATCH() function. That gets a useful ordering for a person looking at the closest matches like in the searching location bar of a web browser. But don't expect Google-like match accuracy from MySQL.
You can decouple the query into two queries each one being very fast.
First, create an index on town.
create index ix1 on user_db (town);
Then get the matches, with a limit of 25 rows:
select * from user_db where town = 'abc' limit 25
The query above may return any number of rows between 0 and 25: let's call this number R. Then, get the non-matches:
select * from user_db where town <> 'abc' limit 25 - R
Assemble both result sets and problem solved. Even if the second query results in a table scan, it will be concluded earlier resulting in a low cost.
One way is to add a new column that has a value of country="abc", then sort by this column.
I'm rebuilding my workspace right now so I cannot try it properly, but something like:
select firstname, lastname, town, town="abc" as sortme
from user_db
order by sortme desc, town, lastname, firstname
limit 25;
While it is unclear what you mean by "closest match" it is difficult to answer your question. Are "abd", "bc" etc regarded a close match to "abc"? Should the word "abc" appear in the town and match "abcville"?
There are a number of options.
Appearance of search string
Using a like "%abc%" where clause will find all towns with the string "abc" appearing in it.
select firstname, lastname
from user_db
where town like "%abc%"
order by town
Leave out the first % if you want to search by towns starting with "abc". The advantage is that this probably will search in the index if there is one for town. Use "abc%" to find towns starting with "abc". There is no ranking but you could add a sort.
Use a fulltext index
Create a FULLTEXT index on town:
ALTER TABLE user_db
ADD FULLTEXT(town);
And use this with a match:
SELECT
MATCH(town) AGAINST('abc') AS Relevance,
firstname, lastname
FROM user_db
WHERE MATCH(town) AGAINST('abc')
ORDER BY Relevance DESC
LIMIT 15
Match uses words to calculate the match so in this case the string "abc" must appear with spaces in in the town in order to have a match. The NATURAL LANGUAGE options work well for plain texts but might not do so for town names.
To be honest I have no experience with FULLTEXT and match performance but it probably is well optimized and works fairly good on large tables.
Create additional fields
As storage is cheap and time is not you might want to consider adding additional fields with search strings or alternative spellings for 'town' create all the indexes you'll need and use that as a search source. As this will need analysis of your use case it is difficult to provide an solution.

MySQL query takes long to execute if using ORDER BY String Column

So my query on a table that contains 4 million records executes instant if I dont use order by. However I want to give my clients a way to sort results by Name field and only show last 100 of the filtered result. As soon as I add order by Name it takes 100 seconds to execute.
My table structure is similar to this:
CREATE TABLE Test(
ID INT PRIMARY KEY AUTO_INCREMENT,
Name VARCHAR(100),
StatusID INT,
KEY (StatusID), <-- Index on StatusID
KEY (StatusID, Name) <-- Index on StatusID, Name
KEY(Name) <-- Index on Name
);
My query simply does something like:
explain SELECT ID, StatusID, Name
FROM Test
WHERE StatusID = 113
ORDER BY Name DESC
LIMIT 0, 100
Above explain when I order by Name gives this result:
StatusID_2 is the composite index of StatausID, Name
Now If I change ORDER BY Name DESC to ORDER BY ID I get this:
How can I make it so that it also examines only 100 rows when using ORDER BY Name?
You can try one thing, try for letters which would be in 100 rows expected in result like
SELECT *
FROM Test
*** Some Joins to filter data or get more columns from other tables
WHERE StatusID = 12 AND NAME REGEXP '^[A-H]'
ORDER BY Name DESC
LIMIT 0, 100
Moreover using index is very important on name (which is already applied) – in this case index range scan will be started and query execution stopped as soon as soon as required amount of rows generated.
So we can't use ID for nothing as it won't scan when it has reached its limit, the only thing we can try for is remove letters which are not possible in expected result and this what we are trying to do with REGEXP
It's hard to tell without the joins and the explain result but you're not making use of index appeareantly.
It might be because of the joins or because you have another key in the where clause. I'd recommend reading this, it covers all possible cases: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html
Increasing the sort_buffer_size and/or read_rnd_buffer_size might help...
You need a composite key based on the filtering WHERE criteria PLUS the order by... create an index on
( StatusID, Name )
This way, the WHERE jumps right to your StatusID = 12 records and ignores the rest of the 4 million... THEN uses the name as a secondary to qualify the ORDER BY.
Without seeing the other tables / join criteria and associated indexes, you might also want to try adding MySQL keyword
SELECT STRAIGHT_JOIN ... rest of query
So it does the query in the order you have selected, but unsure of impact without seeing other joins as noted previously.
ADDITION (per feedback)
I would remove the individual indexes on the ID only so the engine doesn't have to guess which one to use. The composite index can be used as an ID only query regardless of the name so you don't need to have both.
Also, remove the Name only index UNLESS you will ever be querying PRIMARILY on the name as a where qualifier without the ID qualifier... Also, how many total records are even possible for the example IDs you are querying out of the 4 million... You MIGHT want to pull the full set for the id as a sub-query, get a few thousand and have THAT ordered by name which would be quick... something like...
select *
from ( SELECT
ID,
StatusID,
Name
FROM
Test
WHERE
StatusID = 113 ) PreQuery
ORDER BY
Name DESC
LIMIT 0, 100

is there any alternative for this SELECT * FROM type WHERE tid='1' OR tid='2'; query?

is there any alternatives for this MySQL query?
SELECT * FROM type WHERE tid='1' OR tid='2';
here type is the table and tid is the id of the table and i want to select first 2 rows any way
totally there are 3 rows
the above query is working but at certain times server displays nothing ,everyone suggests it is because the server is getting confused due to the query given
any alternatives please....
the IDs are not guaranteed to be gapless. They most likely aren't.
What you need is the keyword LIMIT.
Just as Garr Godfrey already answered, you sort the table by the id (ascending by default) and then limit the results to 2 at maximum
SELECT tid, foo, bar FROM type ORDER BY tid LIMIT 2
You should have a look at the basic SQL keywords. They already do most of the stuff you usually need
maybe if those IDs don't exist you'd have problem, but there is nothing confusing about that query. Try
SELECT * FROM `type` ORDER BY ID LIMIT 2
that will select the lowest 2 ids

Fastest random selection WHERE column X is Y (NULL)

Currently I am using:
SELECT *
FROM
table AS t1
JOIN (
SELECT (RAND() * (SELECT MAX(id) FROM table where column_x is null)) AS id
) AS t2
WHERE
t1.id >= t2.id
and column_x is null
ORDER BY t1.id ASC
LIMIT 1
This is normally extremely fast however when I include the highlighted column_x being Y (null) condition, it gets slow.
What would be the fastest random querying solution where the records' column X is null?
ID is PK, column X is int(4). Table contains about a million records and over 1 GB in total size doubling itself every 24 hours currently.
column_x is indexed.
Column ID may not be consecutive.
The DB engine used in this case is InnoDB.
Thank you.
Getting a genuinely random record can be slow. There's not really much getting around this fact; if you want it to be truly random, then the query has to load all the relevant data in order to know which records it has to choose from.
Fortunately however, there are quicker ways of doing it. They're not properly random, but if you're happy to trade a bit of pure randomness for speed, then they should be good enough for most purposes.
With that in mind, the fastest way to get a "random" record is to add an extra column to your DB, which is populated with a random value. Perhaps a salted MD5 hash of the primary key? Whatever. Add appropriate indexes on this column, and then simply add the column to your ORDER BY clause in the query, and you'll get your records back in a random order.
To get a single random record, simply specify LIMIT 1 and add a WHERE random_field > $random_value where random value would be a value in the range of your new field (say an MD5 hash of a random number, for example).
Of course the down side here is that although your records will be in a random order, they'll be stuck in the same random order. I did say it was trading perfection for query speed. You can get around this by updating them periodically with fresh values, but I guess that could be a problem for you if you need to keep it fresh.
The other down-side is that adding an extra column might be too much to ask if you have storage constraints and your DB is already massive in size, or if you have a strict DBA to get past before you can add columns. But again, you have to trade off something; if you want the query speed, you need this extra column.
Anyway, I hope that helped.
I don't think you need a join, nor an order by, nor a limit 1 (providing the ids are unique).
SELECT *
FROM myTable
WHERE column_x IS NULL
AND id = ROUND(RAND() * (SELECT MAX(Id) FROM myTable), 0)
Have you ran explain on the query? What was the output?
Why not store or cache the value of : SELECT MAX(id) FROM table where column_x is null and use that as a variable. your query would then become:
$rand = rand(0, $storedOrCachedMaxId);
SELECT *
FROM
table AS t1
WHERE
t1.id >= $rand
and column_x is null
ORDER BY t1.id ASC
LIMIT 1
A simpler query will likely be easier on the db.
Know that if your data contains sizable holes - you aren't going to get consistently random results with these kind of queries.
I'm new to MySQL syntax, but digging a little further I think a dynamic query might work. We select the Nth row, where the Nth is random:
SELECT #r := CAST(COUNT(1)*RAND() AS UNSIGNED) FROM table WHERE column_x is null;
PREPARE stmt FROM
'SELECT *
FROM table
WHERE column_x is null
LIMIT 1 OFFSET ?';
EXECUTE stmt USING #r;

optimize SELECT query, knowing that we are dealing with a limited range

I am trying to include in a MYSQL SELECT query a limitation.
My database is structured in a way, that if a record is found in column one then only 5000 max records with the same name can be found after that one.
Example:
mark
..mark repeated 5000 times
john
anna
..other millions of names
So in this table it would be more efficent to find the first Mark, and continue to search maximum 5000 rows down from that one.
Is it possible to do something like this?
Just make a btree index on the name column:
CREATE INDEX name ON your_table(name) USING BTREE
and mysql will silently do exactly what you want each time it looks for a name.
Try with:
SELECT name
FROM table
ORDER BY (name = 'mark') DESC
LIMIT 5000
Basicly you sort mark 1st then the rest follow up and gets limited.
Its actually quite difficult to understand your desired output .... but i think this might be heading in the right direction ?
(SELECT name
FROM table
WHERE name = 'mark'
LIMIT 5000)
UNION
(SELECT name
FROM table
WHERE name != 'mark'
ORDER BY name)
This will first get upto 5000 records with the first name as mark then get the remainder - you can add a limit to the second query if required ... using UNION
For performance you should ensure that the columns used by ORDER BY and WHERE are indexed accordingly
If you make sure that the column is properly indexed, MySQL will take care off optimisation for you.
Edit:
Thinking about it, I figured that this answer is only useful if I specify how to do that. user nobody beat me to the punch: CREATE INDEX name ON your_table(name) USING BTREE
This is exactly what database indexes are designed to do; this is what they are for. MySQL will use the index itself to optimise the search.