MySQL: Optimized query to find matching strings from set of strings - mysql

I am having 10 sets of strings each set having 9 strings. Of this 10 sets, all strings in first set have length 10, those in second set have length 9 and so on. Finally, all strings in 10th set have length 1.
There is common prefix of (length-2) characters in each set. And the prefix length reduces by 1 in next set. Thus, first set has 8 characters in common, second has 7 and so on.
Here is what a sample of 10 sets look like:
pu3q0k0vwn
pu3q0k0vwp
pu3q0k0vwr
pu3q0k0vwq
pu3q0k0vwm
pu3q0k0vwj
pu3q0k0vtv
pu3q0k0vty
pu3q0k0vtz
pu3q0k0vw
pu3q0k0vy
pu3q0k0vz
pu3q0k0vx
pu3q0k0vr
pu3q0k0vq
pu3q0k0vm
pu3q0k0vt
pu3q0k0vv
pu3q0k0v
pu3q0k0y
pu3q0k1n
pu3q0k1j
pu3q0k1h
pu3q0k0u
pu3q0k0s
pu3q0k0t
pu3q0k0w
pu3q0k0
pu3q0k2
pu3q0k3
pu3q0k1
pu3q07c
pu3q07b
pu3q05z
pu3q0hp
pu3q0hr
pu3q0k
pu3q0m
pu3q0t
pu3q0s
pu3q0e
pu3q07
pu3q05
pu3q0h
pu3q0j
pu3q0
pu3q2
pu3q3
pu3q1
pu3mc
pu3mb
pu3jz
pu3np
pu3nr
pu3q
pu3r
pu3x
pu3w
pu3t
pu3m
pu3j
pu3n
pu3p
pu3
pu9
pud
pu6
pu4
pu1
pu0
pu2
pu8
pu
pv
0j
0h
05
pg
pe
ps
pt
p
r
2
0
b
z
y
n
q
Requirement:
I have a table PROFILES having columns SRNO (type bigint, primary key) and UNIQUESTRING (type char(10), unique key). I want to find 450 SRNOs for matching UNIQUESTRINGs from those 10 sets.
First find strings like in the first set. If we don't get enough results (ie. 450), find strings like in second set. If we still don't get enough results (450 minus results of first set) find strings like in third set. And so on.
Existing Solution:
I've written query something like:
select srno from profiles
where ( (uniquestring like 'pu3q0k0vwn%')
or (uniquestring like 'pu3q0k0vwp%') -- all those above uniquestrings after this and finally the last one
or (uniquestring like 'n%')
or (uniquestring like 'q%')
)
limit 450
However, after getting feedback from Rick James in this answer I realized this is not optimized query as it touches lot many rows than it needs.
So I plan to rewrite the query like this:
(select srno from profiles where uniquestring like 'pu3q0k0vwn%' LIMIT 450)
UNION DISTINCT
(select srno from profiles where uniquestring like 'pu3q0k0vwp%' LIMIT 450); -- and more such clauses after this for each uniquestring
I like to know if there are any better solutions to do this.

SELECT ...
WHERE str LIKE 'pu3q0k0vw%' AND -- the 10-char set
str REGEXP '^pu3q0k0vw[nprqmj]' -- the 9 next letters
LIMIT ...
# then check for 450; if not enough, continue...
SELECT ...
WHERE str LIKE 'pu3q0k0vt%' AND -- the 10-char set
str REGEXP '^pu3q0k0vt[vyz]' -- the 9 next letters
LIMIT 450
# then check for 450; if not enough, continue...
etc.
SELECT ...
WHERE str LIKE 'pu3q0k0v%' AND -- the 9-char set
str REGEXP '^pu3q0k0v[wyzxrqmtv]' -- the 9 next letters
LIMIT ...
# check, etc; for a total of 10 SELECTs or 450 rows, whichever comes first.
This will be 10+ selects. Each select will be somewhat optimized by first picking rows with a common prefix with LIKE, then it double checks with a REGEXP.
(If you don't like splitting the inconsistent pu3q0k0vw vs. pu3q0k0vt; we can discuss things further.)
You say "prefix"; I have coded the LIKE and REGEXP to assume arbitrary text after the prefix given.
UNION is not viable, since it will (I think) gather all the rows before picking 450. Each SELECT will stop at the LIMIT if there is no DISTINCT GROUP BY or ORDER BY that require gathering everything first.
REGEXP is not smart enough to avoid scanning the entire table; adding the LIKE avoids such (except when more than, say, 20% of the rows match the LIKE).

Related

How do I Query for used BETWEEN Operater for text searches in MySql database?

I have a SQL Table in that i use BETWEEN Operater.
The BETWEEN Operater selects values within range. The values can be numbers, text , dates.
stu_id name city pin
1 Raj Ranchi 123456
2 sonu Delhi 652345
3 ANU KOLKATA 879845
4 K.K's Company Delhi 345546
5 J.K's Company Delhi 123456
I have a query like this:-
SELECT * FROM student WHERE stu_id BETWEEN 2 AND 4 //including 2 & 4
SELECT * FROM `student` WHERE name between 'A' and 'K' //including A & not K
Here My Question is why not including K.
but I want K also in searches.
Don't use between -- until you really understand it. That is just general advice. BETWEEN is inclusive, so your second query is equivalent to:
WHERE name >= 'A' AND
name <= 'K'
Because of the equality, 'K' is included in the result set. However, names longer than one character and starting with 'K' are not -- "Ka" for instance.
Instead, be explicit:
WHERE name >= 'A' AND
name < 'L'
Of course, BETWEEN can be useful. However, it is useful for discrete values, such as integers. It is a bit dangerous with numbers with decimals, strings, and date/time values. That is why I encourage you to express the logic as inequalities.
In supplement to gordon's answer, one way to get what you're expecting is to turn your name into a discrete set of values:
SELECT * FROM `student` WHERE LEFT(name, 1) between 'A' and 'K'
You need to appreciate that K.K's Company is alphabetically AFTER the letter K on its own so it is not BETWEEN, in the same way that 4.1 is not BETWEEN 2 and 4
By stripping it down to just a single character from the start of the string it will work like you expect, but take cautionary note, you should always avoid running functions on values in tables, because if you had a million names, thats a million strings that mysql has to strip out to just the first letter and it might no longer be able to use an index on name, battering the performance.
Instead, you could :
SELECT * FROM `student` WHERE name >= 'A' and name < 'L'
which is more likely to permit the use of an index as you aren't manipulating the stored values before comparing them
This works because it asks for everything up to but not including L.. Which includes all of your names starting with K, even kzzzzzzzz. Numerically it is equivalent to saying number >= 2 and number < 5 which gives you all the numbers starting with 2, 3 or 4 (like the 4.1 from before) but not the 5
Remember that BETWEEN is inclusive at both ends. Always revert to a pattern of a >= b and a < c, a >= c and a < d when you want to specify ranges that capture all possible values
Compare in lexicographical order, 'K.K's Company' > 'K'
We should convert the string to integer. You can try that mysql script with CAST and SUBSTRING. I've updated your script here. It will include the last record as well.
SELECT * FROM student WHERE name CAST(SUBSTRING(username FROM 1) AS UNSIGNED)
BETWEEN 'A' AND 'K';
The script will work. Hope it will helps to you.
Here I've attached my test sample.

How to Find First Valid Row in SQL Based on Difference of Column Values

I am trying to find a reliable query which returns the first instance of an acceptable insert range.
Research:
some of the below links adress similar questions, but I could get none of them to work for me.
Find first available date, given a date range in SQL
Find closest date in SQL Server
MySQL difference between two rows of a SELECT Statement
How to find a gap in range in SQL
and more...
Objective Query Function:
InsertRange(1) = (StartRange(i) - EndRange(i-1)) > NewValue
Where InsertRange(1) is the value the query should return. In other words, this would be the first instance where the above condition is satisfied.
Table Structure:
Primary Key: StartRange
StartRange(i-1) < StartRange(i)
StartRange(i-1) + EndRange(i-1) < StartRange(i)
Example Dataset
Below is an example User table (3 columns), with a set range distribution. StartRanges are always ordered in a strictly ascending way, UserID are arbitrary strings, only the sequences of StartRange and EndRange matters:
StartRange EndRange UserID
312 6896 user0
7134 16268 user1
16877 22451 user2
23137 25142 user3
25955 28272 user4
28313 35172 user5
35593 38007 user6
38319 38495 user7
38565 45200 user8
46136 48007 user9
My current Query
I am trying to use this query at the moment:
SELECT t2.StartRange, t2.EndRange
FROM user AS t1, user AS t2
WHERE (t1.StartRange - t2.StartRange+1) > NewValue
ORDER BY t1.EndRange
LIMIT 1
Example Case
Given the table, if NewValue = 800, then the returned answer should be 23137. This means, the first available slot would be between user3 and user4 (with an actual slot size = 813):
InsertRange(1) = (StartRange(i) - EndRange(i-1)) > NewValue
InsertRange = (StartRange(6) - EndRange(5)) > NewValue
23137 = 25955 - 25142 > 800
More Comments
My query above seemed to be working for the special case where StartRanges where tightly packed (i.e. StartRange(i) = StartRange(i-1) + EndRange(i-1) + 1). This no longer works with a less tightly packed set of StartRanges
Keep in mind that SQL tables have no implicit row order. It seems fair to order your table by StartRange value, though.
We can start to solve this by writing a query to obtain each row paired with the row preceding it. In MySQL, it's hard to do this beautifully because it lacks the row numbering function.
This works (http://sqlfiddle.com/#!9/4437c0/7/0). It may have nasty performance because it generates O(n^2) intermediate rows. There's no row for user0; it can't be paired with any preceding row because there is none.
select MAX(a.StartRange) SA, MAX(a.EndRange) EA,
b.StartRange SB, b.EndRange EB , b.UserID
from user a
join user b ON a.EndRange <= b.StartRange
group by b.StartRange, b.EndRange, b.UserID
Then, you can use that as a subquery, and apply your conditions, which are
gap >= 800
first matching row (lowest StartRange value) ORDER BY SB
just one LIMIT 1
Here's the query (http://sqlfiddle.com/#!9/4437c0/11/0)
SELECT SB-EA Gap,
EA+1 Beginning_of_gap, SB-1 Ending_of_gap,
UserId UserID_after_gap
FROM (
select MAX(a.StartRange) SA, MAX(a.EndRange) EA,
b.StartRange SB, b.EndRange EB , b.UserID
from user a
join user b ON a.EndRange <= b.StartRange
group by b.StartRange, b.EndRange, b.UserID
) pairs
WHERE SB-EA >= 800
ORDER BY SB
LIMIT 1
Notice that you may actually want the smallest matching gap instead of the first matching gap. That's called best fit, rather than first fit. To get that you use ORDER BY SB-EA instead.
Edit: There is another way to use MySQL to join adjacent rows, that doesn't have the O(n^2) performance issue. It involves employing user variables to simulate a row_number() function. The query involved is a hairball (that's a technical term). It's described in the third alternative of the answer to this question. How do I pair rows together in MYSQL?

MySQL match area code only when given the full number

I have a database that lists a few area codes, area code + office codes and some whole numbers and a action. I want it to return a result by the digits given but I am not sure how to accomplish it. I have some MySQL knowledge but its not very deep.
Here is a example:
match | action
_____________________
234 | goto 1
333743 | goto 2
8005551212| goto 3
234843 | goto 4
I need to query the database with a full 10 digit number -
query 8005551212 gives "goto 3"
query 2345551212 gives "goto 1"
query 3337431212 gives "goto 2"
query 2348431212 gives "goto 4"
This would be similar to the LIKE selection, but I need to match against the database value instead of the query value. Matching the full number is easy,
SELECT * FROM database WHERE `match` = 8005551212;
First the number to query will always be 10 digits, so I am not sure how to format the SELECT statement to differentiate the match of 234XXXXXXX and 234843XXXX, as I can only have one match return. Basically if it does not match the 10 digits, then it checks 6 digits, then it will check the 3 digits.
I hope this makes sense, I do not have any other way to format the number and it has to be accomplished with just a single SQL query and return over a ODCB connection in Asterisk.
Try this
SELECT match, action FROM mytable WHERE '8005551212' like concat(match,'%')
The issue is that you will get two rows in one case .. given your data..
SELECT action
FROM mytable
WHERE '8005551212' like concat(match,'%')
order by length(match) desc limit 1
That should get the row that had the most digits matched..
try this:
SELECT * FROM (
SELECT 3 AS score,r.* FROM mytable r WHERE match LIKE CONCAT(SUBSTRING('1234567890',1,3),'%')
UNION ALL
SELECT 6 AS score,r.* FROM mytable r WHERE match LIKE CONCAT(SUBSTRING('1234567890',1,6),'%')
UNION ALL
SELECT 10 AS score,r.* FROM mytable r WHERE match LIKE CONCAT(SUBSTRING('1234567890',1,10),'%')
) AS tmp
ORDER BY score DESC
LIMIT 1;
What ended up working -
SELECT `function`,`destination`
FROM reroute
WHERE `group` = '${ARG2}'
AND `name` = 0
AND '${ARG1}' LIKE concat(`match`,'%')
ORDER BY length(`match`) DESC LIMIT 1

Select data which have same letters

I'm having trouble with this SQL:
$sql = mysql_query("SELECT $menucompare ,
(COUNT($menucompare ) * 100 / (SELECT COUNT( $menucompare )
FROM data WHERE $ww = $button )) AS percentday FROM data WHERE $ww >0 ");
$menucompare is table fields names what ever field is selected and contains data bellow
$button is the week number selected (lets say week '6')
$ww table field name with row who have the number of week '6'
For example, I have data in $menucompare like that:
123456bool
521478bool
122555heel
147788itoo
and I want to select those, who have same word in the last of the data and make percentage.
The output should be like that:
bool -- 50% (2 entries)
heel -- 25% (1 entry)
itoo -- 25% (1 entry)
Any clearness to my SQL will be very appreciated.
I didn't find anything like that around.
Well, keeping data in such format probably not the best way, if possible, split the field into 2 separate ones.
First, you need to extract the string part from the end of the field.
if the length of the string / numeric parts is fixed, then it's quite easy;
if not, you should use regular expressions which, unfortunately, are not there by default with MySQL. There's a solution, check this question: How to do a regular expression replace in MySQL?
I'll assume, that numeric part is fixed:
SELECT s.str, CAST(count(s.str) AS decimal) / t.cnt * 100 AS pct
FROM (SELECT substr(entry, 7) AS str FROM data) AS s
JOIN (SELECT count(*) AS cnt FROM data) AS t ON 1=1
GROUP BY s.str, t.cnt;
If you'll have regexp_replace function, then substr(entry, 7) should be replaced to regexp_replace(entry, '^[0-9]*', '') to achieve the required result.
Variant with substr can be tested here.
When sorting out problems like this, I would do it in two steps:
Sort out the SQL independently of the presentation language (PHP?).
Sort out the parameterization of the query and the presentation of the results after you know you've got the correct query.
Since this question is tagged 'SQL', I'm only going to address the first question.
The first step is to unclutter the query:
SELECT menucompare,
(COUNT(menucompare) * 100 / (SELECT COUNT(menucompare) FROM data WHERE ww = 6))
AS percentday
FROM data
WHERE ww > 0;
This removes the $ signs from most of the variable bits, and substitutes 6 for the button value. That makes it a bit easier to understand.
Your desired output seems to need the last four characters of the string held in menucompare for grouping and counting purposes.
The data to be aggregated would be selected by:
SELECT SUBSTR(MenuCompare, -4) AS Last4
FROM Data
WHERE ww = 6
The divisor in the percentage is the count of such rows, but the sub-stringing isn't necessary to count them, so we can write:
SELECT COUNT(*) FROM Data WHERE ww = 6
This is exactly what you have anyway.
The divdend in the percentage will be the group count of each substring.
SELECT Last4, COUNT(Last4) * 100.0 / (SELECT COUNT(*) FROM Data WHERE ww = 6)
FROM (SELECT SUBSTR(MenuCompare, -4) AS Last4
FROM Data
WHERE ww = 6
) AS Week6
GROUP BY Last4
ORDER BY Last4;
When you've demonstrated that this works, you can re-parameterize the query and deal with the presentation of the results.

mysql sort varchar with numbers

How can I sort this data numerically rather than lexicographically?
100_10A
100_10B
100_10C
100_11A
100_11B
100_11C
100_12A
100_12B
100_12C
100_13A
100_13B
100_13C
100_14A
100_14B
100_14C
100_15A
100_15B
100_15C
100_16A
100_16B
100_16C
100_1A
100_1B
100_1C
100_2A
100_2B
100_2C
100_3A
100_3B
100_3C
100_4A
100_4B
100_4C
100_5A
100_5B
100_5C
100_6A
100_6B
100_6C
100_7A
100_7B
100_7C
100_8A
100_8B
100_8C
100_9A
100_9B
100_9C
select generalcolum from mytable order by blockid, plotid ASC
What I need out of this sort order is
100_1A
100_1B
100_1C...
...
...
100_10A
100_10B
100_10C
What I need to do in some way is have a zero added before the sort happens so that, I can get them in the order I want.
There are two colums, one that stores the 100 (number before the underscore) and one that stores the 1A the value after the underscore.
My sudo crap select
select thiscolum this table
order by blockid, plotid(+1 zero to prefix if len(plotid) < 2)
For example if the plot value is 1A, to do the best sorting, i need it to be looked at as 01A so that it comes before 10A.
order by length(blockid), blockid, length(plotid), plotid