I have a very long list of users. I want to group them by first letter of name. If the first letter is not a letter, it's grouped under #, so I have max 27 groups, for a-z + #.
I want to show only the gorup labels (e.g. F) if it will have results, and for every letter I want to know how many results it will have. So I do a single GROUP query to count all groups:
SELECT
IF(lastname REGEXP '^[a-z]', UPPER(SUBSTRING(lastname, 1, 1)), '#') first_char,
COUNT(1) num_users
GROUP BY first_char
That seems to work, BUT using REGEXP means that Ö isn't an O, but a #. That's a problem, because LIKE does find 'Ö' = 'O', so it will be in the O group when I name LIKE 'O%'. I could use REGEXP in the results query too, but I rather file Ö under O.
So the LIKE query works perfectly, but the GROUP query doesn't. How do I do exactly what LIKE does during comparisons?, so the group numbers and results always perfectly match.
Or another way to count correctly?
edit 1
Using LIKE a OR LIKE b OR .. OR LIKE z in the IF doesn't even work, because then the group might be Ö instead of O. The numbers will be correct, but the group label won't be. I really need a conversion...
edit 2
Thanks to #mpen. lastname REGEXP '^[[:alpha:]]' is shorter than 26 LIKEs, but the Ö label problem remains. Converting that outside MySQL is easy though.
You can do the grouping like this:
select
IF(name REGEXP '^[[:alpha:]]', UPPER(SUBSTRING(name, 1, 1)), '#') first_char,
COUNT(1) num_users
from _grouptest
group by first_char
And then remove the accents in your scripting language of choice, or if you're brave, you can attempt to remove them in pure MySQL.
_.deburr in JS
Str::removeDiacritics from my PHP lib ptilz which was yoinked from WordPress
Related
I currently have the following code
SELECT Name
FROM Menu
WHERE Name LIKE 'S%'
ORDER BY LEFT(Name, 2)
Name must begin with S and I must to sort the result by alphabet, ignoring first two symbols in the start, I have no idea how to sort this
I think you want:
order by substr(name, 3)
This picks up everything from the third character onward.
I am trying to find addresses from a MySQL database by outcode, ie the first letters of a UK postcode. The following snippet works fine for two letter outcodes:
select * from addresstable where LEFT (Postcode, 2) in ('CB','PE','IP')
but I need it to work in cases where the outcode may be only one letter, ie:
select * from addresstable where LEFT (Postcode, 2) in ('B','BS','GL')
and the Left statement will of course fail on the single letter case.
How best can I do this search?
Thanks
Martin
I'm not an expert on UK postcodes, but it seems they always start with either one letter or two, followed by at least one digit. If that is true, you could use PATINDEX to find the first number, and then use SUBSTRING to get the first characters up to that first number:
select *
from addresstable
where SUBSTRING(Postcode,1,PATINDEX('%[0-9]%',Postcode)-1) in ('B','BX','GL')
Apparently, PATINDEX isn't built-in with MySQL. There are some functions you can create that simulates the behavior if you want to go that route. Another option, but a bit more clunky, but may work is to just check the second character to see if it is numeric or not. If it is, use the LEFT for one character, otherwise use the LEFT for two characters:
select *
from #addresstable
where (case when concat('',SUBSTRING(Postcode,2,1) * 1) = SUBSTRING(Postcode,2,1) then left(Postcode,1) else left(Postcode,2) end) in ('B','BX','GL')
You could use an OR clause for length 1
select *
from addresstable
where LEFT(Postcode, 2) in ('BS','GL')
OR LEFT(Postcode, 1) = 'B'
or
select *
from addresstable
where LEFT(Postcode, 2) = 'GL'
OR LEFT(Postcode, 1) = 'B'
Use regular expressions. To get postcodes that start with "B", "BS", or "GL":
where Postcode regexp '^(B|BS|GL)'
Of course, this example is a bit silly because postcodes that start with BS also start with B, so this can be simplified to:
where Postcode regexp '^(B|GL)'
From a mySQL table I would like to determine the most frequent starting letter; for example if the list is:
day
book
cat
dog
apple
The expected result would ultimately allow me to determine that:
'd' is the most frequent starting letter
'd' has a count of 2
Is there a way to do this without running 26 queries, e.g.:
WHERE myWord LIKE 'a%'
WHERE myWord LIKE 'b%'
...
WHERE myWord LIKE 'y%'
WHERE myWord LIKE 'z%'
I found this SO question which makes me think I can do this in 2 steps:
If I'm not mistaken the approach would be to first build a list of all the first letters using the approach from this SO Answer something like this:
SELECT DISTINCT LEFT(word_name, 1) as letter, word_name
FROM word
GROUP BY (letter)
ORDER BY letter
which I expect would look something like:
a
b
c
d
d
... and then query that list. To do this I would store that new list as a temporary table as per this SO question, something like:
CREATE TEMPORARY TABLE IF NOT EXISTS table2 AS (SELECT * FROM table1)
and query that for Magnitude as per this SO question, something like.
SELECT column, COUNT(*) AS magnitude
FROM table
GROUP BY column
ORDER BY magnitude DESC
LIMIT 1
Is this a sensible approach?
NOTE:
As sometimes happens, in writing this question I think I figured out a way forward, as yet I have no working code. I'll update the question later with code that either works or which needs help.
In the meanwhile I appreciate any feedback, pointers, proposed answers.
Finally, I'm using PHP, PDO, mySQL for this.
TIA
For what it's worth there was an easier way, this is what I ended up with thanks to both who took the time to answer:
$stmt_common2 = $pdo->prepare('SELECT COUNT(*) as occurence,SUBSTRING(word,1,1) as letter
FROM words
GROUP BY SUBSTRING(word,1,1)
ORDER BY occurence DESC, letter ASC
LIMIT 1');
$stmt_common2->execute();
$mostCommon2 = $stmt_common2->fetchAll();
echo "most common letter: " . $mostCommon2[0]['letter'] . " occurs " . $mostCommon2[0]['occurence'] . " times)<br>";
You can achieve by using this simple query
SELECT COUNT(*) as occurence,SUBSTRING(word_name,1,1) as letter
FROM word
GROUP BY SUBSTRING(word_name,1,1)
ORDER BY occurence DESC, letter ASC
LIMIT 1
Note that this question is NOT about searching for (non)accented characters.
Suppose I have a table where there is a column name, with collation utf8mb4_unicode_ci.
This collation works perfectly for the purpose of selecting the base selection
in a case-insensitive, accent-insensitive way.
The problem is that I need to order the results in an accent-sensitive and case-insensitive way.
The purpose of this is to select every name starting with some character/string and sort them "alphabetically", first should be not-accented, then accented.
From selection e.g.:
Črpw
Cewo
céag
čefw
The final results should be:
Cewo
céag -- because accented e is more than non-accented
čefw
Črpw -- because r is more than e
Note that c/C < č/Č , but lower/upper cases are handled as equals.
I tried searching for this problem, but there are only popping similar questions or questions about searching, which is not the case, the searching itself is fine.
From mentioned I've tried this test query:
SELECT * FROM
(SELECT 'Črpw' as t
UNION SELECT 'Cewo'
UNION SELECT 'céag'
UNION SELECT 'čefw')virtual
ORDER BY t COLLATE utf8mb4_czech_ci ASC
Which produces something very similar to what I want
céag
Cewo
čefw
Črpw
But note that é gets ordered before e.
Is there a way how to get to the results order I want to have?
Using: MySQL 5.5.54 (Debian)
Is it possible in MySQL to select rows for a certain range of items?
For example when I want to select all items in where the first letter of the NAME is between the B and T, alphabetically.
I know I can make this is PHP aswell, but it would save me a bit of time if this is possible in MySQL...
Is it possible, and if so, how?
The ideal situation would be something like this:
$sql="SELECT * FROM paths FROM name=name1 TO name=name6"; //which would select name1, 2, 3, 4, 5, 6.
Using BETWEEN will basically get you there, but you need to use one letter past where you want to end. Experiment until you get the result you desire.
SELECT * FROM paths WHERE UPPER(name) BETWEEN 'B' AND 'U';
The idea here is that everything beginning with a 'T' will sort alphabetically before anything beginning with a 'U'. You need to convert it to upper-case via UPPER() so you don't run up against potential collation problems.
So your results could be like:
B,
Bill
Bob
Jane
Tommy
Travis
But Uwe (He's German) would be excluded.
You can use BETWEEN like:
SELECT * FROM paths WHERE name BETWEEN 'B' AND 'U'