mySQL find most common starting letter for values - mysql

From a mySQL table I would like to determine the most frequent starting letter; for example if the list is:
day
book
cat
dog
apple
The expected result would ultimately allow me to determine that:
'd' is the most frequent starting letter
'd' has a count of 2
Is there a way to do this without running 26 queries, e.g.:
WHERE myWord LIKE 'a%'
WHERE myWord LIKE 'b%'
...
WHERE myWord LIKE 'y%'
WHERE myWord LIKE 'z%'
I found this SO question which makes me think I can do this in 2 steps:
If I'm not mistaken the approach would be to first build a list of all the first letters using the approach from this SO Answer something like this:
SELECT DISTINCT LEFT(word_name, 1) as letter, word_name
FROM word
GROUP BY (letter)
ORDER BY letter
which I expect would look something like:
a
b
c
d
d
... and then query that list. To do this I would store that new list as a temporary table as per this SO question, something like:
CREATE TEMPORARY TABLE IF NOT EXISTS table2 AS (SELECT * FROM table1)
and query that for Magnitude as per this SO question, something like.
SELECT column, COUNT(*) AS magnitude
FROM table
GROUP BY column
ORDER BY magnitude DESC
LIMIT 1
Is this a sensible approach?
NOTE:
As sometimes happens, in writing this question I think I figured out a way forward, as yet I have no working code. I'll update the question later with code that either works or which needs help.
In the meanwhile I appreciate any feedback, pointers, proposed answers.
Finally, I'm using PHP, PDO, mySQL for this.
TIA
For what it's worth there was an easier way, this is what I ended up with thanks to both who took the time to answer:
$stmt_common2 = $pdo->prepare('SELECT COUNT(*) as occurence,SUBSTRING(word,1,1) as letter
FROM words
GROUP BY SUBSTRING(word,1,1)
ORDER BY occurence DESC, letter ASC
LIMIT 1');
$stmt_common2->execute();
$mostCommon2 = $stmt_common2->fetchAll();
echo "most common letter: " . $mostCommon2[0]['letter'] . " occurs " . $mostCommon2[0]['occurence'] . " times)<br>";

You can achieve by using this simple query
SELECT COUNT(*) as occurence,SUBSTRING(word_name,1,1) as letter
FROM word
GROUP BY SUBSTRING(word_name,1,1)
ORDER BY occurence DESC, letter ASC
LIMIT 1

Related

MySQL - WHERE x IN ( column)

I tried something out. Here is a simple example in SQL Fiddle: Example
There is a column someNumbers (comma-seperated numbers) and I tried to get all the rows where this column contains a specific number. Problem is, the result only contains rows where someNumbers starts with the specific number.
The query SELECT * FROM myTable where 2 in ( someNumbers ) only returns the row with id 2 and not the row with id 1.
Any suggestions? Thank you all.
You are storing data in the wrong format! You should not be storing multiple values in a single string column. You should not be storing numbers as strings. Instead, you should have a junction table with one row per id and per number.
Sometimes, you just have no choice, because someone else created a really poorly designed database. For these situations, MySQL has the function find_in_set():
SELECT *
FROM myTable
WHERE find_in_set(2, someNumbers ) > 0;
The right solution, however, is to fix the data model.
While Gordon's answer is a good one, here is a way to do this with like
SELECT * FROM myTable where someNumbers like '2,%' or someNumbers like '%,2,%' or someNumbers like '%,2'
The first like checks if your array starts with the number you are looking for (2). The second one checks if 2 is within the array and the last like tests for appearance at the end.
Note that the commas are essential here, because something like '%2%' would also match ...,123,...
EDIT: As suggested by the OP it may happen that only a single value is present in the row. Consequently, the query must check this case by doing ... someNumbers = '2'
I would suggest this query :
SELECT * FROM myTable where someNumbers like '%2%'
It will select every entry where someNumbers contains '2'
Select * from table_name where coloumn_name IN(value,value,value)
you can use it

Best way to return "champion" that exists in a table in case the given query "championship" is not found

I have a very big table with strings.
Field "words":
- dog
- champion
- cat
- this is a cat
- pool
- champ
- boots
...
In my example, if a select query is looking for the given string "championship", it won't find it because this string is not in the table.
In that case, I want the query to return "champion" from the table, i.e. the longest string in the table that begins the given word "championship".
The possible match (if found) is the longest one in table between championship, or championshi, or championsh, or champions, ..., or cham, or cha, or ch, or C.
Question: I want to return longest string in table that starts a given string.
I need high speed. Is there a way to create index and query in order to have fast execution of queries?
Here's one query that will return the specified result:
SELECT t.mycol
FROM mytable t
WHERE 'championship' LIKE CONCAT(t.mycol,'%')
ORDER
BY LENGTH(t.mycol) DESC
LIMIT 1
This query can't do a index range scan, it's going to have to be full scan, but it may be able to use an index to satisfy the query.
If you can restrict the search to a finite number of leading letters that need to match to be considered a "hit", you could include another predicate. For example, to match at least 4 characters:
SELECT t.mycol
FROM mytable t
WHERE 'championship' LIKE CONCAT(t.mycol,'%')
AND t.mycol LIKE 'cham%'
ORDER
BY LENGTH(t.mycol) DESC
LIMIT 1
--or--
AND t.mycol >= 'cham'
AND t.mycol < 'chan'
You are a little vague with 'the longest string in the table that begins the given word "championship".' Would "championing" count as a match?
Perhaps the following will help. If you have an index on words, then the following will return the last word before the given word. It should maximize the initial sequence of matches:
select word
from t
where words <= 'championship'
order by words desc
limit 1;
This isn't exactly what you are asking for, but it might work in practice.
EDIT:
If you are looking for an exact match, then the following should use an index on words effectively and return what you want:
select word
from t
where word in ('championship', 'championshi', 'championsh', 'champions', 'champion',
'champio', 'champi', 'champ', 'cham', 'cha', 'ch', 'c')
order by word desc
limit 1;
It is a bit brute force, but it should have the property of using the index to speed up the query.
Have a look at this article:
http://blog.fatalmind.com/2010/09/29/finding-the-best-match-with-a-top-n-query/
It explains the solution from this SO question:
How to use index efficienty in mysql query
The solution pattern looks like this:
select words
from (
select words
from yourtable
where words <= 'championship'
order by words desc
limit 1
) tmp
where 'championship' like concat (words, '%')

Order by last 3 chars

I have a table like:
id name
--------
1 clark_009
2 clark_012
3 johny_002
4 johny_010
I need to get results in this order:
johny_002
clark_009
johny_010
clark_012
Do not ask me what I already tried, I have no idea how to do this.
This will do it, very simply selecting the right-most 3 characters and ordering by that value ascending.
SELECT *
FROM table_name
ORDER BY RIGHT(name, 3) ASC;
It should be added that as your data grows, this will become an inefficient solution. Eventually, you'll probably want to store the numeric appendix in a separate, indexed integer column, so that sorting will be optimally efficient.
you should try this.
SELECT * FROM Table order by SUBSTRING(name, -3);
good luck!
You may apply substring_index function to parse these values -
select * from table order by substring_index(name, '_', -1)
You can use MySQL SUBSTRING() function to sort by substring
Syntax : SUBSTRING(string,position,length)
Example : Sort by last 3 characters of a String
SELECT * FROM TableName ORDER BY SUBSTRING(FieldName, -3);
#OR
SELECT * FROM TableName ORDER BY SUBSTRING(FieldName, -3,3);
Example : Sort by first 3 characters of a String
SELECT * FROM TableName ORDER BY SUBSTRING(FieldName, 1,3);
Note : Positive Position/Index start from Left to Right and Negative Position/Index start from Right to Left of the String.
Here is the details about SUBSTRING() function.
If you want to order by the last three characters (from left to right) with variable name lengths, I propose this:
SELECT *
FROM TABLE
ORDER BY SUBSTRING (name, LEN(name)-2, 3)
The index starts at lenght of name -2 which is the third last character.
I'm a little late but just encountered the same problem and this helped me.

Ordering a Union Query in MS Access SQL

OK I have a particularly nasty union ordering problem so any help would be appreciated.
The scenario is this:
Member Table with the following records (actual data):
REI882
YUI987
POBO37
NUBS26
BTBU12
MZBY10
TYBW54
(These are listed in the order I want them back from my query.)
There are a number of business rules about the construction of these MemberIDs which I believe are unrelated to the sort. They're historic and set in stone. I'm stuck with them. They indicate seniority of the member.
The ordering is done from the last 4 characters in the ID, ascending. The first two characters of the ID are completely meaningless as far as the sort is concerned.
So the topmost possible record is ??A001 (most senior) and the lowest possible record is ??ZZ99 (least senior).
When I query my member table the list I get back must display most senior at top... Obviously a standard sort does not work. This is what I have to date:
The first of these queries deals with sorting members whose ID only has 1 leading letter. The second deals with those with 2 leading letters.
SELECT * FROM (
SELECT Member.ID
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=-1)) **check the 4th character is a digit
ORDER BY (Mid([Member.ID],3,1)), (Mid([Member.ID],4,1)), (Mid([Member.ID],5,1)), (Mid([Member.ID],6,1))
) t1
UNION
SELECT * FROM (
SELECT Member.ID
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=0)) **check the 4th character is a letter
ORDER BY (Mid([Member.ID],3,1)), (Mid([Member.ID],4,1)), (Mid([Member.ID],5,1)), (Mid([Member.ID],6,1))
) t2
But I get CRAZY results with the union! If I run each of the selects individually - no problem my funky (heavily reliant on some nasty string manipulation in access!) sort works exactly as I want it.
I understand this is pretty complicated but I hope I've explained it clearly and that someone is up for some kudos for figuring it out!!!
edit: The result from my query is seemingly random:
YUI987
MZBY10
NUBS26
BTBU12
REI882
POBO37
TYBW54
ORDER BY in a SELECT statement that UNION with another SELECT is not correct.
See Specifying a conditional order here
You can use this:
SELECT ID FROM(
(SELECT Member.ID,1 AS T,Left([Member.ID],2) AS Part1, Right([Member.ID],4) AS Part2
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],3,1)))=-1)))
UNION
(SELECT Member.ID,2 AS T,Left([Member.ID],3) AS Part1, Right([Member.ID],3) AS Part2
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=-1) and ((IsNumeric(Mid([Member.ID],3,1)))=0)))
UNION
(SELECT Member.ID,3 AS T,Left([Member.ID],4) AS Part1, Right([Member.ID],2) AS Part2
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],5,1)))=-1) and ((IsNumeric(Mid([Member.ID],4,1)))=0)))
ORDER BY T,Part1,Part2)
#Justin Kirk: I don't know what is your problem exactly. But I hope it can help you
Why are you not using the RIGHT function.
Something like
SELECT ID
FROM (
SELECT ID
FROM (
SELECT Member.ID
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=-1)) **check the 4th character is a digit
) t1
UNION
SELECT ID
FROM (
SELECT Member.ID
FROM Member
WHERE (((IsNumeric(Mid([Member.ID],4,1)))=0)) **check the 4th character is a letter
) t2
) t3
ORDER BY RIGHT(ID,4)
How about skipping the UNION?
SELECT members.ID
FROM members
ORDER BY Right([ID],3), Right(id,4)
Based on the new rules, this mess may work.
SELECT
Len(IIf([textId] Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Left([textid],2),
IIf([textId] Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Left([textid],3),
IIf([textId] Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Left([textid],4),"_")))) AS Ln,
IIf(textId Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Left(textid,2),
IIf(textId Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Left(textid,3),
IIf(textId Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Left(textid,4),"_"))) AS Alpha,
IIf(textId Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Val(Right(textid,4)),
IIf(textId Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Val(Right(textid,3)),
IIf(textId Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Val(Right(textid,2)),0))) AS Numbr,
table.textid
FROM table
ORDER BY
Len(IIf([textId] Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Left([textid],2),
IIf([textId] Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Left([textid],3),
IIf([textId] Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Left([textid],4),"_")))),
IIf(textId Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Left(textid,2),
IIf(textId Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Left(textid,3),
IIf(textId Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Left(textid,4),"_"))),
IIf(textId Like "[a-z][a-z][0-9][0-9][0-9][0-9]",Val(Right(textid,4)),
IIf(textId Like "[a-z][a-z][a-z][0-9][0-9][0-9]",Val(Right(textid,3)),
IIf(textId Like "[a-z][a-z][a-z][a-z][0-9][0-9]",Val(Right(textid,2)),0)))

SQL search result ranking

I have a table call objects which there are the columns:
object_id,
name_english(vchar),
name_japanese(vchar),
name_french(vchar),
object_description
for each object.
When a user perform a search, they may enter either english, japanese or french... and my sql statement is:
SELECT
o.object_id,
o.name_english,
o.name_japanese,
o.name_french,
o.object_description
FROM
objects AS o
WHERE
o.name_english LIKE CONCAT('%',:search,'%') OR
o.name_japanese LIKE CONCAT('%',:search,'%') OR
o.name_french LIKE CONCAT('%',:search,'%')
ORDER BY
o.name_english, o.name_japanese, o.name_french ASC
And some of the entries are like:
Tin spoon,
Tin Foil,
Doctor Martin Shoes,
Martini glass,
Cutting board,
Ting Soda.
So, when the user search the word "Tin" it will return all results of these, but instead I just want to return the results which specific include the term "Tin" or displaying the result and rank them by relevance order. How can I achieve that?
Thanks.
You can use MySQL FULLTEXT indices to do that. This requires the MyISAM table type, an index on (name_english, name_japanese, name_french, object_description) or whatever fields you want to search on, and the appropriate use of the MATCH ... AGAINST operator on exactly that set of columns.
See the manual at http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html, and the examples on the following page http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html
After running the query above , you will get all sort of results including ones that you are not interested, but you can then use regular expressions on the above results(returned by mysql server) set to filter out what u need.
This should do the trick - you may have to filter out duplicates, but the basic idea is obvious.
SELECT
`object`.`object_id`,
`object`.`name_english`,
`object`.`name_japanese`,
`object`.`name_french`,
`object`.`object_info`, 1 as ranking
FROM `objects` AS `object`
WHERE `object`.`name_english` LIKE CONCAT(:search,'%') OR `object`.`name_japanese` LIKE CONCAT(:search,'%') OR `object`.`name_french` LIKE CONCAT(:search,'%')
union
SELECT
`object`.`object_id`,
`object`.`name_english`,
`object`.`name_japanese`,
`object`.`name_french`,
`object`.`object_info`, 10 as ranking
FROM `objects` AS `object`
WHERE `object`.`name_english` LIKE CONCAT('%',:search,'%') OR `object`.`name_japanese` LIKE CONCAT('%',:search,'%') OR `object`.`name_french` LIKE CONCAT('%',:search,'%')
ORDER BY ranking, `object`.`name_english`, `object`.`name_japanese`, `object`.`name_french` ASC