i want to check if a string contains a field value as a substring or not.
select * from mytable where instr("mystring", column_name);
but this does not search on word boundaries.
select * from mytable where instr("mystring", concat('[[:<:]]',column_name,'[[:>:]]');
does not work either. how to correct this?
You can do this using the REGEXP operator:
SELECT * FROM mytable WHERE 'mystring' REGEXP CONCAT('[[:<:]]', column_name, '[[:>:]]');
Note, however, that this is slow. You might be best off using the MySQL's FULLTEXT search feature if you care about words. Or do a normal InStr() check then filter the results.
If you don't need the return value of the instr use like instead
select * from mytable where column_name like '%mystring%';
As already discussed in the question you asked yesterday, no indexes can be used and performance is going to be bad, but this could work:
select *
from mytable
where 'mystring' = column_name -- exact match
or 'mystring' like concat('% ', column_name) -- word at the end
or 'mystring' like concat(column_name, ' %') -- word at beginning
or 'mystring' like concat('% ', column_name, ' %') -- word in the middle
Related
I have to retrieve distinct entities from a column, all of which start with a vowel. The query looks like this :
Select DISTINCT column_name
FROM table_name
WHERE column_name LIKE '[aeiou]%';
It's not giving compilation errors, but it's not returning anything. Any ideas?
You can use regular expressions:
Select DISTINCT column_name
FROM table_name
WHERE column_name REGEXP '^[aeiou]';
LIKE wildcard patterns do not support character classes, except in a couple of databases that extend the definitions of the LIKE pattern.
Also, you might want:
WHERE column_name REGEXP '^[aeiouAEIOU]'
If you have a case-sensitive collation.
I have a search module with SQL query like this:
SELECT FROM trilers WHERE title '%something%'
And when I search for keyword for example like "spiderman" it returns not found, but when I search for "spider-man" it returns my content (original row in MySQL is "spider-man").
How can I ignore all symbols like -, #, !, : and return content with "spiderman" and "spider-man" keywords at the same time?
What you can do is replace the characters you don't care about before the search takes place.
First iteration would look like this:
SELECT * FROM trilers WHERE REPLACE(title, '-', '') LIKE '%spiderman%'
This would ignore any '-'.
Next you would rap that with another REPLACE to include '#' like this:
SELECT * FROM trilers WHERE REPLACE(REPLACE(title, '-', ''), '#', '') LIKE '%spiderman%'
For all 3 ('!','-','#') you would just increase the Replace with another Replace like this:
SELECT * FROM trilers WHERE REPLACE(REPLACE(REPLACE(title, '-', ''), '#', ''),'!','') LIKE '%spiderman%'
You could try something like
SELECT * FROM trilers WHERE replace(title, '-', '') LIKE '%spiderman%'
The other answers involving using REPLACE are great, but if you don't care what characters appear between "spider" and "man" or how many characters there are between the two strings, you can use an additional wildcard in your expression:
SELECT * FROM Superheroes WHERE HeroName LIKE '%spider%man%';
If you want to match only one character, but allow any character, you can use the _ wildcard, which matches only one character:
SELECT * FROM Superheroes WHERE HeroName LIKE '%spider_man%';
This will match "spideryman" and "spideryman in la la land" but not "spiderysupereliteuberheroman".
If you have a limited number of possible symbols, a way to do it without REPLACE is to use a disjunctive expression:
SELECT * FROM Superheroes WHERE
HeroName LIKE '%spiderman%'
OR
HeroName LIKE '%spider-man%'
OR
HeroName LIKE '%spider#man%'
OR
HeroName LIKE '%spider!man%';
WHERE trilers REGEXP '[[:<:]]spider[-#!:]?man[[:>:]]'
Some discussion:
[[:<:]] -- word boundary
[-#!:] -- character set, matches any of them. ('-' must be first)
[-#!:]? -- optional -- so that 'spiderman' will still match
This, unlike the rest of the answers, will avoid matching
spidermaniac
Also, consider using FULLTEXT.
You should be able to use your search with a small update. You should be able to do something like: SELECT FROM trilers WHERE title LIKE '%spider%'. This should search for anything where spider is before or after something else like the hyphen (-)
I am trying to retrieve a list of names from a table where the surname does not start with z or Z using mysql. I tried combining substring with instr to accomplish this. Attempt below:
SELECT DISTINCT SQL_CALC_FOUND_ROWS CONCAT(FName," ",SName)
FROM Names
WHERE SUBSTRING(
CONCAT(FName, ' ' ,SName),
INSTR(CONCAT(FName, ' ' ,SName), ' ') +1,
1)
<> 'z'
OR SUBSTRING(
CONCAT(FName, ' ' ,SName),
INSTR(CONCAT(FName, ' ' ,SName), ' ') +1,
1)
<> 'Z'
ORDER BY SName
My attempt is returning results with z as the first letter of the surname. Can anyone explain why? Or if there is a better way to achieve this
This can be much shortened with LIKE:
SELECT DISTINCT SQL_CALC_FOUND_ROWS CONCAT(FName," ",SName)
FROM Names
WHERE FName NOT LIKE 'z%' AND FName NOT LIKE 'Z%';
IIRC LIKE is case-sensitive since MySQL v5.6.x
I wouldn't write it like
...WHERE LOWER(FName) NOT LIKE 'z%';
since applying functions on columns prevent MySQL from using the index on the column (if one exists).
SELECT DISTINCT SQL_CALC_FOUND_ROWS CONCAT(FName," ",SName)
FROM Names
WHERE FName REGEXP '^[^z]';
I have a table with 50+ VARCHAR(255) columns.
The moderators report that some of the content is cut of after 250 characters in few of the fields.
As checked this is expected behavior for VARCHAR(255) and I have to update some of the fields to text. But the problem is they can not give me details/instruction which fields are making problems ..
So my best guess is to analyse the current data and find the columns that usually store long content.
Is there a good query structure I can use to get:
- AVG length for each column.
- Max length for each column.
- Count of rows with length 200+ for this column.
SELECT AVG(CHAR_LENGTH(col)) avg_length,
MAX(CHAR_LENGTH(col)) max_length,
COUNT(CASE WHEN CHAR_LENGTH(col) >= 200 THEN 1 ELSE NULL END) 200_plus_count
FROM tbl;
For average select AVG(length(column_name)) and for maximum select MAX(length(column_name)) for count 200+ select COUNT(column_name) from table WHERE len(rows)=>200. This site should help you with other sql related questions, hope I answered your question :)
I recently had reason to implement exactly this. Using similar logic as Arth's and VMai's answers, I built a stored procedure to get all column sizes for a table.
DELIMITER //
CREATE PROCEDURE ColumnSizeForTable(TableName varchar(64), SchemaName varchar(64))
BEGIN
SELECT ##group_concat_max_len INTO #group_concat_max;
SET SESSION group_concat_max_len = 100000;
SELECT
CONCAT('SELECT TRIM(TRAILING \' UNION ALL \' FROM CAST(CONCAT(',
GROUP_CONCAT(
CONCAT(
CONCAT('\'SELECT \\\'', COLUMN_NAME, '\\\' ColName,\''),
', ',
CONCAT('IFNULL(AVG(CHAR_LENGTH(',COLUMN_NAME,')),\'0\'), \' ColAverage,\''),
', ',
CONCAT('IFNULL(MAX(CHAR_LENGTH(',COLUMN_NAME,')),\'0\'), \' ColMaximum,\''),
', ',
CONCAT('IFNULL(COUNT(CASE WHEN CHAR_LENGTH(',COLUMN_NAME,') >= 200 THEN 1 ELSE NULL END),\'0\'), \' Col200Plus UNION ALL \'')
)
),
') AS CHAR)) INTO #unionquery FROM ',
TABLE_NAME,
';')
INTO #columnquery FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = TableName
AND TABLE_SCHEMA = SchemaName
GROUP BY TABLE_NAME;
PREPARE columnsizestmnt FROM #columnquery;
EXECUTE columnsizestmnt;
PREPARE unionstmnt FROM #unionquery;
EXECUTE unionstmnt;
SET SESSION group_concat_max_len = #group_concat_max;
END //
DELIMITER ;
CALL ColumnSizeForTable('TABLENAME','SCHEMANAME');
How this works:
Increase the session variable group_concat_max_len so we don't have to worry about GROUP_CONCAT being cut off.
Execute a concat query that builds another query.
This first query populates the column names.
The resulting query is put into #columnquery
Execute #columnquery. This builds the query to put the data into a readable format
This second query gets the column data, including average, max, and the 200+ count.
Similar to VMai's answer, if we weren't building another query from this, it would be a flattened result set.
The resulting query is put into #unionquery
Execute #unionquery. This SELECT is outputted to the user. It returns the column details that we were trying to collect in a single, readable format.
I'm building on Arths good answer and integrating my comment. This query should build the query to be executed:
SELECT
CONCAT(
-- The `SELECT` keyword
'SELECT ',
-- we build our list of analyzing columns with a GROUP_CONCAT
GROUP_CONCAT(
CONCAT (
-- of the columns from Arths answer
CONCAT ('AVG(CHAR_LENGTH(', COLUMN_NAME, ')) AVG_', COLUMN_NAME), ', ',
CONCAT ('MAX(CHAR_LENGTH(', COLUMN_NAME, ')) MAX_', COLUMN_NAME), ', ',
CONCAT ('COUNT((CASE WHEN CHAR_LENGTH(', COLUMN_NAME, ') >= 200 THEN 1 ELSE NULL END) 200_plus_', COLUMN_NAME)
), ' '),
-- and add the FROM clause
' FROM ',
TABLE_NAME,
';' )
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
-- replace it by your own
TABLE_NAME = 'example10'
GROUP BY
TABLE_NAME;
Note
It's not nicely formatted, but a search by , and replace by ,\n with an editor like notepad++ will make the statement readable. So you'll missing the annoying task of writing this by hand and not making mistakes.
i have 2 tables, and try to eleminate all entries in table 1 (multiple words per row) wich contain one of the entries in table 2. These words from table 2 can be somewhere in the strings of Table 1.
it should find things like: 'house' in 'big house here' or in 'big house'
it should not find things like this: 'house' in 'houses'
I tried to use the locate function like this:
CREATE TABLE `test`
AS (
SELECT
`table1`.`term1`,
`table2`.`term2`
FROM `table1`,`table2`
WHERE
locate(concat(' ',`table2`.`term2`,' '), concat(' ',`table1`.`term1`,' '))
);
the problem is: it finds some, but not all, and i cannot see the logic behind there why it is not working for everything.
If there is any punctuation surrounding the word you're looking for, your matching won't work.
You could replace all punctuation in the field with spaces.
However, I think a much cleaner solution would be a regular expression:
CREATE TABLE test
AS
SELECT table1.term1, table2.term2
FROM table1, table2
WHERE table1.term1 REGEXP CONCAT('(^|[^A-Za-z]])',table2.term2,'([^A-Za-z]|$)');
(^|[^A-Za-z]) means either start of field or not A-Z or a-z.
([^A-Za-z]|$) means either not A-Z or a-z or end of field.
SQLFiddle.
EDIT:
While the above is pretty and all, it's not particularly efficient. (140 ms in a small test)
More efficient: (80 ms, could be much better on proper data)
SELECT term1, term2
FROM table1, table2
WHERE term1 LIKE CONCAT('%',term2,'%')
AND term1 REGEXP CONCAT('(^|[^A-Za-z])',term2,'([^A-Za-z]|$)');
Way more efficient: (8 ms) (for some weird reason, MySQL seemingly can't do regex very well)
SELECT COUNT(*)
FROM table1, table2
WHERE term1 LIKE CONCAT(term2,' %')
OR term1 LIKE CONCAT(term2,',%')
OR term1 LIKE CONCAT(term2,'.%')
OR term1 LIKE CONCAT(term2,';%')
OR term1 LIKE CONCAT('% ',term2,' %')
OR term1 LIKE CONCAT('% ',term2,',%')
OR term1 LIKE CONCAT('% ',term2,'.%')
OR term1 LIKE CONCAT('% ',term2,';%')
OR term1 LIKE CONCAT('% ',term2)
Slightly more efficient: (4 ms)
SELECT COUNT(*)
FROM table1, table2
WHERE CONCAT(' ', REPLACE(REPLACE(REPLACE(term1, ',', ' '), '.', ' '), ';', ' '), ' ')
LIKE CONCAT('% ',term2,' %')
You may want to include a few more characters above.
SQLFiddle.
Note that much of the above depends on the data, some may be more efficient in some cases and much worse in others (but regex will probably trail behind).
Even more efficient?
Fulltext indices + searching.