MySQL MATCH for more than one field - mysql

SELECT * FROM portfolio
INNER JOIN translation
ON portfolio.description = translation.key
WHERE
MATCH(it_translation.*) AGAINST('test')
Why this code doesn't work?
If I do like this MATCH(it_translation.field) AGAINST('test') everything is ok, but I wanna search FULLTEXT via more than one field, and I don't know how many fields in table.

IIRC for FULLTEXT to work you need a FULLTEXT index that covers every field you want to use it for, so if you "don't know how many fields in table" you won't be able to MATCH it like that.

Related

Count the number of times keywords (in a table) appear in a field in another table

I will simplify my problem in order to explain it.
I have a table which contains text messages posted by users and another table which contains keywords.
I want to display, for each user, the number of times keywords are found in text messages.
I don't want the result to display a keyword if it's not found in text messages.
I wan't it to be case INSENSITIVE. All keywords are lowered but in messages, you can find lower & upper chars.
Because I'm not sure that my explanation is clear enough, here comes the SQLFiddle.
http://sqlfiddle.com/#!2/c402a
Hope anyone can help me.
I found what I was looking for. It wasn't easy for me but here is my query :
SELECT t_msg.msg_usr,
t_list.list_word,
count(t_list.list_word),
t_msg.msg_text
FROM t_msg
INNER JOIN t_list
ON LOWER(t_msg.msg_text) LIKE CONCAT("%", t_list.list_word, "%")
GROUP BY t_msg.msg_usr, t_list.list_word;
The SQLFiddle is there : http://sqlfiddle.com/#!2/ba052/8
The recommendation would be to not try solving this with a query. It's possible to write a query that will do it, such query will scan the messages table for each keyword separately, and produce a count (or a row that you can group by), but this won't scale, or be reliable in sense of language search.
Here is what you might want to do:
Create a table to map (user_id, keyword_id) to a count of this keyword in messages of this user. Let's call it t_keyword_count.
Each time you receive a message, before you save the message into the database, search it for all the keywords you care about (using whatever good text search libraries that account for misspellings, etc.). You should know the (user_id) for this message.
You will, at that point, be ready to add the message to the database, and will have an array of (keyword_id) with keywords that this message will have.
In a transaction, insert the message into the t_msg table, and run update/insert for (user_id,keyword_id) to have value=value+1 (or +n, if you need to count the same keyword more than once in the same message) for the t_keyword_count table.
If you are trying to solve the problem of having to do the above on existing data, you can do this manually, just to build up that t_keyword_count table first (depends on how many keywords you have in total, but even if there are a lot, this can be scripted). But you should change (or mirror) the t_msg.msg_text field to be a field suitable for text search, and use SQL text search functionality to find the keywords.

MySql plural search without Fulltext

I want to make a plural search on my table but i don't want to use FULLTEXT.I tried FULLTEXT but my table doesn't support it.My query is like:
SELECT
*
FROM
items
WHERE
LOWER(items.`name`) LIKE '%parameter%'
OR LOWER(items.brand) LIKE '%parameter%'
OR LOWER(items.sku) LIKE '%parameter%'
When i search 'shirt' it returns good results when i search shirts i doesn't.Is there a way to make plural search without fulltext
I suggest you to create separate table items with MyIsam Engine for items
with fields you want to perform search and primary id.
Now you can do full-text search on new table and retrieve ID and based on ID you can retrieve result of fields from main items table.
The additional table for "items" needs to be updated regularly, may be though trigger or automated script.
it will match all those beginning with parameter passed.
SELECT
*
FROM
items
WHERE
LOWER(items.`name`) LIKE 'parameter%'
OR LOWER(items.brand) LIKE 'parameter%'
OR LOWER(items.sku) LIKE 'parameter%'

how to search several columns in a sql query using concat and UPPER

So I have a DB and a web page where I want to display the result of my db search
I have try a couple ways both work but they are incomplete for what I want to accomplish
I want to be able to search my table columns id,Id_Nombre,Pais,Estado,Ciudad,website all of them with a word or several words from my search text.
this code works but I have to type exactly the word it's case sensitive:
$query = "SELECT * FROM Medios_table WHERE concat(id,Id_Nombre,Pais,Estado,Ciudad,website) LIKE '%$Busqueda%'";
so as a result i I type a word like "People" in my search box and in my data base that word is type in any of those columns as "people" it wont find it.
second code I use works but only using one column
$query = "SELECT * FROM Medios_table WHERE UPPER(Id_Nombre) LIKE UPPER('%$Busqueda%')";
the result for this code is great since it will find it no matter the case used, but I need to extend this type of search to all the other columns, but so far everithing I use does not work.
I have tried:
$query = "SELECT * FROM Medios_table WHERE UPPER(id,Id_Nombre,Pais,Estado,Ciudad,website) LIKE UPPER('%$Busqueda%')";
$query = "SELECT * FROM Medios_table WHERE UPPER(concat(id,Id_Nombre,Pais,Estado,Ciudad,website)) LIKE UPPER('%$Busqueda%')";
$query = "SELECT * FROM Medios_table WHERE concat(UPPER(id,Id_Nombre,Pais,Estado,Ciudad,website)) LIKE UPPER('%$Busqueda%')";
etc.
any help is greatly appreciated, thanks.
Did you try UPPER around each column name? Like this:
$query = "SELECT * FROM Medios_table WHERE concat(UPPER(id),UPPER(Id_Nombre),UPPER(Pais),UPPER(Estado),UPPER(Ciudad),UPPER(website)) LIKE UPPER('%$Busqueda%')";
Your upper(concat(...)) version should work. If you had a problem with that, it's probably just a typo, like you left out a parenthesis or something.
You can't say upper(x,y,z) because upper takes only one parameter. You must do the concat first, then do the upper, i.e. upper(concat(x,y,z)).
That said, this query will be very slow on a big database, because the db engine has to read every record in the table, and then for each one search it character by character. If the table is small or this is done infrequently, that might be acceptable. If the table is big and/or this query will be executed often, you really need a totally different approach.
Update
If you really need to search against any text in a field, where you cannot make any assumptions about the text being searched in or the text being searched for in advance, you should investigate fulltext searches. http://dev.mysql.com/doc/refman/5.0/en/fulltext-natural-language.html
If you will be doing these sort of searches all the time, you might want to build a dictionary, that is, build a list of all the word with all the records that that word occurs in. I think that's basically what fulltext does, so this may or may not gain you anything.
But if you have some foreknowledge, that is, if it's not really that you want to search for arbitrary text occurring anywhere in arbitrary text, than break out the things you want to search for into separate fields.
To take a simple example, suppose you have a field that contains customer full name, like "Fred Smith", "Mary Jones", etc. You want to search for someone by last name. You could search for
where full_name like '%Smith%'
But this would require reading every record in the table. If, instead, you broke the field into first name and last name, then you could search for
where last_name='Smith'
If you have an index on last_name this would be a very fast search.
If you're just trying to look for an entered value in any of several fields, it is much, much faster to do
where estado='Toledo' or ciudad='Toledo' or pais='Toledo'
(assuming that you have indexes on estado, ciudad, and pais), then
where concat(estado, ciudad, pais) like '%Toledo%'
(where no index will do you any good).
If you want to do case-insensitive searches, create an index on upper(estado) instead of on estado, etc. Then "where upper(estado)='TOLEDO'" can use the index.
Also, I'm not sure about MySql, but some database engines can use an index if the LIKE does not begin with a wildcard. That is, "somefield like '%x%'" must read every record in the table. But on some db's, "somefield like 'x%'" can use can index to get to the records where the field starts with "x" and then just process those.
You need full-text search
This link may be useful Mysql full-text search

Fulltext search that isn't exact match

I have a MySQL-table called "customers1" running engine MyISAM. I've created a full text index on the columns name,adress and zip. Now one of the customers in that table is me. I spell my name "Gildebrand". Now i can't expect that the users can spell my name correctly, many might write "Glidebrant", but still want to find my. How could i do that search in SQL?
If i run the following query right now
SELECT * FROM customers1 WHERE MATCH(name,adress,zip) AGAINST('Gildebrand')
It finds me, of course. But if i misspell, "Glidebrand", it doesn't find me. What would be the best approach to this?
I would say the closest you can get from such a result if by using SOUNDEX() http://www.w3resource.com/mysql/string-functions/mysql-soundex-function.php
I have a generic search similar to yours. Here's basically what I do:
SELECT * FROM customers1 WHERE MATCH(name,adress,zip) AGAINST('?')
UNION
SELECT * FROM customers1 WHERE name LIKE ('?%')
This allows the user to just enter a prefix also.
If the user realizes they can't spell your name, but they're sure it starts with Gil, then they can just type that.
You can add additional UNION clauses if you want to support prefixes on other columns too.

Generate number id from text/url for fast "SELECT"

I have the following problem:
I have a feed capturer that captures news from different sources every half an hour.
I only insert entries that don't have their URLs already in the database (URL is used to see if the record is already in database).
Even with that, I get some repeated entries, because some sites report the same news (that usually are from a news source like Reuters). I could look for these repeated entries during insertion, but i think this would slow the insertion time even more.
So, I can later find these repeated entries by the title. But I think this search is slow. Then, my idea is to generate a numeric field from the title and then search by this number for repeated titles.
What kind of encoding could I use (I thought in something reverse to base64) to encode the titles?
I'm suposing that searching for repeated numbers is a lot faster than searching for repeated words. Is that true or not?
Do you suggest a better solution for this problem?
Well, I don't care to have the repeated entries in the database, I just don't want to show then to the user. Like google, that filters the repeated results, but shows then if you want.
I hope I explained It well. Thanks in advance.
Fill the MD5 hash of the URL and title and build a UNIQUE index on it:
CREATE UNIQUE INDEX ux_mytable_title_url ON (title_hash, url_hash)
INSERT
INTO mytable (url, title, url_hash, title_hash)
VALUES ('url', 'title', MD5('url'), MD5('title'))
To select like Google (one result per title), use this query:
SELECT *
FROM (
SELECT DISTINCT title_hash
FROM mytable
) md
JOIN mytable mo
ON mo.url_title = md.title_hash
AND mo.url_hash =
(
SELECT url_hash
FROM mytable mi
WHERE mi.title_hash = md.title_hash
ORDER BY
mi.title_hash, mi.url_hash
LIMIT 1
)
so you can use a new table containing only the encoded keys based on title and url, you have then to add a key on it to accelerate search. But i don't think that you can use an effecient algorytm to transform strings to numbers ..
for the encryption use
SELECT MD5(CONCAT('title', 'url'));
and before every insertion you test if the encoded concatenation of title and url exists on this table.
#Quassnoi can explain better than I, but I think there is no visible difference in performance if you use a VARCHAR/CHAR or INT in a index to use it later for GROUPing or other method to find the duplicates. That way you could use the solution proposed by him but use a normal INDEX instead of a UNIQUE index and keep the duplicates in the database, filtering out only when showing to users.