MySQL Fulltext vs Like - mysql

Background I have a table with max 2000 rows, the user should search up to 6 columns.
I don't know in advance what he's looking for and i want a concatenated search (search1 AND search2 AND...)
Problem In these columns I have the an ID not the plain description (ie i have the id of the town, not its name). So I was thinking about two solutions:
Create another table, where i put keywords (1 key/row) and then i search there using LIKE search1% OR LIKE search2% ...
Add a field to the existent table where I put all the keywords and then do a FULLTEXT on that
Which one is the best? I know that rows are so fews that there won't be big perfomance problems, but i hope they'll get more and more :)
Example
This is my table:
ID | TOWN | TYPE | ADDRESS |
11| 14132 | 3 | baker street 220
13| 45632 | 8 | main street 12
14132 = London
45632 = New York
3 = Customer
8 = Admin
The user typing "London Customer" should find the first row.

If you're simply going to use a series of LIKEs, then I'd have thought it would make sense to make use of a FULLTEXT index, the main reason being that it would let you use more complex boolean queries in the future. (As #Quassnoi states, you can simply create an index if you don't have a use for a specific field.)
However, it should be noted that fulltext has its limitations - words that are common across all rows have a low "score" and hence won't match as prominently as if you'd carried out a series of LIKEs. (On the flipside, you can of course get a "score" back from a FULLTEXT query, which may be of use depending on how you want to rank the results.)

You don't have to create a separate field, since a FULLTEXT index can be created on multiple fields:
CREATE fx_mytable_fields ON mytable (field1, field2, field3)
SELECT *
FROM mytable
WHERE MATCH(field1, field2, field3) AGAINST ('+search1 +search2')
This will return all records that contain search1 and search2 in either of the fields, like this:
field1 field2 field3
-- -- --
search1 something search2
or this:
field1 field2 field3
-- -- --
search1 search2 something something else

Given you've got the data in seperate tables, you'd have to have a FULLTEXT index on each of the searchable fields in each table. After that, it's just a matter of building the query with the appropriate JOINs in place so you can fulltext MATCH AGAINST the text version of the field, and not the foreign key number.
SELECT user.id, user.name, town.name
FROM user
LEFT JOIN town ON user.town = town.id
WHERE MATCH(user.name, town.name) AGAINST (...)

Related

Sphinx search query - condition for numbers in varchar column

I have items and list in which categories exists:
id | name | categories(varchar)
1 | bike red | 2,5,18
2 | bike black | 4,7,13
With Sphinx I need to serach for example: bike AND only from category 5
Is any good way how search in column categories?
In MySql I could write: WHERE name LIKE '%bike%' AND categories LIKE '%5%'
But my Sphinx index is big and searching could be not efective. Is any way like create integer ENUM list or? What could be good solution?
Thanks
Sphinx has Multi-Value Attributes http://sphinxsearch.com/docs/current.html#mva . pretty much perfect for this!
It works kinda like a numeric set in MySQL! (you have multiple categories, so set, not enum)
It will even automatically parse a string list of numbers seperated by commas during indexing.
sql_query = SELECT id,name,categories FROM item
sql_attr_multi = uint categories from field;
Then a sphinxQL query...
SELECT * FROM item WHERE MATCH('bike') AND categories=5
(This may look confusing if familar with MySQL. an equality filter on a MVA attribute, actully just means equals one of the values. If want could write categories IN (5) - same effect)

mysql indexing for better selection performance

I do not fully understand indexes and would like some precisions.
I have a table, named posts, which overtime might become very big.
Each post belongs to a category and a language, through 2 columns category_id and lang
If I create indexes on the columns category_id and lang, does this mean that the posts table will be "organized"/"classified" in mysql by "blocs" of category_id and lang, allowing a better performance of the selection of data when I precise a category_id and/or a lang in my query...?
Which type of index should be created then ?
I hope I'm clear enough here...
What an index does is create a "shadow" table, consisting of only the index values, so it only has to look through the index to find what you're looking for.
If you're doing a query, with a where like this:
WHERE zipcode = 555 AND phone = 12345678
You will need an index on Zipcode and Phone.
If the query is only:
WHERE zipcode = 555
You will need to index zipcode only.

MySQL - How can i query a multi-value field to match a primary key of another table?

I have 2 tables. One table contains all of the states in the USA. The other table is just a list of stuff in those states.
My table is structure looks something like this:
tbl_states - stateID (PK), stateName
tbl_stuff - stuffID, stuffName, relState
The values look like this
1 | Alabama
2 | Georgia
3 | Maryland
The relState column relates to the tbl_states.stateID column and i have it in this format. I plan to have a webform to select multiple states and assign the stuff in the states to the state.
1 | This is some stuff | 1,2 [ and this stuff is only AL, GA. ]
So I'm trying to figure out the best way to write the select statement for this. Is there some way to do it strictly with mysql?
Multi-valued fields in a database are a bad idea. Instead, resolve the many-to-many relationship between states and stuff like this:
I came across this post while searching to do this myself and figured a second answer would be helpful. While it is true that multi-valued fields decrease search efficiency, impact scailability and promote data integrity problems, they can be necessary for simplicity, reporting, and integrating with other systems, as in my case it was.
Assuming the Tables:
Table: States
Id Name
235325235 'Alabama'
457457432 'Georgia'
334634636 'Maryland'
Table: Stuff
Id Text StateIds
1 'Some stuff' '235325235'
2 'Some Stuff for two states' '235325235,457457432'
The following query would return all stuffs for alabama
SELECT * FROM Stuff WHERE FIND_IN_SET('235325235', Stuff.StateIds);
Please note that i complicated your ID's more to make a lower probability of uniqueness and I would recommend using a GUID/UUID since you are using a string searching function.

Mysql WHERE problem with comma-separated list

I need help for this problem.
In MYSQL Table i have a field :
Field : artist_list
Values : 1,5,3,401
I need to find all records for artist uid 401
I do this
SELECT uid FROM tbl WHERE artist_list IN ('401');
I have all record where artist_list fields values are '401' only, but if i have 11,401 this query do not match.
Any idea ?
(I cant user LIKE method because if artist uid is 3 (match for 30, 33, 3333)...
Short Term Solution
Use the FIND_IN_SET function:
SELECT uid
FROM tbl
WHERE FIND_IN_SET('401', artist_list) > 0
Long Term Solution
Normalize your data - this appears to be a many-to-many relationship already involving two tables. The comma separated list needs to be turned into a table of it's own:
ARTIST_LIST
artist_id (primary key, foreign key to ARTIST)
uid (primary key, foreign key to TBL)
Your database organization is a problem; you need to normalize it. Rather than having one row with a comma-separated list of values, you should do one value per row:
uid artist
1 401
1 11
1 5
2 5
2 4
2 2
Then you can query:
SELECT uid
FROM table
WHERE artist = 401
You should also look into database normalization because what you have is just going to cause more and more problems in the future.
SELECT uid
FROM tbl
WHERE CONCAT(',', artist_list, ',') LIKE '%,401,%'
Although it would make more sense to normalise your data properly in the first place. Then your query would become trivial and have much better performance.

Table with a lot of attributes

I'm planing to build some database project.
One of the tables have a lot of attributes.
My question is: What is better, to divide the the class into 2 separate tables or put all of them into one table. below is an example
create table User { id, name, surname,... show_name, show_photos, ...)
or
create table User { id, name, surname,... )
create table UserPrivacy {usr_id, show_name, show_photos, ...)
The performance i suppose is similar due to i can use index.
It's best to put all the attributes in the same table.
If you start storing attribute names in a table, you're storing meta data in your database, which breaks first normal form.
Besides, keeping them all in the same table simplifies your queries.
Would you rather have:
SELECT show_photos FROM User WHERE user_id = 1
Or
SELECT up.show_photos FROM User u
LEFT JOIN UserPrivacy up USING(user_id)
WHERE u.user_id = 1
Joins are okay, but keep them for associating separate entities and 1->N relationships.
There is a limit to the number of columns, and only if you think you might hit that limit would you do anything else.
There are legitimate reasons for storing name value pairs in a separate table, but fear of adding columns isn't one of them. For example, creating a name value table might, in some circumstances, make it easier for you to query a list of attributes. However, most database engines, including PDO in PHP include reflection methods whereby you can easily get a list of columns for a table (attributes for an entity).
Also, please note that your id field on User should be user_id, not just id, unless you're using Ruby, which forces just id. 'user_id' is preferred because with just id, your joins look like this:
ON u.id = up.user_id
Which seems odd, and the preferred way is this:
ON u.user_id = up.user_id
or more simply:
USING(user_id)
Don't be afraid to 'add yet another attribute'. It's normal, and it's okay.
I'd say the 2 separate tables especially if you are using ORM. In most cases its best to have each table correspond to a particular object and have its field or "attributes" be things that are required to describe that object.
You don't need 'show_photos' to describe a User but you do need it to describe UserPrivacy.
You should consider splitting the table if all of the privacy attributes are nullable and will most probably have values of NULL.
This will help you to keep the main table smaller.
If the privacy attributes will mostly be filled, there is no point in splitting the table, as it will require extra JOINs to fetch the data.
Since this appears to be a one to one relationship, I would normally keep it all in one table unless:
You would be near the limit of the number of bytes that can be stored in a row - then you should split it out.
Or if you will normally be querying the main table separately and won't need those fields much of the time.
If some columns is (not identifiable or dependent on the primary key) or (values from a definite/fixed set is being used repeatedly) of the Table make a Different Table for those columns and maintain a one to one relationship.
Why not have a User table and Features table, e.g.:
create table User ( id int primary key, name varchar(255) ... )
create table Features (
user_id int,
feature varchar(50),
enabled bit,
primary key (user_id, feature)
)
Then the data in your Features table would look like:
| user_id | feature | enabled
| -------------------------------
| 291 | show_photos | 1
| -------------------------------
| 291 | show_name | 1
| -------------------------------
| 292 | show_photos | 0
| -------------------------------
| 293 | show_name | 0
I would suggest something differnet. It seems likely that in the future you will be asked for 'yet another attribute' to manage. Rather than add a column, you could just add a row to an attributes table:
TABLE Attribute
(
ID
Name
)
TABLE User
(
ID
...
)
TABLE UserAttributes
(
UserID FK Users.ID
Attribute FK Attributes.ID
Value...
)
Good comments from everyone. I should have been clearer in my response.
We do this quite a bit to handle special-cases where customers ask us to tailor our site for them in some way. We never 'pivot' the NVP's into columns in a query - we're always querying "should I do this here?" by looking for a specific attribute listed for a customer. If it is there, that's a 'true'. So rather than having these be a ton of boolean-columns, most of which would be false or NULL for most customers, AND the tendency for these features to grow in number, this works well for us.