Sphinx search query - condition for numbers in varchar column - mysql

I have items and list in which categories exists:
id | name | categories(varchar)
1 | bike red | 2,5,18
2 | bike black | 4,7,13
With Sphinx I need to serach for example: bike AND only from category 5
Is any good way how search in column categories?
In MySql I could write: WHERE name LIKE '%bike%' AND categories LIKE '%5%'
But my Sphinx index is big and searching could be not efective. Is any way like create integer ENUM list or? What could be good solution?
Thanks

Sphinx has Multi-Value Attributes http://sphinxsearch.com/docs/current.html#mva . pretty much perfect for this!
It works kinda like a numeric set in MySQL! (you have multiple categories, so set, not enum)
It will even automatically parse a string list of numbers seperated by commas during indexing.
sql_query = SELECT id,name,categories FROM item
sql_attr_multi = uint categories from field;
Then a sphinxQL query...
SELECT * FROM item WHERE MATCH('bike') AND categories=5
(This may look confusing if familar with MySQL. an equality filter on a MVA attribute, actully just means equals one of the values. If want could write categories IN (5) - same effect)

Related

how to search elements in mySQL table for comma separated elements from another table?

I have two tables 'A' and 'B'. Table A has a field that is comma-separated values as follows :
Table A (fields : project_name, model_types)
|project_name | | model_types |
project_animals detection,segmentation,detection,classification
I have table B with some information related to each model type listed above.
Table B (fields : model,labels,image_types)
| model | labels | image_types |
detection | cat,dog | jpg,png
segmentation | rat,dog | jpg,tif
classification| cow,cat | bmp,png
I need to read the labels and image_types for each model type listed in table A with a comma separate string. (no need to find unique)
Using the following SQL script, I could get the model_types string
select model_types from A where project_name = 'project_animals'
This will return model_types = 'detection,segmentation,detection,classification'.
So instead of reading table B with each item separately splited (model_types.split(',') outside mySQL and read again, how could I do it once in mySQL script.
So I need the results as follows from a single mySQL statement :
Model_types | labels | image_types
detection cat,dog jpg,png
segmentation rot,dog jpg,tif
detection cat,dog jpg,png
classification cow,cat bmp,png
Is it even possible?
Yes, it is possible, but you probably won't like the result. It's not possible to optimize searches on comma-separated strings (or any other substring matching, or regular expressions, etc.). So the query is bound to do a table-scan to find matches rows in table B.
SELECT tableB.*
FROM tableA
JOIN tableB ON FIND_IN_SET(tableB.model, tableA.model_types)
WHERE project_name = 'project_animals';
The FIND_IN_SET() expression can't use an index. Also it won't work if your comma-separated list contains spaces.
The proper way to store and query multi-valued attributes is to create another child table, and store multiple rows, with one model value per row.
For example, you could create a table project_models:
project_name
model
project_animals
detection
project_animals
segmentation
project_animals
detection
project_animals
classification
Then join this way:
SELECT tableB.*
FROM project_models
JOIN tableB ON tableB.model = project_models.model
WHERE project_models.project_name = 'project_animals';
This can use an index on tableB.model to optimize the join.
Besides this optimization problem, using a comma-separated list causes numerous other problems. See my answer to Is storing a delimited list in a database column really that bad?

Mysql two ways to select where. Which way uses less resources and is faster?

For example have url like domain.com/transport/cars
Based on the url want to select from mysql and show list of ads for cars
Want to choose fastest method (method that takes less time to show results and will use less resources).
Comparing 2 ways
First way
Mysql table transport with rows like
FirstLevSubcat | Text
---------------------------------
1 | Text1 car
2 | Text1xx lorry
1 | Text another car
FirstLevSubcat Type is int
Then another mysql table subcategories
Id | NameOfSubcat
---------------------------------
1 | cars
2 | lorries
3 | dogs
4 | flats
Query like
SELECT Text, AndSoOn FROM transport
WHERE
FirstLevSubcat = (SELECT Id FROM subcategories WHERE NameOfSubcat = `cars`)
Or instead of SELECT Id FROM subcategories get Id from xml file or from php array
Second way
Mysql table transport with rows like
FirstLevSubcat | Text
---------------------------------
cars | Text1 car
lorries | Text1xx lorry
cars | Text another car
FirstLevSubcat Type is varchar or char
And query simply
SELECT Text, AndSoOn FROM transport
WHERE FirstLevSubcat = `cars`
Please advice which way would use less resources and takes less time to show results. I read that better select where int than where varchar SQL SELECT speed int vs varchar
So as understand the First way would be better?
The first design is much better, because you separate two facts in your data:
There is a category 'cars'.
'Text1 car' is in the Category 'cars'.
Imagine, in your second design you enter another car, but type in 'cors' instead of 'cars'. The dbms doesn't see this, and so you have created another category with a single entry. (Well, in MySQL you could use an enum column instead to circumvent this issue, but this is not available in most other dbms. And anyhow, whenever you want to rename your category, say from 'cars' to 'vans', then you would have to change all existing records plus alter the table, instead of simply renaming the entry once in the subcategories table.)
So stay away from your second design.
As to Praveen Prasannan's comment on sub queries and joins: That is nonsense. Your query is straight forward and good. You want to select from transport where the category is the desired one. Perfect. There are two groups of persons who would prefer a join here:
Beginners who simply don't know better and always join from the start and try to sort things out in the end.
Experienced programmers who know that some dbms often handle joins better than sub-queries. But this is a pessimistic habit. Better write your queries such that they are easy to read and maintain, as you are already doing, and only change this in case grave performance issues occur.
Yup. As the SO link in your question suggests, int comparison is faster than character comparison and yield faster fetch. Keeping this in mind, first design would be considered as better design. However sub queries are never recommended. Use join instead.
eg:
SELECT t.Text, t.AndSoOn FROM transport t
INNER JOIN subcategories s ON s.ID = t.FirstLevSubcat
WHERE s.NameOfSubcat = 'cars'

\n Separated Search in Column

I have a district table, in which we store user’s preferred districts in district table district_id (varchar(250)) field(column). Value stored in this field is like 1 2 5 6 1 by using \n. So please tell me, how can i search in this specific column?
Don't. Your design is absolutely horrible and this is why you are having this issue in the first place.
When you have a N-N relationship (a user can have many preferred districts and each district can be preferred by many users) you need to make a middle table with foreign keys to both tables.
You need:
A table for districts with only information about districts.
A table with users with only information about users.
A table for preferred districts by user with the district number and the user id as columns and foreign key constraints. This will make sure that any user can have an unlimited number of preferred districts with easy querying.
I would not recommend performing searches on data stored that way, but if you are stuck it can be done with regular expressions.
You have to deal with starting and ending matches for a string as well. So a regular LIKE is not going to work.
MySQL Regular Expressions
Give this SQL a try. To search for the number 5
SELECT * FROM `TABLE` WHERE `field` REGEXP '\\n?(5)\\n?';
If you want to match using the LIKE feature. It can be done using multiple rules.
SELECT * FROM `TABLE` WHERE `field` LIKE '%\\n5\\n%' OR LIKE '5\\n%' OR LIKE '%\\n5';
Note that you have to use a double \ to escape for a new line.
Easiest way is to just use a LIKE query, like this:
SELECT * FROM `preferred_districts` WHERE `district_id` LIKE '%6%';
To make sure it's the right one you'll receive (because this will also match id 16, 26, 674 etc.) you'll have to check manually if it's correct. In php (dunno if you use it) you could use the snippet below:
$id_field = '1 2 5 6 17';
$ids = explode("\n", $id_field);
if(in_array(6, $ids)) {
echo 'Yup, found the right one';
}
Important Although the above will work, your database design isn't how it should be. You should create (what is sometimes called) a pivot table between the districts and the users, something like below.
(Table 'users_preferred_districts')
user_id | district_id
--------+------------
2 | 1
2 | 17
9 | 21
Like this it's quite easy to retrieve the records you want...
I have used mysql function FIND_IN_SET() and I got the desired result through this function.
I got help from this tutorial.
http://www.w3resource.com/mysql/string-functions/mysql-find_in_set-function.php

Search a many to many relationship with a wild card, performance issues

I am building a database for an app and I am testing performance issues on a larger data set. I generated about 250,000 location records. Each location can be assigned to many categories and a category can be assigned to many locations. My data-set has 2-4 categories assigned to each location.
I want to allow the user to search for locations by filtering which categories should be allowed using a wild card search. So maybe I want to match all categories with the word "red" in it. So if I type red, now it shows all locations which have a category title that has "red" in it. In addition, I would like to wildcard search the location title with that same string.
I wrote up a query which works but performance is awful in large data-sets. Essentially I am using inner queries which is fine if my limit is set and I find results quick (around .05ms). If I don't find any results right away, it looks like it goes through the whole database and the query takes around 9-10 seconds.
Here is a simplified layout of my database:
locations: id | title | address
categories: id | title
locations_categories: id | location_id | category_id
Here is the query I currently am using:
SELECT `id`,`title`,`address`
FROM (`locations`)
WHERE title LIKE '%string%'
AND WHERE id IN (
SELECT location_id
FROM locations_categories
JOIN categories ON categories.id = locations_categories.category_id
WHERE categories.title LIKE '%string%')
First of all, you main query just uses the value of the subquery, so it can be rewritten:
SELECT location_id
FROM locations_categories
JOIN categories ON categories.id = locations_categories.category_id
WHERE categories.title LIKE '%string%'
But I'd propose to split this query in two—JOINs are slow for big datasets. First one will get necessary category IDs (with paging):
SELECT id
FROM categories
WHERE title LIKE '%string%' LIMIT BY <start>, <step>
Then you can get locations_categories:
SELECT location_id FROM locations_categories WHERE category_id IN (...)
And you'll use the location IDs you've got to retrieve corresponding records:
SELECT * FROM locations WHERE id IN (...)
These 3 queries combined will be much faster then your original one.
Also, make sure your title column is indexed—it can be the bottleneck. But since you have a wildcard at the start of the search term, you'll have to use FULLTEXT index here.
Your explain plan will confirm (or disprove) this but I suspect that your issue is that the leading % in the clauses
WHERE categories.title LIKE '%string%'
and
WHERE title LIKE '%string%`
forces full table scans. To address this often requires some knowledge of the domain and application in question
The simple approach is to only search for 'starts with'. Others include full text searching, function based indexes, having a 'grouping table' that presorts and lists the relevant records for known searches.

In mysql is there a way to query for a string in the db but only do an alphanumeric comparison (ignore special chars)

I'm trying to figure out whether it is possible to query a mysql table for a string but have it ignore special characters in the field that is querying in the db....
A better way of clarifying what I'm trying to achieve could be with an example.
If I had a table named "Games" which had 2 columns being "id" and "title" which contained these rows:
id title
-----------------------
1 f-zero
2 quake
3 quake 4
And I wanted to be able to search for "fzero" (notice the search string is without the hyphen), i.e.
SELECT g.* FROM Games as g WHERE alphanumeric(title) = "fzero";
Would this be possible in one way or another?
Thanks in advance!
If you have some structure in the title attribute values, you can use a regexp condition, like:
SELECT g.* FROM Games as g WHERE title REGEXP 'f[0-9\-!\\]*z[0-9\-!\\]*e[0-9\-!\\]*r[0-9\-!\\]*o[0-9\-!\\]*';