MySQL full text search matching similar results - mysql

I'll try to explain my situation: I'm trying to create a search engine for products on my website, so when the user needs to find a product I need to show similar ones, here's an example.
User searches:
assassins creed OR assassinscreed OR aSsAssIn's CreeD assuming there are no letters/numbers mispelling (those 3 queries should produce the same result)
Expected results:
Assassin's Creed AND Assassin's Creed: Unity AND Assassin's Creed: Special Edition
What have I tried so far
I have created a MySQL field for the search engine which contains a parsed name of the product (Assassin's Creed: Unity -> assassinscreedunity
I parse the search query
I search using MySQL's INSTR()
My problem
I'm fine by using this, but I heard it can be slow when the number of rows increases, I've created a full-text index in my table, but I don't think it would help, so I need another solution.
Thanks for any answer, and ask me anything before downvoting.

First of all, you should keep track of performance issues in your queries more precisely than 'heard it cand be slow' and 'think it would help'. One starting point may be the Slow Query Log.
If you have a table which contains the same parsed name in more than one row, consider normalizing your database. In the specific case, store unique parsed names in one table, and only the id of the corresponding parsed name in the table you described in your question. This way, you only need to check the smaller table with unique names and can then quickly find all matching entries in the main table by id.
Example:
Consider the following table with your structure
id | product_name | rating
-----------------------------------
1 | assassinscreedunity | 5
2 | assassinscreedunity | 2
3 | monkeyisland | 3
4 | monkeyisland | 5
5 | assassinscreedunity | 4
6 | monkeyisland | 4
you would have to scan all six entries to find relevant rows.
In contrast, consider two tables like this:
id | p_id | rating
--------------------
1 | 1 | 5
2 | 1 | 2
3 | 2 | 3
4 | 2 | 5
5 | 1 | 4
6 | 2 | 4
id | name
--------------------------
1 | assassinscreedunity
2 | monkeyisland
In this case, you only have to scan two entries (compared to six) and can then efficiently look up relevant rows using the integer id.
To further enhance the performance, you could extend the concept of a parsed name and use hashes. For example, you could calculate the SHA1-hash of your parsed name which is a 160 bit value. You can find entries in your database for this value very efficiently. To match substrings, you can add them to the second table as well. Since the hash only needs to computed once, you still can use the database to match by an integer. Another thing for you might be fuzzy hashing.
In addition, you should read up on the Rabin–Karp algorithm or string searching in general.

Related

Combining multiple select queries in to one table

I've been racking my brains for a while and I'm sure there is a simple solution to it, but for the life of me it's not obvious.
I want to query a database in MySQL Workbench to return a set of serial numbers for a given part number, which is of a fairly basic form:
Select serial_num as sn12345
from process
where part_number = 12345
Thus my output is
sn12345
---------------
0000001
0000002
etc
Now I have a number of part numbers I want to get the serial numbers of so that my output is like
sn12345 | sn12346 | sn12347 |
------------------------------------------
0000001 | 0000005 | 0000008 |
0000002 | 0000006 | 0000009 |
Assume that there are more columns than just these. However, I do not want to UNION the query as I want output in individual columns. Also, there may be different numbers of serial number entries for each part number, i.e 100 for one, but 1000 for a second, and 5 for a third, etc, so I'll probably have a lot of NULL entries.
Thanks in advance!

Most efficient, scalable mysql database design

I have interesting question about database design:
I come up with following design:
first table:
**Survivors:**
Survivor_Id | Name | Strength | Energy
second table:
**Skills:**
Skill_Id | Name
third table:
**Survivor_skills:**
Surviror_Id |Skill_Id | Level
In first table Survivors there will be many records and will grow from time to time.
In second table will be just few skills which can survivors learn (for example: recoon (higher view range), sniper (better accuracy), ...). Theese skills aren't like strength or energy which all survivors have.
Third table is the most interesting, there survivors and skills join together. Everything will work just fine but I am worried about data duplication.
For example: survivor with id 1 will have 5 skills so first table would look like this:
// survivor_id | level_id | level
1 | 1 | 2
1 | 2 | 3
1 | 3 | 1
1 | 4 | 5
1 | 5 | 1
First record: survivor with id 1 has skill with id 1 on level 2
Second record ...
Is this proper approach or should I use something different.
Looks good to me. If you are worried about data duplication:
1) your server-side code should be gear to not letting this happen
2) you could check before inserting if it already exists
3) you could use MYSQL: REPLACE INTO - this will replace duplicate rows if configure proerply, or insert new ones (http://dev.mysql.com/doc/refman/5.0/en/replace.html)
4) set a unique index on columns where you want only unique rows, e.g. level_id, level
I concur with the others - this is the proper approach.
However, there is one aspect which hasn't been discussed: the order of columns in the composite key {Surviror_Id, Skill_Id}, which will be governed by the kinds of queries you need to run...
If you need to find skills of the given survivor, the order needs to be: {Surviror_Id, Skill_Id}.
If you need to find survivors with the given skill, the order needs to be: {Skill_Id, Surviror_Id}.
If you need both, you'll need both the key (and the implied index) on {Surviror_Id, Skill_Id} and an index on {Skill_Id, Surviror_Id}1. Since InnoDB tables are clustered, accessing Level through that secondary index requires double-lookup - to avoid that, consider using a covering index {Skill_Id, Surviror_Id, Level} instead.
1 Or vice-verse.

node.js fetch mysql data using two tables

Table lists
id | user_id | name
1 | 3 | ListA
2 | 3 | ListB
Table celebrities
id | user_id | list_id | celebrity_code
1 | 3 | 1 | AA000297
2 | 3 | 1 | AA000068
3 | 3 | 2 | AA000214
4 | 3 | 2 | AA000348
I am looking a JSON object like this
[
{id:1, name:'ListA', celebrities:[{celebrity_code:AA000297},{celebrity_code:AA000068}]},
{id:2, name:'ListB', celebrities:[{celebrity_code:AA000214},{celebrity_code:AA000348}]}
]
Moved this to an answer since the details were getting long, and I thought the additional references would be useful to future readers.
Since you are using MySQL, check out GROUP_CONCAT. To get your object, you will want to GROUP_CONCAT on a CONCATenated string. If you could live with a schema more like {id:2, name:'ListB', celebrity_codes:['AA000214','AA000348']} you'll have a simpler query. If you make a SQLfiddle of your basic schema (basically your create tables plus the inserts of the above sample data), someone might even write it for you. :-)
To be clear, while GROUP_CONCAT can do this, if you are trying to generate more than a fairly simple schema, it gets to be some pretty messy code and it starts making more and more sense to move it into your application layer both from a code maintenance standpoint as well as performance & scalability considerations.
Also note that SQLLite supports GROUP_CONCAT, for other databases:
Postgres user should look at string_agg
SQL Server users should check out this project on CodePlex.
Oracle users can use MODEL, as illustrated here.

mysql lookup table

Lookup table - unique row identity
The other lookup tables just do not make sense as from what I have seen giving a row an ID then putting that id in another table which also has a id then adding these id's to some more tables which may reference them and still creating a lookup tables with more id's (this is how all the examples I can find seem) What I have done is this :
product_item - table
------------------------------------------
id | title | supplier | price
1 | title11 | suuplier1 | price1
etc.
it then goes on to include more items (sure you get it)
product_feature - table
--------------------------
id | title | iskeyfeature
1 | feature1 | true
feature_desc - table
-----------------------------
id | title | desc
1 | desc1 | text description
product_lookup - table
item_id | feature_id | feature_desc
1 | 1 | 1
1 | 2 | 2
1 | 3 | 3
1 |64 | 15
(as these only need to be referenced in the lookup the id's can be multiples per item or multiple items per feature)
What I want to do without adding item_id to every feature row or description row is retrieve only the columns from the multiple tables where their id is referenced in the same row of the lookup table. I want to know if it is possible to select all the referenced columns from the lookup row if I only know the item_id eg. Item_id = 1 return all rows where item_id = 1 with the columns referenced in the same row. Every item can have multiple features and also every feature could be attached to multiple items , this will not matter if I can just get the pattern right in how to construct this query from a single known value.
Any assistance or just some direction will be greatly appreciated. I'm using phpmyadmin, and sure this will be easier with some php voodoo I am learning mysql from tutorials ect and would like to know how to do it with sql directly.
Having a NULL value in a column is not the major concern that would lead to this design - it's the problem with adding new attribute columns in the future, at which MySQL is disgracefully bad.
If you want to make a query that returns everything about an item in one row, you need to LEFT OUTER JOIN back to the product_lookup table for each feature_id. This is about every 10th mysql question on Stack Overflow, so you should be able to find tons of examples.

mysql fast select query without reading all db

I have a large database with two tables: stat and total.
The example of the relation is the following:
STAT:
| ID | total event |
+--------+--------------+
| 7 | 2 |
| 8 | 1 |
TOTAL:
|ID | Event |
+---+--------------+
| 7 | "hello" |
| 7 | "everybody" |
| 8 | "hi" |
This is a very simplified version; also consider that STAT table could have 500K records, and for each STAT I can have about 200 TOTAL rows.
Currently, if I run a simple SELECT query in table TOTAL the system is terribly slow.
Could anyone help me with some advice for the creation of the TOTAL table? Is it possible to say to MySQL that the id column is already sorted so that there is no reason to scan all the rows till the end where, for example, id=7?
Add INDEX(ID) to your tables (both), if you did not already.
SELECT COUNT(*) FROM TOTAL WHERE ID=7 -> if ID is indexed, this will be fast.
You can add an index, and furthermore you can partition your table.
As per #ypercube's comment, tables are not stored in a sorted state, so one cannot "tell" this to the database. However you can add an index on tables to make them faster to search.
One important thing to check - it looks like TOTAL.ID is intended as a foreign key - if so, the table TOTAL should have a primary key called ID. Rename the existing column of that name to STAT_ID instead, so it is obvious what it is a foreign key for. Then add an index on STAT_ID.
Lastly, as a point of style, I recommend that you make your table and column names case-insensitive, and write them in lower-case. It makes it easier to read SQL when keywords are in upper case, and database objects are in lower.