I have two mysql tables like bellow:
table_category
-----------------
id | name | type
1 | A | Cloth
2 | B | Fashion
3 | C | Electronics
4 | D | Electronics
table_product
------------------
id | cat_cloth | cat_fashion | cat_electronics
1 | 1 | 2 | 3
1 | NULL | 2 | 4
Here cat_cloth, cat_fashion, cat_electronics is ID from table_category
It is better to have another table for category type but I need a quick solution for now.
I want to get list of categories with total number of products. I wrote following query:
SELECT table_category.*, table_product.id, COUNT(table_product.id) as count
FROM table_category
LEFT JOIN table_product` ON table_category.id = table_product.cat_cloth
OR table_category.id = table_product.cat_fashion
OR table_category.id = table_product.cat_electronis
GROUP BY table_product.id
ORDER BY table_product.id ASC
Question: The sql I wrote it works but I have more then 14K categories and 50K products and the sql works very slow. I added index for cat_* ids but no improvement. My question how can I optimize this query?
I found the query takes 3-4 minutes to process the volume of data I mentioned. I want to reduce the execution time.
Best Regards
As far as I can say every "OR" either in "ON" or "WHERE" part is very cost expensive. It will sound very stupid but I would recommend you to make 3 separate small selects combined together with UNION ALL.
This we do with similar problems both in mysql and postgresql and in some cases when we got "resources exceeded" we had to do it also for bigquery. So it is very stupid and you will have more work but it certainly works and it is much quicker in producing results then many "OR"s.
Related
Here is my table structure:
// posts
+----+-----------+---------------------+-------------+
| id | title | body | keywords |
+----+-----------+---------------------+-------------+
| 1 | title1 | Something here | php,oop |
| 2 | title2 | Something else | html,css,js |
+----+-----------+---------------------+-------------+
// tags
+----+----------+
| id | name |
+----+----------+
| 1 | php |
| 2 | oop |
| 3 | html |
| 4 | css |
| 5 | js |
+----+----------+
// pivot
+---------+--------+
| post_id | tag_id |
+---------+--------+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
+---------+--------+
As you see, I store keywords in two ways. Both as string into a column named keywords and as relational into other tables.
Now I need to select all posts that have specific keywords (for example php and html tags). I can do that in two ways:
1: Using unnormalized design:
SELECT * FROM posts WHERE keywords REGEXP 'php|html';
2: Using normalized design:
SELECT posts.id, posts.title, posts.body, posts.keywords
FROM posts
INNER JOIN pivot ON pivot.post_id = posts.id
INNER JOIN tags ON tags.id = pivot.tag_id
WHERE tags.name IN ('html', 'php')
GROUP BY posts.id
See? The second approach uses two JOINs. I guess it will be slower than using REGEXP in huge dataset.
What do you think? I mean what's your recommendation and why?
The second approach uses two JOINs. I guess it will be slower than
using REGEXP in huge dataset.
Your intuition is simply wrong. Databases are designed to do JOINs. They can take advantage of indexing and partitioning to speed queries. More advanced databases (than MySQL) use statistics on tables to choose optimal algorithms for executing the query.
Your first query always requires a full table scan of posts. Your second query can be optimized in various ways.
Further, maintaining the consistency of the data in the data is much more difficult with the first approach. You probably need to implement triggers to handle updates and inserts on all the tables. That slows things down.
There are some cases where it is worth the effort to do this -- think about summary counts or totals of dollars or time. Putting tags into a delimited string is much less likely to be beneficial, because parsing the string in SQL is not likely to be a really big benefit relative to the other costs.
In small tables, you can use both at your discretion.
If you expect the table to grow, you really need to second choice. The reason behind is that The regexp can never use an index in MySQL. And indexes are the key to fast queries.
join will use an index if an index is declared on the column;
All these look good when we talk about data in lower scale. It's very fundamental theory for an OLTP system to have denormalize tables. When you expect your table to scale and want data to be non-redundant and consistent, normalization is the answer. Of course there are costs involved with join but thats trivial with all these issues.
Lets talk about your scenario:
Pros:
all data available querying one table.
Cons:
function wrapped across columns force query optimizer to scan the whole table irrespective of the column index. This is very important from data scaling point of view.
Keyword in your case repeated multiple time leading data redundancy.
Keywords appear multiple times lead to data inconsistencies, if you want to remove/update a keyword, it requires column to be searched and replace everywhere from each row. And if anycase anywhere the keywords left behind, leads data integrity issues.
There are many more. Go through data normalization in RDBMS.
I cant find an answer to this despite looking for several days!
In MySQL I have 2 Tables
ProcessList contains foreign keys all from the process Table
ID |Operation1|Operation2|Operation3|etc....
---------------------------------------
1 | 1 | 4 | 6 | ....
---------------------------------------
2 | 2 | 4 | 5 |....
---------------------------------------
.
.
.
Process Table
ID | Name
-------------------
1 | Quote
2 | Order
3 | On-Hold
4 | Manufacturing
5 | Final Inpection
6 | Complete
Now, I am new to SQL but I understand that MYSQL doesnt have a pivot function as Ive researched, and I see some examples with UNIONs etc, but I need an SQL expression something like (pseudocode)
SELECT name FROM process
(IF process.id APPEARS in a row of the ProcessList)
WHERE processListID = 2
so I get the result
Order
Manufacturing
Final Inspection
I really need the last line of the query to be
WHERE processListID = ?
because otherwise I will have to completely rewrite my app as the SQL is stored in a String in java, and the app suplies the key index only at the end of the statement.
One option is using union to unpivot the processlist table and joining it to the process table.
select p.name
from process p
join (select id,operation1 as operation from processlist
union all
select id,operation2 from processlist
union all
select id,operation3 from processlist
--add more unions as needed based on the number of operations
) pl
on pl.operation=p.id
where pl.id = ?
If you always consider only a single line in the process list (i.e. procsessListId = x), the following query should do a pretty simple and performant job:
select p.name from process p, list l
where l.id = 2
and (p.id in (l.operation1, l.operation2, l.operation3))
I am running a query to retrieve some game levels from a MySQL database. The query itself takes around 0.00025 seconds to execute on a base that contains 40 level strings. I thought it was satisfactory, until I got a message from the website host telling me to optimise the below-mentioned query, or the script will be removed since it is pushing a lot of strain onto their servers.
I tried optimising by using explain and explain extended and adjusting the columns accordingly(adding indexes), but am always getting the same performance. What I noticed also is that MySQL didn't use indexes where they were available but instead did a full-table scan.
Results from EXPLAIN EXTENDED:
table id select_type type possible_keys key key_len ref rows Extra
users 1 SIMPLE ALL PRIMARY,id NULL NULL NULL 7 Using temporary; Using filesort
AllTime 1 SIMPLE ref PRIMARY,userid PRIMARY 4 Test.users.id 1
query:
SELECT users.nickname, AllTime.userid, AllTime.id, AllTime.levelname, AllTime.levelstr
FROM AllTime
INNER JOIN users
ON AllTime.userid=users.id
ORDER BY AllTime.id DESC
LIMIT ($value_from_php),20;
The tables:
users
| id(int) | nickname(varchar) |
| (Primary, Auto_increment) | |
|---------------------------|-------------------|
| 1 | username1 |
| 2 | username2 |
| 3 | username3 |
| ... | ... |
and AllTime
| id(int) | userid(int) | levelname(varchar) | levelstr(text) |
| (Primary, Auto_increment) | (index) | | |
|---------------------------|-------------|--------------------|----------------|
| 1 | 2 | levelname1 | levelstr1 |
| 2 | 2 | levelname2 | levelstr2 |
| 3 | 3 | levelname3 | levelstr3 |
| 4 | 1 | levelname4 | levelstr4 |
| 5 | 1 | levelname5 | levelstr5 |
| 6 | 1 | levelname6 | levelstr6 |
| 7 | 2 | levelname7 | levelstr7 |
Is there a way to optimize this query or would I be better off by calling two consecutive queries from php just to avoid the warning?
I am just learning MySQL, so please take that information into account when replying, thank you :)
I'm assuming you're using InnoDB.
For an INNER JOIN, MySQL typically starts with the table with the fewest rows, in this case users. However, since you just want the latest 20 AllTime records joined with the corresponding user records, you actually should start with AllTime since with the LIMIT, it will be the smaller data set.
Use STRAIGHT_JOIN to force the join order:
SELECT users.nickname, AllTime.userid, AllTime.id, AllTime.levelname,
AllTime.levelstr
FROM AllTime
STRAIGHT_JOIN users
ON users.id = AllTime.userid
ORDER BY AllTime.id DESC
LIMIT ($value_from_php),20;
It should be able to use the primary key on the AllTime table and follow it in descending order. It'll grab all the data on the same pages as it goes.
It should also use the primary key on the users table to grab the id and nickname. If there are more than just two columns, you might add a multi-column covering index on (id, nickname) to improve the speed.
If you can, convert the levelstr column to VARCHAR so that the data is stored on the same page as the rest of the data, otherwise, it has to go fetch the text columns separately. This assumes that your columns are under the 8000 byte row limit for InnoDB. There is no way to avoid the USING TEMPORARY unless you get rid of the text column.
Most likely, your host has identified this query by using the slow query log, which can identify all queries that don't use an index, or they may have red flagged it because of the Using temporary.
it doesn't look like the query has a problem.
Review the application code. Most likely the issue is in the code
Check MySQL query execution plan
possibly you are missing an index
Make sure you cache the data in Application and Database (fyi, sometimes you can load the whole database into Application memory)
Make sure you use a connection pool
Create a view (a very small chance for improvement)
Try to remove the "Order By" clause (again a very small chance it will improve the performance)
The query itself takes around 0.00025 seconds ... I got a message from the website host telling me to optimise the below-mentioned query, or the script will be removed since it is pushing a lot of strain onto their servers.
Ask the website host for more details about why this query has been flagged for attention. A query that trivial is not going to cause strain on anything unless it is being called very frequently.
Find out how many times that query is being run. I will bet you a nickel that your site is getting hammered by a bot and being executed hundreds or thousands of times per minute. If so, then that's your real problem.
LIMIT ($value_from_php),20; -- if $value_form_php is huge, then the query is slow. This is because all the 'old' pages need to be scanned before getting to the 20 you need.
By "remembering where you left off" you can make every page equally fast. See this for further details: http://mysql.rjweb.org/doc.php/pagination
I'll try to explain my situation: I'm trying to create a search engine for products on my website, so when the user needs to find a product I need to show similar ones, here's an example.
User searches:
assassins creed OR assassinscreed OR aSsAssIn's CreeD assuming there are no letters/numbers mispelling (those 3 queries should produce the same result)
Expected results:
Assassin's Creed AND Assassin's Creed: Unity AND Assassin's Creed: Special Edition
What have I tried so far
I have created a MySQL field for the search engine which contains a parsed name of the product (Assassin's Creed: Unity -> assassinscreedunity
I parse the search query
I search using MySQL's INSTR()
My problem
I'm fine by using this, but I heard it can be slow when the number of rows increases, I've created a full-text index in my table, but I don't think it would help, so I need another solution.
Thanks for any answer, and ask me anything before downvoting.
First of all, you should keep track of performance issues in your queries more precisely than 'heard it cand be slow' and 'think it would help'. One starting point may be the Slow Query Log.
If you have a table which contains the same parsed name in more than one row, consider normalizing your database. In the specific case, store unique parsed names in one table, and only the id of the corresponding parsed name in the table you described in your question. This way, you only need to check the smaller table with unique names and can then quickly find all matching entries in the main table by id.
Example:
Consider the following table with your structure
id | product_name | rating
-----------------------------------
1 | assassinscreedunity | 5
2 | assassinscreedunity | 2
3 | monkeyisland | 3
4 | monkeyisland | 5
5 | assassinscreedunity | 4
6 | monkeyisland | 4
you would have to scan all six entries to find relevant rows.
In contrast, consider two tables like this:
id | p_id | rating
--------------------
1 | 1 | 5
2 | 1 | 2
3 | 2 | 3
4 | 2 | 5
5 | 1 | 4
6 | 2 | 4
id | name
--------------------------
1 | assassinscreedunity
2 | monkeyisland
In this case, you only have to scan two entries (compared to six) and can then efficiently look up relevant rows using the integer id.
To further enhance the performance, you could extend the concept of a parsed name and use hashes. For example, you could calculate the SHA1-hash of your parsed name which is a 160 bit value. You can find entries in your database for this value very efficiently. To match substrings, you can add them to the second table as well. Since the hash only needs to computed once, you still can use the database to match by an integer. Another thing for you might be fuzzy hashing.
In addition, you should read up on the Rabin–Karp algorithm or string searching in general.
I have a SQL query that takes a very long time to run on MySQL (it takes several minutes). The query is run against a table that has over 100 million rows, so I'm not surprised it's slow. In theory, though, it should be possible to speed it up as I really only want to get back the rows from the large table (let's call it A) that have a reference in another table, B.
So my query is:
SELECT id FROM A, B where A.ref = B.ref;
(A has over 100 million rows; B has just a few thousand).
I've added INDEXes:
alter table A add index(ref);
alter table B add index(ref);
But it's still very slow (several minutes -- I'd be happy with one minute).
Unfortunately, I'm stuck with MySQL 4.1.22, so I can't use views.
I'd rather not copy all of the relevant rows from A into a separate, smaller table, as the rows that I need will change from time to time. On the other hand, at the moment that's the only solution I can think of.
Any suggestions welcome!
EDIT: Here's the output of running EXPLAIN on my query:
+----+-------------+------------------------+------+------------------------------------------+-------------------------+---------+------------------------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+------+------------------------------------------+-------------------------+---------+------------------------------------------------+-------+-------------+
| 1 | SIMPLE | B | ALL | B_ref,ref | NULL | NULL | NULL | 16718 | Using where |
| 1 | SIMPLE | A | ref | A_REF,ref | A_ref | 4 | DATABASE.B.ref | 5655 | |
+----+-------------+------------------------+------+------------------------------------------+-------------------------+---------+------------------------------------------------+-------+-------------+
(In redacting my original query example, I chose to use "ref" as my column name, which happens to be the same as one of the types, but hopefully that's not too confusing...)
The query optimizer is probably already doing the best that it can, but in the unlikely event that it's reading the giant table (A) first, you can explicitly tell it to read B first using the STRAIGHT_JOIN syntax:
SELECT STRAIGHT_JOIN id FROM B, A where B.ref = A.ref;
From the answers, it seems like you're doing the most efficient thing you can with the SQL. The A table seems to be the big problem, how about splitting it into three individual tables, kind of like a local version of sharding? Alternatively, is it worth denormalising the B table into the A table, assuming B doesn't have too many columns?
Finally, you could just have to buy a faster box to run it on - there's no substitute for horsepower!
Good luck.
SELECT id FROM A JOIN B ON A.ref = B.ref
You may be able to optimize further by using an appropriate type of join e.g. LEFT JOIN
http://en.wikipedia.org/wiki/Join_(SQL)