GROUP BY optimization with indexing in MySQL - mysql

I am currently trying to optimize a GROUP BY query because for some reason it is taking forever. Whats odd is that when I run a GROUP BY query on another column that has the same number of characters MySQL can do it with ease. So I have a feeling it has something to do with the data itself. This is my first question. If anyone has any suggestions on how to debug this that would be awesome.
Assuming it's just a optimization problem, I found this post which recommends creating an index. I am confused about how this flow would work in my terminal
Suppose the query I am having trouble with is
SELECT user_id, count(uid) FROM table GROUP BY user_id;
Given his advice would I just run the previous query and then next the following one:
CREATE INDEX ix_temp ON table (uid);
Or would they be the same query? What is the exact flow here? Is there a step I am missing?

Related

Group by, Order by and Count MySQL performance

I have the next query to get the 15 most sold plates in a place:
This query is taking 12 seconds to execute over 100,000 rows. I think this execution takes too long, so I am searching a way to optmize the query.
I ran the explain SQL command on PHPMyAdmin and i got this:
[![enter image description here][1]][1]
According to this, the main problem is on the p table which is scanning the entire table, but how can I fix this? The id of p table is a primary key, do I need to set it also as an index? Also, is there anything else I can do to make the query runs faster?
You can make a relationship between the two tables.
https://database.guide/how-to-create-a-relationship-in-mysql-workbench/
Beside this you can also use a left join so you won't load the whole right table in.
Order by is a slow function in MySQL, if you are using code afterwards you can just do it in the code that is much faster than order by.
I hope I helped and Community feel free to edit :)
You did include the explain plan but you did not give any information about your table structure, data distribution, cardinality nor volumes. Assuming your indices are accurate and you have an even data distribution, the query is having to process over 12 million rows - not 100,000. But even then, that is relatively poor performance. But you never told us what hardware this sits on nor the background load.
A query with so many joins is always going to be slow - are they all needed?
the main problem is on the p table which is scanning the entire table
Full table scans are not automatically bad. The cost of dereferencing an index lookup as opposed to a streaming read is about 20 times more. Since the only constraint you apply to this table is its joins to other tables, there's nothing in the question you asked to suggest there is much scope for improving this.

SQL query for large data filter from big database

I have a large database on server. The database is all about the mobile numbers with about 20 millions of records at present. I want to match the mobile numbers on my website to filter the DND or Non-DND mobile numbers. I am using this query for small number filtering
SELECT phone_number
FROM mobnum_table
WHERE phone_number IN ('7710450282', '76100003451', '8910003402', '9410009850', '7610000191');
But what about in the condition I want to filter 1,00,000 mobile number records in few seconds..? I heard about SQL query optimization but not aware so much about it. Also please guide me what storage engine should I consider in this situation?
I have already googled it, but didn't find so much good answer for the same.
Thanks in Advance..
I think there is some problem in your requirement itself. If you tell us more about your problem, may be we can help you. Anyway its not a good idea to give all the 100000 numbers in IN. One option is to create another table and do an inner join.
Assume you have another table selectednumbers with columns id and phone_number,
you can do an inner join as follows
SELECT phone_number FROM mobnum_table a inner join selectednumbers b on
a.phone_number=b.phone_number
As I mentioned earlier, your question is not complete. So kindly provide some more information so we can give optimized query.
So you're generating a list of 100,000 numbers, and then putting that back into another query?
If you're getting the numbers from a table, take the query that generated the list of numbers in the first place, put it inside the in() brackets, and you'll see a large improvement immediately.
Restructure them both to use a JOIN, instead of in() and you'll see even more.
OR, depending on your DB structure, just do
SELECT phone_number
FROM mobnum_table
WHERE DND = 1

How to do performance tuning for huge MySQL table?

I have a MySQL table MtgoxTrade(id,time,price,amount,type,tid) with more than 500M+ records, i need to query the three fields (time,price,amount) from all records:
SELECT time, price, amount FROM MtgoxTrade;
It spends 110 seconds on Win7 which is too slow,my questions are:
Will a compound index help on this? Note that my SQL query has no WHERE clause
Any other optimization could be made improve the query performance here?
Updated: I'm sorry that MtgoxTrade table have totally 6 fields: (id,time,price,amount,type,tid). My SQL only need to query three fields (time,price,amount). And i already tried to add composite index on (time,price,amount), but seems no help.
If this is your real query - NO, nothing could possibly help. Come to think of it - you are asking to deliver contents of whole 500M+ table! It will be slow no matter what you do - whole table must be processed.
If you can constrain your program logic to only process some smaller subset of your table, then it is possible to make it faster.
For example, you can process only results for last month using WHERE clause:
SELECT time, price, amount
FROM MtgoxTrade
WHERE time BETWEEN '2013-09-01' AND '2013-09-21'
This can work really fast, but you would still need to add index on time field, like this:
CREATE INDEX mtgoxtrade_time_idx ON mtgoxtrade (time);

how to select data from another column in the same query?

Sorry about the poorly worded question.. but I don't know how else to explain this...
MySQL... I have a query with several extremely complex subqueries in it. I and selecting from a table and I need to find out what "place" each record is in according to a variety of criteria .. So I have this
Select record.id, record.title
(select count(*) from (complex-query-that-returns-newer-records)) as agePlace,
(select count(*) from (complex-query-that-returns-records-with-better-ROI)) as ROIPlace...
From record...
Now the issue is that the query is slow - as I had expected give the amount of crunching required. But I realized that there are there are situations where results of 2 subqueries will be the same, and there is no need for me to run the subquery twice.. (or have it in my code twice). So I would like to wrap one of the subqueries in an if statement and if the criteria are met, use the value from another column that already calculated that data, else, run the subquery as normal .
I have tried just putting the other subquery's alias, but it says unknown column totalSales because the field is in the query, not one of the tables.
Is there any way around this?
UPDATE: I have reposted this as a query refortoring question - thanks for the suggestions.. How to refactor select subqueries into joins?
There really isn't a way around this. The SQL engine compiles the query to run the whole query, not just part of it. During compile time, the query engine does not know that the results will be the same.
More likely, you can move the subqueries to the from clause and find optimizations there.
If that is of interest, you should write another question with the actual queries you are using. That is a different question from this one ("how to rephrase this query" rather than "how can I conditionally make this run").

Intermittently slow Mysql table - why?

We recently had an issue I'd never seen before, where, for about 3 hours, one of our Mysql tables got extremely slow. This table holds forum posts, and currently has about one million rows in it. The query that became slow was a very common one in our application:
SELECT * FROM `posts` WHERE (`posts`.forum_id = 1) ORDER BY posts.created_at DESC LIMIT 1;
We have an index on the posts table on (forum_id, created_at) which normally allows this query and sort to happen in memory. But, during these three hours, notsomuch. What is normally an instantaneous query ranged from taking 2 seconds-45 seconds during this time period. Then it went back to normal.
I've pored through our slow query log and nothing else looks out of the ordinary. I've looked at New Relic (this is a Rails app) and all other actions ran essentially the same speed as normal. We didn't have an unusual number of message posts today. I can't find anything else weird in our logs. And the database wasn't swapping, when it still had gigs of memory available to use.
I'm wondering if Mysql could change its mind back and forth about which indexes to use for a given query, and for whatever reason, it started deciding to do a full table scan on this query for a few hours today? But if that were true, why would it have stopped doing the full table scans?
Has anyone else encountered an intermittently slow query that defied reason? Or do you have any creative ideas about how one might go about debugging a problem like this?
I'd try the MySQL EXPLAIN statement...
EXPLAIN SELECT * FROM `posts` WHERE (`posts`.forum_id = 1) ORDER BY posts.created_at DESC LIMIT 1;
It may be worth checking the MySQL response time in your Rails code, and if it exceeds a threshold then run the EXPLAIN and log the details somewhere.
Table locking also springs to mind - is the posts table updated by a cronjob or hefty query while SELECTs are going on?
Hope that helps a bit!
On a site I work on, we recently switched to InnoDB from MyISAM, and we found that some simple select queries which had both WHERE and ORDER BY clauses were using the index for the ORDER BY clause, resulting in a table scan to find the few desired rows (but, heck, they didn't need to be sorted when it finally found them all!)
As noted in the linked article, if you have a small LIMIT value, your ORDER BY clause is the first member of the primary key (so the data on file is ordered by it), and there are many results that match your WHERE clause, using that ORDER BY index isn't a bad idea for MySQL. However, I presume created_at is not the first member of your primary key, so it's not a particularly smart idea in this case.
I don't know why MySQL would switch indexes if you haven't changed anything, but I'd suggest you try running ANALYZE TABLE on the relevant table. You might also change the query to remove the LIMIT and ORDER BY clauses and sort at the application level, provided the result set is small enough; or you could add a USE INDEX hint so it never guesses wrong.
You could also change the wait_timeout value to something smaller so that these queries that use a bad index simply never complete (but don't lag all of the legitimate queries too). You will still be able to run long queries interactively, even with a small wait_timeout, since there is a separate configuration parameter for that.