How to make my MySQL SUM() query more faster - mysql

I have about 1 million rows on users table and have columns A AA B BB C CC D DD E EE F FF by example to count int values 0 & 1
SELECT
CityCode,SUM(A),SUM(B),SUM(C),SUM(D),SUM(E),SUM(F),SUM(AA),SUM(BB),SUM(CC),SUM(DD),SUM(EE),SUM(FF)
FROM users
GROUP BY CityCode
Result 8 rows in set (24.49 sec).
How to make my statement more faster?

Use explain to to know the excution plan of your query.
Create atleast one or more Index. If possible make CityCode primary key.
Try this one
SELECT CityCode,SUM(A),SUM(B),SUM(C),SUM(D), SUM(E),SUM(F),SUM(AA),SUM(BB),SUM(CC),SUM(DD),SUM(EE),SUM(FF)
FROM users
GROUP BY CityCode,A,B,C,D,E,F,AA,BB,CC,DD,EE,FF

Create an index on the CityCode column.

I believe it is not because of SUM(), try to say select CityCode from users group by CityCode; it should take neary the same time...

Use better hardware
increase caching size - if you use InnoDB engine, then increase the innodb_buffer_pool_size value
refactor your query to limit the number of users (if business logic permits that, of course)

You have no WHERE clause, which means the query has to scan the whole table. This will make it slow on a large table.
You should consider how often you need to do this and what the impact of it being slow is. Some suggestions are:
Don't change anything - if it doesn't really matter
Have a table which contains the same data as "users", but without any other columns that you aren't interested in querying. It will still be slow, but not as slow, especially if there are bigger ones
(InnoDB) use CityCode as the first part of the primary key for table "users", that way it can do a PK scan and avoid any sorting (may still be too slow)
Create and maintain some kind of summary table, but you'll need to update it each time a user changes (or tolerate stale data)
But be sure that this optimisation is absolutely necessary.

Related

mysql query speed at table which has 1.5million rows

It takes around 5 seconds to get result of query from a table consisting 1.5 million row. Query is "select * from table where code=x"
Is there a setting to increase speed ? Or should I jump to another database apart from MySQL ?
You could index the code column. Note that the trade off is that inserting new rows or updating the code column on existing rows will be slowed down a bit since the index also needs to be updated. In any event, you should benchmark the improvement to make sure it's worth it.
WHERE code=x -- needs INDEX(code)
SELECT * when many of the columns are bulky: Large columns are stored "off-record". Hence they take longer to fetch. So, explicitly list the columns you really need, hoping to leave out some of the bulky columns.
When a GROUP BY or LIMIT is involved, it is sometimes best to do
SELECT lots of columns
FROM ( SELECT id FROM t WHERE ... group-by or limit ) AS x
JOIN t AS y USING(id)
etc.
That is, start by finding just the ids as simply as possible, then JOIN back to the original table and other table(s). (This is not the case you presented, but I worry that you over-simplified it.)

How to make a faster query when joining multiple huge tables?

I have 3 tables. All 3 tables have approximately 2 million rows. Everyday 10,000-100,000 new entries are entered. It takes approximately 10 seconds to finish the sql statement below. Is there a way to make this sql statement faster?
SELECT customers.name
FROM customers
INNER JOIN hotels ON hotels.cus_id = customers.cus_id
INNER JOIN bookings ON bookings.book_id = customers.book_id
WHERE customers.gender = 0 AND
customers.cus_id = 3
LIMIT 25 OFFSET 1;
Of course this statement works fine, but its slow. Is there a better way to write this code?
All database servers have a form of an optimization engine that is going to determine how best to grab the data you want. With a simple query such as the select you showed, there isn't going to be any way to greatly improve performance within the SQL. As others have said sub-queries won't helps as that will get optimized into the same plan as joins.
Reduce the number of columns, add indexes, beef up the server if that's an option.
Consider caching. I'm not a mysql expert but found this article interesting and worth a skim. https://www.percona.com/blog/2011/04/04/mysql-caching-methods-and-tips/
Look at the section on summary tables and consider if that would be appropriate. Does pulling every hotel, customer, and booking need to be up-to-the-minute or would inserting this into a summary table once an hour be fine?
A subquery don't help but a proper index can improve the performance so be sure you have proper index
create index idx1 on customers(gender , cus_id,book_id, name )
create index idex2 on hotels(cus_id)
create index idex3 on hotels(book_id)
I find it a bit hard to believe that this is related to a real problem. As written, I would expect this to return the same customer name over and over.
I would recommend the following indexes:
customers(cus_id, gender, book_id, name)
hotels(cus_id)
bookings(book_id)
It is really weird that bookings are not to a hotel.
First, these indexes cover the query, so the data pages don't need to be accessed. The logic is to start with the where clause and use those columns first. Then add additional columns from the on and select clauses.
Only one column is used for hotels and bookings, so those indexes are trivial.
The use of OFFSET without ORDER BY is quite suspicious. The result set is in indeterminate order anyway, so there is no reason to skip the nominally "first" value.

Optimizing mysql query with the proper index

I have a table of 15.1 million records. I'm running the following query on it to process the records for duplicate checking.
select id, name, state, external_id
from companies
where dup_checked=0
order by name
limit 500;
When I use explain extended on the query it tells me it's using the index_companies_on_name index which is just an index on the company name. I'm assuming this is due to the ordering. I tried creating other indexes based on the name and dup_checked fields hoping it would use this one as it may be faster, but it still uses the index_companies_on_name index.
Initially it was fast enough, but now we're down to 3.3 million records left to check and this query is taking up to 90 seconds to execute. I'm not quite sure what else to do to make this run faster. Is a different index the answer or something else I'm not thinking of? Thanks.
Generally the trick here is to create an index that filters first, reducing the number of rows ("Cardinality"), and has the ordering applied secondarily:
CREATE INDEX `index_companies_on_dup_checked_name`
ON `companies` (`dup_checked`,`name`)
That should give you the scope you need.

Mysql Large Table Join Query very slow not Key indexes issue

SELECT t1.*
FROM
( SELECT key_a,key_b,MAX(date) as date
FROM large_table
WHERE date <= **20150126**
group by key_a,key_b
) AS t2
JOIN large_table AS t1 USING(key_a,key_b ,date)
large_table = 1,223,001,206 rows of data
Primary Key key_a,key_b,date
key on key_b
key on date
There are numerous empty dates between rows for a & b that I want the most recent behind or on the "Date" entered.
Is it the Mysql Join settings causing it to be slow ?
I can copy the entire set of a & b data with an INSERT to a temp table just by selecting all the rows and then run the same query on the temp table, but why do multi queries (insert selected, then select from) when only 1 is needed.
The query above only has 4,128,548 total results in the temp insert all dates table, and the date specific returns under 180,000 total.
Not table optimization, not keys, is it Max sort length, Join Buffer size , I have 128 gig ram, on a 32 core server running this, there is no reason for it to be slow, just never bulk insert this large of a single table to run Join queries on prior if anyone else has dealt with tables this size any info greatly appreciated.
Edited query, yes it's late long day had Distinct when it wasn't needed or in actual query
WHERE date <= **20150126**
group by key_a,key_b
needs an index starting with date. It's about doing what you can with the WHERE clause, not sparse or dense.
Then... Since the inner query references only 3 columns, building a 'covering' index may be useful. (Probably useful in your case.) So, tack on the other two fields, in either order. Such as
INDEX(`date`, key_a, key_b)
For MyISAM this step is critical. For InnoDB, this is redundant, since each secondary key (such as your INDEX(date)) implicitly includes the rest of the fields of the PK.
No, the PRIMARY KEY(key_a, key_b, date) cannot serve the purpose. It's in the wrong order. Also, it is (if you are using InnoDB) "clustered" with the index.
The query above only has 4,128,548 total results in the temp insert all dates table, and the date specific returns under 180,000 total.
Sorry, I had trouble parsing that. I assume you are saying 4M rows had 'date<...' and the subquery delivered only 180K rows. Hence, the outer query also returned 180K rows.
The first goal is to get through the 4M rows as efficiently as possible. With the index I propose, that might be about 20K blocks (#16KB each) of index scanning. That's 300MB.
Next the MAX and GROUP BY are performed. At 300MB, this will involve a disk tmp table. (See max_heap_size and max_tmp_table_size.)
Then comes the JOIN to fetch t1.*. You are using a good technique for fetching a bunch of rows from a huge table, where you need a GROUP BY (or LIMIT or ...) that is clumsy when done the obvious way. It goes like this: Write the subquery to find the PKs. Get the best index for it. Then JOIN on the PK.
Now for the JOIN. (Again, I assume InnoDB.) Since you are JOINing on the PK, each lookup into t1 will be efficient -- drill down the PK's BTree to find a row. Do that 180K times.
If those 180K lookups are scattered around the table, then this could be 180K disk hits.
Total effort: 20K + 180K = 200K disk hits, possibly less. On commodity spinning disks, this would take about 30 minutes (plus time for the tmp table). (No, only one core will be used. Anyway, I/O is probably the bottleneck.)
OPTIMIZE TABLE -- almost always useless.
I assume innodb_buffer_pool_size is about 90G? If things are going to be cached, that is where it would happen (for InnoDB). Since 200K blocks is 3GB, it could be easily cached. That is, if you run the query twice, the first might be 30 minutes, but the second might be less than 3 minutes.
To get more numbers, you could do:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS;
and look for 'Handler%', '%sort%', 'Innodb%' and maybe a few others.
What version are you running? Recent versions have a leapfrog technique that works better for max+groupby than what I described. I think it is called MRR. If so, your PK is actually optimal. (Hmmm... I should play around with that.)
PARTITIONing -- I don't see any benefit (for this query).

Better Query to Select Count Where

I'm hosted at Godaddy and they are giving me a fit with a query that I have which is understandable. I am wondering if there is a better way to go about rewriting this query I have to make it resource friendly:
SELECT count(*) as count
FROM 'aaa_db'
WHERE 'callerid' LIKE '16602158641' AND 'clientid'='41'
I have over 1 million rows, and have duplicates therefore I wrote a small script to output the duplicates and delete them rather than changing tables etc.
Please help.
Having separate indexes on clientid and callerid causes MySQL to use just one index, or in some cases, attempt an index merge (MySQL 5.0+). Neither of these are as efficient as having a multi-column index.
Creating a multi-column index on both callerid and clientid columns will relieve CPU and disk IO resources (no table scan), however it will increase both disk storage and RAM usage. I guess you could give it a shot and see if Godaddy prefers that over the other.
The first thing I would try, is to ensure you have an index on the clientid column, then rewrite your query to look for clientid first, this should remove rows from consideration speeding up your query. Also, as Marcus stated, adding a multi-column index will be faster, but remember to add it as (clientid, callerid) as mysql reads indices from left to right.
SELECT count(*) as count
FROM aaa_db
WHERE clientid = 41 and callerid LIKE '16602158641';
Notice I removed the quotes from the clientid value, as it should be an int datatype, if it is not, converting to int datatype should also be faster. Also, I am not sure why you are doing a LIKE without the wildcard operator, if you could change that to an = that will also help.
I like Marcus' answer, but I would first try a single index on just callerid, because with values like 16602158641, how many rows can there be that match out of a million? Not very many, so the single index could match or exceed performance of the double index.
Also, remove LIKE and use =:
SELECT count(*) as count
FROM aaa_db
WHERE callerid = '16602158641' -- equals instead of like
AND clientid = '41'