It takes around 5 seconds to get result of query from a table consisting 1.5 million row. Query is "select * from table where code=x"
Is there a setting to increase speed ? Or should I jump to another database apart from MySQL ?
You could index the code column. Note that the trade off is that inserting new rows or updating the code column on existing rows will be slowed down a bit since the index also needs to be updated. In any event, you should benchmark the improvement to make sure it's worth it.
WHERE code=x -- needs INDEX(code)
SELECT * when many of the columns are bulky: Large columns are stored "off-record". Hence they take longer to fetch. So, explicitly list the columns you really need, hoping to leave out some of the bulky columns.
When a GROUP BY or LIMIT is involved, it is sometimes best to do
SELECT lots of columns
FROM ( SELECT id FROM t WHERE ... group-by or limit ) AS x
JOIN t AS y USING(id)
etc.
That is, start by finding just the ids as simply as possible, then JOIN back to the original table and other table(s). (This is not the case you presented, but I worry that you over-simplified it.)
Related
I'm trying to use EXPLAIN to take a closer look at my queries and see how they're running, and so far, the largest id created in an EXPLAINhas been 7, but it was lengthy query with a lot going on. I just made another query with a structure similar to below and EXPLAIN gave me an id maximum of 13. From what I know about EXPLAIN is it generally means the query is less efficient/runs longer the higher an id EXPLAIN gives, but is this a relative rule or are there some sort of boundaries? Like is a query running with a max of 2 id's seen as very efficient and a query with a max id of 13 seen as very unefficient, or is it just 2 is more efficient than 13? Of course there's the third option of id number having no correlation to efficiency.
ID 13 Query:
select if(cond1, subquery, if(cond2, subquery(subsubquery),
subquery(subsubquery))) as colA, if(cond1, subquery(subsubquery), if(cond2,
subquery(subsubquery), subquery(subsubquery))) as colB from TableA join
TableB on X group by y order by z desc
I've never really heard of the id number correlating to efficiency. Unless I am mistaken, it is just little more than the number of tables (and derived tables) that end up being involved in processing the query.
Joining to a huge table once might make for less/lower id; joining to temp tables that are duplicate (since you can't use them twice in one query) but a miniscule relevant fraction of that huge table (and better/more appropriately indexed) numerous times is sure to increase the id count, but may run much more quickly and efficiently... even factoring in the cost of the preceding queries that were needed to generate those temp tables.
I have a large table with hundreds of thousands of rows. However only about 50,000 rows are actually "active" and part of my queries, because I only select the rows that have been updated last 14 days with WHERE crdate > "2014-08-10". So to speed up the queries to the table I'm thinking what of the following options (or maybe you have another suggestion?) that is the best one:
I can delete all old entries and insert them into a "history" table with a cronjob running every day/week. However this will still make the history table slow if I want to do queries to that one.
I can make an index on my "crdate" column. However my dates are in the format of "2014-08-10 06:32:59" so I guess because it is storing so many different values, that index will be quite large(?) and potentially slow(?).
Do you guys have any other suggestion of how I can speed up queries to this table? Is it an bad idea to set an index on a date-column that have so many different values?
1st rule of databases. Always have indexes on columns you are filtering on.
So yes, put an index on crdate.
You can also go with a history table in parallel but make sure you put the index on the crdate column in the history table too. Having the history table, will allow you to have a smaller index in the main table.
I wanted to add to this for future googler's. if you are querying a datatime a more distinct query will result in a more efficient query for example
SELECT * FROM MyTable WHERE MyDateTime = '01/01/2015 00:00:00'
Will be faster than:
SELECT * FROM MyTable WHERE MyDateTime = '01/01/2015'
I tested this repeatedly on an indexed view(by datetime) of 5 million rows the more distinct query gave me a 1 second quicker response
[site_list] ~100,000 rows... 10mb in size.
site_id
site_url
site_data_most_recent_record_id
[site_list_data] ~ 15+ million rows and growing... about 600mb in size.
record_id
site_id
site_connect_time
site_speed
date_checked
columns in bold are unique index keys.
I need to return 50 most recently updated sites AND the recent data that goes with it - connect time, speed, date...
This is my query:
SELECT SQL_CALC_FOUND_ROWS
site_list.site_url,
site_list_data.site_connect_time,
site_list_data.site_speed,
site_list_data.date_checked
FROM site_list
LEFT JOIN site_list_data
ON site_list.site_data_most_recent_record_id = site_list_data.record_id
ORDER BY site_data.date_checked DESC
LIMIT 50
Without the ORDER BY and SQL_CALC_FOUND_ROWS(I need it for pagination), the query takes about 1.5 seconds, with those it takes over 2 seconds or more which is not good enough because that particular page where this data will be shown is getting 20K+ pageviews/day and this query is apparently too heavy(server almost dies when I put this live) and too slow.
Experts of mySQL, how would you do this? What if the table got to 100 million records? Caching this huge result into a temp table every 30 seconds is the only other solution I got.
You need to add a heuristic to the query. You need to gate the query to get reasonable performance. It is effectively sorting your site_list_date table by date descending -- the ENTIRE table.
So, if you know that the top 50 will be within the last day or week, add a "and date_checked > <boundary_date>" to the query. Then it should reduce the overall result set first, and THEN sort it.
SQL_CALC_ROWS_FOUND is slow use COUNT instead. Take a look here
A couple of observations.
Both ORDER BY and SQL_CALC_FOUND_ROWS are going to add to the cost of your performance. ORDER BY clauses can potentially be improved with appropriate indexing -- do you have an index on your date_checked column? This could help.
What is your exact need for SQL_CALC_FOUND_ROWS? Consider replacing this with a separate query that uses COUNT instead. This can be vastly better assuming your Query Cache is enabled.
And if you can use COUNT, consider replacing your LEFT JOIN with an INNER JOIN as this will help performance as well.
Good luck.
Using SQL Server 2012, I have a table with 7 million rows. PK column is a GUID (COMB GUID). I am trying to test the performance of a query and first need to update a random sampling of data, I want to change a column value (not the PK) of 50,000 rows.
Selecting Top 50,000 Order by NEWID() takes way too long, I think SQL Server is scanning the whole table. I cannot seem to get the syntax right for TABLESAMPLE, it returns an empty set.
What is the best way to get this to work?
And to treat it as an update:
;WITH x AS
(
SELECT TOP (50000) col
FROM dbo.table TABLESAMPLE (50000 ROWS)
)
UPDATE x SET col = 'something else';
But a couple of notes:
You probably won't see a huge performance improvement over ORDER BY NEWID(). On a table with 1MM rows this took over a minute on my machine.
The TOP is there because TABLESAMPLE doesn't guarantee the exact number of rows - it's based on a rough calculation of how many pages might contain 50,000 rows. You may end up with less or more depending on your fillfactor, how many variable-length columns, how many NULL values, etc. The TOP above will help limit it to 50,000 when the estimate leads to a larger number of pages being read, but it won't help if the estimate is under.
There is some discussion of this going on in another question right now.
I have about 1 million rows on users table and have columns A AA B BB C CC D DD E EE F FF by example to count int values 0 & 1
SELECT
CityCode,SUM(A),SUM(B),SUM(C),SUM(D),SUM(E),SUM(F),SUM(AA),SUM(BB),SUM(CC),SUM(DD),SUM(EE),SUM(FF)
FROM users
GROUP BY CityCode
Result 8 rows in set (24.49 sec).
How to make my statement more faster?
Use explain to to know the excution plan of your query.
Create atleast one or more Index. If possible make CityCode primary key.
Try this one
SELECT CityCode,SUM(A),SUM(B),SUM(C),SUM(D), SUM(E),SUM(F),SUM(AA),SUM(BB),SUM(CC),SUM(DD),SUM(EE),SUM(FF)
FROM users
GROUP BY CityCode,A,B,C,D,E,F,AA,BB,CC,DD,EE,FF
Create an index on the CityCode column.
I believe it is not because of SUM(), try to say select CityCode from users group by CityCode; it should take neary the same time...
Use better hardware
increase caching size - if you use InnoDB engine, then increase the innodb_buffer_pool_size value
refactor your query to limit the number of users (if business logic permits that, of course)
You have no WHERE clause, which means the query has to scan the whole table. This will make it slow on a large table.
You should consider how often you need to do this and what the impact of it being slow is. Some suggestions are:
Don't change anything - if it doesn't really matter
Have a table which contains the same data as "users", but without any other columns that you aren't interested in querying. It will still be slow, but not as slow, especially if there are bigger ones
(InnoDB) use CityCode as the first part of the primary key for table "users", that way it can do a PK scan and avoid any sorting (may still be too slow)
Create and maintain some kind of summary table, but you'll need to update it each time a user changes (or tolerate stale data)
But be sure that this optimisation is absolutely necessary.