I applied partitioning to my tables today, and would now like to see stats for each partition (how many rows per partition).
Now, I partitioned it by date, so it's quite easy to get it via "SELECT COUNT(*) FROM table WHERE date >= ... AND date <= ..."... However, what happens when you break your tables by i.e. KEY?
I checked MySQL online manual, but they only use solutions similar to one I explained above. There's gotta be a simpler method (or more fancy looking, so to speak).
Cheers
Put EXPLAIN PARTITIONS in front of your select:
EXPLAIN PARTITIONS SELECT ... FROM table ....
For more info see:
http://dev.mysql.com/doc/refman/5.1/en/partitioning-info.html
Related
I'm looking to improve the query performance of the following mysql query:
SELECT * FROM items
WHERE items.createdAt > ?
AND items.createdAt + items.duration < ?
What are the best indexes to use here? Should I have one for both (createdAt) and (createdAt, duration)?
Thanks!
Create one index on createdAt
Then create a generated (aka computed column) column on createdAt + duration and then create an index on the generated column.
https://dev.mysql.com/doc/refman/8.0/en/create-table-secondary-indexes.html
You can't easily sove the problem. Indexes are one-dimensional. You need a 2D index. SPATIAL provides such, but it would be quite contorted to re-frame your data into such. PARTITION sort of gives an extra dimension; but, again, I don't see that as viable.
So, but best you can do is
INDEX(createdAt)
Or perhaps this is a little better:
INDEX(createdAt, duration)
I don't see that a "generated" column will solve the problem -- however it may lead to sometimes running faster for this simple reason:
If createdAt > ? filters out most of the table because createdAt is near the end, the INDEX(createdAt) appears to be a good index.
When that ? is near the beginning of time, such an index is essentially useless-- because most of the table needs to be scanned.
A generated index on createdAt + duration has the same problem, just in the other direction.
If the real query has an ORDER BY and LIMIT, there may be other tricks to play. I discuss that when trying to map IP-addresses to countries or businesses: http://mysql.rjweb.org/doc.php/ipranges
I'm looking for a way I can get a count for records meeting a condition but my problem is the table is billions of records long and a basic count(*) is not possible as it times out.
I thought that maybe it would be possible to sample the table by doing something like selecting 1/4th of the records. I believe that older records will be more likely to match so I'd need a method which accounts for this (perhaps random sorting).
Is it possible or reasonable to query a certain percent of rows in mysql? And is this the smartest way to go about solving this problem?
The query I currently have which doesn't work is pretty simple:
SELECT count(*) FROM table_name WHERE deleted_at IS NOT NULL
SHOW TABLE STATUS will 'instantly' give an approximate Row count. (There is an equivalent SELECT ... FROM information_schema.tables.) However, this may be significantly far off.
A count(*) on an index on any column in the PRIMARY KEY will be faster because it will be smaller. But this still may not be fast enough.
There is no way to "sample". Or at least no way that is reliably better than SHOW TABLE STATUS. EXPLAIN SELECT ... with some simple query will do an estimate; again, not necessarily any better.
Please describe what kind of data you have; there may be some other tricks we can use.
See also Random . There may be a technique that will help you "sample". Be aware that all techniques are subject to various factors of how the data was generated and whether there has been "churn" on the table.
Can you periodically run the full COUNT(*) and save it somewhere? And then maintain the count after that?
I assume you don't have this case. (Else the solution is trivial.)
AUTO_INCREMENT id
Never DELETEd or REPLACEd or INSERT IGNOREd or ROLLBACKd any rows
ADD an index key with deleted_at column, to improve time execution
and try to count id if id is set.
I'm a newbie when it comes to MySQL/MariaDB partitions, and haven't created one yet, but am reading up on it. My first question is, if I partition a table by year and then month based on a dt_created DATETIME column, do I need to change the way I'm doing SQL queries in order to start to see a performance increase when I'm doing a single day query on dt_created? Or, does a standard query such as:
SELECT * FROM web_tracking_events where dt_created >= '(some time goes here)'
work good enough?
Basically. you can do a query like:
SELECT * FROM web_tracking_events where dt_created >= '(some time goes here)'
This is called pruning. See https://dev.mysql.com/doc/refman/8.0/en/partitioning-pruning.html
However, that means that mysql will open all partitions to check if it finds a match there.
I have a table with entries that have a start_date and end_date (both indexed, DATE format). I want to return a list of all entries where today is between these 2 dates. Here are 2 options I've considered:
1) Direct query:
MySQL query (where 28/02/2014 would be variable of course):
SELECT * FROM mytable WHERE '28/02/2014' BETWEEN start_date AND end_date
2) Daily cronjob to go through all entries and update a field is_valid (boolean format) to be true when today is between both dates, and false otherwise (the performance is less important here as it's not customer-facing). Then the MySQL query to select entries would be:
SELECT * FROM mytable WHERE is_valid = 1
The end goal is to have the fastest query (will be used in search results which would be a prominent page of the site) when entries could reach 100,000 or even millions in the future. I'm not sure if indexing dates would be good enough, or if the cronjob is just overkill - or if there is an even better way to do this!
Thanks in advance for your advice in which option to choose!
EDIT: thanks for the replies - is this index structure good?
If you want the faster query between these two options, then there is nothing like a cron job to set the flag appropriately. You should then index the resulting column, because otherwise you have to do a full-table scan. Without the index, this approach is probably slower than using the dates with an index.
For most purposes, a composite index on start_date and end_date is the preferred solution and should be quite fast enough.
I suspect that you are submitting to the daemon of premature optimization. The fastest approach is to run a cron job and load today's data into a new table, properly indexed and structured for your analysis. Barring that, a composite index is a very reasonable approach. Although updating a flag does solve the problem, it would be neither the fastest nor the cleanest method.
I have used this same schema before. A query with the dates was fast enough, if you have the right indexes.
I am currently part of a team designing a site that will potentially have thousands of users who will be doing a number of date related searches. During the design phase we have been trying to determine which makes more sense for performance optimization.
Should we store the datetime field as a mysql datetime. Or should be break it up into a number of fields (year, month, day, hour, minute, ...)
The question is with a large data set and a potentially large set of users, would we gain performance wise breaking the datetime into multiple fields and saving on relying on mysql date functions? Or is mysql already optimized for this?
Have a look at the MySQL Date & Time Functions documentation, because you can pull specific information from a date using existing functions like YEAR, MONTH, etc. But while these exist, if you have an index on the date column(s), using these functions means those indexes can not be used...
The problem with storing a date as separate components is the work needed to reconstruct them into a date when you want to do range comparisons or date operations.
Ultimately, choose what works best with your application. If there's seldom need for the date to be split out, consider using a VIEW to expose the date components without writing possibly redundant information into your tables.
Use a regular datetime field. You can always switch over to the separated components down the line if performance becomes an issue. Try to avoid premature optimization - in many cases, YAGNI. You may wind up employing both the datetime field and the separated component methodology, since they both have their strengths.
If you know ahead of time some key criteria that all searches will have, MySQL (>= v5.1) table partitioning might help.
For example, if you have a table like this:
create table Books(pubDate dateTime, title varchar(50));
And you know all searches must at least include a year, you could partition it on the date field, along these lines:
create table Books(pubDate dateTime,title varchar(50)
partition by hash(year(pubDate)) partitions 10;
Then, when you run a select against the table, if your where clause includes criteria that limit the partition the results can exist on, the search will only scan that partition, rather than a full table scan. You can see this in action with:
-- scans entire table
explain partitions select * from Books where title='%title%';
versus something like:
-- scans just one partition
explain partitions select * from Books
where year(pubDate)=2010
and title='%title%';
The MySQL documentation on this is quite good, and you can choose from multiple partitioning algorithms.
Even if you opt to break up the date, a table partition on, say, year (int) (assuming searches will always specify a year) could help.