I have a table (innodb) that will have billions of records eventually. Every 2nd week I expect ~ 500K records to get dropped into the table. I would want to partition this table based on the date on which the data is imported - luckily this is a field in the table that is of the format yyyy-mm-dd - Is it possible to partition it based on this date column ? I tried looking at the 18th chapter of mysql docs but couldn't figure out if this is possible.
You cannot partition on a date column directly, but you can partition on TO_DAYS(thedatecolumn). There's some examples here
Related
mysql query to find data entry not present in table for particular date
I wish to find particular fridays date whose entru is not present in the table
You can query the things that exist in the data. So to find missing dates, you can create a table with all dates in your desired date range and make a query which selects those dates that do not exists in your other table.
I want to collect time-series data and store it in snappydata store. I will be collecting millions of rows of data and I want to make queries across timeslices/ranges.
Here is an example query I want to do:
select avg(value)
from example_timeseries_table
where time >= :startDate and time < :endDate;
So, I am thinking that I want to have PARTITION BY COLUMN on time columns rather than the classic PRIMARY KEY column. In other technologies that I am familiar with like Cassandra DB, using the time columns in the partition key would point me directly at the partition and allow pulling the data for the timeslice in a single node rather than across many distributed nodes.
To be performant, I assume I need to partition by column 'time', in this table.
example_timeseries_table
------------------------
id int not nullable,
value varchar(128) not nullable,
time timestamp not nullable
PERSISTENT ASYNCHRONOUS
PARTITION BY COLUMN time
Is this the correct column to partition on for efficient, time-slice queries or do I need to make even more columns like: year_num, month_num, day_num, hour_num columns and PARTITION BY COLUMN on all of them as well, then do a query like this to focus the query to a particular partitioned node?:
select avg(value)
from example_table
where year_num = 2016
and month_num= 1
and day_num = 4
and hour_num = 11
and time >= :startDate and time < :endDate;
When a single partition has all the data, a single processor processes that data and you lose distributed processing. In fact, if you have time series data, most of the time you would be querying the node that holds the latest time range and the rest of your compute capacity sits idle. If you expect concurrent queries on various time ranges then it may be fine but that is not the case most of the time.
Assuming that you are working with row tables, another way to speed up your queries would be by creating an index on your time column.
SnappyData supports partition pruning on row tables. In case you decide to go the way you mention here, the timestamp column's partition pruning should work.
I have a big table with many many tuples, however I perform some queries just in a small range of this (extracted by date range), I would like to know how to set up the table struture to optimize the db work on date range. May I add startdate and enddate fields as index? Can it helps?
I want to partition a table in MySQL while preserving the table's structure.
I have a column, 'Year', based on which I want to split up the table into different tables for each year respectively. The new tables will have names like 'table_2012', 'table_2013' and so on. The resultant tables need to have all the fields exactly as in the source table.
I have tried the following two pieces of SQL script with no success:
1.
CREATE TABLE all_data_table
( column1 int default NULL,
column2 varchar(30) default NULL,
column3 date default NULL
) ENGINE=InnoDB
PARTITION BY RANGE ((year))
(
PARTITION p0 VALUES LESS THAN (2010),
PARTITION p1 VALUES LESS THAN (2011) , PARTITION p2 VALUES LESS THAN (2012) ,
PARTITION p3 VALUES LESS THAN (2013), PARTITION p4 VALUES LESS THAN MAXVALUE
);
2.
ALTER TABLE all_data_table PARTITION BY RANGE COLUMNS (`year`) (
PARTITION p0 VALUES LESS THAN (2011),
PARTITION p1 VALUES LESS THAN (2012),
PARTITION p2 VALUES LESS THAN (2013),
PARTITION p3 VALUES LESS THAN (MAXVALUE)
);
Any assistance would be appreciated!
This is old, but seeing as it comes up highly ranked in partitioning searches, I figured I'd give some additional details for people who might hit this page. What you are talking about in having a table_2012 and table_2013 is not "MySQL Partitioning" but "Manual Partitioning".
Partitioning means that you have one "logical table" with a single table name, which--behind the scenes--is divided among multiple files. When you have millions to billions of rows, over years, but typically you are only searching a single month, partitioning by Year/Month can have a great performance benefit because MySQL only has to search against the file that contains the Year/Month that you are searching for...so long as you include the partition key in your WHERE.
When you create multiple tables like table_2012 and table_2013, you are MANUALLY partitioning the tables, which you don't do with the MySQL PARTITION configuration. To manually partition the tables, during 2012, you put all data into the 2012 table. When you hit 2013, you start putting all the data into the 2013 table. You have to make sure to create the table before you hit 2013 or it won't have any place to go. Then, when you query across the years (e.g. from Nov 2012 - Jan 2013), you have to do a UNION between table_2012 and table_2013.
SELECT * FROM table_2012 WHERE #...
UNION
SELECT * FROM table_2013 WHERE #...
With partitioning, this manual work is not necessary. You do the initial setup of the partitions, then you treat is as a single table. No unions required, no checking the date before you insert, etc. This makes life much easier. MySQL handles figuring out what tables it needs to query. However, you MUST make sure to query against the Year column or it will have to scan ALL files. E.g. SELECT * FROM all_data_table WHERE Month=12 will scan all partitions for Month=12. To ensure you are only scanning the partition files that you need to scan, you want to make sure to include the partition column in every query that you can.
Possible negatives to partitioning...if you have billions of rows and you do an ALTER TABLE on the table to--say--add a column...it's going to have to update every row taking a VERY long time. At the company I currently work for, the boss doesn't think it's worth the time it takes to update the billion rows historically when we are adding a new column for going forward...so this is one of the reasons we do manual partitioning instead of letting MySQL do it.
DISCLAIMER: I am not an expert at partitioning...so if I'm wrong in any of this, please let me know and I'll fix the incorrect parts.
From what I see you want to create many tables from one big table.
I think you should try to create views instead.
Since from what I look around about partitioning, it actually partitions the physical storage of that table and then store them separately. But if you see from the top perspective you will see them as a single table.
I have a table with 5 million rows, and I want to get only rows that have the field date between two dates (date1 and date2). I tried to do
select column from table where date > date1 and date < date2
but the processing time is really big. Is there a smarter way to do this? Maybe access directly a row and make the query only after that row? My point is, is there a way to discard a large part of my table that does not match to the date period? Or I have to read row by row and compare the dates?
Usually you apply some kind of condition before retrieving the results. If you don't have anything to filter on you might want to use LIMIT and OFFSET:
SELECT * FROM table_name WHERE date BETWEEN ? AND ? LIMIT 1000 OFFSET 1000
Generally you will LIMIT to whatever amount of records you'd like to show on a particular page.
You can try/do a couple of things:
1.) If you don't already have one, index your date column
2.) Range partition your table on the date field
When you partition a table, the query optimizer can eliminate partitions that are not able to satisfy the query without actually processing any data.
For example, lets say you partitioned your table by the date field monthly and that you had 6 months of data in the table. If you query for a date between range of a week in OCT-2012, the query optimizer can throw out 5 of the 6 partitions and only scan the partition that has records in the month of OCT in 2012.
For more details, check the MySQL Partitioning page. It gives you all the necessary information and gives a more through example of what I described above in the "Partition Pruning" section.
Note, I would recommend creating/cloning your table in a new partitioned table and do the query in order to test the results and whether it satisfies your requirements. If you haven't already indexed the date column, that should be your first step, test, and if need be check out partitioning.