Deleting partitions in impala dynamically - partitioning

Impala allow to add partitions dynamically as following.
insert into table1 partition (part_col1="merged",part_col2,part_col3)
select col1,col2,col3,part_col2,part_col3 from table2 where
col="SomeValue"
So it will add multiple partitions depending upon the results from the select query.But when it comes to dropping the partitions there does not seem to be equivalent. is there ? You have to explicitly specify partitions to be dropped.
alter table table1 drop
partition(part_col1="A",part_col2="B",part_col3="C")
I can not just say something like
alter table table1 drop partition(part_col1="A",part_col2,part_col3)

Sadly No, there is no way to drop a partition dynamically as you specified in your example.

Dropping multiple partitions is possible with Impala 2.8+
The documentation states "In Impala 2.8 and higher, the expression for the partition clause with a DROP or SET operation can include comparison operators such as <, IN, or BETWEEN, and Boolean operators such as AND and OR."
For an example see https://stackoverflow.com/a/45550350/4202225 where the second Demo shows how to drop a range of partitions

Related

Partitioning table on YEAR and create view in MYSQL

I have 2 problems with a partitioned table in mysql.
My table has three columns
id_row INT NOT NULL AUTO_INCREMENT
name_element VARCHAR(45) NULL
date_element DATETIME NOT NULL
I modify the table to apply partioning by range on YEAR(date_element) as follows
ALTER TABLE `orderslist`
PARTITION BY RANGE(YEAR(date_element))
PARTITIONS 5(
PARTITION part_2013 VALUES LESS THAN (2014),
PARTITION part_2014 VALUES LESS THAN (2015),
PARTITION part_2015 VALUES LESS THAN (2016),
PARTITION part_2016 VALUES LESS THAN (2017),
PARTITION part_2017 VALUES LESS THAN (MAXVALUE));
but when I use
EXPLAIN PARTITIONS SELECT * FROM ordersList WHERE YEAR(date_element) > '2015';
the query uses all the partitions and not only part_2015, part_2016 and part_2017.
Instead if I use
EXPLAIN PARTITIONS SELECT * FROM ordersList WHERE date_element > '2015-10-10 10:00:00';
it works.
So my questions are:
How can I make the first query work?
Is there a way to create a materialized view from this table without losing the partitions?
Thank you
In your first example: EXPLAIN PARTITIONS SELECT * FROM ordersList WHERE YEAR(date_element) > '2015'; there's no way for the engine to identify beforehand in which partition your data is.
It must evaluate YEAR(date_element) in every row to find out the year. It's a classic example of filtering by a function's result. DBMS in general can't use indexes to find data this way, since the function's result is unknown and must be evaluated for every table, so your search turns into a full scan.
I understand your point here, since you used the same function the define partitioning and to find data, but for some reason this optimization is not there. In other words: the engine doesn't notice both functions are the same.
In the second statement, you're directly comparing a column to an arbitrary value, this is what the engine prefers, and indexes come into play.
MySQL's PARTITIONing is quite finicky. Whereas YEAR() is recognized, it is probably the only expression that is recognized, not > it plays dumb.
Why are you partitioning on YEAR? it may not be useful.
If your queries are like what you described. then an appropriate index on a non-partitioned table is likely to run just as fast.
Please provide the important queries and SHOW CREATE TABLE (with or without partitioning) so we can analyze what makes the most sense.
Also, what is PARTITIONS 5??

Most efficient query to get last modified record in large table

I have a table with a large number of records ( > 300,000). The most relevant fields in the table are:
CREATE_DATE
MOD_DATE
Those are updated every time a record is added or updated.
I now need to query this table to find the date of the record that was modified last. I'm currently using
SELECT mod_date FROM table ORDER BY mod_date DESC LIMIT 1;
But I'm wondering if this is the most efficient way to get the answer.
I've tried adding a where clause to limit the date to the last month, but it looks like that's actually slower (and I need the most recent date, which could be older than the last month).
I've also tried the suggestion I read elsewhere to use:
SELECT UPDATE_TIME
FROM information_schema.tables
WHERE TABLE_SCHEMA = 'db'
AND TABLE_NAME = 'table';
But since I might be working on a dump of the original that query might result into NULL. And it looks like this is actually slower than the original query.
I can't resort to last_insert_id() because I'm not updating or inserting.
I just want to make sure I have the most efficient query possible.
The most efficient way for this query would be to use an index for the column MOD_DATE.
From How MySQL Uses Indexes
8.3.1 How MySQL Uses Indexes
Indexes are used to find rows with specific column values quickly.
Without an index, MySQL must begin with the first row and then read
through the entire table to find the relevant rows. The larger the
table, the more this costs. If the table has an index for the columns
in question, MySQL can quickly determine the position to seek to in
the middle of the data file without having to look at all the data. If
a table has 1,000 rows, this is at least 100 times faster than reading
sequentially.
You can use
SHOW CREATE TABLE UPDATE_TIME;
to get the CREATE statement and see, if an index on MOD_DATE is defined.
To add an Index you can use
CREATE INDEX
CREATE [UNIQUE|FULLTEXT|SPATIAL] INDEX index_name
[index_type]
ON tbl_name (index_col_name,...)
[index_option]
[algorithm_option | lock_option] ...
see http://dev.mysql.com/doc/refman/5.6/en/create-index.html
Make sure that both of those fields are indexed.
Then I would just run -
select max(mod_date) from table
or create_date, whichever one.
Make sure to create 2 indexes, one on each date field, not a compound index on both.
As for a discussion of the difference between this and using limit, see MIN/MAX vs ORDER BY and LIMIT
Use EXPLAIN:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
This tells You how mysql executes statement, thanks to that You can figure out most efficient way, cause it depends on Your db structure and there is no one universal solution.

how does MySQL know which partition to look up?

Let's analyse the simplest possible example of MySQL paritioning by hash (slightly modified version of http://dev.mysql.com/doc/refman/5.5/en/alter-table-partition-operations.html):
CREATE TABLE t1 (
id INT,
year_col INT
);
ALTER TABLE t1
PARTITION BY HASH(year_col)
PARTITIONS 8;
Let's say we put there millions of records. The question is - if a specific query comes (e.g. SELECT * FROM t1 WHERE year_col = 5) then how does MySQL know which partition to look up? There are 8 partitions. I guess that the hash function is calculated and MySQL recognizes that it matches thepartitioning key and then MySQL knows which one that is. But what is the query is SELECT * FROM t1 WHERE year_col IN (5, 45, 5435)? How about other non-trivial queries? Is there any general algorithm for that?
This is called Partition pruning:
The optimizer can perform pruning whenever a WHERE condition can be reduced to either one of the following two cases:
partition_column = constant
partition_column IN (constant1, constant2, ..., constantN)
In the first case, the optimizer simply evaluates the partitioning expression for the value given, determines which partition contains that value, and scans only this partition. (...)
In the second case, the optimizer evaluates the partitioning expression for each value in the list, creates a list of matching partitions, and then scans only the partitions in this partition list. (...)
MySQL can apply partition pruning to SELECT, DELETE, and UPDATE statements. INSERT statements currently cannot be pruned.
Pruning can also be applied to short ranges, which the optimizer can convert into equivalent lists of values. (...)

Find items with similar keywords in InnoDB?

Right now I have the following construct to find items with similar keywords:
CREATE TEMPORARY TABLE tmp (FULLTEXT INDEX (keywords)) ENGINE=MyISAM
SELECT object_id, keywords FROM object_search_de;
SELECT object_id
FROM tmp
WHERE MATCH (keywords) AGAINST ('foo,bar') > 1.045;
DROP TEMPORARY TABLE tmp;
So, depending on the amount of overall records and the average size of the keyword field, this can get really slow (over 60 seconds execution time). My goal would be to be within 1 second for this task.
Alternatively to keywords comma separated in a TEXT field, I do also have an atomic keyord table (meaning two columns keyword and object_id, directly associating one keyword with an item).
Are there any alternatives or smooth solutions to achieving the same effect without resorting to a MyISAM mirror table?
First of all, do not create the table each time. You can create it once and use a trigger to insert/update/delete records or periodically (every hour for example) truncate and insert the records if you don't want to use triggers.
Alternatively, you can offload this task from MySQL and use Lucene/Solr or Sphinx.

Optimize mySql for faster alter table add column

I have a table that has 170,002,225 rows with about 35 columns and two indexes. I want to add a column. The alter table command took about 10 hours. Neither the processor seemed busy during that time nor were there excessive IO waits. This is on a 4 way high performance box with tons of memory.
Is this the best I can do? Is there something I can look at to optimize the add column in tuning of the db?
I faced a very similar situation in the past and i improve the performance of the operation in this way :
Create a new table (using the structure of the current table) with the new column(s) included.
execute a INSERT INTO new_table (column1,..columnN) SELECT (column1,..columnN) FROM current_table;
rename the current table
rename the new table using the name of the current table.
ALTER TABLE in MySQL is actually going to create a new table with new schema, then re-INSERT all the data and delete the old table. You might save some time by creating the new table, loading the data and then renaming the table.
From "High Performance MySQL book" (the percona guys):
The usual trick for loading MyISAM table efficiently is to disable keys, load the data and renalbe the keys:
mysql> ALTER TABLE test.load_data DISABLE KEYS;
-- load data
mysql> ALTER TABLE test.load_data ENABLE KEYS;
Well, I would recommend using latest Percona MySQL builds plus since there is the following note in MySQL manual
In other cases, MySQL creates a
temporary table, even if the data
wouldn't strictly need to be copied.
For MyISAM tables, you can speed up
the index re-creation operation (which
is the slowest part of the alteration
process) by setting the
myisam_sort_buffer_size system
variable to a high value.
You can do ALTER TABLE DISABLE KEYS first, then add column and then ALTER TABLE ENABLE KEYS. I don't see anything can be done here.
BTW, can't you go MongoDB? It doesn't rebuild anything when you add column.
Maybe you can remove the index before alter the table because what is take most of the time to build is the index?
Combining some of the comments on the other answers, this was the solution that worked for me (MySQL 5.6):
create table mytablenew like mytable;
alter table mytablenew add column col4a varchar(12) not null after col4;
alter table mytablenew drop index index1, drop index index2,...drop index indexN;
insert into mytablenew (col1,col2,...colN) select col1,col2,...colN from mytable;
alter table mytablenew add index index1 (col1), add index index2 (col2),...add index indexN (colN);
rename table mytable to mytableold, mytablenew to mytable
On a 75M row table, dropping the indexes before the insert caused the query to complete in 24 minutes rather than 43 minutes.
Other answers/comments have insert into mytablenew (col1) select (col1) from mytable, but this results in ERROR 1241 (21000): Operand should contain 1 column(s) if you have the parenthesis in the select query.
Other answers/comments have insert into mytablenew select * from mytable;, but this results in ERROR 1136 (21S01): Column count doesn't match value count at row 1 if you've already added a column.