Issues Creating Apache Kudu Range Partitioned Table - partitioning

I am trying to create a simple Kudu table with Hash and Range Partitions.
When I try to use a Decimal(18,0) for the Range partition I get the following error:
IllegalStateException: null
drop table if exists mydb.xxx;
create table if NOT EXISTS mydb.xxx (
tx_id decimal(18,0) not null ,
tdl_id decimal(18,0) not null ,
dt1 int ,
PRIMARY KEY(tx_id,tdl_id) )
PARTITION BY
HASH(tx_id,tdl_id) PARTITIONS 22 ,
RANGE (tx_id )
(
partition values < 1000 ,
partition 1000 <= values
)
stored as kudu;
This Statement works:
drop table if exists mydb.xxx;
create table if NOT EXISTS mydb.xxx (
tx_id bigint not null ,
tdl_id decimal(18,0) not null ,
dt1 int ,
PRIMARY KEY(tx_id,tdl_id) )
PARTITION BY
HASH(tx_id,tdl_id) PARTITIONS 22 ,
RANGE (tx_id )
(
partition values < 1000 ,
partition 1000 <= values
)
stored as kudu;
The only difference is the data type for tx_id
Does anyone know if it is illegal to use decimal datatypes for range partitioning in Kudu?
Thank you for your help.

Please check kudu tablet server thread then run this change
drop table if exists mydb.xxx;
create table if NOT EXISTS mydb.xxx (
tx_id bigint not null ,
tdl_id decimal(18,0) not null ,
dt1 int ,
PRIMARY KEY(tx_id,tdl_id) )
PARTITION BY
HASH(tx_id,tdl_id) PARTITIONS 22 ,
RANGE (tx_id )
(
partition values < 1000 ,
partition values = 1000 ,
partition values >= 1000
)
stored as kudu;

Related

MySQL table partitioning on timestamp

I have partitioned a table (because of an out of memory error - table got too big). I have partitioned it on a timestamp column as shown below:
CREATE TABLE test (
fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
dob timestamp NOT NULL
)
PARTITION BY RANGE( unix_timestamp(dob) ) (
PARTITION p2012 VALUES LESS THAN (unix_timestamp('2013-01-01 00:00:00')),
PARTITION p2013 VALUES LESS THAN (unix_timestamp('2014-01-01 00:00:00')),
PARTITION pNew VALUES LESS THAN MAXVALUE
);
I was hoping that the process of partitioning would also help in speeding up a couple of my queries whihc take a few hours to run; however, this type of partitioning doesn't seem to kick in and all partitions are still being used and scanned through for the queries. I have tried, and failed, with a couple more approaches:
1) Tried to use different range for the partitioning
CREATE TABLE t2 (
fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
region_code TINYINT UNSIGNED NOT NULL,
dob timestamp NOT NULL
)
PARTITION BY RANGE( YEAR(dob) ) (
PARTITION p2012 VALUES LESS THAN (2013),
PARTITION p2013 VALUES LESS THAN (2014),
PARTITION pNew VALUES LESS THAN MAXVALUE
);
However, that results in an error: Error Code: 1486. Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed
2) Gave up on changing partitioning to be recognized by the query optimizer, and as suggested in MySQL's Doc - 18.5 Partition Selection tried specifying which partitions to use in the select statement instead:
select * from t2 partition (p2012)
But, that returns a syntax error Error Code: 1064. You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(p2012) LIMIT 0, 1000' at line 1
Does anybody have any suggestions what else I could try to utilize table partitioning to optimize the queries?
You can use UNIX_TIMESTAMP() function. Example from MySQL docs:
CREATE TABLE quarterly_report_status (
report_id INT NOT NULL,
report_status VARCHAR(20) NOT NULL,
report_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
)
PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated) ) (
PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-01 00:00:00') ),
PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-04-01 00:00:00') ),
PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-07-01 00:00:00') ),
PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-10-01 00:00:00') ),
PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-01-01 00:00:00') ),
PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-04-01 00:00:00') ),
PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-07-01 00:00:00') ),
PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-10-01 00:00:00') ),
PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2010-01-01 00:00:00') ),
PARTITION p9 VALUES LESS THAN (MAXVALUE)
);
You can find it in:
https://dev.mysql.com/doc/refman/5.7/en/partitioning-range.html.
You can do this if you use DATE or DATETIME instead of TIMESTAMP as the data type.
CREATE TABLE t2 (
fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
region_code TINYINT UNSIGNED NOT NULL,
dob DATETIME NOT NULL
)
PARTITION BY RANGE( YEAR(dob) ) (
PARTITION p2012 VALUES LESS THAN (2013),
PARTITION p2013 VALUES LESS THAN (2014),
PARTITION pNew VALUES LESS THAN MAXVALUE
);
Using the partition-selection hint is only supported in MySQL 5.6 and later.
See http://dev.mysql.com/doc/refman/5.6/en/partitioning-selection.html
Note that the page of the manual is only for MySQL 5.6. If you try to click on MySQL 5.5 documentation link, it redirects you back to 5.6.
The wrong is that, This field dob is not unique key!
you can use this command:
CREATE TABLE t2 (
fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
region_code TINYINT UNSIGNED NOT NULL,
dob timestamp NOT NULL,
unique 'dob' (dob)
)
or the table is exist:
alter table t2 add UNIQUE(dob)
You can try it!

Why is my MySQL group by so slow?

I am trying to query against a partitioned table (by month) approaching 20M rows. I need to group by DATE(transaction_utc) as well as country_id. The rows that get returned if i turn off the group by and aggregates is just over 40k, which isn't too many, however adding the group by makes the query substantially slower unless said GROUP BY is on the transaction_utc column, in which case it gets FAST.
I've been trying to optimize this first query below by tweaking the query and/or the indexes, and got to the point below (about 2x as fast as initially) however still stuck with a 5s query for summarizing 45k rows, which seems way too much.
For reference, this box is a brand new 24 logical core, 64GB RAM, Mariadb-5.5.x server with way more INNODB buffer pool available than index space on the server, so shouldn't be any RAM or CPU pressures.
So, I'm looking for ideas on what is causing this slow down and suggestions on speeding it up. Any feedback would be greatly appreciated! :)
Ok, onto the details...
The following query (the one I actually need) takes approx 5 seconds (+/-), and returns less than 100 rows.
SELECT lss.`country_id` AS CountryId
, Date(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' ) GROUP BY lss.`country_id`, DATE(lss.`transaction_utc`)
EXPLAIN SELECT for the same query is as follows. Notice that it's not using the transaction_utc key. Shouldn't it be using my covering index instead?
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE lss ref idx_unique,transaction_utc,country_id idx_unique 50 const 1208802 Using where; Using temporary; Using filesort
1 SIMPLE c eq_ref PRIMARY PRIMARY 4 georiot.lss.country_id 1
Now onto a couple other options that I've tried to attempt to determine whats going on...
The following query (changed group by) takes about 5 seconds (+/-), and returns only 3 rows:
SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' ) GROUP BY lss.`country_id`
The following query (removed group by) takes 4-5 seconds (+/-) and returns 1 row:
SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )
The following query takes .00X seconds (+/-) and returns ~45k rows. This to me shows that at max we're only trying to group 45K rows into less than 100 groups (as in my initial query):
SELECT lss.`country_id` AS CountryId
, DATE(lss.`transaction_utc`) AS TransactionDate
, c.`name` AS CountryName, lss.`country_id` AS CountryId
, COALESCE(SUM(lss.`sale_usd`),0) AS SaleUSD
, COALESCE(SUM(lss.`commission_usd`),0) AS CommissionUSD
FROM `sales` lss
JOIN `countries` c ON lss.`country_id` = c.`country_id`
WHERE ( lss.`transaction_utc` BETWEEN '2012-09-26' AND '2012-10-26' AND lss.`username` = 'someuser' )
GROUP BY lss.`transaction_utc`
TABLE SCHEMA:
CREATE TABLE IF NOT EXISTS `sales` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user_linkshare_account_id` int(11) unsigned NOT NULL,
`username` varchar(16) NOT NULL,
`country_id` int(4) unsigned NOT NULL,
`order` varchar(16) NOT NULL,
`raw_tracking_code` varchar(255) DEFAULT NULL,
`transaction_utc` datetime NOT NULL,
`processed_utc` datetime NOT NULL ,
`sku` varchar(16) NOT NULL,
`sale_original` decimal(10,4) NOT NULL,
`sale_usd` decimal(10,4) NOT NULL,
`quantity` int(11) NOT NULL,
`commission_original` decimal(10,4) NOT NULL,
`commission_usd` decimal(10,4) NOT NULL,
`original_currency` char(3) NOT NULL,
PRIMARY KEY (`id`,`transaction_utc`),
UNIQUE KEY `idx_unique` (`username`,`order`,`processed_utc`,`sku`,`transaction_utc`),
KEY `raw_tracking_code` (`raw_tracking_code`),
KEY `idx_usd_amounts` (`sale_usd`,`commission_usd`),
KEY `idx_countries` (`country_id`),
KEY `transaction_utc` (`transaction_utc`,`username`,`country_id`,`sale_usd`,`commission_usd`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE ( TO_DAYS(`transaction_utc`))
(PARTITION pOLD VALUES LESS THAN (735112) ENGINE = InnoDB,
PARTITION p201209 VALUES LESS THAN (735142) ENGINE = InnoDB,
PARTITION p201210 VALUES LESS THAN (735173) ENGINE = InnoDB,
PARTITION p201211 VALUES LESS THAN (735203) ENGINE = InnoDB,
PARTITION p201212 VALUES LESS THAN (735234) ENGINE = InnoDB,
PARTITION pMAX VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ AUTO_INCREMENT=19696320 ;
The offending part is probably the GROUP BY DATE(transaction_utc). You also claim to have a covering index for this query but I see none. Your 5-column index has all the columns used in the query but not in the best order (which is: WHERE - GROUP BY - SELECT).
So, the engine, finding no useful index, would have to evaluate this function for all the 20M rows. Actually, it finds an index that starts with username (the idx_unique) and it uses that, so it has to evaluate the function for (only) 1.2M rows. If you had a (transaction_utc) or a (username, transaction_utc) it would choose the most useful of the three.
Can you afford to change the table structure by splitting the column into date and time parts?
If you can, then an index on (username, country_id, transaction_date) or (changing the order of the two columns used for grouping), on (username, transaction_date, country_id) would be quite efficient.
A covering index on (username, country_id, transaction_date, sale_usd, commission_usd) even better.
If you want to keep the current structure, try changing the order inside your 5-column index to:
(username, country_id, transaction_utc, sale_usd, commission_usd)
or to:
(username, transaction_utc, country_id, sale_usd, commission_usd)
Since you are using MariaDB, you can use the VIRTUAL columns feature, without changing the existing columns:
Add a virtual (persistent) column and the appropriate index:
ALTER TABLE sales
ADD COLUMN transaction_date DATE NOT NULL
AS DATE(transaction_utc)
PERSISTENT
ADD INDEX special_IDX
(username, country_id, transaction_date, sale_usd, commission_usd) ;

how do you copy a mysql query minus the primary key into another table

in my code I have the table itself created, I am just trying to copy all fields from one table into another
INSERT INTO $tbl_query (imagetargetpath, gamename, gamedirectory, like, dislike)
SELECT *
FROM $sql_tbl
WHERE gamename LIKE '%$item_searched%'
the table itself was created and structured like this
$tbl_query(`id` INT( 14 ) NOT NULL AUTO_INCREMENT ,
`imagetargetpath` VARCHAR( 80 ) NOT NULL ,
`gamename` VARCHAR( 50 ) NOT NULL , `gamedirectory` VARCHAR( 50 ) NOT NULL
, `like` INT( 14 ) NOT NULL , `dislike` INT( 14 ) NOT NULL ,PRIMARY KEY ( `id` )) ";
my goal is basically to copy the query results into a new table but have the $tbl_query have its own unique id numbers
Select only the columns you need:
INSERT INTO $tbl_query (imagetargetpath, gamename, gamedirectory, `like`, dislike)
SELECT imagetargetpath, gamename, gamedirectory, `like`, dislike
FROM $sql_tbl
WHERE gamename LIKE '%$item_searched%'
FYI, this problem/question comes up a lot with databases, so it worth remembering how to do it

partitions and sub partitions

The following CREATE TABLE statement to partition a table works as expected, without error.
CREATE TABLE `ox_data_archive_20120108` (
`id` bigint(20) unsigned NOT NULL,
`creativeid` int unsigned NOT NULL,
`zoneid` int unsigned NOT NULL,
`datetime` datetime NOT NULL
) PARTITION BY LIST(to_days(datetime )) (
PARTITION `1Jan10` VALUES IN (to_days('2010-01-01') ),
PARTITION `2Jan10` VALUES IN (to_days('2010-01-02') ),
PARTITION `3Jan10` VALUES IN (to_days('2010-01-03') )
);
What I need to do is to create subpartitions based on date + zoneid. I tried the following:
CREATE TABLE mypart (
`id` bigint(20) unsigned NOT NULL,
`creativeid` int unsigned NOT NULL,
`zoneid` int unsigned NOT NULL,
`datetime` datetime NOT NULL
) PARTITION BY LIST(to_days(datetime ))
SUBPARTITION BY KEY(zoneid) (
PARTITION `1Jan10` VALUES IN (to_days('2010-01-01') )
(Subpartition s1, Subpartition s2),
PARTITION `2Jan10` VALUES IN (to_days('2010-01-02') )
(Subpartition s3, Subpartition s4),
PARTITION `3Jan10` VALUES IN (to_days('2010-01-03') )
(Subpartition s5, Subpartition s6)
);
Inserting into this table:
INSERT INTO mypart VALUES (1, 2, 3, '2012-01-31 04:10:03');
results in the following error:
ERROR 1526 (HY000): Table has no partition for value 734898
My query expects to use the zoneid subpartition based on dates. Is it possible?
Contrary to your assertion that the first table works without error, inserting the sample data into it:
INSERT INTO `ox_data_archive_20120108` VALUES (1, 2, 3, '2012-01-31 04:10:03');
results in the same error as for the second table. The value given in the error (734898) happens to be the value for to_days('2012-01-31'). You get this error because you only have partitions for January 1st through 3rd, 2010. Both the day-of-month and the year for the sample data are outside the defined partitions. Instead of TO_DAYS (which returns the number of days from year 0 to the given date), you probably want DAYOFMONTH. Since each partition is contiguous, a RANGE partition seems more appropriate than a LIST.
Off topic, you only need to specify separate subpartition definitions when you want to set options for the subpartitions. Since you're not doing that, a SUBPARTITIONS 2 clause will do the same thing as your statement, but is simpler.
CREATE TABLE mypart (
`id` bigint(20) unsigned NOT NULL,
`creativeid` int unsigned NOT NULL,
`zoneid` int unsigned NOT NULL,
`datetime` datetime NOT NULL
) PARTITION BY RANGE(DAYOFMONTH(`datetime`))
SUBPARTITION BY KEY(zoneid)
SUBPARTITIONS 2 (
PARTITION `01` VALUES LESS THAN 2, -- Note: 0 is valid day-of-month
PARTITION `02` VALUES LESS THAN 3,
PARTITION `03` VALUES LESS THAN 4,
...
);

Using query to change table mapping

I have a table mytable( id, key, value). I realize that key is generating a lot of data redundancy since my key is a string. (my keys are really long, but repetititve) How do I build a separate table out that has (key, keyID) and then alternate my table to be mytable( id, keyID, value) and keyTable(keyID, key) ?
Create keyTable
Fill keys from mytable:
INSERT INTO keyTable (`key`) SELECT DISTINCT mytable.key FROM mytable;
add keyID column to mytable
Assign keyIDs:
UPDATE mytable SET keyID = (SELECT keyTable.keyID FROM keyTable WHERE keyTable.key = mytable.key);
Remove key column from mytable
i just posted my workout for your problem. Just check this step by step:
CREATE TABLE `keytable` (
`keyID` INT( 11 ) NOT NULL auto_increment,
`key` VARCHAR( 100 ) NOT NULL,
`id` INT( 11 ) NOT NULL
) ;
insert into `keytable` (`key`,`id`) select `key`,`id` from mytable;
ALTER TABLE `mytable` CHANGE `key` `keyID` INT( 11 ) NOT NULL ;
update `mytable` set `keyID`= (select `keyID` from keytable where keytable.id=mytable.id)
ALTER TABLE `keytable` DROP `id` ;