I have a large table with 80 million rows and a trigger that updates two other tables, all of which are TokuDB. The server is running Percona 5.6
CREATE TABLE `main` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`ip_addr` varchar(50) NOT NULL DEFAULT '',
`username` varchar(255) NOT NULL DEFAULT ''.
PRIMARY KEY (`id`),
) ENGINE=TokuDB;
The trigger code is
if NEW.ip_addr <> "" THEN
-- get the current oldest date
-- set #maxdate := now();
set #maxdate := (select lastseen from uniq_ip where data = NEW.ip_addr);
INSERT INTO uniq_ip (`data`, `total`, `lastseen`)
VALUES (NEW.ip_addr, 1, NEW.timestamp, #subnet)
ON DUPLICATE KEY UPDATE total = total + 1, lastseen = latest_date(NEW.timestamp, #maxdate);
end if;
-- get all values in one go, its indexed some query comes from index.
if NEW.username <> "" THEN
-- get the current oldest date
set #maxdate := (select lastseen from uniq_username where data = NEW.username);
INSERT INTO uniq_username (`data`, `total`, `lastseen`)
VALUES (NEW.username, 1, NEW.timestamp)
ON DUPLICATE KEY UPDATE total = total + 1, lastseen = latest_date(NEW.timestamp, #maxdate);
end if;
and the uniq_username and uniq_ip are
CREATE TABLE `uniq_ip` (
`data` varchar(42) NOT NULL,
`total` mediumint(4) unsigned NOT NULL DEFAULT '0',
`lastseen` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`data`),
KEY `idx_lastseen` (`data`,`lastseen`)
) ENGINE=TokuDB DEFAULT CHARSET=ascii;
CREATE TABLE `uniq_username` (
`data` varchar(255) CHARACTER SET latin1 NOT NULL,
`total` mediumint(4) unsigned NOT NULL DEFAULT '0',
`lastseen` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`data`),
KEY `idx_data_time` (`data`,`lastseen`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8;
The problem is that a bulk insert under load works perfectly on the username part of the trigger when calculating the timeseen value. However the first part of the function to do the same on the uniq_ip table drops insert rate from 800/s to 30/s when it processes
set #maxdate := (select lastseen from uniq_ip where data = NEW.ip_addr);
If you set to the get now(), its fast (but not the correct result). uniq_username and uniq_ip have the same structure and indexes and the trigger slows right down regardless of which you process first (username or ip) but its only the above statement that slows the trigger down.
The problem below if the uniq_ip table is either TokuDB or InnoDB and the default charset makes no difference, neither does the insert statement being active or commented out. Latest_date() is a tiny function that returns the most recent datetime
Any ideas or tips?
Thanks
main.ip_addr is a latin1 and uniq_ip.data was casting as utf. Changing uniq_ip.data to latin1 improved the insert rate from 50 inserts/sec to 1000 inserts/sec. I guess that latin1->utf8 cast is a cpu killer
Related
I'm trying to count the records in my "records" table and insert in results table but I just want to count today's records.
Below you will see some alternatives that I tried (I'm using MySQL), but I keep getting this error:
You have a syntax error in your SQL next to '' on line 2
INSERT INTO results (Data,total)
VALUES (now(), (SELECT COUNT(*) FROM records WHERE Data = now());
This SQL also causes an error:
INSERT INTO results (Data, total)
VALUES (now(), (SELECT COUNT(record.ID) AS day FROM record
WHERE date(Data) = date(date_sub(now(), interval 0 day));
and then
INSERT INTO resultS (Data,total)
VALUES (now(), (SELECT COUNT(*) FROM records
WHERE Data >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY));
And yet another attempt:
INSERT INTO results (Data, Total)
VALUES (now(), (SELECT COUNT(*) FROM records
WHERE DATE(Data)= CURRENT_DATE() - INTERVAL 1 DAY));
This is my sql config man:
CREATE TABLE `records`
(
`ID` char(23) NOT NULL,
`Name` varchar(255) NOT NULL,
`Total` int(255) NOT NULL,
`Data` date NOT NULL,
`QrCode` varchar(255) NOT NULL,
`City` varchar(255) NOT NULL,
`Device` varchar(255) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `results`
(
`id` int(11) NOT NULL,
`total` int(11) NOT NULL,
`Data` date DEFAULT NULL,
`grown` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
You have defined grown column as not null so you cannot put there NULL.
My query works :
INSERT INTO results
VALUES (1, (SELECT COUNT(1) FROM records WHERE Data= now()), now(), 1);
You should define default value for grown column. Same situation you have with column id. You should define sequence for column id:
id NOT NULL AUTO_INCREMENT;
INSERT INTO results (Data, total)
SELECT CURRENT_DATE(), COUNT(*)
FROM records
WHERE DATE(Data) = CURRENT_DATE();
I have a column named 'Ratio' in my table. Which I want to use to store the value of the ratio of the items with respect to the total value at a specific date.
The problem I'm facing is that the trigger only changes the ratio of one row (NEW.row) and leaves the values of the rest alone. But when one value changes I want them all to change since it is a ratio.
This is what I have so far;
CREATE TABLE `item_table` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`Date` DATE DEFAULT NULL,
`Name` VARCHAR(20) DEFAULT NULL,
`Value` DECIMAL(7 , 6 ) DEFAULT NULL,
`Ratio` DECIMAL(7 , 6 ) DEFAULT NULL,
PRIMARY KEY (`id`)
);
CREATE
TRIGGER `ins_ratio`
BEFORE INSERT ON `item_table` FOR EACH ROW
SET NEW . `Ratio` = NEW.`Value` / (SELECT
SUM(`Value`)
FROM
`item_table`
WHERE
Date = NEW.Date);
How can I get this done?
CREATE TRIGGER `ins_ratio`
BEFORE INSERT ON `item_table`
FOR EACH ROW
SET NEW.`Ratio` = NEW.`Value` / (SELECT COALESCE(SUM(`Value`), 0) + NEW.`Value`
FROM `item_table`
WHERE `Date` = NEW.`Date`);
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=e387a87a05422afec40ecad070e78627
Pay attention - your Ratio is cumulative, not recalculational.
I have a mysql database, and a table structure like this:
CREATE TABLE `user_session_log` (
`stat_id` int(8) NOT NULL AUTO_INCREMENT,
`metric` tinyint(1) NOT NULL DEFAULT '0',
`platform` tinyint(1) NOT NULL DEFAULT '0',
`page_id` varchar(128) DEFAULT '_empty_',
`target_date` date DEFAULT NULL,
`country` varchar(2) DEFAULT NULL COMMENT 'ISO 3166 country code (2 symbols)',
`amount` int(100) NOT NULL DEFAULT '0.000000' COMMENT 'counter or amount',
`unique_key` varchar(180) DEFAULT NULL COMMENT 'Optional unique identifier',
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`stat_id`),
UNIQUE KEY `unique_key` (`unique_key`) USING BTREE,
KEY `target_date` (`target_date`)
) ENGINE=InnoDB AUTO_INCREMENT=21657473 DEFAULT CHARSET=utf8
What I'm trying to achieve is to log the active sessions / unique users based on date, and page_id, and country. Currently I'm able to achieve this by generating multiple insert statements with unique_key, buy adding a page_id and date in the unique key but I want something a little bit different.
The logic should be: insert new row of unique_key (semi-unique user id), where country = this, date = this, page_id = this. If there is already a row with such information (same page_id, unique_key, and date + country) - update the amount = (amount) + 1; (session).
So I could do lookups like :
SELECT sum(amount) WHERE page_id = "something" AND target_date = "2018-12-21"
This would give me a number of sessions. OR:
SELECT COUNT(*) WHERE page_id = "something" AND target_date = "2018-12-21"
This would give me a number of active users on that pagee_id on that day
OR:
SELECT COUNT(*) WHERE target_date = "2018-12-21"
Which would give me a result of total users on that day.
I know about unique index, but would it give me a result I'm looking for?
Edit, a sample insert:
INSERT INTO `user_session_log` (`platform`,`page_id`,`target_date`,`country`,`amount`,`unique_key`,`created`,`modified`) VALUES ('1','page_id_54','2018-10-08','US',1,'ea3d0ce0406a838d9fd31df2e2ec8085',NOW(),NOW()) ON DUPLICATE KEY UPDATE `amount` = (amount) +1, `modified` = NOW();
and the table should know if theres a duplicate based on if theres a same unique_key + date + country + platform + page_id, otherwise just insert a new row.
Right now I'm doing this differently by having different metrics and a unique_key generated already containing the date + page_id and then hashed. that way it's unique by means i can filter the different unique users on a day basis, but I can't filter the amount of sessions that unique user has had, or how long he uses the software and similar.
Firstly create unique index on all the columns that needs to be unique as follows:
ALTER TABLE user_session_log ADD UNIQUE INDEX idx_multi_column ON user_session_log (unique_key, date, country, platform, page_id);
then you can use INSERT ... ON DUPLICATE KEY UPDATE query to insert/update.
I have a call data records table (CDRs) with about 7 million rows,
Each row holds a call record:
unique identifier, caller number, receiving number, answer_datetime and duration_in_seconds
I am looking for an efficient way of finding calls handled by the same receiving number in parallel times.
Any query I tried took too long
The table structure:
CREATE TABLE `cdrs` (
`global_identifier` varchar(32) DEFAULT NULL,
`caller_num` int(14) DEFAULT NULL,
`receiving_num` int(14) DEFAULT NULL,
`call_answer` datetime DEFAULT NULL,
`call_duration` int(7) DEFAULT NULL,
KEY `caller_num` (`caller_num`),
KEY `receiving_num` (`receiving_num`),
KEY `call_answer` (`call_answer`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
;
The query I have already tried:
SELECT
DATE_FORMAT(call_answer,'%Y%m') AS Ym,
b_num,
COUNT(*) AS cnt,
SUM(call_duration) / 60 AS c_dur
FROM
(
SELECT
ycdr.*
FROM
cdrs ycdr
INNER JOIN cdrs ycdr2 ON
ycdr2.receiving_num = ycdr.receiving_num
AND ycdr2.caller_num != ycdr.caller_num
WHERE
ycdr2.call_answer BETWEEN
ycdr.call_answer AND ycdr.call_answer
AND ycdr.call_answer >= '2015-01-01'
AND ycdr.call_answer < '2015-01-05'
GROUP BY
ycdr.global_identifier
) a
;
The EXPLAIN result:
I have this structure of my db:
CREATE TABLE IF NOT EXISTS `peoples` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;
For customers.
CREATE TABLE IF NOT EXISTS `peoplesaddresses` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`people_id` int(10) unsigned NOT NULL,
`phone` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`address` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;
For their addresses.
CREATE TABLE IF NOT EXISTS `peoplesphones` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`people_id` int(10) unsigned NOT NULL,
`phone` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`address` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;
For their phones.
UPD4
ALTER TABLE peoplesaddresses DISABLE KEYS;
ALTER TABLE peoplesphones DISABLE KEYS;
ALTER TABLE peoplesaddresses ADD INDEX i_phone (phone);
ALTER TABLE peoplesphones ADD INDEX i_phone (phone);
ALTER TABLE peoplesaddresses ADD INDEX i_address (address);
ALTER TABLE peoplesphones ADD INDEX i_address (address);
ALTER TABLE peoplesaddresses ENABLE KEYS;
ALTER TABLE peoplesphones ENABLE KEYS;
END UPD4
CREATE TABLE IF NOT EXISTS `order` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`people_id` int(10) unsigned NOT NULL,
`name` varchar(255) CHARACTER SET utf8 NOT NULL,
`phone` varchar(255) CHARACTER SET utf8 NOT NULL,
`adress` varchar(255) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=12 ;
INSERT INTO `order` (`id`, `people_id`, `name`, `phone`, `adress`) VALUES
(1, 0, 'name1', 'phone1', 'address1'),
(2, 0, 'name1_1', 'phone1', 'address1_1'),
(3, 0, 'name1_1', 'phone1', 'address1_2'),
(4, 0, 'name2', 'phone2', 'address2'),
(5, 0, 'name2_1', 'phone2', 'address2_1'),
(6, 0, 'name3', 'phone3', 'address3'),
(7, 0, 'name4', 'phone4', 'address4'),
(8, 0, 'name1_1', 'phone5', 'address1_1'),
(9, 0, 'name1_1', 'phone5', 'address1_2'),
(11, 0, 'name1', 'phone1', 'address1'),
(10, 0, 'name1', 'phone1', 'address1');
Production base have over 9000 records. Is there way to execute this 3 update query's little more faster, than now (~50 min on dev machine).
INSERT INTO peoplesphones( phone, address )
SELECT DISTINCT `order`.phone, `order`.adress
FROM `order`
GROUP BY `order`.phone;
Fill peoplesphones table with unique phones
INSERT INTO peoplesaddresses( phone, address )
SELECT DISTINCT `order`.phone, `order`.adress
FROM `order`
GROUP BY `order`.adress;
Fill peoplesaddresses table with unique adress.
The next three querys are very slow:
UPDATE peoplesaddresses, peoplesphones SET peoplesaddresses.people_id = peoplesphones.id WHERE peoplesaddresses.phone = peoplesphones.phone;
UPDATE peoplesaddresses, peoplesphones SET peoplesphones.people_id = peoplesaddresses.people_id WHERE peoplesaddresses.address = peoplesphones.address;
UPDATE `order`, `peoplesphones` SET `order`.people_id = `peoplesphones`.people_id where `order`.phone = `peoplesphones`.phone;
Finally fill people table, and clear uneccessary fields.
INSERT INTO peoples( id, name )
SELECT DISTINCT `order`.people_id, `order`.name
FROM `order`
GROUP BY `order`.people_id;
ALTER TABLE `peoplesphones`
DROP `address`;
ALTER TABLE `peoplesaddresses`
DROP `phone`;
So, again: How can I make those UPDATE query's a little more faster? THX.
UPD: I forgott to say: I need to do it at once, just for migrate phones and adresses into other tables since one people can have more than one phone, and can order pizza not only at home.
UPD2:
UPD3:
Replace slow update querys on this (without with) get nothing.
UPDATE peoplesaddresses
LEFT JOIN
peoplesphones
ON peoplesaddresses.phone = peoplesphones.phone
SET peoplesaddresses.people_id = peoplesphones.id;
UPDATE peoplesphones
LEFT JOIN
`peoplesaddresses`
ON `peoplesaddresses`.address = `peoplesphones`.address
SET `peoplesphones`.people_id = `peoplesaddresses`.people_id;
UPDATE `order`
LEFT JOIN
`peoplesphones`
ON `order`.phone = `peoplesphones`.phone
SET `order`.people_id = `peoplesphones`.people_id;
UPD4 After adding code at the top (upd4), script takes a few seconds for execute. But on ~6.5k query it terminate with text: "The system cannot find the Drive specified".
Thanks to All. Especially to xQbert and Brent Baisley.
50 minutes for 9000 records is a bit ridiculous, event without indexes. You might as well put the 9000 records in Excel and do what you need to do. I think there is something else going on with your dev machine. Perhaps you have mysql configured to use very little memory? Maybe you can post the results of this "query":
show variables like "%size%";
Just this morning I did an insert(ignore)/select on 2 tables (one into another), both with over 400,000 records. 126,000 records were inserted into the second table, it took a total of 2 minutes 13 seconds.
I would say put indexes on any of the fields you are joining or grouping on, but this seems like a one time job. I don't think the lack of indexes is your problem.
All write operations are slow in relational databases. Especially indexes make them slow, since they have to be recalculated.
If you're using a WHERE in your statements, you should place an index on the fields referenced.
GROUP BY is always very slow, and so is DISTINCT, since they have to do a lot of checks that don't scale linearly. Always avoid them.
You may like to choose a different database engine for what you're doing. 9000 records in 50 minutes is very slow. Experiment with a few different engines, such as MyISAM and InnoDB. If you're using temporary tables a lot, MEMORY is really fast for those.
Update: Also, updating multiple tables in one statement probably shouldn't be done.