I am working with mysql and I have a table with the following structure (a summary):
CREATE TABLE `costs` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`utility` DECIMAL(9,2) UNSIGNED NOT NULL,
`tax` DECIMAL(9,2) UNSIGNED NOT NULL,
`active` TINYINT(1) UNSIGNED NOT NULL DEFAULT '1',
`created_at` TIMESTAMP NULL DEFAULT NULL,
`updated_at` TIMESTAMP NULL DEFAULT NULL,
PRIMARY KEY (`id`),
)
where the active field defaults to 1 when inserting, then i would like when saving a new record all other rows update the active field as 0, so i try to create a trigger for this but i am getting a mysql error.
DELIMITER //
CREATE TRIGGER after_costs_insert AFTER INSERT ON costs FOR EACH ROW
BEGIN
UPDATE costs SET active = 0 WHERE id <> NEW.id;
END;
//
DELIMITER ;
I think it is not possible to do this, so how can I update these rows?
A trigger cannot action the table it was fired upon. That's a typical limitation in SQL, that is mainly meant to prevent infinite loop on invokation (a query fires a trigger, that executes a query, that fires the trigger, and so on).
Here, instead of actually storing this derived information, I would actually recommend using a view that computes the column on the fly when queried.
If you are running MySQL 8.0:
create view costs_view as
select
id,
utility,
tax,
row_number() over(order by id desc) = 1 active,
created_at,
updated_at
from costs
In earlier versions:
create view costs_view as
select
id,
utility,
tax,
id = (select max(id) from costs) active,
created_at,
updated_at
from costs
This gives you an always up-to-date column, that you just don't need to maintain.
If you want only the most recent row, then you can use:
select c.*
from costs c
order by id desc -- or created_at
limit 1;
This will work in a view.
More often, the situation is that you have one active per something -- such as a "utility" or whatever. In that case, you can use a secondary table to store one row per "something" along with the id of the active account. The trigger can set this idea.
In your case, you have only one active row in the costs table, so a secondary table might be considered overkill. You can easily get the current active value using the above query.
Related
I have this table
CREATE TABLE `pcodes` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`product_id` int(10) unsigned NOT NULL,
`code` varchar(100) NOT NULL,
`used` int(10) unsigned NOT NULL DEFAULT '0',
`created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
)
and an insert command is the following:
INSERT INTO `pcodes` (`product_id`, `code`) VALUES ('1', 'test2');
The table contains random codes for each product_id. I want to get one unused code randomly (LIMIT 1 is ok for the job), mark the code as used and return it to the next layer.
So far I did this:
SELECT * FROM pcodes where product_id=1 and used=0 LIMIT 1
UPDATE pcodes SET used= 1 WHERE (id = 2);
but this does not work well when multiple threads request the first unused code. What is the optimal solution to do this query? I would like to avoid stored procedures.
Possible solution.
Assumes that there aren't other predefined values stored in used column except 0 and 1.
CREATE PROCEDURE select_one_random_row (OUT rowid BIGINT)
BEGIN
REPEAT
UPDATE pcodes SET used = CONNECTION_ID() WHERE used = 0 LIMIT 1;
SELECT id INTO rowid FROM pcodes WHERE used = CONNECTION_ID();
UNTIL rowid IS NOT NULL END REPEAT;
UPDATE pcodes SET used = 1 WHERE used = CONNECTION_ID();
END
To prevent indefinite loop (for example when no rows with used=0) add some counter which increments in REPEAT cycle and breaks it after some reasonable iteration attempts.
The code may be converted to FUNCTION which returns selected rowid.
It is possible that the procedure/function fails (by some external reasons), and a row will stay "selected by current CONNECTION_ID()" whereas the connection is broken itself. So you need in service procedure executed by Event Scheduler which will garbage the rows which belongs to non-existed connections and clear their used value back to zero returning such rows to unused pool.
I have a table for storing stats. Currently this is populated with about 10 million rows at the end of the day then copied to daily stats table and deleted. For this reason I can't have an auto-incrementing primary key.
This is the table structure:
CREATE TABLE `stats` (
`shop_id` int(11) NOT NULL,
`title` varchar(255) CHARACTER SET latin1 NOT NULL,
`created` datetime NOT NULL,
`mobile` tinyint(1) NOT NULL DEFAULT '0',
`click` tinyint(1) NOT NULL DEFAULT '0',
`conversion` tinyint(1) NOT NULL DEFAULT '0',
`ip` varchar(20) CHARACTER SET latin1 NOT NULL,
KEY `shop_id` (`shop_id`,`created`,`ip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
I have a key on shop_id, created, ip but I'm not sure what columns I should use to create the optimal index to increase lookup speeds any further?
The query below takes about 12 seconds with no key and about 1.5 seconds using the index above:
SELECT DATE(CONVERT_TZ(`created`, 'UTC', 'Australia/Brisbane')) AS `date`, COUNT(*) AS `views`
FROM `stats`
WHERE `created` <= '2017-07-18 09:59:59'
AND `shop_id` = '17515021'
AND `click` != 1
AND `conversion` != 1
GROUP BY DATE(CONVERT_TZ(`created`, 'UTC', 'Australia/Brisbane'))
ORDER BY DATE(CONVERT_TZ(`created`, 'UTC', 'Australia/Brisbane'));
If there is no column (or combination of columns) that is guaranteed unique, then do have an AUTO_INCREMENT id. Don't worry about truncating/deleting. (However, if the id does not reset, you probably need to use BIGINT, not INT UNSIGNED to avoid overflow.)
Don't use id as the primary key, instead, PRIMARY KEY(shop_id, created, id), INDEX(id).
That unconventional PK will help with performance in 2 ways, while being unique (due to the addition of id). The INDEX(id) is to keep AUTO_INCREMENT happy. (Whether you DELETE hourly or daily is a separate issue.)
Build a Summary table based on each hour (or minute). It will contain the count for such -- 400K/hour or 7K/minute. Augment it each hour (or minute) so that you don't have to do all the work at the end of the day.
The summary table can also filter on click and/or conversion. Or it could keep both, if you need them.
If click/conversion have only two states (0 & 1), don't say != 1, say = 0; the optimizer is much better at = than at !=.
If they 2-state and you changed to =, then this becomes viable and much better: INDEX(shop_id, click, conversion, created) -- created must be last.
Don't bother with TZ when summarizing into the Summary table; apply the conversion later.
Better yet, don't use DATETIME, use TIMESTAMP so that you won't need to convert (assuming you have TZ set correctly).
After all that, if you still have issues, start over on the Question; there may be further tweaks.
In your where clause, Use the column first which will return the small set of results and so on and create the index in the same order.
You have
WHERE created <= '2017-07-18 09:59:59'
AND shop_id = '17515021'
AND click != 1
AND conversion != 1
If created will return the small number of set as compare to other 3 columns then you are good otherwise you that column at first position in your where clause then select the second column as per the same explanation and create the index as per you where clause.
If you think order is fine then create an index
KEY created_shopid_click_conversion (created,shop_id, click, conversion);.
I've got a database with over 7000 records. As it turns out, there are several duplicates within those records. I found several suggestions on how to delete duplicates and keep only 1 record.
But in my case things are a bit more complicated: cases are not simply duplicates if they hold the same data as another record. Instead, several cases ar perfectly okay holding the same data. They are marked as duplicate only when they hold the same data AND are both inserted within 30 seconds.
Therefore I need a SQL statement that deletes duplicates (eg: all fields, except id and datetime) if they have been inserted within a 40 seconds range (eg: evaluating the datetime field).
Since I'm everything but a SQL expert and can't find a suitable solution online, I truly hope some of you might help me out and point me in the right direction. That would be very appreciated!
The table structure is as following:
CREATE TABLE IF NOT EXISTS `wp_ttr_results` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`schoolyear` varchar(10) CHARACTER SET utf8 DEFAULT NULL,
`datetime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`area` varchar(15) CHARACTER SET utf8 NOT NULL,
`content` varchar(10) CHARACTER SET utf8 NOT NULL,
`types` varchar(100) CHARACTER SET utf8 NOT NULL,
`tasksWrong` varchar(300) DEFAULT NULL,
`tasksRight` varchar(300) DEFAULT NULL,
`tasksData` longtext CHARACTER SET utf8,
`parent_id` varchar(20) DEFAULT NULL,
UNIQUE KEY `id` (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=68696 ;
So just to clarify again, a duplicate case is a case that:
[1]holds the same data as another case for all fields, except the id and datetime field
[2]is inserted in the DB, according to the datetime field, within 40 seconds of another record with the same values
If both conditions are met, all cases except one, should be deleted.
As #Juru pointed out in the comments, we need quite a surgical knive to cut this one. It is however possible to do this in an iterative way via a stored procedure.
First we use a self-join to identify the first duplicate for every record, that itself is not a duplicate:
SELECT DISTINCT
MIN(postdups.id AS id)
FROM wp_ttr_results AS base
INNER JOIN wp_ttr_results AS postdups
ON base.id<postdups.id
AND UNIX_TIMESTAMP(postdups.datetime)-UNIX_TIMESTAMP(base.datetime)<40
AND base.user_id=postdups.user_id
AND base.schoolyear=postdups.schoolyear
AND base.area=postdups.area
AND base.content=postdups.content
AND base.types=postdups.types
AND base.tasksWrong=postdups.tasksWrong
AND base.tasksRight=postdups.tasksRight
AND base.parent_id=postdups.user_id
LEFT JOIN wp_ttr_results AS predups
ON base.id>predups.id
AND UNIX_TIMESTAMP(base.datetime)-UNIX_TIMESTAMP(predups.datetime)<40
AND base.user_id=predups.user_id
AND base.schoolyear=predups.schoolyear
AND base.area=predups.area
AND base.content=predups.content
AND base.types=predups.types
AND base.tasksWrong=predups.tasksWrong
AND base.tasksRight=predups.tasksRight
AND base.parent_id=predups.user_id
WHERE predups.id IS NULL
GROUP BY base.id
;
This selects the lowest id of all later records (base.id<postdups.id), that have the same payload as an existing record and are within a 40s window (UNIX_TIMESTAMP(dups.datetime)-UNIX_TIMESTAMP(base.datetime)<40), but skips those base records, that are duplicates themselves. In #Juru's example, the :30 record would be hit, as it is a duplicate of the :00 record, which itself is not a duplicate, but the :41 record would not be hit, as it is a duplicate only to :30, which itself is a duplicate of :00.
We have
Now we have to remove this record - since MySQL can't delete from a table it is reading, we must use a variable to achieve that:
CREATE TEMPORARY TABLE cleanUpDuplicatesTemp SELECT DISTINCT
-- as above
;
DELETE FROM wp_ttr_results
WHERE id IN
(SELECT id FROM cleanUpDuplicatesTemp)
;
DROP TABLE cleanUpDuplicatesTemp
;
Until now we will have removed the first duplicate for each record, in the process possibly changing, what would be considered a duplicate ...
Finally we must loop through this process, exiting the loop if the SELECT DISTINCT returns nothing.
Putting it all together into a stored proceedure:
DELIMITER ;;
CREATE PROCEDURE cleanUpDuplicates()
BEGIN
DECLARE numDuplicates INT;
iterate: LOOP
DROP TABLE IF EXISTS cleanUpDuplicatesTemp;
CREATE TEMPORARY TABLE cleanUpDuplicatesTemp
SELECT DISTINCT
MIN(postdups.id AS id)
FROM wp_ttr_results AS base
INNER JOIN wp_ttr_results AS postdups
ON base.id<postdups.id
AND UNIX_TIMESTAMP(postdups.datetime)-UNIX_TIMESTAMP(base.datetime)<40
AND base.user_id=postdups.user_id
AND base.schoolyear=postdups.schoolyear
AND base.area=postdups.area
AND base.content=postdups.content
AND base.types=postdups.types
AND base.tasksWrong=postdups.tasksWrong
AND base.tasksRight=postdups.tasksRight
AND base.parent_id=postdups.user_id
LEFT JOIN wp_ttr_results AS predups
ON base.id>predups.id
AND UNIX_TIMESTAMP(base.datetime)-UNIX_TIMESTAMP(predups.datetime)<40
AND base.user_id=predups.user_id
AND base.schoolyear=predups.schoolyear
AND base.area=predups.area
AND base.content=predups.content
AND base.types=predups.types
AND base.tasksWrong=predups.tasksWrong
AND base.tasksRight=predups.tasksRight
AND base.parent_id=predups.user_id
WHERE predups.id IS NULL
GROUP BY base.id;
SELECT COUNT(*) INTO numDuplicates FROM cleanUpDuplicatesTemp;
IF numDuplicates<=0 THEN
LEAVE iterate;
END IF;
DELETE FROM wp_ttr_results
WHERE id IN
(SELECT id FROM cleanUpDuplicatesTemp)
END LOOP iterate;
DROP TABLE IF EXISTS cleanUpDuplicatesTemp;
END;;
DELIMITER ;
Now a simple CALL cleanUpDuplicates; should do the trick.
This might work, but it probably won't be very fast...
DELETE FROM dupes
USING wp_ttr_results AS dupes
INNER JOIN wp_ttr_results AS origs
ON dupes.field1 = origs.field1
AND dupes.field2 = origs.field2
AND ....
AND AS dupes.id <> origs.id
AND dupes.`datetime` BETWEEN orig.`datetime` AND (orig.`datetime` + INTERVAL 40 SECOND)
;
I have a problem similar to
SQL: selecting rows where column value changed from previous row
The accepted answer by ypercube which i adapted to
CREATE TABLE `schange` (
`PersonID` int(11) NOT NULL,
`StateID` int(11) NOT NULL,
`TStamp` datetime NOT NULL,
KEY `tstamp` (`TStamp`),
KEY `personstate` (`PersonID`, `StateID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `states` (
`StateID` int(11) NOT NULL AUTO_INCREMENT,
`State` varchar(100) NOT NULL,
`Available` tinyint(1) NOT NULL,
`Otherstatuseshere` tinyint(1) NOT NULL,
PRIMARY KEY (`StateID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
SELECT
COALESCE((#statusPre <> s.Available), 1) AS statusChanged,
c.PersonID,
c.TStamp,
s.*,
#statusPre := s.Available
FROM schange c
INNER JOIN states s USING (StateID),
(SELECT #statusPre:=NULL) AS d
WHERE PersonID = 1 AND TStamp > "2012-01-01" AND TStamp < "2013-01-01"
ORDER BY TStamp ;
The query itself worked just fine in testing, and with the right mix of temporary tables i was able to generate reports with daily sum availability from a huge pile of data in virtually no time at all.
The real problem comes in when i discovered that the tables where using the MyISAM engine, which we have completely abandoned, recreated the tables to use InnoDB, and noticed the query no longer works as expected.
After some bashing head into wall i have discovered that MyISAM seems to go over the columns each row in order (selecting statusChanged before updating #statusPre), while InnoDB seems to do all the variable assigning first, and only after that it populates result rows, regardless if the assigning happens in the select or where clauses, in functions (coalesce, greater etc), subqueries or otherwise.
Trying to accomplish this in a query without variables seems to always end the same way, a subquery requiring exponentially more time to process the more rows are in the set, resulting in a excrushiating minutes (or hours) long wait to get beginning and ending events for one status, while a finished report should include daily sums of multiple.
Can this type of query work on the InnoDB engine, and if so, how should one go about it?
or is the only feasible option to go for a database product that supports WITH statements?
Removing
KEY personstate (PersonID, StateID)
fixes the problem.
No idea why tho, but it was not really required anyway, the timestamp key is the more important one and speeds up the query nicely.
I have a table called promotion_codes
CREATE TABLE promotion_codes (
id int(10) UNSIGNED NOT NULL auto_increment,
created_at datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
code varchar(255) NOT NULL,
order_id int(10) UNSIGNED NULL DEFAULT NULL,
allocated_at datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This table is pre-populated with available codes that will be assigned to orders that meet a specific criteria.
What I need to ensure is that after the ORDER is created, that I obtain an available promotion code and update its record to reflect that it has been allocated.
I am not 100% sure how to not grab the same record twice if simultaneous requests come in.
I have tried locking the row during a select and locking the row during a update - both still seem to allow a second (simultaneous) attempt to grab the same record - which is what I want to avoid
UPDATE promotion_code
SET allocated_at = "' . $db_now . '", order_id = ' . $donation->id . '
WHERE order_id IS NULL LIMIT 1
You can add a second table which holds all used codes. So you can use an unique constraint in the assignment table to make sure that one code is not assigned twice.
CREATE TABLE `used_codes` (`usage` INTEGER PRIMARY KEY auto_increment,
`id` INTEGER NOT NULL UNIQ, -- This makes sure, that there are no two assignments of one code
allocated_at datetime NOT NULL);
You add the ID of an used code into the used_codes table, and query which code you used afterwards. When this two operations are in one transaction, the entire transaction will fail when there is a second try to use the same code.
I did not test the following code, you might to adjust it.
Also you need to make sure that you have your server meets the requirements for transactions.
-- There are changes which have to be atomic, so don't use autocommit
SET autocommit = 0;
BEGIN TRANSACTION
INSERT INTO `used_codes` (`id`, `allocated_at`) VALUES
(SELECT `id` FROM `promotion_codes`
WHERE NOT `id` in (SELECT `id` FROM `used_codes`)
LIMIT 1), now());
SELECT `code` FROM `promotion_codes` WHERE `id` =
-- You might need to adjust the extraction of insertion ID, since
-- I don't know if parallel running transactions can see the maximum
-- their maximum IDs. But there should be a way to extract the last assigned
-- ID within this transaction.
(SELECT `id` FROM `used_codes` HAVING `usage` = max(`usage`));
COMMIT
You can use the returned code if the transaction sucseeded. If there where more than one processes running to use the same code, only one of them succed, while the rest fails with insert errors about the duplicated row. In your software you need to distinguish between the duplicated row error and other errors, and reexecute the statement on duplication errors.