I have table:
CREATE TABLE `cold_water_volume_value` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`parameter_value_id` int(11) NOT NULL,
`time` timestamp(4) NOT NULL DEFAULT CURRENT_TIMESTAMP(4) ON UPDATE CURRENT_TIMESTAMP(4),
`value` double NOT NULL,
`device_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_cold_water_volume_value_id_device_time` (`parameter_value_id`,`device_id`,`time`),
KEY `idx_cold_water_volume_value_id_time` (`parameter_value_id`,`time`),
KEY `fk_cold_water_volume_value_device_id_idx` (`device_id`),
CONSTRAINT `fk_cold_water_volume_value_device_id` FOREIGN KEY (`device_id`) REFERENCES `device` (`id`) ON UPDATE SET NULL,
CONSTRAINT `fk_cold_water_volume_value_id` FOREIGN KEY (`parameter_value_id`) REFERENCES `cold_water_volume_parameter` (`id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=684740 DEFAULT CHARSET=utf8;
And all rows have device_id = NULL. I want to update it by script:
UPDATE cold_water_volume_value SET device_id = 130101 WHERE parameter_value_id = 2120101;
But instead of replacing all device_id for picked parameter_value_id from null to given value, it sets all content of time and value columns to now () and some (seems like completely random from previous values) number.
Why it happens, and how to do it correct way?
time is automatically updated as per your schema.
`time` timestamp(4) NOT NULL DEFAULT CURRENT_TIMESTAMP(4) ON UPDATE CURRENT_TIMESTAMP(4)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To get around that you can set time to itself in your update.
UPDATE cold_water_volume_value
SET device_id = 130101, time = time
WHERE parameter_value_id = 2120101;
But that is likely there to track when the last time a row was updated. If so it's working as intended, leave it to do its thing.
As for value, that might have an update trigger on it. Check with show triggers and look for triggers on that table.
Your device_id is updated using content of time probably because in your index definition you mixed datatypes. It's worth noting that you should not mix datatypes especially on where clause when indexing.
Try to separate your indexes for example:
KEY idx_cold_water_volume_value_id_device_time (time),
KEY idx_cold_water_volume_value_id_device (parameter_value_id,device_id),
Try above statements for your definition and run query again.
It makes sense for the indexed column to have the same datatypes.
e.g. parameter_value_id and device_id
Related
I am using mysql with Django. I am trying to count the number of visitor_pages for a specific dealer in a certain amount of time.
I would share the raw sql query that I have obtained from django debug toolbar.
SELECT COUNT(*) AS `__count`
FROM `visitor_page`
INNER JOIN `dealer_visitors`
ON (`visitor_page`.`dealer_visitor_id` = `dealer_visitors`.`id`)
WHERE (`visitor_page`.`date_time` BETWEEN '2021-02-01 05:51:00'
AND '2021-03-21 05:50:00'
AND `dealer_visitors`.`dealer_id` = 15)
The issue is that I have more than 13 million records in the visitor_pages table and about 1.5 million records in the dealer_visitor table. I have already indexed date_time. I am thinking of using a materialized view but before attempting that, I would really appreciate suggestions on how I could improve this query.
visitor_pages schema:
CREATE TABLE `visitor_page` (
`id` int NOT NULL AUTO_INCREMENT,
`date_time` datetime(6) DEFAULT NULL,
`added_at` datetime(6) DEFAULT NULL,
`updated_at` datetime(6) DEFAULT NULL,
`page_id` int NOT NULL,
`dealer_visitor_id` int NOT NULL,
PRIMARY KEY (`id`),
KEY `visitor_page_page_id_246babdf_fk_web_page_id` (`page_id`),
KEY `visitor_page_dealer_visitor_id_e2dddea2_fk_dealer_visitors_id` (`dealer_visitor_id`),
KEY `visitor_page_date_time_06e9e9f5` (`date_time`),
CONSTRAINT `visitor_page_dealer_visitor_id_e2dddea2_fk_dealer_visitors_id` FOREIGN KEY (`dealer_visitor_id`) REFERENCES `dealer_visitors` (`id`),
CONSTRAINT `visitor_page_page_id_246babdf_fk_web_page_id` FOREIGN KEY (`page_id`) REFERENCES `web_page` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=13626649 DEFAULT CHARSET=latin1;
dealer_visitors schema:
CREATE TABLE `dealer_visitors` (
`id` int NOT NULL AUTO_INCREMENT,
`visit_date` datetime(6) DEFAULT NULL,
`added_at` datetime(6) DEFAULT NULL,
`updated_at` datetime(6) DEFAULT NULL,
`dealer_id` int NOT NULL,
`visitor_id` int NOT NULL,
`type` int DEFAULT NULL,
`notes` longtext,
`location` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `dealer_visitors_dealer_id_306e2202_fk_dealer_id` (`dealer_id`),
KEY `dealer_visitors_visitor_id_27ae498e_fk_visitor_id` (`visitor_id`),
KEY `dealer_visitors_type_af0f7d79` (`type`),
KEY `dealer_visitors_visit_date_f2b138c9` (`visit_date`),
CONSTRAINT `dealer_visitors_dealer_id_306e2202_fk_dealer_id` FOREIGN KEY (`dealer_id`) REFERENCES `dealer` (`id`),
CONSTRAINT `dealer_visitors_visitor_id_27ae498e_fk_visitor_id` FOREIGN KEY (`visitor_id`) REFERENCES `visitor` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1524478 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
EXPLAIN ANALYZE the query gives me the following:
EXPLAIN:
For this query:
SELECT COUNT(*) AS `__count`
FROM visitor_page vp JOIN
dealer_visitors dv
ON vp.dealer_visitor_id = dv.id
WHERE vp.date_time BETWEEN '2021-02-01 05:51:00' AND '2021-03-21 05:50:00' AND
dv.dealer_id = 15;
The best indexes are on dealer_visitors(dealer_id, date_time, id) and visitor_page(dealer_visitor_id).
An index only on date helps a bit. But you are retrieving a month's worth of data and that might be a lot of data to process. Having dealer_id as the first column in the index will restrict the data to only the rows for that dealer in that time frame.
Depending on the distribution of the data, the Optimizer might pick one of the tables to start with, or pick the other. So, let's provide optimal indexes for each case:
ON `visitor_page`.`dealer_visitor_id` = `dealer_visitors`.`id`
WHERE `visitor_page`.`date_time` BETWEEN ...
AND `dealer_visitors`.`dealer_id` = 15
Starting with visitor_page:
visitor_page: INDEX(date_time) -- (already exists)
dealer_visitors: (already has PRIMARY KEY(id))
Starting with dealer_visitors:
dealer_visitors: INDEX(dealer_id) -- (already exists)
visitor_page: INDEX(dealer_visitor_id, date_time) -- in this order
and drop dealer_visitors_visitor_id_27ae498e_fk_visitor_id as now being redundant.
The net is to add one index and drop one index.
Materialized view -- It is often best for Data Warehouse reports to build and incrementally maintain a "summary table" (a "materialized view"). The very odd date range (1 month + 20 days - 61 seconds) makes this clumsy to do. Typically it is handy to make the table based on whole days. If you can shift to daily (or hourly), then see http://mysql.rjweb.org/doc.php/summarytables
Something else to check: How much RAM do you have? What does SHOW VARIABLES LIKE 'innodb_buffer_pool_size'; say?
I see that the tables have different charset/collation. This is not a problem for the query in question, but if you have other queries that JOIN on VARCHARs, check that they use the same collation.
I got a MySQL database with some tables.
In one of these tables i want to insert by a SQL script some new rows.
Unfortunately i have to insert in two columns an empty string and the two columns are part of an unique key for that table.
So i tried to set UNIQUE_CHECKS before and after the insert, but i'm getting errors because of duplicate entries.
Here is the definition of the table:
CREATE TABLE `Table_A` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
`number` varchar(25) DEFAULT NULL,
`changedBy` varchar(150) DEFAULT NULL,
`changeDate` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`,`number`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And the INSERT statement which causes error:
SET UNIQUE_CHECKS = 0;
INSERT INTO `Table_A`
(`name`, `number`, `changedBy`, `changeDate`)
SELECT DISTINCT '', 'myUser', CURRENT_TIMESTAMP
FROM Table_A
AND id NOT IN
(
SELECT DISTINCT id
FROM Table_A
);
SET UNIQUE_CHECKS = 1;
As You can see, i'm using UNIQUE_CHECKS.
But as i said this doesn't work properly.
Any help or suggestion would be appreciated.
Patrick
Switching off Unique Keys for the insert operation doesn't indicate that it will check uniqueness only for the operations that happen after you switch it on again. It just means that database will not waste time to check the constraint during the time it is switch off but it will check the constraint when you switch it on again.
What it measn is that you nead to ensure that column has unique value in a columns with Unique Keys before you can turn it on. Which you don't do.
If you want to maintain Uniqueness somehow for new records you insert after some point in time you would need to create trigger and manually check the new records against already existing data. The same possibly goes for updates. But I don't recommend it - you should probably redesign data so either the Unique Key is not there or the data is truly unique for all the records there are and will be.
I'm working with MySQL 5.7. I created a table with a virtual column (not stored) of type DATETIME with an index on it. While I was working on it, I noticed that order by was not returning all the data (some data I was expecting at the top was missing). Also the results from MAX and MIN were wrong.
After I run
ANALYZE TABLE
CHECK TABLE
OPTIMIZE TABLE
then the results were correct. I guess there was an issue with the index data, so I have few questions:
When and why this could happen?
Is there a way to prevent this?
among the 3 command I run, which is the correct one to use?
I'm worried that this could happen in the future but I'll not notice.
EDIT:
as requested in the comments I added the table definition:
CREATE TABLE `items` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned DEFAULT NULL,
`image` json DEFAULT NULL,
`status` json DEFAULT NULL,
`status_expired` tinyint(1) GENERATED ALWAYS AS (ifnull(json_contains(`status`,'true','$.expired'),false)) VIRTUAL COMMENT 'used for index: it checks if status contains expired=true',
`lifetime` tinyint(4) NOT NULL,
`expiration` datetime GENERATED ALWAYS AS ((`create_date` + interval `lifetime` day)) VIRTUAL,
`last_update` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`create_date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `expiration` (`status_expired`,`expiration`) USING BTREE,
CONSTRAINT `ts_competition_item_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `ts_user_core` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1312459 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED
Queries that were returning the wrong results:
SELECT * FROM items ORDER BY expiration DESC;
SELECT max(expiration),min(expiration) FROM items;
Thanks
TLDR;
The trouble is that your data comes from virtual columns materialized via indexes. The check, optimize, analyze operations you are doing forces the indexes to be synced and fixes any errors. That gives you the correct results henceforth. At least until the index gets out of sync again.
Why it may happen
Much of the problems are caused by issues with your table design. Let's start with.
`status_expired` tinyint(1) GENERATED ALWAYS AS (ifnull(json_contains(`status`,'true','$.expired'),false)) VIRTUAL
No doubt this is created to overcome the fact that you cannot directly index a JSON column in mysql. You have created a virtual column and indexed that instead. It's all very well, but this column can hold only one of two values; true or false. Which means it has very poor cadinality. As a result, mysql is unlikely to use this index for anything.
But we can see that you have combined the status_expired column with the expired column when creating the index. Perhaps with the idea of overcoming this poor cardinality mentioned above. But wait...
`expiration` datetime GENERATED ALWAYS AS ((`create_date` + interval `lifetime` day)) VIRTUAL,
Expiration is another virtual column. This has some repercussions.
When a secondary index is created on a generated virtual column,
generated column values are materialized in the records of the index.
If the index is a covering index (one that includes all the columns
retrieved by a query), generated column values are retrieved from
materialized values in the index structure instead of computed “on the
fly”.
Ref: https://dev.mysql.com/doc/refman/5.7/en/create-table-secondary-indexes.html#json-column-indirect-index
This is contrary to
VIRTUAL: Column values are not stored, but are evaluated when rows are
read, immediately after any BEFORE triggers. A virtual column takes no
storage.
Ref: https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html
We create virtual columns based on the sound principal that values generated by simple operations on columns shouldn't be stored to avoid redundancy, but by creating an index on it, we reintroduce redundancy.
Proposed fixes
based on the information provided, you don't really seem to need the status_expired column or even the expired column. An item that's past it's expiry date is expired!
CREATE TABLE `items` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) unsigned DEFAULT NULL,
`image` json DEFAULT NULL,
`status` json DEFAULT NULL,
`expire_date` datetime GENERATED ALWAYS AS ((`create_date` + interval `lifetime` day)) VIRTUAL,
`last_update` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`create_date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `expiration` (`expired_date`) USING BTREE,
CONSTRAINT `ts_competition_item_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `ts_user_core` (`user_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1312459 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED
Simply compare the current date with the expired_date column in the above table when you need to find out which items have expired. The difference here is instead of expired being a calculated item in every query, you calculate the expiry_date once, when you create the record.
This makes your table a lot neater and queries possibly faster
I need to add multiple records to a mysql database. I tried with multiple queries and its working fine, but not efficient. So I tried it with just one query like below,
INSERT INTO data (block, length, width, rows) VALUES
("BlockA", "200", "10", "20"),
("BlockB", "330", "8", "24"),
("BlockC", "430", "7", "36")
ON DUPLICATE KEY UPDATE
block=VALUES(block),
length=VALUES(length),
width=VALUES(width),
rows=VALUES(rows)
But it always update the table (columns are block_id, block, length, width, rows).
Should I do any changes on the query with adding block_id also. block_id is the primary key. Any help would be appreciated.
I've run your query without any problem, are you sure you don't have other keys defined with the data table ? And also make sure you have 'auto increment' set for the id field. without auto_increment, the query always update existing row
***** Updated **********
Sorry I've mistaken your questions. Yes, with only one auto_increment key, you query will always insert new rows instead of updating existing one ( because the primary key is the only way to detect 'existing' / duplication ), since the key is auto_increment, there's never a duplication if the primary key is not given in the insert query.
I think what you want to achieve is different, you might want to set up composite unique key on all fields (i.e. block, field, width, rows )
By the way, i've set up a SQL fiddle for you.
http://sqlfiddle.com/#!2/e7216/1
The syntax to add the unique key:
CREATE TABLE `data` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`block` varchar(10) DEFAULT NULL,
`length` int(11) DEFAULT NULL,
`width` int(11) DEFAULT NULL,
`rows` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `uniqueme` (`block`,`length`,`width`,`rows`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
Need some advice working with EF4 and MySql.
I have a table with lots of data items. Each item belongs to a module and a zone. The data item also has a timestamp (ticks). The most common usage is for the app to query for data after a specified time for a module and a zone. The data should be sorted.
Problem is that the query selects to many rows and the database server will be low on memory resulting in a very slow query. I tried to limit the query to 100 items but the generated sql will only apply the limit after all the items has been selected and sorted.
dataRepository.GetData().WithModuleId(ModuleId).InZone(ZoneId).After(ztime).OrderBy(p
=> p.Timestamp).Take(100).ToList();
Generated SQL by the MySql .Net Connector 6.3.6
SELECT
`Project1`.`Id`,
`Project1`.`Data`,
`Project1`.`Timestamp`,
`Project1`.`ModuleId`,
`Project1`.`ZoneId`,
`Project1`.`Version`,
`Project1`.`Type`
FROM (SELECT
`Extent1`.`Id`,
`Extent1`.`Data`,
`Extent1`.`Timestamp`,
`Extent1`.`ModuleId`,
`Extent1`.`ZoneId`,
`Extent1`.`Version`,
`Extent1`.`Type`
FROM `DataItems` AS `Extent1`
WHERE ((`Extent1`.`ModuleId` = 1) AND (`Extent1`.`ZoneId` = 1)) AND
(`Extent1`.`Timestamp` > 634376753657189002)) AS `Project1`
ORDER BY
`Timestamp` ASC LIMIT 100
Table definition
CREATE TABLE `mydb`.`DataItems` (
`Id` bigint(20) NOT NULL AUTO_INCREMENT,
`Data` mediumblob NOT NULL,
`Timestamp` bigint(20) NOT NULL,
`ModuleId` bigint(20) NOT NULL,
`ZoneId` bigint(20) NOT NULL,
`Version` int(11) NOT NULL,
`Type` varchar(1000) NOT NULL,
PRIMARY KEY (`Id`),
KEY `IX_FK_ModuleDataItem` (`ModuleId`),
KEY `IX_FK_ZoneDataItem` (`ZoneId`),
KEY `Index_4` (`Timestamp`),
KEY `Index_5` (`ModuleId`,`ZoneId`),
CONSTRAINT `FK_ModuleDataItem` FOREIGN KEY (`ModuleId`) REFERENCES
`Modules` (`Id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `FK_ZoneDataItem` FOREIGN KEY (`ZoneId`) REFERENCES `Zones`
(`Id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=22904 DEFAULT CHARSET=utf8;
All suggestions on how to solve this are welcome.
What's your GetData() method doing? I'd bet it's executing a query on the entire table. And that's why your Take(100) at the end isn't doing anything.
I solved this by using the Table Splitting method described here:
Table splitting in entity framework