Strange trouble indeed!
I am experiencing this issue when (My)SQL add some properties to my table, that I don´t want to be there and that I can´t change, even if I run right command and get "SUCCESS" reply.
Here is code for creating such a table:
CREATE TABLE `KIIS_EVENT_APPLICATION`
(
`ID_USER` smallint(3) unsigned NOT NULL,
`ID_EVENT` smallint(5) unsigned NOT NULL,
`COMES` timestamp,
`LEAVES` timestamp,
`TRANSPORT_THERE` varchar(30) COLLATE cp1250_czech_cs,
`TRANSPORT_BACK` varchar(30) COLLATE cp1250_czech_cs,
`ROLE` varchar(30) COLLATE cp1250_czech_cs NOT NULL,
`RELEVANCE` tinyint(1) unsigned NOT NULL,
FOREIGN KEY (`ID_EVENT`) REFERENCES `KIIS_EVENTS`(`ID_EVENT`),
FOREIGN KEY (`ID_USER`) REFERENCES `KIIS_USERS`(`ID_USER`)
) ENGINE=InnoDB DEFAULT CHARSET=cp1250 COLLATE cp1250_czech_cs
Let´s see the result:
Yellow highlighted things I don´t asked for.
If I run query, such as
ALTER TABLE `KIIS_EVENT_APPLICATION` CHANGE `COMES` `COMES` TIMESTAMP NOT NULL;
page says, it is successfully done, but nothing changes.
How can i make COMES column to be same as LEAVES column ?
Could it be caused by missing primary key? Do I need one when I have 2 foreign there (is it good SQL design practice, or?) ?
Michael - sqlbot got it right in comment.
ALTER TABLE KIIS_EVENT_APPLICATION MODIFY COLUMN COMES TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00'; or more correctly, ... TIMESTAMP NULL DEFAULT NULL
The first timestamp column in a table gets magical behavior by default prior to MySQL Server 5.6.
I added
timestamp NOT NULL DEFAULT '0000-00-00 00:00:00'
properties to all columns with such a behaviour and it works just fine.
Great!
Related
I'm trying to denormalize a few MySQL tables I have into a new table that I can use to speed up some complex queries with lots of business logic. The problem that I'm having is that there are 2.3 million records I need to add to the new table and to do that I need to pull data from several tables and do a few conversions too. Here's my query (with names changed)
INSERT INTO database_name.log_set_logs
(offload_date, vehicle, jurisdiction, baselog_path, path,
baselog_index_guid, new_location, log_set_name, index_guid)
(
select STR_TO_DATE(logset_logs.offload_date, '%Y.%m.%d') as offload_date,
logset_logs.vehicle, jurisdiction, baselog_path, path,
baselog_trees.baselog_index_guid, new_location, logset_logs.log_set_name,
logset_logs.index_guid
from
(
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 7), '/', -1) as offload_date,
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle,
SUBSTRING_INDEX(path, '/', 9) as baselog_path, index_guid,
path, log_set_name
FROM database_name.baselog_and_amendment_guid_to_path_mappings
) logset_logs
left join database_name.log_trees baselog_trees
ON baselog_trees.original_location = logset_logs.baselog_path
left join database_name.baselog_offload_location location
ON location.baselog_index_guid = baselog_trees.baselog_index_guid);
The query itself works because I was able to run it using a filter on log_set_name however that filter's condition will only work for less than 1% of the total records because one of the values for log_set_name has 2.2 million records in it which is the majority of the records. So there is nothing else I can use to break this query up into smaller chunks from what I can see. The problem is that the query is taking too long to run on the rest of the 2.2 million records and it ends up timing out after a few hours and then the transaction is rolled back and nothing is added to the new table for the 2.2 million records; only the 0.1 million records were able to be processed and that was because I could add a filter that said where log_set_name != 'value with the 2.2 million records'.
Is there a way to make this query more performant? Am I trying to do too many joins at once and perhaps I should populate the row's columns in their own individual queries? Or is there some way I can page this type of query so that MySQL executes it in batches? I already got rid of all my indexes on the log_set_logs table because I read that those will slow down inserts. I also jacked my RDS instance up to a db.r4.4xlarge write node. I am also using MySQL Workbench so I increased all of it's timeout values to their maximums giving them all nines. All three of these steps helped and were necessary in order for me to get the 1% of the records into the new table but it still wasn't enough to get the 2.2 million records without timing out. Appreciate any insights as I'm not adept to this type of bulk insert from a select.
'CREATE TABLE `log_set_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`purged` tinyint(1) NOT NULL DEFAUL,
`baselog_path` text,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`new_location` text,
`offload_date` date NOT NULL,
`jurisdiction` varchar(20) DEFAULT NULL,
`vehicle` varchar(20) DEFAULT NULL,
`index_guid` varchar(36) NOT NULL,
`path` text NOT NULL,
`log_set_name` varchar(60) NOT NULL,
`protected_by_retention_condition_1` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_2` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_3` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_4` tinyint(1) NOT NULL DEFAULT ''1'',
`general_comments_about_this_log` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1736707 DEFAULT CHARSET=latin1'
'CREATE TABLE `baselog_and_amendment_guid_to_path_mappings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`path` text NOT NULL,
`index_guid` varchar(36) NOT NULL,
`log_set_name` varchar(60) NOT NULL,
PRIMARY KEY (`id`),
KEY `log_set_name_index` (`log_set_name`),
KEY `path_index` (`path`(42))
) ENGINE=InnoDB AUTO_INCREMENT=2387821 DEFAULT CHARSET=latin1'
...
'CREATE TABLE `baselog_offload_location` (
`baselog_index_guid` varchar(36) NOT NULL,
`jurisdiction` varchar(20) NOT NULL,
KEY `baselog_index` (`baselog_index_guid`),
KEY `jurisdiction` (`jurisdiction`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
'CREATE TABLE `log_trees` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`original_location` text NOT NULL, -- This is what I have to join everything on and since it's text I cannot index it and the largest value is above 255 characters so I cannot change it to a vachar then index it either.
`new_location` text,
`distcp_returncode` int(11) DEFAULT NULL,
`distcp_job_id` text,
`distcp_stdout` text,
`distcp_stderr` text,
`validation_attempt` int(11) NOT NULL DEFAULT ''0'',
`validation_result` tinyint(1) NOT NULL DEFAULT ''0'',
`archived` tinyint(1) NOT NULL DEFAULT ''0'',
`archived_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dir_exists` tinyint(1) NOT NULL DEFAULT ''0'',
`random_guid` tinyint(1) NOT NULL DEFAULT ''0'',
`offload_date` date NOT NULL,
`vehicle` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `baselog_index_guid` (`baselog_index_guid`)
) ENGINE=InnoDB AUTO_INCREMENT=1028617 DEFAULT CHARSET=latin1'
baselog_offload_location has not PRIMARY KEY; what's up?
GUIDs/UUIDs can be terribly inefficient. A partial solution is to convert them to BINARY(16) to shrink them. More details here: http://localhost/rjweb/mysql/doc.php/uuid ; (MySQL 8.0 has similar functions.)
It would probably be more efficient if you have a separate (optionally redundant) column for vehicle rather than needing to do
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle
Why JOIN baselog_offload_location? Three seems to be no reference to columns in that table. If there, be sure to qualify them so we know what is where. Preferably use short aliases.
The lack of an index on baselog_index_guid may be critical to performance.
Please provide EXPLAIN SELECT ... for the SELECT in your INSERT and for the original (slow) query.
SELECT MAX(LENGTH(original_location)) FROM .. -- to see if it really is too big to index. What version of MySQL are you using? The limit increased recently.
For the above item, we can talk about having a 'hash'.
"paging the query". I call it "chunking". See http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks . That talks about deleting, but it can be adapted to INSERT .. SELECT since you want to "chunk" the select. If you go with chunking, Javier's comment becomes moot. Your code would be chunking the selects, hence batching the inserts:
Loop:
INSERT .. SELECT .. -- of up to 1000 rows (see link)
End loop
I'm about to deploy a web app which can end up with a quite big database and have some doubts which I would like to clear up before going live.
Will explain a bit my setup and most common querys:
1- I use sqlalchemy
2- I have many different tables referenced among them by their id (Integer unique field)
3- Some tables use a column with random 50chars unique string which I use client side to avoid exposing id to the clients. This column is indexed.
4- I also indexed some datetime columns which I use for querys which find rows in date ranges.
5- All relations are indexed because sometimes I query by that parameter.
6- Also have indexed some Bool columns which I query together with another index column.
So taking this in mind I ask:
In point 3: It's fine to query by this unique indexed 50chars string? It's not too long to work as index? Will work as fast now as with 50millions register?
Example query:
customer=users.query.filter_by(secretString="ZT14V-_hD9qZrtwLj-rLPTioWQ1MJ4rhfqCUx8SvF0BrrCLtgvV_ZVXXV8_xXW").first()
Then I use this user query to find his associated object:
associatedObject=objects.query.filter_by(id=customer.associatedObject).first()
So once I have this results I just get whatever I need from them:
return({"username":user.Name,"AssociatedStuff":associatedObject.Name})
About point 4:
Will this indexes in datetime columns do some work when comparing with < > operators?
About point 6:
It's ok to query something like:
userFineshedTasks=tasks.query.filter(task.completed==True, task.userID==user.id).all()
being completed and userID indexed columns and userID a reference to users id column.
"Note this query doesn't makes sense because I can get the user completed task from user.tasks.all() given they are referenced and filter the completed from there, but just like an example query..."
Basically asking for confirmation about if this is a correct way to query rows in a huge database given most of my querys will be for unique objects or if I'm doing something wrong.
Hope someone can let me know if this is a good practice or if I will have performance issues in the future.
Thanks in advance!
#Rick James:
Here I'm posting the create table sql code from the database export file:
Hope this is enough to get an idea, is an example of one of the tables, basically same ideas which applies to my questions.
CREATE TABLE `Bookings` (
`id` int(11) NOT NULL,
`CodigoAlojamiento` int(11) DEFAULT NULL,
`Entrada` datetime DEFAULT NULL,
`Salida` datetime DEFAULT NULL,
`Habitaciones` longtext COLLATE utf8_unicode_ci,
`Precio` float DEFAULT NULL,
`Agencia` varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL,
`Extras` text COLLATE utf8_unicode_ci,
`Confirmada` tinyint(1) DEFAULT NULL,
`NumeroOcupantes` int(11) DEFAULT NULL,
`Completada` tinyint(1) DEFAULT NULL,
`Tarifa` int(11) DEFAULT NULL,
`SafeURL` varchar(120) COLLATE utf8_unicode_ci DEFAULT NULL,
`EmailContacto` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`TelefonoContacto` varchar(30) COLLATE utf8_unicode_ci DEFAULT NULL,
`Titular` varchar(300) COLLATE utf8_unicode_ci DEFAULT NULL,
`Observaciones` text COLLATE utf8_unicode_ci,
`IdentificadorReserva` varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL,
`Facturada` tinyint(1) DEFAULT NULL,
`FacturarAClienteOAgencia` varchar(1) COLLATE utf8_unicode_ci DEFAULT NULL,
`Pagada` tinyint(1) DEFAULT NULL,
`CheckOut` tinyint(1) DEFAULT NULL,
`PagaClienteOAgencia` char(1) COLLATE utf8_unicode_ci DEFAULT NULL,
`NumeroFactura` int(11) DEFAULT NULL,
`FechaFactura` datetime DEFAULT NULL,
`CheckIn` tinyint(1) DEFAULT NULL,
`EsPreCheckIn` tinyint(1) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
here the indexes:
ALTER TABLE `Bookings`
ADD PRIMARY KEY (`id`),
ADD UNIQUE KEY `ix_Bookings_SafeURL` (`SafeURL`),
ADD KEY `ix_Bookings_CodigoAlojamiento` (`CodigoAlojamiento`),
ADD KEY `ix_Bookings_Tarifa` (`Tarifa`),
ADD KEY `ix_BookingsE_CheckIn` (`CheckIn`),
ADD KEY `ix_Bookings_CheckOut` (`CheckOut`),
ADD KEY `ix_Bookings_Completada` (`Completada`),
ADD KEY `ix_Bookings_Confirmada` (`Confirmada`),
ADD KEY `ix_Bookings_Entrada` (`Entrada`),
ADD KEY `ix_Bookings_EsPreCheckIn` (`EsPreCheckIn`),
ADD KEY `ix_Bookings_Salida` (`Salida`);```
And here the references:
```ALTER TABLE `Bookings`
ADD CONSTRAINT `Bookings_ibfk_1` FOREIGN KEY (`CodigoAlojamiento`) REFERENCES `Alojamientos` (`id`),
ADD CONSTRAINT `Bookings_ibfk_2` FOREIGN KEY (`Tarifa`) REFERENCES `Tarifas` (`id`);```
4- for querys which find rows in date ranges.
Usually there is something else in the WHERE, say
WHERE x = 123
AND Entrada BETWEEN ... AND ...
I that case this is optimal: INDEX(x, Entrada)
`CheckOut` tinyint(1) DEFAULT NULL
ADD KEY `ix_Bookings_CheckOut` (`CheckOut`),
It is rarely useful to index a "flag". However, a composite index (as above) may be useful.
Why are most columns NULLable? For "booleans", simply use 0 and 1 and DEFAULT to whichever one is appropriate. Use NULL for "don't know", "optional", "not yet supplied", etc.
6- Also have indexed some Bool columns which I query together with another index column.
Then have a composite index. And be sure to say b=1 not b<>0, since <> does not optimize as well.
It's fine to query by this unique indexed 50chars string? It's not too long to work as index? Will work as fast now as with 50millions register?
If the dataset becomes bigger than RAM, there is a performance problem with "random" indexes. Your example should be fine. (Personally, I think 50 chars is excessive.) And such a 'hash' should probably be CHARACTER SET ascii and perhaps with COLLATE ascii_bin instead of a case-folding version.
And "task.completed==True, task.userID==user.id" os probably best indexed with a "composite" INDEX(userID, completed) in either order.
Yes, indexes in datetime columns do some work when comparing with <, <=, >, >= operators? Strings can also be compared, though I do not see any likely columns for string comparisions other than with =.
50M rows is large, but not "huge". Composite indexes are often important for large tables.
I want to have a column that will store the creation date of a row.
I'm using php and mysql but I don't think that matters.
I've looked for a series of answers about that but all of them seem to be for updating an existing table. Well surely there's one for what I'm looking for since it's a pretty basic question but I've yet to find it.
I've tried things with DEFAULT, CONSTRAINT but none of them allow me to create the table once added to my code. You could have the feeling that I'm not well versed in sql and you would not be wrong.
This creates the table, could you tell me what to add ?
CREATE TABLE IF NOT EXISTS artwork (
id_artwork int(4) NOT NULL AUTO_INCREMENT,
title varchar(50) NOT NULL,
creationDate DateTime(3),
CONSTRAINT PK_artwork PRIMARY KEY (id_artwork)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
I've tried the following with no success:
creationDate DateTime(3) DEFAULT GETDATE()
creationDate DateTime(3) DEFAULT (GETDATE())
creationDate DATETIME(3) DEFAULT (CURRENT_TIMESTAMP)
MySQL Version: 5.7.23 - MySQL Community Server (GPL)
From the MySQL documentation on initialization using DATETIME:
If a TIMESTAMP or DATETIME column definition includes an explicit fractional seconds precision value anywhere, the same value must be used throughout the column definition.
This means we'll have to carry forward your precision. I was able to get it to work on SQL Fiddle:
CREATE TABLE IF NOT EXISTS artwork (
id_artwork int(4) NOT NULL AUTO_INCREMENT,
title varchar(50) NOT NULL,
creationDate DateTime(3) DEFAULT CURRENT_TIMESTAMP(3),
CONSTRAINT PK_artwork PRIMARY KEY (id_artwork)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
I have a problem that i have tried to solve for the last 2 days, i have 2 tables, workspat and xtractor_wrk.
xtractor_wrk contains 250000 rows and workspat contains 67 million rows.
CREATE TABLE `xtractor_wrk` (
`db_time` datetime DEFAULT NULL,
`db_position` point NOT NULL,
`db_namn` char(50) CHARACTER SET utf8 COLLATE utf8_swedish_ci NOT NULL,
`db_sis` mediumint(8) unsigned DEFAULT NULL,
`db_om` smallint(5) unsigned DEFAULT NULL,
`db_seq` char(50) DEFAULT NULL,
`db_grarri` datetime DEFAULT NULL,
`db_grtime` datetime DEFAULT NULL,
KEY `db_time` (`db_time`),
KEY `db_sis` (`db_sis`),
KEY `db_om` (`db_om`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC
CREATE TABLE `workspat` (
`db_time` datetime NOT NULL,
`db_point` point NOT NULL,
`db_om` smallint(6) NOT NULL,
`db_sis` mediumint(8) NOT NULL,
`db_status` char(10) CHARACTER SET latin1 NOT NULL,
KEY `db_sis` (`db_sis`),
KEY `db_om` (`db_om`),
KEY `db_time` (`db_time`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci
I have 2 problems:
What i would like to do is to update my table xtractor_wrk with the MAX(workspat.db_time)
and the MIN(workspat.db_time) from the result i would get from the "ON" below .
I have tried a lot of things but the only think i got somewhat working was this:
UPDATE xtractor_wrk
JOIN workspat
ON date(xtractor_wrk.db_time) = date(workspat.db_time)
and xtractor_wrk.db_om = workspat.db_om
and xtractor_wrk.db_sis = workspat.db_sis
SET xtractor_wrk.db_grtime = workspat.db_time
and of course this does not give me the MAX and MIN update to xtractor_wrk its just that this
is the only thing that even remotely worked for me.
workspat.db_time can have any number of matches and i would like the highest and lowest and write them to xtractor_wrk.db_grtime and xtractor_wrk.db_grarri
I also have a problem with speed, i have tried indexing but its still very slow, is there a way to index across tables or is my problem all the updates? Can i write the result to a new table instead of updating or maybe delay the update since its 250000 rows to update? How would i do that?
Trying to suggest only
Add new column that indicates that is already updated or not like 1 if already updated and 0 if its not updated and also add where clause for more faster updating.
Example:
column 1 column 2 column 3 ... Updated
0
1
0
0
1
Why do I get an error of the form:
Error in query: Duplicate entry '10' for key 1
...when doing an INSERT statement like:
INSERT INTO wp_abk_period (pricing_id, apartment_id) VALUES (13, 27)
...with 13 and 27 being valid id-s for existing pricing and apartment rows, and the table is defined as:
CREATE TABLE `wp_abk_period` (
`id` int(11) NOT NULL auto_increment,
`apartment_id` int(11) NOT NULL,
`pricing_id` int(11) NOT NULL,
`type` enum('available','booked','unavailable') collate utf8_unicode_ci default NULL,
`starts` datetime default NULL,
`ends` datetime default NULL,
`recur_type` enum('daily','weekly','monthly','yearly') collate utf8_unicode_ci default NULL,
`recur_every` char(3) collate utf8_unicode_ci default NULL,
`timedate_significance` char(4) collate utf8_unicode_ci default NULL,
`check_in_times` varchar(255) collate utf8_unicode_ci default NULL,
`check_out_times` varchar(255) collate utf8_unicode_ci default NULL,
PRIMARY KEY (`id`),
KEY `fk_period_apartment1_idx` (`apartment_id`),
KEY `fk_period_pricing1_idx` (`pricing_id`),
CONSTRAINT `fk_period_apartment1` FOREIGN KEY (`apartment_id`) REFERENCES `wp_abk_apartment` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_period_pricing1` FOREIGN KEY (`pricing_id`) REFERENCES `wp_abk_pricing` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Isn't key 1 id in this case and having it on auto_increment sufficient for being able to not specify it?
Note: If I just provide an unused value for id, like INSERT INTO wp_abk_period (id, pricing_id, apartment_id) VALUES (3333333, 13, 27) it works fine, but then again, it is set as auto_increment so I shouldn't need to do this!
Note 2: OK, this is a complete "twilight zone" moment: so after running the query above with the huge number for id, things started working normally, no more duplicate entry errors. Can someone explain me WTF was MySQL doing to produce this weird behavior?
It could be that your AUTO_INCREMENT value for the table and the actual values in id column have got out of whack.
This might help:
Step 1 - Get Max id from table
select max(id) from wp_abk_period
Step 2 - Align the AUTO_INCREMENT counter on table
ALTER TABLE wp_abk_period AUTO_INCREMENT = <value from step 1 + 100>;
Step 3 - Retry the insert
As for why the AUTO_INCREMENT has got out of whack I don't know. Added auto_increment after data was in the table? Altered the auto_increment value after data was inserted into the table?
Hope it helps.
I had the same problem and here is my solution :
My ID column had a bad parameter. It was Tinyint, and MySql want to write a 128th line.
Sometimes, your problem you think the bigger you have is only a tiny parameter...
Late to the party, but I just ran into this tonight - duplicate key '472817' and the provided answers didn't help.
On a whim I ran:
repair table wp_abk_period
which output
Number of rows changed from 472816 to 472817
Seems like mysql had the row count wrong, and the issue went away.
My environment:
mysql Ver 14.14 Distrib 5.1.73, for Win64 (unknown)
Create table syntax:
CREATE TABLE `env_events` (
`tableId` int(11) NOT NULL AUTO_INCREMENT,
`deviceId` varchar(50) DEFAULT NULL,
`timestamp` int(11) DEFAULT NULL,
`temperature` float DEFAULT NULL,
`humidity` float DEFAULT NULL,
`pressure` float DEFAULT NULL,
`motion` int(11) DEFAULT NULL,
PRIMARY KEY (`tableId`)
) ENGINE=MyISAM AUTO_INCREMENT=528521 DEFAULT CHARSET=latin1
You can check the current value of the auto_increment with the following command:
show table status
Then check the max value of the id and see if it looks right. If not change the auto_increment value of your table.
When debugging this problem check the table name case sensitivity (especially if you run MySql not on Windows).
E.g. if one script uses upper case to 'CREATE TABLE my_table' and another script tries to 'INSERT INTO MY_TABLE'. These 2 tables might have different contents and different file system locations which might lead to the described problem.