What is the meaning of AUTO_INCREMENT=20018215 here in table schema - mysql

CREATE TABLE `tblspmaster` (
`CSN` bigint(20) NOT NULL AUTO_INCREMENT,
`SP` varchar(50) NOT NULL,
`FileImportedDate` date NOT NULL,
`AMZFileName` varchar(580) NOT NULL,
`CasperBatch` varchar(50) NOT NULL,
`BatchProcessedDate` date NOT NULL,
`ExpiryDate` date NOT NULL,
`Region` varchar(50) NOT NULL,
`FCCity` varchar(50) NOT NULL,
`VendorID` int(11) NOT NULL,
`LocationID` int(11) NOT NULL,
PRIMARY KEY (`CSN`)
) ENGINE=InnoDB AUTO_INCREMENT=20018215 DEFAULT CHARSET=latin1;
What is the meaning of AUTO_INCREMENT=20018215 here in table schema . as i am inserting 500k records my identity is OK from 1 to 500k but when i tried to insert next 500k records, next records identity column value is 524281 instead of 500001.

It means that the first auto-assigned value (to CSN) will be 20018215

The large initial value, 20018215, was probably the previous value of the auto increment when you did a "Send to SQL Editor" -> "Create Statement" menu selection in MySQL Workbench. This is just a safe value to skip over existing data just in case you have to reimport the previous records.
I had the same question, but after generating several "Create" edit templates from known tables, I noticed the AUTO_INCREMENT value corresponded to the quantity of existing records in those tables. I removed the large values from my templates since I want my new tables to begin with a primary key = 1.

Related

Speed Up A Large Insert From Select Query With Multiple Joins

I'm trying to denormalize a few MySQL tables I have into a new table that I can use to speed up some complex queries with lots of business logic. The problem that I'm having is that there are 2.3 million records I need to add to the new table and to do that I need to pull data from several tables and do a few conversions too. Here's my query (with names changed)
INSERT INTO database_name.log_set_logs
(offload_date, vehicle, jurisdiction, baselog_path, path,
baselog_index_guid, new_location, log_set_name, index_guid)
(
select STR_TO_DATE(logset_logs.offload_date, '%Y.%m.%d') as offload_date,
logset_logs.vehicle, jurisdiction, baselog_path, path,
baselog_trees.baselog_index_guid, new_location, logset_logs.log_set_name,
logset_logs.index_guid
from
(
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 7), '/', -1) as offload_date,
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle,
SUBSTRING_INDEX(path, '/', 9) as baselog_path, index_guid,
path, log_set_name
FROM database_name.baselog_and_amendment_guid_to_path_mappings
) logset_logs
left join database_name.log_trees baselog_trees
ON baselog_trees.original_location = logset_logs.baselog_path
left join database_name.baselog_offload_location location
ON location.baselog_index_guid = baselog_trees.baselog_index_guid);
The query itself works because I was able to run it using a filter on log_set_name however that filter's condition will only work for less than 1% of the total records because one of the values for log_set_name has 2.2 million records in it which is the majority of the records. So there is nothing else I can use to break this query up into smaller chunks from what I can see. The problem is that the query is taking too long to run on the rest of the 2.2 million records and it ends up timing out after a few hours and then the transaction is rolled back and nothing is added to the new table for the 2.2 million records; only the 0.1 million records were able to be processed and that was because I could add a filter that said where log_set_name != 'value with the 2.2 million records'.
Is there a way to make this query more performant? Am I trying to do too many joins at once and perhaps I should populate the row's columns in their own individual queries? Or is there some way I can page this type of query so that MySQL executes it in batches? I already got rid of all my indexes on the log_set_logs table because I read that those will slow down inserts. I also jacked my RDS instance up to a db.r4.4xlarge write node. I am also using MySQL Workbench so I increased all of it's timeout values to their maximums giving them all nines. All three of these steps helped and were necessary in order for me to get the 1% of the records into the new table but it still wasn't enough to get the 2.2 million records without timing out. Appreciate any insights as I'm not adept to this type of bulk insert from a select.
'CREATE TABLE `log_set_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`purged` tinyint(1) NOT NULL DEFAUL,
`baselog_path` text,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`new_location` text,
`offload_date` date NOT NULL,
`jurisdiction` varchar(20) DEFAULT NULL,
`vehicle` varchar(20) DEFAULT NULL,
`index_guid` varchar(36) NOT NULL,
`path` text NOT NULL,
`log_set_name` varchar(60) NOT NULL,
`protected_by_retention_condition_1` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_2` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_3` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_4` tinyint(1) NOT NULL DEFAULT ''1'',
`general_comments_about_this_log` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1736707 DEFAULT CHARSET=latin1'
'CREATE TABLE `baselog_and_amendment_guid_to_path_mappings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`path` text NOT NULL,
`index_guid` varchar(36) NOT NULL,
`log_set_name` varchar(60) NOT NULL,
PRIMARY KEY (`id`),
KEY `log_set_name_index` (`log_set_name`),
KEY `path_index` (`path`(42))
) ENGINE=InnoDB AUTO_INCREMENT=2387821 DEFAULT CHARSET=latin1'
...
'CREATE TABLE `baselog_offload_location` (
`baselog_index_guid` varchar(36) NOT NULL,
`jurisdiction` varchar(20) NOT NULL,
KEY `baselog_index` (`baselog_index_guid`),
KEY `jurisdiction` (`jurisdiction`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
'CREATE TABLE `log_trees` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`original_location` text NOT NULL, -- This is what I have to join everything on and since it's text I cannot index it and the largest value is above 255 characters so I cannot change it to a vachar then index it either.
`new_location` text,
`distcp_returncode` int(11) DEFAULT NULL,
`distcp_job_id` text,
`distcp_stdout` text,
`distcp_stderr` text,
`validation_attempt` int(11) NOT NULL DEFAULT ''0'',
`validation_result` tinyint(1) NOT NULL DEFAULT ''0'',
`archived` tinyint(1) NOT NULL DEFAULT ''0'',
`archived_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dir_exists` tinyint(1) NOT NULL DEFAULT ''0'',
`random_guid` tinyint(1) NOT NULL DEFAULT ''0'',
`offload_date` date NOT NULL,
`vehicle` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `baselog_index_guid` (`baselog_index_guid`)
) ENGINE=InnoDB AUTO_INCREMENT=1028617 DEFAULT CHARSET=latin1'
baselog_offload_location has not PRIMARY KEY; what's up?
GUIDs/UUIDs can be terribly inefficient. A partial solution is to convert them to BINARY(16) to shrink them. More details here: http://localhost/rjweb/mysql/doc.php/uuid ; (MySQL 8.0 has similar functions.)
It would probably be more efficient if you have a separate (optionally redundant) column for vehicle rather than needing to do
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle
Why JOIN baselog_offload_location? Three seems to be no reference to columns in that table. If there, be sure to qualify them so we know what is where. Preferably use short aliases.
The lack of an index on baselog_index_guid may be critical to performance.
Please provide EXPLAIN SELECT ... for the SELECT in your INSERT and for the original (slow) query.
SELECT MAX(LENGTH(original_location)) FROM .. -- to see if it really is too big to index. What version of MySQL are you using? The limit increased recently.
For the above item, we can talk about having a 'hash'.
"paging the query". I call it "chunking". See http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks . That talks about deleting, but it can be adapted to INSERT .. SELECT since you want to "chunk" the select. If you go with chunking, Javier's comment becomes moot. Your code would be chunking the selects, hence batching the inserts:
Loop:
INSERT .. SELECT .. -- of up to 1000 rows (see link)
End loop

update table takes long time in mysql?

CREATE TABLE fa (
book varchar(100) DEFAULT NULL,
PRODUCTION varchar(1000) DEFAULT NULL,
VENDOR_LEVEL varchar(100) DEFAULT NULL,
BOOK_NO int(10) DEFAULT NULL,
UNSTABLE_TIME_PERIOD varchar(100) DEFAULT NULL,
`PERIOD_YEAR` int(10) DEFAULT NULL,
promo_3_visuals_manual_drag int(10) DEFAULT NULL,
BOOK_NO int(10) DEFAULT NULL,
PRODUCT_LEVEL_DIST varchar(100) DEFAULT NULL,
PRODUCT_LEVEL_ACV_TREND varchar(100) DEFAULT NULL,
KEY book (BOOK_NO),
KEY period (PERIOD_YEAR)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Index we added to column
Index : BOOK_NO and PERIODIC_YEAR has added
we cant add unique nor primary key to both column as it has plenty of duplicate values in it.
There are 46 millions rows.
We tried partitioning to period year and catno for sub partition, but doesn't worked as it is still takes long time
When i run the update query :
update fa set UNSTABLE_TIME_PERIOD = NULL where BOOK_NO = 0 and periodic_year = 201502;
It taking me more than 7 min , how can i OPTIMIZE the query?
Instead of creating 2 different keys, create single composite key for both the columns like:
KEY book_period (BOOK_NO, PERIOD_YEAR)
Also, first filter the records based on the column which will return the small set of records as compare to other.
If you think BOOK_NO will return less number of records as compare to PERIOD_YEAR, Use BOOK_NO first in where clause else use PERIOD_YEAR first and create the key accordingly.
As Álvaro González said, you should use some sort of key (eg. a Primary Key).
Adding a Primary Key:
CREATE TABLE fa (
<your_id>,
{...},
PRIMARY KEY(<your_id>),
{...}
)
or
CREATE TABLE fa (
<your_id> PRIMARY KEY,
{...}
)
It'd be a good idea to make your PRIMARY KEY AUTO_INCREMENT too for convenience, but this is not essenitial.

MySQL archival of a live production database table

I need a proper strategy to archive a live production table.
This table has a lot of inserts happening on it(close to 1000 inserts per minute). It is a myisam table and it has a key column with an auto incremented numeric value.
I need to move data older than Jan 01 to a new archive table
Inserts should not get affected.
The data is hosted on an RDS instance in Amazon.
Please help!
EDIT:
The table structure is:
CREATE TABLE data (
id int(20) NOT NULL AUTO_INCREMENT,
id_1 varchar(64) CHARACTER SET utf8 NOT NULL,
id_2 varchar(64) CHARACTER SET utf8 NOT NULL,
timestamp int(10) unsigned NOT NULL,
status_code int(10) unsigned NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY check_2 (id_1,id_2,timestamp,status_code),
KEY account_id_3 (id_1,timestamp)
) ENGINE=MyISAM AUTO_INCREMENT=75996470 DEFAULT CHARSET=latin1;
In addition to the above fields, there are about 30 more fields in this table which can accept NULL values.

Update table with MAX() and MIN() from another table + performance problems

I have a problem that i have tried to solve for the last 2 days, i have 2 tables, workspat and xtractor_wrk.
xtractor_wrk contains 250000 rows and workspat contains 67 million rows.
CREATE TABLE `xtractor_wrk` (
`db_time` datetime DEFAULT NULL,
`db_position` point NOT NULL,
`db_namn` char(50) CHARACTER SET utf8 COLLATE utf8_swedish_ci NOT NULL,
`db_sis` mediumint(8) unsigned DEFAULT NULL,
`db_om` smallint(5) unsigned DEFAULT NULL,
`db_seq` char(50) DEFAULT NULL,
`db_grarri` datetime DEFAULT NULL,
`db_grtime` datetime DEFAULT NULL,
KEY `db_time` (`db_time`),
KEY `db_sis` (`db_sis`),
KEY `db_om` (`db_om`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC
CREATE TABLE `workspat` (
`db_time` datetime NOT NULL,
`db_point` point NOT NULL,
`db_om` smallint(6) NOT NULL,
`db_sis` mediumint(8) NOT NULL,
`db_status` char(10) CHARACTER SET latin1 NOT NULL,
KEY `db_sis` (`db_sis`),
KEY `db_om` (`db_om`),
KEY `db_time` (`db_time`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci
I have 2 problems:
What i would like to do is to update my table xtractor_wrk with the MAX(workspat.db_time)
and the MIN(workspat.db_time) from the result i would get from the "ON" below .
I have tried a lot of things but the only think i got somewhat working was this:
UPDATE xtractor_wrk
JOIN workspat
ON date(xtractor_wrk.db_time) = date(workspat.db_time)
and xtractor_wrk.db_om = workspat.db_om
and xtractor_wrk.db_sis = workspat.db_sis
SET xtractor_wrk.db_grtime = workspat.db_time
and of course this does not give me the MAX and MIN update to xtractor_wrk its just that this
is the only thing that even remotely worked for me.
workspat.db_time can have any number of matches and i would like the highest and lowest and write them to xtractor_wrk.db_grtime and xtractor_wrk.db_grarri
I also have a problem with speed, i have tried indexing but its still very slow, is there a way to index across tables or is my problem all the updates? Can i write the result to a new table instead of updating or maybe delay the update since its 250000 rows to update? How would i do that?
Trying to suggest only
Add new column that indicates that is already updated or not like 1 if already updated and 0 if its not updated and also add where clause for more faster updating.
Example:
column 1 column 2 column 3 ... Updated
0
1
0
0
1

alter table statment to insert duplicate into another table

I have a table in which there is a column name with SP varchar(10) NOT NULL. I want that column always to be unique so i created unique index on that column . My table schema as follows :
CREATE TABLE IF NOT EXISTS `tblspmaster` (
`CSN` bigint(20) NOT NULL AUTO_INCREMENT,
`SP` varchar(10) NOT NULL,
`FileImportedDate` date NOT NULL,
`AMZFileName` varchar(50) NOT NULL,
`CasperBatch` varchar(50) NOT NULL,
`BatchProcessedDate` date NOT NULL,
`ExpiryDate` date NOT NULL,
`Region` varchar(50) NOT NULL,
`FCCity` varchar(50) NOT NULL,
`VendorID` int(11) NOT NULL,
`LocationID` int(11) NOT NULL,
PRIMARY KEY (`CSN`),
UNIQUE KEY `SP` (`SP`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=10000000000 ;
Now i want that if anybody tries to insert duplicate record then that record should be inserted into a secondary table name tblDuplicate.
I have gone through this question MySQL - ignore insert error: duplicate entry but i am not sure that instead of
INSERT INTO tbl VALUES (1,200) ON DUPLICATE KEY UPDATE value=200;
can i insert duplicate row into another table ?
what changes needed to be done in main table scheme or index column ?
**Note : Data will be inserted by importing excel or csv files and excel files generally contains 500k to 800 k records but there will be only one single column **
I believe you want to use a trigger for this. Here is the MySQL reference chapter on triggers.
Use a before insert trigger. In the trigger, check if the row is a duplicate (maybe count(*) where key column value = value to be inserted). If the row is a duplicate, perform an insert into your secondary table.