I have a doubt on create table syntax and more in deep when to create index on it.
More in deep I need to create a table by scratch loading ~1 milion record taken from a CSV.
The question is: when should I create an index on the table?
Or better:
- Do I have to prefer to use INDEX syntax on CREATE TABLE statement and then fill the table
or
- Do I have to create table, fill it and then use ALTER TABLE ADD INDEX statement?
Which is faster?
It is good to create index after storing data (specially large data).
Creating index before, will burden more overhead on DBMS.
Related
I have a schema that is used to archive a data set on a daily basis. Some of the analysis needs to look back, so to optimise things I need to create a couple of indexes on each table. These would be seperate (I'm not trying to cross index or anything) just a simple non-unique index, but on each table in the schema.
The archive has already been building for over a year, so we have some 400 - 500 tables, making a manual ALTER query on each tablea bit too time consuming.
I could write a php script to do it, but wondered if there was a more elegant solution with a single query or transaction?
TIA
I have copied #Shadow's answer in the comments above here to show it as the answer:
Well, the alter table and add index sections will be string constants as you have to generate the alter table statements and then execute the alter table statements you generated in the first step. See an example here: stackoverflow.com/a/44527818/5389997
Some databases, like MySQL [1] and PostgreSQL [2], support bundling of certain compatible ALTER TABLE statements (as non-standard SQL).
For example we can have:
ALTER TABLE `my_table`
DROP COLUMN `column_1`,
DROP COLUMN `column_2`,
...
or
ALTER TABLE
MODIFY `column_1` ... ,
MODIFY `column_2` ... ,
instead of having individual statements:
ALTER TABLE `my_table` DROP COLUMN `column_1`;
ALTER TABLE `my_table` DROP COLUMN `column_2`;
or
ALTER TABLE `my_table` MODIFY `column_1` ... ;
ALTER TABLE `my_table` MODIFY `column_2` ... ;
etc
For comparison of the same feature, PostgreSQL [2], which also implements this, will perform all operations in a single scan:
The main reason for providing the option to specify multiple changes in a single ALTER TABLE is that multiple table scans or rewrites can thereby be combined into a single pass over the table.
Although for DROP COLUMN specifically it will often not even need do that:
The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations...
Questions:
Would the multi-column statement result in traversing all the rows just once and performing all changes needed?
How does MySQL actually perform DROP COLUMN? Does it also "hide" the columns first, or does it delete the data straight away?
Assumptions:
Using InnoDB
No indexes/complex defaults are involved in any of the columns we want to change/drop (so basically changes that would not require a temporary table when run as individual alter statements)
References:
[1] MySQL ALTER TABLE docs
[2] PostgreSQL ALTER TABLE docs
MySQL's InnoDB:
(This does not really answer the Questions, but provides a little more insight in the the bigger question of ALTER.)
If any of the alters needs to copy the table over, you are probably better off putting all alters into the same statement. Changing the PRIMARY KEY, for example, requires rebuilding the data that is clustered with the PK.
Some alters can be achieved by simply altering the schema; these are virtually instantaneous, and could be done via separate alter statements. Adding an option to ENUM was implemented long ago.
Some alters need some form of scan, but can do it "in the background". DROP INDEX can be done by quickly "hiding" it, then freeing up the BTree in the background.
I have left out a grey area in which you batch 'simple' alters. One would hope that ALTER is smart enough to simply go through them quickly, rather than deciding to copy the table over.
I got some useful feedback but decided to respond to my own question to provide a more concrete set of answers.
Would the multi-column statement result in traversing all the rows just once and performing all changes needed?
Yes, if the alter statement results in rebuilding the table then it only needs to do it once.*
* This answer comes from my own testing and other mostly anecdotal evidence (including #Uueerdo 's in this post). It would be useful to have some official docs for this...
How does MySQL actually perform DROP COLUMN? Does it also "hide" the columns first, or does it delete the data straight away?
MySQL will rebuild the table in place (rather than create a copy or just change metadata) for most column operations. Each specific case can be found in the Online DDL docs for InnoDB.
A few operations like renaming a column or setting a default value will just alter metadata, so they don't require a table rebuild.
However, dropping a column DOES require a full table rebuild.
I have quite a big table in MySQL 5.5, ~200M rows, and I want to add an index to one of the columns in this table (btree type). The column is of type integer and contains a wide distribution of integers.
My question is when is the btree computed?
When I execute the simple create index query:
ALTER TABLE bigtable ADD INDEX (column3);
It returns immediately. Is the computing of the btree happening in the background? I can't imagine that MySQL is that fast at creating a btree of ~200M values with a wide distribution of integers.
Short answer: Yes.
Long Answer: A look at the MySQL Documentation for ALTER_TABLE reveals the following:
In most cases, ALTER TABLE makes a temporary copy of the original table. MySQL waits for other operations that are modifying the table, then proceeds. It incorporates the alteration into the copy, deletes the original table, and renames the new one. While ALTER TABLE is executing, the original table is readable by other sessions (with the exception noted shortly). Updates and writes to the table that begin after the ALTER TABLE operation begins are stalled until the new table is ready, then are automatically redirected to the new table without any failed updates. The temporary copy of the original table is created in the database directory of the new table. This can differ from the database directory of the original table for ALTER TABLE operations that rename the table to a different database.
So, when you create your index, the index is being created on a temporary copy of the table, which is then imported in place of the now dropped original table when it completes.
From the docs:
An ALTER TABLE statement that contains DROP INDEX and ADD INDEX
clauses that both name the same index uses a table copy, not Fast
Index Creation.
This is a bit unclear to me. Is it talking about the NAME of the index? Can someone give an example of a query in which MySQL resorts to a table copy?
Indeed, it sounds like this line is about:
An (One, single) ALTER TABLE statement
that contains (both) a DROP INDEX and an ADD INDEX clause
and both clauses name the same index
and states that such a statement uses a table copy, not Fast Index Creation.
Such a statement would be:
ALTER TABLE MyTable
DROP INDEX MyIndex
ADD INDEX MyIndex(MyColumn);
The documentation is not really clear about the reason behind this, but I think the database want to create an index first and then drop the other index, so the statement by itself can more easily be made atomic. (Creating the index might fail.) If the index name itself is used in the storage as well, that order of first creating then dropping would give a conflict.
After all, fast index creation is a relatively new feature, so they might improve this over time.
In my database contains 3 millions records. Initially haven't any FULLTEXT index in my database. Now I'm trying to add FULLTEXT index using Create index statement, that take huge time and browser showing connecting and ram goes to 70%. I feel that nothing happen to database. what should I follow? Is there any other way to add FULLTEXT index? If I'm add FULLTEXT index is that affect to next inserting values? 1st I want to anyhow add FULLTEXT index to my table. I'm using Xammp MySql database.
1.Create the dummy table like your original table structure.
2.create indexes on your dummy table which want you.
3.take a dump from your original table and import to dummy.
4.drop the original table and rename the dummy table to original name.