I have a huge table in a database and I want to split that into several parts physically, maintaining the database scheme.
For example, the table name is TableName and has 2 000 000 rows.
I would like to split that table into four parts, but I want to work in the same way with the table, so
select [Column List] from TableName where [Filter]
insert into TableName ([Column List]) values([Values])
update TableName [Updates] where [Filter]
delete from TableName where [filter]
would work in the same way after splitting the table as before. Basically I want my database to handle in different threads my queries. How can I achieve this?
Thanks in advance.
Perhaps you should look at partitioning.
If you are looking into copying data to a separate slave / duplicate; consider implementing that by binary logging, so the replica will read the binary logs to do the replication, rather than doing it manually or programmatically.
Related
I'm experiencing huge performance problem in one legacy application.
There is a search form where user can search records with given value.
A result row contains 10 columns. Then a SP returns any row which contains in any column that value.
This SP uses 8 Tables and some of them have about million records. Every minute I get a new record. This SP conducts paging as well.
Execution of this SP takes sometimes around 40 seconds.
What I did was, I created a new table and put there all records by using a query from this SP, but without conditions.
When there is a new update or update in one of source table I use a trigger and update this new "cache" table.
Now waiting for results from this new table takes only 1-3 seconds.
Has someone experience with something like this?
One of my colleagues said I better use view, but then every time I will be making JOINS.
What do you think? Is there another way?
Often times temporary tables can help you resolve performance issues. One approach might be to collect only the records that you need to consider into temporary tables and then create your final select statement from the temporary tables joined to any other tables that you're not filtering.
As an example, let's say one of the fields you are searching for is field1 in table1. Start by inserting into table #table1 only records that have the value of field1 you are looking for:
select PrimaryKeyTable1, Field1, Field2, Field3, etc...
into #table1
from table1
where Field1 = 'Whatever you are looking for'
This should be pretty fast even for a big tables, especially if you have an index on Field1. You do this for every table with search fields to collect all the records that have relevant records you are searching.
Then you also need to be sure to insert any records into your temporary tables that might have foreign key references to any of your other temporary tables. So let's say you also built a table #table2 with the above method that has a foreign key to table1 called PrimaryKeyTable1. You would insert those records like:
Insert into #table1
(PrimaryKeyTable1, Field1, Field2, Field3, etc...)
select table1.PrimaryKeyTable1, table1.Field1, table1.Field2, table1.Field3, etc...
from table1
join #table2
on table1.PrimaryKeyTable1 = table2.PrimaryKeyTable1
where table1.PrimaryKeyTable1 not in
(Select PrimaryKeyTable1 from #table1)
Now you will also have any records in #table1 that match to a record in #table2 that contain records that match the search criteria. You do this for all your temporary tables that have relevant foreign keys. The order that you do the inserts matters; be sure that you don't reference any temporary tables until after the last insert statement while collecting the foreign key referenced records.
Then you can simply do your final select statement, replacing the actual tables with the temporary tables you have built and eliminating all the filters that search your field data. Depending on the structure of your query there might be other optimizations, but that is the general idea.
If you've already explored all of your indexing options and this still doesn't help, MS SQL Server has "Change Tracking" features that maybe be of use to you in building your cache table. You enable the database for change tracking and configure which tables you wish to track. SQL Server then creates change records on every update, insert, delete on a table and then lets you query for changes to records that have been made since the last time you checked. This is very useful for syncing changes and is more efficient than using triggers. It's also easier to manage than making your own tracking tables. This has been a feature since SQL Server 2005.
How to: Use SQL Server Change Tracking
Change tracking only captures the primary keys of the tables and let's you query which fields might have been modified. Then you can query the tables join on those keys to get the current data. If you want it to capture the data also you can use Change Capture, but it requires more overhead and at least SQL Server 2008 enterprise edition.
Change Data Capture
Your solution is a robust way of doing what is called in Microsoft SQL Server "an indexed view" or "materialized view" in Oracle.
Basically you are correct - it's faster to navigate single indexed table then a dozen ones which are updated constantly.
You should really try creating an indexed view (some start here https://technet.microsoft.com/en-us/library/dd171921(v=sql.100).aspx) and it will probably solve all your performance issues.
You can use schema binding View and create cluster index on view.it will store your view data physically.but after creating schema binding view you can not alter your table.
I have a MySQL database that is up to about 17 GB in size and has 38 million entries. At the moment, I need to both increase the size of one column (varchar 40 to varchar 80) and add more columns.
Many of the fields are indexed including the one that I need to change. It is part of a unique pair that is necessary for the applications to work. In attempting to just make the change yesterday, the query ran for almost four hours without finishing, when I decided to cut our outage and just bring the service back up.
What is the most efficient way to make changes to something of this size?
Many of these entries are also old and if there is a good way to sort of shard off entries but still have them available that might help with this problem by making the table a much more manageable size.
You have some choices.
In any case you should take a backup before you do this stuff.
One possibility is to take your service offline and do it in place, as you have tried. If you do that you should disable key checks and constraints.
ALTER TABLE bigtable DISABLE KEYS;
SET FOREIGN_KEY_CHECKS=0;
ALTER TABLE (whatever);
ALTER TABLE (whatever else);
...
SET FOREIGN_KEY_CHECKS=1;
ALTER TABLE bigtable ENABLE KEYS;
This will allow the ALTER TABLE operation to go faster. It will regenerate the indexes all at once when you do ENABLE KEYS.
Another possibility is to create a new table with the new schema you want, then disable the keys on the new table, then do as #Bader suggested and insert the contents of the old table.
After your new table is built you will re-enable the keys on it, then rename the old table to some name like "old_bigtable" then rename the new table to "bigtable".
It's possible that you can keep your service online while you're populating the new table. But that might work poorly.
A third possibility is to dump your giant table (to a flat file) and then load it to a new table with the new layout. That is pretty much like the second possibility except that you get a table backup for free. You can make this go pretty fast with SELECT DATA INTO OUTFILE and LOAD DATA INFILE. You'll need to have access to your server machine's file system to do this.
In all cases, disable, then re-enable, the constraints and keys to get things to go fast.
Create a new table with the new structure you want with a different name for example NewTable.
Then insert data into this new table from the old table using the following query:
INSERT INTO NewTable (field1, field2, etc...) SELECT field1, field2, ... FROM OldTable
After this is done, you can drop the old table and rename the new table to the original name
DROP TABLE `OldTable`;
RENAME TABLE `NewTable` TO `OldTable` ;
I have tried this approach on a very large table and it's much much faster than altering the table.
With MySQL 5.1 and again with 5.5 certain alter statements were enhanced to just modify the structure without rewriting the entire table ( http://dev.mysql.com/doc/refman/5.5/en/alter-table.html - search for in-place). The availability of this though varies by the type of change you are making and the engine in use, the most value comes from InnoDB Plugin. In the case of your specific changes though the entire table would be rewritten.
When we encounter these issues, we typically try to leverage replica databases. As long as you are adding and not removing you can run your DDL against the replica first and then schedule a brief outage for promoting the replica to the master role. If you happen to be on RDS this is even one of their suggested uses for their replica instances http://aws.amazon.com/about-aws/whats-new/2012/10/11/amazon-rds-mysql-rr-promotion/.
Some other alternatives include:
Selecting out a subset of records into a new table with the desired structure (use INTO OUTFILE to avoid a table lock). Once complete you can schedule a maintenance window and REPLACE INTO or UPDATE any records that have changed in the origin table since the initial data copy. Once the update is complete a RENAME TABLE... of both tables wraps the changes up.
Using a tool like Percona's pt-online-schema-change: http://www.percona.com/doc/percona-toolkit/2.1/pt-online-schema-change.html. This tool works with triggers so if you already have triggers on the tables you want to change this may not fit your needs.
CONTEXT:
we have big databases with loads of tables. Most of them (99%) are using innodb.
we want to have a daily process that monitors which table has been modified. As they use innodb the value of Update_time from
SHOW table STATUS from information_schema;
is null.
For that reason we want to create a daily procedure that will store the checksum (and other stuffs for that matters) of each table somewhere (preferably another table). On that, we will do different checks.
PROBLEM:
I'm trying to use
checksum table from db_schema.table_name;
which returns a resultset-table with 2 columns: "table","checksum".
It gives me the value I want but I'm not able to use it in a select or insert statement.
I tried a lot of things like:
select `checksum` from (checksum table from db_schema.table_name);
or other similar queries. But I'm not able to extract the data from the resultset.
Is there a way I can do that?
Thanks in advance for your help.
EDIT: in the end what I want is to build a more complex resultset having different informations in it (table schema, table name, count, checksum, datetime:now()...)
Then I'll use this resultset to compare with the values of yesterday and draw my own statistics. That's why I want to get the checksum from that resultset.
There is no possibility to save the result of CHECKSUM TABLE directly using SQL. Neither can you use prepared statements or cursors in stored procedures to use the checksum result.
You best make a script around it, or download some popular tools doing it for you.
For MyISAM tables using the CHECKSUM=1 table argument, you can simply use INFORMATION_SCHEMA like this:
SELECT TABLE_NAME,CHECKSUM FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'test' AND ENGINE='MyISAM'
AND CHECKSUM IS NOT NULL;
I have a large db that I am chopping into smaller databases based on time intervals. This will reduce query time dramatically. In a query can I copy a resultset from one database to another with an identical schema?
Basically a select followed by an update conducted in the same code block?
Thanks,
slothishtype
Copying data from one database into another should be almost as simple as #slotishtype describes except you'll need to qualify it with the OTHER database you want it replicated into.
create table OtherDatabase.Student Select * from FirstDatabase.student
However, as you mention about copying same schema, that is something else. If you want all your R/I rules, triggers, etc, you may have to dump the database schema from your first (where it has all the create tables, indexes, etc) and run in a new database. However, that might post an issue where you have auto-incrementing columns. You can't write to a read-only auto-increment column -- the database controls that. However, if such case existed, you would have to just make those columns as integer datatypes (or similar) and do a
insert into OtherDatabase.Student ( field1, field2, etc )
select field1, field2, etc from FirstDatabase.student
If it is not necessary to add it to a new database, this will do fine:
CREATE TABLE student1 SELECT * FROM student
EDIT: for the record: This will not coopy over indices etc.
This, however, will:
CREATE TABLE student_new LIKE student;
INSERT recipes_new SELECT * FROM student;
slotishtype
Is it possible to replicate a single table?
Yes this is possible. Have a look at the slave options of the MySQL manual. This still requires to create a complete binlog of the whole database though.
To sync specific tables again to one or more slaves rather use
pt-table-checksum
and then
pt-table-sync
That should automatically identify the out-of-sync tables and only sync those.
CREATE TABLE new_table_name
SELECT *
FROM original_table_name;
Use "*" if you want to select all columns from the original table, otherwise give specific columns name.
This will replicate table within same database.
I know this is an old question but this is for anyone who comes here looking for an answer:
CREATE TABLE table2 LIKE table1;
This will create a table with the same format and columns but no data. To transfer the data use:
INSERT INTO table2 SELECT * FROM table1;
EDIT:
It is important to note that this is an information transfer only. Meaning if you had indexes on table1 they are not transferred to table2. You will have to manually index table2