Right now I have the following construct to find items with similar keywords:
CREATE TEMPORARY TABLE tmp (FULLTEXT INDEX (keywords)) ENGINE=MyISAM
SELECT object_id, keywords FROM object_search_de;
SELECT object_id
FROM tmp
WHERE MATCH (keywords) AGAINST ('foo,bar') > 1.045;
DROP TEMPORARY TABLE tmp;
So, depending on the amount of overall records and the average size of the keyword field, this can get really slow (over 60 seconds execution time). My goal would be to be within 1 second for this task.
Alternatively to keywords comma separated in a TEXT field, I do also have an atomic keyord table (meaning two columns keyword and object_id, directly associating one keyword with an item).
Are there any alternatives or smooth solutions to achieving the same effect without resorting to a MyISAM mirror table?
First of all, do not create the table each time. You can create it once and use a trigger to insert/update/delete records or periodically (every hour for example) truncate and insert the records if you don't want to use triggers.
Alternatively, you can offload this task from MySQL and use Lucene/Solr or Sphinx.
Related
I'm trying to FULLTEXT index into my table. That table content 3 million records.It was very difficult to insert that index using Alter table statement or Create index statement. Therefor easiest way to create new table and 1st add index and load the data. How can I load existing table data into newly created table? I'm using Xammp MySql database.
I don't know why creating a full text index on an existing table would be difficult. You just do:
create fulltext index idx_table_col on table(col)
Usually, it is faster to add indexes to already loaded tables than to load data into an empty table that has indexes pre-defined.
EDIT:
You can do the load by using insert. The following will insert the first 100,000 rows:
insert into newtable
select *
from oldtable
order by id
limit 0, 100000;
You can put this in a loop (via a stored procedure in MySQL or at the application level). Perhaps this will return faster. Each time you run it, you would change the offset value in limit.
I would expect that the overall time for creating an index would be less than using insert, but for your purposes, you might find this more convenient.
INSERT INTO newTable SELECT * FROM oldTable;
After your new table and index on it is created.
This is given you want to copy all columns. You can select specific columns as well.
I wish to duplicate a selection of records in a mySQL table.
The pk of the table is an autoincremented int.
I want to do this with one set of mysql queries (for performance reasons).
It seems like the fastest way to do this is to put the results of the selection into a temporary table,
make any changes needed, and reinsert the records back to the original table, like this:
CREATE TEMPORARY TABLE temp1234 ENGINE=MEMORY SELECT * FROM a_table WHERE column='my selection';
# do updates in temp1234; (altering FK's mainly)
INSERT INTO a_table SELECT * FROM temp1234;
But when I try to do this i get an error for duplicate PKs.
Now, I realise that I could alter the INSERT with SELECT query to exclude the pk/ID column, but as I am proceduraly generating these queries across multiple tables for a large data copying function, i want to avoid having to supply column names.
What is the best way around this problem?
I searched Internet and Stack Overflow for my trouble, but couldn't find a good solution.
I have a table (MySql MyISAM) containing 300,000 rows (one column is blob field).
I must use:
DELETE FROM tablename WHERE id IN (1,4,7,88,568,.......)
There are nearly 30,000 id's in the IN syntax.
It takes nearly 1 hour. Also It does not make the .MYD file smaller although I delete 10% of it, so I run OPTIMIZE TABLE... command. It also lasts long...(I should use it, because disk space matters for me).
What's a way to improve performance when deleting the data as above and recover space? (Increasing buffer size? which one? or else?)
With IN, MySQL will scan all the rows in the table and match the record against the IN clause. The list of IN predicates will be sorted, and all 300,000 rows in the database will get a binary search against 30,000 ids.
If you do this with JOIN on a temporary table (no indexes on a temp table), assuming id is indexed, the database will do 30,000 binary lookups on a 300,000 record index.
So, 300,000 binary searches against 30,000 records, or 30,000 binary searches against 300,000 records... which is faster? The second one is faster, by far.
Also, delaying the index rebuilding with DELETE QUICK will result in much faster deletes. All records will simply be marked deleted, both in the data file and in the index, and the index will not be rebuilt.
Then, to recover space and rebuild the indexes at a later time, run OPTIMIZE TABLE.
The size of the list in your IN() statement may be the cause. You could add the IDs to a temporary table and join to do the deletes. Also, as you are using MyISAM you can use the DELETE QUICK option to avoid the index hit whilst deleting:
For MyISAM tables, if you use the QUICK keyword, the storage engine
does not merge index leaves during delete, which may speed up some
kinds of delete operations.
I think the best approach to make it faster is to create a new table and insert into it the rows which you dont want to delete and then drop the original table and then you can copy the content from the table to the main table.
Something like this:
INSERT INTO NewTable SELECT * FROM My_Table WHERE ... ;
Then you can use RENAME TABLE to rename the copy to the original name
RENAME TABLE My_Table TO My_Table_old, NewTable TO My_Table ;
And then finally drop the original table
DROP TABLE My_Table_old;
try this
create a table name temptable with a single column id
insert into table 1,4,7,88,568,......
use delete join something like
DELETE ab, b FROM originaltable AS a INNER JOIN temptable AS b ON a.id= b.id where b.id is null;
its just an idea . the query is not tested . you can check the syntax on google.
I have a MySQL MYISAM table (say tbl) consisting of 2 unsigned int fields, say, f1 and f2. There is an index on f2 and the table is very large (approximately 320,000,000+ rows). I update this table periodically (with approximately 100,000 new rows a week), and, in order to be able to search this table without doing an ORDER BY (which would be very time consuming in real-time queries), I physically ORDER the table according to the way in which I want to retrieve its rows.
So, I perform an ALTER TABLE tbl ORDER BY f1 DESC. (I know I have enough physical space on the server for a copy of the table.) I have read that during this operation, a temporary table is created and SELECT statements are not affected on the current rows.
However, I have experienced that this is not the case, and SELECT statements on the table that occur at the same time with the ALTER table are getting blocked and do not terminate. After the ALTER TABLE tbl completes (about 40 minutes on the production server), the SELECT statements on tbl start executing fine again.
Is there any reason why the "ALTER table tbl ORDER BY f1 DESC" seems to be blocking other clients from querying tbl?
Altering a table will always grab a lock on the table, preventing SELECTs from running.
I'll admin that I didn't even know you could do that with an ALTER TABLE.
What are you trying to get from the table? For example, all records in a given range? 320 million rows is not a trivial number. I'll give you my gut reactions:
Switch to InnoDB (allows #2, also gives transactions, but without #2 may hurt performance)
Partition the table (makes it act like a number of slightly smaller tables)
Consider a redesign, such as having a "working set" table and a "historical" table, basically manually partitioning. If you usually look for recently inserted data, this (along with partitioning) will help a lot. If your lookups are evenly distributed, this probably won't make a difference.
Consider adding a new column you could use in conjunction to narrow down selects (so instead of searching on date, search on date and customer ID)
Since I don't know what you're storing, some of these (such as #4) may not apply.
There are some other things you could try. OPTIMIZE TABLE may help you but take less time, but I doubt it. I think internally it's implemented as a dump/reload, at least on the InnoDB side.
I have a table with about 35 million rows. each has about 35 integer values and one time value (last updated)
The table has two indexes
primary - uses two integer values from the table columns
Secondary - uses the 1st integer from the primary + another integer value.
I would like to delete old records (about 20 millions of them) according to the date field.
What is the fastest way:
1. Delete as is according the the date field?
2. Create another index by date and then delete by date.
There will be one time deletion of large portion of the data and then incremental weekly deletion of much smaller parts.
Is there another way to do it more efficiently?
it might be quicker to create a new table containing the rows you want to keep, drop the old table and then rename the new table
For weekly deletions an index on date field would speed things up.
Fastest (but not easiest) - i think - is to keep your records segmented into multiple
tables based on date, e.g. given week, and then have a union table of all those tables for the regular queries across the whole thing (so your queries would be unaltered). You would each week, create new tables and redefine the union table.
When you wish to drop old records, you simply recreate the union table to leave the records in the old tables out, and then drop those left out (remember to truncate before you drop depending on you filesystem). This is probably the fastest way to get there with MySQL.
A mess to manage though :)