Let's say i have a mysql database table 'article' with the following fields: id, title, url, views
I have the field title marked with a FULLTEXT index and the field url marked with a UNIQUE index.
My question is, if i do an ordinary update something like:
UPDATE 'article' SET views = views + 1 WHERE id = {id}
...will this result in a update of the mysql table indexes?
Is it safe (from speed point of view) to keep the field views in the table article or maybe i should create a separate table, let's say, article_stats with the following fields: article_id, views ?
Yes, UPDATE statements update indexes. MySQL manages indexes automatically - you never need to worry about updating them manually or triggering an update. If you are asking whether that particular UPDATE will change your indexes which don't include the views column - no, it wont. Only related indexes get updated.
Keeping a views column is fine, unless you need to track extra information about each view (when it occurred, user who made the view, etc)
Your SQL does contain a syntax error, however. You can't quote table names like 'article'. If you need to quote a table name (e.g. if it contains a SQL reserved word), then use backticks like this:
UPDATE `article` SET ...
I agree with Cal, index gets updated with update statement "if you update the index". In your case specific example the index or indexes are not updated because they are not related to the view field but updating the view will still slow down because that's such a frequent operation and you can programmatically keep the view updates in a shared memory with a binary tree or hash table and update them together after some time or size point. For the best speed you also can use memory tables, which are volatile but you can transfer the data from time to time to the actual table. By this way you won't get to deal with harddisk writer for every "view" update.
Keeping seperate table will result the same thing. Your application slows down because there is an update when there is a select - that row you are updating will be locked until you done with it and another readers will wait your row-level operation. You will still have selects to show that view count even in a seperate table.
Well, when you have that much load, you can have master and slave servers to seperate reads and writes and synchronize time to time.
Related
I just received access to a MySQL Database where the ID is a float field (not autoIncrement). This database was first used with a C# Application that is not updated anymore.
I have to make a web app and I can't edit the type of field in the database neither make a new one.
So, how can I make "INSERT" query that will increment the ID and not create problem when multiple people is working in the same time ?
I tried to get the last id, increment by one, then insert into the table but it's not the best way if users are creating a record in the same time.
Thank you
how can I make "INSERT" query that will increment the ID and not create problem when multiple people is working in the same time ?
You literally cannot make an INSERT query alone that will increment the ID and avoid race conditions. It has nothing to do with the data type of the column. The column could be INT and you would have the same race condition problem.
One solution is to use LOCK TABLES to block concurrent sessions from inserting rows. Then your session can read the current MAX() value in the table, increment it, INSERT a new row with the incremented value, and then UNLOCK TABLES as promptly as possible to allow the concurrent sessions to do their INSERTs.
In fact, this is exactly how MySQL's AUTO_INCREMENT works. Each table stores its own most recent auto-increment value. When you insert to a table with an auto-increment, the table is locked briefly, just long enough for your session to read the table's auto-inc value, increment it, store it back into the table's metadata, and also store that value in your session's thread data. Then it unlocks the table's auto-inc lock. This all happens very quickly. Read https://dev.mysql.com/doc/refman/8.0/en/innodb-auto-increment-handling.html for more on this.
The difficult part is that you can't simulate this from SQL, because SQL naturally must obey transaction scope. The auto-inc mechanism built into InnoDB works outside of transaction scope, so concurrent sessions can read the latest incremented auto-inc value for the table even if the transaction that incremented it has not finished inserting that value and committing its transaction. This is good for allowing maximum concurrency, but you can't do that at the SQL level.
The closest you can do is the LOCK TABLES solution that I described, but this is rather clumsy because it ends up holding that lock a lot longer than the auto-inc lock typically lasts. This puts a limit on the throughput of concurrent inserts to your table. Is that too limiting for your workload? I can't say. Perhaps you have a modest rate of inserts to this table, and it won't be a problem.
Another solution is to use some other table that has an auto-increment or another type of unique id generator that is safe for concurrent sessions to share. But this would require all concurrent sessions to use the same mechanism as they INSERT rows.
A possible solution could be the following, but it is risky and requires thorough testing of ALL applications using the table/database!
The steps to follow:
rename the table (xxx_refactored or something)
create a view using the original table and cast the ID column as FLOAT in the view, so the other application will see the data as FLOAT.
create a new column or alter the existing one and add the AUTO_INCREMENT to it
Eventually the legacy application will have to be updated to handle the column properly, so the view can be dropped
The view will be updatable, so the legacy application will still be able to insert and update the table through the view.
This won't work if:
Data in the column is outside of the range of the chosen new datatype
The column is referenced by a foreign key constraint from any other table
Probably more :)
!!! TEST EVERYTHING BEFORE YOU DO IT IN PRODUCTION !!!
Probably a better option is to ask somebody to show you the code which maintains this field in the legacy application.
I have an excel file that contains contents from the database when downloaded. Each row is identified using an identifier called id_number. Users can add new rows on the file with a new unique id_number. When it is uploaded, for each excel row,
When the id_number exist on the database, an update is performed on the database row.
When the id_number does not exist on the database, an insert is performed on the database row.
Other than the excel file, data can be added or updated individually using a file called report.php. Users use this page if they only want to add one data for an employee, for example.
Ideally, I would like to do an insert ... on duplicate key update for maximum performance. I might also put all of them in a transaction. However, I believe this overall process have some flaws:
Before any add/updates, validation checks have to be done on all excel rows against their corresponding database rows. The reason is because there are many unique columns in the table. That's why I'll have to do some select statements to insure that the data is valid before performing any add/update. Is this efficient on tables with 500 rows and 69 columns? I could probably just get all the data and store all of them in a php array and do the validation check on the array, but what happens if someone adds a new row (with an id_number of 5) through report.php? Then suppose the excel file I uploaded also contains a row with an id_number 5? That could probably destroy my validations because I can not be sure my data is up to date without performing a lot of select statements.
Suppose the system is in the middle of a transaction adding/updating the data retrieved from the excel file, then someone from report.php adds a row because all the validations have been satisfied (E.G. no duplicate id_numbers). Suppose at this point in time the next row to be added from the excel file and the row that will be added by the user on report.php have the same id_number. What happens then? I don't have much knowledge on transactions, I think they at least prevents two queries changing a row at the same time? Is that correct?
I don't really mind these kinds of situations that much. But some files have many rows and it might take a long time to process all of them.
One way I've thought of fixing this is: while the excel file upload is processing, I'll have to prevent users using report.php to modify the rows currently held by the excel file. Is this fine?
What could be the best way to fix these problems? I am using mysql.
If you really need to allow the user to generate their own unique ID then the you could lock the table in question while you're doing you validation and inserting.
If you acquire a write lock, then you can be certain the table isn't changed while you do your work of validation and inserting.
`mysql> LOCK TABLES tbl_name WRITE`
don't forget to
`mysql> UNLOCK TABLES;`
The downside with locking is obvious, the table is locked. If it is high traffic, then all your traffic is waiting, and that could lead all kinds of pain, (mysql running out of connections, would be one common one)
That said, I would suggest a different path altogether, let mysql be the only one who generates a unique id. That is make sure the database table have an auto_increment unique id (primary key) and then have new records in the spreadsheet entered without without the unique id given. Then mysql will ensure that the new records get a unique id, and you don't have to worry about locking and can validate and insert without fear of a collision.
In regards to the question as to performance with a 500 records 69 column table, I can only say that if the php server and the mysql server are reasonably sized and the columns aren't large data types then this amount of data should be readily handled in a fractions of a second. That said performance can be sabotaged by one bad line of code so if your code is slow to perform, I would take that as a separate optimisation problem.
Is there any work around to get the latest change in MySQL Database using Ado.NET.
i.e. change in which table, which column, performed operation, old and new value. both for single table change and multiple table change. want to log the changes in my own new table.
There are several ways how change tracking can be implemented for mysql:
triggers: you can add DB trigger for insert/update/delete that creates an entry in the audit log.
add application logic to track changes. Implementation highly depends on your data layer; if you use ADO.NET DataAdapter, RowUpdating event is suitable for this purpose.
Also you have the following alternatives how to store audit log in mysql database:
use one table for audit log with columns like: id, table, operation, new_value (string), old_value (string). This approach has several drawbacks: this table will grow up very fast (as it holds history for changes in all tables), it keeps values as strings, it saves excessive data duplicated between old-new pairs, changeset calculation takes some resources on every insert/update.
use 'mirror' table (say, with '_log' suffix) for each table with enabled change tracking. On insert/update you can execute additional insert command into mirror table - as result you'll have record 'snapshots' on every save, and by this snapshots it is possible to calculate what and when is changed. Performance overhead on insert/update is minimal, and you don't need to determine which values are actually changed - but in 'mirror' table you'll have a lot of redundant data as full row copy is saved even if only one column is changed.
hybrid solution when record 'snapshots' are temporarily saved, and then processed in background to store differences in optimal way without affecting app performance.
There are no one best solution for all cases, everything depends on the concrete application requirements: how many inserts/updates are performed, how audit log is used etc.
I have 50GB mysql database (80 tables) that I need to delete some contents from it.
I have a reference table that contains list if product ids that needs to be deleted from the the other tables.
Now, the other tables can be 2 GB each, contains the items that needs to be deleted.
My question is: since it is not a small database, what is the safest way to delete
the data in one shot in order to avoid problems?
What is the best method to verify the the entire data was deleted?
Probably this doesn't help anymore. But you should keep this in mind when creating the database. In mysql (depending on the table storage type, for instance in InnoDB) you can specify relations (They are called foreign key constraints). These relations mean that if you delete an entry from one row (for instance products) you can automatically update or delete entries in other tables that have that row as foreign key (such as product_storage). These relations guard that you have a 100% consistent state. However these relations might be hard to add on hindsight. If you plan to do this more often, it is definitely worth researching if you can add these to your database, they will save you a lot of work (all kinds of queries become simpler)
Without these relations you can't be 100% sure. So you'd have to go over all the tables, not which columns you want to check on and write a bunch of sql queries to make sure there are no entries left.
As Thirler has pointed out, it would be nice if you had foreign keys. Without them burnall 's solution can be used to transactions to ensure that no inconsistencies creep.
Regardless of how you do it, this could take a long time, even hours so please be prepared for that.
As pointed out earlier foreign keys would be nice in this place. But regarding question 1 you could perhaps run the changes within a transaction from the MySQL prompt. This assumes you are using a transaction safe storage engine like InnoDB. You can convert from myisam to InnoDB if you need to. Anyway something like this:
START TRANSACTION;
...Perform changes...
...Control changes...
COMMIT;
...or...
ROLLBACK;
Is it acceptable to have any downtime?
When working with PostgreSQL with databases >250Gb we use this technique on production servers in order to perform database changes. If the outcome isn't as expected we just rollback the transaction. Of course there is a penalty as the I/O-system has to work a bit.
// John
I am agree with Thirler that using of foreign keys is preferrable. It guarantees referential integrity and consisitency of the whole database.
I can believe that life sometimes requires more tricky logic.
So you could use manual queries like
delete from a where id in (select id from keys)
You could delete all records at once or by range of keys or using LIMIT in DELETE. Proper index is a must.
To verify consistency you need function or query. For example:
create function check_consistency() returns boolean
begin
return not exists(select * from child where id not in (select id from parent) )
and not exists(select * from child2 where id not in (select id from parent) );
-- and so on
end
Also maybe something to look into is Partitioning in MySQL tables. For more information check out the ref manual:
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
Comes down that you can divide tables (for example) in different partitions per datetime values or indexsets.
Assume that the default ordering of a MySQL-table (ISAM) is changed by executing:
ALTER TABLE tablename ORDER BY columnname ASC;
From now on, am I guaranteed to obtain records retrieved from the table in the order of "columnname ASC" assuming no "ORDER BY" is specified in my queries (i.e. "SELECT * FROM tablename WHERE ... LIMIT 10;")?
Are there any corner-cases that I should be aware of?
Update #1: Thanks a lot to Quassnoi who correctly pointed out that INSERTs and DELETEs messes up the ordering. This leads me to the following to extra questions:
What about UPDATEs? Assume that no INSERTs or DELETEs are made to the table, but only updates - will the sort order be intact?
Assume that INSERTs and DELETEs are made - how do I "rebuild" the sorting again, say once a day (in this specific case the table only changes daily, so rebuilding it daily after the changes are done should still be OK!). Does REPAIR TABLE fix it, or must add do ALTER TABLE ... ORDER BY again?
From documentation:
Note that the table does not remain in this order after inserts and deletes
Actually, if you issue SELECT ... ORDER BY to this table, the option to ALTER TABLE won't spare you of filesort, but instead make filesort much faster.
Sorting an already ordered set is equivalent to browsing this set to ensure everything is OK.
What about UPDATEs? Assume that no INSERTs or DELETEs are made to the table, but only updates - will the sort order be intact?
If your table does not contain any dynamic fields (like VARCHAR or 'BLOB'), then most probably MyISAM will not move it when updating.
I would not rely on this behavior, though, if I were building a nuclear power plant or something I get paid for.
Assume that INSERTs and DELETEs are made - how do I "rebuild" the sorting again, say once a day (in this specific case the table only changes daily, so rebuilding it daily after the changes are done should still be OK!). Does REPAIR TABLE fix it, or must add do ALTER TABLE ... ORDER BY again?
You'll need to do ALTER TABLE ... ORDER BY.
REPAIR just fixes the physical structure of a corrupted table.
Physically ordering a column can save loads of IO.
It's perfectly legitimate method used in advanced systems to speed query time.
You just need to reorg the data every now and then so its stays clustered.
Just because some folks haven't heard of it doesn't mean it doesn't exist.
- 25 year advanced DB design veteran.