I've got a mysql table where each row has its own sequence number in a "sequence" column. However, when a row gets deleted, it leaves a gap. So...
1
2
3
4
...becomes...
1
2
4
Is there a neat way to "reset" the sequencing, so it becomes consecutive again in one SQL query?
Incidentally, I'm sure there is a technical term for this process. Anyone?
UPDATED: The "sequence" column is not a primary key. It is only used for determining the order that records are displayed within the app.
If the field is your primary key...
...then, as stated elsewhere on this question, you shouldn't be changing IDs. The IDs are already unique and you neither need nor want to re-use them.
Now, that said...
Otherwise...
It's quite possible that you have a different field (that is, as well as the PK) for some application-defined ordering. As long as this ordering isn't inherent in some other field (e.g. if it's user-defined), then there is nothing wrong with this.
You could recreate the table using a (temporary) auto_increment field and then remove the auto_increment afterwards.
I'd be tempted to UPDATE in ascending order and apply an incrementing variable.
SET #i = 0;
UPDATE `table`
SET `myOrderCol` = #i:=#i+1
ORDER BY `myOrderCol` ASC;
(Query not tested.)
It does seem quite wasteful to do this every time you delete items, but unfortunately with this manual ordering approach there's not a whole lot you can do about that if you want to maintain the integrity of the column.
You could possibly reduce the load, such that after deleting the entry with myOrderCol equal to, say, 5:
SET #i = 5;
UPDATE `table`
SET `myOrderCol` = #i:=#i+1
WHERE `myOrderCol` > 5
ORDER BY `myOrderCol` ASC;
(Query not tested.)
This will "shuffle" all the following values down by one.
I'd say don't bother. Reassigning sequential values is a relatively expensive operation and if the column value is for ordering purpose only there is no good reason to do that. The only concern you might have is if for example your column is UNSIGNED INT and you suspect that in the lifetime of your application you might have more than 4,294,967,296 rows (including deleted rows) and go out of range, even if that is your concern you can do the reassigning as a one time task 10 years later when that happens.
This is a question that often I read here and in other forums. As already written by zerkms this is a false problem. Moreover if your table is related with other ones you'll lose relations.
Just for learning purpose a simple way is to store your data in a temporary table, truncate the original one (this reset auto_increment) and than repopulate it.
Silly example:
create table seq (
id int not null auto_increment primary key,
col char(1)
) engine = myisam;
insert into seq (col) values ('a'),('b'),('c'),('d');
delete from seq where id = 3;
create temporary table tmp select col from seq order by id;
truncate seq;
insert into seq (col) select * from tmp;
but it's totally useless. ;)
If this is your PK then you shouldn't change it. PKs should be (mostly) unchanging columns. If you were to change them then not only would you need to change it in that table but also in any foreign keys where is exists.
If you do need a sequential sequence then ask yourself why. In a table there is no inherent or guaranteed order (even in the PK, although it may turn out that way because of how most RDBMSs store and retrieve the data). That's why we have the ORDER BY clause in SQL. If you want to be able to generate sequential numbers based on something else (time added into the database, etc.) then consider generating that either in your query or with your front end.
Assuming that this is an ID field, you can do this when you insert:
INSERT INTO yourTable (ID)
SELECT MIN(ID)
FROM yourTable
WHERE ID > 1
As others have mentioned I don't recommend doing this. It will hold a table lock while the next ID is evaluated.
Related
I have a very large table 20-30 million rows that is completely overwritten each time it is updated by the system supplying the data over which I have no control.
The table is not sorted in a particular order.
The rows in the table are unique, there is no subset of columns that I can be assured to have unique values.
Is there a way I can run a SELECT query followed by a DELETE query on this table with a fixed limit without having to trigger any expensive sorting/indexing/partitioning/comparison whilst being certain that I do not delete a row not covered by the previous select.
I think you're asking for:
SELECT * FROM MyTable WHERE x = 1 AND y = 3;
DELETE * FROM MyTable WHERE NOT (x = 1 AND y = 3);
In other words, use NOT against the same search expression you used in the first query to get the complement of the set of rows. This should work for most expressions, unless some of your terms return NULL.
If there are no indexes, then both the SELECT and DELETE will incur a table-scan, but no sorting or temp tables.
Re your comment:
Right, unless you use ORDER BY, you aren't guaranteed anything about the order of the rows returned. Technically, the storage engine is free to return the rows in any arbitrary order.
In practice, you will find that InnoDB at least returns rows in a somewhat predictable order: it reads rows in some index order. Even if your table has no keys or indexes defined, every InnoDB table is stored as a clustered index, even if it has to generate an internal key called GEN_CLUST_ID behind the scenes. That will be the order in which InnoDB returns rows.
But you shouldn't rely on that. The internal implementation is not a contract, and it could change tomorrow.
Another suggestion I could offer:
CREATE TABLE MyTableBase (
id INT AUTO_INCREMENT PRIMARY KEY,
A INT,
B DATE,
C VARCHAR(10)
);
CREATE VIEW MyTable AS SELECT A, B, C FROM MyTableBase;
With a table and a view like above, your external process can believe it's overwriting the data in MyTable, but it will actually be stored in a base table that has an additional primary key column. This is what you can use to do your SELECT and DELETE statements, and order by the primary key column so you can control it properly.
I have a MySQL table that contains millions of entries.
Each entry must be processed at some point by a cron job.
I need to be able to quickly locate unprocessed entries, using an index.
So far, I have used the following approach: I add a nullable, indexed processedOn column that contains the timestamp at which the entry has been processed:
CREATE TABLE Foo (
...
processedOn INT(10) UNSIGNED NULL,
KEY (processedOn)
);
And then retrieve an unprocessed entry using:
SELECT * FROM Foo WHERE processedOn IS NULL LIMIT 1;
Thanks to MySQL's IS NULL optimization, the query is very fast, as long as the number of unprocessed entries if small (which is almost always the case).
This approach is good enough: it does the job, but at the same time I feel like the index is wasted because it's only ever used for WHERE processedOn IS NULL queries, and never for locating a precise value or range of values for this field. So this has an inevitable impact on storage space and INSERT performance, as every single timestamp is indexed for nothing.
Is there a better approach? Ideally the index would just contain pointers to the unprocessed rows, and no pointer to any processed row.
I know I could split this table into 2 tables, but I'd like to keep it in a single table.
What comes to my mind is to create a isProcessed column, with default value = 'N' and you set to 'Y' when processed (at the same time you set the processedOn column). Then create an index on the isProcessed field. When you query (with the where clause WHERE isProcessed = 'N'), it will respond very fast.
UPDATE: ALTERNATIVE with partitioning:
Create your table with partitions and define a field that will have just 2 values 1 or 0. This will create one partition for records with the field = 1 and another for records with field = 0.
create table test (field1 int, field2 int DEFAULT 0)
PARTITION BY LIST(field2) (
PARTITION p0 VALUES IN (0),
PARTITION p1 VALUES IN (1)
);
This way, if you want to query only the records with the field equal to one of the values, just do this:
select * from test partition (p0);
The query above will show only records with field2 = 0.
And if you need to query all records together, you just query the table normally:
select * from test;
As far as I was able to understand, this will help you with your need.
I have multiple answers and comments on others' answers.
First, let me assume that the PRIMARY KEY for Foo is id INT UNSIGNED AUTO_INCREMENT (4 bytes) and that the table is Engine=InnoDB.
Indexed Extra column
The index for the extra column would be, per row, the width of the extra column and the PRIMARY KEY, plus a bunch of overhead. With your processedOn, you are talking about 8 bytes (2 INTs). With a simple flag, 5 bytes.
Separate table
This table would have only id for the unprocessed items. It would take extra code to populate it. It's size would stay at some "high-water mark". So, if there were a burst of unprocessed items, it would grow, but not shrink back. (Here's a rare case where OPTIMIZE TABLE is useful.) InnoDB requires a PRIMARY KEY, and id would work perfectly. So, one column, no extra index. It is a lot smaller than the extra index discussed above. Finding something to work on:
$id = SELECT id FROM tbl LIMIT 1; -- don't care which one
process it
DELETE FROM tbl where id = $id
2 PARTITIONs, one processed, one not
No. When you change a row from processed to unprocessed, the row must be removed from one partition and inserted into the other. This is done behind the scenes by your UPDATE ... SET flag = 1. Also, both partitions have the "high-water" issue -- they will grow but not shrink. And the space overhead for partitioning may be as much as the other solutions.
SELECT by PARTITION ... requires 5.6. Without that, you would need an INDEX, so you are back to the index issues.
Continual Scanning
This incurs zero extra disk space. (That's better than you had hoped for, correct?) And it is not too inefficient. Here's how it works. Here is some pseudo-code to put into your cron job. But don't make it a cron job. Instead, let it run all the time. (The reason will become clear, I hope.)
SELECT #a := 0;
Loop:
# Get a clump
SELECT #z := id FROM Foo WHERE id > #a ORDER BY id LIMIT 1000,1;
if no results, Set #z to MAX(id)
# Find something to work on in that clump:
SELECT #id := id FROM Foo
WHERE id > #a
AND id <= #z
AND not-processed
LIMIT 1;
if you found something, process it and set #z := #id
SET #a := #z;
if #a >= MAX(id), set #a := 0; # to start over
SLEEP 2 seconds # or some amount that is a compromise
Go Loop
Notes:
It walks through the table with minimal impact.
It works even with gaps in id. (It could be made simpler if there were no gaps.) (If the PK is not AUTO_INCREMENT, it is almost identical.)
The sleep is to be a 'nice guy'.
Selective Index
MariaDB's dynamic columns and MySQL 5.7's JSON can index things, and I think they are "selective". One state would be to have the column empty, the other would be to have the flag set in the dynamic/json column. This will take some research to verify, and may require an upgrade.
I've got a bit of a stupid question. The thing is my program has to have the function to delete data from my database. Yay, not really the problem. But how can I delete data without the danger that others can see, that there has been something deleted.
User Table:
U_ID U_NAME
1 Chris
2 Peter
OTHER TABLE
ID TIMESTAMP FK_U_D
1 2012-12-01 1
2 2012-12-02 1
Sooooo the ID's are AUTO_INCREMENT, so if I delete one of them there's a gap. Furthermore, the timestamp is also bigger than the row before, so ascending.
I want to let the data with ID 1 disappear from the user's profile (U_ID 1).
If I delete it, there is a gap. If I just change the FK_U_ID to 2 (Peter) it's obvious, because when I insert data, there are 20 or 30 data rows with the same U_ID...so it's obvious that there has been a modification.
If I set the FK_U_ID NULL --> same sh** like when I change it to another U_ID.
Is there any solution to get this work? I know that if nobody but me has access to the database, it's just no problem. But just in case, if somebody controls my program it should not be obvious that there has been modifications.
So here we go.
For the ID gaps issue you can use GUIDs as #SLaks suggests, but then you can't use the native RDBMS auto_increment which means you have to create the GUID and insert it along with the rest of the record data upon creation. Of course, you don't really need the ID to be globally unique, you could just store a random string of 20 characters or something, but then you have to do a DB read to see if that ID is taken and repeat (recursively) that process until you find an unused ID... could be quite taxing.
It's not at all clear why you would want to "hide" evidence that a delete was performed. That sounds like a really bad idea. I'm not a fan of promulgating misinformation.
Two of the characteristics of an ideal primary key are:
- anonymous (be void of any useful information, doesn't matter what it's set to)
- immutable (once assigned, it will never be changed.)
But, if we set that whole discussion aside...
I can answer a slightly different question (an answer you might find helpful to your particular situation)
The only way to eliminate a "gap" in the values in a column with an AUTO_INCREMENT would be to change the column values from their current values to a contiguous sequence of new values. If there are any foreign keys that reference that column, the values in those columns would need to be updated as well, to preserve the relationship. That will likely leave the current auto_increment value of the table higher than the largest value of the id column, so I'd want to reset that as well, to avoid a "gap" on the next insert.
(I have done re-sequencing of auto_increment values in development and test environments, to "cleanup" lookup tables, and to move the id values of some tables to ranges that are distinct from ranges in other tables... that let's me test SQL to make sure the SQL join predicates aren't inadvertently referencing the wrong table, and returning rows that look correct by accident... those are some reasons I've done reassignment if auto_increment values)
Note that the database can "automagically" update foreign key values (for InnnoDB tables) when you change the primary key value, as long as the foreign key constraint is defined with ON UPDATE CASCADE, and FOREIGN_KEY_CHECKS is not disabled.
If there are no foreign keys to deal with, and assuming that all of the current values of id are positive integers, then I've been able to do something like this: (with appropriate backups in place, so I can recover if things don't work right)
UPDATE mytable t
JOIN (
SELECT s.id AS old_id
, #i := #i + 1 AS new_id
FROM mytable s
CROSS
JOIN (SELECT #i := 0) i
ORDER BY s.id
) c
ON t.id = c.old_id
SET t.id = c.new_id
WHERE t.id <> c.new_id
To reset the table AUTO_INCREMENT back down to the largest id value in the table:
ALTER TABLE mytable AUTO_INCREMENT = 1;
Typically, I will create a table and populate it from that query in the inline view (aliased as c) above. I can then use that table to update both foreign key columns and the primary key column, first disabling the FOREIGN_KEY_CHECKS and then re-enabling it. (In a concurrent environment, where other processes might be inserting/updating/deleting rows from one of the tables, I would of course first obtain an exclusive lock on all of the tables to be updated.)
Taking up again, the discussion I set aside earlier... this type of "administrative" function can be useful in a test environment, when setting up test cases. But it is NOT a function that is ever performed in a production environment, with live data.
All rows in MySQL tables are being inserted like this:
1
2
3
Is there any way how to insert new row at a top of table so that table looks like this?
3
2
1
Yes, yes, I know "order by" but let me explain the problem. I have a dating website and users can search profiles by sex, age, city, etc. There are more than 20 search criteria and it's not possible to create indexes for each possible combination. So, if I use "order by", the search usually ends with "using temporary, using filesort" and this causes a very high server load. If I remove "order by" oldest profiles are shown as first and users have to go to the last page to see the new profiles. That's very bad because first pages of search results always look the same and users have a feeling that there are no new profiles. That's why I asked this question. If it's not possible to insert last row at top of table, can you suggest anything else?
The order in which the results are returned when there's no ORDER BY clause depends on the RDBM. In the case of MySQL, or at least most engines, if you don't explicitly specify the order it will be ascending, from oldest to new entries. Where the row is located "physically" doesn't matter. I'm not sure if all mysql engines work that way though. I.e., in PostgreSQL the "default" order shows the most recently updated rows first. This might be the way some of the MySQL engines work too.
Anyway, the point is - if you want the results ordered - always specify sort order, don't just depend on something default that seems to work. In you case you want something trivial - you want the users in descending order, so just use:
SELECT * FROM users ORDER BY id DESC
I think you just need to make sure that if you always need to show the latest data first, all of your indexes need to specify the date/time field first, and all of your queries order by that field first.
If ORDER BY is slowing everything down then you need to optimise your queries or your database structure, i would say.
Maybe if you add the id 'by hand', and give it a negative value, but i (and probably nobody) would recommend you to do that:
Regular insert, e.g.
insert into t values (...);
Update with set, e.g.
update t set id = -id where id = last_insert_id();
Normally you specify a auto_incrementing primary key.
However, you can just specify the primary key like so:
CREATE TABLE table1 (
id signed integer primary key default 1, <<-- no auto_increment, but has a default value
other fields .....
Now add a BEFORE INSERT trigger that changes the primary key.
DELIMITER $$
CREATE TRIGGER ai_table1_each BEFORE INSERT ON table1 FOR EACH ROW
BEGIN
DECLARE new_id INTEGER;
SELECT COALESCE(MIN(id), 0) -1 INTO new_id FROM table1;
SET NEW.id = new_id;
END $$
DELIMITER ;
Now your id will start at -1 and run down from there.
The insert trigger will make sure no concurrency problems occur.
I know that a lot of time has passed since the above question was asked. But I have something to add to the comments:
I'm using MySQL version: 5.7.18-0ubuntu0.16.04.1
When no ORDER BY clause is used with SELECT it is noticeable that records are displayed, regardless of the order in which they are added, in the table's Prime Key sequence.
I have the following sql code. Is it guaranteed that MyTable is going to be sorted by MyTable.data? Basically the question is - if I am inserting multiple rows with one INSERT statement, can other connection get in the middle of my insertion and insert something else in between my rows?
CREATE TABLE MyTable(
id bigint IDENTITY(1,1) NOT NULL,
data uniqueidentifier NOT NULL,
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED (id ASC)
)
DECLARE #data uniqueidentifier
SET #data = NEWID()
WITH Dummy AS
(
SELECT #data as data, 1 as n
UNION ALL
SELECT #data, n + 1
FROM Dummy
WHERE n < 100
)
INSERT INTO MyTable(data)
SELECT data FROM Dummy
Thanks.
By definition, SQL tables have no defined order. But I don't think this is what you are asking. I think you are asking whether another process could insert a row in between one of your inserts. The answer is yes, unless you have the entire table locked, which you probably do not want to do for concurrency reasons.
Forgive me if I misunderstand your question. But you should never depend on primary keys to order data, if that is what you're after. That is not what primary keys are for. Ordering of data should be done with an ORDER BY clause, or with columns that are specifically introduced by you to keep track of order. Primary key columns are for identifying data, not ordering.
You have no guarantee that those rows will be inserted consecutively - unless you put your SQL Server database into single-user mode and make sure no one else is connected.
But the question is: why is this even important to you?? In SQL Server, you should never rely on a "system-given" order - if you need order, use an explicit ORDER BY - that's the only reliable way to get your rows in an ordered fashion anyway.