I have the following sql code. Is it guaranteed that MyTable is going to be sorted by MyTable.data? Basically the question is - if I am inserting multiple rows with one INSERT statement, can other connection get in the middle of my insertion and insert something else in between my rows?
CREATE TABLE MyTable(
id bigint IDENTITY(1,1) NOT NULL,
data uniqueidentifier NOT NULL,
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED (id ASC)
)
DECLARE #data uniqueidentifier
SET #data = NEWID()
WITH Dummy AS
(
SELECT #data as data, 1 as n
UNION ALL
SELECT #data, n + 1
FROM Dummy
WHERE n < 100
)
INSERT INTO MyTable(data)
SELECT data FROM Dummy
Thanks.
By definition, SQL tables have no defined order. But I don't think this is what you are asking. I think you are asking whether another process could insert a row in between one of your inserts. The answer is yes, unless you have the entire table locked, which you probably do not want to do for concurrency reasons.
Forgive me if I misunderstand your question. But you should never depend on primary keys to order data, if that is what you're after. That is not what primary keys are for. Ordering of data should be done with an ORDER BY clause, or with columns that are specifically introduced by you to keep track of order. Primary key columns are for identifying data, not ordering.
You have no guarantee that those rows will be inserted consecutively - unless you put your SQL Server database into single-user mode and make sure no one else is connected.
But the question is: why is this even important to you?? In SQL Server, you should never rely on a "system-given" order - if you need order, use an explicit ORDER BY - that's the only reliable way to get your rows in an ordered fashion anyway.
Related
Step 1:
I am creating a simple table.
CREATE TABLE `indexs`.`table_one` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(45) NULL,
PRIMARY KEY (`id`));
Step 2:
I make two inserts into this table.
insert into table_one (name) values ("B");
insert into table_one (name) values ("A");
Step 3:
I make a select, I get a table, the records in which are ordered by id.
SELECT * FROM table_one;
This is the expected result, because in mysql the primary key is a clustered index, therefore the data will be physically ordered by it.
Now the part I don't understand.
Step 4:
I am creating an index on the name column.
CREATE INDEX index_name ON table_one(name)
I repeat step 3 again, but I get a different result. The lines are now ordered according to the name column.
Why is this happening? why the order of the rows in the table changes in accordance with the new index on the name column, because as far as I understand, in mysql, the primary key is the only clustered index, and all indexes created additionally are secondary.
I make a select, I get a table, the records in which are ordered by id. [...] This is the expected result, because in mysql the primary key is a clustered index, therefore the data will be physically ordered by it.
There is some misunderstanding of a concept here.
Table rows have no inherent ordering: they represent unordered set of rows. While the clustered index enforces a physical ordering of data in storage, it does not guarantee the order in which rows are returned by a select query.
If you want the results of the query to be ordered, then use an order by clause. Without such clause, the ordering or the rows is undefined: the database is free to return results in whichever order it likes, and results are not guaranteed to be consistent over consecutive executions of the same query.
select * from table_one order by id;
select * from table_one order by name;
(GMB explains most)
Why is this happening? why the order of the rows in the table changes in accordance with the new index on the name column
Use EXPLAIN SELECT ... -- it might give a clue of what I am about to suggest.
You added INDEX(name). In InnoDB, the PRIMARY KEY column(s) are tacked onto the end of each secondary index. So it is effectively a BTree ordered by (name,id) and containing only those columns.
Now, the Optimizer is free to fetch the data from the index, since it has everything you asked for (id and name). (This index is called "covering".)
Since you did not specify an ORDER BY, the result set ordering is valid (see GMB's discussion).
Moral of the story: If you want an ordering, specify ORDER BY. (The Optimizer is smart enough to "do no extra work" if it can see how to provide the data without doing a sort.
Further experiment: Add another column to the table but don't change the indexes. Now you will find SELECT * FROM t is ordered differently than SELECT id, name FROM t. I think I have given you enough clues to predict this difference, if not, ask.
I am trying to build an API and one of the endpoints will return a random row from my database. In the database I have a table in which I want a "views" column to be updated every time I run a SELECT query on a row.
My table looks something like this:
CREATE TABLE `movies` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(256) NOT NULL,
`description` text,
`views` int(11) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`)
);
The row is selected by ordering the table with rand() and then limiting the result by 1, like so:
SELECT * FROM table ORDER BY rand() LIMIT 1;
Is something like this below possible?
SELECT * FROM table ORDER BY rand() LIMIT 1
UPDATE table SET views = +1 WHERE (selected row?);
I'm new to SQL queries, so I don't know if this is the best way or even possible at all. Should I run a new query after this one has completed that updates the value instead?
Usually, every table has a Primary Key, i.e. a unique ID of every single row. Since you have a result of your SELECT query and it's only 1 row, you always can make a consequent update query like UPDATE table SET views = views + 1 WHERE id = <returned_record_id>. Here we assume that the column id is a Primary Key column. This pair of queries need to be issued by the application code. If you want to achieve SELECT + UPDATE functionality as a single SQL statement, consider using stored procedures.
While the aforementioned approach is technically possible, it might have a few performance problems. First of, ORDER BY rand() often has a poor performance. Also, having an update on each select could have bad performance implications.
No what you want is not possible .as, select and update commands can not be used togethor in a single transaction.
You can do it seperately
You need to create a procedure for this in your database like:
CREATE PROCEDURE `procedure_name`()
BEGIN
SELECT * FROM table ORDER BY rand() LIMIT 1 ;
UPDATE table SET views = +1 WHERE (selected row?) ;
END
and then call it
call procedure_name();
You can check only as there are many ways to write a procedure.
Thanks
Unfortunately, what you want to do is not possible, at least not without a lot of work. SQL in general -- and MySQL in particular -- offer a capability called triggers.
Triggers allow you to do take actions when something happens in the database. For instance, if you want to check that values are correct, you can write an insert/update trigger to check the values and reject improper ones. Or, if you want to stash deleted records into an audit table, a trigger is the way to go.
What you are describing could be implemented using a trigger on a "select". Such a beast does not exist.
What are your options? Well, the simplest is to do this in your application. When a movie is selected, then you can update views. Of course, that only increments the views where you have the code.
You can move this code into a stored procedure. This simplifies the application code. It just has to "know" to use the stored procedure. But, there is no enforcement mechanism.
You can make this more enforceable by using permissions. Basically, don't allow access to the underlying table except through the stored procedure. This is closest to what you want.
I have a very large table 20-30 million rows that is completely overwritten each time it is updated by the system supplying the data over which I have no control.
The table is not sorted in a particular order.
The rows in the table are unique, there is no subset of columns that I can be assured to have unique values.
Is there a way I can run a SELECT query followed by a DELETE query on this table with a fixed limit without having to trigger any expensive sorting/indexing/partitioning/comparison whilst being certain that I do not delete a row not covered by the previous select.
I think you're asking for:
SELECT * FROM MyTable WHERE x = 1 AND y = 3;
DELETE * FROM MyTable WHERE NOT (x = 1 AND y = 3);
In other words, use NOT against the same search expression you used in the first query to get the complement of the set of rows. This should work for most expressions, unless some of your terms return NULL.
If there are no indexes, then both the SELECT and DELETE will incur a table-scan, but no sorting or temp tables.
Re your comment:
Right, unless you use ORDER BY, you aren't guaranteed anything about the order of the rows returned. Technically, the storage engine is free to return the rows in any arbitrary order.
In practice, you will find that InnoDB at least returns rows in a somewhat predictable order: it reads rows in some index order. Even if your table has no keys or indexes defined, every InnoDB table is stored as a clustered index, even if it has to generate an internal key called GEN_CLUST_ID behind the scenes. That will be the order in which InnoDB returns rows.
But you shouldn't rely on that. The internal implementation is not a contract, and it could change tomorrow.
Another suggestion I could offer:
CREATE TABLE MyTableBase (
id INT AUTO_INCREMENT PRIMARY KEY,
A INT,
B DATE,
C VARCHAR(10)
);
CREATE VIEW MyTable AS SELECT A, B, C FROM MyTableBase;
With a table and a view like above, your external process can believe it's overwriting the data in MyTable, but it will actually be stored in a base table that has an additional primary key column. This is what you can use to do your SELECT and DELETE statements, and order by the primary key column so you can control it properly.
I've got a bit of a stupid question. The thing is my program has to have the function to delete data from my database. Yay, not really the problem. But how can I delete data without the danger that others can see, that there has been something deleted.
User Table:
U_ID U_NAME
1 Chris
2 Peter
OTHER TABLE
ID TIMESTAMP FK_U_D
1 2012-12-01 1
2 2012-12-02 1
Sooooo the ID's are AUTO_INCREMENT, so if I delete one of them there's a gap. Furthermore, the timestamp is also bigger than the row before, so ascending.
I want to let the data with ID 1 disappear from the user's profile (U_ID 1).
If I delete it, there is a gap. If I just change the FK_U_ID to 2 (Peter) it's obvious, because when I insert data, there are 20 or 30 data rows with the same U_ID...so it's obvious that there has been a modification.
If I set the FK_U_ID NULL --> same sh** like when I change it to another U_ID.
Is there any solution to get this work? I know that if nobody but me has access to the database, it's just no problem. But just in case, if somebody controls my program it should not be obvious that there has been modifications.
So here we go.
For the ID gaps issue you can use GUIDs as #SLaks suggests, but then you can't use the native RDBMS auto_increment which means you have to create the GUID and insert it along with the rest of the record data upon creation. Of course, you don't really need the ID to be globally unique, you could just store a random string of 20 characters or something, but then you have to do a DB read to see if that ID is taken and repeat (recursively) that process until you find an unused ID... could be quite taxing.
It's not at all clear why you would want to "hide" evidence that a delete was performed. That sounds like a really bad idea. I'm not a fan of promulgating misinformation.
Two of the characteristics of an ideal primary key are:
- anonymous (be void of any useful information, doesn't matter what it's set to)
- immutable (once assigned, it will never be changed.)
But, if we set that whole discussion aside...
I can answer a slightly different question (an answer you might find helpful to your particular situation)
The only way to eliminate a "gap" in the values in a column with an AUTO_INCREMENT would be to change the column values from their current values to a contiguous sequence of new values. If there are any foreign keys that reference that column, the values in those columns would need to be updated as well, to preserve the relationship. That will likely leave the current auto_increment value of the table higher than the largest value of the id column, so I'd want to reset that as well, to avoid a "gap" on the next insert.
(I have done re-sequencing of auto_increment values in development and test environments, to "cleanup" lookup tables, and to move the id values of some tables to ranges that are distinct from ranges in other tables... that let's me test SQL to make sure the SQL join predicates aren't inadvertently referencing the wrong table, and returning rows that look correct by accident... those are some reasons I've done reassignment if auto_increment values)
Note that the database can "automagically" update foreign key values (for InnnoDB tables) when you change the primary key value, as long as the foreign key constraint is defined with ON UPDATE CASCADE, and FOREIGN_KEY_CHECKS is not disabled.
If there are no foreign keys to deal with, and assuming that all of the current values of id are positive integers, then I've been able to do something like this: (with appropriate backups in place, so I can recover if things don't work right)
UPDATE mytable t
JOIN (
SELECT s.id AS old_id
, #i := #i + 1 AS new_id
FROM mytable s
CROSS
JOIN (SELECT #i := 0) i
ORDER BY s.id
) c
ON t.id = c.old_id
SET t.id = c.new_id
WHERE t.id <> c.new_id
To reset the table AUTO_INCREMENT back down to the largest id value in the table:
ALTER TABLE mytable AUTO_INCREMENT = 1;
Typically, I will create a table and populate it from that query in the inline view (aliased as c) above. I can then use that table to update both foreign key columns and the primary key column, first disabling the FOREIGN_KEY_CHECKS and then re-enabling it. (In a concurrent environment, where other processes might be inserting/updating/deleting rows from one of the tables, I would of course first obtain an exclusive lock on all of the tables to be updated.)
Taking up again, the discussion I set aside earlier... this type of "administrative" function can be useful in a test environment, when setting up test cases. But it is NOT a function that is ever performed in a production environment, with live data.
I've got a mysql table where each row has its own sequence number in a "sequence" column. However, when a row gets deleted, it leaves a gap. So...
1
2
3
4
...becomes...
1
2
4
Is there a neat way to "reset" the sequencing, so it becomes consecutive again in one SQL query?
Incidentally, I'm sure there is a technical term for this process. Anyone?
UPDATED: The "sequence" column is not a primary key. It is only used for determining the order that records are displayed within the app.
If the field is your primary key...
...then, as stated elsewhere on this question, you shouldn't be changing IDs. The IDs are already unique and you neither need nor want to re-use them.
Now, that said...
Otherwise...
It's quite possible that you have a different field (that is, as well as the PK) for some application-defined ordering. As long as this ordering isn't inherent in some other field (e.g. if it's user-defined), then there is nothing wrong with this.
You could recreate the table using a (temporary) auto_increment field and then remove the auto_increment afterwards.
I'd be tempted to UPDATE in ascending order and apply an incrementing variable.
SET #i = 0;
UPDATE `table`
SET `myOrderCol` = #i:=#i+1
ORDER BY `myOrderCol` ASC;
(Query not tested.)
It does seem quite wasteful to do this every time you delete items, but unfortunately with this manual ordering approach there's not a whole lot you can do about that if you want to maintain the integrity of the column.
You could possibly reduce the load, such that after deleting the entry with myOrderCol equal to, say, 5:
SET #i = 5;
UPDATE `table`
SET `myOrderCol` = #i:=#i+1
WHERE `myOrderCol` > 5
ORDER BY `myOrderCol` ASC;
(Query not tested.)
This will "shuffle" all the following values down by one.
I'd say don't bother. Reassigning sequential values is a relatively expensive operation and if the column value is for ordering purpose only there is no good reason to do that. The only concern you might have is if for example your column is UNSIGNED INT and you suspect that in the lifetime of your application you might have more than 4,294,967,296 rows (including deleted rows) and go out of range, even if that is your concern you can do the reassigning as a one time task 10 years later when that happens.
This is a question that often I read here and in other forums. As already written by zerkms this is a false problem. Moreover if your table is related with other ones you'll lose relations.
Just for learning purpose a simple way is to store your data in a temporary table, truncate the original one (this reset auto_increment) and than repopulate it.
Silly example:
create table seq (
id int not null auto_increment primary key,
col char(1)
) engine = myisam;
insert into seq (col) values ('a'),('b'),('c'),('d');
delete from seq where id = 3;
create temporary table tmp select col from seq order by id;
truncate seq;
insert into seq (col) select * from tmp;
but it's totally useless. ;)
If this is your PK then you shouldn't change it. PKs should be (mostly) unchanging columns. If you were to change them then not only would you need to change it in that table but also in any foreign keys where is exists.
If you do need a sequential sequence then ask yourself why. In a table there is no inherent or guaranteed order (even in the PK, although it may turn out that way because of how most RDBMSs store and retrieve the data). That's why we have the ORDER BY clause in SQL. If you want to be able to generate sequential numbers based on something else (time added into the database, etc.) then consider generating that either in your query or with your front end.
Assuming that this is an ID field, you can do this when you insert:
INSERT INTO yourTable (ID)
SELECT MIN(ID)
FROM yourTable
WHERE ID > 1
As others have mentioned I don't recommend doing this. It will hold a table lock while the next ID is evaluated.