MySQL insert between ID's - mysql

If I have a database with the following information, how can I setup my next INSERT query so that the ID is filled in? (so that it is 5 in this instance.)
Basically, once it gets to 24, it will continue inserting in order (ex: 30,31,32)

You don't. Not with an auto-incrementing integer anyway.
You could change the column to not be an auto-incrementing integer, but then you'll need to determine the next ID before performing each insert which would make all of your INSERT queries unnecessarily complex and the code more difficult to maintain. Not to mention introducing a significant point of failure if multiple threads try to insert and the operation to find the next ID and insert a record isn't fully atomic.
Why do you even need this? There's no reason for a database-generated primary key integer to be contiguous like that. Its purpose is to be unique, and as long as it serves that purpose it's working. There's no need to "fill in the holes" left by previously deleted records.
You could add a different column to the database and perform the logic for finding the next contiguous number when inserting records on that column. But you'd still run into the same aforementioned problems of race conditions and unnecessary complexity.

Change your filename to something more meaningful than the id.
I think something like files/uploads/20130515_170349.wv (for the first row) makes a lot of sense (assuming you don't have more than one file per second.
This also has the advantage that ordering the file names alphabetically is chronological order, making it easier to see the newer and older files.

You can just give it the I'd field and value
Insert into table (I'd, etc, etc) values (5, etc, etc);
However I don't think you can do it dynamically. If I'd is auto increment then it'll keep on oncrementinf whether or not previous tuples have been deleted etc.

Related

Can/should I make id column that is part of a composite key non-unique [duplicate]

I have got a table which has an id (primary key with auto increment), uid (key refering to users id for example) and something else which for my question won’t matter.
I want to make, lets call it, different auto-increment keys on id for each uid entry.
So, I will add an entry with uid 10, and the id field for this entry will have a 1 because there were no previous entries with a value of 10 in uid. I will add a new one with uid 4 and its id will be 3 because I there were already two entried with uid 4.
...Very obvious explanation, but I am trying to be as explainative an clear as I can to demonstrate the idea... clearly.
What SQL engine can provide such a functionality natively? (non Microsoft/Oracle based)
If there is none, how could I best replicate it? Triggers perhaps?
Does this functionality have a more suitable name?
In case you know about a non SQL database engine providing such a functioality, name it anyway, I am curious.
Thanks.
MySQL's MyISAM engine can do this. See their manual, in section Using AUTO_INCREMENT:
For MyISAM tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is useful when you want to put data into ordered groups.
The docs go on after that paragraph, showing an example.
The InnoDB engine in MySQL does not support this feature, which is unfortunate because it's better to use InnoDB in almost all cases.
You can't emulate this behavior using triggers (or any SQL statements limited to transaction scope) without locking tables on INSERT. Consider this sequence of actions:
Mario starts transaction and inserts a new row for user 4.
Bill starts transaction and inserts a new row for user 4.
Mario's session fires a trigger to computes MAX(id)+1 for user 4. You get 3.
Bill's session fires a trigger to compute MAX(id). I get 3.
Bill's session finishes his INSERT and commits.
Mario's session tries to finish his INSERT, but the row with (userid=4, id=3) now exists, so Mario gets a primary key conflict.
In general, you can't control the order of execution of these steps without some kind of synchronization.
The solutions to this are either:
Get an exclusive table lock. Before trying an INSERT, lock the table. This is necessary to prevent concurrent INSERTs from creating a race condition like in the example above. It's necessary to lock the whole table, since you're trying to restrict INSERT there's no specific row to lock (if you were trying to govern access to a given row with UPDATE, you could lock just the specific row). But locking the table causes access to the table to become serial, which limits your throughput.
Do it outside transaction scope. Generate the id number in a way that won't be hidden from two concurrent transactions. By the way, this is what AUTO_INCREMENT does. Two concurrent sessions will each get a unique id value, regardless of their order of execution or order of commit. But tracking the last generated id per userid requires access to the database, or a duplicate data store. For example, a memcached key per userid, which can be incremented atomically.
It's relatively easy to ensure that inserts get unique values. But it's hard to ensure they will get consecutive ordinal values. Also consider:
What happens if you INSERT in a transaction but then roll back? You've allocated id value 3 in that transaction, and then I allocated value 4, so if you roll back and I commit, now there's a gap.
What happens if an INSERT fails because of other constraints on the table (e.g. another column is NOT NULL)? You could get gaps this way too.
If you ever DELETE a row, do you need to renumber all the following rows for the same userid? What does that do to your memcached entries if you use that solution?
SQL Server should allow you to do this. If you can't implement this using a computed column (probably not - there are some restrictions), surely you can implement it in a trigger.
MySQL also would allow you to implement this via triggers.
In a comment you ask the question about efficiency. Unless you are dealing with extreme volumes, storing an 8 byte DATETIME isn't much of an overhead compared to using, for example, a 4 byte INT.
It also massively simplifies your data inserts, as well as being able to cope with records being deleted without creating 'holes' in your sequence.
If you DO need this, be careful with the field names. If you have uid and id in a table, I'd expect id to be unique in that table, and uid to refer to something else. Perhaps, instead, use the field names property_id and amendment_id.
In terms of implementation, there are generally two options.
1). A trigger
Implementations vary, but the logic remains the same. As you don't specify an RDBMS (other than NOT MS/Oracle) the general logic is simple...
Start a transaction (often this is Implicitly already started inside triggers)
Find the MAX(amendment_id) for the property_id being inserted
Update the newly inserted value with MAX(amendment_id) + 1
Commit the transaction
Things to be aware of are...
- multiple records being inserted at the same time
- records being inserted with amendment_id being already populated
- updates altering existing records
2). A Stored Procedure
If you use a stored procedure to control writes to the table, you gain a lot more control.
Implicitly, you know you're only dealing with one record.
You simply don't provide a parameter for DEFAULT fields.
You know what updates / deletes can and can't happen.
You can implement all the business logic you like without hidden triggers
I personally recommend the Stored Procedure route, but triggers do work.
It is important to get your data types right.
What you are describing is a multi-part key. So use a multi-part key. Don't try to encode everything into a magic integer, you will poison the rest of your code.
If a record is identified by (entity_id,version_number) then embrace that description and use it directly instead of mangling the meaning of your keys. You will have to write queries which constrain the version number but that's OK. Databases are good at this sort of thing.
version_number could be a timestamp, as a_horse_with_no_name suggests. This is quite a good idea. There is no meaningful performance disadvantage to using timestamps instead of plain integers. What you gain is meaning, which is more important.
You could maintain a "latest version" table which contains, for each entity_id, only the record with the most-recent version_number. This will be more work for you, so only do it if you really need the performance.

two auto incrementing fields in one MySQL table

I'm setting up a table that needs two auto-incrementing fields, 'id' and 'member#'.
I'll use AUTO_INCREMENT = 1001 on the latter for new data, as there is old data with member numbers less than 1000.
I'll use 'MAX(id)+1' on the 'id' field to auto-increment it.
But I'm not sure if this will do the job whenever there's an INSERT, or even where to put that bit of code. All I'm trying to do here is auto-increment the field, not SELECTing anything.
And out of curiosity, why is there only one AUTO_INCREMENTing field per table?
Surely, it can't be difficult to code AUTO_INCREMENT_2, AUTO_INCREMENT_3 etc.
All answers and assistance appreciated.
================================
ADDITIONAL INFORMATION AND LINKS
Sorry for the delay in my response, I've been doing additional research.
Ok so to explain further, we have people joining our group via the net. As such we need to assign a unique membership number to each person. Two John Does? Two different membership numbers. For this I've set the member# column as AUTO_INCREMENT, and then AUTO_INCREMENT = 1001 as a table option. Old membership numbers have three digits, new memberships have four. So each time someone registers as a new member on the web, there's an insert command that automatically assigns the next four digit membership number in the series to the new member.
member# INT(6) UNSIGNED NOT NULL UNIQUE KEY AUTO_INCREMENT
And as a table option AUTO_INCREMENT = 1001
I hope this is clear. Other situations where someone might want to use a similar strategy could be assigning consecutive invoice numbers, receipt numbers, account numbers, etc. So how does one guarantee a +1 result, ie consecutive numbers?
Now we also need a table id column. Lots of tables need a table id. It too needs to be assigned an AUTO_INCREMENT value, in our case, beginning with 1, and incrementing by 1 (the default), to identify and distinguish one row from another. But unfortunately there can be only one AUTO_INCREMENT column per table in MySQL. :-/
So this situation belongs to a class of problems known as MAX+1 problems. (It may also be related to ROW_COUNT and LAST_INSERT_ID solutions.) The limit of a single AUTO_INCREMENT field per table requires a MAX+1 workaround and I am looking for advice on the best way to implement this. For example, is there a way to set this up inside the CREATE TABLE command itself, when defining the id field? Or something else of an equally simple nature, such as writing a function. It is indeed preferable to optimize for efficiency and use only needed features rather than implement a series of commands. Typically a suggested work around might be:
Lock tables membership write;
$max = SELECT MAX(id) FROM membership;
INSERT INTO membership ( id, firstname, lastname )
VALUES ($max+1 , 'jane', 'smith')
unlock tables;
Is there something better?
As whether AUTO_INCREMENT_2 /_3... features should exist. Well, I'd have to point out that there are a lot of features in MySQL that I'll never use, but obviously someone needs them. Nevertheless, it would be convenient to have this for those (rare) occasions when you might need it. Perhaps there is a distinction to be drawn between having a feature available and using it on any given table. I doubt an unused feature requires much in the way of additional memory or clicks (which are pretty cheap these days anyways).
Some links that may prove useful in understanding this situation:
https://duckduckgo.com/?q=mysql+max%2B1+problems&t=ffab&atb=v1-1&ia=web
Insert and set value with max()+1 problems
Problem with MySql INSERT MAX()+1
https://bugs.mysql.com/bug.php?id=3575
All answers, advice and assistance appreciated.
Each InnoDB table has at most one counter for its auto-increment. This is part of the implementation. If you could define N auto-increment columns, in the same table, it would need more storage space to store N counters. It would require the auto-increment lock to last longer while you incremented N counters.
As for why is there only one per table, sure, it is possible that they could implement it to support more than one, but why?
It would make the implementation a lot more complex, and hinder performance, for cases that 99.99% of apps don't need.
They were trying to solve the needs for the majority of cases. In nearly every case of a table with an auto-increment, one per table is sufficient.
In nearly every case where someone like you thinks they need more than one per table, you'd be wise to step back and reconsider your design.
In MySQL the table structure cannot contain more than one auto_increment field. When you try to create a table with 2 autoincremented fields or alter the table in attempt to create second autoincrement, the query fails.
Autoincrement guarantees that each next value generated in the field will be greater than previous one in current connection. But it do NOT guarantee, that each next value generated in the field will be greater than previous value by 1. The "delta" may be 2 or even 1000... it cannot be negative or zero only.

How can one prevent mysql insertion within a particular timestamp interval?

I have a system whereby users can input data into a mysql table from many sites across the globe.
The data is posted via ajax to my table without issues. But, I would like to improve my insertion code to prevent insertion if the timestamp is within some interval. This would weed out duplicate rows in my table.
Before you get angry -> I do understand I can set a primary key to certain columns and prevent duplicate insertion.
In my use case, I need to allow duplications of the numeric data where it is truly duplicated values from a unique submission -> this is valid in my case. I would like to leverage the timestamp to weed out obvious double insertions where the variables were submitted by accident twice.
I have tried to disable the button for 1-2 seconds, but this hasn't solved the problem entirely.
If I have columns: weight, height, country and the timestamp, I'd like to somehow check if there is an insert within n sections of the timestamp, where the post includes data that matches these variables. This would tell me that there is an accidental duplication from a user and I shouldn't insert it into the database.
I'm not too familiar with MYSQL, so I was hoping to get some guidance here.
Thanks.
There are different solutions, depending on the specifics of your case:
If you need to apply some rule that validates the new row using values inside the row itself a CHECK constraint will do. Consider, though, that MySQL enforces CHECK constraints starting in version 8.0.3 (if I remember well).
If you want to enforce a rule in relation to other rows, you can serialize the insertions into a queue. The consumer of the queue will validate the insertions one by one and will accept or reject them. Consider that serialization is not a good option for massive level of insertions, since it produce a bottleneck (this may be your case since you say insertions from across the globe).
Alternatively, you can use optimistic insertion, and always produce the insertion with an intermediate status "waiting for validation". Then other process(es) can validate the row. If all is good, then the row is approved; if not, then a compensation procedure is executed, in a-la-microservice way.
Which one is your case?

Maintaining a list of unique values in a database

Let's say you have a random number generator spitting out numbers between 1 and 100 000 000 and you want to store them in a database (MySQL) with the timestamp when they were generaeted. If a number that has previously been seen comes, it is discarded.
What would be the best algorithm to make this happen? SELECT then INSERT as necessary? Is there something more efficient?
You can go for a SEQUENCE:
+
no relations are being locked, thus best performance;
no race conditions;
portable.
-
it is possible to get “gaps” in the series of numbers.
You can do a SELECT ... then INSERT ...:
+
no gaps, you can also do some complicated math on your numbers.
-
it's possible to get another parallel session in the middle between SELECT and INSERT and end up with 2 equal numbers;
if there's a UNIQUE constraint, then previos situation will lead to an exception;
to avoid such situation, you might go for an explicit table locks, but this will cause an immediate performance impact.
You can choose INSERT ON DUPLICATE KEY UPDATE, and by now it seems to be the best option (take a look at "INSERT IGNORE" vs "INSERT ... ON DUPLICATE KEY UPDATE"), at least in my view, with the only exception — not portable to other RDBMSes.
P.S. This article is not related to MySQL, but it is worth reading it to get an overview of all the catches that can happen on your way.
If you don't need to insert a new random value every time you can use INSERT IGNORE or REPLACE INTO. Otherwise you should SELECT to check and then INSERT.
This would normally be solved by creating a unique index on the random number column in the table. You could experiment to see if a b-tree versus a hash has better performance.
If you have lots of memory, you could pre-populate a table with 100,000,000 rows -- all possible values. Then, when you look to see if something is already created, then you only need to see if the time stamp is non-null. However, this would require over a Gbyte of RAM to store the table in memory, and would only be the opimal solution if you are trying to maximize transactions per second.
If you put a UNIQUE index on the column with the extracted numbers any INSERT attempting to duplicate a UNIQUE key will fail.
Therefore the easiest and most portable version will be (PHP code, but you get the idea):
function extraction() {
do {
$random = generate_random_number();
$result = #mysql_query("INSERT INTO extractions(number) VALUE ($random)");
} while (!$result);
return $random;
}

How do I search part of a column?

I have a mysql table containing 40 million records that is being populated by a process over which I have no control. Data is added only once every month. This table needs to be search-able by the Name column. But the name column contains the full name in the format 'Last First Middle'.
In the sphinx.conf, I have
sql_query = SELECT Id, OwnersName,
substring_index(substring_index(OwnersName,' ',2),' ',-1) as firstname,
substring_index(OwnersName,' ',2) as lastname
FROM table1
How do I use sphinx search to search by firstname and/or lastname? I would like to be able to search for 'Smith' in only the first name?
Per-row functions in SQL queries are always a bad idea for tables that may grow large. If you want to search on part of a column, it should be extracted out to its own column and indexed.
I would suggest, if you have power over the schema (as opposed to the population process), inserting new columns called OwnersFirstName and OwnersLastName along with an update/insert trigger which extracts the relevant information from OwnersName and populats the new columns appropriately.
This means the expense of figuring out the first name is only done when a row is changed, not every single time you run your query. That is the right time to do it.
Then your queries become blindingly fast. And, yes, this breaks 3NF, but most people don't realize that it's okay to do that for performance reasons, provided you understand the consequences. And, since the new columns are controlled by the triggers, the data duplication that would be cause for concern is "clean".
Most problems people have with databases is the speed of their queries. Wasting a bit of disk space to gain a large amount of performance improvement is usually okay.
If you have absolutely no power over even the schema, another possibility is to create your own database with the "correct" schema and populate it periodically from the real database. Then query yours. That may involve a fair bit of data transfer every month however so the first option is the better one, if allowed.
Judging by the other answers, I may have missed something... but to restrict a search in Sphinx to a specific field, make sure you're using the extended (or extended2) match mode, and then use the following query string: #firstname Smith.
You could use substring to get the parts of the field that you want to search in, but that will slow down the process. The query can not use any kind of index to do the comparison, so it has to touch each record in the table.
The best would be not to store several values in the same field, but put the name components in three separate fields. When you store more than one value in a fields it's almost always some problems accessing the data. I see this over and over in different forums...
This is an intractable problrm because fulll names can contains prefixes, suffixes, middle names and no middle names, composite first and last names with and without hyphens, etc. There is no reasonable way to do this with 100% reliability