Getting an Unique Identifier without Inserting - mysql

I'm looking for the best way to basically get a unique number (guaranteed, not some random string or current time in milliseconds & of a reasonable length about 8 characters) using MySQL (or other ways suggestions welcome).
I just basically want to run some SELECT ... statement and have it always return a unique number with out any inserting into database. Just something that increments some stored value and returns the value and can handle a lot of requests concurrently, without heavy blocking of the application.
I know that I can make something with combinations of random numbers with higher bases (for shorter length), that could make it very unlikely that they overlap, but won't guarantee it.
It just feels like there should be some easy way to get this.
To clarify...
I need this number to be short as it will be part of a URL and it is ok for the query to lock a row for a short period of time. What I was looking for is maybe some command that underhood does something like this ...
LOCK VALUE
RETURN VALUE++
UNLOCK VALUE
Where the VALUE is stored in the database, a MySQL database maybe.

You seek UUID().
http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_uuid
mysql> SELECT UUID();
-> '6ccd780c-baba-1026-9564-0040f4311e29'
It will return a 128-bit hexadecimal number. You can munge as necessary.

Is the unique number to be associated with a particular row in a table? If not, why not call rand(): select rand(); The value returned is between zero and one, so scale as desired.

Great question.
Shortest answer - that is simply not possible according to your specifications.
Long answer - the closest approach to this is MySQL's UUID but that is neither short, nor is sortable (ie: a former UUID value to be greater/smaller than a previous one).
To UUID or not to UUID? is a nice article describing pros and cons regarding their usage, touching also some of the reasons of why you can't have what you need

I am not sure I understand exactly, maybe something like this:
SELECT ROUND(RAND() * 123456789) as id
The larger you make the number, the larger your id.
No guarantees about uniqueness of course, this is a quick hack after all and you should implement a check in code to handle the off chance a duplicate is inserted, but maybe this would serve your purpose?
Of course, there are many other approaches possible to do this.
You can easily use most any scripting language to generate this for you, php example here:
//Generates a 32 character identifier that is extremely difficult to predict.
$id = md5(uniqid(rand(), true));
//Generates a 32 character identifier that is extremely difficult to predict.
$id = md5(uniqid(rand(), true));
Then use $id in your query or whatever you need your unique id in. In my opinion, the advantage of doing this in a scripting language when interacting with a DB is that it is easier to validate for application / usage purposes and act accordingly. For instance, in your example, whatever method you use, if you wanted to be 100% always sure of data integrity, you have to make sure there are no duplicates of that id elsewhere. This is easier to do in a script than in SQL.
Hope that helps my friend, good-luck!

Related

Force binary in query

I have a field in my database which the majority of the time, the case sensitivity doesnt matter.
However there are specific circumstances where this isn't the case.
So the options as I see it are, add the BINARY keyword to the query
select id from table where BINARY fieldname= 'tesT'
or alter the table to make it a case sensitive field.
The question is, does using the BINARY keyword cause additional overhead on the query, or is this negligible.
(it would actually be used in an update where statement)
update table set field='tesT' where BINARY anotherfield='4Rtrtactually '
The short answer is binary has been recommended in a similar SO question, however whether overhead is negligible depends on other parts.
Part one is how you define negligible. If you mean by performance then define your negligible threshold return time(s) and try selects using binary and not using binary to see what kind of return time you get (these need to be cases where case won't matter). Further this by doing an explain to root out any difference.
The second part is it depends on how big the table is and how the where clause column is indexed. Generally speaking, the more rows there are then the more prominent any difference would be. So if you're looking at say ten records in the table then it's probably nothing given your doing a single table update. Ten million records then better chance to be noticeable.
Third part is based on Yann Neuhaus' comment in the MSQL 5.7 reference guide. It might depend on exactly where you put the binary keyword. Assuming his findings are accurate then putting the binary on the constant side of the equals might be faster (or at least different). So test a rewrite of your update to:
update table set field='tesT' where anotherfield = BINARY '4Rtrtactually '
To be honest I have my doubts on this third part but I've seen weirder MySQL magic. Kinda of curious if you test it out what the results would be for your scenario.
Final part might be what type of collation you are using. Based on your question it seems you are using case insensitive collation. Depending on how you are using this data you might want to look into case sensitive or binary collation. This SO question was educational for me on this subject.

MySQL - query by number or letter?

I need to set values to a "Yes or No" column name STATUS. And I'm thinking about 2 methods.
method 1 (use letter): set value Y/N then find all rows that have value Y in field STATUS by a query like:
SELECT * FROM post WHERE status="Y"
method 2 (use number): set value 1/0 then find all rows that have value 1 in field STATUS by a query like:
SELECT * FROM post WHERE status=1
Should I use method 1 or method 2? Which one is faster? Which one is better?
The two are essentially equivalent, so this becomes a question of which is better for your application.
If you are concerned about space, then the smallest space for one character is char(1), using 8 bits. With a number, you can use bit or set types for pack multiple flags. But, this only makes a difference if you have lots of flags.
The store-it-as-a-number approach has a slight advantage, where you can count the "Yes" values by doing:
select sum(status)
(Of course, in MySQL, this is only a marginal improvement on sum(status = 'Y').
The store-it-as-a-letter approach has a slight advantage if you decide to include "Maybe" or other values at some point in the future.
Finally, any difference in performance in different ways of representing these values is going to be very, very minimal. You would need a table with millions and millions of rows to start to notice a problem. So, use the mechanism that works best for your application and way of representing the value.
Second one is definitely faster primarily because whenever you involve something within quotes , it is meaningless to SQL. It would be better to use types that are non string in order to get better performance. I would suggest using METHOD 2.
Fastest way would be ;
SELECT * FROM post WHERE `status` = FIND_IN_SET(`status`,'y');
I think you should create column with ENUM('n','y'). Mysql stores this type in optimal way. It also will help you to store only allowed values in the field.
You can also make it more human friendly ENUM('no','yes') without affect to performance. Because strings 'no' and 'yes' are stored only once per ENUM definition. Mysql stores only index of the value per row.
I think the method 1 is better if you are concerned with the storage prospective .
As storing an integer i.e 1/2 takes 4 bytes of memory where as a character takes only 1 byte of memory. So its better to use method 1.
This may increase some performance .

MySQL: SELECTing by hash: is this possible?

I don't think it has too much sense. Although, this way you could hide the real static value from .php file, but keeping its hash value in php file for mysql query. The source of php file can't be reached from user's machine, but you have make backups of your files, and that static value is there. Selecting using hash of column would resolve this problem, I believe.
But, I didn't find any examples or documentation saying that it's possible to use such functions in queries (not for values in sql queries, but for columns to select).
Is this possible?
An extremely slow query that simply selects all rows with an empty "column".
SELECT * FROM table WHERE MD5(column) = 'd41d8cd98f00b204e9800998ecf8427e'
If you're doing a lot of these queries, consider saving the MD5 hash in a column or index. Even better would be to do all MD5 calculations on the script's end - the day you're going to need an extra server for your project you'll notice that webservers scale a lot better than database servers. (That's something to worry about in the future, of course)
It should be noted that setting up your system this way won't actually solve any problem in your particular case. You are not making your system more secure doing this, you are just making it more convoluted.
The standard way to hide secret values from the source base is to store these secret values in a separate file, and never submit that file to source control or make a backup of it. Load the value of the secret by using php code and then work with the value directly in MySQL (one way to do this is to store a "config.php" file or something along that lines that just sets variables/constants, and then just php-include the file).
That said, I'll answer your question anyway.
MySQL actually has a wide-variety of hashing and encryption functions. See http://dev.mysql.com/doc/refman/5.0/en/encryption-functions.html
Since you tagged your question md5 I'm assuming the function you're looking for is MD5: http://dev.mysql.com/doc/refman/5.0/en/encryption-functions.html#function_md5
You select it just like this:
SELECT MD5(column) AS hashed_column FROM table
Then the value to compare to the hash will be in the column alias 'hashed_column'.
Or to select a particular row based on the hash:
SELECT * FROM table WHERE MD5(column) = 'hashed-value-to-compare'
If I understand correctly, you want to use a hash as a primary key:
INSERT INTO MyTable (pk) VALUES (MD5('plain-value'));
Then you want to retrieve it by hash without knowing what its hash digest is:
SELECT * FROM MyTable WHERE pk = MD5('plain-value');
Somehow this is supposed to provide greater security in case people steal a backup of your database and PHP code? Well, it doesn't. If I know the original plain-value and the method of hashing, I can find the data just as easily as if you didn't hash the value.
I agree with the comment from #scunliffe -- we're not sure exactly what problem you're trying to solve, but it sounds like this method will not solve it.
It's also inefficient to use an MD5 hash digest as a primary key. You have to store it in a CHAR(32), or else UNHEX it and store it in BINARY(16). Regardless, you can't use INT or even BIGINT as the primary key datatype. The key values are more bulky, and therefore make larger indexes.
Also new rows will insert in an arbitrary location in the clustered index. That's more expensive than adding new values to the end of the B-tree, as you would do if you used simple auto-incrementing integers like everyone else.

generate mysql ids as a string instead of a integer

I am sure this is called something that I don't know the name of.
I want to generate ids like:
59AA307E-94C8-47D1-AA50-AAA7500F5B54
instead of the standard auto incremented number.
It doesn't have to be exactly like that, but would like a long unique string value for it.
Is there an easy way to do this?
I want to do it to reference attachments so they are not easily used, like attachment=1
I know there are ways around that, but I figure the string based id would be better if possible, and im sure I am just not searching for the right thing.
Thank you
Last time I checked, you can't specify UUID() as the default constraint for a column in MySQL. That means using a trigger:
CREATE TRIGGER
newid
BEFORE INSERT ON your_table_name
FOR EACH ROW
SET NEW.id = UUID()
I know there are ways around that, but I figure the string based id would be better
I understand you're after the security by obscurity, but be aware that CHAR/VARCHAR columns larger than 4 characters take more space than INT does (1 byte). This will impact performance in retrieval and JOINs.
You could always just pass the regular auto_increment values through SHA1() or MD5() whenever it comes time to send it out to the "public". With a decent salting string before/after the ID value, it'd be pretty much impossible to guess what the original number was. You wouldn't get a fancy looking string like a UUID, but you'd still have a regular integer ID value to deal with internally.
If you're worried about the extra cpu time involved in repeatedly hashing the column, you can always stored the hash value in a seperate field when the record's created.

How does a hash table work? Is it faster than "SELECT * from .."

Let's say, I have :
Key | Indexes | Key-values
----+---------+------------
001 | 100001 | Alex
002 | 100002 | Micheal
003 | 100003 | Daniel
Lets say, we want to search 001, how to do the fast searching process using hash table?
Isn't it the same as we use the "SELECT * from .. " in mysql? I read alot, they say, the "SELECT *" searching from beginning to end, but hash table is not? Why and how?
By using hash table, are we reducing the records we are searching? How?
Can anyone demonstrate how to insert and retrieve hash table process in mysql query code? e.g.,
SELECT * from table1 where hash_value="bla" ...
Another scenario:
If the indexes are like S0001, S0002, T0001, T0002, etc. In mysql i could use:
SELECT * from table WHERE value = S*
isn't it the same and faster?
A simple hash table works by keeping the items on several lists, instead of just one. It uses a very fast and repeatable (i.e. non-random) method to choose which list to keep each item on. So when it is time to find the item again, it repeats that method to discover which list to look in, and then does a normal (slow) linear search in that list.
By dividing the items up into 17 lists, the search becomes 17 times faster, which is a good improvement.
Although of course this is only true if the lists are roughly the same length, so it is important to choose a good method of distributing the items between the lists.
In your example table, the first column is the key, the thing we need to find the item. And lets suppose we will maintain 17 lists. To insert something, we perform an operation on the key called hashing. This just turns the key into a number. It doesn't return a random number, because it must always return the same number for the same key. But at the same time, the numbers must be "spread out" widely.
Then we take the resulting number and use modulus to shrink it down to the size of our list:
Hash(key) % 17
This all happens extremely fast. Our lists are in an array, so:
_lists[Hash(key % 17)].Add(record);
And then later, to find the item using that key:
Record found = _lists[Hash(key % 17)].Find(key);
Note that each list can just be any container type, or a linked list class that you write by hand. When we execute a Find in that list, it works the slow way (examine the key of each record).
Do not worry about what MySQL is doing internally to locate records quickly. The job of a database is to do that sort of thing for you. Just run a SELECT [columns] FROM table WHERE [condition]; query and let the database generate a query plan for you. Note that you don't want to use SELECT *, since if you ever add a column to the table that will break all your old queries that relied on there being a certain number of columns in a certain order.
If you really want to know what's going on under the hood (it's good to know, but do not implement it yourself: that is the purpose of a database!), you need to know what indexes are and how they work. If a table has no index on the columns involved in the WHERE clause, then, as you say, the database will have to search through every row in the table to find the ones matching your condition. But if there is an index, the database will search the index to find the exact location of the rows you want, and jump directly to them. Indexes are usually implemented as B+-trees, a type of search tree that uses very few comparisons to locate a specific element. Searching a B-tree for a specific key is very fast. MySQL is also capable of using hash indexes, but these tend to be slower for database uses. Hash indexes usually only perform well on long keys (character strings especially), since they reduce the size of the key to a fixed hash size. For data types like integers and real numbers, which have a well-defined ordering and fixed length, the easy searchability of a B-tree usually provides better performance.
You might like to look at the chapters in the MySQL manual and PostgreSQL manual on indexing.
http://en.wikipedia.org/wiki/Hash_table
Hash tables may be used as in-memory data structures. Hash tables may also be adopted for use with persistent data structures; database indices sometimes use disk-based data structures based on hash tables, although balanced trees are more popular.
I guess you could use a hash function to get the ID you want to select from. Like
SELECT * FROM table WHERE value = hash_fn(whatever_input_you_build_your_hash_value_from)
Then you don't need to know the id of the row you want to select and can do an exact query. Since you know that the row will always have the same id because of the input you build the hash value form and you can always recreate this id through the hash function.
However this isn't always true depending on the size of the table and the maximum number of hashvalues (you often have "X mod hash-table-size" somewhere in your hash). To take care of this you should have a deterministic strategy you use each time you get two values with the same id. You should check Wikipedia for more info on this strategy, its called collision handling and should be mentioned in the same article as hash-tables.
MySQL probably uses hashtables somewhere because of the O(1) feature norheim.se (up) mentioned.
Hash tables are great for locating entries at O(1) cost where the key (that is used for hashing) is already known. They are in widespread use both in collection libraries and in database engines. You should be able to find plenty of information about them on the internet. Why don't you start with Wikipedia or just do a Google search?
I don't know the details of mysql. If there is a structure in there called "hash table", that would probably be a kind of table that uses hashing for locating the keys. I'm sure someone else will tell you about that. =)
EDIT: (in response to comment)
Ok. I'll try to make a grossly simplified explanation: A hash table is a table where the entries are located based on a function of the key. For instance, say that you want to store info about a set of persons. If you store it in a plain unsorted array, you would need to iterate over the elements in sequence in order to find the entry you are looking for. On average, this will need N/2 comparisons.
If, instead, you put all entries at indexes based on the first character of the persons first name. (A=0, B=1, C=2 etc), you will immediately be able to find the correct entry as long as you know the first name. This is the basic idea. You probably realize that some special handling (rehashing, or allowing lists of entries) is required in order to support multiple entries having the same first letter. If you have a well-dimensioned hash table, you should be able to get straight to the item you are searching for. This means approx one comparison, with the disclaimer of the special handling I just mentioned.