generate mysql ids as a string instead of a integer - mysql

I am sure this is called something that I don't know the name of.
I want to generate ids like:
59AA307E-94C8-47D1-AA50-AAA7500F5B54
instead of the standard auto incremented number.
It doesn't have to be exactly like that, but would like a long unique string value for it.
Is there an easy way to do this?
I want to do it to reference attachments so they are not easily used, like attachment=1
I know there are ways around that, but I figure the string based id would be better if possible, and im sure I am just not searching for the right thing.
Thank you

Last time I checked, you can't specify UUID() as the default constraint for a column in MySQL. That means using a trigger:
CREATE TRIGGER
newid
BEFORE INSERT ON your_table_name
FOR EACH ROW
SET NEW.id = UUID()
I know there are ways around that, but I figure the string based id would be better
I understand you're after the security by obscurity, but be aware that CHAR/VARCHAR columns larger than 4 characters take more space than INT does (1 byte). This will impact performance in retrieval and JOINs.

You could always just pass the regular auto_increment values through SHA1() or MD5() whenever it comes time to send it out to the "public". With a decent salting string before/after the ID value, it'd be pretty much impossible to guess what the original number was. You wouldn't get a fancy looking string like a UUID, but you'd still have a regular integer ID value to deal with internally.
If you're worried about the extra cpu time involved in repeatedly hashing the column, you can always stored the hash value in a seperate field when the record's created.

Related

Tradeoff between using a string or int for column value

I'm making a database table where one of the columns is type. This is the type of thing that's being stored into this row.
Since this software is open source, I have to consider other people using it. I can use an int, which would theoretically be smaller to save in the database as well as much faster on lookup, but then I would have to have some documentation and it would make things more confusing for my users. The other option is to use a string, which takes up much more space and is slower on lookup.
Assuming this table will handle thousands of rows per day, it can reach the point of being unscalable quickly if I select the wrong data type.
Is using int always preferred in this case, when there are many millions of rows potentially in the database?
You are correct, INT is faster and therefore the better choice.
If you are concerned about future developers, add comments to the column explaining each value. If there are a lot of values, consider using a lookup table, so you can ask for a string, get it's numeric ID (a litle bit like a constant) and then look for that.
Like this
id | id_name
---|------------
1 | TYPE_ALPHA
2 | TYPE_BETA
3 | TYPE_DELTA
Now you have a literal explanation of the ID's. Just collect the ID (WHERE id_name = 'TYPE_ALPHA') and then use that to filter your table.
Perhaps a happy medium of the two solutions however is to use the ENUM data type. Documentation here.
If my understanding of ENUM is correct, it treats the field like a string during comparisons, but stores the actual data as numerated integers. When you look for a string, and it's not defined in the table schema, MySQL will simply throw an error, and if it does exist, then it will use the integer equivalent without even showing it. This provides both speed and readability.

Reverse column contents to utilize index?

Based on the query I'm running now I assume this is a pipe dream:
I have an index on a column that contains a string id. Those IDs have an identifier on the end, so to capture the data I need I need to pattern match like so:
key LIKE '%racecar'
Since you can't take advantage of an index with the wildcard starting the string, I was hoping I could do this:
reverse(key) LIKE 'racecar%'
But, this would mean MySQL has to look at, and perform a function on, every single row anyway, is that correct? Any other ways to get around this issue short of changing the naming conventions?
This smells like bad DB design. Split the string and the id into two separate columns and the problem(and many other problems) will be automatically solved.
I also doubt the order of the string and the id will make difference to MYSQL with respect to performance.
Also keep in mind you have an index over key, but this does not mean you have an index over the reverse of key which is the reason you get no speedup at all. I believe that given the situation the performance is beyond salvation.

Getting an Unique Identifier without Inserting

I'm looking for the best way to basically get a unique number (guaranteed, not some random string or current time in milliseconds & of a reasonable length about 8 characters) using MySQL (or other ways suggestions welcome).
I just basically want to run some SELECT ... statement and have it always return a unique number with out any inserting into database. Just something that increments some stored value and returns the value and can handle a lot of requests concurrently, without heavy blocking of the application.
I know that I can make something with combinations of random numbers with higher bases (for shorter length), that could make it very unlikely that they overlap, but won't guarantee it.
It just feels like there should be some easy way to get this.
To clarify...
I need this number to be short as it will be part of a URL and it is ok for the query to lock a row for a short period of time. What I was looking for is maybe some command that underhood does something like this ...
LOCK VALUE
RETURN VALUE++
UNLOCK VALUE
Where the VALUE is stored in the database, a MySQL database maybe.
You seek UUID().
http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_uuid
mysql> SELECT UUID();
-> '6ccd780c-baba-1026-9564-0040f4311e29'
It will return a 128-bit hexadecimal number. You can munge as necessary.
Is the unique number to be associated with a particular row in a table? If not, why not call rand(): select rand(); The value returned is between zero and one, so scale as desired.
Great question.
Shortest answer - that is simply not possible according to your specifications.
Long answer - the closest approach to this is MySQL's UUID but that is neither short, nor is sortable (ie: a former UUID value to be greater/smaller than a previous one).
To UUID or not to UUID? is a nice article describing pros and cons regarding their usage, touching also some of the reasons of why you can't have what you need
I am not sure I understand exactly, maybe something like this:
SELECT ROUND(RAND() * 123456789) as id
The larger you make the number, the larger your id.
No guarantees about uniqueness of course, this is a quick hack after all and you should implement a check in code to handle the off chance a duplicate is inserted, but maybe this would serve your purpose?
Of course, there are many other approaches possible to do this.
You can easily use most any scripting language to generate this for you, php example here:
//Generates a 32 character identifier that is extremely difficult to predict.
$id = md5(uniqid(rand(), true));
//Generates a 32 character identifier that is extremely difficult to predict.
$id = md5(uniqid(rand(), true));
Then use $id in your query or whatever you need your unique id in. In my opinion, the advantage of doing this in a scripting language when interacting with a DB is that it is easier to validate for application / usage purposes and act accordingly. For instance, in your example, whatever method you use, if you wanted to be 100% always sure of data integrity, you have to make sure there are no duplicates of that id elsewhere. This is easier to do in a script than in SQL.
Hope that helps my friend, good-luck!

MySQL: SELECTing by hash: is this possible?

I don't think it has too much sense. Although, this way you could hide the real static value from .php file, but keeping its hash value in php file for mysql query. The source of php file can't be reached from user's machine, but you have make backups of your files, and that static value is there. Selecting using hash of column would resolve this problem, I believe.
But, I didn't find any examples or documentation saying that it's possible to use such functions in queries (not for values in sql queries, but for columns to select).
Is this possible?
An extremely slow query that simply selects all rows with an empty "column".
SELECT * FROM table WHERE MD5(column) = 'd41d8cd98f00b204e9800998ecf8427e'
If you're doing a lot of these queries, consider saving the MD5 hash in a column or index. Even better would be to do all MD5 calculations on the script's end - the day you're going to need an extra server for your project you'll notice that webservers scale a lot better than database servers. (That's something to worry about in the future, of course)
It should be noted that setting up your system this way won't actually solve any problem in your particular case. You are not making your system more secure doing this, you are just making it more convoluted.
The standard way to hide secret values from the source base is to store these secret values in a separate file, and never submit that file to source control or make a backup of it. Load the value of the secret by using php code and then work with the value directly in MySQL (one way to do this is to store a "config.php" file or something along that lines that just sets variables/constants, and then just php-include the file).
That said, I'll answer your question anyway.
MySQL actually has a wide-variety of hashing and encryption functions. See http://dev.mysql.com/doc/refman/5.0/en/encryption-functions.html
Since you tagged your question md5 I'm assuming the function you're looking for is MD5: http://dev.mysql.com/doc/refman/5.0/en/encryption-functions.html#function_md5
You select it just like this:
SELECT MD5(column) AS hashed_column FROM table
Then the value to compare to the hash will be in the column alias 'hashed_column'.
Or to select a particular row based on the hash:
SELECT * FROM table WHERE MD5(column) = 'hashed-value-to-compare'
If I understand correctly, you want to use a hash as a primary key:
INSERT INTO MyTable (pk) VALUES (MD5('plain-value'));
Then you want to retrieve it by hash without knowing what its hash digest is:
SELECT * FROM MyTable WHERE pk = MD5('plain-value');
Somehow this is supposed to provide greater security in case people steal a backup of your database and PHP code? Well, it doesn't. If I know the original plain-value and the method of hashing, I can find the data just as easily as if you didn't hash the value.
I agree with the comment from #scunliffe -- we're not sure exactly what problem you're trying to solve, but it sounds like this method will not solve it.
It's also inefficient to use an MD5 hash digest as a primary key. You have to store it in a CHAR(32), or else UNHEX it and store it in BINARY(16). Regardless, you can't use INT or even BIGINT as the primary key datatype. The key values are more bulky, and therefore make larger indexes.
Also new rows will insert in an arbitrary location in the clustered index. That's more expensive than adding new values to the end of the B-tree, as you would do if you used simple auto-incrementing integers like everyone else.

MySQL: optimal column type for searching

I've been inserting some numbers as INT UNSIGNED in MySQL database. I perform search on this column using "SELECT. tablename WHERE A LIKE 'B'. I'm coming across some number formats that are either too long for unsigned integer or have dashes in them like 123-456-789.
What are some good options for modifying the table here? I see two options (are there others?):
Make another column (VARCHAR(50)) to store numbers with dashes. When a search query detects numbers with dashes, look in this new column.
Recreate the table using a VARCHAR(50) instead of unsigned integer for this column in question.
I'm not sure which way is the better in terms of (a) database structure and (b) search speed. I'd love some inputs on this. Thank you.
Update: I guess I should have included more info.
These are order numbers. The numbers without dashes are for one store (A), and the one with dashes are for Amazon (B; 13 or 14 digits I think with two dashes). A's order numbers should be sortable. I'm not sure if B has to be since the numbers don't mean anything to me really (just a unique number).
If I remove the dashes and put them all together as big int, will there be any decrease in performance in the search queries?
the most important question is how you would like to use the data. What do you need? If you make a varchar, and then you would like to sort it as a number, you will not be able to, since it will be treating it as string..
you can always consider big int, however the question is: do you need dashes? or can you just ignore them on application level? if you need them, it means you need varchar. in that case it might make sense to have two columns if you want to be able to for example sort them as numbers, or perform any calculations. otherwise probably one makes more sense.
you should really provide more context about the problem
Mysql has the PROCEDURE ANALYSE , which helps you to identify with your existing data sets. here's some example.
Given you are running query WHERE A LIKE 'B' mainly. You can also try full text search if "A" varies a lot.
I think option 2 makes the most sense. Just add a new column as varchar(50), put everything in the int column into that varchar, and drop the int. Having 2 separate columns to maintain just isn't a good idea.