for security purpose I do some queries in this way:
SELECT avatar_data FROM users WHERE MD5(ID) ='md5value'
So, for example I have this entries:
-TABLE.users-
ID | avatar_data
39 | some-data
I do this query:
SELECT avatar_data FROM users WHERE MD5(ID) ='d67d8ab4f4c10bf22aa353e27879133c'
'd67d8ab4f4c10bf22aa353e27879133c' is the '39' value filtered by MD5.
I have a VERY large database with a lot of entries. I wonder if this approach might compromise the DB performance?
Because you are using a function on the column you want to search ( MD5(ID)= ), MySQL will have to do a full table scan.
While I am not sure your reason for doing a search like that, to speed things up, I can suggest you add another column with the processed ID data and index it.
So you should do:
SELECT * FROM user WHERE MD5_ID =
'd67d8ab4f4c10bf22aa353e27879133c'
With that query and without functional indexes, yes you would table-scan the whole thing. If you do that often, you may want to pre-compute the digest into a surrogate table or in another column, index and lookup directly.
Yes that would probably get very slow and it really doesn't add any security. MD5 of '39' is pretty easy to figure out. For a one way hash to be successful it needs to contain values that would be unknown to an attacker. Otherwise the attacker is just going to hash the value and you've not really accomplished anything.
You might consider posting more about what you're doing. For example: is this a web administration tool? Is it password protected? Etc.
if you want this kind of security you probably be better out if you save the passwords as a md5 hash. encoding id's dont realy give security
Related
There's 10 tables all with a session_id column and a single session table. The goal is to join them all on the session table. I get the feeling that this is a major code smell. Is this good/bad practice ?
What problems could occur?
Whether this is a good design or not depends deeply on what you are trying to represent with it. So, it might be OK or it might not be... there's no way to tell just from your question in its current form.
That being said, there are couple ways to speed up a join:
Use indexes.
Use covering indexes.
Under the right DBMS, you could use a materialized view to store pre-joined rows. You should be able to simulate that under MySQL by maintaining a special table via triggers (or even manually).
Don't join a table unless you actually need its fields. List only the fields you need in the SELECT list (instead of blindly using *). The fastest operation is the one you don't have to do!
And above all, measure on representative amounts of data! Possible results:
It's lightning fast. Yay!
It's slow, but it doesn't matter that it's slow (i.e. rarely used / not important).
It's slow and it matters that it's slow. Strap-in, you have work to do!
We need Query with 11 joins and the EXPLAIN posted in the original question when it is available, please. And be kind to your community, for every table involved post as well SHOW CREATE TABLE tblname SHOW INDEX FROM tblname to avoid additional requests for these 11 tables. And we will know scope of data and cardinality involved for each indexed column.
of Course more join kills performance.
but it depends !! if your data model is like that then you can't help yourself here unless complete new data model re-design happen !!
1) is it a online(real time transaction ) DB or offline DB (data warehouse)
if online , then better maintain single table. keep data in one table , let column increase in size.!!
if offline , it's better to maintain separate table , because you are not going to required all column always.!!
I'll have to create database(s) to store very large amounts of data but being able to extract data fast enough using MySQL.
I was wondering if it will help if I create a new database or a new tables set for each user instead of using a single large database.
The only worry I have is that the users will be many but I hope that they will not use the project at the same time.
Does anyone have any experience with similar structures or any other advice to solve the problem?
use single database & single "User" table for all users...allocate theme unique id....
or if a single user have many datas.....
Example
if user_a have 10 books....
make a different table like "Book_table" with this structure.....
id , user_id , book_name
you can use this structure for multiple users......
Please explain more what user's details you have......i will try to give more accurate answer.....
Definitely use only one User table. If you need to search that table quickly you can index columns that you need fast searching on. For examplel you have a user table with first_name, last_name and email address. These are text fields so searching would be slow since it needs to scan the entire table and compare all the strings (note string comparison is much slower than integers). To get around this you can index columns like email address which is sort of like an ordered table (hidden away) which only has emails in it. Ordered tables are much easier to search so your searches would be fast.
Here is a good basic example of how to use indexing. Be careful with indexes though since they add overhead on your inserts. Some reading up on your part will really help here.
Anyway that is a really over simplified explanation of indexing.
I am wondering how MySQL finds the rows in a table when searching like so:
select * from table where field = 'text';
Does it use a particular search algorithm? Is it practically the fastest way to look up information in a table? Or would building a search macro using another algorithm (like Boyer-Moore) work faster?
If there is an index on field, then databases often use a b-tree for indexed searches. If there is no index, then the entire table is scanned. This describes some of the techniques used in MySql
http://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html
Many hours of work has gone into optimizing MySql. Take advantage of that work already done, and resist trying to re-doing it
For that query it can do nothing other than searching every entry of that table and comparing its field column against that string.
Boyer-Moore isn't needed because it's exact equality that's requested and not asking whether the field contains that string.
If you are interested in how it found those records try executing using the EXPLAIN keyword:
EXPLAIN select * from table where field = 'text';
I would recommend looking at this article to get a better understanding what is happening in the background.
I would be very surprised if you would be able to write something on your own that is faster. You could look at creating indexes on the table in question to speed up selects.
Okay, mysql indexing. Is indexing nothing more than having a unique ID for each row that will be used in the WHERE clause?
When indexing a table does the process add any information to the table? For instance, another column or value somewhere.
Does indexing happen on the fly when retrieving values or are values placed into the table much like an insert or update function?
Any more information to clearly explain mysql indexing would be appreciated. And please dont just place a link to the mysql documentation, it is confusing and it is always better to get a personal response from a professional.
Lastly, why is indexing different from telling mysql to look for values between two values. For Example: WHERE create_time >= 'AweekAgo'
I'm asking because one of my tables is 220,000+ rows and it takes more than a minute to return values with a very simple mysql select statement and I'm hoping indexing will speed this up.
Thanks in advanced.
You were down voted because you didn't make effort to read or search for what you are asking for. A simple search in google could have shown you the benefits and drawbacks of Database Index. Here is a related question on StackOverflow. I am sure there are numerous questions like that.
To simplify the jargons, it would be easier to locate books in a library if you arrange the in shelves numbered according to their area of specialization. You can easily tell somebody to go to a specific location and pick the book - that is what index does
Another example: imagine an alphabetically ordered admission list. If your name start with Z, you will just skip A to Y and get to Z - faster? If otherwise, you will have to search and search and may not even find it if you didn't look carefully
A database index is a data structure that improves the speed of operations in a table. Indexes can be created using one or more columns, providing the basis for both rapid random lookups and efficient ordering of access to records.
You can create an index like this way :
CREATE INDEX index_name
ON table_name ( column1, column2,...);
You might be working on a more complex database, so it's good to remember a few simple rules.
Indexes slow down inserts and updates, so you want to use them carefully on columns that are FREQUENTLY updated.
Indexes speed up where clauses and order by.
For further detail, you can read :
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
http://www.tutorialspoint.com/mysql/mysql-indexes.htm
There are a lot of indexing, for example a hash, a trie, a spatial index. It depends on the value. Most likely it's a hash and a binary search tree. Nothing really fancy because most likely the fancy thing is expensive.
I don't think it has too much sense. Although, this way you could hide the real static value from .php file, but keeping its hash value in php file for mysql query. The source of php file can't be reached from user's machine, but you have make backups of your files, and that static value is there. Selecting using hash of column would resolve this problem, I believe.
But, I didn't find any examples or documentation saying that it's possible to use such functions in queries (not for values in sql queries, but for columns to select).
Is this possible?
An extremely slow query that simply selects all rows with an empty "column".
SELECT * FROM table WHERE MD5(column) = 'd41d8cd98f00b204e9800998ecf8427e'
If you're doing a lot of these queries, consider saving the MD5 hash in a column or index. Even better would be to do all MD5 calculations on the script's end - the day you're going to need an extra server for your project you'll notice that webservers scale a lot better than database servers. (That's something to worry about in the future, of course)
It should be noted that setting up your system this way won't actually solve any problem in your particular case. You are not making your system more secure doing this, you are just making it more convoluted.
The standard way to hide secret values from the source base is to store these secret values in a separate file, and never submit that file to source control or make a backup of it. Load the value of the secret by using php code and then work with the value directly in MySQL (one way to do this is to store a "config.php" file or something along that lines that just sets variables/constants, and then just php-include the file).
That said, I'll answer your question anyway.
MySQL actually has a wide-variety of hashing and encryption functions. See http://dev.mysql.com/doc/refman/5.0/en/encryption-functions.html
Since you tagged your question md5 I'm assuming the function you're looking for is MD5: http://dev.mysql.com/doc/refman/5.0/en/encryption-functions.html#function_md5
You select it just like this:
SELECT MD5(column) AS hashed_column FROM table
Then the value to compare to the hash will be in the column alias 'hashed_column'.
Or to select a particular row based on the hash:
SELECT * FROM table WHERE MD5(column) = 'hashed-value-to-compare'
If I understand correctly, you want to use a hash as a primary key:
INSERT INTO MyTable (pk) VALUES (MD5('plain-value'));
Then you want to retrieve it by hash without knowing what its hash digest is:
SELECT * FROM MyTable WHERE pk = MD5('plain-value');
Somehow this is supposed to provide greater security in case people steal a backup of your database and PHP code? Well, it doesn't. If I know the original plain-value and the method of hashing, I can find the data just as easily as if you didn't hash the value.
I agree with the comment from #scunliffe -- we're not sure exactly what problem you're trying to solve, but it sounds like this method will not solve it.
It's also inefficient to use an MD5 hash digest as a primary key. You have to store it in a CHAR(32), or else UNHEX it and store it in BINARY(16). Regardless, you can't use INT or even BIGINT as the primary key datatype. The key values are more bulky, and therefore make larger indexes.
Also new rows will insert in an arbitrary location in the clustered index. That's more expensive than adding new values to the end of the B-tree, as you would do if you used simple auto-incrementing integers like everyone else.