MySQL Best way to select using list of ID's - mysql

Through a webservice my application receives a list of identifiers.
With this list, I have to look up a field, one per identifier.
If the field does not exist, the value should be null (it has to be shown).
I am wondering what would be the best approach. First I thought it was best to create a temporary table holding the id's and then joining it to the table holding the data, but if I'm correct this takes at least 1 query per identifier to insert it in the temporary table.
In that case, it seems that I could just as well iterate through the list of identifiers in my application and query the database 1 by 1. Is this correct?
Which approach can you advise?
greetings,
coen

Use the SELECT WHERE IN() Syntax to get a result set with the data you need, then iterate over it in your code. This way you only query the DB once and only get the information you need.

Showing nulls is the trick, you need to join the table to itself, so there are two index lookups per record. Just doing a 1-to-1 query for each identifier will only require one index lookup.
In practice, it won't be twice as slow, since the identifier will be in the key cache by the time the second lookup is executed.
Another option is to render your output using the input identifiers, and use an "IN" like previously suggested. The null records won't be in the query output, but that would be ok since you know what was requested.

Related

Mysql: use value as alias in query

given a table
create table mymy(A int(2),B int(2))
is it possible to use a field value as an alias? Something like (not really):
select A as valueOf(B) from mymy.
No. You can't. The values are not known until the query is run. And even if you could, you'd have a lot of possibly different values in one column. Which one should be used?
The only valid reason I can imagine for such a request is that you have some kind of EAV design and you want to have a Pivot result.
If that's the case, you could use Dymanic SQL (run a query, get the results, build another query based on those results and run that one.) But this kind of operations is better done at the application side (get the results and format them there, as you prefer).

MySQL more than one INDEX key for the same column

Is it right to have more than one INDEX key for the same column in MySQL database?
for example, id field indexed twice with different Keyname while phpmyadmin gives me a warning message:
More than one INDEX key was created
for column id
I would like to know if that is ok, and if there are any effects or side-effect on my script or the server using this method?
I use this method for grouping columns for each index.
Indexing a single column twice is useless, slows down inserts and updates because now you have to (uselessly) maintain two indexes, and probably confuses the optimizer (if it doesn't actually break something). On the other hand, it's fine (and often useful) to have a column indexed alone and then also as part of one or more compound keys.
Theoretically it can be a good idea to have a reverse index on a column as well as the normal index. Not sure if its supported by MySQL though.
It depends what you are seraching for. If you are expecting the user to search for lastnames, and you store first and last names in the same column, then many searches will be of the form
LIKE %lastname
In that case, a normal index will not help much, because it builds the index from the beginning of the string. It will need to look through every record to see that it at some point doesn't contain the search string. A reverse index, will be useful, because it indexes from the back and foward. Using double indexes would speed up this particular kind of search.
With wildcards at both the beginning and the end.

MySQL: is there something like an internal record identifier for every record in a MySQL table?

I'm building a spreadsheet app using MySQL as storage, I need to identify records that are being updated client-side in order to save the changes.
Is there a way, such as some kind of "internal record identifier" (internal as in used by the database engine itself), to uniquely identify records, so that I'll be able to update the correct one?
Certainly, a SELECT query can be used to identify the record, including all the fields in the table, but obviously that has the downside of returning multiple records in most situations.
IMPORTANT: the spreadsheet app aims to work on ANY table, even ones tremendously poorly designed, without any keys, so solutions such as "define a field with an UNIQUE index and work with that" are not an option, table structure may be extremely variable and must not matter.
Many thanks.
AFAIK no such unique internal identifier (say, a simple row ID) exists.
You may maybe be able to run a SELECT without any sorting and then get the n-th row using a LIMIT. Under what conditions that is reliable and safe to use, a mySQL Guru would need to confirm. It probably never is.
Try playing around with phpMyAdmin, the web frontend to mySQL. It is designed to deal with badly designed tables without keys. If I remember correctly, it uses all columns it can get hold of in such cases:
UPDATE xyz set a = b WHERE 'fieldname' = 'value'
AND 'fieldname2' = 'value2'
AND 'fieldname3' = 'value3'
LIMIT 0,1;
and so on.
That isn't entirely duplicate-safe either, of course.
The only idea that comes to my mind is to add a key column at runtime, and to remove it when your app is done. It's a goose-bump-inducing idea, but maybe better than nothing.
MySQL has "auto-increment" numeric columns that you can add and even define as a primary key, that would give you a unique record id automatically generated by the database. You can query the last record id you just inserted with select LAST_INSERT_ID()
example from mysql's official documentation here
To my knowledge, MySQL lacks the implicit ROWID feature as seen in Oracle (and exists in other engines with their own syntax). You'll have to create your own AUTO_INCREMENT field.

Does MYSQL stores it in an optimal way it if the same string is stored in multiple rows?

I have a table where one of the columns is a sort of id string used to group several rows from the table. Let's say the column name is "map" and one of the values for map is e.g. "walmart". The column has an index on it, because I use to it filter those rows which belong to a certain map.
I have lots of such maps and I don't know how much space the different map values take up from the table. Does MYSQL recognizes the same map value is stored for multiple rows and stores it only once internally and only references it with an internal numeric id?
Or do I have to replace the map string with a numeric id explicitly and use a different table to pair map strings to ids if I want to decrease the size of the table?
MySQL will store the whole data for every row, regardless of whether the data already exists in a different row.
If you have a limited set of options, you could use an ENUM field, else you could pull the names into another table and join on it.
I think MySQL will duplicate your content each time : it stores data row by row, unless you explicitly specify otherwise (putting the data in another table, like you suggested).
Using another table will mean you need to add a JOIN in some of your queries : you might want to think a bit about the size of your data (are they that big ?), compared to the (small ?) performance loss you may encounter because of that join.
Another solution would be using an ENUM datatype, at least if you know in advance which string you will have in your table, and there are only a few of those.
Finally, another solution might be to store an integer "code" corresponding to the strings, and have those code translated to strings by your application, totally outside of the database (or use some table to store the correspondances, but have that table cached by your application, instead of using joins in SQL queries).
It would not be as "clean", but might be better for performances -- still, this may be some kind of micro-optimization that is not necessary in your case...
If you are using the same values over and over again, then there is a good functional reason to move it to a separate table, totally aside from disk space considerations: To avoid problems with inconsistent data.
Suppose you have a table of Stores, which includes a column for StoreName. Among the values in StoreName "WalMart" occurs 300 times, and then there's a "BalMart". Is that just a typo for "WalMart", or is that a different store?
Also, if there's other data associated with a store that would be constant across the chain, you should store it just once and not repeatedly.
Of course, if you're just showing locations on a map and you really don't care what they are, it's just a name to display, then this would all be irrelevant.
And if that's the case, then buying a bigger disk is probably a simpler solution than redesigning your database just to save a few bytes per record. Because if we're talking arbitrary strings for place names here, then trying to find duplicates and have look-ups for them is probably a lot of work for very little gain.

How does a hash table work? Is it faster than "SELECT * from .."

Let's say, I have :
Key | Indexes | Key-values
----+---------+------------
001 | 100001 | Alex
002 | 100002 | Micheal
003 | 100003 | Daniel
Lets say, we want to search 001, how to do the fast searching process using hash table?
Isn't it the same as we use the "SELECT * from .. " in mysql? I read alot, they say, the "SELECT *" searching from beginning to end, but hash table is not? Why and how?
By using hash table, are we reducing the records we are searching? How?
Can anyone demonstrate how to insert and retrieve hash table process in mysql query code? e.g.,
SELECT * from table1 where hash_value="bla" ...
Another scenario:
If the indexes are like S0001, S0002, T0001, T0002, etc. In mysql i could use:
SELECT * from table WHERE value = S*
isn't it the same and faster?
A simple hash table works by keeping the items on several lists, instead of just one. It uses a very fast and repeatable (i.e. non-random) method to choose which list to keep each item on. So when it is time to find the item again, it repeats that method to discover which list to look in, and then does a normal (slow) linear search in that list.
By dividing the items up into 17 lists, the search becomes 17 times faster, which is a good improvement.
Although of course this is only true if the lists are roughly the same length, so it is important to choose a good method of distributing the items between the lists.
In your example table, the first column is the key, the thing we need to find the item. And lets suppose we will maintain 17 lists. To insert something, we perform an operation on the key called hashing. This just turns the key into a number. It doesn't return a random number, because it must always return the same number for the same key. But at the same time, the numbers must be "spread out" widely.
Then we take the resulting number and use modulus to shrink it down to the size of our list:
Hash(key) % 17
This all happens extremely fast. Our lists are in an array, so:
_lists[Hash(key % 17)].Add(record);
And then later, to find the item using that key:
Record found = _lists[Hash(key % 17)].Find(key);
Note that each list can just be any container type, or a linked list class that you write by hand. When we execute a Find in that list, it works the slow way (examine the key of each record).
Do not worry about what MySQL is doing internally to locate records quickly. The job of a database is to do that sort of thing for you. Just run a SELECT [columns] FROM table WHERE [condition]; query and let the database generate a query plan for you. Note that you don't want to use SELECT *, since if you ever add a column to the table that will break all your old queries that relied on there being a certain number of columns in a certain order.
If you really want to know what's going on under the hood (it's good to know, but do not implement it yourself: that is the purpose of a database!), you need to know what indexes are and how they work. If a table has no index on the columns involved in the WHERE clause, then, as you say, the database will have to search through every row in the table to find the ones matching your condition. But if there is an index, the database will search the index to find the exact location of the rows you want, and jump directly to them. Indexes are usually implemented as B+-trees, a type of search tree that uses very few comparisons to locate a specific element. Searching a B-tree for a specific key is very fast. MySQL is also capable of using hash indexes, but these tend to be slower for database uses. Hash indexes usually only perform well on long keys (character strings especially), since they reduce the size of the key to a fixed hash size. For data types like integers and real numbers, which have a well-defined ordering and fixed length, the easy searchability of a B-tree usually provides better performance.
You might like to look at the chapters in the MySQL manual and PostgreSQL manual on indexing.
http://en.wikipedia.org/wiki/Hash_table
Hash tables may be used as in-memory data structures. Hash tables may also be adopted for use with persistent data structures; database indices sometimes use disk-based data structures based on hash tables, although balanced trees are more popular.
I guess you could use a hash function to get the ID you want to select from. Like
SELECT * FROM table WHERE value = hash_fn(whatever_input_you_build_your_hash_value_from)
Then you don't need to know the id of the row you want to select and can do an exact query. Since you know that the row will always have the same id because of the input you build the hash value form and you can always recreate this id through the hash function.
However this isn't always true depending on the size of the table and the maximum number of hashvalues (you often have "X mod hash-table-size" somewhere in your hash). To take care of this you should have a deterministic strategy you use each time you get two values with the same id. You should check Wikipedia for more info on this strategy, its called collision handling and should be mentioned in the same article as hash-tables.
MySQL probably uses hashtables somewhere because of the O(1) feature norheim.se (up) mentioned.
Hash tables are great for locating entries at O(1) cost where the key (that is used for hashing) is already known. They are in widespread use both in collection libraries and in database engines. You should be able to find plenty of information about them on the internet. Why don't you start with Wikipedia or just do a Google search?
I don't know the details of mysql. If there is a structure in there called "hash table", that would probably be a kind of table that uses hashing for locating the keys. I'm sure someone else will tell you about that. =)
EDIT: (in response to comment)
Ok. I'll try to make a grossly simplified explanation: A hash table is a table where the entries are located based on a function of the key. For instance, say that you want to store info about a set of persons. If you store it in a plain unsorted array, you would need to iterate over the elements in sequence in order to find the entry you are looking for. On average, this will need N/2 comparisons.
If, instead, you put all entries at indexes based on the first character of the persons first name. (A=0, B=1, C=2 etc), you will immediately be able to find the correct entry as long as you know the first name. This is the basic idea. You probably realize that some special handling (rehashing, or allowing lists of entries) is required in order to support multiple entries having the same first letter. If you have a well-dimensioned hash table, you should be able to get straight to the item you are searching for. This means approx one comparison, with the disclaimer of the special handling I just mentioned.