MySQL SELECT efficiency - mysql

I am using PHP to interact with a MySQL database, and I was wondering if querying MySQL with a "SELECT * FROM..." is more or less efficient than a "SELECT id FROM...".

Less efficient.
If you think about it, SQL is going to send all the data for each row you select.
Imagine that you have a column called MassiveTextBlock - this column will be included when you SELECT * and so SQL will have to send all of that data over when you may not actually require it. In contrast, SELECT id is just grabbing a collection of numbers.

It is less efficient because you are fetching a lot more information than just SELECT id. Additionally, the second question is much more likely to be served using just an index.

It would depend on why you are selecting the data. If you are writing a raw database interface like phpMySQL, then it may make sense.
If you are doing multiple queries on the same table with the same conditions and concatenation operations, then a SELECT col1, col2 FROM table may make more sense to do than using two independent SELECT col1 FROM table and a SELECT col2 FROM table to get different portions of the same data, as this performs both operations in the same query.
But, in general, you should only select the columns you need from the table in order to minimize unnecessary data from being dredged up by the DBMS. The benefits of this increase greatly if your database is on a different server from the client server, or if your database server is old and/or slow.
There is NO CONDITION in which a SELECT * is unavoidable, but if there is, then your data model probably has some serious design flaws.

It depends on your indexes.
Selecting fewer columns can sometimes save a lot of time, if you select only columns that exist in the index that MySQL has used to fetch the results.
For example, if you have an index on the column id, and you perform this query:
SELECT id FROM mytable WHERE id>5
Then MySQL only needs to read the index, and does not need to even read the table row. If on the other hand, you select additional columns, as on:
SELECT id, name FROM mytable WHERE id>5
Then MySQL will need to read the id from the index, then go to the table row to read the name column.
If, however, you are reading columns that aren't in the index you're selecting on anyway, then reading more columns really won't make nearly as much difference, though it will make a small difference. If some columns contain a large amount of data, such as large TEXT or BLOB columns, then it'd be wise to exclude them if you don't need them.

Related

Is it faster to only query specific columns?

I've heard that it is faster to select colums manually ("col1, col2, col3, etc") instead of querying them all with "*".
But what if I don't even want to query all columns of a table? Would it be faster to query, for Example, only "col1, col2" insteaf of "col1, col2, col3, col4"?
From my understanding SQL has to search through all of the columns anyway, and just the return-result changes. I'd like to know if I can achieve a gain in performance by only choosing the right columns.
(I'm doing this anyway, but a backend API of one of my applications returns more often than not all columns, so I'm thinking about letting the user manually select the columns he want)
In general, reducing the number of columns in the select is a minor optimization. It means that less data is being returned from the database server to the application calling the server. Less data is usually faster.
Under most circumstances, this a minor improvement. There are some cases where the improvement can be more important:
If a covering index is available for the query, so the index satisfies the query without having to access data pages.
If some fields are very long, so records occupy multiple pages.
If the volume of data being retrieved is a small fraction (think < 10%) of the overall data in each record.
Listing the columns individually is a good idea, because it protects code from changes in underlying schema. For instance, if the name of a column is changed, then a query that lists columns explicitly will break with an easy-to-understand error. This is better than a query that runs and produces erroneous results.
You should try not to use select *.
Inefficiency in moving data to the consumer. When you SELECT *, you're often retrieving more columns from the database than your application really needs to function. This causes more data to move from the database server to the client, slowing access and increasing load on your machines, as well as taking more time to travel across the network. This is especially true when someone adds new columns to underlying tables that didn't exist and weren't needed when the original consumers coded their data access.
Indexing issues. Consider a scenario where you want to tune a query to a high level of performance. If you were to use *, and it returned more columns than you actually needed, the server would often have to perform more expensive methods to retrieve your data than it otherwise might. For example, you wouldn't be able to create an index which simply covered the columns in your SELECT list, and even if you did (including all columns [shudder]), the next guy who came around and added a column to the underlying table would cause the optimizer to ignore your optimized covering index, and you'd likely find that the performance of your query would drop substantially for no readily apparent reason.
Binding Problems. When you SELECT *, it's possible to retrieve two columns of the same name from two different tables. This can often crash your data consumer. Imagine a query that joins two tables, both of which contain a column called "ID". How would a consumer know which was which? SELECT * can also confuse views (at least in some versions SQL Server) when underlying table structures change -- the view is not rebuilt, and the data which comes back can be nonsense. And the worst part of it is that you can take care to name your columns whatever you want, but the next guy who comes along might have no way of knowing that he has to worry about adding a column which will collide with your already-developed names.
I got this from this answer.
I believe this topic has already been covered here:
select * vs select column
I believe it covers your concerns as well. Please take a look.
All the column labels and values occupy some space. Sending them to the issuer of the request instead of a subset of the columns means sending more data. More data is sent slower.
If you have columns, like
id, username, password, email, bio, url
and you want to get only the username and password, then
select username, password ...
is quicker than
select * ...
because id, email, bio and url are sent as well for the latter, which makes the response larger. But the main problem with select * is different. It might be the source of inconsistencies if, for some reason the order of the columns changed. Also, it might retrieve data you do not want to retrieve. It is always better to have a whitelist with the columns you actually want to retrieve.

Selecting all fields (but one) instead of using asterix (*) decreases running time by 10 times [duplicate]

I've Googled this question and can't seem to find a consistent opinion, or many opinions that are based on solid data. I simply would like to know if using the wildcard in a SQL SELECT statement incurs additional overhead than calling each item out individually. I have compared the execution plans of both in several different test queries, and it seems that the estimates always read the same. Is it possible that some overhead is incurred elsewhere, or are they truly handled identically?
What I am referring to specifically:
SELECT *
vs.
SELECT item1, item2, etc.
SELECT * FROM...
and
SELECT every, column, list, ... FROM...
will perform the same because both are an unoptimised scan
The difference is:
the extra lookup in sys.columns to resolve *
the contract/signature change when the table schema changes
inability to create a covering index. In fact, no tuning options at all, really
have to refresh views needed if non schemabound
can not index or schemabind a view using *
...and other stuff
Other SO questions on the same subject...
What is the reason not to use select * ?
Is there a difference betweeen Select * and Select list each col
SQL Query Question - Select * from view or Select col1,col2…from view
“select * from table” vs “select colA,colB,etc from table” interesting behaviour in SqlServer2005
Do you mean select * from ... instead of select col1, col2, col3 from ...?
I think it's always better to name the column and retrieve the minimal amount of information, because
your code will work independently of the physical order of the columns in the db. The column order should not impact your application, but it will be the case if you use *. It can be dangerous in case of db migration, etc.
if you name the columns, the DBMS can optimize further the execution. For instance, if there is an index that contains all the data your are interested in, the table will not be accessed at all.
If you mean something else with "wildcard", just ignore my answer...
EDIT: If you are talking about the asterisk wild card as in Select * From ... then see other responses...
If you are talking about wildcards in predicate clauses, or other query expressions using Like operator, (_ , % ) as described below, then:
This has to do with whether using the Wildcard affects whether the SQL is "SARG-ABLE" or not. SARGABLE, (Search-ARGument-able)means whether or not the query's search or sort arguments can be used as entry parameters to an existing index. If you prepend the wild card to the beginning of an argument
Where Name Like '%ing'
Then there is no way to traverse an index on the name field to find the nodes that end in 'ing'.
If otoh you append the wildcard to the end,
Where Name like 'Donald%'
then the optimizer can still use an index on the name column, and the query is still SARG-able
If that you call SQL wild car is *. It does not imply performance overhead by it self. However, if the table is extended you could find yourself retrieving fields you doesn't search.
In general not being specific in the fields you search or insert is a bad habit.
Consider
insert into mytable values(1,2)
What happen if the table is extended to three fields?
It may not be more work from an execution plan standpoint. But if you're fetching columns you don't actually need, that's additional network bandwidth being used between the database and your application. Also if you're using a high-level client API that performs some work on the returned data (for example, Perl's selectall_hashref) then those extra columns will impose performance cost on the client side. How much? Depends.

Why shouldn't we use Select * in a mysql query on a production server?

Based on this question here Selecting NOT NULL columns from a table One of the posters said
you shouldn't use SELECT * in production.
My Question: Is it true that we shouldn't use Select * in a mysql query on a production server? If yes, why shouldn't we use select all?
Most people do advise against using SELECT * in production, because it tends to break things. There are a few exceptions though.
SELECT * fetches all columns - while most of the times you don't
need them all. This causes the SQL-server to send more columns than
needed, which is a waste and makes the system slower.
With SELECT *, when you later add a column, the old query will also
select this new column, while typically it will not need it. Naming
the columns explicitly prevents this.
Most people that write SELECT * queries also tend to grab the rows
and use column order to get the columns - which WILL break your code
once columns are injected between existing columns.
Explicitly naming the columns also guarantees they are always in the same order, while SELECT * might behave differently when the table column order is modified.
But there are exceptions, for example statements like these:
INSERT INTO table_history
SELECT * FROM table
A query like that takes rows from table, and inserts them into table_history. If you want this query to keep working when new rows are added to table AND to table_history, SELECT * is the way to go.
Remember that your database server isn't necessarily on the same machine as the program querying the database. The database server could be on a network with limited bandwidth; it could even be halfway across the world.
If you really do need every column, then by all means do SELECT * FROM table.
If you only need certain columns, though, it would waste bandwidth to ask for all columns using SELECT * FROM table only to throw half the columns away.
Other potential reasons it might be good to specify which exact columns you want:
The database structure may change. If your program assumes certain column names, then it may fail if the column names change, for example. Explicitly naming the columns you want to retrieve will make the program fail immediately if your assumptions about the column names are violated.
As #Konerak mentioned, naming the columns you want also ensures that the order of the columns in your result is the same, even if the table schema changes (i.e. inserting one column in-between two others.) This is important if you're depending on FirstName being the [2]nd element of a result.
(Note: a more robust and self-documenting way of dealing with this is to ask for your database results as a list of key-value pairs, like a PHP associative array, Perl hash or a Python dict. That way you never need to use a number to index into the result (name = result[2] ) - instead you can use the column name: name = result["FirstName"].)
Using SELECT * is very inefficient, especially for tables that have a lot of columns. You should only select the columns you need.
Besides this, using column names makes the query easier to read and maintain.

Does the order of columns in a query matter?

When selecting columns from a MySQL table, is performance affected by the order that you select the columns as compared to their order in the table (not considering indexes that may cover the columns)?
For example, you have a table with rows uid, name, bday, and you have the following query.
SELECT uid, name, bday FROM table
Does MySQL see the following query any differently and thus cause any sort of performance hit?
SELECT uid, bday, name FROM table
The order doesn't matter, actually, so you are free to order them however you'd like.
edit: I guess a bit more background is helpful: As far as I know, the process of optimizing any query happens prior to determining exactly what subset of the row data is being pulled. So the query optimizer breaks it down into first what table to look at, joins to perform, indexes to use, aggregates to apply, etc., and then retrieves that dataset. The column ordering happens between the data pull and the formation of the result set, so the data actually "arrives" as ordered by the database, and is then reordered as it is returned to your application.
In practice, I suspect it might.
With a decent query optimiser: it shouldn't.
You can only tell for your cases by measuring. And the measurements will likely change as the distribution of data changes in the database.
with regards
Wazzy
The order of the attributes selected is negligible. The underlying storage engines surely order their attribute locations, but you would not necessarily have a way to know the specific ordering (renames, alter tables, row vs. column stores) in most cases may be independent from the table description which is just meta data anyway. The order of presentation into the result set would be insignificant in terms of any measurable overhead.

MySql queries: really never use SELECT *?

I'm a self taught developer and ive always been told not to use SELECT *, but most of my queries require to know all the values of a certain row...
what should i use then? should i list ALL of the properties every time? like Select elem1,elem2,elem3,....,elem15 FROM...?
thanks
If you really need all the columns and you're fetching the results by name, I would go ahead and use SELECT *. If you're fetching the row results by index, then it makes sense to specify the column names or else they might not be in the order you expect (especially if the table schema changes).
SELECT * FROM ... is not always the best way to go unless you need all columns. That's because if for example a table has 10 columns and you only need 2-3 of them and these columns are indexed, then if you use SELECT * the query will run slower, because the server must fetch all rows from the datafiles. If instead you used only the 2-3 columns that you actually needed, then the server could run the query much faster if the rows were fetch from a covering index. A covering index is one that is used to return results without reading the datafile.
So, use SELECT * only when you actually need all columns.
If you absolutely have to use *, try to limit it to a specific table; e.g.:
SELECT t.*
FROM mytable t
List only the columns that you need, ideally with a table alias:
SELECT t.elem1,
t.elem2
FROM YOUR_TABLE t
The presence of a table alias helps demonstrate what is a column (and where it's from) vs a derived column.
If you are positive that you will always need all the columns then select * should be ok. But my reasoning for avoiding it is: say another developer has another column added to the table which isn't required by your query..then there is overhead. This can get worse as more columns get added.
The only real performance hit you take from using select * is in the bandwidth required to send back extra columns in your result set, if they're not necessary. Other than that, there's nothing inherently "bad" about using select *.
You might SELECT * from a subquery.
Yes, Select * is bad. You do not state what language you will be using process the returned data. Suppose you receive these records back as an array (not a hash map). In this case, what is in Row[12]? Maybe it was ZipCode when you wrote the app, but guess what happens when someone inserts a field SuiteNumber before ZipCode.
Or suppose the next coder appends a huge blob field to each record. Ouch!
Or a little more subtle: let's assume you're doing a join or subselect, and you have no Text or Blob type fields. MySQL will create any temp files it needs in memory. But as soon as you include a Text field (even TinyText), MySQL will need to create the temp file on disk, and call sort-merge. This won't break the program, but it can kill performance.
Select * is sacrificing maintainability in order to save a little typing.