MySql queries: really never use SELECT *? - mysql

I'm a self taught developer and ive always been told not to use SELECT *, but most of my queries require to know all the values of a certain row...
what should i use then? should i list ALL of the properties every time? like Select elem1,elem2,elem3,....,elem15 FROM...?
thanks

If you really need all the columns and you're fetching the results by name, I would go ahead and use SELECT *. If you're fetching the row results by index, then it makes sense to specify the column names or else they might not be in the order you expect (especially if the table schema changes).

SELECT * FROM ... is not always the best way to go unless you need all columns. That's because if for example a table has 10 columns and you only need 2-3 of them and these columns are indexed, then if you use SELECT * the query will run slower, because the server must fetch all rows from the datafiles. If instead you used only the 2-3 columns that you actually needed, then the server could run the query much faster if the rows were fetch from a covering index. A covering index is one that is used to return results without reading the datafile.
So, use SELECT * only when you actually need all columns.

If you absolutely have to use *, try to limit it to a specific table; e.g.:
SELECT t.*
FROM mytable t

List only the columns that you need, ideally with a table alias:
SELECT t.elem1,
t.elem2
FROM YOUR_TABLE t
The presence of a table alias helps demonstrate what is a column (and where it's from) vs a derived column.

If you are positive that you will always need all the columns then select * should be ok. But my reasoning for avoiding it is: say another developer has another column added to the table which isn't required by your query..then there is overhead. This can get worse as more columns get added.

The only real performance hit you take from using select * is in the bandwidth required to send back extra columns in your result set, if they're not necessary. Other than that, there's nothing inherently "bad" about using select *.

You might SELECT * from a subquery.

Yes, Select * is bad. You do not state what language you will be using process the returned data. Suppose you receive these records back as an array (not a hash map). In this case, what is in Row[12]? Maybe it was ZipCode when you wrote the app, but guess what happens when someone inserts a field SuiteNumber before ZipCode.
Or suppose the next coder appends a huge blob field to each record. Ouch!
Or a little more subtle: let's assume you're doing a join or subselect, and you have no Text or Blob type fields. MySQL will create any temp files it needs in memory. But as soon as you include a Text field (even TinyText), MySQL will need to create the temp file on disk, and call sort-merge. This won't break the program, but it can kill performance.
Select * is sacrificing maintainability in order to save a little typing.

Related

Should I avoid ORDER BY in queries for large tables?

In our application, we have a page that displays user a set of data, a part of it actually. It also allows user to order it by a custom field. So in the end it all comes down to query like this:
SELECT name, info, description FROM mytable
WHERE active = 1 -- Some filtering by indexed column
ORDER BY name LIMIT 0,50; -- Just a part of it
And this worked just fine, as long as the size of table is relatively small (used only locally in our department). But now we have to scale this application. And let's assume, the table has about a million of records (we expect that to happen soon). What will happen with ordering? Do I understand correctly, that in order to do this query, MySQL will have to sort a million records each time and give a part of it? This seems like a very resource-heavy operation.
My idea is simply to turn off that feature and don't let users select their custom ordering (maybe just filtering), so that the order would be a natural one (by id in descending order, I believe the indexing can handle that).
Or is there a way to make this query work much faster with ordering?
UPDATE:
Here is what I read from the official MySQL developer page.
In some cases, MySQL cannot use indexes to resolve the ORDER BY,
although it still uses indexes to find the rows that match the WHERE
clause. These cases include the following:
....
The key used to
fetch the rows is not the same as the one used in the ORDER BY:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
So yes, it does seem like mysql will have a problem with such a query? So, what do I do - don't use an order part at all?
The 'problem' here seems to be that you have 2 requirements (in the example)
active = 1
order by name LIMIT 0, 50
The former you can easily solve by adding an index on the active field
The latter you can improve by adding an index on name
Since you do both in the same query, you'll need to combine this into an index that lets you resolve the active value quickly and then from there on fetches the first 50 names.
As such, I'd guess that something like this will help you out:
CREATE INDEX idx_test ON myTable (active, name)
(in theory, as always, try before you buy!)
Keep in mind though that there is no such a thing as a free lunch; you'll need to consider that adding an index also comes with downsides:
the index will make your INSERT/UPDATE/DELETE statements (slightly) slower, usually the effect is negligible but only testing will show
the index will require extra space in de database, think of it as an additional (hidden) special table sitting next to your actual data. The index will only hold the fields required + the PK of the originating table, which usually is a lot less data then the entire table, but for 'millions of rows' it can add up.
if your query selects one or more fields that are not part of the index, then the system will have to fetch the matching PK fields from the index first and then go look for the other fields in the actual table by means of the PK. This probably is still (a lot) faster than when not having the index, but keep this in mind when doing something like SELECT * FROM ... : do you really need all the fields?
In the example you use active and name but from the text I get that these might be 'dynamic' in which case you'd have to foresee all kinds of combinations. From a practical point this might not be feasible as each index will come with the downsides of above and each time you add an index you'll add supra to that list again (cumulative).
PS: I use PK for simplicity but in MSSQL it's actually the fields of the clustered index, which USUALLY is the same thing. I'm guessing MySQL works similarly.
Explain your query, and check, whether it goes for filesort,
If Order By doesnt get any index or if MYSQL optimizer prefers to avoid the existing index(es) for sorting, it goes with filesort.
Now, If you're getting filesort, then you should preferably either avoid ORDER BY or you should create appropriate index(es).
if the data is small enough, it does operations in Memory else it goes on the disk.
so you may try and change the variable < sort_buffer_size > as well.
there are always tradeoffs, one way to improve the preformance of order query is to set the buffersize and then the run the order by query which improvises the performance of the query
set sort_buffer_size=100000;
<>
If this size is further increased then the performance will start decreasing

Retrieve min and max values from different tables with same strucure

I have some logs tables with the same structure. Each tables is related to a site and count billion of entries. The reason of this split is to perform quick and efficient query, because 99.99% of the query are related to the site.
But at this time, I would like to retrieve the min and max value of a column of these tables?
I can't manage to write the SQL request. Should I use UNION?
I am just looking for the request concept, not the final SQL request.
You could use a UNION, yes. Something like this should do:
SELECT MAX(PartialMax) AS TotalMax
FROM
( SELECT MAX(YourColumn) AS PartialMax FROM FirstTable UNION ALL SELECT MAX(YourColumn) AS PartialMax FROM SecondTable ) AS X;
If you have an index over the column you want to find a MAX inside, you should have very good performance as the query should seek to the end of the index on that column to find the maximum value very rapidly. Without an index on that column, the query has to scan the whole table to find the maximum value since nothing inherently orders it.
Added some details to address a concern about "enormous queries".
I'm not sure what you mean by "enormous". You could create a VIEW that does the UNIONs for you; then, you use the view and it will make the query very small:
SELECT MAX(YourColumn) FROM YourView;
but that just optimizes for the size of your query's text. Why do you believe it is important to optimize for that? The VIEW can be helpful for maintenance -- if you add or remove a partition, just fix the view appropriately. But a long query text shouldn't really be a problem.
Or by "enormous", are you worried about the amount of I/O the query will do? Nothing can help that much, aside from making sure each table has an index on YourColumn so that maximum value on each partition can be found very quickly.

Selecting all fields (but one) instead of using asterix (*) decreases running time by 10 times [duplicate]

I've Googled this question and can't seem to find a consistent opinion, or many opinions that are based on solid data. I simply would like to know if using the wildcard in a SQL SELECT statement incurs additional overhead than calling each item out individually. I have compared the execution plans of both in several different test queries, and it seems that the estimates always read the same. Is it possible that some overhead is incurred elsewhere, or are they truly handled identically?
What I am referring to specifically:
SELECT *
vs.
SELECT item1, item2, etc.
SELECT * FROM...
and
SELECT every, column, list, ... FROM...
will perform the same because both are an unoptimised scan
The difference is:
the extra lookup in sys.columns to resolve *
the contract/signature change when the table schema changes
inability to create a covering index. In fact, no tuning options at all, really
have to refresh views needed if non schemabound
can not index or schemabind a view using *
...and other stuff
Other SO questions on the same subject...
What is the reason not to use select * ?
Is there a difference betweeen Select * and Select list each col
SQL Query Question - Select * from view or Select col1,col2…from view
“select * from table” vs “select colA,colB,etc from table” interesting behaviour in SqlServer2005
Do you mean select * from ... instead of select col1, col2, col3 from ...?
I think it's always better to name the column and retrieve the minimal amount of information, because
your code will work independently of the physical order of the columns in the db. The column order should not impact your application, but it will be the case if you use *. It can be dangerous in case of db migration, etc.
if you name the columns, the DBMS can optimize further the execution. For instance, if there is an index that contains all the data your are interested in, the table will not be accessed at all.
If you mean something else with "wildcard", just ignore my answer...
EDIT: If you are talking about the asterisk wild card as in Select * From ... then see other responses...
If you are talking about wildcards in predicate clauses, or other query expressions using Like operator, (_ , % ) as described below, then:
This has to do with whether using the Wildcard affects whether the SQL is "SARG-ABLE" or not. SARGABLE, (Search-ARGument-able)means whether or not the query's search or sort arguments can be used as entry parameters to an existing index. If you prepend the wild card to the beginning of an argument
Where Name Like '%ing'
Then there is no way to traverse an index on the name field to find the nodes that end in 'ing'.
If otoh you append the wildcard to the end,
Where Name like 'Donald%'
then the optimizer can still use an index on the name column, and the query is still SARG-able
If that you call SQL wild car is *. It does not imply performance overhead by it self. However, if the table is extended you could find yourself retrieving fields you doesn't search.
In general not being specific in the fields you search or insert is a bad habit.
Consider
insert into mytable values(1,2)
What happen if the table is extended to three fields?
It may not be more work from an execution plan standpoint. But if you're fetching columns you don't actually need, that's additional network bandwidth being used between the database and your application. Also if you're using a high-level client API that performs some work on the returned data (for example, Perl's selectall_hashref) then those extra columns will impose performance cost on the client side. How much? Depends.

Why shouldn't we use Select * in a mysql query on a production server?

Based on this question here Selecting NOT NULL columns from a table One of the posters said
you shouldn't use SELECT * in production.
My Question: Is it true that we shouldn't use Select * in a mysql query on a production server? If yes, why shouldn't we use select all?
Most people do advise against using SELECT * in production, because it tends to break things. There are a few exceptions though.
SELECT * fetches all columns - while most of the times you don't
need them all. This causes the SQL-server to send more columns than
needed, which is a waste and makes the system slower.
With SELECT *, when you later add a column, the old query will also
select this new column, while typically it will not need it. Naming
the columns explicitly prevents this.
Most people that write SELECT * queries also tend to grab the rows
and use column order to get the columns - which WILL break your code
once columns are injected between existing columns.
Explicitly naming the columns also guarantees they are always in the same order, while SELECT * might behave differently when the table column order is modified.
But there are exceptions, for example statements like these:
INSERT INTO table_history
SELECT * FROM table
A query like that takes rows from table, and inserts them into table_history. If you want this query to keep working when new rows are added to table AND to table_history, SELECT * is the way to go.
Remember that your database server isn't necessarily on the same machine as the program querying the database. The database server could be on a network with limited bandwidth; it could even be halfway across the world.
If you really do need every column, then by all means do SELECT * FROM table.
If you only need certain columns, though, it would waste bandwidth to ask for all columns using SELECT * FROM table only to throw half the columns away.
Other potential reasons it might be good to specify which exact columns you want:
The database structure may change. If your program assumes certain column names, then it may fail if the column names change, for example. Explicitly naming the columns you want to retrieve will make the program fail immediately if your assumptions about the column names are violated.
As #Konerak mentioned, naming the columns you want also ensures that the order of the columns in your result is the same, even if the table schema changes (i.e. inserting one column in-between two others.) This is important if you're depending on FirstName being the [2]nd element of a result.
(Note: a more robust and self-documenting way of dealing with this is to ask for your database results as a list of key-value pairs, like a PHP associative array, Perl hash or a Python dict. That way you never need to use a number to index into the result (name = result[2] ) - instead you can use the column name: name = result["FirstName"].)
Using SELECT * is very inefficient, especially for tables that have a lot of columns. You should only select the columns you need.
Besides this, using column names makes the query easier to read and maintain.

MySQL SELECT efficiency

I am using PHP to interact with a MySQL database, and I was wondering if querying MySQL with a "SELECT * FROM..." is more or less efficient than a "SELECT id FROM...".
Less efficient.
If you think about it, SQL is going to send all the data for each row you select.
Imagine that you have a column called MassiveTextBlock - this column will be included when you SELECT * and so SQL will have to send all of that data over when you may not actually require it. In contrast, SELECT id is just grabbing a collection of numbers.
It is less efficient because you are fetching a lot more information than just SELECT id. Additionally, the second question is much more likely to be served using just an index.
It would depend on why you are selecting the data. If you are writing a raw database interface like phpMySQL, then it may make sense.
If you are doing multiple queries on the same table with the same conditions and concatenation operations, then a SELECT col1, col2 FROM table may make more sense to do than using two independent SELECT col1 FROM table and a SELECT col2 FROM table to get different portions of the same data, as this performs both operations in the same query.
But, in general, you should only select the columns you need from the table in order to minimize unnecessary data from being dredged up by the DBMS. The benefits of this increase greatly if your database is on a different server from the client server, or if your database server is old and/or slow.
There is NO CONDITION in which a SELECT * is unavoidable, but if there is, then your data model probably has some serious design flaws.
It depends on your indexes.
Selecting fewer columns can sometimes save a lot of time, if you select only columns that exist in the index that MySQL has used to fetch the results.
For example, if you have an index on the column id, and you perform this query:
SELECT id FROM mytable WHERE id>5
Then MySQL only needs to read the index, and does not need to even read the table row. If on the other hand, you select additional columns, as on:
SELECT id, name FROM mytable WHERE id>5
Then MySQL will need to read the id from the index, then go to the table row to read the name column.
If, however, you are reading columns that aren't in the index you're selecting on anyway, then reading more columns really won't make nearly as much difference, though it will make a small difference. If some columns contain a large amount of data, such as large TEXT or BLOB columns, then it'd be wise to exclude them if you don't need them.