Why is the use of wildcard * in select statements discouraged? [closed] - mysql

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've been advised in this self page to not use the wildcard * in my SQL queries.
Wrong query
SELECT * FROM table
instead of
SELECT field_a, field_b, field_c FROM table
I understand only one reason, if you need to know only 3 fields from that query there is no point in force the sql engine to work with all the fields and send them back to your program if you just want to use a few.
But this makes me doubt if is correct to use it if you need all (or almost all) the field data retrieved, or even in those cases it's better to specify all your fields.
Is there any other reason to avoid wildcards than reducing the amount of data sent from the DB engine to the program?

The reason which you have understood is very much valid and is perhaps the most strong reason why it is said so.
In many of the application where the table contains too many columns(lets say 20) and the table size is also huge containing millions of records and if you want to retrieve only specific column then there is no point of using wildcard * as then the MYSQL engine has to unnecessarily iterate though all the columns and all the rows.
But to make a point it is nothing like that * is discouraged infact it can be a boon in the same situation when you have 20 columns and you want to retrieve the values from all the columns.
To add more to it the * could be slower because of the floowing reasons:
In your table you dont create index on all of your columns and the query uses full table scan. So it makes the query slow.
When you want to return trailing fields from table which contain variable length columns and hence can result in a slight searching overhead.

Using * means you're querying all the table's fields. If that's what your application is supposed to do, it makes sense to use it.
If that's not what your application is supposed to do, it's a potential recipe for trouble. If the table is ever modified in the future, the best case scenario is that your application will be querying columns it doesn't need, which may harm performance. The worst case scenario is that it will just break.

I agree with all others that it's not "evil" per se, but I do try and avoid it because of the specific design pattern that I follow. Generally after designing tables, I then create views and join together relevant tables. Finally, I create stored procedures which then select from the views.
I have found that it is problematic (at least in SQL Server) to use wildcard selects in the views and stored procedures. Everything looks good at first, but it breaks down after new fields are added to the source tables. It corrupts the views, and they must then rebuilt to be fixed. Based on the size of the system, this can be a problem.
Since wildcard selects in views cause corrupted views after the source tables are altered, I have started avoiding wildcard selects and will manually alter views after adding new columns to tables.

There is not such a specific reason except following two which are also considered as efficient and optimized method to write query.
You might not need all fields with query so its better to get only required fields will be reduce system load to run query and data will be fetched much faster.
Sometimes we have fields with weird name like lcf_user_email_response which we dont want to use while getting data and showing it on site so to make field alias we use field name not the wildcard.
using field name gives us more freedom to play with fields and output
but there is no such restriction or bad with with wildcard, use them if you need all fields.

Related

When/why to use (combined) indexes? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
While working on a project to test the performance on a database, I came to the point of adding indexes.
Having surfed a big part on the internet I'm still left with a couple of questions.
On what table/column is it a good idea to put an index?
I have different types of tables for examples a table full with predefined country names. So I believe it is a good idea to put an index on the Column country_name. I know this is good because there is a small chance I have to add new records to this table and queries will be faster when using a country_name in the where clause.
But what about more complex tables like client (or any other table that will chance a lot and contains a big amount of columns)?
What about combined indexes?
When are combined indexes a good idead, is it when I will query a lot of clients with their first_name and last_name together? Or is it better to add individual indexes to both those columns?
Paradox?
Having read this answer on stackoverflow, I'm left with a paradox. Knowing the data will increase significantly, is a reason for me to add an index. But will slow it down at the same time, as indexes slow down updates/inserts.
e.g. I have to keep a daily track of the weight of clients(>3M records). Adding an index will help me get my results faster. But I gain about 1000 new clients each day, so I'll have to insert them AND update their weights. Which means slower performance because of the inserts/updates.
mySQL specific addition
Is there an advantage on different storage engines, combined with indexes?
As for now I've only used innoDB.
I'm going to focus on the "Combined Indexes" part of the question, but use that to cover several other points that I think will help you better understand indexes.
What about combined indexes?
When are combined indexes a good idead, is it when I will query a lot of clients with their first_name and last_name together? Or is it better to add individual indexes to both those columns?
Indexes are just like phone books. A phone book is a table with fields for Last_Name, First_Name, Address, and Phone_Number. This table has an index on Last_Name,First_Name. This is what you called a combined index.
Let's say you wanted to find "John Smith" in this phone book. That would work out to an query like this:
SELECT * FROM PhoneBook WHERE First_Name = 'John' and Last_Name = 'Smith';
That's pretty easy in your phone book. Just find the section for "Smith", and then go find all the "John"s within that section.
Now imagine that instead of a combined index on Last_Name,First_Name, you had separate indexes: one for Last_Name and one for First_Name. You try to run the same query. So you open up the Last_Name index and find the section for Smith. There are a lot of them. You go to find the John's, but the First_Name fields aren't in the correct order. Maybe it's ordered by Address now instead. More likely in a database, it's in order by when this particular Mr or Ms Smith first moved to town. You'll have to go through all of the Smiths to find your phone number. That's not so good.
So we move to the First_Name index instead. You do the same process and find the section for "John". This isn't any better. We didn't specify to additionally order by last name, and so you have to go through all of the Johns to find your Smiths.
This is exactly how database indexes work. Each index is just a copy of the information included with index, stored in the order specified by the index, along with a pointer back to the full record. There are some additional optimizations, like not filling up each page the index so that you can more efficiently add new entries without having to rebuild the whole index (you only need to rebuild that page), but in a nutshell each new index is another phone book that you have to maintain. Hopefully you can see now why things COLUMN LIKE '%keyword%' searches are so bad.
The other thing to understand about indexes is they exist to support queries, not tables. You don't necessarily want to look at a table and think about what columns you'll key on. You want to look at your queries and think about what columns they use for each table.
For this reason, you may still need separate indexes for both First_Name and Last_Name. This would be when you need to support different queries that use different means to query the table. This is also why application don't always just let you search by any field. Each additional searchable field requires new indexes, which adds new performance cost to the application.
This is also the reason why it's so important to have a separate and organized database layer in your application. It helps you get a handle on what queries you really have, and therefore what indexes you really need. Good tiered application design, or a well-designed service layer for the service-oriented crowd, is really a performance thing as much as anything else, because database performance often cuts to the core of your larger application performance.
Ok you need to know 2 thing: index are for increase speed of search ( select ) but will slow your changes ( insert/update/delete ) if you need to do a track, try use a table only for collect informations, and athor table to be sintetisez your info about your track. Example:
table track ( ip,date,page,... )
table hour_track ( page,number_visitator,date )
In table track you will only add, no update or delete. Table hour_track you will generate with a cronjob ( or athor thenique ) and there you will add a combinate index ( most_search, secound_most_search, ... ) . Combinated index will increase your speed because your databse need only remake 1 arbores not more, more then that if maiby you need a index for a column because there column is more used for your query you can add there column to be first of your index declaration. You can red more here

Batch Set all MySQL columns to all NULL

I have a large database w/ a bunch of tables and columns are mixed some allowing NULL while others not allowing NULL..
I just recently decided to STANDARDIZE my methods and USE NULL for all empty fields etc.. therefore i need to set ALL COLUMNS in ALL my tables to allow NULL (except for primaries ofcourse)
I can whip up a php code to loop this , but i was wondering if there's a quick way to do it via SQL?
regards
You can use meta data from system tables to determine your tables, columns, types etc. And then using that, dynamically build a string that contains your UPDATE SQL, with table and column names concatented in to it. This is then executed.
I've recently posted a solution that allowed the OP to search through columns looking for those that contain a particular value. In lieu of anyone providing a more complete answer, this should give you some clues about how to approach this (or at least, what to research). You'd need to either provide table names, or join to them, and then do something similar as this except you'd be checking type, not value (and the dynamic SQL you build would build an update, not a select).
I will be in a position to help you with your specific scenario further in a few hours... If by then you've had no luck with this (or other answers) then I'll provide something more complete then.
EDIT: Just realised you've tagged this as mySql... My solution was for MS SQL Server. The principals should be the same (and hence I'll leave this answer up as i think youll find it usefull), assuming MySql allows you to query its metadata, and execute dynamically generated SQL commands.
SQL Server - Select columns that meet certain conditions?

Which of these 2 MySQL DB Schema approaches would be most efficient for retrieval and sorting?

I'm confused as to which of the two db schema approaches I should adopt for the following situation.
I need to store multiple attributes for a website, e.g. page size, word count, category, etc. and where the number of attributes may increase in the future. The purpose is to display this table to the user and he should be able to quickly filter/sort amongst the data (so the table strucuture should support fast querying & sorting). I also want to keep a log of previous data to maintain a timeline of changes. So the two table structure options I've thought of are:
Option A
website_attributes
id, website_id, page_size, word_count, category_id, title_id, ...... (going up to 18 columns and have to keep in mind that there might be a few null values and may also need to add more columns in the future)
website_attributes_change_log
same table strucuture as above with an added column for "change_update_time"
I feel the advantage of this schema is the queries will be easy to write even when some attributes are linked to other tables and also sorting will be simple. The disadvantage I guess will be adding columns later can be problematic with ALTER TABLE taking very long to run on large data tables + there could be many rows with many null columns.
Option B
website_attribute_fields
attribute_id, attribute_name (e.g. page_size), attribute_value_type (e.g. int)
website_attributes
id, website_id, attribute_id, attribute_value, last_update_time
The advantage out here seems to be the flexibility of this approach, in that I can add columns whenever and also I save on storage space. However, as much as I'd like to adopt this approach, I feel that writing queries will be especially complex when needing to display the tables [since I will need to display records for multiple sites at a time and there will also be cross referencing of values with other tables for certain attributes] + sorting the data might be difficult [given that this is not a column based approach].
A sample output of what I'd be looking at would be:
Site-A.com, 232032 bytes, 232 words, PR 4, Real Estate [linked to category table], ..
Site-B.com, ..., ..., ... ,...
And the user needs to be able to sort by all the number based columns, in which case approach B might be difficult.
So I want to know if I'd be doing the right thing by going with Option A or whether there are other better options that I might have not even considered in the first place.
I would recommend using Option A.
You can mitigate the pain of long-running ALTER TABLE by using pt-online-schema-change.
The upcoming MySQL 5.6 supports non-blocking ALTER TABLE operations.
Option B is called Entity-Attribute-Value, or EAV. This breaks rules of relational database design, so it's bound to be awkward to write SQL queries against data in this format. You'll probably regret using it.
I have posted several times on Stack Overflow describing pitfalls of EAV.
Also in my blog: EAV FAIL.
Option A is a better way ,though the time may be large when alert table for adding a extra column, querying and sorting options are quicker. I have used the design like Option A before, and it won't take too long when alert table while millions records in the table.
you should go with option 2 because it is more flexible and uses less ram. When you are using option1 then you have to fetch a lot of content into the ram, so will increases the chances of page fault. If you want to increase the querying time of the database then you should defiantly index your database to get fast result
I think Option A is not a good design. When you design a good data model you should not change the tables in a future. If you domain SQL language, using queries in option B will not be difficult. Also it is the solution of your real problem: "you need to store some attributes (open number, not final attributes) of some webpages, therefore, exist an entity for representation of those attributes"
Use Option A as the attributes are fixed. It will be difficult to query and process data from second model as there will be query based on multiple attributes.

Is there any way to make queries using functions in their WHERE sections relatively fast?

Let's take a table Companies with columns id, name and UCId. I'm trying to find companies whose numeric portion of the UCId matches some string of digits.
The UCIds usually look like XY123456 but they're user inputs and the users seem to really love leaving random spaces in them and sometimes even not entering the XY at all, and they want to keep it that way. What I'm saying is that I can't enforce a standard pattern. They want to enter it their way, and read it their way as well. So i'm stuck having to use functions in my where section.
Is there a way to make these queries not take unusably long in mysql? I know what functions to use and all that, I just need a way to make the search at least relatively fast. Can I somehow create a custom index with the functions already applied to the UCId?
just for reference an example of the query I'd like to use
SELECT *
FROM Companies
WHERE digits_only(UCId) = 'some_digits.'
I'll just add that the Companies tables usually has tens of thousands of rows and in some instances the query needs to be run repeatedly, that's why I need a fast solution.
Unfortunately, MySQL doesn't have such things as function- (generally speaking, expression-) based indexes (like in Oracle or PostgreSQL). One possible workaround is to add another column to Companies table, which will actually be filled by normalized values (i.e., digits_only(UCId)). This column can be managed in your code or via DB triggers set on INSERT/UPDATE.

update vs. insert

We are using innodb and have a table that will have many millions of rows. One of the columns will be a varchar(32) whose value will change fairly often. Doing updates to this varchar on tens of thousands of rows will take a long time, so we are trying with the idea of splitting this field off into its own table and then instead of doing updates, we can do a delete followed by a batch insert using load data in file. It seems like this will greatly improve performance. Am I missing something though? Is there an easier way to improve update performance? Has anybody done anything like this before?
If you can select the rows you want to update based on indeces alone this should in practice do the same as your suggestions (and still keep a sane data organization, hence be preferable). Quiet possibly this is even faster than doing it yourself.
You could create an index appropriate to the where clause of your update statement.
The idea of splitting it up may improve performance (I'm not sure), but only, when all values change at once. When individual values change, this approach is slower than the approach with one table.
Another precondition for being faster is, that you must know the key->value mapping of the second table beforehand. If you have to look into the first table for deciding how to store values in the second one, you are also slower than with one table.