Best practice with MySQL - mysql

I'm just wondering which is the best practice:
I have a table and I want to hide a record so should I use a column like visible = 1 or should I create another table and transfer the data.
thanks!

I would recommend adding a isHidden field for this purpose. I usually use a tinyint for this.

Tables don't have performance—queries have performance. When you're trying to decide how to optimize, concentrate on the queries you will run against the table.
It could be worth moving the data to another table, for example if 90% of the data is "hidden" and rows seldom change their hidden state, you could greatly improve performance for queries against the non-hidden data by keeping that table small.
On the other hand, if you have a mix of queries where you sometimes include and sometimes exclude the "hidden" rows, or rows frequently change their hidden state, it would be more convenient to keep them in the same table.
Both strategies are valid in different circumstances. You need to take all your uses of the data into account.

Related

Optional fields on very large db

I am currently working on the redesign of a mysql bdd.
This database is actually a huge table of 200 fields, several million lines, almost no index... In short, disastrous performances and huge ram consumption!
I first managed to reduce the number of fields by setting up 1:n relationships.
I have a question on a specific point:
A number of fields on this database are optional and rarely filled in. (sometimes almost never)
What is the best thing to do in this case?
leave the field in the table even if it is very often of null value
set up a n:n relationship knowing that these relationships, if they exist, will only return one line
...or another solution I haven't thought of
Thank you in advance for your wise advice;)
Dimitri
My suggestion:
First of all, make sure to normalize your db. At least to 3rd Normal Form. This will probably reduce some of your original columns and split them over several tables.
Once that is done, in case you still have lots of 'optional and rarely filled' columns in some of your tables, consider how to proceed depending on your needs: What is most important to you? Disk space or Performance?
Take a look at MySQL Optimazing Data Size for extra tips/improvements for the re-design of your database ...depending on your needs.
Regarding 'set up a n:n relationship...' (I figure you meant 1:1 relationship) can be an interesting option in some cases:
In some circumstances, it can be beneficial to split into two a table that is scanned very often. This is especially true if it is a dynamic-format table and it is possible to use a smaller static format table that can be used to find the relevant rows when scanning the table (extracted from here)

Use many tables with few columns or few tables with many columns in MySQL? [duplicate]

I'm setting up a table that might have upwards of 70 columns. I'm now thinking about splitting it up as some of the data in the columns won't be needed every time the table is accessed. Then again, if I do this I'm left with having to use joins.
At what point, if any, is it considered too many columns?
It's considered too many once it's above the maximum limit supported by the database.
The fact that you don't need every column to be returned by every query is perfectly normal; that's why SELECT statement lets you explicitly name the columns you need.
As a general rule, your table structure should reflect your domain model; if you really do have 70 (100, what have you) attributes that belong to the same entity there's no reason to separate them into multiple tables.
There are some benefits to splitting up the table into several with fewer columns, which is also called Vertical Partitioning. Here are a few:
If you have tables with many rows, modifying the indexes can take a very long time, as MySQL needs to rebuild all of the indexes in the table. Having the indexes split over several table could make that faster.
Depending on your queries and column types, MySQL could be writing temporary tables (used in more complex select queries) to disk. This is bad, as disk i/o can be a big bottle-neck. This occurs if you have binary data (text or blob) in the query.
Wider table can lead to slower query performance.
Don't prematurely optimize, but in some cases, you can get improvements from narrower tables.
It is too many when it violates the rules of normalization. It is pretty hard to get that many columns if you are normalizing your database. Design your database to model the problem, not around any artificial rules or ideas about optimizing for a specific db platform.
Apply the following rules to the wide table and you will likely have far fewer columns in a single table.
No repeating elements or groups of elements
No partial dependencies on a concatenated key
No dependencies on non-key attributes
Here is a link to help you along.
That's not a problem unless all attributes belong to the same entity and do not depend on each other.
To make life easier you can have one text column with JSON array stored in it. Obviously, if you don't have a problem with getting all the attributes every time. Although this would entirely defeat the purpose of storing it in an RDBMS and would greatly complicate every database transaction. So its not recommended approach to be followed throughout the database.
Having too many columns in the same table can cause huge problems in the replication as well. You should know that the changes that happen in the master will replicate to the slave.. for example, if you update one field in the table, the whole row will be w

Which of these 2 MySQL DB Schema approaches would be most efficient for retrieval and sorting?

I'm confused as to which of the two db schema approaches I should adopt for the following situation.
I need to store multiple attributes for a website, e.g. page size, word count, category, etc. and where the number of attributes may increase in the future. The purpose is to display this table to the user and he should be able to quickly filter/sort amongst the data (so the table strucuture should support fast querying & sorting). I also want to keep a log of previous data to maintain a timeline of changes. So the two table structure options I've thought of are:
Option A
website_attributes
id, website_id, page_size, word_count, category_id, title_id, ...... (going up to 18 columns and have to keep in mind that there might be a few null values and may also need to add more columns in the future)
website_attributes_change_log
same table strucuture as above with an added column for "change_update_time"
I feel the advantage of this schema is the queries will be easy to write even when some attributes are linked to other tables and also sorting will be simple. The disadvantage I guess will be adding columns later can be problematic with ALTER TABLE taking very long to run on large data tables + there could be many rows with many null columns.
Option B
website_attribute_fields
attribute_id, attribute_name (e.g. page_size), attribute_value_type (e.g. int)
website_attributes
id, website_id, attribute_id, attribute_value, last_update_time
The advantage out here seems to be the flexibility of this approach, in that I can add columns whenever and also I save on storage space. However, as much as I'd like to adopt this approach, I feel that writing queries will be especially complex when needing to display the tables [since I will need to display records for multiple sites at a time and there will also be cross referencing of values with other tables for certain attributes] + sorting the data might be difficult [given that this is not a column based approach].
A sample output of what I'd be looking at would be:
Site-A.com, 232032 bytes, 232 words, PR 4, Real Estate [linked to category table], ..
Site-B.com, ..., ..., ... ,...
And the user needs to be able to sort by all the number based columns, in which case approach B might be difficult.
So I want to know if I'd be doing the right thing by going with Option A or whether there are other better options that I might have not even considered in the first place.
I would recommend using Option A.
You can mitigate the pain of long-running ALTER TABLE by using pt-online-schema-change.
The upcoming MySQL 5.6 supports non-blocking ALTER TABLE operations.
Option B is called Entity-Attribute-Value, or EAV. This breaks rules of relational database design, so it's bound to be awkward to write SQL queries against data in this format. You'll probably regret using it.
I have posted several times on Stack Overflow describing pitfalls of EAV.
Also in my blog: EAV FAIL.
Option A is a better way ,though the time may be large when alert table for adding a extra column, querying and sorting options are quicker. I have used the design like Option A before, and it won't take too long when alert table while millions records in the table.
you should go with option 2 because it is more flexible and uses less ram. When you are using option1 then you have to fetch a lot of content into the ram, so will increases the chances of page fault. If you want to increase the querying time of the database then you should defiantly index your database to get fast result
I think Option A is not a good design. When you design a good data model you should not change the tables in a future. If you domain SQL language, using queries in option B will not be difficult. Also it is the solution of your real problem: "you need to store some attributes (open number, not final attributes) of some webpages, therefore, exist an entity for representation of those attributes"
Use Option A as the attributes are fixed. It will be difficult to query and process data from second model as there will be query based on multiple attributes.

How bad is it to change the structure of big tables?

My target DB is MySQL, but lets try to look at this problem more "generic". I am about to create a framework that will modify table structures as needed. Tables may contain a hundred thousands of records some day. I might add a column, rename a column, or even change the type of a column (lets assume that's nothing impossible, i.e. I might only change a numeric column into a varchar column), or I might also drop a whole column. But in most cases I would just add columns.
Somebody told me that's an very bad idea. He said that very loud and clear, but with no further argumentation. So what do you think? Bad idea? Why?
In most databases, adding and renaming columns are simple operations that just change the table metadata. You should actually verify that that's the case for the MySQL storage engine you're using, though. Dropping a column should be lightweight too.
By contrast, changing the type of a column is an intensive operation, since it involves actually creating data for each row in the table. Similarly, adding a column and populating it (rather than leaving the new column's values as null).
It is a bad idea.
You will have client applications using these columns, so whenever you rename or remove any columns, you will break their code.
Changing the type of a column might be necessary sometimes, but that's not something you will have to do often. If the column is numeric and you do calculations with it, then you can't change it to be of type varchar. And if you don't need it for any calculations and you expect non-numeric characters later, why not define it as varchar in first place?
Databases are built to do some things very well: to handle data integrity, referential integrity and transactional integrity, to name just three. Messing with table structures (when they have existing data and relationships) is probably violating all three of those things simultaneously.
If you want to adjust the way your data is presented to the user (limiting visibility of columns, renaming columns etc.), you are much better off doing this by exposing your data via views and altering their definition as needed. But leave the tables alone!
it is very harmful and should be avoided as integrity of data is disturbed and your queries will also fail incase you have hardcoded queries or improper ORM.
if you are doing it be cautious and if it live take the backup of your DB first.

Should one steer clear of adding yet another field to a larger MySQL table?

I have a MySQL-InnoDB table with 350,000+ rows, containing a couple of things like id, otherId, shortTitle and so on. Now I'm in need of a Bool/ Bit field for perhaps a couple of hundreds or thousands of those rows. Should I just add that bool field into the table, or should I best create a new table referencing the IDs of the old table -- thereby not risking to cause performance issues on all the old existing functions that access the first table?
(Side info: I'm never using "SELECT * ...". The main table has lots of reading, rarely writing.)
Adding a field can indeed hamper performance a little, since your table row grow larger, but it's hardly a problem for a BIT field.
Most probably, you will have exactly same row count per page, which means having no performance decrease at all.
On the other hand, using an extra JOIN to access the row value in another table will be much slower.
I'd add the column right into the table.
What does the new column denote?
From the data modelling perspective, if the column belongs with the data under whichever normal form is in use, then put it with the data; performance impact be damned. If the column doesn't directly belong to the table, then put it in a second table with a foreign key.
Realistically, the performance impact of adding a new column on a table with ~350,000 isn't going to be particularly huge. Have you tried issuing the ALTER TABLE statement against a copy, perhaps on a local workstation?
I don't know why people insist in called 350K-row tables big. In the mainframe world, that's how big the DBMS configuration tables are :-).
That said, you should be designing your tables in third normal form. If, and only if, you have performance problems, then should you consider de-normalizing.
If you have a column that will apply only to certain of the rows, it's (probably) not going to be 3NF to put it in the same table. You should have a separate table with a foreign key into your 'primary' table.
Keep in mind that's if the boolean field actually doesn't apply to some of the rows. That's a different situation to the field applying to all rows but not being known for some. In that case, a nullable column in the primary table would be better. But that doesn't sound like what you're describing.
Requiring a bit field for the next entries only sounds like you want to implement inheritance. If that is the case, I would add it to a new table to keep things readable. Otherwise, it doesn't matter if you add it to the main table or not, unless your queries are not using indexes, in which case I would change that first before making any other decisions regarding performance.