this is all about how to store content in the most efficient way in a database.
The most important thing here is not to save as much space as possible - the focus lays on the fastest way to use this data.
So in general its an easy thing :
We have 10 choices with radio boxes - we can select ALL or we can select none - or just select one/some of 'em.
So in general I see two general options to save the result in my database :
A) Just make 10 Fields on my table with Tinyint(1) and set a 0 or 1
B) I could use ONE Int(7) which will have the chance to decode the result like a binary system - f.E. if you choose option 3, 5 & 8 its like 00101001.
So the question is which makes more sense ?
B will take only 4 Bytes and A will take 8 Bytes - besides B will need a short PHP function to decode the binary thing.
The question now what option do you think will be of better usage sooner or later if the database get a hell of querys ?
It would probably be better to have 10 different fields, unless you know that each query will always need ALL 10 fields.
It'll be easier for the programmer (not always having to calculate bitvectors). It's easier to add new columns or remove them. You can put small, fast indexes on some columns, but not all.
Related
Edit 1:
Because few good ppl have pointed out that my question isnt very clear, I thought I will rewrite it and make it more clear now.
So basically, I am making an app, which allows users to create his own form with his own set of input fields, with data like name, type etc. After creating his form and he publishes the form, whenever there is an entry in the form, the data gets saved into the db ofcourse. Because the form itself is dynamic. I need a way to save this data.
My first choice was JSONizing it and saving. But because I cannot do any SQL queries on them, if I save in JSON format, i am eliminating this option.
Then the simple method is storing in a table like (id, rowid, columnname, value) and i keep the rowid same for all row data. But in this way, if a form contains 30 fields, after 100 entries my db would have 3000 rows. so in the long run, it would go huge and I think queries will get slow when there are millions of rows in the table.
Then I got this idea of a table like (id, rowid, column1, column2...column100). And i will save all the inputs in the form into single row. In this way it does add only 1 row per submit and its easier to query too. I will store the actual column names and map them to the right column(number) from there. This is my idea. column100 because 100 is the maximum inputs the user can add in his form.
So my question is, whether my idea is good, or should I stick to the classic table.
If I've understood your question, you need to have to design a database structure to store data whose schema you don't know in advance.
This is hard - there's no "efficient" solution in relational databases that I'm aware of.
Option 1 would be to look at a non-relational (NoSQL) solution instead.
I won't elaborate the benefits and drawbacks, as they are highly dependent on which NoSQL option you choose.
It's worth noting that many relational engines (including MySQL) allow you to store and query structured data formats like JSON. I've not used this feature in MySQL myself, but similar functionality in SQL Server performs very well.
Within relational databases, the common solution is an "Entity/Attribute/Value" (EAV)schema. This is sorta like your option 2.
EAV designs can theoretically store an unlimited number of columns, and an unlimited number of rows - but common queries quickly become impossible. In your sample data, finding all records where the name begins with a K and the power is at least 22 turns into a very complex SQL query. It also means the application needs to enforce rules of uniqueness, mandatory/optional data attributes, and data transformation from one format to another.
From a performance point of view, this doesn't really scale to complex queries. This is because every clause in your "where" needs a self join, and indexes won't have a big impact on searches for non-text strings (searching for numerical "greater than 20" is not the same as searching for a text "greater than 20".).
Option 3 is, indeed, to make the schema logic fit into a limited number of columns (your option 1).
It means you have a limitation on the number of columns, and you still have to manage mandatory/optional, uniqueness etc. in the application. However, querying the data should be easier - finding accounts where the name starts with K and the power is at least 22 is a fairly straightforward exercise.
You do have a lot of unused columns, but that doesn't really impact performance much - disk space is so cheap that all the wasted space is probably less space than you carry around in your smart phone.
If I understand your requirement, what I will do with your requirement is to create a many to many relationship something like this:
(tbl1) form:
- id
- field1
- field2
(tbl2) user_added_fields:
- id
- field_name
(tbl3) form_table_user_added_fields:
- form_id (fk)
- user_added_fields_id (fk)
This may not likely to solve your own requirements, but I hope this will give you a hint. Happy coding! :)
Hi just a simple question
I need to store data to database, there are 2 option to show now
Data : a,b,c,d
1. store a,b,c,d in 1 column, when needed only query and perform splitting in application
2. store a,b,c,d to 4 different column, can query directly from database
Which option will be better? My concern is split it into 4 different column will make the tables contain many column, does it slow down the performance? And also I am curious is it possible the query is fast but the transfer of data to my application is slow?
MySQL performance is a complicated subject. To the issue you raised:
My concern is split it into 4 different column will make the tables
contain many column, does it slow down the performance?
there is nothing inherently worse, from a performance perspective, to have 4 columns, or 10, or 20, or 50.
Now, that being said, there are things that could impact performance, and probably will if you don't know about them. For example, if you SELECT * FROM {my_table} when really you only need to SELECT a FROM {my_table}... yeah, that'll impact your performance (although there are arguments to be made in favor of SELECT * FROM {my_table} depending on your caching strategy).
Likewise, you'll want to consider LIMIT statements. To your question
And also I am curious is it possible the query is fast but the
transfer of data to my application is slow?
Yes, of course. If you only need 50 rows and your table has 50000, you're gonna want to add limit clauses to your SQL statements, or you'll be sending a lot more data over the wire than you need to be. Memory is faster than disk and disk is faster than network. If you're sending a lot of data over the wire that you don't need, you better believe it's gonna cause performance problems. But again, keep in mind, that has nothing to do with how many columns you have. There is absolutely nothing inherent in the number of columns a table has that affects performance (at least not at the scale that you're talking about and in the way that you are thinking about it)
All of which to say, performance is a complex topic. You should take a look into it, if you're interested. And it sounds like a,b,c, and d are logically different columns, so you should probably go ahead and store them in different columns in MySQL. Hope this helps.
I'm planning to develop a PHP Web App, it will mainly be used by registered users(sessions)
While thinking about the DB design, I was contemplating that in order to give the best user experience possible there would be lots of options for the user to activate, deactivate, specify, etc.
For example:
- Options for each layout elements, dialog boxes, dashboard, grid, etc.
- color, size, stay visible, invisible, don't ask again, show everytime, advanced mode, simple mode, etc.
This would get like 100s of fields ranging from simple Yes/No or 1 to N values..., for each user.
So, is it having a field for each of these options the way to go?
or how do those CRMs or CMS or other Web Apps do it to store lots of 1-2 char long values?
Do they group them on Text fields separated by a special char and then "explode" them as an array for runtime usage?
thank you
How about something like this:
CREATE TABLE settings (
user_id INT,
setting_name VARCHAR(255),
setting_value CHAR(2)
)
That way, to store a configuration setting for a user, you can do:
INSERT INTO settings (user_id, setting_name, setting_value),
VALUES (1, "timezone", "+8")
And when you need to query a setting for a particular user, you can do:
SELECT setting_value FROM settings
WHERE user_id = 1 AND setting_name = "timezone"
I would absolutely be inclined to have individual fields for each option. My rule of thumb is that each column holds exactly one piece of data whenever possible. No more, no less. As was mentioned earlier, the ease of maintenance and the ability to add / drop options down the road far outweighs the pain in the arse of setting it up. I would, however, put some thought into how you create the table(s). The idea mentioned earlier was to have a Settings table with 100 columns ( one for each option ) and one row for each user. That would work, to be sure. If it were me I would be inclined to break it down a bit further. You start with a basic User table, of course. That would hold the basics of username, password, userid etc. That way you can use the numeric userid as the key index for your Settings table(s). But after that I would try to break down the settings into smaller tables based on logical usage. For example, if you have 100 options, and 19 of those pertain to how a user views / is viewed / behaves in one specific part of the site, say something like a forum, then break those out into a separate table ie ForumSettings. Maybe there are 12 more that pertain to email preferences, but would not be used in other areas of the site / app. Now you have an EmailSettings table. Doing this would not only reduce the number of columns in your generic Settings table, but it would also make writing queries for specific tasks or areas of the app much easier, speed up the performance a tick, and make maintenance moving forward far less painful. Some may disagree as from a strictly data modeling perspective I'm pretty sure that the one Settings table would be indicated. But from a real world perspective, I have never gone wrong using logical chunks such as this.
From a pure data-model perspective, that would be the clearest design (though awful wide). Some might try to bitmask them into a single field for assumed space reasons, but the logic to encode/decode makes that not worthwhile, in my opinion. Also you lose the ability to index on them.
Another option (I just saw posted) is to hold a separate table with an FK back to the user table. But then you have to iterate over the results to get the value you want to check for.
I have a project which needs an Excel GUI (client's request) with a backend mysql db/table requiring almost 90 fields.
(almost 60 fields are duplications of 6 fields.)
After giving it some thought, I ended up creating a table with 11 fields: 10 searcheable fields, and one big field which can contain up to 60 fields "together", separated by ":"
So a record on that big field would be look something like this:
charge1:100:200:200::usd:charge2:1000:2000:2000::usd:charge3:150:200:200:250:USD, and so on
As you can see, these are blocks of 6 fields and can be up 10 of these "blocks", but never more than 255 characters altogether.
None of these "fields" need to to be indexed nor searched for (that's done on the other 10 fields)
What I am doing is "SELECT *" query (with an Excel GUI) of the 11 fields and then (with VBA) I separate these values to columns (this takes less than 1 second).
With VBA I display the data on certain fields within the Excel "form".
This is working fine and I am very happy with the results, as I was looking for a light, simple and super fast solution, and it is.
Is there a "technical" reason for not doing this ?
Perhaps fields with too many characters might give problems ????
I understand there are many ways of handling this, however this is a small project and I am looking for a simple solution that works, not a complex one (with too many tables and/or fields)
Since the GUI is an excel interface I don't want to make it too complex if there isn't need for that.
Thanks in advance for your input.
I think you already have a pretty good idea of problems that may arise.
Indexing doesn't work real good on those fields, updating and reading individual values requires extra work in your application.
Also, you're storing what looks mostly like numbers in a string-type column, so that means some extra storage space (though you'd have to weigh that against a bit of overhead for separate columns).
It might turn into a nightmare when the structure of those columns changes.
All of that might be manageable effort for you, but it's entirely possible that the dev after you will hate you. :p
I am designing a database and have run into something that I do not like. I will have up to 1000 different scenarios, and in each scenario I will need to store the state of each one of 64 different toggles. I've come up with the following tables:
Scenario (Scenario_ID, Scenario_Name)
Toggles (Scenario_ID, Toggle_1, Toggle_2, ..., Toggle_64)
This leaves me with a 65-column Toggles table. There has to be a better way than having 64 columns of "on" or "off", but I am at a loss as to what it might be. I don't want to store the toggle state in a CSV in one column because it will be constantly changing, and needs to be parsed quite often. It would be much easier for me to update the table by simply updating Toggle_14 to "off" rather than parsing a CSV, changing it, and reloading it.
What is a better database design?
Use a Many-to-Many relation:
Table Scenario (Id, Name)
Table ScenarioToggles (ScenarioId, TogglesId, ToggleState)
Table Toggles (Id, ToggleName)
The field, ToggleState in the ScenarioToggles table will hold the On/Off value.
It also yields some flexibility if you ever need more than 64 toggles.
You can design Toogles Table it like this:
Scenario_ID
Toogle_nr
Toogle_state
and get 64 entries for each scenario.
However, if toogles count is not going to change often and in every scenario you'll need all the toogles, your solution seems to be the best and the simplest.
It's a hack (sort of, ish, kinda), but depending on your lookup requirements, you could just store a single 64 character string, which you could access in your programming language of choice as an array.
However, the many, many, downsides of doing this more than likely outweigh the benefits. (Then again, storing 64 bits of flippable state information in a database is a pretty odd requirement in the first place.)
Since your question is tagged "mysql" I'd recommend using the SET type. Updating records to toggle values can be a little tricky though, as MySQL doesn't natively offer functions to that purpose (search for REMOVE_FROM_SET at MySQL's bug tracker for a workaround.) On the other hand, it's very compact and it won't require you to join 64 tables or use aggregate functions for every SELECT query.
Some people are just trying to be way too clever. Use one table, with scenario Id, name, and 64 toggle columns. This is well normalised, fast, space efficient, and extensible.