mysql: using one field to contain many "fields" to save on fields - mysql

I have a project which needs an Excel GUI (client's request) with a backend mysql db/table requiring almost 90 fields.
(almost 60 fields are duplications of 6 fields.)
After giving it some thought, I ended up creating a table with 11 fields: 10 searcheable fields, and one big field which can contain up to 60 fields "together", separated by ":"
So a record on that big field would be look something like this:
charge1:100:200:200::usd:charge2:1000:2000:2000::usd:charge3:150:200:200:250:USD, and so on
As you can see, these are blocks of 6 fields and can be up 10 of these "blocks", but never more than 255 characters altogether.
None of these "fields" need to to be indexed nor searched for (that's done on the other 10 fields)
What I am doing is "SELECT *" query (with an Excel GUI) of the 11 fields and then (with VBA) I separate these values to columns (this takes less than 1 second).
With VBA I display the data on certain fields within the Excel "form".
This is working fine and I am very happy with the results, as I was looking for a light, simple and super fast solution, and it is.
Is there a "technical" reason for not doing this ?
Perhaps fields with too many characters might give problems ????
I understand there are many ways of handling this, however this is a small project and I am looking for a simple solution that works, not a complex one (with too many tables and/or fields)
Since the GUI is an excel interface I don't want to make it too complex if there isn't need for that.
Thanks in advance for your input.

I think you already have a pretty good idea of problems that may arise.
Indexing doesn't work real good on those fields, updating and reading individual values requires extra work in your application.
Also, you're storing what looks mostly like numbers in a string-type column, so that means some extra storage space (though you'd have to weigh that against a bit of overhead for separate columns).
It might turn into a nightmare when the structure of those columns changes.
All of that might be manageable effort for you, but it's entirely possible that the dev after you will hate you. :p

Related

"Externalize" a column in a MySQL table

i have kind of a weird request. My research up to this point and my previous experience tell me that it's not doable, but i want some more opininions.
I've been dropped in a flawed system, designed more than 10 years ago for a certain amount of data, and that has exploded in the last years.
I have a couple of tables in my DB that have a blob field containing an XML. Not so big, around 20-30k per xml, but i have a few millions rows in each table.
The problem is that the queries are quite slow, and the tables are quite big. I've checked and, without the xml field, each table will shrink more than 100x. Also this value is rarely read or written.
My question is this: the best practices would suggest to split the tables into t1_description and t1_contents - the first containing all the field except the xml and the latter containing just this one (and a fk) - but this system has a lot of applications reading and writing on the db, and a few of this application are huge monolith built with little to no future development in mind.
Changing the DB is out of the question, but: can i "externalize" the field, kinda like a partition would do, but without actually changing the table structure?
In this way any application requesting the xml field shouldn't need to change anything, but the DB would auto provide this data when requested.
Thanks a lot!

What is more efficient, a table with 100 columns and less rows or 5 columns and 30 times more rows

Edit 1:
Because few good ppl have pointed out that my question isnt very clear, I thought I will rewrite it and make it more clear now.
So basically, I am making an app, which allows users to create his own form with his own set of input fields, with data like name, type etc. After creating his form and he publishes the form, whenever there is an entry in the form, the data gets saved into the db ofcourse. Because the form itself is dynamic. I need a way to save this data.
My first choice was JSONizing it and saving. But because I cannot do any SQL queries on them, if I save in JSON format, i am eliminating this option.
Then the simple method is storing in a table like (id, rowid, columnname, value) and i keep the rowid same for all row data. But in this way, if a form contains 30 fields, after 100 entries my db would have 3000 rows. so in the long run, it would go huge and I think queries will get slow when there are millions of rows in the table.
Then I got this idea of a table like (id, rowid, column1, column2...column100). And i will save all the inputs in the form into single row. In this way it does add only 1 row per submit and its easier to query too. I will store the actual column names and map them to the right column(number) from there. This is my idea. column100 because 100 is the maximum inputs the user can add in his form.
So my question is, whether my idea is good, or should I stick to the classic table.
If I've understood your question, you need to have to design a database structure to store data whose schema you don't know in advance.
This is hard - there's no "efficient" solution in relational databases that I'm aware of.
Option 1 would be to look at a non-relational (NoSQL) solution instead.
I won't elaborate the benefits and drawbacks, as they are highly dependent on which NoSQL option you choose.
It's worth noting that many relational engines (including MySQL) allow you to store and query structured data formats like JSON. I've not used this feature in MySQL myself, but similar functionality in SQL Server performs very well.
Within relational databases, the common solution is an "Entity/Attribute/Value" (EAV)schema. This is sorta like your option 2.
EAV designs can theoretically store an unlimited number of columns, and an unlimited number of rows - but common queries quickly become impossible. In your sample data, finding all records where the name begins with a K and the power is at least 22 turns into a very complex SQL query. It also means the application needs to enforce rules of uniqueness, mandatory/optional data attributes, and data transformation from one format to another.
From a performance point of view, this doesn't really scale to complex queries. This is because every clause in your "where" needs a self join, and indexes won't have a big impact on searches for non-text strings (searching for numerical "greater than 20" is not the same as searching for a text "greater than 20".).
Option 3 is, indeed, to make the schema logic fit into a limited number of columns (your option 1).
It means you have a limitation on the number of columns, and you still have to manage mandatory/optional, uniqueness etc. in the application. However, querying the data should be easier - finding accounts where the name starts with K and the power is at least 22 is a fairly straightforward exercise.
You do have a lot of unused columns, but that doesn't really impact performance much - disk space is so cheap that all the wasted space is probably less space than you carry around in your smart phone.
If I understand your requirement, what I will do with your requirement is to create a many to many relationship something like this:
(tbl1) form:
- id
- field1
- field2
(tbl2) user_added_fields:
- id
- field_name
(tbl3) form_table_user_added_fields:
- form_id (fk)
- user_added_fields_id (fk)
This may not likely to solve your own requirements, but I hope this will give you a hint. Happy coding! :)

Performance Issues with Include in Entity Framework

I am working on a large application being developed using Repository Pattern, Web APIs, AngularJS. In one of the scenario, I am trying to retrieve data from a single lead which has relations with approx. 20 tables. Lazy loading is disable, so I am using Include to get the data from all the 20 tables. Now, here comes performance issue, if I try to retrieve single record, it takes approx. 15 seconds. This is a huge performance issue. I am returning JSON and my entities are decorated with DataContract(IsReference = true)/ Data Member attribute.
Any suggestions will be highly appreciated.
Include is really nasty for performance because of how it joins.
See more info in my blog post here http://mikee.se/Archive.aspx/Details/entity_framework_pitfalls,_include_20140101
To summarize the problem a bit it's because EF handles Include by joining. This creates a result set where every row includes every column of every joined entity (Some contain null values).
This is even more nasty if the root entity contains large fields (like a long text or a binary) because that one get repeated.
15s is way too much though. I suspect something more is at play like missing indexes.
To summarize the solutions. My suggestion is normally that you load every relation separately or in a multiquery. A simple query like that should be 5-30ms per entity depending on your setup. In this case it would still be quite slow (~1s if you are querying on indexes). Maybe you need to look at some way to store this data in a better format if this query is run often (Cache, document, json in the db). I can't help you with that though, would need far more information as the update paths affect the possibilities a lot.
The performance has been improved by Enabling Lazy Loading.

MySQL broad schema vs multiple tables

I'm currently modeling the schema for a project which will be using MySQL 5.5. I'm having issues, however with a current feature that's being requested. This feature would require there to be very specific sum of data pertaining to a certain "specialty". There are 47 specialties, and this will rarely change(possibly never). The problem is each of these 47 specialties would require its own table with a different schema.
This would mean my "programs" table will contain 47 different foreign keys, only one of which would bare an actual relation. This seems inefficient to me. However, adding possibly a thousand values to a single table doesn't seem much better, as most of those values won't be provided in any given scenario.
I feel like I'm choosing the better of 2 evils. Though the thought crossed my mind to embed these fields in a TEXT column without a schema, possibly as JSON. These fields don't need to be indexed, but this approach feels dirty also.
I've never dealt with an issue such as this. Is there any alternative to the approaches I mentioned? If not, what approach is best and why?

Having data stored across tables representing individual data types - Why is it wrong?

Say I have lots of time to waste and decide to make a database where information is not stored as entities but in separate inter-related tables representing INT,VARCHAR,DATE,TEXT, etc types.
It would be such a revolution to never have to design a database structure ever again except that the fact no-one else has done it probably indicates it's not a good idea :p
So why is this a bad design ? What principles is this going against ? What issues could it cause from a practical point of view with a relational database ?
P.S: This is for the learning exercise.
Why shouldn't you separate out the fields from your tables based on their data types? Well, there are two reasons, one philosophical, and one practical.
Philosophically, you're breaking normalization
A properly normalized database will have different tables for different THINGS, with each table having all fields necessary and unique for that specific "thing." If the only way to find the make, model, color, mileage, manufacture date, and purchase date of a given car in my CarCollectionDatabase is to join meaningless keys on three tables demarked by data-type, then my database has almost zero discoverablity and no real cohesion.
If you designed a database like that, you'd find writing queries and debugging statements would be obnoxiously tiresome. Which is kind of the reason you'd use a relational database in the first place.
(And, really, that will make writing queries WAY harder.)
Practically, databases don't work that way.
Every database engine or data-storage mechanism i've ever seen is simply not meant to be used with that level of abstraction. Whatever engine you had, I don't know how you'd get around essentially doubling your data design with fields. And with a five-fold increase in row count, you'd have a massive increase in index size, to the point that once you get a few million rows your indexes wouldn't actually help.
If you tried to design a database like that, you'd find that even if you didn't mind the headache, you'd wind up with slower performance. Instead of 1,000,000 rows with 20 fields, you'd have that one table with just as many fields, and some 5-6 extra tables with 1,000,000+ entries each. And even if you optimized that away, your indexes would be larger, and larger indexes run slower.
Of course, those two ONLY apply if you're actually talking about databases. There's no reason, for example, that an application can't serialize to a text file of some sort (JSON, XML, etc.) and never write to a database.
And just because your application needs to store SQL data doesn't mean that you need to store everything, or can't use homogenous and generic tables. An Access-like application that lets user define their own "tables" might very well keep each field on a distinct row... although in that case your database's THINGS would be those tables and their fields. (And it wouldn't run as fast as a natively written database.)