I'm working on a custom CRM for a client who wants "custom fields" which need to be completely dynamic on a per-user basis. I have considered EAV because I cannot simply add these columns to the table. However, after doing some digging, I have settled with using a JSON column in order to store the custom field values in key value pairs.
My concern is the speed of doing lookups on data in this JSON column. How can I ensure that queries which search or perform where's on values in this JSON column remain fast with large amounts of rows in the table?
Or is a JSON column not the ideal solution for this?
Related
I'm planning of storing a large amount of data from a user submitted form (around 100 questions) in a json field.
I will only need to access for queries for two pieces of data from the form, name and type.
Would it be advisable (and more efficient), to extract name and type to their own fields for querying or shall I just whack it all in one json field and query that json field since json searching is now supported?
If you are concerned about performance, then maintaining separate fields for the name and type is probably the way to go here. The reason for this is that if these two points of data exist as separate fields, it leaves open the possibility to do things like add indices to those columns. While you can use MySQL's JSON API to query by name and type, it would most likely would never be able to compete with an index lookup, at least not in terms of performance.
From a storage point of view, you would not pay much of a price to maintain two separate columns. The main price you would pay is that everytime the JSON gets updated, you would have to also update the name and type columns.
I'm working on a website which has a database table with more than 100 fields.
The problem is when my records number get very much (like more than 10000) the speed of response gets very much and actually doesn't return any answer.
Now i want to optimize this table.
My question is: Can we use json type for fields to reduce the number of columns?
my limitation is that i want to search, change and maybe remove that specific data which is stored in json.
PS: i read this qustion : Storing JSON in database vs. having a new column for each key, but that was asked in 2013 and as we know in MuSQL 5.7 json field type is added.
tnx for any guide...
First of all having table with 100 columns may suggest you should rethink your architecture before proceeding. Otherwise it will only become more and more pain in later stages.
May be you are storing data as seperate columns which can be broken down to be stored as seperate rows.
I think the sql query you are writing is like (select * ... ) where you may be fetching extra columns than you may require. You may specify the columns you require. It will definitely speed up the api response.
In my personal view storing active data in json inside sql is not useful. Json should be used as last resort for the meta data which does not mutate or needs not to be searched.
Please make your question more descriptive about the schema of your database and query you are making for api.
I have the following problem:
We have a lot of different, yet similar types of data items that we want to record in a (MariaDB) database. All data items have some common parameters such as id, username, status, file glob, type, comments, start & end time stamps. In addition there are many (let's say between 40 and 100) parameters that are specific to each type of data item.
We would prefer to have the different data item types in the same table because they will be displayed along with several other data, as they happen, in one single list in the web application. This will appear like an activity stream or "Facebook wall".
It seems that the normalised approach with a top-level generic table joined with specific tables underneath will lead to bad performance. We will have to do both a lot of joins and unions in order to display the activity stream, and the application will frequently poll with this query, so it's important that the query runs fast.
So, which is the better solution(s) in terms of performance and storage optimization?
to utilize MariaDB's dynamic columns
to just add in all the different kinds of columns we need in one table, and just accept that each data item type will only use a few of the columns, i.e. the rest will be null.
something else?
Does it matter if we use regular columns when a lot of the data in them will be null?
When should we use dynamic columns and when is it better to use regular columns?
I believe you should have separate columns for the values you are filtering by. However, you might have some unfiltered values. For those it might be a good idea to store them in a single column as a json object (simple to encode/decode).
A few columns -- the main ones for using in WHERE and ORDER BY clauses (but not necessarily all the columns you might filter on.
A JSON column or MariaDB Dynamic columns.
See my blog on why not to use EAV schema. I focus on how to do it in JSON, but MariaDB's Dynamic Columns is arguably better.
The needs would be long to describe, so I'll simplify the example.
I want to make a form creation system ( the user can create a form, adding fields, etc... ). Let's focus on checkbox vs textarea.
The checkbox can have a value of 0 or 1, depending on the checked status.
The textarea must be a LONGTEXT type.
So in the database, that give me 3 choices concerning the structure of the table field_value:
1.
checkbox_value (TINYINT) | textarea_value (MEDIUMTEXT)
That mean that no input will ever use all column of the table. The table will waste some space.
2.
allfield_value (MEDIUMTEXT)
That mean that for the checkbox, I'll store a really tiny value in a MEDIUMTEXT, which is useless.
3.
tblcheckbox.value
tbltextarea.value
Now I have 1 separate table per field. That's optimal in terms of space, but in the whole context of the application, I might expect to have to read over 100 tables -- 1 query with a many JOIN ) in order to generate a single page that display a form.
In your opinion, what's the best way to proceed?
Do not consider an EAV data model. It's easy to put data in, but hard to get data out. It doesn't scale. It has no data integrity. You have to write lots of code yourself to do things that any RDBMS does for you if you model your data properly. Trying to use an RDBMS to create a general-purpose form management system that can accommodate any future needs is an example of the Inner-Platform Effect antipattern.
(By the way, if you do use EAV, don't try to join all the attributes back into a single row. You already commented that MySQL has a limit on the number of joins per query, but even if you can live within that, it doesn't perform well. Just fetch an attribute per row, and sort it out in application code. Loop over the attribute rows you fetch from the database, and populate your object field by field. That means more code for you to write, but that's the price of Inner-Platform Effect.)
If you want to store form data relationally, each attribute would go in its own column. This means you need to design a custom table for your form (or actually set of tables if your forms support multivalue fields). Name the columns according to the meaning of each given form field, not something generic like "checkbox_value". Choose a data type according to the needs of the given form field, not a one-size-fits-all MEDIUMTEXT or VARCHAR(255).
If you want to store form data non-relationally, you have more flexibility. You can use a non-relational document store such as MongoDB or even Solr. You can store documents without having to design a schema as you would with a relational database. But you lose many of the structural benefits that a schema gives you. You end up writing more code to "discover" the fields of documents instead of being able to infer the structure from the schema. You have no constraints or data types or referential integrity.
Also, you may already be using a relational database successfully for the rest of your data management and can't justify running two different databases simultaneously.
A compromise between relational and non-relational extremes is the Serialized LOB design, with the extension described in How FriendFeed Uses MySQL to Store Schema-Less Data. Most of your data resides in traditional relational tables. Your amorphous form data goes into a single BLOB column, in some format that encodes fields and data together (for example, XML or JSON or YAML). Then for any field of that data you want to be searchable, create an auxiliary table to index that single field and reference rows of form data where a given value in that respective field appears.
You might want to consider an EAV data model.
I have a situation where I have to create tables dynamically. Depending on some criteria I am going to vary the size of the columns of a particular table.
For that purpose I need to calculate the size of one row.
e.g.
If I am going to create a following table
CREATE TABLE sample(id int, name varchar(30));
so that formula would give me the size of a single row for the table above considering all overheads for storing a row in a mysql table.
Is possible to do so and Is it feasible to do so?
It depends on the storage engine you use and the row format chosen for that table, and also your indexes. But it is not a very useful information.
Edit:
I suggest going against normalization only when you know exactly what you're doing. A DBMS is created to deal with large amount of data. You probably don't need to serialize your strctured data into a single field.
Keep in mind that your application layer then has to tokenie (or worse) the serialized field data to get the original meaning back, which has certainly larger overhead than getting the data already in a structured form, from the DB.
The only exeption I can think of is a client-heavy architcture, when moving processing to the client side actually takes burden off the server, and you would serialize our data anyway for the sake of the transfer. - In server-side code (like php) it is not a good practive to save serialized stye data into the DB.
(Though, using php's built in serialization may be a good idea in some cases. Your current project does not seem to benefit from it.)
The VARCHAR is a variable-length data type, it has a length property, but the value can be empty; calculation may be not exact. Have a look at 'Avg_row_length' field in information_schema.tables.