I need your help with a db question on MySQL. I have a table containing spots with several properties about each spot, and an user table containing properties for each user. How to store into database the spots a user subscribed to ?
Currently I have a field containing a Json array, which I decode and re-encode each time I need to access and modify. But a direct mysql request can't support it so complexity is quickly so important...
How would you structure your dB in such a situation?
Have your fields in different colums in a table instead of just one for the Json array.
You can try a join operation on both tables. That way you wont need to decode and re-encode your Json array.
Downside being it will take up more storage space.
Hope that helps. :)
Related
I have a table A that contains the definition/configuration for a form (fields, display information, etc). I perform a lookup into that table to determine what the form that is being displayed looks like. We also dynamically create tables to hold data as specified in that form or record.
When working with other developers, twice it has been suggested to store the field information in JSON format in a single field in table A instead of individual fields for configuration.
My principle concern is one of performance. We are retrieving row information from Table A or we are retrieving row information from table A and parsing it in the client.
Which is better in terms of performance? In terms of code reuse?
Short answer is yes, storing configuration as a serialized JSON document will give you the flexibility of changing and propagating changes easily likely with less code. Ideally, let the client do the deserialization,
Assuming documents are fairly small (<5K) processing cost is negligible, and as long as your access pattern is key/vale based the database performance should not be different from accessing any other row by primary key. Make sure to index the key.
But more broadly I would consider the following,
A document store for this scenario (for both the configuration and data).
Consider separating schema definition from the user/system preferences.
Shard data by the key (this would be a replacement for creating separate tables)
My principle concern is one of performance. We are retrieving row
information from Table A or we are retrieving row information from
table A and parsing it in the client.
Which is better in terms of performance? In terms of code reuse?
I do not see performance as a problem here.
JSON Pros
Schema flexibility. If you change or add something, you need not to touch database tables
Configuration richness. JSON is more expressive, than a database table
Easy nested structure support
JSON Cons
Inability to change only a part of JSON object. You have to deserialize it, change, serialize again and then store.
Inability to easily change a part of many objects. Where simple UPDATE...WHERE can be issued for database table, you will have to read your database row by row and update each separately when using JSON.
Weak versioning. Changing JSON schema format is not very simple and is not obvious. When you change database structure, it's always visible and straightforward process. JSON schema change is not so obvious.
If you go with JSON, I recommend to use JSON schemas to validate current versions of data. And consider making a migration regulation. If JSON schema changes, a special migration must be prepared, which will walk database and restructure all JSON data there in a single transaction.
I am trying to understand why JSON is widely used for data transfer between client and server. I understand that it offers simple design which is easy to understand. However, on the contrary;
A JSON string includes repeated data, e.g, incase of a table, columns names (keys) are repeated in each object . Would it not be wise to send columns as first object and rest of the object should be the data (without columns/keys information) from the table.
Once we have a JSON object, the searching based on keys is expensive (in time) compared to indexes. Imagine a table with 20-30 column, doing this searching for each key for each object would cost a lot more time compare to directly using indexes.
There may be many more drawbacks and advantages, add here if you know one.
I think if you want data transfer then you want a table based format. The JSON format is not a table based format like standard databases or Excel. This can complicate analyzing data if there is a problem because someone will usually use excel for that (sorting, filtering, formulas). Also building test files will be more difficult because you can't simply use excel to export to JSON.
But, If you wanted to use JSON for data transfer you could basically build a JSON version of a CSV file. You would only use arrays.
Columns: ["First_Name", "Last_Name"]
Rows: [
["Joe", "Master"],
["Alice", "Gooberg"]
.... etc
]
Seems messy to me though.
If you wanted to use objects then you will have to embed Column names for every bit of data, which in my opinion indicates a wrong approach.
I have a table in My Sql qhere i have fields lie name, location, description and picture.
What I want to do is store multiple picture links in the picture column.
Is there a way of doing that without creating a separate table for picture?
Thank you
Well you need to perform some sort of serialization in order to do that. I used to do that before I moved to document-oriented databases. Quite possibly your best option is to store everything in a json format as it is pretty universal and I can't think of any language that cannot handle it and convert it back to an object, array, dictionary or whatever the language requires. Assuming you need to save the name of the file as in somefile.png, what you could do is store ["image1.png","image2.png","image3.png"] and so on. If you want to store a blob however it's a bit more complicated. You either have to create a second table or read the contents of each image, convert it to base64, load all base64 strings into an object and then serialize it into a json. I wouldn't recommend that as each operation would cost a lot of system resources.
Let's assume I need to store some data of unknown amount within a database table. I don't want to create extra tables, because this will take more time to get the data. The amount of data can be different.
My initial thought was to store data in a key1=value1;key2=value2;key3=value3 format, but the problem here is that some value can contain ; in its body. What is the best separator in this case? What other methods can I use to be able to store various data in a single row?
The example content of the row is like data=2012-05-14 20:07:45;text=This is a comment, but what if I contain a semicolon?;last_id=123456 from which I can then get through PHP an array with corresponding keys and values after correctly exploding row text with a seperator.
First of all: You never ever store more than one information in only one field, if you need to access them separately or search by one of them. This has been discussed here quite a few times.
Assuming you allwas want to access the complete collection of information at once, I recommend to use the native serialization format of your development environment: e.g. if it is PHP, use serialze().
If it is cross-plattform, JSON might be a way to go: Good JSON encoding/decoding libraries exist for something like all environments out there. The same is true for XML, but int his context the textual overhead of XML is going to bite a bit.
On a sidenote: Are you sure, that storing the data in additional tables is slower? You might want to benchmark that before finally deciding.
Edit:
After reading, that you use PHP: If you don't want to put it in a table, stick with serialize() / unserialize() and a MEDIUMTEXT field, this works perfectly, I do it all the time.
EAV (cringe) is probably the best way to store arbitrary values like you want, but it sounds like you're firmly against additional tables for whatever reason. In light of that, you could just save the result of json_encode in the table. When you read it back, just json_decode to get it back into an array.
Keep in mind that if you ever need to search for anything in this field, you're going to have to use a SQL LIKE. If you never need to search this field or join it to anything, I suppose it's OK, but if you do, you've totally thrown performance out the window.
it can be the quotes who separate them .
key1='value1';key2='value2';key3='value3'
if not like that , give your sql example and we can see how to do it.
I'm trying to implement a key/value store with mysql
I have a user table that has 2 columns, one for the global ID and one for the serialized data.
Now the problem is that everytime any bit of the user's data changes, I will have to retrieve the serialized data from the db, alter the data, then reserialize it and throw it back into the db. I have to repeat these steps even if there is a very very small change to any of the user's data (since there's no way to update that cell within the db itself)
Basically i'm looking at what solutions people normally use when faced with this problem?
Maybe you should preprocess your JSON data and insert data as a proper MySQL row separated into fields.
Since your input is JSON, you have various alternatives for converting data:
You mentioned many small changes happen in your case. Where do they occur? Do they happen in a member of a list? A top-level attribute?
If updates occur mainly in list members in a part of your JSON data, then perhaps every member should in fact be represented in a different table as separate rows.
If updates occur in an attribute, then represent it as a field.
I think cost of preprocessing won't hurt in your case.
When this is a problem, people do not use key/value stores, they design a normalized relational database schema to store the data in separate, single-valued columns which can be updated.
To be honest, your solution is using a database as a glorified file system - I would not recommend this approach for application data that is core to your application.
The best way to use a relational database, in my opinion, is to store relational data - tables, columns, primary and foreign keys, data types. There are situations where this doesn't work - for instance, if your data is really a document, or when the data structures aren't known in advance. For those situations, you can either extend the relational model, or migrate to a document or object database.
In your case, I'd see firstly if the serialized data could be modeled as relational data, and whether you even need a database. If so, move to a relational model. If you need a database but can't model the data as a relational set, you could go for a key/value model where you extract your serialized data into individual key/value pairs; this at least means that you can update/add the individual data field, rather than modify the entire document. Key/value is not a natural fit for RDBMSes, but it may be a smaller jump from your current architecture.
when you have a key/value store, assuming your serialized data is JSON,it is effective only when you have memcached along with it, because you don't update the database on the fly every time but instead you update the memcache & then push that to your database in background. so definitely you have to update the entire value but not an individual field in your JSON data like address alone in database. You can update & retrieve data fast from memcached. since there are no complex relations in database it will be fast to push & pull data from database to memcache.
I would continue with what you are doing and create separate tables for the indexable data. This allows you to treat your database as a single data store which is managed easily through most operation groups including updates, backups, restores, clustering, etc.
The only thing you may want to consider is to add ElasticSearch to the mix if you need to perform anything like a like query just for improved search performance.
If space is not an issue for you, I would even make it an insert only database so any changes adds a new record that way you can keep the history. Of course you may want to remove the older records but you can have a background job that would delete the superseded records in a batch in the background. (Mind you what I described is basically Kafka)
There's many alternatives out there now that beats RDBMS in terms of performance. However, they all add extra operational overhead in that it's yet another middleware to maintain.
The way around that if you have a microservices architecture is to keep the middleware as part of your microservice stack. However, you have to deal with transmitting the data across the microservices so you'd still end up with a switch to Kafka underneath it all.