I'm currently using a MySQL where do I store JSON information and I'd fetch them from MySQL and parse them on my application. I would like to get rid of MySQL, but first I would like to know is that wise?
Is that efficient if I move to way that I store the data into data folder that contains .json files and these contains the data I need? There will be my app's coordinate data per user who wants to track themselves on map. Will that cause any issues? I don't need "query", but what about big data like 50K lines in example? Same amount will be in MySQL too. Amount doesn't change, but will there be any problems that appears when moving from "reading from sql to reading from json files"
It's difficult to answer all these questions in one, but I'll address some of them:
There are dedicated NoSQL databases that are very good at the type of data storage you're talking about: MongoDB, CouchDB etc. It might be worth checking these out. They are very good at dealing with JSON data. Querying and parsing are very simple in Node.js.
You can store JSON in MySQL (or other RDMS systems), I've done it in several projects with good results. As of MySQL 5.7.8 there is a dedicated JSON type. Queries can actually work surprisingly well, I know I've queried tables with tens of millions of JSON entries pretty quickly.
Make sure you consider backup and restore scenarios, what happens in the event of a data loss. Using MySQL or a NoSQL database will simplify this for you. Either way make sure you have this covered!
I wouldn't call 50K lines big data! I dealt with databases with tens of millions of rows.. this still wouldn't be called big data.
I would probably not recommend storing your data in files. I've worked in telematics before, we stored millions of JSON blobs in relational databases with very little problems. Later on we planned to move to a NoSQL database for these, but the relational database worked surprising well, especially because you can adopt a hybrid approach of using relational queries and including JSON data in the results (to be parsed by clients).
You might not need the ability to query, but it's very useful to get for example "Give me all JSON for user id 100". An RDBMS or NoSQL system would make this very easy.
Related
I have many IoT devices sending data currently to MySQL Database.
I want to port it to some other Database, which will be Open Source and provide me with:
JSON support
Scalability
Flexibility to add multiple columns automatically as per payload
Python and PHP Support
Extremely Fast Read, Write
Ability to export at least 6 months of data in CSV format
Please revert back soon.
Any help will be appreciated.
Thanks
Shaping your database based on input data is a mistake. Think of tomorrow your data will be CSV or XML, in a slight different format. Design your database based on your abstract data model, normalize it and apply existing data to your model. Shape your structure based on what input you have and what output you plan to get. If you retrieve the same content as the input, storing data in files will be sufficient, you don't need a database.
Also, you don't want to store "raw" records the database. Even if your database can compose a data record out of the raw element at run time, you cannot run a selection based on a certain extracted element, without visiting all the records.
Most of the databases allow you to connect from anywhere (there is not such thing as better support of PostgreSQL in Java as compared to Python, but the quality and level of standardization for drivers may vary). The question is what features shall your driver support. For example, you may require support for bulk import (don't issue large INSERT sets to the database).
What you actually look for is:
scalability: can your database grow with your data? Would the DB benefit of adding additional CPUs (MySQL particularly doesn't for large queries). Can you shard your database on multiple instances? (MySQL again fails to handle that).
does your model looks like a snowflake? If yes, you may consider NoSQL, otherwise stay away of it. If you manage to model as a snowflake (and this means you are open for compromises) you may use anything like Lucene based search products, Mongo, Cassandra, etc. The fact you have timeseries doesn't qualify you for NoSQL. For example, you may have 10K devices issuing 5k message types. Specific data is redundantly recorded at device level and at message type level. In that case, because of the n:m relation, you don't have the snowflake anymore.
why do you store the data? What queries are you going to issue?
Why do you want to move away from MySQL? It is open source and can meet all of the criteria you listed above. This is a very subjective question so it's hard to give a good answer, but MySQL is not a bad option
I have never used NoSQL before, generally the applications I write requires relations. However, I have encountered a something that I don't know how to go about. So far, I am only designing the database. For now, my main logic is in the MySQL Database. I have static content that I will be hosting through a CDN. However, I have dynamic content that will be updated but very rarely but will be read almost on every request - like phone number, email address, address, additional info. They will not be used for searching, however this data is unstructured. A user can have multiple email addresses, phone numbers, and addresses; and they would be needed for multiple tables. So, using relational database in this case fails my needs (I don't want to create an Entity-Attribute-Value Table for this) and since I know that it doesn't affect the logic - its only used as a "meta-data" I want to keep them in a JSON format. And after Google-ing for sometime, I found out that MongoDB stores "documents" in JSON which sounded like the perfect solution. However, I have one question regarding this. How do I connect these databases together? Do I need to just add a user_id or organization_id "column"/field for a document on create/update and do a "select" query (whatever is the equivalent in the MongoDB) to receive the meta data? Or is there a different way?
I'll present here my opinion. What you're trying to do here is called "polyglot persistence". If you introduce mongo, you'll have 2 architectures for storing your data, different in strength, api, design, what not and this has its price.
Mongo DB, is a great product, I've used it by myself with a great success, but you have to understand that it doesn't provide all the features you would expect from RDBMS like MySQL. For example, it totally lacks transactions.
Moreover if you store in both MySQL and Mongo you'll have to care for data integrity by yourself (what happens if as a part of logical transaction mysql transaction succeeds, but mongo fails to store the data), there is no rollback...
I believe you've got my point.
Yes, mongo really allows to query by various JSON parameters, in fact it features the whole query language, it resembles SQL to some extent, but its not really a "relational" query engine, because mongo is not a relational database, so you don't have JOINs for example. But you've said by yourself that you are not going search by these fields, so I kind of don't understand what benefit you would have from using mongo. Maybe this is only about terminology, but I'm confused with this statement a little.
Where mongo is really shines is when you have a lot of data (its a big data product after all), then you have funny stuff like replica-sets and sharding, but the question is whether you really need it? do you really have "big data" - really huge amount of objects to be stored?
As an alternative, I think maybe you can use a text column for storing the JSON "as is". I mean, you might have a column, storing the JSON.
You even sometimes have "JSON" type as a native type in the database, I'm not sure whether MySQL supports it.
In this case you even can do some operations on these jsons (like, append, partial update and so forth).
Of course the choice is yours, all I'm saying is that you should think whether you have more benefits while using 2 persistence engines, or will make your project more complicated.
Hope this helps
I'm looking for alternative data storage methods to SQL (That is to say, I do not want to use SQL, even for queries) and came across a few based on JSON. Talking with friends who do database work, they said I shouldn't consider these, but wouldn't elaborate. What are the potential (and practical) drawbacks to using JSON as a data storage file format?
I figured JSON would be better than SQL for these reasons:
JSON is strictly defined and doesn't have flavors (Oracle, Microsoft, MySQL, etc.)
Since Google started making Chrome, JS interpreters have made reading, parsing, and outputting JS (and thus JSON) a very fast and easy process.
Database output could be pure JSON, erasing the need for a middle-man interpreter for browsers, etc.
among others...
I think you might want to take a look at NO-SQL databases:
https://en.wikipedia.org/wiki/NoSQL
If you like using JSON-like data, then one I have personally used is MongoDB.
I have not used it as a main/single source of my app data, but only for secondary purposes. But, I guess, you can try using it as your main data storage too (I think many people do).
What I have tried, and was quite satisfying, was MongoDB with C# and using MongoVue as a GUI application for executing queries and interaction with the DB. I was not very happy with MongoVUE, but it seems that it was the best option at the time.
However, SQL DBs are very good at defining relationships in your data. E.g. referencing an entry on table A from an entry on table B, and that kind of stuff. Using those relationships, you can join tables and do many interesting things. I think, it is good for you to get some experience on this field as well.
MongoDB is not build for defining relationships (as far as I understand). It has the concept of "documents", where you store information in a JSON like format (with nested key/values). You can query documents, but joining seems like hacking your way around its normal usage: How do I perform the SQL Join equivalent in MongoDB?
Also, ensuring data consistency (in a truly reliable manner) when using relationships in MongoDB seems pretty impossible to me. But even if I am wrong and it is possible, it will be 10 times harder achieving it than with SQL DBs.
But you can have a look at the list in WikiPedia and there might be a better alternative than MongoDB for you.
But you can use pure JSON as well with no DB system.
So, in summary JSON-like storage has (at least) these issues:
Not good at defining and utilizing relationships
When using relationships, data integrity (or more likely, reference integrity) is hard.
If you are not using a good DB system, but you just dump JSON into a file, when that file becomes too big you will have performance issues. Imagine querying a 1GB JSON encoded array of objects to get the ones you want. You will have to load the entire array on memory, run through the whole of it (since you will have no indices) and then (if you have not run out of memory and your connection -when using a network- has not expired) you will get a result. Most NO-SQL DBs like MongoDB and most SQL DBs have no such problems (at least within reasonable amounts of data). They are fine-tuned, they support indexing, references, permissions, roles and you can also define executing code at the DB level (like triggers and stored procedures). Certainly they are more complex, but that complexity may be required most of the times to achieve the end result.
JSON, or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs. It is used primarily to transmit data between a server and web application, as an alternative to XML.
You are more looking at the comparison between database vs flat-file storage really.
Even when using a relational DB, data integrity (or referential integrity) is still hard because rows are, usually, timestamped. Quite often foreign keys are not enforced because of this. When an row update occurs you have 2 choices. Firstly, 'forget' the previous version. Secondly, update the original row and copy the previous version into a timestamped 'non-relational' history table where foreign keys are useless. Most business data requires updates. Features for maintaining referential integrity in Relational databases are useless for this type of business data (which represents most enterprise data).
What is needed is a Temporal Database, or an abstraction layer which presents a user with the appropriate version of a row based on a Time context. Ideally in 2 dimensions i.e. transaction time and business time (aka valid time).
This is more of a concept/database architecture related question. In order to maintain data consistency, instead of a NoSQL data store, I'm just storing JSON objects as strings/Text in MySQL. So a MySQL row will look like this
ID, TIME_STAMP, DATA
I'll store JSON data in the DATA field. I won't be updating any rows, instead I'll add new rows with the current time stamp. So, when I want the latest data I just fetch the row with the max(timestamp). I'm using Tornado with the Python MySQLDB driver as my primary backend application.
I find this approach very straight forward and less prone to errors. The JSON objects are fairly simple and are not nested heavily.
Is this approach fundamentally wrong ? Are there any issues with storing JSON data as Text in MySQL or should I use a file system based storage such as HDFS. Please let me know.
MySQL, as you probably know, is a relational database manager. It is designed for being used in a way where data is related to each other through keys, forming relations which can then be used to yield complex retrieval of data. Your method will technically work (and be quite fast), but will probably (based on what I've seen so far) considerably impair your possibility of leveraging the technology you're using, should you expand the complexity of your scope!
I would recommend you use a database like Redis or MongoDB as they are designed for document storage rather than relational architectures.
That said, if you find the approach works fine for what you're building, just go ahead. You might face some blockers up ahead if you want to add complexity to your solution but either way, you'll learn something new! Good luck!
Pradeeb, to help answer your question you need to analyze your use case. What kind of data are you storing? For me, this would be the deciding factor: every technology has its specific use case where it excels at.
I think it is safe to assume that you use JSON since your data structure needs to very flexible documents, compared to a traditional relational DB. There are certain data stores that natively support such data structures, such as MongoDB (they call it "binary JSON" or BSON) as Phil pointed out. This would give you improved storage and/or improved search capabilities. Again, the utility depends entirely on your use case.
If you are looking for something like a job queue and horizontal scalability is not an issue and you just need fast access of the latest you could use RedisDB, an in-memory key value store, that has a hash (associative array) data type and lists for this kind of thing. Alternatively, since you mentioned HDFS and horizontal scalability may very well be an issue, I can recommend using queue systems like Apache ActiveMQ or RabbitMQ.
Lastly, if you are writing heavily, and your are not client limited but your data storage is your bottle neck: look into distributed, flexible-schema data storage like HBase or Cassandra. They offer flexible data schemas, are heavily write optimized, and data can be appended and remains in chronological order, so you can fetch the newest data efficiently.
Hope that helps.
This is not a problem. You can also use memcached storage engine in modern MySQL which would be perfect. Although I have never tried that.
Another approach is to use memcached as cache. Write everything to both memcached, and also mysql. When you go to read data, try reading from memcached. If it does not exist, read from mysql. This is a common technique to reduce database bottleneck.
Im thinking about doing the following and need suggestions if it makes sense to approach it this way. Basically since I am able to do queries in MongoDB and MongoDb is wicked fast at these since the hotspots of the data are cached in memory. I was thinking of storing data I would normally do a join from in mysql in mongoDB. While I am using memcached to store simple query results (for example a movie description page), for bigger stuff that requires more realtime/ondemand queries I was thinking about storing this in MongoDB. For example the view count for movies and who saw it, and doing analysis on it.
Hopefully I explained it clearly.
more info:
We dont want to keep writing to our mysql server on every rating like etc, MongoDB seemed like a good option to store the ratings,views of movies etc and then later on be able to do processing on that data. Whereas with Memcached data is not persisted and were unable to do queries
Thanks,
Faisal
Memory caching alone is not a good reason to go with MongoDB. Any properly configured RDBMS will cache frequently used data in memory.
What aspect of MySQL is currently limiting your performance? Do you have enough RAM in your server? Are your disks fast enough? Do you have a low latency cache device like an SSD configured appropriately?
There's nothing wrong with using both solutions in your application. As a matter of fact, I'm using mysql to store user sessions as suppose to cookies. In addition, I have another project that utilizes mysql but for certain parts of my application, I will be using MongoDB. Why? It's wicked fast and I hate writing join queries. It's so much easier to pop data in/out of mongo as suppose to having to do join queries in mysql.
i.e.
When saving tags for a particular user, it's so gosh darn eary to save/modify/delete the tag that's stored in mongodb. With MySQL, I would've had to write a query that JOINs multiple tables. For data such as user account, password, city, state - I saved everything into MySQL.
What your talking about is the idea of having normalized and de-normalized data. Using MongoDB as a denormalized data store for your normalized sql data is fine. Using Mongodb as your only data store for certain kinds of data is also fine. Just make sure that its clear in the system design as to where the true data is, and where the denormalized data is.
Normalized data is true fact. Denormalized data is gossip - you aren't sure if its up to date or not.