How to migrate data from mongodb to mysql? - mysql

I am currently working on an application like to analitics, i has Angularjs app which communicates with Spring REST Client App from which user creates token(trackingID) and use generated script with this id putting on his website to collect information about visitor's actions through another Spring REST tracking App, for tracking app i am using as mongodb to collect visitor actions/visitor info for fast insertion, but for rest client app mysql with user/accounts details.
My question is how to migrate mongo data from tracking app to mysql maybe for getting posibility of join for easily and fastest way of analyze data with any kind of filters from angularjs client app, to create manually any workers that periodically will transfer data from last point to present state from mongo to mysql, or are any existed tools that can be setted for this transfer?

There is no official library to do this.
But you can use mongoexport feature from mongoDB to export it in a CSV format and mysqlimport to import them into MySQL.
Here are links to the documentation MySQL import and MongoDB Export.
One more method you can try to write a program in one of your favorite language and read from MongoDB and write into MySQL

MySQL 5.7 has a new JSON data type, that can be very convenient.
You can create a table at MySQL to receive the JSON messages AS IS, and then use SQL to query it or do a post processing to load the data in a structured set of database tables.
Check this out: https://dev.mysql.com/doc/refman/5.7/en/json.html

I realise this question is a few years old - but recently I've had a number of people enquiring whether a tool I developed (https://virtual.blue/apps/json-converter) can do exactly what the OP is asking (convert MongoDB to SQL) so I am guessing it is still something people want. Keep reading to find out why I am honestly not surprised by this.
The short answer to whether the tool can help you is: perhaps. If your existing data relationships are not too complicated, and your database is not enormous, it may well be worth a try.
However, I thought it might help to try and explain what the issues are with this kind of conversion, since all the answers I have seen so far are along the lines of "try tool X" or "first convert to format Y and then you can slurp it into MySQL using utility Z". ie there is no thought to whether what you get at the end of doing this is going to make sense in terms of data relationships and integrity.
For example, you could just stick your entire database dump in a single field of a single SQL table (ok space limitations might prevent this in reality, but hopefully you get my point). Then your database would be "in MySQL format", but it would be absolutely no use to anyone.
The point is, what you actually want is a fully defined database model, correctly encapsulating all of the intrinsic data relationships. ("Database normalization" as it is known.) If your conversion process gets those relationships wrong, then you have a broken model, and any queries you try to run over it are likely to return nonsense. Unfortunately there is no magic tool that is just going to "know" the best way to represent your data in MySQL, and closing your eyes and shovelling it into a bunch of random tools is unlikely to miraculously get you what you want.
And herein lies the fundamental problem with the "NoSQL" philosophy (fad). They sold people the bogus notion of "non-relational data". My first thought when I heard this was, "How does that work? Surely all data is relational?" By the looks of things we are steadily getting more and more evidence that my instincts were right. ("NoSQL? Why stop there? I go with 'NoDatabase'. It returns no results at all, but it sure is fast!")
The NoSQL madness throws several important fundamental engineering principles to the wind. We shouted "don't hard code!", "DRY!" (Don't Repeat Yourself) because these actions infuse inflexibility into systems. Traditional wisdom makes precisely the same flexibility argument when it advises "create a fully described model with all the data relationships represented". Then you can execute any arbitrary query over it and expect meaningful results. "Yes but there are a whole bunch of queries we are never going to need to run," says the NoSQL proponent. But surely we learnt our lesson on things we are "never going to need to do"? ("I hard code liberally, because I know I am never going to want to change my code." Hmm...)
The arguments about speed are largely moot. Say it turns out you are frequently doing a complex 9 table join, with unsurprisingly sluggish performance. So create an index. Cache it. Swap some disk space for speed. The NoSQL philosophy is to swap data integrity for speed, which makes no sense at all.
When you generate your fast lookup index (cache/table/map/whatever) what you are really doing is creating a view over your model. If your model changes, you can readily update your view. Going from a model to a view is easy - it's a one to many operation and you are on the right side of entropy.
However, when you went with MongoDB you effectively decided to create views without bothering to describe your fundamental model. Now you discover there are queries you want to run, but can't - and so it's no wonder you want to move over to SQL and actually have your data modelled correctly. The problem is you now want to go from a view to a model. Now you're on the wrong side of entropy. Your view is a lossy representation of the model's fundamental relationships. You can't expect a tool to "translate" your database, because you are asking it to insert new relationships which were not originally defined. These are real world relationships that are not machine-guessable. The tool cannot know what relationships were intended.
In short the only way you can do this reliably is to get your hands dirty. An intelligent human, with complete understanding of the system you are modelling needs to sit down and carefully come up with (possibly a substantial amount of) code which effectively picks through the data and resolves all of the insufficiently represented data relationships. If your data is complex then it's going to be a headache and there is no way to cheat.
If your data is still relatively simple then I would suggest making the conversion as soon as possible, before it becomes difficult. In this case my tool (https://virtual.blue/apps/json-converter) may be able to help.
(They really should have asked a Physicist before they came up with all this nonsense...!)

You can download a trial version of Studio 3T for Mongo and export your database to SQL (or JSON) directly

Related

Best way to store/edit GEOJSON working on Wordpress [duplicate]

I want to create a website that will have an ajax search. It will fetch the data or from a JSON file or from a database.I do not know which technology to use to store the data. JSON file or MySQL. Based on some quick research it is gonna be about 60000 entries. So the file size if i use JSON will be around 30- 50 MB and if use MySQL will have 60000 rows. What are the limitations of each technique and what are the benefits?
Thank you
I can't seem to comment since I need 50 rep. for commenting, so I will give it as an answer:
MySQL will be preferable for many reasons, not the least of which being you do not want your web server process to have write access to the filesystem (except for possibly logging) because that is an easy way to get exploited.
Also, the MySQL team has put a lot of engineering effort into things such as replication, concurrent access to data, ACID compliance, and data integrity.
Imagine if, for instance, you add a new field that is required in whatever data structure you are storing. If you store in JSON files, you will have to have some process that opens each file, adds the field, then saves it. Compare this to the difficulty of using ALTER TABLE with a DEFAULT value for the field. (A bit of a contrived example, but how many hacks do you want to leave in your codebase for dealing with old data?) so to be really blunt about, MySQL is a database while JSON is not, so the correct answer is MySQL, without hesitation. JSON is just a language, and barely even that. JSON was never designed to handle anything like concurrent connections or any sort of data manipulation, since its own function is to represent data, not to manage it.
So go with MySQL for storing the data. Then you should use some programming language to read that database, and send that information as JSON, rather than actually storing anything in JSON.
If you store the data in files, whether in JSON format or anything else, you will have all sorts of problems that people have stopped worrying about since databases started being used for the same thing. Size limitations, locks, name it. It's good enough when you have one user, but the moment you add more of them, you'll start solving so many problems that you would probably end up by writing an entire database engine just to handle the files for you, while all along you could have simply used an actual database. Do note! Don't take my word for granted, I am not an expert on this field, so let others post their answer and then judge by that. I think enough people here on stackoverflow have more experience then I do haha. These are NOT entirely my words, but I have taken out the parts that were true from what I knew and know and added some of my own knowledge :) Have a great time making your website
For MySQl :you can select specific rows,or specific column using queries ,filter data based on a key,order alphabetically
downside:need a REST API to fetch data because it can't be accessed directly,you have to use php or python or whatever programming language for backend code.
for json file :benefits :no backend code directly accessed using GET http request.
downside:no filtering ,ordering or any queries,you have to do it manually.

What to use JSON file or SQL

I want to create a website that will have an ajax search. It will fetch the data or from a JSON file or from a database.I do not know which technology to use to store the data. JSON file or MySQL. Based on some quick research it is gonna be about 60000 entries. So the file size if i use JSON will be around 30- 50 MB and if use MySQL will have 60000 rows. What are the limitations of each technique and what are the benefits?
Thank you
I can't seem to comment since I need 50 rep. for commenting, so I will give it as an answer:
MySQL will be preferable for many reasons, not the least of which being you do not want your web server process to have write access to the filesystem (except for possibly logging) because that is an easy way to get exploited.
Also, the MySQL team has put a lot of engineering effort into things such as replication, concurrent access to data, ACID compliance, and data integrity.
Imagine if, for instance, you add a new field that is required in whatever data structure you are storing. If you store in JSON files, you will have to have some process that opens each file, adds the field, then saves it. Compare this to the difficulty of using ALTER TABLE with a DEFAULT value for the field. (A bit of a contrived example, but how many hacks do you want to leave in your codebase for dealing with old data?) so to be really blunt about, MySQL is a database while JSON is not, so the correct answer is MySQL, without hesitation. JSON is just a language, and barely even that. JSON was never designed to handle anything like concurrent connections or any sort of data manipulation, since its own function is to represent data, not to manage it.
So go with MySQL for storing the data. Then you should use some programming language to read that database, and send that information as JSON, rather than actually storing anything in JSON.
If you store the data in files, whether in JSON format or anything else, you will have all sorts of problems that people have stopped worrying about since databases started being used for the same thing. Size limitations, locks, name it. It's good enough when you have one user, but the moment you add more of them, you'll start solving so many problems that you would probably end up by writing an entire database engine just to handle the files for you, while all along you could have simply used an actual database. Do note! Don't take my word for granted, I am not an expert on this field, so let others post their answer and then judge by that. I think enough people here on stackoverflow have more experience then I do haha. These are NOT entirely my words, but I have taken out the parts that were true from what I knew and know and added some of my own knowledge :) Have a great time making your website
For MySQl :you can select specific rows,or specific column using queries ,filter data based on a key,order alphabetically
downside:need a REST API to fetch data because it can't be accessed directly,you have to use php or python or whatever programming language for backend code.
for json file :benefits :no backend code directly accessed using GET http request.
downside:no filtering ,ordering or any queries,you have to do it manually.

Using MySQL and MongoDB together

I have never used NoSQL before, generally the applications I write requires relations. However, I have encountered a something that I don't know how to go about. So far, I am only designing the database. For now, my main logic is in the MySQL Database. I have static content that I will be hosting through a CDN. However, I have dynamic content that will be updated but very rarely but will be read almost on every request - like phone number, email address, address, additional info. They will not be used for searching, however this data is unstructured. A user can have multiple email addresses, phone numbers, and addresses; and they would be needed for multiple tables. So, using relational database in this case fails my needs (I don't want to create an Entity-Attribute-Value Table for this) and since I know that it doesn't affect the logic - its only used as a "meta-data" I want to keep them in a JSON format. And after Google-ing for sometime, I found out that MongoDB stores "documents" in JSON which sounded like the perfect solution. However, I have one question regarding this. How do I connect these databases together? Do I need to just add a user_id or organization_id "column"/field for a document on create/update and do a "select" query (whatever is the equivalent in the MongoDB) to receive the meta data? Or is there a different way?
I'll present here my opinion. What you're trying to do here is called "polyglot persistence". If you introduce mongo, you'll have 2 architectures for storing your data, different in strength, api, design, what not and this has its price.
Mongo DB, is a great product, I've used it by myself with a great success, but you have to understand that it doesn't provide all the features you would expect from RDBMS like MySQL. For example, it totally lacks transactions.
Moreover if you store in both MySQL and Mongo you'll have to care for data integrity by yourself (what happens if as a part of logical transaction mysql transaction succeeds, but mongo fails to store the data), there is no rollback...
I believe you've got my point.
Yes, mongo really allows to query by various JSON parameters, in fact it features the whole query language, it resembles SQL to some extent, but its not really a "relational" query engine, because mongo is not a relational database, so you don't have JOINs for example. But you've said by yourself that you are not going search by these fields, so I kind of don't understand what benefit you would have from using mongo. Maybe this is only about terminology, but I'm confused with this statement a little.
Where mongo is really shines is when you have a lot of data (its a big data product after all), then you have funny stuff like replica-sets and sharding, but the question is whether you really need it? do you really have "big data" - really huge amount of objects to be stored?
As an alternative, I think maybe you can use a text column for storing the JSON "as is". I mean, you might have a column, storing the JSON.
You even sometimes have "JSON" type as a native type in the database, I'm not sure whether MySQL supports it.
In this case you even can do some operations on these jsons (like, append, partial update and so forth).
Of course the choice is yours, all I'm saying is that you should think whether you have more benefits while using 2 persistence engines, or will make your project more complicated.
Hope this helps

Which is the right database for the job?

I am working on a feature and could use opinions on which database I should use to solve this problem.
We have a Rails application using MySQL. We have no issues with MySQL and it runs great. But for a new feature, we are deciding whether to stay MySQL or not. To simplify the problem, let's assume there is a User and Message model. A user can create messages. The message is delivered to other users based on their association with the poster.
Obviously there is an association based on friendship but there are many many more associations based on the user's profile. I plan to store some metadata about the poster along with the message. This way I don't have to pull the metadata each time when I query the messages.
Therefore, a message might look like this:
{
id: 1,
message: "Hi",
created_at: 1234567890,
metadata: {
user_id: 555,
category_1: null,
category_2: null,
category_3: null,
...
}
}
When I query the messages, I need to be able to query based on zero or more metadata attributes. This call needs to be fast and occurs very often.
Due to the number of metadata attributes and the fact any number can be included in a query, creating SQL indexes here doesn't seem like a good idea.
Personally, I have experience with MySQL and MongoDB. I've started research on Cassandra, HBase, Riak and CouchDB. I could use some help from people who might have done the research as to which database is the right one for my task.
And yes, the messages table can easily grow into millions or rows.
This is a very open ended question, so all we can do is give advice based on experience. The first thing to consider is if it's a good idea to decide on using something you haven't used before, instead of using MySQL, which you are familiar with. It's boring not to use shiny new things when you have the opportunity, but believe me that it's terrible when you've painted yourself in a corner because you though that the new toy would do everything it said on the box. Nothing ever works the way it says in the blog posts.
I mostly have experience with MongoDB. It's a terrible choice unless you want to spend a lot of time trying different things and realizing they don't work. Once you scale up a bit you basically can't use things like secondary indexes, updates, and other things that make Mongo an otherwise awesomely nice tool (most of this has to do with its global write lock and the database format on disk, it basically sucks at concurrency and fragments really easily if you remove data).
I don't agree that HBase is out of the question, it doesn't have secondary indexes, but you can't use those anyway once you get above a certain traffic load. The same goes for Cassandra (which is easier to deploy and work with than HBase). Basically you will have to implement your own indexing which ever solution you choose.
What you should consider is things like if you need consistency over availability, or vice versa (e.g. how bad is it if a message is lost or delayed vs. how bad is it if a user can't post or read a message), or if you will do updates to your data (e.g. data in Riak is an opaque blob, to change it you need to read it and write it back, in Cassandra, HBase and MongoDB you can add and remove properties without first reading the object). Ease of use is also an important factor, and Mongo is certainly easy to use from the programmer's perspective, and HBase is horrible, but just spend some time making your own library that encapsulates the nasty stuff, it will be worth it.
Finally, don't listen to me, try them out and see how they perform and how it feels. Make sure you try to load it as hard as you can, and make sure you test everything you will do. I've made the mistake of not testing what happens when you remove lots of data in MongoDB, and have paid for that dearly.
I would recommend to look at presentation about Why databases suck for messaging which is mainly targeted on the fact why you shouldn't use databases such as MySQL for messaging.
I think in this scenario CouchDB's changes feed may come quite handy although you probably would also have to create some more complex views based on querying message metadata. If speed is critical try to also look at redis which is really fast and comes with pub/sub functionality. MongoDB with it's ad hoc queries support may also be a decent solution for this use case.
I think you're spot-on in storing metadata along with each message! Sacrificing storage for faster retrieval time is probably the way to go. Note that it could get complicated if you ever need to change a user's metadata and propagate that to all the messages. You should consider how often that might happen, whether you'll actually need to update all the message records, and based on that whether it's worth paying the price for the sake of less queries (it probably is worth it, but that depends on the specifics of your system).
I agree with #Andrej_L that Hbase isn't the right solution for this problem. Cassandra falls in with it for the same reason.
CouchDB could solve your problem, but you're going to have to define views (materialized indices) for any metadata you're going to want to query. If the whole point of not using MySQL here is to avoid indexing everything, then Couch is probably not the right solution either.
Riak would be a much better option since it queries your data using map-reduce. That allows you to build any query you like without the need to pre-index all your data as in couch. Millions of rows are not a problem for Riak - no worries there. Should the need arise, it also scales very well by simply adding more nodes (and it can balance itself too, so this is really a non-issue).
So based on my own experience, I'd recommend Riak. However, unlike you, I've no direct experience with MongoDB so you'll have to judge it agains Riak yourself (or maybe someone else here can answer on that).
From my experience with Hbase is not good solution for your application.
Because:
Doesn't contain secondary index by default(you should install plugins or something like these). So you can effectively search only by primary key. I have implemented secondary index using hbase and additional tables. So you can't use this one in online application because of for getting result you should run map/reduce job and it will take much time on million data.
It's very difficult to support and adjust this db. For effective work you will use HBAse with Hadoop and it's necessary powerful computers or several ones.
Hbase is very useful when you need make aggregation reports on big amount of data. It seems that you needn't.
Due to the number of metadata attributes and the fact any number can
be included in a query, creating SQL indexes here doesn't seem like a
good idea.
It sounds like you need a join, so you can mostly forget about CouchDB till they sort out the multiview code that was worked on (not actually sure it is still worked on).
Riak can query as fast as you make it, depends on the nodes
Mongo will let you create an index on any field, even if that is an array
CouchDB is very different, it builds indexes using a stored Map-Reduce(but without the reduce) they call a "view"
RethinkDB will let you have SQL but a little faster
TokuDB will too
Redis will kill all in speed, but it's entirely stored in RAM
single level relations can be done in all of them, but differently for each.

Using combination of MySQL and MongoDB

Does it make sense to use a combination of MySQL and MongoDB. What im trying to do basically is use MySQl as a "raw data backup" type thing where all the data is being stored there but not being read from there.
The Data is also stored at the same time in MongoDB and the reads happen only from mongoDB because I dont have to do joins and stuff.
For example assume in building NetFlix
in mysql i have a table for Comments and Movies. Then when a comment is made In mySQL i just add it to the table, and in MongoDB i update the movies document to hold this new comment.
And then when i want to get movies and comments i just grab the document from mongoDb.
My main concern is because of how "new" mongodb is compared to MySQL. In the case where something unexpected happens in Mongo, we have a MySQL backup where we can quickly get the app fallback to mysql and memcached.
On paper it may sound like a good idea, but there are a lot of things you will have to take into account. This will make your application way more complex than you may think. I'll give you some examples.
Two different systems
You'll be dealing with two different systems, each with its own behavior. These different behaviors will make it quite hard to keep everything synchronized.
What will happen when a write in MongoDB fails, but succeeds in MySQL?
Or the other way around, when a column constraint in MySQL is violated, for example?
What if a deadlock occurs in MySQL?
What if your schema changes? One migration is painful, but you'll have to do two migrations.
You'd have to deal with some of these scenarios in your application code. Which brings me to the next point.
Two data access layers
Your application needs to interact with two external systems, so you'll need to write two data access layers.
These layers both have to be tested.
Both have to be maintained.
The rest of your application needs to communicate with both layers.
Abstracting away both layers will introduce another layer, which will further increase complexity.
Chance of cascading failure
Should MongoDB fail, the application will fall back to MySQL and memcached. But at this point memcached will be empty. So each request right after MongoDB fails will hit the database. If you have a high-traffic site, this can easily take down MySQL as well.
Word of advice
Identify all possible ways in which you think 'something unexpected' can happen with MongoDB. Then use the most simple solution for each individual case. For example, if it's data loss you're worried about, use replication. If it's data corruption, use delayed replication.