What's the best approach to showcase benefits from SQL - NoSQL environment? - mysql

I was asked to create a little environment showcasing benefits from using NoSQL - SQL hybrid over only SQL database. Since my background is mostly Admin/DevOps I have basic knowledge about databases, but I've never done something like this.
I thought of creating a VM hosting MySQL or PostgreSQL instance and populating it with Sakila or other free database as starting point and the second VM with Mongo/Redis, but I don't know what to do from this point.
How can I integrate those databases? How can I run tests and what should I test - query response times? Is this even good strategy?
Any help would be appreciated.

Try to make two data models for the twitter data feed, one in relational SQL tables and another one in a single JSON document.
Used MySQL as a relational DB AND MongoDB as NoSQL DB.
The main performance indicator will show up when you start executing SQL on table joins in MySQL against executing MongoDB queries where there is no need to joins.
This is an idea but there are many other advantages.

Related

Can you practice MSSQL queries on MySQL?

I am trying to practice writing simple SQL queries but I can't connect to my school account on Microsoft SQL Server Studio because they delete your database once you finish the class. I downloaded MySQL but I wasn't sure if I could practice queries on it or not. Any answers would be great, thanks!
The syntax and built-in functions are not identical, but they are similar for many things. For example, SQL TOP, LIMIT, or ROWNUM clauses is a case where they diverge.
Personally, I’d recommend taking a look at what w3schools.com has for SQL (and MySQL) resources. They have a ton of info to get you going, tutorials, references, etc. They also have “Try It Yourself” modules where you can use a playground database they provide to practice executing in the language your looking at material for. In cases where the SQL Server and MySQL (and other languages) syntax differs, w3schools shows examples of both (all that they support) like in the example I mentioned at the top.
Snippet from the top of their SQL tutorial home page:
“Our SQL tutorial will teach you how to use SQL in: MySQL, SQL Server, MS Access, Oracle, Sybase, Informix, Postgres, and other database systems.”
I suggest SQLite. Why?
It is an embedded database, rather than something like MySql which stores folders of metadata. Sqlite stores only a single file. My reasons for beginning with Sqlite are as follows:
1) It is the default database for a lot of common applications (Django, Airflow, etc). Knowing it would come in real handy when learning these tools.
2) The download is not only much simpler, but the ide is much faster. Your complete database is also only a single file (very beginner friendly).
3) In memory databases. That's correct, you can spawn and delete databases in Sqlite within memory (or even delete the whole file with your OS file removal). Very useful for learning, data science, and on the fly OLAP.
4) It can store up to 140 TB of data. It is the perfect tool to load a csv to quickly analyze the data. Also, you can create a small database, compress the file, and send it to anyone! Sharing your whole database is really just sharing a file.
5) You can import sqlite into Python, C, C++, and start automating your queries. You can do this with MySQL too but there is more library downloading and reading to do. Do not use sqlite in production (multi threading limits) but it is great for ad hoc analysis (Jupyter notebooks), prototyping, and learning.
Overall, mysql is not the best tool for beginners (it's not even used in production as much as Postgres or SQL server). It abstracts too much from the user to even understand what the database represents or the query engine. Also, Sqlite is closer to standard ANSI than MySQL in my opinion (given all the syntactic sugar). Learn Sqlite, move to postgres, and then explore all the nosql, blockchain, etc. If you ever face MySQL, you'll pick it up in minutes. I guarantee, you will have a much easier time picking up sqlite!

Indexing for data copy on production servers

We have a postgresql database that is used in production and a mysql database for reporting, that copies data from the production server. Now, my issue is that the reporting database is denormalized so it is not a simple replica of the production DB schema. The data from production is copied onto the reporting server using complex queries that transform the data into a desired format. My question is about any guidelines that are industry standards for a practice like this. I would like to create indexes on the production server designed to make the complex queries for copying to reporting database faster. However, I would like to have some kind of quantifiable methods of measuring the pros and cons of creating these kind of indexes in the production server w.r.t its effect on the insert/update performance on the database. If anybody can provide some inputs based on experience or point me in the direction of some write-ups on this I would be grateful. I have tried google but haven't found anything that addresses an issue of this kind.
Any input would be much appreciated.
Thanks in advance.

Performing a join across multiple heterogenous databases e.g. PostgreSQL and MySQL

There's a project I'm working on, kind of a distributed Database thing.
I started by creating the conceptual schema, and I've partitioned the tables such that I may require to perform joins between tables in MySQL and PostgreSQL.
I know I can write some sort of middleware that will break down the SQL queries and issue sub-queries targeting individual DBs, and them merge the results, but I'd like to do do this using SQL if possible.
My search so far has yielded this (Federated storage engine for MySQL) but it seems to work for MySQL databases.
If it's possible, I'd appreciate some pointer's on what to look at, preferably in Python.
Thanks.
It might take some time to set up, but PrestoDB is a valid OpenSource solution to consider.
see https://prestodb.io/
You connect connect to Presto with JDBC, send it the SQL, it interprets the different connections, dispatches to the different sources, then does the final work on the Presto node before returning the result.
From the postgres side, you can try using a foreign data wrapper such as mysql_ftw (example). Queries with joins can then be run through various Postgres clients, such as psql, pgAdmin, psycopg2 (for Python), etc.
This is not possible with SQL.
Your options are to write your own "middleware" as you hinted at. To do that in Python, you would use the standard DB-API drivers for both databases and write individual queries; then merge their results. An ORM like sqlalchemy will go a long way to help with that.
The other option is to use an integration layer. There are many options out there, however, none that I know that are written in Python. mule esb, apache servicemix, wso2 and jboss metamatrix are some of the more popular ones.
You can colocate the data on a single RDBMS node (either PostgreSQL or MySQL for example).
Two main approaches
Readonly - You might want to use read-replicas of both source systems, then use a process to copy the data to a new writeable converged node; OR
Primary - You might chose a primary database of 2. Move the data from one to the primary using a conversion process (eg. ETL or off the shelf table-level replication)
Then you can just run the query on the one RDBMS with JOINs as usual.
BONUS: You can also do log reading from RDBMS that can ship logs through Kafka. You can make it really complex as required.

Synchronising data between different databases

I'm looking for a possible solution for the following problem.
First the situation I'm at:
I've 2 databases, 1 Oracle DB and 1 MySQL DB. Although they have a lot of similarities they are not identical. A lot of tables are available on both the Oracle DB and the MySQL DB but the Oracle tables are often more extensive and contain more columns.
The situation with the databases can't be changed, so I've to deal with that.
Now I'm looking for the following:
I want to synchronise data from Oracle to MySQL and vice versa. This has to be done real time or as close to real time as possible. So when changes are made at one DB they have to be synced to the other DB as quickly as possible.
Also not every table has to be in sync, so the solution must offer a way of selecting which tables have to be synced and which not.
Because the databases are not identical replication isn't an option I think. But what is?
I hope you guys can help me with finding a way of doing this or a tool which does exactly what I need. Maybe you know some good papers/articles I can use?
Thanks!
Thanks for the comments.
I did some further research on ETL and EAI.
I found out that I am searching for an ETL tool.
I read your question and your answer. I have worked on both Oracle, SQL, ETL and data warehouses and here are my suggestions:
It is good to have a readymade ETL tool. But, if your application is big enough to make you need a tailor made ETL tool, I suggest you for a home-made ETL process.
If your transactional database is on Oracle, you can have triggers set up on the key tables that would further trigger an external procedure written in C, C++ or Java.
The reason behind using an external procedure is to be able to communicate with both databases at a time - Oracle and MySQL.
You can read more about Oracle External Procedures here.
If not through ExtProc, you can develop a separate application in Java or .Net that would extract data from the first database, transform it according to your business rules and load it into your warehouse.
In either approaches that you choose, you will have greater control on the ETL process if you implement your own tool, rather than going for a readymade tool.

Converting Mysql to No sql databases

I have a production database server running on MYSQL 5.1, now we need to build a app for reporting which will fetch the data from the production database server, since reporting queries through entire database may slow down, hence planning to switch to nosql. The whole system is running aws stack planning to use DynamoDb. Kindly suggest me the ways to sync data from the production nosql server to nosql database server.
Just remember the simple fact that any NoSQL database is essentially a document database; it's really difficult to automatically convert a typical relational database in MySQL to a good document design.
In NoSQL you have a single collection of documents, and each document will probably contain data that would be in related rows in multiple tables. The advantage of a NoSQL redesign is that most data access is simpler and faster without requiring you to write complex join statements.
If you automatically convert each MySQL table to a corresponding NoSQL collection, you really won't be taking advantage of a NoSQL DB. This is because you'll end up loading many more documents, and thus make many more calls to the database than needed and thus loosing simplicity and speediness of NoSQL DB.
Perhaps a better approach is to look at how your applications use the MySQL database and go from there. You might then consider writing a simple utility script knowing fully well your MySQL database design.
As the data from a NoSQL database like MongoDB, RIAK or CouchDB has a very different structure than a relational database like MySQL the only way to migrate/synchronise the data would be to actually write a job which would write the data from MySQL to the NoSQL database using SELECT queries as stated on the MongoDB website:
Migrate the data from the database to MongoDB, probably simply by writing a bunch of SELECT * FROM statements against the database and then loading the data into your MongoDB model using the language of your choice.
Depending of the quantity of your data this could take awhile to process.
If you have any other questions don't hesitateo to ask.