Good scalable fault-tolerant in-memory database with LINQ support for .NET - fault-tolerance

Are there are good in-memory transactional databases that support LINQ and SQL Server persistance? I'd like to create a full representation of a large data store in memory and have it commit to a SQL Server Database in a lazy fashion, but still keep some level of fault tolerance by scaling it out horizontally. I don't want to rely on non-relational datagrams like CouchDB.

SQLite supports in-memory databases has transaction support and has a Linq provider as well.
As for the SQL Server persistance, I think you would be on your own to code up something for doing lazy transfers to a backing SQL Server database.

SQL Server with an ORM tool and possibly a caching solution.

Related

Migrating from SQL Server to AWS Aurora

My organization is considering moving our current SQL Server RDS instance to an AWS Aurora instance. Our motivation is solely to cut down on costs. I have run some successful tests using the MySQL Workbench Database Migration tool to move the SQL Server db to an Aurora instance. The database is about 4GB, has about 100 tables, about a dozen views and stored procedures. I am already using a MySQL copy of the database for development on a local machine, so all SQL syntax differences are already handled.
Are there any serious downsides to this migration project? Anything we should consider before making the switch?
This is a really series step. You should consider a few key things while migrating:
Performance: for simple request Aurora (improved MySQL) can be faster than MS SQL, but:
a) MS SQL has smarter query analyzer and in order to take good Aurora (MySQL) performance you need to understand how it works very well. SQL Server can “excuse” a lot of developer things.
b) For such big databases SQL Server in Enterprise edition has a few great features: partitioning, data compressions, online rebuild indexes, always on. But Enterprise edition can be expensive enough. That’s true.
Development: MySQL (Aurora) syntax is really poor comparing to SQL Server. MySQL doesn’t support many many things like HierarchyID, recurrent CTE, included column in indexes, index filtering, change tracking, XML, JSON and so on. It even limites systax for nested queries a lot. Many thing you need to implement yourself. Also, SQL Server has more professional development tools like SQL Server Management Tool, SQL Profiler, and Tuning Adviser and so on.
Different implementations: Some things in Aurora work differently. For instance, unique index in Aurora allows having many null values but SQL Server doesn't allow this. And so on.
Priсe: For good performance for now you need to rent at least large Aurora instance which costs about $210 p/m + replica = about $420 - not so cheap. Amazon Aurora Pricing
So, I’d recommend calculating all pros and cons while migrating because you can reduce resource cost but spend additional time and affords (so money) on development and maintenance.

Cassandra + Spark vs MySQL + Spark

I have to design a piece of software on a three layer architecture:
A process periodically polling a data source such an ftp to inject in a database
A database
Spark for the processing of the data
My data is simple and perfectly suitable for being stored in a single RDMS table, or I can store it in Cassandra, then periodically I would need Spark to run some machine learning algorithms on the whole set of data.
Which of the database better suits my use case? In detail, I do not need to scale on multiple nodes and I think the main underlying questions are:
Is simple querying (SELECT) faster on Cassandra or MySQL on a simple table?
Does the Spark Connector from Cassandra benefit of some features of it that will make it faster than a SQL connector?
You can use MySQL if data size is less than 2Tb. Select on MySQL table will be more flexible than in Cassandra.
You should use Cassandra when your data storage requirement crosses single machine. Cassandra needs careful data modeling to be done for each lookup or Select scenario.
You can use suggested approach below for MySQL Spark Integration
How to work with MySQL and Apache Spark?
It all depends on Data : size, integrity, scale, Flexible schema sharding etc.
Use MySQL if:
Data size is small ( in single digit TBs)
Strong Consistency( Atomicity, Consistency, Isolation & Durability) is required
Use Cassandra if:
Data size is huge and horizontal scalability is required
Eventual Consistency ( Basically Available Soft-state Eventual consistency)
Flexible schema
Distributed application.
Have a look at this benchmarking article and this pdf
I think it's better to use a sql database as mysql, cassandra only should be used if you need to scale your data in bigger proportions and along many datacenters. The java cassandra jdbc driver is just a normal driver to connect to cassandra, it doesn't have any especial advantages over other database drivers.

Query an MS-SQL server from MySQL server

I work for a large organization that has an established and well populated MS-SQL server. However, I am not a Microsoft user, and my database of choice is MySQL. I am looking for a solution that will allow me to either...
-Directly query our MS-SQL server from my MySQL server
and/or
-Set up some sort of job that will copy data systematically from the MS-SQL server to our MySQL server.
It looks like Linked Servers may be part of the solution, however everything I have found describes scenarios where MS-SQL is accessing MySQL, not the other way around.
To be clear I want my MySQL server to talk to/query/pull data from my MS-SQL server.
Any help appreciated!
As far as I'm aware, you can't query any other RDBMS vendor from MySQL. MySQL's remote access feature is FEDERATED tables, which only work with other MySQL databases as far as I know.
About the simplest way you could do this would be to use SQL Server's Import/Export Wizard to create a simple package that copies the data to your MySQL server through an ODBC or ADO.NET connection to the MySQL database.
To be clear I want my MySQL server to talk to/query/pull
data from my MS-SQL server.
I think it is hard to even assume this is the best decision. Without a TON more context of what the real problem is and/or the real "need", answers vary widely from "just use ms-sql" to other levels of ad-hoc ETL. That said, some abstract feedback.
There is nothing wrong with MS-SQL, as long as you are (a) not paying for it and (b) have a clean solution to use it from a real POSIX based system. Technically, MS-SQL is a great database, I just dislike Windows. To that end, I made sure that working with MS-SQL from Ruby was done well at both the C extension layer with TinyTDS and the ActiveRecord adapter.
Sadly, I have personally stopped maintaing the later, but the C extensions are strong and even used by great projects like Sequel which if you had to some sort of raw ETL without the overhead of ActiveRecord is a great choice since it has adapter for all DBs, TinyTDS included.

Should I use SQLite instead of MySQL?

I need to improve a PHP-MySQL web application, which only uses MySQL for REPL operations (and some search functions). 99% of the applications that I worked with never used advanced MySQL features, like replication, cross-table constraints, locking etc.
To my understanding I should instead use SQLite.
Are there any practical benefits if I do this?
Will I see a significant (>100ms) speed boost?
Should I expect problems with tables with more than 1,000,000 rows?
There is no catch-all answer to that, but there is a main point to consider: A very good rule of thumb is, that the higher your degree of concurrency is, the more you'll profit from MySQL and vice versa.
This means that in a scenario where database requests never ever are concurrent, you might see a speedup by using SQlite, though I doubt it would be in the 100ms order of magnitude.
The reason behind this is (very roughly):
In a database server environment, such as MySQL, PostgreSQL, MS SQL, Oracle and friends, a dedicated process (or a group of processes) exclusively touch the database files - the important part being dedicated. This means, that concurrency issues can be resolved in-process.
In a file-based database, such as SQlite, MS Access (Jet Engine) and friends, multiple processes will touch the DB files without knowing of each other - this implies that concurrency issues have to be resolved by writing them to the DB or helper file(s). This is typically much slower and less robust. In exchange for that, the overhead of communication between the database client (the web app) and the database server (which is in-process) is nonexistent.
Edit
After comment I want to make it more clear, that I am talking of concurrent writes, not concurrent reads. Concurrent reads of an unchanging dataset is not a hard problem - it doesn't need any locking at all.
The principal advantage of SQLite is that it is a file-based relational database that uses SQL as its query language. Being file-based tremendously simplifies deployment, making it very good for the case where an application needs a little database but must be run in an environment where having a database server would be problematic. (For example, many browsers use SQLite to manage their cookie stores; using a database server for that problem would be verging on the insane in many ways.)
The principal advantage of MySQL (with a sane table type) is that it is a database server that uses SQL as its query language. Being server-based allows for many features that a file-based system can't handle simply (such as replication) but does make things quite a bit more complex to deploy.
Whether the benefits of the additional complexity of a database server (e.g., MySQL) outweigh the costs (relative to a file-based database engine like SQLite) depends on a great many factors, notably including how many installations are expected and who is expected to perform those installations.

Converting Mysql to No sql databases

I have a production database server running on MYSQL 5.1, now we need to build a app for reporting which will fetch the data from the production database server, since reporting queries through entire database may slow down, hence planning to switch to nosql. The whole system is running aws stack planning to use DynamoDb. Kindly suggest me the ways to sync data from the production nosql server to nosql database server.
Just remember the simple fact that any NoSQL database is essentially a document database; it's really difficult to automatically convert a typical relational database in MySQL to a good document design.
In NoSQL you have a single collection of documents, and each document will probably contain data that would be in related rows in multiple tables. The advantage of a NoSQL redesign is that most data access is simpler and faster without requiring you to write complex join statements.
If you automatically convert each MySQL table to a corresponding NoSQL collection, you really won't be taking advantage of a NoSQL DB. This is because you'll end up loading many more documents, and thus make many more calls to the database than needed and thus loosing simplicity and speediness of NoSQL DB.
Perhaps a better approach is to look at how your applications use the MySQL database and go from there. You might then consider writing a simple utility script knowing fully well your MySQL database design.
As the data from a NoSQL database like MongoDB, RIAK or CouchDB has a very different structure than a relational database like MySQL the only way to migrate/synchronise the data would be to actually write a job which would write the data from MySQL to the NoSQL database using SELECT queries as stated on the MongoDB website:
Migrate the data from the database to MongoDB, probably simply by writing a bunch of SELECT * FROM statements against the database and then loading the data into your MongoDB model using the language of your choice.
Depending of the quantity of your data this could take awhile to process.
If you have any other questions don't hesitateo to ask.