I am newbie to the Ejabberd so I am still exploring all the possibilities and possible setups for chat server.
From the documentation I have seen that Ejabberd supports Redis database for transient data, user session I suppose...
I was wondering has anyone had any experience using Redis for storing transient data and then MySql for the rest of the data? Will this setup be beneficial comparing to Mnesia + MySql one? Maybe Redis + Riak is even better setup?
Just looking for some general opinions since I am a newcomer in this area...
Full disclosure: I work for Basho, the maintainers of Riak, so I have a clear preference here.
Looking at the source of Ejabberd, I see it's written in Erlang as well, which is optimized as a distributed system. Their architecture diagram specifically shows Riak as a NoSQL backend. Redis is often paired with Riak due to its simple retrieval and key/value design. If scale is a concern on the transient side as well, you could use Riak's in-memory backend alongside the disk-based backend for durable data (more on backends here).
Riak is designed for scaling, so if you anticipate growth beyond a single server's worth of CPU, memory or storage, then it's perfect. If you do not anticipate this growth, then Riak may be overkill. For more on when to use it, read this.
Related
I have a little bit of a problem concerning the design of a planned application, especially database engine and Serverless/not serverless.
The goal is a Web Application which talks via the Rest API to the database. The Rest API itself is really just CRUD operations, so for that the Serverless aproach (AWS Lambda) would fit pretty good in my opinion. For that, the probably most efficient database to choose would be DynamoDB (NoSQL).
I am familliar with RDBMS and have only little knowledge of NoSQL databases.
The Schema of the application is not yet finished and should be expandable at later points, because there could be new features to implement and so on. Because of this, i would rather use a RDBMS and not a NoSQL database, because they don't scale that well in terms of editing the schema at later points. (at least that's what i read the last couple of hours)
Choosing for example Amazon RDS MySQL database, would be much more expensive and i don't know how well they do with the Serverless aproach of the Rest API.
So i am standing at a point i really don't know what services to use here. Could i still use DynamoDB? The schema would propably be very relational.
DynamoDB doesn't have any concept of schema so the whole thing about editing is kind of unrelated (to DynamoDB). If by schema you mean objects with certain properties then it depends on the use case. If you are OK with having object in a table that don't share the same "schema" then it is extremely simple as that is allowed by default. On the other hand, if you need all the objects to share the same set of attributes and you are going to change them frequently then this is indeed not as easy and straight forward compared to RDS.
Next, if you have a relational schema - tables - and are planning to do some JOINs on them then DynamoDB is really not a good solution. DynamoDB is good for a specific type of use cases like storing sessions or something with similar (low) complexity. Writing more complex queries in DynamoDB can get very tedious and painful.
Considering the price. Well, I wouldn't really say that DynamoDB is that cheap. It seems like that at a first glance but if you dig deeper into it then you find that it is actually pretty expensive, mainly from the perspective of writes. You need to provision read and write capacity and more throughput you require the more costly it gets (you can go with auto-scaling for burst traffic but in case of consistent traffic, this will not help you that much). At larger scales, RDS (not Aurora) will cost only a fraction of what the DynamoDB will cost you (assuming that we are talking about use case which can be handled by RDS).
If you are worried about RDS integration with Lambda then the complexity is not that bigger compared to DynamoDB. There are some considerations that need to be taken such as the lambda execution time hard limit (which is currently 15 minutes) and RDS may be slower to respond (compared to DynamoDB), but if your query is taking that long then you are either doing something wrong or misusing those tools.
All in all, if you are comfortable with using RDS and don't need that millisecond latency provided by DynamoDB (or even microsecond latency if using DAX as well) then I would definitely go with RDS over DynamoDB in your case. Again, DynamoDB is not a general purpose solution to every data related problem and more often then not, I see it being heavily misused for stuff that can be easily handled by RDS.
Is a good idea use Redis as a persistent database(AOF strategy) to store information about geodata?
For example, instead you store all positions of a user inside mysql, I want to use redis. But I am afraid of persistence problem.
Redis persistence is not the same as durability in an ACID database. Trying to make Redis maximally durable (insofar as it can) will limit its performance and lead to large log files. You can relax persistence by various configuration options, but this naturally leads to a compromise on durability.
You should read more about it:
https://redis.io/topics/persistence
http://oldblog.antirez.com/post/redis-persistence-demystified.html
Personally, I would not use Redis as a primary data store for any data that could not be reproduced easily. That would not be using Redis for its strength, in any case.
OpenStack Nova is currently using MySQL (powered by SQLAlchemy) as its db backend. What would be the pros and cons of switching to Cassandra?
Openstack uses MYSQL as a backend for persisting service schema and the state of various artifacts (nodes, roles, networks, security groups, etc). The transactional intensity towards the persistence store is not so "instensive", therefore NoSQL is a good option in general. Here are some pros/cons:
PROS:
persistence store high availability out of the box
live horizontal scalability
better multi-tenancy, given the large schematic scope and scalability of Cassandra
enablement for analytics: sitting on a NoSQL store it becomes more straightforward to introduce analytics functionality within openstack
CONS:
code redesign: openstack's code is centric on relational database model. Migrating to NoSQL would require a relevant redesign of all openstack projects/codes, as well as require the introduction of indexing a model within cassandra to allow to relate data. Changes like this often require time, thinking and stability
more complex administration/maintenance than Mysql
potential for data conflicts: Cassandra has an eventually consistent model, although, given the not so concurrent transactional use of openstack, this should not be much of a problem at first sight
performance, although again, as openstack is not really "transactional" and as it has its own performance issues (python based code and services), this should not be much of a problem as well.
I'm having trouble getting a clear understanding of what MySQL 5.6 is introducing w/r/t memcache.
As I understand it, memcache by itself is essentially a huge, shared, memory-resident hash table that is managed by a server, memcached. In particular, it knows nothing about a persistent data store, and offers no services in that regard. It simply knows about keys and values (like a Perl hash).
What I think mySQL 5.6 introduces is a NoSQL API, whereby mySQL clients can request data from the mySQL server by key, rather than by a SELECT statement. (And similarly, they can perform updates with key=value pairs). MySQL uses memcached to cache these in memory as a performance boost, but also takes care of things like writing updates back to the database before they age out of the cache, etc.
In other words, the use of memcached is an implementation detail of the mySQL 5.6 NoSQL feature, and is not something the application programmer needs to be aware of.
I'd welcome any corrections or amplification to my understanding.
Thanks,
Chap
I think it's quite simple (from the official documentation):
I disagree with your last sentence, the application programmer has to be really aware of the memcache plugin because having it onboard of the MySQL server means that he can decide (maybe he will be forced to) access data through a memcached language interface or via the SQL interface
To better understand the impact of this plugin onto an app design you should know that there are 3 configuration tables used by MySQL for a proper memcached management; understanding how the "cache_policies" works will shade some light to some of your doubts:
Table cache_policies specifies whether to use InnoDB as the data store of memcached (innodb_only), or to use the traditional memcached engine as the backstore (cache-only), or both (caching). In the last case, if memcached cannot find a key in memory, it searches for the value in an InnoDB table.
here is the link: innodb-memcached-internals
This quote above means that, depending on what you decided for a specific key-value, you will have different application scenarios :
innodb_only -> means that you can query the data via a sql interface or via a memcached interface, here is a link to some memcached language interface examples memcached-interfaces
cache-only -> means that you should query the data via the memchached interface only
caching -> means that you can use both the interfaces (note that the storage mechanism slightly changes)
Of course this latter configuration decision is strictly related to your specific needs
I don't really have a complete answer for you I'm afraid, as I too am struggling to find the detail I require before toying around with it.
That said however there is one important point which I have managed to uncover that you seem to have missed, namely that by accessing the InnoDB storage engine via the new plugin you are actually completely bypassing SQL and avoiding all the overhead that comes with it.
This of course makes it essentially a key/value store more akin to most NoSQL databases complete with all the drawbacks associated with them. i.e. no joins etc...
However on the flip side for many applications these days, this is exactly what we want. There has been only a handful of real world performance mentions that I have come across but all seem to point to this implementation significantly outperforming MongoDB and other similar NoSQL solutions (how much truth is in it I do not know) with even one (relatively in depth) comparison claiming as high as 700k qps on a commodity server (compared with around 100k on a well tuned MySQL setup), which is incredible if true.
Resource here:
http://yoshinorimatsunobu.blogspot.co.uk/search/label/handlersocket
Anyway, sorry I can't be any more help but its food for thought at least!
I come from the cliche land of PHP and MySQL on Dreamhost. BUT! I am also a javascript jenie and I've been dying to get on the Node.js train. In my reading I've discovered inadvertently a NoSQL solution called Redis!
With my shared web host and limited server experience (I know how to install Linux on one of my old dell's and do some basic server admin) how can I get started using Redis and Node.js? and the next best question is -- what does one even use Redis for? What situation would Redis be better suited than MySQL? And does Node.js remove the necessity for Apache? If so why do developers recommend using NGINX server?
Lots of questions but there doesnt seem to be a solid source out there with this info all in one place!
Thanks again for your guidance and feedback!
NoSQL is just an inadequate buzz word.
I'll attempt to answer the latter part of the question.
Redis is a key-value store database system. Speed is its primary objective, so most of its use comes from event driven implementations (as it goes over in its reddit tutorial).
It excels at areas like logging, message transactions, and other reactive processes.
Node.js on the other hand is mainly for independent HTTP transactions. It is basically used to serve content (much like a web server, but Node.js really wouldn't be necessarily public facing) very fast which makes it useful for backend business logic applications.
For example, having a C program calculate stock values and having Node.js serve the content for another internal application to retrieve or using Node.js to serve a web page one is developing so one's coworkers can view it internally.
It really excels as a middleman between applications.
Redis
Redis is an in-memory datastore : All your data are stored in the memory meaning that a huge database means huge memory usage, but with really fast access and lookup.
It is also a key-value store : You don't have any realtionships, or queries to retrieve your data. You can only set a key value pair, and retreive it by its id. (Redis also provides useful types such as sets and hashes).
These particularities makes Redis really well suited for storing sessions in a web application, creating indexes on a database, handling real-time data like analytics.
So if you need something that will "replace" MySQL for storing your basic application models I suggest you try something like MongoDB, Riak or CouchDB that are document store.
Document stores manages your data as something analogous to JSON objects (I know it's a huge shortcut).
Read this article if you want to know more about popular nosql databases.
Node.js
Node.js provides asynchrous I/O for the V8 JavaScript engine.
When you run a node server, it listens on a port on your machine (e.g. 3000). It does not do any sort of Domain name resolution and Virtual Host handling so you have to use a http server with a proxy such as Apache or nginx.
Choosing over nginx in production is a matter of performance, and I find it easier to use. But I suggest you use the one you're the most comfortable with.
To get started with it just install them and start playing with it. HowToNode
You can get a free plan from https://redistogo.com/ - it is a hosted redis database instance.
Quick intro to redis data types and basic commands is available here - http://redis.io/topics/data-types-intro.
A good comparison of when to use what is here - http://playbook.thoughtbot.com/choosing-platforms/databases/