MYSQL KEY-VALUE PAIR Viability - mysql

I am new to MySQL and I am looking for some answers to the following questions:
a) Can MySQL community server be leveraged for a key-value pair type database?
b) Which MySQL engine is best suited for a key-value pair type database?
c) Is MySQL cluster a must for horizontal scaling of key-value based datastore or can it be acheived using MySQL replication?
d) Are there any docs or whitepapers for best practices when implementiing a key-value datastore on MySQL?
e) Are there any known big implementations other than friendfeed doing key-value pair using MySQL?

Any relational database can provide a key-value store, but it's not what they're for: and they aren't good at it, not when compared to native key-value databases like e.g. Cassandra.
If your requirements aren't extreme, your best bet would be MyISAM as it's probably fastest and transaction support is not (high) on the priority list of key-value databases.

If you're only doing key-value stuff, you might want to check out HandlerSocket for Mysql. I haven't used it, but this overview shows installation and usage. Basically, it strips out all the relational stuff (query parsing, joins, etc) and uses mysql's storage directly, making it extremely fast (but suitable only for key-value storage).

I know I am late to the game here, but actually, MySQL version 5.6 has some changes that allow it to become a key value store with memcached built in. It looks really slick:
NoSQL Interface via memcached

Related

PostgreSQL VS MySQL while dealing with GeoDjango in Django

There are multiple Tutorials/Questions over the Internet/Youtube/StackOverflow for finding nearyby businesses, given a location, for example (Question on StackOverflow) :
Returning nearby locations in Django
But one thing common in these all is that they all prefers PostgreSQL (instead of MySQL) for Django's Geodjango library
I am building a project as:
Here a user can register as a customer as well as a business (customer's and business's name/address etc (all) fields will be separate, even if its the same user)
This is not how database is, only for rough idea or project
Both customer and business will have their locations stored
Customers can find nearby businesses around him
I was wondering what are the specific advantages of using PostgreSQL over MySQL in context to computing and fetching the location related fields.
(MySQL is a well tested database for years and most of my data is relational, so I was planning to use MySQL or Microsoft SQL Server)
Would there be any processing disadvantages in context to algorithms used to compute nearby businesses if I choose to go with MySQL, how would it make my system slow?
But one thing common in these all is that they all prefers PostgreSQL (instead of MySQL) for Django's Geodjango library
The reason why they suggest using Postgres is that it has better support for spatial data. It's not that MySQL doesn't support spatial data. However, there is a long list of features which Postgres supports and MySQL doesn't. You can look at this page for details. Almost every time MySQL is mentioned on that page, it is to describe a feature that it does not support, but that Postgres does.
(MySQL is a well tested database for years and most of my data is relational, so I was planning to use MySQL or Microsoft SQL Server)
Note that foriegn key constraints are not compatible with MyISAM, which is the only MySQL database engine which supports spatial indexes. So if you pick MySQL, you need to choose between referential integrity and fast spatial lookups.
If you use Postgres, you can have both referential integrity and fast spatial lookups. Postgres is also a quite mature and widely used relational database these days.
Would there be any processing disadvantages in context to algorithms used to compute nearby businesses if I choose to go with MySQL, how would it make my system slow?
It really depends on how many businesses you're searching for. If you pick an engine that does not support spatial indexes, MySQL is forced to do a full table scan, which takes O(N) time. On the other hand, it can do bounding box comparisons to eliminate many geometries quite quickly. I have seen acceptable interactive performance for 100k points, with performance dropping off after that. In contrast, Postgres with a spatial index is fast for any number of points.

What to use to locally store data persistently in flutter?

I am quite new to the programming environment so any help in which direction to head would be greatly appreciated. What should i use? NoSQL or Sql? What should be my deciding factors?
Should i go with a noSQL db like sembast since my server side application uses mongodb for storage and since i have to mainly deal with JSON data , or should I go with SQL db like sqflite? What should be my considerations when deciding between the two? Are there some other options that i should be aware of too?
SQL databases are called relational databases. They are based on tables, which consists of a number of rows of data. They use SQL (Structured Query Language) for defining and manipulating the data. Thanks to this, SQL database are a good fit for complex querying. The transaction mechanism makes SQL databases a good choice for heavy duty transactional type applications.
NoSQL databases are called as non-relational or distributed database. They are based on key-value pairs document and don't have a standard schema definitions. Queries are focused on collection of documents so they cannot be very complicated. Although NoSQL provides transaction mechanism it's not stable enough for a complex transactional applications. NoSQL databases fit best for the hierarchical data storage (similar to JSON data) so they are highly preferred for large data set of data (big data).

Store JSON data as Text in MySQL

This is more of a concept/database architecture related question. In order to maintain data consistency, instead of a NoSQL data store, I'm just storing JSON objects as strings/Text in MySQL. So a MySQL row will look like this
ID, TIME_STAMP, DATA
I'll store JSON data in the DATA field. I won't be updating any rows, instead I'll add new rows with the current time stamp. So, when I want the latest data I just fetch the row with the max(timestamp). I'm using Tornado with the Python MySQLDB driver as my primary backend application.
I find this approach very straight forward and less prone to errors. The JSON objects are fairly simple and are not nested heavily.
Is this approach fundamentally wrong ? Are there any issues with storing JSON data as Text in MySQL or should I use a file system based storage such as HDFS. Please let me know.
MySQL, as you probably know, is a relational database manager. It is designed for being used in a way where data is related to each other through keys, forming relations which can then be used to yield complex retrieval of data. Your method will technically work (and be quite fast), but will probably (based on what I've seen so far) considerably impair your possibility of leveraging the technology you're using, should you expand the complexity of your scope!
I would recommend you use a database like Redis or MongoDB as they are designed for document storage rather than relational architectures.
That said, if you find the approach works fine for what you're building, just go ahead. You might face some blockers up ahead if you want to add complexity to your solution but either way, you'll learn something new! Good luck!
Pradeeb, to help answer your question you need to analyze your use case. What kind of data are you storing? For me, this would be the deciding factor: every technology has its specific use case where it excels at.
I think it is safe to assume that you use JSON since your data structure needs to very flexible documents, compared to a traditional relational DB. There are certain data stores that natively support such data structures, such as MongoDB (they call it "binary JSON" or BSON) as Phil pointed out. This would give you improved storage and/or improved search capabilities. Again, the utility depends entirely on your use case.
If you are looking for something like a job queue and horizontal scalability is not an issue and you just need fast access of the latest you could use RedisDB, an in-memory key value store, that has a hash (associative array) data type and lists for this kind of thing. Alternatively, since you mentioned HDFS and horizontal scalability may very well be an issue, I can recommend using queue systems like Apache ActiveMQ or RabbitMQ.
Lastly, if you are writing heavily, and your are not client limited but your data storage is your bottle neck: look into distributed, flexible-schema data storage like HBase or Cassandra. They offer flexible data schemas, are heavily write optimized, and data can be appended and remains in chronological order, so you can fetch the newest data efficiently.
Hope that helps.
This is not a problem. You can also use memcached storage engine in modern MySQL which would be perfect. Although I have never tried that.
Another approach is to use memcached as cache. Write everything to both memcached, and also mysql. When you go to read data, try reading from memcached. If it does not exist, read from mysql. This is a common technique to reduce database bottleneck.

Seeking clarification about mysql 5.6 memcache integration

I'm having trouble getting a clear understanding of what MySQL 5.6 is introducing w/r/t memcache.
As I understand it, memcache by itself is essentially a huge, shared, memory-resident hash table that is managed by a server, memcached. In particular, it knows nothing about a persistent data store, and offers no services in that regard. It simply knows about keys and values (like a Perl hash).
What I think mySQL 5.6 introduces is a NoSQL API, whereby mySQL clients can request data from the mySQL server by key, rather than by a SELECT statement. (And similarly, they can perform updates with key=value pairs). MySQL uses memcached to cache these in memory as a performance boost, but also takes care of things like writing updates back to the database before they age out of the cache, etc.
In other words, the use of memcached is an implementation detail of the mySQL 5.6 NoSQL feature, and is not something the application programmer needs to be aware of.
I'd welcome any corrections or amplification to my understanding.
Thanks,
Chap
I think it's quite simple (from the official documentation):
I disagree with your last sentence, the application programmer has to be really aware of the memcache plugin because having it onboard of the MySQL server means that he can decide (maybe he will be forced to) access data through a memcached language interface or via the SQL interface
To better understand the impact of this plugin onto an app design you should know that there are 3 configuration tables used by MySQL for a proper memcached management; understanding how the "cache_policies" works will shade some light to some of your doubts:
Table cache_policies specifies whether to use InnoDB as the data store of memcached (innodb_only), or to use the traditional memcached engine as the backstore (cache-only), or both (caching). In the last case, if memcached cannot find a key in memory, it searches for the value in an InnoDB table.
here is the link: innodb-memcached-internals
This quote above means that, depending on what you decided for a specific key-value, you will have different application scenarios :
innodb_only -> means that you can query the data via a sql interface or via a memcached interface, here is a link to some memcached language interface examples memcached-interfaces
cache-only -> means that you should query the data via the memchached interface only
caching -> means that you can use both the interfaces (note that the storage mechanism slightly changes)
Of course this latter configuration decision is strictly related to your specific needs
I don't really have a complete answer for you I'm afraid, as I too am struggling to find the detail I require before toying around with it.
That said however there is one important point which I have managed to uncover that you seem to have missed, namely that by accessing the InnoDB storage engine via the new plugin you are actually completely bypassing SQL and avoiding all the overhead that comes with it.
This of course makes it essentially a key/value store more akin to most NoSQL databases complete with all the drawbacks associated with them. i.e. no joins etc...
However on the flip side for many applications these days, this is exactly what we want. There has been only a handful of real world performance mentions that I have come across but all seem to point to this implementation significantly outperforming MongoDB and other similar NoSQL solutions (how much truth is in it I do not know) with even one (relatively in depth) comparison claiming as high as 700k qps on a commodity server (compared with around 100k on a well tuned MySQL setup), which is incredible if true.
Resource here:
http://yoshinorimatsunobu.blogspot.co.uk/search/label/handlersocket
Anyway, sorry I can't be any more help but its food for thought at least!

NoSQL / InnoDB with Memcached

Anybody use NoSQL / InnoDB with Memcached?
How stable is it? I have set it up yesterday and going to test today, but maybe you can share some knowledge also?
Not sure what you mean by NoSQL/InnoDB - Innodb is a storage engine used in mysql table schemas and isn't really related to NoSQL key/value stores like Mongo, Redis or CouchDB. If you mean a comparison between the two, here is a basic benchmark on an update statement between mongo, a major NoSQL platform, and mysql tables using the InnoDB engine.
http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-update-performance.html
That said, most of the NoSQL alternatives have at this point fairly stable libraries. An application my team worked on utilized memcached alongside mongo utilizing their Python APIs in a search app to store query data to train the search results on later. Basically memcached hashes were stored alongside query data and then called after a result set was picked by the user in order to refine the results for those works. Haven't had any problems with utilizing the two together and implementation was a snap.
Most NoSQL engines now use some serialized key-value data, commonly some variant on the JSON spec. This actually makes things generally even easier than the old RDBMS approach of constructing your objects from across multiple tables and running numerous updates for your persistence tier. In the case of Mongo, we handed the whole serialized BSON doc returned from Mongo to memcached for the temp storage and there were no chokes at all.
This NoSQL thing is pretty cool for those already working with the object paradigm.