Caching the data result of complex computation - mysql

I have a Spring Boot server application. Clients of this server ask for statistics about different things all the time. These statistics can be shared among clients, and must not be real time.
It's good enough if these statistics are refreshed every 15-30 mins.
Also, computing these statistics requires reading the whole database.
So, I'd like to cache these computed statistics and update them now and then.
What is your suggestion, what tool or pattern should I use?
I have the following ideas so far:
using memcached
upgrading to MySQL 5.7 which has JSON store, and store the data there
Please keep in mind that the hardware of my server is not too powerful: 512MB RAM and 1 CPU (cheapest option in DigitalOcean).
Thank you in advance!
Edit 1:
These statistics are composed of quite simple data structures: int to int maps, lists, etc. and they are NOT fitting well for a relational database.
Edit 2:
The whole data is only a few megabytes. The crutial point is that creating this data requires a lot of database reads, and a lot of clients are asking for it.
I also want to keep my server application stateless. I think it's important to mention.

A simple solution for the problem, is saving the data in JSON format to a file, and that's it.
Additionally, this file can be on a ram disk partition, so it will be blazing fast.

Related

In-memory database for mahout recommendatiion

I have been working on mahout lately. The current version of supports inputs from Files, MySQL etc... via its DataModels. In my case, the raw-data resides within a Postgres DB at a client location. The raw-data requires a good amount of pre-processing before being fed into the mahout DataModel. Currently I'm storing the refined data as a simple *.csv file and loading it to Mahout using inbuilt FileDataModel.
Is it possible to use an inmemory DB to actually store the refined data and t load it to Mahout using its existing MySQLJDBCDataModel/JDBCDataModel? . If so, what kind of inmemory DB would serve this purpose
sqllite3 is quite often the goto in memory database and for good reason it's one of the most battle hardened databases out there and can be found literally everywhere. The browser you're using is likely using it. It has an in memory option that's fairly straight forward. Even disk based it's also fast.
Most databases given enough RAM will efficiently load most of your data into RAM anyway. I used PostgreSQL as the backend for a search engine for a long time and most access was to RAM with almost nothing going to disk when reading. If you already have the database in PostgreSQL it might be simpler to keep it in that.
Keep in mind that you can only access an SQLite in-memory database from a single process.
If you need the ultimate performance, even a fully cached persistent database won't be as fast as a true in-memory database system. To me, though, it doesn't sound like you need that level of extreme performance.

XML or MySQL for User Database?

Might seem a strange question but would there be a performance benefit in using XML for a database rather than MySQL and tables?
To put this into context I wil be creating a website that has user profiles. I know more XML than MySQL and know most ppl will use MySQL as standard but was wondering if anyone could throw some pennies this way about how the two compare and if this suggestion is as outrageous to anyone understanding what the big O notation is as it could be...
The bigger xml file, the more memory usage because you'll have to load the entire xml file to RAM whilst running your script.
An average MySQL database is about 4mb big. Lets take that to a xml file of 4 mb, loaded to ram 4 mb, loaded from disk, into ram at every pageview, with about 25 visitors at any given moment that's 100mb already lost, let's say they flick a lotthrough pages it adds up to a fast 1 gigabyte of ram.
Not to mention you'll add about 1 second to page load every time, if not longer.
Not to mention continueus disk load for reading and writing changed vars. Threaded fork issues when two vitors want to update the same xml file.
These problems you don't have with an SQL server.
MySQL has indexes, and it's optimized for the binary values you will be storing. All you have with an xml file, is a plain file.. and any optimizations (caching, indexing, anything you can think of) will be up to you to implement.
XML is a great format for transport, everybody speaks it.. but you do not want to use it for storage.
And if you already know XML, but not yet MySQL.. I would say you're ahead of the game. You'll probably find writing SQL queries and fetching the results more straightforward than working with xml data.
As I see - there are several XML Db solutions available - these appear in a simple google search:
http://exist-db.org/exist/index.xml;jsessionid=1dowedwdr9hsanbcvdcom8aka
http://basex.org/
http://www.oracle.com/technetwork/database/features/xmldb/index.html
http://www.sedna.org/
So all it matters here is the speed of development. If you're mostly familiar with XML - then using one of those could be a booster for development time.
However - there is plenty of relational DB ORM products - depending on the programming language, that leverage the most dev effort and make it easy to use a database for a web site. So if you don't have some specific needs for your web site, you might go with any of the options above.
It depends on the structure of your database. This question cann't give a definite answer without knowing anything about your data. Any comparison of XML versus a relational database depends heavily on which data you choose, and what type of operations you plan.
For example you want store, index, and query is more than million rows and each row has a lot of the same fields. That’s a simple and fixed structure and it’s the same for all records. It’s a perfect fit for a relational database and can be stored in a single table. Relational databases handles such fixed records very efficiently.
Well, there are two main questions here.
First, if you're going to use a database, you have a choice between an XML database and a relational database. The choice depends primarily on the nature of your data (especially its complexity, but also the way in which it is used).
Then you have the choice between using a database and using a simple file (for example an XML file). That choice depends primarily on the quantity of data and the transaction throughput.
Since you haven't told us much about the nature of the data or its quantity or the throughput requirements, it's hard to advise you specifically on either question.

Reduce database writes with memached

I would like to convert my stats tracking system not to write to the database directly, as we're hitting bottlenecks.
We're currently using memcached for certain aspects of the site, and I wanted to use it for storing stats and committing them to mysql DB periodically.
The issue lies however in the number of items (which is in the millions) for which potentially there could be stats collected between the cronjob runs that would commit them into the database. Other than running a SELECT * FROM data and checking for existence of every single memcache key, and then updating the table.... is there any other way to do this?
(I'm not saying below is gospel, this is just my gut feeling. As said later on, I don't have the specifics of your system :) And obviously no offence meant etc :) )
I would advice against using memcached for this. Memcached is build te quickly retrieve values that you've gotten before, not to store values. The big difference is that is your cache is getting full, you'll loose your data.
Normally, you'd just have no data in your cache, and recollect the data from the source, which is impossible in this case. That alone would be a reason for me to try an dissuade you from this.
Now you say the major problem is the mysql connection limit you are hitting. If you do simple stuff (like what we talked about in the comments: the insert delayed), it's just a case of increasing the limit. You should probably have enough power to have your scripts/users go to the database once and say "this should eventually be added", and then go away. If your users can't even open 1 connection for that, there's a serious resource problem you probably won't fix by adding extra layers of cache?
Obviously hard to say without any specs of the system, soft and hardware, but my suggestion would be to see if you can just let them open their connections by increasing the limit, and fiddle with the server variables a bit, instead of monkey-patching your system by using a memcached as an in-between layer.
I had a similar issue with statistic data. But please don't use memcached for it. You can't be sure that ALL your items will moved to DB. You can loose data and/or double process data.
You should analyse your bottleneck against how much data you are writing/reading and how many connections you need. And than switch to something scalable like Hadoop, Cassandra, Scripe and other systems.
You need to provide additional information on the platform that you are running: O/S, database (version), storage engine, RAM, CPU (if possible)?
Are you inserting into a single table or more than one table?
Can you disable the indexes on the tables you are inserting into as this slows down the insert functions.
Are you running any triggers or stored procedures to compute values as you insert the raw data?

Database design with millions of entry

Suppose there is a messaging system. This system has millions of entry to be sent and get reported and the count is growing by 100K every hour. 2 service accesses db, one is sender, one is reporter. So what would you suggest in order to get maximum performance? How could the db be designed?
Also what open source RDBMS would you suggest among mysql, postgresql, mongodb etc. to fullfil this high volume db?
Thanks
You've not really provided much information on your requirement other than a few comments about expected data volumes. Simple storage of large volumes of data has no real intrinsic value, it's the ability to access that data which gives the real value; so knowing how you expected to retrieve information from the database is more important than how much data you want to store.
Do these messages really require a document db like MongDB, or are are they structured enough to use a straight RDBMS like Postgresql or MySQL. Do you need full text search capability? How often and what type of queries are executed against this message data? Are you trying to write your own Twitter?
If those are your current data volumes, look to using db replication for resilience. Consider partitioning your message table, perhaps by date posted. Use master/slave (or even multi-master/multi-slave) as Konerak has suggested. Look at the possibilities of an archive table for older messages that are less likely to be queried, but which are then still available. Look at what a commercial database like Oracle can offer you. Get in a professional to help tune the db for performance, rather than simply asking for free advice on sites like SO.
Consider your hardware as well... multiple load balanced servers to help with the volumes (we have 14 dedicated servers purely for accepting new messages, and three high performance servers tuned for querying the data).

Best storage engine for constantly changing data

I currently have an application that is using 130 MySQL table all with MyISAM storage engine. Every table has multiple queries every second including select/insert/update/delete queries so the data and the indexes are constantly changing.
The problem I am facing is that the hard drive is unable to cope, with waiting times up to 6+ seconds for I/O access with so many read/writes being done by MySQL.
I was thinking of changing to just 1 table and making it memory based. I've never used a memory table for something with so many queries though, so I am wondering if anyone can give me any feedback on whether it would be the right thing to do?
One possibility is that there may be other issues causing performance problems - 6 seconds seems excessive for CRUD operations, even on a complex database. Bear in mind that (back in the day) ArsDigita could handle 30 hits per second on a two-way Sun Ultra 2 (IIRC) with fairly modest disk configuration. A modern low-mid range server with a sensible disk layout and appropriate tuning should be able to cope with quite a substantial workload.
Are you missing an index? - check the query plans of the slow queries for table scans where they shouldn't be.
What is the disk layout on the server? - do you need to upgrade your hardware or fix some disk configuration issues (e.g. not enough disks, logs on the same volume as data).
As the other poster suggests, you might want to use InnoDB on the heavily written tables.
Check the setup for memory usage on the database server. You may want to configure more cache.
Edit: Database logs should live on quiet disks of their own. They use a sequential access pattern with many small sequential writes. Where they share disks with a random access work load like data files the random disk access creates a big system performance bottleneck on the logs. Note that this is write traffic that needs to be completed (i.e. written to physical disk), so caching does not help with this.
I've now changed to a MEMORY table and everything is much better. In fact I now have extra spare resources on the server allowing for further expansion of operations.
Is there a specific reason you aren't using innodb? It may yield better performance due to caching and a different concurrency model. It likely will require more tuning, but may yield much better results.
should-you-move-from-myisam-to-innodb
I think that that your database structure is very wrong and needs to be optimised, has nothing to do with the storage