Storing huge amount of data [closed]

Storing huge amount of data [closed] - mysql

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
Apologies for the broadness of this question, but looking for a bit of advice.
I have built a system in Laravel. A user is able to upload a file. This file is then sent to a system that does some data science stuff on it. It sends a response back to my system as a JSON string which I then save back to a JSON file. This file is then loaded in the frontend so some charts can be displayed.
For this most part this approach has been fine. However, when I upload really large files (which is fine because I am chunk uploading), the saved JSON file is then huge. This then becomes a problem because it is too big for the frontend to load. A single file can contain hundreds of thousands rows of data.
So my question is really about what other options I have. Instead of saving the response as a JSON file, is it okay to save these 300k+ rows to a mySQL database or is this too much for it to handle? Should I use something like MongoDB instead?
I am thinking that a database is probably the best route as I can then query the specific data I need for each chart, without having to load a huge file just to extract it. I do use the other data within the file, but this can be queried on an event so not as worried about this.
My concern is that if just 3 people upload a file, I could then have a database with over a million rows. How can I then scale this, how can I ensure I wont have issues the more users that use the system?
Any advice on this would be greatly appreciated. I am starting to think that I may need to deploy a database server per user and auto prune their data every X days.
Thanks

300k rows is not a problem. I support MySQL databases with several billion rows in a single table. You may need to upgrade to a larger, more powerful server, but MySQL can store it.
That said, you're on the right track thinking about how much data you want to retain and how much you need to prune.
Scalability depends more on optimizing the queries you need to run, not just on the number of rows you store.
So you should test the queries your app runs, and see how many rows it takes before you see performance decline. Try to optimize the queries by the usual techniques like adding indexes, or rewriting the SQL logic, or partitioning the data, etc. (many techniques, too much to get into in a Stack Overflow answer).
If you have optimized the queries as much as you can, then you need to start pruning data, or splitting data over multiple MySQL instances.

Related

How can I reduce database load and benefit from RabbitMQ with my current design? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 days ago.
Improve this question
I'm trying to find the best way to implement a multi-consumer, high load queue system.
My current technology uses a large MySQL database table queue_items, and queue_item_data (loaded dynamically). This table is never emptied so we don't process the same item twice. This table (our entire db) gets backed up every night.
I've been told this is a bad approach. Applications query the database and lock the table during it, causing a lot of deadlocks and load on the database, so I'm trying to improve it.
An item starts off by being created in the database, then loading the data (loading the url via a headless browser), parse html and inserting the data record based on response, then another service checks the newly created data record to see if we can auto decline, then a manual review via the web application, and finally passed to the processor service to be processed and inserted into our "data lake".
The loader, checker, and processor are all different microservices which run multiple instances depending on load, their "workers".
One way I thought of improving this was having RabbitMQ with 3 different queues:
loader-queue (enqueued when created)
checker-queue (enqueued when loaded)
processor-queue (enqueued when checked)
Which would reduce the queries on the MySQL end from the microservices checking which items they needed to deal with all the time.
Things I could see going wrong with my new approach (RabbitMQ), and not sure how to handle
Items could get lost, fall off queues, and never picked up. silently failing.
Keeping
items unique in each 3 queues, ensuring an item is only in one
Storing tens of thousands of items at a time, awaiting action in one
or more of the 3 queues, can RabbitMQ even handle this?
Backing up the state of where items are, similar to how the MySQL tables are backed up, managing that state, avoiding data loss and
requeueing items to specific states or queues easily.
Even if someone could give me a heads up that I'm on the right path would be great, I'm feeling rather overloaded with information, but surely its better than querying the database a ton of times, handling all that concurrency via constant locks to understand which items to deal with?

Database Design: When should I use multiple database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Background:
I'm trying to build the backend services of an app. This app has rooms where a user can join in to. When a user joins a room, s/he can exchange some data with other users through socket. Outside of the room, the user can view the processed data from the transactions that happened inside the rooms s/he joined in to. Room list, room informations, and the data transactions inside the room should be stored in a database.
My idea is to create a project with one database, however, an experienced developer suggested to the database into two:
One that uses MongoDB for storing data transactions happening inside the room.
One that uses MySQL for storing and returning room list, room information and analytics of what happened inside a room.
Problem I see with using multiple database:
I did some research and from what I understand, multiple database is not recommended but could be implemented if data are unrelated. Displaying analytics will need to process data transactions that happened inside a room and also display the room's information. If I use the two database approach, I will need to retrieve data from both database in order to achieve this.
Question:
I personally think it's easier to use a single database approach since I don't see the data inside and outside of the room as 'unrelated'. Am I missing an important point on when to use multiple database?
Thanks in advance. Have a good day.

You can look at this problem from two perspectives; technical and practical.
Technically, if your back-end is expected to become very complex or scaled, it is recommended to break it down into multiple microservices, each in charge of a tiny task. Ideally, these small services should help you achieve separation of concerns, so each service only works with one piece of the data. If other services need to read and/or modify that piece of data, they have to go through the service in charge.
In this case, depending on the data each service is dealing with, you can pick the proper database. For instance, if you have transactional data, you can use MySQL, MongoDB for large schemaless content, or Elasticsearch if you want to perform a text search.
Practically, it is recommended to start small with one service and database (if you prefer to develop your app sooner), and then, over time break it down into multiple services as you need to add and/or improve features.
There are multiple points to keep in mind. First, if you expect to have a large user base, you should start development with the right architecture from the beginning to avoid scaling issues. Second, sometimes one database cannot perform the task you need. For example, it would be very inefficient to do a text search in MySQL. Finally, there is no absolute right way of doing things. However you do one thing, another friend might show up tomorrow and ask you why you did not do it his/her way. The most important thing is to start doing and then, learning and improving along the way. Facebook was started with MySQL and it worked fine initially. Could it survive one database type today? I suspect the answer is no, but would it have made sense for them to add all the N databases that they have now back then?
Good luck developing!

Figure out database problems

At the beginning I would like to say that I am not expert in this domain and It is problematic to describe all nuances. I work on Rails application which uses Mysql database. Our DB has grown and now we have seriously problems with performance. In our app we have two features (for example sync with mobile) which process much data and It causes our database hang. We use newrelic to monitoring which confirmed that we have problems with those two parts of app. My main question is how to profile my app to figure out which actions make the biggest problem? Which tools I can use? Do you have any tips what I can do/configure DB to improve performance? What action I should do to find out where the problem is (next small step)? I know that those question are very general but I am junior in this domain and new in rails. I believe that more question will appear after your answers ;)

Firstly what is the size of db, like how many tables and avg no of rows per table. With the basic details available in your question, here are few steps you can look upon.
Regarding data sync with mobile. First step should be avoid heavy processing on the fly rather use background jobs to process the data and store in tables what actually needs to be sent to mobile . So in that case you will avoid multiple queries, rather 1-2 queries should fetch the entire data with minimum processsing.
I'm sure you must have applied indexing and active model relationship has been properly managed. It would actually help if you can post some models and basic relationship between them . Also explan in brief how the sync is being done and apis being handled.
Use benchmarking to figure the time to fetch data and keep a watch on the logs when processing data. There are some better benchmarking tooks available.

Which database should I choose? MySQL or mongoDB? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm working on a project which is somewhat familiar to WhatApp, except that I'm not implementing the chatting function.
The only thing I need to store in the database is user's information, which won't be very large, and I also need an offline database in the mobile app which should be synced with the server database.
Currently I use MySQL for my server database, and I'm thinking of using JSON for the syncing between mobile app and the server, then I found that mongoDB has a natural support for JSON, which caused me wonder should I change to mongoDB.
So here are my questions:
Should I change to mongoDB or should I still use MySQL? The data for each user won't be too large and it does have some requirement for data consistency. But mongoDB's JSON support is somewhat attractive.
I'm not familiar with the syncing procedure, I did some digging and it appears that JSON is a good choice, but what data should I put into the JSON files?

I actually flagged this as too broad and as attracting primarily opinion based answers but I'll give it a go anyhow and hope to stay objective.
First of all you have 2 separate questions here.
What database system should I use.
How do I sync between app and server.
The first one is easily answered because it doesn't really matter. Both are good options for storing data. MySQL is mature and stable and MongoDB although it's newer has very good reviews and I don't know of any known problems which would prevent it from being used. So take the database which you find easy to use.
Now for second I'll first put in a disclaimer that for data synchronization between multiple entities entire books are written and that it is after all this time still the subject of Phds.
I would advice against directly synchronizing between mobile app and database because that requires the database credentials to be contained within the app. Mobile apps can and will be decompiled and credentials extracted which would compromise your entire database. So you'll probably want to create some API which first does device/user authentication and then changes the database.
This already means that using MongoDB for sake of this would probably be a bad idea.
Now JSON itself is just a format of representing data with some structure, just as XML. As such it's not a method of synchronization but transport.
For synchronizing data it's important that you know the source of truth.
If you have 1 device <-> 1 record it's easy because the device will be the source of truth, after all the only mutations that take place are presumably done by the user on the device.
If you have n devices <-> 1 record then it becomes a whole lot more annoying. If you want to allow a device to change the state when offline you'll need to do some tricks to synchronize the data when the device comes back online. But this is probably a question too complex and situation dependent to answer on SO.
If you however force the device to always immediately propagate changes to the database then the database will always contain the most up to date record, or truth. Downside is that part of the app will not be functional when offline.
If offline updates don't change the state but merely add new records then you can push those to the server when it comes online. But keep in mind you won't be able to order these events.

Log in one place versus multiple logs [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
For a web app or a standalone server app, which would you recommend, and why?
have the giant application.log, where everything is logged;
have many smaller logs:
security.log
performance.log
lifecycle.log
integration.log

I like using databases for logging. Four useful features:
You don't lose the time-ordering, as you'd experience when looking at multiple log files at once.
You can still filter by specific message types, if you want to.
You get integrity, so if your computer crashed just as you were writing a log file, you won't get a corrupted log, and it'll be replayed off the journal when your database starts up again.
Pruning the log is really easy! No need to use hacky log rotation programs that require your daemons to be SIGHUPed or anything.
Your mileage may vary. :-)

I'd suggest multiple logs for a couple of reasons:
Reduce noise, if there are production troubleshooting situations where time counts.
Different groups can get different logs. A systems group may want different logs than an applications group as each has their part of the system where they can optimize settings.
As for how to log, I'd suggest a mixture of database, e-mail and local text file just in case there are issues with things that require connectivity to another server.

Depends on your logging reviews. You should be in the best place to answer it. Do you need specific logs so you can identify issues with better accuracy, or do you just need an overall log for maintenance purposes?
I would always use different logs, provided they don't cause too much overhead.

If you expect lots of logging and you will quickly know where to look then split them up. One big log file can get unruly real quick, but if you are constantly looking in multiple log files for the relavent entry, that's no good either. So when you split them up, make sure it's in a way where when you get an error, you know exactly which file to open first.

I'd recommend using centralized logging system off-the-shelf and not inventing your own. This will allow you to easily dig out data in any form whenever you like. It also should save you time and headache. Managing log files is not only a burden on your application because you have to write additional code, but also for humans, who actually look at the text. Check out logFaces, it might solve some of your problems - it comes with it's own database, can be used with most commercial database brands and allows instant access to relevant log data.
Disclosure: I am the author of this product.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008