Log in one place versus multiple logs [closed] - language-agnostic

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
For a web app or a standalone server app, which would you recommend, and why?
have the giant application.log, where everything is logged;
have many smaller logs:
security.log
performance.log
lifecycle.log
integration.log

I like using databases for logging. Four useful features:
You don't lose the time-ordering, as you'd experience when looking at multiple log files at once.
You can still filter by specific message types, if you want to.
You get integrity, so if your computer crashed just as you were writing a log file, you won't get a corrupted log, and it'll be replayed off the journal when your database starts up again.
Pruning the log is really easy! No need to use hacky log rotation programs that require your daemons to be SIGHUPed or anything.
Your mileage may vary. :-)

I'd suggest multiple logs for a couple of reasons:
Reduce noise, if there are production troubleshooting situations where time counts.
Different groups can get different logs. A systems group may want different logs than an applications group as each has their part of the system where they can optimize settings.
As for how to log, I'd suggest a mixture of database, e-mail and local text file just in case there are issues with things that require connectivity to another server.

Depends on your logging reviews. You should be in the best place to answer it. Do you need specific logs so you can identify issues with better accuracy, or do you just need an overall log for maintenance purposes?
I would always use different logs, provided they don't cause too much overhead.

If you expect lots of logging and you will quickly know where to look then split them up. One big log file can get unruly real quick, but if you are constantly looking in multiple log files for the relavent entry, that's no good either. So when you split them up, make sure it's in a way where when you get an error, you know exactly which file to open first.

I'd recommend using centralized logging system off-the-shelf and not inventing your own. This will allow you to easily dig out data in any form whenever you like. It also should save you time and headache. Managing log files is not only a burden on your application because you have to write additional code, but also for humans, who actually look at the text. Check out logFaces, it might solve some of your problems - it comes with it's own database, can be used with most commercial database brands and allows instant access to relevant log data.
Disclosure: I am the author of this product.

Related

Storing huge amount of data [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
Apologies for the broadness of this question, but looking for a bit of advice.
I have built a system in Laravel. A user is able to upload a file. This file is then sent to a system that does some data science stuff on it. It sends a response back to my system as a JSON string which I then save back to a JSON file. This file is then loaded in the frontend so some charts can be displayed.
For this most part this approach has been fine. However, when I upload really large files (which is fine because I am chunk uploading), the saved JSON file is then huge. This then becomes a problem because it is too big for the frontend to load. A single file can contain hundreds of thousands rows of data.
So my question is really about what other options I have. Instead of saving the response as a JSON file, is it okay to save these 300k+ rows to a mySQL database or is this too much for it to handle? Should I use something like MongoDB instead?
I am thinking that a database is probably the best route as I can then query the specific data I need for each chart, without having to load a huge file just to extract it. I do use the other data within the file, but this can be queried on an event so not as worried about this.
My concern is that if just 3 people upload a file, I could then have a database with over a million rows. How can I then scale this, how can I ensure I wont have issues the more users that use the system?
Any advice on this would be greatly appreciated. I am starting to think that I may need to deploy a database server per user and auto prune their data every X days.
Thanks
300k rows is not a problem. I support MySQL databases with several billion rows in a single table. You may need to upgrade to a larger, more powerful server, but MySQL can store it.
That said, you're on the right track thinking about how much data you want to retain and how much you need to prune.
Scalability depends more on optimizing the queries you need to run, not just on the number of rows you store.
So you should test the queries your app runs, and see how many rows it takes before you see performance decline. Try to optimize the queries by the usual techniques like adding indexes, or rewriting the SQL logic, or partitioning the data, etc. (many techniques, too much to get into in a Stack Overflow answer).
If you have optimized the queries as much as you can, then you need to start pruning data, or splitting data over multiple MySQL instances.

Database Design: When should I use multiple database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Background:
I'm trying to build the backend services of an app. This app has rooms where a user can join in to. When a user joins a room, s/he can exchange some data with other users through socket. Outside of the room, the user can view the processed data from the transactions that happened inside the rooms s/he joined in to. Room list, room informations, and the data transactions inside the room should be stored in a database.
My idea is to create a project with one database, however, an experienced developer suggested to the database into two:
One that uses MongoDB for storing data transactions happening inside the room.
One that uses MySQL for storing and returning room list, room information and analytics of what happened inside a room.
Problem I see with using multiple database:
I did some research and from what I understand, multiple database is not recommended but could be implemented if data are unrelated. Displaying analytics will need to process data transactions that happened inside a room and also display the room's information. If I use the two database approach, I will need to retrieve data from both database in order to achieve this.
Question:
I personally think it's easier to use a single database approach since I don't see the data inside and outside of the room as 'unrelated'. Am I missing an important point on when to use multiple database?
Thanks in advance. Have a good day.
You can look at this problem from two perspectives; technical and practical.
Technically, if your back-end is expected to become very complex or scaled, it is recommended to break it down into multiple microservices, each in charge of a tiny task. Ideally, these small services should help you achieve separation of concerns, so each service only works with one piece of the data. If other services need to read and/or modify that piece of data, they have to go through the service in charge.
In this case, depending on the data each service is dealing with, you can pick the proper database. For instance, if you have transactional data, you can use MySQL, MongoDB for large schemaless content, or Elasticsearch if you want to perform a text search.
Practically, it is recommended to start small with one service and database (if you prefer to develop your app sooner), and then, over time break it down into multiple services as you need to add and/or improve features.
There are multiple points to keep in mind. First, if you expect to have a large user base, you should start development with the right architecture from the beginning to avoid scaling issues. Second, sometimes one database cannot perform the task you need. For example, it would be very inefficient to do a text search in MySQL. Finally, there is no absolute right way of doing things. However you do one thing, another friend might show up tomorrow and ask you why you did not do it his/her way. The most important thing is to start doing and then, learning and improving along the way. Facebook was started with MySQL and it worked fine initially. Could it survive one database type today? I suspect the answer is no, but would it have made sense for them to add all the N databases that they have now back then?
Good luck developing!

Planning for database scaling and schema changes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm doing research before I create my social network database and I've found a lot of questions/resources pertaining to graph and key-value databases for social networks. I understand there are a TON of different options and ways to implement the DB. I also understand that what the big companies do is complex and way above what I currently need (1b+ users). I also know each of the big companies have revamped their databases to account for the insane scaling they go through.
Because I don't know how the network will grow, and I don't believe I can accurately create a model that will scale to 1m users (due to unknowns such as how people will use it, how often people post, comment, etc). But I can at least try to create a database that will be easiest to scale when (if) the need arises.
Do most companies create a database to handle up to 1k users, then once they grow, they revamp it for 10k users, then 100k, etc? If they do, at each of these arbitrary numbers (because of the unknowns listed above), do companies typically change a few tables/nodes/etc, or do they completely recreate the database to take advantage of new technologies (such as moving from SQL to graph)?
I want to pick the best solution, but I'm finding the decision between graph, key-value, SQL, among others very difficult--especially with no data to know what relationships/data is most important. I believe I can create a solid system using a graph that can support up to 10k users, but I'm worried having to potentially completely reacreate the database as the system grows. Is this a worry now to avoid issues, or implement now and adapt later type problem?
Going further, if I do need to plan on complete DB restructures, does it typically make sense to use a Multi-Model NoSQL DBMS (such as OrientDB or ArangoDB)?
I personally think you are asking premature questions.
Seriously, even with a bad model, a database can handle 10k users.
You think about scaling, but the hardest problem is not scaling, it is to come to the point where you need to scale.
I'm sure everybody wants 1bn users, but then you are already dreaming about having a social network with 200 times more users than Github itself ? (Github has ~ 5 million users).
Also, even by thinking it ahead, you will refactor and refactor again definitely during years, and you will have more than one persistence layer, be sure of it.
Code and code good, stay lean, remain able to change quickly, deploy, show to users, refactor, test, deploy and show to users in the same day. These are the things you need to do now, not asking questions about a problem you don't have yet, you definitely have a lot of other problems to solve now ;-)
UPDATE
Based on your comment, you might need to think that there are questions we just can not simply answer, because we don't need your exact requirements.
I have a simple app, which uses 4 persistence layers, and this app is not yet online. I'll give you my "why" about using it and which use case :
Neo4j : it is the core of the application data, I use it because I love it, I know it very much (it is my job) and, as the concept of the app is quite new and can evolve rapidly, having a schemaless db is reducing a lot of the refactoring stuff. Also I have now a lot of use cases coming by building the app, which make Neo4j a good choice when you need to add features without breaking what has already been done.
MySQL
I use it for User accounts and profiles. Why ? Because the framework I use already has a lot of bundles integrating this kind of stuff in a couple of lines of code, the bundles are well maintained and if I would use (currently) neo4j for it, I will have to reinvent the wheel. Also all the modules I use evolve in stability and compatibility with the framework.
Of course the mysql data is coupled (minimally) with the neo4j one. But I know that this kind of data will not evolve that much, so Mysql is a good choice and in case I have to refactor some points, this will not be a huge pain.
Redis
I use Redis for storing analytics data, Redis is quite flexible and I can easily create new keys and add data on top of it.
RabbitMQ :
I use a lot of message queues, why ? For testing refactoring. I can easily process messages with multiple consumers for testing "refactoring", testing mutliple database layers while the app is running for testing changes, testing new features, testing refactoring, ...
You will refactor ! Just try to keep it as simple as possible.

Which database should I choose? MySQL or mongoDB? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm working on a project which is somewhat familiar to WhatApp, except that I'm not implementing the chatting function.
The only thing I need to store in the database is user's information, which won't be very large, and I also need an offline database in the mobile app which should be synced with the server database.
Currently I use MySQL for my server database, and I'm thinking of using JSON for the syncing between mobile app and the server, then I found that mongoDB has a natural support for JSON, which caused me wonder should I change to mongoDB.
So here are my questions:
Should I change to mongoDB or should I still use MySQL? The data for each user won't be too large and it does have some requirement for data consistency. But mongoDB's JSON support is somewhat attractive.
I'm not familiar with the syncing procedure, I did some digging and it appears that JSON is a good choice, but what data should I put into the JSON files?
I actually flagged this as too broad and as attracting primarily opinion based answers but I'll give it a go anyhow and hope to stay objective.
First of all you have 2 separate questions here.
What database system should I use.
How do I sync between app and server.
The first one is easily answered because it doesn't really matter. Both are good options for storing data. MySQL is mature and stable and MongoDB although it's newer has very good reviews and I don't know of any known problems which would prevent it from being used. So take the database which you find easy to use.
Now for second I'll first put in a disclaimer that for data synchronization between multiple entities entire books are written and that it is after all this time still the subject of Phds.
I would advice against directly synchronizing between mobile app and database because that requires the database credentials to be contained within the app. Mobile apps can and will be decompiled and credentials extracted which would compromise your entire database. So you'll probably want to create some API which first does device/user authentication and then changes the database.
This already means that using MongoDB for sake of this would probably be a bad idea.
Now JSON itself is just a format of representing data with some structure, just as XML. As such it's not a method of synchronization but transport.
For synchronizing data it's important that you know the source of truth.
If you have 1 device <-> 1 record it's easy because the device will be the source of truth, after all the only mutations that take place are presumably done by the user on the device.
If you have n devices <-> 1 record then it becomes a whole lot more annoying. If you want to allow a device to change the state when offline you'll need to do some tricks to synchronize the data when the device comes back online. But this is probably a question too complex and situation dependent to answer on SO.
If you however force the device to always immediately propagate changes to the database then the database will always contain the most up to date record, or truth. Downside is that part of the app will not be functional when offline.
If offline updates don't change the state but merely add new records then you can push those to the server when it comes online. But keep in mind you won't be able to order these events.

I would like to create a database with the goal of populating this database with comprehensive inventory information obtained via a shell script [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I would like to create a database with the goal of populating this database with comprehensive inventory information obtained via a shell script from each client machine. My shell script currently writes this information to a single csv file located on a server through an ssh connection. Of course, if this script were to be run on multiple machines at once it would likely cause issues as each client potentially would try to write to the csv at the same time.
In the beginning, the inventory was all I was after; however after more thought I began to ponder wether or not much much more could be possible after I gathered this information. If I were to have this information contained within a database I might be able to utilize the information to initialize other processes based on the information of a specific machine or group of "like" machines. It is important to note that I am already currently managing a multitude of processes by identifying specific machine information. However pulling that information from a database after matching a unique identifier (in my mind) could greatly improve the efficiency. Also allowing for more of server side approach cutting down on the majority of client side scripting. (Instead of gathering this information from the client machine on the startup of each client I would have it already in a central database allowing a server to utilize the information and kick off specific events)
I am completely foreign to SQL and am not certain if it is 100% necessary. Is it necessary? For now I have decided to download and install both PostgreSQL and MySQL on separate Macs for testing. I am also fairly new to stackoverflow and apologize upfront if this is an inappropriate question or style of question. Any help including a redirection would be appreciated greatly.
I do not expect a step by step answer by any means, rather am just hoping for a generic "proceed..." "this indeed can be done..." or "don't bother there is a much easier solution."
As I come from the PostgreSQL world, I highly recommend using it for it's strong enterprise-level features and high standard compliance.
I always prefer to have a database for each project that I'm doing for the following benefits:
Normalized data is easier to process and build reports on;
Performance of database queries will be much better due to the caching done by the DB engine, indexes on your data, optimized query paths;
You can greatly improve machine data processing by using SQL/MED, which allows querying external data sources from the database directly. You can have a look on the Multicorn project and examples they provide.
Should it be required to deliver any kinds of reports to your management, DB will be your friend, while doing this outside the DB will be overly complicated.
Shortly — go for the database!