In order to limit user resources, I tend to create a new db user on each registration. The thing is, there can be millions of users signing up. And I have no idea about the maximum number of database users that can be created on MySQL.
How many users can be created on MySQL?
There are no hard coded limitations to number of users in a MySQL database. User accounts are stored in tables in terms of rows and columns that consume some variable amount of memory and disk space. Although you could in theory add an infinite number of users, you will hit resource boundaries such as disk space, memory use, and processing time to add new users will take too long.
The exact limit depends on the configuration settings of the MySQL database.
Related
I am running a finance related web portal which involve millions of debit credit transactions in MySQL , getting the balance of a specific user at a certain limit i.e. with 3 million rows becomes slow.
Now I am thinking to create separate MySQL container for each user and record only the relevant user transactions in every container and I am sure It will be fast to calculate balance of any user.
I have around 20 thousands of user and I wants to know is it practical to create separate MySQL container for each user ? or shall I go for any other approach. Thanks
I would not recommend a separate MySQL instance per user.
I operated MySQL in docker containers at a past job. Even on very powerful servers, we could run only about 30 MySQL instances per server before running out of resources. Perhaps a few more if each instance is idle most of the time. Regardless, you'll need hundreds or thousands of servers to do what you're describing, and you'll need to keep adding servers as you get more users.
Have you considered how you will make reports if each user's data is in a different MySQL instance? It will be fine to make a report about any individual user, but you probably also need reports about aggregate financial activity across all the users. You cannot run a single query that spans MySQL instances, so you will have to do one query per instance and write custom code to combine the results.
You'll also have more work when you need to do backups, upgrades, schema changes, error monitoring, etc. Every one of these operations tasks will be multiplied by the number of instances.
You didn't describe how your data is organized or any of the specific queries you run, but there are techniques to optimize queries that don't require splitting the data into multiple MySQL instances. Techniques like indexing, caching, partitioning, or upgrading to a more powerful server. You should look into learning about those optimization techniques before you split up your data, because you'll just end up with thousands of little instances that are all poorly optimized.
I have around 20 thousands of user and I wants to know is it practical to create separate MySQL container for each user
No, definitely not. While docker containers are relatively lightweight 20k of them is a lot which will require a lot of extra resources (memory, disk, CPU).
getting the balance of a specific user at a certain limit i.e. with 3 million rows becomes slow.
There are several things you can try to do.
First of all try to optimize the database/queries (can be combined with vertical scaling - by using more powerful server for the database)
Enable replication (if not already) and use secondary instances for read queries
Use partitioning and/or sharding
I know this is sacrilegious, but for a table like that I like to use two tables. (The naughty part is the redundancy.)
History -- details of all the transactions.
Current -- the current balance
You seem to have just History, but frequently need to compute the Current for a single user. If you maintain this as you go, it will run much faster.
Further, I would do the following:
Provide Stored Procedure(s) for all actions. The typical action would be to add one row to History and update one row in Current.
Never UPDATE or DELETE rows in History. If a correction is needed, add another row with, say, a negative amount. (This, I think, is "proper" accounting practice anyway.)
Once you have made this design change, your question becomes moot. History won't need to have frequent big scans.
Use InnoDB (not MyISAM).
Another thing that may be useful -- change the main indexes on History from
PRIMARY KEY(id),
INDEX(user_id)
to
PRIMARY KEY(user_id, id), -- clusters a user's rows together
INDEX(id) -- this keeps AUTO_INCREMENT happy
So this is very much a conceptual question (as much as I'd love to build a billion user app I don't think it's going to happen).
I've read the article by Pinterest on how they scaled their MySQL fleet a number of times ( https://medium.com/#Pinterest_Engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f ) and I still don't get how they would "open up new shards" without effecting existing users.
The article states that every table is on every shard, including the User table.
So I'm assuming that when a user registers and they are assigned a random shard, this has to be done via a function that will always return the same result regardless of the number of shards.
e.g if I sign up with test#example.com they would potentially use that email to work out the shard id and this would have to take into consideration the number of currently 'open' shards. My initial assumption was that they would use something like the mod shard they mentioned later on in the article e.g.
md5($email) % number_of_shards
But as they open up the number of shards it would change the function result.
I then thought perhaps they had a separate DB to hold purely user info for authentication purposes and this would also contain a column with the assigned shard_id, but as I say the article implies that even the user table is on each shard.
Does anyone else have any ideas or insights into how something like this might work?
You are sharding on "user", correct? I see 3 general ways to split up the users.
The modulo approach to sharding has a big problem. When you add a shard, suddenly most users need to move most users to a different shard.
At the other extreme (from modulo) is the "dictionary" approach. You have some kind of lookup that says which shard each user is on. With millions of users, maintenance of the dictionary becomes a costly headache.
I prefer a hybrid:
Do modulo 4096 (or some suitably large number)
Use a dictionary with 4096 entries. This maps 4096 values into the current number of shards.
You have a package to migrate users from one shard to another. (This is a vital component of the system -- you will use it for upgrading, serious crashes, etc, load balancing, etc)
Adding a shard involves moving a few of the 4096 to the new shard and changing the dictionary. The users to move would probably come from the 'busiest' shards, thereby relieving the pressure on them.
Yes, item 4 impacts some users, but only a small percentage of them. You can soften the blow by picking 'idle' or 'small' or 'asleep' users to move. This would involve computing some metric for each of the 4096 clumps.
Situation: I'm trying to create a page which has members that are allowed to view it's content and each member is defined by an ID. To access the information for the page I want to make sure the user is a member of the page.
Question: Is it faster to have one large record that has to be traversed through to look up IDs or create one field that's indexed that have several (billions) of records but that all very small?
Rdms servers like MySQL are optimized for the lookup of a few short records from among many. If you structure your data that way and index it correctly, you'll get decent performance.
On the other hand, long lists of data requiring linear search will not scale up well.
By the way, plan for tens of millions of records, not billions. If you actually need to handle billions, you'll probably have the resources to scale up your system. In the meantime RAM and disk storage are still on a Moore's-law cost curve, even if CPUs aren't.
That scenario would typically be implemented with three tables. A simplified version would look something like this:
users(id, name, password, ... )
pages(id, title, .... )
page_access(user_id, page_id, permission)
where user_id and page_id are references to the users and pages tables respectively. In your application you would then check the current page and user against the page_access table to see if permission should be granted or denied.
MySQL has no trouble processing millions upon millions of well indexed rows, so long as you have sufficient hardware powering it. You want to aim for normalised data, not cramming everything into a single row in a single table.
I have design question for MySQL. As a side project, I am attempting to create a cloud based safety management system. In the most basic terms, a company will subscribe to the service, which will manage company document record as blobs, corrective, employee information, audit results.
My initial design concept was to have a seperate DB for each company.
However, the question I have is if user access control is secure, would it be ok to have all the companies under one DB? What are the pitfalls of this? Are there any performance issues to consider? For identifying records, would it be a compound key of the company and referenceID number unique for each company? If so when generating a reference number for a record of a company, would it slow down as the record set increases?
In terms of constraints, I would expect up to 2000 companies and initially a maximum of 1000 records per company growing at 5% per year. I expect a maximum of 2 gig of blob storage per company growing at 10% per year. The system is to run one cloud server whether multiple db or one big one.
Any thoughts on this would be appreciated.
If there is not much inter-company interaction and overall frequent statistics and you don't plan to make application updates every week or so which would impact the DB structure, I'd go with separate DB (and DB user) for each company. It's more scalable, less prone to user access bugs and easier to make some operations such as remove a company.
On the other hand, 2 mil entries is not such a big deal and if you plan to develop the application further, keeping it in one DB could be better approach.
You have two question : performance and security.
If you use the same mysql user, security will not be different from one option to the other.
If you need performance, you can have the same results, running one or multiple databases (see for instance mysql partioning).
But there are others things that you should consider: like how it will be easy to have one database for your website... or like how it would be easy to have one database per user.
In fact, I give you an answer : considering the size of your data, don't make a choice on performance matters that are quite significantly equals for your needs, but on the choice that will make your life easy.
We have a website with many users. To manage users who transacted on a given day, we use Redis and stored a list of binary numbers as the values. For instance, if our system had five users, and user 2 and 5 transacted on 2nd January, our key for 2nd January will look like '01001'. This also helps us to determine unique users over a given period and new users using simple bit operations. However, with growing number of users, we are running out of memory to store all these keys.
Is there any alternative database that we can use to store the data in a similar manner? If not, how should we store the data to get similar performance?
Redis' nemory usage can be affected by many parameters so I would also try looking in INFO ALL for starters.
With every user represented by a bit, 400K daily visitors should take at least 50KB per value, but due to sparsity in the bitmap index that could be much larger. I'd also suspect that since newer users are more active, the majority of your bitmaps' "active" flags are towards its end, causing it to reach close to its maximal size (i.e. total number of users). So the question you should be trying to answer is how to store these 400K visits efficiently w/o sacrificing the functionality you're using. That actually depends what you're doing with the recorded visits.
For example, if you're only interested in total counts, you could consider using the HyperLogLog data structure to count your transacting users with a low error rate and small memory/resources footprint. On the other hand, if you're trying to track individual users, perhaps keep a per user bitmap mapped to the days since signing up with your site.
Furthermore, there are bitmap compression techniques that you could consider implementing in your application code/Lua scripting/hacking Redis. The best answer would depend on what you're trying to do of course.