Can I use user id as account number - mysql

I have users less then million. I use MySQL and table creates auto increment from 100000 (6 digits). If any problem if i use user id as account number for small web application. What is the best way in practice?

Fairly unspecified what an account number exactly means. In general, you can use that but in my opinion, the user-id is a technical information which should NOT get outside of the system to the customer (for security reasons). I suggest to create a GUID or any other generated (unpredictable)id for each user and then use that as account number to give "outside". With this approach, no user can "predict" the ID of another user.

Generally, I would decouple business logic from application logic even if the overlap is evident. The way key generation happens in databases could leave gaps and/or your app may not produce keys at the time needed. A simple key generator (synchronized maybe) maybe better. That said, here is more in this dialog for you to ponder upon.

Related

Suitability of AWS Cognito Identity ID for SQL primary key

I am working on a platform where unique user ID's are Identity ID's from a Amazon Cognito identity pool. Which look like so: "us-east-1:128d0a74-c82f-4553-916d-90053e4a8b0f"
The platform has a MySQL database that has a table of items that users can view. I need to add a favorites table that holds every favorited item of every user. This table could possibly grow to millions of rows.
The layout of the 'favorites' table would look like so:
userID, itemID, dateAdded
where userID and itemID together are a composite primary key.
My understanding is that this type of userID (practically an expanded UUID, that needs to be stored as a char or varchar) gives poor indexing performance. So using it as a key or index for millions of rows is discouraged.
My question is: Is my understanding correct, and should I be worried about performance later on due to this key? Are there any mitigations I can take to reduce performance risks?
My overall database knowledge isn't that great, so if this is a large problem...Would moving the favorited list to a NoSQL table (where the userID as a key would allow constant access time), and retrieving an array of favorited item ID's, to be used in a SELECT...WHERE IN query, be an acceptable alternative?
Thanks so much!
Ok so here I want to say why this is not good, the alternative, and the read/write workflow of your application.
Why not: this is not a good architecture because if something happens to your Cognito user pool, you cant repopulate it with the same ids for each individual user. Moreover, Cognito is getting offered in more regions now; compare to last year. Lets say your users' base are in Indonesia, and now that Cognito is being available in Singapore; you want to move your user pools from Tokyo to Singapore; because of the latency issue; not only you have the problem of moving the users; you have the issue of populating your database; so your approach lacks the scalability, maintainability and breaks the single responsibility principle (updating Cognito required you to update the db and vica versa).
Alternative solution: leave the db index to the db domain; and use the username as the link between your db and your Cognito user pool. So:
Read work flow will be:
User authentication: User authenticates and gets the token.
Your app verifies the token, and from its payload get the username.
You app contacts the db and get the information of the user, based on the username.
Your app will bring the user to its page and provides the information which was stored in the database.
Write work flow will be:
Your app gets the write request with the user with the token.
verifies the token.
Writes to the database based on the unique username.
Regarding MySQL, if you use the UserID and CognitoID composite for the primary key, it has a negative impact on query performance therefore not recommended for a large dataset.
However using this or even UserID for NoSQL DynamoDB is more suitable unless you have complex queries. You can also enforce security with AWS DynamoDB fine-grained access control connecting with Cognito Identity Pools.
While cognito itself has some issues, which are discussed in this article, and there are too many to list...
It's a terrible idea to use cognito and then create a completely separate user Id to use as a PK. First of all it is also going to be a CHAR or VARCHAR, so it doesn't actually help. Additionally now you have extra complexity to deal with an imaginary problem. If you don't like what cognito is giving you then either pair it with another solution or replace it altogether.
Don't overengineer your solution to solve a trivial case that may never come up. Use the Cognito userId because you use Cognito. 99.9999% of the time this is all you need and will support your use case.
Specifically this SO post explains that there is are zero problems with your approach:
There's nothing wrong with using a CHAR or VARCHAR as a primary key.
Sure it'll take up a little more space than an INT in many cases, but there are many cases where it is the most logical choice and may even reduce the number of columns you need, improving efficiency, by avoiding the need to have a separate ID field.

Database design: Using hundred of fields for little values

I'm planning to develop a PHP Web App, it will mainly be used by registered users(sessions)
While thinking about the DB design, I was contemplating that in order to give the best user experience possible there would be lots of options for the user to activate, deactivate, specify, etc.
For example:
- Options for each layout elements, dialog boxes, dashboard, grid, etc.
- color, size, stay visible, invisible, don't ask again, show everytime, advanced mode, simple mode, etc.
This would get like 100s of fields ranging from simple Yes/No or 1 to N values..., for each user.
So, is it having a field for each of these options the way to go?
or how do those CRMs or CMS or other Web Apps do it to store lots of 1-2 char long values?
Do they group them on Text fields separated by a special char and then "explode" them as an array for runtime usage?
thank you
How about something like this:
CREATE TABLE settings (
user_id INT,
setting_name VARCHAR(255),
setting_value CHAR(2)
)
That way, to store a configuration setting for a user, you can do:
INSERT INTO settings (user_id, setting_name, setting_value),
VALUES (1, "timezone", "+8")
And when you need to query a setting for a particular user, you can do:
SELECT setting_value FROM settings
WHERE user_id = 1 AND setting_name = "timezone"
I would absolutely be inclined to have individual fields for each option. My rule of thumb is that each column holds exactly one piece of data whenever possible. No more, no less. As was mentioned earlier, the ease of maintenance and the ability to add / drop options down the road far outweighs the pain in the arse of setting it up. I would, however, put some thought into how you create the table(s). The idea mentioned earlier was to have a Settings table with 100 columns ( one for each option ) and one row for each user. That would work, to be sure. If it were me I would be inclined to break it down a bit further. You start with a basic User table, of course. That would hold the basics of username, password, userid etc. That way you can use the numeric userid as the key index for your Settings table(s). But after that I would try to break down the settings into smaller tables based on logical usage. For example, if you have 100 options, and 19 of those pertain to how a user views / is viewed / behaves in one specific part of the site, say something like a forum, then break those out into a separate table ie ForumSettings. Maybe there are 12 more that pertain to email preferences, but would not be used in other areas of the site / app. Now you have an EmailSettings table. Doing this would not only reduce the number of columns in your generic Settings table, but it would also make writing queries for specific tasks or areas of the app much easier, speed up the performance a tick, and make maintenance moving forward far less painful. Some may disagree as from a strictly data modeling perspective I'm pretty sure that the one Settings table would be indicated. But from a real world perspective, I have never gone wrong using logical chunks such as this.
From a pure data-model perspective, that would be the clearest design (though awful wide). Some might try to bitmask them into a single field for assumed space reasons, but the logic to encode/decode makes that not worthwhile, in my opinion. Also you lose the ability to index on them.
Another option (I just saw posted) is to hold a separate table with an FK back to the user table. But then you have to iterate over the results to get the value you want to check for.

What are the ramifications of having a ton of views in MySql

I have a multi-tenant MySQL database. For most things I use a tenant-id column in tables to discriminate but for a few purposes, it is much better to have a view that is already filtered by tenant id. So for example, I might have a view called 'quarterly_sales_view' but for tenant 30 I would have 'quarterly_sales_view_30' and for tenant 51 I would have 'quarterly_sales_view_51'. I create these views dynamically and everything is working great but we have just a few tenants right now and I realize this would never work for millions of tenants.
My question is, am I going to run into either performance problems or just hard limits with a few thousand, few hundred, or few dozen custom views?
ADDITIONAL INFO:
I am using a 3rd party tool (immature) that requires a table name (or view name, since it's read-only) and operates on that. In the context it's working, I can't let it have access to the entire view, so I create another view that is simply defined as SELECT * FROM MasterView WHERE TenantId = 30. I recognize this as a workaround for a poor limitation of having to have the tool work on the table directly. Luckily this tool is open source, so I can tweak it to use a different approach. I just wanted to have an idea of how long I had before the current approach blew up.
The primary concern within this question (IMO) should be less with performance and more with the design. First, the number of views should not affect performance, but, why do you need a view per tenant? Is it not possible to simply filter for a tenant by ID on a more generic view. E.g.:
SELECT * FROM vwMyTentants WHERE TenantId = 30
Whatever the reason, you should reconsider your approach because it is a sign of design smell.

Unique, numeric, incremental identifier

I need to generate unique, incremental, numeric transaction id's for each request I make to a certain XML RPC. These numbers only need to be unique across my domain, but will be generated on multiple machines.
I really don't want to have to keep track of this number in a database and deal with row locking etc on every single transaction. I tried to hack this using a microsecond timestamp, but there were collisions with just a few threads - my application needs to support hundreds of threads.
Any ideas would be appreciated.
Edit: What if each transaction id just has to be larger than the previous request's?
If you're going to be using this from hundreds of threads, working on multiple machines, and require an incremental ID, you're going to need some centralized place to store and lock the last generated ID number. This doesn't necessarily have to be in a database, but that would be the most common option. A central server that did nothing but serve IDs could provide the same functionality, but that probably defeats the purpose of distributing this.
If they need to be incremental, any form of timestamp won't be guaranteed unique.
If you don't need them to be incremental, a GUID would work. Potentially doing some type of merge of the timestamp + a hardware ID on each system could give unique identifiers, but the ID number portion would not necessarily be unique.
Could you use a pair of Hardware IDs + incremental timestamps? This would make each specific machine's IDs incremental, but not necessarily be unique across the entire domain.
---- EDIT -----
I don't think using any form of timestamp is going to work for you, for 2 reasons.
First, you'll never be able to guarantee that 2 threads on different machines won't try to schedule at exactly the same time, no matter what resolution of timer you use. At a high enough resolution, it would be unlikely, but not guaranteed.
Second, to make this work, even if you could resolve the collision issue above, you'd have to get every system to have exactly the same clock with microsecond accuracy, which isn't really practical.
This is a very difficult problem, particularly if you don't want to create a performance bottleneck. You say that the IDs need to be 'incremental' and 'numeric' -- is that a concrete business constraint, or one that exists for some other purpose?
If these aren't necessary you can use UUIDs, which most common platforms have libraries for. They allow you to generate many (millions!) of IDs in very short timespans and be quite comfortable with no collisions. The relevant article on wikipedia claims:
In other words, only after generating
1 billion UUIDs every second for the
next 100 years, the probability of
creating just one duplicate would be
about 50%.
If you remove 'incremental' from your requirements, you could use a GUID.
I don't see how you can implement incremental across multiple processes without some sort of common data.
If you target a Windows platform, did you try Interlocked API ?
Google for GUID generators for whatever language you are looking for, and then convert that to a number if you really need it to be numeric. It isn't incremental though.
Or have each thread "reserve" a thousand (or million, or billion) transaction IDs and hand them out one at a time, and "reserve" the next bunch when it runs out. Still not really incremental.
I'm with the GUID crowd, but if that's not possible, could you consider using db4o or SQL Lite over a heavy-weight database?
If each client can keep track of its own "next id", then you could talk to a sentral server and get a range of id's, perhaps a 1000 at a time. Once a client runs out of id's, it will have to talk to the server again.
This would make your system have a central source of id's, and still avoid having to talk to the database for every id.

Database denormalization opportunity

I'm looking for a strategy in as far as stopping the repetitive problem of branching out tables. For example as a fictitious use case, say I have a table with users that contains their name, login, password and other meta data. In this particular scenario, say the user is restricted to login per a specific subset of IP(s). Thus, we have a 1:M relationship. Everytime a use case such as the following comes up, your normal work flow includes that of have a 'users' table and a table such as 'user_ips' in which case you'd have something such as a pk(ip_id), fk(user_id) and IP on the user_ips side.
For similar situations, do you folks normally fan out in the fashion as above? Is there an opportunity to denormalize effectively here? Perhaps store the IPs in a BLOB column in some CSV delimited fashion? What are some strategies you folks are deploying today?
Opportunity to denormalize? I think you may have misunderstood conventional wisdom - denormalization is an optimization technique. Not something you go out looking for.
I would suspect that any normalized solution when the number of potential related items is large is going to out perform a denormalized solution if properly indexed. My strategy is to normalize the database then provide views or table-based functions that take advantage of indexed joins to make the cost bearable. I'd let performance demands dictate the move to a denormalized form.
Keep this in mind. If you need to implement role-based security access to parts of the information, table-based security is MUCH easier to implement than column-based, especially at the database or data layer level.
I would strongly suggest against putting multiple IP addresses in a field. Never mind 3NF this breaks 1NF.
Tvanfsson is right in that if you index the FKEY you'll get pretty comparable performance unless there's going to be millions of records in the 'users_ips' table.
What's even better is that by keeping these tables normalized you can actually report on this information in the future so that when users are confused as to why they can't login from certain LANs, writing the app (or SQL) to troubleshoot and do user IP lookups will be A LOT easier.
One option would be to store your Ip addresses as an xml string. I think this would be better than a comma separted list and allow you flexibility to add other elements to the string should you need them (Port comes to mind) without database changes.
Although, I think the normalized fashion is better in most cases.
As with any denormalization question, you need to consider the costs associated with it. In particular, if you list the IP addresses in the main table, how are you going to be able to answer the question "which users can be associated with IP address w.x.y.z?". With the fully normalized form, that is easy and symmetric with "which IP addresses can be associated with user pqr?". With denormalized forms, the questions have very different answers. Also, ensuring that the correct integrity rules are applied is much harder in the denormalized version, in general.
You may want to consider an user-attribute table and attribute-type table where you can define what type of attributes a user can have. Each new use use case would become an attribute type, and the data would simply be added user-attribute table.
With your example of the IP addresses, you would have an attribute type of IP and store the respective IP's in the user-attribute table. This gives you the flexibility to add another type, such as MAC address, and does not require that you create a new table to support the new data types. For each new use case you do not have to add anything bu data.
The down side is that your queries will be a little more complex given this attribute structure.
IMHO, it's all about cost/benefit analysis. All depends on requirements (including probable ones) and capabilities of the platform you are using.
For example, if you have a requirement like "Show all unique IP addresses recorded in the system", then you better "branch" now and create a separate table to store IP addresses. Or if you need certain constraints on IP addresses (like "all IP addresses of a given user must be unique) then you might greatly benefit from having a separate table and proper constraints applied to it. (Please note that you could meet both requirements even if you used de-normalized design and proper XML-related machinery; however, RelDB-based solution to these requirements seems to be much cheaper to implement and maintain.)
Obviously, these are not be the only examples of requirements that would dictate normalized solution.
At the same time, I think that requirements like "Show all IP addresses of a user" or "Show all users associated with a given IP address" may not be sufficient to justify a normalized solution.
You could try to perform deeper analysis (in search of requirements of the first type), or just rely on your understanding of the project's context (current and future) and on your "guts feeling".
My own "guts feeling" in this particular case is that requirements of the first type (pro-normalization requirements) are extremely likely, so you'd be better off with a normalized solution, from the very beginning. However, you've said that this use case is fictitious, so in your real situation the conclusion may be exactly opposite.
Never say "never": 3NF is not always the best answer.