Using sAMAccountName as uid in MySQL database - mysql

I have an application that authenticate with LDAP and returns a JWT with the sAMAccountname of the logged user.
This application have a MySQL database where I'd like to store the user in different tables (fields like createdBy, updatedBy, etc.) and I was wondering what is the correct way of handling this:
using the sAMAccount name as identifier (so the createdBy will be a VARCHAR(25))
using a link table to match the sAMAccountname with an autoincremented identifier
Normally I would choose the "id" way, it's faster and easier to read in my opinion, but I'm not really into linking users from LDAP dictionary and changing their id in my database, so honestly I would choose the first option.
What are the pro/cons of using a string as uid ? In my case it's likely to be only for statuses like updatedBy, cratedBy, deletedBy etc. so I won't have hardlinks between multiple tables using an user identifier.

I think you should create user table with a surrogate primary key (autoincrementing one) and make unique index on sAMAccount column.
Natural primary keys are good because they just naturally describe a record they point to. But the downsize of using them is that they consume too much space in the index. Index lookups / rebuilds are slower. Tables consume more space also.

I'd connect everything using an id as primary key.
ONe thing is that the sAMAccountName is not necessarilly unique. Think of a user changing her or his name. The sAMAccountName might then change but it's still the same user. When you connect everything via an ID you can change the sAMAccountName-field without breaking everything.
But that's just my 2 cent

Related

Should I use multiple databases in MySQL for my "hosting" platform? [duplicate]

Let us say I need to design a database which will host data for multiple companies. Now for security and admin purposes I need to make sure that the data for different companies is properly isolated but I also do not want to start 10 mysql processes for hosting the data for 10 companies on 10 different servers. What are the best ways to do this with the mysql database.
There are several approaches to multi-tenant databases. For discussion, they're usually broken into three categories.
One database per tenant.
Shared database, one schema per
tenant.
Shared database, shared schema. A tenant identifier (tenant key) associates every row with the right tenant.
MSDN has a good article on the pros and cons of each design, and examples of implementations.
Microsoft has apparently taken down the pages I referred to, but they are on on archive.org. Links have been changed to point there.
For reference, this is the original link for the second article
In MySQL I prefer to use a single database for all tenants. I restrict access to the data by using a separate database user for each tenant that only has access to views that only show rows that belong to that tenant.
This can be done by:
Add a tenant_id column to every table
Use a trigger to populate the tenant_id with the current database username on insert
Create a view for each table where tenant_id = current_database_username
Only use the views in your application
Connect to the database using the tenant specific username
I've fully documented this in a blog post:
https://opensource.io/it/mysql-multi-tenant/
The simple way is: for each shared table, add a column says SEGMENT_ID. Assigned proper SEGMENT_ID to each customer. Then create views for each customer base on the SEGMENT_ID, These views will keep data separated from each customers. With this method, information can be shared, make it simple for both operation & development (stored procedure can also be shared) simple.
Assuming you'd run one MySQL database on a single MySQL instance - there are several ways how to distinguish between what's belonging to whom.
Most obvious choice (for me at least) would be creating a composite primary key such as:
CREATE TABLE some_table (
id int unsigned not null auto_increment,
companyId int unsigned not null,
..
..
..,
primary key(id, company_id)
) engine = innodb;
and then distinguishing between companies by changing the companyId part of the primary key.
That way you can have all the data of all the companies in the same table / database and at application level you can control what company is tied to which companyId and determine which data to display for certain company.
If this wasn't what you were looking for - my apologies for misunderstanding your question.
Have you considered creating a different schema for each company?
You should try to define more precisely what you want to achieve, though.
If you want to make sure that an HW failure doesn't compromise data for more than one company, for example, you have to create different instances and run them on different nodes.
If you want to make sure that someone from company A cannot see data that belong to company B you can do that at the application level as per Matthew PK answer, for example
If you want to be sure that someone who manages to compromise the security and run arbitrary SQL against the DB you need something more robust than that, though.
If you want to be able to backup data independently so that you can safely backup Company C on mondays and Company A on sundays and be able to restore just company C then, again, a purely application-based solution won't help.
Given a specific DB User, you could give a user membership to group(s) indicating the companies whose data they are permitted to access.
I presume you're going to have a Companies table, so just create a one-to-many relationship between Companies and MySQLUsers or something similar.
Then, as a condition of all your queries, just match the CompanyID based on the UserID
in my file Generate_multiTanentMysql.php i do all steps with PHP script
https://github.com/ziedtuihri/SaaS_Application
A Solution Design Pattern :
Creating a database user for each tenant
Renaming every table to a different and unique name (e.g. using a prefix ‘someprefix_’)
Adding a text column called ‘id_tenant’ to every table to store the name of the tenant the row belongs to
Creating a trigger for each table to automatically store the current database username to the id_tenant column before inserting a new row
Creating a view for each table with the original table name with all the columns except id_tenant. The view will only return rows where (id_tenant = current_database_username)
Only grant permission to the views (not tables) to each tenant’s database user
Then, the only part of the application that needs to change is the database connection logic. When someone connects to the SaaS, the application would need to:
Connect to the database as that tenant-specific username

Database design with user data and central database updates

I am designing a windows desktop app. It uses LiteDB as the single file local db for users - using it very much as a relational database with foreign keys etc (each Table having an integer ID as primary key and references to other tables via FK integers).
It's a retro-gaming app, so 'tables' will include things such as:
System (e.g. "Sony PlayStation", "Nintendo 64")
Controller (e.g. "Sony Dual Shock")
Control (e.g. "Cross", "Start", "Select")
Because of the above, I will have to stick to using integer IDs as the primary key - I though about using the 'name', but this wouldn't work for Controls (i.e. Start will be found on many controllers).
User should be able to add and delete records as they wish (although there will be a discouraging of deleting 'standards')
The challenge is that I'm also going to host a mysql database on my server, allowing users to update their tables from this. Now this is the bit I can't get my head around.
Say they add a System "Casio Watch" to their local table. This will get an auto-gen ID (say '94'). At the same time, some updates occur on the server database and a new system is added (e.g. "Commodore Calculator") this also gets the auto-gen ID of '94.' That's conflict number 1.
You could get around the above by just appending it as a new row in the user db - getting a new ID in that. But my second worry is around foreign keys. Let's say there's a 'Manufacturers' table with a 'Biggest Seller' field. Now on the server, for Manufacturer = Commodore, the 'Biggest Seller' FK is 94 for "Commodore Calculator" However, if this Manufacturer table is imported into the user local db, then Commodore's biggest seller would be "Casio watch" - it's ID being 94 on the user db.
Forgive me if I'm being a bit slow about all this. Referential integrity is coming to mind (is that the one with update/null FKs on change??) but I don't think you can do this through LiteDB (i.e. a change in one does not cascade to related tables).
Any advice would be greatly appreciated.
Using a simple auto increment field will not work as you have accurately stated.
Either add a "server id" field to the relevant tables identifying the computer / installation the data comes from and making sure that this field is unique across all your installations. Each system / manufacturer / etc that you need to synchronise across multiple databases will have a compound primary key consisting of the server id and an auto incremented value (although, you probably need to have a separate generator to create the auto increment locally). So, "Casio Watch" would have the server id of 1 and the auto incremented value of 94. The "Commodore Calculator" would have the same auto increment value, but its server id would be different, therefore no conflict will occur.
The other option is to use universally unique id (UUID) instead of a simple auto increment field. UUIDs are guaranteed to be unique across all mysql installations (there are some limitations). In mysql you can use the uuid() function to generate a uuid.
From a system design view UUID is simpler because mysql guarantees its uniqueness within certain limitations that are described in the above link. However, UUIDs require more storage space and will have negative impact on innodb's performance.

At what point should I create a separate table (mysql)?

A friend and I are working on a database that stores information about cPanel hosting accounts, such as what settings, apps, and features each account is using.
Most of the fields are boolean, such as whether or not the account has any wordpress sites, any php 5.4 driven sites, any ruby on rails sites, etc...
A small number of fields are non-boolean data like disk usage in MB, hostname of the server the account resides on, and the username of the account, etc...
In my mind, it makes sense to store ALL this information in one single table.
So the table might have the following columns:
php54 boolean,
wordpress boolean,
ror boolean,
username varchar(8),
hostname varchar(20),
usage_mb int(9),
I figure that the primary key could be (username,hostname).
However, my friend has already set up the database with multiple tables that look like this:
Fact Table:
id int(11),
php54 boolean,
wordpress boolean,
ror boolean,
usage_mb int(9),
User Table:
id int(11),
factid int(11),
hostid int(11),
username varchar(8)
Hostname Table:
id int(11),
hostname varchar(20),
ip varchar(15),
Where each table's primary key is "id" and the user table references the hostname table and fact table using 'hostid' and 'factid' foreign keys (respectively).
I believe my friend's rationale behind multiple tables is to organize the data based on the type of data, despite all the data being related to one single, unique account.
My rationale is that since all the data belongs to one unique account, and therefore every single row is 1:1, does it make sense to have multiple tables?
I would think multiple tables would be sensible if a row in one table can reference multiple rows in another table... But in this case each row from each table can only be associated with one single row from any other table... so i think one table is fine.
Should this data be in multiple tables, or in one single table?
We're both sort of noobs figuring things out as we go.
At which point does it make sense to use multiple tables?
Currently its really difficult to write an API to add the data associated with one single account to three separate tables, as all the primary keys auto increment, and other than that there isn't any key that is unique to the account which would make it easy to update existing data.
Sorry if none of this makes sense
In your case, I dont't think having multiple tables with one to one relationships is the right way.
It is not forbidden and in some cases it can be helpfull (
Is there ever a time where using a database 1:1 relationship makes sense?), but you'll have to deal with unecessary joins in your requests.
Ignoring ids, the way you find out what your CKs (candidate keys) are and whether you should decompose is the topic of normalization to higher NFs (normal forms). This formalizes your notion of "a row in one table can reference multiple rows in another" (among others). Guessing using common sense here, there's no particular need to decompose. Introducing ids not visible at the business level is always technically unnecessary but happens per its own practical/ergonomic reasons. Further explanation/justification is information modeling & database design textbook chapters on design, CKs, NFs & surrogates--read some. Vague notions like "same type of data" are not helpful.
(TL;DR "At what point should I create a separate table?" is a basic question with a complex answer that requires learning some stuff.)

Mysql deduce foreign key relationship for random queries

I am an MySQL novice and am looking for the solution to the following problem:
I would like to create a CMS with cppcms which shall be capable to have modules. Since I want to reduce the chance of (accidental) access to private data, I want a module which handles data access and rights. Since this module is supposed to be unaware of data structures created by other modules I would like it to deduce the data owner through foreign key relations. My idea would be to search for a path (over foreign keys) which links a row to a user id.
Sum up:
What I am trying to do
Taking a random query, determine the affected rows
for the affected rows determine a relationship/path (via foreign keys) to a user/userid (a column in an existing table)
return only the rows for which a relationship could be determined and a condition holds (e.g. the userid found in the related query matches a fixed user id, such as the user currently accessing the system)
(As far as I know foreign keys only enforce the existence of a key in another table, however the precondition I assume is, that every row is linked to a user over a path of foreign key relations)
My Problem/Question:
Is there an existing solution/Better approach to the problem? Prepared statements wont do the trick since I don't know all datastructures/queries in advance.
How do I get the foreign key relations? Is there another way besides "SHOW CREATE TABLE" and then parsing the result string?
How can I determine the rows that would be affected, without modifing them? I would like to filter this set afterwards by determining if I can link it to the current user (not the mysql user but system user).
Could I try executing the query, and then select the affect rows, and if I determine an access violation simply do a rollback? Problem with this: how to do the changes to the subset of rows for which it is legal (e.g. I attempt to change 5 rows, may only change 2, how to only change those 2). One idea was to search a way to create a temporary table with the result set; this solution has several drawbacks: foreign key relations are not possilbe for temporary tables, they are 'lost'.
P.S.: I am coding in c++, therfore I would prefer cpp-compatible library recommendations, however I am open to other suggestions. While googling I stumbled over doctrine and Iam currently researching it.
P.P.S.: Database engine is InnoDB (has to because of the foreign keys)
UPDATE: Explanation Attempt of Part 2:
I am trying to filter which collumns a user is allowed to see of tables. To do so I would like to find a connection in the database over foreign keys (By foreign keys I ensure that I can get to all data over joins, and they are a hint on which columns I have to join). Since I plan on a complexer system (e.g. forum) I don't want to join all data in a temporary table and run a user query on those. I would rather evaluate the userquery and check for the result if I can map it with a join to the users id. For example I could use this to enforce that an edit button is only enabled for the posts created by the user. (I know there are easier ways to do this, but I basically want to allow programmers to write their own queries without giving them the chance to edit or view data that they are not allowed to see. My assumption is that the programmer is not an evildoer but simply forgetting constraints, thus I want to enforce them in software).
Getting here would be pretty good, but I have a little more complex need.
First a basic example. Let's say its like facebook and all the friends of a person are allowed to see his pictures.
pictures = id **userid** file (bool)visibleForFriends album
friendship = **userid1** **userid2**
users = userid
What I want to happen is:
Programmer input "SELECT * FROM pictures WHERE album=2"
System gets all matching records (e.g. set of ids)
System sees foreign key userid, tries to match current userid against the pictures userid, adds all matching to the returned result part
System notices special column visibleForFriends
System tries to determin all Friends (SELECT userid1 FROM friendship WHERE userid2=currentUserID join (have to read up on joins) SELECT userid2 FROM friendship WHERE userid1 =currentUserID)
System adds all rows where visibleForFriends is true and pictures.userid=Result from 5.
While the Friendship part is some extra code (I think doable if igot started on the first bit), I still need to figure out how to automatically follow the foreign keys to see the connection. Ignoring the special Friendship case (special case), I would like the system to work on this as well:
pictures = id **albumid** file (bool)visibleForFriends album
albums = id **userid**
users = userid
Now the system should go pictures.albumid ==> albums.id -> albums.userid ==> users.userid.
I hope the examples clarified the question a bit. One problem is, that in point one from the example (programmer query input) I dont want to let "DELETE *" take effect on anything not owned by the user. So I have to filter which rows to actually delete.
In response to part of your answer (part 1), providing the Mysql user you access the database with has access rights to information_schema then you can use the following query to understand existing foreign key relations within a specific database:
SELECT
TABLE_NAME,
COLUMN_NAME,
REFERENCED_TABLE_NAME,
REFERENCED_COLUMN_NAME
FROM
information_schema.KEY_COLUMN_USAGE
WHERE
TABLE_SCHEMA = 'dbname' AND REFERENCED_COLUMN_NAME IS NOT NULL;
I am slightly confused by the part 2 and am unsure how to give an appropriate response to this section. I hope you find the above query helpful though in your project!
Is there an existing solution/Better approach to the problem?
Yes, I think so. You're describing a multi-tenant database. In a multi-tenant database in which the users share tables (also known as "shared everything"), each table should have a column for the user id. In effect, each row knows its owner.
This will vastly simplify your SQL, since you need no joins to determine who a row belongs to. it will probably speed up your SQL a lot, too.
This SO answer has a decent summary of the issues and alternatives.

User name as DB ID

I am creating a simple web application.
Is it wrong to identify user by it's user name even in the application low level?
For example, say I have a authentication token table that has three columns: token, userID, expDate.
Will it be wrong to put the user username in userID column?
Do I have to worry about the fact that everybody knows the user ID in my DB?
No, I don't think there's anything wrong with that particularly. I've seen that in practice at very big sites - just make sure that you have a unique constraint and index for that value (better, make it the primary key). Also, consider that using the username as their ID means you can't let the user change their username later without breaking existing links (say, if your user shares their user page externally).
I'm not sure, but there might be some overhead from using a string instead of a number.
Also it could be a hassle to update other database tables if a user's username ever changes.