Mapping strings to relations - mysql

I'm creating a system where tasks can be assigned to different users. The problem is that tasks are mapped through a string column called recipient, that in the end maps to a collection of users. The contents of this column could look like this:
has:tasks-update,tasks-access - Users that have the tasks-update and tasks-access Permission.
role:administrator - Users that have the administrator role.
Right now I'm resolving it problematically. This is somewhat easy when I have to figure out who has access to a specific task, but cumbersome when a user needs to know what tasks are "assigned" to them.
Right now I'm resolving each recipient column to see if the user is included, this is unfortunately not very feasible as it comes with a huge performance cost.
I already have indices on the appropriate columns to speed the look-ups up.
A solution to this, was that I would resolve the recipients when the recipient was changed and then place the relationships between users and tasks in an intermediate table. While this lets me quickly look up the tasks a user is assigned to, it also becomes problematic since now I need to keep track of (for example) each time a user has been given the administrator role and now synchronize this to the intermediate table.
I was hoping I could get some insight into solving this issue without sacrificing performance like I am right now, but also not have to synchronize all the time.

Storing a list of anything as a string in a singular column can lead to all sorts of problems down the line
As you have encountered already.. any relational look-up, insert, update or delete operations on the list will first require some form of parsing of the existing list
It is worth noting that any indexes on this column will likely NOT be usable by the engine for these tasks, as indexes on string based columns (other than FULL TEXT) are only really useful when searching the start of the string
For example,
SELECT *
FROM site_user
WHERE recipients LIKE '%tasks-update%'
Will not be able to use an index on the recipients column
A suggestion
I would split out your current lists into new tables, like
role - id, name, …
e.g. {3, 'administrator',… }
permission - id, name, …
e.g. {5, 'tasks-access',… }
e.g. {9, 'tasks-update',… }
site_user - id, name, role_id, …
e.g. {7, 'Jeff', 3,… }
site_user_permission - id, site_user_id, permission_id, …
e.g. {1, 7, 5,… }
e.g. {2, 7, 9,… }
Where from the example records, 'Jeff' is an 'administrator' and has been assigned the 'tasks-update', and 'tasks-access' permissions
Lookups should be easily achievable using JOINs, and stay consistent when data is added or removed. Data integrity can be maintained by adding appropriate foreign keys and unique indexes
N.B. Without specific examples of the operations that are causing you issues, or more details on how you intend to use user roles and permissions, it is difficult to do more than make general suggestions

The tried and good approach, complying to normal forms would be to have task_type and role tables. You of course have a user table and since a user can have many roles and privileges, you will need a user_role and a user_privilege table to handle the many-to-many relations. An easy way to handle the problems is to have some numbers representing some privileges and roles, like 1 for administrator and 2, 3, 5, 7, 11, 13, 17 and so on for other privileges. Having a similar number for a role as a primary key would ease the role matching problem. For example, let's consider the case when you have a privilege with code 7. If you search for roles with the id divisible with this code, then you will get 7 (data_read, for example) and 1 (administrator).

You need for sure a relation table between users and tasks and of course in this relation you have also to flag if user is adminstrator or not. This is the best way for design the structure of your application instead of merging information into a single columns which cause performance/complexity issue. Go ahead with this approach ,your work will benefit from this.

Related

Access query is duplicating unique records / Linked table issues

I hope someone can help me with this:
I have a simple query combining a list of names and basic details with another table containing more specific information. Some names will necessarily appear more than once and arbitrary distinctions like "John Smith 1" and "John Smith 2" are not an option, so I have been using an autonumber to keep the records distinct.
The problem is that my query is creating two records for each name that appears more than once. For example, there are two clients named 'Sophoan', each with a different id number, and the query has picked up each one twice resulting in four records (in total there are 122 records when there should only be 102). 'Unique values' is set to 'yes'.
I've researched as much as I can and am completely stuck. I've tried to tinker with sql but it always comes back with errors, I presume because there are too many fields in the query.
What am I missing? Or is a query the wrong approach and I need to find another way to combine my tables?
Project in detail: I'm building a database for a charity which has two main activities: social work and training. The database is to record their client information and the results of their interactions with clients (issues they asked for help with, results of training workshops etc.). Some clients will cross over between activities which the organisation wants to track, hence all registered clients go into one list and individual tables spin of that to collect data for each specific activity the client takes part in. This query is supposed to be my solution for combining these tables for data entry by the user.
At present I have the following tables:
AllList (master list of client names and basic contact info; 'Social Work Register' and 'Participant Register' join to this table by
'Name')
Social Work Register (list of social work clients with full details
of each case)
Social Work Follow-up Table (used when staff call social work clients
to see how their issue is progressing; the register has too many
columns to hold this as well; joined to Register by 'Client Name')
Participants Register (list of clients for training and details of
which workshops they were attended and why they were absent if they
missed a session)
Individual workshop tables x14 (each workshop includes a test and
these tables records the clients answers and their score for each
individual test; there will be more than 20 of these when the
database is finished; all joined to the 'Participants Register' by
'Participant Name')
Queries:
Participant Overview Query (links the attendance data from the 'Register' with the grading data from each Workshop to present a read-only
overview; this one seems to work perfectly)
Social Work Query (non-functional; intended to link the 'Client
Register' to the 'AllList' for data entry so that when a new client
is registered it creates a new record in both tables, with the
records matched together)
Participant Query (not yet attempted; as above, intended to link the
'Participant Register' to the 'AllList' for data entry)
BUT I realised that queries can't be used for data entry, so this approach seems to be a dead end. I have had some success with using subforms for data entry but I'm not sure if it's the best way.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
[N.B. There are more tables that store secondary information but aren't relevant to the issue as they are not and will not be linked to any other tables.]
I realised that queries can't be used for data entry
Actually, non-complex queries are usually editable as long as the table whose data you want to edit remains 'at the core' of the query. Access applies a number of factors to determine if a query is editable or not.
Most of the time, it's fairly easy to figure out why a query has become non-editable.
Ask yourself the question: if I edit that data, how will Access ensure that exactly that data will be updated, without ambiguity?
If your tables have defined primary keys and these are part of your query, and if there are no grouping, calculated fields (fields that use some function to change or test the value of that field), or complex joins, then the query should remain editable.
You can read more about that here:
How to troubleshoot errors that may occur when you update data in Access queries and in Access forms
Dealing with Non-Updateable Microsoft Access Queries and the Use of Temporary Tables.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
This remark actually proves that you have design issues in your database.
A basic tenet of Database Design is to remove redundancy as much as possible. One of the reasons is actually to avoid having to update the same data in multiple places.
Another remark: you are using the Client's name as a Natural Key. Frankly, it is not a very good idea. Generally, you want to make sure that what constitutes a Primary key for a table is reliably unique over time.
Using people's names is generally the wrong choice because:
people change name, for instance in many cultures, women change their family name after they get married.
There could also have been a typo when entering the name and now it can be hard to correct it if that data is used as a Foreign Key all in different tables.
as your database grows, you are likely to end up with some people having the same name, creating conflicts, or forcing the user to make changes to that name so it doesn't create a duplicate.
The best way to enforce uniqueness of records in a table is to use the default AutoNumber ID field proposed by Access when you create a new table. This is called a Surrogate key.
It's not mean to be edited, changed or even displayed to the user. It's sole purpose is to allow the primary key of a table to be unique and non-changing over time, so it can reliably be used as a way to reference a record from one table to another (if a table needs to refer to a particular record, it will contain a field that will hold that ID. That field is called a Foreign Key).
The names you have for your tables are not precise enough: think of each table as an Entity holding related data.
The fact that you have a table called AllList means that its purpose isn't that well-thought of; it sounds like a catch-all rather than a carefully crafted entity.
Instead, if this is your list of clients, then simply call it Client. Each record of that table holds the information for a single client (whether to use plural or singular is up to you, just stick to your choice though, being consistent is hugely important).
Instead of using the client's name as a key, create an ID field, an Autonumber, and set it as Primary Key.
Let's also rename the "Social Work Register", which holds the Client's cases, simply as ClientCase. That relationship seems clear from your description of the table but it's not clear in the table name itself (by the way, I know Access allows spaces in table and field names, but it's a really bad idea to use them if you care at least a little bit about the future of your work).
In that, create a ClientID Number field (a Foreign Key) that will hold the related Client's ID in the ClientCase table.
You don't talk about the relationship between a Client and its Cases. This is another area where you must be clear: how many cases can a single Client have?
At most 1 Case ? (0 or 1 Case)
exactly 1 Case?
at least one Case? (1 or more Cases)
any number of Cases? (0 or more Cases)
Knowing this is important for selecting the right type of JOIN in your queries. It's a crucial part of the design assumptions when building your database.
For instance, in the most general case, assuming that a Client can have 0 or more cases, you could have a report that displays the Client's Name and the number of cases related to them like this:
SELECT Client.Name,
Count(ClientCase.ID) AS CountOfCases
FROM Client
LEFT JOIN ClientCase
ON Client.ID = ClienCase.ClientID
GROUP BY Client.Name
You've described your basic design a bit more, but that's not enough. Show us the actual table structures and the SQL of the queries you tried. From the description you give, it's hard to really understand the actual details of the design and to tell you why it fails and how to make it work.

Implementing Comments and Likes in database

I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.
I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).
I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.
The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)
Detailed description:
I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:
comment on any of those 3 tables
mark any of them as liked
tag any of them with some tag
I also want to count the number of likes for every element and the number of times that particular tag was used.
1st approach:
a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.
b) The same counts for comments.
c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...
I really don't like this design for the last part, it smells badly for me ;)
2nd approach:
I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:
a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)
and for places:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)
and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...
At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:
in element (Photo/Article/Place) table
by select count().
I hope that my explanation of the issue is more thorough now.
The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.
Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:
Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:
BTW, there are roughly 3 ways to implement the "ER category":
All types in one table.
All concrete types in separate tables.
All concrete and abstract types in separate tables.
Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).
Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.
Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.
If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.
You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.
This is a general idea
please don´t pay much attention to the field names styling, but more to the relation and structure
This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"
This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "like"
as far as i understand. several tables are required. There is a many to many relation between them.
Table which stores the user data such as name, surname, birth date with a identity field.
Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
each different data type has its table. for example, status updates, photos, links.
the last table is for many to many relation storing an id, user id, data type and data id.
Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?
If not favour the one that requires the fewer tables
In this case:
Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
Find comments by a person about one kind of thing: simple query in either case
Find all comments by a person about all things: this seems little gnarly either way.
I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.
Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.
One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.
One big table with dictionary may become uncontrollable one day.
Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...

Database user table design, for specific scenario

I know this question has been asked and answered many times, and I've spent a decent amount of time reading through the following questions:
Database table structure for user settings
How to handle a few dozen flags in a database
Storing flags in a DB
How many database table columns are too many?
How many columns is too many columns?
The problem is that there seem to be a somewhat even distribution of supporters for a few classes of solutions:
Stick user settings in a single table as long as it's normalized
Split it into two tables that are 1 to 1, for example "users" and "user_settings"
Generalize it with some sort of key-value system
Stick setting flags in bitfield or other serialized form
So at the risk of asking a duplicate question, I'd like to describe my specific scenario, and hopefully get a more specific answer.
Currently my site has a single user table in mysql, with around 10-15 columns(id, name, email, password...)
I'd like to add a set of per-user settings for whether to send email alerts for different types of events (notify_if_user_follows_me, notify_if_user_messages_me, notify_when_friend_posts_new_stuff...)
I anticipate that in the future I'd be infrequently adding one off per-user settings which are mostly 1 to 1 with users.
I'm leaning towards creating a second user_settings table and stick "non-essential" information such as email notification settings there, for the sake of keeping the main user table more readable, but is very curious to hear what expects have to say.
Seems that your dilemma is to vertically partition the user table or not. You may want to read this SO Q/A too.
i'm gonna cast my vote for adding two tables... (some sota key-value system)
it is preferable (to me) to add data instead of columns... so,
add a new table that links users to settings, then add a table for the settings...
these things: notify_if_user_follows_me, notify_if_user_messages_me, notify_when_friend_posts_new_stuff. would then become row insertions with an id, and you can reference them at any time and extend them as needed without changing the schema.

Database schema for ACL

I want to create a schema for a ACL; however, I'm torn between a couple of ways of implementing it.
I am pretty sure I don't want to deal with cascading permissions as that leads to a lot of confusion on the backend and for site administrators.
I think I can also live with users only being in one role at a time. A setup like this will allow roles and permissions to be added as needed as the site grows without affecting existing roles/rules.
At first I was going to normalize the data and have three tables to represent the relations.
ROLES { id, name }
RESOURCES { id, name }
PERMISSIONS { id, role_id, resource_id }
A query to figure out whether a user was allowed somewhere would look like this:
SELECT id FROM resources WHERE name = ?
SELECT * FROM permissions WHERE role_id = ? AND resource_id = ? ($user_role_id, $resource->id)
Then I realized that I will only have about 20 resources, each with up to 5 actions (create, update, view, etc..) and perhaps another 8 roles. This means that I can exercise blatant disregard for data normalization as I will never have more than a couple of hundred possible records.
So perhaps a schema like this would make more sense.
ROLES { id, name }
PERMISSIONS { id, role_id, resource_name }
which would allow me to lookup records in a single query
SELECT * FROM permissions WHERE role_id = ? AND permission = ? ($user_role_id, 'post.update')
So which of these is more correct? Are there other schema layouts for ACL?
In my experience, the real question mostly breaks down to whether or not any amount of user-specific access-restriction is going to occur.
Suppose, for instance, that you're designing the schema of a community and that you allow users to toggle the visibility of their profile.
One option is to stick to a public/private profile flag and stick to broad, pre-emptive permission checks: 'users.view' (views public users) vs, say, 'users.view_all' (views all users, for moderators).
Another involves more refined permissions, you might want them to be able to configure things so they can make themselves (a) viewable by all, (b) viewable by their hand-picked buddies, (c) kept private entirely, and perhaps (d) viewable by all except their hand-picked bozos. In this case you need to store owner/access-related data for individual rows, and you'll need to heavily abstract some of these things in order to avoid materializing the transitive closure of a dense, oriented graph.
With either approach, I've found that added complexity in role editing/assignment is offset by the resulting ease/flexibility in assigning permissions to individual pieces of data, and that the following to worked best:
Users can have multiple roles
Roles and permissions merged in the same table with a flag to distinguish the two (useful when editing roles/perms)
Roles can assign other roles, and roles and perms can assign permissions (but permissions cannot assign roles), from within the same table.
The resulting oriented graph can then be pulled in two queries, built once and for all in a reasonable amount of time using whichever language you're using, and cached into Memcache or similar for subsequent use.
From there, pulling a user's permissions is a matter of checking which roles he has, and processing them using the permission graph to get the final permissions. Check permissions by verifying that a user has the specified role/permission or not. And then run your query/issue an error based on that permission check.
You can extend the check for individual nodes (i.e. check_perms($user, 'users.edit', $node) for "can edit this node" vs check_perms($user, 'users.edit') for "may edit a node") if you need to, and you'll have something very flexible/easy to use for end users.
As the opening example should illustrate, be wary of steering too much towards row-level permissions. The performance bottleneck is less in checking an individual node's permissions than it is in pulling a list of valid nodes (i.e. only those that the user can view or edit). I'd advise against anything beyond flags and user_id fields within the rows themselves if you're not (very) well versed in query optimization.
This means that I can exercise blatant
disregard for data normalization as I
will never have more than a couple
hundred possible records.
The number of rows you expect isn't a criterion for choosing which normal form to aim for.
Normalization is concerned with data integrity. It generally increases data integrity by reducing redundancy.
The real question to ask isn't "How many rows will I have?", but "How important is it for the database to always give me the right answers?" For a database that will be used to implement an ACL, I'd say "Pretty danged important."
If anything, a low number of rows suggests you don't need to be concerned with performance, so 5NF should be an easy choice to make. You'll want to hit 5NF before you add any id numbers.
A query to figure out if a user was
allowed somewhere would look like
this:
SELECT id FROM resources WHERE name = ?
SELECT * FROM permissions
WHERE role_id = ? AND resource_id = ? ($user_role_id, $resource->id)
That you wrote that as two queries instead of using an inner join suggests that you might be in over your head. (That's an observation, not a criticism.)
SELECT p.*
FROM permissions p
INNER JOIN resources r ON (r.id = p.resource_id AND
r.name = ?)
You can use a SET to assign the roles.
CREATE TABLE permission (
id integer primary key autoincrement
,name varchar
,perm SET('create', 'edit', 'delete', 'view')
,resource_id integer );

database modelling -mysql

I am doing the design of a database, that will have eventually thousands of users. Each user has your profile and specific data associated.
In your opinion, it is best practice a table for id, username, activationLink and hash and another for address, age, photo, job, or it is best a unique table for all stuff?
thanks for your time
If:
All (or almost all) users have all data filled
Most of the time you query for all fields
then keep them in a single table, otherwies split them.
In your model, activationLink seems to be queried for only once per activation, so I'd move it into a separate table (which would allow deleting it after the account had been activated).
Address, age, photo and job are usually shown along with the username, so it would be better to merge them into a single table.
Don't allow your initial design to limit the ability (or just make it difficult) to expand your requirements in the future.
At the moment, a user may have one address so you might put it in the users table - what if you want them to be able to store "work" and "home" addresses in future, or a history of past addresses?
A user may only be allowed to have a single photo, but if you put it (or a URL for it) in users.photo, then you'd have to change your data structure to allow a user to have a history of profile photos
As Quassnoi mentions, there are performance implications for each of these decisions - more tables means more complexity, and more potential for slow queries. Don't create new tables for the sake of it, but consider your data model carefully as it quickly becomes very hard to change it.
Any values that are a strict 1-to-1 relationship with a user entity, and are unlikely to ever change and require a history for (date of birth is a good example) should go in the table with the core definition. Any potential 1-to-many relationships (even if they aren't right now) are good candidates for their own tables.