Should a session table be cleared off from the records after a user logs out? - mysql

I am using a MySql table to store a session record for the current logged in user. Once the user logs off, I update few fields in the same record and flags(revoked) it that it should not be used again. So for every LogIn a new record is created. This serves my purpose, but it turns out that the table is going to grow huge.
What should be the standard approach for storing Sessions? Should the ones, which are revoked be stored in a separate table, or should they be deleted or left in the same table?
I consider leaving the data in the same session table. While querying for a particular record, I query with two fields : (idPeople (not unique) and revoked (0 or 1)), for example SELECT * FROM session WHERE idPeople = "someValue" AND revoked = 0. and then update the record if needed while the user is, logged in or kogging out. Will the huge size of table affect this? or MySql will handle this? And what are other ramifications for this which I am unable to see?

First, it may be a good idea to add a unique field to your table (e.g. SESSION_ID, which could be a running auto-increment number), define this field as a unique ID, and use it to quickly find the record to be updated (i.e. revoke=1).
Second, this type of table always triggers the question you are asking, and the best answer can only be given after you assess and answer some preliminary questions, for instance:
When you wish to check the activities of a user, how far into the past does it make sense to go? One month? One year?
What is the longest period that you may wish to keep this information available (even using non routine queries to retrieve?
What type of questions (queries) I expect to be asked on this table?
One you answer those questions, you can consider the following options:
Have a routine process that would run once a day (at midnight or any other time your system can afford it) which would delete rows whose timestamp is older than, say, one month (or any other period suiting your needs), OR
Same as above but would first copy those records to an "history" table,
Change the structure of your table to a more efficient one, by adding some fields (as suggested above) and indices that would provide good answers for your "SELECT" needs.

Related

Should I use more columns or more rows?

I need to create a table where each user (approx 60 atm) would have a defined task for each day. Right now the database have one column for each user with the task name in it (which is bad in my opinion as each new user would need to change the scheme of the table) and a "date" column.
A solution would be to have a "user" column and add a "task" column but that would mean there would be 60 (number of current users) rows per day.
I don't really know what's the best situation in this case.
Should I use more columns or more rows?
They're two completely different things, so this comparison doesn't make much sense...
Right now the database have one column for each user
Bad idea. Full stop. A user is a record of data, not a structural element of the database itself. For example, a table of users might contain columns like Username, Email, RegistrationDate, etc. It would not be a single row of data in which you add a column for each new user.
This would be a nightmare to maintain, would render things like Foreign Keys useless (and, honestly, render the entire concept of a relational database useless), would reach resource limits very quickly, etc.
Each record of information is a row, not a column (or table). In this case, each row in your table is a "User Task". It defines (or has a Feorign Key to) a User and defines (or has a Foreign Key to) a Task.
but that would mean there would be 60 (number of current users) rows per day
If the number of records in the table starts to become a problem, you can start looking into things like sharding and partitioning, archiving old data, etc. You've got time though, because "dozens of records per day" is sustainable for thousands of years. (And by then I imagine the hardware will be at least twice as good as it is today.)
Right now the database have one column for each user with the task
name in it (which is bad in my opinion as each new user would need > to change the scheme of the table)
You're right, this is very bad. Using one column for user, one for the task and one for the date, will be much better.
60 rows per day is not much. This means 21.900 rows per years and 219.000 rows in ten years. Mysql is able to handle millions of rows in a table
If you have two indexes, one for user and one for the date, searching for data will be fast enough.
Knowing nothing else about your database or schema, why not create a dimension table to store your users and fact table to track your task details?
That way you can more easily add new users and the tasks table would continue to grow as new facts are added. It would also be very easy to denormalize this model for query and/or reporting purposes.
Adding columns is a nuisance and can be slow. Instead have a table with columns (user, task, etc)
Even "60 rows per second" is not a problem. 600/second might be.
See the tag [pivot-table] for how to turn rows into columns for output display.

I came up with this SQL structure to allow rolling back and auditing user information, will this be adequate?

So, I came up with an idea to store my user information and the updates they make to their own profiles in a way that it is always possible to rollback (as an option to give to the user, for auditing and support purposes, etc.) while at the same time improving (?) the security and prevent malicious activity.
My idea is to store the user's info in rows but never allow the API backend to delete or update those rows, only to insert new ones that should be marked as the "current" data row. I created a graphical explanation:
Schema image
The potential issues that I come up with this model is the fact that users may update the information too frequently, bloating up the database (1 million users and an average of 5 updates per user are 5 million entries). However, for this I came up with the idea of putting apart the rows with "false" in the "current" column through partitioning, where they should not harm the performance and will await to be cleaned up every certain time.
Am I right to choose this model? Is there any other way to do such a thing?
I'd also use a second table user_settings_history.
When a setting is created, INSERT it in the user_settings_history table, along with a timestamp of when it was created. Then also UPDATE the same settings in the user_settings table. There will be one row per user in user_settings, and it will always be the current settings.
So the user_settings would always have the current settings, and the history table would have all prior sets of settings, associated with the date they were created.
This simplifies your queries against the user_settings table. You don't have to modify your queries to filter for the current flag column you described. You just know that the way your app works, the values in user_settings are defined as current.
If you're concerned about the user_settings_history table getting too large, the timestamp column makes it fairly easy to periodically DELETE rows over 180 days old, or whatever number of days seems appropriate to you.
By the way, 5 million rows isn't so large for a MySQL database. You'd want your queries to use an index where appropriate, but the size alone isn't disadvantage.

Store all user's login dates

Let's say that I have a website and I want to know all the users that logged in during a certain time interval.
Would it be a good idea to create a new table in the database for this purpose and add a new entry whenever a users logs in?
The table would contain 2 columns: the id of the user and the login date.
My main concern is that the number of entries from the table will become extremely large.
Can this be considered a good idea? Do you know if this method is being applied for other websites?
Thanks in advance!
The number of records in a table can be controlled via external script, which is put on cron/scheduler. If it becomes too big, old records can be removed
if it is not possible, as a workaround there could be a check of the number of records on each insert
just do not forget to set an index on the date field...
Yes, you can create a table that logs all the login time of each user. If there are millions of users you might want to store the recent login time instead. If space is not a problem then it will be good to store the login time each time a user is authenticated or authorized. Like this you can archive the data in this table periodically.
The general answer to this question is 'depends'.
You can:
Add user to the table on login. You hit the disk for each user, so be careful with a big amount of users.
You store a bunch of users in memory and write all the group at a certain size or time. This way you hit the disk fewer times.
Depending on how many users you expect you can think of a no-SQL solution.
Depending on your system, I advise the 2nd o 3rd approach
Read this for more info: Fast write performance, even if reads are very slow

Recommend to track all logins, update login table, or both?

Currently I am having a hard time deciding/weighing the pros/cons of tracking login information for a member website.
Currently
I have two tables, login_i and login_d.
login_i contains the member's id, password, last login datetime, and total count of logins. (member id is primary key and obviously unique so one row per member)
login_d contains a list of all login data in history which tracks each and every time a login occurs. It contains member's id, datetime of login, ip_address of login. This table's primary key is simply an auto-incremented INT field, really purposeless but need a primary and the only unique single field (an index on the otherhand is different but still not concerned).
In many ways I see these tables as being very similar but the benefit of having the latter is to view exactly when a member logged in, how many times, and which IP it came from. All of the information in login_i (last login and count) truthfully exists in login_d but in a more concise form without ever needing to calculate a COUNT(*) on the latter table.
Does anybody have advice on which method is preferred? Two tables will exist regardless but should I keep record of last_login and count in login_i at all if login_d exists?
added thought/question
good comment made below - what about also tracking login attempts based on a username/email/ip? Should this ALSO be stored in a table (a 3rd table I assume).
this is called denormalization.
you ideally would never denormalize.
it is sometimes done anyway to save on computationally expensive results - possibly like your total login count value.
the downside is that you may at some point get into a situation where the value in one table does not match the values in the other table(s). of course you will try your best to keep them properly up to date, but sometimes things happen. In this case, you will possibly generate bugs in application logic if they receive an incorrect value from one of the sources.
In this specific case, a count of logins is probably not that critical to the successful running of the app - so not a big risk - although you will still have the overhead of maintaining the value.
Do you often need last login and count? If Yes, then you should store it in login_i aswell. If it's rarely used then you can take your time process the query in the giant table of all logins instead of storing duplicated data.

Where to store users visited pages?

I have a project, where I have posts for example.
The task is next: I must show to user his last posts visit.
This is my solution: every time user visits new (for him) topic, I create a new record in table visits.
Table visits has next structure: id, user_id, post_id, last_visit.
Now my tables visits has ~14,000,000 records and its still growing every day..
May be my solution isnt optimal and exists another way how to store users visits?
Its important to save every visit as standalone record, because I also have feature to select and use users visits. And I cant purge this table, because data could be needed later month, year. How I could optimize this situation?
Nope, you don't really have much choice other than to store your visit data in a table with columns for (at a bare minimum) user id, post id, and timestamp if you need to track the last time that each user visited each post.
I question whether you need an id field in that table, rather than using a composite key on (user_id, post_id), but I'd expect that to have a minor effect, provided that you already have a unique index on (user_id, post_id). (If you don't have an index on that pair of fields, adding one should improve query performance considerably and making it a unique index or composite key will protect against accidentally inserting duplicate records.)
If performance is still an issue despite proper indexing, you should be able to improve it a bit by segmenting the table into a collection of smaller tables, but segment it by user_id or post_id (rather than by date as previous answers have suggested). If you break it up by user or post id, then you will still be able to determine whether a given user has previously viewed a given post and, if so, on what date with only a single query. If you segment it by date, then that information will be spread across all tables and, in the worst-case scenario of a user who has never previously viewed a post (which I expect to be fairly common), you'll need to separately query each and every table before having a definitive answer.
As for whether to segment it by user id or by post id, that depends on whether you will more often be looking for all posts viewed by a user (segment by user_id to get them all in one query) or all users who have viewed a post (segment by post_id).
If it doesn't need to be long lasting, you could store it in session instead. If it does, you could either break the records apart by table, like say 1 per month, or you could only store the last 5-10 pages visited, and delete old ones as new ones come in. You could also change it to pages visited today, this week, etc.
If you do need all 14 million records, I would create another historical table to archive the visits that are not the most relevant for the day-to-day site operation.
At the end of the month (or week, or quarter, etc...) have some scheduled logic to archive records beyond a certain cutoff point to the historical table and reduce the number of records in the "live" table. This should help increase the query speed on the "live" table since you would have less records in it.
If you do need to query all of the data, you can use both tables and have all of the data available to you.
you could delete the ones you don't need - if you only want to show the last 10 visited posts then
DELETE FROM visits WHERE user_id = ? AND id NOT IN (SELECT id from visits where user_id = ? ORDER BY last_visit DESC LIMIT 0, 10);
(i think that's the best way to do that query, any mysql guru can tell me otherwise? you can ORDER BY in DELETE but the LIMIT only takes 1 parameter, so you can't do LIMIT 10, 100 there)
after inserting/updating each new row, or every few days if you like
Having a structure like (id, user_id, post_id, last_visit) for your vists table, makes it appear as though you are saving all posts, not just last post per Topic. Don't you need a topic ID in there somewhere so that you can determine what there last post PER TOPIC was, and so you know which row to replace when they post in the same topic more than once?
Store post_ids to $_SESSION and then using MYSQL IN with one SELECT query you will be able to show his visited posts. But all those ids will be destroyed after member close his browser, but anyways, this is much more faster and optimal than using database.
edit: sorry, I didn't notice you that you must store that records in database and use it after months. Then I have no idea how to optimize it, but with 14 mln. records you should definitely use indexes.