mysql: Complex conditional query before group by - mysql

I have a Postings table (with data of people posting a service they offer) and a table of people that have corresponded (written mails) to these Posting authors thus starting a Transaction (inserted into a second table: Transactions).
Each Posting can have many transactions. Each time a user Logs-in he/she (Transaction_Taker) can send mail to the author (Posting_Author) of his choice.
Each first mail generates a new Transaction and its Transaction_Id (int) is appended to Postings table in the varchar, hyphen-separated Posting_Transaction_List field.
The contents of each subsequent mail that same (logged-in) user (aka Transaction_Taker) sends, does not create/insert a new transaction (nothing inserted to field Posting_Transaction_List) but rather is appended (update) to the Transaction that was started initially by that user for that Posting.
For easy navigation and search, once a user has logged in, I show an ajax generated list of all these postings such that each Posting only shows once though it can have many transactions. In other words I need to show a list of all available Postings including the ones this (logged-in) user has started a Transaction for, but these postings (the ones in which this user has an ongoing transaction) should show, but we should not show that same Posting with OTHER user's transactions. That is, only the logged-in user should see his/her transactions.
Assuming I have table Postings with fields: Posting_Id (int), Posting_Author (varchar), Posting_Content (text), Posting_Transaction_List (varchar)
…and table Transactions with fields: Transaction_Id (int), Transaction_Posting_Id (int), Transaction_Taker_Id (int)
I am (almost) achieving my goal with the following SQL:
$AlmostGoodSQL = "SELECT *, Posting_Id FROM Postings LEFT JOIN Transactions ON
Postings.Posting_Id = Transactions.Transaction_Posting_Id WHERE Posting_Content
LIKE '%"+$SomeSearchString+"%' GROUP BY Posting_Id";
The problem is this shows a distinct instance of each Posting, but not necessarily the ones that have to do with the logged-in user (in the case where there are many transactions -including hers- for a Posting). To do this, I would need to select ALL Postings without transactions attached PLUS those that have Transactions just for this user BEFORE doing the group by. This is what I cannot achieve. I believe that due to the way 'group by' works you could maybe select maximum or minimum values, but not an exact match, say for all the Postings that have Transactions with user (Transaction_Taker) '123456'. I think "group by" shows whichever instance it finds first. How to make it match my criteria?
It does not look like a subquery would do, but rather like something conditional, like: "Search for all Postings and if the Posting has a Transaction listed in the Posting_Transaction_List that points to a Transaction where the Transaction_Taker_Id is the one of the logged-in user ($UserId), then show it distinctly (just that one, once)"… and I don't know how to do all that in SQL: Can anybody please help?

Related

Is there a scalability issue in having a one-to-many relationship between participants and conversation?

I have a database design as shown in following entity-relationship diagram (ERD):
https://app.dbdesigner.net/designer/schema/0-social_media-00a3405c-0bcd-4809-9f8e-e86c1b8e5f33
I was wondering if I should have a one-to-many relationship between Participants and Conversation.
Issue: need many joins
The issue is that we need to make a join every time we want to get the id of the Participants of a Conversation to broadcast Messages.
Not only that, but we also need the content of the Messages, meaning we need to make two joins between three tables.
Questions
Is there a more scalable solution for this?
Is there any bottleneck issues?
Is there anything else wrong with the table aside that as an added bonus?
Scalable because:
If one conversation attracts more and more Users (in their role as participants), you simply have to add rows in the table Participants. Imagine the conversation has a members-list, it's called Participants.
If one User account was deleted, you simply have to search for all his records (associated conversations) in table Participants and delete them as well.
Both cases mean only a modification of Participants, whereas the conversation remains untouched.
Associative Entity
This membership or relationship of User to Conversation is bridged by a so-called associative relationship, associative table or associative entity. Means one User can attend (participate in) 0 or many Conversations, vice-versa one Conversation can have (at least) one (the creator) or many participating Users.
So the entity/table Participants acts like a bridge: connecting two sides/perspectives.
Broadcast Example
User A wants to broadcast a message to the channel/conversation 1. Now the system needs to determine all recipients. So look only within table Participants for the conversation 1 and find their attending Users A, B and C. All except the sender A should receive the broadcast: B and C.
There was no join involved. A simple query: SELECT user_id FROM participants WHERE conversation_id = 1 AND user_id <> 'A'. Given the Message and assuming that user_ids can be used directly as destination (email-address, phone-number, etc.), the system can immediately send the broadcast out.

How to efficiently program a big/big left join query in mySQL?

Our problem lies in performing a left join on two large tables (both having millions of entries).
The first one is a table that contains input supplied by the end-user of our program. It contains answers to a variety of questions. Every question belongs to a certain questionnaire. The most important columns are an identifier for the given response, an identifier for the questionnaire form, the datetime the answer is given and an identifier for the user that supplied the answer.
The second table contains information on daily progress of the users regarding the completion of questionnaires. It contains information on the amount of answers a certain user has given on a certain day for a given activity. The most important columns in this table are the user id, the questionnaire id and the date.
The second database is updated right after a new answer enters the first database. Updating is performed by code (workers) that runs on a different server. We would to like make the system robust against failure of this other server. An important step to ensure that the table with the results ('responses') remains in sync with the progress ('progress_questionnaires') table is to be able to check whether a combination of user_id, questionnaire_id and datetime from the 'responses' table is also present in the 'progress_questionnaires' table. A query that captures the required results, but does not perform on large databases (NxN, in which N is couple of millions entries), is displayed below.
A query that captures the required results is:
SELECT r.chapter_id, r.user_id, CAST(first_created as date) as date, 1 as original
FROM responses r
LEFT JOIN progress_questionnaires pq ON r.questionnaire_id = pq.questionnaire_id AND r.user_id = pq.user_id AND CAST(r.first_created as date) = pq.date
WHERE pa.activity_id IS NULL
GROUP BY r.questionnaire_id, r.user_id, CAST(r.first_created as date)
As stated before, this query does capture the required results, but does not perform well on large tables. All key columns are properly indexed as far as we know.
We would be very happy if someone could help us out.
P.S. We are using MariaDB, SQL version 5.5.43. I hope I supplied al necessary information, but logically I would be happy to supply additional information where necessary.

User Point System DB Design Approach

I am looking point system DB Design. My Question is quite similar to the question that I have found here. : - Database design - Approach for storing points for users .
In this system, user earns points when any of this action happens:-
User Register on the website. (i.e Active Entry to the User table)
User Writes Answer of the Other Users Question. (i.e Entry to the
Answer table)
User Answers are rated be other Users. (i.e Entry to Answer Rating
Table for the User )
User Invites Other Users to Join the platform
From the DB Design Side, I have created these two tables:-
Action_Master table, and
User_Action_Point table.
The Action_Master table contains this :- (id, action_name, action_point)
The User_Action_Point table stores the history of each actions, so it look like this:-
(id, action_master_id, action_done, created_by, created_at, updated_by, updated_at, deleted_at)
Now the problem here is the User_Action_Point table, it contains the repeated data of the User Table, Answer Table and Answer_rating Table.
This problem is very well addressed by Jeffrey in the first answer of the linked question. According to his answer we should have to execute Views or Stored Procedure to sum up the points from different tables every time. This approach is awesome because we need not to handle the overhead of data deletion or any other changes that may affect the User Points.
But, is that a good way when we need users points very frequently ? Don't you think this approach can increase the db response time or the loads on the MySQL server ?
or I need to store the aggregated Users points data in some table with the overhead of handling repeated data (i.e if anything get deleted then we also have to minus those points in the point table.)
Please Suggest.

Where to store users visited pages?

I have a project, where I have posts for example.
The task is next: I must show to user his last posts visit.
This is my solution: every time user visits new (for him) topic, I create a new record in table visits.
Table visits has next structure: id, user_id, post_id, last_visit.
Now my tables visits has ~14,000,000 records and its still growing every day..
May be my solution isnt optimal and exists another way how to store users visits?
Its important to save every visit as standalone record, because I also have feature to select and use users visits. And I cant purge this table, because data could be needed later month, year. How I could optimize this situation?
Nope, you don't really have much choice other than to store your visit data in a table with columns for (at a bare minimum) user id, post id, and timestamp if you need to track the last time that each user visited each post.
I question whether you need an id field in that table, rather than using a composite key on (user_id, post_id), but I'd expect that to have a minor effect, provided that you already have a unique index on (user_id, post_id). (If you don't have an index on that pair of fields, adding one should improve query performance considerably and making it a unique index or composite key will protect against accidentally inserting duplicate records.)
If performance is still an issue despite proper indexing, you should be able to improve it a bit by segmenting the table into a collection of smaller tables, but segment it by user_id or post_id (rather than by date as previous answers have suggested). If you break it up by user or post id, then you will still be able to determine whether a given user has previously viewed a given post and, if so, on what date with only a single query. If you segment it by date, then that information will be spread across all tables and, in the worst-case scenario of a user who has never previously viewed a post (which I expect to be fairly common), you'll need to separately query each and every table before having a definitive answer.
As for whether to segment it by user id or by post id, that depends on whether you will more often be looking for all posts viewed by a user (segment by user_id to get them all in one query) or all users who have viewed a post (segment by post_id).
If it doesn't need to be long lasting, you could store it in session instead. If it does, you could either break the records apart by table, like say 1 per month, or you could only store the last 5-10 pages visited, and delete old ones as new ones come in. You could also change it to pages visited today, this week, etc.
If you do need all 14 million records, I would create another historical table to archive the visits that are not the most relevant for the day-to-day site operation.
At the end of the month (or week, or quarter, etc...) have some scheduled logic to archive records beyond a certain cutoff point to the historical table and reduce the number of records in the "live" table. This should help increase the query speed on the "live" table since you would have less records in it.
If you do need to query all of the data, you can use both tables and have all of the data available to you.
you could delete the ones you don't need - if you only want to show the last 10 visited posts then
DELETE FROM visits WHERE user_id = ? AND id NOT IN (SELECT id from visits where user_id = ? ORDER BY last_visit DESC LIMIT 0, 10);
(i think that's the best way to do that query, any mysql guru can tell me otherwise? you can ORDER BY in DELETE but the LIMIT only takes 1 parameter, so you can't do LIMIT 10, 100 there)
after inserting/updating each new row, or every few days if you like
Having a structure like (id, user_id, post_id, last_visit) for your vists table, makes it appear as though you are saving all posts, not just last post per Topic. Don't you need a topic ID in there somewhere so that you can determine what there last post PER TOPIC was, and so you know which row to replace when they post in the same topic more than once?
Store post_ids to $_SESSION and then using MYSQL IN with one SELECT query you will be able to show his visited posts. But all those ids will be destroyed after member close his browser, but anyways, this is much more faster and optimal than using database.
edit: sorry, I didn't notice you that you must store that records in database and use it after months. Then I have no idea how to optimize it, but with 14 mln. records you should definitely use indexes.

Facebook like notifications tracking (DB Design)

I am just trying to figure out how Facebook's database is structured for tracking notifications.
I won't go much into complexity like Facebook is. If we imagine a simple table structure for notificaitons:
notifications (id, userid, update, time);
We can get the notifications of friends using:
SELECT `userid`, `update`, `time`
FROM `notifications`
WHERE `userid` IN
(... query for getting friends...)
However, what should be the table structure to check out which notifications have been read and which haven't?
I dont know if this is the best way to do this, but since I got no ideas from anyone else, this is what I would be doing. I hope this answer might help others as well.
We have 2 tables
notification
-----------------
id (pk)
userid
notification_type (for complexity like notifications for pictures, videos, apps etc.)
notification
time
notificationsRead
--------------------
id (pk) (i dont think this field is required, anyways)
lasttime_read
userid
The idea is to select notifications from notifications table and join the notificationsRead table and check the last read notification and rows with ID > notificationid. And each time the notifications page is opened update the row from notificationsRead table.
The query for unread notifications I guess would be like this..
SELECT `userid`, `notification`, `time` from `notifications` `notificationsRead`
WHERE
`notifications`.`userid` IN ( ... query to get a list of friends ...)
AND
(`notifications`.`time` > (
SELECT `notificationsRead`.`lasttime_read` FROM `notificationsRead`
WHERE `notificationsRead`.`userid` = ...$userid...
))
The query above is not checked.
Thanks to the idea of db design from #espais
You could add another table...
tblUserNotificationStatus
-------------------------
- id (pk)
- notification_id
- user_id
- read_status (boolean)
If you wanted to keep a history, you could keep the X latest notifications and delete the rest that are older than your last notification in the list....
If, when you give notifications, you give all relevant notifications available at that time, you can make this simpler by attaching timestamps to notifiable events, and keeping track of when each user last received notifications. If you are in a multi-server environment, though, you do have to be careful about synchronization. Note that this approach doesn't require true date-time stamps, just something that increases monotonically.
I see no-one here addresses the fact, that notifications are usually re-occurring, aka. notification of an upcoming transaction is always going to be the same, but with a different transaction ID or Date in it. as so: { You have a new upcoming payment: #paymentID, with a due date of #dueDate }.
Having texts in a different table can also help with
If you want to change the notification text later on
Making the app multilingual is easier, because I can just layer the notifications table with a language code and retrieve the appropriate string
Thus I also made a table for those abstract notifications, which are just linked under the the user with a middle table, where one notification type can be sent to one user at multiple times. I also linked the notifications to the user not by a foreign key ID, but I made notification codes for all notifications and full_text indexed the varchar field of those codes, for faster read speeds. Due to the fact that these notifications need to be sent at specific times, it is also easier for the developer to write
NotificationService::sendNew( Notification::NOTE_NEW_PAYMENT, ['paymentId'] => 123, ['dueDate'] => Carbon::now(), 'userIdToSendTo' );
Now since my messages are going to have custom data in them, that is inserted into the string, as you can see from the second argument beforehand, then I will store them in a database blob. as such
$values = base64_encode(serialize($valuesInTextArray));
This is because I want to decouple the notifications from other tables and as such I dont want to crete unnessecary FK relations from and to the notifications table, so that I can for example say notification 234 is attached to transaction 23 and then join and get that transaction ID. Decoupling this takes away the overhead of managing these relations. The downside is, it is nigh impossible to delete notifications, when for example a transaction is deleted, but in my use case I decided, this is not needed anyway.
I will retrieve and fill the texts on the App side as follows. Ps. I am using someones vksprintf function (https://github.com/washingtonpost/datawrapper/blob/master/lib/utils/vksprintf.php), props to him!
$valuesToFillInString = unserialize(base64_decode($notification->values));
vksprintf( $notificationText->text, $valuesToFillInString )
Notice also which fields I index, because I am going to find or sort by them
My Database design is as follows
==============================
TABLE: Users
id (pk)
==============================
TABLE: Notifications
id (pk)
user_id (fk, indexed)
text_id (fk - NotificationTexts table)
values (blob) [containing the array of values, to input into the text string]
createdDateTime (DateTime)
read (boolean)
[ClusterIndex] => (user_id, createdDateTime)
==============================
TABLE: NotificationTexts
id (pk)
text_id (uniquem indexed)
text (varchar) [{ You have a new upcoming payment: #paymentID, with a due date of #dueDate }]
note (varchar, nullable) [notes for developers, informational column]
I am also trying to figure out how to design a notification system. Regarding notification status (read, unread, deleted, archived, ect) I think that it would be good a good candidate to for ENUM. I think it is possible that there will be more than two different types of status other than READ and UNREAD such as deleted, archived, seen, dismissed, ect.
That will allow you to expand as your needs evolve.
Also I think it may make sense (at least in my case) to have a field to store an action url or a link. Some notifications could require or prompt the user to follow a link.
It also may make sense to have a notification type as well if you want different types. I am thinking there could be system notifications (such as a verify email notification) and user prompted notifications (such as a friend request).
Here is the structure I think would be a minimum to have a decent notification system.
users
-------------
id
username
password
email
notifications
-------------
id
user_id (fk)
notification_type (enum)
notification_status (enum)
notification_action (link)
notification_text
date_created (timestamp)
Table are following
User
userId (Integer)
fullName(VarChar)
Notification
notificationId (Integer)
creationDate (Date)
notificationDetailUrl (VarChar)
isRead (bollean)
description (VarChar)
userId (F.K)