I've researched related questions on the site but failed to find a solution. What I have is a user activity table in MySQL. It lists all kind of events of a user within a photo community site. Depending on the event that took place, I need to query certain data from other users.
I'll explain it in a more practical way by using two examples. First, a simple event, where the user joined the site. This is what the row in the activity table would look like:
event: REGISTERED
user_id: 19 (foreign key to user table)
date: current date
image_id: null, since this event has nothing to do with images
It is trivial to query this. Now an event for which extra data needs to be queried. This event indicates a user that uploaded an image:
event: IMAGEUPLOAD
user_id: 19 (foreign key to user table)
date: current date
image_id: 12
This second event needs to do a join to the image table to get the image URL column from that table. A third event could be about a comment vote, where I would need to do a join to the comments table to get extra columns.
In essence, I need a way to conditionally select extra columns (not rows) per row based on the event type. This is easy to do when the columns come from the same table, but I'm struggling to do this using joins from other tables. I hope to do this in one, conditional query without the use of a stored procedure.
Is this possible?
You could make the joins depend on the event type, like:
select *
from Events e
left join Image i
on e.event = 'IMAGEUPLOAD'
and e.image_id = i.id
left join comments c
on e.event = 'COMMENT'
and e.comment_id = c.id
If there's one column that is shared among all linked tables, for example create_date, you can coalesce to select the one that's not NULL:
select coalesce(i.create_date, c.create_date, ...) as create_date
Doing precisely what you want to do is not possible. A SELECT is designed to return a list of tuples/rows, and each has the same number of elements/columns.
What you really are doing here is collecting 2 different kinds of information, and you're going to have to process the 2 different kinds of information separately anyway, which should be a hint that you're doing something slightly wrong. Instead, pull the various event types out individually, perform whatever additional operations you need to do to convert them to your common output type (eg. HTML if this is for a website), and then interleave them together at that stage.
Related
I am getting students added to course using a query but i noticed 2 of them do not appear( even if they are active and listed as participants) checking inside tables i noticed they do not appear in mdl_user_info_data. How can i prevent this? or whats the reason they dont were added to this table
This is my query:
SELECT u.id,u.username,u.firstname,u.lastname,u.email, b.data
FROM mdl_user u,mdl_role_assignments r, mdl_user_info_data b
WHERE u.id=r.userid
AND u.id=b.userid
AND r.roleid=5
AND r.contextid = 'somecontextId'
ORDER BY u.email ASC;
The table mdl_user_info_data holds the values for custom user profile fields (that are defined in mdl_user_info_field).
If a user's profile has not been edited since a particular custom user profile field has been created, then there will not be an associated mdl_user_info_data record for them.
Note that if there is more than one custom user field defined, there can be more than one mdl_user_info_data field - so your query could return more than one record per user.
You probably want to rewrite your query to LEFT JOIN with mdl_user_info_data. You probably also want to LEFT JOIN with mdl_user_info_field to identify which of the custom user profile fields it relates to.
Also note that your query makes a number of assumptions that may not always be true - if your query is running inside Moodle code, then you should use {user_info_data} instead of the 'mdl_' prefix, as that prefix can be changed. Hard-coding roleid 5 for 'student' can also fail on some sites (although it is usually the case).
Our problem lies in performing a left join on two large tables (both having millions of entries).
The first one is a table that contains input supplied by the end-user of our program. It contains answers to a variety of questions. Every question belongs to a certain questionnaire. The most important columns are an identifier for the given response, an identifier for the questionnaire form, the datetime the answer is given and an identifier for the user that supplied the answer.
The second table contains information on daily progress of the users regarding the completion of questionnaires. It contains information on the amount of answers a certain user has given on a certain day for a given activity. The most important columns in this table are the user id, the questionnaire id and the date.
The second database is updated right after a new answer enters the first database. Updating is performed by code (workers) that runs on a different server. We would to like make the system robust against failure of this other server. An important step to ensure that the table with the results ('responses') remains in sync with the progress ('progress_questionnaires') table is to be able to check whether a combination of user_id, questionnaire_id and datetime from the 'responses' table is also present in the 'progress_questionnaires' table. A query that captures the required results, but does not perform on large databases (NxN, in which N is couple of millions entries), is displayed below.
A query that captures the required results is:
SELECT r.chapter_id, r.user_id, CAST(first_created as date) as date, 1 as original
FROM responses r
LEFT JOIN progress_questionnaires pq ON r.questionnaire_id = pq.questionnaire_id AND r.user_id = pq.user_id AND CAST(r.first_created as date) = pq.date
WHERE pa.activity_id IS NULL
GROUP BY r.questionnaire_id, r.user_id, CAST(r.first_created as date)
As stated before, this query does capture the required results, but does not perform well on large tables. All key columns are properly indexed as far as we know.
We would be very happy if someone could help us out.
P.S. We are using MariaDB, SQL version 5.5.43. I hope I supplied al necessary information, but logically I would be happy to supply additional information where necessary.
On the project I'm working on we have an activity table and each activity can be linked to one of about 20 different "activity details" tables...
e.g. If the activity was of type "work", then it would have a corresponding activity_details_work record, if it was of type "sick leave" then it would have a corresponding activity_details_sickleave record and so on.
Currently we are loading the activities and then for each activity we have a separate query to go fetch the activity details from the relevant table. This obviously doesn't scale well if you have thousands of activities.
So my initial thought was to have a single query which fetches the activities and joins the details in one go e.g.
SELECT * FROM activity
LEFT JOIN activity_details_1_work ON ...
LEFT JOIN activity_details_2_sickleave ON ...
LEFT JOIN activity_details_3_travelwork ON ...
...etc...
LEFT JOIN activity_details_20_yearleave ON ...
But this will result in each record having 100's of fields, most of which are empty and that feels nasty.
Lazy-loading the details isn't really an option either as the details are almost always requested in the core logic, at least for the main types anyway.
Is there a super clever way of doing this that I'm not thinking of?
Thanks in advance
My suggestion is to define a view for each ActivityType, that is tailored specifically to that activity.
Then add an index on the Activity table lead by the ActivityType field. Cluster said index unless there is an overwhelming need for some other to be clustered (or performance benchmarking shows some other clustering selection to be more performant).
Is there a particular reason why this degree of denormalization was designed in? Is that reason well known?
Chances are your activity tables are like (date_from, date_to, with_who, descr) or something to that effect. As Pieter suggested, consider tossing in a type varchar or enum field in there, so as to deal with a single details table.
If there are rational reasons to keep the tables apart, consider adding triggers that maintain boolean/tinyint fields (has_work, has_sickleave, etc), or a bit string (has_activites_of_type where the first position amounts to has_work, the next to has_sickleave, etc.).
Either way, you'll probably be better off by fetching the activity's details in one or more separate queries -- if only to avoid field name collisions.
I don't think enum is the way to go, because as you say there might be 1000's of activities, then altering your activity table would become an issue.
There is no point doing a left join on a large number of tables either.
So the options that you have are :
See this The first comment might be useful.
I am guessing that your activity table has a field called activity_type_id.
Build a table called activity_types containing fields activity_type_id, activity_name, activity_details_table_name. First query in the following way
activity
inner join
activity_types
using( activity_type_id )
This query gives you the table name on which to query for the details.
This way you can add any new activity type just by adding a row in the activity_types table.
I have 200 users each user will eventually have a "reviewINFO" table with certain data.
Each user will have a review every 3 to 4 months
So for every review, it creates a new row inside the "reviewINFO" table.
This is where i'm stuck. I'm not sure if I need to serialize a table inside each row or not.
Example:
-> links
"USER1reviewINFO"-row1->USER1table1
-row2->USER1table2
-row3->USER1table3
-row4->USER1table4
-row5->USER1table5
"USER2reviewINFO"-row1->USER2table1
-row2->USER2table2
-row3->USER2table3
-row4->USER2table4
-row5->USER2table5
using this method it will make a couple of thousand rows within two years. And I think its harder to manage.
"Userxtablex" is a table with dynamic rows of children names,ages,boolean
What i'm think of doing is serialize each USERxtable into its corresponding row.
Please help as I would not like to make this complicate or inefficient
Generally, you should never have to serialize data of this nature into a table row to accomplish what your goal is (which I am assuming is an implicit link between a user and a review)
What you need to do is key the reviews by a user_id such that all the reviews are packaged in one table, and relate numerically back to the users table.
Assuming you have an AUTO_INCREMENT primary key in the user table, all you would need is a user_id field in the reviews table that represents what user the review relates to. There is no need for a separate structure for each user, if that's what you are suggesting. Reviews can have date fields as well, so you can perform queries for a specific year or window of time.
You can then use a JOIN query to select out your data set relating to a particular user or review, and apply the usual WHERE clause to determine what result set you want to fetch.
I have searched for a solution for this problem, but haven't found it (yet), probably because I don't quite know how to explain it properly myself. If it is posted somewhere already, please let me know.
What I have is three databases that are related to each other; main, pieces & groups. Basically, the main database contains the most elementary/ most used information from a post and the pieces database contains data that is associated with that post. The groups database contains all of the (long) names of the groups a post in the main database can be 'posted in'. A post can be posted in multiple groups simultaneously. When a new post is added to my site, I check the pieces too see if there are any duplicates (check if the post has been posted already). In order to make the search for duplicates more effective, I only check the pieces that are posted in the same group(s).
Hopefully you're still with me, cause here's where it starts to get really confusing I think (let me know if I need to specify things more clearly): right now, both the main and the pieces database contain the full name of the group(s) (basically I'm not using the groups database at all). What I want to do is replace the names of those groups with their associated IDs from the groups database. For example, I want to change this:
from:
MAIN_table:
id | group_posted_in
--------|---------------------------
1 | group_1, group_5
2 | group_15, group_75
3 | group_1, group_215
GROUPS_table:
id | group_name
--------|---------------------------
1 | group_1
2 | group_2
3 | group_3
etc...
into:
MAIN_table:
id | group_posted_in
--------|---------------------------
1 | 1,5
2 | 15,75
3 | 1,215
Or something similar to this. However, This format specifically causes issues as the following query will return all of the rows (from the example), instead of just the one I need:
SELECT * FROM main_table WHERE group = '5'
I either have to change the query to something like this:
...WHERE group = '5' OR group = '5,%' OR group = '%,5,%' OR group = '%,5'
Or I have to change the database structure from Comma Separated Values to something like this: [15][75]. The accompanying query would be simpler, but it somehow seems like a cumbersome solution to me. Additionally, (simple) joins will not be easy/ possible at all. It will always require me to run a separate query to fetch the names of the groups--whether a user searches for posts in a specific group (in which case, I first have to run a query to fetch the id's, then to search for the associated posts), or whether it is to display them (first the posts, then another query to match the groups).
So, in conclusion: I suppose I know there is a solution to this problem, but my gut tells me that it is not the right/ best way to do it. So, I suppose the question that ties this post together is:
What is the correct method to connect the group database to the others?
For a many-to-many relationship, you need to create a joining table. Rather than storing a list of groups in a single column, you should split that column out into multiple rows in a separate table. This will allow you to perform set based functions on them and will significantly speed up the database, as well as making it more robust and error proof.
Main
MainID ...
Group
GroupID GroupName
GroupsInMain
GroupsInMainID MainID(FK) GroupID(FK)
So, for MainID 1, you would have GroupsInMain records:
1,1,1
2,1,5
This associates groups 1 and 5 with MainID 1
FK in this case means a Foreign Key (i.e. a reference to a primary key in another table). You'd probably also want to add a unique constraint to GroupsInMain on MainID and GroupID, since you'd never want the same values for the pairing to show up more than once.
Your query would then be:
select GroupsInMain.MainID, Group.GroupName
from Group, GroupsInMain
where Group.GroupID=GroupsInMain.GroupID
and Group.GroupID=5