There are 3 entities - articles, journals and subscribers. There are no restrictions on how to store data in database.
The same article can be simultaneously published in several journals.
How to select all published articles from subscribed journals sorted
by date of publication and without repeats?
The easiest way:
Create a table with articles:
posts
p_id, j1_id, j2_id, text, date
Create a table with subscribtions:
follows
f_id, u_id, j_id (u_id — is a user id from table users)
Execute:
example query
select posts.* from posts inner join follows on (j_id = j1_id or j_id
= j2_id) where u_id = 1 order by date desc
This query returns data with duplicates. You can use mechanisms DISTINCT or GROUP BY, but it creates an additional sorting operation to remove duplicates.
The other way it can be done using mechanism UNION, but it also uses a DISTINCT.
(select posts.* from posts inner join follows on j_id = j1_id where u_id = 1)
union
(select posts.* from posts inner join follows on j_id = j2_id where u_id = 1)
order by date desc
Perhaps I selected the incorrect storage structure in my way.
Actually the question, is it possible to do something about this problem, to minimize the time required for big data?
you can use the following table structure
posts : pid, text, date
journals : jid, jtext
journals_posts : jid, pid
follows : fid, uid, jid
select distinct posts.* from posts
inner join journals_posts on journals_posts.pid = posts.pid
inner join follows on follows.jid = journals_posts.jid
where follows.uid = <userid>
to take care of speed you can create index on
journals_posts(jid)
follows(uid)
you might required to create indexes on other fields check with "explain " which tables are scanned without using joins
Related
I have two tables (documents and states).
documents fields:
id, (int)
document,(string)
file, (string)
creation (date)
states fields:
id, (int)
id_document, (int)
status, (string)
last_update (date)
id_document obviously match id in the first table.
The first table holds data related to documents and the second an updated serie of states of the processing of documents of the first table.
I need to create a view to show the list of documents with, only, their last reached status, if any.
I wrote this query but that make the correct join but I'm unable to restrict it to the last status:
SELECT
documents.*, states.status, states.last_update
FROM
documents
LEFT JOIN states ON states.id = documents.id
ORDER BY states.last_update
I tried with DISTINCT, DISTINCTROW but without luck....
This should provide what you are looking for (there may exist a more elegant solution):
select d.*,
s.status,
s.last_update
from documents d,
states s
where s.id = d.id
and s.last_update = ( select last_update
from states
where id = d.id
order by last_update desc
limit 0,1 )
or s.last_update is NULL;
Use a sub-query with max(last_status) per document. Then inner join the sub-query to your query on document and last_status.
Maybe this works:
SELECT
documents.*, updates .status, updates.updated
FROM
documents
LEFT JOIN (SELECT id_document, status, MAX(last_update) AS updated
FROM states
GROUP BY id_document) AS updates
ON states.id = documents.id
ORDER BY states.last_update
I have 2 tables, authors and books
authors contains the unique id authorId
books also contains this as a foreign key
I need to know the authors with the most number of books. If 2 or more authors are tied for the greatest number of books, I need to show both authors
I have been able to achieve this by first getting the maximum count
SELECT #maxCount := (MAX(counter)) FROM (SELECT count(*) AS counter FROM books GROUP BY authorId) AS counts;
and then using it to get the Ids with that count as part of my author selection
SELECT *
FROM authors
WHERE authorId IN (
SELECT authorId
FROM books
GROUP BY authorId
HAVING COUNT(*) = #maxCount
);
I've been told that I am not allowed to use variables and that what I've done is horribly inefficient if the tables grow very large.
Am I missing something obvious here? Is there a way to do this in a single statement without a variable (or temp table), and without having to select/group the entire books table twice?
SELECT author, COUNT(*)
FROM authors
JOIN books
ON authors.authorId=books.AuthorId
GROUP BY author
ORDER BY COUNT(*) DESC
Will give you a list ordered by the number of books for each author. I don't have an instance nearby to test, and tend to steer clear of embedded variables but expect something like....
SELECT *
FROM (
SELECT author
, #maxcount:=IF(COUNT(*)>#maxcount,COUNT(*), #maxcount)
, COUNT(*) AS cnt
FROM authors
JOIN books
ON authors.authorId=books.AuthorId
GROUP BY author
ORDER BY COUNT(*) DESC
) ilv
WHERE cnt=#maxcount;
Performance still sucks with large datasets (even with the right indexes). If you have to run this query frequently with >100,000 records, then you might consider denormalizing your data.
Symcbean solution is great... you can add Limit 1 to it, to get only one instance.
SELECT A.authorId, A.name, COUNT(*) AS num_books
FROM authors A
INER JOIN books B
ON A.authorId=B.AuthorId
GROUP BY A.authorId, A.name
ORDER BY COUNT(*) DESC
LIMIT 1
But if you want to get all the authors who share the max number of books, your best bet is to store the max(count) in a variable, or temp table and use it in second query.
for example, you can store the info in the following temp table
CREATE TEMPORARY TABLE IF NOT EXISTS maxBooks AS (
SELECT authorId, COUNT(*) AS num_books
FROM books
GROUP BY authorId
ORDER BY COUNT(*) DESC
LIMIT 1
)
now you can join it to your table to get counts which are equal to max count
I have a table in a MySQL DB, called ‘users’. The fields for users are : id, email, username, first_name, last_name. Another table in the same MySQL DB, called ‘premium_users’ , contains 2 fields : user_id, premium_service_id. A third table called ‘premium_services’ contains 2 fields : premium_service_id , premium_service_name.
I want to create an SQL query , to interrogate my db, so i can have a full list of what premium services has every premium user. How can i interrogate properly with inner join? I’ve try this :
select * from users inner join premium_users on users.id = premium_users.user_id inner join premium_services on premium_users.premium_service_id = premium_services.premium_service_id;
Since you say which service has every user, you'll need to use aggregation to determine this. Here's one way:
select user_id
from premium_users
group by user_id
having count(*) = (select count(*) from premium_services)
SQL Fiddle Demo
Depending on your data, you may need count(distinct premium_service_id) instead, but you should have constraints that don't allow duplicates in those table.
Rereading your question, I might have got this backwards. Looks like you want a list of premium services instead of users. Same concept applies:
select ps.premium_service_id
from premium_services ps
join premium_users pu on ps.premium_service_id = pu.premium_service_id
group by ps.premium_service_id
having count(distinct pu.user_id) = (select count(distinct user_Id) from premium_users)
More Fiddle
I have three Tables:
Posts:
id, title, authorId, text
authors:
id, name, country
Comments:
id, authorId, text, postId
I want to run a mysql command which selects the first 5 posts which were written by authors, whose country is 'Ireland'. In the same call, I want to retrieve all the comments for those five posts, and also the author info.
I've tried the following:
SELECT posts.id as 'posts.id', posts.title as 'posts.title' (etc. etc. list all fields in three table)
FROM
(SELECT * FROM posts, authors WHERE authors.country = 'ireland' AND authors.id = posts.authorId LIMIT 0, 5 ) as posts
LEFT JOIN
comments ON comments.postId = posts.id,
authors
WHERE
authors.id = posts.authorId
I had to include every field with an alias ^ because there was a duplicate for id, and more fields in future may become duplicates as I'm looking for a generic solution.
My two questions are:
1) I am getting a duplicate field entry from within my subselect for id, so do I have to list out all my fields as aliases again within the subselect or is there only one field I need for a subselect
2) Is there a way to auto-alias my call? At the moment I've just aliased every field in the main select but can it do this for me so there are no duplicates?
Sorry if this isn't very clear it's a bit of a messy problem! Thanks.
You are doing an unnecessary join back to the author table in your query. You get all the fields you want in the posts subquery. I would rename this to something other than an existing table, perhaps pa to indicate posts and authors.
You say you want the first 5 posts, but have no order clause. A better form of the query is:
SELECT pa.id as 'posts.id', pa.title as 'posts.title' (etc. etc. list all fields in three table)
FROM (SELECT *
FROM posts join
authors
on authors.id = posts.authorId
WHERE authors.country = 'ireland'
order by post.date
LIMIT 0, 5
) pa LEFT JOIN
comments c
ON c.postId = pa.id
Note that this returns the first five posts and their authors (as specified in the question). But one author may be responsible for all five posts.
In MySQL, you can use * and it will get rid of duplicate aliases in the from clause. I think this is dangerous. It is better to list all the columns you want.
To answer your questions:
You can select as many (or as few) columns as you need from a sub-query
You do not need to join the authors table again since you already selected all fields in the sub-query (and so get rid of duplicate columns names).
A few additional remarks...
... about the JOIN syntax
Prefer the form
FROM t1 JOIN t2 ON (t1.fk = t2.pk)
to the obsolete, obscure
FROM t1, t2 WHERE t1.fk = t2.pk
... about the use of a LIMIT clause without an ORDER BY clause
The order in which rows are returned by a SELECT statement without an ORDER BY clause is undefined. Therefore, a LIMIT n clause without an ORDER BY clause could return any n rows in theory.
Your final query should look like this:
SELECT *
FROM (
SELECT *
FROM posts
JOIN authors ON (authors.id = posts.authorId )
WHERE authors.country = 'ireland'
ORDER BY posts.id DESC -- assuming this column is monotonically increasing
LIMIT 5
) AS last_posts
LEFT JOIN comments ON ( comments.postId = last_posts .id )
I have three tables: users, groups and relation.
Table users with fields: usrID, usrName, usrPass, usrPts
Table groups with fields: grpID, grpName, grpMinPts
Table relation with fields: uID, gID
User can be placed in group in two ways:
if collect group minimal number of points (users.usrPts > group.grpMinPts ORDER BY group.grpMinPts DSC LIMIT 1)
if his relation to the group is manually added in relation tables (user ID provided as uID, as well as group ID provided as gID in table named relation)
Can I create one single query, to determine for every user (or one specific), which group he belongs, but, manual relation (using relation table) should have higher priority than usrPts compared to grpMinPts? Also, I do not want to have one user shown twice (to show his real group by points, but related group also)...
Thanks in advance! :) I tried:
SELECT * FROM users LEFT JOIN (relation LEFT JOIN groups ON (relation.gID = groups.grpID) ON users.usrID = relation.uID
Using this I managed to extract specified relations (from relation table), but, I have no idea how to include user points, respecting above mentioned priority (specified first). I know how to do this in a few separated queries in php, that is simple, but I am curious, can it be done using one single query?
EDIT TO ADD:
Thanks to really educational technique using coalesce #GordonLinoff provided, I managed to make this query to work as I expected. So, here it goes:
SELECT o.usrID, o.usrName, o.usrPass, o.usrPts, t.grpID, t.grpName
FROM (
SELECT u.*, COALESCE(relationgroupid,groupid) AS thegroupid
FROM (
SELECT u.*, (
SELECT grpID
FROM groups g
WHERE u.usrPts > g.grpMinPts
ORDER BY g.grpMinPts DESC
LIMIT 1
) AS groupid, (
SELECT grpUID
FROM relation r
WHERE r.userUID = u.usrID
) AS relationgroupid
FROM users u
)u
)o
JOIN groups t ON t.grpID = o.thegroupid
Also, if you are wondering, like I did, is this approach faster or slower than doing three queries and processing in php, the answer is that this is slightly faster way. Average time of this query execution and showing results on a webpage is 14 ms. Three simple queries, processing in php and showing results on a webpage took 21 ms. Average is based on 10 cases, average execution time was, really, a constant time.
Here is an approach that uses correlated subqueries to get each of the values. It then chooses the appropriate one using the precedence rule that if the relations exist use that one, otherwise use the one from the groups table:
select u.*,
coalesce(relationgroupid, groupid) as thegroupid
from (select u.*,
(select grpid from groups g where u.usrPts > g.grpMinPts order by g.grpMinPts desc limit 1
) as groupid,
(select gid from relations r where r.userId = u.userId
) as relationgroupid
from users u
) u
Try something like this
select user.name, group.name
from group
join relation on relation.gid = group.gid
join user on user.uid = relation.uid
union
select user.name, g1.name
from group g1
join group g2 on g2.minpts > g1.minpts
join user on user.pts between g1.minpts and g2.minpts