How to write mysql subselect properly with conditions and limiting

How to write mysql subselect properly with conditions and limiting - mysql

I have three Tables:
Posts:
id, title, authorId, text
authors:
id, name, country
Comments:
id, authorId, text, postId
I want to run a mysql command which selects the first 5 posts which were written by authors, whose country is 'Ireland'. In the same call, I want to retrieve all the comments for those five posts, and also the author info.
I've tried the following:
SELECT posts.id as 'posts.id', posts.title as 'posts.title' (etc. etc. list all fields in three table)
FROM
(SELECT * FROM posts, authors WHERE authors.country = 'ireland' AND authors.id = posts.authorId LIMIT 0, 5 ) as posts
LEFT JOIN
comments ON comments.postId = posts.id,
authors
WHERE
authors.id = posts.authorId
I had to include every field with an alias ^ because there was a duplicate for id, and more fields in future may become duplicates as I'm looking for a generic solution.
My two questions are:
1) I am getting a duplicate field entry from within my subselect for id, so do I have to list out all my fields as aliases again within the subselect or is there only one field I need for a subselect
2) Is there a way to auto-alias my call? At the moment I've just aliased every field in the main select but can it do this for me so there are no duplicates?
Sorry if this isn't very clear it's a bit of a messy problem! Thanks.

You are doing an unnecessary join back to the author table in your query. You get all the fields you want in the posts subquery. I would rename this to something other than an existing table, perhaps pa to indicate posts and authors.
You say you want the first 5 posts, but have no order clause. A better form of the query is:
SELECT pa.id as 'posts.id', pa.title as 'posts.title' (etc. etc. list all fields in three table)
FROM (SELECT *
FROM posts join
authors
on authors.id = posts.authorId
WHERE authors.country = 'ireland'
order by post.date
LIMIT 0, 5
) pa LEFT JOIN
comments c
ON c.postId = pa.id
Note that this returns the first five posts and their authors (as specified in the question). But one author may be responsible for all five posts.
In MySQL, you can use * and it will get rid of duplicate aliases in the from clause. I think this is dangerous. It is better to list all the columns you want.

To answer your questions:
You can select as many (or as few) columns as you need from a sub-query
You do not need to join the authors table again since you already selected all fields in the sub-query (and so get rid of duplicate columns names).
A few additional remarks...
... about the JOIN syntax
Prefer the form
FROM t1 JOIN t2 ON (t1.fk = t2.pk)
to the obsolete, obscure
FROM t1, t2 WHERE t1.fk = t2.pk
... about the use of a LIMIT clause without an ORDER BY clause
The order in which rows are returned by a SELECT statement without an ORDER BY clause is undefined. Therefore, a LIMIT n clause without an ORDER BY clause could return any n rows in theory.
Your final query should look like this:
SELECT *
FROM (
SELECT *
FROM posts
JOIN authors ON (authors.id = posts.authorId )
WHERE authors.country = 'ireland'
ORDER BY posts.id DESC -- assuming this column is monotonically increasing
LIMIT 5
) AS last_posts
LEFT JOIN comments ON ( comments.postId = last_posts .id )

Related

MySQL: Optimizing Sub-queries

I have this query I need to optimize further since it requires too much cpu time and I can't seem to find any other way to write it more efficiently. Is there another way to write this without altering the tables?
SELECT category, b.fruit_name, u.name
, r.count_vote, r.text_c
FROM Fruits b, Customers u
, Categories c
, (SELECT * FROM
(SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r
WHERE b.fruit_id = r.fruit_id
AND u.customer_id = r.customer_id
AND category = "Fruits";

This is your query re-written with explicit joins:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN
(
SELECT * FROM
(
SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r on r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
CROSS JOIN Categories c
WHERE c.category = 'Fruits';
(I am guessing here that the category column belongs to the categories table.)
There are some parts that look suspicious:
Why do you cross join the Categories table, when you don't even display a column of the table?
What is ORDER BY fruit_id, count_vote DESC, r_id supposed to do? Sub query results are considered unordered sets, so an ORDER BY is superfluous and can be ignored by the DBMS. What do you want to achieve here?
SELECT * FROM [ revues ] GROUP BY fruit_id is invalid. If you group by fruit_id, what count_vote and what r.text_c do you expect to get for the ID? You don't tell the DBMS (which would be something like MAX(count_vote) and MIN(r.text_c)for instance. MySQL should through an error, but silently replacescount_vote, r.text_cbyANY_VALUE(count_vote), ANY_VALUE(r.text_c)` instead. This means you get arbitrarily picked values for a fruit.
The answer hence to your question is: Don't try to speed it up, but fix it instead. (Maybe you want to place a new request showing the query and explaining what it is supposed to do, so people can help you with that.)

Your Categories table seems not joined/related to the others this produce a catesia product between all the rows
If you want distinct resut don't use group by but distint so you can avoid an unnecessary subquery
and you dont' need an order by on a subquery
SELECT category
, b.fruit_name
, u.name
, r.count_vote
, r.text_c
FROM Fruits b
INNER JOIN Customers u ON u.customer_id = r.customer_id
INNER JOIN Categories c ON ?????? /Your Categories table seems not joined/related to the others /
INNER JOIN (
SELECT distinct fruit_id, count_vote, text_c, customer_id
FROM Reviews
) r ON b.fruit_id = r.fruit_id
WHERE category = "Fruits";
for better reading you should use explicit join syntax and avoid old join syntax based on comma separated tables name and where condition

The next time you want help optimizing a query, please include the table/index structure, an indication of the cardinality of the indexes and the EXPLAIN plan for the query.
There appears to be absolutely no reason for a single sub-query here, let alone 2. Using sub-queries mostly prevents the DBMS optimizer from doing its job. So your biggest win will come from eliminating these sub-queries.
The CROSS JOIN creates a deliberate cartesian join - its also unclear if any attributes from this table are actually required for the result, if it is there to produce multiples of the same row in the output, or just an error.
The attribute category in the last line of your query is not attributed to any of the tables (but I suspect it comes from the categories table).
Further, your code uses a GROUP BY clause with no aggregation function. This will produce non-deterministic results and is a bug. Assuming that you are not exploiting a side-effect of that, the query can be re-written as:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN Reviews r
ON r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
ORDER BY r.fruit_id, count_vote DESC, r_id;
Since there are no predicates other than joins in your query, there is no scope for further optimization beyond ensuring there are indexes on the join predicates.
As all too frequently, the biggest benefit may come from simply asking the question of why you need to retrieve every single row in the tables in a single query.

How can I find records that contain multiple category IDs (using Toxi structure)?

SQL
SELECT *
FROM posts
LEFT JOIN taxonomy_term_map
ON (posts.ID = taxonomy_term_map.object_id)
WHERE taxonomy_term_map.term_id
IN (98,119)
GROUP BY posts.ID DESC
HAVING COUNT(posts.ID ) >= 2
LIMIT 0,20
TABLES & COLUMNS
posts { ID, post_title, etc... }
taxonomy_terms { term_id, term_label, term_slug, etc. }
post_taxonomy_term_map { map_id, object_id, taxonomy, term_id}
(NOTE: object_id relates to the posts.ID value)
My site uses the Toxi taxonomy structure for tagging/categorizing posts. Each post can have multiple term IDs attached to it.
Each taxonomy term associated with a post gets a record in the "post_taxonomy_term_map" table.
The query I'm current using returns matching records at the top of the results ("GROUP BY posts.ID DESC") along with additional records that don't fully match all the terms provided.
I only want to select records that match ALL of the term ID values provided, everything else should be ignored. Additionally, I want to order records by posts.rank, but I can't currently do that while ordering posts by GROUP.
I would appreciate some assistance.

I think this query will get what you need:
select p.* -- get only the columns from posts
from posts p
join post_taxonomy_term_map m on m.object_id = p.id
where m.term_id in (98, 119) -- filter to specific terms
group by p.id
having count(*) = 2 -- filter out incomplete ones
order by p.rank -- order by rank
limit 20 -- get only the first 20 rows

Can I use a join query to do this?

I have to admit trying to understand JOINs makes my brain explode so I need some help.
What I'm trying to accomplish is return info on the last 25 postings in a forum, but the main posts table only returns numbers for the Topic and Forum, whereas I need the textual names of the topic and forum, which I can retrieve from two other tables. In my very limited understanding of joins, it seems I can use one to do all of this in a single query rather than coding 3+ queries with loops and other perhaps unneeded code.
This would be the main query:
SELECT post_id, topic_id, forum_id, post_time
FROM posts
ORDER BY post_id DESC
LIMIT 25
But for each of the 25 results I also want included forum_title from table forums where forum_id in that table matches the forum_id from the main query results, as well as topic_title from table topics where the topic_id in that table matches the topic_id in the main query results.
I'm hoping just even seeing what this would look like will help my understanding of how JOINs work.
Thanks
EDIT: I realized I should have used the exact column and table names so that I wouldn't be editing suggestions. Using the exact names, this is how Aquinas' suggestion would look:
SELECT post_id, topic_id, forum_id, post_time, Forum_Title, Topic_Title
FROM phpbb3_posts
INNER JOIN phpbb3_topics
on phpbb3_topics.topic_id = phpbb3_posts.topic_id
INNER JOIN phpbb3_forums
on phpbb3_forums.forum_id = phpbb3_posts.forum_id
ORDER BY post_id DESC
LIMIT 25
but I get this error (this is in mysql)
1052 - Column 'topic_id' in field list is ambiguous

SELECT post_id, topic_id, forum_id, post_time, Forum_Title, Topic_Title FROM posts
INNER JOIN topics on topics.topic_id = posts.topic_id
INNER JOIN forums on forums.forum_id = posts.forum_id
ORDER BY id DESC LIMIT 25

How to query data without repeats and minimize the time?

There are 3 entities - articles, journals and subscribers. There are no restrictions on how to store data in database.
The same article can be simultaneously published in several journals.
How to select all published articles from subscribed journals sorted
by date of publication and without repeats?
The easiest way:
Create a table with articles:
posts
p_id, j1_id, j2_id, text, date
Create a table with subscribtions:
follows
f_id, u_id, j_id (u_id — is a user id from table users)
Execute:
example query
select posts.* from posts inner join follows on (j_id = j1_id or j_id
= j2_id) where u_id = 1 order by date desc
This query returns data with duplicates. You can use mechanisms DISTINCT or GROUP BY, but it creates an additional sorting operation to remove duplicates.
The other way it can be done using mechanism UNION, but it also uses a DISTINCT.
(select posts.* from posts inner join follows on j_id = j1_id where u_id = 1)
union
(select posts.* from posts inner join follows on j_id = j2_id where u_id = 1)
order by date desc
Perhaps I selected the incorrect storage structure in my way.
Actually the question, is it possible to do something about this problem, to minimize the time required for big data?

you can use the following table structure
posts : pid, text, date
journals : jid, jtext
journals_posts : jid, pid
follows : fid, uid, jid
select distinct posts.* from posts
inner join journals_posts on journals_posts.pid = posts.pid
inner join follows on follows.jid = journals_posts.jid
where follows.uid = <userid>
to take care of speed you can create index on
journals_posts(jid)
follows(uid)
you might required to create indexes on other fields check with "explain " which tables are scanned without using joins

sql with GROUP_CONCAT

I'm running this SQL query
$sql = "select images.image, images.comment as feedDescription,
customers.fullName, CONCAT('[', GROUP_CONCAT(DISTINCT likes.uid),']') as likes,
CONCAT('[', GROUP_CONCAT(DISTINCT CONCAT('{\"userid\":\"', comments.fid, '\", \"comment\":\"', comments.comment, '\"}') separator ','),']') as comments
FROM images
LEFT JOIN customers on images.client_id = customers.client_id
LEFT JOIN likes on images.image = likes.image
LEFT JOIN comments on images.image = comments.image
WHERE images.fid=:userID
ORDER BY images.image LIMIT $offset,$limit";
the only problem is that I am getting only the first row ...
I have images table, customers table (taking the name of the customer by the id i got in the images), likes table (people who did "like" on the image) and comments (people who wrote "comments" on the table)

You are using an aggregation function on a query, so MySQL is automatically returning only one row -- the aggregation of all the data.
In other databases, this would produce an error, because you have a mixture of aggregated and non-aggregated columns. This is a (mis)feature of MySQL called "hidden columns".
Add a group by to your query to fix the problem:
group by images.image, images.comment, customers.fullName
Be sure to add this after the WHERE clause and before the ORDER BY.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008