MySQL: Ordering By Using Two Counts From Another Table? - mysql

I have one sql table that looks like this called "posts":
id | user
--------------------------------
0 | tim
1 | tim
2 | bob
And another called "votes" that stores either upvotes or downvotes on the posts in the "posts" table:
id | postID | type
--------------------------------
0 | 0 | 0
1 | 2 | 1
2 | 0 | 1
3 | 0 | 1
4 | 3 | 0
In this table, the 'type' is either a 0 for downvote or 1 for upvote.
How would I go about ordering posts by "tim" by the number of (upvotes - downvotes) the post has?

SELECT
p.id,
p.user,
SUM(v.type * 2 - 1) AS votecount
FROM posts p
LEFT JOIN votes v ON p.id = v.postID
WHERE p.user = 'tim'
GROUP BY p.id, p.user
ORDER BY votes DESC
UPDATE – p and v explained.
In this query, p and v are aliases of, respectively, posts and votes. An alias is essentially an alternative name and it is defined only within the scope of the statement that declares it (in this case, the SELECT statement). Not only a table can have an alias, but a column too. In this query, votecount is an alias of the column represented by the SUM(v.type * 2 - 1) expression. But presently we are talking only about tables.
Before I go on with explanation about table aliases, I'll briefly explain why you may need to prefix column names with table names, like posts.id as opposed to just id. Basically, when a query references more than one table, like in this case, you may find it quite useful always to prefix column names with the respective table names. That way, when you are revisiting an old script, you can always tell which column belongs to which table without having to look up the structures of the tables referenced. Also it is mandatory to include the table reference when omitting it creates ambiguity as to which table the column belongs to. (In this case, referencing the id column without referencing the posts table does create ambiguous situation, because each table has got their own id.)
Now, a large and complex query may be difficult to read when you write out complete table names before column names. This is where (short) aliases come in handy: they make a query easier to read and understand, although I've already learnt that not all people share that opinion, and so you should judge for yourself: this question contains two versions of the same query, one with long-named table references and the other with short-aliased ones, as well as an opinion (in a comment to one of the answers) why aliases are not suitable.
Anyway, using short table aliases in this particular query may not be as beneficial as in some more complex statements. It's just that I'm used to aliasing tables whenever the query references more than one.
This MySQL documentation article contains the official syntax for aliasing tables in MySQL (which is actually the same as in standard SQL).

Not tested, but should work:
select post.id, sum(if(type = 0, -1, 1)) as score
from posts join votes on post.id = votes.postID
where user = 'tim'
group by post.id
order by score
Do you plan to concur SO? ;-)

Edit: I cut out the subquery since in mysql its unnecessary. The original query was portable, but unnecessary for mysql.
select
p.id, SUM(case
when v.type = 0 then -1
when v.type = 1 then 1
else 0 end) as VoteCount
from
posts p
left join votes v
on p.id = v.postid
where
p.[user] = 'tim'
group by
p.id
order by
VoteCount desc

Related

Check values exist and not exist between 2 tables

I have 3 tables, one of accounts, one of friends and another of consumers.
Something like this:
table_accounts
id | account_name
table_friends
id | account_id | people_id
table_consumers
id | account_id | people_id
I need to cross the following information:
Which consumer_id coexist in both tables, something simple like this:
SELECT
*
FROM
table_friends,
table_consumers
WHERE
table_friend.account_id = 12345
AND table_friend.account_id = table_consumers.account_id
GROUP BY table_friend.people_id
this query is very slow
Well, I now need to get what are the consumer_id's friends table, which are NOT in the consumers table. And in a third moment, find out which consumer_id does NOT exist in the friends table. But I think it's the same thing ...
My doubt is about logic, I can not think how to cross this information.
This is probably more or less what you try to do : (and take a look at Subqueries with EXISTS vs IN - MySQL )
SELECT *
FROM table_friends
WHERE
NOT EXISTS (
SELECT *
FROM table_consumers
WHERE table_consumers.people_id = table_friends.people_id
)
BTW, you say "this query is very slow" how many row you query ? what is "slow" ? do you have some index where you need them ?
Could you do something like this:
Select a.account_name
, a.id
, case when f.id is null then 0 else 1 end isFriend
, case when c.id is null then 0 else 1 end isConsumer
from table_accounts a
left join table_friends f on a.id = f.account_id
left join table_consumers c on a.id = c.account_id
If I understand your question correctly you can use NOT IN to find the exceptions for each table. Something like this:
SELECT id
FROM table_consumers
WHERE account_id
NOT IN
(SELECT account_id
FROM table_friends)
You can do the same thing with the table names reversed to find out which friends are not in consumers. If you were wanting to include more than one table in the query, you may want to check out using UNION or UNION ALL as well. See: UNION ALL and NOT IN together
Looks like you already got the answer on how to compose your query, but you should think about the redesign of your schema. If it's not too late.
Both table_friends, and table_consumers represent people. The only difference is what type/kind of people. You don't want to add a new table every time you need to add a new attribute to people.
What you need is:
table_accounts
table_people
table_people_type
table_people_type_mapping
The last one being a mapping table between table_people and table_people_type.
In table_people_type you could have friends and consumers for now, but you could also add different types later on without schema change. And your queries would be more intuitive.
Again, that is in case if schema change is still an option for you.

MySQL - How do I optimize appending field from table b to a query of table a

I know this has to be a fairly common issue, and I am sure the answer is readily available but I am not sure how to phrase my search so I have been forced to troubleshoot this on my own for the most part.
Table A
id | content_id | score
1 | 2 | 16
2 | 2 | 4
3 | 3 | 8
4 | 3 | 12
Table B
id | content
1 | "Content Goes Here"
2 | "Content Goes Here"
3 | "Content Goes Here"
Objective: SUM all scores from table A, group by the unique content_id and show the content associated with the id, ordered by the sum score.
Current Working Query:
SELECT a.content_id, b.content, SUM(a.score) AS sum
FROM table_a a
LEFT JOIN table_b b ON a.content_id = b.id
GROUP BY a.content_id
ORDER BY sum ASC;
Problem: As far as I can tell, with the way I have structured my query, the content is grabbed from table_b by looping through each record on table_a, checking for a record in table_b with an identical id, and grabbing the content field. The problem here is that in table_a there is nearly 500k+ records, and in table_b there is 112 records. Which means that potentially 500,000 x 112 cross table lookups/matches are being performed just to attached 112 unique content fields to a total of 112 results in the ending result set.
HELP!: How do I more efficiently append the 112 content fields from table_b to the 112 results produced by the query? I am guessing it has something to do with the query execution order, like somehow only looking for and appending the content field to the matched result row AFTER the sums are produced and it is narrowed down to only 112 records? Have studied the MySQL API and benchmarked various subqueries, several joins, and even tried playing with UNION. It is probably something abundandtly obvious to you guys, but my brain just can't get around it.
FYI: Like mentioned earlier, the query does work. The results are produced in about 8 to 10 seconds, and of course each subsequent query after that is immediate because of query caching. But for me, with how simple this is, I know that 8 seconds can at LEAST be cut in half. I just feel it deep down in my guts. Right deep down in my gutssss.
I hope this is concise enough, if I need to clarify or explain something better please let me know! Thanks in advance.
The MySQL query optimiser only allows "nested loop joins" ** These are the internal operators for how an INNER join is evaluated. Other RDBMS allow other kinds of JOINs which are more efficient.
However, in your case you can try this. Hopefully the optimiser will do the aggregate before the JOIN
SELECT
a.content_id, b.content a.sum
FROM
(
SELECT content_id, SUM(score) AS sum
FROM table_a
GROUP BY content_id
) a
JOIN table_b b ON a.content_id = b.id
ORDER BY
sum ASC;
In addition, if you don't want the results ordered you can use ORDER BY NULL which usually removes a filesort from the EXPLAIN. And of course, I assume that there are indexes on the 2 content_id columns (one primary key, one foreign key index)
Finally, I would also assume that an INNER JOIN will be enough: every a.contentid exists in tableb. If not, you are missing a foreign key and index on a.contentid
** It's getting better but you need MariaDB or MySQL 5.6
This should be a little faster:
SELECT
tmp.content_id,
b.content,
tmp.asum
FROM (
SELECT
a.content_id,
SUM(a.score) AS asum
FROM
table_a a
GROUP BY
a.content_id
ORDER BY
NULL
) as tmp
LEFT JOIN table_b b
ON tmp.content_id = b.id
ORDER BY
tmp.asum ASC
You can use EXPLAIN to check the query execution plan for both queries when you want to benchmark them

Mysql query substract issue

To begin with I have two tables:
feeds:(id,content_id,author) and
feeds_ratings:(id,feed_id (FK to feeds(id)),user_id,rating)
What I want to do is get the total rating difference for a specific author by a certain user.
To explain a bit more, let's say we have three rows in the feeds table, (1,3245,test),(2,3215,test),(3,3122,test) and tree rows on the feeds_ratings table, (1,1,12,like), (1,2,12,like),(1,3,12,dislike)
The input will be the user_id and the author and I want the output to be the difference between the total dislikes and likes by the input user, for the specific input author. (In this example it will be 1 because of the two likes and the one dislike.
How can that be implemented in a mysql query? I tried searching and some code of my own but I can't make it work, so any help is appreciated!
Something like this will work using SUM with CASE -- add 1 for likes and -1 for dislikes:
SELECT SUM(CASE WHEN fr.rating = 'like' THEN 1 ELSE -1 END) TotalLikes
FROM Feeds F
INNER JOIN Feeds_Ratings FR ON F.id = FR.feed_id
WHERE F.author = 'test'
AND FR.user_id = 12
SQL Fiddle Demo
Obviously replace author and user_id with the appropriate values -- these are just for your sample input.

getting quize data, questions and answers in 1 query?

I need to get quize title, quize description, quize questions and answers for each questions. My table structure is:
quizes
quize_id | title | user_id | ...
questions
questions_id | quize_id | question | ...
question_answers
answer_id | question_id | user_id | answer | ...
I can use join
SELECT * FROM quizes JOIN questions q ON q.quize_id=quizes.quize_id JOIN question_answers a ON a.question_id=q.question_id
But the problem with this is that I will get in results many rows with redundant data. For example each row will carry field title,user_id, ... Another way is to make for each question extra query to get answers. Is there any better way? Should I use only 1 query or more?
Your tables hold 3 types of data. If you use the query you've got, you'll get all the data as a big table. You've said that this involves a lot of duplication.
If you use multiple queries, you will get multiple result sets, which effectively will leave you with multiple tables, and thus this is unlikely to help.
You could cut the query down to just the columns you want to get the data for:
SELECT qq.Question, qa.Answer
FROM quizes qz
join questions qq on qz.quize_id = qq.quize_id
join question_answers qa on qq.question_id = qa.question_id
WHERE qz.quize_id = #quize_id
ORDER BY 1, 2 --or other ordering
However where there are multiple answers for the same question, the question will be repeated on every row. There isnt much you can do about that, it is the price of combining multiple table's data into one table ("denormalising").
If you need to format your output table so that it looks like this (but with more columns):
Quize_id | Question | Answer
1 Q1 A1
A2
Q2 A3
2 Q3 A4
This is a whole different matter. You would need to use the query you've got to populate a temporary table, ordering the data by the sort order you want displayed. To this table you'd need to add a primary key (integer) column, then run a set of update statements to replace the repeated values with nulls, then output the table in the order of the primary key column. (There are other ways to do this, but this is the easiest to explain)
Does this help?
I found also another way which return all data I need, including user details for each question:
SELECT
question,
group_concat(qa.answer SEPARATOR ',') as answers,
group_concat(qa.user_id SEPARATOR ',') as userIds,
group_concat(up.nickname SEPARATOR ',') as nickname
FROM quize_questions qq
INNER JOIN question_answers qa ON qa.question_id=qq.question_id
INNER JOIN user_profile up ON up.user_id = qa.user_Id
GROUP BY qq.question_id
I am just not sure if this is the right way. I am worried about speed.

MySQL Select based on count of substring in a column?

Let's say I have two tables (I'm trying to remove everything irrelevant to the question from the tables and make some sample ones, so bear with me :)
___________________ ________________________
|File | |Content |
|_________________| |______________________|
|ID Primary Key | 1 * |ID Primary Key |
|URL Varcher(255) |---------|FileID Foreign Key |
|_________________| | ref File(ID) |
|FileContent Text |
|______________________|
A File has a url. There may be many Content items corresponding to each File.
I need to create a query using these tables that I'm having some trouble with. I essentially want the query, in simple terms, to say:
"Select the file URL and the sum of the times substring "X" appears in all content entries associated with that file."
I'm pretty good with SQL selects, but I'm not so good with aggregate functions and it's letting me down. Any help is greatly appreciated :)
The query won't be efficient but might give you a hint:
SELECT url, cnt
FROM (
SELECT
f.id,
IFNULL(
SUM(
(LENGTH(c.text) - LENGTH(REPLACE(c.text, f.url, '')))/LENGTH(f.url)
),
0
) as cnt
FROM file c
JOIN content c ON f.id = c.fileid
GROUP BY f.id
) cnts JOIN file USING(id);
To append files that do not have a match in the content table you can UNION ALL the rest of use LEFT JOIN in the cnts subquery.
This solution attempts to use REGEXP to match the substring. REGEXP returns 1 if it matches, 0 if not, so SUM() them up for the total. REGEXP might seem like overkill, but would allow for more complicated matching than a simple substring.
SELECT
File.ID,
File.URL,
SUM(Content.FileContent REGEXP 'substring') AS numSubStrs
FROM File LEFT JOIN Content ON File.ID = Content.ID
GROUP BY File.ID, File.URL;
The easier method if a more complex match pattern won't ever be needed uses LIKE and COUNT(*) instead of SUM():
SELECT
File.ID,
File.URL,
COUNT(*) AS numSubStrs
FROM File LEFT JOIN Content ON File.ID = Content.ID
WHERE Content.FileContent LIKE '%substring%'
GROUP BY File.ID, File.URL;
Note the use of LEFT JOIN, which should produce 0 when there are not actually any entries in Content.