mysql count votes optimization - mysql

so im making a file hub nothing huge or fancy just to store some files that may be shared by others for download. and it just occured to me in the way that i originally intended to count the amount of upvotes or downvotes the query could be server heavy.the query to get the files is something along the lines of
select*from files;
and in such i would recieve an array of my files that i could loop over and get specifics on each file now with the inclusion of voting a file that same foreach loop would include a further query that would get the count the amount votes a file would get (the file id in the where clause) like so
select*from votes where upvoted=true and file.id=?
and i was thinking of using pdo::rowCount to get my answer. now evey bone in my body just says this is bad very bad as imagine im getting 10,000 files i just ran 10,000 extra queries one on each file and i havent looked at the downvotes yet which i was think could go in a similar fasion. any optimization adviece here is a small rep of the structure of a few tables. the upvoted and downvoted columbs are of type bool or tinyint if you will
table: file table: user table: votes
+----+-------------+ +----+-------------+ +--------+--------+--------+--------+
| id |storedname | | id | username | |file_id | user_id| upvoted | downvoted
+----+-------------+ +----+-------------+ +--------+--------+--------+--------+
| 1 | 45tfvb.txt | | 1 | matthew | | 1 | 2 | 1 | 0
| 2 |jj7fnfddf.pdf| | 2 | mark | | 2 | 1 | 1 | 1
| .. | .. | | .. | .. | | .. | .. | .. | ..

there are two ways to do this. the better way to do this (aka faster) is to write separate queries and build into one variable in your programming language (like php, python.. etc.)
SELECT
d.id as doc_id,
COUNT(v.document_id) as num_upvotes
FROM votes v
JOIN document d on d.id = v.document_id
WHERE v.upvoted IS TRUE
GROUP BY doc_id
);
that will return your list of upvoted documents. you can do the same for your downvotes.
then after your select from document do a for loop to compare the votes with the document by ID and build into a dictionary or list.
The second way to do this which can take a lot longer at runtime if you have a bunch of records in the table (its less efficient, but easier to write) is to add subquery selects in your select statement like this...
SELECT
logical_name ,
document.id ,
file_type ,
physical_name ,
uploader_notes ,
views ,
downloads ,
user.name ,
category.name AS category_name,
(Select count(1) from votes where upvoted=true and document_id=document.id )as upvoted,
(select count(1) from votes where upvoted=false and document_id=document.id) as downvoted
FROM document
INNER JOIN category ON document.category_id = category.id
INNER JOIN user ON document.uploader_id = user.id
ORDER BY category.id

Two advices:
Avoid SELECT * especially if you're going to count. Replace it, with something like that:
SELECT COUNT(1) AS total WHERE upvoted=true AND file.id=?
Maybe you want to create a TRIGGER to keep update a counter in the file table.
I hope it will be helpfull to you.

Related

NOT IN subquery gives 0 results

i'm not an mysqlologist but i have to deal with the following problem:
given a following table:
+-------+-----------+-------------+------+
| id | articleID | img | main |
+-------+-----------+-------------+------+
| 48350 | 4325 | scr426872xa | 1 |
| 48351 | 4325 | scr426872ih | 2 |
| 48352 | 4325 | scr426872jk | 2 |
| 48353 | 4326 | scr426882vs | 1 |
| 48354 | 4326 | scr426882ss | 2 |
| 48355 | 4326 | scr426882nf | 2 |
+-------+-----------+-------------+------+
each set of images of one distinct articleID should have one image set as main=1 and an unspecified number of images with main value of 2
Due to processing issues it can happen that there is no main=1 set for an image and i need to find the articleID where images with main=2 exist, but not with main=1.
By explaining it backwards it is easier to fomulate what my thinking process for the query is. My idea was to create a result set (subquery) by querying the table for articleID where main is "1". Then use that result to check which distinct articleID of a query where main=2 is not in the results of aforementioned (sub-)query. Basically "substracting" all matching articleID lines.
This should give basically the leftover of all main=2 lines which have no line with the same articleID where main=1
SELECT DISTINCT articleID
FROM img_table WHERE main = 2
AND articleID
NOT IN (SELECT articleID FROM img_table WHERE main = 1 );
I get no result when I know for a fact that there are some. There is surely something I'm doing wrong. I hope my problem is explained in a way that not only me know what I want :)
Given your problem description, it looks like you're actually looking for NOT EXISTS to check for rows that don't have a matching row in the subselect. Note that you do have to add the article id to the where clause in the subselect:
SELECT DISTINCT articleID
FROM img_table t1
WHERE main = 2
AND NOT EXISTS
(SELECT articleID
FROM img_table t2
WHERE main = 1
AND t2.articleID = t1.articleID);
I think your current solution should work too, but maybe you didn't show all the data. For the data you specified, the query would indeed return 0 rows, because all articleIDs have at least one main=1 and a main=2 image.
One important thing to remember: the subquery must not return any NULL value, otherwise NOT IN won't work properly. So if articleID is nullable, make sure your subselect looks like this:
(SELECT articleID FROM img_table WHERE main = 1 and articleID IS NOT NULL)
I didn't find any issue in your query, Please add some data where article id having only main 2. Your query checking both article ID contains main 1,2. ie why you not getting any result.

MySQL counting number of max groups

I asked a similar question earlier today, but I've run into another issue that I need assistance with.
I have a logging system that scans a server and catalogs every user that's online at that given moment. Here is how my table looks like:
-----------------
| ab_logs |
-----------------
| id |
| scan_id |
| found_user |
-----------------
id is an autoincrementing primary key. Has no real value other than that.
scan_id is an integer that is incremented after each successful scan of all users. It so I can separate results from different scans.
found_user. Stores which user was found online during the scan.
The above will generate a table that could look like this:
id | scan_id | found_user
----------------------------
1 | 1 | Nick
2 | 2 | Nick
3 | 2 | John
4 | 3 | John
So on the first scan the system found only Nick online. On the 2nd it found both Nick and John. On the 3rd only John was still online.
My problem is that I want to get the total amount of unique users connected to the server at the time of each scan. In other words, I want the aggregate number of users that have connected at each scan. Think counter.
From the example above, the result I want from the sql is:
1
2
2
EDIT:
This is what I have tried so far, but it's wrong:
SELECT COUNT(DISTINCT(found_user)) FROM ab_logs WHERE DATE(timestamp) = CURDATE() GROUP BY scan_id
What I tried returns this:
1
2
1
The code below should give you the results you are looking for
select s.scan_id, count(*) from
(select distinct
t.scan_id
,t1.found_user
from
tblScans t
inner join tblScans t1 on t.scan_id >= t1.scan_id) s
group by
s.scan_id;
Here is sqlFiddle
It assumes the names are unique and includes current and every previous scans in the count
Try with group by clause:
SELECT scan_id, count(*)
FROM mytable
GROUP BY scan_id

How to SELECT row B only if row A doesn't exist on GROUP BY

I'm passing through the following situation and have not found a good solution to this problem. I am going through a optimization of a API so am looking for fastest possible solution.
The following description is not exactly what I am doing, but I think it represents the problem well.
Let's say I have a table of products:
+----+----------+
| id | name |
+----+----------+
| 1 | product1 |
| 2 | product2 |
+----+----------+
And I have a table of attachments to each product, separate by language:
+----+----------+------------+-----------------------+
| id | language | product_id | attachment_url |
+----+----------+------------+-----------------------+
| 1 | bb | 1 | image1_bb.jpg |
| 1 | en | 1 | image1_en.jpg |
| 1 | pt | 1 | image1_pt.jpg |
| 2 | bb | 1 | image2_bb.jpg |
| 2 | pt | 1 | image2_pt.jpg |
+----+----------+------------+-----------------------+
What I intend to do is to get the correct attachment according to the language selected on the request. As you can see above, I can have several attachments to each product. We use Babel (bb) as a generic language, so every time I don't have a attachment to the right language, I should get the babel version. Is also important to consider that the Primary Key of the attachments table is a composite of id + language.
So, supposing I try to get all the data in pt, my first option to create a SQL query was:
SELECT p.id, p.name,
GROUP_CONCAT( '{',a.id,',',a.attachment_url, '}' ) as attachments_list
FROM products p
LEFT JOIN attachments a
ON (a.product_id=p.id AND (a.language='pt' OR a.language='bb'))
The problem is that, with this query I always get the bb data and I only want to get it when there is no attachment on the right language.
I already tried to do a subquery changing attachments for:
(SELECT * FROM attachments GROUP BY id ORDER BY id ASC, language DESC)
but it doubles the time of the request.
I also tried using DISTINCT inside the GROUP_CONCAT, but it only works if the whole result of each row is equal, so it does not work for me.
Does anyone knows any other solution that I can apply directly into the query?
EDIT:
Combining the answers of #Vulcronos and #Barmar made the final solution at least 2x faster than the one I first suggested.
Just to add some context, for anybody else who is looking for it. I am using Phalcon. Because of it, I had a lot of trouble putting the pieces together, as Phalcon PHQL does not support subqueries, nor a lot of the other stuff I had to use.
For my scenario, where I had to deliver approximatelly 1.2MB of JSON content, with more than 2100 objects, using custom queries made the total request time up to 3x faster than Phalcon native relations management methods (hasMany(), hasManyToMany(), etc.) and 10x faster than my original solution (which used a lot the find() method).
Try doing two joins instead of one:
SELECT p.id, p.name,
GROUP_CONCAT( '{',COALESCE(a.id, b.id),',',COALESCE(a.attachment_url, b.attachment_url), '}' ) as attachments_list
FROM products p
LEFT JOIN attachments a
ON (a.product_id=p.id AND a.language='pt')
LEFT JOIN attachments b
ON (a.product_id=p.id AND a.language='bb')
and then using COALESCE to return b instead of a if a doesn't exist. You can also do it with a subselect if the above doesn't work.
OR conditions tend to make queries slow, because it's hard to optimize them with indexes. Try joining separately using the two different languages.
SELECT p.id, p.name,
IFNULL(apt.attachment_url, abb.attachment_url) AS attachment_url
FROM products AS p
JOIN attachments AS abb ON abb.product_id = p.id
LEFT JOIN attachments AS apt ON alang.product_id = p.id AND apt.language = 'pt'
WHERE abb.language = 'bb'
This assumes that all products have a bb attachment, while pt is optional.
I left out the join of Product, because it's not relevant for this problem. It's only needed to include the product name in the resultset.
SELECT a.product_id, a.id, a.attachment_url FROM attachments a
WHERE a.language = ?
OR (a.language = 'bb'
AND NOT EXISTS
(SELECT * FROM attachments
WHERE language = ?
AND id = a.id
AND product_id = a.product_id));
Notes: problems like this usually have many possible solutions. This is not necessarily the most efficient one.

SQL statement to return elements from a column only if no elements from a different column match

Sorry for the confusing question, I will try to clarify.
I have an SQL database ( that I did not create ) that I would like to write a query for. I know very little about SQL, so it is hard for me to even know what to search for to see if this question has already been asked, so sorry if it has. It should be an easy solution for those in the know.
The query I need is for a search I would like to perform on an existing data management system. I want to return all the documents that a given user has NOT signed-off on, as indicated by rows in a signoffs_table. The data is stored similarly to as follows: (this is actually a simplification of the actual schema and hides several LEFT JOINS and columns)
signoffs_table:
| id | user_id | document_id | signers_list |
The naive solution I had was to do something like the following:
SELECT document_id from signoffs_table WHERE (user_id <> $BobsID) AND signers_list LIKE "%Bob%";
This works if ONLY Bob signs the document. The problem is that if Bob and Mary have signed the document then the table looks like this:
signoffs_table:
-----------------------------------------------
| id | user_id | document_id | signers_list |
-----------------------------------------------
| 1 | 10 | 100 | "Bob,Mary,Jim" |
| 2 | 20 | 100 | "Bob,Mary,Jim" |
-----------------------------------------------
(assume Bob's ID = 10 and mary's ID = 20).
and then when I do the query then I get back document_id 100 (in row #2) because there is a row that Bob should have signed, but did not.
Is what I am trying to do possible with the given database structure? I can provide more details if needed. I am not sure how much details are needed.
I guess this query is what you mean:
SELECT document_id FROM signoffs_table AS t1
WHERE signers_list LIKE "%Bob%"
AND NOT EXISTS (
SELECT 1 FROM signoffs_table AS t2
WHERE (t2.user_id = $BobsID) AND t2.document_id = t1.document_id )
I believe your design is incorrect. You have a many-to-many relationship between documents and signers. You should have a junction table, something like:
ID DocumentID SignerID

Mysql returning multiple comma separated records

I'm trying to return a list of discussions and their attached queues (ids and names).
So far I have the following:
SELECT a.id as discussion_id,c.queue_id,e.queue_name
FROM support_discussions AS a
JOIN (
SELECT b.queue_id,b.discussion_id
FROM support_queues_discussions AS b
) AS c ON a.id=c.discussion_id
JOIN (
SELECT d.id,d.name AS queue_name
FROM support_queues AS d
) AS e ON c.queue_id=e.id
This returns the following (as expected):
discussion_id | queue_id | queue_name
1 | 1 | Queue name A
1 | 2 | Queue name B
What I'd really like to do is to get it to return each discussion as one line, along with separate columns for the queue id and the queue name:
discussion_id | queue_id | queue_name
1 | 1,2 | Queue name A,Queue name B
Any thoughts on how this can be done in an efficient manner?
there is GROUP_CONCAT function in mysql which does exactly what you want
did you think about what will happen if queue name contains comma character? maybe you should rethink your solution because what you described sounds way to dodgy
You can use GROUP_CONCAT.