mySql count all latest versions - mysql

In had a question up yesterday unfortunately I didn't explain myself well enough - one of them end of the day things.
Anyway I have a table called documents...
+----+--------------------------------------+-----------+---------+---------+
| id | document_guid | title | version | payload |
+----+--------------------------------------+-----------+---------+---------+
| 1 | 0D2753BE-583B-42CE-B0DA-1FD0171D95C0 | animation | 1 | {} |
| 2 | 0D2753BE-583B-42CE-B0DA-1FD0171D95C0 | animation | 2 | {} |
| 3 | 1C2A1131-0261-4D58-81AA-EFAB5285B282 | formation | 1 | {} |
| 4 | 1E17403F-C590-4CE4-9E79-E1B7C98F97F1 | session | 1 | {} |
| 4 | 1E17403F-C590-4CE4-9E79-E1B7C98F97F1 | session | 2 | {} |
+----+--------------------------------------+-----------+---------+---------+
As you can see we can have multiple versions of the same document (referenced by document_guid). What I need is a count of all documents in the table excluding obsolete version. i.e. If the document 1E17403F-C590-4CE4-9E79-E1B7C98F97F1 has two versions like it shows in the example above then it should only account for one document in the overall count.
I really hope this makes more sense then my last question.
The main problem I have is I need a similar query that returns all the latest versions rather than just the count too.

A useful query would probably look like:
-- select the maximum version (and other information, per group)
-- can also add a 'count(1) as version_count' if required
select max(version) as latest_version, title, document_guid
from documents
-- from each group, as divided up by the same guid *see note 1
group by document_guid, title
This query returns the latest version number; there is always "one latest version" per document.
1 The title, which may be a break in normalization, needs to be part of the group for it to be included in the result columns; if not needed, it can be removed.
If the title is a required field that can change across versions then this needs to be written differently - first find the "latest version" and then join it back with the appropriate rows. An example:
select t.latest_version, d.title, d.document_guid
from documents d
join (
select max(version) as latest_version, document_guid
from documents
group by document_guid
) t
on t.document_guid = d.document_guid and t.latest_version = d.version
And of course this assumes a key of (document_guid, version).

To count distinct document_guids:
select count(distinct document_guid) from documents
To return the latest version of each document, you can either do a GROUP BY (as user2864740's answer), or a NOT EXISTS:
select * from documents d1
where not exists (select 1 from documents d2
where d2.document_guid = d1.document_guid
and d2.version > d1.version)
I.e. return a row if there are no other with same document_guid that has a higher version number.

Related

Joined query missing records

I have a database that contains a people table and another table with names for those people. For each person, there is at least one record in the names table, with one of those being set as the 'person_default_name_id' for that person, but other variations of that name in different languages. The idea is that the user who looks up the table will have a preferred language set (e.g. English, Spanish, Russian) and a preferred script set, which is based on their preferred language (e.g. if their preferred language is English or Spanish, the script would be "Latin", while if the preferred language is Russian, the script would be "Cyrillic"). It's a little complex and I'm wanting to display a list of names, but only display one name per person, and that one chosen name should be shown according to the best-fit for the user's chosen language and script.
The code below is what I'm trying:
SELECT
people.person_id,
names.name
FROM
`people`
LEFT JOIN
`names` ON names.person_id=people.person_id
LEFT JOIN
`languages` ON names.language_id = languages.language_id
LEFT JOIN
`language_scripts` ON languages.language_id = language_scripts.language_id
WHERE
(
/* 1st preference - display the default name for the person IF the default name's language writing system matches the user's writing system */
(people.person_default_name_id=names.name_id AND language_scripts.script_id = :user_script_id)
OR
/* 2nd preference - display the alternative name in the user's chosen language if an alternative name exists in that language */
names.language_id = :user_language_id
OR
/* 3rd preference - display the alternative name in the user's chosen writing system if an alternative name exists in that writing system */
language_scripts.script_id = :user_script_id
)
GROUP BY
people.person_id
ORDER BY
names.name ASC
Example data is below:
Table: people
person_id | person_default_name_id
------------------------------------
1 | 2
Table: names
name_id | name | person_id | language_id
--------------------------------------------
1 | George | 1 | 1
2 | Jorge | 1 | 2
3 | Джордж | 1 | 3
Table: languages
language_id | language
------------------------
1 | English
2 | Spanish
3 | Russian
Table: language_scripts
language_script_id | language_id | script_id
----------------------------------------------
1 | 1 | 1
2 | 2 | 1
3 | 3 | 2
Table: scripts
script_id | script
----------------------
1 | Latin
2 | Cyrillic
I'm finding that some of the expected records are not coming through. I'm guessing that there are improvements I could make to my query, but my skills are not quite advanced enough to know the best path. Can anyone see what I'm doing wrong?
I would suggest you put your where clause conditions in your select statement and return a "score" for each record. Remove it entirely from your where clause and it may give you insight into why you have missing records if they are returned with a 0 score.
Case when condition Then 5
when condition then 4
Etc...
else 0
End case
Once you have your results scored, you can order by your score descending and take the first one per person. Or add additional outer queries to only return the rows having the max score per person.
Apologies for answering from my phone.

Updating to every number after column value, up to 9?

So I created a database table in MySQL that held permission rights for permissions and commands, the command rights started with the prefix command_ in the column permission_name and then I have an extra column called allowed_ranks, which is a list of INT rank ID's that are required, split by a , character.
The issue is, the command ones were anything higher, and I've put 1 id in allowed_ranks, is there a way I can loop through all the ones with column starting with command_ and change the allowed_ranks that are just 1 ID to every number starting from that to 9? 9 is the maximum rank ID.
I've already done part of the query, I'm just not sure how to do the updating?
UPDATE permission_rights` SET `allowed_ranks` = '?' WHERE `permission_name` LIKE 'command_%';
How would I update it to every number after the columns value up to 9? So lets say I had this record... just a quick example to ensure you know what I mean.
| permission_name | allowed_ids |
----------------------------------
| command_hello | 2
| command_junk | 5
| command_delete | 8
| command_update | 1
Would become...
| permission_name | allowed_ids |
----------------------------------
| command_hello | 2,3,4,5,6,7,8,9
| command_junk | 5,6,7,8,9
| command_delete | 8,9
| command_update | 1,2,3,4,5,6,7,8,9
The better approach would be to use a number generator (some method which will produce number from 1 to n), but general MySQL has no such capability.
If you use MariaDB you can use seq_1_to_1000 as suggested here in Answer by O.Jones.
However your use case seems to be simpler, since you said that the highest rank is 9, I would just use
update a
set a.allowed_ids = RIGHT('1,2,3,4,5,6,7,8,9',19-2*a.allowed_ids)
where a.permission_name like 'command_%'

mysql count votes optimization

so im making a file hub nothing huge or fancy just to store some files that may be shared by others for download. and it just occured to me in the way that i originally intended to count the amount of upvotes or downvotes the query could be server heavy.the query to get the files is something along the lines of
select*from files;
and in such i would recieve an array of my files that i could loop over and get specifics on each file now with the inclusion of voting a file that same foreach loop would include a further query that would get the count the amount votes a file would get (the file id in the where clause) like so
select*from votes where upvoted=true and file.id=?
and i was thinking of using pdo::rowCount to get my answer. now evey bone in my body just says this is bad very bad as imagine im getting 10,000 files i just ran 10,000 extra queries one on each file and i havent looked at the downvotes yet which i was think could go in a similar fasion. any optimization adviece here is a small rep of the structure of a few tables. the upvoted and downvoted columbs are of type bool or tinyint if you will
table: file table: user table: votes
+----+-------------+ +----+-------------+ +--------+--------+--------+--------+
| id |storedname | | id | username | |file_id | user_id| upvoted | downvoted
+----+-------------+ +----+-------------+ +--------+--------+--------+--------+
| 1 | 45tfvb.txt | | 1 | matthew | | 1 | 2 | 1 | 0
| 2 |jj7fnfddf.pdf| | 2 | mark | | 2 | 1 | 1 | 1
| .. | .. | | .. | .. | | .. | .. | .. | ..
there are two ways to do this. the better way to do this (aka faster) is to write separate queries and build into one variable in your programming language (like php, python.. etc.)
SELECT
d.id as doc_id,
COUNT(v.document_id) as num_upvotes
FROM votes v
JOIN document d on d.id = v.document_id
WHERE v.upvoted IS TRUE
GROUP BY doc_id
);
that will return your list of upvoted documents. you can do the same for your downvotes.
then after your select from document do a for loop to compare the votes with the document by ID and build into a dictionary or list.
The second way to do this which can take a lot longer at runtime if you have a bunch of records in the table (its less efficient, but easier to write) is to add subquery selects in your select statement like this...
SELECT
logical_name ,
document.id ,
file_type ,
physical_name ,
uploader_notes ,
views ,
downloads ,
user.name ,
category.name AS category_name,
(Select count(1) from votes where upvoted=true and document_id=document.id )as upvoted,
(select count(1) from votes where upvoted=false and document_id=document.id) as downvoted
FROM document
INNER JOIN category ON document.category_id = category.id
INNER JOIN user ON document.uploader_id = user.id
ORDER BY category.id
Two advices:
Avoid SELECT * especially if you're going to count. Replace it, with something like that:
SELECT COUNT(1) AS total WHERE upvoted=true AND file.id=?
Maybe you want to create a TRIGGER to keep update a counter in the file table.
I hope it will be helpfull to you.

Select a row with specific content from a GROUP BY group

I have two tables which allow a user to request songs. Of course a song can be requested by multiple users:
| Id | Song_Name | | Requested_Id | By_IP |
+====+===========+ +==============+=========+
| 1 | song1 | | 1 | 1.1.1.1 |
| 2 | song2 | | 1 | 2.2.2.2 |
| 3 | song3 | | 1 | 3.3.3.3 |
| 2 | 2.2.2.2 |
In order to prevent one user from requesting a song multiple times (abuse), I need to check whether a certain song has already been requested by the user which is just trying to request it again. So I'm doing a LEFT JOIN between the first and the second table and a GROUP BY by the row's Id which returns one row for each song.
PROBLEM: GROUP BY returns unpredictable values on fields which are not grouped. That is known. But How can I make sure that SELECT returns the row containing a specific IP, in case this IP exists in this group? If the IP does not exist, any other row of the group can be returned by SELECT.
Thanks a lot!
UPDATE: I need to show the song in a list, independent of how many users (or even none at all) have requested it. So SELECT definitely needs to return one row for every song. But in case that for example the user with IP 3.3.3.3 is trying to request song1, (which was already requested by him) I expect the query to return this:
| Id | Song_Name | By_IP |
+====+===========+=========+
| 1 | song1 | 3.3.3.3 | (3.3.3.3 in case it exists, otherwise anything else)
| 2 | song2 | 2.2.2.2 |
I also need the grouping with the other requests (IPs), because I need to get the whole number of requests per song as well. Therefore I use Count().
WORKAROUND: Since it seems to be pretty complicated to do what I need (if possible at all), I'm now working with a workaround. I'm using the GROUP_CONCAT() aggregate function. This delivers me all IPs of that group separated by ",". So I can search whether the one I'm searching for already exists there. The only drawback of this is, that the (default) maximum lenght of this returned string is 1024. That means that I can't handle a big amount of users, but for now it should be fine.
It is still unclear what do u want? there is no requested date present in table. without date how do u know when a particular song has been requested.
Select Songs.id, Songs.Song_name, requested_songs.By_IP
from Songs
INNER JOIN requested_songs
on Songs.id = requested_songs.Requested_id
Group BY requested_songs.Requested_id
order by requested_songs.Requested_id ASC
;
SQLFiddle Demo:
Are you sure you're not overthinking your solution a bit? If all you want to do is eliminate duplicates, just put a UNIQUE index on your second table on both columns.
If you're trying to do something more complicated with that GROUP BY, please provide a sample resultset, as Quassnoi requested.
Just group by with Song_Name and By_IP. Like this
SELECT * FROM `songs` JOIN users GROUP BY song_name, ip

Max occurences of a given value in a table

I have a table (pretty big one) with lots of columns, two of them being "post" and "user".
For a given "post", I want to know which "user" posted the most.
I was first thinking about getting all the entries WHERE (post='wanted_post') and then throw a PHP hack to find which "user" value I get the most, but given the large size of my table, and my poor knowledge of MySQL subtle calls, I am looking for a pure-MySQL way to get this value (the "user" id that posted the most on a given "post", basically).
Is it possible ? Or should I fall back on the hybrid SQL-PHP solution ?
Thanks,
Cystack
It sounds like this is what you want... am I missing something?
SELECT user
FROM myTable
WHERE post='wanted_post'
GROUP BY user
ORDER BY COUNT(*) DESC
LIMIT 1;
EDIT: Explanation of what this query does:
Hopefully the first three lines make sense to anyone familiar with SQL. It's the last three lines that do the fun stuff.
GROUP BY user -- This collapses rows with identical values in the user column. If this was the last line in the query, we might expect output something like this:
+-------+
| user |
+-------+
| bob |
| alice |
| joe |
ORDER BY COUNT(*) DESC -- COUNT(*) is an aggregate function, that works along with the previous GROUP BY clause. It tallies all of the rows that are "collapsed" by the GROUP BY for each user. It might be easier to understand what it's doing with a slightly modified statement, and it's potential output:
SELECT user,COUNT(*)
FROM myTable
WHERE post='wanted_post'
GROUP BY user;
+-------+-------+
| user | count |
+-------+-------+
| bob | 3 |
| alice | 1 |
| joe | 8 |
This is showing the number of posts per user.
However, it's not strictly necessary to actually output the value of an aggregate function in this case--we can just use it for the ordering, and never actually output the data. (Of course if you want to know how many posts your top-poster posted, maybe you do want to include it in your output, as well.)
The DESC keyword tells the database to sort in descending order, rather than the default of ascending order.
Naturally, the sorted output would look something like this (assuming we leave the COUNT(*) in the SELECT list):
+-------+-------+
| user | count |
+-------+-------+
| joe | 8 |
| bob | 3 |
| alice | 1 |
LIMIT 1 -- This is probably the easiest to understand, as it just limits how many rows are returned. Since we're sorting the list from most-posts to fewest-posts, and we only want the top poster, we just need the first result. If you wanted the top 3 posters, you might instead use LIMIT 3.