I fail at mysql, and could really do with some help. I don't know what it would be called, and all my attempts at using combinations of DISTINCT and GROUP BY are just not working out.
I have a table of server monitoring data with these columns:
nStatusNumber
Bandwidth
Load
Users
ServerNumber
DiskFree
MemFree
TimeStamp
**nStatusNumber** - A unique number increasing for each entry
**ServerNumber** - A unique number for each server
For the top of my dashboard for this, I need to display the most recent report for each unique server.
// How many servers are we monitoring ?
$nNumServers = mysql_numrows(mysql_query("SELECT DISTINCT(ServerNumber) FROM server_status;"));
// Get our list of servers
$strQuery = "SELECT * FROM server_status ORDER BY nStatusNumber DESC limit ".$nNumServers.";";
And then loop through the results until we hit $nNumServers . This worked at first, until servers started going down/up and the report order got jumbled.
Say theres 20 servers, the most recent 20 results aren't necessarily 1 from each server.
I'm trying to figure this out in a hurry, and failing at it. I've tried all sorts of combinations of DISTINCT and GROUP BY with no luck so far and would appreciate any guidance on what's probably an embarrassingly easy problem that I just can't see the answer to.
Thanks!
PS - Here's an example query that I've been trying, showing the problem I'm having. Check the "nStatusNumber" field, these should be showing the most recent results only for each server - http://pastebin.com/raw.php?i=ngXLRhd6
PPS - Setting max(nStatusNumber) doesn't give accurate results. I don't want some average/sum/median figure, I need the most recent ACTUAL figures reported by each server. Heres more example results for the queries:
http://pastebin.com/raw.php?i=eyuPD7vj
For your purpose you need to find the row unique to a nServerNumber and TimeStamp. This is not as simple as just saying MAX(TimeStamp) as you need to find the row corresponding to it.
Although I am not an expert in SQL you can try this and see if it works.
SELECT A.nServerNumber, A.nStatusNumber, A.nVNStatsBandwidth, A.fLoad, A.nUsers,
A.nUsersPaid, A.nServerNumber, A.nFreeDisk, A.nTotalDisk, A.nFreeMemory,
A.nTotalMemory, A.TimeStamp
FROM server_status A
INNER JOIN
(
SELECT nServerNumber, MAX(TimeStamp) as `TimeStamp`
FROM server_status
GROUP BY nServerNumber
) B
ON A.nServerNumber = B.nServerNumber
AND A.TimeStamp = B.TimeStamp
ORDER BY A.nServerNumber ASC;
This query will give you all the servers with their latest info. So if you want the total number of servers just run the mysql_numrows(...) function on this result and if you want the data just iterate through the same result (no need to fire two separate SQL queries).
Try this ::
Select
Select MAX(nStatusNumber) from table,
Bandwidth,
Load,
Users,
ServerNumber,
DiskFree,
MemFree,
MAX(`TimeStamp`)
from your table
group by ServerNumber
Related
I have a database with over 100 million rows of reddit comment data in the format of:
{
author: redditauthor1,
body: example comment,
subreddit: /r/funny,
....
}
I am trying to get a list of users with their respective number of comments for all the subreddits they posted in. I am also narrowing it down by users who also posted in the subreddit I pass through as a parameter.
I have 4 indexes for this single table. Reason being is that I only plan on reading from this for the time being. The indexes look like so:
CREATE INDEX idx_subreddit
ON comments(subreddit);
CREATE INDEX idx_author
ON comments(author);
CREATE INDEX idx_authsub
ON comments(author, subreddit);
CREATE INDEX idx_subauth
ON comments(subreddit, author);
I've also tried just narrowing it down to the subreddit,author index with no improvement. I am further narrowing down my search by removing [deleted] users from the list of rows. My query is as follows:
SELECT author, subreddit, count(*) as numcomments
from comments
WHERE author IN (SELECT author FROM comments WHERE subreddit="politics" AND author != "[deleted]")
group by author, subreddit
ORDER BY author
LIMIT 100
;
According to my explain plan, this returns 3 million rows, which is expected of a nearly 100Gb dataset.
The query takes well over 300 seconds to run for large subreddits such as /r/politics. Smaller ones with less activity run in a second or less. Is there anything I can do to improve this execution time? I've tried running the query through EverSQL and using the query they specified as well as the single subreddit,author composite index they recommended but it actually made the runtime worse. I know there are third party options like pushShift API which utilizes google bigquery but because I'd like to work on this offline I want to do it all locally. Lastly, I've thought of just getting all the comments and "counting" them myself instead of using mySql's count(*) method and group by but even so the query takes a while to retrieve all the comments (15 million) that I'd have to process on the back end. Is there a solution to this? Something like a Redis caching system? Partitioning? I wish to get this query under 3 seconds if possible. Any feedback is appreciated.
Per a user's suggestion I have run an explain on this query:
SELECT x.author
, x.subreddit
, COUNT(*) numcomments
FROM comments x
JOIN
( SELECT author
FROM comments
WHERE subreddit = "politics"
AND author != "[deleted]"
) y
ON y.author = x.author
GROUP
BY x.author
, x.subreddit;
and the EXPLAIN produced this:
Move the criteria directly in the main query. By adding two selects you are doing at least twice the work. Good luck.
SELECT author, subreddit, count(*) as numcomments
from comments
WHERE subreddit="politics" AND author != "[deleted]"
group by author, subreddit
LIMIT 100
;
I've tried about a dozen different methods to solve this issue, and everything I try is breaking my query... I have the following code that is used to generate a loop of threads on a message board:
SELECT MB_TOPICS.*, MAX(MB_REPLIES.TIMESTAMP) AS LATEST
FROM MB_TOPICS
LEFT JOIN MB_REPLIES
ON MB_TOPICS.TOPIC_ID = MB_REPLIES.TOPIC_ID
WHERE MB_TOPICS.CATEGORY_ID='$CATEGORY'
GROUP BY MB_TOPICS.TOPIC_ID
ORDER BY MB_TOPICS.STICKY DESC, LATEST DESC, MB_TOPICS.TIMESTAMP DESC
LIMIT $start,$limit";
This is basically pulling all of the topics within the category, and then via a join it is also getting a timestamp of the most recent reply (if any) from the replies table.
On the sort, I want to keep the most recently active threads at the top... currently (after sticky Y/N) it's sorting by most recent reply and then by the timestamp when the thread was created... this is wrong because it means a new thread will appear after an old thread with replies. I've tried things like
GREATEST(LATEST, MB_TOPICS.TIMESTAMP)
or using IIF statements, CASE statements within the ORDER BY, etc., but anything I do is just breaking the query so that no results appear. I just want to make this so that whichever timestamp is most recent (last reply or topic creation), it sorts descending on that largest value. I know this must be simple but it's killing me today. Thank you!
Edit: If it's helpful information here... the 'LATEST' column will be null for threads that have no replies...
OK, I finally got it. I had to use the MAX() function again rather than the alias, and coalesce to deal with the null values, combined with RiggsFolly's suggestion of pulling it as a new column, resulted in this functioning query:
"SELECT MB_TOPICS.*, MAX(MB_REPLIES.TIMESTAMP) AS LATEST,
GREATEST(COALESCE(MAX(MB_REPLIES.TIMESTAMP),0), MB_TOPICS.TIMESTAMP) AS SORT_ORDER
FROM MB_TOPICS
LEFT JOIN MB_REPLIES ON MB_TOPICS.TOPIC_ID = MB_REPLIES.TOPIC_ID
WHERE MB_TOPICS.CATEGORY_ID='$CATEGORY'
GROUP BY MB_TOPICS.TOPIC_ID
ORDER BY MB_TOPICS.STICKY DESC, SORT_ORDER DESC
LIMIT $start,$limit";
Thanks, I wouldn't have gotten there without the discussion here.
I have table with documents nammed: z_web_dok it has 133,369 rows, the other table with products nammed: z_web_dok_art has 693,930 rows, all fields needed for querying are indexed, and connection between those tables is auto increment field of z_web_dok.oid and z_web_dok_art.oid_id.
What is the problem?
I were made simple sql query for returning old prices (by buyer, place, country). The query is:
SELECT (COALESCE(NULLIF(TNew.cijena_e,''), TNew.cijena)*(TN.kurs_iznos/1))
FROM z_web_dok_art As TNew
INNER JOIN z_web_dok As TN ON TN.oid=TNew.oid_id
WHERE (TN.Drzava='BiH' AND TNew.aid='SOME_PRODUCT_ID' AND TN.vrsta_dok='pri'
AND TN.kup_id='1047' AND TN.mag_id='5' AND TN.oid<>'151967')
ORDER BY TN.dat_zavrsena DESC
LIMIT 1
and it worked, for each product, so i used it inside another query, which gets products their new prices and with the query above (subquery) i tried to pull the old prices (Connection was between tables: TNew.aid=WArt.aid )
SELECT WArt.aid, WArt.nc, WArt.isp, COALESCE(NULLIF(WArt.cijena_e,''), WArt.cijena) As cijena, WArt.rab, WArt.vpc, WArt.mpc, C.EAN, C.Model, C.Naziv, C.JM, C.WebBiH, C.DostupnostBiH,
(
SELECT (COALESCE(NULLIF(TNew.cijena_e,''), TNew.cijena)*(TN.kurs_iznos/1))
FROM z_web_dok_art As TNew
INNER JOIN z_web_dok As TN ON TN.oid=TNew.oid_id
WHERE (TN.Drzava='BiH' AND TNew.aid=WArt.aid AND TN.vrsta_dok='pri'
AND TN.kup_id='1047' AND TN.mag_id='5' AND TN.oid<>'151967')
ORDER BY TN.dat_zavrsena DESC
LIMIT 1
) As s_cijena
FROM
z_web_dok_art As WArt
LEFT JOIN Cjenovnik As C ON C.ID=WArt.aid
WHERE WArt.oid_id='151967'
ORDER BY CASE WHEN WArt.isp='0' THEN 1 ELSE 0 END, WArt.id_dok_art
It worked, for a long period, but today we discovered it returned 6 times higher result for old price for two products (on a order that has 19 products).
So i were query TN.oid with same query to see what does it pulls from database and it was oid that has following fields same: (kup_id, vrsta_dok) but all others are different even if the query asked for Drzava='BiH' AND TN.mag_id='5' it doesn't return results for that.
But the other problem is, if i execute subquery as query alone, with that product (or more them) it returns RIGHT RESULTS. But above one mixed results right and wrong one.
It's not problem in solving this!
I can solve it, but i wanted to know why this QUERY didn't work, i mean, what's the problem with it? Did anyone had problem with this?
This is my first bad experience with similar queries...
I were thinking about, it's happening because of subquery JOINS with another table inside of query (because the subquery works perfect without it's parent query).
Thank you for your time!
Best regards,
Nedžad.
IMPORTANT UPDATE (25.10.2016):
The server mysql version: 5.5.52-cll
Local version mysql: 5.6.17 - MySQL Community Server (GPL)
Localhost returned the proper results, with NO MISTAKE,
while the server mysql still returns wrong results..
Is it bug at mysql 5.5.52-cll or what?
Image of results:
UPDATE: (SOLVED)
And i solved it by using group in subquery:
GROUP BY TNew.oid_id, TNew.aid
(grouping by, documents, and products inside of it), and it returned the right results - the performance was good, 0.1704 (which is good because it's always one document at opening). Once again, thanks everyone which were lead me to right path
I am trying to make my query more efficient because it is still heavy and in the future it will get allot worst.
Here is my query:
SELECT SUM(fb_diff.shares) shares
FROM (
SELECT (SUM(fb.shares) - SUM(fbs.shares)) shares
FROM (
SELECT post_id, shares
FROM wp_facebook_total_stats
WHERE date = '2014-08-01 00:00:00'
GROUP BY post_id
) fbs
LEFT JOIN wp_facebook_total_stats fb ON fb.post_id = fbs.post_id
WHERE fb.date = '2014-09-28'
) fb_diff
It works... I get the data... But is there a way to the same without getting the same table twice?
Because when I do EXPLAIN, I get this:
2 DERIVED fb ALL post_id NULL NULL NULL 588849 Using where
3 DERIVED wp_facebook_total_stats index post_id post_id 8 NULL 588849 Using where
If you are trying to get the difference between post shares based on different dates or lapsed time and don't want to join recursively to the same table, I can see at least a couple of options:
Create a view that does this ahead of time and can be cached then query the view.
Pull the data into an array within your code by changing your select statement to group on date and post_id, then doing the math within your code to show shares differences.
Modify your schema to better meet your needs, if possible. For example add a column(s) to wp_facebook_total_stats which shows difference between shares versus previous day, previous week, previous month, etc. Whatever you will need to get the job done.
Each option has its benefits and drawbacks, consider them carefully.
Hope this helps, good luck.
I have a query that I need for it to select the average time difference of two different times from two separate tables.
That seemed easy until the next part of the query came in : I have to group by the reason for a student coming into the office. So now this query needs to :
Select the reason (why)
Count how many times a student has come in for that reason count(why)
Then I need to check the AVG time from the starttime to finishtime. Keep in mind that I am grouping by why. This means that I need the timediff to compute difference for all records that fall within the reason for the students visit.
I wrote this query :
SELECT why,
count(why),
SEC_TO_TIME(AVG(MAX(support.finishtime) - session.signintime))
FROM session
LEFT JOIN support
ON support.session_id = session.session_id
WHERE status = 3
GROUP BY why;
However I get a error :
ERROR 1111 (HY000): Invalid use of group function
I don't seem to understand this problem. From reading past questions they are considering to use having? But I don't understand how or even where in this situation where to add the having clause.
Any help would be much appreciated.
Thank you
SELECT
session.why AS Reason,
COUNT(session.why) AS Count,
SEC_TO_TIME(AVG(TIMEDIFF(t.fin, session.signintime))) AS Time
FROM session
LEFT JOIN (SELECT support.session_id, MAX(support.finishtime) AS fin
FROM support
GROUP BY support.session_id) AS t
ON session.session_id = t.session_id
WHERE session.status = 3
GROUP BY session.why
ORDER BY session.signintime DESC
This does the job perfectly! I just got it a couple hours ago, I never really worked with sub queries so my professor helped me out with it.
Ok, looking over this a little bit, I'm thinking that what you want is a nested query to get the max support record.
One way to do this is:
SELECT why,
count(why),
SEC_TO_TIME(AVG((select max(finishtime) from
support where support.session_id = session.session_id) -
session.signintime))
FROM session
WHERE status = 3
GROUP BY why;
I've create an SQL Fiddle, so you can look at the table structure that I was using.