COUNT(*) does not count correctly - mysql

I'm having trouble with a mysql query.
SELECT switch_id, port_id, isp_service.service, isp_service.id
FROM traffic, isp_service
WHERE datetime>='2013-09-01 00:00:00'
AND datetime<'2013-09-02 00:00:00'
AND isp_service.id=traffic.isp_service_id
GROUP BY switch_id, port_id
This query returns me 1000 rows.
Now I am trying to count how many users each service has so I did:
SELECT ris.id, COUNT(*) as numberOfUsers
FROM
(SELECT switch_id, port_id, isp_service.service, isp_service.id
FROM traffic, isp_service
WHERE datetime>='2013-09-01 00:00:00'
AND datetime<'2013-09-02 00:00:00'
AND isp_service.id=traffic.isp_service_id
GROUP BY switch_id, port_id)ris
GROUP BY ris.id
ORDER BY ris.id
Now, how is possible that if I sum up the column numberOfUser the results is bigger than 1000?

You should learn about proper join syntax and to prefix all columns with table aliases. A better way to write the second query is:
SELECT ris.id, COUNT(*) as numberOfUsers
FROM (SELECT switch_id, port_id, s.service, s.id
FROM traffic t join
isp_service s
on s.id = t.isp_service_id
WHERE datetime >= '2013-09-01 00:00:00' AND datetime < '2013-09-02 00:00:00'
GROUP BY switch_id, port_id
) ris
GROUP BY ris.id
ORDER BY ris.id;
This query has a problem, because s.service and s.id are included in the select, but they are not in the group by. That means that MySQL takes arbitrary values for them.
It is unclear what specifies a "user". If the switch_id/port_id pair identifies a user, then the query should produce correct results. However, you are likely to be missing id values that have users. You might be able to do this in one query:
SELECT s.id, count(*) as NumberOfUsers, count(distinct switch_id, port_id) as NumberOfUsers2
FROM traffic t join
isp_service s
on s.id = t.isp_service_id
WHERE datetime >= '2013-09-01 00:00:00' AND datetime < '2013-09-02 00:00:00'
GROUP BY s.id;
I am not sure which count is most appropriate.

Related

MySQL aggregate select query returning incorrect data

I have the following code:
SELECT gl.account_description AS invoice_total, COUNT(ili.invoice_id) AS total_invoice,
SUM(ili.line_item_amount) AS total_convert
FROM general_ledger_accounts gl JOIN
invoice_line_items ili
ON gl.account_number = ili.account_number JOIN
invoices i
ON ili.invoice_id = i.invoice_id
GROUP BY gl.account_description, i.invoice_date, ili.account_number
HAVING i.invoice_date BETWEEN '2014-04-01' AND '2014-06-30' AND
COUNT(ili.account_number) > 1
ORDER BY account_description DESC;
In my query I am supposed to have 10 rows of data returned and I'm only having 7 return and none of them with the correct information. What I should have returning is the account_description column from the general_ledger_accounts table, a count of the items in the invoice_line_items table, a sum of the line_item_amount columns in the invoice_line_items table that have the same account number. It should only be searching within invoices between the dates '2014-04-01' and '2014-06-30'. I'm supposed to join in the invoices table.
Can anyone see what it is that I'm doing wrong in my syntax to get the wrong results?
Obviously, you want a WHERE clause and not a HAVING clause for filtering on the dates -- and to remove the date from the GROUP BY (you are not selecting it):
SELECT gl.account_description AS invoice_total,
COUNT(*) AS total_invoice,
SUM(ili.line_item_amount) AS total_convert
FROM general_ledger_accounts gl JOIN
invoice_line_items ili
ON gl.account_number = ili.account_number JOIN
invoices i
ON ili.invoice_id = i.invoice_id
WHERE i.invoice_date BETWEEN '2014-04-01' AND '2014-06-30'
GROUP BY gl.account_description, ili.account_number
HAVING COUNT(*) > 1
ORDER BY account_description DESC;
I don't know if there are other issues.
I suspect that you want to remove columns i.invoice_date and ili.account_number from the group by clause. Otherwise, you get one record per distinct values of these three columns, which does not seem to be what you want.
Accordingly, you should move the filter on the dates to the where clause:
SELECT
gl.account_description AS invoice_total,
COUNT(ili.invoice_id) AS total_invoice,
SUM(ili.line_item_amount) AS total_convert
FROM general_ledger_accounts gl
INNER JOIN invoice_line_items ili
ON gl.account_number = ili.account_number
INNER JOIN invoices i
ON ili.invoice_id = i.invoice_id
WHERE
i.invoice_date >= '2014-04-01'
AND i.invoice_date < '2014-07-01'
GROUP BY gl.account_description
HAVING COUNT(ili.account_number) > 1
ORDER BY g1.account_description DESC;
Note that I modified the condition on the dates to use half-open intervals: this way, you don't have to worry about whether the last month has 30 or 31 days (or 28, or 29...); this would also smoothly handle the time part of the dates, if any.

Include zeros in SQL count query?

I want to be able to return 0 when I am doing a count, I'd preferably not use joins as my query doesn't use them.
This is my query.
SELECT count( user_id ) as agencyLogins,
DATE_FORMAT(login_date, '%Y-%m-%d') as date
FROM logins, users
WHERE login_date >= '2015-02-10%' AND login_date < '2016-02-11%'
AND logins.user_id = users.id
GROUP BY DATE_FORMAT(login_date,'%Y-%m-%d')
What it does is counts the amount of times a user has logged into the website.
It doesn't count zeros though where as I want to know when there has been no log ins.
Please try using explicit join in the future, more readable and will make you avoid this errors. What you need is a left join:
SELECT t.id,count(s.user_id) as agencyLogins, DATE_FORMAT(s.login_date, '%Y-%m-%d') as date
FROM users t
LEFT OUTER JOIN login s
ON(t.id = s.user_id)
WHERE (s.login_date >= '2015-02-10%' AND s.login_date < '2016-02-11%') or (s.user_id is null)
GROUP BY t.id,DATE_FORMAT(s.login_date,'%Y-%m-%d')
This might be help you out
SELECT SUM(agencyLogins), date FROM (
SELECT count( user_id ) as agencyLogins,
DATE_FORMAT(login_date, '%Y-%m-%d') as date
FROM logins, users
WHERE login_date >= '2015-02-10%' AND login_date < '2016-02-11%'
AND logins.user_id = users.id
GROUP BY DATE_FORMAT(login_date,'%Y-%m-%d')
UNION ALL
SELECT 0,''
) AS A
GROUP BY DATE
I think below SQL useful to you. 2015-02-10% please remove % symbol in that string.
SELECT IF(COUNT(user_id) IS NULL,'0',COUNT(user_id)) as agencyLogins, DATE_FORMAT(login_date, '%Y-%m-%d') as date FROM users left join logins on logins.user_id = users.id
WHERE date(login_date) >= date('2015-02-10') AND date(login_date) <= date('2016-02-11')
GROUP BY DATE_FORMAT(login_date,'%Y-%m-%d')

MySQL funnel multiple ANDs

With the below MySQL query, I would like to match where page is both /signup and then later down in the userflow /confirm
SELECT COUNT(*) as `total` FROM (
SELECT COUNT(DISTINCT t.user_id) AS `visitors`
FROM `tracks` t
JOIN `user_details` u ON u.id=t.user_id AND u.site_id=t.site_id
WHERE t.site_id='334565'
AND (t.page = '/signup' AND t.page = '/confirm')
AND t.timestamp BETWEEN '2015-01-23 00:00:00' AND '2015-04-30 23:59:59'
GROUP BY t.user_id, t.track_id
) as a
The main problem with this query, is that MySQL doesn't work the way I'm trying to use it (incorrectly).
The other problem is that the returned order would potentially be incorrect, so also needs to be in the specified order.
Maybe this query needs to be done completely differently, but I'm not sure I'm on the right track.
Has anyone done this before or is there a better way to get the job done?
Please note that the above WHERE clause could match more than just page and could be anything such as t.referrer or u.somethingelse
Another example would be:
SELECT COUNT(*) as `total` FROM (
SELECT COUNT(DISTINCT t.user_id) AS `visitors`
FROM `tracks` t
JOIN `user_details` u ON u.id=t.user_id AND u.site_id=t.site_id
WHERE t.site_id='334565'
AND (u.browser = 'chrome' AND t.referrer_host = 'google.com' AND t.page = '/confirm' and t.page = '/preferences')
AND t.timestamp BETWEEN '2015-01-23 00:00:00' AND '2015-04-30 23:59:59'
GROUP BY t.user_id, t.track_id
) as a
Each of the u.browser, t.referrer_host, t.page are goals and I am trying to show them all together as a funnel. Kind of how an analytics program would do it.
I'm assuming this is tracking visitors to web pages (not a tough assumption to make), with each url / page endpoint having its own entry in the tracking table.
In order to find users who have hit both pages, you need to join the tracking table to itself. Something like this:
SELECT COUNT(DISTINCT t1.user_id) AS `visitors`
FROM `tracks` t1
JOIN `user_details` u ON u.id=t1.user_id AND u.site_id=t1.site_id
join `tracks` t2 on t1.site_id = t2.site_id and u.id = t2.user_id and t1.track_id <> t2.track_id
WHERE t1.site_id='334565'
AND (t1.page = '/signup' AND t2.page = '/confirm')
AND t1.timestamp BETWEEN '2015-01-23 00:00:00' AND '2015-04-30 23:59:59'
I don't think there's any need for grouping, as I think you just want the distinct number of visitors that have signed up, and then confirmed.

MySQL Syntax Issue combining to working queries

I'm just starting to learn SQL, and managed to cobble together a couple of working queries, but then when I combine them I am getting a syntax error. The query throwing the error:
SELECT sca_ticket_status.name As Status, AVG(QueueTime)
FROM (SELECT DateDiff (created, now()) as 'QueueTime'
FROM sca_ticket as SubQuery
LEFT JOIN sca_ticket_status
ON sca_ticket.status_id = sca_ticket_status.id
GROUP BY name
ORDER BY sort
For reference, the two working queries that I am attempting to leverage are as follows:
SELECT sca_ticket_status.name As Status, COUNT(sca_ticket.ticket_id) AS Count
FROM sca_ticket
LEFT JOIN sca_ticket_status
ON sca_ticket.status_id = sca_ticket_status.id
WHERE sca_ticket.created between date_sub(now(),INTERVAL 1 WEEK) and now()
GROUP BY name
ORDER BY sort
SELECT AVG(QueueTime)
FROM (SELECT DateDiff (created, now()) as 'QueueTime'
FROM `sca_ticket`
WHERE `status_id` = 1) as SubQuery
Try closing your second select statement
SELECT sca_ticket_status.name As Status, AVG(QueueTime)
FROM (SELECT status_id, DateDiff (created, now()) as 'QueueTime'
FROM sca_ticket) q1
LEFT JOIN sca_ticket_status
ON q1.status_id = sca_ticket_status.id
GROUP BY name
ORDER BY sort
You will also need to expose the status_id column in your inner select list if you want to join on it later.
You do not need a subquery at all. This just slows down the processing in MySQL (the optimizer is not very smart; it materializes subqueries losing index information).
SELECT ts.name As Status, AVG(DateDiff(t.created, now()))
FROM sca_ticket t LEFT JOIN
sca_ticket_status ts
ON t.status_id = ts.id
GROUP BY ts.name
ORDER BY sort

Query joining in millions of records is slow, help me optimize please

Here's my query:
SELECT SQL_BUFFER_RESULT SQL_BIG_RESULT users.id, users.email,
COUNT(av.user_id) AS article_views_count,
COUNT(af.id) AS article_favorites_count,
COUNT(lc.user_id) AS link_clicks_count,
COUNT(ai.user_id) AS ad_impressions_count,
COUNT(ac.user_id) AS ad_clicks_count
FROM users
LEFT JOIN article_views AS av ON (av.user_id = users.id AND av.created_at >= '2012-11-28 00:00:00' AND av.created_at <= '2012-11-30 23:59:59')
LEFT JOIN article_favorites AS af ON (af.user_id = users.id AND af.created_at >= '2012-11-28 00:00:00' AND af.created_at <= '2012-11-30 23:59:59')
LEFT JOIN link_clicks AS lc ON (lc.user_id = users.id AND lc.created_at >= '2012-11-28 00:00:00' AND lc.created_at <= '2012-11-30 23:59:59')
LEFT JOIN ad_impressions AS ai ON (ai.user_id = users.id AND ai.created_at >= '2012-11-28 00:00:00' AND ai.created_at <= '2012-11-30 23:59:59')
LEFT JOIN ad_clicks AS ac ON (ac.user_id = users.id AND ac.created_at >= '2012-11-28 00:00:00' AND ac.created_at <= '2012-11-30 23:59:59')
GROUP BY users.id
HAVING (article_views_count + article_favorites_count + link_clicks_count + ad_impressions_count + ad_clicks_count) > 0
Some stats to give you context:
users: 1,474,348 rows
article_views: 32,603,637 rows
article_favorites: 10,199 rows
link_clicks: 4,258,901 rows
ad_impressions: 66,758,573 rows
ad_clicks: 324,125 rows
Every table that is joined in has a composite index on user_id and created_at (in that order).
We're running Mysql 5, every table is MyISAM engine.
Here's an EXPLAIN of the query: https://gist.github.com/4197482
The goal is to only return users that have any activity (view, favorite, click, impression, ad click) within the time period.
Any ideas to optimize this bad boy?
Your query seems to be an analytical query to make some analysis based on large amount of data ( as it contains an aggregation function and a GROUP BY clause).
To improve performance on such queries, you can create a materialized view result of then JOIN with somethink like:
CREATE TABLE my_view AS SELECT ... FROM ... JOIN ...
By doing that, the next query will be much more efficient as MySQL will only have to calculate the aggregation
You will then just have to implement a strategy to refresh the table (via a timestamp for example)
Another solution is to import your data in a DBMS which is built to be efficient on this kind of querires: column oriented databases. For example, InfiniDB which is an open source dbms based on MySQL with a storage engine optimized for analytical queries.
Try to split query to INNER JOIN with each table and combine them with UNION.
Like
SELECT users.id, users.email, COUNT(av.user_id) AS article_views_count
FROM users
JOIN article_views AS av ON (av.user_id = users.id AND av.created_at >= '2012-11-28 00:00:00' AND av.created_at <= '2012-11-30 23:59:59')
GROUP BY users.id, users.email
UNION
....