I have a place_analytics table where I fetch followed places per month, my problem is that, when I fetch distinct count on monthly basis, it simply ignores the distinct value and fetched unwanted(described in screenshot) record based on monthly basis.
Note: A user can follow/unfollow same place multiple times, I do not delete this data but store with multiple entries.
For instance, in below screenshot, I have filtered the result based on place_id, where user_id=5 follows place_id=4 in May, unfollows it in July, then follows back also in July.
When I query,
SELECT place_id, COUNT(DISTINCT user_id) as follow_count
FROM place_analytics
WHERE is_followed=1
GROUP BY place_id
which is expected results, But, when I add another group by clause by months as,
MONTHNAME(STR_TO_DATE(MONTH(created_at), '%m'))
It returns, following result set where place_id 4 appears multiple times.
QUESTION
If user_id=5 has followed/unfollowed the place_id=4 in May, and followed in July. My query should only consider one record through out the table and ignore monthly group by. Is this possible?
Detail MySQL Fiddle is here,
Expected result should be,
Related
I'm pretty new to SQL and I'm struggling with one of the questions on my exercise. How would I calculate average session length per daily active user? The table shown is just a sample of what the extended table is. Imagine loads more rows.
I simply used this query to calculate the daily active users:
SELECT COUNT (DISTINCT user_id)
FROM table1
and welcome to StackOverflow!
now, your question:
How would I calculate average session length per daily active user?
you already have the session time, and using AVG function you will get a simple average for all
select AVG(session_length_seconds) avg from table_1
but you want per day... so you need to think as group by day, so how do you get the day? you have a activity_date as a Date entry, it's easy to extract day, month and year from it, for example
select
DAY(activity_date) day,
MONTH((activity_date) month,
YEAR(activity_date) year
from
table_1
will break down the date field in columns you can use...
now, back to your question, it states daily active user, but all you have is sessions, a user could have multiple sessions, so I have no idea, from the context you have shared, how you go about that, and make the avg for each session, makes no sense as data to retrieve, I'll just assume, and serves this answer just to get you started, that you want the avg per day only
knowing how to get the average, let's create a query that has it all together:
select
DAY(activity_date) day,
MONTH((activity_date) month,
YEAR(activity_date) year,
AVG(session_length_seconds) avg
from
table_1
group by
DAY(activity_date),
MONTH((activity_date),
YEAR(activity_date)
will output the average of session_length_seconds per day/month/year
the group by part, you need to have as many fields you have in the select but that do not do any calculation, like sum, count, etc... in our case avg does calculation, so we don't want to group by that value, but we do want to group by the other 3 values, so we have a 3 columns with day, month and year. You can also use concat to join day, month and year into just one string if you prefer...
I am trying to get a SQL query to count personid unique for the month, is a 'Returning' visitor unless they have a record of 'New' for the month as well.
month | personid | visitstat
---------------------------------
January john new
January john returning
January Bill returning
So the query I'm looking for should get a count for each unique personid that has "returning" unless a "new" exists for that personid as well - in this instance returning a count of 1 for
January Bill returning
because john is new for the month.
The query I've tried is
SELECT COUNT(distinct personid) as count FROM visit_info WHERE visitstat = 'Returning' GROUP BY MONTH(date) ORDER BY date
Unfortunately this counts "Returning" even if a "New" record exists for the person in that month.
Thanks in advance, hopefully I explained this clearly enough.
SQL Database Image
Chart of Data
You already wrote the "magic" word yourself, "exists". You can use exactly that, a NOT EXISTS and a correlated subquery.
SELECT count(DISTINCT vi1.personid) count
FROM visit_info vi1
WHERE vi1.visitstat = 'Returning'
AND NOT EXISTS (SELECT *
FROM visit_info vi2
WHERE vi2.personid = vi1.personid
AND year(vi2.date) = year(vi1.date)
AND month(vi2.date) = month(vi1.date)
AND vi2.visitstat = 'New')
GROUP BY year(vi1.date),
month(vi1.date)
ORDER BY year(vi1.date),
month(vi1.date);
I also recommend to include the year in the GROUP BY expression, as you otherwise might get unexpected results when the data spans more than one year. Also only use expressions included in the GROUP BY clause or passed to an aggregation function in the ORDER BY clause. MySQL, as opposed to virtually any other DBMS, might accept it otherwise, but may also produce weird results.
I also faced one of the same scenarios I was dealing with a database. The possible way I did was to use group by with having clause and a subquery.
I have a table called "Sold_tickets" with attributes "Ticket_id" and "Date_sold". I want to find the day when the most tickets have been sold and the amount of tickets that were sold.
ticket_id date_sold
1 2017-02-15
2 2017-02-15
3 2017-02-14
In this case I want my output to look like this:
date_sold amount
2017-02-15 2
I know you can use a query like this
SELECT Count(ticket_id)
FROM Sold_tickets
WHERE date_sold = '2017-02-15';
to get an output of 2. The same can of course be done for 2017-02-14 to get an output of 1. However, then I have to manually check all the dates and compare them myself. Does a function exist (in sqlite) that counts the tickets sold for all the dates and then shows you only the maximum value?
Try using a GROUP BY aggregation query, then retain only the record having the maximum number of sales.
SELECT date_sold, COUNT(*)
FROM Sold_tickets
GROUP BY date_sold
ORDER BY COUNT(*) DESC
LIMIT 1
This solution would work well assuming that you don't have two or more dates tied for the greatest number of sales, or, if there is a tie, that you don't mind choosing just one date group.
What is the best way to think about the Group By function in MySQL?
I am writing a MySQL query to pull data through an ODBC connection in a pivot table in Excel so that users can easily access the data.
For example, I have:
Select
statistic_date,
week(statistic_date,4),
year(statistic_date),
Emp_ID,
count(distict Emp_ID),
Site
Cost_Center
I'm trying to count the number of unique employees we have by site by week. The problem I'm running into is around year end, the calendar years don't always match up so it is important to have them by date so that I can manually filter down to the correct dates using a pivot table (2013/2014 had a week were we had to add week 53 + week 1).
I'm experimenting by using different group by statements but I'm not sure how the order matters and what changes when I switch them around.
i.e.
Group by week(statistic_date,4), Site, Cost_Center, Emp_ID
vs
Group by Site, Cost_Center, week(statistic_date,4), Emp_ID
Other things to note:
-Employees can work any number of days. Some are working 4 x 10's, others 5 x 8's with possibly a 6th day if they sign up for OT. If I sum the counts by week, I get anywhere between 3-7 per Emp_ID. I'm hoping to get 1 for the week.
-There are different pay code per employee so the distinct count helps when we are looking by day (VTO = Voluntary Time Off, OT = Over Time, LOA = Leave of Absence, etc). The distinct count will show me 1, where often times I will have 2-3 for the same emp in the same day (hits 40 hours and starts accruing OT then takes VTO or uses personal time in the same day).
I'm starting with a query I wrote to understand our paid hours by week. I'm trying to adapt it for this application. Actual code is below:
SELECT
dkh.STATISTIC_DATE AS 'Date'
,week(dkh.STATISTIC_DATE,4) as 'Week'
,month(dkh.STATISTIC_DATE) as 'Month'
,year(dkh.STATISTIC_DATE) as 'Year'
,dkh.SITE AS 'Site ID Short'
,aep.LOC_DESCR as 'Site Name'
,dkh.EMPLOYEE_ID AS 'Employee ID'
,count(distinct dkh.EMPLOYEE_ID) AS 'Distinct Employee ID'
,aep.NAME AS 'Employee Name'
,aep.BUSINESS_TITLE AS 'Business_Ttile'
,aep.SPRVSR_NAME AS 'Manager'
,SUBSTR(aep.DEPTID,1,4) AS 'Cost_Center'
,dkh.PAY_CODE
,dkh.PAY_CODE_SHORT
,dkh.HOURS
FROM metrics.DAT_KRONOS_HOURS dkh
JOIN metrics.EMPLOYEES_PUBLIC aep
ON aep.SNAPSHOT_DATE = SUBDATE(dkh.STATISTIC_DATE, DAYOFWEEK(dkh.STATISTIC_DATE) + 1)
AND aep.EMPLID = dkh.EMPLOYEE_ID
WHERE dkh.STATISTIC_DATE BETWEEN adddate(now(), interval -1 year) AND DATE(now())
group by dkh.SITE, SUBSTR(aep.DEPTID,1,4), week(dkh.STATISTIC_DATE,4), dkh.STATISTIC_DATE, dkh.EMPLOYEE_ID
The order you use in group by doesn't matter. Each unique combination of the values gets a group of its own. Selecting columns you don't group by gives you somewhat arbitrary results; you'd probably want to use some aggregation function on them, such as SUM to get the group total.
Grouping by values you derive from other values that you already use in group by, like below, isn't very useful.
week(dkh.STATISTIC_DATE,4), dkh.STATISTIC_DATE
If two rows have different weeks, they'll also have different dates, right?
I'm looking to make some bar graphs to count item sales by day, month, and year. The problem that I'm encountering is that my simple MySQL queries only return counts where there are values to count. It doesn't magically fill in dates where dates don't exist and item sales=0. This is causing me problems when trying to populate a table, for example, because all weeks in a given year aren't represented, only the weeks where items were sold are represented.
My tables and fields are as follows:
items table: account_id and item_id
// table keeping track of owners' items
items_purchased table: purchaser_account_id, item_id, purchase_date
// table keeping track of purchases by other users
calendar table: datefield
//table with all the dates incremented every day for many years
here's the 1st query I was referring to above:
SELECT COUNT(*) as item_sales, DATE(purchase_date) as date
FROM items_purchased join items on items_purchased.item_id=items.item_id
where items.account_id=125
GROUP BY DATE(purchase_date)
I've read that I should join a calendar table with the tables where the counting takes place. I've done that but now I can't get the first query to play nice this 2nd query because the join in the first query eliminates dates from the query result where item sales are 0.
here's the 2nd query which needs to be merged with the 1st query somehow to produce the results i'm looking for:
SELECT calendar.datefield AS date, IFNULL(SUM(purchaseyesno),0) AS item_sales
FROM items_purchased join items on items_purchased.item_id=items.item_id
RIGHT JOIN calendar ON (DATE(items_purchased.purchase_date) = calendar.datefield)
WHERE (calendar.datefield BETWEEN (SELECT MIN(DATE(purchase_date))
FROM items_purchased) AND (SELECT MAX(DATE(purchase_date)) FROM items_purchased))
GROUP BY date
// this lists the sales/day
// to make it per week, change the group by to this: GROUP BY week(date)
The failure of this 2nd query is that it doesn't count item_sales by account_id (the person trying to sell the item to the purchaser_account_id users). The 1st query does but it doesn't have all dates where the item sales=0. So yeah, frustrating.
Here's how I'd like the resulting data to look (NOTE: these are what account_id=125 has sold, other people many have different numbers during this time frame):
2012-01-01 1
2012-01-08 1
2012-01-15 0
2012-01-22 2
2012-01-29 0
Here's what the 1st query current looks like:
2012-01-01 1
2012-01-08 1
2012-01-22 2
If someone could provide some advice on this I would be hugely grateful.
I'm not quite sure about the problem you're getting as I don't know the actual tables and data they contain that generates those results (that would help a lot!). However, let's try something. Use this condition:
where (items.account_id = 125 or items.account_id is null) and (other-conditions)
Your first query is perfectly acceptable. The fact is you don't have data in the mysql table and therefore it can't group any data together. This is fine. You can account for this in your code so that if the date does not exist, then obviously there's no data to graph. You can better account for this by ordering the date value so you can loop through it accordingly and look for missed days.
Also, to avoid doing the DATE() function, you can change the GROUP BY to GROUP BY date (because you have in your fields selected DATE(pruchase_date) as date)