I have a purchase table with columns - item_key, price, recorded_at(datetime).
For any item, price changes once in a while, as we run some promotional campaigns. For instance; regular price for item_key 1 is 100 and we reduce the price to 80 for a week and then go back to regular price. The promotional price could be different at different times (next time it could be 60 instead of 80). We have around 100 items.
I am trying to write a query (mysql) to fetch number of days by item by price. My query is getting way too complex and taking more than acceptable time to return results.
I would greatly appreciate any help on this.
Thanks,
I am not entirely sure what you mean by "number of days by item by price".... I assume since this is a purchase table, each entry is a purchase made... If this is the case and you want to see how many distinct days there was a purchase for each item and a certain price, I think this should work:
SELECT item_key, price, COUNT(DISTINCT recorded_at) from table
GROUP BY item_key, price
I think this should take every item_key/price pair and count how many distinct days there was a sale for that pair.
Related
I'm pretty new to SQL and I'm struggling with one of the questions on my exercise. How would I calculate average session length per daily active user? The table shown is just a sample of what the extended table is. Imagine loads more rows.
I simply used this query to calculate the daily active users:
SELECT COUNT (DISTINCT user_id)
FROM table1
and welcome to StackOverflow!
now, your question:
How would I calculate average session length per daily active user?
you already have the session time, and using AVG function you will get a simple average for all
select AVG(session_length_seconds) avg from table_1
but you want per day... so you need to think as group by day, so how do you get the day? you have a activity_date as a Date entry, it's easy to extract day, month and year from it, for example
select
DAY(activity_date) day,
MONTH((activity_date) month,
YEAR(activity_date) year
from
table_1
will break down the date field in columns you can use...
now, back to your question, it states daily active user, but all you have is sessions, a user could have multiple sessions, so I have no idea, from the context you have shared, how you go about that, and make the avg for each session, makes no sense as data to retrieve, I'll just assume, and serves this answer just to get you started, that you want the avg per day only
knowing how to get the average, let's create a query that has it all together:
select
DAY(activity_date) day,
MONTH((activity_date) month,
YEAR(activity_date) year,
AVG(session_length_seconds) avg
from
table_1
group by
DAY(activity_date),
MONTH((activity_date),
YEAR(activity_date)
will output the average of session_length_seconds per day/month/year
the group by part, you need to have as many fields you have in the select but that do not do any calculation, like sum, count, etc... in our case avg does calculation, so we don't want to group by that value, but we do want to group by the other 3 values, so we have a 3 columns with day, month and year. You can also use concat to join day, month and year into just one string if you prefer...
data table looks like this
Use a query to calculate average income per hour by day of week.
SELECT WEEKDAY(date_start_time), SUM(total_income)/SUM(DATEDIFF((hour,
date_start_time, date_end_time) AS avg_income
FROM Deliveries
GROUP BY WEEKDAY(date_start_time)
Things to know:
Entry_id is a unique key for each time the employee comes into the office
There will be many records of the same user_id if an employee comes into the office repeatedly
Tasks completed will most likely stay unused in this question
Am I appropriately answering this question?
Things I am concerned about:
1) Does DATEDIFF only return an integer value? If thats the case, then to have a better estimation of the avg_income does this mean we should use DATEDIFF(minutes, ..., ...) and then calculate the hours with decimal places from that integer?
2) Are people working overnight shifts something that I need to worry about? How much more complicated would it make this query?
3) Moving onward if I was asked to "calculate the average earnings per hour during 9am to 5pm" does this mean I need to calculate this for each individual employee... or for each individual hour (ie. ultimately am I grouping by hour or by user_ID)?
1) Use timediff()
2) You will not only need to consider overnight shifts but you will need to consider overtime pay if they work > 40 hours in between the week start date and the week end date for a given week. This is only if employees are paid different hourly rates for these (ex.time and a half). If this is a factor then you will need to roll up your sleeves because it will be a full algorithm.
3) This depends on what you are trying to find the average by (user, day, etc.) but a simple way would be to just nest your select and grab an avg().
select avg(earnings) overall_average from
(select user, [calculated_earnings] as earnings from [table] where [conditions])
select avg(earnings) overall_average from
(select weekday, [calculated_earnings] as earnings from [table] where [conditions])
Have this query that I use to get the average price of the products in a product category for each of the last 30 days:
SELECT DATE(bsbp.date) AS pricedate, UNIX_TIMESTAMP(DATE(bsbp.date)) AS unixdate,
ROUND(AVG((bsbp.price / 100) * (bc.exchangerate / 100)), 0) AS avgprice
FROM bd_shopbikesprices bsbp, bd_categoriesshopbikes bcsb, bd_shopbikes bsb,
bd_shops bs, bd_currencies bc
WHERE bsbp.shopbikeid = bcsb.shopbikeid AND bcsb.categoryid = 94
AND bsbp.shopbikeid = bsb.id AND bsb.shopid = bs.id AND bs.feedcurrencyid = bc.id
AND bsbp.price > 0
GROUP BY DATE(bsbp.date) ORDER BY pricedate DESC LIMIT 30
Problem is that the table with the prices (bsbp) only contains price changes, i.e. the last price of each product where the price was different than the previous price of the product (or where the product was new and therefore didn't have a previous price).
Like this:
shopbikeid|date|price
890061|2016-07-27 02:50:01|29999
890061|2016-07-21 03:21:51|49999
890061|2016-07-17 21:20:55|29999
890061|2016-06-30 04:41:36|49999
Currently the query takes the average new prices for each day, which isn't the actual average price since the average new prices only covers the products where the price was changed/new products.
My question is how the query should be rewritten so each daily average is the average price of all products on that day, including products where the prices was changed before that day.
Can it be done somehow with a nice query? (the database is a MySQL database)
I had a similiar case and solved it using the following approach:
Create a temporary table tmpLatestDates from your price table, in which you group by the product and use MAX(date)
Create another temporary table tmpLatestPrices: Join tmpLatestDateswith your price table on product and date, only keeping the rows from tmpLatestDates. This gives you the latest price for each product.
Run your original query on tmpLatestPrices
When you do this with large datasets you'll want to add indexes to the temporary tables after you created them. Also don't forget to drop the temporary tables after you're done.
The most practical way of handling it would be to put all queries in a stored procedure.
Edit: You can follow the same logic using subqueries, but I find the temp. table approach easier to follow plus it simplifies maintenance later on.
(Table names in quotes)
Let's say there are "users" that try to sells "products". They earn a commission on all "product_sales" (id, product_id, user_id, total, sale_date). I want to somehow store their commission rate based on certain dates. For example, a user will earn 1% from 2015-01-01 to 2015-01-15, 2% from 2015-01-16 to 2015-01-28, and 3% from 2015-01-29 onwards.
I want to run a query that calculates the total commissions for a user in January.
I want to run a query that calculates daily earnings in January.
How do I store the commission rates? One idea was having a table "user_commissions" that has (id, user_id, commission_rate, from_date, to_date). It would be easy to calculate the rate for (1) if commissions stayed the same, in which case I'd do this:
SELECT (sum(total) * 0.01) as total_commissions FROM product_sales WHERE user_id = 5 and sale_date between '2015-01-01' and '2015-01-31'
But with commission rates variable this is more complex. I need to somehow join the commissions table on each sale to get the right totals.
Another question would be:
How do I store the users' current commission rate that doesn't have an expiration date and include that in the reports? In my example, "3% from 2015-01-29 onwards". This has no end date.
Your table structure is a very reasonable structure and often used for slowly changing dimensions. Storing the effective and end dates in the structure is important for efficiency.
One way to store to_date for the most recent commission is to use NULL. This allows you to do:
select *
from commissions
where to_date is null
to get the most recent record.
However, a better way is to use some far distant date, such as '9999-12-12'. This allows you get the most recent commission using:
where curdate() between from_date and to_date
This is an expression that can also make use of an index on from_date, to_date.
Honestly, I would store user commission percentages and the effective dates of those commissions in one table.
TABLE: COMMISSION
user_id, date_effective, commission
In the other table I would store sales data. With each sale, I would keep the commission the salesperson got on the sale. (Added bonus, you can change the commission on any sale, like an override of sorts.)
TABLE: SALE
sale_id, sale_date, user_id, sale_amount, commission
When you create the row in your program, you can grab the correct commission rate using the following query:
SELECT commission from commission WHERE user_id=[user's id] AND date_effective<=[sale date, today] ORDER BY date_effective ASC;
MySQL Left Joins, and SQL in general, can get really tricky when trying to join on dates that don't exactly match up. (Looking back, basically.) I am struggling with the same problem right now without the luxury of the solution I just suggested.
(This whole post is based on the assumption that you aren't going to be directly interacting with this database through a DBMS but instead through an application.)
I'm looking to make some bar graphs to count item sales by day, month, and year. The problem that I'm encountering is that my simple MySQL queries only return counts where there are values to count. It doesn't magically fill in dates where dates don't exist and item sales=0. This is causing me problems when trying to populate a table, for example, because all weeks in a given year aren't represented, only the weeks where items were sold are represented.
My tables and fields are as follows:
items table: account_id and item_id
// table keeping track of owners' items
items_purchased table: purchaser_account_id, item_id, purchase_date
// table keeping track of purchases by other users
calendar table: datefield
//table with all the dates incremented every day for many years
here's the 1st query I was referring to above:
SELECT COUNT(*) as item_sales, DATE(purchase_date) as date
FROM items_purchased join items on items_purchased.item_id=items.item_id
where items.account_id=125
GROUP BY DATE(purchase_date)
I've read that I should join a calendar table with the tables where the counting takes place. I've done that but now I can't get the first query to play nice this 2nd query because the join in the first query eliminates dates from the query result where item sales are 0.
here's the 2nd query which needs to be merged with the 1st query somehow to produce the results i'm looking for:
SELECT calendar.datefield AS date, IFNULL(SUM(purchaseyesno),0) AS item_sales
FROM items_purchased join items on items_purchased.item_id=items.item_id
RIGHT JOIN calendar ON (DATE(items_purchased.purchase_date) = calendar.datefield)
WHERE (calendar.datefield BETWEEN (SELECT MIN(DATE(purchase_date))
FROM items_purchased) AND (SELECT MAX(DATE(purchase_date)) FROM items_purchased))
GROUP BY date
// this lists the sales/day
// to make it per week, change the group by to this: GROUP BY week(date)
The failure of this 2nd query is that it doesn't count item_sales by account_id (the person trying to sell the item to the purchaser_account_id users). The 1st query does but it doesn't have all dates where the item sales=0. So yeah, frustrating.
Here's how I'd like the resulting data to look (NOTE: these are what account_id=125 has sold, other people many have different numbers during this time frame):
2012-01-01 1
2012-01-08 1
2012-01-15 0
2012-01-22 2
2012-01-29 0
Here's what the 1st query current looks like:
2012-01-01 1
2012-01-08 1
2012-01-22 2
If someone could provide some advice on this I would be hugely grateful.
I'm not quite sure about the problem you're getting as I don't know the actual tables and data they contain that generates those results (that would help a lot!). However, let's try something. Use this condition:
where (items.account_id = 125 or items.account_id is null) and (other-conditions)
Your first query is perfectly acceptable. The fact is you don't have data in the mysql table and therefore it can't group any data together. This is fine. You can account for this in your code so that if the date does not exist, then obviously there's no data to graph. You can better account for this by ordering the date value so you can loop through it accordingly and look for missed days.
Also, to avoid doing the DATE() function, you can change the GROUP BY to GROUP BY date (because you have in your fields selected DATE(pruchase_date) as date)