I'm currently developing a program that will generate reports based upon lead data. My issue is that I'm running 3 queries for something that I would like to only have to run one query for. For instance, I want to gather data for leads generated in the past day
submission_date > (NOW() - INTERVAL 1 DAY)
and I would like to find out how many total leads there were, and how many sold leads there were in that timeframe. (sold=1 / sold=0). The issue comes with the fact that this query is currently being done with 2 queries, one with WHEREsold= 1 and one with WHEREsold= 0. This is all well and good, but when I want to generate this data for the past day,week,month,year,and all time I will have to run 10 queries to obtain this data. I feel like there HAS to be a more efficient way of doing this. I know I can create a mySQL function for this, but I don't see how this could solve the problem.
Thanks!!
Why not GROUP BY sold so you get the totals for sold and not sold
One way to do is to exploit the aggregate functions (usually SUM and COUNT help you the most in this situation) along with MySQL's IF() function.
For example, you could use a query such as:
SELECT
SUM(IF(sold = 1, sold, 0)) AS TotalSold,
SUM(IF(sold = 0, sold, 0)) AS TotalUnsold,
SUM(IF(submission_date > (NOW() - INTERVAL 1 WEEK)
AND sold = 1, sold, 0) AS TotalSoldThisWeek
FROM ...
WHERE ...
The condition (e.g. sold = 1) could be as complex as you want by using AND and OR.
Disclamer: code wasn't tested, this was just provided as an example that should work with minor modifications.
Related
I know there're lots of possible duplicates similar to this question, I tested all I could find but I still have an issue understanding the logic of the query...
I have a bar chart which I wanna connect it to the query that showcases how many "seatbelt", "Speeding", etc violations happened per day.
The query I have now is:
select violationType as 'Violations', count( DISTINCT violationType) as 'Total', day(violationDateTime) as 'Day' from traffic_violations group by day(violationDateTime);
The result:
Do you think the result is correct? because what I see is that "wrong parking" is repeated by day, yes I know it showcases how many of wrong parking happened on day 1, 2, 3 but what about "Speeding"? It didn't happened in day 1, nor 2,3?
Basically I want to create a daily chart in the dashboard. Starts at 0 in the beginning of the day and counts up the traffic violations until the end of the day then resets the next day.
I also tried this:
select count(id) as 'Total', violationType as 'Violation', violationDateTime as 'Date' from traffic_violations group by violationType, day(violationDateTime);
results in:
Which I think is more correct than the above?
I started a HR management project and I want to count days between 2 dates without counting the holidays and weekends. So the HR can count employee's day off
Here's the case, I want to count between 2018-02-14 and 2018-02-20 where there is an office holiday on 2018-02-16. The result should be 3 days.
I have already created a table called tbl_holiday where I put all weekends and holidays in one year there
I found this post, and I tried it on my MariaDB
Here's my query:
SELECT 5 * (DATEDIFF('2018-02-20', '2018-02-14') DIV 7) +
MID('0123444401233334012222340111123400012345001234550', 7 *
WEEKDAY('2018-02-14') + WEEKDAY('2018-02-20') + 1, 1) -
(SELECT COUNT(dates) FROM tbl_holiday WHERE dates NOT IN (SELECT dates FROM tbl_holiday)) as Days
The query works but the result is 4 days, not 3 days. It means the query only exclude the weekends but not the holiday
What is wrong with my query? Am I missing something? Thank you for helping me
#RichardDoe, from the question comments.
In a reasonable implementation of a date table, you create a list of all days (covering a sufficient range to cope with any query you may run against it - 15 years each way from today is probably a useful minimum), and alongside each day you store a variety of derived attributes.
I wrote a Q&A recently with basic tools that would get you started in SQL Server: https://stackoverflow.com/a/48611348/9129668
Unfortunately I don't have a MySQL environment or intimate familiarity with it to allow me to write or adapt queries off the top of my head (as I'm doing here), but I hope this will illustrate the structure of a solution for you in SQL Server syntax.
In terms of the answer I link to (which generates a date table on the fly) and extending it by adding in your holiday table (and making some inferences about how you've defined your holiday table), and noting that a working day is any day Mon-Fri that isn't a holiday, you'd write a query like so to get the number of working days between any two dates:
WITH
dynamic_date_table AS
(
SELECT *
FROM generate_series_datetime2('2000-01-01','2030-12-31',1)
CROSS APPLY datetime2_params_fxn(datetime2_value)
)
,date_table_ext1 AS
(
SELECT
ddt.*
,IIF(hol.dates IS NOT NULL, 1, 0) AS is_company_holiday
FROM
dynamic_date_table AS ddt
LEFT JOIN
tbl_holiday AS hol
ON (hol.dates = ddt.datetime2_value)
)
,date_table_ext2 AS
(
SELECT
*
,IIF(is_weekend = 1 OR is_company_holiday = 1, 0, 1) AS is_company_work_day
FROM date_table_ext1
)
SELECT
COUNT(datetime2_value)
FROM
date_table_ext2
WHERE
(datetime2_value BETWEEN '2018-02-14' AND '2018-02-20')
AND
(is_company_work_day = 1)
Obviously, the idea for a well-factored solution is that these intermediate calculations (being general in nature to the entire company) get rolled into the date_params_fxn, so that any query run against the database gains access to the pre-defined list of company workdays. Queries that are run against it then start to resemble plain English (rather than the approach you linked to and adapted in your question, which is ingenious but far from clear).
If you want top performance (which will be relevant if you are hitting these calculations heavily) then you define appropriate parameters, save the lot into a stored date table, and index that table appropriately. This way, your query would become as simple as the final part of the query here, but referencing the stored date table instead of the with-block.
The sequentially-numbered workdays I referred to in my comment on your question, are another step again for the efficiency and indexability of certain types of queries against a date table, but I won't complicate this answer any further for now. If any further clarification is required, please feel free to ask.
I found the answer for this problem
It turns out, I just need to use a simple arithmetic operator for this problem
SELECT (SELECT DATEDIFF('2018-02-20', '2018-02-14')) - (SELECT COUNT(id) FROM tbl_holiday WHERE dates BETWEEN '2018-02-14' AND '2018-02-20');
I am creating a pipeline report so we can count opportunities that have been added each week.
My query is:
SELECT
sum(IF(o.date_entered > date_sub(now(), INTERVAL 1 WEEK), 1,0))
Pretty simple and works. The problem is, sales now also wants to count any opportunity as new that has been moved out of a loss status. So, I left-joined to an audit table to include this use case. But now, it counts every instance of the audit table for a given account where the field = sales_stage and the before_value is a loss status. So, and not that this would happen that often if ever, but if an opportunity moves from loss to lead, back to loss, and back to lead, it will count it as 2 new opportunities. I just want to get the latest instance of field=sales_stage and before_value is a loss status, and count that one time.
I want something like a sub-query in the left join, and I keep trying to use MAX, but nothing's working. Here's part of my join:
INNER JOIN opportunities o ON ao.opportunity_id=o.id
LEFT JOIN opportunities_audit oa ON o.id=oa.parent_id
AND after_value_string = 'Loss'
AND date_created > date_sub(now(), INTERVAL 1 WEEK)
Does anybody know the solution to this type of problem? Thank you in advance for any advice!
For a task in a managerial accounting context, I generated a relatively large SQL-Query with a MySQL-DB. This query has close to 600 lines and generates as a result a large table with the economic analysis for different products.
This works fine so far and the query just takes about 3 seconds.
But the outcome is only the analysis for one month. Now we would like to execute the query for a couple of month and aggregate the results.
I simply could change the query to include a larger time period as a condition (now just one month). But that would lead to an incorrect (averaged) distribution of overhead costs due to ignoring of larger monthly fluctuations in certain key figures.
Therefore, I think, I would have to generate one (sub-)table per month I would like to analyze. Finally, all these sub-tables would have to be aggregated with a superordinate main query. That should probably work, but this query would then be really large. E.g. for 12 months I would need about 12 x 600 lines for the sub-queries and about another 100 lines for the main query.
This leads to my question: Is this the way how one would do that? Without better knowing, it seems to me an unusually large query which also might be cumbersome to maintain. What would be the best practice way to accomplish the given task?
Thank you
If the data is static once the month is over you can launch your select at the beginning of each month (to calculate the previous month) and store the result in a table with an extra column "month".
insert into monthly_aggregation (month, ...)
select ... <600 lines of SQL for specific month>
This can be triggered at the beginning of every month.
If historical data can change, you have to rebuild the whole table by executing the INSERT-SElECT per month.
Let's say this is your query showing products for a particular month:
select product_id, sum(purchased), avg(price), ...
from <many tables>
where month = 6
group by product_id;
Then you can change it thus to have it show product data per month:
select month, product_id, sum(purchased), avg(price), ...
from <many tables>
group by month, product_id;
You can then work on this with an outer query:
select ...
from
(
select month, product_id, sum(purchased), avg(price), ...
from <many tables>
group by product_id, month
) product_and_month
group by ...;
In rails 3 (also with meta_where gem if you feel like using it in your query), I got a really tricky query that I have been banging my head for:
Suppose I have two models, customers and purchases, customer have many purchases. Let's define customers with at least 2 purchases as "repeat_customer". I need to find the total number of repeat_customers by each day for the past 3 months, something like:
Date TotalRepeatCustomerCount
1/1/11 10 (10 repeat customers by the end of 1/1/11)
1/2/11 15 (5 more customer gained "repeat" status on this date)
1/3/11 16 (1 more customer gained "repeat" status on this date)
...
3/30/11 150
3/31/11 160
Basically I need to group customer count based on the date of creation of their second purchase, since that is when they "gain repeat status".
Certainly this can be achieved in ruby, something like:
Customer.includes(:purchases).all.select{|x| x.purchases.count >= 2 }.group_by{|x| x.purchases.second.created_at.to_date }.map{|date, customers| [date, customers.count]}
However, the above code will fire query on the same lines of Customer.all and Purchase.all, then do a bunch of calculation in ruby. I would much prefer doing selection, grouping and calculations in mysql, since it is not only much faster, it also reduces the bandwith from the database. In large databases, the code above is basically useless.
I have been trying for a while to conjure up the query in rails/active_record, but have no luck even with the nice meta_where gem. If I have to, I will accept a solution in pure mysql query as well.
Edited: I would cache it (or add a "repeat" field to customers), though only for this simplified problem. The criteria for repeat customer can change by the client at any point (2 purchases, 3 purchases, 4 purchases etc), so unfortunately I do have to calculate it on the spot.
SELECT p_date, COUNT(customers.id) FROM
(
SELECT p_date - INTERVAL 1 day p_date, customers.id
FROM
customers NATURAL JOIN purchases
JOIN (SELECT DISTINCT date(purchase_date) p_date FROM purchases) p_dates
WHERE purchases.purchase_date < p_date
GROUP BY p_date, customers.id
HAVING COUNT(purchases.id) >= 2
) a
GROUP BY p_date
I didn't test this in the slightest, so I hope it works. Also, I hope I understood what you are trying to accomplish.
But please note that you should not do this, it'll be too slow. Since the data never changes once the day is passed, just cache it for each day.