SELECT c.siteno, a.sitename, a.location, Count(a.status) AS ChargeablePermit
FROM (PermitStatus AS a LEFT JOIN states AS b ON a.status = b.statusheading)
LEFT JOIN Sitedetails AS c ON a.zone = c.compexzone
WHERE b.statusheading like "Chargeable" and a.loaded_date between
(select monthstart from ChargeDate) and (select Monthend from ChargeDate)
GROUP BY a.sitename, c.siteno, a.location;
This query returns me the count of chargeable permits by site
Mar14
Siteno (1) Sitename (site1) Location (location1) Chargeablepermit (30)
these calculations are based on the period determined by the two sub selects (i.e. for the month of March 14)
i was wondering if i could change the date range covered by the subselects (i.e.to April 14) and do math on (subtract one count from the other) the counts of chargeable permits from the two different result sets and have that result displayed on the on one table
for instance if April 14 was
April
Siteno (1) Sitename (Site1) Location (Location1) ChargeablePermit (40) Difference (10)
Not in the way it seems you are proposing, you would simply double-up your SQL within a UNION query to return the data sets for the 2 periods, and then perform an aggregate on the results:
SELECT SUM(CP) FROM (
SELECT (ChargeablePermit * -1) AS CP FROM ... WHERE dates = Date1
UNION ALL
SELECT ChargeablePermit AS CP FROM ... WHERE dates = Date2
)
Depending on how many records you're dealing with, a UNION like this could be quite slow however. So the other approach would be to turn your SQL into an Append query which inserts the output into a temp table. You would run the query for each period, before running a 2nd query to aggregate the results from the temp table.
Also you should consider using joins to filter your results rather than subqueries.
Related
Noobie to SQL. I have a simple query here that is 70 million rows, and my work laptop will not handle the capacity when I import it into Tableau. Usually 20 million rows and less seem to work fine. Here's my problem.
Table name: Table1
Fields: UniqueID, State, Date, claim_type
Query:
SELECT uniqueID, states, claim_type, date
FROM table1
WHERE date >= '11-09-2021'
This gives me what I want, BUT, I can limit the query significantly if I count the number of uniqueIDs that have been used in 3 or more different states. I use this query to do that.
SELECT unique_id, count(distinct states), claim_type, date
FROM table1
WHERE date >= '11-09-2021'
GROUP BY Unique_id, claim_type, date
HAVING COUNT(DISTINCT states) > 3
The only issue is, when I put this query into Tableau it only displays the FIRST state a unique_id showed up in, and the first date it showed up. A unique_id shows up in multiple states over multiple dates, so when I use this count aggregation it's only giving me the first result and not the whole picture.
Any ideas here? I am totally lost and spent a whole business day trying to fix this
Expected output would be something like
uniqueID | state | claim type | Date
123 Ohio C 01-01-2021
123 Nebraska I 02-08-2021
123 Georgia D 03-08-2021
If your table is only of those four columns, and your queries are based on date ranges, your index must exist to help optimize that. If 70 mil records exist, how far back does that go... Years? If your data since 2021-09-11 is only say... 30k records, that should be all you are blowing through for your results.
I would ensure you have the index based on (and in this order)
(date, uniqueId, claim_type, states). Also, you mentioned you wanted a count of 3 OR MORE, your query > 3 will results in 4 or more unless you change to count(*) >= 3.
Then, to get the entries you care about, you need
SELECT date, uniqueID, claim_type
FROM table1
WHERE date >= '2021-09-11'
group by date, uniqueID, claim_type
having count( distinct states ) >= 3
This would give just the 3-part qualifier for date/id/claim that HAD them. Then you would use THIS result set to get the other entries via
select distinct
date, uniqueID, claim_type, states
from
( SELECT date, uniqueID, claim_type
FROM table1
WHERE date >= '2021-09-11'
group by date, uniqueID, claim_type
having count( distinct states ) >= 3 ) PQ
JOIN Table1 t1
on PQ.date = t1.date
and PQ.UniqueID = t1.UniqueID
and PQ.Claim_Type = t1.Claim_Type
The "PQ" (preQuery) gets the qualified records. Then it joins back to the original table and grabs all records that qualified from the unique date/id/claim_type and returns all the states.
Yes, you are grouping rows, so therefore you 'loose' information on the grouped result.
You won't get 70m records with your grouped query.
Why don't you split your imports in smaller chunks? Like limit the rows to chunks of, say 15m:
1st:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000;
2nd:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000 OFFSET 15000000;
3rd:
SELECT uniqueID, states, claim_type, date FROM table1 WHERE date >= '11-09-2021' LIMIT 15000000 OFFSET 30000000;
and so on..
I know its not a perfect or very handy solution but maybe it gets you to the desired outcome.
See this link for infos about LIMIT and OFFSET
https://www.bitdegree.org/learn/mysql-limit-offset
It is wise in the long run to use DATE datatype. That requires dates to look like '2021-09-11, not '09-11-2021'. That will let > correctly compare dates that are in two different years.
If your data is coming from some source that formats it '11-09-2021', use STR_TO_DATE() to convert as it goes in; You can reconstruct that format on output via DATE_FORMAT().
Once you have done that, we can talk about optimizing
SELECT unique_id, count(distinct states), claim_type, date
FROM table1
WHERE date >= '2021-09-11'
GROUP BY Unique_id, claim_type, date
HAVING COUNT(DISTINCT states) > 3
Tentatively I recommend this composite index speed up the query:
INDEX(Unique_id, claim_type, date, states)
That will also help with your other query.
(I as assuming the ambiguous '11-09-2021' is DD-MM-YYYY.)
I am relatively new to SQL, i am trying to update monthly salary based on employees working for a certain duration, the query displays the data using info from the person and employee table but it won't update, i keep getting a 'operand should contain 1 column' error? How would i go about displaying all the data and be able to update the monthly_salary column as well? Thanks.
UPDATE employee ep set monthly_salary = monthly_salary*1.15 = all(
SELECT p.person_id, p.name_first, p.name_last, ep.monthly_salary, ep.start_date, curdate() as today_date,
TIMESTAMPDIFF(month,ep.start_date,curdate()) as duration_months
FROM employee ep
INNER JOIN person p ON ep.person_id = p.person_id having duration_months > 24);
query result
I want this expected result but the monthly salary hasn't been updated yet, is it possible to display this and update the monthly_salary?
You are not able to do both in a single query. Typically one would run a "select query" to inspect if the desired logic appears correct, e.g.
SELECT
p.person_id
, p.name_first
, p.name_last
, ep.start_date
, curdate() as today_date
, TIMESTAMPDIFF(month,ep.start_date,curdate()) as duration_months
FROM employee ep
INNER JOIN person p ON ep.person_id = p.person_id
WHERE ep.start_date < curdate() - INTERVAL 24 MONTH
;
In that query the important piece of logic is the where clause which seeks out any employees with a start date earlier than today - 24 months.
If that logic is correct, then apply the same logic in an "update query":
UPDATE employee ep
SET monthly_salary = monthly_salary*1.15
WHERE ep.start_date < curdate() - INTERVAL 24 MONTH
;
Syntax notes:
you cannot string multiple conditions together using multiple equality operators (monthly_salary = monthly_salary*1.15 = all(...) there are 2 = signs in that
x = all() requires that all values returned by a subquery will equal x
the having clause is NOT just a substitute for a where clause. A having clause is designed for evaluating aggregated data e.g. having count(*) > 2
Finally, while it was inventive to use the having clause, what you were doing was gaining access to the alias 'duration_months', so you could simply have done this instead:
where TIMESTAMPDIFF(month,ep.start_date,curdate()) > 24
BUT this is not a good way to filter information because it requires running a function on every row of data before a decision can be reached. This has he effect of making queries slower. Compare that to the following:
WHERE ep.start_date < curdate() - INTERVAL 24 MONTH
ep.start_date is not affected by any function, and curdate() - INTERVAL 24 MONTH is just one calculation (not done every row). So this is much more efficient (also known as "sargable").
I am using the Graph Reports for the select below. The MySQL database only has the active records in the database, so if no records are in the database from X hours till Y hours that select does not return anything. So in my case, I need that select return Paypal zero values as well even the no activity was in the database. And I do not understand how to use the UNION function or re-create select in order to get the zero values if nothing was recorded in the database in time interval. Could you please help?
select STR_TO_DATE ( DATE_FORMAT(`acctstarttime`,'%y-%m-%d %H'),'%y-%m-%d %H')
as '#date', count(*) as `Active Paid Accounts`
from radacct_history where `paymentmethod` = 'PayPal'
group by DATE_FORMAT(`#date`,'%y-%m-%d %H')
When I run the select the output is:
Current Output
But I need if there are no values between 2016-07-27 07:00:00 and 2016-07-28 11:00:00, then in every hour it should show zero active accounts Like that:
Needed output with no values every hour
I have created such select below , but it not put to every hour the zero value like i need. showing the big gap between the 12 Sep and 13 Sep anyway, but there should be the zero values every hour
(select STR_TO_DATE ( DATE_FORMAT(acctstarttime,'%y-%m-%d %H'),'%y-%m-%d %H')
as '#date', count(paymentmethod) as Active Paid Accounts
from radacct_history where paymentmethod <> 'PayPal'
group by DATE_FORMAT(#date,'%y-%m-%d %H'))
union ALL
(select STR_TO_DATE ( DATE_FORMAT(acctstarttime,'%y-%m-%d %H'),'%y-%m-%d %H')
as '#date', 0 as Active Paid Accounts
from radacct_history where paymentmethod <> 'PayPal'
group by DATE_FORMAT(#date,'%y-%m-%d %H')) ;
I guess, you want to return 0 if there is no matching rows in MySQL. Here is an example:
(SELECT Col1,Col2,Col3 FROM ExampleTable WHERE ID='1234')
UNION (SELECT 'Def Val' AS Col1,'none' AS Col2,'' AS Col3) LIMIT 1;
Updated the post: You are trying to retrieve data that aren't present in the table, I guess in reference to the output provided. So in this case, you have to maintain a date table to show the date that aren't in the table. Please refer to this and it's little bit tricky - SQL query that returns all dates not used in a table
You need an artificial table with all necessary time intervals. E.g. if you need daily data create a table and add all day dates e.g. start from 1970 till 2100.
Then you can use the table and LEFT JOIN your radacct_history. So for each desired interval you will have group item (group by should be based on the intervals table.
For the purposes of my question, I have a database in a MySQL server with info on many taxi rides (it is comprised of two tables, history_trips and trip_info).
In history_trips, each row's useful data is comprised of a unique alphanumeric ID, ride_id, the name of the rider, rider, and the time the ride ended, finishTime as a Y-m-d string.
In trip_info, each row's useful data similarly contains ride_id and rider, but also contains an integer, value (calculated in the back end from other data).
What I need to do is create a query that can find the average of all the maximum 'values' from all riders in a given time period. The riders included in this average are only considered if they completed less than X (let's say 3) rides within the aforementioned time period.
So far, I have a query that creates a grouped table containing the name of the rider, the finishTime of their highest 'value' ride, the value of said ride, and the number of rides, num_rides, they have taken in that time period. The AVG(b.value) column, however, gives me the same values as b.value, which is unexpected. I would like to find some way to return the average of the b.value column.
SELECT a.rider, a.finishTime, b.value, AVG(b.value), COUNT(a.rider) as num_rides
FROM history_trips as a, trip_info as b
WHERE a.finishTime > 'arbitrary_start_date_str' and a.ride_id = b.ride_id
and b.value = (SELECT MAX(value)
from trip_info where rider = b.rider and ride_id = b.ride_id)
GROUP BY a.rider
HAVING COUNT(a.rider) < 3
I am a novice in SQL but have read on some other forums that when using the AVG function on a value you must also GROUP BY that value. I was wondering if there is a way around that or if I am thinking of this problem incorrectly. Thanks in advance for any advice / solutions you might have!
The following worked for me:
SELECT AVG(ridergroups.maxvalues) avgmaxvalues FROM
(SELECT MAX(trip_info.value) maxvalues FROM trip_info
INNER JOIN history_trips
ON trip_info.rideid = history_trips.ride_id
WHERE history_trips.finishTime > '2010-06-20'
GROUP BY trip_info.rider
HAVING COUNT(trip_info.rider) < 3) ridergroups;
The subquery groups the maximum values by rider after filtering by date and rider count. The containing query calculates the average of the maximum values.
This query below selects all rows that have a row with the same father registering 335 days or less since earlier registration. Is there a way to edit this query so that it does not eliminate the duplicate row in the output? I need to see all instances of the registration for that father within 335 days of each other.
SELECT * FROM ymca_reg a later
WHERE NOT EXISTS (
SELECT 1 FROM ymca_reg a earlier
WHERE
earlier.Father_First_Name = later.Father_First_Name
AND earlier.Father_Last_Name = later.Father_Last_Name
AND (later.Date - earlier.Date < 335) AND (later.Date > earlier.Date)
My current query is:
SELECT ymca_reg.* FROM ymca_reg WHERE (((ymca_reg.Year) In (SELECT Year FROM ymca_reg As Tmp
GROUP BY Year, Father_Last_Name, Father_First_Name
HAVING Count(*)>1
And Father_Last_Name = ymca_reg.Father_Last_Name
And Father_First_Name = ymca_reg.Father_First_Name)))
ORDER BY ymca_reg.Year, ymca_reg.Father_Last_Name, ymca_reg.Father_First_Name
This query does return all the duplicates for review correctly, but it's terribly slow because it doesn't use a join and as soon as I add the date criteria it only returns the later row. Thanks.
I think you want something like this:
SELECT *
FROM ymca_reg later
WHERE EXISTS (SELECT 1
FROM ymca_reg earlier
WHERE earlier.Father_First_Name = later.Father_First_Name AND
earlier.Father_Last_Name = later.Father_Last_Name AND
abs(later.Date - earlier.Date) < 335 and
later.Date <> earlier.Date
);
This should return all records that have such duplicates. Note that "later" and "earlier" are no longer really apt descriptions, but I left the names so you can see the similarity to your query.