I have a database with roughly 6 million entries - and will grow - where I'm running queries to return for a HighCharts charting functionality. I need to read longitudinally over years, so I'm running queries like this:
foreach($states as $state_id) { //php code
SELECT //mysql psuedocode
sum(case when mydatabase.Year = '2003' then 1 else 0 end) Year_2003,
sum(case when mydatabase.Year = '2004' then 1 else 0 end) Year_2004,
sum(case when mydatabase.Year = '2005' then 1 else 0 end) Year_2005,
sum(case when mydatabase.Year = '2006' then 1 else 0 end) Year_2006,
sum(case when mydatabase.Year = '2007' then 1 else 0 end) Year_2007,
sum(case when mydatabase.Year = '$more_years' then 1 else 0 end) Year_$whatever_year,
FROM mytable
WHERE State='$state_id'
AND Sex IN (0,1)
AND Age_segment IN (5,4,3,2,1)
AND "other_filters IN (etc, etc, etc)
} //end php code
But for various state at once... So returning lets say 5 states, each with the above statement but a state ID is substituted. Meanwhile the years can be any number of years, the Sex (male/female/other) and Age segment and other modifiers keep changing based on filters. The queries are long (at minimum 30-40seconds) a piece. So a thought I had - unless I'm totally doing it wrong - is to actually store the above query in a second table with the results, and first check that "meta query" and see if it was "cached" and then return the results without reading the db (which won't be updated very often).
Is this a good method or are there potential problems I'm not seeing?
EDIT: changed to table, not db (duh).
Table structure is:
id | Year | Sex | Age_segment | Another_filter | Etc
Nothing more complicated than that and no joining anything else. There are keys on id, Year, Sex, and Age_segment right now.
Proper indexing is what is needed to speed up the query. Start by doing an "EXPLAIN" on the query and post the results here.
I would suggest the following to start off. This way avoids the for loop and returns the data in 1 query. Not knowing the number of rows and cardinality of each column I suggest a composite index on State and Year.
SELECT mytable.State,mytable.Year,count(*)
FROM mytable
AND Sex IN (0,1)
AND Age_segment IN (5,4,3,2,1)
AND "other_filters IN (etc, etc, etc)
GROUP BY mytable.State,mytable.Year
The above query can be further optimised by checking the cardinality of some of the columns. Run the following to get the cardinality:
SELECT Age_segment FROM mytable GROUP BY Age_segment;
Pseudo code...
SELECT Year
, COUNT(*) total
FROM my_its_not_a_database_its_a_table
WHERE State = $state_id
AND Sex IN (0,1)
AND Age_segment IN (5,4,3,2,1)
GROUP
BY Year;
Related
I have set up a database with a table of approximately 100GB. I use 4cpu and 15GB ram on the gcs. When I make a query I can see from the read/write operations dashboards that only a few hundred lines are read per second which will take forever to complete.
the query is done on DBeaver, it is not a complicated query as can be seen from the picture attached. I simply do not understand why it is so slow??
the query is simply this:
"""INSERT INTO analytics_agg (
hash,
product,
interface,
click_time_0,
click_time_5,
click_time_10,
click_time_30,
click_time_60)
SELECT
hash,
product,
interface,
count(case when click_time=0 then 1 else 0 end) ,
count(case when click_time=5 then 1 else 0 end) ,
count(case when click_time=10 then 1 else 0 end) ,
count(case when click_time=30 then 1 else 0 end) ,
count(case when click_time=60 then 1 else 0 end)
FROM analytics
group by hash,product,interface """
As you're selecting from one table and inserting in another, the selection from the first table has a large impact on the final insertion rate and performance.
Try to create the following index, which should optimize the selection query and improve the overall performance of the insertion:
ALTER TABLE `analytics` ADD INDEX `analytics_idx_hash_product_interface` (`hash`,`product`,`interface`);
I am working on a application where I need to write a query on a table, which will return multiple columns count in a single query.
After research I was able to develop a query for a single sourceId, but what will happen if i want result for multiple sourceIds.
select '3'as sourceId,
(select count(*) from event where sourceId = 3 and plateCategoryId = 3) as TotalNewCount,
(select count(*) from event where sourceId = 3 and plateCategoryId = 4) as TotalOldCount;
I need to get TotalNewCount and TotalOldCount for several source Ids, for example (3,4,5,6)
Can anyone help, how can I revise my query to return a result set of three columns including data of all sources in list (3,4,5,6)
Thanks
You can do all source ids at once:
select source_id
sum(case when plateCategoryId = 3 then 1 else 0 end) as TotalNewCount,
sum(case when plateCategoryId = 4 then 1 else 0 end) as TotalOldCount
from event
group by source_id;
Use a where (before the group by) if you want to limit the source ids.
Note: The above works in both Vertica and MySQL, and being standard SQL should work in any database.
I have tried searching all over for answers but none have answered my exact issue. I have what should be a relatively simple query. However, I am very new and still learning SQL.
I need to query two columns with different dates. I want to return rows with the current number of accounts and current outstanding balance and in the same query, return rows for the same columns with data 90 days prior. This way, we can see how much the number of accounts and balance increased over the past 90 days. Optimally, I am looking for results like this:
PropCode|PropCat|Accts|AcctBal|PriorAccts|PriorBal|
----------------------------------------------------
77 |Comm | 350 | 1,000| 275 | 750
Below is my starting query. I realize it's completely wrong but I have tried numerous different solution attempts but none seem to work for my specific problem. I included it to give an idea of my needs. The accts & AcctBal columns would contain the 1/31/14 data. The PriorAcct & PriorBal columns would contain the 10/31/13 data.
select
prop_code AS PropCode,
prop_cat,
COUNT(act_num) Accts,
SUM(act_bal) AcctBal,
(SELECT
COUNT(act_num)
FROM table1
where date = '10/31/13'
and Pro_Group in ('BB','FF')
and prop_cat not in ('retail', 'personal')
and Not (Acct_Code = 53 and ACTType in (1,2,3,4,5,6,7))
)
AS PriorAccts,
(SELECT
SUM(act_bal)
FROM table1
where date = '10/31/13'
and Pro_Group in ('BB','FF')
and prop_cat not in ('retail', 'personal')
and Not (Acct_Code = 53 and ACTType in (1,2,3,4,5,6,7))
)
AS PriorBal
from table1
where date = '01/31/14'
and Pro_Group in ('BB','FF')
and prop_cat not in ('retail', 'personal')
and Not (Acct_Code = 53 and ACTType in (1,2,3,4,5,6,7))
group by prop_code, prop_cat
order by prop_cat
You can use a CASE with aggregates for this (at least in SQL Server, not sure about MySQL):
...
COUNT(CASE WHEN date='1/31/14' THEN act_num ELSE NULL END) as 'Accts'
,SUM(CASE WHEN date='1/31/14' THEN act_bal ELSE NULL END) as 'AcctBal'
,COUNT(CASE WHEN date='10/31/13' THEN act_num ELSE NULL END) as 'PriorAccts'
,SUM(CASE WHEN date='10/31/13' THEN act_bal ELSE NULL END) as 'PriorAcctBal'
....
WHERE Date IN ('1/31/14', '10/31/13')
I have a following Mysql table storing meter readings of different power stations.
Date, station_name, reading
2013-05-06, ABC, 102
2013-05-06, PQR, 122
I want a SQL query with following result for a particular date.
Date, ABC, PQR, ABC-PQR
2013-05-06,102,122,-20
You could use CASE statements:
SELECT Date
, SUM(CASE WHEN station_name = 'ABC' THEN reading ELSE 0 END) as ABC
, SUM(CASE WHEN station_name = 'PQR' THEN reading ELSE 0 END) as PQR
, SUM(CASE WHEN station_name = 'ABC' THEN reading ELSE 0 END) - SUM(CASE WHEN station_name = 'PQR' THEN reading ELSE 0 END) as 'ABC-PQR'
FROM table
WHERE Date = '20130506'
GROUP BY Date
You can search for MySQL PIVOT to find out other methods people use.
I believe that it is not possible to do dynamic column based on value of row. I believe you should do it in application-layer rather than database-layer.
See this post: mysql select dynamic row values as column names, another column as value.
I have a log table with following schema:
OperatorId - JobId - Status ( Good/Bad/Ugly )
Alex 6 Good
Alex 7 Good
James 6 Bad
Description: Whenever an operator works on a job, an entry is made along with Status. That's it.
Now I need a report like:
OperatorId - Good Count - Bad Count - Ugly Count
Alex 2 0 0
James 0 1 0
select operatorid,
sum(if(status="good",1,0)) as good,
sum(if(status="bad",1,0)) as bad,
sum(if(status="ugly",1,0)) as ugly
from table
group by operatorid
This is called a Pivot Table. It is done by setting a value 1 or 0 for each state and then summing them up:
SELECT
T.OperatorId,
SUM(T.GoodStat) AS Good,
SUM(T.BadStat) AS Bad,
SUM(T.UglyStat) AS Ugly
FROM
(
SELECT
CASE WHEN Status = 'Good' THEN 1 ELSE 0 END AS GoodStat,
CASE WHEN Status = 'Bad' THEN 1 ELSE 0 END AS BadStat,
CASE WHEN Status = 'Ugly' THEN 1 ELSE 0 END AS UglyStat,
OperatorId
FROM logTable T
)
GROUP BY T.OperatorId
If, like me, you prefer, whenever possible, to calculate counts with COUNT rather than with SUM, here's an alternative solution, which uses a method asked about in this thread:
SELECT
operatorid,
COUNT(status = 'good' OR NULL) as good,
COUNT(status = 'bad' OR NULL) as bad,
COUNT(status = 'ugly' OR NULL) as ugly
FROM table
GROUP BY operatorid