So I have a table where I collect data for the jobs that I do. Each time I create a job I assign it a date. The problem with this is the days I don't have jobs aren't stored in the database therefore when I graph my data I never see the days that I had zero jobs.
My current query looks like this:
SELECT job_data_date, SUM(job_data_invoice_amount) as job_data_date_income
FROM job_data
WHERE job_data_date >= '2010-05-05'
GROUP BY job_data_date
ORDER BY job_data_date;
The output is:
| job_data_date | job_data_date_income |
| 2010-05-17 | 125 |
| 2010-05-18 | 190 |
| 2010-05-20 | 170 |
As you can see from the example output the 2010-05-19 would not show up in the results because it was never stored there.
Is there a way to show the dates that are missing?
Thank you,
Marat
One idea is that you could have a table with all of the dates in it that you want to show and then do an outer join with that table.
So if you had a table called alldates with one column (job_data_date):
SELECT ad.job_data_date, SUM(job_data_invoice_amount) as job_data_date_income
FROM alldates ad left outer join job_data jd on ad.job_data_date = jd.job_data_date
WHERE ad.job_data_date >= '2010-05-05'
GROUP BY ad.job_data_date
ORDER BY ad.job_data_date;
The down side is that you would need to keep this table populated with all of the dates you want to show.
There's no reasonable way to do this using pure SQL, on MySQL at least, without creating a table with every date ever devised. Your best option is to alter the application that's using the results of that query to fill in the holes itself. Rather than graphing only the values it received, construct its own set of values with a simple loop; counting up one day at a time, filling in values from the query wherever they're available.
Related
Is there a way to auto increment the id field of my database based on the values of two other columns in the inserted row?
I'd like to set up my database so that when multiple rows are inserted at the same time, they keep their tracknumber ordering. The ID field should auto increment based firstly on the automatically generated timestamp field, and then secondly the tracknumber contained within that timestamp.
Here's an example of how the database might look:
id | tracknumber | timestamp
________________________________________
1 | 1 | 2014-03-31 11:35:17
2 | 2 | 2014-03-31 11:35:17
3 | 3 | 2014-03-31 11:35:17
4 | 1 | 2014-04-01 09:10:14
5 | 2 | 2014-04-01 09:10:14
I've been reading up on triggers but not sure if that's appropriate here? I feel as though i'm missing an obvious function.
This is a bit long for a comment.
There is no automatic way to do this. You can do it with triggers, if you like. Note the plural, you will need triggers for insert, update, and delete, if you want the numbering to remain accurate as the data changes.
You can do this one the query side, if the goal is to enumerate the values. Here is one method using a subquery:
select t.*,
(select count(*) from table t2 where t2.timestamp = t.timestamp and t2.id <= t.id
) as tracknumber
from table t;
The performance of this might even be reasonable with an index on table(timestamp, id).
If the data is being created once, you can also populate the values using an update query.
If you are inserting them in one transaction and or script, then sort the data yourself in the server side according to these two fields (assuming you create timestamp on server side too because that would seem logical) and then insert the rows one after another. I don't think it is necessary to overthink this and look for a difficult approach in the database. Database will still be inserting rows one after another, not all at once so there is no way it will know that it needs to do some kind of sorting beforehand. It is you who has to do it.
I am building a front-end to a largish db (10's of millions of rows). The data is water usage for loads of different companies and the table looks something like:
id | company_id | datetime | reading | used | cost
=============================================================
1 | 1 | 2012-01-01 00:00:00 | 5000 | 5 | 0.50
2 | 1 | 2012-01-01 00:01:00 | 5015 | 15 | 1.50
....
On the frontend users can select how they want to view the data, eg: 6 hourly increments, daily increments, monthly etc. What would be the best way to do this quickly. Given the data changes so much and the number of times any one set of data will be seen, caching the query data in memcahce or something similar is almost pointless and there is no way to build the data before hand as there are too many variables.
I figured using some kind of agregate aggregate table would work having tables such as readings, readings_6h, readings_1d with exactly the same structure, just already aggregated.
If this is a viable solution, what is the best way to keep the aggregate tables upto date and accurate. Besides the data coming in from meters the table is read only. Users don't ever have to update or write to it.
A number of possible solutions include:
1) stick to doing queries with group / aggregate functions on the fly
2) doing a basic select and save
SELECT `company_id`, CONCAT_WS(' ', date(`datetime`), '23:59:59') AS datetime,
MAX(`reading`) AS reading, SUM(`used`) AS used, SUM(`cost`) AS cost
FROM `readings`
WHERE `datetime` > '$lastUpdateDateTime'
GROUP BY `company_id`
3) duplicate key update (not sure how the aggregation would be done here, also making sure that the data is accurate not counted twice or missing rows.
INSERT INTO `readings_6h` ...
SELECT FROM `readings` ....
ON DUPLICATE KEY UPDATE .. calculate...
4) other ideas / recommendations?
I am currently doing option 2 which is taking around 15 minutes to aggregate +- 100k rows into +- 30k rows over 4 tables (_6h, _1d, _7d, _1m, _1y)
TL;DR What is the best way to view / store aggregate data for numerous reports that can't be cached effectively.
This functionality would be best served by a feature called materialized view, which MySQL unfortunately lacks. You could consider migrating to a different database system, such as PostgreSQL.
There are ways to emulate materialized views in MySQL using stored procedures, triggers, and events. You create a stored procedure that updates the aggregate data. If the aggregate data has to be updated on every insert you could define a trigger to call the procedure. If the data has to be updated every few hours you could define a MySQL scheduler event or a cron job to do it.
There is a combined approach, similar to your option 3, that does not depend on the dates of the input data; imagine what would happen if some new data arrives a moment too late and does not make it into the aggregation. (You might not have this problem, I don't know.) You could define a trigger that inserts new data into a "backlog," and have the procedure update the aggregate table from the backlog only.
All these methods are described in detail in this article: http://www.fromdual.com/mysql-materialized-views
I have a dataset from which I need to count all row occurrences grouping by each day and sum them into a dataset of following format:
| date | count |
| 2001-01-01 | 11 |
| 2001-01-02 | 0 |
| 2001-01-03 | 4 |
The problem is, that some of the data is missing from certain periods of time and new dates should be created to have the count of zero. I have searched various topics considering this same issue and from them I've learned that it's possible to solve by creating a temporary calendar table to hold all the dates and join the result dataset with the date table.
Though, I have only a read access to the database I'm using, so it's not possible for me to create a separate calendar table. So could this be possible to solve in a single query only? If not, I could always do this in PHP but I would prefer a more straighforward way to do this.
EDIT: Just to clarify based on the questions asked in the comments: The missing dates are required for a spesific, user given time frame. E.g. the query could be:
SELECT date(timestamp), count(distinct(id))
FROM 'table'
WHERE date(timestamp) BETWEEN date("2001-01-01") AND date("2001-12-31")
GROUP BY date(timestamp)
SQL is really not made for this kind of job :/
That's possible but really really messy and I strongly discourage you from doing it.
The easiest way was to have a separate calendar table but as you said you only have a read access to your database.
The other one is to generate the sequence using this kind of trick:
SELECT #rownum:=#rownum+1 rownum, t.*FROM (SELECT #rownum:=0) r, ("yourquery") t;
I won't get into it, as I already told you, it's really ugly :(
try this...
SELECT Date, COUNT(*) Count
FROM yourtable
GROUP BY Date
This works for sure!!!
Let me know, if it helped!
I have two simple Mysql tables:
SYMBOL
| id | symbol |
(INT(primary) - varchar)
PRICE
| id | id_symbol | date | price |
(INT(primary), INT(index), date, double)
I have to pass two symbols to get something like:
DATE A B
2001-01-01 | 100.25 | 25.26
2001-01-02 | 100.23 | 25.25
2001-01-03 | 100.24 | 25.24
2001-01-04 | 100.25 | 25.26
2001-01-05 | 100.26 | 25.28
2001-01-06 | 100.27 | 30.29
Where A and B are the symbols i need to search and the date is the date of the prices. (because i need the same date to compare symbol)
If one symbol doesn't have a date that has the other I have to jump it. I only need to retrive the last N prices of those symbols.
ORDER: from the earliest date to latest (example the last 100 prices of both)
How could I implement this query?
Thank you
Implementing these steps should bring you the desired result:
Get dates and prices for symbol A. (Inner join PRICE with SYMBOL to obtain the necessary rows.)
Similarly get dates and prices for symbol B.
Inner join the two result sets on the date column and pull the price from the first result set as the A column and the other one as B.
This should be simple if you know how to join tables.
I think you should update your question to resolve any of the mistakes you made in representing your data. I'm having a hard time following the details. However, I think based on what I am seeing there are four MySQL concepts you need to solve your problem.
The first is JOINS you would use a join to put two tables together so you may select related data using the key that you describe as "id_symbol"
The second would be to use LIMIT which will allow you to specify the number of records to return such as that if you wanted one record you would use the keywould LIMIT 1 or if you wanted a hundred records LIMIT 100
The third would be to use a WHERE clause to allow you to search for a specific value in one of your fields from the table you are querying.
The last is the ORDER BY which will allow you to specify a field to sort your returned records and the direction you want them sorted ASC or DESC
An Example:
SELECT *
FROM table1
JOIN table2 ON table1.id = table2.table1_id
WHERE table1.searchfield = 'search string'
LIMIT 100
ORDER BY table1.orderfield DESC
(This is pseudo code so this query may not actually work but is close and should provide you with the correct idea.)
I suggest referencing the MySQL documentation found here it should provide everything you need to keep going.
I want to write a query that, for any given start date in the past, has as each row a week-long date interval up to the present.
For instance, given the start date of Nov 13th 2010, and the present date of 12-16-2010, I want a result set like
+------------+------------+
| Start | End |
+------------+------------+
| 2010-11-15 | 2010-11-21 |
+------------+------------+
| 2010-11-22 | 2010-11-28 |
+------------+------------+
| 2010-11-29 | 2010-12-05 |
+------------+------------+
| 2010-12-06 | 2010-12-12 |
+------------+------------+
It doesn't go past 12 because the week-long period that the present date occurs in isn't complete.
I can't get a foothold on how I would even start to write this query.
Can I do this in a single query? Or should I use code for looping, and do multiple queries?
It's quite difficult (but not impossible) to create such a result set dynamically in MySQL as it doesn't yet support any of recursive CTEs, CONNECT BY, or generate_series that I would use to do this in other databases.
Here's an alternative approach you can use.
Create and prepopulate a table containing all the possible rows from some date far in the past to some date far in the future. Then you can easily generate the result you need by querying this table with a WHERE clause, using an index to make the query efficient.
The drawbacks of this approach are quite obvious:
It takes up storage space unnecessarily.
If you query outside of the range that you populated your table with you won't get any results, which means that you will either have to populate the table with enough dates to last the lifetime of your application or else you need a script to add more dates every so often.
See also this related question:
How do I make a row generator in MySQL
Beware this is just a concept idea: I do not have a mysql installation right here, so that I cannot test it.
However I would base myself on a table containing the integers, in order to emulate a series.
Something like :
CREATE TABLE integers_table
(
id integer primary key
);
Followed by (warning, this is pseudo code)
INSERT INTO integers_table(0…32767);
(that should be enough weeks for the rest of our lives :-)
Then
FirstMondayInUnixTimeStamp_Utc= 3600 * 24 * 4
SecondPerday=3600 * 24
(since 1 jan 1970 was a thursday. Beware I did not cross check! I might be off a few hours!)
And then
CREATE VIEW weeks
AS
SELECT integers_table.id AS week_id,
FROM_UNIXTIME(FirstMondayInUnixTimeStamp_Utc + week_id * SecondPerDay * 7) as week_start
FROM_UNIXTIME(FirstMondayInUnixTimeStamp_Utc + week_id * SecondPerDay * 7 + SecondPerDay * 6) as week_end;