I have a complex(?) SQL query I am needing to build. We have an application that captures a simple data set for multiple clients:
ClientID | AttributeName | AttributeValue | TimeReceived
----------------------------------------------------------------
002 | att1 | 123.98 | 23:02:00 02-03-20017
----------------------------------------------------------------
003 | att2 | 987.2 | 23:02:00 02-03-20017
I need to be able to return a single record per client that looks something like this
Attribute | Hour_1 | Hour_2 | Hour_x |
--------------------------------------
att1 120.67 |
--------------------------------------
att2 | 10 | 89.3 |
The hours are to be determined by a time provided to the query. If the time was 11:00 on 02-03-20017, then hour 1 would be from 10-11 on 02-03-20017, and hour 2 from 9-10 on 02-03-20017. Attributes will be allocated to these hourly buckets based on the hour/date in their time stamp (not all buckets will have data). There will be a limit on the number of hours allocated in a single query. In summary, there are possibly 200-300 attributes and hourly blocks of up to 172 hours. To be honest I am not really sure where to start to build a query like this. Any guidance appreciated.
Related
I have a job table, where each job has some metrics like cost, time taken, etc. I'd like to select information for a set of jobs, like the requestor and job action, and in addition to that row data, select some high-level metrics (min cost, max cost, min time taken, etc.).
The data changes frequently, so I'd like to get this information in a single select. Is it possible to do this? I'm not sure if this is conceptually possible because the DB would have to return row-level data along with aggregate data.
Right now I can get all the details and calculate the min/max, something like this:
select requestor, action, cost, time_taken from job;
But then I have to write code to find the min/max and this query has to download all the cost/time data when I am really only interested in the min/max. I really want to do something like
select (min(cost), max(cost), min(time_taken), max(time_taken)), (requestor, action) from job;
And get the aggregate data first, and then the row level data. Is this possible? (On a real server this is on MySQL, but for dev I locally use sqlite so it'd be nice if it worked there too, but not required).
The table looks something like this:
+----+-----------+--------+------+------------+
| id | requestor | action | cost | time_taken |
+----+-----------+--------+------+------------+
| 1 | 31233 | sync | 8 | 423.3 |
+----+-----------+--------+------+------------+
| 2 | 11229 | read | 1 | 1.3 |
+----+-----------+--------+------+------------+
| 3 | 1434 | edit | 5 | 152.8 |
+----+-----------+--------+------+------------+
| 4 | 101781 | sync | 12 | 712.1 |
+----+-----------+--------+------+------------+
I'd like to get back the stats:
min/max cost: 1/12
min/max time_taken: 1.3/712.1
and all the requestors and actions:
+-----------+--------+
| requestor | action |
+-----------+--------+
| 31233 | sync |
+-----------+--------+
| 11229 | read |
+-----------+--------+
| 1434 | edit |
+-----------+--------+
| 101781 | sync |
+-----------+--------+
Do you just want aggregation?
select requestor, action, min(cost), max(cost), min(time_taken), max(time_taken),
from job
group by requestor, action;
I have a data set with this structure:
ContractNumber | MonthlyPayment | Duration | StartDate | EndDate
One contract number can occur many times as this data set is a consolidation of different reports with the same structure.
Now I want to filter / find the contract numbers in which MonthlyPayment and/or Duration and/or StartDate and/or EndDate differ.
Example (note that Contract Number is not a Primary key):
ContractNumber | MonthlyPayment | Duration | StartDate | EndDate
001 | 500 | 12 | 01.01.2015 | 31.12.2015
001 | 500 | 12 | 01.01.2015 | 31.12.2015
001 | 500 | 12 | 01.01.2015 | 31.12.2015
002 | 1500 | 24 | 01.01.2014 | 31.12.2017
002 | 1500 | 24 | 01.01.2014 | 31.12.2017
002 | 1500 | 24 | 01.01.2014 | 31.12.2018
With this sample data set, I would need to retrieve 002 with a specific query. 001 is the the same and does not Change, but 002 changes over time.
Besides of writing a VBA script running over an Excel, I don't have any solid idea on how to solve this with SQL
My first idea would be a SQL Approach with grouping, where same values are grouped together, but not the different ones. I am currently experimenting on this one. My attempt is currently:
1.) Have the usual table
2.) Create a second table / query with this structure:
ContractNumber | AVG(MonthlyPayment) | AVG(Duration) | AVG(StartDate) | AVG(EndDate)
Which I created with Grouping.
E.G.
Table 1.)
ContractNumber | MonthlyPayment
1 | 10
1 | 10
1 | 20
2 | 300
2 | 300
2 | 300
Table 2.)
ContractNumber | AVG(MonthlyPayment)
1 | 13.3
2 | 300
3) Now I want to find the distinct contract number where - in this example only the MonthlyPayment - does not equal to the average (it should be the same - otherwise we have a variation which I need to find).
Do you have any idea how I could solve this? I would otherwise start writing a VBA or Python script. I have the data set in CSV, so for now I could also do it with MySQL, Power Bi or Excel.
I need to perform this Analysis once, so I would not Need a full approach, so the queries can be splitted into different steps.
Very appreciated! Thank you very much.
To find all contract numbers with differences, use:
select ContractNumber
from
(
select distinct ContractNumber, MonthlyPayment , Duration , StartDate , EndDate
from MyTable
) x
group by ContractNumber
having count(*) >1
I have project like online service, i have made some part and stopped. If user use service it must take some amount (e.g. 5$ per service). I don't know how to build MySQL tables. I have made 2 tables 1st for rest amount 2nd for add and subtract amounts. May be this is wrong way, what is the best practice?
action_table
id | userId | reason | amount
1 | 4 | for service 3 | -5
2 | 2 | refill account | 100
3 | 13 | for service 3 | -5
balance_table
1 | 4 | 23
2 | 2 | 125
3 | 13 | 0
After using service query adds one row to action_table and updates balance_table
Personally, if I was making an account database, I would have one table for an account and one for transactions, like this:
Accounts:
| id | user | name | balance |
Transactions:
| id | account_id | description | amount | is_withdrawal |
The reason I came up with this is because it helps to think of database tables like real world objects sometimes, and in this case you have a Transaction and an Account.
Then, you can use a TRIGGER to update the account table anytime you add a transaction.
I have this existing schema where a "schedule" table looks like this (very simplified).
CREATE TABLE schedule (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(45),
start_date date,
availability int(3),
PRIMARY KEY (id)
);
For each person it specifies a start date and percentage of work time available to spent on this project. That availability percentage implicitly continues until a newer value is specified.
For example take a project that lasts from 2012-02-27 to 2012-03-02:
id | name | start_date | availability
-------------------------------------
1 | Tom | 2012-02-27 | 100
2 | Tom | 2012-02-29 | 50
3 | Ben | 2012-03-01 | 80
So Tom starts on Feb., 27nd, full time, until Feb, 29th, from which on he'll be available only with 50% of his work time.
Ben only starts on March, 1st and only with 80% of his time.
Now the goal is to "normalize" this sparse data, so that there is a result row for each person for each day with the availability coming from the last specified day:
name | start_date | availability
--------------------------------
Tom | 2012-02-27 | 100
Tom | 2012-02-28 | 100
Tom | 2012-02-29 | 50
Tom | 2012-03-01 | 50
Tom | 2012-03-02 | 50
Ben | 2012-02-27 | 0
Ben | 2012-02-28 | 0
Ben | 2012-02-29 | 0
Ben | 2012-03-01 | 80
Ben | 2012-03-02 | 80
Think a chart showing the availability of each person over time, or calculating the "resource" values in a burndown diagram.
I can easily do this with procedural code in the app layer, but would prefer a nicer, faster solution.
To make this remotely effective, I recommend creating a calendar table. One that contains each and every date of interest. You then use that as a template on which to join your data.
Equally, things improve further if you have person table to act as the template for the name dimension of your results.
You can then use a correlated sub-query in your join, to pick which record in Schedule matches the calendar, person template you have created.
SELECT
*
FROM
calendar
CROSS JOIN
person
LEFT JOIN
schedule
ON schedule.name = person.name
AND schedule.start_date = (SELECT MAX(start_date)
FROM schedule
WHERE name = person.name
AND start_date <= calendar.date)
WHERE
calendar.date >= <yourStartDate>
AND calendar.date <= <yourEndDate>
etc
Often, however, it is more efficient to deal with it in one of two other ways...
Don't allow gaps in the data in the first place. Have a nightly batch process, or some other business logic that ensures all relevant dat apoints are populated.
Or deal with it in your client. Return each dimension in you report (data, and name) as seperate data sets to act as your templates, and then return the data as your final data set. Your client can itterate over the data and fill in the blanks as appropriate. It's more code, but can actually use less resource overall than trying to fill-the-gaps with SQL.
(If your client side code does this slowly, post another question examining that code. Provided that the data is sorted, this is acutally quite quick to do in most languages.)
Objective: Convert an overgrown Excel sheet into an Access database, but maintain a front-end that is familiar and easy to use.
There are several aspects to this, but the one I'm stuck on is one of the input forms. I'm not going to clutter this question with the back-end implementation that I have already tried because I'm open to changing it. Currently, an Excel spreadsheet is used to input employee hour allocations to various tasks. It looks something like the following.
Employee | Task | 10/03/10 | 10/10/10 | 10/17/10 | 10/24/10 | ... | 12/26/11
---------------------------------------------------------------------------------
Doe, John | Code | 16 | 16 | 20 | 20 | ... | 40
---------------------------------------------------------------------------------
Smith, Jane | Code | 32 | 32 | 16 | 32 | ... | 32
---------------------------------------------------------------------------------
Doe, John | Test | 24 | 24 | 20 | 20 | ... | 0
---------------------------------------------------------------------------------
Smith, Jane | Test | 0 | 0 | 16 | 0 | ... | 0
---------------------------------------------------------------------------------
Smith, Jane | QA | 8 | 8 | 8 | 8 | ... | 8
---------------------------------------------------------------------------------
TOTAL | 80 | 80 | 80 | 80 | ... | 80
Note that there are fifteen months of data on the sheet and that employee allocations are entered for each week of those fifteen months. Currently, at the end of the fifteen months, a new sheet is created, but the database should maintain this data for historical purposes.
Does anyone have any ideas on how to create an editable form/datasheet that has the same look and feel? If not, how about an alternative solution that still provides the user a quick glance at all fifteen months and allows easy editing of the data? What would the back-end tables look like for your proposed solution?
This is a classic de-normalization problem.
To produce an editable spread-sheet like view of your database you'll need a table with 66 columns (the two identifying columns and 64 weekly integer columns). The question is whether you want the permanent storage of the data to use this table, or to use a normalized table with four columns (the two identifiers, the week-starting date, and the integer hours value).
I would give serious consideration to storing the data in the normalized form, then converting (using a temporary table) into the denormalized form, allowing the user to print/edit the data, and then converting back to normal form.
Using this technique you get the following benefits:
The ability to support rolling windows into the data (with 66 columns, you will see a specified 15 month period, and no other). With a rolling window you can show them last month and the next 14 months, or whatever.
It will be substantially easier to do things like check peoples total hours per month, or compare the hours spent in testing vs QA for an arbitrary range of dates.
Of course, you have to write the code to translate between normal and denormal form, but that should be pretty straightforward.