Time as an independant dimension outside the dataset in Quicksight - data-analysis

I am using the KPI visual in quicksight to show the change in a calculated field from one month to the next.
My data is transactional data. Each record in the db includes a "transaction date" and the total dollar amount for the transaction. In the month of December 2021, we recorded no transactions in the database. I am using the transaction date field from this database to drive time based aggregations of the total number of tranasctions and the total value of transactions over a period. I'm also using that date field to drive this KPI visual.
Problem is - my data doesn't have anything for decemeber, so that month doesn't exist in the aggregated result and therefore doesn't show up in my KPI visual.
I was able to get the gap in the timeline to show up with a column chart over time, but not the KPI.
Is there a data analysis concept I don't know about here where time is handled as an independant dimension outside of the dataset?

Usually you would have a date dimension table that contains dates (with the granularity you need for your dashboards).
You would then LEFT JOIN this date dimension table to your table containing the transactional data on the date field. This will make sure that all your transactions are included and if there are not transactions for some particular date, then the left join still includes a row for that date but without any of the fields from your transaction table being populated.

Related

Which data model to use to maintain every day historical data for a customer

I need to maintain everyday closing balance of a customer and plot a line graph based on that balance for the last 365 days. Which data model is preferred to maintain this data ?
MySQL, Cassandra or any other databases ?
The obvious solution would be to have a table with a key [client id, data] and the value being closing balance.
The question is how do you fill that data in? You could have a running job at the day end. The big question is how to make the system reliable? If job fails and is restarted next day, will that provide data for the previous day?
Typical way of addressing this type of problem is using "event sourcing". This is a fancy way of saying that it needs to be a storage of records for every operation executed on a balance. Every add/deduct should be there, including client id and date. These records also may have "resulting balance" - which implies that last operation for a customer in a day has the closing balance as well.
At the end, you will have list of transactions and you will be able to "replay" previous event to get balances. It's your choice if you want to have actual table for daily balances per client - as it may be cheaper to look up that data instead of recalculating it every time.
In banking industry, every transaction is stored as a separate record for this specific reason - to be able to get different reports; incl. closing balances per day.

Is there a way to know the number of records added into the SQL database after a particular date and time

The table doesn't have any date time column. I want to if there is any inbuilt keyword which can does that.
I want to know all commits done after a particular date.
If flashback is enabled on the database you can get records on the table in an around a particular date range in Oracle.(It purely depends on if its enabled and for how long the flashback needs to be kept)
You can query to see the data in the table as of 3 days back as follows
select *
from table as of timestamp sysdate-3

How do I store data pertaining to a month or year in Mysql

I receive csv files at the end of each month from my customer for each of their KPI (for example csv's for resumes received, candidates joined, candidates resigned, sales, profits, loss , etc) for that specific month.
I want to be able to query this data inorder to generate reports for any month, day or year. This report will be generated dynamically i.e the admin would specify what rows he would like to have in a report (for eg a report with applications received, applications shortlisted, candidates shortlisted after the 1st interview for the period of jan to july.) for any period of time.
What would be the best way to store the data into my database in order to generate such reports? I am using Mysql as my database.
I am not sure if I would need to flush out the old data from my tables currently. So considering that I keep all the data persistent, what would be the best suited database design for this?
Currently what I do is I have a table for each of their KPI. This table has got a date field which I am using to generate the report. But I am looking for a more optimized way.
Thanks in advance.
It is better to store those values (month or year related values ) in a "Date" type fields which would not need any other manipulation while building reports. The conditions or logic for the specific period of time should be handled in your front end. In this case, the usage of Date field is the optimized way.

MS-Access: updating query for each new linked table each week

First - thanks for your time. My issue lies in my new usage of Access for tracking values from weekly excel reports. Each week I'm given a new excel file with updated values for about 50 employees. These values generally track their performance over 6 different metrics. I've begun to link these excels into an access database to keep and track that data each week. These linked tables are given the name convention of the date that the data is as of - example: 05-05; 05-12; 05-19, 05-26; etc.
My question is - is there a way to build a query to track the change (difference in values) from last week to this week (05-19 to 05-26), automatically? And also taking into account future additions of linked tables so that I don't have to add a piece to the query each week?
In addition, I'm looking to track overall change - first table 05-05 to the most recent linked table (which ever date that's true for, whether it's the end of July, or the end of the year).
Based on these 2 results, I'd eventually build out the query to show every week with their value and in the next column the week over week change (up down or neutral)

query to aggregat data by day to generate charts - rows unkown

I started building a search engine monitor. I'm pulling data from the google rest api into a mysql database with the following fields: date, search-keyword, domain, url, position.
Now I got into trouble querying and outputting the data for charting. The results go up and down, new results from google come into the list which haven't been there on the first day. However for charting I have to assign the first days at least blank values to output a chart.
What I do right now: First I select every domain showing up in the period. Lets say the for the keyword searchengine I get the domains wikipedia.org, ixquick.com, yahoo.com, searchenginewatch.com When I make another request for ever domain to query an array of rankings grouped by day. leading to the ...
Problem: Is where any query (mysql/nosql) which returns for each day an average and if where is no row a default value e.g. blank?
Result should look like:
dates={01/01/2014,02,03,04,05,06,07,08,...,31}
wikipedie={1,1,1,1,1,1,1,1,...,1}
yahoo = {"","",7,5,3,3,3,...,3}
You can create a date table, select the date range you'd like, and outer join your data to it, filling in 0s for values that do not exist for a given term/date.
Edit:
Some more details.
1) Create a table that has a row for every date +- 10 years (or whatever is appropriate). You can make this one column if you'd like, or many columns (date, month, year, etc.). The second approach makes this extensible if you want to summarize by various rollups in the future.
2) Outer join your table to the date table and use a NVL statement to coerce any null averages to 0.
3) Profit!
If your results are grouped by date, how can MySQL know there's (for example) 31 days in that month?
On the other hand, you can somehow fill the holes in PHP by loop through the array and fill a zero if the value does not exist.