Does the Socrata SODA API support getting a list of dates on which the dataset was modified? - socrata

Does the Socrata SODA API support a method to query out all the dates a dataset has been updated? Basically a changelog for the dataset that has an object for every modification/update to a dataset.
There is an existing question that asks for the last modified date (you can get it through the "/data.json API available on all Socrata-powered sites".
There is also a method to get the modified dates of individual rows using System Fields and the :update_at field. But this is incomplete, a data provider might update every row each time. This means there is no guarantee that we are really getting back a history of modifications, just the top layer of modification on each row.
I'm looking for the complete list of modification dates, at least. We are trying to get a sense of activity on datasets and we need to know how often they are being updated.

Unfortunately, Max, we don't offer what you're looking for. We've got the last time the dataset and metadata were modified, but not a changelog of every single time that there was a change.
A surprisingly large number of datasets change very frequently, as often as every 5 minutes.

Related

MySQL Storing Reports

I am looking for a way to store auto-generated reports. There are about 10-15 columns and 100-3000 rows depending on the report but each report is consistent in column count.
I am looking for a way to organise and store these reports into a large group without creating an entire new database and 1000s of tables to store each indervidual report.
The reports need to be queryable so they can be subdivided by team/area/person etc as each report can be a combination of 3-4 different sub-reports depending on how you split/sort the data.
I am using Python to collect and sort the data from the database so using MariaDB/MySQL would be preferred but im happy to use something else if there is a pre-exising connection libary for it.
To sum up i need something similar to a excel spreadsheet with each table being a sheet and sheet name being the date it was generated so i can select by the date generated.
Think through the goals.
Is this a legal issue -- you need to produce an unalterable report as something "official". A la a non-editable .pdf?
(at the opposite extreme) Be able to generate (or regenerate) any report for any timeframe.
Is performance an issue? (Either perceived or real)
I like to build and maintain Summary Table(s) for any "Data Warehouse" application. And build "reports" that take as a parameter a date range and a small number of other things. And have the report generation so fast that it does not matter if multiple people are pulling reports at random times.
15 columns and 3000 rows is usually excessive. If pulling a report is trivial enough, it can be less 'massive'; just get the parts you want, without such bulk.
http://mysql.rjweb.org/doc.php/summarytables

Firestore Cloud Function Trigger only when path updates with new entire

I have a firestore database that looks like this
/entries/ ....
/users/{userid}...
a bunch of documents is being sent into ... of entries and userid contains on 8 docs of user profile information.
my problem is that the entries doc contains field hours and no relation to the user doc which contains the field weekly_capacity
I need to aggregate this the two fields hours/weekly capacity setting them to Full-time equivalency variable
But the Full-time equivalency needs to be accurate and this company FTE can change so it would need to calculate the FTE over various date even if the user changed their FTE status x number of times.
And the current app only fetched the entries when the user logins into the app, which can be whenever.
None of the API requests that I am using will give me a json that holds both weekly_capacity and hours on the same fetch. If every time a user logs into the app firestore calls the http to fetch all entries then how can I compare the hours field on the collection's entries to the weekly_capacity field
Just a little context: FTE = Full-time equivalency and is used to measure as a standard to see if an employee compares to there core commit hours they signed up for which is 40. SO if I agreed to work 40 and I work actually work 40 hours then I would be 1 whole FTE. If I worked 20 and I suppose to work 40 I am .5 FTE. The math is really simple it's just that in my situation the variable FTE can change any time and the app will allow the user to enter a range of dates fetching the total actual hours they worked and FTE letting them know how many hours they were supposed to work vs how many hours they actually worked. Since the variable changes, I need some way in firestore to track the change and aggregate correctly against the hours actually worked. To give an error example: let's say I changed my FTE from 1 to .7 on March 20th, I then want to generate a report of March 1 to March 30th stating my hours worked and FTE status meaning did I reach my goal. The kicker is that I can't fetch or merge the entries which hold the var hours and /users/ which hold the var weekly_capacity.
I don't even think a cloud function would solve the problem since entries are only fetched when the user logins in right?
I'm assuming the following for answering your question.
Requirement: To calculate FTE for a user when user's weekly_capacity is updated or user logs in.
Problems:
Some way in firestore to track the change.
Calculate FTE correctly according to the change.
Here's what I think will solve the problems.
Google Cloud Firestore supports listeners for the collections in which you store the data. So you can listen for any change in users collection and entries collection. This is how you can track the change.
To calculate FTE, when a change is made to weekly_capacity of user document or a new entry is made to entries collection you need to query both collections separately to get the records corresponding to the user affected. You can also use a collections-group query for this purpose but that depends on your database design.
Hope that helps.

MySQL: Dates as column headers

I'm trying to develop a new reporting module for a resource management tool (PHP+Mysql).
I am trying to extract data in the following format from mysql:
I have a table that consists of date and location of multiple people(i.e Office, Home or Client).
Sample Data as in DB.
here date_plotted means the date at which the user is engaged and plotting_date represents when this particular entry was made in the system(the date). So User was plotted to be in office on 30th Oct and the same entry was made on 30th Oct.
Data as in resource table
The resource table represents the user table.
Any suggestions on how to do the same in mysql?
These are the primary tables which needs to be used.
The above table id done in excel for now to represent the outcome.
I'm new to SQL so haven't tried anything yet.
There is a tool for Windows that might simplify this operation. It's made by MySQL and called MySQL for Excel. In theory it should allow you to structure and make changes to MySQL databases as well as perform queries that result in spreadsheets.
Without knowing more about your data, for example being supplied an actual csv file to work with, and the parameters of the actual pull, whether it's fix dates always or if this is a dynamic pull based on a range this question could result in 100 different implementations that visually return similar results, but have massively different requirements overhead-wise in implementation.

How can you generate a report for newly created groups/DLs on a weekly basis?

I am an admin for a Google Apps for Business domain and we want to be able to run a report to tell us what groups have been created in the last week. There is no such "Date Created" column for the groups. The best I have been able to do so far is run a list of the groups on a weekly basis but I want to be able to automate comparing that to the list from the week before.
You might as well store the list you have goten in a 'permanent storage' - a spreadsheet, ScriptDB or script Properties - and proceed to a comparison every week to see if something has been added (or removed)... This is maybe less straightforward and elegant but might be simpler to get working.
The weekly triggered function could do this :
get the list of names
sort it
write the sorted list to spreadsheet
retrieve the sorted list from last week by reading the preceding row in spreadsheet
compare both sorted lists at array level
and send yourself a mail with the difference.(eventually write the log to the spreadsheet)
This is certainly possible but requires a bit of coding.
You'll have to use the Audit API for this. See this response for some starter code on how to make basic calls to the API. The one tricky part is to set up OAuth 2 but its very possible after that.
Once you have the setup working you can then add additional startTime and endTime parameters to define your week interval along with the CREATE_GROUP event filter in the URL.

Slowly Changing Dimensions in SSAS and SSRS

I have a project where establishments are inspected anything from once every 6 months to once every 3 years and the results of the inspection scorecard are recorded as a record in a type 2 slowly changing dimension table [tblInspections], using StartDate and EndDate to cover the period between inspections for which this scorecard is valid. The inspections table is linked to [tblEstablishments] which contains other details about other fixed dimensions such as location and business type.
So currently, we are providing aggregated reports of current situation (where EndDate is null) and also audit reports for the history of any one establishment (On EstablishmentID)
My next task is to provide more detailed analysis reports of trends of the scorecard results and I need to provide historical aggregated results of the situation on the last day of each month.
My problem is that despite knowing exactly what I want, I am now unsure how to get there.
1) Do I start by writing ETL process to build a cube based on all the historical results working out what all the aggregates would have been at the end of each month?
2) Am I then able to just process the current records at the end of each month effectively add a new slice onto the end of an existing cube without reprocessing from scratch? (if so how?)
3) Is there another way of doing this? Does Analysis Services have better ways of dealing with SCDs automatically when determining historical status at any point in time by selecting the correct record from multiple records with start and end date?
Any advice and pointers to tutorials related to this would be much appreciated.
First I think you are going to want to build a new periodic (monthly) snapshot fact table if you are trying to analyze the inspection results across establishments (and other dimensions, like time/date). Then you can build the ETL process to populate this new fact table. Finally, you can model the fact table as a new measure group in a new or existing cube...be sure to pay attention to the aggregation property of the measures in this new measure group...typically you don't want to sum periodic snapshot measures (think about what happens if you sum your bank account balance at the end of each month and look at it by year).
Yes, you will run your ETL at the end of each month which will had more rows to your periodic (monthly) snapshot fact table. Then you can just process the cube and you are all set.
Analysis Services handles SCD2 Dimensions quite well (assuming you are using Surrogate Keys...you are aren't you?). I think the business process that you are trying to model (Inspections)...is what is causing some confusion because it's no longer a dimension in this new analysis, it has become a fact (a periodic snapshot fact)