I wish to count the number of distinct visitors to a store each half an hour.
I know I will need:
the temporal context (half an hour)
What kind of EPA should I create to count distinct visitor ids?
I suppose I will need an internal segmentation by visitorId?
Here you go:
The EPA should be of type Aggregate.
Yes, you need internal segmentation context by visitorId.
See the documentation for a sample which demonstrates how to perform count:
https://github.com/ishkin/Proton/tree/master/documentation/sample/fraud
A pdf doc included.
Hope this help.
If you want to count how many times each user entered the store, then you can use Aggregate EPA with segmentation context on the visitorId as suggested by #urishani.
If you want to count how many distinct visitors entered the store (assuming you can get multiple input events for each visitor), then you need to:
Aggregate all the input events related to the same visitor that arrived during a half hour to one event, lets call it distinctVisitor
Count how many distinctVisitor events arrived in the previous half hour
To implement (1) I would use an Aggregate type EPA with segmentation context on the visitorId, and an half hour sliding window temporal context. It will have condition that the count over the input events is at least 1, its evaluation policy will be deferred (at the end of the half hour), and it will derive the distinctVisitor event.
To implement (2) I would use an Aggregate type EPA, that will take the distinctVistor as input event, will have sliding window temporal event that will be open every half hour for a short time (lets say 30 second), it will count the number of distinctVisistor events, and its evaluation policy will be deferred. It will not use a segmentation context, since you want to count events of all visitors.
Related
I'm tracking some events in my app. However, I only can see the count of "action" events in the last 30 minutes. I would like to see the count of those events in a given period of time.
If you're talking about the "view" parameter name for your action event, you would need to logged a custom dimension for them to appear on your event reporting. Once you logged them on the Custom Definitions page, a data card for each parameter you logged is added to the related event-detail report. However, note that it may take up to 24 hours for the data card to appear. During this 24-hour period, you may see (not set) appear as a parameter value. Once that initial 24-hour period has passed, you will see the expected parameter values from that point forward. You may also check this documentation regarding custom dimensions and metrics for your reference.
I am creating a booking management system in which it is allowed to create recurrent events.
Searching around, I understood that creating "repeating patters" would be an optimal idea for the DB design, as explained here: Calendar Recurring/Repeating Events - Best Storage Method
My issue comes from the fact that I would need to add some data for each single event, such as if payments have been made for each single event, confirmation, notes, etc.
This would end in creating a different table with a single row for each event created. In other words, physically adding a row for each event instead of using "recurrent patterns".
I can't see a solution for avoiding 1 line in the DB for 1 event. Any suggestion? In my system, each user would not have many events, let's say a maximum of 50 events per week.
Let's assume you have a table already to store recurrent events, such as
table recurrent_event
id bigint
start date
interval int -- as simple or complex as needed
Now your application will need some logic anyway to calculate which single events will come from this, e.g. to display a list of single events. I would not store a list of all these singles events in the database, as, initially, this list wouldn't add any useful information. Also, the list would have to end somewhere, so it might fail to encompass all single events coming from the recurrent event. The only need to insert a record for a single event arises when some additional information for the event gets actually entered. Just for these single events, I'd create a
table single_event_additional_info
id bigint
recurrent_event_id bigint
single_event_date date
additional-information ... whatever datatype fitting
that points back to the recurring event. So, when treating a recurring event, selecting from this table all single events referring to it will yield all information relevant for the recurring event. The rest of single events will still be determined by calculation.
I'm writing a script that will, given parameters for start/end datetimes, find the first mutually available timeslot on two different calendars in order to book a meeting.
The script I have so far is running really slowly. I'm guessing this is because I'm looping through both calendars in 30 minute increments and running CalendarApp.getCalendarById(email).getEvents each time to see if there's a free 30-minute timeslot.
I've thought about running a batch operation using .getEvents() once to minimize the number of reads but I get stuck here because the result is an array with busy timeslots, whereas I'm trying to find free timeslots.
Is there a better way to approach this to make my script run faster?
Find Open Time Slots
I've done something similar to this recently and what I did was to create and object similar to something like this:
var timeSlotsObj={"8:00-8:30":0,"8:30-9:00":0,"9:00-9:30":0,"9:30-10:00":0,...."7:30-8:00":0,slotA["8:00-8:30","8:30-9:00",...]}
Then I went through each calendar and incremented the timeslot value for each event on a given day which overlapped that time slot in each calendar. After that I took the slotsA array and looped through it looking for any timeSlot that still had 0 in it.
The loop looks like the following:
for(var i=0;i<timeSlotObj.slotA.length;i++){
if(timeSlotsObj[timeSlotsObj.slotA[i]]==0){
//You just found an empty time slot and it's value is timeSlotsObj.slotA[i]
}
Any object property that still 0 as it's value is an open time slot for the given set of calendars on any given day.
Javascript Object Reference
In my case I actually used Date Objects as the Object Properties or Keys but the idea is the same. Whatever slots have no events in them are free slots.
I am stuck with a problem. In an app's db, I am having a schedule table which will store user provided schedules. E.g
Daily
Every Week
Twice a Week
Every 3rd (or any user chosen) day of week
Every Month
Twice a month
Every x day of month
Every x month of year
And so on. These schedules will then provide reference point to schedule different tasks or identify their repeat-ance.
I am not able to think of a proper database structure for it. The best I can get is to have a table with following columns:
Day
Week
Month
Year
type
Then store the specified schedule in the related column and provide the type.
e.g Every week can go like 1 in week column and 1 (designated value for repeating whole) or something like that.
The problem with this approach is that this table is gonna be used very frequently and the data retrieved will not be straightforward. It will need calculation to know the schedule type and hence will require complex db queries to get each type of schedule.
I am implementing it in Laravel app if that can provide any other methodology. It's a SAAS app with huge amount of data related to the schedule table.
Any help will be very much appreciated. Thanks
I suggest you are approaching the problem backwards.
Devise several rules. Code the rules in your app, not in SQL. When inserting an event, pre-fill a calendar through the next 12 months with all occurrences of the event. Every month, go through all events and extend the "pre-fill" through another month (13 months hence).
Now the SELECTs are simple and fast.
SELECT ... WHERE date = '...'
has all the events for that day (assuming it is within 12 months).
The complexity is on inserting. But presumably you insert less often than you select.
The table with the event definitions would be only as complex as needed for your app to figure out what to do. Perhaps
start_date DATE,
frequency ENUM('day', 'week', 'month', ...)
multiplier TINYINT, -- this lets you say "every second week"
offset TINYINT, -- to get "15th of every month"
Twice a week would be two entries.
Better yet, there are several packages (in Perl, shell, etc) that provide a very rich language for expressing event-date-patterns. Furthermore, you may be able to simply 'call' it to do all the work for you!
I am running a dataset from one SQL database into another, while summarizing it. The starting database has over 3 billion records covering several years of readings from a couple thousand utilities meters. I am running it through an SSIS package. First I run the data through a script component that rounds off the timestamp to the nearest hour and converts it from UTC to local time. Then an aggregate transform that groups the data by meter, reading type, and hour, giving me the min and max of the reading value.
I would like to subtract the max of the previous hour from the current hour to get a delta. In SQL, I could use the LAG function. I am thinking I could do a script component in SSIS that keeps the last meter, reading type, timestamp and max(value). Then, if we are still on the same meter, I can get the delta.
But, is there a cleaner way? Is there a transform that keeps the previous row, and lets me access it and operate on the values of both the current row and previous row?
no there is no lag() in ssis, but you can implement it by using self join
sorting
create a index (e.g. 1,2,3,4,5)
multicast
leftside (1,2,3,4,5), rightside (2,3,4,5)
self join
(leftside1 - rightside2, leftside2 - rightside3 ... leftside4 - rigthside5)
rightside is the 'next' of leftside