Fiware: subscription duration - fiware

what are the possiblities for the subscription duration?
I saw that the default one is "PT24H".
what is the maximum period that I can get, and can it be unlimitted for example?
Thanks and best regards.

In NGSIv1 (i.e. POST /v1/subscribeContext operation) duration is mandatory. If you don't use an explicit value, then PT24H (24 hours) is used by default, as you mention. You cannot set "unlimited" explicitly, but something absurdly large (such as P100Y, i.e. one hundred years) would achieve the same effect from a practical point of view.
In NGSIv2 (i.e. POST /v2/subscriptions operation) expires (which value is a date) is used instead of duration. In this case, you can create subscription without expires.

Related

Function for getting today's date?

I am creating a 'choice' inside a contract template that requires checking today's date. My DAML code is as follows:
controller dealer can
Add_Car : CarId
with
startCoverage: Date
do
-- Check for a legal start date
assert (
startCoverage > *today* --should check that its not before today
)
create this with date_vehicle_added = startCoverage
What is the name of the function I can use to get the current date? It needs to go where it says "*today*".
tldr: Use getTime : Update Time and toDateUTC : Time -> Date,
but be aware of the pitfalls. Preferably, use the Notarized Date
pattern if this is possible.
Modeling Date/Time Implicitly
Modeling date and time is always a subtle problem, doubly so when
dealing with a deterministic, distributed system such as a digital
ledger. DAML provides a primitive function getTime which will return
the "Ledger Effective Time" (LET), which the ledger model guarantees will be
a monotonically increasing time value (in milliseconds) that is
constrained to within a ledger defined delta of wall-clock UTC time.
This can be converted to a UTC Date using the toDateUTC function in
DA.Date. This is the straight-line answer to your question, but has
a couple of caveats.
This Time and Date are in UTC, you will have to explicitly model
how this corresponds to local-time. As DAML is a deterministic,
distributed system, there is no local time as any given transaction
must be deterministically executed across multiple timezones.
Gratuitous use of Date/Time comparisons can lead to contracts being
implicitly stalled due to the passage of time. If a choice is guarded
by a check against wall-clock time, this can mean that any delay in
your application processing an exercise can result in the choice
becoming invalid. Making sure you handle this case properly in your
application code is a subtle issue——and as this is an implicit
parameter to your choice, there is no operational intervention
possible to avoid the problem even if you have advanced warning.
Integration testing of your model and application becomes
non-deterministic and non-repeatable in the presence of explicit time
comparisons. While you can write repeatable Scenario tests for your
model, as you have explicit control of LET there, this will not
exercise your off-ledger application.
Modeling Date/Time Explicitly
An alternative is the Notarized Date Pattern. Here, the signatories to
a contract agree on a trusted party to notarize the current date. This
notarization takes the form of a CurrentDate contract on the ledger.
This contract has the Notary as the signatory, and generally has a
single consuming choice, controlled by the Notary, to advance the
date.
If you use this approach, your Add_Car choice would take an extra
parameter currentDate : ContractId CurrentDate, which you can think
of as the controller providing evidence or a proof that the agreed
Notary has attested to the current date for the purposes of this
contract. This resolves the issues with the implicit time model thus:
As Date is now explicit on the ledger timezones become explicit in
the progress of the CurrentDate contract.
While a contract can still be stalled if the Notary advances the
current date contract, the explicit nature of Date management means
that a) Any exercise sequenced ahead of the Notary date update will be
processed successfully; which means, b) There is now an avenue for
operational intervention where you have advanced warning that
processing for a given day has fallen behind schedule — assuming such
intervention is anticipated and permitted by the Notary agreement.
Because the operation of your system is once again a pure function of the contents of the ledger, the behaviour of the larger application becomes deterministic and repeatable. This massively reduces the effort required
for maintenance, testing, and debugging.
For these reasons I would recommend using the Notarized Date pattern where
this is possible, reserving implicit Date handling for those cases where
there really isn't an alternative.
Before your assert, you can bind the result of the getTime function. Then I suggest converting startCoverage to a Time, by means of the toGregorian and datetime functions in DA.Date.
You may not have enough information to do this correctly with your sample code; Date is a "local date", expected to be interpreted relative to some timezone, whereas Time is an absolute UNIX epoch offset. It's for that reason that there are so many caveats to toDateUTC listed in the manual and that I recommend avoiding that function.
Moreover, please keep in mind that only "ledger effective time" is available, which is not quite the same thing as "current time". Sure, for the purposes of the live exercise of Add_Car, getTime's result will correspond to the current time. However, transactions are necessarily replayable (for validation or other reasons), and for those executions getTime will produce what it originally did on exercise. You cannot use getTime to determine the amount of wall-clock time DAML took to execute some code, and that implies that even during live exercise ledger effective time doesn't correspond precisely to wall-clock. When you run test scenarios, the time starts at UNIX epoch and can be advanced manually in your scenario as your test requires; in fact I can recommend using pass or passToDate to test the very contract you're writing.

Does upsert() preserve TTL in couchbase?

This one is simple to figure out via code, but since its counter-intuitive (and not documented) I'm documenting it here:
Does Upsert (insert/update operation) preserve TTL in couchbase?
In other words, if I run this code:
cb.upsert('hello',{'hi':'there'},ttl=10)
cb.upsert('hello',{'hi':'there'})
will the documented created (and then updated) expire after 10 seconds?
So, no. the 2nd upsert resets the TTL - and the document will never expire.
Note that this behavior isn't consistent across couchbase: the incr() operation (for counters) does NOT reset the TTL.

Day wise aggregation considering client's timezone from millions of rows

Suppose I've a table where visitors'(website visitor) information is stored. Suppose, the table structure consists of the following fields:
ID
visitor_id
visit_time (stored as milliseconds in UTC since
'1970-01-01 00:00:00')
Millions of rows are in this table and it's still growing.
In that case, If I want to see a report (day vs visitors) from any timezone then one solution is :
Solution #1:
Get the timezone of the report viewer (i.e. client)
Aggregate the data from this table considering the client's timezone
Show the result day wise
But In that case performance will degrade. Another solution may be the following:
Solution #2:
Using Pre-aggregated tables / summary tables where client's timezone is ignored
But in either case there is a trade off between performance and correctness.
Solution #1 ensures correctness and Solution #2 ensures better performance.
I want to know what is the best practice in this particular scenario?
The issue of handling time comes up a fair amount when you get into distributed systems, users and matching events between various sources of data.
I would strongly suggest that you ensure all logging systems use UTC. This allows collection from any variety of servers (which are all hopefully kept synchronized with respect to their view of the current UTC time) located anywhere in the world.
Then, as requests come in, you can convert from the users timezone to UTC. At this point you have the same decision -- perform a real-time query or perhaps access some data previously summarized.
Whether or not you want to aggregate the data in advance will depend on a bunch of things. Some of these might entail the ability to reduce the amount of data kept, reducing the amount of processing to support queries, how often queries will be performed or even the cost of building a system versus the amount of use it might see.
With respect to best practices -- keep the display characteristics (e.g. time zone) independent from the processing of the data.
If you haven't already, be sure you consider the lifetime of the data you are keeping. Will you need ten years of back data available? Hopefully not. Do you have a strategy for culling old data when it is no longer required? Do you know how much data you'll have if you store every record (estimate with various traffic growth rates)?
Again, a best practice for larger data sets is to understand how you are going to deal with the size and how you are going to manage that data over time as it ages. This might involve long term storage, deletion, or perhaps reduction to summarized form.
Oh, and to slip in a Matrix analogy, what is really going to bake your noodle in terms of "correctness" is the fact that correctness is not at issue here. Every timezone has a different view of traffic during a "day" in their own zone and every one of them is "correct". Even those oddball time zones that differ from yours by an adjustment that isn't measured only in hours.

redis as write-back view count cache for mysql

I have a very high throughput site for which I'm trying to store "view counts" for each page in a mySQL database (for legacy reasons they must ultimately end up in mySQL).
The sheer number of views is making it impractical to do SQL "UPDATE ITEM SET VIEW_COUNT=VIEW_COUNT+1" type of statements. There are millions of items but most are only viewed a small number of times, others are viewed many times.
So I'm considering using Redis to gather the view counts, with a background thread that writes the counts to mySQL. What is the recommended method for doing this? There are some issues with the approach:
how often does the background thread run?
how does it determine what to write back to mySQL?
should I store a Redis KEY for every ITEM that gets hit?
what TTL should I use?
is there already some pre-built solution or powerpoint presentation that gets me halfway there, etc.
I have seen very similar questions on StackOverflow but none with a great answer...yet! Hoping there's more Redis knowledge out there at this point.
I think you need to step back and look at some of your questions from a different angle to get to your answers.
"how often does the background thread run?"
To answer this you need to answer these questions: How much data can you lose? What is the reason for the data being in MySQL, and how often is that data accessed? For example, if the DB is only needed to be consulted once per day for a report, you might only need it to be updated once per day. On the other hand, what if the Redis instance dies? How many increments can you lose and still be "ok"? These will provide the answers to the question of how often to update your MySQL instance and aren't something we can answer for you.
I would use a very different strategy for storing this in redis. For the sake of the discussion let us assume you decide you need to "flush to db" every hour.
Store each hit in hashes with a key name structure along these lines:
interval_counter:DD:HH
interval_counter:total
Use the page id (such as MD5 sum of the URI, the URI itself, or whatever ID you currently use) as the hash key and do two increments on a page view; one for each hash. This provides you with a current total for each page and a subset of pages to be updated.
You would then have your cron job run a minute or so after the start of the hour to pull down all pages with updated view counts by grabbing the previous hour's hash. This provides you with a very fast means of getting the data to update the MySQL DB with while avoiding any need to do math or play tricks with timestamps etc.. By pulling data from a key which is no longer bing incremented you avoid race conditions due to clock skew.
You could set an expiration on the daily key, but I'd rather use the cron job to delete it when it has successfully updated the DB. This means your data is still there if the cron job fails or fails to be executed. It also provides the front-end with a full set of known hit counter data via keys that do not change. If you wanted, you could even keep the daily data around to be able to do window views of how popular a page is. For example if you kept the daily hash around for 7 days by setting an expire via the cron job instead of a delete, you could display how much traffic each page has had per day for the last week.
Executing two hincr operations can be done either solo or pipelined still performs quite well and is more efficient than doing calculations and munging data in code.
Now for the question of expiring the low traffic pages vs memory use. First, your data set doesn't sound like one which will require huge amounts of memory. Of course, much of that depends on how you identify each page. If you have a numerical ID the memory requirements will be rather small. If you still wind up with too much memory, you can tune it via the config, and if needs be could even use a 32 bit compile of redis for a significant memory use reduction. For example, the data I describe in this answer I used to manage for one of the ten busiest forums on the Internet and it consumed less than 3GB of data. I also stored the counters in far more "temporal window" keys than I am describing here.
That said, in this use case Redis is the cache. If you are still using too much memory after the above options you could set an expiration on keys and add an expire command to each ht. More specifically, if you follow the above pattern you will be doing the following per hit:
hincr -> total
hincr -> daily
expire -> total
This lets you keep anything that is actively used fresh by extending it's expiration every time it is accessed. Of course, to do this you'd need to wrap your display call to catch the null answer for hget on the totals hash and populate it from the MySQL DB, then increment. You could even do both as an increment. This would preserve the above structure and would likely be the same codebase needed to update the Redis server from the MySQL Db if you the Redis node needed repopulation. For that you'll need to consider and decide which data source will be considered authoritative.
You can tune the cron job's performance by modifying your interval in accordance with the parameters of data integrity you determine from the earlier questions. To get a faster running cron nob you decrease the window. With this method decreasing the window means you should have a smaller collection of pages to update. A big advantage here is you don't need to figure out what keys you need to update and then go fetch them. you can do an hgetall and iterate over the hash's keys to do updates. This also saves many round trips by retrieving all the data at once. In either case if you will likely want to consider a second Redis instance slaved to the first to do your reads from. You would still do deletes against the master but those operations are much quicker and less likely to introduce delays in your write-heavy instance.
If you need disk persistence of the Redis DB, then certainly put that on a slave instance. Otherwise if you do have a lot of data being changed often your RDB dumps will be constantly running.
I hope that helps. There are no "canned" answers because to use Redis properly you need to think first about how you will access the data, and that differs greatly from user to user and project to project. Here I based the route taken on this description: two consumers accessing the data, one to display only and the other to determine updating another datasource.
Consolidation of my other answer:
Define a time-interval in which the transfer from redis to mysql should happen, i.e. minute, hour or day. Define it in a way so that fast and easyly an identifying key can be obtained. This key must be ordered, i.e. a smaller time should give a smaller key.
Let it be hourly and the key be YYYYMMDD_HH for readability.
Define a prefix like "hitcount_".
Then for every time-interval you set a hash hitcount_<timekey> in redis which contains all requested items of that interval in the form ITEM => count.
There exists two parts of the solution:
The actual page that has to count:
a) get the current $timekey, i.e. by date- functions
b) get the value of $ITEM
b) send the redis-command HINCRBY hitcount_$timekey $ITEM 1
A cronjob which runs in that given interval, not too close to the limit of that intervals (in example: not at the full hour). This cronjob:
a) Extracts the current time-key (for now it would be 20130527_08)
b) Requests all matching keys from redis with KEYS hitcount_* (those should be a small number)
c) compares every such hash against the current hitcount_<timekey>
d) if that key is smaller than current key, then process it as $processing_key:
read all pairs ITEM => counter by HGETALL $processing_key as $item, $cnt
update the database with `UPDATE ITEM SET VIEW_COUNT=VIEW_COUNT+$cnt where ITEM=$item"
delete that key from the hash by HDEL $processing_key $item
no need to del the hash itself - there are no empty hashes in redis as far as I tried
If you want to have a TTL involved, say if the cleanup-cronjob may be not reliable (as might not run for many hours), then you could create the future hashes by the cronjob with an appropriate TTL, that means for now we could create a hash 20130527_09 with ttl 10 hours, 20130527_10 with TTL 11 hours, 20130527_11 with TTL 12 hours. Problem is that you would need a pseudokey, because empty hashes seem to be deleted automatically.
See EDIT3 for current state of the A...nswer.
I would write a key for every ITEM. A few tenthousand keys are definitely no problem at all.
Do the pages change very much? I mean do you get a lot of pages that will never be called again? Otherwise I would simply:
add the value for an ITEM on page request.
every minute or 5 minutes call a cronjob that reads the redis-keys, read the value (say 7) and reduce it by decrby ITEM 7. In MySQL you could increment the value for that ITEM by 7.
If you have a lot of pages/ITEMS which will never be called again you could make a cleanup-job once a day to delete keys with value 0. This should be locked against incrementing that key again from the website.
I would set no TTL at all, so the values should live forever. You could check the memory usage, but I see a lot of different possible pages with current GB of memory.
EDIT: incr is very nice for that, because it sets the key if not set before.
EDIT2: Given the large amount of different pages, instead of the slow "keys *" command you could use HASHES with incrby (http://redis.io/commands/hincrby). Still I am not sure if HGETALL is much faster then KEYS *, and a HASH does not allow a TTL for single keys.
EDIT3: Oh well, sometimes the good ideas come late. It is so simple: Just prefix the key with a timeslot (say day-hour) or make a HASH with name "requests_". Then no overlapping of delete and increment may happen! Every hour you take the possible keys with older "day_hour_*" - values, update the MySQL and delete those old keys. The only condition is that your servers are not too different on their clock, so use UTC and synchronized servers, and don't start the cron at x:01 but x:20 or so.
That means: a called page converts a call of ITEM1 at 23:37, May 26 2013 to Hash 20130526_23, ITEM1. HINCRBY count_20130526_23 ITEM1 1
One hour later the list of keys count_* is checked, and all up to count_20130523 are processed (read key-value by hgetall, update mysql), and deleted one by one after processing (hdel). After finishing that you check if hlen is 0 and del count_...
So you only have a small amount of keys (one per unprocessed hour), that makes keys count_* fast, and then process the actions of that hour. You can give a TTL of a few hours, if your cron is delayed or timejumped or down for a while or something like that.

What is enough to store dates/times in the DB from multiple time zones for accurate calculations?

This is a HARD question. In fact it is so hard it seems the SQL standard and most of the major databases out there don't have a clue in their implementation.
Converting all datetimes to UTC allows for easy comparison between records but throws away the timezone information, which means you can't do calculations with them (e.g. add 8 months to a stored datetime) nor retrieve them in the time zone they were stored in. So the naive approach is out.
Storing the timezone offset from UTC in addition to the timestamp (e.g. timestamp with time zone in postgres) would seem to be enough, but different timezones can have the same offset at one point in the year and a different one 6 months later due to DST. For example you could have New York and Chile both at UTC-4 now (August) but after the 4th of November New York will be UTC-5 and Chile (after the 2nd of September) will be UTC-3. So storing just the offset will not allow you to do accurate calculations either. Like the above naive approach it also discards information.
What if you store the timezone identifier (e.g. America/Santiago) with the timestamp instead? This would allow you to distinguish between a Chilean datetime and a New York datetime. But this still isn't enough. If you are storing an expiration date, say midnight 6 months into the future, and the DST rules change (as unfortunately politicians like to do) then your timestamp will be wrong and expiration could happen at 11 pm or 1 am instead. Which might or might not be a big deal to your application. So using a timestamp also discards information.
It seems that to truly be accurate you need to store the local datetime (e.g. using a non timezone aware timestamp type) with the timezone identifier. To support faster comparisons you could cache the utc version of it until the timezone db you use is updated, and then update the cached value if it has changed. So that would be 2 naive timestamp types plus a timezone identifier and some kind of external cron job that checks if the timezone db has changed and runs the appropriate update queries for the cached timestamp.
Is that an accurate solution? Or am I still missing something? Could it be done better?
I'm interested in solutions for MySQL, SQL Server, Oracle, PostgreSQL and other DBMS that handle TIMESTAMP WITH TIME ZONE.
You've summarized the problem well. Sadly the answer is to do what you've described.
The correct format to use does depend the pragmatics of what the timestamp is supposed to represent. It can in general be divided between past and future events (though there are exceptions):
Past events can and usually should be stored as something which can never be reinterpreted differently. (eg: a UTC time stamp with a numeric time zone). If the named time zone should be kept (to be informative to the user) then this should be separate.
Future events need the solution you've described. Local timestamp and named time zone. This is because you want to change the "actual" (UTC) time of that event when the time zone rules change.
I would question if time zone conversion is such an overhead? It's usually pretty quick. I'd only go through the pain of caching if you are seeing a really significant performance hit. There are (as you pointed out) some big operations which will require caching (such as sorting billions of rows based on the actual (UTC) time.
If you require future events to be cached in UTC for performance reasons then yes, you need to put a process in place to update the cached values. Depending of the type of DB it is possible that this could be done by the sysadmins as TZ rules change rarely.
If you care about the offset, you should store the actual offset. Storing the timezone identifier is not that same thing as timezones can, and do, change over time. By storing the timezone offset, you can calculate the correct local time at the time of the event, rather than the local time based on the current offset. You may still want to store the timezone identifier, if it's important to know what actual timezone event was considered to have happened in.
Remember, time is a physical attribute, but a timezone is a political one.
If you convert to UTC you can order and compare the records
If you add the name of the timezone it originated from you can represent it in it's original tz and be able to add/substract timeperiods like weeks, months etc (instead of elapsed time).
In your question you state that this is not enough because DST might be changed. DST makes calculating with dates (other than elapsed time) complicated and quite code intensive. Just like you need code to deal with leap years you need to take into account if for a given data / period you need to apply a DST correction or not. For some years the answer will be yes for others no.
See this wiki page for how complex those rules have become.
Storing the offset is basically storing the result of those calculations. That calculated offset is only valid for that given point in time and can't be applied as is to later or earlier points like you suggest in your question. You do the calculation on the UTC time and then convert the resulting time to the required timezone based on the rules that are active at that time in that timezone.
Note that there wasn't any DST before the first world war anywhere and date/time systems in databases handle those cases perfectly.
I'm interested in solutions for MySQL, SQL Server, Oracle, PostgreSQL and other DBMS that handle TIMESTAMP WITH TIME ZONE.
Oracle converts with instant in time to UTC but keeps the time zone or UTC offset depending on what you pass. Oracle (correctly) makes a difference between the time zone and UTC offset and returns what you passed to you. This only costs two additional bytes.
Oracle does all calculations on TIMESTAMP WITH TIME ZONE in UTC. This is does not make a difference for adding months, but makes a difference for adding days as there is no daylight savings time. Note that the result of a calculation must always be a valid timestamp, e.g. adding one month to January 31st will throw an exception in Oracle as February 31st does not exist.