How can i get timestamp difference in DAML choice execution? - daml

I am trying to execute 2 choices one after the another. Both are executing so fast, that they have same timestamp.
timestamp = 1607079031453,
Thus making it difficult to arrange via ascending order in a table.
Canyou suggest any work aroud for this?

getTime in DAML does not give you "system time", as there is no notion of system time on a distributed system. It gives you something called "Ledger Time" documented here: https://docs.daml.com/concepts/time.html
Ledger Time is specified by the submitting node, and a property of the entire transaction. That means all calls to getTime within a single transaction will return the same time.
If you create two identical contracts in a single transaction, there are only two ways to distinguish them:
Position in the transaction tree
Contract Id
Contract Id is a hash so gives you no useful ordering properties other than some value to order by stably. If you want to order by the order in which contracts were created, you need to use the position in the transaction tree.
I don't know where you store your data, or which API you use to store it there, but suppose you used a subscription to the Transaction Service, which returns Create events in order, and stored it to an SQL database, you can just put an auto-incrementing integer column on your table and use that integer to sort by.

#bame's answer is mostly geared towards the DAML language, I'll explore it from the point of view of the Ledger API.
If your objective is to assess that one choice effectively occurred after the other and the two choices occur as part of different transactions you could use offsets for it.
Offsets are effectively an opaque binary blob from the client perspective, but they must be lexicographically comparable: take the two offsets and the lowest one will have occurred before the one with the higher offset.
Note that this only applies if the two choices were taken as part of two different transactions. If they occurred in the same transaction, the choice that occurred before will appear before as you traverse the transaction tree in preorder.

Related

Load balancing KEYs using GET via Redis

my application currently using MySQL makes phone calls fetching information about the dialed numbers and the caller ID from the DB. I want to have a group where a list of caller IDs to be defined in Redis. Let's say 10 caller IDs. But for each dialing, I want to SELECT/GET the caller ID from redis server not just a random number. Is that possible with Redis? It's like load balancing from the list of Keys from redis to make sure all keys are given a fair chance to be used?
An example of the data set will be a phonebook which will be the key, and there will be say 10 phone numbers in that phonebook. I want to use those numbers for every unique dialing so all numbers in the phonebook are used evenly for dialing.
I can do that in MySQL by setting up an update field in the table but that's going to increate UPDATE's on MySQL. Is this something can easily be done with Redis? I can't seem to think of a logic on how to do that.
Thanks.
There are two ways to do it in Redis:
ZSET
You can track the usage frequency with the score of a zset entry. So when you fetch one out from Redis with lowest score, you increase its score by one.
The side benefit is you can easily see exactly how many times each element has been used.
LIST
If you're not bothered about tracking the usage in numbers. You can also do it with a Redis list. Just use RPOPLPUSH source destination from/to itself to achieve round robin load balancing effect. Basically it takes an element from the bottom and puts it back onto the top the queue, and returns you the value of the shuffled element, obviously.
The benefit is there is only one command to run and the operation is atomic.

Validation and Creating unique ID

Okay - for my project I was asked to identify some validation techniques for a process that we have to transform some data. Let me give you a background.
We receive data from client - we load the file, and only pull in the fields necessary for processing. A few checks are done at this stage. From here we run scripts on the data which essentially does all the heavy lifting. (Dropping duplicates, checking dates, etc). Then it runs through a blackbox system and spits us out results.
We have been notified by client that we are extremely off in our counts for a particular group. roughly $4mill dollars for this one.
We have a process to identify a unique member, by generating a pol_ID, a Suf_ID and with their associated groupname, they are considered unique in our system, and in our processing system.
We need a process to handle the records for these unique members.
A unique member can have one to many claims associated to their name in a given time period.
When we receive claim information, it is generally handled by using the payor_field + claimno + a generated sequence number (sometimes this sequence number is the last two digits of claimno)...
Ex. Three claims come into system, and after processing through load, we see
the client has repeated the claimno - since we using the last two digits, it no longer makes them unique and drops a two of the three records. Only retaining the first one.
WKS-01100 75.02 - stays
WKS-01100 6000.56 - drops
WKS-01100 560.23 - drops
My problem comes into play, because we usually make assumptions on the claimno that if we parse off the last two digits, it is unique, in testing this case we have tried creating an explicit incremental sequence number in another column to consider this unique. Which then doubles our results.
Now my questions are as follows:
Is there another way to make these claims unique? Auto-Increment is not an option. Consider the client can send duplicate claimnos which is where our problem lies, they can potentially recycle their claimno.
Since it's month based, maybe there could be some kind of month id on the end..?
Would any binary representation of the sequence number work? It is an INT data type..
(Also should be noted we deal with historical data that goes back 24 months, and each month we get the next consecutive months data, and we drop the first month in the set)
We are not limited on what we do to transform this claimno so I am open to suggestions...tried to keep it short but let me know if I need to add more info :) Thanks.
Do you have a timestamp saved for each claim? A possible solution is to append the timestamp to make the claim unique.
WKS-01100-1330464175
WKS-01100-1327514036
WKS-01100-1341867984

Should id or timestamp be used to determine the creation order of rows within a database table? (given possibility of incorrectly set system clock)

A database table is used to store editing changes to a text document.
The database table has four columns: {id, timestamp, user_id, text}
A new row is added to the table each time a user edits the document. The new row has an auto-incremented id, and a timestamp matching the time the data was saved.
To determine what editing changes a user made during a particular edit, the text from the row inserted in response to his or her edit is compared to the text in the previously inserted row.
To determine which row is the previously inserted row, either the id column or the timestamp column could be used. As far as I can see, each method has advantages and disadvantages.
Determining the creation order using id
Advantage: Immune to problems resulting from incorrectly set system clock.
Disadvantage: Seems to be an abuse of the id column since it prescribes meaning other than identity to the id column. An administrator might change the values of a set of ids for whatever reason (eg. during a data migration), since it ought not matter what the values are so long as they are unique. Then the creation order of rows could no longer be determined.
Determining the creation order using timestamp
Advantage: The id column is used for identity only, and the timestamp is used for time, as it ought to be.
Disadvantage: This method is only reliable if the system clock is known to have been correctly set each time a row was inserted into the table. How could one be convinced that the system clock was correctly set for each insert? And how could the state of the table be fixed if ever it was discovered that the system clock was incorrectly set for a not precisely known period in the past?
I seek a strong argument for choosing one method over the other, or a description of another method that is better than the two I am considering.
Using the sequential id would be simpler as it's probably(?) a primary key and thus indexed and quicker to access. Given that you have user_id, you can quickly assertain the last and prior edits.
Using the timestamp is also applicable, but it's likely to be a longer entry, and we don't know if it's indexed at all, plus the potential for collisions. You rightly point out that system clocks can change... Whereas sequential id's cannot.
Given your update:
As it's difficult to see what your exact requirements are, I've included this as evidence of what a particular project required for 200K+ complex documents and millions of revisions.
From my own experience (building a fully auditable doc/profiling system) for an internal team of more than 60 full-time researchers. We ended up using both an id and a number of other fields (including timestamp) to provide audit-trailing and full versioning.
The system we built has more than 200 fields for each profile and thus versioning a document was far more complex than just storing a block of changed text/content for each one; Yet, each profile could be, edited, approved, rejected, rolled-back, published and even exported as either a PDF or other format as ONE document.
What we ended up doing (after a lot of strategy/planning) was to store sequential versions of the profile, but they were keyed primarily on an id field.
Timestamps
Timestamps were also captured as a secondary check and we made sure of keeping system clocks accurate (amongst a cluster of servers) through the use of cron scripts that checked the time-alignment regularly and corrected them where necessary. We also used Ntpd to prevent clock-drift.
Other captured data
Other data captured for each edit also included (but not limited to):
User_id
User_group
Action
Approval_id
There were also other tables that fulfilled internal requirements (including automatically generated annotations for the documents) - as some of the profile editing was done using data from bots (built using NER/machine learning/AI), but with approval being required by one of the team before edits/updates could be published.
An action log was also kept of all user actions, so that in the event of an audit, one could look at the actions of an individual user - even when they didn't have the permissions to perform such an action, it was still logged.
With regard to migration, I don't see it as a big problem, as you can easily preserve the id sequences in moving/dumping/transferring data. Perhaps the only issue being if you needed to merge datasets. You could always write a migration script in that event - so from a personal perspective I consider that disadvantage somewhat diminished.
It might be worth looking at the Stack Overflow table structures for there data explorer (which is reasonably sophisticated). You can see the table structure here: https://data.stackexchange.com/stackoverflow/query/new, which comes from a question on meta: How does SO store revisions?
As a revision system, SO works well and the markdown/revision functionality is probably a good example to pick over.
Use Id. It's simple and works.
The only caveat is if you routinely add rows from a store-and-forward server so rows may be added later but should treated as being added earlier
Or add another column whose sole purpose is to record the editing order. I suggest you do not use datetime for this.

The data structure for the "order" property of an object in a list

I have a list of ordered objects stored in the database and accessed through an ORM (specifically, Django's ORM). The list is sorted arbitrarily by the user and I need some way to keep track of it. To that end, each object has an "order" property that specifies its order in relation to the other objects.
Any data structure I use to sort on this order field will have to be recreated upon the request, so creation has to be cheap. I will often use comparisons between multiple objects. And insertions can't require me to update every single row in the database.
What data structure should I use?
Heap
As implemented by std::set (c++) with comparison on the value (depends on the value stored), e.g. integers by value, strings by string ordering and so on. Good for ordering and access but not sure when you need to do reordering, is that during insertion, constantly and so on.
You're not specifying what other constraints do you have. If number of elements is know and it is reasonable or possible to keep fixed array in memory then each position would be specified by index. Assumes fixed number of elements.
You did not specify if you want operations you want to perform often.
If you load up values once then access them without modification you can use one algorithm for creation then either build index or transform storage to be able to use fast access ...
Linked list (doubly)?
I don't think you can get constant time for insertions (but maybe amortized constant time?).
I would use a binary tree, then the bit sequence describing the path can be used as your "order property".

Using a table to keep the last used ID in a web server farm

I use a table with one row to keep the last used ID (I have my reasons to not use auto_increment), my app should work in a server farm so I wonder how I can update the last inserted ID (ie. increment it) and select the new ID in one step to avoid problems with thread safety (race condition between servers in the server farm).
You're going to use a server farm for the database? That doesn't sound "right".
You may want to consider using GUID's for Id's. They may be big but they don't have duplicates.
With a single "next id" value you will run into locking contention for that record. What I've done in the past is use a table of ranges of id's (RangeId, RangeFrom, RangeTo). The range table has a primary key of "RangeId" that is a simple number (eg. 1 to 100). The "get next id" routine picks a random number from 1 to 100, gets the first range record with an id lower than the random number. This spreads the locks out across N records. You can use 10's, 100's or 1000's of range records. When a range is fully consumed just delete the range record.
If you're really using multiple databases then you can manually ensure each database's set of range records do not overlap.
You need to make sure that your ID column is only ever accessed in a lock - then only one person can read the highest and set the new highest ID.
You can do this in C# using a lock statement around your code that accesses the table, or in your database you can put together a transaction on your read/write. I don't know the exact syntax for this on mysql.
Use a transactional database and control transactions manually. That way you can submit multiple queries without risking having something mixed up. Also, you may store the relevant query sets in stored procedures, so you can simply invoke these transactional queries.
If you have problems with performance, increment the ID by 100 and use a thread per "client" server. The thread should do the increment and hand each interested party a new ID. This way, the thread needs only access the DB once for 100 IDs.
If the thread crashes, you'll loose a couple of IDs but if that doesn't happen all the time, you shouldn't need to worry about it.
AFAIK the only way to get this out of a DB with nicely incrementing numbers is going to be transactional locks at the DB which is hideous performance wise. You can get a lockless behaviour using GUIDs but frankly you're going to run into transaction requirements in every CRUD operation you can think of anyway.
Assuming that your database is configured to run with a transaction isolation of READ_COMMITTED or better, then use one SQL statement that updates the row, setting it to the old value selected from the row plus an increment. With lower levels of transaction isolation you might need to use INSERT combined with SELECT FOR UPDATE.
As pointed out [by Aaron Digulla] it is better to allocate blocks of IDs, to reduce the number of queries and table locks.
The application must perform the ID acquisition in a separate transaction from any business logic, otherwise any transaction that needs an ID will end up waiting for every transaction that asks for an ID first to commit/rollback.
This article: http://www.ddj.com/architect/184415770 explains the HIGH-LOW strategy that allows your application to obtain IDs from multiple allocators. Multiple allocators improve concurrency, reliability and scalability.
There is also a long discussion here: http://www.theserverside.com/patterns/thread.tss?thread_id=4228 "HIGH/LOW Singleton+Session Bean Universal Object ID Generator"