Extract event data from process execution from Bonitastudio - extract

I am working on a project and constructed my Bonita solution and executed it over a hundred of time. I would like to carry the project to the process mining phase and for me to carry out process mining I must have 3 fundamentals information: ID of the activities found in my process, the name of the activities and the timestamp of when the activities have been done.
I now need to extract the information from Bonitastudio but don't know how to extract these information. I had a look from the app website but could not find something that fulfilled my need.
I will be glad if you could give a help that could guide me please.
Thanks!

Related

Is there a good way for surfacing the time data was last updated in a Workshop Module?

Is there a good way to surface when a dataset backing an object was last built in a Workshop module? This would be helpful for giving users of the module a view on data freshness.
The ideal situation is that your data encodes the relevant information about how fresh it is; for instance if your object type represent "flights" then you can write a Function that sorts and returns the most recent flight and present it's departure timestamp as the "latest" update, since it represents the most recent data available.
The next best approach would be to have a last_updated column or similar that's either coming from the source system or added during the sync step. If the data connection is a JDBC or similar connection, this would be straightforward; something like select *, now() as last_updated_timestamp. If you have a file-based connection, you might need to get a bit more creative. This still falls short of accurately conveying the actual "latest data" available in the object type, but at least let's the user know when the last extract from the source system occured.
There are API endpoints for various services in Foundry related to executing schedules and builds, but metadata from these can be misleading if presented to users as an indication of data freshness because they don't actually "know" anything about the data itself - for example you might get the timestamp of when the latest pipeline build started, but if the source system has a 4 hour lag before data is synced to the relevant export tables, then you'll still be "off". So again, best to get this from inside your data wherever possible.

Experiencing long wait times for job to be created when using Data Connector API

The company I work for has multiple BIM360 accounts. We use an ETL pipeline to automate the extraction of data from these accounts. The goal is to load this data into our datawarehouse.
In order to do this, we loop through all our accounts. For each account, we post a data request.
After this step, we check if the job has been created. If not, we wait 5 minutes and then check again. Once created, we check if the export is ready and then ingest the data.
Lately we have been noticing extremely long wait times for these jobs to be created. In many cases, it took over one hour per account after which our API token was no longer valid which caused our ETL pipeline to fail.
Is there anything that can be done about this? I would also like to know if there is anything we can do ourselves to improve the performance.
Example job ID's (for Autodesk):
cca436eb-cddc-49db-9efd-3b67c96c514c,
68dd4a6f-290a-4a02-bce8-32a8d8764799,
3f0ec71a-716c-4344-bd5e-b94c82782023
More IDs can be provided if needed.
Many thanks in advance for your help.

Active Collab 5 Webhooks / Maintaining "metric" data

I have an application I am working on that basically takes the data from Active Collab and creates reports / graphs out of the data. The API itself is insufficient to get the proper data on a per request basis so I resorted to pulling the data down into a separate data set that can be queried more efficiently.
So in order to avoid needing to query the entire API constantly I decided to make use of webhooks in order to make the transformations to the relevant data and lower the need to resync the data.
However I notice not all events are sent, notably the following.
TaskListUpdated
MemberUpdated
TimeRecordUpdated
ProjectUpdated
There is probably more but these are the main ones I noticed so far,
Time reports is probably the most important, in fact it missing from webhooks means that almost any application has a good chance of incorrect data if it needs time record data. Its fairly common to do a typo in a time record and then adjust it later.
So am I missing anything here? Is there some way to see these events reliably?
EDIT:
In order to avoid a long comment to Ilija I am putting the bulk here.
Webhooks apart, what information do you need to pull? API that powers
time tracking reports can do all sorts of cross project filtering, so
your approach to keep a separate database may be an overkill.
Basically we are doing a multi-variable tiered time report. It can be sorted / grouped by any conceivable method you may want to look at.
http://www.appsmagnet.com/product/time-reports-plus/
This is the closest to what we are trying to do, back when we used Active Collab 4 this did the job, but even with it we had to consolidate it in our own spreadsheets.
So the idea of this is to better integrate our Active Collab data into our own workflow.
So the main data we are looking for in this case is
Job Types
Projects
Task Lists
Tasks
Time Records
Categories
Members / Clients
Companies
These items can feed not only our reports, but many other aspects of our company as well. For us Active Collab is the point of truth, so we want the data quickly accessible and fully query-able.
So I have set up a sync system that initially grabs all the data it can from Active Collab and then uses a mix of cron's and webhooks to keep it up to date.
Cron jobs work well for all aspects that do not have "sub items" (projects/tasks/task lists/time records). So those I need to rely on the webhook since syncing them takes to much time to be able to keep it up to date in real time.
For the webhook I noticed the above do not carry through. Time Records I figured out a way around it listed in my answer, and member can be done through the cron. However Task list and project updating are the only 2 of some concern. Project is fairly important as the budget can change and that would be used in reports, task lists has the start / end dates that could be used as well. Since going through every project / task list constantly to see if there is a change is really not a great idea I am looking for a way to reliably see updates for them.
I have based this system on https://developers.activecollab.com/api-documentation/ but I know there are at least a few end points that are not listed.
Cross-project time-record filtering using Active Collab 5 API
This question is actually from another developer on the same system (and also shows a TrackingFilter report not listed in the docs). Due to issues with maintaining an accurate set of data we had to adapt it. I actually notice that you (Ilija) are the person replying and did recommend we move over to this style of system.
This is not a total answer but a way to solve the issue with TimeRecordUpdated not going through the webhook.
There is another API endpoint for /whats-new This endpoint describes changes for the last day or so and it has a category called TrackingObjectUpdatedActivityLog this refers to an updated time record.
So I set up a cron job to check this fairly consistently and manually push the TimeRecordUpdated event through my system to keep it consistent.
For MemberUpdated since the data for a member being updated is unlikely to affect much, having a daily cron for checking the users seems good enough.
ProjectUpdated could technically be considered the same, but with the absence of TaskListUpdated that leads to far to many api calls to sync the data. I have not found a solution for this yet unfortunately.

designing a mysql workflow database

I am writing a simple issue tracking system. I need to know how I can design a database allowing for dynamic workflows.
Employees will make requests in these system. Such requests move from office to office. For example an issue x workflow could be as follows:
The employee posts an issue for which I have defined a category
The issue is routed to the first department defined in the work flow
After approval, it is routed to the second department defined in the workflow
I already have tables for issues, issue_category, departments
so I want to know how to implement an workflow table related to the departments and how to forward to the next table after approval.
Sorry for the long winded question. suggestions, guidelines, request for clarification welcome.
Not writing a system takes less time that writing one.
Have you looked into off-the-shelf workflow systems? There are lots of BPM solutions out there that will do what you're describing very nicely.
Issue tracking? Have you thought about just using JIRA or Bugzilla or something like that?
If your purpose is to learn how to write a workflow system, go for it. But if you're intending to put a solution into production for a wider audience be aware of other possibilities.
I'd forget about tables for a while and just think in a more abstract way about the problem. I see a number of meaningful entities in your statement:
workflow
task
employee
department
I also imagine some other items that might be useful in your solution:
queue for each employee to accept and prioritize incoming tasks
a mechanism for allowing both computer and human consumers of tasks to plug into the system
an auditing capability to track how tasks flow through for compliance and debugging purposes
an alerting mechanism to notify users that a task has been completed
a scheduler to allow tasks to be added on a regular basis
You have a lot of thinking to do before you even begin thinking about tables. I'd recommend doing that first.
This is a very big problem. If you're doing this for someone else, make sure that all parties understand what they're getting into.

efficient way to store sitewide user activity

I tried searching through on stackoverflow as well as googling around a lot, but am not able to find answers to my problem (I guess I'm searching for the wrong keywords / terms).
We are in the process of building a recommendation engine, and while we are initially logging all user activity in custom logs (we use ruby / rails), we need to do an EOD scanning of that file and arrange according to the user. We also have some other user data coming in from some other places (his fb activity, twitter timeline, etc), and hence by EOD we want all data for a particular user to be saved somewhere and then run our analyzer code on all of the user's data to generate the recommendations.
The problem is that we are generating a lot of data, and while for the time being we are using a mysql table to store all this data, we are not sure till how much time can we continue to do this, as our user-base grows (we are still testing it out internally with about 10 users with a lot of activity). Plus, as eager developers we would like to try out something new that can suffice our needs.
Any pointers in this direction will be very helpful.
Check out Amazon Elastic Map Reduce. It was built for this very type of thing.