Simulate Multi-Threading in MS Access (VBA) - ms-access

I wrote a VBA script that runs in an Access database. The script looks up values on various tables and assigns an attribute to a main table based on the combination of values.
The script works as intended, however, I am working with millions of records so it takes an unacceptably long time.
I would like to break the process up into smaller parts and run the script concurrently on separate threads.
Before I start attempting to build a solution, I would like to know:
Based on your experience, would this increase performance? Or would the process take just as long?
I am looking at using Powershell or VBScript to accomplish this. Any obstacles to look out for?
Please Note: Due to the client this will run on, I have to use Access for the backend and if I use Powershell it will have to be version 1.0.
I know these are very vague questions but any feedback based on prior experience is appreciated. Thanks

Just wanted to post back with my final solution on this...
I tried the following ways to assign an attribute to a main table based on a combination of values from other tables for a 60,000 record sample size:
Solution 1: Used a combination of SQL queries and FSO Dictionary objects to assign attribute
Result: 60+ minutes to update 60,000 records
Solution 2: Ran script from Solution 1 concurrently from 3 separate instances of Excel
Result: CPU was maxed out (Instance 1 - 50% of CPU, Instances 2 and 3 - 25% each); stopped the code after an hour since it wasn't a viable solution
Solution 3: Tried using SQL UPDATE queries to update main table with the attribute
Result: This failed because apparently Access does not allow for a join on an UPDATE sub-query (or I just stink at writing SQL)
Solution 4 (Best Result): Selected all records from main table that matched the criteria for each attribute,
output the records into csv and assigned the attribute to all records in the csv file.
This created a separate file for each attribute, all in the same format. I then
imported and appended all of the records from the csv files into a new main table.
Result: 2.5 minutes to update 60,000 records
Special thanks to Pynner and Remou who suggested writing the data out to csv.
I never would have thought that this would be the quickest way to update the records with the attribute. I probably would have scrapped the project thinking it was impossible to accomplish with Access and VBA had you not made this suggestion. Thank you so much for sharing your wisdom!

Related

Pentaho Import uniqe records into database

I am quite new to Pentaho Spoon and I would like to import records of an csv file to an database table. However, only unique records should be imported into the database table. That is why I need to compare EACH record with all records of the database table in order to determine if the record should be imported or not.
So far, I tried out the suggested CRUD-pattern which looks like this:
As you can see in the picture, I merge the excel input and the table input (ignore the cast-steps. I needed to cast a value because ther differed in the float format: database format was #.000000 and the csv format of float was #.0)
After the merge join, I compare the flag (which is given by the merge rows(diff) and if the compared records are new, I import them to the database table, if they are changed, I update the record and if they are deleted or identical, I simply do nothing. So far, so good.
But here is the problem: If I shuffle the records of the csv-input-file and run the transformation anew, all the records are imported anew and consequently, there are duplicated in my database table (which I wanted to avoid). To emphasize again: The right way to solve this is that each row of the csv-input-file is compared with ALL entries in the database table.
How can I realize this? Any suggestions? Thank you so much in advance!!
The Merge Rows (diff) expect the input to be sorted. Normally, you have been warned of this by a pop-up.
Put a Sort rows step on the output flow of the Excel Input, before it reaches the Merge Rows (diff).
You should do the same between the Table Input and the Merge Rows (diff). On course you may think you could do it in the sql statement of the Table Input.
However, there is a beginner trap here. You have 3 other steps Output Rows, Update and Delete which operates on the same table. And these steps may lock the table. As in Kettle all the steps are running concurrently, you do not know which steps will fire first, and the table may be locked and never be able to read even the first record. This is known in jargon as an auto-lock, and the way to solve it is to put a Sort Row step as a buffer.
You can use the 'Dimension lookup/update' control which provides the same functionality which you are trying to achieve.
Thanks,
Nilesh

Creating a global variable in Talend to use as a filter in another component

I have job in Talend that is designed to bring together some data from different databases: one is a MySQL database and the other a MSSQL database.
What I want to do is match a selection of loan numbers from the MySQL database (about 82,000 loan numbers) to the corresponding information we have housed in the MSSQL database.
However, the tables in MSSQL to which I am joining the data from MySQL are much larger (~ 2 million rows), are quite wide, and thus cost much more time to query. Ideally I could perform an inner join between the two tables based on the loan number, but since they are in different databases this is not possible. The inner join that is performed inside a tMap occurs after the Lookup input has already returned its data set, which is quite large (especially since this particular MSSQL query will execute a user-defined function for each loan number).
Is there any way to create a global variable out of the output from the MySQL query (namely, the loan numbers selected by the MySQL query) and use that global variable as an IN clause in the MSSQL query?
This should be possible. I'm not working in MySQL but I have something roughly equivalent here that I think you should be able to adapt to your needs.
I've never actually answered a Stackoverflow question and while I was typing this the page started telling me I need at least 10 reputation to post more than 2 pictures/links here and I think I need 4 pics, so I'm just going to write it out in words here and post the whole thing complete with illustrations on my blog in case you need more info (quite likely, I should think!)
As you can see, I've got some data coming out of the table and getting filtered by tFilterRow_1 to only show the rows I'm interested in.
The next step is to limit it to just the field I want to use in the variable. I've used tMap_3 rather than a tFilterColumns because the field I'm using is a string and I wanted to be able to concatenate single quotes around it but if you're using an integer you might not need to do that. And of course if you have a lot of repetition you might also want to get a tUniqueRows in there as well to save a lot of unnecessary repetition
The next step is the one that does the magic. I've got a list like this:
'A1'
'A2'
'B1'
'B2'
etc, and I want to turn it into 'A1','A2','B1','B2' so I can slot it into my where clause. For this, I've used tAggregateRow_1, selecting "list" as the aggregate function to use.
Next up, we want to take this list and put it into a context variable (I've already created the context variable in the metadata - you know how to do that, right?). Use another tMap component, feeding into a tContextLoad widget. tContextLoad always has two columns in its schema, so map the output of the tAggregateRows to the "value" column and enter the name of the variable in the "key". In this example, my context variable is called MyList
Now your list is loaded as a text string and stored in the context variable ready for retrieval. So open up a new input and embed the variable in the sql code like this
"SELECT distinct MY_COLUMN
from MY_SECOND_TABLE where the_selected_row in ("+
context.MyList+")"
It should be as easy as that, and when I whipped it up it worked first time, but let me know if you have any trouble and I'll see what I can do.

MS Access data lineage documentation

I am looking for a scripted/automated way (presumably VBA?) to take an Access query and generate some kind of savable, searchable, publish-able documentation on the data lineage. So if there were a bunch of layered/nested queries, or even passthrough queries, along the way I want a way to trace the final fields in the specified query back until I get back to the original source tables/fields.
Everything I've found seems to do database documentation focused on how the table relationships are configured. I'm looking for a way to get the documentation of the user-created portion, down to the field. I'm very open-minded on what format the output is in. I'm convinced this must be possible, but haven't had any luck yet.
I'm also open to recommendations for a third-party application if it could do this.
Thanks in advance!
Access does have a built in “dependency” feature. The result is a VERY nice tree-view of those dependencies, and you can even launch such objects using that treeview of your application to “navigate” the application so to speak.
The option is found under database tools and is appropriately called Object Dependencies.
The result looks like this:
While you don't want to use auto correct, this feature will force on track changes. If this is a large application, then on first run a significant delay will occur. After that, the results can be viewed instantly. As noted, not only do you have a hierarchical tree view, but objects in the tree view can be clicked on to launch the object in question.
And the above will work for a query that based on a query etc. all the way down to the base table.
https://www.dropbox.com/sh/f73rs3h9u9q2xk5/AAArloN_Cmf_WbPZ4W75I6KVa?dl=0
This is a set of queries I wrote to provide the kind of documentation you're looking for. It seems a bit kludgy, but it works for me. It's not a simple as the other response, but it provides output that can be incorporated into other documentation.
Note - the documentation is out of date with respect to Union queries. The query I have to analyze Union queries seems to only pick up the 1st two things that go into the Union, so I changed this to a Make Table query, and manually edit the resulting table to add the missing relationships.
To use the queries:
Copy the table and all the queries into your database
Run the "Mapping Unions Make Table" query
Manually edit the Unions table if necessary
When you run any of the 3 main output queries, you are prompted for the Top object you want to analyze. Enter the name of a query or table to find all the dependencies for that object. The three main outputs are:
Mapping Summary - lists all of the objects that go into the top object and all of the objects that go into them, to a depth of about 10 (depth is controlled in the "Mapping all parents" query)
Mapping summary without duplicates
Mapping summary duplicates
I especially like the 2nd output - this is in a format that can be saved in Excel and input to Visio's Org Chart Wizard to get a simple graphical representation of the relationships. Then the 3rd output query can be used to manually add in the queries that go into more than one other query, which Visio's wizard cannot handle.

Access Web Application Data Macro to Sum values in Query and return as variable

I am working on a small application in Access Services on SharePoint to log colleagues leave requests, and I need to work out a data macro to calculate how many days of leave they have remaining from their allowance.
I have a table [Colleagues] with all of the user data, for simplicity I'll reduce it to [Email] and [Allowance] in days. I have another table which stores the requests [Requests] including the number of days to deduct in each approved leave request [Days Requested].
I have set up a query that returns all approved requests for the colleague and I would like to use a data macro that is triggered to run when the colleague logs in. As you cannot use aggregate functions in Web Applications, I am currently using ForEachRecord in the query to total the number of deductible days, however I cannot work out how to return that to a field in the [Colleague] record.
According to the Access help, I should be able to set the value to a LocalVar and use it in expressions as simply as referencing [Deductible Days], however this is not working.
Any help?
I finally worked this out after much tinkering.
In my query I included the [Colleague Email] field as well as the [Days Requested] field, and then when my Application loads it navigates to a form created from the [Colleagues] table. I have modified the Data Source of the form to link the [Email] field in the query results to the [Email] field in the [Colleagues] form.
Following this I was able to create an unbound textbox with the data source =Sum([Days Requested]) referring to the relevant field in the query. Voila! I now have the value to play around with in my application.
Hope that helps, took a lot of fiddling around. No data macros needed after all, but its a method I shall remember in future, opens up a lot of possibilities.
If I understand your situation correctly, I was faced with a very similar problem.
I believe the solution used here will work for you. It involves using a query to Sum up the values (we would use Sum where he used Count), use a Data Macro to run the query and then have have an On Insert/On Update trigger the Data Macro:
http://devspoint.wordpress.com/2014/03/26/validating-data-with-data-macros-in-access-services-2013/
Let me know if this works for you. It worked for me!

Similar rows in MySQL

I'm trying to select the top ten most similar properties for a given property in a realty site and I wondering if you guys could help me out. The variables I'm working with would be price(int), area(int), bathrooms(int), bedrooms(int), suites(int), parking(int). At the moment I'm thinking of ordering by ABS(a-b) but wouldn't that be slow if I had to calculate that every time a property is viewed? (I'm not sure I could cache this since the database is constantly being updated) Is there another option?
Thanks for your help!
One solution could be to create a new table containing the result ready. Like this:-
property_id similar_properties_ids
--------------------------------------
1 2,5,8
2 3,10
...
...
And a cron running at regular intervals doing the calculation for all the properties and filling up the similar_properties_ids.
So, at runtime, you don't have the calculation overhead but the downside is that you get results which are a little old (updated during the last cron run).