Access Update Query using result of a Group By Query - ms-access

I have a table tblResponses which records responses received for each project in my database. Multiple responses per project, each with a date.
Another table tblActivity stores each activity on a project. Multiple activities per project.
I want to update each record in the Activity table with the date of the MOST RECENT response received for that project. If I use a GROUP BY query on tblResponses to get the Max(ResponseDate) grouped by projectID, I cannot then use this in an update query on tblActivity, as it makes the query not updateable.
At the moment I am having to populate a temporary table from the output of the GROUP BY query, and then use this in the Update query to update tblActivity. Not ideal as leads to database bloat etc, poor performance etc.
Is there any way to do this WITHOUT populating a temporary table? I understand why a Group By query cannot be updateable itself, but don't see why it cannot be used to provide the Update To values for updating another table.
(And yes, I know it shouldn't be necessary to store the result physically in a separate table when it could be calculated, but for various lengthy reasons, that isn't an option here.)
Many thanks for any help!
Jim

Related

Filtering a query by another query while allowing record input

I have 2 queries. A, B.
Query A has several columns of data and B has only 1 column. When I link A & B I get exactly what I want (filtered records of A).
However, I still do want to input new data into the query, how do I do this?
Ok then :)
Question was how to make a query with JOINs updateable.
See: Dealing with Non-Updateable Microsoft Access Queries and the Use of Temporary Tables
Reasons why a Query or Recordset is not Updateable
There are many reasons why your data may not be updateable. Some are
pretty obvious:
The query is a Totals query (uses GROUP BY) or Crosstab query (uses TRANSFORM), so the records aren't individual records
The field is a calculated field, so it can't be edited
You don't have permissions/rights to edit the table or database
The query uses VBA functions or user defined functions and the database isn't enabled (trusted) to allow code to run
Some reasons are less obvious but can't be avoided:
Linked tables without a primary key for certain backend databases (e.g. SQL Server). Access/Jet requires the table to be keyed to make
any changes. This makes sense since Access wants to issue a SQL query
for modifications but can't uniquely identify the record.
Less obvious are these situations:
Queries with some fields are summaries linked to individual records and the individual records still can't be edited
Queries with multi-table joins that aren't on key fields
Union queries
Another resource: http://allenbrowne.com/ser-61.html

MySQL count selected rows in one table to update value in another table

I have created a table ("texts" table) for storing ocr text from scanned documents. The table now has 100,000 + records. It stores a separate record for each page in the document. I set up the table originally so it stored the documents' title and its location against each record, which was obviously bad design as the info was duplicated for many records. I have subsequently created a separate table which now only stores one record for each document ("documents" table). The original table still contains a record for each page in the document, but the only columns now are the ocr text and the id of the document record in the documents table.
The documents table has a column "total_pages". I am trying to update this value using the following query:
UPDATE documents SET total_pages=(SELECT Count(*) from texts where texts.docs_id=documents.id)
This just seems to take forever to execute and I have had to crash out of it on a couple of occasions. There are over 8000 records in the documents table.
I have tested the query by limiting it to just one document
UPDATE documents SET total_pages=(SELECT Count(*) from texts where texts.docs_id=documents.id and documents.id=1)
This works eventually with just one record, but it takes a very long time to execute. I am guessing that my full query needs a bit of optimization! Any help greatly appreciated.
This is your query:
UPDATE documents
SET total_pages = (SELECT Count(*)
from texts
where texts.docs_id = documents.id)
For performance, you want an index on texts(docs_id). That will probably fix your performance problem. In fact, it might make it unnecessary to store this value in the master table.
If you do decide to store the count, be sure that you keep the value up-to-date. That would typically require a trigger to handle inserts and dates (and perhaps updates, if doc_id changes).

MS SQL Server: using CDC to populate single destination table from several source tables

Can I use Change Data Capture in MS SQL Server (2008 or 2012) with the SSIS Package which joins several source tables into one destination table?
Technet articles describe CDC + SSIS usage cases when source table and destination table have the same structure. The only hint to the possibility of change tracking for custom data transformations is that it is possible to specify the columns for which CDC will track changes.
The problem is, I need to combine data from a number of source tables to get the destination table and then keep it in sync with those source tables.
This is because the data in the destination datawarehouse is normalized to lesser extent than in the source database. For example, I have Events table (containting Computer ID, Date/Time, and Event Description) and Computers table (containting Computer ID and Computer Name). I don't need those normalized tables and computer ids in the destination table, so the select to fill the destination tables should be:
INSERT INTO DestDB..ComputerEvents (ComputerName, DateTime, Event)
SELECT s.ComputerName, e.DateTime, e.Event
FROM SourceDB..EventLog e
JOIN SourceDB..ComputerNames s
ON e.CompID = s.CompID
I just cannot figure out how to make CDC work with SSIS Package containing such transformation? Is it even possible?
To answer the question: No, you can't.
As one other responder has pointed out, CDC can only tell you what changed in EACH source table since the last time you extracted changes.
Using CDC to extract changes from multiple source tables to load a single destination table is anything but simple.
Let's show why by means of an example. For this example I assume that a staging table is a table that is truncated routinely before being populated.
Suppose we have two source tables: Order, OrderDetail. We have one destination fact table FactOrder. FactOrder contains the OrderKey (from Order) and the sum of order amount from OrderDetail. A customer orders 3 products. One Order and 3 OrderDetail records are inserted into the source database tables. Our DW ETL extracts the 1 order record (insert) and 3 OrderDetail records (insert). If we chose to load changed records into staging tables as a previous responder said we could simply join our staging tables to create our FactOrder record. But, what happens if the we no longer carry one of the products and someone deletes a record from the OrderDetail record. The next DW ETL extracts 1 OrderDetail record (delete). How do we use this information to update the target table? Clearly we can't join from Order to OrderDetail because Order has no record for this particular OrderKey since it is a staging table that we just truncated. I chose a delete example but consider the same problem if dependent tables are updated.
What I propose instead is to extract the distinct set of primary key (OrderKey in our example) values for which there are changes in the any of the source tables required to build the FactOrder record and then extract the full FactOrder record in a subsequent request. For example, if 5 Order records are changed we know the 5 OrderKey values. If 30 OrderDetail records are changed we need to determine the distinct set of OrderKey values. Let's say that is 10 OrderKey. We then union the two sets. Let's say that there is overlap so that yields 12 OrderKey values. Now we seed our FactOrder extract query with the 12 OrderKey values. We get back 12 complete FactOrder records. We then use comparison of new to stored binary checksum to determine how to action the 12 records (insert or update). The above approach does not cover deletes from the Order table. Those would result in trivial deletes from FactOrder.
The many examples out there as you noted show how to use CDC to replicate/synchronize data from 1 source to 1 destination which isn't a typical data warehouse load use case since the tables in the data warehouse are typically denormalized (thus requiring joins among multiple source tables to build the destination row).
OK first thing CDC captures changes in a table, so if there was some insert or delete or update in a table then a CDC record gets created with an indicator to say insert or update or delete and all the CDC task does is output records to one of the three outputs based on that indicator column so coming back to your question you might have to have multiple OLDEDB Sources and CDC Task for each Source and UNION ALL similar operations (insert,update, delete) together and then the Destination component or OLEDB Command component hope this helps :)
Consider CDC as if it were your automated mechanism for filling staging tables (instead of a sql query, or replication), using one CDC source table pointed at one regular staging table. From there simply build your joined queries against the multiple staging tables as needed.
My assumption is, that you are pulling data from non-identical tables, like
an Order table,
an OrderDetail table, etc.
If you are pulling from several identical tables in the same or different dbs, then you can push the output of the CDC directly into the staging table and you're done.

MySQL Query: Return all rows with a certain value in one column when value in another column matches specific criteria

This may be a little difficult to answer given that I'm still learning to write queries and I'm not able to view the database at the moment, but I'll give it a shot.
The database I'm trying to acquire information from contains a large table (TransactionLineItems) that essentially functions as a store transaction log. This table currently contains about 5 million rows and several columns describing products which are included in each transaction (TLI_ReceiptAlias, TLI_ScanCode, TLI_Quantity and TLI_UnitPrice). This table has a foreign key which is paired with a primary key in another table (Transactions), and this table contains transaction numbers (TRN_ReceiptNumber). When I join these two tables, the query returns one row for every item we've ever sold, and each row has a receipt number. 16 rows might have the same receipt number, meaning that all of these items were sold in a single transaction. Below that might be 12 more rows, each sharing another receipt number. All transactions are broken down into multiple rows like this.
I'm attempting to build a query which returns all rows sharing a single receipt number where at least one row with that receipt number meets certain criteria in another column. For example, three separate types of gift cards all have values in the TLI_ScanCode column that begin with "740000." I want the query to return rows with values beginning with these six digits in the TLI_ScanCode column, but I would also like to return all rows which share a receipt number with any of the rows which meet the given scan code criteria. Essentially, I need the query to return all rows for every receipt number which is also paired in at least one row with a gift card-related scan code.
I attempted to use a subquery to return a column of all receipt numbers paired with gift card scan codes, using "WHERE A.TRN_ReceiptAlias IN (subquery..." to return only those rows with a receipt number which matched one of the receipt numbers returned by the subquery. This appeared to run without issue for five minutes before the server ground to a halt for another twenty while it processed the query. The query appeared to complete successfully, but given that I was working with IT to restore normal store operations during this time I failed to obtain the results of the query (apart from the associated shame and embarrassment).
I'd like to know if there is a way to write a query to obtain this information without causing the server to hang. I'm assuming that either: a) it wasn't very smart to use a subquery in this manner on such a large table, or b) I don't know enough about SQL to obtain the information I need. I'm assuming the answer is both A and B, but I'd very much like to learn how to do this the right way. Any help would be greatly appreciated. Thanks!
SELECT *
FROM a as a1
JOIN b
ON b.id = a.id
JOIN a as a2
ON a2.id = b.id
WHERE b.some_criteria = 'something';
Include an index on (b.id,b.some_criteria)
You aren't the first person, nor will you be the last to bring down your system with an inefficient query.
The most important lesson is that "Decision Support" and "Analytics" really don't co-exist with a transaction system. You really want to pull the data into a datamart or datawarehouse or some other database that isn't your transaction database, so that you don't take the business offline.
In terms of understanding why your initial query was so inefficient, you want to familiarize yourself with the EXPLAIN EXTENDED syntax that returns you plan information that should help you debug your query and work on making it perform acceptably. If you update your question with the actual explain plan output for it, that would be helpful in determining what the issue is.
Just from the outline you provided, it does sound like a self join would make sense rather than the subquery.

SQL to update column in modified table

I am a reasonably competent SQL programmer but my skills are still pretty much in the domain of simple INSERT, SELECT, UPDATE statements with an occasional LIKE etc thrown in. What I am currently trying to do is rather more complex. Here is the scenario.
I have three tables.
Table 1, *users* identifies users via a User ID, uid. Users can have one or more sub accounts
Table 2 *accounts* keeps a record of subaccounts for each user with, amongst other things the columns uid and sid where uid is the one defined in the *users* table.
Table 3, *data* is currently storing some data, in a data column that is being associated with a particular subaccount, sid.
The thing I have just realized is that there is no particular reason to block users from using those data across subaccounts. No problem - I can change my data subset search SQL to work with the uid instead. However, given the frequency of such searches, it seems well worth while simply sticking in a uid column in *data*.
To do that I would need to write some smart SQL that would get uid,sid pairs from the *accounts* table and use that information to update the newly created uid column in the data table. This I have to admit is beyond my knowledge of SQL.
I should mention that the system using these data is now in production and has several 100s of users so the option of just acting like they are not there is not available. Not terribly relevant I think but I should mention that uid and sid are alphanumeric strinsg with both columns being indexed.
I would be most grateful to anyone here who might be able to help out with it.
Mysql can do updates based on joins and based on reading of your schema here's what I'd do...
UPDATE accounts a, data d
set d.uid=a.uid
where a.sid=d.sid
and d.uid is NULL