SQL Server sometimes temporary partitions indexes - sql-server-2008

I have implemented a system on one of our SQL Servers (all currently 2008) that reads out the size and usage of indexes (not PKs) on our tables and stores the information historically in a dedicated database.
Each index gets a assigned a own SID in table a, and each time the index size or usage changes by a specified value a new entry in table b is created, the old one set to inactive (SCD2).
The jobs runs once a day.
Problem: On very rare occassions I get two rows of size for some indexes and so far it only has happened on 3 tables out of more than 1000 that are watched.
Where do i get the data:
FROM IndexInfo.IndexOverview io
JOIN IndexInfo.IndexSizeUsage isu ON io.IndexOverviewSid = isu.IndexOverviewSid
JOIN XYZ.sys.partitions p ON io.object_id = p.object_id AND io.index_id = p.index_id
JOIN ( SELECT container_id, SUM(total_pages) total_pages, SUM(used_pages) used_pages
FROM XYZ.sys.allocation_units
GROUP BY container_id) a ON p.partition_id= a.container_id
LEFT JOIN (SELECT object_id, index_id, ISNULL(100.0*(user_lookups+user_scans+user_seeks)
/ NULLIF((SELECT SUM(user_lookups+user_scans+user_seeks)
FROM XYZ.sys.dm_db_index_usage_stats indusinner
WHERE indusinner.database_id=indusout.database_id AND indusinner.object_id=indusout.object_id
GROUP by object_id),0),0) usage
FROM XYZ.sys.dm_db_index_usage_stats indusout WHERE database_id=DB_ID('XYZ')) usageselect ON io.object_id=usageselect.object_id AND io.index_id=usageselect.index_id
This is is the important part.
IndexOverview (table a): one entry per created user index
IndexSizeUsage (table b): SCD2 data for usage and index size (by the time the error occurs only 1 active entry for each index)
The result:
Two rows in table b for an entry in table a while the only difference in the rows is the information from table a. As table a is Grouped by the only value I use for the join, SQL Server somehow creates a second partition by the time the data is read.
I tried to replicate the problem by manually executing this query while reorganizing / rebuilding the index (which should never happen by the time the job is scheduled) and by doing large inserts, forcing the index to grow.
When and why does SQL Server temporarily create a second partition?

Related

Data design best practices for customer data

I am trying to store customer attributes in a MySQL database although it could be any type of database. I have a customer table and then I have a number of attribute tables (status, product, address, etc.)
The business requirements are to be able to A) look back at a point in time to see if a customer was active or what address they had on any given date and B) have a customer service rep be able to put things like entering future vacation holds. I customer might call today and tell the rep they will be on vacation next week.
I currently have different tables for each customer attribute. For instance, the customer status table has records like this:
CustomerID
Status
dEffectiveStart
dEffectiveEnd
1
Active
2022-01-01
2022-05-01
1
Vacation
2022-05-02
2022-05-04
1
Active
2022-05-05
2099-01-01
When I join these tables the sql typically looks like this:
SELECT *
FROM customers c
JOIN customerStatus cs
on cs.CustomerID = c.CustomerID
and curdate() between cs.dEffectiveStart and cs.dEffectiveEnd
While this setup does work as designed, it is slow. The query joins themselves aren't too bad, but when I try to throw an Order By on its done. The typical client query would pull 5-20k records. There are 5-6 other similar tables to the one above I join to a customer.
Do you any suggestions of a better approach?
That ON clause is very hard to optimize. So, let me try to 'avoid' it.
If you are always (or usually) testing CURDATE(), then I recommend this schema design pattern. I call it History + Current.
The History table contains many rows per customer.
The Current table contains only "current" info about each customer -- one row per customer. Your SELECT would need only this table.
Your design is "proper" because the current status is not redundantly stored in two places. My design requires changing the status in both tables when it changes. This is a small extra cost when changing the "status", for a big gain in SELECT.
More
The Optimizer will probably transform that query into
SELECT *
FROM customerStatus cs
JOIN customers c
ON cs.CustomerID = c.CustomerID
WHERE curdate() >= cs.dEffectiveStart
AND curdate() <= cs.dEffectiveEnd
(Use EXPLAIN SELECT ...; SHOW WARNINGS; to find out exactly.)
In a plain JOIN, the Optimizer likes to start with the table that is most filtered. I moved the "filtering" to the WHERE clause so we could see it; I left the "relation" in the ON.
curdate() >= cs.dEffectiveStart might use an index on dEffectiveStart. Or it _might` use an index to help the other part.
The Optimizer would probably notice that "too much" of the table would need to be scanned with either index, and eschew both indexes and simply do a table scan.
Then it will quickly and efficiently JOIN to the other table.

Access Database Slow Finding Any Records Not Matching

My Access Database is slow when finding non-matching records
SELECT
RT3_Data_Query.Identifier, RT3_Data_Query.store, RT3_Data_Query.SOURCE,
RT3_Data_Query.TRAN_CODE, RT3_Data_Query.AMOUNT,
RT3_Data_Query.DB_CR_TYPE, RT3_Data_Query.status,
RT3_Data_Query.TRAN_DATE, RT3_Data_Query.ACCEPTED_DATE,
RT3_Data_Query.RECONCILED_DATE
FROM
RT3_Data_Query
LEFT JOIN Debit_AO_Query ON RT3_Data_Query.[Identifier] = Debit_AO_Query.[Identifier]
WHERE
(((Debit_AO_Query.Identifier) Is Null));
I'm doing a query of two queries I created. The last query is just to compare these two queries and show what is missing between them which is what i posted above. I'm matching an identifier between the two queries which looks like this 583005-01-20185804.33 which is a combination of store, date and amount.
Here is a link to the database:
https://wetransfer.com/downloads/15f912909fbe2ea0a5111e44b953d11a20190808195913/db9912
The query is slow because you don't use indexes on tables and join on concated fields (Identifier is Location & Date & Total)!
Each table needs a primary key or it is not a table! That should be an autonumber for the beginning!
Indexing:
Add a field called id to each table, datatype autonumber and make it PK.
Add a key for the fields compared in the join and the where clause (set all index properties (primary, unique, ignore) to no)!
for table RT3_Data (because it is huge create a copy first, then delete the data, or creating index will fail onMaxLocksPerFile):
store
AMOUNT
TRAN_DATE
after that reimport data from copy with query:
INSERT INTO RT3_DATA
SELECT [Copy Of RT3_DATA].*
FROM [Copy Of RT3_DATA];
for table Debit_AO:
Location
Total
Date (should be renamed as Date() is a VBA-Function)
Now change the queryRT3_Data_Query Without Matching Debit_AO_Queryto:
SELECT RT3_Data.store
,RT3_Data.SOURCE
,RT3_Data.TRAN_CODE
,RT3_Data.AMOUNT
,RT3_Data.DB_CR_TYPE
,RT3_Data.STATUS
,RT3_Data.TRAN_DATE
,RT3_Data.ACCEPTED_DATE
,RT3_Data.RECONCILED_DATE
FROM RT3_Data
LEFT JOIN Debit_AO
ON RT3_Data.[store] = Debit_AO.[Location]
AND RT3_Data.[AMOUNT] = Debit_AO.[Total]
AND RT3_Data.[TRAN_DATE] = Debit_AO.[DATE]
WHERE (
(
Debit_AO.Location IS NULL
AND Debit_AO.Total IS NULL
AND Debit_AO.[Date] IS NULL
)
);
Now the query executes in less than 10 seconds and for sure there are more optimizations (e.g composite index).

MySQL - merge column value for rows with the same key with the same key

Heres a description screen shot of the keys in a table that I use:
Table keys
Each row in the table is a representation of purchases by a specific client in a specific hour.
So a typical row would be like:
typical row (screenshot)
I need to merge two clients data, so one client will have all the purchases values summed up in his rows, for each hour.
In pseudo code what i want to perform is:
For every hour (row), Add the 'purchase amount' of the rows that have
client id of '526' to all rows that have client id '518'
.
At first, I tried to execute this but then got an error due to the multiple keys configured in the table:
UPDATE purchases set client id = 518 where client id = 526;
since the client '518' already has rows for the same hours, I can not perform the above query that creates new rows.
How should I tackle this?
You will require three queries:
One to do the sum if a record exists for both customers with the same time value:
update purchase p1
inner join purchase p2 on p2.client_id=528 and p2.date=p1.date
set p1.amount = p1.amount + p2.amount
where p1.client_id=526;
A second one to handle the records where only one exists (and not the one that will continue to exist):
insert into purchase
(select 526, date, amount
from purchase p1
where p1.client_id=528 and
not exists (select *
from purchase p2
where p2.client_id=526 and
p2.date=p1.date));
Note - the above can probably also be done (and more elegantly) using an update query.
And a final query to remove the merged records:
delete from purchase where client_id=528;
Note - I used client_id values 526 and 528 throughout - you may need to alter these numbers to fit your purpose.

Mysql DELETE where ID isn't present in multiple tables - best practice?

I want to delete people that aren't present in events or photos or email subscribers. Maybe they were, but the only photo they're tagged in gets deleted, or the event they were at gets purged from the database.
Two obvious options:
1)
DELETE FROM people
WHERE personPK NOT IN (
SELECT personFK FROM attendees
UNION
SELECT personFK FROM photo_tags
UNION
SELECT personFK FROM email_subscriptions
)
2)
DELETE people FROM people
LEFT JOIN attendees A on A.personFK = personPK
LEFT JOIN photo_tags P on P.personFK = personPK
LEFT JOIN email_subscriptions E on E.personFK = personPK
WHERE attendeePK IS NULL
AND photoTagPK IS NULL
AND emailSubPK IS NULL
Both A & P are about a million rows apiece, and E a few thousand.
The first option works fine, taking 10 seconds or so.
The second option times out.
Is there a cleverer, better, faster third option?
This is what I would do with, say, a multi-million row half-fictitious schema like above.
For the person, I would add count columns, 1 each, related to the child tables, and a datetime. Such as
photoCount INT NOT NULL,
...
lastUpdt DATETIME NOT NULL,
When it comes time for an INSERT/UPDATE on child tables (main focus naturally being insert), I would
begin a transaction
perform a "select for update" which renders an Intention Lock on the parent (people) row
perform the child insert, such as a new picture or email
increment the parent relevant count variable and set lastUpdt=now()
commit the tran (which releases the intention lock)
A delete against a child row is like above but with a decrement.
Whether these are done client-side / Stored Procs/ Trigger is your choice.
Have an Event see 1 and 2 that fires off once a week (you choose how often) that deletes people rows that have lastUpdt greater than 1 week and the count columns all at zero.
I realize the Intention Lock is not an exact analogy but the point about timeouts and row-level locking and the need for speed are relevant.
As always carefully craft your indexes considering frequency of use, real benefit, and potential drags on the system.
As for any periodic cleanup Events, schedule them to run in low peak hours with the scheduler.
There are some natural downsides to all of this. But if those summary numbers are useful for other profile pages, and fetching them on the fly is too costly, you benefit by it. Also you certainly evade what I see in your two proposed solutions as expensive calls.
I try duplicate your scenario here using postgreSQL. But I think there is something else you didnt tell us.
Both A & P are about a million rows apiece, and E a few thousand.
table people= 10k records
I select 9500 record at random and insert into email_subscriptions
Then duplicate those 9500 records 100 times for attendees and photo_tags total 950k on each table
SQL FIDDLE DEMO
First query need 5 sec
Second one need 11 millisec.

Merging two tables in Access?

I have two tables that have different data that I need to merge. They do have similarities such as: Order number, Name, type or product. But they have separate data as well like: Order date, and Engravings.
Would I do two separate Append queries in Access into a merged table? Or one Append queries? Or just keep the data separate?
I am new to Access and trying to find the best way to approach this.
Merging the two tables into one completely defeats the purpose of using a database and you're better off using excel at that point. You want to split the data as much as possible along logical lines so that you can find, say... all the orders that Mr X has ever made for a specific product. And in that case you're going to want to have separate tables for customers, orders, engravings and the like.
The best practice from a design standpoint is to place fields that each table has in common into a third "master" table, then create relationships from that table to the existing tables and delete the data that has been transferred to the main table (except for the primary keys, which have to be common with your master table).
To create the master table, use a Make Table query to generate the master table based on one of your tables, then an append query to add any products in the master table that might not be common to both, based on the other table. Finally, delete queries for each table would rid you of redundant data in both original tables.
However, I strongly suggest you use Microsoft's tutorials and download the NorthWind sample database so you can get an idea of what a properly structured database looks like. The beginner's learning curve for access is very steep and having well built example databases is almost a requisite.
Make a backup of your database(s) and play with it until it turns out right. Do not make the mistake of playing with live data until you know what you're doing.
As you have similar fields on either table, take the Order number field from both tables using a union query. Something like:
SELECT tbl_Delivery_Details.OrderNo
FROM tbl_Delivery_Details
GROUP BY tbl_Delivery_Details.OrderNo
UNION
SELECT tbl_Delivery_Header.[Order number]
FROM tbl_Delivery_Header
GROUP BY tbl_Delivery_Header.[Order number];
This would take the order numbers from the delivery details table and from the delivery header table and merge them into one list with only one instance of each order number. Save the query.
You could then use this query in a new query. Bring in your 2 tables to this query and insert the fields from either table that you require.
As users add records to the tables they will be added to the union selet query when it is next run.
PB
It depends on what you want to do. Let's assume you have tables A (with 50 records) and B (with 75) records, and both tables have a similar column called OrderID.
Appending Rows:
If you want to create a table with 125 total records by combining records (rows) from A and records (rows) from B, run the following two queries:
Query 1:
SELECT A.ORDER_NUMBER, A.TEXT_FIELD1 as DATA INTO C
FROM A;
Query 2:
INSERT INTO C ( ORDER_NUMBER, DATA )
SELECT B.ORDER_NUMBER, B.TEXT_FIELD2
FROM B;
Appending Columns: If you want to create a table with 75 total records where you are appending columns from A to the columns in B, then run the following query:
SELECT B.ORDER_NUMBER, A.TEXT_FIELD1, B.TEXT_FIELD2 INTO C
FROM A RIGHT JOIN B ON A.ORDER_NUMBER = B.ORDER_NUMBER;
... in a similar way, you can append columns in B to columns in A in a new table C with a total of 50 records by running the following query:
SELECT A.ORDER_NUMBER, A.TEXT_FIELD1, B.TEXT_FIELD2 INTO C
FROM A LEFT JOIN B ON A.ORDER_NUMBER = B.ORDER_NUMBER;