Use of Index to Improve Speed of Aggregate functions in select query - mysql

I need to create a new table with sum aggregates of the measures columns in source table.
The Source table is very huge.
eg. Source Table
Category | Product | Sales
A | P1 | 100
B | P2 | 200
C | P3 | 300
Query is like :
SELECT Category,
Product,
SUM(Sales)
FROM source_table
GROUP BY Category.
There is no where condition.
Will indexing help in speeding up the process?
Any alternate mechanism for speeding the query?

It might help to add an index on Category since it is in the GROUP BY clause. But you're doing a full table dump, so it might just be slow.
Probably a better strategy is to create a new table for the sales report and populated it based on your business needs. If it can be updated only daily, then schedule a stored procedure to run nightly to repopulate it. If it needs to reflect the current state of the table, then you can use triggers to update the report table as the base table is updated. Or you could run a separate query at the application level to update the report table when your base table is updated.

Indexes are a tricky tool. If you're planning to add an index to a column of your table, you should consider at the very least:
1- How many different values do I have in this column.
2- How's the proportion between the total number of record and the number of different values.
3- How often do I apply where, group by or order by clauses on this column.
As the #Kasey answer states, for what's to see, you could add an index on the category column, but, it will depend on the number of different values you have for that column.

Related

how to compare huge table of mysql

I have a huge table of mysqlwhich contains more than 33 million records .How I could compare my table to found non duplicate records , but unfortunately select statement doesn't work. Because it's huge table.
Please provide me a solution
First, Create a snapshot of your database or the tables you want to compare.
Optionally you can also limit the range of data you want to compare , for example only 3 years of data. This way your select query won't hog all the resources.
Snapshot will be bunch of files each representing a table containg your primary key or business key for each record ( I am assuming you can compare data based on aforementioned key . If thats not the case record all the field in your file)
Next, read each records from the file and do a select against the corresponding table. If there are more than 1 record you know it is a duplicate
Thanks
Look at the explain plan and see if what the DB is actually doing for the NOT IN.
You could try refactoring, with an index on subscriber as Roy suggested if necessary. I'm not familiar enough with MySQL to know whether the optimizer will execute these identically.
SELECT *
FROM contracts
WHERE NOT EXISTS
( SELECT 1
FROM edms
WHERE edms.subscriber=contracts.subscriber
);
-- or
SELECT C.*
FROM contracts AS C
LEFT
JOIN edms AS E
ON E.subscriber = C.subscriber
WHERE E.subscriber IS NULL;

MySQL combine n tables

I have n (source) tables with the same structure that each have few million rows. Each of these table receives new data from different sources on a regular basis.
(Ex: Sales table. Each store have its own sales table. There's 1000 stores selling hundred of thousands items each day. How would you combine those tables?)
I would like to merge them in one summary table. I would like the changes from any of the source tables to be reflected on the summary and changes on the summary to be reflected on the appropriate source table.
(Ex: Sales table. When new sales occurs, the summary table is updated. If a changes to the sale is made in the summary table, it is reflected on the appropriate store table.)
I can see three solutions.
1.Create an event/trigger that would refresh my summary tables at a given time or after an insert/update/delete.
Something like:
#Some event triggers this
DROP TABLE table_summary;
INSERT INTO table_summary
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
UNION ALL
SELECT * FROM tablen...
The downside here, I believe, is performance, I do not think I can afford to run this query every time there is an INSERT/UPDATE/DELETE on one of the table.
2.Create a view.
CREATE VIEW table_summary AS
SELECT * FROM table1
UNION ALL
SELECT * FROM table2;
#This query takes 90s to complete
Performance wise, I have the same kind of problem as with the solution #1
3.Create an INSERT/UPDATE/DELETE trigger for each table. That's a lot of triggers and MySQL limit to one per table. I started that way but the code scaffolding to maintain appears impressive and likely hard to maintain.
I am sure there's a better way I have not think of.

Using update and insert triggers to provide a count column in other relations

I have a table 'user_plays_track' that keeps track of how many times a user has 'played' a track.
I use the following query to either insert a new track a user has played, or update the number of times an existing track has been played:
INSERT INTO user_plays_track
(user_id, track_id) VALUES (x,y)
ON duplicate key UPDATE play_count = play_count+1
Here is the structure of my table:
user_id track_id play_count
1 5 2
4 2 1
3 5 7
From this information, I can infer things such as the total number of times a track has been played, or the total number of plays an artist has had, by finding the sum of all the track counts.
With a thousand or so records, this would soon become messy and the semantics unclear. What I wish to do, is use triggers to produce what could be described as cache.
For example, when a record is updated or inserted into 'user_plays_track', the 'tracks' table will increment its play_count column, indicating the total number of plays from all users for that track.
track_id artist_id track_name play_count
2 1 Hey 1
5 1 Test 9
Furthering this, another trigger should be applied, to infer new knowledge such as the total number of artist plays. This would again be triggered when a new track is added, it will find the artist_id the track belongs to and update the 'artist' table accordingly.
artist_id artist_name play_count
1 Bob 10
How would I go about implementing the relevant triggers, to provide a incrementing totals when a user 'plays' a track?
The more you want to calculate at query time, the more you want views, calculated columns and stored or user routines. The more you want to calculate at normalized base update time, the more you want cascades and triggers. The more you want to calculate at some other (scheduled or ad hoc) time, the more you use snapshots aka materialized views and updated denormalized bases. You can combine these. Any time the database is accessed it can be enabled by and restricted by stored routines or other api.
Until you can show that they are in adequate, views and calculated columns are the simplest.
The whole idea of a DBMS is to store a representation of your application state as the database (which normalization reduces the redundancy of) and then you query and let the DBMS implement and optimize calculation of the answer. You haven't presented a reason for not doing that in the most straightforward way possible.

MySQL: Copy entire row from one to another and delete original

Could someone please explain (or point in right direction) how I would move multiple rows from one table to another and remove the row from the original table based on a set criteria?
I understand
INSERT INTO table2 SELECT * FROM table1
to copy the data from one table to another but I need to then remove the original. The reason being it has been suggested to speed up the querying of the table I should move all redundant data (ended, expired, products older than 3 months) from the main table to another one.
A bit of background, I have a table that holds products, some products have expired but the products still need to be accessible. There are about 50,000 products that have expired and 2,000 which are active. There is a status column (int 1 = active, 2 = expired etc) to determine what to show on the front end.
I guess this post is 2 questions:
Is there a better way to speed up the querying of the product table without removing expired items?
If not, how to move rows from one table to another
Many many thanks!
INSERT INTO table2 (column_name1, column_name2) SELECT column_name1,
column_name2 FROM table 1 WHERE (where clause here)
DELETE FROM table1 WHERE (where clause here)
Source for above: mysql move row between tables
50,000 records in the table really isn't that many. If you're having performance issues, I'd look at your queries and your indexes to help speed up performance. And since those expired records still need to be accessed, then it could be more difficult having multiple tables to maintain.
However, to move data from one table to another as you've asked, you just need to run 2 different statements. Assuming you want to move inactive products:
INSERT INTO ProductsBackup SELECT * FROM Products WHERE Status <> 1
DELETE FROM Products WHERE WHERE Status <> 1
If you have Identities on your columns, you might be better off specifying the column names. But assuming the ProductId is the Identity, then be careful moving those to a different table as you probably don't want to lose that original id as it may point to other tables.
Good luck.

Matching algorithm in SQL Server 2008

I have more than 3 million rows in my table. When the user try to insert or update this table I have to check the following conditions sequentially.(Business Need)
Does any of the row has same address?
Does any of the row has same postcode?
Does any of the row has same DOB?
Obviously the newly inserted or updated row will match lot of the records from this table.
But the business need is, the matching process should end when the first match (row) found and that row has to returned.
I can easily achieve this using simple "SELECT" query . But it's taking very long time to find the match.
Please suggest some efficient way to do this.
If you're just looking for a way to return after the first match, use LIMIT 1.
You may want to maintain a table of either birth dates or postcodes and have each row link to a user, so that you can easily filter customers down to a smaller set. It would allow you to perform a much faster search on the database.
Example:
dob | userID
1/1/1980 | 235
1/1/1980 | 482
1/1/1980 | 123
2/1/1980 | 521
In that scenario, you only have to read 3 rows from the large users table if your target date is 1/1/1980. It's via a primary key index, too, so it'll be really fast.