Oracle best practice - mysql

I have to pull some incremental data and do some small and complex calculations after that. But in days passing by, the data grew large and after the 1st incremental stage, it started more time to insert and update large records.
So, what I did was:
CREATE TABLE T1 AS(SELECT (some_conditions) FROM SOME_TABLE);
CREATE TABLE T2 AS(SELECT (some_conditions) FROM T1);
DROP TABLE T1
RENAME T2 TO T
Is this a good practice in a production environment. It works very fast though.

Normally I'd agree that DDL is pretty bad thing to do regularly but we need to be pragmatic.
I think if Tom Kyte (Oracle Guru) says it's ok then it's ok.
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6407993912330

Related

Manage Querying from huge table which takes a lot time to alter/update

i have a very huge table "table1" from which i am continuously querying all day(24x7)
What happens is at the end of the day say at 12.AM, i run a query which would alter "table1" at row level. this activity takes around 3-4 hours till my updated "table1" is finished creating.
But till that time i wanted to still query from "table1".
SO i decided to create two tables. "table1_active" and "table1_passive"
normally during the day i will query from "table1_passive" and after i am updating "table1_active" i should switch my querying from "table1_passive"
to "table1_active"
and this switching should be done everyday, so that my all day querying should not hamper.
I dont know is there a better way to actually set a trigger or can anyone suggest me a method to do it?
In my experience, the use of a secondary table like table1_passive it's risky. You don't know exactly (as I understand) when the update process finish so you will don't know either when you should switch querying between table1_passive and table1_active.
There are several ways to improve the update process over your table but you have to keep in mind, those are temporary solutions if table1 grows constantly:
Use MyISAM as storage engine. Here is a very good article about improve the updates over a MyISAM table.
If you are updating table1 based on a where clause, you might use indexes to help database engine find which records has to update.
Consider use partitions to work with your table faster.
If you sill have those two tables, you can:
Create an Unique_Index on table1_active and set ON DUPLICATE KEY UPDATE
Update table1_passive.
Use bulk insert in table1_active to speed up the process, the database will make sure that there are not duplicate rows based on your criteria.
BUT, if you are querying all day, and the table grows constantly I suggest to use NoSql because the problem will be there, even if you optimize the update process now.
If we could know the table structure and the update query you are using maybe we can help you better ;)
Regards and good luck!

Optimising "NOT IN(...)" query for millions of rows

Note: I do not have access to the source code/database to which this question pertains. The two tables in question are located on different servers.
I'm working with a 3rd party company that have systems integrated with our own. They have a query that runs something like this;
DELETE FROM table WHERE column NOT IN(1,2,3,4,5,.....3 000 000)
It's pretty much referencing around 3 million values in the NOT IN.
I'm trying to point out that this seems like an inefficient method for deleting multiple rows and keeping all the ones noted in the query. The problem is, as I don't have the access myself to the source code/database I'm not totally sure what to suggest as a solution.
I know the idea of this query is to keep a target server synced up with a source server. So if a row is deleted on the source server, the target server will reflect that change when this (and other) query is run.
With this limited knowledge, what possible suggestions could I present to them?
The first thing that comes to mind is having some kind of flag column that indicates whether it's been deleted or not. When the sync script runs it would first perform an update on the target server for all rows marked as deleted (or insert for new rows), then a second query to delete all rows marked for deletion.
Is there more logical way to do something like this, bearing in mind complete overhauls in functionality are out of the question. Only small tweaks to the current process will be possible for a number of reasons.
Instead of
DELETE FROM your_table
WHERE column NOT IN(1,2,3,4,5,.....3 000 000)
you could do
delete t1
from your_table t1
left join table_where_the_ids_come_from t2 on t1.column = t2.id
where t2.id is null
I know the idea of this query is to keep a target server synced up with a source server. So if a row is deleted on the source server, the target server will reflect that change when this (and other) query is run.
I know this is obvious, but why don't these two servers stay in sync using replication? I'm guessing it's because aside from this one table, they don't have identical data.
If out-of-the-box replication isn't flexible enough, you could use a change-data capture tool.
The idea is that the tool monitors changes in a MySQL binary log stream, and reacts to them. The reaction is user-defined, and it can include applying the same change to another MySQL instance, which would keep them in sync.
Here's a blog that shows how to use Maxwell, which is one of the open-source CDC tools, this one released from Zendesk:
https://www.percona.com/blog/2016/09/13/mysql-cdc-streaming-binary-logs-and-asynchronous-triggers/
A couple of advantages of this approach:
No need to re-sync the whole table. You'd only apply incremental changes as they occur.
No need to schedule re-syncs daily or whatever. Since incremental changes are likely to be small, you could apply the changes nearly immediately.
Deleting a large number of rows will take a huge amount of time. This is likely to require a full table scan. As it finds rows to delete, it will stress the undo/redo log. It will clog replication (if using such). Etc.
How many rows do you expect to delete?
Better would be to break the list up into chunks of 1000. (This applies whether using IN(list of constants) or JOIN.) But, since you are doing NOT, it gets stickier. Possibly the best way is to copy over what you want:
CREATE TABLE new LIKE real;
INSERT INTO new
SELECT * FROM real WHERE id IN (...); -- without NOT
RENAME TABLE real TO old,
new TO real;
DROP TABLE old;
I go into details of chunking, partitioning, and other techniques in Big Deletes .

Should I be using CREATE VIEW instead of JOIN all the time

I have the following query:
SELECT t.*, a.hits AS ahits
FROM t, a
WHERE (t.TRACK LIKE 'xxx')
AND a.A_ID = t.A_ID
ORDER BY t.hits DESC, a.hits DESC
which runs very frequently. Table t has around 15M+ rows and a has around 3M+ rows.
When I did an EXPLAIN on the above query, I received a note saying that it always created a temp table. I noticed that creating a temp table based on the above query took quite a while. And, this is done plenty of time.
Thus, I am wondering if I create a view using the above say:
CREATE VIEW v_t_a
SELECT t.*, a.hits AS ahits
FROM t, a
WHERE a.A_ID = t.A_ID
And change my code to:
SELECT * FROM v_t_a WHERE TRACK LIKE 'xxx' ORDER BY hits DESC, ahits DESC
Will it improve the performance? Will it remove the create temp table time?
Thank you so much for your suggestions!
It is very dangerous if you assume MySQL would optimize your VIEWs same way as more advanced database systems would. Same as with subqueries and derived tables MySQL 5.0 will fail and perform very inefficiently in many counts.
MySQL has two ways of handling the VIEWS – query merge, in which case VIEW is simply expanded as a macro or Temporary Table in which case VIEW is materialized to temporary tables (without indexes !) which is later used further in query execution.
There does not seems to be any optimizations applied to the query used for temporary table creation from the outer query and plus if you use more then one Temporary Tables views which you join together you may have serious issues because such tables do not get any indexes.
So be very careful implementing MySQL VIEWs in your application, especially ones which require temporary table execution method. VIEWs can be used with very small performance overhead but only in case they are used with caution.
MySQL has long way to go getting queries with VIEWs properly optimized.
VIEW internally JOINS the TWO tables everytime you QUERY a VIEW...!!
To prevent this, create MATERIALIZED VIEW...
It is a view that is more of a TABLE ...You can query it directly as other table..
But you have to write some TRIGGERS to update it automatically, if any underlying TABLE data changes...
See this : http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views
It's rare that doing exactly the same operations in a view will be more efficient than doing it as a query.
The views are more to manage complexity of queries rather than performance, they simply perform the same actions at the back end as the query would have.
One exception to this is materialised query tables which actually create a separate long-lived table for the query so that subsequent queries are more efficient. I have no idea whether MySQL has such a thing, I'm a DB2 man myself :-)
But, you could possibly implement such a scheme yourself if performance of the query is an issue.
It depends greatly on the rate of change of the table. If the data is changing so often that a materialised query would have to be regenerated every time anyway, it won't be worth it.

How to continuously remove anything older than the newst 10 entries of a MySQLdatabase (Possibly in JPQL/JPA)

I'm looking for a way to continuously monitor and delete the oldest entries so that the database is never larger than a certain value. I'm only interested in the latest 10 for example and everything past that number should be deleted. The database is updated through varous programs but the program that does the monitoring and deleting will probably be a Java EE application with JPA. I don't know at which layer of the implementation this will be done. If MySQL has build in management that does this, if I'll have to write a query that does this, or if there is a feature of Java that can do this.
Edit: I'm using an autoincremented id that could be used to determine threshhold of deleting.
This is a complex problem, because unless your table is not linked to any other table, you might very well have the latest row in table A referencing a very old row in table B. In this case, although the table B's row is very old, you can't delete it without breaking the coherence of your database.
Doing it "continuously" is even harder (read: impossible). I would first
examine if it's really needed. Disks are cheap, and 10 entries in an enterprise database is really nothing.
implement some purge mechanism and execute it very now and then, when the database is not used by anyone else.
I'll have a stab without knowing anything about your table schema:
DELETE FROM MyTable WHERE Id NOT IN (SELECT TOP 10 Id FROM MyTable ORDER BY Date DESC)
This is pretty inefficient to run all the time and there may by a MySql-specific TRUNCATE that does the job nicer. You'd probably get better performance from limiting your reads to the 10 rows you need, and actually archiving / deleting the extraneous data only periodically.

Temporary table vs Table variable

I have a table where I have around 1.5 million+ results that I need to delete. Previously, I was using a temporary table and this caused the transaction log to increase in size quite quickly. Problem is, once I have done one result set, I need to move onto another where there is another 1.5 million+ results. The performance of this is rather slow and I'm wondering if I should use a table variable rather than writing a table to the temp database.
EDIT
I use the temporary table when I select the initial 1.5million+ records.
Side-stepping the table variable vs. temp table question, you're probably better off batching your deletes into smaller groups inside of a while loop. That's your best bet for keeping the transaction log size reasonable.
Something like:
while (1=1) begin
delete top(1000)
from YourTable
where ...
if ##rowcount < 1000 break
end /* while */
In general, I prefer using table variables over temp tables, if only because they're easier to use. I find few cases where the use of temp tables is warranted. You don't talk about how you're using temp tables in your routines, but I suggest benchmarking the two options.
A table variable is often not suitable for such large resultsets, being more appropriate for small numbers. You'd likely find that the table variable's data would be written to tempdb anyway due to its size.
Personally I have found table variables to be much slower than temporary tables when dealing with large resultsets. In an example mentioned at the end of this article on SQL Server Central using 1 million rows in a table of each time, the query using the temporary table took less than a sixth of the time to complete.
Personally I've found table variables to often suffer performance-wise when I have to join them to real tables in a query.
If the performance is slow it may be at least partly the settings on the database itself. Is it set to automatically grow? What's the recovery model of it?