I have an application that used to insert a datetime field. There was an issue though that it would insert with the seconds and it wasn't supposed to!
I.e. 2011-08-07 15:24:06
They have corrected this and inserts now only include up to the minute so the above example would look like
2011-08-07 15:24:00
There are 20 million rows "wrong" at the moment. What is the most efficient query to fix the old ones?
UPDATE
table
SET
timefield = DATE_FORMAT(timefield,'%Y-%m-%d %H:%i:00')
WHERE
SECOND(timefield) <> 0;
This will have to read each row of the table and extract the seconds part of the time (this is unavoidable), but it won't have to update rows which already are correct.
Probably filter your update based on the 'wrong' ones then do a right trim of the last 3 characters.
Related
I have a database with ~200 tables that I want to audit to ensure the tables won't grow too large.
I know I can easily get an idea of a lot of the table attributes I want (size in MB, rows, row length, data length, etc) with:
SHOW TABLE STATUS FROM myDatabaseName;
But it's missing one key piece of information I'm after: how many rows are added to each table in a given time period?
My records each contain a datestamp column in matching formats, if it helps.
Edit: Essentially, I want something like:
SELECT COUNT(*)
FROM *
WHERE datestamp BETWEEN [begindate] AND [enddate]
GROUP BY tablename
The following should work to get number of rows entered in for a given table for a given time period:
select count(*) from [tablename] where datestamp between [begindate] and [enddate]
After a bit of research, it looks like this isn't possible in MySQL, since it would require massive table reads (after all, the number of rows can differ between users).
Instead, I grabbed the transaction logs for all the jobs that write into the tables and I'll parse them. A bit hacky, but it works.
How can I create a lifetime of a row so after a specific time say 2 weeks the row will automatically erase? Any info would be great.
RDBMS don't generally allow rows to automatically self destruct. It's bad for business.
More seriously, some ideas, depending on your exact needs
run a scheduled job to run a DELETE to remove rows based on some date/time column
(more complex idea) use a partitioned table with a sliding window to move older rows to another partition
use a view to only show rows less than 2 weeks old
Add a timestamp column to the table that defaults to CURRENT_TIMESTAMP, and install a cron job on the server that frequently runs and prunes old records.
DELETE FROM MyTable WHERE datediff(now(), myTimestamp) >= 14;
Or you can add timestamp column and always select like this:
SELECT * FROM myTable WHERE timetampColumn>=date_sub(now(), interval 2 week);
It is better if you don't need to erase the data and you want to show only data from last 2 weeks.
I am working on a mysql query for a report. The idea is to have a simple table say 'reportTable' with the values being fetched from various places. I could then use the reportTable more easily without remembering lots of joins etc and also share this table for other projects.
Should I break down the inner insert part of the query so it does
chunks at a time I will be adding probably tens of thousands of rows?
INSERT INTO reportTable
(
-- long query grabbing results from various places
SELECT var1 FROM schema1.table1
SELECT var2 FROM schema2.table1
SELECT var2 FROM schema2.table1
etc
)
This addresses your concerns that inserting data takes too long and so on. I understood it like you rebuild your table each time. So, instead of doing so, just fetch the data that is new and not already in your table. Since looking up if the data is already present in your report table might be expensive, too, just get the delta. Here's how:
Make sure that in every table you need a column like this is present:
ALTER TABLE yourTable ADD COLUMN created timestamp DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;
The ON UPDATE clause is of course optionally, don't know if you need to keep track of changes. If so, give me a comment and I can provide you with a solution with which you can keep a history of your data.
Now you need a small table that holds some meta information.
CREATE TABLE deltameta (tablename varchar(50), LSET timestamp, CET timestamp);
LSET is short for Last Successful Extraction Time, CET for Current Extraction Time.
When you get your data it works like this:
UPDATE deltameta SET CET = CURRENT_TIMESTAMP WHERE tablename = 'theTableFromWhichYouGetData';
SELECT #varLSET := LSET, #varCET := CET FROM deltameta WHERE tablename = 'theTableFromWhichYouGetData';
INSERT INTO yourReportTable (
SELECT whatever FROM aTable WHERE created >= #varLSET AND created < #varCET
);
UPDATE deltameta SET LSET = CET WHERE tablename = 'theTableFromWhichYouGetData';
When anything goes wrong during inserting your script stops and you get the same data the next time you run it. Additionally you can work with transactions here, if you need to roll back. Again, write a comment if you need help with this.
I may be wrong, but you seem to be talking about a basic view. You can read an introduction to views here: http://techotopia.com/index.php/An_Introduction_to_MySQL_Views, and here are the mysql view docs: http://dev.mysql.com/doc/refman/5.0/en/create-view.html
I'am having an optimalization problem. I am not really into MySQL Indexes yet. I read a few articles and questions but I haven't found what I am looking for.
I have an UPDATE statement with a WHERE condition on a table having nearly 1 million rows.
I want to speed up this UPDATE operation. I think most of the time is spent by looking for a the record specified in the WHERE.
My queries look like this, with different URLs:
UPDATE table SET updated = 1, updated_date = now() , [some more fields updated]
WHERE url = 'someurl.com?id=123532js23';
URLs are unique. Currently there is an AUTO INREMENT id field defined as a PRIMARY KEY. I dont need any <, > or BETWEEN operations - so maybe could I use some hashing?
What engine and indexing should I use for best performance?
One of my friends suggested to use InnoDB + UNIQUE on url field. Is there anything else I can do?
This UPDATE runs really many times - about 1 000 000 times each day and most of the executions results in updating - about 95%.
Thanks for any help!
One of my friends suggested to use InnoDB + UNIQUE on url field. Is there anything else I can do? This UPDATE runs really many times - about 1 000 000 times each day and most of the executions results in updating - about 95%.
You friend is right.
One thing is that URL may be long and the maximum possible length of an index key on InnoDB is 767 bytes.
Thus, you would better hash the urls (say, with MD5) and create a UNIQUE index on the field containing the url's hash (and of course use it in the WHERE condition):
INSERT
INTO mytable (url, hashed_url, ...)
VALUES ('someurl.com?id=123532js23', MD5('someurl.com?id=123532js23'))
UPDATE mytable
SET ...
WHERE hashed_url = MD5('someurl.com?id=123532js23')
I have a large MySQL photos table, and even with full-text indexing, normal indexing, and other performance tricks, I'm finding my query is getting a little slower (crappy MySQL full-text).
I have two fields, one called 'caption' and the other 'shortCaption'. The 'caption' field contains 500,000 rows of data between 150 and 2000 characters in length and the new 'shortCaption' field is empty.
The searchable information within my 'caption' field is mostly in the first 300 characters, so to speed up my full-text query I would like to have a 'shortCaption' field that contains the first 300 characters only, making less work for my DB/query.
Is there a way, in one query, to iterate through each photo record, grabbing the first 300 characters (length(field,300)) and updating the new field with the shorter version? I was just about to write a little server-side script that dealt with this, but curious to see if it can be done purely with SQL as it would be quicker.
Thanks in advance.
UPDATE tablename SET newfield = LEFT(oldfield, 300)
This query would update all the rows in tablename
Could try something like this:
UPDATE table
SET new_column = left(old_column, 300);
I ran a test against one of my test tables and it worked. Just know that the query will take some time to execute over 500K rows.
UPDATE table1
SET column1 = LEFT( column2, 300 )