How to find old recovery files on Abinitio production boxes? - ab-initio

Can any one throw some light on the process of finding old recovery files in abinitio production boxes

Ab Initio's Co>Op creates recovery files in the working directory as *jobname***.rec**
So if you have an idea where the graphs/psets were executed from, you could simply do a find:
find . -name \*.rec
Or if you have access to the AbIntio log files from the old executed jobs, you could grep them for this:
ABINITIO: Job may be closed and recovery file deleted by: m_rollback -d <working directory path>/<jobname>.rec
which shows the name of the recovery file it created.

You can try the following command to find .rec files older than 60 days in current and its nested directories.
find . -name \*.rec -mtime +60

Related

What will happen with a gsutil command if a DRA bucket's contents are unavailable?

I'm on an DRA (Durable Reduced Availability) bucket and I perform the gsutil rsync command quite often to upload/download files to/from the bucket.
Since file) could be unavailable (because of the DRA), what exactly will happen during a gsutil rsync session when such a scenario is being hit?
Will gsutil just wait until the unavailable files becomes available and complete the task, thus always downloading everything from the bucket?
Or will gsutil exit with a warning about a certain file not being available, and if so exactly what output is being used (so that I can make a script to look for this type of message)?
What will the return code be of the gsutil command in a session where files are found to be unavailable?
I need to be 100% sure that I download everything from the bucket, which I'm guessing can be difficult to keep track of when downloading hundreds of gigabytes of data. In case gsutil rsync completes without downloading unavailable files, is it possible to construct a command which retries the unavailable files until all such files have been successfully downloaded?
If your files exceed the resumable threshold (as of 4.7, this is 8MB), any availability issues will be retried with exponential backoff according to the num_retries and max_retry_delay configuration variables. If the file is smaller than the threshold, it will not be retried (this will be improved in 4.8 so small files also get retries).
If any file(s) fail to transfer successfully, gsutil will halt and output an exception depending on the failure encountered. If you are using gsutil -m rsync or gsutil rsync -C, gsutil will continue on errors and at the end, you'll get a CommandException with the message 'N file(s)/object(s) could not be copied/removed'
If retries are exhausted and/or either of the failure conditions described in #2 occur, the exit code will be nonzero.
In order to ensure that you download all files from the bucket, you can simply rerun gsutil rsync until you get a nonzero exit code.
Note that gsutil rsync relies on listing objects. Listing in Google Cloud Storage is eventually consistent. So if you are upload files to the bucket and then immediately run gsutil rsync, it is possible you will miss newly uploaded files, but the next run of gsutil rsync should pick them up.
I did some tests on a project and could not get gsutil to throw any errors. Afaik, gsutil operates on the directory level, it is not looking for a specific file.
When you run, for example $ gsutil rsync local_dir gs://bucket , gsutil is not expecting any particular file, it just takes whatever you have in "local_dir" and uploads it to gs://bucket, so :
gsutil will not wait, it will complete.
you will not get any errors - the only errors I got is when the local directory or bucket are missing entirely.
if, let´s say a file is missing on local_dir, but it is available in the bucket and then you run $ gsutil rsync -r local_dir gs://bucket, then nothing will change in the bucket. with the "-d" option, the file will be deleted on the bucket side.
As a suggestion, you could just add a crontab entry to rerun the gstuil command a couple of times a day or at night.
Another way is to create a simple script and add it to your crontab to run every hour or so. this will check if your file exists, and if so it will run the gsutil command:
#!/bin/bash
FILE=/home/user/test.txt
if [ -f $FILE ];
then
echo "file exists..or something"
else
gsutil rsync /home/user gs://bucket
fi
UPDATE :
I think this may be what you need. In ~/ you should have a .boto file .
~$ more .boto | grep max
# num_retries = <integer value>
# max_retry_delay = <integer value>
Uncomment those lines and add your numbers. Default is 6 retries, so you could do something like 24 retries and put 3600s in between. This in theory should always keep looping .
Hope this helps !

Mercurial (TortoiseHG) won't commit after manually deleting some files

I removed some database files from my project using the search function in Explorer. After that Mercurial complains that it cannot find the files an refuses to commit. I tried using the shelve tool, but I run then in a bugreport for version 2.5 of TortoiseHG stating that the node holding the database file could not be found.
'
How do I solve this?
It it possible you deleted not only the files in your working directory but also down in the data store itself (.hg/....)? It's possible to do that if you search indelicately in explorer. Here's the command line equivalent:
ry4an#four:~/projects/unblog$ find . -name '*.xml*'
./static/attachments/2005-09-22-isle-royale.gpx.xml
./.hg/store/data/static/attachments/2005-09-22-isle-royale.gpx.xml.i
It is entirely safe and okay for me to delete that .gpx.xml file, but if I deleted every file with .gpx.xml in the name then I'd be deleting the file from the store and corrupting my repository.
Try running hg verify in the repository and see what output you get.

What is the 'Query' MySQL data file used for?

I am having some real difficulties finding out exactly what a certain file in the MySQL data directory is used for. (Using Google with its file name is pointless!)
Basically, I need to create some space on the drive that hosts all MySQL data and have noticed a file almost 16GB in size!!
I cant see any reference to a Query file in my config file nor can I match its size up to that of any log files, etc (in case its a log file missing the .log extension). I'm totally stumped!
I would like to know what this file is and how to reduce its size if at all possible?
Thanks in advance for your assistance!
That could be the general query log (I said "could" because the name can be configured by yourself). Look in your my.ini for an entry
log=/path/to/query
Or start the MySQL Administrator, goto "Startup Variables->Log Files" and look for "Query Logfile"
That file is completely unnessasary for your server to run (if you confirmed that the entry log=... exists in your config.
It is just good for debugging.
Try stopping your mysql server, delete it and restart your server again. The file will be recreated.
I also noticed that the slow-query-log ("diamond-slow-log") is large, too.
That file only logs queries that take longer than x seconds (2 by default). That file can be deleted or deactivated, too. But I would keep it since it contains queries that could easily be optimized with an extra index.
Update
there is another way, to confirm that this is the general query log.
Download a windows port of the tail unix command. E.g. this one http://tailforwin32.sourceforge.net/
I often use this on my dev machine to see what is goning on.
Open a shell (cmd.exe) and navigate the folder where that file exists.
Then type
tail -f query
That will print the last few lines of the file and if the file changes every new line.
So if you do a SELECT * FROM table you should see the query in the console output.

How to solve jenkins 'Disk space is too low' issue?

I have deployed Jenkins in my CentOS machine, Jenkins was working well for 3 days, but yesterday there was a Disk space is too low. Only 1.019GB left. problem.
How can I solve this problem, it make my master offline for hours?
You can easily change the threshold from jenkins UI (my version is 1.651.3):
[]
Update: How to ensure high disk space
This feature is meant to prevent working on slaves with low free disk space. Lowering the threshold would not solve the fact that some jobs do not properly cleanup after they finish.
Depending on what you're building:
Make sure you understand what is the disk output of your build - if possible - restrict the output to happen only to the job workspace. Use workspace cleanup plugin to cleanup the workspace as post build step.
If the process must write some data to external folders - clean them up manually on post build steps.
Alternative1 - provision a new slave per job (use spot slaves - there are many plugins that integrate with different cloud provider to provision on the fly machines on demand)
Alternative2 - run the build inside a container. Everything will be discarded once the build is finished
Beside above solutions, there is a more "COMMON" way - directly delete the largest space consumer from Linux machine. You can follow the below steps:
Login to Jenkins machine (Putty)
cd to the Jenkins installation path
Using ls -lart to list out hidden folder also, normally jenkin
installation is placed in .jenkins/ folder
[xxxxx ~]$ ls -lart
drwxrwxr-x 12 xxxx 4096 Feb 8 02:08 .jenkins/
list out the folders spaces
Use df -h to show Disk space in high level
du -sh ./*/ to list out total memory for each subfolder in current path.
du -a /etc/ | sort -n -r | head -n 10 will list top 10 directories eating disk space in /etc/
Delete old build or other large size folder
Normally ./job/ folder or ./workspace/ folder can be the largest folder. Please go inside and delete base on you need (DO NOT
delete entire folder).
rm -rf theFolderToDelete
You can limit the reduce of disc space by discarding the old builds. There's a checkbox for this in the project configuration.
This is actually a legitimate question so I don't understand the downvotes, perhaps it belongs on Superuser or Serverfault. This is a soft warning threshold not hard limit where the disk is out of space.
For hudson see where to configure hudson node disk temp space thresholds - this is talking about the host, not nodes
Jenkins is the same. The conclusion is for many small projects the system property called hudson.diagnosis.HudsonHomeDiskUsageChecker.freeSpaceThreshold could be decreased.
In saying that I haven't tested it and there is a disclaimer
No compatibility guarantee
In general, these switches are often experimental in nature, and subject to change without notice. If you find some of those useful, please file a ticket to promote it to the official feature.
I got the same issue. My jenkins version is 2.3 and its UI is slightly different. Putting it here so that it may helps someone. Increasing both disk space thresholds to 5GB fixed the issue.
I have a cleanup job with the following build steps. You can schedule it #daily or #weekly.
Execute system groovy script build step to clean up old jobs:
import jenkins.model.Jenkins
import hudson.model.Job
BUILDS_TO_KEEP = 5
for (job in Jenkins.instance.items) {
println job.name
def recent = job.builds.limit(BUILDS_TO_KEEP)
for (build in job.builds) {
if (!recent.contains(build)) {
println "Preparing to delete: " + build
build.delete()
}
}
}
You'd need to have Groovy plugin installed.
Execute shell build step to clean cache directories
rm -r ~/.gradle/
rm -r ~/.m2/
echo "Disk space"
du -h -s /
To check the free space as Jenkins Job:
Parameters
FREE_SPACE: Needed free space in GB.
Job
#!/usr/bin/env bash
free_space="$(df -Ph . | awk 'NR==2 {print $4}')"
if [[ "${free_space}" = *G* ]]; then
free_space_gb=${x/[^0-9]*/}
if [[ ${free_space_gb} -lt ${FREE_SPACE} ]]; then
echo "Warning! Low space: ${free_space}"
exit 2
fi
else
echo "Warning! Unknown: ${free_space}"
exit 1
fi
echo "Free space: ${free_space}"
Plugins
Set build description
Post-Build Actions
Regular expression: Free space: (.*)
Description: Free space: \1
Regular expression for failed builds: Warning! (.*)
Description for failed builds: \1
For people who do not know where the configs are, download the tmpcleaner from
https://updates.jenkins-ci.org/download/plugins/tmpcleaner/
You will get an hpi file here. Go to Manage Jenkins-> Manage plugins-> Advanced and then upload the hpi file here and restart jenkins
You can immediately see a difference if you go to Manage Nodes.
Since my jenkins was installed in a debian server, I did not understand most of the answers related to this since i cannot find a /etc/default folder or jenkins file.
If someone knows where the /tmp folder is or how to configure it for debian , do let me know in comments

Copying a MySQL database to another machine

I'm trying make a copy of a MySQL database on another server. I stop the server, tar up the mysql directory, copy it to the other server and untar it. I set all the permissions to match to the working server, and copied the my.cnf file so everything is the same except for server specific changes.
However, when I try to startup the server, I get the following InnoDB error:
InnoDB: Operating system error number 13 in a file operation.
This error means mysql does not have the access rights to
the directory.
File name /var/lib/mysql/ibdata1
File operation call: 'open'.
The owner/group for all the files is mysql. I even tried changing permissions to a+rw. I can su to the mysql user and access the ibdata1 file using head.
SOLUTION:
The problem was selinux was enabled and preventing the new files from being accessed.
A silly question, but people forget: you said you checked that all files have the same permissions; still, even though it said so in the message, might you possibly have forgotten to check the permissions on the containing directory?
UPDATE: Two more suggestions:
You might try inserting --console and --log-warnings flags, for lots of debugging output, something like this (on my Mac):
/usr/libexec/mysqld --console --log-warnings --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --socket=/var/lib/mysql/mysql.sock
If all else fails, you can probably try strace mysqld ... to see what exactly is it failing. The error will be somewhere at the bottom.
UPDATE2: Interesting indeed... I can't see what your OS is. I normally don't use /sbin/service, it's a bit mysterious for me; on a Mac, it's deprecated in favour of launchctl with config file in /System/Library/LaunchDaemons/mysqld.plist, and on most Linux boxes you have /etc/init.d/mysqld. So you could insert strace there.
Or (untested, but manpage suggests it's possible) you could try stracing the service call:
strace -ff -o straces /sbin/service start mysqld
This should produce files straces.pid, one of which should be mysqld's, and hopefully you'll find your error there.
This isn't a direct answer to your question, but I would recommend trying one of these programs for your backup / restore needs.
Percona Xtrabackup: https://launchpad.net/percona-xtrabackup
Mydumper: http://www.mydumper.org/
Both are great tools, are free and open source, and will help you avoid that problem entirely.
Hope that helps.