Troubleshooting failed packer build - packer

I am just getting started with Packer, and have had several instances where my build is failing and I'd LOVE to log in to the box to investigate the cause. However, there doesn't seem to be a packer login or similar command to give me a shell. Instead, the run just terminates and tears down the box before I have a chance to investigate.
I know I can use the --debug flag to pause execution at each stage, but I'm curios if there is a way to just pause after a failed run (and prior to cleanup) and then runt he cleanup after my debugging is complete.
Thanks.

This was my top annoyance with packer. Thankfully, packer build now has an option -on-error that gives you options.
packer build -on-error=ask ... to the rescue.
From the packer build docs:
-on-error=cleanup (default), -on-error=abort, -on-error=ask - Selects what to do when the build fails. cleanup cleans up after the previous steps, deleting temporary files and virtual machines. abort exits without any cleanup, which might require the next build to use -force. ask presents a prompt and waits for you to decide to clean up, abort, or retry the failed step.

Having used Packer extensively, the --debug flag is most helpful. Once the process is paused you SSH to the box with the key (in the current dir) and figure out what is going on.

Yeah, the way I handle this is to put a long sleep in a script inline provisioner after the failing step, then I can ssh onto the box and see what's up. Certainly the debug flag is useful, but if you're running the packer build remotely (I do it on jenkins) you can't really sit there and hit the button.
I do try and run tests on all the stuff I'm packing outside of the build - using the Chef provisioner I've got kitchen tests all over everything before it gets packed. It's a royal pain to try and debug anything besides packer during a packer run.

While looking up info for this myself, I ran across numerous bug reports/feature requests for Packer.
Apparently, someone added new features to the virtualbox and vmware builders a year ago (https://github.com/mitchellh/packer/issues/409), but it hasn't gotten merged into main.
In another bug (https://github.com/mitchellh/packer/issues/1687), they were looking at adding additional features to --debug, but that seemed to stall out.

If a Packer build is failing, first check where the build process has got stuck, but do the check in this sequence:
Are the boot commands the appropriate ones?
Is the preseed config OK?
If 1. and 2. are OK, then it means box has booted and the next to check is the login: SSH keys, ports, ...
Finally any issues within the provisioning scripts

Related

Why is shadow-cljs returning this error message on "Stale Output"? How to guarantee the watch for this building is running?

I am new to Clojure and not a pro in Javascript. I am watching the free part of the course on Reagent.
Following the instructions on the course's repo, after doing the git clone and the npm install, the author indicates running $ npm run dev. Everything seems to work fine. I can see the app on my http://localhost:3000/.
The favicon with the app's logo and its name is loaded on the corner of the browser's tab:
However, on the bottom of the web page, there is this error message from shadow-cljs:
shadow-cljs - Stale Output! Your loaded JS was not produced by the
running shadow-cljs instance. Is the watch for this build running?
Why is this happening? How should I fix it?
How to guarantee that the watch for this building is running?
Is there a simple command to run on terminal to check this?
Obs. 1: If this is relevant, my operational system is NixOS and this is my config file.
Obs. 2: I am not sure if this question is connected to my previous question on npm and Cider (Emacs IDE for Clojure) that happened while working with this same repo.
It is likely that this is due to you running npm run dev AND cider-jack-in.
I don't use emacs, so I'm not exactly sure what cider-jack-in does, but I believe it launches a new JVM. Since the npm run dev also did that you end up with two running JVMs, which also means two running shadow-cljs instances. That is not ideal and they will start interfering with each other leading to errors such as yours.
So, either you run npm run dev and use emacs to connect to that server. cider-connect or whatever is called should do that.
Or you don't run npm run dev at all and instead only cider-jack-in and then start the watch from the REPL.
Don't forget to first kill all java processes that might be running for that project. As long as there is more than one shadow-cljs process running for the project things will be weird.
This happens to me when I clicked on the build link BEFORE it has compiled. In which case, the link is displaying a previously compiled version, not the live version, and "watch" on code changes doesn't work either. Just wait for your terminal output to say "compiled" before clicking on the link.

QEMU, No bootable device, Windows Subsystem for Linux

I'm learning how to build a basic OS kernel with https://intermezzos.github.io
I've create my .iso file and I'm at the point where I am runnning qemu-system-x86_64 -cdrom os.iso
When I press enter, QEMU runs a window with the following output:
Booting from Floppy...
Boot failed: could not read the boot disk
Booting from DVD/CD...
Boot failed: Could not read from CDROM (code 0004)
Booting from ROM...
iPXE (PCI 00:03.0) starting execution...ok
iPXE initializing devices...ok
iPXE 1.0.0+git-20131111.c3d1e78-2ubuntu1.1 -- Open Source Network Boot Firmware
-- http://ipxe.org
Features: HTTP HTTPS iSCSI DNS TFTP AoE bzImage ELF MBOOT PXE Menu
net0: 52:54:00:12:34:56 using 82549em on PCI00:03.0 (open)
[Link:up, TX:0 TXE:0 RX:0 RXE:01]
Configuring (net0 52:54:00:12:34:56)...ok
net0: 10.0.2.15/255.255.255.0 gw 10.0.2.2
Nothing to boot: No such file for directory (http://ipxe.org/2d03e13b)
No more network devices
No bootable device.
I went to the website listed in the output (http://ipxe.org/2d03e13b) and one of the tips is that I might Use the iPXE command line to perform DHCP manually, however when I press CTRL + B to access cli, I'm not able to do so.
Where do I look next to troubleshoot this problem of not being able to boot my .iso?
How do I make QEMU have access to keyboard input?
UPDATE
I don't know how, but I'm am now able to use CTRL + B to access iPXE command line.
This seems like a good place to start diagnosing my problem of not being able to boot my .iso.
What am I looking for?
UPDATE 2
Thanks to Peter Maydell's suggestion below, I've tested a known-good iso image (https://alpinelinux.org/), running qemu-system-x86_64 -cdrom alpine-3.4.3-x86_64.iso and it booted perfectly just as I expected.
I've rewritten my files from https://intermezzos.github.io to create a new iso image, this time copy and pasting the code from the repository, just in case I was previously inputting typos.
Still not booting. On to the next clue...
The first thing to do is to check whether this command line and ISO image work on a normal Linux host system. That will tell you whether the problem is (a) the Windows Subsystem for Linux not correctly implementing something QEMU relies on or (b) your ISO image actually not being a bootable CDROM.
You might also try booting a known-good ISO image such as one for a Linux distribution.
(The general principle here is to try to do diagnostic tests to split the space of "what might be the problem" into smaller sections and determine which side your problem is.)
Simple solution to the problem. Has to do with systems that use EFI to boot.
source
I needed to apt-get install grub-pc-bin and then rebuild image.

deploying YouTrack6 on OpenShift

Some time ago, I've deployed a YouTrack5 instance on OpenShift, using this excellent tutorial. It works fine and smoothly.
Now, I want to install YouTrack6. Unfortunately, the same method can't be used for it, as since version 6 YouTrack .war file is no longer available.
So, I've tried to deploy YouTrack6 jar via a DIY cart, which should be ok, as the jar can be run standalone.
This is the command line that I've provided in the
.openshift/action_hooks/start script:
nohup /usr/lib/jvm/java-1.7.0/bin/java -Xmx1g -XX:MaxPermSize=250m -Djetty.home=$OPENSHIFT_DATA_DIR -Duser.home=$OPENSHIFT_DATA_DIR -Ddatabase.location=${OPENSHIFT_DATA_DIR}teamsysdata -Djava.awt.headless=true -jar ${OPENSHIFT_REPO_DIR}youtrack-6.0.12463.jar ${OPENSHIFT_DIY_IP}:${OPENSHIFT_DIY_PORT} &
Indeed, it works, the application is deployed and started - BUT: it is very unstable, looks like it crashes and caused to restart after just every few actions.
From the logs, I couldn't understand where the problem lies, looks like on YouTrack's side everything's ok.
My question is - what can be the problem that causes this unstable behavior, and is there any way to work around it (maybe by changing the command line flags, etc.)?

Why won't my NAnt builds run in Hudson?

My NAnt builds run fine locally on a developer machine, and locally on the command line of the Hudson server, but they will not run in my configured Hudson project.
The console output when I run a Build via the Hudson web UI is similar to the following :
Started by user anonymous [workspace]
$ sh -xe
C:\WINDOWS\TEMP\hudson8104357939096562606.sh
C:\WINDOWS\TEMP\hudson8104357939096562606.sh:
fork failed: no error [1] Archiving
artifacts Finished: SUCCESS
I have another project configured properly that runs fine so I know the NAnt plugin is setup properly in Hudson, and that NAnt is on the system path.
Can anyone suggest possible causes as to why this build won't run?
The problematic build may be configured to Execute a Shell script, rather than Execute a Windows Batch file.
Copy the command from the existing build step (the Execute Shell Script) and remove the step. Then add a new step to Execute a windows Batch File and paste the command.
Trigger the build and observe the results.
(I asked and answered this since it took me quite a while to figure out how I had mis-configured this particular build. Hopefully it'll save time or give ideas to other people trouble-shooting automation..)

Hudson build fails when run in browser but works from command line

I am setting up a new Hudson task (on WinXP) for a project which generates javascript files, and performs xslt transformations as part of the build process.
The ant build is failing on the XSL transformations when run from Hudson, but works fine when the same build on the same codebase (ie in Hudson's workspace) is run from the command line.
The failure message is:
line 208: Variable 'screen' is multiply defined in the same scope.
I have tried configuring Hudson to use both ant directly and to use a batch script - both fail in Hudson.
I have tried in Firefox, IE6 and Chrome and have seen the same issue.
Can anyone suggest how we can workaround this problem with Hudson?
Problem solved.
Our build is actually dependent on jdk 1.4.2, and Hudson appears to run using 1.6. When I set Hudson to run as a service, it ran as my local user, which meant that it picked up the 1.4.2 JAVA_HOME environment variable - and therefore worked.
I guess another possible solution is to configure Hudson to use 1.4.2 by default.
I would assume this is not an issue with Hudson directly, as it is with the build script and/or the environment itself.
Is your build script relying on certain environment variables being defined, or worse, the job running from within a certain directory structure (i.e. it works if it's run from under /home/mash/blah but not from under another directory like /tmp)? Is the build script making reference to external files by relative paths?
These are the things I would look into. For environment variables, you can tell Hudson to pass these into Ant. For the other issues, you probably want to change your build script. Check the console output provided by Hudson, and maybe set Ant to print verbose/debug messages to get a better idea about the environment/filepaths.