Hadoop switching to new HDFS "image" - exception

I have new HDFS local storage directories (dfs.namenode.name.dir and dfs.datanode.data.dir), the actual local directories under which HDFS stores data (both for namenode, secondarynamenode and datanode), with all the necessary things (edits, fsimage, etc.). I would like to switch from my current HDFS to this new HDFS.
I order to to do this I have stopped the cluster (I run in pseudo-distributed mode), I have edited the hdfs-site.xml config (modified the paths) and started the cluster.
However the NameNode fails to start based on the following error:
FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected version of storage directory /path/to/NameNodeDir. Reported: -X. Expecting = -Y.
Why this doesn't work? As I said I have a whole "image/snapshot" of a properly working new HDFS (the whole thing: name, secondary, data). I thought that I could simply "swap" the HDFS config for the new one and it should work.
I can not format the NameNode, as the new HDFS "image" contains data. What I am trying to achieve is to "plugin" new HDFS, replacing the old one, without any modifications to the new HDFS "data/meta" files.
One of the potential problem might be the YARN/HDFS version missmatch. As stated here (http://hortonworks.com/blog/hdfs-metadata-directories-explained), the key layoutVersion in VERSION file in namenode/current is the different X from Y. The same key on my previous HDFS "instance" had a Y corresponding to the one visible in the logs. To simplify new HDFS layoutVersion value is X, old HDFS layoutVersion value is Y. I will try to upgrade to YARN 2.6 in order to verify this.
For any help I would be grateful.
regards
SOLUTION
The problem was with HDFS version mismatch, as I have written in the comment to user Mikhail Golubtsov. I was trying to run newer HDFS metadata with the use of binaries from older HDFS version, thus the error. If anybody encounters similar problem, just update/upgrade you YARN/HDFS version to the appropriate one. This has solved the issue for me.

Related

XAMPP MAC OSx - use external USB drive for MySQL storage

I'm using fairly large MySQL DB Tables with XAMPP, which makes it tough with my rather small internal storage of my Mac. I thought I would just keep MySQL data on an external USB3.0 SSD drive, but it looks like it's not that easy.
Here is what I've tried:
With XAMPP ( not VM ): Moved /Applications/XAMPP/xamppfiles/var/mysql to /Volumes/myexternalssd/mysql and then pointed everything in my.cnf to that dir. The permissions seem to have copied properly. But it didn't work. MySQL does not start at all if I trash the original dir, or just keeps using the original dir if I leave it in place.
With XAMPP-VM: Moving ~/.bitnami dir to the ext drive and then symlinling ( ln -s ) to the new location. The error is then:
Cannot load stack definitions.
Dtails:
1 error occurred:
* failed to create stack: cannot deserialize stack from file "/Users/arseni/.bitnami/stackman/machines/xampp/metadata.json": open /Users/arseni/.bitnami/stackman/machines/xampp/metadata.json: operation not permitted

JHipster liquibase

I'm getting started with JHipster and am attempting to initialize my data using liquibase. I have added two entities via the JHipster yo task and have added my two csv files into the /resources/config/liquibase directory and added the relevant loadData section to my "added entity" change log files to point at the CSV's. I had to update the MD5hash in the databasechangelog table and the app is running BUT, the CSV files don't seem to get picked up via the loadData elements I added to the "added entity" XML files. No data is inserted. Any ideas how to go about running this down?
If you updated the MD5 hashes in the changelog table, I suspect your change log files will not be run because Liquibase will think that they have already been run. I would rather set to null the MD5 hashes and re-start the app.
This was my solution for me:
1. Delete row in databasechangelog
2. Delete table
3. Re-start app
Liquibase re-generated table with csv and loaded all data to database.
I hope I helped you :)

Apache Cassandra Unable to load a csql file

I'm just starting out with Apache Cassandra. I have some csql files that define my data. I have got Cassandra installed on my machine and I did start it as per Apache Cassandra Wiki. Nothing suspicious!
I'm using the CLI to create the namespaces and the tables for which I have some cql files in a specific directory like:
create_tables.cql
load_tables.cql
I was able to successfully do the create_tables.cql, but when I tried to urn the load_tables.cql, I always end up seeing:
/Users/myUser/data/load-test-data.cql:7:Can't open 'test_data.csv' for reading: [Errno 2] No such file or directory: 'test_data.csv'
The load_tables.cql refers to another csv file that contains the test data that I want to populate my database with!
COPY test_table (id, name) FROM 'test_data.csv';
I tried doing al sort of permissions to the data folder where the cql files are, but still I keep getting this message. Any hints as to what I could do to get this solved?
Ok I got this one sorted! It has got to do with the absolute and relative paths. I ended up using an absolute path to where my CSV is located! This solved the issue!

Hadoop ConnectException

I recently installed hadoop on my local ubuntu. I have started data-node by invoking bin/start-all.sh script. However when I try to run the word count program
bin/hadoop jar hadoop-examples-1.2.1.jar wordcount /home/USER/Desktop/books /home/USER/Desktop/books-output
I always get a connect exception. The folder 'books' is on my deskop(local filesystem). Any suggestions on how to overcome this?
I have followed every steps in this tutorial. I am not sure how to get rid of that error. All help will be appreciated.
copy your books file into your hdfs
and for the input path argument use hdfs path of your copied book file.
for more detail go through below link.
http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_1_--_Running_WordCount#Basic_Hadoop_Admin_Commands
There is a bit of confusion here, when you run the hadoop ... command then the default filesystem which it uses is the hadoop distributed filesystem hence the files must be located on the hdfs for hadoop to access it.
To copy files from the local filesystem to the hadoop filesystem you have to use the following command
hdfs dfs -copyFromLocal /path/in/local/file/system /destination/on/hdfs
One more thing if you want to run the program from your IDE directly then sometimes you get this issue which can be solved by adding the
core-site.xml and hdfs-site.xml files in the conf variable something like
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/hdfs-site.xml"));
change the path above to the hdfs-site.xml and core-site.xml to your local path.
So the above arguments can also be provided from the command line by adding them to the classPath with -cp tag.

What is the 'Query' MySQL data file used for?

I am having some real difficulties finding out exactly what a certain file in the MySQL data directory is used for. (Using Google with its file name is pointless!)
Basically, I need to create some space on the drive that hosts all MySQL data and have noticed a file almost 16GB in size!!
I cant see any reference to a Query file in my config file nor can I match its size up to that of any log files, etc (in case its a log file missing the .log extension). I'm totally stumped!
I would like to know what this file is and how to reduce its size if at all possible?
Thanks in advance for your assistance!
That could be the general query log (I said "could" because the name can be configured by yourself). Look in your my.ini for an entry
log=/path/to/query
Or start the MySQL Administrator, goto "Startup Variables->Log Files" and look for "Query Logfile"
That file is completely unnessasary for your server to run (if you confirmed that the entry log=... exists in your config.
It is just good for debugging.
Try stopping your mysql server, delete it and restart your server again. The file will be recreated.
I also noticed that the slow-query-log ("diamond-slow-log") is large, too.
That file only logs queries that take longer than x seconds (2 by default). That file can be deleted or deactivated, too. But I would keep it since it contains queries that could easily be optimized with an extra index.
Update
there is another way, to confirm that this is the general query log.
Download a windows port of the tail unix command. E.g. this one http://tailforwin32.sourceforge.net/
I often use this on my dev machine to see what is goning on.
Open a shell (cmd.exe) and navigate the folder where that file exists.
Then type
tail -f query
That will print the last few lines of the file and if the file changes every new line.
So if you do a SELECT * FROM table you should see the query in the console output.