Too many warning and info messages during hive startup - warnings

Currently, I use hadoop-2.2.0.tar.gz and hive-0.11.0.tar.gz to study hadoop and hive. I just extract files from these two tarballs and set the environment for them.
Then I startup the hive with the hivecommand. I find it too many WARN and INFO messages as the following. Is there any way to eliminate these messages? I haven't set any configuration yet. Whether the version of hadoop is not compatible with the hive?
14/03/21 09:24:52 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/03/21 09:24:52 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/03/21 09:24:52 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/03/21 09:24:52 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/03/21 09:24:52 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/03/21 09:24:52 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/03/21 09:24:52 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/03/21 09:24:52 WARN conf.Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream#6295eb:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
14/03/21 09:24:52 WARN conf.Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream#6295eb:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
Logging initialized using configuration in jar:file:/home/ljq/hive-0.11.0/lib/hive-common-0.11.0.jar!/hive-log4j.properties
Hive history file=/tmp/ljq/hive_job_log_ljq_8719#rhel6-hadoop1_201403210924_1350715255.txt
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ljq/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ljq/hive-0.11.0/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive (default)>

According to the next link: org.apache.hadoop.hive.conf.HiveConf.java these values are defined inside the class, and these are deprecated just now, so this warnings always will be showed. The solution proposed in this another link: hadoop.apache.org/Configuration is just omit warnings. You must edit the file $HADOOP_HOME/etc/hadoop/log4j.properties and uncomment the line after the legend:
# Uncomment the following line to turn off configuration deprecation warnings.
log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN

Related

rhc An unexpected error occurred: invalid character at "<!doctype " even direct HIT Enter

I am having issue while setup openshift enviroment, even though I just HIT enter only. Any alternative way to setup?
-bash-4.2$ rhc setup
/usr/local/lib/ruby/gems/2.5.0/gems/commander-4.2.1/lib/commander/user_interaction.rb:328: warning: constant ::NIL is deprecated
/usr/local/lib/ruby/gems/2.5.0/gems/commander-4.2.1/lib/commander/user_interaction.rb:328: warning: constant ::Data is deprecated
/usr/local/lib/ruby/gems/2.5.0/gems/commander-4.2.1/lib/commander/user_interaction.rb:328: warning: constant ::TRUE is deprecated
/usr/local/lib/ruby/gems/2.5.0/gems/commander-4.2.1/lib/commander/user_interaction.rb:328: warning: constant ::FALSE is deprecated
/usr/local/lib/ruby/gems/2.5.0/gems/commander-4.2.1/lib/commander/user_interaction.rb:328: warning: constant ::TimeoutError is deprecated
/usr/local/lib/ruby/gems/2.5.0/gems/commander-4.2.1/lib/commander/user_interaction.rb:328: warning: constant ::Fixnum is deprecated
/usr/local/lib/ruby/gems/2.5.0/gems/commander-4.2.1/lib/commander/user_interaction.rb:328: warning: constant ::Bignum is deprecated
OpenShift Client Tools (RHC) Setup Wizard
This wizard will help you upload your SSH keys, set your application namespace, and check that other programs like Git are properly installed.
If you have your own OpenShift server, you can specify it now. Just hit enter to use the server for OpenShift Online: openshift.redhat.com.
Enter the server hostname: |openshift.redhat.com|
You can add more servers later using 'rhc server'.
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.6.0.1/lib/httpclient/session.rb:746: warning: Object#timeout is deprecated, use Timeout.timeout instead.
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.6.0.1/lib/httpclient/session.rb:616: warning: Object#timeout is deprecated, use Timeout.timeout instead.
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.6.0.1/lib/httpclient/session.rb:872: warning: Object#timeout is deprecated, use Timeout.timeout instead.
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.6.0.1/lib/httpclient/session.rb:950: warning: Object#timeout is deprecated, use Timeout.timeout instead.
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.6.0.1/lib/httpclient/session.rb:746: warning: Object#timeout is deprecated, use Timeout.timeout instead.
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.6.0.1/lib/httpclient/session.rb:616: warning: Object#timeout is deprecated, use Timeout.timeout instead.
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.6.0.1/lib/httpclient/session.rb:872: warning: Object#timeout is deprecated, use Timeout.timeout instead.
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.6.0.1/lib/httpclient/session.rb:983: warning: Object#timeout is deprecated, use Timeout.timeout instead.
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.6.0.1/lib/httpclient/session.rb:983: warning: Object#timeout is deprecated, use Timeout.timeout instead.
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.6.0.1/lib/httpclient/session.rb:983: warning: Object#timeout is deprecated, use Timeout.timeout instead.
/usr/local/lib/ruby/gems/2.5.0/gems/httpclient-2.6.0.1/lib/httpclient/session.rb:983: warning: Object#timeout is deprecated, use Timeout.timeout instead.
An unexpected error occurred: invalid character at "<!doctype "
rhc is the command line tool for the deprecated OpenShift v2. If you're looking for the OpenShift v3, built on Kubernetes, command line tool, you can find it here: https://github.com/openshift/origin/releases/tag/v3.11.0
If you're trying to use v2 locally for some reason, the openshift.redhat.com server is no longer active, so you would have to specify where you're running a v2 server.

No suitable driver found for jdbc in Spark

I am using
df.write.mode("append").jdbc("jdbc:mysql://ip:port/database", "table_name", properties)
to insert into a table in MySQL.
Also, I have added Class.forName("com.mysql.jdbc.Driver") in my code.
When I submit my Spark application:
spark-submit --class MY_MAIN_CLASS
--master yarn-client
--jars /path/to/mysql-connector-java-5.0.8-bin.jar
--driver-class-path /path/to/mysql-connector-java-5.0.8-bin.jar
MY_APPLICATION.jar
This yarn-client mode works for me.
But when I use yarn-cluster mode:
spark-submit --class MY_MAIN_CLASS
--master yarn-cluster
--jars /path/to/mysql-connector-java-5.0.8-bin.jar
--driver-class-path /path/to/mysql-connector-java-5.0.8-bin.jar
MY_APPLICATION.jar
It doens't work. I also tried setting "--conf":
spark-submit --class MY_MAIN_CLASS
--master yarn-cluster
--jars /path/to/mysql-connector-java-5.0.8-bin.jar
--driver-class-path /path/to/mysql-connector-java-5.0.8-bin.jar
--conf spark.executor.extraClassPath=/path/to/mysql-connector-java-5.0.8-bin.jar
MY_APPLICATION.jar
but still get the "No suitable driver found for jdbc" error.
I had to add the driver option when using the sparkSession's read function.
.option("driver", "org.postgresql.Driver")
var jdbcDF - sparkSession.read
.option("driver", "org.postgresql.Driver")
.option("url", "jdbc:postgresql://<host>:<port>/<DBName>")
.option("dbtable", "<tableName>")
.option("user", "<user>")
.option("password", "<password>")
.load()
Depending on how your dependencies are setup, you'll notice that when you include something like compile group: 'org.postgresql', name: 'postgresql', version: '42.2.8' in Gradle, for example, this will include the Driver class at org/postgresql/Driver.class, and that's the one you want to instruct spark to load.
There is 3 possible solutions,
You might want to assembly you application with your build manager (Maven,SBT) thus you'll not need to add the dependecies in your spark-submit cli.
You can use the following option in your spark-submit cli :
--jars $(echo ./lib/*.jar | tr ' ' ',')
Explanation : Supposing that you have all your jars in a lib directory in your project root, this will read all the libraries and add them to the application submit.
You can also try to configure these 2 variables : spark.driver.extraClassPath and spark.executor.extraClassPath in SPARK_HOME/conf/spark-default.conf file and specify the value of these variables as the path of the jar file. Ensure that the same path exists on worker nodes.
I tried the suggestions shown here which didn't work for me (with mysql). While debugging through the DriverManager code, I realized that I needed to register my driver since this was not happening automatically with "spark-submit". I therefore added
Driver driver = new Driver();
The constructor registers the driver with the DriverManager, which solved the SQLException problem for me.

ERROR org.apache.sqoop.tool.ExportTool - Error during export: Export job failed

we are trying to export the data from HDFS to mysql using sqoop, and facing the following issue.
Sample data:
4564,38,153,2013-05-30 10:40:42.767,false,No credentials attempted,,,00 00 00 00 01 64 e6 a6
4565,38,160,2013-05-30 10:40:42.767,false,No credentials attempted,,,00 00 00 00 01 64 e6 a7
4566,38,80,2013-03-07 12:16:26.03,false,No SSH or Telnet credentials available. If an HTTP(S) exists for this asset, it was not able to authenticate.,,,00 00 00 00 01 0f c7 e6
Following Sqoop program, we used to export data from HDFS to MYSQL and we specified the schema in table:
public static void main(String[] args) {
String[] str = { "export", "--connect", "jdbc:mysql://-------/test",
"--table", "status", "--username", "root", "--password", "******",
"--export-dir", "hdfs://-----/user/hdfs/InventoryCategoryStatus/",
"--input-fields-terminated-by", ",", "--input-lines-terminated-by", "\n"
};
Sqoop.runTool(str);
}
Error after program execution:
[exec:exec]
0 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
123 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
130 [main] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
Note: /tmp/sqoop-manish/compile/fd0060344195ec9b06030b84cdf6e243/status.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
9516 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
11166 [main] WARN org.apache.hadoop.conf.Configuration - mapred.jar is deprecated. Instead, use mapreduce.job.jar
16598 [main] WARN org.apache.hadoop.conf.Configuration - mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
16612 [main] WARN org.apache.hadoop.conf.Configuration - mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
16614 [main] WARN org.apache.hadoop.conf.Configuration - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16618 [main] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.
17074 [main] WARN org.apache.hadoop.conf.Configuration - session.id is deprecated. Instead, use dfs.metrics.session-id
17953 [main] WARN org.apache.hadoop.conf.Configuration - mapred.job.classpath.files is deprecated. Instead, use mapreduce.job.classpath.files
17956 [main] WARN org.apache.hadoop.conf.Configuration - mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files
17957 [main] WARN org.apache.hadoop.conf.Configuration - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
17958 [main] WARN org.apache.hadoop.conf.Configuration - mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
17959 [main] WARN org.apache.hadoop.conf.Configuration - mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
17959 [main] WARN org.apache.hadoop.conf.Configuration - mapred.job.name is deprecated. Instead, use mapreduce.job.name
17959 [main] WARN org.apache.hadoop.conf.Configuration - mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
17960 [main] WARN org.apache.hadoop.conf.Configuration - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
17960 [main] WARN org.apache.hadoop.conf.Configuration - mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
17960 [main] WARN org.apache.hadoop.conf.Configuration - mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps
17961 [main] WARN org.apache.hadoop.conf.Configuration - mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class
17961 [main] WARN org.apache.hadoop.conf.Configuration - mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
19283 [main] WARN org.apache.hadoop.mapred.LocalDistributedCacheManager - LocalJobRunner does not support symlinking into current working dir.
19312 [main] WARN org.apache.hadoop.conf.Configuration - mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files
20963 [Thread-29] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.Exception: java.lang.NumberFormatException: For input string: " it was not able to authenticate."
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:400)
Caused by: java.lang.NumberFormatException: For input string: " it was not able to authenticate."
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:481)
at java.lang.Integer.valueOf(Integer.java:582)
at status.__loadFromFields(status.java:412)
at status.parse(status.java:334)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:77)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:36)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:183)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:338)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:232)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
21692 [main] WARN mapreduce.Counters - Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
21698 [main] WARN mapreduce.Counters - Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
21699 [main] ERROR org.apache.sqoop.tool.ExportTool - Error during export: Export job failed!
------------------------------------------------------------------------
BUILD SUCCESS
------------------------------------------------------------------------
Total time: 30.419s
Finished at: Fri Aug 23 15:28:03 IST 2013
Final Memory: 14M/113M
After, we checked the mysql table contain only 100 records out of 1600 records. Same program, we executed on another tables then out of 8000 records only 6800 records and 235202 records out of 376927 records got exported into mysql table. Can anyone please provide some suggestion regarding above program execution error.
Looking forward to reply, your help is highly appreciated.
Looking at your examples it seems that you are using comma as a column (field) separator, however you are allowing the comma to be part of the data itself. Notice the third line from the example data:
4566,38,80,2013-03-07 12:16:26.03,false,No SSH or Telnet credentials available. If an HTTP(S) exists for this asset, it was not able to authenticate.,,,00 00 00 00 01 0f c7 e6
The 6th column (No SSH ...) contains comma inside. As a result this one column will be split by Sqoop as two different columns and hence the exception you are getting. I would suggest to clean up your data. If you are using Sqoop to import them into HDFS, you can use parameters --enclosed-by or --escaped-by to overcome this issue.
You seem to be having a string where a number expected like " it was not able to authenticate." (as I can see from the trace that you shared). Please check the source data that is being pushed to the database.
Edit
Use some other character as the delimiter. When the data is being written (I assume that a MR program is generating this data.) to HDFS, use a rare character (like ^A, #, #) as a delimiter.
There are various options in the 'export' command such as '--enclosed-by', ' --escaped-by'. But your data should be prepared accordingly. The simplest option looks to be to select a delimiter character that is highly improbable to occur inside your data.
Edit-2
In that case, there's nothing much any tool can do as the delimiter character is coming in between the data field without any escape character or without enclosing strings (like "Hello, How are you"). The data needs to be controlled while storing. So while extracting through flume you should use a different delimiter than ',' OR escape ',' (like "Hello\, How are you") characters OR be able to enclose each field ("Hello, How are you").
So you should achieve this while extracting and storing data through flume. You should explore if there are any options on flume for achieving these.
Alternately you can write a MR program to cleanse or filter out problem records (to handle separately) OR load data into a staging table on MYSQL and write an SP to hadle the problem record scenario and insert into target table.

slf4j exception with quartz

I am trying to use quartz in a simple example in project. I am getting the following exception, I am not sure what it means...However I updated my slf4j to 1.6.1 in my POM file even then this still appears,
SLF4J: slf4j-api 1.6.x (or later) is incompatible with this binding.
SLF4J: Your binding is version 1.5.5 or earlier.
SLF4J: Upgrade your binding to version 1.6.x. or 2.0.x
Exception in thread "main" java.lang.NoSuchMethodError: org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder;
at org.slf4j.LoggerFactory.bind(LoggerFactory.java:121)
at org.slf4j.LoggerFactory.performInitialization(LoggerFactory.java:111)
at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:268)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:241)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:131)
at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:395)
at org.quartz.impl.StdSchedulerFactory.<init>(StdSchedulerFactory.java:249)
............
Any help on this would be highly appreciated. Thanks.
You need all your SLF4J dependencies to use the same version.
SLF4J: Your binding is version 1.5.5 or earlier.
SLF4J: Upgrade your binding to version 1.6.x. or 2.0.x
If you look at your dependency tree, I expect that you'll find more then one version of SLF4J for the various jar it uses.
For example
[INFO] +- org.hibernate:hibernate-core:jar:3.5.3-Final:compile
[INFO] | +- antlr:antlr:jar:2.7.7:compile (version managed from 2.7.6)
[INFO] | \- org.slf4j:slf4j-api:jar:1.5.8:compile
[INFO] +- org.slf4j:slf4j-log4j12:jar:1.5.8:compile
Here the two slf4j deps have the same version.
Looks like the SLF4J binding used inside quartz is too old. You should exclude the old version from quartz and add a new one explicitly to your project. Run mvn dependency:tree and post your result here. I will be able to give you exact instructions then.

How to Configuring Logging in Jetty via config file?

How do I get jetty to turn down the level of logging from the default of INFO?
I'm actually trying to run the default Apache Solr installation, which ships with jetty, but dumps a lot of information to the console, and I'd only like to see warnings.
I don't want to go hack up the code, I just would like to be able to drop a config file somewhere, but I've been googling for a while, and all I find are obsolete methods or programmatic methods.
Thanks!
edit: -D options would be great, too!
Short answer: java -DDEBUG -jar start.jar
Long answer: (taken from http://docs.codehaus.org/display/JETTY/Debugging)
"Jetty has it's own builtin logging facade that can log to stderr or slf4j (which in turn can log to commons logging, log4j, nlog4j and java logging). Jetty logging looks for a slf4j jar on the classpath. If found, slf4j is used to control logging otherwise stderr is used. The org.mortbay.log.Log class is used to coordinate logging and the following system parameters may be used to control logging:"
org.mortbay.log.class: Specify an implementation of org.mortbay.log.Logger to use
DEBUG: If set, debug logs will be produced, else only INFO and WARN logs will be generated
VERBOSE: If set, verbose logging is produced, including ignored exceptions
IGNORED: If set (jetty 6.1.10 and later), ignored exceptions are logged (independent of DEBUG and VERBOSE settings
Here I undestand that by the "system parameters", in the above cited text, they mean "Java system properties".
If you run jetty 6 as a daemon, the logging config file is:
/usr/share/jetty/resources/log4j.properties
(Where /usr/share/jetty is your $jetty.home.) And to turn down the default log level in that log4jproperties file, change the rootLogger entry:
log4j.rootLogger=WARN, stdout
Find the file logging.properties under your JAVA_HOME directory
Change the default global logging level from
.level= INFO
to
.level= WARNING