Json Data on hive - json

I am trying to read json data using Hive External table but I am getting Null pointer exception while using json serde..
Below is the table command and error:
hive> create external table json_tab
> (
> name string, age string, passion string
> )
> row format SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' location '/home/pandi/hive_in';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.NullPointerException
I have added below jar as well:
add jar /usr/local/apache-hive-2.1.1-bin/lib/hive-contrib-2.1.1.jar;
add jar /usr/local/apache-hive-2.1.1-bin/lib/hive-json-serde.jar;
Please help.

It looks like an issue with the SerDe class.
Try to make use of this implementation: 'org.apache.hive.hcatalog.data.JsonSerDe' present in hive-hcatalog-core-0.13.0.jar;
This works for me.

Related

bigquery error: "Could not parse '41.66666667' as INT64"

I am attempting to create a table using a .tsv file in BigQuery, but keep getting the following error:
"Failed to create table: Error while reading data, error message: Could not parse '41.66666667' as INT64 for field Team_Percentage (position 8) starting at location 14419658 with message 'Unable to parse'"
I am not sure what to do as I am completely new to this.
Here is a file with the first 100 lines of the full data:
https://wetransfer.com/downloads/25c18d56eb863bafcfdb5956a46449c920220502031838/f5ed2f
Here are the steps I am currently taking to to create the table:
https://i.gyazo.com/07815cec446b5c0869d7c9323a7fdee4.mp4
Appreciate any help I can get!
As confirmed with OP (#dan), the error encountered is caused by selecting Auto detect when creating a table using a .tsv file as the source.
The fix for this is to manually create a schema and define the data type for each column properly. For more reference on using schema in BQ see this document.

Athena Creating Table from JSON, how to deal with multiple nested structure

I am trying to create a table from a json, the json being like
{"ocNo" : "6090","clientSessionKey" : {"office" : {"ortsCode" : 6090},"workstationNo" : 1}}
I tried to achieve it by executing following query:
CREATE EXTERNAL TABLE events_tryout(
ocNo string,
clientSessionKey struct<office struct<ortsCode: int>,
workstationNo int>
) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://lab.ea38-zplus.cap.nonprod.int2/test/'
However I get the following error message:
FAILED: ParseException line 1:81 missing : at 'struct' near '<EOF>' line 1:118 missing : at 'int' near '<EOF>'
I checked that the json is valid, so that is not the problem.
However, when I run it by removing ClientSessionKey and this nesting it works, which tells me, that the problem is adding another nesting. Can Athena deal with structs inside structs while creating tables from json, or should another approach be taken?
The problem is that there is a missing : after office, just like the error message is saying.
There is also another : missing after workstationNo.
Try struct<office:struct<ortsCode:int>,workstationNo: int>.

Problems with importing a JSON tweet into hive

i work on cloudera quickstart with docker, I'm trying to create a table into hive interface.
This is my code.
add jar hdfs:///user/cloudera/hive-serdes-1.0-SNAPSHOT.jar
drop table if exists tweets;
CREATE EXTERNAL TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user1:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user1 STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/user/cloudera/';
load data inpath '/user/cloudera/search.json' into table tweets;
when I run "select * from tweets;", I get this error
Fetching results ran into the following error(s):
Bad status for request TFetchResultsReq(fetchType=0, operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xf2e\xcc\xb6v\x8eC"\xae^x\x89*\xd6j\xa7', guid='h\xce\xacgmZIP\x8d\xcc\xc0\xe8C\t\x1a\x0c')), orientation=4, maxRows=100): TFetchResultsResp(status=TStatus(errorCode=0, errorMessage='java.io.IOException: java.io.IOException: Not a file: hdfs://quickstart.cloudera:8020/user/cloudera/2015_11_18', sqlState=None, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:java.io.IOException: java.io.IOException: Not a file: hdfs://quickstart.cloudera:8020/user/cloudera/2015_11_18:25:24', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:366', 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:275', 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:752', 'sun.reflect.GeneratedMethodAccessor19:invoke::-1', 'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43', 'java.lang.reflect.Method:invoke:Method.java:606',
Don't use your user folder as a Hive table location. A user folder is meant for general file storage, such as that 2015_11_18 directory it's trying to read, not an entire Hive structure.
Do LOCATION '/user/cloudera/tweets';, for example instead.
You could also just make a regular managed table if you don't care if things are deleted when you drop the table.

Read a json file with 12 nested level into hive in AZURE hdinsights

I tried to create a schema for the json file manually and tried to create a Hive table and i am getting
column type name length 10888 exceeds max allowed length 2000.
I am guessing i have to change the metastore details but i am not sure where is the config located In azure Hdinsights .
Other way I tried was
I got the schema from spark dataframe and i tried to create table from the view but still I get the same error.
this are the steps i tried in spark
val tne1 = sc.wholeTextFiles("wasb:path").map(x=>x._2)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
val tne2 = sqlContext.read.json(tne1)
tne2.createOrReplaceTempView("my_temp_table");
sqlContext.sql("create table s ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'hive.serialization.extend.nesting.levels'='true') as select * from my_temp_table")
i am getting the error in this step
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: InvalidObjectException(message:Invalid column type name length 5448 exceeds max allowed length 2000, type struct
when i try to persist or create the rdd i get the schema but in a formatted view . even if i get the full view i might extract the schema .
I Added the following property through Ambari > Hive > Configs > Advanced > Custom hive-site:
hive.metastore.max.typename.length=14000.
and now i am able to create table with column type name upto 14000 length
I was able to fix this problem by running the below command before my create table statement. You can see it to whatever limit fits your schema definition, I made mine extra large.
Note, you have to do this again for each session in hive.
set hive.metastore.max.typename.length=11000;

Hive error on CREATE

I'm following these instructions and I've got to running Hive. I ran the following commands:
ADD JAR /home/cloudera/Downloads/hive-serdes-1.0-SNAPSHOT.jar
CREATE EXTERNAL TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
PARTITIONED BY (datehour INT)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/home/cloudera/flume/tweets';
and then I encountered an error:
CREATE does not exist
Query returned non-zero code: 1, cause: CREATE does not exist.
As I'm new to Hive, I might be missing something obvious.
What might be causing such an error?
I was getting similar error on my Hive console while runing hive commands:
create does not exist
Query returned non-zero code: 1, cause: create does not exist
I resolved this problem by setting the Hive run as user setting.
I changed it from "Run as end user instead of Hive user" from True to False and restarted Hive server/clients.
with this setting my hive commands started running with hive user and started working.
before making this setting the default user id the root user where hive was running from.
This is hive setting issue please restart your hive console and check your hive-jdbc version and hadoop version compatability. Hope this will solve your issue as i can see the query is fine.
The problem is that you didn't put ; in the end of the first statement.
You need to change this:
ADD JAR /home/cloudera/Downloads/hive-serdes-1.0-SNAPSHOT.jar
Into this:
ADD JAR /home/cloudera/Downloads/hive-serdes-1.0-SNAPSHOT.jar;