I am trying to load json file using json serde. I have added the serde jar file successfully.
1) My json jar file placed on path /apps/hive/warehouse/lib/
I have run this command successfully
add jar hdfs:///apps/hive/warehouse/lib/json-serde-1.3-jar-with-dependencies.jar; converting to local hdfs:///apps/hive/warehouse/lib/json-serde-1.3-jar-with-dependencies.jar Added [/tmp/6f1a54b9-65c4-4e32-8e87-25d60ef775c6_resources/json-serde-1.3-jar-with-dependencies.jar] to class path Added resources: [hdfs:///apps/hive/warehouse/lib/json-serde-1.3-jar-with-dependencies.jar]
2) Now when i am trying to upload json file on this path /apps/hive/warehouse/lib/ or tmp/
using ambari GUI. I am unable to upload it gives error 500. see attached image
3) I have also tried this command but beacuse i am unable to upload json file. so, it doesnot work
hadoop fs -put tmp/test.json /apps/hive/warehouse/lib/test.json
Kindly help me solving this issue
To load the file into hive, first copy the file to an HDFS Location. Like below:
hadoop fs -put /complete_path_to_my_json/json_to_upload.json /app/hive/a_temp_location
Then once you have the table created with JSON SerDe like below:
create table if not exists my_json_table (id int, name string, designation string) row format serde 'org.apache.hive.hcatalog.d.JsonSerDe';
You can load the data into the table with the following statement:
load data inpath '/app/hive/a_temp_location/json_to_upload.json' into table my_json_table;
Related
I got this link from snowflake site "https://youtu.be/H0sbMDqdYQ8" ,were they are trying load from json file with copy command ,where the table has 4 column of the 2 are variant.i am trying the same ,but when try load the json file with copy command with file format as JSON ,its throws error "SQL compilation error: JSON/XML/AVRO file format can produce one and only one column of type variant or object or array. Use CSV file format if you want to load more than one column." how to load a json file into a table which has more than one column.my requirement is same as the above youtube link ...
In the example from Snowflake's docs, data-load-transform check out the section Load semi-structured Data into Separate Columns and you'll be able to load multiple columns of a table.
I am trying to load csv files into a Hive table. I need to have it done through HDFS.
My end goal is to have the hive table also connected to Impala tables, which I can then load into Power BI, but I am having trouble getting the Hive tables to populate.
I create a table in the Hive query editor using the following code:
CREATE TABLE IF NOT EXISTS dbname.table_name (
time_stamp TIMESTAMP COMMENT 'time_stamp',
attribute STRING COMMENT 'attribute',
value DOUBLE COMMENT 'value',
vehicle STRING COMMENT 'vehicle',
filename STRING COMMENT 'filename')
Then I check and see the LOCATION using the following code:
SHOW CREATE TABLE dbname.table_name;
and find that is has gone to the default location:
hdfs://our_company/user/hive/warehouse/dbname.db/table_name
So I go to the above location in HDFS, and I upload a few csv files manually, which are in the same five-column format as the table I created. Here is where I expect this data to be loaded into the Hive table, but when I go back to dbname in Hive, and open up the table I made, all values are still null, and when I try to open in browser I get:
DB Error
AnalysisException: Could not resolve path: 'dbname.table_name'
Then I try the following code:
LOAD DATA INPATH 'hdfs://our_company/user/hive/warehouse/dbname.db/table_name' INTO TABLE dbname.table_name;
It runs fine, but the table in Hive still does not populate.
I also tried all of the above using CREATE EXTERNAL TABLE instead, and specifying the HDFS in the LOCATION argument. I also tried making an HDFS location first, uploading the csv files, then CREATE EXTERNAL TABLE with the LOCATION argument pointed at the pre-made HDFS location.
I already made sure I have authorization privileges.
My table will not populate with the csv files, no matter which method I try.
What I am doing wrong here?
I was able to solve the problem using:
CREATE TABLE IF NOT EXISTS dbname.table_name (
time_stamp STRING COMMENT 'time_stamp',
attribute STRING COMMENT 'attribute',
value STRING COMMENT 'value',
vehicle STRING COMMENT 'vehicle',
filename STRING COMMENT 'filename')
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
and
LOAD DATA INPATH 'hdfs://our_company/user/hive/warehouse/dbname.db/table_name' OVERWRITE INTO TABLE dbname.table_name;
I am getting "java.lang.ClassNotFoundException: com.bizo.hive.serde.csv.CSVSerde" exception when trying to query a hive table having properties
ROW FORMAT SERDE
'com.bizo.hive.serde.csv.CSVSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
The solution is adding a jar file when you submit your Spark command.
I had the same problem. I could not connect Spark to an Hive table with CSV format. But for other Hive tables Spark worked perfectly.
After reading through your post and Rao's comment, I realized it should be a missing jar issue.
Step 1:
Download a jar file (csv-serde-1.1.2-0.11.0-all.jar) from here
Step 2:
Then run spark-submit or spark-shell or pyspark with this jar.
I use pyspark:
pyspark --deploy-mode client --master yarn --jars /your/jar/path/csv-serde-1.1.2-0.11.0-all.jar
Step 3:
Test your Spark + Hive connection:
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
hiveTableRdd = sqlContext.sql("SELECT * FROM hiveDatabase.hiveTable")
hiveTableRdd.show()
Now it should work.
***note: I used 'com.bizo.hive.serde.csv.CSVSerde', because the data was double-quoated:
"ID1","A,John","25.6"
"ID2","B,Mike","29.1"
"ID3","C,Tony","27.3"
...
The Hive table with CSV CSVserde :
CREATE EXTERNAL TABLE hiveDatabase.hiveTable (
ID string,
Name string,
Value string
)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
'separatorChar' = '\,'
,'quoteChar' = '\"')
stored as textfile
LOCATION
'/data/path/hiveTable';
The problem is the following:
After having created a table with Cygnus 0.2.1, I receive a MapReduce error when trying to select a column from Hive. If we see the files created in hadoop by Cygnus, we can see that the format used is JSON. This problem didn't appear in previous versions of Cygnus as it was creating hadoop files in CSV format.
In order to test it, I left 2 tables created reading from each format. You can compare and see the error with the following queries:
SELECT entitytype FROM fiware_ports_meteo; (it fails, created with 0.2.1 in JSON format)
SELECT entitytype FROM fiware_test_table; (it works, created with 0.2 in CSV format)
The path to the HDFS files are, respectively:
/user/fiware/ports/meteo
/user/fiware/testTable/
I suspect the error comes from parsing the JSON file by the MapReduce job since the CSV format works as expected.
How can this issue be avoided?
You simply have to add the Json serde to the Hive classpath. As a not priviledged user, you can do that from the Hive CLI:
hive> ADD JAR /usr/local/hive-0.9.0-shark-0.8.0-bin/lib/json-serde-1.1.9.3-SNAPSHOT.jar;
If you have developed a remote Hive client, you can perform the same operation as any other query execution. Let's say you are using Java:
Statement stmt = con.createStatement();
stmt.executeQuery(“ADD JAR /usr/local/hive-0.9.0-shark-0.8.0-bin/lib/json-serde-1.1.9.3-SNAPSHOT.jar”);
stmt.close();
I am trying to upload the csv file on HDFS for Impala and failing many time. Not sure what is wrong here as I have followed the guide. And the csv is also on HDFS.
CREATE EXTERNAL TABLE gc_imp
(
asd INT,
full_name STRING,
sd_fd_date STRING,
ret INT,
ftyu INT,
qwerINT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY','
LOCATION '/user/hadoop/Gc_4';
Error which I am getting. And I am using Hue for it.
> TExecuteStatementResp(status=TStatus(errorCode=None,
> errorMessage='MetaException: hdfs://nameservice1/user/hadoop/Gc_4 is
> not a directory or unable to create one', sqlState='HY000',
> infoMessages=None, statusCode=3), operationHandle=None)
Any lead.
/user/hadoop/Gc_4 must be a directory. So you need to create a directory, for example, /user/hadoop/Gc_4. Then you upload your Gc_4 to it. So the file path is /user/hadoop/Gc_4/Gc_4. After that, you can use LOCATION to specify the directory path /user/hadoop/Gc_4.
LOCATION must be a directory. This requirement is same in Hive and Impala.
It's not the answer but a workaround.
In most cases I have seen that the table uploaded but the "status" was not successful.
Also if you have stored the data with the help of Hive which gives you more control then "Don't forget to REFRESH the metadata on Impala UI." .Very Important.