I have a MySQL table which includes a column that is AUTO_INCREMENT:
CREATE TABLE features (
id INT NOT NULL AUTO_INCREMENT,
name CHAR(30),
value DOUBLE PRECISION
);
I created a DataFrame and wanted to insert it into this table.
case class Feature(name: String, value: Double)
val rdd: RDD[Feature]
val df = rdd.toDF()
df.write.mode(SaveMode.Append).jdbc("jdbc:mysql://...", "features", new Properties)
I get the error, Column count doesn’t match value count at row 1. If I delete the id column it works. How could I insert this data into the table without changing the schema?
You have to include an id field in the DataFrame, but its value will be ignored and replaced with the auto-incremented ID. That is:
case class Feature(id: Int, name: String, value: Double)
Then just set id to 0, or any number when you create a Feature.
Related
When working with JSON datatype, is there a way to ensure the input JSON must have elements. I don't mean primary, I want the JSON that gets inserted to at least have the id and name element, it can have more but at the minimum the id and name must be there.
thanks
The function checks what you want:
create or replace function json_has_id_and_name(val json)
returns boolean language sql as $$
select coalesce(
(
select array['id', 'name'] <# array_agg(key)
from json_object_keys(val) key
),
false)
$$;
select json_has_id_and_name('{"id":1, "name":"abc"}'), json_has_id_and_name('{"id":1}');
json_has_id_and_name | json_has_id_and_name
----------------------+----------------------
t | f
(1 row)
You can use it in a check constraint, e.g.:
create table my_table (
id int primary key,
jdata json check (json_has_id_and_name(jdata))
);
insert into my_table values (1, '{"id":1}');
ERROR: new row for relation "my_table" violates check constraint "my_table_jdata_check"
DETAIL: Failing row contains (1, {"id":1}).
add jar /path to/hive-serdes-1.0-SNAPSHOT.jar;
CREATE EXTERNAL TABLE student
( id int, student_id INT, type STRING, score DOUBLE
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES ( 'mongo.columns.mapping'='{ "id":"_id",
"student_id":"student_id", "type":"type","score":"score" }' )
TBLPROPERTIES('mongo.uri'='mongodb://****---****.nam.nsroot.net:*****/admin.student');
I am able to successfully run the code and ingest data. But the "id" field gets populated as NULL.
Should i change the data type ? I tried STRING as well. Got the same result.
According to the mongo-hadoop Hive SerDe, ObjectId corresponds to a special instance of STRUCT.
A Hive field corresponding to an ObjectId must be a STRUCT with the fields oid, a STRING, and bsontype, an INT, and nothing else. The oid is the string of the ObjectId while the bsontype should always be 8. Per your example, it should be :
CREATE EXTERNAL TABLE student
(id STRUCT<oid:STRING, bsontype:INT>, student_id INT, type STRING, score DOUBLE)
Where the output would be something similar to:
{"oid":"56d6e0f6ff1f17f74ebbc16c","bsontype":8}
{"oid":"56d6e0f8ff1f17f74ebbc16d","bsontype":8}
...
The above was tested with: MongoDB v3.2.x, mongo-java-driver-3.2.2.jar, mongo-hadoop-core-1.5.0-rc0.jar, mongo-hadoop-hive-1.5.0-rc0.jar.
I need to copy the data of an old table with millions of rows to a newer table, with a slightly different definition. Most importantly, there is one new field with a null-default, and a varchar field became an enum (with directly mapping values).
Old table:
id : integer
type : varchar
New table:
id : integer
type : enum
number : integer, default null
All of the possible string values of type are within the new enumeration.
I tried the following:
insert into new.table select * from old.table
But I obviously get:
Insert value list does not match column list: 1136 Column count doesn't match value count at row 1
You can copy the table data and structure from phpmyadmin window, and then modify the new table and add the new column.
Using the INSERT ... SELECT syntax:
INSERT INTO new.table `id`, `type` SELECT `id`, `type` FROM old.table
Apparently the varchar to enum remapping isn't a problem.
I have a use case where I have a table a. I want to select data from it, group by come fields, do some aggregations and insert the result into another hive table b having one of the column as a struct. I am facing some difficulty with it. Can some one please help and tell me whats wrong with my queries.
CREATE EXTERNAL TABLE IF NOT EXISTS a (
date string,
acct string,
media string,
id1 string,
val INT
) PARTITIONED BY (day STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 'folder1/folder2/';
ALTER TABLE a ADD IF NOT EXISTS PARTITION (day='{DATE}') LOCATION 'folder1/folder2/Date={DATE}';
CREATE EXTERNAL TABLE IF NOT EXISTS b (
date string,
acct string,
media string,
st1 STRUCT<id1:STRING, val:INT>
) PARTITIONED BY (day STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 'path/';
FROM a
INSERT OVERWRITE TABLE b PARTITION (day='{DATE}')
SELECT date,acct,media,named_struct('id1',id1,'val',sum(val))
WHERE day='{DATE}' and media is not null and acct is not null and NOT (id1 = "0" )
GROUP BY date,acct,media,id1;
Error I got :
SemanticException [Error 10044]: Line 3:31 Cannot insert into target table because column number/types are different ''2015-07-16'': Cannot convert column 4 from struct<id1:string,val:bigint> to struct<id1:string,val:int>.
Sum return a BIGINT, not an INT. So Declare
st1 STRUCT<id1:STRING, val:BIGINT>
instead of
st1 STRUCT<id1:STRING, val:INT>
I'm trying to create a new table from another table with CREATE AS and dynamic Partitioning on HiveCLI. I'm learning from Hive official wiki where there is this example:
CREATE TABLE T (key int, value string)
PARTITIONED BY (ds string, hr int) AS
SELECT key, value, ds, hr+1 hr1
FROM srcpart
WHERE ds is not null
And hr>10;
But I received this error:
FAILED: SemanticException [Error 10065]:
CREATE TABLE AS SELECT command cannot specify the list of columns for the target table
Source: https://cwiki.apache.org/confluence/display/Hive/DynamicPartitions#DynamicPartitions-Syntax
Since you already know the full schema of the target table, try creating it first and the populating it with a LOAD DATA command:
SET hive.exec.dynamic.partition.mode=nonstrict;
CREATE TABLE T (key int, value string)
PARTITIONED BY (ds string, hr int);
INSERT OVERWRITE TABLE T PARTITION(ds, hr)
SELECT key, value, ds, hr+1 AS hr
FROM srcpart
WHERE ds is not null
And hr>10;
Note: the set command is needed since you are performing a full dynamic partition insert.
SET hive.exec.dynamic.partition.mode=nonstrict;
CREATE TABLE T (key int, value string)
PARTITIONED BY (ds string, hr int);
INSERT OVERWRITE TABLE T PARTITION(ds, hr)
SELECT key, value, ds, hr+1 AS hr
FROM srcpart
WHERE ds is not null
And hr>10;
In the above code, instead of the Create statement use: CREATE TABLE T like srcpart;
In case the partitioning is similar.