Rebuild a Hive table with different definitions - mysql

I want to build an automatic system to help me map Hive tables.
I have an SQL table with meta data: tableID, fieldName, field Type, description, lastUpdated.
I want to update automatically my tables -
where lastUpdate=CURDATE() - INTERVAL '1' DAY
But I don't have an indication on what change was made - it can be a new column in the table, and it can be a column name that was changed, or even a description update.
Is there a way to "define" a table all over again when it already exists? That all the changes I want to make will be executed at once (all change types)?
for instance - I have a table that was defined like this:
create external table IF NOT EXISTS tableA (`a` string, `b` int, `c` int) PARTITIONED BY (dt date) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION 'File/Path';
And the change was that column "b" type is now "string". Is there a (generic) update/alter query that I can write:
*SomeCommand* tableA (`a` string, `b` string, `c` int)
and my column will be updated?
Same if I have a new column - d, type: float.
*SomeCommand* tableA (`a` string, `b` string, `c` int, `d` float)
I need one command that can contain these options, please. Or - if you have another good idea n how to do this, I will really appreciate it...
Thank you!

You can use ALTER TABLE REPLACE COLUMNS. It does exactly what you asked,
It will replace all the columns at once. See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Add/ReplaceColumns

Related

ALTER COLUMN TYPE from tinyInt to Varchar in Mysql

I need to change column type from tinyInt(used as bool) to Varchar, without loosing data.
I have found many answers on stack-overflow but all of them are written in postgres and I have no idea how to rewrite it in Mysql.
Answers for this problem on stack-overflow looks like that:
ALTER TABLE mytabe ALTER mycolumn TYPE VARCHAR(10) USING CASE WHEN mycolumn=0 THEN 'Something' ELSE 'TEST' END;
How would similar logic look like in Mysql?
The syntax you show has no equivalent in MySQL. There's no way to modify values during an ALTER TABLE. An ALTER TABLE in MySQL will only translate values using builtin type casting. That is, an integer will be translated to the string format of that integer value, just it would in a string expression.
For MySQL, here's what you have to do:
Add a new column:
ALTER TABLE mytable ADD COLUMN type2 VARCHAR(10);
Backfill that column:
UPDATE mytable SET type2 = CASE `type` WHEN 0 THEN 'Something' ELSE 'TEST' END;
If the table has millions of rows, you may have to do this in batches.
Drop the old column and optionally rename the new column to the name of the old one:
ALTER TABLE mytable DROP COLUMN `type`, RENAME COLUMN type2 to `type`;
Another approach would be to change the column, allowing integers to convert to the string format of the integer values. Then update the strings as you want.
ALTER TABLE mytable MODIFY COLUMN `type` VARCHAR(10);
UPDATE mytable SET `type` = CASE `type` WHEN '0' THEN 'Something' ELSE 'TEST' END;
Either way, be sure to test this first on another table before trying it on your real table.

Copy table data on same server with field remapping

I need to copy the data of an old table with millions of rows to a newer table, with a slightly different definition. Most importantly, there is one new field with a null-default, and a varchar field became an enum (with directly mapping values).
Old table:
id : integer
type : varchar
New table:
id : integer
type : enum
number : integer, default null
All of the possible string values of type are within the new enumeration.
I tried the following:
insert into new.table select * from old.table
But I obviously get:
Insert value list does not match column list: 1136 Column count doesn't match value count at row 1
You can copy the table data and structure from phpmyadmin window, and then modify the new table and add the new column.
Using the INSERT ... SELECT syntax:
INSERT INTO new.table `id`, `type` SELECT `id`, `type` FROM old.table
Apparently the varchar to enum remapping isn't a problem.

hive insert into structure data type using a query

I have a use case where I have a table a. I want to select data from it, group by come fields, do some aggregations and insert the result into another hive table b having one of the column as a struct. I am facing some difficulty with it. Can some one please help and tell me whats wrong with my queries.
CREATE EXTERNAL TABLE IF NOT EXISTS a (
date string,
acct string,
media string,
id1 string,
val INT
) PARTITIONED BY (day STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 'folder1/folder2/';
ALTER TABLE a ADD IF NOT EXISTS PARTITION (day='{DATE}') LOCATION 'folder1/folder2/Date={DATE}';
CREATE EXTERNAL TABLE IF NOT EXISTS b (
date string,
acct string,
media string,
st1 STRUCT<id1:STRING, val:INT>
) PARTITIONED BY (day STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 'path/';
FROM a
INSERT OVERWRITE TABLE b PARTITION (day='{DATE}')
SELECT date,acct,media,named_struct('id1',id1,'val',sum(val))
WHERE day='{DATE}' and media is not null and acct is not null and NOT (id1 = "0" )
GROUP BY date,acct,media,id1;
Error I got :
SemanticException [Error 10044]: Line 3:31 Cannot insert into target table because column number/types are different ''2015-07-16'': Cannot convert column 4 from struct<id1:string,val:bigint> to struct<id1:string,val:int>.
Sum return a BIGINT, not an INT. So Declare
st1 STRUCT<id1:STRING, val:BIGINT>
instead of
st1 STRUCT<id1:STRING, val:INT>

Dynamic Partitioning + CREATE AS on HIVE

I'm trying to create a new table from another table with CREATE AS and dynamic Partitioning on HiveCLI. I'm learning from Hive official wiki where there is this example:
CREATE TABLE T (key int, value string)
PARTITIONED BY (ds string, hr int) AS
SELECT key, value, ds, hr+1 hr1
FROM srcpart
WHERE ds is not null
And hr>10;
But I received this error:
FAILED: SemanticException [Error 10065]:
CREATE TABLE AS SELECT command cannot specify the list of columns for the target table
Source: https://cwiki.apache.org/confluence/display/Hive/DynamicPartitions#DynamicPartitions-Syntax
Since you already know the full schema of the target table, try creating it first and the populating it with a LOAD DATA command:
SET hive.exec.dynamic.partition.mode=nonstrict;
CREATE TABLE T (key int, value string)
PARTITIONED BY (ds string, hr int);
INSERT OVERWRITE TABLE T PARTITION(ds, hr)
SELECT key, value, ds, hr+1 AS hr
FROM srcpart
WHERE ds is not null
And hr>10;
Note: the set command is needed since you are performing a full dynamic partition insert.
SET hive.exec.dynamic.partition.mode=nonstrict;
CREATE TABLE T (key int, value string)
PARTITIONED BY (ds string, hr int);
INSERT OVERWRITE TABLE T PARTITION(ds, hr)
SELECT key, value, ds, hr+1 AS hr
FROM srcpart
WHERE ds is not null
And hr>10;
In the above code, instead of the Create statement use: CREATE TABLE T like srcpart;
In case the partitioning is similar.

Truncate decimals from column in SQL

How would I delete all the decimals from a column in SQL Server 2008?
If I have a column X_Coord and had three rows with the value, how would I trim it so that there are NO decimals after the last whole number?
For example, let's say my table is called RCMP, and the column is below:
X_Coord
---------------
- 5588790.77000
- 5588873.79000
- 5588943.71000
How would I remove the decimals in a single query?
I tried ROUND, but that ends up making the values appear as ie 5588790.00000.
I want it to appear as: 5588790.
Cast the decimal data type to an integer.
SELECT CAST(x_coord AS INT)
FROM dbo.RMCP
Edit: I have updated the code to reflect how to change the data type. This is a big impact change so be very careful. I would urge you to test this in development.
if object_id('#Demo') is not null
drop table #Demo;
go
create table #Demo(x_coord decimal(12,5))
insert into #Demo values(5588790.77000),(5588873.79000),(5588943.71000)
alter table #Demo
ALTER COLUMN x_coord INT NULL
select *
from #Demo
GO
--or this works
if object_id('#Demo') is not null
drop table #Demo;
go
create table #Demo(x_coord decimal(12,5))
insert into #Demo values(5588790.77000),(5588873.79000),(5588943.71000)
alter table #Demo
add new_x_coord INT NULL
UPDATE #Demo SET new_x_coord = CAST(x_coord AS INT)
GO
--********************** dont drop anything until you confirm the data is good!!!!!!!!!!!!! and test this in development *************************************
ALTER TABLE #Demo
DROP COLUMN x_coord
exec sp_rename 'dbo.#Demo.new_x_coord','x_coord','COLUMN'
select *
from #Demo
GO