Optimizing hive queries - mysql

I am trying to optimize hive query. I have partitioned and stored my base table as ORC file as shown below.
create table if not exists processed (
plc string,
direction string,
table int,
speed float,
time string
) PARTITIONED BY (time_id bigint) STORED AS ORC;
I am firing the below query on the above table (contains 500.000 records). The final result I get is stored as a json. The whole transaction takes about 35 secs. Is there a way wherein I can reduce this time. Or may be, someone could suggest me using a different framework instead of Hive. This is the query :
String finalQuery = "select plc,direction,AVG(speed) as speed ,COUNT(plc) as count,time_id from processed WHERE plc IN "
+ " "
+ "("
+ plcCSV
+ ")"
+ " " + " " + "AND" + " " + "time_id =" + " " + time_id + " "
+ "group by plc,direction,time_id";

First of all create an index on plc column and then try.

Related

Why does SQL operation shows the same output? [duplicate]

This question already has answers here:
Compare dates in MySQL
(5 answers)
Closed last year.
when I output these two SQL operations, the columns for the two months (01 = January) and (02 = February) are summed together instead of individually. So they show the same output. But why?
If if want to output (02 February) it should not show the sum of January but it does. Where is my problem in my code?
"SELECT SUM(" + dataBaseHelper.GET_SUM + ") FROM " + dataBaseHelper.TABLE_NAME + " WHERE " + dataBaseHelper.DATE + " BETWEEN '01/01/2022' AND " + "'07/01/2022'"
and
"SELECT SUM(" + dataBaseHelper.GET_SUM + ") FROM " + dataBaseHelper.TABLE_NAME + " WHERE " + dataBaseHelper.DATE + " BETWEEN '01/02/2022' AND " + "'07/02/2022'"
The standard syntax for date in MySQL is yyyy-mm-dd. I think you could replace the dates in your where between statement using this consideration. Otherwise read string to date cast specifications.

Data truncation error using mysql jdbc when inserting data from csv file to table

I am inserting data from a .csv file to my MySQl database table in eclipse using "load data local infile". However, I am getting the error message shown below.
Exception in thread "main" com.mysql.jdbc.MysqlDataTruncation: Data truncation: Incorrect date value: '1894' for column 'startYear' at row 1
Sample Data:
tconst_titles,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes
tt1,short,Carmencita,Carmencita,0,1894,0000,1
tt2,short,Le clown et ses chiens,chiens,0,1892,0000,5
A similar thread mentioned that it has to do with the date values not being in the correct format. However, I have declared "startYear" as a date when creating the table and it should recognize '1894' as a year shouldn't it ?
Code for creating the table:
String sql1 = "CREATE TABLE Titles" +
"(tconst_titles VARCHAR(255) PRIMARY KEY, " +
" titleType VARCHAR(255), " +
" primaryTitle VARCHAR(255), " +
" originalTitle VARCHAR(255), " +
" isAdult TINYINT(1), " +
" startYear DATE, " +
" endYear DATE, " +
" runtimeMinutes INT)";
Code for inserting data from .csv file:
import java.sql.*;
public class populate {
public static void main(String[] args) throws SQLException {
Connection con = DriverManager.getConnection("jdbc:mysql://localhost:3306/IMDB","root","user");
Statement stmt = con.createStatement();
String sql =
"load data local infile 'titles.csv' \n" +
" replace \n" +
" into table Titles \n" +
" columns terminated by '\\t' \n" +
" ignore 1 lines";
stmt.execute(sql);
}
}
https://dev.mysql.com/doc/refman/5.7/en/datetime.html says:
The DATE type is used for values with a date part but no time part. MySQL retrieves and displays DATE values in 'YYYY-MM-DD' format. The supported range is '1000-01-01' to '9999-12-31'.
Your data is only YYYY. This has no month or day, so it's not a date in the format required by MySQL.
If you only want to store a year with no month or day, use the YEAR data type if you have values in the supported range of 1901 - 2155.
If you have other years (like you have 1894), use SMALLINT UNSIGNED.

Teradata create table auto-increment column error

I am trying to import my CSV file into Teradata using Teradata's Fastload script.
I also tried adding an auto-increment column.
This is my CSV file:
Word,country,sale,week
hi,USA,26.17,11/22/15-11/28/15
bye,USA,16.5,11/22/15-11/28/15
code snippet
String tableName = "my_db.mytable";
String createTable = "CREATE TABLE " + tableName + "," +
"NO FALLBACK," +
"NO BEFORE JOURNAL," +
"NO AFTER JOURNAL," +
"CHECKSUM = DEFAULT" +
"(" +
" id decimal(10,0) NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1 MINVALUE 1 MAXVALUE 2147483647 NO CYCLE),"+
" word VARCHAR(500) CHARACTER SET UNICODE," +
" country VARCHAR(50)," +
" sale FLOAT," +
" week VARCHAR(30)" +
") " +
"PRIMARY INDEX (id)";
// INSERT statement
String insertTable = "INSERT INTO " + tableName + " VALUES(?,?,?,?,?)";
Error i got:
Row 1 in FastLoad table my_db.mytable_ERR_1 contains the following data:
ErrorCode=2673
ErrorFieldName=F_id
ActualDataParcelLength=55
DataParcel: byte array length 55 (0x37), offset 0 (0x0), dump length 55 (0x37)
This doesn't look like a FastLoad script, is this part of a JDBC-FastLoad?
2673 The source parcel length does not match data that was defined.
Your input data is a comma-delimited text, thus you must define all columns as VARCHAR.
And there are four columns in your input file, but you specify five in the INSERT. As the name implies a GENERATED ALWAYS sequence is automatically created.

insert statement difficulty

this is where I am getting my info from, and when I choose the address it fills in all the info
but the problem starts when I try to add a renter to the renter table after I have deleted a renter. this table no longer shows columns with all addressIDs so I am trying to insert the AddressID as well from the property table.I hope this makes sense
I cant insert pictures yet, but here is what it looks like when i chose a property, rentals
if ( ( evt.getStateChange() == java.awt.event.ItemEvent.SELECTED ) &&
( PropertyComboBox.getSelectedIndex() != 0 ) )
{
Address = ( String ) PropertyComboBox.getSelectedItem();
try {
myResultSet = myStatement.executeQuery(
"SELECT Property.Address,Property.AddressID,Property.RentAmt, Renter.RenterID, Renter.AddressID, Renter.FirstName, Renter.LastName, Renter.CellPhone, Renter.DepositPaid,Renter.DepositAmtPaid " +
"FROM Property, Renter " +
"WHERE Property.Address = '" + Address + "'" + "AND Renter.AddressID = Property.AddressID" );
if (myResultSet.next())
{
renterID = (myResultSet.getString("Renter.RenterID"));
addressID = (myResultSet.getString("Property.AddressID"));
txtRentAmt.setText(myResultSet.getString("Property.RentAmt"));
txtShowAddressID.setText(myResultSet.getString("Property.AddressID"));
txtShowRenterID.setText(myResultSet.getString("Renter.RenterID"));
txtFirstName.setText(myResultSet.getString("Renter.FirstName"));
txtLastName.setText(myResultSet.getString("Renter.LastName"));
txtCellPhone.setText(myResultSet.getString("Renter.CellPhone"));
txtDepositPaid.setText(myResultSet.getString("Renter.DepositPaid"));
txtDepositAmtPaid.setText(myResultSet.getString("Renter.DepositAmtPaid"));
if(myResultSet.getString("Renter.DepositPaid") == ("Y"))
{
txtDepositPaid.setText("Y");
}
else
{
txtDepositPaid.setText("N");
}
}
}
can someone help me with this ? I am trying to insert a new renter
from a netbeans jform into my database. The AddressID
(PK,auto-increment ) from the property table should automatically
insert into the renter table AddressID (FK, auto-increment(so I
thought)
It will insert if I use this statement but then the addressID shows as
NULL, not the AddressID from the property table, which I need. Ive
been working on this since Saturday. UGH Please help! very simple, yet
I cannot figure it out
ls_query = "INSERT INTO Renter (FirstName,LastName,CellPhone,DepositPaid,DepositAmtPaid)"
+ " VALUES (" + addressID + ",'"
+ addFirstName + "','"
+ addLastName + "','"
+ addCellPhone + "','"
+ addDepositPaid + "',"
+ addDepositAmtPaid + ")" + " WHERE Property.AddressID = " + addressID ;
INSERT plus WHERE? i guess you need UPDATE, not INSERT http://dev.mysql.com/doc/refman/5.0/en/update.html
EDIT: it's not clear, you are mixing in insert in one table with a where in another table?, just do "INSERT ... (fields) VALUES (values)" without WHERE and specify all addressID on fields.
You need to specify AddressID in the field list.
...INTO Renter (AddressID, FirstName...
Assuming that you specify all columns in the table, you can omit the field list.
You may also be more comfortable with the INSERT ... SET syntax.

How to add group_concat to hsqldb with distinct?

I am trying to add the group_concat function to hsqldb so that I can properly test a query as a unit/integration test. The query works fine in mysql, so I need it to work in hsqldb (hopefully).
// GROUP_CONCAT
jdbcTemplate.update("DROP FUNCTION GROUP_CONCAT IF EXISTS;");
jdbcTemplate.update(
"create aggregate function group_concat(in val varchar(100), in flag boolean, inout buffer varchar(1000), inout counter int) " +
" returns varchar(1000) " +
" contains sql " +
"begin atomic " +
" if flag then" +
" return buffer;" +
" else" +
" if val is null then return null; end if;" +
" if buffer is null then set buffer = ''; end if;" +
" if counter is null then set counter = 0; end if;" +
" if counter > 0 then set buffer = buffer || ','; end if;" +
" set buffer = buffer + val;" +
" set counter = counter + 1;" +
" return null;" +
" end if;" +
"end;"
);
Adding this aggregation function solves most of the problem. It will correctly behave like mysql's group_concat. However, what it won't do is let me use the distinct keyword like this:
group_concat(distinct column)
Is there any way to factor in the distinct keyword? Or do I rewrite the query to avoid the distinct keyword altogether?
HSQLDB has built-in GROUP_CONCAT and accepts DISTINCT.
http://hsqldb.org/doc/2.0/guide/dataaccess-chapt.html#dac_aggregate_funcs
At the moment you cannot add DISTINCT to a user-defined aggregate function, but this looks like an interesting feature to allow in the future.