How can I strip dollar signs ($) from a field in Hive? - csv

I'm trying to import a csv into Hive. I have a column which is a dollar value and is reported within the CSV as '$123,244.00.' I would like to convert this value into a float in Hive.
So I've loaded the csv into a temporary table, treating that column as a string. Next I want to load it into the final table, and in the process convert that string into a float or decimal.
Any suggestions on the best way to go about doing this?

This should work:
select float(regexp_replace(substr('$123,244.00', 2, length('$123,244.00')), ',', '')) from table;
You need to remove any commas as well as the dollar sign. You may find this link helpful as well: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-NumericTypes

Related

How to store integer data with comma in it?

I am having some difficulties about storing integer data with commas, I have prices, which is like 4,600 So I need to store it with commas but when I try to send it as Integer it cut after first number. I tried to change column type. BigInt or Double but it doesn't effect any. Any possible way to do that?
Also tried to change comma to dot "." but with this, mysql delete the "0" at last... I don't know why...
Prices
------
4,500
2,300
1,500
Because you're using a comma, MySQL most likely interprets the number as two fields, separated by the comma. For example:
Prices,Unspecified
------,-----------
4 ,500
2 ,300
1 ,500
In the numbers in question: If the comma is a thousands separator, remove it (via String replace) before trying to store the number. If it's a decimal point, replace it with a period (via String replace) and store it as a DOUBLE (or DECIMAL if you need high accuracy for large numbers).
If you want to display the number with a comma, use String formatting (possibly a number-formatting function other than String.format() or sprintf()) after retrieving the value from the database.
If you want to be able to do calculations using SQL queries with those numbers, then you need to store the price either use the DECIMAL type or use integer types and multiply the number by e.g. 1000 before saving.
Double or any other floating point representation of numbers are not suitable for price calculations/storage.
If you use DECIMAL need to convert the number form your local format 4,5000 to the format the database expects when you store it in the database, and convert it back to the local format when you retrieve your data.
If you store it as string then you can keep your local format but that's the worst solution, and should never be used.

redshift unload : Putting quotes only only character fields and not numeric

I am trying to run the unload command on redshift to dump data from a table into a CSV file. This table has character and numeric fields. The character fields may contain a comma (,) , so I need quotes around them. However, I dont need quotes around my numeric columns.
The following command is the closest I have come, but cant seem to get rid of the quotes aroud my numeric data. How can I achieve the desired result?
unload ('select * from mytable') to
's3://mybucket/path/file.csv'
DELIMITER ',' ADDQUOTES
This results in data like:
"Henry, Jr","23","4.5"
"Henry, Sr","56","4.2"
What I would like is :
"Henry, Jr",23,4.5
"Henry, Sr",56,4.2
From reading the official documentation, it seems like that's not possible.
I can suggest two potential workarounds:
1) wrap your string columns with quotes in the query, i.e. instead of
select * from mytable
have
select int_col_1, int_col_2, '"'||str_col_1||'"','"'||str_col_2||'"' from mytable
2) export tab delimited files so the commas in text columns stop being a problem

Preserving decimal values in SSIS

I have a column from my .csv file coming in with values as 1754625.24 etc,. where as we have to save it as integer in our database. So am trying to split number with '.' and divide second part with 1000 (24/1000) as i want 3 digit number.
so i get 0.024. But i am having issues storing/preserving that value as a decimal.
I tried (DT_DECIMAL,3) conversion but i get result as '0'.
My idea is to then append '024' part to original first part. So my final result should look like 1754625024
Please help
I am not convinced why would you store 1754625.24 as 1754625024 when storing it as int.
But still for your case , we can use a derived column task and
use Replace command on the source column of csv. E.g.
Replace('1754625.24','.',0)

Convert datatypes in Access Insert

Ok here is my problem. I have a csv file that is created out of my control that has a data for different groupings on the same file. The first seven lines are table headers for each group which are different for each group. So first I import this file into Access into a single table. I have since created queries to access the individual groups for data analysis. The problem is that I need to use an expression on one of the fields but since it has to be text in order to import from the spreadsheet because each column contains numbers and characters because of the headers in the top and because sometimes the data is not in the correct column and needs to be massaged. So what I want to do is insert each group into their own table but I want to convert some of the columns to numbers so that my expression will work. I will post the expression that I am having problems with. Thanks.
Sum(IIf([2000 Query].[Field19]=1,IIf([5000 Query].[Field21]>0,-[5000 Query].[Field21],[5000 Query].[Field21]),[5000 Query].[Field21])) AS [ADJ Invoice Total]
CDec:
IIf(CDec([2000 Query].[Field19])=1 ...
It works like so:
?cdec(" 20,121.34 ")
20121.34
So commas and leading and trailing spaces should be okay.
CDec is available in VBA but not in MS Access queries. In queries, Val will work:
IIf(Val([2000 Query].[Field19])=1 ...
Or CDbl, which will accept comma thousand separators and leading and trailing spaces.

CSV with Semi-Colon as delimiter

Has anyone ever written peoplecode to read a CSV file that uses semi-colons as a delimiter and comma as a decimal divider?
I need to read an Italian CSV file that uses those characters instead of the normal comma and decimal point.
What was your experience? Anything to watch out for?
Two options, one using file layout and the other without.
Option A) Using file layout: Consider below properties of filelayout
Definition Delimiter = "semicolon"
FieldType for the numeric field having comma as decimal divider = "character"
After reading the field, replace comma with a period and use value(&new_str) on the new string to convert it to number
Option B) Without file layout:
Open the input file in your code.
Loop through each line.
Use split to fetch field values- e.g.
&ret_arr = split(&str_line,";");
&ret_arr array will be populated with field values, access them using &ret_arr[1],..[2] etc.
Replace comma from that numeric field and use value(&new_str) for conversion.
Above was my experience (long back), nothing else to watch out for. Hope this helps!
If you are using a file layout it probably won't read the commas in as decimal separator although you can tell it to use the semi-colon as the separator. Your other option is to read all fields in as text and then do a replace on the number fields to replace the comma with a period and then do a value() on the string to convert it to a number.