Fail to load a 4 column CSV file into OCTAVE - output is only first column, or 1 array per line - csv

Trouble loading a csv file into OCTAVE.
EDIT: as pointed out from ANDY and Eliahu Aaron, I changed ; to ,.
csvread 4 returns separated columns, each named after the first line.
My matlab script throws these errors:
error: 'z' undefined near line 13 column 3
error: called from myScript at line 13 column 2
I can0t find -z even though there is now a column called z from where it should calculate.
This fixed my Issue in the end:
g = cell2mat(A(2:end-1,2));
My csv looks like this:
time;z;y;x
5;15084;-1360;-9664
7;15280;-1296;-9784
10;15032;-1384;-9688
30;15160;-1548;-9772
56;15116;-1532;-9660
First I had to delete the first row- because matrix was unreadable for octave.
If I try to csv2cell my file - I only get 1 column filled with all values in every line
mycsvdata = csv2cell("file.csv")
if I try csvread i get 1 column with the values of the first column name "ans"... 2nd,3rd and 4th column is ignored.
csvread("file.csv")
when i drag and drop the same csv into matlab - i click on the green tick and every column is named after its first cell and is a var. I end up having 4 vars called: time, z, y and x.
In octave this is kind of impossible for me to archieve.
what am i doing wrong?
This seems to be such a basic problem but I havent come across a solution in the internet.
I need to get 4 variables called time, z, y and x and having them all the values from the 1st (time), 2nd(z), 3rd(y) and 4th(x) column stored in them
I am new to octave and have a code written for matlab - which I want to change to octave. I am not even able to test my code, becuase I am not able to load the csv properly. This is very frustrating for me.
thanks in adavance

CSV by default uses , as column delimiters but your file has ; as column delimiters.
You can use dlmread("file.csv", ";") instead of csvread but it can't read the first row time;z;y;x.
You can use csv2cell("file.csv", ";"), the first row will be strings and the rest numbers.
To create a struct array with fields time;z;y;x you can use the fullowing code:
pkg load io
A = csv2cell("file.csv", ";");
B = cell2struct(A(2:end,:),A(1,:),2);

Related

Preserving decimal values in SSIS

I have a column from my .csv file coming in with values as 1754625.24 etc,. where as we have to save it as integer in our database. So am trying to split number with '.' and divide second part with 1000 (24/1000) as i want 3 digit number.
so i get 0.024. But i am having issues storing/preserving that value as a decimal.
I tried (DT_DECIMAL,3) conversion but i get result as '0'.
My idea is to then append '024' part to original first part. So my final result should look like 1754625024
Please help
I am not convinced why would you store 1754625.24 as 1754625024 when storing it as int.
But still for your case , we can use a derived column task and
use Replace command on the source column of csv. E.g.
Replace('1754625.24','.',0)

AWS Data Pipeline RedShift "delimiter not found" error

I'm working on the data pipeline. In one of the steps CSV from S3 is consumed by RedShift DataNode. My RedShift table has 78 columns. Checked with:
SELECT COUNT(*) FROM information_schema.columns WHERE table_name = 'my_table';
After failed RedshiftCopyActivity 'stl_load_errors' table shows "Delimiter not found" (1214) error for line number 1, for column namespace (this is second column, varchar(255)) on position 0. Consumed CSV line looks like that:
0,my.namespace.string,2119652,458031,S,60,2015-05-02,2015-05-02 14:51:02,2015-05-02 14:51:14.0,1,Counter,1,Counter 01,91,Chaymae,0,,,,227817,1,Dine In,5788,2015-05-02 14:51:02,2015-05-02 14:51:27,17.45,0.00,0.00,17.45,,91,Chaymae,0,0.00,12,M,A,-1,13,F,0,0,2,2.50,F,1094055,Coleslaw Md Upt,8,Sonstige,900,Sides,901,Sides,0.00,0.00,0,,,0.0000,0,0,,,0.00,0.0000,0.0000,0,,,0.00,0.0000,,1,Woche Counter,127,Coleslaw Md Upt,2,2.50
After simple replacement ("," to "\n") I have 78 lines so it looks like the data should be matched... I'm stuck on that. Maybe someone knows how I can find more information about the error or see the solution?
EDIT
Query:
select d.query, substring(d.filename,14,20),
d.line_number as line,
substring(d.value,1,16) as value,
substring(le.err_reason,1,48) as err_reason
from stl_loaderror_detail d, stl_load_errors le
where d.query = le.query
and d.query = pg_last_copy_id();
results with 0 rows.
I figured it out and maybe it will be useful for someone else:
There were in fact two problems.
My first field in the redshift table was of the type INT IDENTITY(1,1) and in CSV I had 0 value there. After removing the first column from CSV, even without specified columns mapping everything was copied without a problem if...
DELIMITER ',' commandOption was added to S3ToRedshiftCopyActivity to force using comma. Without it RedShift recognized dot from namespace (my.namespace.string) as delimiter.
You need to add FORMAT AS JSON 's3://yourbucketname/aJsonPathFile.txt'. AWS has not mentioned this already. Please note that this is only work when your data is in json form like
{'attr1': 'val1', 'attr2': 'val2'} {'attr1': 'val1', 'attr2': 'val2'}
{'attr1': 'val1', 'attr2': 'val2'} {'attr1': 'val1', 'attr2': 'val2'}

Why does SSIS TOKEN function fail to count adjacent column delimiters?

I ran into a problem with SQL Server Integration Services 2012's new string function in the Expression Editor called TOKEN().
This is supposed to help you parse a delimited record. If the record comes out of a flat file, you can do this with the Flat File Source. In this case, I am dealing with old delimited import records that were stored as strings in a database VARCHAR field. Now they need to be extracted, massaged, and re-exported as delimited strings. For example:
1^Apple^0001^01/01/2010^Anteater^A1
2^Banana^0002^03/15/2010^Bear^B2
3^Cranberry^0003^4/15/2010^Crow^C3
If these strings are in a column called OldImportRecord, the delimiter is a caret (as shown), and we wish to put the fifth field into a Derived Column, we would use an expression like:
TOKEN(OldImportRecord,"^",5)
This returns Anteater, Bear, Crow, etc. In fact, we can create Derived Columns for each of the fields in this record (note that the index is one-based), change them as needed, and then build another delimited record for export.
Here's the problem. What if some of our data includes some empty strings (or Nulls rendered as empty strings)?
4^^0004^6/15/2010^Duck^D4
The TOKEN() fails to count the adjacent column delimiters, which throws off the column count. Now it only sees five columns instead of six columns. Our TOKEN(OldImportRecord,"^",5) returns "D4" instead of the intended "Duck". When we extract the fourth column, we wind up trying to put "Duck" into a Date column, and all sorts of fun ensues.
Here's a partial workaround:
TOKEN(REPLACE(OldImportRecord,"^^","^ ^"),"^",5)
Notice this misses every second delimiter pair, so it will fail for a string like "5^^^^Emu^E5", which looks like"5^ ^^ ^Emu^E5" after the REPLACE(). The column count is still wrong.
So here's my full workaround. This includes two nested REPLACE statements(), an RTRIM() to remove the superfluous spaces, and a DT_STR cast because I would like to keep the result in VARCHAR:
(DT_STR,255,1252)RTRIM(TOKEN(REPLACE(REPLACE(OldImportRecord,"^^","^ ^"),"^^","^ ^"),"^",5))
I am posting this for information, since others may also run into this problem.
Does anyone have a better workaround, or even a real solution?
Reason for the issue:
TOKEN method in SSIS uses the implementation of strtok function in C++. I gathered this information while reading the book Microsoft® SQL Server® 2012 Integration Services. It is mentioned as note on page 113 (I like this book! Lots of nice information.).
I searched for the implementation of strtok function and I found the following links.
INFO: strtok(): C Function -- Documentation Supplement - The code sample in this link shows that the function does ignore consecutive delimiter characters.
The answers to the following SO questions point out that strtok function is designed to ignore consecutive delimiters.
Need to know when no data appears between two token separators using strtok()
strtok_s behaviour with consecutive delimiters
I think that the TOKEN and TOKENCOUNT functions are working as per design but whether that is how SSIS should behave might be a question for the Microsoft SSIS team.
Original Post - Above section is an update:
I created a simple package in SSIS 2012 based on your data inputs. As you had described in your question, the TOKEN function does not behave as intended. I agree with you that the function doesn't seem to work. This post is not an answer to your original issue.
Here is an alternative way to write the expression in a relatively simpler fashion. This will only work if the last segment in your input record will always have a value (say A1, B2, C3 etc.).
Expression can be rewritten as:
This statement will take the input record as the parameter, the delimiter caret (^) as the second parameter. The third parameter calculates the total number segments in the records when split by the delimiter. If you have data in the last segment, you are guaranteed to have two segments. You can then subtract 1 to fetch the penultimate segment.
(DT_STR,50,1252)TOKEN(OldImportRecord,"^",TOKENCOUNT(OldImportRecord,"^") - 1)
I created a simple package with data flow task. OLE DB source retrieves the data and the derived transformation parses and splits the data as per the screenshot below. The output is then inserted into the destination table. You can see the source and destination tables in the last screenshot. Destination table has two columns. The first column stores the penultimate segment data and the segments count based on the delimiter (which again isn't correct). You can notice that the last record didn't fetch the correct results. If the last record didn't have the value 8, then the above expression will fail because the expression will evaluate to zero index.
Hope that helps to simplify your expression.
If you don't hear from anyone else, I would recommend logging this issue in Microsoft Connect website.
Create table and populate scripts:
CREATE TABLE [dbo].[SourceTable](
[OldImportRecord] [varchar](50) NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DestinationTable](
[NewImportRecord] [varchar](50) NOT NULL,
[CaretCount] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT INTO dbo.SourceTable (OldImportRecord) VALUES
('1^Apple^0001^01/01/2010^Anteater^A1'),
('2^Banana^0002^03/15/2010^Bear^B2'),
('3^Cranberry^0003^4/15/2010^Crow^C3'),
('4^^0004^6/15/2010^Duck^D4'),
('5^^^^Emu^E5'),
('6^^^^Geese^F6'),
('^^^^Pheasant^G7'),
('8^^^^Sparrow^');
GO
Derived column transformation inside data flow task:
Data in source and destination tables:
Not only does TOKEN skip adjacent delimiters, it also skips leading and trailing delimiters as well. So, using your example, if you had a field "good" field that looks like this:
1^Apple^0001^01/01/2010^Anteater^A1
Followed by one with adjacent and leading delimiters like this:
^^^0004^6/15/2010^Duck^
TOKENCOUNT would only find two delimiters and you'd end up with 0004 assigned to Token1, 6/15/2010 for Token2, and Duck for Token3.
I used a different kind of replace. Rather than placing spaces between adjacent delimiters, which wouldn't help with leading or training, I used replace to surround the delimiters with characters I absolutely wouldn't find in my text. The following Expression works well for me. It's wordy, but it is what it is.
(DT_STR,255,1252)REPLACE(TOKEN(REPLACE(OldImportRecord,"^","~^~"),"^",1),"~","")
Of course, you'd replace the number 1 with whatever Token you wanted and adjust the cast according to your needs. Hope that helps.

Detecting the value 0x in a varchar column sourced from Excel

I have a sql table that gets populated via SQLBulkCopy from Excel. The copy down is done using the Microsoft ACE drivers.
I had a problem with one particular file - when it was loaded down to sql, some of the columns (which appear empty in excel) contained an odd value.
For example, running this sql:
SELECT
CONVERT(VARBINARY(10),MyCol),
LEN(MyCol)
FROM MyTab
would return
0x, 0
i.e. - converting the value in the column to varbinary shows something, but doing length of the varchar shows no length. I realise that the value shown is the stem of a hex value, but its weird that its gets there, and how hard it is to detect.
Obviously I can just clear out the cells in Excel, but I really need to detect this automatically as end users will have the same issue. It is causing issues further down the line when the data gets processed. Its quite hard to trace the problem back from its eventual symptoms to being this issue in the source.
Other than the above conversion to varbinary to output in SSMS I've not come up with a way of detecting these values, either in Excel or via a SQL script to remove them.
Any ideas?
This may help you:
-- Conversion from hex string to varbinary:
DECLARE #hexstring VarChar(MAX);
SET #hexstring = 'abcedf012439';
SELECT CAST('' AS XML).Value('xs:hexBinary( substring(sql:variable("#hexstring"), sql:column("t.pos")) )', 'varbinary(max)')
FROM (SELECT CASE SubString(#hexstring, 1, 2) WHEN '0x' THEN 3 ELSE 0 END) AS t(pos)
GO
-- Conversion from varbinary to hex string:
DECLARE #hexbin VarBinary(MAX);
SET #hexbin = 0xabcedf012439;
SELECT '0x' + CAST('' AS XML).Value('xs:hexBinary(sql:variable("#hexbin") )', 'varchar(max)');
GO
One method is to add a new column, convert the data, drop the
old column and rename the new column to the old name.
As Martin points out above, 0x is what you get when you convert an empty string. eg:
SELECT CONVERT(VARBINARY(10),'')
So the problem of detecting it obviously goes away.
I have to assume that there is some rubbish in the excel cell, that is being filtered out in the process of the write down by either the ACE driver or the SQLBulkCopy. Because there was something in the field originally, the value written is empty instead of null.
In order to make sure that everything is consistent in the data we'll need to do a post process to switch all empty values to nulls so that the next lots of scripts work.

Load file into mysql

I need to load a tab delimited file into mysql database. My database is set up with columns: ID;A;B;C;D;E and my file is a dump of columns ID and D. How can I load this file into my db and replace just columns ID and D, which out changing the values of columns C and E. When I load it in now, columns C and E are changed to DEFAULT:NULL.
I already answered a similar question like this here, but in your case, you'd want to load the csv into a temporary table, then use a simple update SQL statement to update the specific columbs from your temporary table to your production table.
You can update specific column using this command:
UPDATE the_table
SET D=<value-of-D>
WHERE ID=<value-of-ID>
Then run this command for each row in the tab-delimited file, replacing the D and ID values for each.
You can use a stored procedure or a PHP program to do the needful.
For MySQL, the SP would need to open the file using load_file() and storing the value in a variable. Then the program needs to loop through by finding "\n" which stands for new line to get a whole line in a string.
Then the program needs to be find the first tab position using locate() and by using substring() get the first ID col. Then the program needs to find the 4th tab i.e. 3 more tabs by using locate() and its 3rd parameter. This will be the starting position of your D column. Then find the next tab character again using locate() and its 3rd parameter which will give you the end character of the D column. Using substring() get the content of the D column. Use a update command to update the row's D column using ID as the search key in the where clause.
Since the above will loop through all lines you will update all rows of data.