Issue while accessing data from Octave dataframe - octave

I am trying to access data from a dataframe in Octave which satisfy some criterias.
Let us say dataframe name is A with comulns 'Date' and 'Close Price'.
Let us say we intend to access Close prices when Close Price was less than 5.
I am using the command:
A.Close(A.Close<5)
I am getting the following error:
error: subsref.m querying name Close returned positions 5
subsref.m querying name returned positions
I see an instruction video in youtube, where same command is used but no error showed up.

I cannot reproduce the problem. Are you sure your dataframe is correctly initialised?
pkg load dataframe
A = dataframe( [1,2;3,4;5,6;7,8], 'colnames', { 'Date', 'Close Price' } );
A.Close( A.Close < 5 )
% ans =
% 2
% 4
I suspect your error may have to do with the fact that your column name is Close_Price, but you tried to index it via 'Close'. Do you have any other columns that start with the word 'Close' by any chance?

Related

Stream a NOT NULL selection of a table?

I'm trying to select the primary key for all rows in a table based on if another column is NULL.
The following code does not do what I want, but this is what it would look like as a pure select(), but the table is so large that it nearly fills up memory before returning any results.
s = tweets.select().where(tweets.c.coordinates != None)
result = engine.execute(s)
for row in result:
print(row)
Because the table is so large, I found a streaming solution that works for the session.query() object:
def page_query(q):
r = True
offset = 0
while r:
r = False
for elem in q.limit(1000).offset(offset):
r = True
yield elem
offset += 1000
so I'm trying to structure the above select() as a query(), but when I do, it returns every row in the table, including ones with coordinates = 'null'
q= session.query(Tweet).filter(Tweet.coordinates.is_not(None))
for i in page_query(q):
print(f' {i}')
If I instead do
q= session.query(Tweet).filter(Tweet.coordinates.is_not('null'))
for i in page_query(q):
print(f' {i}')
I get an error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.SyntaxError) syntax error at or near "'null'"
LINE 3: WHERE milan_tweets.coordinates IS NOT 'null'
^
(using != appears to give the same results as the built in .is_not())
So how can I make this selection?
EDIT: Code block at the top does NOT do what I expected originally, my mistake.
Rows are added to the database as python Nones, and looking in dbeaver shows the values as "null"
You have correctly diagnosed the problem.
Query returns e.g. a million rows,
and the psycopg2 driver drags all
of those result rows over the network,
buffering them locally, before returning
even a single row up to your app.
Why? Because the public API includes a
detail where your app could ask
"how many rows were in that result?",
and the driver must retrieve all in order
to learn that bit of trivia.
If you promise not to ask the "how many?"
question, you can stream results with this:
import sqlalchemy as sa
engine = sa.create_engine(uri).execution_options(stream_results=True)
Then rows will be up-delivered to your app
nearly
as soon as they become available,
rather than being buffered
a long time.
This yields a significantly smaller
memory footprint for your python process,
as the DB driver layer does not need
to malloc() storage sufficient to
store all million result rows.
https://docs.sqlalchemy.org/en/14/core/connections.html#streaming-with-a-fixed-buffer-via-yield-per
cf test_core_fetchmany_w_streaming

Error while reading data, error message: CSV table references column position 15, but line starting at position:0 contains only 1 columns

I am new in bigquery, Here I am trying to load the Data in GCP BigQuery table which I have created manually, I have one bash file which contains bq load command -
bq load --source_format=CSV --field_delimiter=$(printf '\u0001') dataset_name.table_name gs://bucket-name/sample_file.csv
My CSV file contains multiple ROWS with 16 column - sample Row is
100563^3b9888^Buckname^https://www.settttt.ff/setlllll/buckkkkk-73d58581.html^Buckcherry^null^null^2019-12-14^23d74444^Reverb^Reading^Pennsylvania^United States^US^40.3356483^-75.9268747
Table schema -
When I am executing bash script file from cloud shell, I am getting following Error -
Waiting on bqjob_r10e3855fc60c6e88_0000016f42380943_1 ... (0s) Current status: DONE
BigQuery error in load operation: Error processing job 'project-name-
staging:bqjob_r10e3855fc60c6e88_0000ug00004521': Error while reading data, error message: CSV
table
encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection
for more details.
Failure details:
- gs://bucket-name/sample_file.csv: Error while
reading data, error message: CSV table references column position
15, but line starting at position:0 contains only 1 columns.
What would be the solution, Thanks in advance
You are trying to insert wrong values to your table per the schema you provided
Based on table schema and your data example I run this command:
./bq load --source_format=CSV --field_delimiter=$(printf '^') mydataset.testLoad /Users/tamirklein/data2.csv
1st error
Failure details:
- Error while reading data, error message: Could not parse '39b888'
as int for field Field2 (position 1) starting at location 0
At this point, I manually removed the b from 39b888 and now I get this
2nd error
Failure details:
- Error while reading data, error message: Could not parse
'14/12/2019' as date for field Field8 (position 7) starting at
location 0
At this point, I changed 14/12/2019 to 2019-12-14 which is BQ date format and now everything is ok
Upload complete.
Waiting on bqjob_r9cb3e4ef5ad596e_0000016f42abd4f6_1 ... (0s) Current status: DONE
You will need to clean your data before upload or use a data sample with more lines with --max_bad_records flag (Some of the lines will be ok and some not based on your data quality)
Note: unfortunately there is no way to control date format during the upload see this answer as a reference
We had the same problem while importing data from local to BigQuery. After researching the data we saw that there data which starting \r or \s enter image description here
After implementing ua['ColumnName'].str.strip() and ua['District'].str.rstrip(). we could add data to Bg.
Thanks

During bulk insert date is detected as null

In one of my applications I am importing csv data into mt Access db using the following bulk insert query.
INSERT INTO Log_134_temp ([DATE],[TIME],CH0,CH1,CH2,CH3) SELECT [DATE],[TIME],CH0,CH1,CH2,CH3 FROM [Text;FMT=CSVDelimited;HDR=Yes;DATABASE=C:\tmp].[SAMPLE_1.csv]
The query gets executed and all the parameters in the query are correct. The issue is with just one of the csv files which gives the following error after the query is executed.
The field 'Log_134_temp.date' cannot contain a Null value because the
Required property for this field is set to True. Enter a value in
this field.
Where as the other csv files get imported without any issue.
The file that is imported successfully and the file with the issue however look identical with their formats. And this has puzzled me over a day now.
The file that gets imported
https://www.dropbox.com/s/amddhzhi6nr24ex/SAMPLE_1_111.csv?dl=0
The file that doesn't get imported
https://www.dropbox.com/s/2rrgdf7oor5ptbf/SAMPLE_1_112.csv?dl=0
Bad line in row 135169:
2019-02-14,16:57:54,310,837,300,650
It contains a lot of 00 symbols.
I found this with help of simple Python cycle:
In [43]: f = read_file(r'...SAMPLE_1_112.csv')
In [44]: li = f.split('\n')
...
In [60]: prev_len = 1
In [61]: for l in li:
...: if not len(l): continue
...: if prev_len != len(l): print(l)
...: prev_len = len(l)
...:
DATE,TIME,CH0,CH1,CH2,CH3
2018-10-03,11:45:44,246,621,250,600
2019-02-14,16:57:54,310,837,300,650
2019-02-14,16:59:01,309,859,300,650

Timeout error when downloading large files with URL

when I try and download the datasets within the for loop I always get this error Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : InternetOpenUrl failed: 'The operation timed out'
I have tried changing the timeout to different values 100,200,500,1000, but it doesn't seem to be changing as the error occurs after the same amount of time regardless of what value I set.
## Get catalog and select for dataset ids, also filtered out data sets that don't contain key words
catalog<-tbl_df(read.csv2("https://datasource.kapsarc.org/explore/download/"))
select(catalog,datasetid,country,data.classification,title,theme,keyword)%>%
filter(grepl("Demo|Econo|Trade|Industry|Policies|Residential",theme))%>%
select(datasetid)->datasets
data_kapsarc<-list()
base_url<-"https://datasource.kapsarc.org/api/v2/catalog/datasets/population-by-sex-and-age-group/exports/json?rows=-1&start=0&pretty=true&timezone=UTC"
options(timeout=1000)
##download data sets and store in list of dataframes
for (i in 1:length(datasets$datasetid)){
try({
url<-gsub("population-by-sex-and-age-group",datasets[i,1]$datasetid,base_url)
temp <- tempfile()
download.file(url,temp,mode='wb')
data_kapsarc[[i]]<-fromJSON(temp)
unlink(temp)
},silent=TRUE)
}

SSIS Import a date and time from a txt to a table datetime

So I want to import a datetime from a txt:
2015-01-22 09:19:59
into a table using a data flow. I have my Flat Source File and my destination DB set up fine. I changed the data type for the txt input for that column in the advanced settings and the input and output properties to:
database timestamp [DT_DBTIMESTAMP]
This is the same data type as the DB used for the table so this should work.
However, when I execute the package I get a error saying the data conversion failed... How do I make this possible?
[Import txt data [1743]] Error: Data conversion failed. The data conversion for column "statdate" returned status value 2 and status text "The value could not be converted because of a potential loss of data.".
[Import txt data [1743]] Error: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "output column "statdate" (2098)" failed because error code 0xC0209084 occurred, and the error row disposition on "output column "statdate" (2098)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
[Import txt data [1743]] Error: An error occurred while processing file "C:\Program Files\Microsoft SQL Server\MON_Datamart\Sourcefiles\tbl_L30T1.txt" on data row 14939.
On the row he is giving the error the datetime is filled up with spaces. that is why on the table the "allow nulls" is checked but my SSIS package gives the error for some reason... can I somewhere tell the package to allow nulls aswell?
I suggest you import the data in to a character field and then parse it after entry.
The following function should help you:
SELECT IsDate('2015-01-22 09:19:59')
, IsDate(Current_Timestamp)
, IsDate(' ')
, IsDate('')
The IsDate() function returns a 1 when it thinks the value is a date and a 0 when it is not.
This would allow you to do something like:
SELECT value_as_string
, CASE WHEN IsDate(value_as_string) = 1 THEN
Cast(date_as_string As datetime)
ELSE
NULL
END As value_as_datetime
FROM ...
I solved it Myself. Thank you for your suggestion gvee but the way I did it is way easier.
In the Flat File Source when making a new connection in the advanced tab I fixed all the data types according to the table in the database EXCEPT the column with the timestamp (in my case it was called "statdate")! I changed this data type to a STRING because otherwise my Flat File Source would give me a conversion error even before any scripts would have been able to be executed and the only way arround this was setting the error output to ignore failure wich I don't want. (You still have to change the data type after you set it to a string in the advanced settings by right clicking the flat file source -> show advanced editor -> going to the output colums and changing the data type there from Date to string.)
After the timestamp was set to a string I added a Derived Column with this expression to delete all the spaces and give it then "NULL" value:
TRIM(<YourColumnName>) == "" ? (DT_STR,4,1252)NULL(DT_STR,4,1252) : <YourColumnName>
Next I added a Data Conversion to set the string back to a timestamp. The Data conversion is finally connected to the OLE DB Destination.
I hope this helps anyone with the same problem in the future.
End result: Picture of data flow