alter external table TABLENAME refresh error - external

Error message - SQL execution internal error: Processing aborted due to error 300002:1263389222; incident 9679362.
Trying to refresh external table on S3. It has worked previously.
create or replace external table EXT_ANIXTER_WNT_PART(event_timestamp datetime as TO_TIMESTAMP(value:wirepas.wirepas.packetReceivedEvent.rxTimeMsEpoch::varchar),
source_endpoint int as (value:wirepas.wirepas.packetReceivedEvent.sourceEndpoint::int),
source_address int as (value:wirepas.wirepas.packetReceivedEvent.sourceAddress::int),
folder varchar as (split_part(metadata$filename,'/', 1)::varchar),
message_date date as to_date(split_part(metadata$filename,'/', 2) ||'/' || split_part(metadata$filename,'/', 3) ||'/' || split_part(metadata$filename,'/', 4), 'YYYY/MM/DD')
partition by (folder, message_date)
location = #LABS_DATA.SBAS.PROLOGIS2WPEWNT_EXTWIREPAS_COM
file_format = (type = JSON)
refresh_on_create = TRUE
auto_refresh = TRUE

Try increasing the warehouse size and see if that works. Maybe the current warehouse isnt enough to read all the files (it may be due to memory not enough on the current warehouse size) depending upon the number of files in the S3 bucket.

Related

Lua - How to analyse a .csv export to show the highest, lowest and average values etc

Using Lua, i’m downloading a .csv file and then taking the first line and last line to help me validate the time period visually by the start and end date/times provided.
I’d also like to scan through the values and create a variety of variables e.g the highest, lowest and average value reported during that period.
The .csv is formatted in the following way..
created_at,entry_id,field1,field2,field3,field4,field5,field6,field7,field8
2021-04-16 20:18:11 UTC,6097,17.5,21.1,20,20,19.5,16.1,6.7,15.10
2021-04-16 20:48:11 UTC,6098,17.5,21.1,20,20,19.5,16.3,6.1,14.30
2021-04-16 21:18:11 UTC,6099,17.5,21.1,20,20,19.6,17.2,5.5,14.30
2021-04-16 21:48:11 UTC,6100,17.5,21,20,20,19.4,17.9,4.9,13.40
2021-04-16 22:18:11 UTC,6101,17.5,20.8,20,20,19.1,18.5,4.4,13.40
2021-04-16 22:48:11 UTC,6102,17.5,20.6,20,20,18.7,18.9,3.9,12.40
2021-04-16 23:18:11 UTC,6103,17.5,20.4,19.5,20,18.4,19.2,3.5,12.40
And my code to get the first and last line is as follows
print("Part 1")
print("Start : check 2nd and last row of csv")
local ctr = 0
local i = 0
local csvfilename = "/home/pi/shared/feed12hr.csv"
local hFile = io.open(csvfilename, "r")
for _ in io.lines(csvfilename) do ctr = ctr + 1 end
print("...... Count : Number of lines downloaded = " ..ctr)
local linenumbera = 2
local linenumberb = ctr
for line in io.lines(csvfilename) do i = i + 1
if i == linenumbera then
secondline = line
print("...... 2nd Line is = " ..secondline) end
if i == linenumberb then
lastline = line
print("...... Last line is = " ..lastline)
-- return line
end
end
print("End : Extracted 2nd and last row of csv")
But I now plan to pick a column, ideally by name (as I’d like to be able to use this against other .csv exports that are of a similar structure.) And get the .csv into a table/array...
I’ve found an option for that here - Csv file to a Lua table and access the lines as new table or function()
See below..
#!/usr/bin/lua
print("Part 2")
print("Start : Convert .csv to table")
local csvfilename = "/home/pi/shared/feed12hr.csv"
local csv = io.open(csvfilename, "r")
local items = {} -- Store our values here
local headers = {} --
local first = true
for line in csv:gmatch("[^\n]+") do
if first then -- this is to handle the first line and capture our headers.
local count = 1
for header in line:gmatch("[^,]+") do
headers[count] = header
count = count + 1
end
first = false -- set first to false to switch off the header block
else
local name
local i = 2 -- We start at 2 because we wont be increment for the header
for field in line:gmatch("[^,]+") do
name = name or field -- check if we know the name of our row
if items[name] then -- if the name is already in the items table then this is a field
items[name][headers[i]] = field -- assign our value at the header in the table with the given name.
i = i + 1
else -- if the name is not in the table we create a new index for it
items[name] = {}
end
end
end
end
print("End : .csv now in table/array structure")
But I’m getting the following error ??
pi#raspberrypi:/ $ lua home/pi/Documents/csv_to_table.lua
Part 2
Start : Convert .csv to table
lua: home/pi/Documents/csv_to_table.lua:12: attempt to call method 'gmatch' (a nil value)
stack traceback:
home/pi/Documents/csv_to_table.lua:12: in main chunk
[C]: ?
pi#raspberrypi:/ $
Any ideas on that ?
I can confirm that the .csv file is there ?
Once everything (hopefully) is in a table - I then want to be able to generate a list of variables based on the information in a chosen column, which I can then use and send within a push notification or email (which I already have the code for).
The following is what I’ve been able to create so far, but I would appreciate any/all help to do more analysis of the values within the chosen column so I can see all things like get highest, lowest, average etc.
print("Part 3")
print("Start : Create .csv analysis values/variables")
local total = 0
local count = 0
for name, item in pairs(items) do
for field, value in pairs(item) do
if field == "cabin" then
print(field .. " = ".. value)
total = total + value
count = count + 1
end
end
end
local average = tonumber(total/count)
local roundupdown = math.floor(average * 100)/100
print(count)
print(total)
print(total/count)
print(rounddown)
print("End : analysis values/variables created")
io.open returns a file handle on success. Not a string.
Hence
local csv = io.open(csvfilename, "r")
--...
for line in csv:gmatch("[^\n]+") do
--...
will raise an error.
You need to read the file into a string first.
Alternatively can iterate over the lines of a file using file:lines(...) or io.lines as you already do in your code.
local csv = io.open(csvfilename, "r")
if csv then
for line in csv:lines() do
-- ...
You're iterating over the file more often than you need to.
Edit:
This is how you could fill a data table while calculating the maxima for each row on the fly. This assumes you always have valid lines! A proper solution should verify the data.
-- prepare a table to store the minima and maxima in
local colExtrema = {min = {}, max = {}}
local rows = {}
-- go over the file linewise
for line in csvFile:lines() do
-- split the line into 3 parts
local timeStamp, id, dataStr = line:match("([^,]+),(%d+),(.*)")
-- create a row container
local row = {timeStamp = timeStamp, id = id, data = {}}
-- fill the row data
for val in dataStr:gmatch("[%d%.]+") do
table.insert(row.data, val)
-- find the biggest value so far
-- our initial value is the smallest number possible
local oldMax = colExtrema[#row.data].max or -math.huge
-- store the bigger value as the new maximum
colExtrema.max[#row.data] = math.max(val, oldMax)
end
-- insert row data
table.insert(rows, row)
end

RFC-enabled function module to update physical samples

I need to update some fields of Physical samples in SAP ERP:
List of columns which are in the table QPRS:
ABINF: Storage Information
ABDAT: Storage Deadline
ABORT: Storage Location
List of fields which correspond to statuses (table JEST):
Sample Was Stored: status I0363 (short code in Status History: "STRD")
Sample Consumed/Destroyed: status I0362 (short code in Status History: "USED")
Is there a RFC-enabled function module to update these fields?
Thanks.
As far as I know there is no BAPI for updating storage data. Anyhow but you will need ABAP development for this, QPRS_QPRS_STORAGE_UPDATE is the FM you can copy into Z one and make it remote-enabled:
DATA: i_qprs TYPE qprs,
i_lgort TYPE qprs-lgort VALUE 'Z07',
i_abort TYPE qprs-abort VALUE '1',
i_abdau TYPE qprs-abdau VALUE 10,
i_abdat TYPE qprs-abdat VALUE '20200510',
i_abinf TYPE qprs-abinf VALUE 'info 1st',
i_aufbx TYPE rqprs-aufbx VALUE 'first storage',
i_prnvx TYPE rqprs-prnvx VALUE abap_true,
i_qprs_cust TYPE qprs_cust,
e_qprs_new TYPE qprs,
e_aufbx TYPE rqprs-aufbx,
e_prnvx TYPE rqprs-prnvx.
i_qprs-phynr = '000900000054'.
CALL FUNCTION 'QPRS_QPRS_STORAGE_UPDATE'
EXPORTING
i_qprs = i_qprs
i_lgort = i_lgort
i_abort = i_abort
i_abdau = i_abdau
i_abdat = i_abdat
i_abinf = i_abinf
i_aufbx = i_aufbx
i_prnvx = i_prnvx
i_qprs_cust = i_qprs_cust
IMPORTING
e_qprs_new = e_qprs_new
e_aufbx = e_aufbx
e_prnvx = e_prnvx
EXCEPTIONS
sample_locked = 1
locking_error = 2
sample_not_found = 3
abort_not_found = 4
sample_already_changed = 5.

Kernel32.dll - DeviceIOControl returns false while trying to get String descriptor in Win 10

I am currently using the DeviceIOControl API from kernel32.dll to get the String Descriptors of the list of connected USB devices.
public static String GetStringDescriptor(IntPtr deviceHandle, Int32 ConnectionIndex, Byte DescriptorIndex, UInt16 LanguageID)
{
USB_DESCRIPTOR_REQUEST Buffer = new USB_DESCRIPTOR_REQUEST();
Buffer.ConnectionIndex = ConnectionIndex;
Buffer.SetupPacket.wValue = (UInt16)((USB_STRING_DESCRIPTOR_TYPE << 8) | DescriptorIndex);
Buffer.SetupPacket.wIndex = LanguageID;
Buffer.SetupPacket.wLength = MAXIMUM_USB_STRING_LENGTH;
Int32 nBytesReturned;
Boolean Status = DeviceIoControl(deviceHandle,
IOCTL_USB_GET_DESCRIPTOR_FROM_NODE_CONNECTION,
ref Buffer,
Marshal.SizeOf(Buffer),
ref Buffer,
Marshal.SizeOf(Buffer),
out nBytesReturned,
IntPtr.Zero);
if (Status)
return Buffer.Data.bString;
else
return null;
}
We use this function to get the descriptor details such as Language ID, Serial number, Manufacturer and Product String. Only while requesting the Serial Number, the Status returned is TRUE and we get the expected values. But the status returns false for Language id, manufacturer and Product string.
I checked the error status returned by the DeviceIoControl using:
int error = Marshal.GetLastWin32Error();
It returns 31 as the error code which means that the Device is not working properly/ the driver for the device is not properly installed.
I tried all the obvious solutions like reinstalling the driver for the device and restarting the PC etc., but none seems to work. I am sure there are no issues in the device or the code because it works flawlessly in windows 7 PCs. Also, since I am able to get the serial number, I think the device handle is also proper.
I am not able to proceed with any further debugging. Is there some update to the DeviceIoControl function in Windows 10? Or is the way to get the languageID, manufacturer and Product String changed in Windows 10?
Most probably the device you are trying to get string descriptors from is in low power state. Check its current power state first, and if it differs from "PowerDeviceD0" - string descriptors may not be obtained (depending on device, actually, and actual power state level D1, D2, D3). This could be the cause of error code 31 from DeviceIOControl().
Try to wake the device first or get some stored strings with SetupAPI.

SSIS Event Handler - How do I get the entire error message?

I've set up a data flow task with a source component (ODBC to Salesforce) that writes rowcounts and any raised error messages to a table.
I've created an OnError event handler that writes the message from System::ErrorDescription to a variable, and then that variable is written to the table.
My problem is that System::ErrorDescription doesn't have the interesting error message, but the summary.
These are the messages being generated in the Progress tab:
[SRC - Extract Account [6]] Error: System.Data.Odbc.OdbcException (0x80131937): ERROR [HY000] INVALID_LOGIN: Invalid username, password, security token; or user locked out.etc, etc,etc
[SSIS.Pipeline] Error: SRC - Extract Account failed the pre-execute phase and returned error code 0x80131937.
System::ErrorDescription only has the [SSIS.Pipeline] error ("SRC - Extract Account failed the pre-execute phase and returned error code 0x80131937").
How do I return the more detailed [SRC - Extract Account [6]] message?
Thanks,
Jason
You could also just query your SSISDB to get the error.
Use event_name to find your error
Try this:
/*
:: PURPOSE
Show the Information/Warning/Error messages found in the log for a specific execution
:: NOTES
The first resultset is the log, the second one shows the performance
:: INFO
Author: Davide Mauri
Version: 1.1
:: VERSION INFO
1.0:
First Version
1.1:
Added filter option on Message Source
Correctly handled the "NULL" filter on ExecutionId
*/
USE SSISDB
GO
/*
Configuration
*/
-- Filter data by execution id (use NULL for no filter)
DECLARE #executionIdFilter BIGINT = NULL;
-- Show only Child Packages or everyhing
DECLARE #showOnlyChildPackages BIT = 0;
-- Show only message from a specific Message Source
DECLARE #messageSourceName NVARCHAR(MAX)= '%'
/*
Implementation
*/
/*
Log Info
*/
SELECT * FROM catalog.event_messages em
WHERE ((em.operation_id = #executionIdFilter) OR #executionIdFilter IS NULL)
AND (em.event_name IN ('OnInformation', 'OnError', 'OnWarning'))
AND (package_path LIKE CASE WHEN #showOnlyChildPackages = 1 THEN '\Package' ELSE '%' END)
AND (em.message_source_name like #messageSourceName)
ORDER BY em.event_message_id;
/*
Performance Breakdown
*/
IF (OBJECT_ID('tempdb..#t') IS NOT NULL) DROP TABLE #t;
WITH
ctePRE AS
(
SELECT * FROM catalog.event_messages em
WHERE em.event_name IN ('OnPreExecute')
AND ((em.operation_id = #executionIdFilter) OR #executionIdFilter IS NULL)
AND (em.message_source_name like #messageSourceName)
),
ctePOST AS
(
SELECT * FROM catalog.event_messages em
WHERE em.event_name IN ('OnPostExecute')
AND ((em.operation_id = #executionIdFilter) OR #executionIdFilter IS NULL)
AND (em.message_source_name like #messageSourceName)
)
SELECT
b.operation_id,
from_event_message_id = b.event_message_id,
to_event_message_id = e.event_message_id,
b.package_path,
b.execution_path,
b.message_source_name,
pre_message_time = b.message_time,
post_message_time = e.message_time,
elapsed_time_min = DATEDIFF(mi, b.message_time, COALESCE(e.message_time, SYSDATETIMEOFFSET()))
INTO
#t
FROM
ctePRE b
LEFT OUTER JOIN
ctePOST e ON b.operation_id = e.operation_id AND b.package_name = e.package_name AND b.message_source_id = e.message_source_id AND b.[execution_path] = e.[execution_path]
INNER JOIN
[catalog].executions e2 ON b.operation_id = e2.execution_id
WHERE
e2.status IN (2,7)
OPTION
(RECOMPILE)
;
I know the question is old, but I had this problem today.
Each error message line fires OnError event.
So to capture all error lines concatenate the value of yours variable.
Something like that:
Dts.Variables["MyErrorVar"].Value = Dts.Variables["MyErrorVar"].Value + Environment.NewLine + Dts.Variables["System::ErrorDescription"].Value.ToString()

SQL Deadlock with Python Data Insert

I'm currently trying to build a database interface with python to store stock data. This data is in the form of a tuple list with each element consisting of "date, open, high, low, close, volume. date represents a UNIX timestamp and has to be unique in combination with the ticker symbol in the database. Below is an example of a typically processed output (company_stock):
[(1489780560, 'NYSE:F', 12.5, 12.505, 12.49, 12.495, 567726),
(1489780620, 'NYSE:F', 12.495, 12.5, 12.48, 12.48, 832487),
(1489780680, 'NYSE:F', 12.485, 12.49, 12.47, 12.475, 649818),
(1489780740, 'NYSE:F', 12.475, 12.48, 12.47, 12.47, 700579),
(1489780800, 'NYSE:F', 12.47, 12.48, 12.47, 12.48, 567798)]
I'm using the pymysql package to insert this list into a local MySQL database (Version 5.5). While the code runs through and the values get inserted, the database will crash - or rather stop - after reaching about ~250k rows. Since the relevant This is the export part of the stock data processing function which gets called about once every 20 seconds and inserts about 400 values.
# SQL Export
def tosql(company_stock, ticker, interval, amount_period, period):
try:
conn = pymysql.connect(host = "localhost", user = "root",
passwd = "pw", db = "db", charset = "utf8",
autocommit = True,
cursorclass = pymysql.cursors.DictCursor)
cur = conn.cursor()
# To temp table
query = "INSERT INTO stockdata_import "
query += "(date, tickersymbol, open, high, low, close, volume)"
query += "VALUES (%s, %s, %s, %s, %s, %s, %s)"
cur.executemany(query, company_stock)
# Duplicate Check with temp table and existing database storage
query = "INSERT INTO stockdata (date, tickersymbol, open, high, low, close, volume) "
query += "SELECT i.date, i.tickersymbol, i.open, i.high, i.low, "
query += "i.close, i.volume FROM stockdata_import i "
query += "WHERE NOT EXISTS(SELECT dv.date, dv.tickersymbol FROM "
query += "stockdata dv WHERE dv.date = i.date "
query += "AND dv.tickersymbol = i.tickersymbol)"
cur.execute(query)
print(": ".join([datetime.now().strftime("%d.%m.%Y %H:%M:%S"),
"Data stored in Vault. Ticker", str(ticker),
"Interval", str(interval),
"Last", str(amount_period), str(period)]))
finally:
# Clear temp import table and close connection
query = "DELETE from stockdata_import"
cur.execute(query)
cur.close()
conn.close()
I suspect that the check for already existent values takes too long as the database grows and eventually breaks down due to the lock of the tables (?) while checking for uniqueness of the date/ticker combination. Since I expect this database to grow rather fast (about 1 million rows per week) it seems that a different solution is required to ensure that there is only one date/ticker pair. This is the SQL CREATE statement for the import table (the real table with which it gets compared looks the same):
CREATE TABLE stockdata_import (id_stock_imp BIGINT(12) NOT NULL AUTO_INCREMENT,
date INT(10),
tickersymbol VARCHAR(16),
open FLOAT(12,4),
high FLOAT(12,4),
low FLOAT(12,4),
close FLOAT(12,4),
volume INT(12),
crawled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(id_stock_imp));
I have already looked into setting a constraint for the date/tickersymbol pair and to handle upcoming exceptions in python, but my research so far suggested that this would be even slower plus I am not even sure if this will work with the bulk insert of the pymysql cursor function executemany(query, data).
Context information:
The SQL export shown above is the final part of a python script handling the stock data response. This script, in turn, gets called by another script which is timed by a crontab to run at a specific time each day.
Once the crontab starts the control script, this will call the subscript about 500 times with a sleep time of about 20-25 seconds between each run.
The error which I see in the logs is: ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Questions:
How can I optimize the query or alter the storage table to ensure uniqueness for a given date/ticker combination?
Is this even the problem or do I fail to see some other problem here?
Any further advice is also welcome.
If you would like to ensure uniqueness of your data, then just add a unique index on the relevant date and ticker fields. Unique index prevents duplicate values from being inserted, therefore there is no need to check for the existence of data before the insertion.
Since you do not want to insert duplicate data, just use insert ignore instead of plain insert to supress duplicate insert errors. Based on the mumber of affected rows, you can still detect and log duplicate insertions.