Upload XML into Exact Online exceeding maximum size or response time - exact-online

I have several XML files generated by a industry solution with new data to be uploaded into Exact Online using the XML API, either directly or using the UploadXmlTopics table in Invantive SQL with the Exact Online driver.
However, the Exact Online XML API poses a limit of approx. 10 MB per upload and even then the load time can be long when the system is heavily loaded.
When the load time exceeds ten minutes, part of the transactions have been applied and a part has failed. With a timeout there is no message returned which states what remains to be loaded.
I can not change the XML files since they are automatically generated by the industry solution.
What is the best way to reliably upload the XML files into Exact Online?

Of course it is best to have the XML files changed, but there are various alternatives. Please note first of all that performance of Exact Online varies across the day and weekend. Best time to upload massive amounts of data is Sunday between 13:00 and 23:00 based upon experience.
When uploading manually, you can split the XML files by hand into several XML files. Always split on the main topic at the path /eExact/TOPIC.
When uploading through UploadXmlTopics table, you can use two approaches:
Calculated
Automated
Calculated XML size for Exact Online
The calculated approach is the only one available in older versions. It works as follows:
In a in memory table or a file based table put a record per XML topic that you wish to upload. I normally name them 'xml' as column name.
Then determine how many fragments you need, for instance using:
select ceil(log(xmlsize / 10000, 16)) + 1
from ( select sum(length(xml)) xmlsize from xmlaccounts#inmemorystorage )
Replace 10000 by the maximum fragment size. Choose a smaller one during periods of heavy load and 1000000 for the weekend.
Memorize the outcome using for instance:
local define xmlaccountsparts "${outcome:0,0}"
Then construct the new XML to insert into UploadXmlTopics as follows:
select filenamepostfix, xml
from ( select filenamepostfix, listagg(xml, '') xml
from ( select substr(md5(xml), 1, ${xmlaccountsparts}) filenamepostfix
, xml
from xmlaccounts#inmemorystorage
)
group
by filenamepostfix
)
And insert this payload into Exact Online using UploadXMLTopics.
What it effectively does is to first determine approximately how many files you need using the logarithmic function with 16 as base. Then use MD5 to associate somewhat randomly distributed hexadecimal (16 base) values for each XML topic to upload. Take a number of left-side characters of the MD5 values equal to the logarithmic function outcome to get approximately this number of files, each with approximately the same payload size. And then reconstruct the XML.
Automatic XML size for Exact Online
Newer releases have an auto fragment option which do the heavy lifting for you. Use a SQL like:
insert into UploadXMLTopics#eol
--
-- Upload seed data into Exact Online.
--
( topic
, payload
, division_code
, orig_system_reference
, fragment_payload_flag
, fragment_max_size_characters
)
select topic
, filecontents
, division_code
, filename
, true
, 10000 /* This one is in characters. You can also specify in number. */
from ...
The loaded fragments can be queried using:
select *
--
-- Check results and reload.
--
from UploadXMLTopicFragments#eol
And looks something like:

Related

Ensuring Same date format in JSON objects

Setup :
Angular 8 + Spring boot 2.2.3 + (Oracle 12.1 / PostgreSQL 12.3)
We are building a approval System where User can fill online form like google forms and can submit for approval. Rather than normalizing form structure , we'll be storing metadata in JSON format in our DB.
Values that are filled in form would also be going as JSON format in DB.
One point come up as a concern , in DB we can store date in particular format like 12-May-2020 which would be consistent across all inserted data as this data might be used to construct reports in near future.
Based on pros/cons of this approach need to decide on DB / data model as well.
So,
Is there any way I can enforce date format in JSON
If this cannot be done in JSON , what options do i have at Angular 8 / Spring boot application level which would enforce all developers / date components / date fields to use same date format.
If these cannot be done , how can I handle different formats in Query over JSON data that would be used in reporting or otherwise , both in Oracle and PostgreSQL
Regards
The proper solution to your problem is to create a real, normalized date column.
If for some reason you can't or don't want to do that, I would create a check constraint that validates the date format by trying to cast it to a real date value.
The following is for Postgres, but you can create something similar for Oracle as well:
create table the_table
(
form_data jsonb,
constraint check_valid_date check ( (form_data ->> 'entry_date')::date is not null)
);
Obviously you will need to adjust the expression that gets the date value from the JSON to match the key and path inside your json value.
The cast to date will require that the date value is entered using the ISO standard format, yyyy-mm-dd which is the only "consistent" way to store a date as a string.
Alternatively you can use to_date() with a format mask:
check ( to_date(form_data ->> 'entry_date', 'yyyy-mm-dd) is not null)
in DB we can store date in particular format like 12-May-2020
You are mistaken, Oracle doesn't store date in that format. It is internally stored in TYPE12/13 data type. Each bit represents different parts of the date. What you see is a human readable format displayed according to your locale-specific NLS settings or using TO_CHAR with format mask.
To keep it aligned across all platforms, use the globally accepted ANSI standard date literal which uses a fixed format 'YYYY-MM-DD'.
For example:
DATE '2020-05-21'

Storing large amounts of queryable JSON

I am trying to find a database solution that is capable of the following.
Store flat, random, JSON structures separated by a table name(random_json_table_1, random_json_table_2 for example).
Capable of handling a large number of insert operations(+10000/second).
Able to query the random json structures(SELECT * FROM random_json_table_1 WHERE JSON_SELECT('data', '$.city.busses') NOT NULL AND JSON_SELECT('data', '$.city.busStops', 'length') > 5) for example.
SELECT queries must run fast over gigabytes of data.
I had a look at Amazon Athena and it looks a bit promising but I am curious if there are any other solutions out there.
You may consider BigQuery.
Regarding 2), there is BigQuery streaming interface.
And 4), you can play with BigQuery public data (e.g. the popular BitCoin transaction table) to see how fast BigQuery can be.
Below is sample query using BigQuery standardSQL, showing how to filter data which is stored in JSON string.
#standardSQL
SELECT JSON_EXTRACT(json_text, '$') AS student
FROM UNNEST([
'{"age" : 1, "class" : {"students" : [{"name" : "Jane"}]}}',
'{"age" : 2, "class" : {"students" : []}}',
'{"age" : 10,"class" : {"students" : [{"name" : "John"}, {"name": "Jamie"}]}}'
]) AS json_text
WHERE CAST(JSON_EXTRACT_SCALAR(json_text, '$.age') AS INT64) > 5;
It feels like Google's BigQuery managed database might be of value to you. Reading here we seem to find that there is a soft limit of 100,000 rows per second and the ability to insert 10,000 rows per single request. For performing queries, BigQuery advertises itself as being able to process petabyte sized tables within acceptable limits.
Here is a link to the main page for BigQuery:
https://cloud.google.com/bigquery/

Homogenize a field with different date formats in Mysql

I am working with Mysql workbench.
I have a huge database in a csv, that contains among other things, 3 columns with different formats of dates.
To be able of loading this csv file into my database, I have to set the 3 date columns as text, otherwise, it doesn't upload them properly.
Here an example of my data:
inDate, outDAte
19-01-10, 02-02-10
04-01-11 12:02, 2011-01-11 11:31
29-01-11 6:57, 29-03-2010
30-03-10, 01-04-2010
2012-12-03 05:39:27.040, 12-12-12 17:04
2012-12-04 13:47:01.040, 29-11-12
I want to homogenize them and to make 2 columns of each one of those one only with "date" and other only with "time".
I have tried working with "regular expressions" and with "case".
When I used "reg expressions" gave me nulls and with "case" gave me "truncated incorrect value".
I have tried to find something similar situations in the web. I have found that people got similar issues but with two date formats not with so many different formats as I do:
Convert varchar column to date in mysql at database level
Converting multiple datetime formats to single mysql datetime format
Format date in SELECT * query.
I am really new in this and I do not know how to write so many exceptions in mysql.
Load the CSV into a temporary table; massage the values in that table; finally copy to the 'real' table.
Have 2 columns in that table for each date; one for the raw value coming from the CSV; the other being a DATETIME(3) (or whatever the sanitized version will be).
Do one of these for each distinctly different format:
UPDATE tmp SET inDate = ...
WHERE raw_inDate REGEXP '...';
The WHERE may need things like AND LENGTH(inDate) = 8 and other ways to test other than REGEXP.
SUBSTRING_INDEX(inDate, '-', ...) may be a handy function for splitting up a date.
But, really, I would rather write the code in Perl or some other real programming language.

Left Trim 'abc' and right trim 'xyz'

A vendor writes user defined data into a a single column as XML, I need to write an SQL query (2008 / 2012 / 2014) that pulls data from that column for a 3rd party application, here's an example of what is in the column
<udf><udf_date_ppe>15/12/2019</udf_date_ppe><udf_text_ppn>300965994</udf_text_ppn><udf_date_ved>8/12/2016</udf_date_ved><udf_text_vtno>417 - Working holiday</udf_text_vtno><udf_text_ppi>Taiwan</udf_text_ppi></udf>
The problem is I need to grab all the actual data not the XML and the XML isn't stored in the same order meaning I have to dynamically figure out the lengths to trim left and right. For example, I want the date only inbetween this XML
<udf_date_ppe>15/12/2019</udf_date_ppe>
but I don't know how many characters are before it. Once I figure out how to do one I can replicate for the other fields, this is only one user defined filed but at least the XML isn't going to change. I only have view access to the server.
Bit of a pain I know but any help is appreciated.
If you just want to grab the 10 characters appearing inside the <udf_date_ppe> tags, you can use SQL Server's string functions and the following query:
SELECT SUBSTRING(col, CHARINDEX('<udf_date_ppe>', col) + 14, 10)
FROM yourTable
This assumes that there is only a single <udf_date_ppe> tag in the column.

Eliminate or translate HTML character in SQL import from XML file

I came across an issue when running a procedure that shreds an XML file and imports the data into SQL server.
It has been running successfully for a few months, but today I got an error:
Conversion failed when converting date and/or time from character
string.
This is the line where it fails
SELECT
ltrim(rtrim(T.X.value('Cell[4]/Data[1]','varchar(max)'))) AS StartDate
,ltrim(rtrim(T.X.value('Cell[5]/Data[1]','varchar(max)'))) AS EndDate
FROM #xml.nodes('/Workbook[1]/Worksheet1]/Table[1]/Row') as T(X)
When I looked at the XML file, I noticed that some of the dates were written like this:
01/12/2016&#160
This character &#160 is a Non-breaking space.
I would like to know if there is any way in SQL Server to account for these types of issues? For this specific problem, I can use REPLACE:
SELECT
REPLACE(ltrim(rtrim(T.X.value('Cell[4]/Data[1]','varchar(max)'))),'&#160','') AS StartDate
,ltrim(rtrim(T.X.value('Cell[5]/Data[1]','varchar(max)'))) AS EndDate
FROM #xml.nodes('/Workbook[1]/Worksheet1]/Table[1]/Row') as T(X)
but if other XML/HTML characters come up, is there a way to universally check for/deal with them?
I'd imagine you could create an auxiliary table of strings that you wish to replace/remove and join that to your query. This would be preferred over hard-coding each character, and would allow you to expand on the functionality easily. The caveat is that I'd expect it to slow down you query based on the number of characters you need to replace/remove.
SELECT
REPLACE(ltrim(rtrim(T.X.value('Cell[4]/Data[1]','varchar(max)'))),
StringsToRemove.string,'') AS StartDate,
REPLACE(ltrim(rtrim(T.X.value('Cell[5]/Data[1]','varchar(max)'))),
StringsToRemove.string, '') AS EndDate
FROM
#xml.nodes('/Workbook[1]/Worksheet1]/Table[1]/Row') as T(X),
StringsToRemove