How to build and send an IDOC from MII to SAP ECC using IDOC_Asynchronous_Inbound - integration

We have a custom built legacy application that collects data from a SQL server database, builds an IDOC and then "sends" that IDOC to ECC. (This application was written in VB6 and uses the SAPGUI 6 SDK to accomplish this.)
I'm attempting to decommission this solution and replace it with a solution built in MII.
As far as I can tell I need to create the IDOC in MII using IDOC_Asynchronous_Inbound but I'm stuck at how I should populate the fields required.
IDOC_Asynchronous_Inbound has two segments: IDOC_CONTROL_REC_40 and IDOC_DATA_REC_40
I guessed which fields to fill in the IDOC_CONTROL_REC_40/item segment by looking at the source code of the old VB application. I think this should do:
IDOC_INBOUND_ASYNCHRONOUS/TABLES/IDOC_CONTROL_REC_40/item
- IDOCTYP: WMMBID01
- MESTYP: WMMBXY
- SNDPRN: <value>
- SNDPRT: LI
- SNDPOR: <value>
- RCVPRN: <value>
- RCVPRT: LS
- EXPRSS: X
Looking at the source code of the old VB app, I should now add a segment of type E1MBXYH with the following fields filled:
- BLDAT: <date>
- BUDAT: <date>
- TCODE: MB31
- XBLNR: <value>
- BKTXT: <value>
Based on guesswork and some blog posts, I'm guessing I have to add this segment as an item segment to the IDOC_DATA_REC_40 segment.
My guess is I should then add item segments of type E1MBXYI for all of the 'records' I'd like to send to SAP with the following fields:
- MATNR: <value>
- WERKS: <value>
- LGORT: <value>
- CHARG: <value>
- BWART: 261
- ERFMG: <value>
- SHKZG: H
- ERFME: <value>
- AUFNR: <value>
- SGTXT: <value>
Now, looking at the IDOC_DATA_REC_40 segment in MII, these are the fields that are available:
- SEGNAM
- MANDT
- DOCNUM
- SEGNUM
- PSGNUM
- HLEVEL
- SDATA
My guess is that the segment name should go into SEGNAM and the data (properly structured/spaced) should go into SDATA. I'm not sure what I should put in the other fields (if anything). (I have the description file for this IDOC type so I know how to 'structure' the data I have to put in the SDATA segment... counting spaces, yay!)
To hopefully clarify how the IDOC should be structured, this is a (link to a) screenshot of an IDOC posted by the current VB application:
screenshot of an IDOC in SAP showing the data structure
I hope someone here can confirm I'm on the right track in filling the segments and that there's someone who knows which fields I should fill in the data segments.
Kind regards,
Thomas
P.S. Some of the resources consulted:
How to create and send Idocs to SAP using SAP .Net Connector 3
Goods movement IDOC SAP documentation
How to send IDOCs from SAP MII to SAP ERP
P.P.S. Full disclosure: I've also posted this question on the SAP Community Questions & Answers board.

Correctly dealing with SAP IDocs is unfortunately not so easy as it looks at first glance. Maybe it would be a good idea to have a look at the SAP Java IDoc Class Library as mentioned here:
SAP .Net Connector 3.0 - How can I send an idoc from a non-SAP system?
Even if you would not like to switch to Java, it could be at least used as a reference example implementation in order to see how the Remote Function Modules have to be filled with the IDoc data to send.
The SAP Java IDoc Class Library can be downloaded together with the SAP Java Connector from here.

I have no MII system by my side but you'd better thoroughly examine IDoc documentation rather than read the tea leaves. It can contain helpful hints how to fill one or another field of segment.
Go to WE60 and enter your segment names (IDOC_CONTROL_REC_40/IDOC_DATA_REC_40) or IDoc definition name IDOC_Asynchronous_Inbound.
It may not be very helpful but better than nothing.

Related

AllenNLP BERT SRL input format ("OntoNotes v. 5.0 formatted")

The goal is to train BERT SRL on another data set. According to configuration, it requires conll-formatted-ontonotes-5.0.
Natively, my data comes in a CoNLL format and I converted it to the conll-formatted-ontonotes-5.0 format of the GitHub edition of OntoNotes v.5.0. Reading the data works and training seems to work, except that precision remains at 0. I suspect that either the encoding of SRL arguments (BOI or phrasal?) or the column structure (other OntoNotes editions in CoNLL format differ here) differ from the expected input. Alternatively, the error may arise because if the role labels are hard-wired in the code. I followed the reference data in using the long form (ARGM-TMP), but you often see the short form (AM-TMP) in other data.
The question is which dataset and format is expected here. I guess it's one of the CoNLL/Skel formats for OntoNotes 5.0 with a restored WORD column, but
The CoNLL edition doesn't seem to be shipped with the LDC edition of OntoNotes
It does not seem to be the format of the "conll-formatted-ontonotes-5.0" edition of OntoNotes v.5.0 on GitHub provided by the OntoNotes creators.
There is at least one other CoNLL/Skel edition of OntoNotes 5.0 data as part of PropBank. This differs from the other one in leaving out 3 columns and in the encoding of predicates. (For parts of my data, this is the native format.)
The SrlReader documentation mentions BIO (IOBES) encoding. This has been used in other CoNLL editions of PropBank data, indeed, but not in the above-mentioned OntoNotes corpora. Other such formats are the CoNLL-2008 and CoNLL-2009 formats, for example, and different variants.
Before I start reverse-engineering the SrlReader, does anyone have a data snippet at hand so that I can prepare my data accordingly?
conll-formatted-ontonotes-5.0 version of my data (sample from EWT corpus):
google/ewt/answers/00/20070404104007AAY1Chs_ans.xml 0 0 where WRB (TOP(S(SBARQ(WHADVP*) - - - - * (ARGM-LOC*) * * -
google/ewt/answers/00/20070404104007AAY1Chs_ans.xml 0 1 can MD (SQ* - - - - * (ARGM-MOD*) * * -
google/ewt/answers/00/20070404104007AAY1Chs_ans.xml 0 2 I PRP (NP*) - - - - * (ARG0*) * * -
google/ewt/answers/00/20070404104007AAY1Chs_ans.xml 0 3 get VB (VP* get 01 - - * (V*) * * -
google/ewt/answers/00/20070404104007AAY1Chs_ans.xml 0 4 morcillas NNS (NP*) - - - - * (ARG1*) * * -
The "native" format is the one under of the CoNLL-2012 edition, see cemantix.org/conll/2012/data.html how to create it.
The Ontonotes class that reads it may, however, encounter difficulties when parsing "native" CoNLL-2012 data, because the CoNLL-2012 preprocessing scripts can lead to invalid parse trees. Parsing with NLTK will naturally lead to a ValueError such as
ValueError: Tree.read(): expected ')' but got 'end-of-string'
at index 1427.
"...LT#.#.) ))"
There is no direct way to solve that at the data level, because the string that is parsed is an intermediate representation, but not the original data. If you want to process CoNLL-2012 data, the ValueError has to be caught, cf. https://github.com/allenai/allennlp/issues/5410.

Bioisosteric replacement using SMARTS (KNIME and RDKit)

I am trying to create a KNIME workflow that would accept a list of compounds and carry out bioisosteric replacements (we will use the following example here: carboxylic acid to tetrazole) automatically.
NOTE: I am using the following workflow as inspiration : RDKit-bioisosteres (myexperiment.org). This uses a text file as SMARTS input. I cannot seem to replicate the SMARTS format used here.
For this, I plan to use the Rdkit One Component Reaction node which uses a set of compounds to carry out the reaction on as input and a SMARTS string that defines the reaction.
My issue is the generation of a working SMARTS string describing the reaction.
I would like to input two SDF files (or another format, not particularly attached to SDF): one with the group to replace (carboxylic acid) and one with the list of possible bioisosteric replacements (tetrazole). I would then combine these two in KNIME and generate a SMARTS string for the reaction to then be used in the Rdkit One Component Reaction node.
NOTE: The input SDF files have the structures written with an
attachment point (*COOH for the carboxylic acid for example) which
defines where the group to replace is attached. I suspect this is the
cause of many of the issues I am experiencing.
So far, I can easily generate the reactions in RXN format using the Reaction Builder node from the Indigo node package. However, converting this reaction into a SMARTS string that is accepted by the Rdkit One Component Reaction node has proven tricky.
What I have tried so far:
Converting RXN to SMARTS (Molecule Type Cast node) : gives the following error code : scanner: BufferScanner::read() error
Converting the Source and Target molecules into SMARTS (Molecule Type Cast node) : gives the following error code : SMILES loader: unrecognised lowercase symbol: y
showing this as a string in KNIME shows that the conversion is not carried out and the string is of SDF format : *filename*.sdf 0 0 0 0 0 0 0 V3000M V30 BEGIN etc.
Converting the Source and Target molecules into RDkit first (RDkit from Molecule node) then from RDkit into SMARTS (RDkit to Molecule node, SMARTS option). This outputs the following SMARTS strings:
Carboxylic acid : [#6](-[#8])=[#8]
Tetrazole : [#6]1:[#7H]:[#7]:[#7]:[#7]:1
This is as close as I've managed to get. I can then join these two smarts strings with >> in between (output: [#6](-[#8])=[#8]>>[#6]1:[#7H]:[#7]:[#7]:[#7]:1) to create a SMARTS reaction string but this is not accepted as an input for the Rdkit One Component Reaction node.
Error message in KNIME console :
ERROR RDKit One Component Reaction 0:40 Creation of Reaction from SMARTS value failed: null
WARN RDKit One Component Reaction 0:40 Invalid Reaction SMARTS: missing
Note that the SMARTS strings that this last option (3.) generates are very different than the ones used in the myexperiments.org example ([*:1][C:2]([OH])=O>>[*:1][C:2]1=NNN=N1). I also seem to have lost the attachment point information through these conversions which are likely to cause issues in the rest of the workflow.
Therefore I am looking for a way to generate the SMARTS strings used in the myexperiments.org example on my own sets of substituents. Obviously doing this by hand is not an option. I would also like this workflow to use only the open-source nodes available in KNIME and not proprietary nodes (Schrodinger etc.).
Hopefully, someone can help me out with this. If you need my current workflow I am happy to upload that with the source files if required.
Thanks in advance for your help,
Stay safe and healthy!
-Antoine
What you're describing is template generation, which has been a consistent field of work in reaction prediction and/or retrosynthesis in cheminformatics for a long time.
I'm not particularly familiar with KNIME myself, though I know RDKit extensively: Your last option (3) is closest to what I'd consider a usable workflow. The way I would do this:
Load the substitution pair molecules from SDF into RDKit mol objects.
Export these RDKit mol objects as SMARTS strings rdkit.Chem.MolToSmarts().
Concatenate these strings into the form before_substructure>>after_substructure to generate a reaction SMARTS string.
Load this SMARTS string into a reaction object rxn = rdkit.Chem.AllChem.ReactionFromSmarts()
Use the rxn.RunReactants() method to generate your bioisosterically substituted products.
The error you quote for the RDKit One Component Reaction node input cuts off just before the important information, unfortunately. Running rdkit.Chem.AllChem.ReactionFromSmarts("[#6](-[#8])=[#8]>>[#6]1:[#7H]:[#7]:[#7]:[#7]:1") produces no errors for me locally, which leads me to believe this is specific to the KNIME node functionality.
Note, that the difference between [#6](-[#8])=[#8] and [*:1][C:2]([OH])=O is relatively minimal: The former represents a O-C=O substructure, the latter represents a ~COOH group. Within the square brackets of the latter, the :num refers to an optional 'atom map' number, which allows a one-to-one mapping of reactant and product atoms. For example, [C:1][C:3].[C:2][C:4]>>[C:1][C:3][C:4][C:2] allows you to track which carbon is which during a reaction, for situations where it may matter. The token [*:1] means "any atom" and is equivalent to a wavey line in organic chemistry (and it is mapped to #1).
There are only two situations I can think of where [#6](-[#8])=[#8] and [*:1][C:2]([OH])=O might differ:
You have methanoic acid as a potential input for substitution (former will match, latter might not - I can't remember how implicit hydrogens are treated in this situation)
Inputs are over/under protonated. (COO- != COOH)
Converting these reaction SMARTS to RDKit reaction objects and running them on input molecule objects should potentially create a number of substituted products. Note: Typically, in extensive projects, there will be some SMARTS templates that require some degree of manual intervention - indicating attachment points, specifying explicit hydrogens, etc. If you need any help or have any questions don't hesitate to drop a comment and I'll do my best to help with specifics.

Working on migration of SPL 3.0 to 4.2 (TEDA)

I am working on migration of 3.0 code into new 4.2 framework. I am facing a few difficulties:
How to do CDR level deduplication in new 4.2 framework? (Note: Table deduplication is already done).
Where to implement PostDedupProcessor - context or chainsink custom? In either case, do I need to remove duplicate hashcodes from the list or just reject the tuples? Here I am also doing column updating for a few tuples.
My file is not moving into archive. The temporary output file is getting generated and that too empty and outside load directory. What could be the possible reasons? - I have thoroughly checked config parameters and after putting logs, it seems correct output is being sent from transformer custom, so I don't know where it is stuck. I had printed TableRowGenerator stream for logs(end of DataProcessor).
1. and 2.:
You need to select the type of deduplication. It is not a big difference if you choose "table-" or "cdr-level-deduplication".
The ite.businessLogic.transformation.outputType does affect this. There is one Dedup only. You can not have both.
Select recordStream for "cdr-level-deduplication", do the transformation to table row format (e.g. if you like to use the TableFileWriter) in xxx.chainsink.custom::PostContextDataProcessor.
In xxx.chainsink.custom::PostContextDataProcessor you need to add custom code for duplicate-handling: reject (discard) tuples or set special column values or write them to different target tables.
3.:
Possibly reasons could be:
Missing forwarding of window punctuations or statistic tuple
error in BloomFilter configuration, you would see it easily because PE is down and error log gives hints about wrong sha2 functions be used
To troubleshoot your ITE application, I recommend to enable the following debug sinks if checking the StreamsStudio live graph is not sufficient:
ite.businessLogic.transformation.debug=on
ite.businessLogic.group.debug=on
ite.businessLogic.sink.debug=on
Run a test with a single input file only and check the flow of your record and statistic tuples. "Debug sinks" write punctuations markers also to debug files.

Regarding dom4j,iCal4j and backport-util-concurrent Export Control Classification Number (ECCN)

We would like know below mentioned details to use dom4j,iCal4j and backport-util-concurrent in commercial product
Can anyone tell me if the Java code contains encryption - or even better
can anyone tell me what will be export code (ECCN) for dom4j,iCal4j and backport-util-concurrent?
can anyone tell me what export code (ECCN) to use when distributing product with dom4j,iCal4j and backport-util-concurrent?
more info on ECCN Ref: http://en.wikipedia.org/wiki/Export_Control_Classification_Number
With Regards,
Kasim Basha Shaik
iCal4j ECCN is n/a(not applicable).since,ical4j is not developed in the US so I don't believe it is subject to export restrictions. Either way there is not really any encryption code in ical4j, with the only encoding being BASE64 encoding of binary values.
(above information is provided by the "Ben" creator of iCal4j URL here )
In both dom4j source from here and
backport-util-concurrent source from here
I scanned through the code for the following key words.
- AlgorithmParameters
- CertificateFactory
- CertPathBuilder
- CertPathValidator
- CertStore
- Cipher
- AES
- DES
- DESede
- RSA
- KeyFactory
- KeyGenerator
- Hmac
- KeyPairGenerator
- KeyStore
- Mac
- MessageDigest
- SecretKeyFactory
- Signature
- TransformService
- XMLSignatureFactory
Encryption related code not found and above encryption key word are taken from here
form the above code scan,I came a conclusion that ECCN for dom4j and backport-util-concurrent is n/a

Using the nltk to recognise dates as named entities?

I'm trying to use the NLTK Named Entity Tagger to identify various named entities. In the book Natural Language Processing with Python they provide a list of commonly used named entitities, (Table 7.4, if anyone is curious) which include: DATE June, 2008-06-29 and TIME two fifty a m, 1:30 p.m. So I got the impresssion that this could be done with the NLTK's named entity tagger.
However, when I've run the tagger, it doesn't seem to pick up dates or times at all, as it does people or organizations. Does the NLTK named entity tagger not handle these date/time cases, or does it only pick up a specific date/time format? If it doesn't handle these cases, does anybody know of a system that does? Or is creating my own the only solution?
Thanks!
You should check out the contrib repository of NLTK - contains a module called timex.py or download it here:
https://github.com/nltk/nltk_contrib/blob/master/nltk_contrib/timex.py
From the first line of the module:
# Code for tagging temporal expressions in text