Number of lines read with Spring Batch ItemReader - csv

I am using SpringBatch to write a csv-file to the database. This works just fine.
I am using a FlatFileItemReader and a custom ItemWriter. I am using no processor.
The import takes quite some time and on the UI you don't see any progress. I implemented a progress bar and got some global properties where i can store some information (like lines to read or current import index).
My question is: How can i get the number of lines from the csv?
Here's my xml:
<batch:job id="importPersonsJob" job-repository="jobRepository">
<batch:step id="importPersonStep">
<batch:tasklet transaction-manager="transactionManager">
<batch:chunk reader="personItemReader"
writer="personItemWriter"
commit-interval="5"
skip-limit="10">
<batch:skippable-exception-classes>
<batch:include class="java.lang.Throwable"/>
</batch:skippable-exception-classes>
</batch:chunk>
<batch:listeners>
<batch:listener ref="skipListener"/>
<batch:listener ref="chunkListener"/>
</batch:listeners>
</batch:tasklet>
</batch:step>
<batch:listeners>
<batch:listener ref="authenticationJobListener"/>
<batch:listener ref="afterJobListener"/>
</batch:listeners>
</batch:job>
I already tried to use the ItemReadListener Interface, but this isn't possible as well.

if you need to know how many lines where read, it's available in spring batch itself,
take a look at the StepExecution
The method getReadCount() should give you the number you are looking for.
You need to add a step execution listener to your step in your xml configuration. To do that (copy/pasted from spring documentation):
<step id="step1">
<tasklet>
<chunk reader="reader" writer="writer" commit-interval="10"/>
<listeners>
<listener ref="chunkListener"/>
</listeners>
</tasklet>
</step>
where "chunkListner" is a bean of yours annotated with a method annotated with #AfterStep to tell spring batch to call it after your step.
you should take a look at the spring reference for step configuration
Hope that helps,

Related

Wildfly json-formatter class name metadata

Is it possible to configure a non-static value for the metadata field in the wildly json-formatter?
I didn't find anything about it in the wildfly documentation- it only has a simple static field example (meta-data=[#version=1])
For example, I would like to have a field "simpleClassName" - The class of the code calling the log method.
I also tried to use a similar syntax to pattern-formatter(example below) but it doesn't work
<formatter name="JSON">
<json-formatter>
<meta-data>
<property name="simpleClassName" value="%c{1}"/>
</meta-data>
</json-formatter>
</formatter>
No the meta-data is only static information. However what you're looking for seems to be the details of the caller. Note that this is an expensive operation should you should use it with caution. What you'd want to do is change the print-details to true. In CLI it would be something like:
/subsystem=logging/json-formatter=JSON:write-attribute(name=print-details, value=true)

Is there a checkstyle rule for forcing every field in a class to have an annotation?

We want to make compliance easy and for FedRAMP want something like this on all fields in our database objects
#FedRamp(confidentiality=LOW, integrity=MODERATE, availability=HIGH)
We want checkstyle to break the builds if people add data and forget to add these on 'any' field in the *Dbo.java class. Then, we can generate the FedRAMP compliance on each data item (and therefore the entire system). We run checkstyle on every class but only want this rule run on classes ending in *Dbo.java. Is this possible where we import some already existing checkstyle rule or plugin and add the class name filter to it?
thanks,
Dean
To report violations for such cases for any classes, you can use MatchXpathCheck (you need checkstyle 8.39+)
Config will look like:
<?xml version="1.0"?>
<!DOCTYPE module PUBLIC
"-//Checkstyle//DTD Checkstyle Configuration 1.3//EN"
"https://checkstyle.org/dtds/configuration_1_3.dtd">
<module name = "Checker">
<module name="TreeWalker">
<module name="MatchXpath">
<property name="id" value="fedramp_check"/>
<property name="query" value="//CLASS_DEF/OBJBLOCK/VARIABLE_DEF/MODIFIERS/ANNOTATION/IDENT[not(#text='FedRamp')]"/>
<message key="matchxpath.match"
value="Field should have 'FedRamp' annotation."/>
</module>
</module>
</module>
This will report violations like this:
$ cat Test.java
class Test {
#FedRamp(confidentiality=LOW, integrity=MODERATE, availability=HIGH)
private int withAnnotation = 11; // no violation
#Fed(confidentiality=LOW, integrity=MODERATE, availability=HIGH)
private int without = 11; // violation
#NotNull
int without = 11; // violation
}
$ java -jar checkstyle-8.42-all.jar -c config.xml Test.java
Starting audit...
[ERROR] C:\workdir\Test.java:6:4: Field should have FedRamp annotation. [fedramp_check]
[ERROR] C:\workdir\Test.java:9:4: Field should have FedRamp annotation. [fedramp_check]
Audit done.
Checkstyle ends with 2 errors.
Second part of your question - narrow execution only to specific classes - can be solved in several ways.
Use a bit different xpath to filter class names (not files, since there can be many classes in single file)
<property name="query"
value="//CLASS_DEF[./IDENT[ends-with(#text,
'Dbo')]]/OBJBLOCK/VARIABLE_DEF/MODIFIERS/ANNOTATION/IDENT[not(#text='FedRamp')]"/>
Use BeforeExecutionExclusionFileFilter - it is filter for whole config and will work ok only if you have a separate config only for checking annotation thing.
Suppress violations for this check (by id, for example) for other class files, see doc

Custom SSIS workflow task

I have a ton of containers that all follow this same basic premise:
When I pull data from a remote database I first blank out a collector table, copy the data from the remote DB to the collector, count the rows in the collector, and if there are enough rows then I merge into the real table. If not, I send an email with an error message.
Instead of repeating this over and over I would like to make a custom component. I think this is just a filter component I would make, but what I'm not really sure of is how to replicate the Data Flow Task piece. Are there any good examples somebody could point me to, or even just let me know what I want to do isn't possible?
When I see problems like this, Biml tends to offer the lowest barrier to creating a simple, repeatable solution. Biml is free, all it costs you is a registration email and install BimlExpress into whatever version of Visual Studio/SSDT you are working with.
I assume that I'm going to collect the data from AdventureWorks2014 Sales.Currency table and transport it to a table in tempdb called dbo.SalesCurrency.
I defined it as
CREATE TABLE dbo.SalesCurrency
(
CurrencyCode nchar(3) NOT NULL
, Name nvarchar(50) NOT NULL
, ModifiedDate datetime NOT NULL
);
Given that, let's look at some Biml concepts. Biml is an XML based dialect that describes the business intelligence artifacts (and then some). If you ever did classic ASP development with the mix of scripting and tags, it's a similar concept but much nicer due to the .NET integration.
<# #> this is a multi-line block
<#= #> is a single line expression
Great, how do I use it? Assuming you've installed BimlExpress, open an SSIS project and right click on the Project section and select Add New Biml File. Do that twice and we'll rename the second one. The first one is a driver, the second one is the worker.
Brains biml
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="Source" ConnectionString="Data Source=localhost\dev2017;Initial Catalog=AdventureWorks2014;Provider=SQLNCLI11.0;Integrated Security=SSPI;" />
<OleDbConnection Name="Target" ConnectionString="Data Source=localhost\dev2017;Initial Catalog=tempdb;Provider=SQLNCLI11.0;Integrated Security=SSPI;" />
</Connections>
<#
string sourceQuery = "SELECT * FROM Sales.Currency;";
string targetSchemaTable = "[dbo].[SalesCurrency]";
string templateName = "so_56050574_include.biml";
dynamic customOutput;
#>
<Packages>
<#= CallBimlScriptWithOutput(templateName, out customOutput, sourceQuery, targetSchemaTable) #>
</Packages>
</Biml>
The first line is just xml namespace.
The next block, the Connections collections I define my Source and Target connections. I'm very creative and named them Source and Target
The next lines look a lot like C# because they are. I define my source query, fully qualified target table name, square brackets included and the name of my template file. The final variable, customOutput isn't used in here but it's a bag that allows me to pass information back from the template file - namely the name of the SSIS package it built.
I then define a Packages collection and make a single package. The package I make is defined by whatever I send to CallBimlScriptWithOutput and I then use the variables I just defined.
It looks complex but it's not. Why I like this approach is that instead of hard coding these values into my driver program, it allows me to take a metadata driven approach to development. I could look these values up from a spreadsheet, Sharepoint List, webservice, whatever I feel like (or my client offers as a repository).
Worker biml
I name this file so_56050574_include.biml and while there's plenty of text in there, it's fairly straight forward.
The first line helps the Intellisense during the Biml design experience.
The next two lines specify that these variables are going to be passed in - like a function call. I'll be able to use them like a .NET variable within the scope of this file.
The next few lines are a little funky but SSIS doesn't like duplicated names and it also doesn't like "bad" characters in names. I specify the package name will be Populate Collector and then I make the target table safe for SSIS. All the way at the bottom of the file, you'll see I have made a tiny method called MakeSsisSafeName which I use to sanitize the package name.
I create a Package and give it a good name. That Package has a Container. Within the Container, I create a handful of SSIS Variables that I'll need to do my work. That Container has tasks of Execute SQL Task -> Data Flow Task -> Execute SQL Task -> Execute SQL Task --> Send Mail Task
<## template designerbimlpath="/Biml/Packages" #>
<## property name="SourceQuery" type="string" #>
<## property name="TargetSchemaTable" type="string" #>
<#
string packageName = string.Format("Populate Collector {0}", MakeSsisSafeName(TargetSchemaTable));
CustomOutput.PackageName = packageName;
#>
<Package Name="<#= packageName #>" ConstraintMode="Linear">
<Tasks>
<Container Name="SEQC Collector" ConstraintMode="Parallel">
<Variables>
<Variable Name="RowCount" DataType="Int64">0</Variable>
<Variable Name="QueryEmpty" DataType="String">TRUNCATE TABLE <#=TargetSchemaTable#></Variable>
<Variable Name="QueryCount" DataType="String">SET NOCOUNT ON; SELECT COUNT_BIG(1) AS rc FROM <#=TargetSchemaTable#></Variable>
<Variable Name="QuerySource" DataType="String"><#=SourceQuery#></Variable>
<Variable Name="TargetSchemaTable" DataType="String"><#=TargetSchemaTable #></Variable>
</Variables>
<Tasks>
<ExecuteSQL Name="SQL Empty Collector Table" ConnectionName="Target">
<VariableInput VariableName="User.QueryEmpty" />
</ExecuteSQL>
<Dataflow Name="DFT Populate Collector Table">
<Transformations>
<OleDbSource Name="OLESRC Query" ConnectionName="Source">
<VariableInput VariableName="User.QuerySource" />
</OleDbSource>
<OleDbDestination Name="OLEDST Target" ConnectionName="Target">
<TableFromVariableOutput VariableName="User.TargetSchemaTable" />
</OleDbDestination>
</Transformations>
<PrecedenceConstraints>
<Inputs>
<Input OutputPathName="SQL Empty Collector Table.Output" EvaluationValue="Success" />
</Inputs>
</PrecedenceConstraints>
</Dataflow>
<ExecuteSQL Name="SQL Count Collector Table Rows" ConnectionName="Target" ResultSet="SingleRow">
<VariableInput VariableName="User.QueryCount" />
<Results>
<Result Name="0" VariableName="User.RowCount" />
</Results>
<PrecedenceConstraints>
<Inputs>
<Input OutputPathName="DFT Populate Collector Table.Output" EvaluationValue="Success" />
</Inputs>
</PrecedenceConstraints>
</ExecuteSQL>
<ExecuteSQL Name="SQL Merge Collector Data" ConnectionName="Target">
<DirectInput>SELECT 1; -- simulate merge</DirectInput>
<PrecedenceConstraints>
<Inputs>
<Input OutputPathName="SQL Count Collector Table Rows.Output" EvaluationOperation="ExpressionAndConstraint" EvaluationValue="Success" Expression="#[User::RowCount] > 0" />
</Inputs>
</PrecedenceConstraints>
</ExecuteSQL>
<!--
<SendMail Name="Send Mail" ToLine="Foo#bar.com" ConnectionName="Target" Subject="Subject line">
<DirectInput>Body here, I think</DirectInput>
<PrecedenceConstraints>
<Inputs>
<Input OutputPathName="SQL Count Collector Table Rows.Output" EvaluationOperation="ExpressionOrConstraint" EvaluationValue="Success" Expression="#[User::RowCount] == 0" />
</Inputs>
</PrecedenceConstraints>
</SendMail>
-->
<ExecuteSQL Name="SQL Pretend I send mail" ConnectionName="Target">
<DirectInput>SELECT 2; -- simulate merge</DirectInput>
<PrecedenceConstraints>
<Inputs>
<Input OutputPathName="SQL Count Collector Table Rows.Output" EvaluationOperation="ExpressionAndConstraint" EvaluationValue="Success" Expression="#[User::RowCount] ==0" />
</Inputs>
</PrecedenceConstraints>
</ExecuteSQL>
</Tasks>
</Container>
</Tasks>
</Package>
<#+
private static string MakeSsisSafeName(string name)
{
return name.Replace("/", "_").Replace("\\", "_").Replace(":", "_").Replace("[", "_").Replace("]", "_").Replace(".", "_").Replace("=", "_").Trim();
}
#>
Right click on the BimlScript brains file and select Generate SSIS Package
That should build out a package like this and hey, it works!
What's not covered
I don't know how you actually use this. Maybe you have one big package with lots of containers and your vision is to just push the button and have another template container added. Biml won't do that. It doesn't merge two SSIS packages - it overlays one with current definition. But, the way I defined all of this, you should be able to copy the generated Container and paste it into an existing SSIS package - assuming it has two connections named Source and Target.
Connections can also be tricky. If you're collecting data from N source servers then you'll likely want a looping mechanism to change out the Source value. That's not hard. But if the source data you're pulling back for each Collector has a different signature, then you need each bespoke Data Flow task.
Sending Email. I don't have an SMTP connection handy so I put a best guess at what the Send Mail would look like and then commented it out <!-- ... --> You'll need to add a Connection for your SMTP server in the brains package and then configure the SendMail task to use it. And then remove my "SQL Pretend I send mail" task.
Finally, you'll notice the names are repeated in the worker Biml. That tells the engine how things should be wired up. If you don't like what I called something, you'll need to change it in two places. Search and Replace will be handy in this ;)
The question asked about custom workflow tasks - answer it
Fine. It sucks. The DataFlow stuff gets into COM objects and they aren't pleasant to work with. When you supply a query or source table, you need to check the metadata, add/remove columns and lots of stuff that's poorly documented and is a lot of scut work. And that's just building a "regular" package through the interfaces. Once you get that solved, then you are looking at encapsulating that logic into a custom componentry which used to be documented with fair enough samples on Codeplex but that's dead now and I don't know if it's been migrated to github. Oh and custom tasks and components especially are version dependent so you get to build against the various binaries to get a dll for each. And then you'll likely need to build out UI components to help folks configure your SSIS task/component. And then you'll need to worry about delivering and installing it on each developer's computer. And the server installation.
Or, I can define it once via Biml and be done.

OpenDDS and OpenSplice interoperability

I have two programs, one using OpenSplice 6.7.1 and the other using OpenDDS 3.10.
They are both using RTPS as protocol, the same domain id and the destination port (I verified using wireshark).
The problem is that they are not communicating.
I don't know if I am doing anything wrong with the config... I am using the basic config for OpenDDS with RTPS and for OpenSplice I used the provided ospl.xml after changing the domain ID.
Here are my config files.
For OpenDDS:
[common]
DCPSGlobalTransportConfig=$file
DCPSDefaultDiscovery=DEFAULT_RTPS
[transport/the_rtps_transport]
transport_type=rtps_udp
For OpenSplice:
<OpenSplice>
<Domain>
<Name>ospl_sp_ddsi</Name>
<Id>223</Id>
<SingleProcess>true</SingleProcess>
<Description>Stand-alone 'single-process' deployment and standard DDSI networking.</Description>
<Service name="ddsi2">
<Command>ddsi2</Command>
</Service>
<Service name="durability">
<Command>durability</Command>
</Service>
<Service name="cmsoap">
<Command>cmsoap</Command>
</Service>
</Domain>
<DDSI2Service name="ddsi2">
<General>
<NetworkInterfaceAddress>AUTO</NetworkInterfaceAddress>
<AllowMulticast>true</AllowMulticast>
<EnableMulticastLoopback>true</EnableMulticastLoopback>
<CoexistWithNativeNetworking>false</CoexistWithNativeNetworking>
</General>
<Compatibility>
<!-- see the release notes and/or the OpenSplice configurator on DDSI interoperability -->
<StandardsConformance>lax</StandardsConformance>
<!-- the following one is necessary only for TwinOaks CoreDX DDS compatibility -->
<!-- <ExplicitlyPublishQosSetToDefault>true</ExplicitlyPublishQosSetToDefault> -->
</Compatibility>
</DDSI2Service>
<DurabilityService name="durability">
<Network>
<Alignment>
<TimeAlignment>false</TimeAlignment>
<RequestCombinePeriod>
<Initial>2.5</Initial>
<Operational>0.1</Operational>
</RequestCombinePeriod>
</Alignment>
<WaitForAttachment maxWaitCount="100">
<ServiceName>ddsi2</ServiceName>
</WaitForAttachment>
</Network>
<NameSpaces>
<NameSpace name="defaultNamespace">
<Partition>*</Partition>
</NameSpace>
<Policy alignee="Initial" aligner="true" durability="Durable" nameSpace="defaultNamespace"/>
</NameSpaces>
</DurabilityService>
<TunerService name="cmsoap">
<Server>
<PortNr>Auto</PortNr>
</Server>
</TunerService>
</OpenSplice>
What am I doing wrong ?
Multi-vendor interoperability has been demonstrated repeatedly at OMG events but not recently, so maybe a regression has happened with/in either of the products.
Your OpenSplice configuration is (apart from domainId which should match the one used in your application where typically users use DDS::DOMAIN_ID_DEFAULT to indicate they want to use the ID as specified in the configuration as pointed to by the OSPL_URI environment variable) a proper default configuration. I'm sure you are aware that the AUTO setting of the to-be-used interface/IP-address is a potential source-of-confusion if you use multi-homed machines.
So next would be to look at both (DDSI)traces and/or wireshark captures and see if you spot DDSI wire-frames for both Vendors (1.2 for PrismTech, 1.3 for OCI).
When for instance there's no sign of vendor-1.3 being identified in OpenSplice DDSI-traces then that suggests there's still some 'fundamental' communication issues.
Note that at these OMG-events we typically used the (for us 'bundled') iShapes example on domain '0' and module-less IDL topic-type specification to verify interoperability, so it it doesn't work for your application that's something worth trying too (and check/use wireshark in combination with that example too)
I'll also keep watching the community-forum for new information on this ..

How to get JAXB output to have namespace included with the child node with no prefix?

God knows I searched the forum for an answer, but didn't see any.
This is the simplified XML my JAXB code reads. There are 2 namespaces involved. xyz and abc. These two are defined in two different schema files. And xjc generates two different packages for them. The following file is nicely read into those classes and can even write it.
<xyz:xyz xsi:schemaLocation="urn:xyz xyz.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xyz="urn:xyz">
<session>
<App xsi:schemaLocation="urn:abc abc.xsd" xmlns="urn:abc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<AppItem att1="1234"/>
</App>
</session>
</xyz:xyz>
This is how it writes it.
<ns3:xyz xmlns:ns2="urn:abc" xmlns:ns3="urn:xyz">
<session>
<ns2:App>
<ns2:AppItem att1="1234"/>
</ns2:App>
</session>
</ns3:xyz>
Now i know about NamespacePrefixMapper and I can change ns2 and ns3 to the values I want. And I want this. Basically I want to main the original form of the XML. The App element should have all its information contained in itself and not create a prefix.
<xyz:xyz xmlns:xyz="urn:xyz">
<session>
<App xsi:schemaLocation="urn:abc abc.xsd" xmlns="urn:abc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<AppItem att1="1234"/>
</App>
</session>
</xyz:xyz>
Does anyone have any clue as to how to achieve this? Seems like some setting in AppType.java should tell the writer to not update root element with prefix.