boost log :using formatting_ostream to store log - boost-log

i want to prepare log message and store it in something.
then pass this something to function ,which apply << to this something putting it into BOOst_LOG macro.
i need this because i put this log to several backends using several loggers .
i read about formatting_ostream but all examples show overloading of << ,during this it takes lvalue reference to formatting_ostream. i wonder where is formatting_ostream created??.
can i do this:
boost::log::formatting_ostream os << "Request #" << this->GetId() << " for " << mUrl << " has been cancelled by the user at start of execute coroutine." << std::endl;
boost_log_function(mHTTPRequest_LoggingInstance_shared_pointer);
then :
BOOST_LOG(*loggerChannel_cancelExecute.get()) << os;

First, you don't need to output a log record in multiple loggers in order to have it processed in multiple sink backends. As long as the record is not rejected by filters, every log record is processed in all sinks, regardless of which logger was used to produce it. If you purposely arrange filters and attributes in loggers so that records from one logger are only processed in one sink (e.g. by using channels), you could also arrange them in a way that allows records from other loggers not associated with particular sinks to be processed in all sinks. This is much more efficient than generating multiple records in different loggers because it avoids overhead of creating extra log records and applying filters to them.
Now, to directly answer your question, the formatting_ostream object that is passed to various functions is created by Boost.Log. Exactly where it is created depends on the function in question. For example, the stream that is passed to formatters is created as part of the sink frontend implementation.
You can create formatting_ostream, but you need to remember the following:
You have to supply a string in which the formatted output will be stored in the formatting_ostream constructor. That string must stay alive for the whole lifetime of the stream object.
After you're done with formatting, you need to explicitly flush the stream to ensure any buffered content in the stream is pushed out into the string.
std::string str;
boost::log::formatting_ostream strm(str);
strm << "Request #" << this->GetId() << " for " << mUrl
<< " has been cancelled by the user at start of execute coroutine.";
strm.flush();
BOOST_LOG(*loggerChannel_cancelExecute.get()) << str;
However, you are not required to use formatting_ostream in the first place. You can compose the string in any way you want, including std::ostringstream, Boost.Format or even std::snprintf.
You should know though that pre-composing the message string like this may be bad for performance. If a log record is discarded by filters, the streaming expression is not evaluated at all. But your code that pre-composes the message is always evaluated, even if the log record is discarded afterwards.

Related

Writing multiple data in CSV file using Bean Shell Post Processor in Jmeter

I had a scenario where I need to write correlation values in a CSV file. And the easiest way I had come up with is the below code in answer section.
More suggestions are appreciated.
There is a plugin called Flexible file writer You can use that it is efficient and easy to implement explained here.
Be aware that starting from JMeter 3.1 it's recommended to use JSR223 Test Elements and Groovy language for any form of scripting so consider migrating to the JSR223 PostProcessor and the following code:
new File('FILEPATH/filename.csv') << vars.get('PARAM_1') << ',' << vars.get('PARAM_2') << System.getProperty('line.separator')
However this approach will work only if no concurrency assumed, if the PostProcessor will be executed by 2 or more concurrent threads you may run into a race condition when multiple threads will be writing into the same file resulting in data corruption.
So I would recommend declaring your PARAM_1 and PARAM_2 as Sample Variables and storing them into a file using i.e. Flexible File Writer
Add this below code in Bean Shell Post Processor
a = vars.get("PARAM_1"); // PARAM_1 is parameter/correlation variable
b = vars.get("PARAM_2"); // PARAM_2 is parameter/correlation variable
f = new FileOutputStream("FILEPATH/filename.csv", true);
p = new PrintStream(f);
this.interpreter.setOut(p);
print(a +","+ b);
f.close();

How to run sap r/3 transactions through JCO3? or execute reports through JCO?

If I log in SAP R/3 and execute the transaction code MM60 then it will show some UI screen for Material list and ask for material number. If I specify a material number and execute then it will show me the output i.e. material list.
Here the story ends if I am a SAP R/3 user.
But what if I want to do the same above steps using java program and get the result in java itself instead of going to SAP R/3? I want to do this basically because I want to use that output data for BI tool.
Suppose I am using JCO3 for connection with R/3.
EDIT
Based on the info in the link I tried to do something like below code but it does not schedule any job in background nor it downloads any spool file, etc.
I've manually sent a doc to spool and tried giving its ID in the code. This is for MM60.
JCoContext.begin(destination);
function = mRepository.getFunction("BAPI_XBP_JOB_OPEN");
JCoParameterList input = function.getImportParameterList();
input.setValue("JOBNAME", "jb1");
input.setValue("EXTERNAL_USER_NAME", "sap*");
function.execute(destination);
JCoFunction function2 = mRepository.getFunction("BAPI_XBP_JOB_ADD_ABAP_STEP");
function2.getImportParameterList().setValue("JOBNAME", "jb1");
function2.getImportParameterList().setValue("EXTERNAL_USER_NAME", "sap*");
function2.getImportParameterList().setValue("ABAP_PROGRAM_NAME", "RMMVRZ00");
function2.getImportParameterList().setValue("ABAP_VARIANT_NAME", "KRUGMANN");
function2.getImportParameterList().setValue("SAP_USER_NAME", "sap*");
function2.getImportParameterList().setValue("LANGUAGE", destination.getLanguage());
function2.execute(destination);
function3.getImportParameterList().setValue("JOBNAME", "jb1");
function3.getImportParameterList().setValue("EXTERNAL_USER_NAME", "sap*");
function3.getImportParameterList().setValue("EXT_PROGRAM_NAME", "RMMVRZ00");
function3.getImportParameterList().setValue("SAP_USER_NAME", "sap*");
function3.execute(destination);
JCoFunction function4 = mRepository.getFunction("BAPI_XBP_JOB_CLOSE");
function4.getImportParameterList().setValue("JOBNAME", "jb1");
function4.getImportParameterList().setValue("EXTERNAL_USER_NAME", "sap*");
function4.execute(destination);
JCoFunction function5 = mRepository.getFunction("BAPI_XBP_JOB_START_ASAP");
function5.getImportParameterList().setValue("JOBNAME", "jb1");
function5.getImportParameterList().setValue("EXTERNAL_USER_NAME", "sap*");
function5.execute(destination);
JCoFunction function6 = mRepository.getFunction("RSPO_DOWNLOAD_SPOOLJOB");
function6.getImportParameterList().setValue("ID", "31801");
function6.getImportParameterList().setValue("FNAME", "abc");
function6.execute(destination);
You cannot execute an SAP transaction through JCo. What you can do, is run remote-enabled function modules. So you need to either write a function module of your own, providing exactly the functionality you require, or find an SAP function module, that does what you need (or close enough to be useful).
Your code has the following issues:
XBP BAPIs can only be used if you declare their usage via BAPI_XMI_LOGON and BAPI_XMI_LOGOFF. Pass the parameters interface = 'XBP', version = '3.0', extcompany = 'any name you want'.
You start the program RMMVRZ00 (which corresponds to the program directly behind the transaction code MM60) with the program variant KRUGMANN which is defined at SAP side with a given material number, but your goal is probably to pass a varying material number, so you should first change the material number in the program variant via BAPI_XBP_VARIANT_CHANGE.
After calling BAPI_XBP_JOB_OPEN, you should read the returned value of the JOBCOUNT parameter, and pass it to all subsequent BAPI_XBP_JOB_* calls, along with JOBNAME (I mean, two jobs may be named identically, JOBCOUNT is there to identify the job uniquely).
After calling BAPI_XBP_JOB_START_ASAP, you should wait for the job to be finished, by repeatedly calling BAPI_XBP_JOB_STATUS_GET until the job status is A (aborted) or F (finished successfully).
You hardcode the spool number generated by the program. To retrieve the spool number, you may call BAPI_XBP_JOB_SPOOLLIST_READ which returns all spool data of the job.
Moreover I'm not sure whether you may call the function module RSPO_DOWNLOAD_SPOOLJOB to download the spool data to a file on your java computer. If it doesn't work, you may use the spool data returned by BAPI_XBP_JOB_SPOOLLIST_READ and do whatever you want.
In short, I think that the sequence should be:
BAPI_XMI_LOGON
BAPI_XBP_VARIANT_CHANGE
BAPI_XBP_JOB_OPEN
BAPI_XBP_JOB_ADD_ABAP_STEP
BAPI_XBP_JOB_CLOSE
BAPI_XBP_JOB_START_ASAP
Calling repeatedly BAPI_XBP_JOB_STATUS_GET until status is A or F
Note that it may take some time if there are many jobs waiting in the SAP queue
BAPI_XBP_JOB_SPOOLLIST_READ
Eventually RSPO_DOWNLOAD_SPOOLJOB if it works
BAPI_XMI_LOGOFF
Eventually BAPI_TRANSACTION_COMMIT because XMI writes an XMI log.

GCP dataflow - processing JSON takes too long

I am trying to process json files in a bucket and write the results into a bucket:
DataflowPipelineOptions options = PipelineOptionsFactory.create()
.as(DataflowPipelineOptions.class);
options.setRunner(BlockingDataflowPipelineRunner.class);
options.setProject("the-project");
options.setStagingLocation("gs://some-bucket/temp/");
Pipeline p = Pipeline.create(options);
p.apply(TextIO.Read.from("gs://some-bucket/2016/04/28/*/*.json"))
.apply(ParDo.named("SanitizeJson").of(new DoFn<String, String>() {
#Override
public void processElement(ProcessContext c) {
try {
JsonFactory factory = JacksonFactory.getDefaultInstance();
String json = c.element();
SomeClass e = factory.fromString(json, SomeClass.class);
// manipulate the object a bit...
c.output(factory.toString(e));
} catch (Exception err) {
LOG.error("Failed to process element: " + c.element(), err);
}
}
}))
.apply(TextIO.Write.to("gs://some-bucket/output/"));
p.run();
I have around 50,000 files under the path gs://some-bucket/2016/04/28/ (in sub-directories).
My question is: does it make sense that this takes more than an hour to complete? Doing something similar on a Spark cluster in amazon takes about 15-20 minutes. I suspect that I might be doing something inefficiently.
EDIT:
In my Spark job I aggregate all the results in a DataFrame and only then write the output, all at once. I noticed that my pipeline here writes each file separately, I assume that is why it's taking much longer. Is there a way to change this behavior?
Your jobs are hitting a couple of performance issues in Dataflow, caused by the fact that it is more optimized for executing work in larger increments, while your job is processing lots of very small files. As a result, some aspects of the job's execution end up dominated by per-file overhead. Here's some details and suggestions.
The job is limited rather by writing output than by reading input (though reading input is also a significant part). You can significantly cut that overhead by specifying withNumShards on your TextIO.Write, depending on how many files you want in the output. E.g. 100 could be a reasonable value. By default you're getting an unspecified number of files which in this case, given current behavior of the Dataflow optimizer, matches number of input files: usually it is a good idea because it allows us to not materialize the intermediate data, but in this case it's not a good idea because the input files are so small and per-file overhead is more important.
I recommend to set maxNumWorkers to a value like e.g. 12 - currently the second job is autoscaling to an excessively large number of workers. This is caused by Dataflow's autoscaling currently being geared toward jobs that process data in larger increments - it currently doesn't take into account per-file overhead and behaves not so well in your case.
The second job is also hitting a bug because of which it fails to finalize the written output. We're investigating, however setting maxNumWorkers should also make it complete successfully.
To put it shortly:
set maxNumWorkers=12
set TextIO.Write.to("...").withNumShards(100)
and it should run much better.

How do chain python generator expressions with filtering?

I am trying to chain generators so I can process a large CSV file as a stream of lines rather than doing each single operation in a batch.
This way, I can delay each iteration of each step, avoiding loading an entire data set in memory.
Generator expressions work fine except if I put try to filter the output with an if statement.
This works: and only one iteration is consumed at a time
file_iterable = open("myfile.csv")
parsed_csv_iterable = (parse_i(i) for i in file_iterable)
then I can get a line at a time by calling next() on the resulting iterable. However, if I do this
file_iterable = open("myfile.csv")
parsed_csv_iterable = (parse_i(i) for i in file_iterable if parse_i(i)[0] in [1,2,3])
Then, the iterator keeps running until is exhausted. Why? What's the workaround?

How can I set an expression to the FileSpec property on Foreach File enumerator?

I'm trying to create an SSIS package to process files from a directory that contains many years worth of files. The files are all named numerically, so to save processing everything, I want to pass SSIS a minimum number, and only enumerate files whose name (converted to a number) is higher than my minimum.
I've tried letting the ForEach File loop enumerate everything and then exclude files in a Script Task, but when dealing with hundreds of thousands of files, this is way too slow to be suitable.
The FileSpec property lets you specify a file mask to dictate which files you want in the collection, but I can't quite see how to specify an expression to make that work, as it's essentially a string match.
If there's an expression within the component somewhere which basically says Should I Enumerate? - Yes / No, that would be perfect. I've been experimenting with the below expression, but can't find a property to which to apply it.
(DT_I4)REPLACE( SUBSTRING(#[User::ActiveFilePath],FINDSTRING( #[User::ActiveFilePath], "\", 7 ) + 1 ,100),".txt","") > #[User::MinIndexId] ? "True" : "False"
Here is one way you can achieve this. You could use Expression Task combined with Foreach Loop Container to match the numerical values of the file names. Here is an example that illustrates how to do this. The sample uses SSIS 2012.
This may not be very efficient but it is one way of doing this.
Let's assume there is a folder with bunch of files named in the format YYYYMMDD. The folder contains files for the first day of every month since 1921 like 19210101, 19210201, 19210301 .... all the upto current month 20121101. That adds upto 1,103 files.
Let's say the requirement is only to loop through the files that were created since June 1948. That would mean the SSIS package has to loop through only the files greater than 19480601.
On the SSIS package, create the following three parameters. It is better to configure parameters for these because these values are configurable across environment.
ExtensionToMatch - This parameter of String data type will contain the extension that the package has to loop through. This will supplement the value to FileSpec variable that will be used on the Foreach Loop container.
FolderToEnumerate - This parameter of String data type will store the folder path that contains the files to loop through.
MinIndexId - this parameter of Int32 data type will contain the minimum numerical value above which the files should match the pattern.
Create the following four parameters that will help us loop through the files.
ActiveFilePath - This variable of String data type will hold the file name as the Foreach Loop container loops through each file in the folder. This variable is used in the expression of another variable. To avoid error, set it to a non-empty value, say 1.
FileCount - This is a dummy variable of Int32 data type will be used for this sample to illustrate the number of files that the Foreach Loop container will loop through.
FileSpec - This variable of String data type will hold the file pattern to loop through. Set the expression of this variable to below mentioned value. This expression will use the extension specified on the parameters. If there are no extensions, it will *.* to loop through all files.
"*" + (#[$Package::ExtensionToMatch] == "" ? ".*" : #[$Package::ExtensionToMatch])
ProcessThisFile - This variable of Boolean data type will evaluate whether a particular file matches the criteria or not.
Configure the package as shown below. Foreach loop container will loop through all the files matching the pattern specified on the FileSpec variable. An expression specified on the Expression Task will evaluate during runtime and will populate the variable ProcessThisFile. The variable will then be used on the Precedence constraint to determine whether to process the file or not.
The script task within the Foreach loop container will increment the counter of variable FileCount by 1 for each file that successfully matches the expression.
The script task outside the Foreach loop will simply display how many files were looped through by the Foreach loop container.
Configure the Foreach loop container to loop through the folder using the parameter and the files using the variable.
Store the file name in variable ActiveFilePath as the loop passes through each file.
On the Expression task, set the expression to the following value. The expression will convert the file name without the extension to a number and then will check if it evaluates to greater than the given number in the parameter MinIndexId
#[User::ProcessThisFile] = (DT_BOOL)((DT_I4)(REPLACE(#[User::ActiveFilePath], #[User::FileSpec] ,"")) > #[$Package::MinIndexId] ? 1: 0)
Right-click on the Precedence constraint and configure it to use the variable ProcessThisFile on the expression. This tells the package to process the file only if it matches the condition set on the expression task.
#[User::ProcessThisFile]
On the first script task, I have the variable User::FileCount set to the ReadWriteVariables and the following C# code within the script task. This increments the counter for file that successfully matches the condition.
public void Main()
{
Dts.Variables["User::FileCount"].Value = Convert.ToInt32(Dts.Variables["User::FileCount"].Value) + 1;
Dts.TaskResult = (int)ScriptResults.Success;
}
On the second script task, I have the variable User::FileCount set to the ReadOnlyVariables and the following C# code within the script task. This simply outputs the total number of files that were processed.
public void Main()
{
MessageBox.Show(String.Format("Total files looped through: {0}", Dts.Variables["User::FileCount"].Value));
Dts.TaskResult = (int)ScriptResults.Success;
}
When the package is executed with MinIndexId set to 1948061 (excluding this), it outputs the value 773.
When the package is executed with MinIndexId set to 20111201 (excluding this), it outputs the value 11.
Hope that helps.
From investigating how the ForEach loop works in SSIS (with a view to creating my own to solve the issue) it seems that the way it works (as far as I could see anyway) is to enumerate the file collection first, before any mask is specified. It's hard to tell exactly what's going on without seeing the underlying code for the ForEach loop but it seems to be doing it this way, resulting in slow performance when dealing with over 100k files.
While #Siva's solution is fantastically detailed and definitely an improvement over my initial approach, it is essentially just the same process, except using an Expression Task to test the filename, rather than a Script Task (this does seem to offer some improvement).
So, I decided to take a totally different approach and rather than use a file-based ForEach loop, enumerate the collection myself in a Script Task, apply my filtering logic, and then iterate over the remaining results. This is what I did:
In my Script Task, I use the asynchronous DirectoryInfo.EnumerateFiles method, which is the recommended approach for large file collections, as it allows streaming, rather than having to wait for the entire collection to be created before applying any logic.
Here's the code:
public void Main()
{
string sourceDir = Dts.Variables["SourceDirectory"].Value.ToString();
int minJobId = (int)Dts.Variables["MinIndexId"].Value;
//Enumerate file collection (using Enumerate Files to allow us to start processing immediately
List<string> activeFiles = new List<string>();
System.Threading.Tasks.Task listTask = System.Threading.Tasks.Task.Factory.StartNew(() =>
{
DirectoryInfo dir = new DirectoryInfo(sourceDir);
foreach (FileInfo f in dir.EnumerateFiles("*.txt"))
{
FileInfo file = f;
string filePath = file.FullName;
string fileName = filePath.Substring(filePath.LastIndexOf("\\") + 1);
int jobId = Convert.ToInt32(fileName.Substring(0, fileName.IndexOf(".txt")));
if (jobId > minJobId)
activeFiles.Add(filePath);
}
});
//Wait here for completion
System.Threading.Tasks.Task.WaitAll(new System.Threading.Tasks.Task[] { listTask });
Dts.Variables["ActiveFilenames"].Value = activeFiles;
Dts.TaskResult = (int)ScriptResults.Success;
}
So, I enumerate the collection, applying my logic as files are discovered and immediately adding the file path to my list for output. Once complete, I then assign this to an SSIS Object variable named ActiveFilenames which I'll use as the collection for my ForEach loop.
I configured the ForEach loop as a ForEach From Variable Enumerator, which now iterates over a much smaller collection (Post-filtered List<string> compared to what I can only assume was an unfiltered List<FileInfo> or something similar in SSIS' built-in ForEach File Enumerator.
So the tasks inside my loop can just be dedicated to processing the data, since it has already been filtered before hitting the loop. Although it doesn't seem to be doing much different to either my initial package or Siva's example, in production (for this particular case, anyway) it seems like filtering the collection and enumerating asynchronously provides a massive boost over using the built in ForEach File Enumerator.
I'm going to continue investigating the ForEach loop container and see if I can replicate this logic in a custom component. If I get this working I'll post a link in the comments.
The best you can do is use FileSpec to specify a mask, as you said. You could include at least some specs in it, like files starting with "201" for 2010, 2011 and 2012. Then, in some other task, you could filter out those you don't want to process (for instance, 2010).