Spark Profile for Projections Module Configuration Check Failed: MIN_EXECUTORS_LESS_MAX_EXECUTORS - palantir-foundry

We are on an on-prem stack. I did set up a Projection on a dataset. In the Projection configuration dialog in the build section I left the Text Box in which the Spark Profile can be set empty, as the GUI says: The spark profiles on this projection are managed and adjusted based on data volume. Adjustments to spark profiles will only apply to the next build of this projection.
I schedule the build to run once a day. I consistently succeeds once. From the details of the succeeded build I can see that Spark is using: dynamicAllocationMaxExecutors 40 and dynamicAllocationMinExecutors 2.
When I open the GUI of the projection again (a couple of days later), I can see that Foundry sets the following profile "NUM_EXECUTORS_1". All builds using this profile are failing. In the build details I see: Configuration Check Failed: Min executors must be less than the max executors
Is our Spark Profile wrong/incorrectly configured or is the Projection simply choosing a profile which is not suitable? I would like to let Foundry choose the profile based on the data volume without having to manually adjust it. Can you let me know how to best solve this issue?

Based on the Spark Profiles documentation, when setting NUM_EXECUTORS_1 this adds the following spark configuration spark.executor.instances: 1; spark.dynamicAllocation.maxExecutors: 1. Your stack has the following Spark configuration set as default spark.dynamicAllocation.minExecutors: 2, therefore a pre-launch check MIN_EXECUTORS_LESS_MAX_EXECUTORS fails, since the minExecutors is greater than maxExecutors.
I raised this with the product team, asked whether NUM_EXECUTORS_1 could set spark.dynamicAllocation.minExecutors as well. Other option would be to change the stack default to spark.dynamicAllocation.minExecutors: 1. Thank you for reporting this.

Our onside support team let me know that we have configured a minimum of 2 executors on our stack.
If Foundry then tries to use the profile NUM_EXECUTORS_1 the max value of this profile is below the configured minimum of 2.
Expected behavior of Foundry would be, that the dynamic selection of the Spark Profile honors the configured minimum thresholds. So yes, I would consider this a bug.

Related

How do I know if my Foundry job is using incremental computation?

I want to know if a job I'm debugging is using incremental computation or not since it's necessary for my debugging techniques.
There are two ways to tell: the job's Spark Details will indicate this (if it's using Python), as will its code.
Spark Details
If you navigate to the Spark Details page as noted here, you'll notice there's a tab for Snapshot / Incremental. In this tab, and if your job is using Python, you'll get a description of if your job is running using incremental computation. If the page reports No Incremental Details Found and you ran the job recently, this means it is not using Incremental computation. However, if your job is somewhat old (typically more than a couple of days), this may not be correct as these Spark details are removed for retention reasons.
A quick way to check if your job's information has been removed due to retention is to navigate to the Query Plan tab and see if any information is present. If nothing is present, this means your Spark details have been deleted and you will need to re-run your job in order to see anything. If you want a more reliable way of determining if a job is using incremental computation, I'd suggest following the second method below.
Code
If you navigate to the code backing the Transform, you'll want to look for a couple indicators, depending on the language used.
Python
The Transform will have an #incremental() decorator on it if it's using incremental computation. This doesn't indicate however whether it will choose to write or read an incremental view. The backing code can choose what types of reads or writes it wishes to do, so you'll want to inspect the code more closely to see what it's written to do.
from transforms.api import transform, Input, Output, incremental
#incremental() # This being present indicates incremental computation
#transform(...)
def my_compute_function(...):
...
Java
The Transform will have the getReadRange and getWriteMode methods overridden in the backing code.

Facing round off issue with Jmeter Throughput Metric in exported CSV file in Jmeter 5.0

I can see the whole number for Throughput in Jmeter 5.0, if I double click against each of the sampler.
But it doesn't appear when I export the same report in .csv file.
It is round off in the CSV file, and I need to have the whole number so that I can compare with Baseline and Prior deployments.
How to deal with it, I have been doing the same and it was/is working in older version of Jmeter 2.13, recently I upgraded to latest version 5.0 and facing this issue.
Could anyone help me out on this.
Thanks
Looking into Synthesis Report plugin source
new RateRenderer("#.0"), // Throughput
I don't see easy way of getting the full throughput number as it is being cut to one decimal number.
I would recommend going for Summary Report Listener instead, looking into its source you will have 5 decimal points in the resulting table.
new DecimalFormat("#.00000"), // Throughput //$NON-NLS-1$
Also be aware that you can use Merge Results tool in order to combine results of 2 test runs into a single .jtl file and provide difference prefixes to different runs. Once done you will be able to visualize the difference in throughput for 2 test runs using i.e. Transactions Per Second listener
You can install Merge Results Tool using JMeter Plugins Manager:
The Throughput number is limited to 1 decimal point in the jp#gc - Synthesis Report (filtered) Listener
Whereas we can still get the Throughput number upto 5 decimal point in Summary & Aggregate Report Listeners
This only happens with the latest Jmeter version along with the latest Plugins, and Plugin Manager
But I needed to use only the jp#gc - Synthesis Report (filtered) Listener as I have to use both 90% Line Response Time and Std. Dev. metrics in my custom report, also it has the RegExp filtering capabilities which the above two in-built Listeners are not having.
Hence, have found a Workaround:
Replaced the following older Jar files manually into the latest Jmeter 5.0, and it works.
-JMeterPlugins-Standard.jar
-JMeterPlugins-Extras.jar
-JMeterPlugins-ExtrasLibs.jar
This helps me to have the whole Throughput number from the "jp#gc - Synthesis Report (filtered)" Listener

how to create custom node in knime?

I have added all the plugins of Knime in Eclipse and I want to create my Own custom node. but I am not able to understand how to pass the data from one node to another node.
I saw one node which has been provided by the Knime itself which is " File Reader " node. Now I want the source code of this node or jar file for this node But I am not able to find it out.
I am searching with the similar name in eclipse plugin folder but still I didn't get it.
Can someone please tell me how to pass the data from one node to another node and how to identify the classes or jar for any node given by knime and source code also.
Assuming that your data is a standard datatable, then you need to subclass NodeModel, with a call to the supertype constructor:
public MyNodeModel(){
//One incoming table, one outgoing table
super(1,1);
}
You need to override the default #execute(BufferedDataTable[] inData, ExecutionContext exec) method - this is where the meat of the node work is done and the output table created. Ideally, if your input and output table have a one-to-one row mapping then use a ColumnRearranger class (because this reduces disk IO considerably, and if you need it, allows simple parallelisation of your node), otherwise your execute method needs to iterate through the incoming datatable and generate an output table.
The #configure(DataTableSpec[] inSpecs) method needs to be implemented to at the least provide a spec for the output table if this can be determined before the node is executed (it normally can, and this allows downstream nodes also to be configures, but the 'Transpose' node is an example of a node which cannot do so).
There are various other methods which you also need to implement, but in some cases these will be empty methods.
In addition to the NodeModel, you need to implement some other classes too - a NodeFactory, optionally a NodeSettingsPane and optionally a NodeView.
In Eclipse you can view the sources for many nodes, and also the KNIME community 'book' pages all have a link to their source code. Take a look at https://tech.knime.org/developer-guide and https://tech.knime.org/developer/example for a step-by-step guide. Also, questions to the knime forums (including a developer forum) generally get rapid responses - and KNIME run a Developer Training Course a few times a year if you want to spend a few days learning more. And last but not least, it is worth familiarising yourself with the noding guidelines which describe the best practice of how your node should behave
Source code for KNIME nodes are now available on git hub.
Alternatively you can check under your project>plugin dependencies>knime-base.jar>org.knime.base.node.io.filereader for file reader source code in eclipse KNIME SDK.
Knime-base.jar will be added to your project by default when created with KNIME SDK.

View plot for Node in KNIME_BATCH_APPLICATION

I have been using KNIME 2.7.4 for running analysis algorithm. I have integrated KNIME with our existing application to run in BATCH mode using the below command.
<<KNIME_ROOT_PATH>>\\plugins\\org.eclipse.equinox.launcher_1.2.0.v20110502.jar -application org.knime.product.KNIME_BATCH_APPLICATION -reset -workflowFile=<<Workflow Archive>> -workflow.variable=<<parameter>>,<<value>>,<<DataType>
Knime provide different kinds of plot which I want to use. However I am running the workflow in batch mode. Is there any option in KNIME where I can specify the Node Id and "View" option as a parameter to KNIME_BATCH_APPLICATION.
Would need suggestion or guidance to achieve this functionality.
I have posted this question in KNIME forum and got the satisfactory answer mentioned below
As per concept of command line execution, this requirement does not fit in. Also there is now way for batch executor to open the view of specific plot node.
Hence there could be two solutions
Solution 1
Write the output of workflow in a file and use any charitng plugin to plot the graph and do the drilldown activity.
Solution 2
Use jFreeChart and write the image using ImageWriter node which can be displayed in any screen.

Change default configuration on Hadoop slave nodes?

Currently I am trying to pass some values through command line arguments and then parse it using GenericOptionsParser with tool implemented.
from the Master node I run something like this:
bin/hadoop jar MYJAR.jar MYJOB -D mapred.reduce.tasks=13
But this only get applied on the Master!! Is there any way to make this applied on the slaves as well?
I use Hadoop 0.20.203.
Any help is appreciated.
But this only get applied on the Master!! Is there any way to make this applied on the slaves as well?
According to the "Hadoop : The Definitive Guide". Setting some of the property on the client side is of no use. You need to set the same in the configuration file. Note, that you can also create new properties in the configuration files and read them in the code using the Configuration Object.
Be aware that some properties have no effect when set in the client configuration. For
example, if in your job submission you set mapred.tasktracker.map.tasks.maximum with
the expectation that it would change the number of task slots for the tasktrackers running your job, then you would be disappointed, since this property only is only honored
if set in the tasktracker’s mapred-site.html file. In general, you can tell the component
where a property should be set by its name, so the fact that mapred.task.tracker.map.tasks.maximum starts with mapred.tasktracker gives you a clue that it can be set only for the tasktracker daemon. This is not a hard and fast rule, however, so in some cases you may need to resort to trial and error, or even reading the source.
You can also configure the environment of the Hadoop variables using the HADOOP_*_OPTS in the conf/hadoop-env.sh file.
Again, according to the "Hadoop : The Definitive Guide".
Do not confuse setting Hadoop properties using the -D property=value option to GenericOptionsParser (and ToolRunner) with setting JVM system properties using the -Dproperty=value option to the java command. The syntax for JVM system properties does not allow any whitespace between the D and the property name, whereas GenericOptionsParser requires them to be separated by whitespace.
JVM system properties are retrieved from the java.lang.System class, whereas Hadoop properties are accessible only from a Configuration object.