I have a javaexec task in my build.gradle file while processes more than 100 files sequentially. I want to add incremental build behavior to this task.
Please let me know how to implement this incremental feature for custom JavaExec task.
I have tried the following:
class IncrementalReverseTask extends DefaultTask {
#InputDirectory
def File inputDir
#OutputDirectory
def File outputDir
#Input
def inputProperty
#TaskAction
void execute(IncrementalTaskInputs inputs) {
println inputs.incremental ? "CHANGED inputs considered out of date"
: "ALL inputs considered out of date"
if (!inputs.incremental)
project.delete(outputDir.listFiles())
inputs.outOfDate { change ->
println "Modified: ${change.file.name}"
def targetFile = new File(outputDir, change.file.name)
targetFile.text = change.file.text.reverse()
}
inputs.removed { change ->
println "removed: ${change.file.name}"
def targetFile = new File(outputDir, change.file.name)
targetFile.delete()
}
}
}
then create a task like this
task incrementalReverse(type: IncrementalReverseTask) {
inputDir = file('resource/inputs')
outputDir = file('resource/outputs')
inputProperty = project.properties['taskInputProperty'] ?: "original"
}
The above incremental task checks for changes in files in inputs folder.
Related
I have a scenario where a certain number of operations including a group by has to be applied on a number of small (~300MB each) files. The operation looks like this..
df.groupBy(....).agg(....)
Now to process it on multiple files, I can use a wildcard "/**/*.csv" however, that creates a single RDD and partitions it to for the operations. However, looking at the operations, it is a group by and involves lot of shuffle which is unnecessary if the files are mutually exclusive.
What, I am looking at is, a way where i can create independent RDD's on files and operate on them independently.
It is more an idea than a full solution and I haven't tested it yet.
You can start with extracting your data processing pipeline into a function.
def pipeline(f: String, n: Int) = {
sqlContext
.read
.format("com.databricks.spark.csv")
.option("header", "true")
.load(f)
.repartition(n)
.groupBy(...)
.agg(...)
.cache // Cache so we can force computation later
}
If your files are small you can adjust n parameter to use as small number of partitions as possible to fit data from a single file and avoid shuffling. It means you are limiting concurrency but we'll get back to this issue later.
val n: Int = ???
Next you have to obtain a list of input files. This step depends on a data source but most of the time it is more or less straightforward:
val files: Array[String] = ???
Next you can map above list using pipeline function:
val rdds = files.map(f => pipeline(f, n))
Since we limit concurrency at the level of the single file we want to compensate by submitting multiple jobs. Lets add a simple helper which forces evaluation and wraps it with Future
import scala.concurrent._
import ExecutionContext.Implicits.global
def pipelineToFuture(df: org.apache.spark.sql.DataFrame) = future {
df.rdd.foreach(_ => ()) // Force computation
df
}
Finally we can use above helper on the rdds:
val result = Future.sequence(
rdds.map(rdd => pipelineToFuture(rdd)).toList
)
Depending on your requirements you can add onComplete callbacks or use reactive streams to collect the results.
If you have many files, and each file is small (you say 300MB above which I would count as small for Spark), you could try using SparkContext.wholeTextFiles which will create an RDD where each record is an entire file.
By this way we can write multiple RDD parallely
public class ParallelWriteSevice implements IApplicationEventListener {
private static final IprogramLogger logger = programLoggerFactory.getLogger(ParallelWriteSevice.class);
private static ExecutorService executorService=null;
private static List<Future<Boolean>> futures=new ArrayList<Future<Boolean>>();
public static void submit(Callable callable) {
if(executorService==null)
{
executorService=Executors.newFixedThreadPool(15);//Based on target tables increase this
}
futures.add(executorService.submit(callable));
}
public static boolean isWriteSucess() {
boolean writeFailureOccured = false;
try {
for (Future<Boolean> future : futures) {
try {
Boolean writeStatus = future.get();
if (writeStatus == false) {
writeFailureOccured = true;
}
} catch (Exception e) {
logger.error("Erorr - Scdeduled write failed " + e.getMessage(), e);
writeFailureOccured = true;
}
}
} finally {
resetFutures();
if (executorService != null)
executorService.shutdown();
executorService = null;
}
return !writeFailureOccured;
}
private static void resetFutures() {
logger.error("resetFutures called");
//futures.clear();
}
}
I'm trying to setup TeamCity, using config as code with Kotlin. I'm writing wrappers for buildsteps so I can hide the default exposed configuration and only expose parameters that matter. This would allow me to prevent users of the class from changing values that would cause build errors.
I want this:
steps {
step {
name = "Restore NuGet Packages"
type = "jb.nuget.installer"
param("nuget.path", "%teamcity.tool.NuGet.CommandLine.3.3.0%")
param("nuget.updatePackages.mode", "sln")
param("nuget.use.restore", "restore")
param("sln.path", "path_to_solution") //parameter here
param("toolPathSelector", "%teamcity.tool.NuGet.CommandLine.3.3.0%")
}
...to be this:
MyBuildSteps.buildstep1("path_to_solution")
Here's the function signature for step:
public final class BuildSteps {
public final fun step(base: BuildStep?, init: BuildStep.() -> Unit ): Unit { /* compiled code */ }
}
This is what I tried:
class MyBuildSteps {
fun restoreNugetPackages(slnPath: String): kotlin.Unit {
var step: BuildStep = BuildStep {
name = "Restore NuGet Packages"
type = "jb.nuget.installer"
}
var stepParams: List = Parametrized {
param("build-file-path", slnPath)
param("msbuild_version", "14.0")
param("octopus_octopack_package_version", "1.0.0.%build.number%")
param("octopus_run_octopack", "true")
param("run-platform", "x86")
param("toolsVersion", "14.0")
param("vs.version", "vs2015")
}
return {
step.name
step.type
stepParams
} //how do I return this?
}
}
Any advice would be much appreciated!
I assume you want to encapsulate step {...} into a function buildstep1 with a parameter slnPath.
Use this function signature and copy-paste the step {...} part right inside. Add any parameters you see fit:
fun BuildSteps.buildstep1(slnPath: String) {
step {
name = "Restore NuGet Packages"
type = "jb.nuget.installer"
param("nuget.path", "%teamcity.tool.NuGet.CommandLine.3.3.0%")
param("nuget.updatePackages.mode", "sln")
param("nuget.use.restore", "restore")
param("sln.path", slnPath) // your parameter here
param("toolPathSelector", "%teamcity.tool.NuGet.CommandLine.3.3.0%")
}
}
That's all! Use it instead of the step {...} construct:
steps {
buildstep1("path_to_solution")
}
This function may be declared anywhere in the configuration file (I usually place those at the bottom) or in a separate .kts file and imported (theoretically).
I'm trying to write automated tests for a gui application written in scala with swing. I'm adapting the code from this article which leverages getName() to find the right ui element in the tree.
This is a self-contained version of my code. It's incomplete, untested, and probably broken and/or non-idiomatic and/or suboptimal in various ways, but my question is regarding the compiler error listed below.
import scala.swing._
import java.awt._
import ListView._
import org.scalatest.junit.JUnitRunner
import org.junit.runner.RunWith
import org.scalatest.FunSpec
import org.scalatest.matchers.ShouldMatchers
object MyGui extends SimpleSwingApplication {
def top = new MainFrame {
title = "MyGui"
val t = new Table(3, 3)
t.peer.setName("my-table")
contents = t
}
}
object TestUtils {
def getChildNamed(parent: UIElement, name: String): UIElement = {
// Debug line
println("Class: " + parent.peer.getClass() +
" Name: " + parent.peer.getName())
if (name == parent.peer.getName()) { return parent }
if (parent.peer.isInstanceOf[java.awt.Container]) {
val children = parent.peer.getComponents()
/// COMPILER ERROR HERE ^^^
for (child <- children) {
val matching_child = getChildNamed(child, name)
if (matching_child != null) { return matching_child }
}
}
return null
}
}
#RunWith(classOf[JUnitRunner])
class MyGuiSpec extends FunSpec {
describe("My gui window") {
it("should have a table") {
TestUtils.getChildNamed(MyGui.top, "my-table")
}
}
}
When I compile this file I get:
29: error: value getComponents is not a member of java.awt.Component
val children = parent.peer.getComponents()
^
one error found
As far as I can tell, getComponents is in fact a member of java.awt.Component. I used the code in this answer to dump the methods on parent.peer, and I can see that getComponents is in the list.
Answers that provide another approach to the problem (automated gui testing in a similar manner) are welcome, but I'd really like to understand why I can't access getComponents.
The issue is that you're looking at two different classes. Your javadoc link points to java.awt.Container, which indeed has getComponents method. On the other hand, the compiler is looking for getComponents on java.awt.Component (the parent class), which is returned by parent.peer, and can't find it.
Instead of if (parent.peer.isInstanceOf[java.awt.Container]) ..., you can verify the type of parent.peer and cast it like this:
parent.peer match {
case container: java.awt.Container =>
// the following works because container isA Container
val children = container.getComponents
// ...
case _ => // ...
}
I've written a Scala program that I'd like to be triggered through a UI (also in Swing). The problem is, when I trigger it, the UI hangs until the background program completes. I got to thinking that the only way to get around this is by having the program run in another thread/actor and have it update the UI as and when required. Updating would include a status bar which would show the file currently being processed and a progressbar.
Since Scala actors are deprecated, I'm having a tough time trying to plough through Akka to get some kind of basic multithreading running. The examples given on the Akka website are also quite complicated.
But more than that, I'm finding it difficult to wrap my head around how to attempt this problem. What I can come up with is:
Background program runs as one actor
UI is the main program
Have another actor that tells the UI to update something
Step 3 is what is confounding me. How do I tell the UI without locking up some variable somewhere?
Also, I'm sure this problem has been solved earlier. Any sample code for the same would be highly appreciated.
For scala 2.10
You can use scala.concurrent.future and then register a callback on completion. The callback will update the GUI on the EDT thread.
Lets do it!
//in your swing gui event listener (e.g. button clicked, combo selected, ...)
import scala.concurrent.future
//needed to execute futures on a default implicit context
import scala.concurrent.ExecutionContext.Implicits._
val backgroundOperation: Future[Result] = future {
//... do that thing, on another thread
theResult
}
//this goes on without blocking
backgroundOperation onSuccess {
case result => Swing.onEDT {
//do your GUI update here
}
}
This is the most simple case:
we're updating only when done, with no progress
we're only handling the successful case
To deal with (1) you could combine different futures, using the map/flatMap methods on the Future instance. As those gets called, you can update the progress in the UI (always making sure you do it in a Swing.onEDT block
//example progress update
val backgroundCombination = backgroundOperation map { partial: Result =>
progress(2)
//process the partial result and obtain
myResult2
} //here you can map again and again
def progress(step: Int) {
Swing.onEDT {
//do your GUI progress update here
}
}
To deal with (2) you can register a callback onFailure or handle both cases with the onComplete.
For relevant examples: scaladocs and the relevant SIP (though the SIP examples seems outdated, they should give you a good idea)
If you want to use Actors, following may work for you.
There are two actors:
WorkerActor which does data processing (here, there is simple loop with Thread.sleep). This actor sends messages about progress of work to another actor:
GUIUpdateActor - receives updates about progress and updates UI by calling handleGuiProgressEvent method
UI update method handleGuiProgressEvent receives update event.
Important point is that this method is called by Actor using one of Akka threads and uses Swing.onEDT to do Swing work in Swing event dispatching thread.
You may add following to various places to see what is current thread.
println("Current thread:" + Thread.currentThread())
Code is runnable Swing/Akka application.
import akka.actor.{Props, ActorRef, Actor, ActorSystem}
import swing._
import event.ButtonClicked
trait GUIProgressEventHandler {
def handleGuiProgressEvent(event: GuiEvent)
}
abstract class GuiEvent
case class GuiProgressEvent(val percentage: Int) extends GuiEvent
object ProcessingFinished extends GuiEvent
object SwingAkkaGUI extends SimpleSwingApplication with GUIProgressEventHandler {
lazy val processItButton = new Button {text = "Process it"}
lazy val progressBar = new ProgressBar() {min = 0; max = 100}
def top = new MainFrame {
title = "Swing GUI with Akka actors"
contents = new BoxPanel(Orientation.Horizontal) {
contents += processItButton
contents += progressBar
contents += new CheckBox(text = "another GUI element")
}
val workerActor = createActorSystemWithWorkerActor()
listenTo(processItButton)
reactions += {
case ButtonClicked(b) => {
processItButton.enabled = false
processItButton.text = "Processing"
workerActor ! "Start"
}
}
}
def handleGuiProgressEvent(event: GuiEvent) {
event match {
case progress: GuiProgressEvent => Swing.onEDT{
progressBar.value = progress.percentage
}
case ProcessingFinished => Swing.onEDT{
processItButton.text = "Process it"
processItButton.enabled = true
}
}
}
def createActorSystemWithWorkerActor():ActorRef = {
def system = ActorSystem("ActorSystem")
val guiUpdateActor = system.actorOf(
Props[GUIUpdateActor].withCreator(new GUIUpdateActor(this)), name = "guiUpdateActor")
val workerActor = system.actorOf(
Props[WorkerActor].withCreator(new WorkerActor(guiUpdateActor)), name = "workerActor")
workerActor
}
class GUIUpdateActor(val gui:GUIProgressEventHandler) extends Actor {
def receive = {
case event: GuiEvent => gui.handleGuiProgressEvent(event)
}
}
class WorkerActor(val guiUpdateActor: ActorRef) extends Actor {
def receive = {
case "Start" => {
for (percentDone <- 0 to 100) {
Thread.sleep(50)
guiUpdateActor ! GuiProgressEvent(percentDone)
}
}
guiUpdateActor ! ProcessingFinished
}
}
}
If you need something simple you can run the long task in a new Thread and just make sure to update it in the EDT:
def swing(task: => Unit) = SwingUtilities.invokeLater(new Runnable {
def run() { task }
})
def thread(task: => Unit) = new Thread(new Runnable {
def run() {task}
}).run()
thread({
val stuff = longRunningTask()
swing(updateGui(stuff))
})
You can define your own ExecutionContext which will execute anything on the Swing Event Dispatch Thread using SwingUtilities.invokeLater and then use this context to schedule a code which needs to be executed by Swing, still retaining the ability to chain Futures the Scala way, including passing results between them.
import javax.swing.SwingUtilities
import scala.concurrent.ExecutionContext
object OnSwing extends ExecutionContext {
def execute(runnable: Runnable) = {
SwingUtilities.invokeLater(runnable)
}
def reportFailure(cause: Throwable) = {
cause.printStackTrace()
}
}
case ButtonClicked(_) =>
Future {
doLongBackgroundProcess("Timestamp")
}.foreach { result =>
txtStatus.text = result
}(OnSwing)
I wrote my first console app in Scala, and I wrote my first Swing app in Scala -- in case of the latter, the entry point is top method in my object extending SimpleSwingApplication.
However I would like to still go through main method, and from there call top -- or perform other equivalent actions (like creating a window and "running" it).
How to do it?
Just in case if you are curious why, the GUI is optional, so I would like to parse the command line arguments and then decide to show (or not) the app window.
If you have something like:
object MySimpleApp extends SimpleSwingApplication {
// ...
}
You can just call MySimpleApp.main to start it from the console. A main method is added when you mix SimpleSwingApplication trait. Have a look at the scaladoc.
Here is the most basic example:
import swing._
object MainApplication {
def main(args: Array[String]) = {
GUI.main(args)
}
object GUI extends SimpleSwingApplication {
def top = new MainFrame {
title = "Hello, World!"
}
}
}
Execute scala MainApplication.scala from the command line to start the Swing application.
If you override the main method that you inherit from SimpleSwingApplication, you can do whatever you want:
object ApplicationWithOptionalGUI extends SimpleSwingApplication {
override def main(args: Array[String]) =
if (parseCommandLine(args))
super.main(args) // Starts GUI
else
runWithoutGUI(args)
...
}
I needed main.args to be usable in SimpleSwingApplication. I wanted to give a filename to process as an argument in CLI, or then use JFileChooser in case the command line argument list was empty.
I do not know if there is simple method to use command line args in SimpleSwingApplication, but how I got it working was to define, in demoapp.class:
class demoSSA(args: Array[String]) extends SimpleSwingApplication {
....
var filename: String = null
if (args.length > 0) {
filename = args(0)
} else {
// use JFileChooser to select filename to be processed
// https://stackoverflow.com/questions/23627894/scala-class-running-a-file-chooser
}
....
}
object demo {
def main(args: Array[String]): Unit = {
val demo = new demoSSA(args)
demo.startup(args)
}
}
Then start the app by
demo [args], by giving filename as a CLI argument or leaving it empty and program uses JFileChooser to ask it.
Is there a way to get to the args of main() in SimpleSwingApplication singleton object?
Because it doesn't work to parse 'filename' in overridden SimpleSwingApplication.main and to use 'filename' in the singleton object it needs to be declared (val args: Array[String] = null;), not just defined (val args: Array[String];). And singleton objecs inherited from SSA cannot have parameters.
(Edit)
Found one another way:
If whole demoSSA is put inside overridden main() or startup(), then top MainFrame needs to defined outside as top = null and re-declare it from startup() as this.top:
object demo extends SimpleSwingApplication {
var top: MainFrame = null
override def startup(args: Array[String]) {
... // can use args here
this.top = new MainFrame {
... // can use args here also
};
val t = this.top
if (t.size == new Dimension(0,0)) t.pack()
t.visible = true
}
}
But I think I like the former method more, with separated main object.It has at least one level indentation less than the latter method.