Change or Add ROW label - powershell-5.1

$counters = #(
"\Processor(_Total)% Processor Time" ,"\Memory\Available MBytes"
,"\Paging File(_Total)% Usage" ,"\LogicalDisk(*)\Avg. Disk Bytes/Read"
,"\LogicalDisk()\Avg. Disk Bytes/Write" ,"\LogicalDisk(*)\Avg. Disk sec/Read"
,"\LogicalDisk()\Avg. Disk sec/Write" ,"\LogicalDisk(*)\Disk Read Bytes/sec"
,"\LogicalDisk()\Disk Write Bytes/sec" ,"\LogicalDisk(*)\Disk Reads/sec"
,"\LogicalDisk()\Disk Writes/sec"
)
(Get-Counter $counters).countersamples
`Im new to powershell and found this script to get server performance. When you execute this command, you will get a column called “Path”. How would you rename this or add new row for betters understanding.
Example labels:
Read Latency = “\LogicalDisk()\Disk Reads/sec"
Write Latency = "\LogicalDisk()\Disk Writes/sec”
I have tried foreach but they will be executed one at a time and will not be accurate data. They need to be executed at once to capture the performance of the server at that exact time for all performance counters. Our environment are still running on PS 5.1 (including windows 2016/2019).

Related

Read-side latency and responsivness

I am using Lagom with MySQL and I am having latency issue. I am using ES and CQRS. I have intergrated my backend service and frontend service and now facing some issue. I have to refresh my page each time to get the response since it took some time to store in the MySQL database. There is a lag in getting stored and thus fetching data from the database is giving late response.
Is there a way to solve this issue?
Thanks in advance
I have tried providing some settings in the configuration file but doesn't get the desired result.
lagom.persistence.jdbc {
# Configuration for creating tables
create-tables {
# Whether tables should be created automatically as needed
auto = true
# How long to wait for tables to be created, before failing
timeout = 20s
# The cluster role to create tables from
run-on-role = ""
# Exponential backoff for failures configuration for creating tables
failure-exponential-backoff {
# minimum (initial) duration until processor is started again
# after failure
min = 3s
# the exponential back-off is capped to this duration
max = 30s
# additional random delay is based on this factor
random-factor = 0.2
}
}
}
I think you cannot solve this issue in ES and CQRS, because one of the goals of it to separate writing and reading parts. So, that you read side projection and writing part can have different values some time it is normal.
You cant try to read from persistent entity directly;

Can you create a disk and an instance in with one command in Google Compute Engine?

Currently, I'm creating a disk from a snapshot. Then I wait for 60 seconds and create an instance which will use that disk as its system disk. I'm using the gcloud utility for this.
Is there any way I can create the disk and the instance in one command?
Mix of copy-pasted Python code and pseudocode below:
cmd_create_disk = [GCLOUD, 'compute', 'disks', 'create', new_instance,
'--source-snapshot', GCE_RENDER_SNAPSHOT_VERSION,
'--zone', GCE_REGION, '--project', GCE_PROJECT]
# wait for 60 seconds
cmd_make_instance = [GCLOUD, 'compute', 'instances', 'create', new_instance,
'--disk', 'name='+new_instance+',boot=yes,auto-delete=yes',
'--machine-type', instance_type, '--network', GCE_NETWORK,
'--no-address', '--tags', 'render', '--tags', 'vpn',
'--tags', proj_tag, '--zone', GCE_REGION,
'--project', GCE_PROJECT]
The instance uses the disk as its system disk. Waiting for 60 seconds is quite arbitrary and I'd rather leave this up to GCE, making sure the instance is indeed started with the system disk.
When you delete an instance you can specify that the disk should also get deleted. In the same manner, I'd like to create an instance and specify the disk to be created from image.
The boot disk can be created automatically. You can specify the image to use for that using --image and --image-project flags in gcloud compute instances create command line. You'll need to make sure to create the image first though - your current command to create the disk seems to use a snapshot rather than an image.

GCP dataflow - processing JSON takes too long

I am trying to process json files in a bucket and write the results into a bucket:
DataflowPipelineOptions options = PipelineOptionsFactory.create()
.as(DataflowPipelineOptions.class);
options.setRunner(BlockingDataflowPipelineRunner.class);
options.setProject("the-project");
options.setStagingLocation("gs://some-bucket/temp/");
Pipeline p = Pipeline.create(options);
p.apply(TextIO.Read.from("gs://some-bucket/2016/04/28/*/*.json"))
.apply(ParDo.named("SanitizeJson").of(new DoFn<String, String>() {
#Override
public void processElement(ProcessContext c) {
try {
JsonFactory factory = JacksonFactory.getDefaultInstance();
String json = c.element();
SomeClass e = factory.fromString(json, SomeClass.class);
// manipulate the object a bit...
c.output(factory.toString(e));
} catch (Exception err) {
LOG.error("Failed to process element: " + c.element(), err);
}
}
}))
.apply(TextIO.Write.to("gs://some-bucket/output/"));
p.run();
I have around 50,000 files under the path gs://some-bucket/2016/04/28/ (in sub-directories).
My question is: does it make sense that this takes more than an hour to complete? Doing something similar on a Spark cluster in amazon takes about 15-20 minutes. I suspect that I might be doing something inefficiently.
EDIT:
In my Spark job I aggregate all the results in a DataFrame and only then write the output, all at once. I noticed that my pipeline here writes each file separately, I assume that is why it's taking much longer. Is there a way to change this behavior?
Your jobs are hitting a couple of performance issues in Dataflow, caused by the fact that it is more optimized for executing work in larger increments, while your job is processing lots of very small files. As a result, some aspects of the job's execution end up dominated by per-file overhead. Here's some details and suggestions.
The job is limited rather by writing output than by reading input (though reading input is also a significant part). You can significantly cut that overhead by specifying withNumShards on your TextIO.Write, depending on how many files you want in the output. E.g. 100 could be a reasonable value. By default you're getting an unspecified number of files which in this case, given current behavior of the Dataflow optimizer, matches number of input files: usually it is a good idea because it allows us to not materialize the intermediate data, but in this case it's not a good idea because the input files are so small and per-file overhead is more important.
I recommend to set maxNumWorkers to a value like e.g. 12 - currently the second job is autoscaling to an excessively large number of workers. This is caused by Dataflow's autoscaling currently being geared toward jobs that process data in larger increments - it currently doesn't take into account per-file overhead and behaves not so well in your case.
The second job is also hitting a bug because of which it fails to finalize the written output. We're investigating, however setting maxNumWorkers should also make it complete successfully.
To put it shortly:
set maxNumWorkers=12
set TextIO.Write.to("...").withNumShards(100)
and it should run much better.

How to increase hadoop map tasks by implementing getSplits

I want to process multiline CSV files and for that I wrote a custom CSVInputFormat.
I would like to have about 40 threads processing CSV lines on each hadoop node. However, when I create a cluster on Amazon EMR with 5 machines (1 master and 4 cores), I can see I get only 2 map tasks running, even if there are 6 available map slots:
I implemented getSplits in my inputFormat so it would behave like NLineInputFormat. I was expecting with this I would get more thing running in parallel, but have had no effect. Also, I tried setting arguments -s,mapred.tasktracker.map.tasks.maximum=10 --args -jobconf,mapred.map.tasks=10, but no effect.
What can I do to have lines being processed in parallel? The way hadoop is running, it's not scalable, as doesn't matter how many instances I allocate to the cluster, only two map tasks will run at most.
UPDATE:
When I use a non compressed file (zip) as origin, it create more map tasks, about 17 for 1.3 million rows. Even so, I wonder why it shouldn't be more and why more mappers aren't created when data is zipped.
Change the split size to have more splits.
Configuration conf= new Cofiguration();
//set the value that increases your number of splits.
conf.set("mapred.max.split.size", "1020");
Job job = new Job(conf, "My job name");

all pooled connections were in use and max pool size was reached

I am writing a .NET 4.0 console app that
Opens up a connection Uses a Data Reader to cursor through a list of keys
For each key read, calls a web service
Stores the result of a web service in the database
I then spawn multiple threads of this process in order to improve the maximum number of records that I can process per second.
When I up the process beyond about 30 or so threads, I get the following error:
System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.
Is there an Server or client side option to tweak to allow me to obtain more connections fromn the connection pool?
I am calling a sql 2008 r2 DATABASE.
tHx
This sounds like a design issue. What's your total record count from the database? Iterating through the reader will be really fast. Even if you have hundreds of thousands of rows, going through that reader will be quick. Here's a different approach you could take:
Iterate through the reader and store the data in a list of objects. Then iterate through your list of objects at a number of your choice (e.g. two at a time, three at a time, etc) and spawn that number of threads to make calls to your web service in parallel.
This way you won't be opening multiple connections to the database, and you're dealing with what is likely the true bottleneck (the HTTP call to the web service) in parallel.
Here's an example:
List<SomeObject> yourObjects = new List<SomeObject>();
if (yourReader.HasRows) {
while (yourReader.Read()) {
SomeObject foo = new SomeObject();
foo.SomeProperty = myReader.GetInt32(0);
yourObjects.Add(foo);
}
}
for (int i = 0; i < yourObjects.Count; i = i + 2) {
//Kick off your web service calls in parallel. You will likely want to do something with the result.
Task[] tasks = new Task[2] {
Task.Factory.StartNew(() => yourService.MethodName(yourObjects[i].SomeProperty)),
Task.Factory.StartNew(() => yourService.MethodName(yourObjects[i+1].SomeProperty)),
};
Task.WaitAll(tasks);
}
//Now do your database INSERT.
Opening up a new connection for all your requests is incredibly inefficient. If you simply want to use the same connection to keep requesting things, that is more than possible. You can open a connection, and then run as many SqlCommand commands through that one connection. Simply keep the ONE connection around, and dispose of it after all your threading is done.
Please restart the IIS you will be able to connect