How to specify EMR cluster create CLI commands using AWS Java SDK? - json

Ok, this question is where I reached after trying out some stuff. I'll first give a brief intro to what I wanted to do and how I got here.
I'm writing a script to start an EMR cluster using Java AWS SDK. The EMR cluster is to be started inside a VPC and a subnet with a certain id. When I specify the subnet id (code line below ending with // ******) the emr cluster stays in the STARTING state and does not move ahead for several minutes, eventually giving up and failing. I'm not sure if there's a bug with the implementation of this functionality in the SDK.
try {
/**
* Specifying credentials
*/
String accessKey = EmrUtils.ACCESS_KEY;
String secretKey = EmrUtils.SECRET_ACCESS_KEY;
AWSCredentials credentials = new BasicAWSCredentials(accessKey,
secretKey);
/**
* Initializing emr client object
*/
emrClient = new AmazonElasticMapReduceClient(credentials);
emrClient.setEndpoint(EmrUtils.ENDPOINT);
/**
* Specifying bootstrap actions
*/
ScriptBootstrapActionConfig scriptBootstrapConfig = new ScriptBootstrapActionConfig();
scriptBootstrapConfig.setPath("s3://bucket/bootstrapScript.sh");
BootstrapActionConfig bootstrapActions = new BootstrapActionConfig(
"Bootstrap Script", scriptBootstrapConfig);
RunJobFlowRequest jobFlowRequest = new RunJobFlowRequest()
.withName("Java SDK EMR cluster")
.withLogUri(EmrUtils.S3_LOG_URI)
.withAmiVersion(EmrUtils.AMI_VERSION)
.withBootstrapActions(bootstrapActions)
.withInstances(
new JobFlowInstancesConfig()
.withEc2KeyName(EmrUtils.EC2_KEY_PAIR)
.withHadoopVersion(EmrUtils.HADOOP_VERSION)
.withInstanceCount(1)
.withEc2SubnetId(EmrUtils.EC2_SUBNET_ID) // ******
.withKeepJobFlowAliveWhenNoSteps(true)
.withMasterInstanceType(EmrUtils.MASTER_INSTANCE_TYPE)
.withTerminationProtected(true)
.withSlaveInstanceType(EmrUtils.SLAVE_INSTANCE_TYPE));
RunJobFlowResult result = emrClient.runJobFlow(jobFlowRequest);
String jobFlowId = result.getJobFlowId();
System.out.println(jobFlowId);
} catch (Exception e) {
e.printStackTrace();
System.out.println("Shutting down cluster");
if (emrClient != null) {
emrClient.shutdown();
}
}
When I do the same thing using the EMR console, the cluster starts, bootstraps and successfully goes into the WAITING state. Is there any other way I can specify the subnet id to start a cluster. I suppose boto allows us to send additional parameters as a string. I found something similar in Java: .withAdditionalInfo(additionalInfo) which is a method of RunJobFlowRequest() and takes a JSON string as an argument. I don't however know the key that should be used for the ec2 subnet id in the JSON string.
(Using python boto is not an option for me, I've faced other showstopping issues with that and had to shift to AWS Java SDK)

Related

Create a Network Load Balancer on Oracle Cloud Infrastructure with a Reserved IP using Terraform

Using Terraform to set up a Network Load Balancer on Oracle Cloud Infrastructure, it works as expected if created with an ephemeral public IP, however one created using a reserved public IP does not respond. Here are the exact Terraform resourses used to create the load balancer:
resource "oci_core_public_ip" "ip" {
for_each = { for lb in var.load_balancers: lb.subnet => lb if ! lb.private
compartment_id = local.compartment_ocid
display_name = "${var.name}-public-ip"
lifetime = "RESERVED"
lifecycle {
prevent_destroy = true
}
}
resource "oci_network_load_balancer_network_load_balancer" "nlb" {
for_each = { for lb in var.load_balancers: lb.subnet => lb if lb.type == "network" }
compartment_id = local.compartment_ocid
display_name = "${var.name}-network-load-balancer"
subnet_id = oci_core_subnet.s[each.value.subnet].id
is_private = each.value.private
#reserved_ips {
# id = oci_core_public_ip.ip[each.value.subnet].id
#}
}
All of the other resources: security list rules, listeners, backend set and backends, etc, etc, are created such that the above works. If, however I uncomment the assignment of reserved_ips to the network load balancer then it does not work: no response from the load balancer's public IP. Everything is the same except those three lines being uncommented.
Between each test I tear down everything and recreate with Terraform. It always works with an ephemeral IP and never works with the reserved IP. Why? What am I missing? Or does this just not work as advertised?
The Terraform version is v1.3.4 and the resource version is oracle/oci version 4.98.0.
The reserved IP is set up correctly however the terraform provider removes its association with the load balancer's private IP. Closer inspection of the Terraform output shows this
~ resource "oci_core_public_ip" "ip" {
id = "ocid1.publicip.oc1.uk-london-1.ama...sta"
- private_ip_id = "ocid1.privateip.oc1.uk-london-1.abw...kya" -> null
# (11 unchanged attributes hidden)
}
Manually replacing it fixes it (until the next tf run)
$ oci network public-ip update --public-ip-id ocid1.publicip.oc1.uk-london-1.ama...rrq --private-ip-id ocid1.privateip.oc1.uk-london-1.abw...kya
There is a bug ticket on Terraform's github.

How to find CPU MEMORY usage with docker stats command?

I am using docker-java API to execute docker API in my project. I didn't find any suitable method which lists down docker CPU memory usage as
GET /v1.24/containers/redis1/stats HTTP/1.1 with the help of docker-java API
Dependency
compile group: 'com.github.docker-java', name: 'docker-java', version: '3.1.2'
Code
public static void execute() {
DockerClient dockerClient = DockerClientBuilder.getInstance().build();
dockerClient.statsCmd("containerName");
}
I didn't get any output
Tell me how to execute docker stats with docker-java api
This works for me
public Statistics getNextStatistics() throws ProfilingException {
AsyncResultCallback<Statistics> callback = new AsyncResultCallback<>();
client.statsCmd(containerId).exec(callback);
Statistics stats;
try {
stats = callback.awaitResult();
callback.close();
} catch (RuntimeException | IOException e) {
// you may want to throw an exception here
}
return stats; // this may be null or invalid if the container has terminated
}
DockerClient is where we can establish a connection between a Docker engine/daemon and our application.
By default, the Docker daemon can only be accessible at the unix:///var/run/docker.sock file. We can locally communicate with the Docker engine listening on the Unix socket unless otherwise configured.
we can open a connection in two steps:
DefaultDockerClientConfig.Builder config
= DefaultDockerClientConfig.createDefaultConfigBuilder();
DockerClient dockerClient = DockerClientBuilder
.getInstance(config)
.build();
Since engines could rely on other characteristics, the client is also configurable with different conditions.
For example, the builder accepts a server URL, that is, we can update the connection value if the engine is available on port 2375:
DockerClient dockerClient
= DockerClientBuilder.getInstance("tcp://docker.baeldung.com:2375").build();
Note that we need to prepend the connection string with unix:// or tcp:// depending on the connection type.

SSH to Google Compute instance using NodeJS, without gcloud

I'm trying to create a SSH tunnel into a compute instance, from an environment that doesn't have gcloud installed (App Engine Standard NodeJS Environment).
What are the steps needed to do that? How does gcloud compute ssh command does it? Is there a NodeJS library that already does it?
I created the package gcloud-ssh-tunnel that does the necessary steps:
Create a private/public key using sshpk
Imports the public key using the OS Login API
SSH using ssh2 (and specifically create a tunnel, because this was the use case I needed - see the Why? section in the package)
Delete the public key using the OS Login API (to not overflow the account or leave security access)
You can use ssh2 to do that in nodejs.
"gcloud compute ssh" generates persistent SSH keys for the user. The public key is stored in project or instance SSH keys metadata, and the Guest Environment creates the necessary local user and places ~/.ssh/authorized_keys in its home directory.
You can manually add your public key to the instance, and then connect to it via ssh using a node ssh library1.
Or you can set a startup script for the instance when you are creating it2.
As Cloud Ace pointed out, you can use the ssh2 module3 for node.js compatibility.
In order to SSH into a GCP instance you have to:
Enable OS Login
Create a service account and assign it "Compute OS Admin Login" role.
Create SSH key and import it into the service account.
Use that SSH key and POSIX username.
The first 2 steps already link to the documentation.
Create SSH key:
import {
generatePrivateKey,
} from 'sshpk';
const keyPair = generatePrivateKey('ecdsa');
const privateKey = keyPair.toString();
const publicKey = keyPair.toPublic().toString();
Import key:
const osLoginServiceClient = new OsLoginServiceClient({
credentials: googleCredentials,
});
const [result] = await osLoginServiceClient.importSshPublicKey({
parent: osLoginServiceClient.userPath(googleCredentials.client_email),
sshPublicKey: {
expirationTimeUsec: ((Date.now() + 10 * 60 * 1_000) * 1_000).toString(),
key: publicKey,
},
});
SSH using the key:
const ssh = new NodeSSH();
await ssh.connect({
host,
privateKey,
username: loginProfile.posixAccounts[0].username,
});
In this example, I am using node-ssh but you can use anything.
The only other catch is that you need to figure out the public host. Implementation for that looks like this:
const findFirstPublicIp = async (
googleCredentials: GoogleCredentials,
googleZone: string,
googleProjectId: string,
instanceName: string,
) => {
const instancesClient = new InstancesClient({
credentials: googleCredentials,
});
const instances = await instancesClient.get({
instance: instanceName,
project: googleProjectId,
zone: googleZone,
});
for (const instance of instances) {
if (!instance || !('networkInterfaces' in instance) || !instance.networkInterfaces) {
throw new Error('Unexpected result.');
}
for (const networkInterface of instance.networkInterfaces) {
if (!networkInterface || !('accessConfigs' in networkInterface) || !networkInterface.accessConfigs) {
throw new Error('Unexpected result.');
}
for (const accessConfig of networkInterface.accessConfigs) {
if (accessConfig.natIP) {
return accessConfig.natIP;
}
}
}
}
throw new Error('Could not locate public instance IP address.');
};
Finally, to clean up, you have to call deleteSshPublicKey with the name of the key that you've imported:
const fingerprint = crypto
.createHash('sha256')
.update(publicKey)
.digest('hex');
const sshPublicKey = loginProfile.sshPublicKeys?.[fingerprint];
if (!sshPublicKey) {
throw new Error('Could not locate SSH public key with a matching fingerprint.');
}
const ssh = new NodeSSH();
await ssh.connect({
host,
privateKey,
username: loginProfile.posixAccounts[0].username,
});
await osLoginServiceClient.deleteSshPublicKey({
name: sshPublicKey.name,
});
In general, you'd need to reserve & assign a static external IP address to begin with (unless trying to SSH from within the same network). And a firewall rule needs to be defined for port tcp/22, which then can be applied as a "label" to the network interface, which has that external IP assigned.
The other way around works with gcloud app instances ssh:
SSH into the VM of an App Engine Flexible instance
which might be less effort & cost to setup, because a GCP VM usually has gcloud installed.

Flume stream to mysql

I have been trying to stream a data into MySQL database using APACHE KAFKA and FLUME. (Here is my flume configuration file)
agent.sources=kafkaSrc
agent.channels=channel1
agent.sinks=jdbcSink
agent.channels.channel1.type=org.apache.flume.channel.kafka.KafkaChannel
agent.channels.channel1.brokerList=localhost:9092
agent.channels.channel1.topic=kafkachannel
agent.channels.channel1.zookeeperConnect=localhost:2181
agent.channels.channel1.capacity=10000
agent.channels.channel1.transactionCapacity=1000
agent.sources.kafkaSrc.type = org.apache.flume.source.kafka.KafkaSource
agent.sources.kafkaSrc.channels = channel1
agent.sources.kafkaSrc.zookeeperConnect = localhost:2181
agent.sources.kafkaSrc.topic = kafka-mysql
***agent.sinks.jdbcSink.type = How to declare this?***
agent.sinks.jdbcSink.connectionString = jdbc:mysql://1.1.1.1:3306/test
agent.sinks.jdbcSink.username=user
agent.sinks.jdbcSink.password=password
agent.sinks.jdbcSink.batchSize = 10
agent.sinks.jdbcSink.channel =channel1
agent.sinks.jdbcSink.sqlDialect=MYSQL
agent.sinks.jdbcSink.driver=com.mysql.jdbc.Driver
agent.sinks.jdbcSink.sql=(${body:varchar})
I know how to stream data into hadoop or hbase (logger type or hdfs type), However can't find a type to stream into mysql DB. So my question is how do i declare the jdbcSink.type?
You could always create a custom sink for MySQL. This is what we did at FIWARE with Cygnus tool.
Feel free to get inspired from it: https://github.com/telefonicaid/fiware-cygnus/blob/master/cygnus-ngsi/src/main/java/com/telefonica/iot/cygnus/sinks/NGSIMySQLSink.java
It extends this other custom base class for all our sinks: https://github.com/telefonicaid/fiware-cygnus/blob/master/cygnus-ngsi/src/main/java/com/telefonica/iot/cygnus/sinks/NGSISink.java
Basically, you have to extend AbstractSink and implement the Configurable interface. That means to override al least the following methods:
public Status process() throws EventDeliveryException
and:
public void configure(Context context)
respectively.

Bare Metal Cloud - How to set authorized ssh keys for compute instances?

I have successfully provisioned Bare Metal Cloud compute instances using the following code:
public static Instance createInstance(
ComputeClient computeClient,
String compartmentId,
AvailabilityDomain availabilityDomain,
String instanceName,
Image image,
Shape shape,
Subnet subnet
) {
LaunchInstanceResponse response = computeClient.launchInstance(
LaunchInstanceRequest.builder()
.launchInstanceDetails(
LaunchInstanceDetails.builder()
.availabilityDomain(availabilityDomain.getName())
.compartmentId(compartmentId)
.displayName(instanceName)
.imageId(image.getId())
.shape(shape.getShape())
.subnetId(subnet.getId())
.build())
.build());
return response.getInstance();
}
However, I can't SSH into any instances I create via the code above, because there's no parameter on launchInstance to pass in the public key of my SSH keypair.
How can I tell the instance what SSH public key to allow? I know it must be possible somehow since the console UI allows me to provide the SSH public key as part of instance creation.
According to the launch instance API documentation, you need to pass your SSH public key via the ssh_authorized_keys field of the metadata parameter:
Providing Cloud-Init Metadata
You can use the following metadata key names to provide information to Cloud-Init:
"ssh_authorized_keys" - Provide one or more public SSH keys to be
included in the ~/.ssh/authorized_keys file for the default user on
the instance. Use a newline character to separate multiple keys. The
SSH keys must be in the format necessary for the authorized_keys file
The code for this in the Java SDK looks like this:
public static Instance createInstance(
ComputeClient computeClient,
String compartmentId,
AvailabilityDomain availabilityDomain,
String instanceName,
Image image,
Shape shape,
Subnet subnet
) {
String sshPublicKey = "ssh-rsa AAAAB3NzaC1y...key shortened for example...fdK/ABqxgH7sy3AWgBjfj some description";
Map<String, String> metadata = new HashMap<>();
metadata.put("ssh_authorized_keys", sshPublicKey);
LaunchInstanceResponse response = computeClient.launchInstance(
LaunchInstanceRequest.builder()
.launchInstanceDetails(
LaunchInstanceDetails.builder()
.availabilityDomain(availabilityDomain.getName())
.compartmentId(compartmentId)
.displayName(instanceName)
.imageId(image.getId())
.metadata(metadata)
.shape(shape.getShape())
.subnetId(subnet.getId())
.build())
.build());
return response.getInstance();
}
Then the instance will allow you to SSH to it using the SSH keypair for that public key.