I have a VM instance running in GCE (using the Container Optimised OS) and within that I have an actively running container that is generating json logs. I can see these logs when I navigate to /var/lib/docker/containers/<CONTAINER_IMAGE>/<CONTAINER_IMAGE>-json.log.
In the same Instance, another docker container is running using the image gcr.io/stackdriver-agents/stackdriver-logging-agent:1.8.4. This was automatically set up when I created the VM.
The VM has permission to access to Cloud Logging and the Cloud Logging API is enabled. I have also followed the steps here and added google-logging-enabled to the metadata with a value of true.
When the VM is started, the logging agent seems to spin up correctly and emits a log saying that it is tailing the log file of the docker container I want logs for, however the logs within that file never appear in Google Logging. Below is a screenshot of the logs that do make it to Cloud Logging:
I have had this issue for a while now so would be very grateful for any help with this issue! Thanks in advance (:
In the json logs I was providing, the time format used was not being accepted by fluentd. I've been able to get around that by adding:
reserve_time true
to the filter in the default config. Now the config ignores any nested fields with time specified. I learned of this from here.
Google logging uses a fluentd to catch the logs.
You can reconfugure fluentd to include additional log files.
Create a file /etc/google-fluentd/config.d/my_app_name.conf and put in the file a line in a format path /path/to/my/log. Here are more examples in the fluentd documentation.
You can also specify how the file is going to be parsed: as a single string type field or in more structured way (more convinient when you're looking for something). Again - here's some more info about fluentd's output plugins.
Finally go ahead and read the fluentd documentation to have a better understanding on using this tool.
Related
I am using couchbase server 6.0.2 image from RedHat
https://access.redhat.com/containers/?tab=overview&get-method=registry-tokens#/registry.connect.redhat.com/couchbase/server
in openshift.
The Pod is running but does not react to http://localhost:8091. The Logs show the error shown below.
I have 3 questions:
Why is whoami failing in the entrypoint?
Why isn't the server responding on port 8091?
Does the couchbase server image require root permissions?
It seems the couchbase/server image is expecting to be run as root, then creates its own user couchbase and group couchbase.
At the end it's running an entrypoint script and in there checking if the user running the whole thing, is actually the user couchbase by executing the whois command.
This is not the case if you just run it in openshift, as the container will be run as some "random" unprivileged user.
This leads to a set of consecutive failures:
Here You will find the evaluation that is done in the entrypoint.sh.
Now the whois command is failing since there is not actual user just said random UID. that failing, leaves the first part of the evaluation blank, which will result in a failure.
This is a bug in the couchbase/server image and as such you should, if time allows contribute to fixing by opening an issue against that repo.
I am using Ubuntu 18.04 on Google compute engine.
I am using the steps as shown in Google cloud documentation. My command is
sudo gcloud logging write "logname" "A simple entry"
The entry gets created but under the resource type as 'global'. However i want it to be created under resource name as compute engine.
I have tried setting logname as "projects/campuskudos-980/logs/appengine.googleapis.com%2Fvm.syslog" but that didn't work out
sudo gcloud logging write "logname" "A simple entry"
I want the logs to be created under GCE VM Instance resource type. So I can filter it out on stackdriver
Currently there’s no way to specify the resource type when using gcloud logging write command. As explained in the documentation for simplicity, this command makes several assumptions about the log entry. For instance, it always sets the resource type to global.
Right now, there are two ways to do that:
1- With the gcloud logging write command, use logname and specify something like projects/[PROJECT_ID]/logs/compute.googleapis.com. After that, using advanced filters on Stackdriver Logging as explained in the documentation, you can filter logs using an advanced filter to query all entries inside ‘compute.googleapis.com’.
For e.g.:
logName: (“projects/[PROJECT_ID]/logs/compute.googleapis.com”)
2- Call directly to API as explained in documentation specifying resource type as gce_instance.
Then that entry will appear under GCE VM Instance resource type on Stackdriver Logging.
I'm desperate for help here. I have a compute engine instance that hosts a lot of websites. These are the steps that I took:
Go to Compute Engine > Snapshots and take a snapshot of my instance
Click on the newly created snapshot and click Create Instance.
The new instance has all the configs of the current running instance
Then when I tried to access the new instance via SSH, it wouldn't work. Error message:
"Connection Failed
We are unable to connect to the VM on port 22. Learn more about possible causes of this issue."
Clicking on Learn more gets me to https://cloud.google.com/compute/docs/ssh-in-browser#ssherror
The instance is booting up and sshd is not yet running - Not sure how to check this
The instance is not running sshd - Not sure how to check this either
sshd is listening on a port other than the one you are connecting to - My current instance is having ssh running on port 22 so I guess this is fine?
There is no firewall rule allowing SSH access on the port - Again, my current instance is having ssh running so I don't think it's because of firewall, right?
The firewall rule allowing SSH access is enabled, but is not configured to allow connections from GCP Console services. - Same as above
The instance is shut down - Instance is still running.
Strange thing is if I create a fresh instance from scratch and then do the steps above to clone to a new instance then that new instance can be accessed normally via SSH.
Can anyone show me how to fix this if possible? Or show me how to see logs, check for what went wrong etc as I tried to google but pretty confused with all the jargons or where to find a particular stuff. Sorry for the wall of text. Thanks
**
Edit #1
**: I got technical support from Google. The steps below might help someone else, but not me as when I reached step 7, I waited forever and couldn't get to the login page.
1.) Go to the VM instances page and click on the Instance name of your VM.
2.) Click the Edit button at the top of the page.
3.) Under Custom metadata, click Add item.
4.) Set 'Key' to 'startup-script' and set 'Value' to this script:
#! /bin/bash
useradd -G sudo USERNAME
echo 'USERNAME:PASSWORD' | chpasswd
NOTE: change the value of USERNAME and PASSWORD to the name and password of your choice.
5.) Enable "Enable connecting to serial ports" by checking the box below the SSH button.
6.) Click Save and then click RESET on the top of the page. Wait for some time for the instance to reboot.
7.) Click on 'Connect to serial port' in the page. In the new window, you might need to wait a bit and press on Enter of your keyboard once; then, you should see the login prompt.
8.) Login using the USERNAME and PASSWORD you provided.
Note: Please do not share any of your password and username for your data security.
As those steps above couldn't help me and the Google support representative looked at the log but didn't see anything wrong, she suggested to debug SSH following this guide https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-ssh#use_your_disk_on_a_new_instance which I will do when I have time. Feel like I'm writing an essay. Will keep posted
The troubleshooting steps that you can follow are:
Use the serial console to view your instance logs and check whether the new instance you created from the snapshot failed to start to the appropriate run level where the ssh daemon would get started. If sshd was not started you would not have ssh access to your instance.
You can try restarting the instance if it doesn’t affect production and try to gain ssh access again. Might be that some issue prevented the instance from starting up properly and restarting it could fix it.
You can try creating another VM instance from the snapshot in case the previous instance wasn’t created properly.
If creating a new VM instance from the snapshot doesn’t fix the issue, it might be that the snapshot itself wasn’t created properly. You can read this documentation guide, section Understanding snapshot best practices, and try creating another snapshot and VM instances from it.
I had the same problem and after a lot of searching, I found an answer from user Peripheral from ServerFault that worked for me.
I found the fix for me. A recent update has a known issue where it removes the default gateway from the iptables. To fix it, I have to go to the instance and select Edit. Scroll down, and under Custom Metadata put the following:
key: startup-script
value: route add default gw <gatewayIP> eth0
Save and restart the VM.
Source
All credits to him/her, just want to share to help others find their solution faster.
I had the same issue. I eventually figured that it was because I attached a persistent disk added an entry into the /etc/fstab file. This entry is supposed to automatically mount the attached disk upon restart of the instance.
However, when I created a snapshot of the boot disk, I didn't remove the /etc/fstab entry. So creating a new instance from this snapshot will always cause a boot error as the script tries to mount a disk that is not attached.
This information is present in the documentation
Before I begin, I'd like to start by saying I am completely new to Kafka and am fairly new to Linux, so if this ends up being a ridiculously simple answer, please be kind! :)
The high level idea of what I'm trying to do is use Confluent's Kafka Connect to read from a MySQL database that is having sensor data streamed to it on a minute or sub-minute basis and then use Kafka as an "ETL pipeline" to instantly route that data to a Data Warehouse and/or MongoDB for reporting or even tie in directly to Kafka from our web-app.
I am using Robin Moffatt's series as well as Confluent's JDBC Source Connector Quickstart as my initial guide. As far as where these are hosted, I am using an Amazon RDS MySQL database and a separate AWS EC2 t2.large instance with Ubuntu 16.04.2 to run Kafka Connect.
Using Robin's workflow, I am to the point where I have created the configuration file, but I am not using the json format he uses. I am using the format from the quickstart article.
name=jdbc_source_mysql_4427_Data
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
connection.url=jdbc:mysql://lndbtest.cdveaddpnevv.us-east-2.rds.amazonaws.com:3306/LNDBv1?user=adminRDS&password=*****
table.whitelist=4427_Data
mode=timestamp
timestamp.column.name=TmStamp
validate.non.null=false
topic.prefix=mysql-
And that is saved at:
/etc/kafka-connect-jdbc/kafka-connect-jdbc-source.properties
I then run:
/usr/bin/confluent load jdbc_source_mysql_4427_Data -d /etc/kafka-connect-jdbc/kafka-connect-jdbc-source.properties
and get this error:
{
"error_code": 400,
"message": "Connector configuration is invalid and contains the following 2 error(s):\nInvalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://lndbtest.cdveaddpnevv.us-east-2.rds.amazonaws.com:3306/LNDBv1?user=adminRDS&password=*** for configuration Couldn't open connection to jdbc:mysql://lndbtest.cdveaddpnevv.us-east-2.rds.amazonaws.com:3306/LNDBv1?user=adminRDS&password=***\nInvalid value java.sql.SQLException: No suitable driver found for jdbc:mysql://lndbtest.cdveaddpnevv.us-east-2.rds.amazonaws.com:3306/LNDBv1?user=adminRDS&password=*** for configuration Couldn't open connection to jdbc:mysql://lndbtest.cdveaddpnevv.us-east-2.rds.amazonaws.com:3306/LNDBv1?user=adminRDS&password=***\nYou can also find the above list of errors at the endpoint `/{connectorType}/config/validate`"
}
It seems to be a driver issue. My question at this point is, "Do I need to download the MySQL JDBC driver to my EC2 instance, or should that have been included in the Confluent Platform package?"
Also, does my overall idea sound like a good fit for Kafka Connect?
As I mentioned earlier, I am new to these technologies, but have found the best way to learn something is to jump right in and try to solve a problem. Any ideas and suggestions would be more than welcome. Thank you!
The overall concept makes sense to me. You do need to download the driver and add it to your worker classpath. It isn't packaged for licensing reasons I assume.
As #dawsaw says, you do need to make the MySQL JDBC driver available to the connector.
My observation here would be–given a free hand in all the application and architecture you describe– it would be best to stream from the sensor into Kafka, and then from there Kafka into MySQL, Mongo, webapp, etc.
Streaming into a DB to then stream out of the DB is not a perfect choice, if you have the option.
It's because there's no mysql driver in the distribution of confluent. I think you can solve the problem by downloading a mysql driver jar file, then putting it in confluent/share/java/kafka-connect-jdbc folder and re-run the program.
I am using AWS Elasticbeanstalk for my project.When I uploading new version app it is giving error
Update environment operation is complete, but with errors. For more information, see troubleshooting documentation
My IAM role has AWSElasticbeanstalkFullAccess
Then why I am getting this error.
Thank in advance
I had the same issue.
I did the following and it worked.
From Elastic Beanstalk environment page, I chose to Rebuild the environment (Actions > Rebuild Environment)
Deployed the new application version.
A number of things can result in this, including issues with .ebextensions files.
Troubleshooting tends to be iterative, since logs are frequently not created or inaccessible.
Things to try:
Roll back to a previous application version and verify any changes to .ebextensions are valid
Rebuild the environment (Actions -> Rebuild Envrionment) in the EB console. This frequently enables EBs log snapshot facility to recover such that you can get further insight into what might be amiss.
You can try digging what the error exactly was by getting the logs of the event in the 'eb-engine.log' file. In my case I got this error and follow one answer to solve it. You can try 'eb logs' command also to get detailed info.