I am trying for a solution to show the S3 BucketSize information in Kibana using ELK Stack. I need S3 BucketSize information stored in S3 Bucket and then read the files through Logstash. Hope this would solve viewing historical data. Since I am new to ELK, I am petty not sure how to get this work.
I have tried below things to work, but did not go well...
Using ncdu-s3 tool I have S3 BucketSize details in JSON format. I am trying to put that into Logstash and it is throwing error as below.
The JSON file format is
[1,0,{"timestamp":1469370986,"progver":"0.1","progname":"ncdu-s3"},
[{"name":"s3:\/\/BucketName\/FolderName1"},
[{"name":"FolderName1"},
{"dsize":107738,"name":"File1.rar"},
{"dsize":532480,"name":"File2.rar"},
[{"name":"FolderName2"},
{"dsize":108890,"name":"File3.rar"}]]]
I use the below command
curl -XPOST 'http://localhost:9200/test/test/1' -d #/home/ubuntu/filename.json
ERROR:
{"error":"MapperParsingException[Malformed content, must start with an object]","status":400}
I think I need to format the JSON file in order to work... Do anyone suggest a good way to go?
Related
Afternoon
The Marketing Team have lost alot of data at my current place of work.
I am running PM2 in putty
I am currently using pm2 logs 'applicationname' -1000 lines to get the data I need
This works
But is there a good way to export the data
perhaps a csv or .json
?
You should be able to find logs in
$HOME/.pm2/logs/*
and then it's easy to convert it to anything you want for xample: .log file read and convert to json
I'm building an architecture using boto3, and I hope to dump the data in JSON format from API to S3. What blocks in my way right now is first, firehose does NOT support JSON; my workaround right now is not compressing them but it's still different from a JSON file. But I still want to see a better choice to make the files more compatible.
And second, the file names can't be customized. All the data I collected will be eventually converted onto Athena for the query, so can boto3 do the naming?
Answering a couple of the questions you have. Firstly if you stream JSON into Firehose it will write JSON to S3. JSON is the file data structure and compression is the file type. Compressing JSON doesn't make it something else. You'll just need to decompress it before consuming it.
RE: file naming, you shouldn't care about that. Let the system name it whatever. If you define the Athena table with the location, you'll be able to query it. When new files are added, you'll be able to query them immediately.
Here is an AWS tutorial that walks you through this process. JSON stream to S3 with Athena query.
I am trying to send csv file data throguh producer into kafka topic then on the consumer side i am listening the event.
Producer is a command line. I am sending the csv file using below command -
kafka-console-producer.bat --broker-list localhost:9092 --topic freshTopic < E:\csv\sample.csv
I am sucessfully listen the event on consumer side as well.
Now I have to save that data in some database like elasticsearch. For this i have to convert csv records into DataModel. I read below tutorial for it but not able to understand that how can i write this in java. So Can anyone help me here how can i convert csv file data into datamodel? Thanks in advance.
Csv data streaming using kafka
What you wrote will work fine to get data into Kafka. There are lots of ways to get data into Elasticsearch (which is not a database) from there...
You don't need Avro, as JSON will work too, but confluent-schema-registry doesn't handle conversion from CSV, and Kafka has no "DataModel" class.
Assuming you want Avro to get it into Elasticsearch as individual fields, then
You could use Kafka Connect spooldir source instead of the console producer, and that would get you further along, and then you can run Elasticsearch sink connector from there
Use something to parse the CSV to Avro, as the link you have shows (doesn't have to be Python, KSQL could work too)
If you are fine with JSON, then Logstash would work as well
I'm working on an ETL job that will ingest JSON files into a RDS staging table. The crawler I've configured classifies JSON files without issue as long as they are under 1MB in size. If I minify a file (instead of pretty print) it will classify the file without issue if the result is under 1MB.
I'm having trouble coming up with a workaround. I tried converting the JSON to BSON or GZIPing the JSON file but it is still classified as UNKNOWN.
Has anyone else run into this issue? Is there a better way to do this?
I have two json files which are 42mb and 16mb, partitioned on S3 as path:
s3://bucket/stg/year/month/_0.json
s3://bucket/stg/year/month/_1.json
I had the same problem as you, crawler classification as UNKNOWN.
I were able to solved it:
You must create custom classifier with jsonPath as "$[*]" then create new crawler with the classifier.
Run your new crawler with the data on S3 and proper schema will be created.
DO NOT update your current crawler with the classifier as it won't apply the change, I don't know why, maybe because of classifier versioning AWS mentioned in their documents. Create new crawler make them work
As mentioned in
https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html#custom-classifier-json
When you run a crawler using the built-in JSON classifier, the entire file is used to define the schema. Because you don’t specify a JSON path, the crawler treats the data as one object, that is, just an array.
That is something which Dung also pointed out in his answer.
Please also note that file encoding can lead to JSON being classified as UNKNOWN. Please try and re-encode the file as UTF-8.
I need to upload data in MapQuest DMv2 through a CSV file. After going through the documentation I found following syntax of uploading data-
http://www.mapquestapi.com/datamanager/v2/upload-data?key=[APPLICATION_KEY]&inFormat=json&json={"clientId": "[CLIENT_ID]","password": "[REGISTRY_PASSWORD]","tableName": "mqap.[CLIENT_ID]_[TABLENAME]","append":true,"rows":[[{"name":"[NAME]","value":"[VALUE]"},...],...]}
This is fair enough if I want to put individual rows in in rows[], but there is no mention of the procedure to follow to upload data through a CSV file. It has been clearly mentioned that "CSV, KML, and zipped Shapefile uploads are supported ". How can I achieve it though this Data Manager API service?
Use a multipart post to upload the csv instead of the rows. You can see it working here.
I used the CURL program to accomplish that. Here is an example of a CURL.exe command line. You can call it from a batch file, or in my case, from a C# program.
curl.exe -F clientId=XXXXX -F password=XXXXX -F tableName=mqap.XXXXX_xxxxx -F append=false --referer http://www.mapquest.com -F "file=#C:\\file.csv" "http://www.mapquestapi.com/datamanager/v2/upload-data?key=KEY&ambiguities=ignore"