logstash limit the size of output file - output

I am running logstash 6.8.11, using logstash to retrieve logs from a filebeat.
The configuration is quite simple:
input {
beats {
port => hidden
}
}
output {
if [beat][name] =~ "anonymous*" {
file {
path => "/tmp/test-%{+YYYY-MM-DD-HH}"
codec => json_lines
}
}}
Is there any way I can limit the size of the output file to a specific size (let's say 10MB, or 50MB)

The file output does not support log rotation based on size itself. There is a long standing open issue that directly speaks to that here (and related issues here and here). The response from the elastic developers was that you can use existing tools like logrotate to do rotation.

Related

JSZip read downloaded data (Angular 2)

I am trying to use JSZip to unzip a JSON file but due to my lack of understanding how JSZip works I get the response in a format that I do not know how to use.
So far this is my code:
this.rest.getFile(this.stlLocation).subscribe(
data => {
let JSONFIle = new JSZIP();
JSONFIle.file(data.url, data._body, {binary : true, compression : 'DEFLATE'});
console.log(JSONFIle);
},
err => {
this.msgs.push({severity: 'error', summary: 'Error Message', detail: err});
}
);
So I download a file using an angular 2 service and I use an observable to get the response. When the data is received I finally call JSZip and try to unzip the file but the result of the operation is an intricate object with my data scattered all over the place and buried inside several layers. All I want is the unzipped JSON file that I can open and process.
Thank you for your help,
Dino
after a bit of reading I have realized I was going on the wrong path. If you are downloading the file to a browser, you shouldn't have to do anything. Browsers add the Accept-Encoding: 'deflate' header automatically; it is both unnecessary and not good practice to do this at a DOM/JS level. If you are using NGINX the following link may help you out:
NGINX COMPRESSION AND DECOMPRESSION

Google Cloud Vision API 'Request Admission Denied'

I am new to Google Cloud Vision API. I am doing OCR on images primarily for bills and receipts.
For a few images it is working fine, but when I try some other images it gives me this error:
Error: { [Error: Request Admission Denied.]
code: 400,
errors:
[ { message: 'Request Admission Denied.',
domain: 'global',
reason: 'badRequest' } ] }
This is my code:
// construct parameters
const req = new vision.Request({
image: new vision.Image('./uploads/reciept.png'),
features: [
new vision.Feature('TEXT_DETECTION', 1)
]
})
vision.annotate(req).then((res) => {
// handling response
//console.log(res.responses[0].textAnnotations);
var desc=res.responses[0].textAnnotations;
var descarr=[];
for (i = 0; i < desc.length; i++) {
descarr.push(desc[i].description);
}
Ran into this problem as well. It was an image size issue. I don't know what the hard limit is. 4MB worked, but 9MB didn't, it's somewhere in between there.
I was able to work around this by saving the image as another format and submitting that instead; there was something "wrong" (or at least, unexpected by Google) with the image file itself. Not sure about image manipulation in the language you're using (js?), but in Python it was as simple as:
from PIL import Image
bad_image = Image.open(open('failure.jpg', 'rb'))
bad_image.save(open('success.png', 'wb'))
The Best Practices doc says that the image file size should not exceed 4 MB. Based on this responses above, this could be the problem.
Interesting. I ran into the same problem today, using Google's Java client. The way I read James' answer, he had a JPEG file that failed, but worked as a PNG file. In my case, I had a PNG file that failed, but worked as a JPEG.
I had concluded it was a size limitation, as I'd expect JPEGs to be typically smaller than PNGs; however, James' experience suggests otherwise.
I couldn't find any relevant documentation in Google's Javadocs. Since the response is a 400 error, perhaps the Java library is not encoding the image buffer correctly.

How could I read 4k csv file with phpexcel

I can read a few csv rows using phpexcel, but when I try to read 4k rows an exception is thrown and I get the following message:
Notice: Undefined index:
Error loading file "": Could not open for reading! File does not exist.
$nomOrigine = $_FILES["monfichier"]["name"];
$elementsChemin = pathinfo($nomOrigine);
try {
$objPHPExcel = PHPExcel_IOFactory::load($nomOrigine);
$objWorksheet = $objPHPExcel->getActiveSheet();
} catch(Exception $e) {
die(_'Error loading file "'.pathinfo($nomOrigine,PATHINFO_BASENAME).'": '.$e->getMessage()_);
}
When posting data, if post_max_size is smaller than the file (or other data) that is being uploaded, the file will not be received on the server and you may get (as you did) an error that doesn't seem to make sense.
Increase your post_max_size to the largest value that you think you will need, but if you need a very large size then you should not be doing file uploads via a form.
I say that because with, for example, a 100MB file, the upload time will be very long, and (last I checked) user feedback is not very good, and if there is an error on client or server, then the upload will have to be restarted completely. You can do research on other upload methods if you think you'll need such sizable uploads.
I imagine 20M (as you mentioned in comment) will be fine, but that's really up to you and what size of files you expect to be dealing with.

Importing local json file using d3.json does not work

I try to import a local .json-file using d3.json().
The file filename.json is stored in the same folder as my html file.
Yet the (json)-parameter is null.
d3.json("filename.json", function(json) {
root = json;
root.x0 = h / 2;
root.y0 = 0;});
. . .
}
My code is basically the same as in this d3.js example
If you're running in a browser, you cannot load local files.
But it's fairly easy to run a dev server, on the commandline, simply cd into the directory with your files, then:
python -m SimpleHTTPServer
(or python -m http.server using python 3)
Now in your browser, go to localhost:3000 (or :8000 or whatever is shown on the commandline).
The following used to work in older versions of d3:
var json = {"my": "json"};
d3.json(json, function(json) {
root = json;
root.x0 = h / 2;
root.y0 = 0;
});
In version d3.v5, you should do it as
d3.json("file.json").then(function(data){ console.log(data)});
Similarly, with csv and other file formats.
You can find more details at https://github.com/d3/d3/blob/master/CHANGES.md
Adding to the previous answers it's simpler to use an HTTP server provided by most Linux/ Mac machines (just by having python installed).
Run the following command in the root of your project
python -m SimpleHTTPServer
Then instead of accessing file://.....index.html open your browser on http://localhost:8080 or the port provided by running the server. This way will make the browser fetch all the files in your project without being blocked.
http://bl.ocks.org/eyaler/10586116
Refer to this code, this is reading from a file and creating a graph.
I also had the same problem, but later I figured out that the problem was in the json file I was using(an extra comma). If you are getting null here try printing the error you are getting, like this may be.
d3.json("filename.json", function(error, graph) {
alert(error)
})
This is working in firefox, in chrome somehow its not printing the error.
Loading a local csv or json file with (d3)js is not safe to do. They prevent you from doing it. There are some solutions to get it working though. The following line basically does not work (csv or json) because it is a local import:
d3.csv("path_to_your_csv", function(data) {console.log(data) });
Solution 1:
Disable the security in your browser
Different browsers have different security setting that you can disable. This solution can work and you can load your files. Disabling is however not advisable. It will make you vulnerable for all kind of threads. On the other hand, who is going to use your software if you tell them to manually disable the security?
Disable the security in Chrome:
--disable-web-security
--allow-file-access-from-files
Solution 2:
Load your csv/json file from a website.
This may seem like a weird solution but it will work. It is an easy fix but can be unpractical though. See here for an example. Check out the page-source. This is the idea:
d3.csv("https://path_to_your_csv", function(data) {console.log(data) });
Solution 3:
Start you own browser, with e.g. Python.
Such a browser does not include all kind of security checks. This may be a solution when you experiment with your code on your own machine. In many cases, this may not be the solution when you have users. This example will serve HTTP on port 8888 unless it is already taken:
python -m http.server 8888
python -m SimpleHTTPServer 8888 &
Open the (Chrome) browser address bar and type the underneath. This will open the index.html. In case you have a different name, type the path to that local HTML page.
localhost:8888
Solution 4:
Use local-host and CORS
You may can use local-host and CORS but the approach is not user-friendly coz setting up this, may not be so straightforward.
Solution 5:
Embed your data in the HTML file
I like this solution the most. Instead of loading your csv, you can write a script that embeds your data directly in the html. This will allow users use their favorite browser, and there are no security issues. This solution may not be so elegant because your html file can grow very hard depending on your data but it will work though. See here for an example. Check out the page-source.
Remove this line:
d3.csv("path_to_your_csv", function(data) { })
Replace with this:
var data =
[
$DATA_COMES_HERE$
]
You can't readily read local files, at least not in Chrome, and possibly not in other browsers either.
The simplest workaround is to simply include your JSON data in your script file and then simply get rid of your d3.json call and keep the code in the callback you pass to it.
Your code would then look like this:
json = { ... };
root = json;
root.x0 = h / 2;
root.y0 = 0;
...
I have used this
d3.json("graph.json", function(error, xyz) {
if (error) throw error;
// the rest of my d3 graph code here
}
so you can refer to your json file by using the variable xyz and graph is the name of my local json file
Use resource as local variable
var filename = {x0:0,y0:0};
//you can change different name for the function than json
d3.json = (x,cb)=>cb.call(null,x);
d3.json(filename, function(json) {
root = json;
root.x0 = h / 2;
root.y0 = 0;});
//...
}

SOLR - Best approach to import 20 million documents from csv file

My current task on hand is to figure out the best approach to load millions of documents in solr.
The data file is an export from DB in csv format.
Currently, I am thinking about splitting the file into smaller files and having a script while post this smaller ones using curl.
I have noticed that if u post high amount of data, most of the time the request times out.
I am looking into Data importer and it seems like a good option
Any others ideas highly appreciated
Thanks
Unless a database is already part of your solution, I wouldn't add additional complexity to your solution. Quoting the SOLR FAQ it's your servlet container that is issuing the session time-out.
As I see it, you have a couple of options (In my order of preference):
Increase container timeout
Increase the container timeout. ("maxIdleTime" parameter, if you're using the embedded Jetty instance).
I'm assuming you only occasionally index such large files? Increasing the time-out temporarily might just be simplest option.
Split the file
Here's the simple unix script that will do the job (Splitting the file in 500,000 line chunks):
split -d -l 500000 data.csv split_files.
for file in `ls split_files.*`
do
curl 'http://localhost:8983/solr/update/csv?fieldnames=id,name,category&commit=true' -H 'Content-type:text/plain; charset=utf-8' --data-binary #$file
done
Parse the file and load in chunks
The following groovy script uses opencsv and solrj to parse the CSV file and commit changes to Solr every 500,000 lines.
import au.com.bytecode.opencsv.CSVReader
import org.apache.solr.client.solrj.SolrServer
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
import org.apache.solr.common.SolrInputDocument
#Grapes([
#Grab(group='net.sf.opencsv', module='opencsv', version='2.3'),
#Grab(group='org.apache.solr', module='solr-solrj', version='3.5.0'),
#Grab(group='ch.qos.logback', module='logback-classic', version='1.0.0'),
])
SolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr/");
new File("data.csv").withReader { reader ->
CSVReader csv = new CSVReader(reader)
String[] result
Integer count = 1
Integer chunkSize = 500000
while (result = csv.readNext()) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", result[0])
doc.addField("name_s", result[1])
doc.addField("category_s", result[2])
server.add(doc)
if (count.mod(chunkSize) == 0) {
server.commit()
}
count++
}
server.commit()
}
In SOLR 4.0 (currently in BETA), CSV's from a local directory can be imported directly using the UpdateHandler. Modifying the example from the SOLR Wiki
curl http://localhost:8983/solr/update?stream.file=exampledocs/books.csv&stream.contentType=text/csv;charset=utf-8
And this streams the file from the local location, so no need to chunk it up and POST it via HTTP.
Above answers have explained really well the ingestion strategies from single machine.
Few more options if you have big data infrastructure in place and want to implement distributed data ingestion pipeline.
Use sqoop to bring data to hadoop or place your csv file manually in hadoop.
Use one of below connector to ingest data:
hive- solr connector, spark- solr connector.
PS:
Make sure no firewall blocks connectivity between client nodes and solr/solrcloud nodes.
Choose right directory factory for data ingestion, if near real time search is not required then use StandardDirectoryFactory.
If you get below exception in client logs during ingestion then tune autoCommit and autoSoftCommit configuration in solrconfig.xml file.
SolrServerException: No live SolrServers available to handle this
request
Definitely just load these into a normal database first. There's all sorts of tools for dealing with CSVs (for example, postgres' COPY), so it should be easy. Using Data Import Handler is also pretty simple, so this seems like the most friction-free way to load your data. This method will also be faster since you won't have unnecessary network/HTTP overhead.
The reference guide says ConcurrentUpdateSolrServer could/should be used for bulk updates.
Javadocs are somewhat incorrect (v 3.6.2, v 4.7.0):
ConcurrentUpdateSolrServer buffers all added documents and writes them into open HTTP connections.
It doesn't buffer indefinitely, but up to int queueSize, which is a constructor parameter.