Will logstash insert duplicate documents when restarting after a crash - duplicates

In the scenario where Logstash crashes or the Elasticsearch server is not reachable, I might have to restart Logstash and begin processing a file that was half-way inserted into Elasticsearch.
Does Logstash remember which line in the log file was last processed and pick up where it left off, or will it insert duplicate documents?
I suspect that the _id could be a hash generated by the file and line number to avoid duplicates but I am not sure.

The Elastic products that read files (logstash, filebeat, or the older logstash-forwarder) remember where they are in the files that they're reading.
If the pipeline backs up, each component will then stop sending more logs until the congestion is removed.
There will be logs "stuck" in the pipeline. IIRC, the logstash queue is 20 events. If you kill logstash before it can write those, you'll lose those events. They were working on making that better for logstash2, but it didn't make it in.
So, the risk is more of missing a few documents than getting duplicates.

Related

Using Consul for dynamic configuration management

I am working on designing a little project where I need to use Consul to manage application configuration in a dynamic way so that all my app machines can get the configuration at the same time without any inconsistency issue. We are using Consul already for service discovery purpose so I was reading more about it and it looks like they have a Key/Value store which I can use to manage my configurations.
All our configurations are json file so we make a zip file with all our json config files in it and store the reference from where you can download this zip file in a particular key in Consul Key/Value store. And all our app machines need to download this zip file from that reference (mentioned in a key in Consul) and store it on disk on each app machine. Now I need all app machines to switch to this new config at the same time approximately to avoid any inconsistency issue.
Let's say I have 10 app machines and all these 10 machines needs to download zip file which has all my configs and then switch to new configs at the same time atomically to avoid any inconsistency (since they are taking traffic). Below are the steps I came up with but I am confuse on how loading new files in memory along with switch to new configs will work:
All 10 machines are already up and running with default config files as of now which is also there on the disk.
Some outside process will update the key in my consul key/value store with latest zip file reference.
All the 10 machines have a watch on that key so once someone updates the value of the key, watch will be triggered and then all those 10 machines will download the zip file onto the disk and uncompress it to get all the config files.
(..)
(..)
(..)
Now this is where I am confuse on how remaining steps should work.
How apps should load these config files in memory and then switch all at same time?
Do I need to use leadership election with consul or anything else to achieve any of these things?
What will be the logic around this since all 10 apps are already running with default configs in memory (which is also stored on disk). Do we need two separate directories one with default and other for new configs and then work with these two directories?
Let's say if this is the node I have in Consul just a random design (could be wrong here) -
{"path":"path-to-new-config", "machines":"ip1:ip2:ip3:ip4:ip5:ip6:ip7:ip8:ip9:ip10", ...}
where path will have new zip file reference and machines could be a key here where I can have list of all machines so now I can put each machine ip address as soon as they have downloaded the file successfully in that key? And once machines key list has size of 10 then I can say we are ready to switch? If yes, then how can I atomically update machines key in that node? Maybe this logic is wrong here but I just wanted to throw out something. And also need to clean up all those machines list after switch since for the next config update I need to do similar exercise.
Can someone outline the logic on how can I efficiently manage configuration on all my app machines dynamically and also avoid inconsistency issue at the same time? Maybe I need one more node as status which can have details about each machine config, when it downloaded, when it switched and other details?
I can think of several possible solutions, depending on your scenario.
The simplest solution is not to store your config in memory and files at all, just store the config directly in the consul kv store. And I'm not talking about a single key that maps to the entire json (I'm assuming your json is big, otherwise you wouldn't zip it), but extracting smaller key/value sets from the json (this way you won't need to pull the whole thing every time you make a query to consul).
If you get the config directly from consul, your consistency guarantees match consul consistency guarantees. I'm guessing you're worried about performance if you lose your in-memory config, that's something you need to measure. If you can tolerate the performance loss, though, this will save you a lot of pain.
If performance is a problem here, a variation on this might be to use fsconsul. With this, you'll still extract your json into multiple key/value sets in consul, and then fsconsul will map that to files for your apps.
If that's off the table, then the question is how much inconsistencies are you willing to tolerate.
If you can stand a few seconds of inconsistencies, your best bet might be to put a TTL (time-to-live) on your in-memory config. You'll still have the watch on consul but you combine it with evicting your in-memory cache every few seconds, as a fallback in case the watch fails (or stalls) for some reason. This should give you a worst-case few seconds inconsistencies (depending on the value you set for your TTL), but normal case (I think) should be fast.
If that's not acceptable (does downloading the zip take a lot of time, maybe?), you can go down the route you mentioned. To update a value atomically you can use their cas (check-and-set) operation. It will give you an error if an update had happened between the time you sent the request and the time consul tried to apply it. Then you need to pull the list of machines, and apply your change again and retry (until it succeeds).
I don't see why you would need 2 directories, but maybe I'm misunderstanding the question: when your app starts, before you do anything else, you check if there's a new config and if there is you download it and load it to memory. So you shouldn't have a "default config" if you want to be consistent. After you downloaded the config on startup, you're up and alive. When your watch signals a key change you can download the config to directly override your old config. This is assuming you're running the watch triggered code on a single thread, so you're not going to be downloading the file multiple times in parallel. If the download failed, it's not like you're going to load the corrupt file to your memory. And if you crashed mid-download, then you'll download again on startup, so should be fine.

loopback handle long requests/processes

Suppose I receive a big csv file with lots of data in it, and the loopback server must parse all this data after the file is loaded, run some processes in it (Ex. Create user accounts and do some other registrations related to the account, or just create a database entry for each row in the file) and say this file has possibly from 10,000 to 3'000,000 entries (I'm using MySQL btw, maybe there is a better option for that too), it takes a lot of time to process all that, is there a "neat" way to handle that? right now what I'm doing is, after I get the file, I return the response to the user in my remote method callback(null,{message:'got file, server still working'}); and continue to process in the background (in the same remote method code line, I just don't callback after done because I already did) and then I run a 500ms timer interval in the front-end to request the process status in a different endpoint (I save the progress percentage in a field's row on the database, for this endpoint to request), is this the way to do this? or is there a better option? I already run mysql queries in groups of 10,000 each commit and I disable foreign key checking too (I'm using mysql connector query execution directly). Thanks in advance :)

mysql huge operations

I am currently importing a huge CSV file from my iPhone to a rails server. In this case, the server will parse the data and then start inserting rows of data into the database. The CSV file is fairly large and would take a lot time for the operation to end.
Since I am doing this asynchronously, my iPhone is then able to go to other views and do other stuff.
However, when it requests another query in another table.. this will HANG because the first operation is still trying to insert the CSV's information into the database.
Is there a way to resolve this type of issue?
As long as the phone doesn't care when the database insert is complete, you might want to try storing the CSV file in a tmp directory on your server and then have a script write from that file to the database. Or simply store it in memory. That way, once the phone has posted the CSV file, it can move on to other things while the script handles the database inserts asynchronously. And yes, #Barmar is right about using an InnoDB engine rather than MyISAM (which may be default in some configurations).
Or, you might want to consider enabling "low-priority updates" which will delay write calls until all pending read calls have finished. See this article about MySQL table locking. (I'm not sure what exactly you say is hanging: the update, or reads while performing the update…)
Regardless, if you are posting the data asynchronously from your phone (i.e., not from the UI thread), it shouldn't be an issue as long as you don't try to use more than the maximum number of concurrent HTTP connections.

Choosing what to log in mysql binary logs

I'm running the mysql-server-5.0.95-1.el5_7.1.
I'm getting some strange behaviours in the database, and I'd like to store some actions.
Reading the Mysql Reference Manual, I set the binary-logs and started logging all what happened in the database.
But the log files are too big, and I'd like to know if there is a way to configure the binary log to store just some actions (like ALTER TABLE, or DELETE, or CREATE INDEX) instead logging all actions.
If yes, how can I do it.
Rgds.
The point of the binary logs is to record state changes on the server. If it changes state, it gets recorded.

How to recover from Solr deleted index files?

When I delete solr's index files on disk, (found in /solr/data/index and solr/data/spellchecker), solr throws an exception whenever I try to make a request to it:
java.lang.RuntimeException: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.NIOFSDirectory#/…/solr/data/index:
The only way I've found to recover from this is to “seed” the data directory with the index files from elsewhere. It doesn't really matter where it seems. Once I do this, I can run a query to reload the schema and regenerate the index. Is this how this is supposed to work? It seems like there should be a way to tell solr to regenerate those files from scratch. Maybe I'm just mistaken in my assumption that these files are not part of the application itself (kind of implied by the name “data”)?
Solr will throw that exception at startup if the index directory exists but is empty. However if you delete the directory, Solr will create it and the empty segments files at startup.
If you are using sunspot solr on rails, sunspot can reindex all the data from the database into solr. However, solr standalone would not know where to pull the data to reindex. You would need a backup of the data.
The problem may lie with the segments file if you delete the index . the files are physically deleted but are present in the ram or cache of the solr . Avoid deleting files directly from solr index files physically . use delete query to delete the index , doing this would alter the segments of the index and you will not have to restart the solr
regards
Rajat
Exception FileNotFoundException signals that an attempt to open the file denoted by a specified pathname has failed. So either your index is invalid or corrupted.
NIOFSDirectory class is used for reading and writing index files. The directory is created at the named location if it does not yet exist.
So you should probably:
Delete the index directory or restore data from backups.
Restart the server (or at least the reload the config).