- S3 files been reuploaded with additional data appended.
Logstash process all files after the datetime marked by the last run of processing. - Local file system files being copy pasted over with updates from other locations
Files are handled with descriptors, so they can be renamed or moved without affecting logstash tailing the file. However, when they are overwritten, they are considered to be a completely different file and all the data in the file will be reprocessed. - Losing / deleting the since db file used to track progress.
- And I'm sure there can be more.
The solution is actually surprisingly simple. Calculate a hash of the event message and use that as the document id for elasticsearch. Here's a sample config:
input {
#something. anything.
}
filter {
mutate {
add_field => ["logstash_checksum", "%{message}"]
}
anonymize {
fields => ["logstash_checksum"]
algorithm => "MD5"
key => "a"
}
}
output {
elasticsearch {
host => "127.0.0.1"
document_id => '%{logstash_checksum}'
}
}
Note that this works best with events that already contains the timestamp such as web server access log from IIS, apache, etc, load balancer logs, etc. It would be a bad idea to apply this technique to stream based log entries that rely on timestamp at the time of injestion by logstash.
This comment has been removed by the author.
ReplyDeleteThis looks perfect, it'll write the message to a document_id that already exists, so you don't get a duplicate entry. One thing, is there a way to check if the document_id exists in the index before writing to the output?
ReplyDeleteExtremely late reply, but not that I'm aware of.
DeleteAccording to Stanford Medical, It is in fact the one and ONLY reason women in this country live 10 years more and weigh an average of 42 pounds lighter than we do.
ReplyDelete(And really, it is not about genetics or some hard exercise and really, EVERYTHING related to "how" they are eating.)
P.S, What I said is "HOW", not "WHAT"...
CLICK on this link to find out if this brief quiz can help you find out your true weight loss potential