Having implemented the ElasticSearch cluster at where I'm working myself single-handedly, I'll go over a few points of interest. (The design is pretty straight forward for people who have worked with ElasticSearch. If you want to know how to set it up, RTFM.)
- First of all, you want to use a single security group across all your data nodes and master nodes for the purpose of discovery. This "shields" your nodes from other nodes or malicious attempt to join the nodes and is part of your security. Open port 9300 to and from this security group itself.
- A split of "core" data nodes and "spot" data nodes. Basically you have a small set of "core" data nodes that guarantee the safety of the data. A set of spot instances then are added to the cluster to boost the performance.
- Set rack_id and cluster.routing.allocation.awareness.attributes!!!
I don't like stating the obvious, but THIS IS CRITICAL. Set the "core" nodes to use the same rack_id while the spot instances to use another. This will force the replication to store at least 1 complete set of data on "core" nodes. Also, install kopf plugin and MAKE SURE THAT IS THE CASE. Spot instances are just that, spot instances. They can all disappear in a blink of an eye. - Your shard and replication count directly affects the number of maximum number of machines you can "Auto Scale" to. Self explanatory.
- You can update the instance size of the servers specified in the launch configuration and terminate the instances one by one to "vertically" scale the servers should you run into horizontal scaling limit due to shard and replication count limit.
- This is an incredibly economical set up. Taking r3.2xlarge instances for example, even 3 year heavy reserve cost $191 / month while spot instances cost you $60 / month. It is the ability to leverage spot instances that makes all managed hosting of ElasticSearch look like day light robbery.
For $4k / month, you can easily scale all the way up to 2TB+ memory and 266+ cores and 10TB of SSD for 30k iops (assuming 50% of $4k monthly fee spent on spot instances and 25% on SSD.). You get 60gb ram and 780gb storage from Bonsai btw. - Setup your master nodes on dedicated separate set of Auto Scaling Groups and have at least 2 servers in it. This is to prevent the cluster from falling apart should any of those master nodes in the Auto Scaling Group gets recycled.
Another point of interest with the necessity of these dedicated master nodes is that because of the volatility of the core data nodes and spot instances since they are on Auto Scaling Groups and the frequency at which the shard and cluster states are reshuffled (again, due to ASG).
They don't need to be particularly beefy. - Front up the master nodes with an Elastic Load Balancer
- Open port 9200 for HTTP. User queries will then be evenly distributed across the master nodes. You can use a separate ASG for a set of none data nodes specifically used for query purpose if there is a high volume of query traffic. Optionally set up SSL here.
- Open port 9300 for TCP. Logstash instances can then join the cluster using this endpoint as the "discovery point". Otherwise you will not have a specific IP address you can set in logstash ElasticSearch output set in node protocol.
- Configure a S3 bucket to store snapshot and restore from using the master nodes as the gateway.
The cluster I'm currently running has 2x t2.small for masters, 2x r3.large for "core" nodes and 4x r3.large for spot instances. It managed to hold 200gb of index and 260 million records and growing without breaking a sweat. It will have a long long way to go before it hits its scaling limit of 12x r3.8xlarge. Should be good for upwards of many terabytes of index and billions of rows. Fingers crossed.
This comment has been removed by the author.
ReplyDeleteJohn, great info. If you are interested in some well paid contracting work - please contact me at http://dreamingwell.com/contact. I'm in need of an expert in ElasticSearch on AWS - specifically for planning a large scale roll out.
ReplyDeleteGreat post! I'm doing something similar, and I'm wondering whether you actually utilize any ASG policy for the spot instance group? Or is it a fixed size group the same way as core ASG?
ReplyDeleteNice job. Can you tell us what tool have you used for designing the architecture picture?
ReplyDeleteAdobe Illustrator. Hahaha
DeleteThis comment has been removed by a blog administrator.
ReplyDeletehi, Which tool do you use to make this 3D diagram ? i really need it.
ReplyDeleteI know what you mean, unfortunately I used illustrator and the free symbol pack. You'll find that if you just google around a bit.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteThe Blog gave me idea about ElasticSearch on AWS with AutoScaling Groups and Spot Instances my sincere thanks for sharing this post
ReplyDeleteAWS Training in BTM Layout || AWS Training in Marathahalli
laravel interview questions
ReplyDeleteaem interview questions salesforce interview questions
oops abab interview questions
itil interview questions informatica interview questions
extjs interview questions
sap bi interview questions
Super information thanks AWS Online Training Hyderabad
ReplyDeleteHi There,
ReplyDeleteThree cheers to you ! Hooray!!! I feel like I hit the jackpot on ElasticSearch on AWS with AutoScaling Groups and Spot Instances.
I have earlier ask for help but I still have problem with installing Mageia ,
Why it is so difficult.? I have a laptop HP where I want to install some Linus to play with but I can not find an installation that I can copy to a USB or a CD så I can install in this Laptop.
This Laptop has not internet connection yet and have a windows XP OS until now but I want to convert it to Linux.
It was cool to see your article pop up in my google search for the process yesterday. Great Guide.
Keep up the good work!
Shukran,
Radhey
Thanks for blog really informative. Hope i will find more interesting post from you
ReplyDeletedevops training in bangalore
devops training institutes in bangalore
devops certification course in bangalore
Your info is really amazing with impressive content..Excellent blog with informative concept. Really I feel happy to see this useful blog, Thanks for sharing such a nice blog..
ReplyDeleteIf you are looking for any Data science Related information please visit our website Data science courses in Pune page!
Really I feel happy to see this useful blog...I would like to thank for the efforts you have made in writing this post...Awaiting for your Feature posts...Big Thanks
ReplyDeleteBEST JAVA TRAINING IN CHENNAI WITH PLACEMENT
Java training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery
ROWE ROWE
ReplyDeleteROWE ROWE
ROWE ROWE
ROWE ROWE
ROWE ROWE
ROWE ROWE
ROWE ROWE
ROWE ROWE
It was wonderfull reading your article. Great writing style # BOOST Your GOOGLE RANKING.It’s Your Time To Be On #1st Page Our Motive is not just to create links but to get them indexed as will Increase Domain Authority (DA).We’re on a mission to increase DA PA of your domain High Quality Backlink Building Service 1000 Backlink at cheapest 50 High Quality Backlinks for just 50 INR 2000 Backlink at cheapest Captured Current News
ReplyDeleteThe Kashmir Files Full Movie Download Filmyzilla
ReplyDeletebatman movie download in hindi filmyzilla
Wonder Woman 1984 Full Movie Download in Hindi Filmyzilla
Silence Movie Download in Hindi Filmyzilla
RRR Movie Download in Hindi 480p Filmyzilla
The Matrix Resurrections Movie Download Filmyzilla
kgf chapter 2 full movie in hindi download filmyzilla
ümraniye lg klima servisi
ReplyDeleteataşehir daikin klima servisi
maltepe toshiba klima servisi
kadıköy toshiba klima servisi
maltepe beko klima servisi
kadıköy beko klima servisi
ümraniye daikin klima servisi
beykoz toshiba klima servisi
üsküdar toshiba klima servisi