Saturday 8 November 2014

ElasticSearch on AWS with AutoScaling Groups and Spot Instances

One of the most powerful feature of ElasticSearch is its ability to scale horizontally, in many different ways; routing, sharding, and time / pattern based index creation and query. It is a robust storage solution that is capable of starting out small and cheap and then grow and scale as the load and volume rises.

Having implemented the ElasticSearch cluster at where I'm working myself single-handedly, I'll go over a few points of interest. (The design is pretty straight forward for people who have worked with ElasticSearch. If you want to know how to set it up, RTFM.)


As the diagram is clearly marked:
  1. First of all, you want to use a single security group across all your data nodes and master nodes for the purpose of discovery. This "shields" your nodes from other nodes or malicious attempt to join the nodes and is part of your security. Open port 9300 to and from this security group itself.
  2. A split of "core" data nodes and "spot" data nodes. Basically you have a small set of "core" data nodes that guarantee the safety of the data. A set of spot instances then are added to the cluster to boost the performance.
    • Set rack_id and cluster.routing.allocation.awareness.attributes!!!
      I don't like stating the obvious, but THIS IS CRITICAL. Set the "core" nodes to use the same rack_id while the spot instances to use another. This will force the replication to store at least 1 complete set of data on "core" nodes. Also, install kopf plugin and MAKE SURE THAT IS THE CASE. Spot instances are just that, spot instances. They can all disappear in a blink of an eye.
    • Your shard and replication count directly affects the number of maximum number of machines you can "Auto Scale" to. Self explanatory.
    • You can update the instance size of the servers specified in the launch configuration and terminate the instances one by one to "vertically" scale the servers should you run into horizontal scaling limit due to shard and replication count limit.
    • This is an incredibly economical set up. Taking r3.2xlarge instances for example, even 3 year heavy reserve cost $191 / month while spot instances cost you $60 / month. It is the ability to leverage spot instances that makes all managed hosting of ElasticSearch look like day light robbery.
      For $4k / month, you can easily scale all the way up to 2TB+ memory and 266+ cores and 10TB of SSD for 30k iops (assuming 50% of $4k monthly fee spent on spot instances and 25% on SSD.). You get 60gb ram and 780gb storage from Bonsai btw.
  3. Setup your master nodes on dedicated separate set of Auto Scaling Groups and have at least 2 servers in it. This is to prevent the cluster from falling apart should any of those master nodes in the Auto Scaling Group gets recycled.
    Another point of interest with the necessity of these dedicated master nodes is that because of the volatility of the core data nodes and spot instances since they are on Auto Scaling Groups and the frequency at which the shard and cluster states are reshuffled (again, due to ASG).
    They don't need to be particularly beefy.
  4. Front up the master nodes with an Elastic Load Balancer
    • Open port 9200 for HTTP. User queries will then be evenly distributed across the master nodes. You can use a separate ASG for a set of none data nodes specifically used for query purpose if there is a high volume of query traffic. Optionally set up SSL here.
    • Open port 9300 for TCP. Logstash instances can then join the cluster using this endpoint as the "discovery point". Otherwise you will not have a specific IP address you can set in logstash ElasticSearch output set in node protocol.
  5. Configure a S3 bucket to store snapshot and restore from using the master nodes as the gateway.
The cluster I'm currently running has 2x t2.small for masters, 2x r3.large for "core" nodes and 4x r3.large for spot instances. It managed to hold 200gb of index and 260 million records and growing without breaking a sweat. It will have a long long way to go before it hits its scaling limit of 12x r3.8xlarge. Should be good for upwards of many terabytes of index and billions of rows. Fingers crossed.

20 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. John, great info. If you are interested in some well paid contracting work - please contact me at http://dreamingwell.com/contact. I'm in need of an expert in ElasticSearch on AWS - specifically for planning a large scale roll out.

    ReplyDelete
  3. Great post! I'm doing something similar, and I'm wondering whether you actually utilize any ASG policy for the spot instance group? Or is it a fixed size group the same way as core ASG?

    ReplyDelete
  4. Nice job. Can you tell us what tool have you used for designing the architecture picture?

    ReplyDelete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. hi, Which tool do you use to make this 3D diagram ? i really need it.

    ReplyDelete
  7. I know what you mean, unfortunately I used illustrator and the free symbol pack. You'll find that if you just google around a bit.

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. The Blog gave me idea about ElasticSearch on AWS with AutoScaling Groups and Spot Instances my sincere thanks for sharing this post
    AWS Training in BTM Layout || AWS Training in Marathahalli

    ReplyDelete
  10. Hi There,


    Three cheers to you ! Hooray!!! I feel like I hit the jackpot on ElasticSearch on AWS with AutoScaling Groups and Spot Instances.



    I have earlier ask for help but I still have problem with installing Mageia ,
    Why it is so difficult.? I have a laptop HP where I want to install some Linus to play with but I can not find an installation that I can copy to a USB or a CD så I can install in this Laptop.
    This Laptop has not internet connection yet and have a windows XP OS until now but I want to convert it to Linux.
    It was cool to see your article pop up in my google search for the process yesterday. Great Guide.
    Keep up the good work!


    Shukran,
    Radhey

    ReplyDelete
  11. Your info is really amazing with impressive content..Excellent blog with informative concept. Really I feel happy to see this useful blog, Thanks for sharing such a nice blog..
    If you are looking for any Data science Related information please visit our website Data science courses in Pune page!

    ReplyDelete
  12. Really I feel happy to see this useful blog...I would like to thank for the efforts you have made in writing this post...Awaiting for your Feature posts...Big Thanks
    BEST JAVA TRAINING IN CHENNAI WITH PLACEMENT
    Java training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery

    ReplyDelete