AWS Elasticsearch Access with Serverless Lambda
Elasticsearch is a well known search solution and AWS offers a fully-managed service for it. The managed service has the exact same API to interact with just like an unmanaged cluster which is great because you can use all the available tooling as it is.
There are a couple of different options when it comes to deploying a business service backed by Elasticsearch. For example,
- You can expose the ES endpoint directly so that clients can invoke the ES rest API.
- Create a business centric API around ES and expose it as the only way to consume ES service.
Which approach is better than the other depends on your needs. Additionally, AWS ES offers Kibana to visualize your search data. To protect Kibana from unauthorized access, you can integrate it with Cognito. Once you do that, only authenticated users from your Cognito User Pool will be able to access the Kibana Dashboard.
In my case, I wanted to deploy an ES domain on AWS with lambda and API Gateway in front. And I am using Serverless Framework to manage my lambda functions. Additionally, I wanted to secure the access to Kibana using Cognito User Pools. When I first started with this, I face a lot of challenges to set up my ES domain in AWS with proper access policies. Merely defining an Identity Based Policy did not work. There is a great article by Jon Handler on Getting started with Elasticsearch and Cognito which talks about a lot of things in great detail. However, there were still some missing pieces given the limited knowledge I had at the time. This post outlines some of the concepts which may help you understand what makes Elasticsearch access control different than other IAM policies you are used to defining in your serverless.yml
.
Resource vs Identity Based Policy
This is one of the most important concepts I think, which you’d have to understand before you configure Elasticsearch Access Policy.
A Resource Based Policy is attached to the resource which you are trying to secure. That means, you have to know upfront who can access the resource. The who may mean many different things depending on what you want to do. For example, you can allow everything under one AWS Account ID to have access to ES.
An Identity Based Policy is attached to the entity accessing your resource. For example an indexing lambda function to insert data in Elasticsearch.
The main difference between the two is that for the Resource Based Policy, you have to plan ahead.
One of the many ways you can do this is by creating an IAM Role, assigning it to your lambda and granting it access in the ES Access Policy. So the policy document may look like the following:
Notice the Principal
key here. It means that we are specifically telling Elasticsearch to allow all access to everyone who is assigned this role.
Initial plan
In my first iteration, I planned this:
- Create two separate roles (one for read and other for read and write). The actions in the role don’t matter (I think) because ES will ignore those anyways.
- Add these additional roles to my
serverless.yml
and attach it to functions.
But I couldn’t find a way to keep the default roles which Serverless creates and add this additional role. So that plan failed.
Second attempt
Then I created an access policy in Elasticsearch which allowed access to everything under my account.
But this was still not enough as I kept getting error that the lambda didn’t have access to invoke es:HttpGet
.
Making it work with Serverless
I finally found the reason why my second attempt failed in the this section of Identity and Access Management document for Elasticsearch. Essentially, if you have a Resource Based Policy which allows access to everything under the account, you must also create an Identity Based Policy for the entity accessing the service. That means, in addition to my second attempt, I also had to configure the serverless.yml
with the following block under iamRoleStatements
Signing the requests
As you know, a client (lambda in this case) talks to Elasticsearch via REST APIs. That means, the requests don’t go internally in AWS (like calling a lambda via ARN). So how does ES differentiate between a request coming from your lambda vs a request from some other source (because they are both HTTP).
Well, once you allow your lambda to access the Elasticsearch instance, you must sign the HTTP requests with AWS V4 signing as well. Otherwise the request will appear as if it is coming from an unauthorized user. To sign a request in NodeJS, checkout the following two libraries.
1. aws-elasticsearch-connector
2. aws4
Configuring Cognito
The Getting started with Elasticsearch and Cognito article mentioned before has a step by step guide to configure Cognito for Kibana. However, I found somethings which didn’t work for me:
- I didn’t see the need for checking “Enable access to unauthenticated identities”. Because I don’t fully understand the Cognito Identity Pools yet, this seemed dangerous. The permissions that you assign to the
Unauth_Role
don't matter to ES because the ES access policy supersedes. This is only needed (I think) if you want to create a dashboard in Kibana which is accessible without authentication. - If you follow the steps in “Change the Principal to the ARN for the assumed Auth role.”, it will allow Kibana access to only the Identity Pool users. You can always add additional role here but then you may run into the same problem as my attempt 1 about how to assign this extra role to your lambda.
Summary
- Create an access policy with a Principal value which allows everything under your account to access Elasticsearch.
- Assign necessary permissions to your lambda in the serverless.yml
- Sign your request with V4 signing.