OpenSearch on NetApp ONTAP in the cloud

Contents

Share this page

Subbareddy Jangalapalli

October 11, 2023

593 views

OpenSearch is a scalable, flexible, and extensible open-source software suite for search, analytics, and observability applications. First announced by Amazon Web Services in January 2021 as an open-source fork of Elasticsearch and Kibana, OpenSearch is a powerful tool for building e-commerce, application, document search, machine learning, anomaly detection, and observability solutions. It is licensed under the Apache 2.0 license and is community supported. Although the community is led by AWS, OpenSearch is not limited to running in AWS; it can be run in any cloud or even on premises.

OpenSearch is a distributed database designed to run effectively on local disks. However, many organizations have deployed NetApp^® ONTAP^® storage in AWS, Azure, or Google Cloud. NetApp Cloud Volumes ONTAP, Azure NetApp Files, Google NetApp Volumes, and on-premises NetApp AFF systems can leverage the power of ONTAP data management, storage efficiencies, and speed.

This blog post explains how to deploy the infrastructure necessary to run OpenSearch by using NetApp ONTAP with NFS or iSCSI volumes and how to configure the OpenSearch cluster.

OpenSearch architecture

OpenSearch is a distributed search and analytics engine that can scale to hundreds of nodes. Data is organized into indexes, each of which is a collection of shards that store data in a document database format. A shard is an instance of an Apache Lucene index. Each primary shard has replica shards based on your resilience and I/O needs, but a replication factor of 3 is typical.

OpenSearch Dashboards (a fork of Kibana) is the user interface to OpenSearch. OpenSearch Benchmark is a performance benchmarking tool that is itself a fork of Elasticsearch Rally.

Sizing and scoping considerations

Primary shards are read/write, and replica shards are read only.
- More primary shards can scale writes, but you shouldn’t scale an unlimited number of primary shards because that will eventually degrade performance.
Although it is tempting to pack indexes as large as possible to be efficient, best practice is to keep each shard between 10GB and 50GB in size.
- An Apache Lucene index has an upper limit of 2,147,483,519 documents.
- Oversizing shards is inefficient and requires greater resources, resulting in degraded performance. Queries can run in parallel across many shards when the shards are located on different nodes. Decisions about the number of shards must be made before deployment.
- Maintaining the recommended shard size while using networked storage solves many problems:
  - Helpful when a node is down and unable to access data from the respective local SSD.
  - Scaling becomes much easier when data is exponentially growing. Just grow the volume nondisruptively.
  - Local SSDs often require rebalancing. Shared storage reduces the need for these operations and reduces the operational time and costs of data transfer and migration cost and effort.
  - No lock-in with any cloud provider.
  - The data layer is automatically high availability, giving a 99.99% SLA, and is encrypted at rest with no extra configuration required.
  - Duplication, compression, and tiering reduce the amount of consumed capacity, allowing intelligent placement of storage blocks on efficiently priced storage.
  - Troubleshooting and support are simplified, reducing mean time to resolution.

OpenSearch is compute intensive, requiring at least 8 cores per node for production uses.

Infrastructure deployment

For our testing, we deployed a three-node cluster with a Dashboards node behind an optional load balancer, which terminates and load balances the HTTPS connections. Each node was deployed into an availability zone.

AWS, Azure, and Google Cloud each do networking differently. You need to provision the networking according to the hyperscalar, because each does networking differently. All have the concept of an availability zone, and for full SLA supportability, all require a zonal deployment. Spanning OpenSearch cluster nodes across regions introduces inter-regional latency to internode communications. Data Prepper or other tools can be used to push data into OpenSearch. They could be in other regions and would need connectivity, including from the edge. This must be a consideration in the design, especially as the OpenSearch cluster evolves over time.

Networking configuration requirements for your organization might be different than what we tested with. NetApp is happy to help you with any questions you might have.

If you are in Azure, you will need an Azure Resource Group.
Create a network (VNet or VPC) with routing rules and security groups according to your networking and security policies and needs.
1. If you are in Azure using Azure NetApp Files, you will need a delegated subnet for Azure NetApp Files.
If you need public access, create an external load balancer to terminate and load balance the HTTPS connections. This could be a managed load balancer provided by your cloud provider or a third-party load balancer deployed in your network.
Storage configuration
1. Deploy a volume for each OpenSearch node to achieve the required performance.
2. For most NetApp offerings in a cloud provider, including the cloud first-party services such as Azure NetApp Files, Amazon FSx for NetApp ONTAP, and Google Cloud NetApp Volumes, performance is coupled to the size of the volume.

There are two main ways to deploy OpenSearch: with a “bare metal” deployment into a virtual machine or by using containers. We used Ubuntu for the operating system, but you’re not limited to Ubuntu. However, some of the following commands will be different if you are using Red Hat.

VM bare metal installation

Deploy a Linux VM into each availability zone in the vNet you are using.
Spot Elastigroup by NetApp Stateful Nodes can be used to deploy your instances. This deployment reduced our operating costs by 60%.
Install OpenSearch according to the documentation.
Because we are running OpenSearch on NFS, there are some differences from the linked installation guide:
Update and prep the operating system and install the NFS client utilities for your Linux distribution:
sudo apt update && sudo apt upgrade -y && sudo apt install nfs-common -y
Create mount points and mount your NFS volumes by adding the mounts to your fstab file.
Deploy a VM for Dashboards and Install OpenSearch Dashboards on the Dashboard server.

Docker installation

We followed the Docker installation instructions.

We found it easier and more effective to run from a shell script called from cloud-init for each node, making the node.name parameter particular to the node: docker run -d --network host -v /data/data1:/usr/share/OpenSearch/data -e "cluster.name=OpenSearch-cluster" -e "node.name=docker1" -e "discovery.seed_hosts=<node-1_IP>,=<node-2_IP>,<node-3_IP>" -e "cluster.initial_master_nodes==<node-1_IP>" -e "path.data=/usr/share/OpenSearch/data" OpenSearchproject/OpenSearch:latest

This script replaces the default OpenSearch.yml parameters that come with the container. Modify to suit your environment.

Deploy OpenSearch Dashboards

docker run -d --network host -e "OpenSearch.hosts==<node-1_IP>:9200,=<node-2_IP>:9200,<node-3_IP>:9200 -e "OpenSearch.ssl.verification=none" -e "OpenSearch.username=admin" -e "OpenSearch.password=admin" OpenSearchproject/OpenSearch-dashboards:latest

Note: The default account and password here is admin:admin. Obviously you wouldn’t use that in production!

We used the default OpenSearch security for demonstration purposes. In production you would use your own certificates and identity management. The security module must be configured to get OpenSearch to work.

Performance details and results

We benchmarked performance with the OpenSearch-benchmark tool (v0.1.0) with various platforms and storage options. From our observations, NetApp LUNs (iSCSI), NFS (FSx for ONTAP) storage volumes give better performance than others, as shown in the following table.

Find more details in our NetApp community post.

Subbareddy Jangalapalli

Having 20+ years of experience, played various roles at Walmart, CapitalOne, Walgreens, & Pfizer thru Accenture & Cognizant. Currently working as Cloud Solutions architect at NetApp to play distinguished opens source, multi-cloud & data engineer/architect roles to evaluate the open-source technologies & benchmarking for various uses cases with ONTAP volumes vs respective cloud native volumes and help customers/stakeholders.

View all Posts by Subbareddy Jangalapalli

Next Steps

Blogs

Brush up on the latest trends and developments in cloud, on premises, and everywhere in between. This is where it all gets real, with a cherry on top.

Get to reading

Community

Explore a wide range of open forums where you can post questions, share answers and just generally get smart on all the NetApp technologies that matter most to you.

Join the discussion