OpenSearch is a scalable, flexible, and extensible open-source software suite for search, analytics, and observability applications. First announced by Amazon Web Services in January 2021 as an open-source fork of Elasticsearch and Kibana, OpenSearch is a powerful tool for building e-commerce, application, document search, machine learning, anomaly detection, and observability solutions. It is licensed under the Apache 2.0 license and is community supported. Although the community is led by AWS, OpenSearch is not limited to running in AWS; it can be run in any cloud or even on premises.
OpenSearch is a distributed database designed to run effectively on local disks. However, many organizations have deployed NetApp® ONTAP® storage in AWS, Azure, or Google Cloud. NetApp Cloud Volumes ONTAP, Azure NetApp Files, Google NetApp Volumes, and on-premises NetApp AFF systems can leverage the power of ONTAP data management, storage efficiencies, and speed.
This blog post explains how to deploy the infrastructure necessary to run OpenSearch by using NetApp ONTAP with NFS or iSCSI volumes and how to configure the OpenSearch cluster.
OpenSearch is a distributed search and analytics engine that can scale to hundreds of nodes. Data is organized into indexes, each of which is a collection of shards that store data in a document database format. A shard is an instance of an Apache Lucene index. Each primary shard has replica shards based on your resilience and I/O needs, but a replication factor of 3 is typical.
OpenSearch Dashboards (a fork of Kibana) is the user interface to OpenSearch. OpenSearch Benchmark is a performance benchmarking tool that is itself a fork of Elasticsearch Rally.
Sizing and scoping considerations
For our testing, we deployed a three-node cluster with a Dashboards node behind an optional load balancer, which terminates and load balances the HTTPS connections. Each node was deployed into an availability zone.
AWS, Azure, and Google Cloud each do networking differently. You need to provision the networking according to the hyperscalar, because each does networking differently. All have the concept of an availability zone, and for full SLA supportability, all require a zonal deployment. Spanning OpenSearch cluster nodes across regions introduces inter-regional latency to internode communications. Data Prepper or other tools can be used to push data into OpenSearch. They could be in other regions and would need connectivity, including from the edge. This must be a consideration in the design, especially as the OpenSearch cluster evolves over time.
Networking configuration requirements for your organization might be different than what we tested with. NetApp is happy to help you with any questions you might have.
There are two main ways to deploy OpenSearch: with a “bare metal” deployment into a virtual machine or by using containers. We used Ubuntu for the operating system, but you’re not limited to Ubuntu. However, some of the following commands will be different if you are using Red Hat.
We followed the Docker installation instructions.
We found it easier and more effective to run from a shell script called from cloud-init for each node, making the node.name parameter particular to the node: docker run -d --network host -v /data/data1:/usr/share/OpenSearch/data -e "cluster.name=OpenSearch-cluster" -e "node.name=docker1" -e "discovery.seed_hosts=<node-1_IP>,=<node-2_IP>,<node-3_IP>" -e "cluster.initial_master_nodes==<node-1_IP>" -e "path.data=/usr/share/OpenSearch/data" OpenSearchproject/OpenSearch:latest
This script replaces the default OpenSearch.yml parameters that come with the container. Modify to suit your environment.
docker run -d --network host -e "OpenSearch.hosts==<node-1_IP>:9200,=<node-2_IP>:9200,<node-3_IP>:9200 -e "OpenSearch.ssl.verification=none" -e "OpenSearch.username=admin" -e "OpenSearch.password=admin" OpenSearchproject/OpenSearch-dashboards:latest
Note: The default account and password here is admin:admin. Obviously you wouldn’t use that in production!
We used the default OpenSearch security for demonstration purposes. In production you would use your own certificates and identity management. The security module must be configured to get OpenSearch to work.
We benchmarked performance with the OpenSearch-benchmark tool (v0.1.0) with various platforms and storage options. From our observations, NetApp LUNs (iSCSI), NFS (FSx for ONTAP) storage volumes give better performance than others, as shown in the following table.
Demo
View the OpenSearch Demo
Having 20+ years of experience, played various roles at Walmart, CapitalOne, Walgreens, & Pfizer thru Accenture & Cognizant. Currently working as Cloud Solutions architect at NetApp to play distinguished opens source, multi-cloud & data engineer/architect roles to evaluate the open-source technologies & benchmarking for various uses cases with ONTAP volumes vs respective cloud native volumes and help customers/stakeholders.