A Quick Guide to Elasticsearch with Spring Data and Spring Boot

A Quick Guide to Elasticsearch with Spring Data and Spring Boot

You’ve probably heard of Elasticsearch or the Elastic Stack. The project started as a search engine based on Lucene, an open-source search engine library built by Shay Banon to index his wife’s cooking recipes. Since its early days, Elasticsearch has come a long way and has evolved into the Elastic Stack, a great suite for taking data from any source, in order to search, analyze, and visualize it in near real-time.

Elasticsearch is a distributed document store; data is serialized as JSON documents and stored in a distributed way across cluster nodes. The inverted index is a data structure that lists every unique word that appears in any document, and also lists the documents in which each word appears. This enables fast full-text searches, a feature not supported or only partially supported by database engines. The index is a collection of documents and each document is a collection of fields. In turn, each field can be indexed in an optimal data structure; for example, an inverted index for text fields, but a BKD tree for a numeric field.

How can be Elasticsearch integrated into a Spring Boot application? What are the options for adding Elasticsearch to a Java app? This post will give you a quick introduction to Elasticsearch integration options.

This tutorial was created with the following frameworks and tools:

Table of Contents

Elasticsearch integration options for Spring Boot

For Java applications including Spring Boot applications, Elasticsearch provides the following clients for integration:

  • Java Transport Client: Deprecated in Elasticsearch 7.0.0. Provides a client object to execute all operations asynchronously, accepting a listener or returning a future.

  • Java REST Client: Composed of the Low Level REST Client and the High Level REST Client. The Low Level Client provides load balancing, failover, persistent connections, and request/response trace logging. The High Level Client works on top of the Low Level Client and is the replacement for the TransportClient. It depends on the Elasticsearch core and provides synchronous and asynchronous APIs.

  • Java API Client: The new client library, independent of Elasticsearch core, provides strongly typed requests and responses, blocking and asynchronous versions for all APIs, fluent builders and functional patterns, as well as jackson and JSON-b support.

Hello Spring Data Elasticsearch!

Spring Data Elasticsearch is another integration option that adds the Spring repository abstraction at the search layer, providing access and search functionality for domain objects stored in Elasticsearch. Operations are sent through a client connected to the Elasticsearch node. With Spring Data, the High Level REST Client is the default client, although Elasticsearch documentation states that it’s been deprecated in favor of the Java API Client since version 7.15. The Java API Client is not listed as a supported client yet. In the meantime, the Java Transport Client is still supported in Spring Data, but the general recommendation is to use the High Level Client.

Instead of calling the Elasticsearch APIs directly, the repository and REST template abstractions provide a simplified interface for document operations, encapsulating API request/response processing, and exposing a query interface that has multiple implementations for different levels of query complexity. Through the starter dependency, it can also handle client autoconfiguration and automatic document index mapping for simple use cases.

Besides the high-level REST client support, Spring Data provides the Reactive Client, a non-official driver based on WebClient, with calls operated directly on the Reactive Stack. The Reactive Client also depends on the Elasticsearch core, as it is designed for handling Elasticsearch request/response types.

Build an application with Spring Data Elasticsearch

JHipster provides the Elasticsearch option to add search capabilities on top of your database. The integration is based on Spring Data Elasticsearch repositories, so let’s generate a reactive blog application to explore what the generator provides. The blog application built in this tutorial is based on the JHipster sample reactive-ms.jdl, but adding Maven, MongoDB (Elasticsearch integration only works with SQL databases and MongoDB), Bootstrap pagination, Okta authentication, and Kibana for index mapping inspection.

JHipster has a quick setup you can follow to set up your environment. For installing Node.js, I like NVM (Node Version Manager)

wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
nvm install 14.18.3

Install JHipster with the following line:

npm install -g generator-jhipster@7.6.0

Verify the JHipster version with the following line:

jhipster --version
INFO! Using JHipster version installed globally
7.6.0

Once your environment is ready, create a spring-data-elasticsearch folder for the project, fetch the application JDL from the GitHub repository, and generate the application with JHipster:

mkdir spring-data-elasticsearch
cd spring-data-elasticsearch
https -d raw.githubusercontent.com/oktadev/okta-spring-data-elasticsearch-example/main/blog-reactive-ms.jdl
jhipster jdl blog-reactive-ms.jdl

Before you begin, you’ll need a free Okta developer account. Install the Okta CLI and run okta register to sign up for a new account. If you already have an account, run okta login. Then, run okta apps create jhipster. Select the default app name, or change it as you see fit. Then, change the Redirect URIs to:

http://localhost:8080/login/oauth2/code/oidc,http://localhost:8081/login/oauth2/code/oidc,http://localhost:8761/login/oauth2/code/oidc

Use http://localhost:8080,http://localhost:8081,http://localhost:8761 for the Logout Redirect URIs.

What does the Okta CLI do?

The Okta CLI streamlines configuring a JHipster app and does several things for you:

  1. Creates an OIDC app with the correct (see above, below are the default values) redirect URIs:
    • login: http://localhost:8080/login/oauth2/code/oidc and http://localhost:8761/login/oauth2/code/oidc
    • logout: http://localhost:8080 and http://localhost:8761
  2. Creates ROLE_ADMIN and ROLE_USER groups that JHipster expects
  3. Adds your current user to the ROLE_ADMIN and ROLE_USER groups
  4. Creates a groups claim in your default authorization server and adds the user’s groups to it

NOTE: The http://localhost:8761* redirect URIs are for the JHipster Registry, which is often used when creating microservices with JHipster. The Okta CLI adds these by default.

You will see output like the following when it’s finished:

Okta application configuration has been written to: /path/to/app/.okta.env

Run cat .okta.env (or type .okta.env on Windows) to see the issuer and credentials for your app. It will look like this (except the placeholder values will be populated):

export SPRING_SECURITY_OAUTH2_CLIENT_PROVIDER_OIDC_ISSUER_URI="https://{yourOktaDomain}/oauth2/default"
export SPRING_SECURITY_OAUTH2_CLIENT_REGISTRATION_OIDC_CLIENT_ID="{clientId}"
export SPRING_SECURITY_OAUTH2_CLIENT_REGISTRATION_OIDC_CLIENT_SECRET="{clientSecret}"

NOTE: You can also use the Okta Admin Console to create your app. See Create a JHipster App on Okta for more information.

The JHipster Registry is also a Spring Cloud Config server, and by default, it is configured with the profiles dev and native, which means the configuration will be provided from the location docker-compose/central-server-config. Update docker-compose/central-server-config/application.yml with the OIDC settings to be shared with all microservices. Set the values from the .okta.env file the Okta CLI created. To avoid copy/paste errors, you can do it from the command line:

source .okta.env
echo \"
spring:
  security:
    oauth2:
      client:
        provider:
          oidc:
            issuer-uri: $SPRING_SECURITY_OAUTH2_CLIENT_PROVIDER_OIDC_ISSUER_URI
        registration:
          oidc:
            client-id: $SPRING_SECURITY_OAUTH2_CLIENT_REGISTRATION_OIDC_CLIENT_ID
            client-secret: $SPRING_SECURITY_OAUTH2_CLIENT_REGISTRATION_OIDC_CLIENT_SECRET\" >> docker-compose/central-server-config/application.yml

For the experiment in this tutorial, configure the Kibana interface for Elasticsearch, which will allow visualizations of the Elasticsearch data.

Edit docker-compose/docker-compose.yml and add the Kibana service like this:

services:
  ...  
  blog-kibana:
     image: docker.elastic.co/kibana/kibana:7.15.2
     ports:
       - 5601:5601
     environment:
       ELASTICSEARCH_URL: http://blog-elasticsearch:9200
       ELASTICSEARCH_HOSTS: '["http://blog-elasticsearch:9200"]'

NOTE: The Kibana version must match the version of Elasticsearch that’s used by the blog-elasticsearch image.

Also add the following additional configuration to the blog-elasticsearch service, to expose the ports to the local host for sending APIs requests:

services:
  blog-elasticsearch:
    ...
    ports:
      - 9200:9200

Remove the Keycloak service at the bottom, as Okta will be used as the identity provider (IdP).

Create each application’s container image:

cd blog
./mvnw -DskipTests -ntp -Pprod verify jib:dockerBuild
cd gateway
./mvnw -DskipTests -ntp -Pprod verify jib:dockerBuild

Run all your images with Docker Compose:

cd ../docker-compose
docker compose up

Access the JHipster Registry at http://localhost:8761 and sign in with your Okta credentials. When you see all services up and green, go to http://localhost:8080 and sign in with your Okta account.

Okta Sign In Form

IMPORTANT NOTE: There is a potential issue observed for the blog service. As Kubernetes probes are enabled by default, the blog application might change its state to OUT_OF_SERVICE during start up and Eureka might discard the following heartbeats with state UP. In this case, disable Kubernetes probes in the docker-compose.yml with the following environment variable:

services:
  blog:
    image: blog
    environment:
      ...
      - MANAGEMENT_ENDPOINT_HEALTH_PROBES_ENABLED=false

Restart your microservices stack using Ctrl+C and run docker compose up again.

Inspecting Elasticsearch index mapping

During startup, Spring Data Elasticsearch will create the index for the entities annotated with @Document, deriving the mappings from the entity’s annotations. But for the properties, if the field type is not specified, it will default to FieldType.Auto. This means that no mapping entry is written for the property and that Elasticsearch will add a mapping entry dynamically when the first data for this property is stored.

As our Kibana interface was configured in Docker Compose, before creating any entities, let’s inspect the index mappings that were automatically created for the blog microservice. Go to http://localhost:5601, the Kibana dashboard. On the top left menu, go to Management > Stack Management > Index Management > Indices tab. Besides the user index, an index per entity should be listed.

Kibana Indexes

Choose for example the tag index. Before persisting any instance, the Mappings tab will look like the following:

{
  "mappings": {
    "_doc": {
      "properties": {
        "_class": {
          "type": "keyword",
          "index": false,
          "doc_values": false
        }
      }
    }
  }
}

As you can see, properties are not mapped yet.

Go back to the application, and in the top-right menu go to Entities > Tag and create some Tag entities. Then create some Blog and Post entities as well. You’ll see that a search box is present on the entity list page. Go ahead and test the search functionality. The results page will return the matches with pagination.

Entity Search Box

Go back to Kibana, and display the Tag mappings after persistence:

{
  "mappings": {
    "_doc": {
      "properties": {
        "_class": {
          "type": "keyword",
          "index": false,
          "doc_values": false
        },
        "id": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

Now the id and name properties have been dynamically mapped with multi-field mapping. For example, the name field is defined with type text, for full-text searches, and it defines a name.keyword subfield for aggregations and sorting:

"name": {
  "type": "text",
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  }
}

It is often useful to index the same field in different ways for different purposes. Text fields are searchable by default, but by default are not available for aggregations, sorting, unless multi-field mapping is in place. There is another option for enabling sorting for text fields, which is the fielddata mapping parameter. This option will not be covered in our tutorial, as it is discouraged in the Elasticsearch documentation, because of high memory consumption.

Elasticsearch exposes several APIs, and the Index APIs allow managing individual indices, index settings, aliases, mappings, and index templates. Although Kibana is a user-friendly front-end, you can also query the index mappings through the Index APIs, with an HTTPie command like the following:

http :9200/tag/_mapping

The output should look the same as the mapping visualized in Kibana:

Kibana Index Mapping

Key components in reactive Spring Data Elasticsearch

As mentioned before, the blog application is reactive, and that’s because the JDL file had reactive true in its gateway and blog application configurations.

This config also tells JHipster to include the Elasticsearch dependencies in the blog microservice. JHipster generates the web layer with the entity resource class, and it optionally generates the service class too. Two of the key components that enable the Elasticsearch reactive integration are the ReactiveElasticsearchClient and the ReactiveElasticsearchOperations. A client is required to connect to an Elasticsearch cluster or node, and *Operations is the interface for the execution of CRUD and query commands. The third component is the entity ReactiveElasticsearchRepository, which can provide domain-specific search methods.

Spring Data Collaboration

When using the starter dependency, ReactiveElasticsearchRestClientAutoConfiguration configures the reactive Elasticsearch client from the Spring Elasticsearch properties. ElasticsearchDataAutoConfiguration is the root configuration class that triggers the reactive search template initialization. Let’s briefly describe each of the mentioned components.

The reactive client

The ReactiveElasticsearchClient is based on Spring’s WebClient and calls are directly operated on the reactive stack. The reactive client sends and receives high-level request and response objects. In this example, the reactive client is initialized from spring.elasticsearch.* properties. You can verify the Elasticsearch endpoints are set in the docker-compose.yml file for the blog service.

blog:
  image: blog
  environment:
    ...
    - SPRING_ELASTICSEARCH_URIS=http://blog-elasticsearch:9200
    ...

The operations abstraction

ReactiveElasticsearchOperations is the gateway for executing high-level commands against an Elasticsearch cluster using the ReactiveElasticsearchClient. The ReactiveElasticsearchTemplate is the default implementation of ReactiveElasticsearchOperations. Some high-level operations are save, get, delete and search, which accept the Spring Data Query types as parameters.

The entity search repository

The entity search repository enables the document CRUD and search operations in the application. The search repository extends ReactiveElasticsearchRepository, the persistence technology-specific abstraction, that builds on the core repository support utilizing operations provided through the ReactiveElasticsearchTemplate executed by the ReactiveElasticsearchClient. JHipster generates a search repository for the entities specified in the search option in the JDL. In the example, all entities are set to be searchable in Elasticsearch:

application {
  config {
    baseName blog
    reactive true
    ...
    searchEngine elasticsearch
  }
  ...
  search * with elasticsearch
}

When the entity in the JDL definition requires the Elasticsearch engine, and also pagination, the entity search repository is generated with pagination and sorting options. For example, the generated TagSearchRepository adds a custom search() method that assembles a native search query and passes it to the ReactiveElasticsearchTemplate:

public interface TagSearchRepository extends ReactiveElasticsearchRepository<Tag, String>, TagSearchRepositoryInternal {}

interface TagSearchRepositoryInternal {
    Flux<Tag> search(String query, Pageable pageable);
}

class TagSearchRepositoryInternalImpl implements TagSearchRepositoryInternal {

    private final ReactiveElasticsearchTemplate reactiveElasticsearchTemplate;

    TagSearchRepositoryInternalImpl(ReactiveElasticsearchTemplate reactiveElasticsearchTemplate) {
        this.reactiveElasticsearchTemplate = reactiveElasticsearchTemplate;
    }

    @Override
    public Flux<Tag> search(String query, Pageable pageable) {
        List<FieldSortBuilder> builders = new SortToFieldSortBuilderConverter().convert(pageable.getSort());

        NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder()
            .withQuery(queryStringQuery(query))
            .withPageable(PageRequest.of(pageable.getPageNumber(), pageable.getPageSize()));

        builders
            .stream()
            .forEach(builder -> {
                queryBuilder.withSort(builder);
            });

        NativeSearchQuery nativeSearchQuery = queryBuilder.build();
        return reactiveElasticsearchTemplate.search(nativeSearchQuery, Tag.class).map(SearchHit::getContent);
    }
}

Learn more about Elasticsearch, Spring Boot, and JHipster

JHipster helps to simplify the setup of Spring Boot applications or microservices with search capabilities. I hope you enjoyed this quick introduction to Elasticsearch integration options, and could taste the advantages of the Spring Data repository abstraction for encapsulating some basic Elasticsearch operations. Remember to check the compatibility matrix to match the right Spring Data Elasticsearch dependency for your Elasticsearch version.

Keep learning, and for more examples and recipes on Elasticsearch and Okta integrations for Spring Boot, check out the following links:

You can find the source code for this tutorial in the @oktadev/okta-spring-data-elasticsearch-example repository on GitHub.

Be sure to follow us on Twitter and subscribe to our YouTube Channel so that you never miss any of our excellent content!

Okta Developer Blog Comment Policy

We welcome relevant and respectful comments. Off-topic comments may be removed.