Nest JS with Elastic search -- Full text search

Elasticsearch (ES) is a database that provides distributed, near real-time search and analytics for different types of data. It is based on the Apache Lucene™ library and is developed in Java. It works on structured, unstructured, numerical and geospatial data. The data is stored in the form of schema-less JSON documents.

The official clients for Elastic Search are available in the following languages:

Java
JavaScript
Go
.NET
PHP
Perl
Python
Ruby

Features it supports

Some of the major features that Elasticsearch has to offer are:

Security analytics and infrastructure monitoring. Can be scaled to thousands of servers and can handle petabytes of data. Can be integrated with Kibana to provide real-time visualisation of Elasticsearch data for accessing application performance and for monitoring logs and infrastructure metrics data. Use of machine learning to automatically model the behaviour of your data in real-time.

Major Concepts

Index

It is similar to a table in a relational database which stores documents having a particular schema in JSON format. In ES versions before 6.0.0, a single index could have multiple types where documents having different schemas could be stored in the same index. For example: We could have Cars and Bikes types in the same index. However, from version 6.0.0 onwards, if we want to store documents of both Cars and Bikes, we will have to create separate indices for each type.

Documents

They are basically records in an index just like a row in a relational database. Each document has a JSON format, a unique _id associated to it and pertains to a specific mapping/schema in the index.

Fields

These are basically attributes of a document in an index similar to columns in a table of a relational database.

Data types

Elasticsearch supports a number of different data types for the fields in a document. I’ll just explain some of the most commonly used ones.

String: It is of further two types: text and keyword.

Text is basically used when we want to store a product description or a tweet or a news article. Basically, if we want to find all the documents in which a particular attribute contains a specific phrase or a word then we use text data type. Elasticsearch has special analysers which process the string and convert it into a list of individual tokens before indexing the document. After analysing the text, it creates an inverted index which consists of a list of all the unique words that appear in any document, and for each word a list of all the documents in which it appears. For example: If our index has a field Description and for one of the documents its value is “This phone has dual sim capability”, then before indexing this document, ES would check if any analyser is specified, otherwise, it will use the default Standard Analyser to divide it into individual tokens and will convert each token into lower case. Tokens: [“this”, “phone”, “has”, “dual”, “sim”, “capability”]

I will explain the analysing process in greater detail in future blogs. Keyword is used for storing user names, email addresses, hostnames, zip-codes, etc. In this case, Elasticsearch does not analyse the string and the string is indexed as is without breaking it into tokens. This is the ideal type when we want to do an exact match for fields with string values. Keywords are also used for sorting and aggregation. Numeric: As is evident from the name, it is used when we want to store numeric data like marks, percentage, phone number, etc. Some of the numeric types that ES supports are long, integer, short, byte, double, float.

Date:

It can either be strings containing formatted dates, like “2015–01–01” or “2015/01/01 12:10:30”, or a long number representing milliseconds-since-the-epoch, or an integer representing seconds-since-the-epoch. Internally, dates are converted to UTC (if the time-zone is specified) and stored as a long number representing milliseconds-since-the-epoch. Boolean: It accepts JSON true and false values, but can also accept strings which are interpreted as either true or false. IP: It is a special data type for storing IPv4 and IPv6 addresses. Nested: In Elasticsearch, an attribute can have an array of JSON objects as its value. For example: Suppose we are maintaining an index of all the clubs that play football, then each document pertaining to a specific club will have a field by the name of players which can be an array of different players that play for that club. Here is a sample document:

{  
   "name":"ABC United",
   "homeGround":"Old Trafford",
   "players":[  
      {  
         "firstName":"James",
         "lastName":"Cohen",
         "position":"Goal Keeper"
      },
      {  
         "firstName":"Paul",
         "lastName":"Pogba",
         "position":"Midfielder"
      }
   ]
}

ES (Elastic Search) search with Nest JS APIs

Lets spin container for ES before we start doing integration of ES with Nest JS

what we need to understand this whole setup

  • we need node js
  • simple vs code editor
  • we need docker installed to create ES container
  • we need nestjs basic understanding to get much out of this blog
version: '3.7'
services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.9.1
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=es-docker-cluster
      - discovery.seed_hosts=es01
      - cluster.initial_master_nodes=es01
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data01:/usr/share/elasticsearch/data
    ports:
      - 9200:9200        
volumes:
  data01:
docker-compose up

Now lets say if i want to have search then how can we build APIs for that, we can write apis in nestjs for it

https://www.npmjs.com/package/@nestjs/elasticsearch

Installation

$ npm i --save @nestjs/elasticsearch @elastic/elasticsearch

Usage

Import ElasticsearchModule:

@Module({
  imports: [ElasticsearchModule.register({
    node: 'http://localhost:9200',
  })],
  providers: [...],
})
export class SearchModule {}

Inject ElasticsearchService:

@Injectable()
export class SearchService {
  constructor(private readonly elasticsearchService: ElasticsearchService) {}
}

Async options Quite often you might want to asynchronously pass your module options instead of passing them beforehand. In such case, use registerAsync() method, that provides a couple of various ways to deal with async data.

  1. Use factory
ElasticsearchModule.registerAsync({
  useFactory: () => ({
    node: 'http://localhost:9200'
  })
});

Obviously, our factory behaves like every other one (might be async and is able to inject dependencies through inject).

ElasticsearchModule.registerAsync({
  imports: [ConfigModule],
  useFactory: async (configService: ConfigService) => ({
    node: configService.get('ELASTICSEARCH_NODE'),
  }),
  inject: [ConfigService],
})

Lets create a main module with ES module

import { Module, OnModuleInit } from '@nestjs/common';
import { ElasticsearchModule } from '@nestjs/elasticsearch';
import { ProcessCredentials } from 'aws-sdk';
import { SearchQueryBuilderService } from './query-builder.service';
import { SearchServiceController } from './search.controller';
import { SearchService } from './search.service';
@Module({
  imports: [
    ElasticsearchModule.registerAsync({
      imports: [],
      useFactory: async () => ({
        node: process.env.ELASTIC_URL || 'http://localhost:9200',
        maxRetries: 10,
        requestTimeout: 60000,
        auth: {
          username: process.env.ELASTIC_USERNAME,
          password: process.env.ELASTIC_PASSWORD
        }
      }),
      inject: [],
    }),
  ],
  controllers: [SearchServiceController],
  providers: [SearchService, SearchQueryBuilderService],
  exports: [ElasticsearchModule, SearchService, SearchQueryBuilderService],
})
export class SearchModule implements OnModuleInit {
  constructor(private readonly searchService: SearchService) { }
  public async onModuleInit() {
    await this.searchService.createIndex();
  }
}

before we start doing search we need index created on ES

export class SearchModule implements OnModuleInit {
  constructor(private readonly searchService: SearchService) { }
  public async onModuleInit() {
    await this.searchService.createIndex();
  }
}

and here we are doing async initialization by passing ES auth and endpoint details for local setup we can pass localhost:9200 default PORT

  ElasticsearchModule.registerAsync({
      imports: [],
      useFactory: async () => ({
        node: process.env.ELASTIC_URL || 'http://localhost:9200',
        maxRetries: 10,
        requestTimeout: 60000,
        auth: {
          username: process.env.ELASTIC_USERNAME,
          password: process.env.ELASTIC_PASSWORD
        }
      }),
      inject: [],
    })

We also need mapping for ES based on which we will build our search Query, Mapping decided how search will be done

export const Mapping = {
  properties: {
    text: {
      type: 'text',
      analyzer: 'english',
      fields: {
        keyword: {
          type: 'keyword',
          ignore_above: 1024
        },
        word_delimiter: {
          type: 'text',
          analyzer: 'word_delimiter'
        }
      }
    },
    name: {
      type: 'text',
      analyzer: 'english',
      fields: {
        keyword: {
          type: 'keyword',
          ignore_above: 256
        }
      }
    },
    id: {
      type: 'keyword'
    },
    description: {
      type: 'text',
      analyzer: 'english',
      fields: {
        keyword: {
          type: 'keyword',
          ignore_above: 256
        }
      }
    },
    url: {
      type: 'text',
      analyzer: 'english',
      fields: {
        keyword: {
          type: 'keyword',
          ignore_above: 256
        }
      }
    }
  }
}

we have these many columns, text, name, id, description and url and every column is mapped with analyzer to decides how search will be performed on that column

Now we can build query and send to ES for search

import { Injectable } from '@nestjs/common';
import { SearchDtoParam } from './search.dto';

@Injectable()
export class SearchQueryBuilderService {
  constructor() { }

  public buildSearchQuery(searchParam: SearchDtoParam) {
    // tslint:disable-next-line:naming-convention
    const { search_term } = searchParam;
    try {

      const query = [];
      let flag = false;
      if (search_term) {
        flag = true;
        query.push({
          multi_match: {
            query: `${search_term}`,
            type: 'cross_fields',
            fields: [
              'name',
              'name.word_delimiter',
              'url',
              'text',
              'description',
              'description.word_delimiter',
            ],
            operator: 'or',
          },
        });
      }
      if (flag) {
        return {
          query: {
            bool: {
              must: query,
            },
          },
        };
      }
      return {};

    } catch (err) {
      
    }
  }
}

This service is having all in it

  • creating index
  • doing search with search builder
import { Injectable } from '@nestjs/common';
import { ElasticsearchService } from '@nestjs/elasticsearch';
import { Mapping, Settings } from './mapping';
import { SearchQueryBuilderService } from './query-builder.service';
import debug from "debug";
import { uuid } from 'uuidv4';
import { SearchDtoParam } from './search.dto';

const error = debug("lib:error:azure");

@Injectable()
export class SearchService {
  constructor(
    private readonly esService: ElasticsearchService,
    private readonly builderService: SearchQueryBuilderService) { }
  public async createIndex() {
    // create index if doesn't exist
    try {
      const index = process.env.ELASTIC_INDEX;
      const checkIndex = await this.esService.indices.exists({ index });
      if (checkIndex.statusCode === 404) {
        this.esService.indices.create({
          index,
          body: {
            mappings: Mapping,
            settings: Settings,
          },
        },
          (err: any) => {
            if (err) {
              error(err, 'SearchService -> createIndex');
              throw err;
            }
          },
        );
      }
    } catch (err) {
      error(err, 'SearchService -> createIndex');
      throw err;
    }
  }
  public async indexData(payload: any) {
    try {
      return await this.esService.index({
        index: process.env.ELASTIC_INDEX,
        id: uuid(),
        body: payload,
      });
    } catch (err) {
      error(err, 'SearchService -> indexData');
      throw err;
    }
  }
  public async search(searchParam: SearchDtoParam) {
    try {
      const { body } = await this.esService.search<any>({
        index: process.env.ELASTIC_INDEX,
        body: this.builderService.buildSearchQuery(searchParam),
        from: 0,
        size: 1000,
      });
      const totalCount = body.hits.total.value;
      const hits = body.hits.hits;
      const data = hits.map((item: any) => item._source);
      return {
        totalCount,
        data,
      };
    } catch (err) {
      error(err, 'SearchService || search query issue || -> search');
      throw err;
    }
  }
}

On module initialization if we are able to connect to ES then it will create a default index and then we can synchronize data to ES, this is one time process if we already have index then we cna skip this Now we can plug controller and module

@Controller('search-service')
@ApiTags('search-service')
@ApiBearerAuth()
export class SearchServiceController {
  private readonly log = new Logger(SearchServiceController.name);

  constructor(private readonly service: SearchService) { }


  @HttpCode(HttpStatus.CREATED)
  @ApiConsumes("application/json")
  @Post(
    '/search',
  )
  //@UseGuards(new JWTAuthGuard())
  async fetchESResults(
    @Body() searchDto: SearchDtoParam,
  ) {
    return this.service.search(searchDto)
  }

  @HttpCode(HttpStatus.CREATED)
  @ApiConsumes("application/json")
  @Post(
    '/sync',
  )
}

Conclusion

This is all about ES integration with Nest JS, we just need to worry about

  • getting ES credentials
  • initializing Main module with ES async initialization and create index on init
  • build mapping for ES6 collection
  • query the ES index with query and bind all this with controller and service

Comments