6. Elastic search — 데이터 indexing, 연동(monstache)

앞에서 알아본 analyzer, setting, mapping을 적용시켜 데이터를 indexing시켜 보도록 하겠다.

Taeha Hong
7 min readOct 7, 2021

Analyzer 설정과 mapping

nori-analyzer

nori 는 한글 analyzer 로 엘라스틱 서치에서 지원한다.
하지만 기본 플러그인으로 포함되어 있지는 않기 때문에 수동으로 플러그인해야 한다.

{ELASTICSEARCH_PATH}/bin

./elasticsearch-plugin install analysis-nori

Setting, mapping

앞에서 공부한 내용을 토대로 analyzer를 구성하고 mapping했다.

  • shard 5개
  • replica 1개
  • nori-analyzer
    - nori-tokenizer
    - decompound_mode : mixed
  • filter
    - nori_readingform

연동 Monstache

mongoDB와 elastic search를 연동하기 위해 Logstash나 monstache를 사용한다. 그 중에서 monstache를 이용해서 연동해보도록 하겠다. monstache를 선택한 이유는 logstash보다 간단하고 쉽게 연동이 가능하기 때문이다.
monstache가 mongoDB에 새로운 데이터가 들어오면 elastic search에도 넣어준다. (동기화)

Install

  1. Go lang 설치
  2. monstache 설치
> wget <https://github.com/rwynn/monstache/releases/download/v6.7.4/monstache-98f8bc6.zip>
> unzip monstache-98f8bc6.zip

설정 파일 생성

mongo-elastic.toml 파일 생성
mongo-url, elasticsearch-url 설정
namespace 지정

  • direct-read-namespaces : mongoDB의 namespace 규칙을 따라야 한다.
# connection settings# connect to MongoDB using the following URL
mongo-url = "{mongoDB URL}"
# connect to the Elasticsearch REST API at the following node URLs
elasticsearch-urls = ["<http://192.168.0.87:9200>"]
# frequently required settings# if you need to seed an index from a collection and not just listen and sync changes events
# you can copy entire collections or views from MongoDB to Elasticsearch
direct-read-namespaces = ["usedMarketDB.items"]
# if you want to use MongoDB change streams instead of legacy oplog tailing use change-stream-namespaces
# change streams require at least MongoDB API 3.6+
# if you have MongoDB 4+ you can listen for changes to an entire database or entire deployment
# in this case you usually don't need regexes in your config to filter collections unless you target the deployment
# to listen to an entire db use only the database name. For a deployment use an empty string.
change-stream-namespaces = ["usedMarketDB.items"]
# additional settings# compress requests to Elasticsearch
gzip = true
# generate indexing statistics
stats = true
# index statistics into Elasticsearch
index-stats = true
# use 4 go routines concurrently pushing documents to Elasticsearch
elasticsearch-max-conns = 4
# propogate dropped collections in MongoDB as index deletes in Elasticsearch
dropped-collections = false
# propogate dropped databases in MongoDB as index deletes in Elasticsearch
dropped-databases = false
# in Elasticsearch with a newer version. Elasticsearch is preventing the old docs from overwriting new ones.
replay = false
# resume processing from a timestamp saved in a previous run
resume = false
# do not validate that progress timestamps have been saved
resume-write-unsafe = true
# override the name under which resume state is saved
resume-name = "default"
# use a custom resume strategy (tokens) instead of the default strategy (timestamps)
# tokens work with MongoDB API 3.6+ while timestamps work only with MongoDB API 4.0+
resume-strategy = 1
# print detailed information including request traces
verbose = true
index-as-update = falseindex-oplog-time = falseindex-files = falsefile-highlighting = falseelasticsearch-user = "elastic"elasticsearch-password = "{elasticsearch-password}"[[mapping]]
namespace = "{mongoDB namespace}"
index
[[script]]
namespace = "{mongoDB namespace}"
script = """
module.exports = function(doc) {
var newdoc = {
title : doc.title
community: doc.community,
salePrice: doc.salePrice,
saleStatus: doc.saleStatus,
uploadTime: doc.uploadTime,
parentCommunity: doc.parentCommunity,
} return newdoc;
}
"""

실행

백그라운드로 실행

nohup ./monstache -f mongo-elastic.toml 2>&1 &

Reference

https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-nori-analyzer.html
https://coding-start.tistory.com/167
https://rastalion.me/mongodb-to-elasticsearch-realtime-sync/

--

--