6. Elastic search — 데이터 indexing, 연동(monstache)
앞에서 알아본 analyzer, setting, mapping을 적용시켜 데이터를 indexing시켜 보도록 하겠다.
7 min readOct 7, 2021
Analyzer 설정과 mapping
nori-analyzer
nori 는 한글 analyzer 로 엘라스틱 서치에서 지원한다.
하지만 기본 플러그인으로 포함되어 있지는 않기 때문에 수동으로 플러그인해야 한다.
{ELASTICSEARCH_PATH}/bin
./elasticsearch-plugin install analysis-nori
Setting, mapping
앞에서 공부한 내용을 토대로 analyzer를 구성하고 mapping했다.
- shard 5개
- replica 1개
- nori-analyzer
- nori-tokenizer
- decompound_mode : mixed - filter
- nori_readingform
연동 Monstache
mongoDB와 elastic search를 연동하기 위해 Logstash나 monstache를 사용한다. 그 중에서 monstache를 이용해서 연동해보도록 하겠다. monstache를 선택한 이유는 logstash보다 간단하고 쉽게 연동이 가능하기 때문이다.
monstache가 mongoDB에 새로운 데이터가 들어오면 elastic search에도 넣어준다. (동기화)
Install
- Go lang 설치
- monstache 설치
> wget <https://github.com/rwynn/monstache/releases/download/v6.7.4/monstache-98f8bc6.zip>
> unzip monstache-98f8bc6.zip
설정 파일 생성
mongo-elastic.toml 파일 생성
mongo-url, elasticsearch-url 설정
namespace 지정
- direct-read-namespaces : mongoDB의 namespace 규칙을 따라야 한다.
# connection settings# connect to MongoDB using the following URL
mongo-url = "{mongoDB URL}"
# connect to the Elasticsearch REST API at the following node URLs
elasticsearch-urls = ["<http://192.168.0.87:9200>"]# frequently required settings# if you need to seed an index from a collection and not just listen and sync changes events
# you can copy entire collections or views from MongoDB to Elasticsearch
direct-read-namespaces = ["usedMarketDB.items"]# if you want to use MongoDB change streams instead of legacy oplog tailing use change-stream-namespaces
# change streams require at least MongoDB API 3.6+
# if you have MongoDB 4+ you can listen for changes to an entire database or entire deployment
# in this case you usually don't need regexes in your config to filter collections unless you target the deployment
# to listen to an entire db use only the database name. For a deployment use an empty string.
change-stream-namespaces = ["usedMarketDB.items"]# additional settings# compress requests to Elasticsearch
gzip = true# generate indexing statistics
stats = true# index statistics into Elasticsearch
index-stats = true# use 4 go routines concurrently pushing documents to Elasticsearch
elasticsearch-max-conns = 4# propogate dropped collections in MongoDB as index deletes in Elasticsearch
dropped-collections = false# propogate dropped databases in MongoDB as index deletes in Elasticsearch
dropped-databases = false# in Elasticsearch with a newer version. Elasticsearch is preventing the old docs from overwriting new ones.
replay = false# resume processing from a timestamp saved in a previous run
resume = false# do not validate that progress timestamps have been saved
resume-write-unsafe = true# override the name under which resume state is saved
resume-name = "default"# use a custom resume strategy (tokens) instead of the default strategy (timestamps)
# tokens work with MongoDB API 3.6+ while timestamps work only with MongoDB API 4.0+
resume-strategy = 1# print detailed information including request traces
verbose = trueindex-as-update = falseindex-oplog-time = falseindex-files = falsefile-highlighting = falseelasticsearch-user = "elastic"elasticsearch-password = "{elasticsearch-password}"[[mapping]]
namespace = "{mongoDB namespace}"
index[[script]]
namespace = "{mongoDB namespace}"
script = """
module.exports = function(doc) {var newdoc = {
title : doc.title
community: doc.community,
salePrice: doc.salePrice,
saleStatus: doc.saleStatus,
uploadTime: doc.uploadTime,
parentCommunity: doc.parentCommunity, } return newdoc;
}
"""
실행
백그라운드로 실행
nohup ./monstache -f mongo-elastic.toml 2>&1 &
Reference
https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-nori-analyzer.html
https://coding-start.tistory.com/167
https://rastalion.me/mongodb-to-elasticsearch-realtime-sync/