HibernateSearchからElasticSearchへの接続タイムアウト

こんにちは
昨年１０月からジョインさせて頂いてます卜部です
よろしくおねがいします

webアプリケーションから検索を伴うページが表示できず、タイトルのようなことが起きていたので調べた事をメモしておきます

まずはエラーログ

org.hibernate.search.exception.SearchException: HSEARCH400007: Elasticsearch request failed

ググった結論から言うと、アプリケーション側で特に指定しないまま（デフォルト）だとコネクション数２なので、１０に指定するといいよという話のようです
Hibernate Search: concurrent timeout exception
Max_total_connection_per_route property

という事で、コネクション数が接続タイムアウトに対して効果があるのか検証してみました。

再現方法

アプリケーションからElasticSearchへのコネクション数を増やしながら、同じ方法でElasticSearchに対して負荷をかけます。
後述する項目についてデータを記録し、それを比較をしていきます。
コネクション数はプロパティに値を指定することで変更します。

spring.jpa.properties.hibernate.search.default.elasticsearch.max_total_connection_per_route=10

負荷の再現

現象が発生した時の条件を揃えるのは難しいので、バックアップを使って全データに対して再indexingすることで代用します

比較したい項目

アプリケーション設定の変更による他への影響も知りたかったので、こんな視点でチェックしてみます

アプリケーションが全データ再indexingに要した時間
アプリケーションとElasticSearch間でのデータ推移（時間と量）
ElasticSearchのステータス

測定結果

※１AZ、１ノードの環境で検証

index再構成のログ

 １コネクション：３８５秒
2020-01-31 15:40:47.987  INFO 6080 --- [nsport thread-2] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000030: 260000 documents indexed in 385070 ms
2020-01-31 15:40:47.987  INFO 6080 --- [nsport thread-2] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000031: Indexing speed: 675.201904 documents/second; progress: 100.00%
2020-01-31 15:40:48.396  INFO 6080 --- [h coordinator-1] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000028: Reindexed 260012 entities
 
２コネクション（デフォルト）：３６７秒
2020-01-31 17:26:47.338  INFO 9567 --- [nsport thread-2] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000030: 260000 documents indexed in 367259 ms
2020-01-31 17:26:47.338  INFO 9567 --- [nsport thread-2] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000031: Indexing speed: 707.947266 documents/second; progress: 100.00%
2020-01-31 17:26:47.803  INFO 9567 --- [h coordinator-1] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000028: Reindexed 260012 entities

４コネクション：４０３秒
2020-01-31 12:17:49.336  INFO 30276 --- [nsport thread-2] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000030: 260000 documents indexed in 403305 ms
2020-01-31 12:17:49.336  INFO 30276 --- [nsport thread-2] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000031: Indexing speed: 644.673401 documents/second; progress: 100.00%
2020-01-31 12:17:49.748  INFO 30276 --- [h coordinator-1] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000028: Reindexed 260012 entities

１０コネクション：３７９秒
2020-01-31 14:53:18.370  INFO 1758 --- [nsport thread-2] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000030: 260000 documents indexed in 379773 ms
2020-01-31 14:53:18.370  INFO 1758 --- [nsport thread-2] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000031: Indexing speed: 684.619507 documents/second; progress: 100.00%
2020-01-31 14:53:18.850  INFO 1758 --- [h coordinator-1] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000028: Reindexed 260012 entities
 
２０コネクション：３４８秒
2020-01-31 20:41:20.533  INFO 17435 --- [nsport thread-2] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000030: 260000 documents indexed in 348115 ms
2020-01-31 20:41:20.533  INFO 17435 --- [nsport thread-2] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000031: Indexing speed: 746.879639 documents/second; progress: 100.00%
2020-01-31 20:41:21.023  INFO 17435 --- [h coordinator-1] o.h.s.b.i.SimpleIndexingProgressMonitor  : HSEARCH000028: Reindexed 260012 entities

HibernateSearchのログでは秒数がバラバラで、少なくともコネクション数との相関関係は見られませんでした

アプリケーションサーバからのデータトラフィック量

見づらくて申し訳ないですが、グラフの山は左からそれぞれコネクション数４，１０，１，２（デフォルト），２０です

アプリケーションサーバから送信トラフィックが増え始め、０に収束するまでの時間は、先程のログで表記された時間とかなり違っています。
２０、１０コネクションはほぼログの時間と誤差無しですが、それ未満のコネクション数ではログの時間経過した後にもデータを送信し続けています。
アプリのログとの乖離が小さくなるため、これだけでも１０コネクションに設定するメリットはありそうです。

ElasticSearchのステータス

画像を消してしまったので後日撮ったデータで申し訳ないですが、２（デフォルト）と１０の比較です

インデックス作成、検索クエリのレイテンシー、バルクスレッドプールの低下も明らかです。
もちろん他にも検証すべき視点や調査も必用ですが、少なくともパフォーマンス改善に繋がりそうだと分かりました。

検証環境では１ノードだったので２０コネクションが最もパフォーマンスが出ていましたが、トータル接続数の設定（デフォルトで２０）は変更していないので、本番２ノードに適用するには１０コネクションにしてみるのが良さそうです。
経過観察して、効果を報告できればと思います