8 Workers setup with nginx
Bo Jeanes edited this page 2021-01-12 16:14:04 +11:00
This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

WIP

The actual documentation for setting up workers is not really easy to follow :

This is how I change my setup for using workers. Seems to work for me now.

**WARNING : SHOULD BE REVIEWED ! WIP ! ** Look at the issues below first

I expect you have already a working synapse configuration. Not putting whole config files here

Background

  • My setup is having around 400 users. mostly around 300 concurrent connections on day time. 4500 local rooms. Some big federated rooms too.
  • Server is running in a VMware with 16 CPU and 32GB RAM (half of it for postgreSQL).
  • DB is 14GB big
  • nginx is used as a reverse proxy
  • Synapse homeserver process is hammering with 100-120%CPU all day long, but never uses more of the CPUs.
  • my nginx graph gives an average of 140 requests/s in working hours
  • I'm using the debian packages of matrix.org and starting matrix with systemd

Which workers are meaningful ?

analysing old logs

First, I wanted to check what endpoints are asked the most in my installation. I grepped the endpoints of every worker as described in https://github.com/matrix-org/synapse/blob/master/docs/workers.md in my nginx access log for 24 hours. Below the grep I used for different worker's endpoints

synapse.app.synchrotron

grep -E '(/_matrix/client/(v2_alpha|r0)/sync|/_matrix/client/(api/v1|v2_alpha|r0)/events|/_matrix/client/(api/v1|r0)/initialSync|/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync)'

synapse.app.federation_reader

grep -E '(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/publicRooms|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/send/|/_matrix/federation/v1/get_groups_publicised|/_matrix/key/v2/query|/_matrix/federation/v1/groups/)'

synapse.app.media_repository

grep -E '(/_matrix/media/|/_synapse/admin/v1/purge_media_cache|/_synapse/admin/v1/room/.*/media.*|/_synapse/admin/v1/user/.*/media.*|/_synapse/admin/v1/media/.*|/_synapse/admin/v1/quarantine_media/.*)'

synapse.app.client_reader

grep -E '(/_matrix/client/(api/v1|r0|unstable)/publicRooms|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/joined_members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/context/.*|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/members|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state|/_matrix/client/(api/v1|r0|unstable)/login|/_matrix/client/(api/v1|r0|unstable)/account/3pid|/_matrix/client/(api/v1|r0|unstable)/keys/query|/_matrix/client/(api/v1|r0|unstable)/keys/changes|/_matrix/client/versions|/_matrix/client/(api/v1|r0|unstable)/voip/turnServer|/_matrix/client/(api/v1|r0|unstable)/joined_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups|/_matrix/client/(api/v1|r0|unstable)/publicised_groups/|/_matrix/client/(api/v1|r0|unstable)/pushrules/.*|/_matrix/client/(api/v1|r0|unstable)/groups/.*|/_matrix/client/(r0|unstable)/register|/_matrix/client/(r0|unstable)/auth/.*/fallback/web)'

Note : I didn't included /_matrix/client/(api/v1|r0|unstable)/rooms/.*/messages) 175576 (without /messages) 9998816 (with /messages not sure why)

synapse.app.user_dir

grep -E '/_matrix/client/(api/v1|r0|unstable)/user_directory/search'

synapse.app.frontend_proxy

grep -E '/_matrix/client/(api/v1|r0|unstable)/keys/upload'

synapse.app.event_creator

grep -E '(/_matrix/client/(api/v1|r0|unstable)/rooms/.*/send|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/state/|/_matrix/client/(api/v1|r0|unstable)/rooms/.*/(join|invite|leave|ban|unban|kick)|/_matrix/client/(api/v1|r0|unstable)/join/|/_matrix/client/(api/v1|r0|unstable)/profile/)'

results

workers endpoints request/day percent
synchrotron 9017921 90.19%
federation_reader 321413 3.21%
media_repository 115749 1.16%
client_reader 175576 1.76%
user_dir 1341 0.01%
frontend_proxy 6936 0.07%
event_creator 26876 0.27%
total 9665812 96.67%
total requests 9998816 100.00%
others 333004 3.33%

So the synchrotron would make the most of sense for me (since I think my setup is standard, I guess it's almost always like this)

Setting up synchrotron worker(s)

WARNING : I broke parts of my setup a lot while trying to do it on a live server.

homeserver.yaml

Just add this in the existing listeners part of the config

listeners:
  # The TCP replication port
  - port: 9092
    bind_address: '127.0.0.1'
    type: replication
  # The HTTP replication port
  - port: 9093
    bind_address: '127.0.0.1'
    type: http
    resources:
     - names: [replication]

Also add this to homeserver.yaml

worker_app: synapse.app.homeserver
daemonize: false 

restart your synapse to check it's still working

# systemctl restart matrix-synapse

workers configuration

Note : if you work as root, take care of giving the config files to matrix-synapse user after creating them

I used the systemd instructions from here https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers. But I changed it to be able to start multiple synchrotron workers.

mkdir /etc/matrix-synapse/workers

/etc/matrix-synapse/workers/synchrotron-1.yaml

worker_app: synapse.app.synchrotron

# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093

worker_listeners:
 - type: http
   port: 8083
   resources:
     - names:
       - client

worker_daemonize: False
worker_pid_file: /var/run/synchrotron1.pid
worker_log_config: /etc/matrix-synapse/synchrotron1-log.yaml
# This is needed until that https://github.com/matrix-org/synapse/pull/7133
send_federation: False
update_user_directory: False
start_pushers: False
notify_appservices: False

If you want to run multiple synchrotron, create other config like this sed -e 's/synchrotron1/sychrotron2/g' -e 's/8083/8084' /etc/matrix-synapse/workers/synchrotron1.yaml > /etc/matrix-synapse/workers/synchrotron2.yaml

Don't forget to create log config files as weel for each worker.

/etc/matrix-synapse/synchrotron1-log.yaml

This process should produce the logfile /var/log/matrix-synapse/synchrotron1.log It may possibly be reduced...

version: 1

formatters:
  precise:
   format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s- %(message)s'                                                                                        

filters:
  context:
    (): synapse.util.logcontext.LoggingContextFilter
    request: ""

handlers:
  file:
    class: logging.handlers.RotatingFileHandler
    formatter: precise
    filename: /var/log/matrix-synapse/synchrotron1.log
    maxBytes: 104857600
    backupCount: 10
    filters: [context]
    encoding: utf8
    level: DEBUG
  console:
    class: logging.StreamHandler
    formatter: precise
    level: WARN

loggers:
    synapse:
        level: WARN

    synapse.storage.SQL:
        level: INFO

    synapse.app.synchrotron:
            level: DEBUG
root:
    level: WARN
    handlers: [file, console]

Starting the worker

I tried to start the worker with synctl but I had to change the config to include /etc/matrix-synapse/conf.d/* in it cause it wasn't reading them. Since I use systemd to start it in production, it's better to set up workers to start with systemd directly

systemd

Followed this : https://github.com/matrix-org/synapse/tree/master/contrib/systemd-with-workers

And created an extra systemd service to be able to have multiple synchrotrons.

/etc/systemd/system/matrix-synapse-worker-synchrotron@.service

[Unit]
Description=Synapse Matrix Worker
After=matrix-synapse.service
BindsTo=matrix-synapse.service

[Service]
Type=notify
NotifyAccess=main
User=matrix-synapse
WorkingDirectory=/var/lib/matrix-synapse
EnvironmentFile=/etc/default/matrix-synapse
ExecStart=/opt/venvs/matrix-synapse/bin/python -m synapse.app.synchrotron --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matrix-synapse/conf.d/ --config-path=/etc/matrix-synapse/workers/synchrotron-%i.yaml
ExecReload=/bin/kill -HUP $MAINPID
Restart=always
RestartSec=3
SyslogIdentifier=matrix-synapse-synchrotron-%i

[Install]
WantedBy=matrix-synapse.service
  • Reload the systemd config : systemctl daemon-reload
  • start synchrotron1 : systemctl start matrix-synapse-worker-synchrotron@1.service
  • check the logs : journal -xe -f -u matrix-synapse-worker-synchrotron@1.service

If this worked, you should have now an extra python process for synchrotron1. But it doesn't handle any traffic yet.

Nginx config

Some extras

add this to your default_server somewhere in server { }

        location /nginx_status {
                stub_status on;
                access_log   off;
                allow 127.0.0.1;
                allow ::1;
                deny all;
        }

you can then get some ideas of the requests you get with

$ curl http://127.0.0.1/nginx_status
Active connections: 270 
server accepts handled requests
 172758 172758 3500311 
Reading: 0 Writing: 126 Waiting: 144 

upstream synchrotrons

First, I set up a pool for the synchrotrons (look at the ports configured in the workers). This way, I could scale out when there is too much load. I also added a log format to be able to trace in nginx which worker is handling which request (stolen somewhere I don't remember) :

Place this in your nginx config (I put it in my vhost config outside of server {})

log_format backend '$remote_addr - $remote_user - [$time_local] $upstream_addr: $request $status URT:$upstream_response_time request_time $request_time';

upstream synchrotron {
#               ip_hash; # this might help in some cases, not in mine
#               server 127.0.0.1:8008; # main synapse process, to roll back when it goes wrong (reacted strangely)
               server 127.0.0.1:8083; # synchrotron1
#               server 127.0.0.1:8084; # synchrotron2
#               server 127.0.0.1:8085; # synchrotron3
}

Then, you can change the default log format of your vhost :

server {
#[...]
       access_log /var/log/nginx/matrix-access.log backend;
#[...]
}

reverse proxy the endpoints

in my server {} section I set multiple locations (to avoid a very big regexp):

       location ~ ^/_matrix/client/(v2_alpha|r0)/sync$ {
                       proxy_pass http://synchrotron$request_uri;
                       proxy_set_header X-Forwarded-For $remote_addr;
                       proxy_set_header Host $host;
       }  
       location ~ ^/_matrix/client/(api/v1|r0)/rooms/[^/]+/initialSync$ {
               proxy_pass http://synchrotron$request_uri;
               proxy_set_header X-Forwarded-For $remote_addr;
               proxy_set_header Host $host;
       }
       location ~ ^/_matrix/client/(api/v1|r0)/initialSync$ {
               proxy_pass http://synchrotron$request_uri;
               proxy_set_header X-Forwarded-For $remote_addr;
               proxy_set_header Host $host;
       }
       location ~ ^/_matrix/client/(api/v1|v2_alpha|r0)/events$ {
               proxy_pass http://synchrotron$request_uri;
               proxy_set_header X-Forwarded-For $remote_addr;
               proxy_set_header Host $host;
       }

reload the nginx config, and your synchrotron worker should start to get traffic.

federation_reader

workers/federation_reader.yaml

synapse.app.federation_reader listen on port 8011

worker_app: synapse.app.federation_reader

worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093

worker_listeners:
    - type: http
      port: 8011
      resources:
          - names: [federation]

            
worker_pid_file: "/var/run/app.federation_reader.pid"
worker_daemonize: False
worker_log_config: /etc/matrix-synapse/federation-reader-log.yaml
# This is needed until that https://github.com/matrix-org/synapse/pull/7133
send_federation: False
update_user_directory: False
start_pushers: False
notify_appservices: False

Here I separated the ^/_matrix/federation/v1/send/ endpoint, since it's documented that this cannot be multiple

        location ~ ^/_matrix/federation/v1/send/ {
                proxy_pass http://127.0.0.1:8011$request_uri;
                proxy_set_header X-Forwarded-For $remote_addr;
                proxy_set_header Host $host;
        }
# and a big regex for the rest
        location ~ ^(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/get_groups_publicised$|/_matrix/key/v2/query|/_matrix/federation/v1/groups/) {
                proxy_pass http://127.0.0.1:8011$request_uri;
                proxy_set_header X-Forwarded-For $remote_addr;
                proxy_set_header Host $host;
        }

event_creator

/etc/matrix-synapse/workers/event_creator.yaml

worker_app: synapse.app.event_creator

# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093

worker_listeners:
 - type: http
   port: 8102
   resources:
     - names:
       - client

worker_daemonize: False
worker_pid_file: /var/run/event_creator.pid
worker_log_config: /etc/matrix-synapse/event_creator-log.yaml
# This is needed until that https://github.com/matrix-org/synapse/pull/7133
send_federation: False
update_user_directory: False
start_pushers: False
notify_appservices: False

nginx

# events_creator
       location ~ ^/_matrix/client/(api/v1|r0|unstable)(/rooms/.*/send|/rooms/.*/state/|/rooms/.*/(join|invite|leave|ban|unban|kick)$|/join/|/profile/) {
               proxy_pass http://127.0.0.1:8102$request_uri;
               proxy_set_header X-Forwarded-For $remote_addr;
               proxy_set_header Host $host;
       }

media_repository

/etc/matrix-synapse/workers/media_repository.yaml

worker_app: synapse.app.media_repository

# The replication listener on the synapse to talk to.
worker_replication_host: 127.0.0.1
worker_replication_port: 9092
worker_replication_http_port: 9093

worker_listeners:
 - type: http
   port: 8101
   resources:
     - names:
       - media

worker_daemonize: False
worker_pid_file: /var/run/media_repository.pid
worker_log_config: /etc/matrix-synapse/media_repository-log.yaml
# This is needed until that https://github.com/matrix-org/synapse/pull/7133
send_federation: False
update_user_directory: False
start_pushers: False
notify_appservices: False

Nginx

# media_repository
       location ~ (^/_matrix/media/|^/_synapse/admin/v1/purge_media_cache$|^/_synapse/admin/v1/room/.*/media.*$|^/_synapse/admin/v1/user/.*/media.*$|^/_synapse/admin/v1/media/.*$|^/_synapse/admin/v1/quarantine_media/.*$) {
               proxy_pass http://127.0.0.1:8101$request_uri;
               proxy_set_header X-Forwarded-For $remote_addr;
               proxy_set_header Host $host;
       }

Issues

This are the issues I met until now (it might also have been related to some big federated rooms):