User:Dominik.epple/DocumentConverterInstall

From Open-Xchange
DocumentConverter Quickinstall Guide / Cheatsheet

Introduction

This document aims to be a condensed "HOWTO" like installation walkthrough.

The other relevant documentation is to be considered as a reference.

This document describes the fully clustered setup. Simpler setups can be deduced as special case.

This document has been created and verified on CentOS7 and OX App Suite 7.8.4.

Design description

We consider a clustered setup with multiple middleware nodes and multiple converter nodes.

We consider a high available connection from the middleware nodes to the converter nodes realized using HAproxy instances running locally on the middleware nodes which do simple round-robin to the Apache instances of the Fronted nodes, which will do correct session stickyness aware routing to the converter nodes.

We need the Apache instances to get correct session sticky routing behavior.

We need the HAproxy instances to connect to the Apache instances in a high available fashion. If there are other means in the infrastructure offering that functionality, this is also okay. We present the HAproxy based setup to give an example of a fully working setup.

Software installation and configuration

Middleware nodes

Prerequisites: Drive is installed and working. We need Drive to upload test documents and verify their conversion. This also assumes (since we a discussing a clustered setup) a clustered filestore is available and working. If you are only going for a single node proof-of-concept setup, that is not relevant of course.

Packages to be installed:

open-xchange-documentconverter-api open-xchange-documentconverter-client

Configuration:

/opt/open-xchange/etc/permissions.properties:
# assume a global switch on for testing. further config cascade stuff etc is out of scope
# of this document.
com.openexchange.capability.document_preview=true
documentconverter-client.properties:
# this needs to be adjusted. will be discussed below.
com.openexchange.documentconverter.client.remoteDocumentConverterUrl=http://host[:port]/documentconverterws

Frontend nodes

None

Converter nodes

Packages to be installed:

open-xchange-documentconverter-server open-xchange-documentconverter-api readerengine

(Note: the official documentation also mentions "pdf2svg" which at least on CentOS7 does not exist, but rather a package named "readerengine-pdf2svg" is pulled as dependency. So for the moment let's assume we don't need to install that explicitly, but if you are following this guide on Debian you should double-check.)

Configuration:

/opt/open-xchange/etc/server.properties:
# Pick a unique route, which will be configured consistently in apache
com.openexchange.server.backendRoute=DC1
# Clustered setups need to listen not only on localhost
com.openexchange.connector.networkListenerHost=*
/opt/open-xchange/documentconverter/etc/documentconverter.properties 
# TODO figure out the recommended Cache setup for a clustered scenario
com.openexchange.documentconverter.RemoteCacheUrls = ?
# Default is 3. Adjust for your sizing.
com.openexchange.documentconverter.jobProcessorCount=3

Services configuration:

systemctl start open-xchange-documentconverter-server
systemctl enable open-xchange-documentconverter-server

Note: you don't need ConfigDB / any DB access. Don't connect the converter nodes to any database. Don't make it join any Hazelcast cluster. No further configuration is required.

If you are annoyed by the "unable to connect to ConfigDB" (false) error messages in the logs, you can configure logback to suppress them (TODO: add information how to do that).

Configure the middleware to converter connectivity

So that was the trivial part. Now it gets interesting.

First test: direct connectivity

Pick a middleware node. Configure direct connection to one converter service which is called dc1. By default the service listens on port 8008 and listens on the path /documentconverterws.

# curl http://dc1:8008/documentconverterws/
<html>
<head><meta charset="UTF-8"><title>Open-Xchange DC</title></head>
<body><h1 align="center">OX Software GmbH DC</h1>
<p>WebService is running...</p>
<p>Error Code: 0</p>
<p>API: v5</p></body>
</html>

That's how it should look like. HTTP status code 200 (not shown for clarity, but you can verify with curl -v, "WebService is running...", "Error Code: 0".

If that is successful, go ahead and configure that URL in the middleware node

documentconverter-client.properties:
# this needs to be adjusted. will be discussed below.
com.openexchange.documentconverter.client.remoteDocumentConverterUrl=http://dc1:8008/documentconverterws

Do a testing cycle as described below.

If everything works, you confirmed that the middleware node and the converter node are configured correctly.

Now ensure all your middleware nodes and all your converter nodes are configured likewise. Make sure the converter nodes get unique com.openexchange.server.backendRoute values (see above).

Second test: use Apache for loadbalancing

Pick one frontend node to configure its Apache for loadbalancing for the converter nodes.

There are sample configuration stanzas in our default configuration. They are just fine. Just make sure the route parameters match the ones from the converter nodes.

I want to emphasize to configure the Allow line correctly. The /documentconverterws endpoint must not be made available publicly!

A sample Apache configuration looks like

<Proxy balancer://oxcluster_docs>
    Order Deny,Allow
    Deny from all
    # configure the allowed IPs such that only the middleware nodes are able to access
    # the /documentconverterws endpoint must not be made available publicly!
    Allow from 10.0.1
    BalancerMember http://dc1:8008 timeout=100 smax=0 ttl=60 retry=60 loadfactor=50 keepalive=On route=DC1
    BalancerMember http://dc2:8008 timeout=100 smax=0 ttl=60 retry=60 loadfactor=50 keepalive=On route=DC2
    # add further converter nodes, as many as you have
    ProxySet stickysession=JSESSIONID|jsessionid scolonpathdelim=On
    SetEnv proxy-initial-not-pooled
    SetEnv proxy-sendchunked
</Proxy>

ProxyPass /documentconverterws balancer://oxcluster_docs/documentconverterws

With that configuration in place, test the connectivity from the middleware node to that endpoint, e.g. (assuming the frontend node is called frontend1):

# curl http://frontend1/documentconverterws
<html>
<head><meta charset="UTF-8"><title>Open-Xchange DC</title></head>
<body><h1 align="center">OX Software GmbH DC</h1>
<p>WebService is running...</p>
<p>Error Code: 0</p>
<p>API: v5</p></body>
</html>

The response looks exactly like before. The difference is just that we access the service now via Apache.

Execute the test multiple times. It must answer fast every time. If some answers are slow (maybe only once or twice) that is an indication that some route configuration was incorrect and Apache disabled some converter targets.

Make sure you are actually getting responses from the different converter nodes by whatever means suit you (e.g. tcpdump on the converter nodes, looking at Apache's balancer-manager to verify all nodes show up as Ok, the Elected number increases equally for all converter nodes, etc.)

Configure that endpoint in your test middleware node (com.openexchange.documentconverter.client.remoteDocumentConverterUrl). Restart the middleware service.

Do a full test cycle as described below and in particular make sure that in pop-out view all preview images are rendered correctly. That is the testcase for correct session stickyness.

If everything works correctly, make sure the Apache configuration is adjusted accordingly on all your Apache / Fronted nodes.

Now we have high availability for the converter nodes, but the Apache service is still a SPOF.

Third test: adding local HAproxy instances

We add local HAproxy instances to eliminate the Apache SPOF.

Assuming HAproxy is already running locally for other services. If not, install it and do a basic configuration according to [HAproxy].

Add a stanza like

listen converter
    mode              http
    option            dontlognull
    option            redispatch
    no option         httpclose
    timeout connect   10s
    timeout client    2m
    timeout server    2m
    bind 127.0.0.1:8008
    balance roundrobin
    option httpchk GET /documentconverterws
    server frontend1 frontend1:80 check port 80 inter 6000 rise 3 fall 1
    server frontend2 frontend2:80 check port 80 inter 6000 rise 3 fall 1
    # add more frontend nodes if you have them

Re-test that also this listener works correctly.

# curl http://127.0.0.1:8008/documentconverterws
<html>
<head><meta charset="UTF-8"><title>Open-Xchange DC</title></head>
<body><h1 align="center">OX Software GmbH DC</h1>
<p>WebService is running...</p>
<p>Error Code: 0</p>
<p>API: v5</p></body>
</html>

Same as before, execute the test multiple times, and verify in the Apache's balancer-managers that everything works as expected.

Look at the HAproxy stats endpoint to verify all frontend nodes show as OK and get roughly the same number of sessions.

If that looks good, configure the HAproxy endpoint http://127.0.0.1:8008/documentconverterws in your test middleware node (com.openexchange.documentconverter.client.remoteDocumentConverterUrl). Restart the middleware service.

Do a full test cycle (as described below).

Final setup

Assuming the test middleware node works as described in the previous section, do the same HAproxy configuration on all middleware nodes and configure the HAproxy endpoint http://127.0.0.1:8008/documentconverterws in all middleware nodes' middleware configuration.

The fully clustered setup should be working now.

Testing

It is highly recommended to make sure you always end up on the same middleware node for testing, by whatever means is required in your infrastructure to do so. For this document we assume you can access them directly (internal testing). Otherwise you'd need to expose special entry points to your setup.

If you can't ensure that, you need to configure all middleware nodes absolutely identical also during setup and testing, which is very cumbersome and error-prone.

If you reconfigured something on the middleware node(s), restart the service there.

service open-xchange restart

Wipe out caches on the converter node(s) and restart the service there:

service open-xchange-documentconverter-server stop
rm -rf /var/spool/open-xchange/documentconverter/readerengine.*/*
service open-xchange-documentconverter-server start

(Careful about your paths and when copy-pasting; I don't take responsibility of you remove something wrong.)

If your testuser is logged in from a previous test, log out.

Login your testuser.

Access Drive. Upload test documents if not done before. Preferably some small trivial docs, also some large multi-page docs with embedded diagrams etc.

Switch to "icons" or "tiles" view. You should get nice preview thumbnails for each document. (This is only a relevant test for the first time since the previews are stored in the database. TODO: find out how to wipe / re-test that.)

Click the "eye" icon to start the Document Viewer. You should be able to view documents with reasonable speed. Scrolling through the pages should be possible and fast.

Click the "pop out" icon to the top right to use pop-out view. Verify all pages are rendered correctly in the full-page view and in the thumbnail previews.

Troubleshooting

In most of the times in our experiments if something was not working, it was a misconfiguration on network level. (Wrong hostnames, IPs, ports, missing firewall adjustments, etc).

The documentconverter service