Installation of FeatureHub
Deployment Options
As explained in the Architecture section, there are a number of different combinations for deployment of FeatureHub and they typically focus on where you will deploy the infrastructure, how you want your clients to use it and how much use it will get. FeatureHub as of 1.5.0 is designed to support very low use cases of only hundreds of requests a day up to tens of millions of requests a day or more.
To get you up and going quickly, we have created a number of different Docker options using Docker-Compose.
You can check for the latest versions here
To install, grab the latest version of FeatureHub tagged release, e.g:
curl -L https://github.com/featurehub-io/featurehub-install/archive/refs/tags/featurehub-1.4.1.tar.gz | tar xv
Make sure you have your docker server running and docker-compose installed if you are using it (or your swarm set up if you are using docker stack). These example stacks are great for experimenting or to understand what the different capabilities of the FeatureHub stack are - for example read replica databases, OAuth2 configuration for your provider, multiple NATs servers and so forth.
Evaluation Deployment
For the Evaluation option (not recommended for production):
cd featurehub-install-featurehub-1.4.1/docker-compose-options/all-in-one-h2
docker compose up
This will install all the necessary components including FeatureHub Admin Console. You can now load it on http://localhost:8085
What makes it only an evaluation option is simply the database used (H2), it writes to the local disk and is not intended to be used for a long running or highly performant, concurrent system.
Production-Ready Deployments
Streaming | App traffic volume | Option | Recommendation |
---|---|---|---|
Yes |
Low |
1a |
Suits a simple container deployment platform like ECS, Compute Engine with a single Docker instance and using Party Server |
Yes |
Mid-High |
2a |
Suits a Kubernetes deployment like GKS,EKS,AKS or a more complex ECS based deploy and using Edge/Dacha deployed separately from Admin App |
No |
Low |
1b |
Suits a Party-Server-ish deployment with a single database on Cloud Run or Azure Container Instances. Could also deploy to platform like ECS or Compute Engine but less cost-effective. |
No |
Mid |
2b |
Suits a MR deployment with a single master and one or more read replicas, with Edge-Rest pointed at the read replicas. |
Option 1a - Low Volume Deployment (Streaming)
With Option 1a, the Party Server, all of the services are deployed in a single running container. Internally this is packaged as a single Java server running four different services (a static web server serving the Admin App, the Management Repository server, the cache and the Edge service), and also the NATs service bus to provide cache and streaming services. This allows you to run only one of these (for instance) and support a full streaming service talking to an external database.
The image is the same as the basic evaluation image with the difference being the database (we recommend Postgres).
Because the single container is responsible for handling all incoming requests (including requests for features), this will mean it should be able to handle around 150-200 concurrent requests per CPU, but streaming requests will further limit that capacity. As such, it causes Edge traffic to compete with Admin traffic.
You can run up multiple Party Servers, in this case the nats.urls
configuration (see below) must
be set correctly for discovery (along with allowing network traffic between them). If you start doing this, it is likely a better
choice to split them into multiple parts - with Admin on one server and Edge/Dacha on others. If you think your usage will grow, we
encourage you to use different DNS hosts pointing to the same server for features vs the Admin app.
Setup Instructions
For Postgres option:
cd featurehub-install-featurehub-1.4.1/docker-compose-options/all-in-one-postgres
docker compose up
Or for MySQL option:
cd featurehub-install-featurehub-1.4.1/docker-compose-options/all-in-one-mysql
docker compose up
This will install all the necessary components including FeatureHub Admin Console. You can now load it on localhost:8085
Option 1b - Low Volume Deployment (Non-streaming)
In this case, this deployment, known as party-server-ish
is different from the evaluation image, and deploys only the Management
Repository and a version of Edge that talks to the database. The party-server-ish
serves the website, Admin App and Edge-Rest applications
running inside a single process. There is no NATS or Dacha, and no SSE based
streaming capability available.
This kind of option is suitable if you are only using GET requests (being able to use the test API to update features remains available), such as for mobile or Web applications.
As with all deploys, you can configure a read replica for each container, and Edge requests will hit the replica by default (as they are read only).
Option 2a - Streaming Scalable Deployment
This option is best if you want to run FeatureHub in production at scale. Running separate instances of Edge, Cache, NATS and FeatureHub MR Server, means you can deploy these components independently for scalability and redundancy, and Docker images are provided for each of these services (see our docker-compose section below).
Because they are deployed in separate containers, you have considerably greater control over what network traffic gains access to each of these pieces, and they do not all sit under the same Web server. This kind of deployment is intended for situations where you want streaming support, or where you want much greater volume or response than the 2b solution can provide you. The Dacha servers are able to support massive horizontal scaling of features and feature updates, further scaling up and not creating a load on the database as necessary by the use of the NATS cluster.
We provide an installation option with Postgres database. It brings up the Admin App (MR), the cache (Dacha), the Edge server, the distributed bus (NATS) and the database all as separate services. Edge runs on a different port to the Admin App and shows how you can use a different URL to serve traffic for feature consumers from your Admin App.
Setup Instructions
cd featurehub-install-featurehub-1.4.1/docker-compose-options/all-separate-postgres
docker compose up
There is also a helm chart available for production Kubernetes deployment for this option. Please follow documentation here. It doesn’t include a Postgres or NATs server as generally your cloud provider will have a managed Postgres service, and NATs have their own Kubernetes Helm charts for scalable, reliable deploys.
Option 2b - Non-Streaming Scalable Deployment
This option is limited only by the number of read replicas you can support and the method you have over balancing access
to these replicas. This deployment uses the separation of mr
(the Admin App and its backend) from edge-rest
instead of bundling
them together and configuring a read replica for edge-rest
(the only time we recommend doing this). As many cloud providers
allow you to configure multiple active read replicas, potentially across different zones of the world, this allows you to scale
your connectivity across those replicas. See the documentation below on Database Read Replicas for how to configure this.
Cloud Deployments
Deploying FeatureHub (non-streaming) on Google Cloud Run
Google Cloud Run lets you spin up a container instance and multiplex requests to it, making it directly available as soon as you have configured it. These are basic instructions on how to do this.
Create your Cloud SQL Instance
In this example we use the instance of Postgres 13 of the smallest possible size and deploy a 2 cpu, 512Mb Cloud Run instance that scales from 0 to 3, allowing up to 400 incoming requests concurrently per instance. Each CPU for incoming Edge requests is capable of supporting around 200 concurrent requests. The CPU of the database affects the speed at which the instances respond - for example we were only able to sustain around 50 requests per second (with around a 650ms time per request) with a 0.6 CPU database.
export GCP_REGION=us-east1 export GCP_ZONE=us-east1-b gcloud config set project your-project gcloud config set compute/zone $GCP_ZONE
We are now going to create a Cloud SQL database, so you need to choose a root password, a database name and a schema name. We will create a very small instance that is zonal only, has no daily backup, and connectivity via public IP but SSL - Cloud SQL pricing give you more details on how much this will cost. Obviously you can choose a larger one, but this initial deployment will probably be throwaway as it is quite easy. This step takes a while
export FH_DB_NAME=featurehub-db export FH_DB_PASSWORD=FeatureHub17# export FH_DB_SCHEMA=featurehub gcloud sql instances create $FH_DB_NAME --database-version=POSTGRES_13 --zone=$GCP_ZONE --tier=db-f1-micro "--root-password=$FH_DB_PASSWORD" --assign-ip --require-ssl --storage-type=SSD
this should just show you a database schema called postgres
gcloud sql databases list --instance=$FH_DB_NAME
Now create the new featurehub database schema
gcloud sql databases create $FH_DB_SCHEMA --instance $FH_DB_NAME
now get the "connection name" - it is the connectionName
parameter from this:
gcloud sql instances describe $FH_DB_NAME
You need it in the custom properties below. In my case this was
backendType: SECOND_GEN connectionName: featurehub-example:us-central1:featurehub-db databaseVersion: POSTGRES_13 ...
this becomes the name you pass to the container
export FH_DB_CONN_NAME=featurehub-example:us-central1:featurehub-db
Create your Cloud Run deployment
export FH_CR_NAME=featurehub export FH_IMAGE=featurehub/party-server-ish:1.5.4
Note that you need to be a Project Owner or Cloud Run Admin to allow unauthenticated traffic.
export HOST_URL=http://localhost gcloud run deploy $FH_CR_NAME --image=$FH_IMAGE --min-instances=0 --max-instances=3 --cpu=2 --memory=512Mi --port=8085 --concurrency=400 "--set-env-vars=db.url=jdbc:postgresql:///$FH_DB_SCHEMA,db.username=postgres,db.password=$FH_DB_PASSWORD,db.minConnections=3,db.maxConnections=100,monitor.port=8701,db.customProperties=cloudSqlInstance=$FH_DB_CONN_NAME;socketFactory=com.google.cloud.sql.postgres.SocketFactory" --set-cloudsql-instances=$FH_DB_NAME --platform=managed --region=$GCP_REGION --allow-unauthenticated
If you are using OAuth2, then you will need to set those properties, and we recommend setting your oauth2.disable-login
to true to
prevent being able to login without an OAuth2 connection.
Use the example Cloud Shell to ensure you can connect to it, but it can take a while to create.
Configuration
Run configuration
By this we mean the properties you can set to control the behaviour of different servers. As of 1.5.0 all FeatureHub controller properties are available as environment variables using the same case. If you have been using the mechanism introduced in 1.4.1 this still works but isn’t recommended going forward.
If you are using a system like
Kubernetes, you can mount these properties in /etc/app-config/application.properties
and
/etc/app-config/secrets.properties
.
Database configuration
All subsystems that talk to the database take these parameters. Even if you are using environment variables, we recommend using lower case so the database connections are correctly configured.
-
db.url
- the jdbc url of the database server. -
db.username
- the username used to log in. -
db.password
- the password for the user -
db.minConnections
- the minimum number of connections to hold open (default 3) -
db.maxConnections
- the maximum connections to open to the db (default 100) -
db.pstmtCacheSize
- the prepared statement cache size (no default)
The library we use - ebean - supports a number of other configuration parameters
Database Read Replicas
We also support Read Replicas which are useful for deployments of edge-rest
. We do not
recommend them for mr
, party-server
or party-server-ish
deployments as read replicas have to behave
like being up to a couple of seconds out is ok. This is fine for edge-rest
as it’s major functionality
is reading via a GET. To use a read replica db.
prefixes use db-replica
prefixes to configure a read replica,
where it is and how it should be connected to. Typically an edge-rest
deployment will configure both of these (db
and db-replica
parameters) but the corresponding mr
will not.
NATS
If you are using the Streaming version of FeatureHub, then you may need to configure your NATS urls. If you have only once instance of a party-server, you can leave it as the default. If you have deployed Option 2, or you have multiple servers with Option 1a, you will need to make sure your NATS servers are configured correctly.
-
nats.urls
- a comma separated list of NATs servers.
NATS works by having the clients tell the servers where each other are, so the NATS servers need to be routable (i.e. they must be able to talk to each other) but do not need to be explicitly told about each other.
Management Repository
The following properties can be set:
-
passwordsalt.iterations
(1000) - how many iterations it will use to salt passwords -
cache.pool-size
(10) - how many threads it will allocate to publishing changes to Dacha and SSE -
feature-update.listener.enable
(true) - whether this MR should listen to the same topic as the Dacha’s and respond if they are empty -
environment.production.name
(production) - the name given to the automatically created production environment. It will be tagged "production". -
environment.production.desc
(production) - the description field for same. -
register.url
[deprecated] - the url used for registration. The front-end should strip the prefix off this and add its own relative one. The format has to beregister.url=http://localhost:8085/register-url?token=%s
- if your site ishttps://some.domain.info
for example, it would beregister.url=https://some.domain.info/register-url?token=%s
. This is honoured but no longer required and it is recommended to be removed. -
portfolio.admin.group.suffix
("Administrators") - the suffix added to a portfolio group when a portfolio is created for the first time, it needs an Admin group. So a portfolio called "Marketing" would get an admin group called "Marketing Administrators" created. -
web.asset.location=/var/www/html/intranet
- can be set optionally if you are intending to serve the Admin web app on the intranet without public internet access. We supply this application build already preloaded with all necessary assets. Available in FeatureHub v1.5.4 and higher.
Dacha Config
The following properties can be set (that are meaningful):
-
nats.urls
- a comma separated list of NATs servers -
cache.timeout
- how long the server will attempt to find and resolve a master cache before moving onto the next step (in ms, default = 5000) -
cache.complete-timeout
- how long it will wait after another cache has negotiated master before it expects to see data (in ms, default = 15000) -
cache.pool-size
- the number of threads in pool for doing "work" - defaults to 10
Edge (all) Config
-
jersey.cors.headers
- a list of CORS headers that will be allowed, specifically for browser support -
update.pool-size
(10) - how many threads to allocate to processing incoming updates from NATs. These are responses to feature requests and feature updates coming from the server. -
edge.cache-control.header
- specifically for the GET (polling) API, this lets your infrastructure limit how often the clients can actually poll back. It would allow an infrastructure team to override individual development teams on how often they wish polling to take place. It is generally not recommended to do this, but there may be situations where it makes sense.
Edge (Streaming) Config
-
nats.urls
- a comma separated list of NATs servers -
listen.pool-size
(10) - how many threads to allocate to processing incoming requests to listen. This just takes the request, decodes it and sends it down via NATs and releases. -
edge.sse.drop-after-seconds
(30) - how many seconds a client is allowed to listen for before being kicked off. Used to ensure connections don’t go stale. This was previously namedmaxSlots
and a valid in that field is recognized. -
edge.dacha.delay-slots
(10) - if Dacha is unavailable because it does not have a full cache, it will reject the request. For SSE, this creates a sliding window of a random delay in seconds, meaning a connection will be dropped in 1-10 seconds (by default). This is designed to prevent reconnect storms as infrastructure is restarted. -
edge.sse.heartbeat-period
(0) - if defined, Edge will attempt to send heartbeat signals down the SSE connection for the duration of the connection while it is alive. If you setedge.sse.drop-after-seconds
to 0, then the SSE connection will stay open, sending heartbeat signals until the remote system drops the connection. This allows the heartbeat to be used as well as or instead of kicking SSE connections off to ensure ghost connections. -
dacha.url.<cache-name>
= url - this is only relevant if you are running split servers - so Dacha and Edge run in their own containers. You need to tell Edge where Dacha is located. The default cache is calleddefault, so it will expect one called `dacha.url.default
and the url. In the sample docker-compose where they are split, the hostname for Dacha isdacha
, so this isdacha.url.default=http://localhost:8094
. This isn’t required for the Party Server because communication is internal.
Edge (REST only) Config
Edge REST uses the database, so it also needs the database config. Edge-REST is bundled as a separate container, so it can be run and exposed directly instead of being exposed along with the Admin site.
Party Server
The party server honours all values set by the Management Repository, Dacha and the SSE-Edge.
Party-Server-ish
The party-server-ish
honours all the values set by the Management Repository and Edge REST.
Common to all servers
All servers expose metrics and health checks. The metrics are for Prometheus and are on /metrics
,
liveness is on /health/liveness
and readyness on /health/readyness
. Furthermore, every listening port responds with a 200 on
a request to /
so that load balancers that aren’t configured to listen to the proper readiness checks will function.
Each different server has a collection of what things are important to indicate aliveness.
The server.port
setting will expose these endpoints,
which means they are available to all of your normal API endpoints as well. In a cloud-native environment,
which FeatureHub is aimed at, this is rarely what you want. So FeatureHub has the ability to list these
endpoints on a different port.
-
monitor.port
(undefined) - if not defined, it will expose the metrics and health on the server port. If not, it will expose them on this port (and not on the server port). -
featurehub.url-path
- allows to configure base path (context root) other than "/". This will set the base path in the index.html of the FeatureHub web app and the backend. Note, this is an offset, not a full domain name, e.g.featurehub.url-path=/foo/featurehub
. In case if the front-end is decoupled on a CDN, the base bath needs to be configured directly in index.html by setting:<base href="/foo/featurehub/">
(note the trailing slash). -
connect.logging.environment
- this is a comma separated value list that lets you pick up values from environment variables that get added directly to your logs. It is typically used in Kubernetes deploys to allow you to extract information from the k8s deploy and put it in environment variables and have them logged. The format is<ENV-VAR>=<log-key>
. You can use.
notation to split it into objects.
connect.logging.environment=MY_KUBERNETES_NODE=kubernetes.node,MY_KUBERNETES_ZONE=kubernetes.zone
{"@timestamp":"2022-01-22T18:12:56.767+1300","message":"1 * Server has received a request on thread grizzly-http-server-0\n1 > GET http://localhost:8903/info/version\n1 > accept: */*\n1 > host: localhost:8903\n1 > user-agent: curl/7.77.0\n","priority":"TRACE","path":"jersey-logging","thread":"grizzly-http-server-0","kubernetes":{"node":"peabody","zone":"amelia"},"host":"thepolishedbrasstack.lan","connect.rest.method":"received: GET - http://localhost:8903/info/version"}
-
audit.logging.web.header-fields
- a comma separated list of fields that will be extracted out of each web request and put into a field in the JSON logs output by the server. All headers are grouped into an object calledhttp-headers
. Headers by definition are case insensitive. Available from 1.5.5. An example:
audit.logging.web.header-fields=user-agent,origin,Sec-fetch-Mode
{"@timestamp":"2022-01-22T14:46:19.374+1300","message":"txn[1106] Begin","priority":"TRACE","path":"io.ebean.TXN","thread":"grizzly-http-server-0","host":"my-computer","http-headers":{"user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36","origin":"http://localhost:53000","Sec-fetch-Mode":"cors"}}
-
audit.logging.user
- if this is set to true (it is false by default) then the user’s ID and email will be logged against each of their requests where it is known. It appears in auser
object withid
andemail
as components. Available from 1.5.5. An example
audit.logging.user=true
{"@timestamp":"2022-01-22T14:58:15.854+1300","message":"txn[1109] select t0.id, t0.when_archived, t0.feature_key, t0.alias, t0.name, t0.secret, t0.link, t0.value_type, t0.when_updated, t0.when_created, t0.version, t0.fk_app_id from fh_app_feature t0 where t0.id = ?; --bind(2b86605b-1a81-4fc7-80b7-17edc5e3206e, ) --micros(697)","priority":"DEBUG","path":"io.ebean.SQL","thread":"grizzly-http-server-1","host":"my-computer","user":{"id":"68c09a3d-6e44-4379-bfc1-3e75af59af38","email":"irina@i.com"}}
Common to Party, SSE Edge and Management Repository
-
server.port
(8903) - the server port that the server runs on. it always listens to 0.0.0.0 (all network interfaces) -
server.gracePeriodInSeconds
(10) - this is how long the server will wait for connections to finish after it has stopped listening to incoming traffic
Jersey specific config around logging is from here: Connect jersey Common
-
jersey.exclude
-
jersey.tracing
-
jersey.bufferSize
(8k) - how much data of a body to log before chopping off -
jersey.logging.exclude-body-uris
- urls in which the body should be excluded from the logs -
jersey.logging.exclude-entirely-uris
- urls in which the entire context should be excluded from the logs. Typically you will include the /health/liveness and /health/readyness API calls along with the /metrics from this. You may also wish to include login urls. -
jersey.logging.verbosity
- the default level of verbosity for loggingHEADERS_ONLY, - PAYLOAD_TEXT, - PAYLOAD_ANY
Runtime Monitoring
Prometheus
The Prometheus endpoint is on /metrics for each of the servers. Extensive metrics are exposed on all services by default. It is recommended that for public facing sites, you separate the monitoring port from the server port, so you don’t expose your health check or metrics endpoints to the public.
Health and Liveness checks
A server is deemed "Alive" once it is in STARTING or STARTED mode. It is deemed "Ready" when it is in STARTED mode. All servers put themselves into STARTING mode as soon as they are able, and then STARTED once the server is actually listening. The urls are:
-
/health/liveness
-
/health/readyness