Configuration
Run configuration
By this we mean the properties you can set to control the behaviour of different servers. As of 1.5.0 all FeatureHub controller properties are available as environment variables using the same case. If you have been using the mechanism introduced in 1.4.1 this still works but isn’t recommended going forward.
Please make sure you check out the featurehub-install GitHub repository
and the featurehub-helm
repository for examples on how these projects should be run and deployed.
|
If you are using a system like
Kubernetes, you can mount these properties in /etc/app-config/application.properties
and
/etc/app-config/secrets.properties
.
Database configuration
Only the Management Repository, Party Server, Party-Server-ish and Edge-REST deployments talk to the database. Edge (streaming), Dacha1 and Dacha2 do not.
Configuration parameters:
-
db.url
- the jdbc url of the database server. -
db.username
- the username used to log in. -
db.password
- the password for the user -
db.minConnections
- the minimum number of connections to hold open (default 3) -
db.maxConnections
- the maximum connections to open to the db (default 100) -
db.pstmtCacheSize
- the prepared statement cache size (no default)
The library we use - ebean - supports a number of other configuration parameters
Database Read Replicas
We also support Read Replicas which are useful for deployments of edge-rest
. We do not
recommend them for mr
, party-server
or party-server-ish
deployments as read replicas have to behave
like being up to a couple of seconds out is ok. This is fine for edge-rest
as it’s major functionality
is reading via a GET. To use a read replica db.
prefixes use db-replica
prefixes to configure a read replica,
where it is and how it should be connected to. Typically an edge-rest
deployment will configure both of these (db
and db-replica
parameters) but the corresponding mr
will not.
Async options for FeatureHub
If you use a streaming version of FeatureHub, you need an async layer and this has traditionally been NATS. With the advent of Dacha2, the async layer is now a bigger topic.
If you use Dacha1, you do not need to change anything, the preconfigured settings are sufficient. |
To swap from Dacha1 (NATS only) to Dacha2 (all options), you need to set the environment variables:
-
dacha1.enabled=false
- disable Dacha1 -
dacha2.enabled=true
- enable Dacha2
NATS
The upside of NATS is simplicity and speed, it is the fastest product we test with by far. We do not use JetStream, but you can configure it with JetStream if you wish. The downside is typically that your Operations team has to support it, and it is an extra cost to deploy. The Kubernetes charts for NATS are excellent and reliable.
Communication
NATS communicates using a protocol whereby the clients essentially tell the servers about each other, but they do have to be able to communicate. Please read the NATS documentation for more information.
If you have only one instance of a party-server you do not need to set anything. If you have deployed NATS yourself or are running multiple Party Servers, you will need to set the environment variable:
-
nats.urls
- a comma separated list of NATs servers. -
'nats.healthcheck.disabled` - set to true if you don’t want MR, Edge or Dacha to fail its health check if it cannot communicate with NATS.
You do not need to configure anything further for NATS for Dacha1 or Dacha2. NATS supports request/reply, true pub/sub and pub/queues in a single product.
Channels
If you use NATs outside of FeatureHub, you are likely to have secured it and if you do this, you need to know what channels FeatureHub use.
If you are using Dacha1, the following channels are used (and contain gzipped json data). They are published from MR and listened to from Dacha1.
-
default/feature-updates-v2
- this is used to send any updates to individual features -
default/environment-updates-v2
- this is used to send environment updates -
default/service-account-channel-v2
- this is used to send service account updates -
default/cache-management-v2
- this is used by the Dacha1 cache as a management layer, to communicate with MR and the other Dacha instances usually when seeking a full cache.
If you are using Dacha2, it uses CloudEvents, so the channel is (published from MR, listened to from Dacha2):
-
featurehub/mr-dacha2-channel
- but its name can be configured using the environment variablecloudevents.mr-dacha2.nats.channel-name
. Dacha2 will listen to this channel both as a Pub/Sub style listener (i.e. a topic listener) and as a Pub/Queue style (a shared subscription), for the purposes of the Enricher process used by the Webhooks.
Dacha1 and Dacha2 publish enriched events (if this is turned on) on the following channel, Edge listens
to this:
- featurehub/enriched-events
- configured using cloudevents.enricher.channel-name
Edge listens to the following channels (published from MR):
-
featurehub/mr-edge-channel
- configured usingcloudevents.mr-edge.nats.channel-name
- this is broadcast on by MR. It is a stream of feature updates.
Edge publishes on the following channels (and consumed by MR):
-
featurehub/mr-updates-queue
- configured usingcloudevents.edge-mr.nats.channel-name
. This sends back updates from PUT requests on the FeatureApi by valid ApiKeys who have write permissions, and also any other traffic (e.g. webhook status data).
Edge publishes on the Stats channel only if you have turned it on:
- featurehub/edge-stats
- configured using cloudevents.stats.nats.channel-name
We have a Usage Service which stores this data but haven’t exposed the data as yet on the front end so aren’t releasing it (the tool is postgres only).
If Edge communicates with Dacha (1 or 2) over NATS, there is an extra channel:
-
default/edge_v2
- this is a request/reply channel for requests from Edge to get data from the Dacha cache.
Google’s Pub/Sub
For Google’s PubSub, you will need to turn NATS off. We consider Google Pub/Sub to be production ready as of 1.5.9.
-
nats.enabled=false
And PubSub on:
-
cloudevents.pubsub.enabled=true
You will also need to tell FeatureHub what the topics are for publishing on. You need to configure these across all three applications.
-
cloudevents.pubsub.project=featurehub
- whatever the Google project ID you are deploying FeatureHub into -
cloudevents.edge-mr.pubsub.topic-name=featurehub-edge-updates
- the name of the topic used by Edge to publish TestSDK updates back to MR -
cloudevents.mr-edge.pubsub.topic-name=featurehub-mr-edge
- Feature Updates published from MR → Edge for streaming clients -
cloudevents.stats.pubsub.topic-name=featurehub-stats
- You only need to configure this if you have Stats publishing turned on, otherwise you can ignore it -
cloudevents.mr-dacha2.pubsub.topic-name=featurehub-mr-dacha2
- the name of the topic MR publishes feature updates, environment updates and service account updates. -
cloudevents.pubsub.min-backoff-delay-seconds=5
- Edge and Dacha must be able to create their own subscriptions. Google PubSub is not actually "pubsub", there is no way for all connections to a subscription to receive a message, so this delay reflects how long PubSub should wait before trying to redeliver the message. Edge and Dacha will delete their subscriptions when they shut down, so keep this low. -
cloudevents.mr-edge.pubsub.subscription-prefix=featurehub-edge-listener
- the prefix used for subscriptions created by Edge -
cloudevents.mr-dacha2.pubsub.subscription-prefix=featurehub-dacha2-listener
- the prefix used for subscriptions created by Dacha2 instances
MR also needs an extra environment variable to tell it what subscription to listen to for updates from the TestSDK topic configured above (cloudevents.edge-mr.pubsub.topic-name
):
-
cloudevents.inbound.channel-names=featurehub-edge-updates-mr-sub
For testing locally there are extra fields if you are using the emulator - they are used in the featurehub-installs
PubSub folder. To test in Google, we use Pulumi.
AWS Kinesis
We are supporting AWS Kinesis from v1.5.9 of FeatureHub. We discontinued support in 1.8.0 due to lack of interest.
Management Repository
The following properties can be set:
-
passwordsalt.iterations
(1000) - how many iterations it will use to salt passwords -
cache.pool-size
(10) - how many threads it will allocate to publishing changes to Dacha and SSE -
feature-update.listener.enable
(true) - whether this MR should listen to the same topic as the Dacha’s and respond if they are empty -
environment.production.name
(production) - the name given to the automatically created production environment. It will be tagged "production". -
environment.production.desc
(production) - the description field for same. -
register.url
[deprecated] - the url used for registration. The front-end should strip the prefix off this and add its own relative one. The format has to beregister.url=http://localhost:8085/register-url?token=%s
- if your site ishttps://some.domain.info
for example, it would beregister.url=https://some.domain.info/register-url?token=%s
. This is honoured but no longer required and it is recommended to be removed. -
portfolio.admin.group.suffix
("Administrators") - the suffix added to a portfolio group when a portfolio is created for the first time, it needs an Admin group. So a portfolio called "Marketing" would get an admin group called "Marketing Administrators" created. -
web.asset.location=/var/www/html/intranet
- can be set optionally if you are intending to serve the Admin web app on the intranet without public internet access. We supply this application build already preloaded with all necessary assets. Available in FeatureHub v1.5.4 and higher. With 1.5.10 or higher there is -web.asset.location=/var/www/html/html
- which can be used if there is a lot of mobile use. -
cache-control.web.index
- this allows you to set the Cache-Control header on the index.html file. It is set by default tono-store, max-age=0
preventing any caching, so as new versions roll out, they are correctly picked up. -
cache-control.web.other
- this sets the cache control on all of the other content of the website, which is essentially considered to be versioned. This data should never change and it is set by default tomax-age=864000
- or about 10 days. -
webhooks.features.enabled
- enables webhooks functionality. True by default. Note, internally webhooks are dependent on enrichment pipeline, see below forenricher.enabled
property. If this property is set to false, then it will overridewebhooks.features.enabled
property. To disable webhooks, setwebhooks.features.enabled=false
This will remove webhooks functionality from the Admin UI. Depending on the installation option, webhooks may require additional configuration as described here -
webhook.features.max-fails
- number of webhook retries before disabling. When webhooks fail to connect and deliver their result - any HTTPS status outside of the 200 range (including 0 - where a connection is refused) then the FeatureHub app begins a countdown, where after n number of retries, it will automatically disable the webhook. Default value is 5. To change, set to the desired number:webhook.features.max-fails=10
-
enricher.enabled
- The enricher pipeline, currently only required to be enabled for webhooks to work. True by default. To reduce unnecessary "load" if you do not use webhooks, disable as follows:enricher.enabled=false
-
ga.tracking-id
- If provided, this will enable tracking on the usage of your Admin UI and tell you what features people are using. -
mr.dacha2.api-keys
- This swaps the Dacha2 endpoint to the public interface and protects it to expect akey
header with one of the keys listed here. See Dacha2 below for more details.
SDK Feature Extension Properties
The system is capable of letting you extend the data you publish to the SDK and control access to that data using the Service Accounts. Each Service Account has Read/Unlock/Lock/Change Value and also Read Extended Feature Value.
-
sdk.feature.properties
- In the Management Repository (and Party Server/Party-Server-ish) you can define this environment variable or system property and it is a name/value pair of data which will appear in your "Feature Properties" section in your SDK (if the associated Service Account is given permission). It is an advanced feature and it requires understanding of the Object model of Feature Hub - and it is available only in the Open Source version. -
sdk.feature.properties.size
- this controls the cache size. If you are using metadata (see below), the system will hold onto the JSON parsed metadata to speed up publishing, it will hold onto this many features. It is set by default to 100.
The name/value pairs are pairs of names and Handlebars templates. The templates feature lets you walk the tree of data provided.
The values in the name/value pair can be actual templates, or if the value starts with a #
then a reference to an absolute file location. The latter is useful if you wish to store your definitions in files as part of your Kubernetes deployment. We recommend experimenting with a local copy of FeatureHub to get your templates just the way you want them.
An example:
sdk.feature.properties=appName={{{feature.parentApplication.name}}},portfolio={{{feature.parentApplication.portfolio.name}}},category={{{metadata.category}}}
If you curl your SDK endpoint, you will see something like:
"fp":{"appName":"D2nbUfr8LH1i","portfolio":"D2nbUfr8LH1i","category":"shoes"}
fp
meaning "feature properties".
Your template will be handed the following fields which you refer to by name in your template:
-
feature
- this represents the DbApplicationFeature definition and lets you walk around the Feature definition (it is not the value but the feature itself). -
featureValue
- this represents the DbFeatureValue definition. For flags, it will always have a value, for the other types it may be empty. -
metadata
- this representsfeature.metaData
that has been parsed into a JSON object and is thus navigable in Handlebars. If you want the raw metadata, use{{{feature.metaData}}}
, if for example you want thecategory
inside the metadata in your features, use{{{metadata.category}}}
. If there is no metadata or no category or the metadata isn’t JSON for some features, that won’t cause problems, it will just be left out (see below).
In addition to these we give you:
-
pubFeature
- this is the feature that is about to be published and is documented in the API. -
pubFeatureValue
- this is the value of the feature (if any) that is about to be published, can be null and is also documented in the API. -
fgStrategies
- these are the strategies coming from feature groups this feature is associated with. They are RolloutStrategies and basically follow the same API pattern.
A few things to note
-
in Handlebars if you don’t want it to HTML escape your data, you need to use triple braces, e.g.
{{{feature.parentApplication.name}}}
. -
null values are safe to navigate, Handlebars will just return an empty value
-
the data is sparse - if a key returns a null or empty value, it will drop that key to keep data transfer down.
-
on SSE (streaming) connections, until the connection is re-established, the change in permission won’t become visible as Edge does not get updated as to changes in permissions of Service Accounts. If you are running your SSE connections in HeartBeat mode then you won’t get updates until they drop and re-establish.
Dacha1 Config
If you are using Dacha1 (the "active" cache), the following properties/environment variables can be set (that are meaningful):
-
cache.timeout
- how long the server will attempt to find and resolve a master cache before moving onto the next step (in ms, default = 5000) -
cache.complete-timeout
- how long it will wait after another cache has negotiated master before it expects to see data (in ms, default = 15000) -
cache.pool-size
- the number of threads in pool for doing "work" - defaults to 10
Dacha2 Config
If you are using Dacha2 (the "lazy" cache), then the following properties/environment variables can be set:
-
management-repository.url
- http(s) location of the Management Repository. Needs to include a port number. -
dacha2.cache.api-key
- this is used to allow Dacha2 to communicate with a Remote Management Repository to get its cache misses. If an Edge request comes in that Dacha2 hasn’t seen, it will make a request to the Management Repository to get the state - which is a fairly heavy call. This call you don’t normally want to expose over the internet as it can be a vector of denial of service attack on your Management Repository. If you in fact wish to distribute your Dacha2 instances out to several clusters, potentially across the world, then you can protect the MR API by settingmr.dacha2.api-keys
(as outlined above) which gives you (a) the ability to rotate keys and (b) different potential keys for each cluster and this keydacha2.cache.api-key
on your Dacha2 instances. By setting themr.dacha2.api-keys
value, the API will swap to the public interface and will not be available internally. Note, this traffic need not going through the external load balancer for Dacha2 instances in the same cluster as the Management Repository. The following diagram shows this in more detail.
There are other configuration fields that allow you to not actively cache incoming new environments or set the size of the LRU cache for data, but unless you are getting memory issues or need to support more than 10000 active environments, it is not recommended change these.
Edge (all) Config
-
jersey.cors.headers
- a list of CORS headers that will be allowed, specifically for browser support -
update.pool-size
(10) - how many threads to allocate to processing incoming updates from NATs. These are responses to feature requests and feature updates coming from the server. -
edge.cache-control.header
- specifically for the GET (polling) API, this lets your infrastructure limit how often the clients can actually poll back. It would allow an infrastructure team to override individual development teams on how often they wish polling to take place. It is generally not recommended to do this, but there may be situations where it makes sense.
Edge (Streaming) Config
-
listen.pool-size
(10) - how many threads to allocate to processing incoming requests to listen. This just takes the request, decodes it and sends it down via NATs and releases. -
edge.sse.drop-after-seconds
(30) - how many seconds a client is allowed to listen for before being kicked off. Used to ensure connections don’t go stale. This was previously namedmaxSlots
and a valid in that field is recognized. -
edge.dacha.delay-slots
(10) - if Dacha is unavailable because it does not have a full cache, it will reject the request. For SSE, this creates a sliding window of a random delay in seconds, meaning a connection will be dropped in 1-10 seconds (by default). This is designed to prevent reconnect storms as infrastructure is restarted. -
edge.sse.heartbeat-period
(0) - if defined, Edge will attempt to send heartbeat signals down the SSE connection for the duration of the connection while it is alive. If you setedge.sse.drop-after-seconds
to 0, then the SSE connection will stay open, sending heartbeat signals until the remote system drops the connection. This allows the heartbeat to be used as well as or instead of kicking SSE connections off to ensure ghost connections. -
dacha.url.default
= url - You MUST specify this for Dacha2 this is only relevant if you are running split servers - so Dacha and Edge run in their own containers. You need to tell Edge where Dacha is located. In the sample docker-compose where they are split, the hostname for Dacha isdacha
, so this isdacha.url.default=http://localhost:8034
. This isn’t required for the Party Server because communication is internal.
Edge (REST only) Config
Edge REST uses the database, so it also needs the database config. Edge-REST is bundled as a separate container, so it can be run and exposed directly instead of being exposed along with the Admin site.
Party Server
The party server honours all values set by the Management Repository, Dacha and the SSE-Edge.
Party-Server-ish
The party-server-ish
honours all the values set by the Management Repository and Edge REST.
Common to all servers
All servers expose metrics and health checks. The metrics are for Prometheus and are on /metrics
,
liveness is on /health/liveness
and readyness on /health/readyness
. Furthermore, every listening port responds with a 200 on
a request to /
so that load balancers that aren’t configured to listen to the proper readiness checks will function.
Each different server has a collection of what things are important to indicate aliveness.
The server.port
setting will expose these endpoints,
which means they are available to all of your normal API endpoints as well. In a cloud-native environment,
which FeatureHub is aimed at, this is rarely what you want. So FeatureHub has the ability to list these
endpoints on a different port.
-
monitor.port
(undefined) - if not defined or0
, it will expose the metrics and health on the server port. If not, it will expose them on this port (and not on the server port). For systems like ECS where having > 1 port is not desirable, you should set it to0
. -
featurehub.url-path
- allows to configure base path (context root) other than "/". This will set the base path in the index.html of the FeatureHub web app and the backend. Note, this is an offset, not a full domain name, e.g.featurehub.url-path=/foo/featurehub
. In case if the front-end is decoupled on a CDN, the base bath needs to be configured directly in index.html by setting:<base href="/foo/featurehub/">
(note the trailing slash). -
cache-control.api
- allows the configuration of the Cache-Control headings on all GET based API calls. This allows you to put a CDN in front of FeatureHub and ensure the CDN does not cache any headers. It is on by default. See also thecache-control.web
configuration for MR and Party Server. -
cache-control.api.enabled
- set this tofalse
if you wish to disable the Cache Control headers for APIs. -
connect.logging.environment
- this is a comma separated value list that lets you pick up values from environment variables that get added directly to your logs. It is typically used in Kubernetes deploys to allow you to extract information from the k8s deploy and put it in environment variables and have them logged. The format is<ENV-VAR>=<log-key>
. You can use.
notation to split it into objects. -
opentelemetry.valid-baggage-headers
- this allows you to filter out the valid W3C Baggage headers that you will accept from incoming clients. The Web UI adds its own (the request-id for that session) and the server adds in the UUID of the user who is performing the action. Your clients can add anything further they wish to baggage headers, but you need to specify them here to allow them to be added to the baggage. The side effect of this is that all baggage will be logged in the logs. This gives you very good OpenTelemetry and log level tracebility for actions from clients.
connect.logging.environment=MY_KUBERNETES_NODE=kubernetes.node,MY_KUBERNETES_ZONE=kubernetes.zone
{"@timestamp":"2022-01-22T18:12:56.767+1300","message":"1 * Server has received a request on thread grizzly-http-server-0\n1 > GET http://localhost:8903/info/version\n1 > accept: */*\n1 > host: localhost:8903\n1 > user-agent: curl/7.77.0\n","priority":"TRACE","path":"jersey-logging","thread":"grizzly-http-server-0","kubernetes":{"node":"peabody","zone":"amelia"},"host":"thepolishedbrasstack.lan","connect.rest.method":"received: GET - http://localhost:8903/info/version"}
-
audit.logging.web.header-fields
- a comma separated list of fields that will be extracted out of each web request and put into a field in the JSON logs output by the server. All headers are grouped into an object calledhttp-headers
. Headers by definition are case insensitive. Available from 1.5.5. An example:
audit.logging.web.header-fields=user-agent,origin,Sec-fetch-Mode
{"@timestamp":"2022-01-22T14:46:19.374+1300","message":"txn[1106] Begin","priority":"TRACE","path":"io.ebean.TXN","thread":"grizzly-http-server-0","host":"my-computer","http-headers":{"user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36","origin":"http://localhost:53000","Sec-fetch-Mode":"cors"}}
-
audit.logging.user
- if this is set to true (it is false by default) then the user’s ID and email will be logged against each of their requests where it is known. It appears in auser
object withid
andemail
as components. Available from 1.5.5. An example
audit.logging.user=true
{"@timestamp":"2022-01-22T14:58:15.854+1300","message":"txn[1109] select t0.id, t0.when_archived, t0.feature_key, t0.alias, t0.name, t0.secret, t0.link, t0.value_type, t0.when_updated, t0.when_created, t0.version, t0.fk_app_id from fh_app_feature t0 where t0.id = ?; --bind(2b86605b-1a81-4fc7-80b7-17edc5e3206e, ) --micros(697)","priority":"DEBUG","path":"io.ebean.SQL","thread":"grizzly-http-server-1","host":"my-computer","user":{"id":"68c09a3d-6e44-4379-bfc1-3e75af59af38","email":"irina@i.com"}}
Encryption
To enable Slack integration or URL and header encryption for Webhooks, you will need to specify an encryption key/password to be used for these items which are encrypted at rest (i.e. they are encrypted until they are explicitly used, in Slack’s case they are encrypted until the system is about to POST to Slack, in Webhooks the same is true).
Config options to enable encryption:
-
webhooks.encryption.password
- this is encryption key/password.Required for Slack integration to work.This can be set to anything, we recommend using a randomiser or password generator to set a reasonably long key (16+ characters). -
webhooks.decryption.enabled
- this defaults tofalse
- which means 1) Once Slack Bot User OAuth Token is set in the Admin app, it cannot be viewed again and will show as hidden. Set this totrue
to be able to view or change the token. 2) Once webhook URL or header value is set and an option to "Encrypt" is checked in the Admin app, they cannot be viewed again and will be shown as hidden. Set this totrue
to be able to view, delete or reset the webhook URL or header values.
If you are migrating from v1.8.0 to 1.8.1 only, earlier version not affected. The webhooks.encryption.password property was set at a default value in 1.8.0 but it was removed as a security precaution in 1.8.1 . It will affect your Slack integration if you used it in 1.8.0. You will need to:1. Set this property with your own generated value, e.g. webhooks.encryption.password=foobar , then set webhooks.decryption.enabled=true 2. Go to FeatureHub Admin UI System Config page, select disable Slack checkbox, reveal and copy Slack Bot User OAuth Token then clear it and Save. Re-enter Slack Bot User OAuth Token and Save. This should re-enable Slack with new encryption key. If you cannot view and copy Slack Bot User OAuth Token, go to Slack Workspace settings here https://api.slack.com/apps and find FeatureHub app. Go to OAuth and Permissions screen and find your Slack Bot User OAuth Token. |
OpenTelemetry AutoConfiguration
OpenTelemetry configuration is done using the AutoConfiguration style definition. It allows you to specify your own tracing exporters, propagators, etc based on system properties (in the config files) or by environment variables.
Common to Party, SSE Edge and Management Repository
-
server.port
(8903) - the server port that the server runs on. it always listens to 0.0.0.0 (all network interfaces) -
server.gracePeriodInSeconds
(10) - this is how long the server will wait for connections to finish after it has stopped listening to incoming traffic
Jersey specific config around logging is from here: Connect jersey Common
-
jersey.exclude
-
jersey.tracing
-
jersey.bufferSize
(8k) - how much data of a body to log before chopping off -
jersey.logging.exclude-body-uris
- urls in which the body should be excluded from the logs -
jersey.logging.exclude-entirely-uris
- urls in which the entire context should be excluded from the logs. Typically you will include the /health/liveness and /health/readyness API calls along with the /metrics from this. You may also wish to include login urls. -
jersey.logging.verbosity
- the default level of verbosity for loggingHEADERS_ONLY, - PAYLOAD_TEXT, - PAYLOAD_ANY
Runtime Monitoring
Prometheus
The Prometheus endpoint is on /metrics for each of the servers. Extensive metrics are exposed on all services by default. It is recommended that for public facing sites, you separate the monitoring port from the server port, so you don’t expose your health check or metrics endpoints to the public.
Health and Liveness checks
A server is deemed "Alive" once it is in STARTING or STARTED mode. It is deemed "Ready" when it is in STARTED mode. All servers put themselves into STARTING mode as soon as they are able, and then STARTED once the server is actually listening. The urls are:
-
/health/liveness
-
/health/readyness