SDK Contributors Documentation

What to know up front

There are a few things you need to know up front.

MIT License

All SDKs are MIT licensed. This is because they are included in the client’s code base. That is generally as far as we can go, but try and choose dependencies that are MIT or similarly licensed.

OpenAPI

We use an OpenAPI definition for the SSE layer. Hold on you say, OpenAPI is a REST/JSON-RPC standard, it doesn’t support SSE. Yes, you would be right, it doesn’t support WebSockets either. However, there is one REST call in the API, and all of the data structures that are sent over the wire are listed in that document.

To find the latest published version of the API, go to http://api.dev.featurehub.io/edge/releases.json - and then take the latest version and go to http://api.dev.featurehub.io/edge/1.1.3.yaml (where you replace 1.1.3.yaml with the latest revision).

You will notice the eventsource url is missing, and it is. If you use the standard OpenAPI generator as supported by the community, then you will generally get a passable API. If you are having difficulty with it, please let us know - we have expertise in making it work well.

SDK submissions

From our perspective, we are happy to accept any contributions within our guidelines and that follow the basic requirements of the SDK pattern we have established. It is fine that they are delivered in stages, we just ideally want to keep the key functions the same between the different languages.

An SDK is generally recommended to have a local feature cache which we refer to as a repository. This will store the state of all of the features and let the user interact with the features all of the time, without the application needing to ask the FeatureHub server for features. Then there is a client, which operates independently and gets the features and passes them to the repository for processing. There are two forms currently, GET and SSE (Server Sent Events)

It is worthwhile they be idiomatic to your language.

The SDK is broken into three main API calls:

  • GET the basic HTTP GET API is used for polling, and is intended primarily for use by customer applications, such as browsers and mobile clients where various limitations mean having a constant radio push link to the FeatureHub server is infeasible, and immediate feature change is generally not required or wanted. Customer facing applications will typically have a capture/release mechanism where updates become available, but you should provide a "holding cell" for them, and let the programmer who is using the SDK release them into the repository for use or to fire events at their leisure. This typically appears in a UI as "the application needs to update" kinds of messages.

  • SSE this is the idealistic form of connection to FeatureHub - it holds a connection open to the server for as long as the server allows and the server will push changes as they occur down to the client. This is generally what you want to use in a server based microservice that is serving calls in a multi-threaded environment. Typically Go, Java, .NET or standalone Python or Ruby services can use this. Applications like Rails running on Passenger should generally not use them, because Rails is generally licensed single threaded. PHP cannot support this model as it is entirely reactive and does not have multi-threads.

  • PUT this is the TEST API that FeatureHub provides, its intended for QA Automated tests to allow QA environments to control features. We generally recommend that Baggage support is built into the SDK using Open Telemetry as use of this API means that testing is not able to achieve massive parallelisation.

Building the GET Adapter

The GET Adapter is almost fully documented in the OpenAPI document above.

General notes

  • MUST: etags are supported. When a response comes back, the etag header is received, it must be held onto and on the next request, it must be sent back in the if-none-matches header. If this matches the current etag of the features, the server will reply with a 304.

  • MUST: You will always get a 200 (list of environments and their features) or 236 on success. A 236 will send back the same data as a 200 but the client must stop polling as it is an indication that the environment is stale and will no longer receive updates.

  • MUST: 304 (no change in data, so no data) - the client should just skip and expect no data and wait for the next poll

  • MUST: 400 - malformed request - typically you have requested no environments

  • MUST: 404 - the key doesn’t exist, please stop asking for it - polling should stop completely and not restart.

  • MUST: 500 - some tragic event happened on the server and woe to the Ops person investigating it

  • MUST: 503 - the server cache isn’t ready and can’t give an answer as to whether the key does or does not exist - wait for a little while and try again.

  • MUST: Support polling interval changes from the server by interpreting the max-age=X field. If the cache-control header is set, and there is a max-age=X field, the polling interval must be updated to match. This is the number of seconds between polls. This allows per environment control over cache control and the server being able to indicate to the client how quickly (or slowly) to poll.

Server Evaluated Feature Support

If you are intending to support server evaluated features:

  • MUST: For server evaluated support, the Context sends an x-featurehub header. If it can’t set a header (e.g. a browser) it can pass it as a parameter instead.

  • MUST: The x-featurehub header is a FORM style separation of attributes from the context, key=value&key=value, etc

  • MUST: When an x-featurehub header is sent (i.e the client context has values) the SDK must calculate a SHA256 in hexdigest form. This is added as a query parameter called contextSha to the outgoing request. This parameter is in the OpenAPI document. It is intended to allow the client side to "bust the cache" if there is a Fastly for a FeatureHub deploy.

Building an SSE Adapter

The SSE Adapter is perhaps the most difficult one to support if your language doesn’t have a readily supported SSE implementation.

The key to the Feature Hub SDK is that all clients should receive updates at the same time. In the perfect world, this means all of your stack updates instantly with your configured updates - backend and front-end. We ideally want to take advantage of caching at CDN layers if we can.

To achieve this, there are only three technologies available across the major platforms that they have in common, plain old HTTP, Server Sent Events, and Web Sockets. Lets brush HTTP/2 and HTTP/3 (QUIC) under the table for the time being.

At its core, HTTP is a connectionless protocol, even with Keep-Alive, it is client driven protocol. As such, it isn’t suitable for our stated goal of instantaneous updates. It is however important in our story, because it does allow the user full control over when they update their features, and in a costly (both in real money data wise and in battery life) environment, being able to have that control and yet still roll out features in a measured way is invaluable. We will use plain HTTP for our Mobile SDK when it comes along, and we use it for our test API to update features.

WebSockets is essentially bi-directional tcp overlaid on the HTTP layer and it suffers a few problems for our use case. It isn’t cachable, it requires considerably greater complexity in terms of client implementation, and it is bi-directional, which isn’t really necessary in our case. WebSockets have to regularly kill your connection to ensure that they don’t have stale phantom connections.

That leaves us with one thing left, and given our kind of use case is exactly what it was designed for, it makes sense that we use this technology. Server Sent Events came out in 2006 and is very widely supported in both the browser space (except for IE, which requires a polyfill) and through many client libraries. It is well supported by web servers, and around the globe by all proxies and gateways. It doesn’t require complex protocols like Socket.IO, as it is a simple set of key values that periodically come down the wire. Further it is focused on server-push, is cachable, allowing you to use clever CDN’s like Fastly.

SSE also kills your connection regularly to ensure stale connections are removed, but you can control that in FeatureHub and CDNs also used that as a key to refresh their own caches.

Note, for this reason you will see the connection being dropped and then reconnected again every 30-60 seconds. You have an option of setting it longer if you change maxSlots in the Edge server.

What is even better about SSE is that you can simply use curl, your normal browser inspection tools, and our implementation of it is very easy to use and understand.

The downsides of it, as mentioned in that post by Fastly, are the same as with WebSockets. It keeps a radio link open and so you shouldn’t use it for Mobile without connecting and disconnecting. We intend to provide a simple GET API for use in our Mobile APIs for Android (Java), iOS (Swift) and, of course, Dart for Flutter.

Back to HTTP/2 - this is a technology that we see best used from a CDN as it allows multiplexing multiple event streams over one connection. HTTP/2 supports server sent events, but offers limited advantages unless more than just features are being sourced from the same server.

Further information on SSE:

Before you start

We recommend you start up a FeatureHub Party Server docker image, and curl into the features - even use a browser for your link and you will see a list of updates. The default server kicks you off every 30 seconds but that is configurable, and is intended to ensure that you don’t have stale, phantom connections.

If you create a feature, change a feature, delete a feature, add a new feature, all of these things you should be able to watch and see come down the line. This is sort of what it should look like:

curl -v http://localhost:8553/features/default/fc5b929b-8296-4920-91ef-6e5b58b499b9/VNftuX5LV6PoazPZsEEIBujM4OBqA1Iv9f9cBGho2LJylvxXMXKGxwD14xt2d7Ma3GHTsdsSO8DTvAYF
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8553 (#0)
> GET /features/default/fc5b929b-8296-4920-91ef-6e5b58b499b9/VNftuX5LV6PoazPZsEEIBujM4OBqA1Iv9f9cBGho2LJylvxXMXKGxwD14xt2d7Ma3GHTsdsSO8DTvAYF HTTP/1.1
> Host: localhost:8553
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/event-stream
< Transfer-Encoding: chunked
<
event: ack
data: {"status":"discover"}

event: features
data: [{"id":"6c376de1-3cb8-4297-b641-8f27e0d11612","key":"FEATURE_SAMPLE","version":1,"type":"BOOLEAN","value":false},{"id":"b8d9b3a0-2972-4f56-a57f-3f74fe9c7e4f","key":"NEW_BUTTON","version":1,"type":"BOOLEAN","value":false},{"id":"5f562e19-aedf-44d5-ab5f-c2994e2b7f57","key":"NEW_BOAT","version":4,"type":"BOOLEAN","value":false}]

event: feature
data: {"id":"5f562e19-aedf-44d5-ab5f-c2994e2b7f57","key":"NEW_BOAT","version":5,"type":"BOOLEAN","value":true}

event: feature
data: {"id":"ae5e1af5-ac7d-475c-9862-7a3f88fa20d3","key":"dunk","type":"BOOLEAN"}

event: feature
data: {"id":"ae5e1af5-ac7d-475c-9862-7a3f88fa20d3","key":"dunk","version":1,"type":"BOOLEAN","value":false}

event: delete_feature
data: {"id":"ae5e1af5-ac7d-475c-9862-7a3f88fa20d3","key":"dunk","type":"BOOLEAN"}

event: bye
data: {"status":"closed"}

You can see it is a series of pairs: event, data. These are standard names in SSE, their values are what we control.

The event is the command, there is a special one called "error" that is managed by the protocol itself.

Requirements for SSE client

The SSE client must support all server field types, and not fault if new event types are introduced. These are

  • ack: nothing needs to be done for this, it is simply a heartbeat and is sent at start to ensure that the client receives something.

  • features: this contains an array of Feature objects and their appropriate state. If client evaluated keys are supported, this will include strategies, if server evaluated keys are being used, it will only include the specific feature values for this client.

  • feature: this contains the update to a single feature

  • delete_feature: this indicates that the specified feature should be deleted. The semantics of this are slightly more complex however. A deleted feature is (a) an actual delete on that feature value, (b) a delete of the feature entirely or (c) a feature value being retired. In the (c) case, because of parallelism in the way messages are delivered, a feature could be created (v0), updated (v1), retired (v2), and unretired (v3) in quick succession. This could lead to an incorrect delivery order for deleted features. As such you need to check if the incoming deleted has a "null" version, a version of 0, or the version is greater than the version currently in the repository. In any of these cases, delete the feature.

  • config: this contains metadata used about how the channel should perform, see below

  • bye - when the server is closing the connection (you will not always receive this)

config

The config response always responds with a JSON data packet. The currently recognized fields are:

  • edge.stale - the client should stop attempting to connect as this API key represents an environment that is stale. This MUST be honoured.

A note on the EventSource spec

The EventSource spec indicates that if the server wants the client to stop listening, it should send an HTTP 204. However in our case because we have to validate the Service Account and Environment, and this causes a slight delay, we send back and ack, and then a failed message. If you receive a failed message, this is when you could stop listening. However it may due transient issues on the network preventing your client from talking to the server. that would be rare but it does happen. It could also happen because the cache does not yet know about your environment or service account, such as Dacha starting after the Edge server, or the first Dacha taking a short while to negotiate its cache.

Look at the other examples, talk to us

There are multiple examples of the SDK so far, so have a look at their implementation. Chances are you have a passing familiarity with at least one of the languages.

Please also talk to us, we are available on the #fh-sdk channel on the Anyways Labs Slack.

A Feature Repository

It is expected that there will be a repository pattern of some kind in each SDK. That may have all the functionality pertaining to features, listeners, streamed updates, and analytics logging built in and yet actually do nothing itself.

As an example, the Typescript client has two repositories - a LocalRepository intended to be loaded from a file, and a ClientFeatureRepository. Most of the other SDKs have just one. They generally have two Edge connectors to choose from - a polling edge connector and an eventstream edge connector. Both of these feed data to the repository through a generic interface.

For the Java version, this has been done because Jersey is the first example stack, but there are many others in Java-land and when we have a Mobile SDK, it will support Android-Java, which will not be able to use SSE. It also means if someone built a pure NATs client or Kafka client, the same repository could be used.

Consider approaching it this way, where the event source is passed the repository and it notifies that repository as new events come in.

However, if it is unlikely your repository will be used a different way, then merging them together makes sense.

Typically, because the repository is what the main code base will interact with, a repository will be responsible for:

  • holding all of the features

  • keeping a track of the new features coming in and checking their versions to make sure they are new versions or that their values have changed. A feature could arrive with the same version but a different value in Server Evaluated API Keys because the client can change its context which will change the way a feature has been evaluated.

  • triggering events (callbacks, streams or whatever is idiomatic in your language) for when features change

  • keeping track of user context so you can apply rollout strategies (see Rollout Strategies below)

  • allowing clients to remove themselves from listening

  • indicating the clients when the full list of features has arrived ("ready"). If your SSE layer actually blocks until it has received the full list, this may be perfectly idiomatic, especially if your SDK is targeting servers or command line tools.

  • analytics logging and registering senders

  • other optional characteristics, such as the catch & release mode supported by Javascript and Dart (because of their UI focus)

Repository Readiness

The current design for repositories is there are three states that a repository can be in, Not Ready (before the first successful connection), Ready and Failed.

If you are writing a server application, you should include the FeatureHub in your liveness checks - anything other than Ready and the repository is not usable.

  • Not Ready - this represents the situation before the SDK has been able to connect and get feature states. At this point it does not even know if the FeatureHub URL and API key it has been given are valid.

  • Failed - this represents the state if the server returns a 4xx error of any kind. This can happen if your SDK is too old or because the API does not exist.

  • Ready - this represents the repository having features, not necessarily the current features, that depends on how your application connects to the FeatureHub server.

A repository starts in Not Ready, and transitions to Ready only when the first successful connect is made. It should not transition to Not Ready again, only to Failed. Transient outages in the install of FeatureHub should not mark the repository as Not Ready as the repository is designed specifically as a cache. Having wide ranging server outages because a FeatureHub install was interupted is a key design goal of using a local cache.

So transitions are: Not Ready > Ready, Not Ready > Ready > Failed, Not Ready > Failed.

As above, if a server receives a "usage cap exceeded" message from the SaaS version of FeatureHub, it should not transition to a different state.

The SSE Layer

This is normally a separate thing, and you would pass your repository into this and it would update it as new updates come in. Exactly how this works is up to you, the Dart, Java and Typescript clients simply hand off the decoded event type and the JSON blob and let the repository deal with the rest.

The SSE layer could be held onto, it might not be. If for example you wanted to block until the full list of features was available, you might hold onto this until it told you it was ready or it timed out.

The Test Client

The Test API is something that an integration or e2e test would use to toggle features. Where it sits in your SDK is up to you, it could simple be available by the generated OpenAPI client like it is in Dart and C#.

Rollout Strategies

New in Milestone 1.0 is the support for Rollout Strategies, and each of the SDKs has had a ClientContext added to it to support this.

Essentially the ClientContext is information provided to the Repository about the client that is using it. It is designed to support rollout strategies.

The ClientContext is essentially a key/value pair repository with some keys having a special meaning. The keys themselves are case sensitive, but how they appear in your language and what case they use is up to you. All keys are stored as a key and a list of possible values, because the strategy API supports matching against arrays. The keys at the moment are:

  • userkey - an arbitrary key that is primarily used for percentage based rollout (the UI support for this delivered in Milestone 1.0). This key will also be used in the future for individual user profiling if you wish to use it for that, so keep it as opaque as possible. A good opaque key is also useful for percentage rollout (see below).

  • session - a key which is usually used to indicate the current logged in session.

  • device - the device the user is using (mobile, desktop, browser). Defined by the OpenAPI enum StrategyAttributeDeviceName

  • platform - the device’s platform. Again defined by the OpenAPI enum StrategyAttributePlatformName

  • country - the country of the user. We define acceptable variants using the OpenAPI enum StrategyAttributeCountryName because it allows us to also infer geographical regions. Please let us know if we have missed from the Encyclopedia Britannica’s list - if your country isn’t on their list or shouldn’t be on their list, please take it up with them.

  • version - the semantic version of your application. Generally the combination of version and platform is very useful when rolling out features to specific platforms (such as Mobile).

We also expose the ability for a person to store a key/value pair or a key/list of values pair.

We encourage a fluid style API for developers to use for this context.

Supporting server side evaluation

We started with server side evaluation in Milestone 1.0. To support this, if a user puts data into the ClientContext then there needs to be a mechanism by which the user has indicated they have finished putting this data into the ClientContext (build is used in the other SDKs), which then triggers whichever client the user is using to refresh its connection by passing a special header - x-featurehub.

Supporting client side evaluation

In client side evaluation, they API keys have an * in them to indicate they are client side. The Edge Server knows which keys are which, and won’t let strategy details out for Server Evaluated keys like it did prior to 1.3. All client side evaluation is done in the SDK, and is intended to compare a set of rollout strategies to the ClientContext entries.

In the Typescript, C# and Java SDKs, this is done by creating a special Server or Client eval Context that is hidden from the user behind the ClientContext interface. When the build happens, each one knows either to make an updated request to the server (server side eval, needs to reset a new x-featurehub header) or to do nothing (client side eval).

Encoding the header

The header x-featurehub is designed to follow the same kind of format as the W3C Baggage spec, where you have key value pairs where the value is URL-Encoded. In our case, we are sending arrays of values which we expect to be separated by commas. So the header will be:

key=value,value,value,key=value,value,value,key=value

to support this, the values are joined by commas and then url encoded, and then key value pairs are made of them.

An example from the C# APi is as follows:

     await _fhConfig.NewContext()
        .Attr("city", "Istanbul City")
        .Attrs("family", new List<String> {"Bambam", "DJ Elif"})
        .Country(StrategyAttributeCountryName.Turkey)
        .Platform(StrategyAttributePlatformName.Ios)
        .Device(StrategyAttributeDeviceName.Mobile)
        .UserKey("tv-show")
        .Version("6.2.3")
        .SessionKey("session-key").Build();

this makes the header:

city=Istanbul+City,country=turkey,device=mobile,family=Bambam%2cDJ+Elif,platform=ios,session=session-key,userkey=tv-show,version=6.2.3

The key replacements in attributes is that = must be replaced with %3D, and , with %2C as they are the delimeters and how the server will split the attributes up to be evaluated.

You should sort the headers by key name, at least to make testing easier.

Percentage Rollout

For percentage rollout, we apply the Murmur3 Hash to the user’s key (by default) and the feature’s ID and spread it out over 1 million values. This means a given key will get consistent results across different devices for the same feature.

So a user "fred" might get assigned a value of 23.2852%, and will always for that feature get that percentage. "mary" on the other hand may get 77.5421%, but for that feature will always get that percentage. For a different feature, they will both get different percentages.

When applying percentage rollouts, order matters for the rollout strategies that are defined on the feature. The first matching strategy will be taken and applied. Lets take an example, say we have a String rollout, where we have a default of red, a 20% of blue and a 30% of green. This means that anyone with a 20% or lower calculated hash will get blue. Anyone with a 50% (20% + 30%) or lower will get green - note however the people who had 20% will have already been matched and exited the criteria matching. If the strategies were in the opposite order, you would get 0-30% on green, and 30-50% would be blue.

We do support the ability to indicate that the percentage rollout could or should be over different keys, so if for example you wanted percentage rollout over a company or store field, you will be able to do this in the future. The API and server side evaluation supports this, the most complex part is the UI to allow users to manage this data, so this will appear over time. It will only become important for SDKs when we start supporting client side evaluation.

Client Side Evaluation

Client side evaluation currently follows the same basic pattern in all SDKs. A rollout strategy basically consists of a bucket of data from the user, against which you need to map a rollout strategy’s attributes and potentially its percentage criteria (if it has any). Writing client side evaluation can be quite time consuming.

  • A "context" in the following discussion is simply a map, or dictionary of key / value pairs.

  • Each feature can always have zero or more rollout strategies attached to it.

The process for each feature is normally thus:

  1. if you are supporting value interceptors, check those first - generally the rule is if the feature is not locked and you have an interceptor value, return that value. Value interceptors can come from the incoming request trying to override the value (so the lock status is important) or from the local developer’s machine who is trying to operate within a specific context regardless of what the server is trying to tell her.

  2. now you check the rollout strategies and determine if any of them match. Each rollout strategy comes with a value (if matched), a percentage rollout strategy (which may be null) and zero or more attributes. The caveat is that if there is no context or there are no strategies this step can be skipped, nothing can match.

  3. if neither of these match, then you fall back to the default value of the feature.

Evaluating the rollout strategies

Remember we have to follow the percentage rules from above when applying strategies as well, so we start our iterating over the strategies by setting the cumulative percentage to 0. And then this is the way we cycle for each strategy and say:

  1. if there is a percentage on this strategy, figure out which keys we are using for the percentage determine based on the above rules if we are inside that percentage range. If we aren’t, skip this strategy, If we are and there are attributes, check the attributes as well (see below). If we match return the associated value, if not, skip ahead.

  2. if there is no percentage on this strategy, check the attributes. If we match return the associated value, if not, skip ahead.

Checking the attributes

There are a collection of attributes associated with each rollout strategy. Each one of them has a set of key data:

  • fieldName - which field name in the Context to compare against

  • type - string, number, date, datetime, semver, ip/cidr, boolean. Dates and Date/Times are always UTC. You will need to write a matcher for each of these types, although once formatted correctly, the date/datetimes can usually reuse the string matcher.

  • condition - what is the condition to apply, different types have different collections they will compare against, those conditions are equals, not equals, includes, excludes, greater, lesser, greater-equals, lesser-equals

  • values - the value(s) you are comparing against as a match

The Typescript strategy matcher is here for instance