These extra timeout contexts were only in the new multiple IDPs e2e
test. Remove this possible cause of test cleanup flakes where the test
runs slow enough in CI that this timeout context has already expired
and then the cleanup function fails with context deadline exceeded
errors.
To make the subject of the downstream ID token more unique when
there are multiple IDPs. It is possible to define two IDPs in a
FederationDomain using the same identity provider CR, in which
case the only thing that would make the subject claim different
is adding the IDP display name into the values of the subject claim.
- Remove that validation from the controller since the CRD already
validates it during creates and updates.
- Also finish the supervisor_federationdomain_status_test.go by adding
more tests for both controller validations and CRD validations
- Also fix small bug in controller where it used Sprintf wrong
- Rename WaitForTestFederationDomainStatus test helper to
WaitForFederationDomainStatusPhase
Also changes the transformation pipeline code to sort and uniq
the transformed group names at the end of the pipeline. This makes
the results more predicable without changing the semantics.
- Avoid a possible race condition where the status says "Ready" but
the endpoints take another moment to become available, potentially
casing a fast client to get a 404 after observing that the status
is "Ready" and then immediately trying to use the endpoints.
Co-authored-by: Benjamin A. Petersen <ben@benjaminapetersen.me>
- Refactor testlib.CreateTestFederationDomain helper
- Call testlib.WaitForTestFederationDomainStatus after each integration
test creates an IDP and expects the FederationDomain to become ready
- Create an IDP for some tests which want the FederationDomain to be
ready but were previously not creating any IDP
- Expect the new FederationDomain condition type
"IdentityProvidersFound" in those tests where it is needed
Co-authored-by: Joshua Casey <joshuatcasey@gmail.com>
- adds the truthy condition
- TODOs for falsy conditions
- addiional notes for other conditions
- tests updated to pass with the new condition
Co-authored-by: Ryan Richard <richardry@vmware.com>
- move pushd/popd inside if statements for alternative-deploy methods
- add specific alternative-deploy vars for individual components
- supervisor
- concierge
- local-user-authenticator
while preserving the current alternative-deploy for all three
- doc that equals for flags does not work
--foo=bar is invalid
--foo bar is valid
Used this as an opportunity to refactor how some tests were
making assertions about error strings.
New test helpers make it easy for an error string to be expected as an
exact string, as a string built using sprintf, as a regexp, or as a
string built to include the platform-specific x509 error string.
All of these helpers can be used in a single `wantErr` field of a test
table. They can be used for both unit tests and integration tests.
Co-authored-by: Benjamin A. Petersen <ben@benjaminapetersen.me>
- Specify mappings on OIDCIdentityProvider.spec.claims.additionalClaimMappings
- Advertise additionalClaims in the OIDC discovery endpoint under claims_supported
Co-authored-by: Ryan Richard <richardry@vmware.com>
Co-authored-by: Joshua Casey <joshuatcasey@gmail.com>
Also increase the timeout in an integration test because it is flaking
on one of the GKE environments sometimes, probably because the
Concierge controllers aren't ready fast enough before the integration
tests start.
The fuzzed value depends on which Go compiler is used. This breaks
the fips tests in CI as long as the fips compiler is a version behind
(we are still waiting for the 1.19 fips compiler to come out).
The fuzzing is still being tested by a separate unit test, so we are
not losing fuzzing test coverage.
- Upgrade Go used in CI from 1.19.0 to 1.19.1
- Upgrade all go.mod direct dependencies to latest available versions
- Upgrade distroless base image to latest available version
- Upgrade Go fips compiler to to latest available version
Note that upgrading the go-oidc library changed an error message
returned by that library, so update the places where tests were
expecting that error message.
When oidcclientsecretstorage.Set() wants to update the contents of the
storage Secret, it also wants to keep the original ownerRef of the
storage Secret, so it needs the middleware to rewrite the API group
of the ownerRef again during the update (just like it had initially done
during the create of the Secret).
Sets the Name, Namespace, CreationTimestamp fields in the object meta
of the return value.
Co-authored-by: Ryan Richard <richardry@vmware.com>
Co-authored-by: Benjamin A. Petersen <ben@benjaminapetersen.me>
- Change update-codegen.sh script to also generated openapi code for the
aggregated API types
- Update both aggregated API servers' configuration to make them serve
the openapi docs for the aggregated APIs
- Add new integration test which runs `kubectl explain` for all Pinniped
API resources, and all fields and subfields of those resources
- Update some the comments on the API structs
- Change some names of the tmpl files to make the filename better match
the struct names
This commit is a WIP commit because it doesn't include many tests
for the new feature.
Co-authored-by: Ryan Richard <richardry@vmware.com>
Co-authored-by: Benjamin A. Petersen <ben@benjaminapetersen.me>
Where possible, use securityContext settings which will work with the
most restrictive Pod Security Admission policy level (as of Kube 1.25).
Where privileged containers are needed, use the namespace-level
annotation to allow them.
Also adjust some integration tests to make similar changes to allow the
integration tests to pass on test clusters which use restricted PSAs.
This test is a little flaky in slow Kubernetes clusters. Try giving a
little more time for things to update before failing the test, to
hopefully make this test a little more reliable.
It seems that google changed it from "the gcp auth plugin is deprecated
in v1.22+, unavailable in v1.25+; use gcloud instead" to instead say
"unavailable in v1.26+". Make the matcher in category_test.go more loose
to allow both to match.
Also fix some tests that were broken by bumping golang and dependencies
in the previous commits.
Note that in addition to changes made to satisfy the linter which do not
impact the behavior of the code, this commit also adds ReadHeaderTimeout
to all usages of http.Server to satisfy the linter (and because it
seemed like a good suggestion).
Add the new `username` scope to the Supervisor and exclude usernames from dynamic clients which are not granted the scope, and other dynamic client related changes
When the token exchange grant type is used to get a cluster-scoped
ID token, the returned token has a new audience value. The client ID
of the client which performed the authorization was lost. This didn't
matter before, since the only client was `pinniped-cli`, but now that
dynamic clients can be registered, the information would be lost in the
cluster-scoped ID token. It could be useful for logging, tracing, or
auditing, so preserve the information by putting the client ID into the
`azp` claim in every ID token (authcode exchange, clsuter-scoped, and
refreshed ID tokens).
- For backwards compatibility with older Pinniped CLIs, the pinniped-cli
client does not need to request the username or groups scopes for them
to be granted. For dynamic clients, the usual OAuth2 rules apply:
the client must be allowed to request the scopes according to its
configuration, and the client must actually request the scopes in the
authorization request.
- If the username scope was not granted, then there will be no username
in the ID token, and the cluster-scoped token exchange will fail since
there would be no username in the resulting cluster-scoped ID token.
- The OIDC well-known discovery endpoint lists the username and groups
scopes in the scopes_supported list, and lists the username and groups
claims in the claims_supported list.
- Add username and groups scopes to the default list of scopes
put into kubeconfig files by "pinniped get kubeconfig" CLI command,
and the default list of scopes used by "pinniped login oidc" when
no list of scopes is specified in the kubeconfig file
- The warning header about group memberships changing during upstream
refresh will only be sent to the pinniped-cli client, since it is
only intended for kubectl and it could leak the username to the
client (which may not have the username scope granted) through the
warning message text.
- Add the user's username to the session storage as a new field, so that
during upstream refresh we can compare the original username from the
initial authorization to the refreshed username, even in the case when
the username scope was not granted (and therefore the username is not
stored in the ID token claims of the session storage)
- Bump the Supervisor session storage format version from 2 to 3
due to the username field being added to the session struct
- Extract commonly used string constants related to OIDC flows to api
package.
- Change some import names to make them consistent:
- Always import github.com/coreos/go-oidc/v3/oidc as "coreosoidc"
- Always import go.pinniped.dev/generated/latest/apis/supervisor/oidc
as "oidcapi"
- Always import go.pinniped.dev/internal/oidc as "oidc"
- Add dynamic client unit tests for the upstream OIDC callback and
POST login endpoints.
- Enhance a few log statements to print the full fosite error messages
into the logs where they were previously only printing the name of
the error type.
- Enhance the token exchange to check that the same client is used
compared to the client used during the original authorization and
token requests, and also check that the client has the token-exchange
grant type allowed in its configuration.
- Reduce the minimum required bcrypt cost for OIDCClient secrets
because 15 is too slow for real-life use, especially considering
that every login and every refresh flow will require two client auths.
- In unit tests, use bcrypt hashes with a cost of 4, because bcrypt
slows down by 13x when run with the race detector, and we run our
tests with the race detector enabled, causing the tests to be
unacceptably slow. The production code uses a higher minimum cost.
- Centralize all pre-computed bcrypt hashes used by unit tests to a
single place. Also extract some other useful test helpers for
unit tests related to OIDCClients.
- Add tons of unit tests for the token endpoint related to dynamic
clients for authcode exchanges, token exchanges, and refreshes.
This is only a first commit towards making this feature work.
- Hook dynamic clients into fosite by returning them from the storage
interface (after finding and validating them)
- In the auth endpoint, prevent the use of the username and password
headers for dynamic clients to force them to use the browser-based
login flows for all the upstream types
- Add happy path integration tests in supervisor_login_test.go
- Add lots of comments (and some small refactors) in
supervisor_login_test.go to make it much easier to understand
- Add lots of unit tests for the auth endpoint regarding dynamic clients
(more unit tests to be added for other endpoints in follow-up commits)
- Enhance crud.go to make lifetime=0 mean never garbage collect,
since we want client secret storage Secrets to last forever
- Move the OIDCClient validation code to a package where it can be
shared between the controller and the fosite storage interface
- Make shared test helpers for tests that need to create OIDC client
secret storage Secrets
- Create a public const for "pinniped-cli" now that we are using that
string in several places in the production code
The following validation is enforced:
1. Names must start with client.oauth.pinniped.dev-
2. Redirect URIs must start with https://
or http://127.0.0.1
or http://::1
3. All spec lists must not have duplicates
Added an integration test to assert all static validations.
Signed-off-by: Monis Khan <mok@vmware.com>
When response_mode=form_post is requested, some error cases will be
returned to the client using the form_post web page to POST the result
back to the client's redirect URL.
Also fix some comments that didn't fit onto one line in the yaml
examples, be consistent about putting a blank line above `---` yaml
separators, and some other small doc improvements.
Also:
- Add CSS to login page
- Refactor login page HTML and CSS into a new package
- New custom CSP headers for the login page, because the requirements
are different from the form_post page
Note that attempting to update 1.18.18 to 1.18.20 didn't work for some
reason, so I skipped that one. The code generator didn't like 1.18.20
and it deleted all the generated code. Avoiding 1.18.19 because it is
listed as having a regression at
https://kubernetes.io/releases/patch-releases/#non-active-branch-history
The other handlers for GET and POST requests are not yet implemented in
this commit. The shared handler code in login_handler.go takes care of
things checking the method, checking the CSRF cookie, decoding the state
param, and adding security headers on behalf of both the GET and POST
handlers.
Some code has been extracted from callback_handler.go to be shared.
Also fix some test failures on the callback handler, register the
new login handler in manager.go and add a (half baked) integration test
Signed-off-by: Margo Crawford <margaretc@vmware.com>
To keep this backwards compatible, this PR changes how
the cli deals with ambiguous flows. Previously, if there
was more than one flow advertised, the cli would require users
to set the flag --upstream-identity-provider-flow. Now it
chooses the first one in the list.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
Go 1.18.1 started using MacOS' x509 verification APIs on Macs
rather than Go's own. The error messages are different.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
We cannot use plog until the log level config has been setup, but
that occurs after this init function has run.
Signed-off-by: Monis Khan <mok@vmware.com>
- Use camel-case in the static configmap
- Parse the value into a boolean in the go struct instead of a string
- Add test for when unsupported value is used in the configmap
- Run the config_test.go tests in parallel
- Update some paragraphs in configure-supervisor.md for clarity
Add new deprecated_insecure_accept_external_unencrypted_http_requests
value in values.yaml. Allow it to be a boolean or a string to make it
easier to use (both --data-value and --data-value-yaml will work).
Also:
- Consider "ip6-localhost" and "ip6-loopback" to be loopback addresses
for the validation
- Remove unused env.SupervisorHTTPAddress var
- Deprecate the `service_http_*` values in values.yaml by renaming them
and causing a ytt render error when the old names are used
ory/x has new releases very often, sometimes multiple times per week,
causing a lot of noise from dependabot. We were barely using it
directly, so replace our direct usages with equivalent code.
You can use an older version of K8s on your development workstation
by temporarily editing kind-up.sh to add the `--image` flag. By defining
both v1beta2 and v1beta3 you should continue to be able to use old
versions of K8s in this way with Kind v0.12.0.
It appears that kind completely ignores kubeadm.k8s.io/v1beta2 config
starting in Kind v0.12.0.
You can observe the config being ignored or used by adding `-v 10` to
the command-line arguments of `kind create cluster` in kind-up.sh.
- Two of the linters changed their names
- Updated code and nolint comments to make all linters pass with 1.44.2
- Added a new hack/install-linter.sh script to help developers install
the expected version of the linter for local development
- Note that v0.8.0 no longer supports the "trivialVersions=true"
command-line option, so remove that from update-codegen.sh.
It doesn't seem to impact the output (our generated CRD yaml files).
Also:
- Make our code generator script work with Go 1.17
- Make our update.sh script work on linux
- Update the patch versions of the old Kube versions that we were using
to generate code (see kube-versions.txt)
- Use our container images from ghcr instead of
projects.registry.vmware.com for codegen purposes
- Make it easier to debug in the future by passing "-v" to the Kube
codegen scripts
- Updated copyright years to make commit checks pass
The purpose of this change is to allow Helm to be used to deploy Pinniped
into the local KinD cluster for the local integration tests. That said,
the change allows any alternate deployment mechanism, I just happen
to be using it with Helm.
All default behavior is preserved. This won't change how anyone uses the
script today, it just allows me not to copy/paste the whole setup for the
integration tests.
Changes:
1) An option called `--alternate-deploy <path-to-deploy-script>` has been
added, that when enabled calls the specified script instead of using ytt
and kapp. The alternate deploy script is called with the app to deploy
and the tag of the docker image to use. We set the default value of
the alternate_deploy variable to undefined, and there is a check that
tests if the alternate deploy is defined. For the superivsor it looks
like this:
```
if [ "$alternate_deploy" != "undefined" ]; then
log_note "The Pinniped Supervisor will be deployed with $alternate_deploy pinniped-supervisor $tag..."
$alternate_deploy pinniped-supervisor $tag
else
normal ytt/kapp deploy
fi
```
2) Additional log_note entries have been added to enumerate all values passed
into the ytt/kapp deploy. Used while I was trying to reach parity in the integration
tests, but I think they are useful for debugging.
3) The manifests produced by ytt and written to /tmp are now named individually.
This is so an easy comparison can be made between manifests produced by a ytt/kapp
run of integration tests and manifests produced by helm run of the integration tests.
If something is not working I have been comparing the manifests after these runs to
find differences.
This allows us to target browser based tests with the regex:
go test -v -race -count 1 -timeout 0 ./test/integration -run '/_Browser'
New tests that call browsertest.Open will automatically be forced to
follow this convention.
Signed-off-by: Monis Khan <mok@vmware.com>
When the POST to the CLI's localhost callback endpoint results in a
non-2XX status code, then treat that as a failed login attempt and
automatically show the manual copy/paste UI.
Just in case some future browser change sends some new kind of request
to our CLI, just ignore them by returning StatusMethodNotAllowed and
continuing to listen.
When the test was going to fail, a goroutine would accidentally block
on writing to an unbuffered channel, and the spawnTestGoroutine helper
would wait for that goroutine to end on cleanup, causing the test to
hang forever while it was trying to fail.
This is to support the new changes in Google Chrome v98 which now
performs CORS preflight requests for the Javascript form submission
on the Supervisor's login page, even though the form is being submitted
to a localhost listener.
reccommend using install-pinniped-concierge-crds.yaml, then
install-pinniped-concierge-resources.yaml.
Previously we recommended install-pinniped-concierge-crds (a subset),
then install-pinniped-concierge (everything concierge related, including
the crds). This works fine for install, but not uninstall. Instead we
should use a separate yaml file that contains everything in
install-pinniped-concierge but *not* in install-pinniped-concierge-crds.
We have been generating this file in CI since a5ced4286b6febc7474b7adee34eeb1b62ec82b7
but we haven't released since then so we haven't been able to recommend
its use.
Fosite v0.42.0 introduced a new RevokeRefreshTokenMaybeGracePeriod()
interface function. Updated our code to support this change. We didn't
support grace periods on refresh tokens before, so implemented it by
making the new RevokeRefreshTokenMaybeGracePeriod() method just call
the old RevokeRefreshToken() method, therefore keeping our old behavior.
This change allows configuration of the http and https listeners
used by the supervisor.
TCP (IPv4 and IPv6 with any interface and port) and Unix domain
socket based listeners are supported. Listeners may also be
disabled.
Binding the http listener to TCP addresses other than 127.0.0.1 or
::1 is deprecated.
The deployment now uses https health checks. The supervisor is
always able to complete a TLS connection with the use of a bootstrap
certificate that is signed by an in-memory certificate authority.
To support sidecar containers used by service meshes, Unix domain
socket based listeners include ACLs that allow writes to the socket
file from any runAsUser specified in the pod's containers.
Signed-off-by: Monis Khan <mok@vmware.com>
Also refactor oidc downstreamsessiondata code to be shared between
callback handler and auth handler.
Signed-off-by: Ryan Richard <richardry@vmware.com>
When the LDAP and AD IDP watcher controllers encountered an update error
while trying to update the status conditions of the IDP resources, then
they would drop the computed desired new value of the condition on the
ground. Next time the controller ran it would not try to update the
condition again because it wants to use the cached settings and had
already forgotten the desired new value of the condition computed during
the previous run of the controller. This would leave the outdated value
of the condition on the IDP resource.
This bug would manifest in CI as random failures in which the expected
condition message and the actual condition message would refer to
different versions numbers of the bind secret. The actual condition
message would refer to an older version of the bind secret because the
update failed and then the new desired message got dropped on the
ground.
This commit changes the in-memory caching strategy to also cache the
computed condition messages, allowing the conditions to be updated
on the IDP resource during future calls to Sync() in the case of a
failed update.
Ran:
go get -u ./... && go mod tidy
Pinned all go.opentelemetry.io deps to match k/k.
This is needed to make the go get command work.
Signed-off-by: Monis Khan <mok@vmware.com>
- Make everything private
- Drop unused AuthTime field
- Use %q format string instead of "%s"
- Only rely on GetRawAttributeValues in AttributeUnchangedSinceLogin
Signed-off-by: Monis Khan <mok@vmware.com>
For password based login on the CLI (i.e. no browser), this change
relaxes the response code check to allow for any redirect code
handled by the Go standard library. In the future, we can drop the
rewriteStatusSeeOtherToStatusFoundForBrowserless logic from the
server side code.
Signed-off-by: Monis Khan <mok@vmware.com>
Highlights from this dep bump:
1. Made a copy of the v0.4.0 github.com/go-logr/stdr implementation
for use in tests. We must bump this dep as Kube code uses a
newer version now. We would have to rewrite hundreds of test log
assertions without this copy.
2. Use github.com/felixge/httpsnoop to undo the changes made by
ory/fosite#636 for CLI based login flows. This is required for
backwards compatibility with older versions of our CLI. A
separate change after this will update the CLI to be more
flexible (it is purposefully not part of this change to confirm
that we did not break anything). For all browser login flows, we
now redirect using http.StatusSeeOther instead of http.StatusFound.
3. Drop plog.RemoveKlogGlobalFlags as klog no longer mutates global
process flags
4. Only bump github.com/ory/x to v0.0.297 instead of the latest
v0.0.321 because v0.0.298+ pulls in a newer version of
go.opentelemetry.io/otel/semconv which breaks k8s.io/apiserver.
We should update k8s.io/apiserver to use the newer code.
5. Migrate all code from k8s.io/apimachinery/pkg/util/clock to
k8s.io/utils/clock and k8s.io/utils/clock/testing
6. Delete testutil.NewDeleteOptionsRecorder and migrate to the new
kubetesting.NewDeleteActionWithOptions
7. Updated ExpectedAuthorizeCodeSessionJSONFromFuzzing caused by
fosite's new rotated_secrets OAuth client field. This new field
is currently not relevant to us as we have no private clients.
Signed-off-by: Monis Khan <mok@vmware.com>
Also refactor the code that decides which types of revocation failures
are worth retrying. Be more selective by only retrying those types of
errors that are likely to be worth retrying.
- Rename the RevokeRefreshToken() function to RevokeToken() and make it
take the token type (refresh or access) as a new parameter.
- This is a prefactor getting ready to support revocation of upstream
access tokens in the garbage collection handler.
This change updates the new TLS integration tests to:
1. Only create the supervisor default TLS serving cert if needed
2. Port forward the node port supervisor service since that is
available in all environments
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the TLS config used by all pinniped components.
There are no configuration knobs associated with this change. Thus
this change tightens our static defaults.
There are four TLS config levels:
1. Secure (TLS 1.3 only)
2. Default (TLS 1.2+ best ciphers that are well supported)
3. Default LDAP (TLS 1.2+ with less good ciphers)
4. Legacy (currently unused, TLS 1.2+ with all non-broken ciphers)
Highlights per component:
1. pinniped CLI
- uses "secure" config against KAS
- uses "default" for all other connections
2. concierge
- uses "secure" config as an aggregated API server
- uses "default" config as a impersonation proxy API server
- uses "secure" config against KAS
- uses "default" config for JWT authenticater (mostly, see code)
- no changes to webhook authenticater (see code)
3. supervisor
- uses "default" config as a server
- uses "secure" config against KAS
- uses "default" config against OIDC IDPs
- uses "default LDAP" config against LDAP IDPs
Signed-off-by: Monis Khan <mok@vmware.com>
- Used to determine on which port the impersonation proxy will bind
- Defaults to 8444, which is the old hard-coded port value
- Allow the port number to be configured to any value within the
range 1024 to 65535
- This commit does not include adding new config knobs to the ytt
values file, so while it is possible to change this port without
needing to recompile, it is not convenient
- Allow the port number to be configured to any value within the
range 1024 to 65535
- This commit does not include adding new config knobs to the ytt
values file, so while it is possible to change this port without
needing to recompile, it is not convenient
- pull construction of authenticators.Response into searchAndBindUser
- remove information about the identity provider in the error that gets
returned to users. Put it in debug instead, where it may show up in
logs.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
- changed to use custom authenticators.Response rather than the k8s one
that doesn't include space for a DN
- Added more checking for correct idp type in token handler
- small style changes
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This stores the user DN in the session data upon login and checks that
the entry still exists upon refresh. It doesn't check anything
else about the entry yet.
Use "..." instead of "main.go" as the build target since we may have
extra files in the future.
https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies
-trimpath
remove all file system paths from the resulting executable.
Instead of absolute file system paths, the recorded file names
will begin with either "go" (for the standard library),
or a module path@version (when using modules),
or a plain import path (when using GOPATH).
Signed-off-by: Monis Khan <mok@vmware.com>
- Discover the revocation endpoint of the upstream provider in
oidc_upstream_watcher.go and save it into the cache for future use
by the garbage collector controller
- Adds RevokeRefreshToken to UpstreamOIDCIdentityProviderI
- Implements the production version of RevokeRefreshToken
- Implements test doubles for RevokeRefreshToken for future use in
garbage collector's unit tests
- Prefactors the crud and session storage types for future use in the
garbage collector controller
- See remaining TODOs in garbage_collector.go
Otherwise, the CA and proxy settings will not be used for the call
to the upstream token endpoint while performing the refresh. This
mistake was exposed by the TestSupervisorLogin integration test, so
it has test coverage.
- If the upstream refresh fails, then fail the downstream refresh
- If the upstream refresh returns an ID token, then validate it (we
use its claims in the future, but not in this commit)
- If the upstream refresh returns a new refresh token, then save it
into the user's session in storage
- Pass the provider cache into the token handler so it can use the
cached providers to perform upstream refreshes
- Handle unexpected errors in the token handler where the user's session
does not contain the expected data. These should not be possible
in practice unless someone is manually editing the storage, but
handle them anyway just to be safe.
- Refactor to share the refresh code between the CLI and the token
endpoint by moving it into the UpstreamOIDCIdentityProviderI
interface, since the token endpoint needed it to be part of that
interface anyway
- Requiring refresh tokens to be returned from upstream OIDC idps
- Storing refresh tokens (for oidc) and idp information (for all idps) in custom session data during authentication
- Don't pass access=offline all the time
- throw an error when prompt=none because the spec says we can't ignore
it
- ignore the other prompt params
Signed-off-by: Ryan Richard <richardry@vmware.com>
This will allow us to store custom data inside the fosite session
storage for all downstream OIDC sessions.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This was wrong, since you don't need a LoadBalancer to run the
impersonation proxy if you specify spec.service.type = "None" or
"ClusterIP" on the CredentialIssuer.
This change fixes a copy paste error that led to the impersonation
proxy signer CA being rotated based on the configuration of the
rotation of the aggregated API serving certificate. This would lead
to occasional "Unauthorized" flakes in our CI environments that
rotate the serving certificate at a frequent interval.
Updated the certs_expirer controller logs to be more detailed.
Updated CA common names to be more specific (this does not update
any previously generated CAs).
Signed-off-by: Monis Khan <mok@vmware.com>
Updated Roadmap to reflect the work on Supervisor token refresh for OIDC and LDAP/AD. Also changed ordering on Multiple IDP Support as we are seeing more user interest for this feature.
At debug level:
upstreamoidc.go:213] "claims from ID token and userinfo"
providerName="oidc"
keys=[at_hash aud email email_verified exp iat iss sub]
At all level:
upstreamoidc.go:207] "claims from ID token and userinfo"
providerName="oidc"
claims="{\"at_hash\":\"C55S-BgnHTmr2_TNf...hYmVhYWESBWxvY2Fs\"}"
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the kube cert agent to a middle ground behavior
that balances leader election gating with how quickly we load the
signer.
If the agent labels have not changed, we will attempt to load the
signer even if we cannot roll out the latest version of the kube
cert agent deployment.
This gives us the best behavior - we do not have controllers
fighting over the state of the deployment and we still get the
signer loaded quickly.
We will have a minute of downtime when the kube cert agent deployment
changes because the new pods will have to wait to become a leader
and for the new deployment to rollout the new pods. We would need
to have a per pod deployment if we want to avoid that downtime (but
this would come at the cost of startup time and would require
coordination with the kubelet in regards to pod readiness).
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates our certificate code to use the same 5 minute
backdate that is used by the Kubernetes controller manager. This
helps to account for clock skews between the API servers and the
kubelets that are running the pinniped pods. While this backdating
reflects a large percentage of the lifetime of our short lived
certificates (100% for the 5 minute client certificates), even a 10
minute irrevocable client certificate is within our limits. When
we move to the CSR based short lived certificates, they will always
have at least a 15 minute lifetime (5 minute backdating plus 10 minute
minimum valid duration).
Signed-off-by: Monis Khan <mok@vmware.com>
CertificatesV1beta1 was removed in Kube 1.22, so the tests cannot
blindly rely on it anymore. Use CertificatesV1 whenever the server
reports that is available, and otherwise use the old
CertificatesV1beta1.
Note that CertificatesV1 was introduced in Kube 1.19.
This commit makes the following changes to the kube cert agent tests:
1. Informers are synced on start using the controllerinit code
2. Deployment client and informer are synced per controller sync loop
3. Controller sync loop exits after two consistent errors
4. Use assert instead of require to avoid ending the test early
Signed-off-by: Monis Khan <mok@vmware.com>
This makes it so that our service selector will match exactly the
YAML we specify instead of including an extra "kapp.k14s.io/app" key.
This will take us closer to the standard kubectl behavior which is
desirable since we want to avoid future bugs that only manifest when
kapp is not used.
Signed-off-by: Monis Khan <mok@vmware.com>
This should make it easier for us to to notice if something is wrong
with our service (especially in any future kubectl tests we add).
Signed-off-by: Monis Khan <mok@vmware.com>
Not required, but within the spirit of using the version number.
Since the existing kube cert agent deployment will get deleted anyway
during an upgrade, it shouldn't hurt to change the version number.
New installations will get the new version number on the new kube cert
agent deployment.
Fixes#801. The solution is complicated by the fact that the Selector
field of Deployments is immutable. It would have been easy to just
make the Selectors of the main Concierge Deployment, the Kube cert agent
Deployment, and the various Services use more specific labels, but
that would break upgrades. Instead, we make the Pod template labels and
the Service selectors more specific, because those not immutable, and
then handle the Deployment selectors in a special way.
For the main Concierge and Supervisor Deployments, we cannot change
their selectors, so they remain "app: app_name", and we make other
changes to ensure that only the intended pods are selected. We keep the
original "app" label on those pods and remove the "app" label from the
pods of the Kube cert agent Deployment. By removing it from the Kube
cert agent pods, there is no longer any chance that they will
accidentally get selected by the main Concierge Deployment.
For the Kube cert agent Deployment, we can change the immutable selector
by deleting and recreating the Deployment. The new selector uses only
the unique label that has always been applied to the pods of that
deployment. Upon recreation, these pods no longer have the "app" label,
so they will not be selected by the main Concierge Deployment's
selector.
The selector of all Services have been updated to use new labels to
more specifically target the intended pods. For the Concierge Services,
this will prevent them from accidentally including the Kube cert agent
pods. For the Supervisor Services, we follow the same convention just
to be consistent and to help future-proof the Supervisor app in case it
ever has a second Deployment added to it.
The selector of the auto-created impersonation proxy Service was
also previously using the "app" label. There is no change to this
Service because that label will now select the correct pods, since
the Kube cert agent pods no longer have that label. It would be possible
to update that selector to use the new more specific label, but then we
would need to invent a way to pass that label into the controller, so
it seemed like more work than was justified.
Today is my last day working full time on Pinniped (for now). This change removes me from the MAINTAINERS.md and the website.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
These fields were changed as a minor hardening attempt when we switched to Distroless, but I bungled the field names and we never noticed because Kapp doesn't apply API validations.
This change fixes the field names so they act as was originally intended. We should also follow up with a change that validates all of our installation manifest in CI.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Changes made to both components:
1. Logs are always flushed on process exit
2. Informer cache sync can no longer hang process start up forever
Changes made to concierge:
1. Add pre-shutdown hook that waits for controllers to exit cleanly
2. Informer caches are synced in post-start hook
Changes made to supervisor:
1. Add shutdown code that waits for controllers to exit cleanly
2. Add shutdown code that waits for active connections to become idle
Waiting for controllers to exit cleanly is critical as this allows
the leader election logic to release the lock on exit. This reduces
the time needed for the next leader to be elected.
Signed-off-by: Monis Khan <mok@vmware.com>
The kubelet will send the SIGTERM signal when it wants a process to
exit. After a grace period, it will send the SIGKILL signal to
force the process to terminate. The concierge has always handled
both SIGINT and SIGTERM as indicators for it to gracefully exit
(i.e. stop watches, controllers, etc). This change updates the
supervisor to do the same (previously it only handled SIGINT). This
is required to allow the leader election lock release logic to run.
Otherwise it can take a few minutes for new pods to acquire the
lease since they believe it is already held.
Signed-off-by: Monis Khan <mok@vmware.com>
For clusters where the control plane nodes aren't running a CNI, the
kube-cert-agent pods deployed by concierge cannot be scheduled as they
don't know to use `hostNetwork: true`. This change allows embedding the
host network setting in the Concierge configuration. (by copying it from
the kube-controller-manager pod spec when generating the kube-cert-agent
Deployment)
Also fixed a stray double comma in one of the nearby tests.
Instead of blindly waiting long enough for a disruptive change to
have been observed by the old leader and followers, we instead rely
on the approximation that checkOnlyLeaderCanWrite provides - i.e.
only a single actor believes they are the leader. This does not
account for clients that were in the followers list before and after
the disruptive change, but it serves as a reasonable approximation.
Signed-off-by: Monis Khan <mok@vmware.com>
Those images that are pulled from Dockerhub will cause pull failures
on some test clusters due to Dockerhub rate limiting.
Because we already have some images that we use for testing, and
because those images are already pre-loaded onto our CI clusters
to make the tests faster, use one of those images and always specify
PullIfNotPresent to avoid pulling the image again during the integration
test.
OpenShift has good defaults for these duration fields that we can
use instead of coming up with them ourselves:
e14e06ba8d/pkg/config/leaderelection/leaderelection.go (L87-L109)
Copied here for easy future reference:
// We want to be able to tolerate 60s of kube-apiserver disruption without causing pod restarts.
// We want the graceful lease re-acquisition fairly quick to avoid waits on new deployments and other rollouts.
// We want a single set of guidance for nearly every lease in openshift. If you're special, we'll let you know.
// 1. clock skew tolerance is leaseDuration-renewDeadline == 30s
// 2. kube-apiserver downtime tolerance is == 78s
// lastRetry=floor(renewDeadline/retryPeriod)*retryPeriod == 104
// downtimeTolerance = lastRetry-retryPeriod == 78s
// 3. worst non-graceful lease acquisition is leaseDuration+retryPeriod == 163s
// 4. worst graceful lease acquisition is retryPeriod == 26s
if ret.LeaseDuration.Duration == 0 {
ret.LeaseDuration.Duration = 137 * time.Second
}
if ret.RenewDeadline.Duration == 0 {
// this gives 107/26=4 retries and allows for 137-107=30 seconds of clock skew
// if the kube-apiserver is unavailable for 60s starting just before t=26 (the first renew),
// then we will retry on 26s intervals until t=104 (kube-apiserver came back up at 86), and there will
// be 33 seconds of extra time before the lease is lost.
ret.RenewDeadline.Duration = 107 * time.Second
}
if ret.RetryPeriod.Duration == 0 {
ret.RetryPeriod.Duration = 26 * time.Second
}
Signed-off-by: Monis Khan <mok@vmware.com>
This change fixes a small race condition that occurred when the
current leader failed to renew its lease. Before this change, the
leader would first release the lease via the Kube API and then would
update its in-memory status to reflect that change. Now those
events occur in the reverse (i.e. correct) order.
Signed-off-by: Monis Khan <mok@vmware.com>
Even though a client may hold the leader election lock in the Kube
lease API, that does not mean it has had a chance to update its
internal state to reflect that. Thus we retry the checks in
checkOnlyLeaderCanWrite a few times to allow the client to catch up.
Signed-off-by: Monis Khan <mok@vmware.com>
We no longer have a transitive dependency on this older repository, so we don't need the replace directive anymore.
There is a new fork of this that we should move to (https://github.com/golang-jwt/jwt), but we can't easily do that until a couple of our direct dependencies upgrade.
This is a revert of d162cb9adf.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Change list of attributeParsingOverrides to a map
- Add unit test for sAMAccountName as group name without the override
- Change some comments in the the type definition.
The Kube API server code that we use will cast inputs in an attempt
to see if they implement optional interfaces. This change adds a
simple wrapper struct to prevent such casts from causing us any
issues.
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the default NO_PROXY for the supervisor to not
proxy requests to the Kubernetes API and other Kubernetes endpoints
such as Kubernetes services.
It also adds https_proxy and no_proxy settings for the concierge
with the same default.
Signed-off-by: Monis Khan <mok@vmware.com>
In the upstream dynamiccertificates package, we rely on two pieces
of code:
1. DynamicServingCertificateController.newTLSContent which calls
- clientCA.CurrentCABundleContent
- servingCert.CurrentCertKeyContent
2. unionCAContent.VerifyOptions which calls
- unionCAContent.CurrentCABundleContent
This results in calls to our tlsServingCertDynamicCertProvider and
impersonationSigningCertProvider. If we Unset these providers, we
subtly break these consumers. At best this results in test slowness
and flakes while we wait for reconcile loops to converge. At worst,
it results in actual errors during runtime. For example, we
previously would Unset the impersonationSigningCertProvider on any
sync loop error (even a transient one caused by a network blip or
a conflict between writes from different replicas of the concierge).
This would cause us to transiently fail to issue new certificates
from the token credential require API. It would also cause us to
transiently fail to authenticate previously issued client certs
(which results in occasional Unauthorized errors in CI).
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the pinniped CLI entrypoint to prevent browser
processes that we spawn from polluting our std out stream.
For example, chrome will print the following message to std out:
Opening in existing browser session.
Which leads to the following incomprehensible error message from
kubectl:
Unable to connect to the server: getting credentials:
decoding stdout: couldn't get version/kind; json parse error:
json: cannot unmarshal string into Go value of type struct
{ APIVersion string "json:\"apiVersion,omitempty\"";
Kind string "json:\"kind,omitempty\"" }
This would only occur on the initial login when we opened the
browser. Since credentials would be cached afterwards, kubectl
would work as expected for future invocations as no browser was
opened.
I could not think of a good way to actually test this change. There
is a clear gap in our integration tests - we never actually launch a
browser in the exact same way a user does - we instead open a chrome
driver at the login URL as a subprocess of the integration test
binary and not the pinniped CLI. Thus even if the chrome driver was
writing to std out, we would not notice any issues.
It is also unclear if there is a good way to prevent future related
bugs since std out is global to the process.
Signed-off-by: Monis Khan <mok@vmware.com>
After merging the new Kube 1.22 ExecCredential changes from main into
this feature branch, some of the new units test on this feature branch
needed to be update to account for the new ExecCredential "interactive"
field.
- Add `AllowPasswordGrant` boolean field to OIDCIdentityProvider's spec
- The oidc upstream watcher controller copies the value of
`AllowPasswordGrant` into the configuration of the cached provider
- Add password grant to the UpstreamOIDCIdentityProviderI interface
which is implemented by the cached provider instance for use in the
authorization endpoint
- Enhance the IDP discovery endpoint to return the supported "flows"
for each IDP ("cli_password" and/or "browser_authcode")
- Enhance `pinniped get kubeconfig` to help the user choose the desired
flow for the selected IDP, and to write the flow into the resulting
kubeconfg
- Enhance `pinniped login oidc` to have a flow flag to tell it which
client-side flow it should use for auth (CLI-based or browser-based)
- In the Dex config, allow the resource owner password grant, which Dex
implements to also return ID tokens, for use in integration tests
- Enhance the authorize endpoint to perform password grant when
requested by the incoming headers. This commit does not include unit
tests for the enhancements to the authorize endpoint, which will come
in the next commit
- Extract some shared helpers from the callback endpoint to share the
code with the authorize endpoint
- Add new integration tests
At a high level, it switches us to a distroless base container image, but that also includes several related bits:
- Add a writable /tmp but make the rest of our filesystems read-only at runtime.
- Condense our main server binaries into a single pinniped-server binary. This saves a bunch of space in
the image due to duplicated library code. The correct behavior is dispatched based on `os.Args[0]`, and
the `pinniped-server` binary is symlinked to `pinniped-concierge` and `pinniped-supervisor`.
- Strip debug symbols from our binaries. These aren't really useful in a distroless image anyway and all the
normal stuff you'd expect to work, such as stack traces, still does.
- Add a separate `pinniped-concierge-kube-cert-agent` binary with "sleep" and "print" functionality instead of
using builtin /bin/sleep and /bin/cat for the kube-cert-agent. This is split from the main server binary
because the loading/init time of the main server binary was too large for the tiny resource footprint we
established in our kube-cert-agent PodSpec. Using a separate binary eliminates this issue and the extra
binary adds only around 1.5MiB of image size.
- Switch the kube-cert-agent code to use a JSON `{"tls.crt": "<b64 cert>", "tls.key": "<b64 key>"}` format.
This is more robust to unexpected input formatting than the old code, which simply concatenated the files
with some extra newlines and split on whitespace.
- Update integration tests that made now-invalid assumptions about the `pinniped-server` image.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We accidentally missed this in the v0.10.0 release process. The new YAML field here should make it easier to automate this step, which seems like a really good idea.
This may be a temporary fix. It switches the manual auth code prompt to use `promptForValue()` instead of `promptForSecret()`. The `promptForSecret()` function no longer supports cancellation (the v0.9.2 behavior) and the method of cancelling in `promptForValue()` is now based on running the blocking read in a background goroutine, which is allowed to block forever or leak (which is not important for our CLI use case).
This means that the authorization code is now visible in the user's terminal, but this is really not a big deal because of PKCE and the limited lifetime of an auth code.
The main goroutine now correctly waits for the "manual prompt" goroutine to clean up, which now includes printing the extra newline that would normally have been entered by the user in the manual flow.
The text of the manual login prompt is updated to be more concise and less scary (don't use the word "fail").
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Remove all the "latest" links and replace them with our new shortcode so they point at the latest release in a more explicit way.
This also eliminates one of the sections in our Concierge and Supervisor install guides, since you're always installing a specific version.
- Provide instructions for installing with both kapp (one step) and kubectl (two steps for the Concierge).
- Minor wording changes. Mainly we are now a bit less verbose about reminding people they can choose a different version (once per page instead of in each step).
- When we give an example `kapp deploy` command, don't suggest `--yes` and `--diff-changes`.
Users can still use these but it seems overly verbose for an example command.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test is asynchronously waiting for the controller to do something, and in some of our test environments it will take a bit longer than we'd previously allowed.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Prior to this fix, this controller did not correctly react to changes to the ClusterIP service. It would still eventually react with a long delay due to our 5 minute resync interval.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This change fixes a race that can occur because we have multiple
writers with no leader election lock.
1. TestAPIServingCertificateAutoCreationAndRotation/automatic
expires the current serving certificate
2. CertsExpirerController 1 deletes expired serving certificate
3. CertsExpirerController 2 starts deletion of expired serving
certificate but has not done so yet
4. CertsManagerController 1 creates new serving certificate
5. TestAPIServingCertificateAutoCreationAndRotation/automatic
records the new serving certificate
6. CertsExpirerController 2 finishes deletion, and thus deletes the
newly created serving certificate instead of the old one
7. CertsManagerController 2 creates new serving certificate
8. TestAPIServingCertificateAutoCreationAndRotation/automatic keeps
running and eventually times out because it is expecting the
serving certificate created by CertsManagerController 2 to match
the value it recorded from CertsManagerController 1 (which will
never happen since that certificate was incorrectly deleted).
Signed-off-by: Monis Khan <mok@vmware.com>
This functioned fine, but did not have the intended visual appearance when it came to how the text of the auth code wrapped inside the copy button in the manual flow.
The new styling behaves correctly on at least Chrome, Firefox, and Safari on macOS.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
It turns out that `syscall.Stdin` is of type `int` on Linux and macOS, but not on Windows (it's `syscall.Handle`). This should now be portable and do all the require type casting on every platform.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Add Dex to the prerequisites and add a note that to query for the groups
scope the user must set the organizations Dex should search against.
Otherwise the groups claim would be empty. This is because of the format
group claims are represented, i.e. "org:team".
Signed-off-by: Radoslav Dimitrov <dimitrovr@vmware.com>
For CLI-based auth, such as with LDAP upstream identity providers, the
user may use these environment variables to avoid getting interactively
prompted for username and password.
The following guide describes the process of configuring Supervisor
with Dex and identify users through their Github account. Issue #415
Signed-off-by: Radoslav Dimitrov <dimitrovr@vmware.com>
this will hopefully fix some flakes where aws provisioned a host for the
load balancer but the tests weren't able to resolve it.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
TestAgentController really runs the controller and evaluates multiple
calls to the controller's Sync with real informers caching updates.
There is a large amount of non-determinism in this unit test, and it
does not always behave the same way. Because it makes assertions about
the specific errors that should be returned by Sync, it was not
accounting for some errors that are only returned by Sync once in a
while depending on the exact (unpredictable) order of operations.
This commit doesn't fix the non-determinism in the test, but rather
tries to work around it by also allowing other (undesired but
inevitable) error messages to appear in the list of actual error
messages returned by the calls to the Sync function.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
Previously, the ytt install docs suggested that you use ytt templates
from the HEAD of main with the container image from the latest public
release, which could result in a mismatch.
It seems like page.ClearCookies() only clears cookies for the current
domain, so there doesn't seem to be a function to clear all browser
cookies. Instead, we'll just start a whole new browser each test.
They start fast enough that it shouldn't be a problem.
Our actual CLI code behaved correctly, but this test made some invalid assumptions about the "upstream" IDP we're testing. It assumed that the upstream didn't support `response_mode=form_post`, but Okta does. This means that when we end up on the localhost callback page, there are no URL query parameters.
Adjusting this regex makes the test pass as expected.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
I found that there are some situations with `response_mode=form_post` where Chrome will open additional speculative TCP connections. These connections will be idle so they block server shutdown until the (previously 5s) timeout. Lowering this to 500ms should be safe and makes any added latency at login much less noticeable.
More information about Chrome's TCP-level behavior here: https://bugs.chromium.org/p/chromium/issues/detail?id=116982#c5
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Using the same fake TTY trick we used to test LDAP login, this new subtest runs through the "manual"/"jump box" login flow. It runs the login with a `--skip-listen` flag set, causing the CLI to skip opening the localhost listener. We can then wait for the login URL to be printed, visit it with the browser and log in, and finally simulate "manually" copying the auth code from the browser and entering it into the waiting CLI prompt.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This flag is (for now) meant only to facilitate end-to-end testing, allowing us to force the "manual" login flow. If it ends up being useful we can un-hide it, but this seemed like the safest option to start with.
There is also a corresponding `--oidc-skip-listen` on the `pinniped get kubeconfig` command.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
For some reason our headless Chrome test setup behaves slightly differently on Linux and macOS hosts. On Linux, the emoji characters are not recognized as valid text, so they are URL encoded. This change updates the test to cope with both cases correctly.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This adds a new login flow that allows manually pasting the authorization code instead of receiving a browser-based callback.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is a more restrictive library interface that more closely matches the use cases of our new form_post login flow.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This allows the prompts to be cancelled, which we need to be able to do in the case where we prompt for a manually-pasted auth code but the automatic callback succeeds.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is a new pacakge internal/oidc/provider/formposthtml containing a number of static files embedded using the relatively recent Go "//go:embed" functionality introduced in Go 1.16 (https://blog.golang.org/go1.16).
The Javascript and CSS files are minifiied and injected to make a single self-contained HTML response. There is a special Content-Security-Policy helper to calculate hash-based script-src and style-src rules.
This new code is covered by a new integration test that exercises the JS/HTML functionality in a real browser outside of the rest of the Supervisor.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Our Supervisor callback handler now needs to load JS and CSS from the provider endpoint, and this JS needs to make a `fetch()` call across origins (to post the form to the CLI callback). This requires a custom Content-Security-Policy compared to other pages we render.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test would occasionally flake for me when running locally. This change moves more of the assertions into the "eventually" loop, so they can temporarily fail as long as they converge on the expected values.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test did not tolerate this connection failing, which can happen for any number of flaky networking-related reasons. This change moves the connection setup into an "eventually" retry loop so it's allowed to fail temporarily as long as it eventually connects.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This CredentialIssuer field is called `spec.impersonationProxy.service.type`, not `spec.impersonationProxy.service.mode`.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
TestSimultaneousLDAPRequestsOnSingleProvider proved to be unreliable
on AKS due to some kind of kubectl port-forward issue, so only
run the LDAP client's integration tests on Kind. They are testing
the integration between the client code and the OpenLDAP test server,
not testing anything about Kubernetes, so running only on Kind should
give us sufficient test coverage.
After noticing that the upstream OIDC discovery calls can hang
indefinitely, I had tried to impose a one minute timeout on them
by giving them a timeout context. However, I hadn't noticed that the
context also gets passed into the JWKS fetching object, which gets
added to our cache and used later. Therefore the timeout context
was added to the cache and timed out while sitting in the cache,
causing later JWKS fetchers to fail.
This commit is trying again to impose a reasonable timeout on these
discovery and JWKS calls, but this time by using http.Client's Timeout
field, which is documented to be a timeout for *each* request/response
cycle, so hopefully this is a more appropriate way to impose a timeout
for this use case. The http.Client instance ends up in the cache on
the JWKS fetcher object, so the timeout should apply to each JWKS
request as well.
Requests that can hang forever are effectively a server-side resource
leak, which could theoretically be taken advantage of in a denial of
service attempt, so it would be nice to avoid having them.
- Add new optional ytt params for the Supervisor deployment.
- When the Supervisor is making calls to an upstream OIDC provider,
use these variables if they were provided.
- These settings are integration tested in the main CI pipeline by
sometimes setting them on deployments in certain cases, and then
letting the existing integration tests (e.g. TestE2EFullIntegration)
provide the coverage, so there are no explicit changes to the
integration tests themselves in this commit.
- this allows the oidc upsream watcher to honor the
HTTP_PROXY,HTTPS_PROXY,NO_PROXY environment variables
Co-authored-by: Christian Ang <angc@vmware.com>
We want the value of time.Now() to be calculated before the call to
IssueClientCertPEM to prevent the ExpirationTimestamp from being
later than the notAfter timestamp on the issued certificate.
Signed-off-by: Monis Khan <mok@vmware.com>
This fixes some rare test flakes caused by a data race inherent in the way we use `assert.Eventually()` with extra variables for followup assertions. This function is tricky to use correctly because it runs the passed function in a separate goroutine, and you have no guarantee that any shared variables are in a coherent state when the `assert.Eventually()` call returns. Even if you add manual mutexes, it's tricky to get the semantics right. This has been a recurring pain point and the cause of several test flakes.
This change introduces a new `library.RequireEventually()` that works by internally constructing a per-loop `*require.Assertions` and running everything on a single goroutine (using `wait.PollImmediate()`). This makes it very easy to write eventual assertions.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Before this change, we used the `fosite.DefaultOpenIDConnectClient{}` struct, which implements the `fosite.Client` and `fosite.OpenIDConnectClient` interfaces. For a future change, we also need to implement some additional optional interfaces, so we can no longer use the provided default types. Instead, we now use a custom `clientregistry.Client{}` struct, which implements all the requisite interfaces and can be extended to handle the new functionality (in a future change).
There is also a new `clientregistry.StaticRegistry{}` struct, which implements the `fosite.ClientManager` and looks up our single static client. We could potentially extend this in the future with a registry backed by Kubernetes API, for example.
This should be 100% refactor, with no user-observable change.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This change updates the impersonator to always authorize every
request instead of relying on the Kuberentes API server to perform
the check on the impersonated request. This protects us from
scenarios where we fail to correctly impersonate the user due to
some bug in our proxy logic. We still rely completely on the API
server to perform admission checks on the impersonated requests.
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the impersonation proxy code to run as a
distinct service account that only has permission to impersonate
identities. Thus any future vulnerability that causes the
impersonation headers to be dropped will fail closed instead of
escalating to the concierge's default service account which has
significantly more permissions.
Signed-off-by: Monis Khan <mok@vmware.com>
WithImpersonation already deletes impersonation headers and has done
so since the early days:
https://github.com/kubernetes/kubernetes/pull/36769
ensureNoImpersonationHeaders will still reject any request that has
impersonation headers set so we will always fail closed.
Signed-off-by: Monis Khan <mok@vmware.com>
When anonymous authentication is disabled, the impersonation proxy
will no longer authenticate anonymous requests other than calls to
the token credential request API (this API is used to retrieve
credentials and thus must be accessed anonymously).
Signed-off-by: Benjamin A. Petersen <ben@benjaminapetersen.me>
Signed-off-by: Monis Khan <mok@vmware.com>
There was nothing to guarantee that _all_ Supervisor pods would be ready to handle this request. We saw a rare test flake where the LDAPIdentityProvider was marked as ready but one of the Supervisor pods didn't have it loaded yet and returned an HTTP 422 error (`Unprocessable Entity: No upstream providers are configured`).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The `require.Eventually()` function runs the body of the check in a separate goroutine, so it's not safe to use other `require` assertions as we did here. Our `library.RequireEventuallyWithoutError()` function does not spawn a goroutine, so it's safer to use here.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
When a CredentialIssuer is switched from one service type to another (or switched to disabled mode), the `impersonatorconfig` controller will delete the previous Service, if any. Normally one Concierge pod will succeed to delete this initially and any other pods will see a NotFound error.
Before this change, the NotFound would bubble up and cause the strategy to enter a ErrorDuringSetup status until the next reconcile loop. We now handle this case without reporting an error.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This typo wasn't caught in testing because 1) the Kubernetes API ignores the unknown field and 2) the `type` field defaults to `LoadBalancer` anyway, so things behave as expected.
Even though this doesn't cause any large problems, it's quite confusing.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The LastUpdateTime is no longer updated on every resync. It only changes if the underlying status has changed, so that it effectively shows when the transition happened.
This change happened in ab750f48aa, but we missed this test. It only fails when it has been more than ten minutes since the CredentialIssuer transitioned into a healthy state, but that can happen in our long-running CI environments.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test felt overly complex and some of the cleanup logic wasn't 100% correct (it didn't clean up in all cases).
The new code is essentially the same flow but hopefully easier to read.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We had this one test that mutated the CredentialIssuer, which could cause the impersonation proxy to blip on one or both of the running concierge pods. This would sometimes break other concurrently running tests.
Instead, this bit of code is split into a separate non-concurrent test.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This required a weird hack because some of the Fosite tests (or a transitive dependency of them) depends on a newer version of gRPC that's incompatible with the Kubernetes runtime version we use. It wasn't as simple as just replacing the gRPC module with an older version, because in the latest versions of gRPC, they split out the "examples" packages into their own module. This new module name doesn't exist at the old version.
Ultimately, the workaround was to make a fake "examples" module locally. This module can be empty because we never actually depend on that code (it's only used in transitive dependency tests).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
These are tricky because a real load balancer controller (e.g., on GKE) will overwrite and set NodePort, so we can't blindly set the desired state of this fields.
For now, we will just skip reconciling these. In the future, we could be more clever about merging them together with the current state.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
If the only thing that has changed about a strategy is the LastUpdated timestamp, then we should not update the object.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This is to allow the use of binary LDAP entry attributes as the UID.
For example, a user might like to configure AD’s objectGUID or maybe
objectSid attributes as the UID attribute.
This negatively impacts the readability of the UID when it did not come
from a binary value, but we're considering this an okay trade-off to
keep things simple for now. In the future, we may offer more
customizable encoding options for binary attributes.
These UIDs are currently only used in the downstream OIDC `sub` claim.
They do not effect the user's identity on the Kubernetes cluster,
which is only based on their mapped username and group memberships from
the upstream identity provider. We are not currently supporting any
special encoding for those username and group name LDAP attributes, so
their values in the LDAP entry must be ASCII or UTF-8 in order for them
to be interpreted correctly.
This test setup should tolerate when the TokenCredentialRequest API isn't quite ready to authenticate the user or issue a cert.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This check is no longer valid, because there can be ephemeral, recoverable errors that show as ErrorDuringSetup.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This updates the code to use a different mechanism for driving desired state:
- Read existing object
- If it does not exist, create desired object
- If it does exist, make a copy and set all the desired fields
- Do a deepequal to see if an update is necessary.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We also no longer need an initial event, since we don't do anything unless the CredentialIssuer exists, so we'll always be triggered at the appropriate time.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This type of field appears in more than one of our APIs, so this package will provide a single source of truth for validating and parsing inputs.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The documentation was a bit confusing before, and it was easy to accidentally install a very outdated version if you weren't reading carefully.
We could consider writing a post-release CI job to update these references automatically (perhaps using a Hugo macro?), but for now a manual update seems sufficient.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Automatically try to fall back to using StartTLS when using TLS
doesn't work. Only complain when both don't work.
- Remember (in-memory) which one worked and keeping using that one
in the future (unless the pod restarts).
- This enhances our LDAP client code to make it possible to optionally
dial an LDAP server without TLS and then use StartTLS to upgrade
the connection to TLS.
- The controller for LDAPIdentityProviders is not using this option
yet. That will come in a future commit.
Previously, our controllers would automatically create a CredentialIssuer with a singleton name. The helpers we had for this also used "raw" client access and did not take advantage of the informer cache pattern.
With this change, the CredentialIssuer is always created at install time in the ytt YAML. The controllers now only update the existing CredentialIssuer status, and they do so using the informer cache as much as possible.
This change is targeted at only the kubecertagent controller to start. The impersonatorconfig controller will be updated in a following PR along with other changes.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- For testing purposes, we would like to ensure that when we connect
to the LDAP server we cannot accidentally avoid using TLS or StartTLS.
- Also enabled the openldap `memberOf` overlay in case we want to
support group search using `memberOf` in the future.
- This required changes to the docker.io/bitnami/openldap container
image, so we're using our own fork for now. Will submit a PR to
bitnami/openldap to see if they will accept it (or something similar)
upstream.
Reflect the upstream group membership into the Supervisor's
downstream tokens, so they can be added to the user's
identity on the workload clusters.
LDAP group search is configurable on the
LDAPIdentityProvider resource.
- Note that this adds an extra check of the response, which is that
the issuer string in the response must match issuer of the requested
URL.
- Some of the error messages also changed to match the errors provided
by oidc.NewProvider
See RFC6648 which asks that people stop using `X-` on header names.
Also Matt preferred not mentioning "IDP" in the header name.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This change makes it easier to understand misconfigurations caused
by issuers with extraneous trailing slashes.
Signed-off-by: Mo Khan <mok@vmware.com>
The admin kubeconfigs we have on EKS clusters are a bit different from others, because there is no certificate/key (EKS does not use certificate auth).
This code didn't quite work correctly in that case. The fix is to allow the case where `tlsConfig.GetClientCertificate` is non-nil, but returns a value with no certificates.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
to avoid garbage collection breaking the refresh flow
Also changed the access token lifetime to be 2 minutes instead of 15
since we now have cert caching.
- Use `nickname` claim as an example, which means we only need the `openid` scope.
This is also more stable since emails can change over time.
- Put the OIDCIdentityProvider and Secret into one YAML blob, since they will likely be copy-pasted together anyway.
- Add a separate section for using alternate claims.
- Add a separate section for using a private GitLab instance.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Some minor edits I came across while reviewing this:
- Capitalize "GitLab" the way they do.
- Use `{{< ref "xyz" >}}` references when linking internally. The advantage of these is that they're "type checked" by Hugo when the site is rendered, so we'll know if we ever break one.
- Add links to the GitLab docs about creating an OAuth client. These also cover adding a group-level or instance-wide application.
- Re-wrap the YAML lines to fit a bit more naturally.
- Add a `namespace` to the YAML examples, so they're more likely to work without tweaks.
- Use "gitlab" instead of "my-oidc-identity-provider" as the example name, for clarity.
- Re-word a few small bits. These are 100% subjective but hopefully an improvement?
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The supervisor treats all events the same hence it must use a
singleton queue.
Updated the integration test to remove the data race caused by
calling methods on testing.T outside of the main test go routine.
Signed-off-by: Monis Khan <mok@vmware.com>
Followup on the previous comment to split apart the ServiceAccount of the kube-cert-agent and the main concierge pod. This is a bit cleaner and ensures that in testing our main Concierge pod never requires any privileged permissions.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Since 0dfb3e95c5, we no longer directly create the kube-cert-agent Pod, so our "use"
permission on PodSecurityPolicies no longer has the intended effect. Since the deployments controller is now the
one creating pods for us, we need to get the permission on the PodSpec of the target pod instead, which we do somewhat
simply by using the same service account as the main Concierge pods.
We still set `automountServiceAccountToken: false`, so this should not actually give any useful permissions to the
agent pod when running.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- And perform auto-discovery when the flags are not set
- Several TODOs remain which will be addressed in the next commit
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This change updates the impersonator logic to pass through requests
that authenticated via a bearer token that asserts a UID. This
allows us to support service account tokens (as well as any other
form of token based authentication).
Signed-off-by: Monis Khan <mok@vmware.com>
- The linux base64 command is different, so avoid using it at all.
On linux the default is to split the output into multiple lines,
which messes up the integration-test-env file. The flag used to
disable this behavior on linux ("-w0") does not exist on MacOS's
base64.
- On debian linux, the latest version of Docker from apt-get still
requires DOCKER_BUILDKIT=1 or else it barfs.
This controller is responsible for cleaning up kube-cert-agent pods that were deployed by previous versions.
They are easily identified because they use a different `kube-cert-agent.pinniped.dev` label compared to the new agent pods (`true` vs. `v2`).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is a relatively large rewrite of much of the kube-cert-agent controllers. Instead of managing raw Pod objects, they now create a single Deployment and let the builtin k8s controller handle it from there.
This reduces the amount of code we need and should handle a number of edge cases better, especially those where a Pod becomes "wedged" and needs to be recreated.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Make PINNIPED_TEST_LDAP_LDAPS_CA_BUNDLE optional for integration tests
- When there is no CA bundle provided, be careful to use nil instead of
an empty bundle, because nil means to use the OS defaults
Now that we have the fix from https://github.com/kubernetes/kubernetes/pull/97693, we no longer need these sleeps.
The underlying authenticator initialization is still asynchronous, but should happen within a few milliseconds.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This change updates the impersonator logic to use the delegated
authorizer for all non-rest verbs such as impersonate. This allows
it to correctly perform authorization checks for incoming requests
that set impersonation headers while not performing unnecessary
checks that are already handled by KAS.
The audit layer is enabled to track the original user who made the
request. This information is then included in a reserved extra
field original-user-info.impersonation-proxy.concierge.pinniped.dev
as a JSON blob.
Signed-off-by: Monis Khan <mok@vmware.com>
$PINNIPED_TEST_SUPERVISOR_UPSTREAM_OIDC_ISSUER_CA_BUNDLE was recently
changed to be a base64 encoded value, so this script does not need to
base64 encode the value itself anymore.
- Also some light prefactoring in login.go to make room for LDAP-style
login, which is not implemented yet in this commit. TODOs are added.
- And fix a test pollution problem in login_oidc_test.go where it was
using a real on-disk CLI cache file, so the tests were polluted by
the contents of that file and would sometimes cause each other to
fail.
Also force the LDAP server pod to restart whenever the LDIF file
changes, so whenever you redeploy the tools deployment with a new test
user password the server will be updated.
Unfortunately, Secrets do not seem to have a Generation field, so we
use the ResourceVersion field instead. This means that any change to
the Secret will cause us to retry the connection to the LDAP server,
even if the username and password fields in the Secret were not
changed. Seems like an okay trade-off for this early draft of the
controller compared to a more complex implementation.
This early version of the controller is not intended to act as an
ongoing health check for your upstream LDAP server. It will connect
to the LDAP server to essentially "lint" your configuration once.
It will do it again only when you change your configuration. To account
for transient errors, it will keep trying to connect to the server
until it succeeds once.
This commit does not include looking for changes in the associated bind
user username/password Secret.
Avoid them because they can't be used in GoLand for running integration
tests in the UI, like running in the debugger.
Also adds optional PINNIPED_TEST_TOOLS_NAMESPACE because we need it
on the LDAP feature branch where we are developing the upcoming LDAP
support for the Supervisor.
- Bad usernames and passwords aren't really errors, since they are
based on end-user input.
- Other kinds of authentication failures are caused by bad configuration
so still treat those as errors.
- Empty usernames and passwords are already prevented by our endpoint
handler, but just to be safe make sure they cause errors inside the
authenticator too.
- The unit tests for upstreamldap.Provider need to mock the LDAP server,
so add an integration test which allows us to get fast feedback for
this code against a real LDAP server.
- Automatically wrap the user search filter in parenthesis if it is not
already wrapped in parens.
- More special handling for using "dn" as the username or UID attribute
name.
- Also added some more comments to types_ldapidentityprovider.go.tmpl
- The ldap_upstream_watcher.go controller validates the bind secret and
uses the Conditions to report errors. Shares some condition reporting
logic with its sibling controller oidc_upstream_watcher.go, to the
extent which is convenient without generics in golang.
I don't believe this is used by any tests or docs. I think it was for some initial local testing of the impersonation proxy?
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- When the upstream IDP is an LDAP IDP and the user's LDAP username and
password are received as new custom headers, then authenticate the
user and, if authentication was successful, return a redirect with
an authcode. Handle errors according to the OAuth/OIDC specs.
- Still does not support having multiple upstream IDPs defined at the
same time, which was an existing limitation of this endpoint.
- Does not yet include the actual LDAP authentication, which is
hidden behind an interface from the point of view of auth_handler.go
- Move the oidctestutil package to the testutil directory.
- Add an interface for Fosite storage to avoid a cyclical test
dependency.
- Add GetURL() to the UpstreamLDAPIdentityProviderI interface.
- Extract test helpers to be shared between callback_handler_test.go
and auth_handler_test.go because the authcode and fosite storage
assertions should be identical.
- Backfill Content-Type assertions in callback_handler_test.go.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This isn't strictly necessary because we currently always have the concierge endpoint and CA as CLI flags, but it doesn't hurt and it's better to err on the side of _not_ reusing a cache entry.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We have some nice normalization code in this package to remove expired or otherwise malformed cache entries, but we weren't calling it in the appropriate place.
Added calls to normalize the cache data structure before and after each transaction, and added test cases to ensure that it's being called.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Add some fields to LDAPIdentityProvider that we will need to be able
to search for users during login
- Enhance TestSupervisorLogin to test logging in using an upstream LDAP
identity provider. Part of this new test is skipped for now because
we haven't written the corresponding production code to make it
pass yet.
- Some refactoring and enhancement to env.go and the corresponding env
vars to support the new upstream LDAP provider integration tests.
- Use docker.io/bitnami/openldap for our test LDAP server instead of our
own fork now that they have fixed the bug that we reported.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
The goal here was to start on an integration test to get us closer to the red
test that we want so we can start working on LDAP.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Rename the test/deploy/dex directory to test/deploy/tools
- Rename the dex namespace to tools
- Add a new ytt value called `pinny_ldap_password` for the tools
ytt templates
- This new value is not used on main at this time. We intend to use
it in the forthcoming ldap branch. We're defining it on main so
that the CI scripts can use it across all branches and PRs.
Signed-off-by: Ryan Richard <richardry@vmware.com>
Before this change, the "context", "cluster", and "user" fields in generated kubeconfig YAML were always hardcoded to "pinniped". This could be confusing if you generated many kubeconfigs for different clusters.
After this change, the fields will be copied from their names in the original kubeconfig, suffixed with "-pinniped". This suffix can be overridden by setting the new `--generated-name-suffix` CLI flag.
The goal of this change is that you can distinguish between kubeconfigs generated for different clusters, as well as being able to distinguish between the Pinniped and original (admin) kubeconfigs for a cluster.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
A demo of running the Supervisor and Concierge on
a kind cluster. Can be used to quickly set up an
environment for manual testing.
Also added some missing copyright headers to other
hack scripts.
These commits include security fixes (CVE-2021-3121) for code generated by github.com/gogo/protobuf.
We expect this fix to also land in v1.20.6, but we don't want to wait for it.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test could flake if the load balancer hostname was provisioned but is not yet resolving in DNS from the test process.
The fix is to retry this step for up to 5 minutes.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test could fail when the cluster was under heavy load. This could cause kubectl to emit "Throttling request took [...]" logs that triggered a failure in the test.
The fix is to ignore these innocuous warnings.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We had this code that printed out pod logs when certain tests failed, but it is a bit cumbersome. We're removing it because we added a CI task that exports all pod logs after every CI run, which accomplishes the same thing and provides us a bunch more data.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This allows setting `$PINNIPED_TEST_CLI` to point at an existing `pinniped` CLI binary instead of having the test build one on-the-fly. This is more efficient when you're running the tests across many clusters as we do in CI.
Building the CLI from scratch in our CI environment takes 1.5-2 minutes, so this change should save nearly that much time on every test job.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We've seen some test flakes caused by this test. Some small changes:
- Use a 30s timeout for each iteration of the test loop (so each iteration needs to check or fail more quickly).
- Log a bit more during the checks so we can diagnose what's going on.
- Increase the overall timeout from one minute to five minutes
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- a credential that is understood by -> a credential that can be used to
authenticate to
- This is more neutral to whether its going directly to k8s
or through the impersonation proxy
Instead of using the LongRunningFunc to determine if we can safely
use http2, follow the same logic as the aggregation proxy and only
use http2 when the request is not an upgrade.
Signed-off-by: Monis Khan <mok@vmware.com>
In the case where we are using middleware (e.g., when the api group is
different) in our kubeclient, these error messages have a "...middleware request
for..." bit in the middle.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
When the frontend connection to our proxy is closed, the proxy falls through to
a panic(), which means the HTTP handler goroutine is killed, so we were not
seeing this log statement.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This is probably a good idea regardless, but it also avoids an infinite recursion from IntegrationEnv() -> assertNoRestartsDuringTest() -> NewKubeclient() -> IntegrationEnv() -> ...
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test could flake in some rare scenarios. This change adds a bunch of retries, improves the debugging output if the tests fail, and puts all of the subtests in parallel which saves ~10s on my local machine.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test has occasionally flaked because it only waited for the APIService GET to finish, but did not wait for the controller to successfully update the target object.
The new code should be more patient and allow the controller up to 10s to perform the expected action.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This new capability describes whether a cluster is expected to allow anonymous requests (most do since k8s 1.6.x, but AKS has it disabled).
This commit also contains new capability YAML files for AKS and EKS, mostly to document publicly how we expect our tests to function in those environments.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
It takes a lot of manual steps to get ready to manually test the
impersonation proxy on a kind cluster, which makes it error prone,
so encapsulate them into a script to make it easier.
At the end of the test, wait for the KubeClusterSigningCertificate
strategy on the CredentialIssuer to go back to being healthy, to avoid
polluting other integration tests which follow this one.
This is the configuration for https://pre-commit.com/, which now also runs golangci-lint using the same version as CI (currently v1.33.0).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We were previously issuing both client certs and server certs with
both extended key usages included. Split the Issue*() methods into
separate methods for issuing server certs versus client certs so
they can have different extended key usages tailored for each use
case.
Also took the opportunity to clean up the parameters of the Issue*()
methods and New() methods to more closely match how we prefer to call
them. We were always only passing the common name part of the
pkix.Name to New(), so now the New() method just takes the common name
as a string. When making a server cert, we don't need to set the
deprecated common name field, so remove that param. When making a client
cert, we're always making it in the format expected by the Kube API
server, so just accept the username and group as parameters directly.
I'm kinda surprised this is working with our current implementation of the
impersonator, but regardless this seems like a step forward.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
The impersonator_test.go unit test now starts the impersonation
server and makes real HTTP requests against it using client-go.
It is backed by a fake Kube API server.
The CA IssuePEM() method was missing the argument to allow a slice
of IP addresses to be passed in.
These tests occasionally flake because of a conflict error such as:
```
supervisor_discovery_test.go:105:
Error Trace: supervisor_discovery_test.go:587
supervisor_discovery_test.go:105
Error: Received unexpected error:
Operation cannot be fulfilled on federationdomains.config.supervisor.pinniped.dev "test-oidc-provider-lvjfw": the object has been modified; please apply your changes to the latest version and try again
Test: TestSupervisorOIDCDiscovery
```
These retries should improve the reliability of the tests.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Also make each t.Run use its own namespace to slight reduce the
interdependency between them.
Use t.Cleanup instead of defer in whoami_test.go just to be consistent
with other integration tests.
The same coverage that was supplied by
TestCredentialRequest_OtherwiseValidRequestWithRealTokenShouldFailWhenTheClusterIsNotCapable
is now provided by an assertion at the end of TestImpersonationProxy,
so delete the duplicate test which was failing on GKE because the
impersonation proxy is now active by default on GKE.
When testing that the impersonation proxy port was closed there
is no need to include credentials in the request. At the point when
we want to test that the impersonation proxy port is closed, it is
possible that we cannot perform a TokenCredentialRequest to get a
credential either.
Also add a new assertion that the TokenCredentialRequest stops handing
out credentials on clusters which have no successful strategies.
Signed-off-by: Monis Khan <mok@vmware.com>
To make an impersonation request, first make a TokenCredentialRequest
to get a certificate. That cert will either be issued by the Kube
API server's CA or by a new CA specific to the impersonator. Either
way, you can then make a request to the impersonator and present
that client cert for auth and the impersonator will accept it and
make the impesonation call on your behalf.
The impersonator http handler now borrows some Kube library code
to handle request processing. This will allow us to more closely
mimic the behavior of a real API server, e.g. the client cert
auth will work exactly like the real API server.
Signed-off-by: Monis Khan <mok@vmware.com>
It also works on the slightly older MacOS Catalina.
This script is only used on development laptops, so hopefully
this will work for more laptop OS's now.
Signed-off-by: Ryan Richard <richardry@vmware.com>
This adds two new flags to "pinniped get kubeconfig": --skip-validation and --timeout.
By default, at the end of the kubeconfig generation process, we validate that we can reach the configured cluster. In the future this might also validate that the TokenCredentialRequest API is running, but for not it just verifies that the DNS name resolves, and the TLS connection is available on the given port.
If there is an error during this check, we block and retry for up to 10 minutes. This duration can be changed with --timeout an the entire process can be skipped with --skip-validation.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This makes output that's easier to copy-paste into the test. We could also make it ignore the order of key/value pairs in the future.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The thing we're waiting for is mostly that DNS is resolving, the ELB is listening, and connections are making it to the proxy.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
All controller unit tests were accidentally using a timeout context
for the informers, instead of a cancel context which stays alive until
each test is completely finished. There is no reason to risk
unpredictable behavior of a timeout being reached during an individual
test, even though with the previous 3 second timeout it could only be
reached on a machine which is running orders of magnitude slower than
usual, since each test usually runs in about 100-300 ms. Unfortunately,
sometimes our CI workers might get that slow.
This sparked a review of other usages of timeout contexts in other
tests, and all of them were increased to a minimum value of 1 minute,
under the rule of thumb that our tests will be more reliable on slow
machines if they "pass fast and fail slow".
In impersonator_config_test.go, instead of waiting for the resource
version to appear in the informers, wait for the actual object to
appear.
This is an attempt to resolve flaky failures that only happen in CI,
but it also cleans up the test a bit by avoiding inventing fake resource
version numbers all over the test.
Signed-off-by: Monis Khan <mok@vmware.com>
- Use `Eventually` when making tls connections because the production
code's handling of starting and stopping the TLS server port
has some async behavior.
- Don't use resource version "0" because that has special meaning
in the informer libraries.
This time, don't use the Squid proxy if the cluster supports real external load balancers (as in EKS/GKE/AKS).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Because otherwise `go test` will panic/crash your test if it takes
longer than 10 minutes, which is an annoying way for an integration
test to fail since it skips all of the t.Cleanup's.
This updates our issuerconfig.UpdateStrategy to sort strategies according to a weighted preference.
The TokenCredentialRequest API strategy is preffered, followed by impersonation proxy, followed by any other unknown types.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- This commit does not include the updates that we plan to make to
the `status.strategies[].frontend` field of the CredentialIssuer.
That will come in a future commit.
This is more than an automatic merge. It also includes a rewrite of the CredentialIssuer API impersonation proxy fields using the new structure, and updates to the CLI to account for that new API.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
I think this is another aspect of the test flakes we're trying to fix. This matters especially for the "Multiple Pinnipeds" test environment where two copies of the test suite are running concurrently.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
If the test is run immediately after the Concierge is installed, the API server can still have broken discovery data and return an error on the first call.
This commit adds a retry loop to attempt this first kubectl command for up to 60s before declaring failure.
The subsequent tests should be covered by this as well since they are not run in parallel.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
These controllers were a bit inconsistent. There were cases where the controllers ran out of the expected order and the custom labels might not have been applied.
We should still plan to remove this label handling or move responsibility into the middleware layer, but this avoids any regression.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This field is a new tagged-union style field that describes how clients can connect using each successful strategy.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We don't support using the impersonate headers through the impersonation
proxy yet, so this integration test is a negative test which asserts
that we get an error.
- The CA cert will end up in the end user's kubeconfig on their client
machine, so if it changes they would need to fetch the new one and
update their kubeconfig. Therefore, we should avoid changing it as
much as possible.
- Now the controller writes the CA to a different Secret. It writes both
the cert and the key so it can reuse them to create more TLS
certificates in the future.
- For now, it only needs to make more TLS certificates if the old
TLS cert Secret gets deleted or updated to be invalid. This allows
for manual rotation of the TLS certs by simply deleting the Secret.
In the future, we may want to implement some kind of auto rotation.
- For now, rotation of both the CA and TLS certs will also happen if
you manually delete the CA Secret. However, this would cause the end
users to immediately need to get the new CA into their kubeconfig,
so this is not as elegant as a normal rotation flow where you would
have a window of time where you have more than one CA.
Also fixes our sitemap to have correct `lastmod` times when built locally (it was already correct on Netlify).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Should work on cluster which have:
- load balancers not supported, has squid proxy (e.g. kind)
- load balancers supported, has squid proxy (e.g. EKS)
- load balancers supported, no squid proxy (e.g. GKE)
When testing with a load balancer, call the impersonation proxy through
the load balancer.
Also, added a new library.RequireNeverWithoutError() helper.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This flag selects a CredentialIssuer to use when detecting what mode the Concierge is in on a cluster. If not specified, the command will look for a single CredentialIssuer. If there are multiple, then the flag is required.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Also update concierge_impersonation_proxy_test.go integration test
to use real TLS when calling the impersonator.
Signed-off-by: Ryan Richard <richardry@vmware.com>
These are prone to breaking when stdr is upgraded because they rely on the exact ordering of keys in the log message. If we have more problems we can rewrite the assertions to be more robust, but for this time I'm just fixing them to match the new output.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The login commands now expect either `--concierge-mode ImpersonationProxy` or `--concierge-mode TokenCredentialRequestAPI` (the default).
This is partly a style choice, but I also think it helps in case we need to add a third major mode of operation at some point.
I also cleaned up some other minor style items in the help text.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Adds a new optional `spec.impersonationProxyInfo` field to hold the URL and CA data for the impersonation proxy, as well as some additional status condition constants for describing the current status of the impersonation proxy.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
These are some more changes that came up when Pablo and I were reviewing the previous docs PR.
In no particular order:
- Fix "related posts" on the blog section, and hide the section if there are none.
- Minor style changes to several pages (guided by various style guides).
- Redirect the root of get.pinniped.dev to our main page (shouldn't really be hit, but it's nice to do something).
- Add more mobile-friendly CSS for our docs.
- Reword the "getting started" CTA, and hide it on the docs pages (you're already there).
- Fix the "Learn how Pinniped provides identity services to Kubernetes" link on the landing page.
- Add a date to our blog post cards.
- Rewrite the hero text on the landing page.
- Fix the docs link for the "Get Started with Pinniped" button on the landing page.
- Rework the landing page grid text.
- Add Margo and Nanci to the team section and sort it alphabetically.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Also:
- Shut down the informer correctly in
concierge_impersonation_proxy_test.go
- Remove the t.Failed() checks which avoid cleaning up after failed
tests. This was inconsistent with how most of the tests work, and
left cruft on clusters when a test failed.
Signed-off-by: Ryan Richard <richardry@vmware.com>
Also:
- Changed base64 encoding of impersonator bearer tokens to use
`base64.StdEncoding` to make it easier for users to manually
create a token using the unix `base64` command
- Test the headers which are and are not passed through to the Kube API
by the impersonator more carefully in the unit tests
- More WIP on concierge_impersonation_proxy_test.go
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This change adds a new virtual aggregated API that can be used by
any user to echo back who they are currently authenticated as. This
has general utility to end users and can be used in tests to
validate if authentication was successful.
Signed-off-by: Monis Khan <mok@vmware.com>
- Because the impersonation proxy config controller needs to be able
to delete the load balancer which it created
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This is a more reliable way to determine whether the load balancer
is already running.
Also added more unit tests for the load balancer.
Signed-off-by: Ryan Richard <richardry@vmware.com>
Makes most of the fonts a bit bigger, increases contrast, fixes some nits about the spacing in numbered/bulletted lists, and adds some image alt texts.
Overall this improves our Lighthouse accessibility score from 71 to 95 and I think it's subjectively more readable.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This wasn't needed before because the other code wasn't in the main module and golangci-lint won't cross a module boundary.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
If someone has already set impersonation headers in their request, then
we should fail loudly so the client knows that its existing impersonation
headers will not work.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I think we were assuming the name of our Concierge app, and getting lucky
because it was the name we use when testing locally (but not in CI).
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Watch a configmap to read the configuration of the impersonation
proxy and reconcile it.
- Implements "auto" mode by querying the API for control plane nodes.
- WIP: does not create a load balancer or proper TLS certificates yet.
Those will come in future commits.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This is a partial revert of 288d9c999e. For some reason it didn't occur to me
that we could do it this way earlier. Whoops.
This also contains a middleware update: mutation funcs can return an error now
and short-circuit the rest of the request/response flow. The idea here is that
if someone is configuring their kubeclient to use middleware, they are agreeing
to a narrow-er client contract by doing so (e.g., their TokenCredentialRequest's
must have an Spec.Authenticator.APIGroup set).
I also updated some internal/groupsuffix tests to be more realistic.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I think the reason we were seeing flakes here is because the kube cert agent
pods had not reached a steady state even though our test assertions passed, so
the test would proceed immediately and run more assertions on top of a weird
state of the kube cert agent pods.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I added that test helper to create an http.Request since I wanted to properly
initialize the http.Request's context.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This allows us to keep all of our resources in the pinniped category
while not having kubectl return errors for calls such as:
kubectl get pinniped -A
Signed-off-by: Monis Khan <mok@vmware.com>
As of upgrading to Kubernetes 1.20, our aggregated API server nows runs some
controllers for the two flowcontrol.apiserver.k8s.io resources in the title of
this commit, so it needs RBAC to read them.
This should get rid of the following error messages in our Concierge logs:
Failed to watch *v1beta1.FlowSchema: failed to list *v1beta1.FlowSchema: flowschemas.flowcontrol.apiserver.k8s.io is forbidden: User "system:serviceaccount:concierge:concierge" cannot list resource "flowschemas" in API group "flowcontrol.apiserver.k8s.io" at the cluster scope
Failed to watch *v1beta1.PriorityLevelConfiguration: failed to list *v1beta1.PriorityLevelConfiguration: prioritylevelconfigurations.flowcontrol.apiserver.k8s.io is forbidden: User "system:serviceaccount:concierge:concierge" cannot list resource "prioritylevelconfigurations" in API group "flowcontrol.apiserver.k8s.io" at the cluster scope
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I messed this up before because the ordering of the path components is a bit different than in the specific version case.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
When the Pinniped server has been installed with the `api_group_suffix`
option, for example using `mysuffix.com`, then clients who would like to
submit a `TokenCredentialRequest` to the server should set the
`Spec.Authenticator.APIGroup` field as `authentication.concierge.mysuffix.com`.
This makes more sense from the client's point of view than using the
default `authentication.concierge.pinniped.dev` because
`authentication.concierge.mysuffix.com` is the name of the API group
that they can observe their cluster and `authentication.concierge.pinniped.dev`
does not exist as an API group on their cluster.
This commit includes both the client and server-side changes to make
this work, as well as integration test updates.
Co-authored-by: Andrew Keesler <akeesler@vmware.com>
Co-authored-by: Ryan Richard <richardry@vmware.com>
Co-authored-by: Margo Crawford <margaretc@vmware.com>
Makes it easy to deploy Pinniped under a different API group for manual
testing and iterating on integration tests on your laptop.
Signed-off-by: Ryan Richard <richardry@vmware.com>
Because it is a test of the conciergeclient package, and the naming
convention for integration test files is supervisor_*_test.go,
concierge_*_test.go, or cli_*_test.go to identify which component
the test is primarily covering.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This makes sure that if our clients ever send types with the wrong
group, the server will refuse to decode it.
Signed-off-by: Monis Khan <mok@vmware.com>
- I realized that the hardcoded fakekubeapi 404 not found response was invalid,
so we were getting a default error message. I fixed it so the tests follow a
higher fidelity code path.
- I caved and added a test for making sure the request body was always closed,
and believe it or not, we were double closing a body. I don't *think* this will
matter in production, since client-go will pass us ioutil.NopReader()'s, but
at least we know now.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Yes, this is a huge commit.
The middleware allows you to customize the API groups of all of the
*.pinniped.dev API groups.
Some notes about other small things in this commit:
- We removed the internal/client package in favor of pkg/conciergeclient. The
two packages do basically the same thing. I don't think we use the former
anymore.
- We re-enabled cluster-scoped owner assertions in the integration tests.
This code was added in internal/ownerref. See a0546942 for when this
assertion was removed.
- Note: the middlware code is in charge of restoring the GV of a request object,
so we should never need to write mutations that do that.
- We updated the supervisor secret generation to no longer manually set an owner
reference to the deployment since the middleware code now does this. I think we
still need some way to make an initial event for the secret generator
controller, which involves knowing the namespace and the name of the generated
secret, so I still wired the deployment through. We could use a namespace/name
tuple here, but I was lazy.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Co-authored-by: Ryan Richard <richardry@vmware.com>
This was generated via `hugo gen chromastyles --style=monokailight > ./site/themes/pinniped/assets/scss/_syntax.css`.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We have these redirects set up to make the `kubectl apply -f [...]` commands cleaner, but we never went back and fixed up the documentation to use them until now.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This project overwrote the v1.0.0 tag with a different commit ID, which has caused issues with the Go module sum DB (which accurately detected the issue).
This has been one of the reasons why Dependabot is not updating our Go dependencies.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This optimizes our image in a few different ways:
- It adds a bunch of files and directories to the `.dockerignore` file.
This lets us have a single `COPY . .` but still be very aggressive about pruning what files end up in the build context.
- It adds build-time cache mounts to the `go build` commands using BuildKit's `--mount=type=cache` flag.
This requires BuildKit-capable Docker, but means that our Go builds can all be incremental builds.
This replaces the previous flow we had where we needed to split out `go mod download`.
- Instead of letting the full `apt-get install ca-certificates` layer end up in our final image, we copy just the single file we need.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We need this in CI when we want to configure Dex with the redirect URI for both
primary and secondary deploys at one time (since we only stand up Dex once).
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I didn't advertise this feature in the deploy README's since (hopefully) not
many people will want to use it?
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Previously, when triggering a Tilt reload via a *.go file change, a reload would
take ~13 seconds and we would see this error message in the Tilt logs for each
component.
Live Update failed with unexpected error:
command terminated with exit code 2
Falling back to a full image build + deploy
Now, Tilt should reload images a lot faster (~3 seconds) since we are running
the images as root.
Note! Reloading the Concierge component still takes ~13 seconds because there
are 2 containers running in the Concierge namespace that use the Concierge
image: the main Concierge app and the kube cert agent pod. Tilt can't live
reload both of these at once, so the reload takes longer and we see this error
message.
Will not perform Live Update because:
Error retrieving container info: can only get container info for a single pod; image target image:image/concierge has 2 pods
Falling back to a full image build + deploy
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This reverts commit 4a28d1f800.
This commit was originally made to fix a bug that caused TokenCredentialRequest
to become slow when the server was idle for an extended period of time. This was
to address a Kubernetes issue that was fixed in 1.19.5 and onward. We are now
running with Kubernetes 1.20, so we should be able to pick up this fix.
2021-01-13 11:12:09 -05:00
2875 changed files with 427350 additions and 72140 deletions
This type of issue should only be opened if you intend to create a
formal proposal document. Please refer to the proposal process in
[proposals/README.md](proposals/README.md).
Please title this issue starting with `[Proposal]` followed by a
title for what you are going to propose. For example:
`[Proposal] Lunar landing module authentication via Pinniped`.
-->
### Proposal Tracking Issue
- Proposal: <!-- this starts empty, then please update to link to proposal PR, then also link to proposal doc file after it is merged -->
- Discussion Links: <!-- link to any mailing list threads, Slack conversations, community meetings, or other places where the proposal was discussed, if any -->
-<!-- A -->
-<!-- B -->
- Pull requests: <!-- link to all PRs related to this proposal such as updates to the proposal doc, implementation PRs, etc. - keep this list up to date -->
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
# queries: security-extended,security-and-quality
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name:Autobuild
uses:github/codeql-action/autobuild@v2
# ℹ️ Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
# If the Autobuild fails above, remove it and uncomment the following three lines.
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
Below is a list of solutions where Pinniped is being used as a component.
**[Kubeapps](https://kubeapps.com/)**
Kubeapps uses Pinniped to [enable SSO authentication](https://github.com/kubeapps/kubeapps/blob/master/docs/user/using-an-OIDC-provider-with-pinniped.md) when running on clusters where SSO cannot be configured for the cluster API server.
TKG uses Pinniped to provide a seamless SSO experience across management and workload clusters.
**[VMware Tanzu Mission Control (TMC)](https://tanzu.vmware.com/mission-control)**
TMC uses Pinniped to provide a uniform authentication experience across all attached clusters.
## Adding your organization to the list of adopters
If you are using Pinniped and would like to be included in the list of Pinniped Adopters, add an SVG version of your logo that is less than 150 KB to
the [img directory](https://github.com/vmware-tanzu/pinniped/tree/main/site/themes/pinniped/static/img) in this repo and submit a pull request with your change including 1-2 sentences describing how your organization is using Pinniped. Name the image file something that
reflects your company (e.g., if your company is called Acme, name the image acme.svg). Please feel free to send us a message in [#pinniped](https://kubernetes.slack.com/archives/C01BW364RJA) with any questions you may have.
Pinniped is better because of our contributors and [maintainers](MAINTAINERS.md). It is because of you that we can bring
great software to the community.
Contributions to Pinniped are welcome. Here are some things to help you get started.
## Code of Conduct
@@ -8,24 +11,19 @@ Please see the [Code of Conduct](./CODE_OF_CONDUCT.md).
## Project Scope
Learn about the [scope](https://pinniped.dev/docs/scope/) of the project.
See [SCOPE.md](./SCOPE.md) for some guidelines about what we consider in and out of scope for Pinniped.
## Meeting with the Maintainers
## Roadmap
The maintainers aspire to hold a video conference every other week with the Pinniped community.
Any community member may request to add topics to the agenda by contacting a [maintainer](MAINTAINERS.md)
in advance, or by attending and raising the topic during time remaining after the agenda is covered.
Typical agenda items include topics regarding the roadmap, feature requests, bug reports, pull requests, etc.
A [public document](https://docs.google.com/document/d/1qYA35wZV-6bxcH5375vOnIGkNBo7e4OROgsV4Sj8WjQ)
tracks the agendas and notes for these meetings.
These meetings are currently scheduled for the first and third Thursday mornings of each month
at 9 AM Pacific Time, using this [Zoom meeting](https://VMware.zoom.us/j/94638309756?pwd=V3NvRXJIdDg5QVc0TUdFM2dYRzgrUT09).
If the meeting day falls on a US holiday, please consider that occurrence of the meeting to be canceled.
The near-term and mid-term roadmap for the work planned for the project [maintainers](MAINTAINERS.md) is documented in [ROADMAP.md](ROADMAP.md).
## Discussion
Got a question, comment, or idea? Please don't hesitate to reach out via the GitHub [Discussions](https://github.com/vmware-tanzu/pinniped/discussions) tab at the top of this page.
Got a question, comment, or idea? Please don't hesitate to reach out
via GitHub [Discussions](https://github.com/vmware-tanzu/pinniped/discussions),
and tag it with `proposal`, or create a new [Discussion](https://github.com/vmware-tanzu/pinniped/discussions).
The project team will work with you on your feature request.
The project [maintainers](MAINTAINERS.md) will work with you on your feature request.
Once the feature request has been validated, a [pull request](https://github.com/vmware-tanzu/pinniped/compare)
can be opened to implement the feature.
@@ -60,25 +58,44 @@ can be opened to implement the feature.
For specifics on what to include in your feature request, please follow the
guidelines in the issue and pull request templates.
### Reporting security vulnerabilities
Please follow the procedure described in [SECURITY.md](SECURITY.md).
## CLA
We welcome contributions from everyone but we can only accept them if you sign
We welcome contributions from everyone, but we can only accept them if you sign
our Contributor License Agreement (CLA). If you would like to contribute and you
have not signed it, our CLA-bot will walk you through the process when you open
a Pull Request. For questions about the CLA process, see the
[FAQ](https://cla.vmware.com/faq) or submit a question through the GitHub issue
tracker.
## Learning about Pinniped
New to Pinniped?
- Start here to learn how to install and use Pinniped: [Learn to use Pinniped for federated authentication to Kubernetes clusters](https://pinniped.dev/docs/tutorials/concierge-and-supervisor-demo/)
- Start here to learn how to navigate the source code: [Code Walk-through](https://pinniped.dev/docs/reference/code-walkthrough/)
- Other more detailed documentation can be found at: [Pinniped Docs](https://pinniped.dev/docs/)
## Building
The [Dockerfile](Dockerfile) at the root of the repo can be used to build and
package the code. After making a change to the code, rebuild the docker image with the following command.
package the server-side code. After making a change to the code, rebuild the
docker image with the following command.
```bash
# From the root directory of the repo...
docker build .
```
The Pinniped CLI client can be built for local use with the following command.
Pinniped provides identity services to Kubernetes.
Pinniped allows cluster administrators to easily plug in external identity
providers (IDPs) into Kubernetes clusters. This is achieved via a uniform
install procedure across all types and origins of Kubernetes clusters,
declarative configuration via Kubernetes APIs, enterprise-grade integrations
with IDPs, and distribution-specific integration strategies.
- Easily plug in external identity providers into Kubernetes clusters while offering a simple install and configuration experience. Leverage first class integration with Kubernetes and kubectl command-line.
- Give users a consistent, unified login experience across all your clusters, including on-premises and managed cloud environments.
- Securely integrate with an enterprise IDP using standard protocols or use secure, externally managed identities instead of relying on simple, shared credentials.
### Example Use Cases
To learn more, please visit the Pinniped project's website, https://pinniped.dev.
* Your team uses a large enterprise IDP, and has many clusters that they
manage. Pinniped provides:
* Seamless and robust integration with the IDP
* Easy installation across clusters of any type and origin
* A simplified login flow across all clusters
* Your team shares a single cluster. Pinniped provides:
* Simple configuration to integrate an IDP
* Individual, revocable identities
## Getting started with Pinniped
### Architecture
The Pinniped Supervisor component offers identity federation to enable a user to
access multiple clusters with a single daily login to their external IDP. The
Pinniped Supervisor supports various external [IDP
Care to kick the tires? It's easy to [install and try Pinniped](https://pinniped.dev/docs/demo/).
Care to kick the tires? It's easy to [install and try Pinniped](https://pinniped.dev/docs/).
## Discussion
Got a question, comment, or idea? Please don't hesitate to reach out via the GitHub [Discussions](https://github.com/vmware-tanzu/pinniped/discussions) tab at the top of this page.
Got a question, comment, or idea? Please don't hesitate to reach out
via GitHub [Discussions](https://github.com/vmware-tanzu/pinniped/discussions),
Pinniped development is sponsored by VMware, and the Pinniped team encourages users
who become aware of a security vulnerability in Pinniped to report any potential
vulnerabilities found to security@vmware.com. If possible, please include a description
of the effects of the vulnerability, reproduction steps, and a description of in which
version of Pinniped or its dependencies the vulnerability was discovered.
The use of encrypted email is encouraged. The public PGP key can be found at https://kb.vmware.com/kb/1055.
Pinniped provides identity services for Kubernetes clusters. The community has adopted this security disclosure and response policy to ensure we responsibly handle critical issues.
The Pinniped team hopes that users encountering a new vulnerability will contact
us privately as it is in the best interests of our users that the Pinniped team has
an opportunity to investigate and confirm a suspected vulnerability before it becomes public knowledge.
## Supported Versions
As of right now, only the latest version of Pinniped is supported.
## Reporting a Vulnerability - Private Disclosure Process
Security is of the highest importance and all security vulnerabilities or suspected security vulnerabilities should be reported to Pinniped privately, to minimize attacks against current users of Pinniped before they are fixed. Vulnerabilities will be investigated and patched on the next patch (or minor) release as soon as possible. This information could be kept entirely internal to the project.
If you know of a publicly disclosed security vulnerability for Pinniped, please **IMMEDIATELY** contact the VMware Security Team (security@vmware.com). The use of encrypted email is encouraged. The public PGP key can be found at https://kb.vmware.com/kb/1055.
**IMPORTANT: Do not file public issues on GitHub for security vulnerabilities**
To report a vulnerability or a security-related issue, please contact the VMware email address with the details of the vulnerability. The email will be fielded by the VMware Security Team and then shared with the Pinniped maintainers who have committer and release permissions. Emails will be addressed within 3 business days, including a detailed plan to investigate the issue and any potential workarounds to perform in the meantime. Do not report non-security-impacting bugs through this channel. Use [GitHub issues](https://github.com/vmware-tanzu/pinniped/issues/new/choose) instead.
## Proposed Email Content
Provide a descriptive subject line and in the body of the email include the following information:
* Basic identity information, such as your name and your affiliation or company.
* Detailed steps to reproduce the vulnerability (POC scripts, screenshots, and logs are all helpful to us).
* Description of the effects of the vulnerability on Pinniped and the related hardware and software configurations, so that the VMware Security Team can reproduce it.
* How the vulnerability affects Pinniped usage and an estimation of the attack surface, if there is one.
* List other projects or dependencies that were used in conjunction with Pinniped to produce the vulnerability.
## When to report a vulnerability
* When you think Pinniped has a potential security vulnerability.
* When you suspect a potential vulnerability but you are unsure that it impacts Pinniped.
* When you know of or suspect a potential vulnerability on another project that is used by Pinniped.
## Patch, Release, and Disclosure
The VMware Security Team will respond to vulnerability reports as follows:
1. The Security Team will investigate the vulnerability and determine its effects and criticality.
2. If the issue is not deemed to be a vulnerability, the Security Team will follow up with a detailed reason for rejection.
3. The Security Team will initiate a conversation with the reporter within 3 business days.
4. If a vulnerability is acknowledged and the timeline for a fix is determined, the Security Team will work on a plan to communicate with the appropriate community, including identifying mitigating steps that affected users can take to protect themselves until the fix is rolled out.
5. The Security Team will also create a [CVSS](https://www.first.org/cvss/specification-document) using the [CVSS Calculator](https://www.first.org/cvss/calculator/3.0). The Security Team makes the final call on the calculated CVSS; it is better to move quickly than making the CVSS perfect. Issues may also be reported to [Mitre](https://cve.mitre.org/) using this [scoring calculator](https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator). The CVE will initially be set to private.
6. The Security Team will work on fixing the vulnerability and perform internal testing before preparing to roll out the fix.
7. The Security Team will provide early disclosure of the vulnerability by emailing the [Pinniped Distributors](https://groups.google.com/g/project-pinniped-distributors) mailing list. Distributors can initially plan for the vulnerability patch ahead of the fix, and later can test the fix and provide feedback to the Pinniped team. See the section **Early Disclosure to Pinniped Distributors List** for details about how to join this mailing list.
8. A public disclosure date is negotiated by the VMware SecurityTeam, the bug submitter, and the distributors list. We prefer to fully disclose the bug as soon as possible once a user mitigation or patch is available. It is reasonable to delay disclosure when the bug or the fix is not yet fully understood, the solution is not well-tested, or for distributor coordination. The timeframe for disclosure is from immediate (especially if it’s already publicly known) to a few weeks. For a critical vulnerability with a straightforward mitigation, we expect the report date for the public disclosure date to be on the order of 14 business days. The VMware Security Team holds the final say when setting a public disclosure date.
9. Once the fix is confirmed, the Security Team will patch the vulnerability in the next patch or minor release, and backport a patch release into all earlier supported releases. Upon release of the patched version of Pinniped, we will follow the **Public Disclosure Process**.
## Public Disclosure Process
The Security Team publishes a [public advisory](https://github.com/vmware-tanzu/pinniped/security/advisories) to the Pinniped community via GitHub. In most cases, additional communication via Slack, Twitter, mailing lists, blog and other channels will assist in educating Pinniped users and rolling out the patched release to affected users.
The Security Team will also publish any mitigating steps users can take until the fix can be applied to their Pinniped instances. Pinniped distributors will handle creating and publishing their own security advisories.
## Mailing lists
* Use security@vmware.com to report security concerns to the VMware Security Team, who uses the list to privately discuss security issues and fixes prior to disclosure. The use of encrypted email is encouraged. The public PGP key can be found at https://kb.vmware.com/kb/1055.
* Join the [Pinniped Distributors](https://groups.google.com/g/project-pinniped-distributors) mailing list for early private information and vulnerability disclosure. Early disclosure may include mitigating steps and additional information on security patch releases. See below for information on how Pinniped distributors or vendors can apply to join this list.
## Early Disclosure to Pinniped Distributors List
The private list is intended to be used primarily to provide actionable information to multiple distributor projects at once. This list is not intended to inform individuals about security issues.
## Membership Criteria
To be eligible to join the [Pinniped Distributors](https://groups.google.com/g/project-pinniped-distributors) mailing list, you should:
1. Be an active distributor of Pinniped.
2. Have a user base that is not limited to your own organization.
3. Have a publicly verifiable track record up to the present day of fixing security issues.
4. Not be a downstream or rebuild of another distributor.
5. Be a participant and active contributor in the Pinniped community.
6. Accept the Embargo Policy that is outlined below.
7. Have someone who is already on the list vouch for the person requesting membership on behalf of your distribution.
**The terms and conditions of the Embargo Policy apply to all members of this mailing list. A request for membership represents your acceptance to the terms and conditions of the Embargo Policy.**
## Embargo Policy
The information that members receive on the Pinniped Distributors mailing list must not be made public, shared, or even hinted at anywhere beyond those who need to know within your specific team, unless you receive explicit approval to do so from the VMware Security Team. This remains true until the public disclosure date/time agreed upon by the list. Members of the list and others cannot use the information for any reason other than to get the issue fixed for your respective distribution's users.
Before you share any information from the list with members of your team who are required to fix the issue, these team members must agree to the same terms, and only be provided with information on a need-to-know basis.
In the unfortunate event that you share information beyond what is permitted by this policy, you must urgently inform the VMware Security Team (security@vmware.com) of exactly what information was leaked and to whom. If you continue to leak information and break the policy outlined here, you will be permanently removed from the list.
## Requesting to Join
Send new membership requests to https://groups.google.com/g/project-pinniped-distributors. In the body of your request please specify how you qualify for membership and fulfill each criterion listed in the Membership Criteria section above.
## Confidentiality, integrity and availability
We consider vulnerabilities leading to the compromise of data confidentiality, elevation of privilege, or integrity to be our highest priority concerns. Availability, in particular in areas relating to DoS and resource exhaustion, is also a serious security concern. The VMware Security Team takes all vulnerabilities, potential vulnerabilities, and suspected vulnerabilities seriously and will investigate them in an urgent and expeditious manner.
cmd.Flags().Uint16Var(&flags.listenPort,"listen-port",0,"TCP port for localhost listener (authorization code flow only)")
cmd.Flags().StringSliceVar(&flags.scopes,"scopes",[]string{oidc.ScopeOfflineAccess,oidc.ScopeOpenID,"pinniped:request-audience"},"OIDC scopes to request during login")
cmd.Flags().StringSliceVar(&flags.scopes,"scopes",[]string{oidcapi.ScopeOfflineAccess,oidcapi.ScopeOpenID,oidcapi.ScopeRequestAudience,oidcapi.ScopeUsername,oidcapi.ScopeGroups},"OIDC scopes to request during login")
cmd.Flags().BoolVar(&flags.skipBrowser,"skip-browser",false,"Skip opening the browser (just print the URL)")
cmd.Flags().BoolVar(&flags.skipListen,"skip-listen",false,"Skip starting a localhost callback listener (manual copy/paste flow only)")
cmd.Flags().StringVar(&flags.sessionCachePath,"session-cache",filepath.Join(mustGetConfigDir(),"sessions.yaml"),"Path to session cache file")
cmd.Flags().StringSliceVar(&flags.caBundlePaths,"ca-bundle",nil,"Path to TLS certificate authority bundle (PEM format, optional, can be repeated)")
cmd.Flags().StringSliceVar(&flags.caBundleData,"ca-bundle-data",nil,"Base64 endcoded TLS certificate authority bundle (base64 encoded PEM format, optional, can be repeated)")
cmd.Flags().StringSliceVar(&flags.caBundleData,"ca-bundle-data",nil,"Base64 encoded TLS certificate authority bundle (base64 encoded PEM format, optional, can be repeated)")
cmd.Flags().BoolVar(&flags.debugSessionCache,"debug-session-cache",false,"Print debug logs related to the session cache")
cmd.Flags().StringVar(&flags.requestAudience,"request-audience","","Request a token with an alternate audience using RFC8693 token exchange")
cmd.Flags().BoolVar(&flags.conciergeEnabled,"enable-concierge",false,"Exchange the OIDC ID token with the Pinniped concierge during login")
cmd.Flags().StringVar(&flags.conciergeNamespace,"concierge-namespace","pinniped-concierge","Namespace in which the concierge was installed")
cmd.Flags().BoolVar(&flags.conciergeEnabled,"enable-concierge",false,"Use the Concierge to login")
cmd.Flags().StringVar(&conciergeNamespace,"concierge-namespace","pinniped-concierge","Namespace in which the Concierge was installed")
cmd.Flags().StringVar(&flags.conciergeAuthenticatorType,"concierge-authenticator-type","","Concierge authenticator type (e.g., 'webhook', 'jwt')")
cmd.Flags().StringVar(&flags.conciergeEndpoint,"concierge-endpoint","","API base for the Pinniped concierge endpoint")
cmd.Flags().StringVar(&flags.conciergeCABundle,"concierge-ca-bundle-data","","CA bundle to use when connecting to the concierge")
cmd.Flags().StringVar(&flags.conciergeEndpoint,"concierge-endpoint","","API base for the Concierge endpoint")
cmd.Flags().StringVar(&flags.conciergeCABundle,"concierge-ca-bundle-data","","CA bundle to use when connecting to the Concierge")
cmd.Flags().StringVar(&flags.conciergeAPIGroupSuffix,"concierge-api-group-suffix",groupsuffix.PinnipedDefaultSuffix,"Concierge API group suffix")
cmd.Flags().StringVar(&flags.credentialCachePath,"credential-cache",filepath.Join(mustGetConfigDir(),"credentials.yaml"),"Path to cluster-specific credentials cache (\"\" disables the cache)")
cmd.Flags().StringVar(&flags.upstreamIdentityProviderName,"upstream-identity-provider-name","","The name of the upstream identity provider used during login with a Supervisor")
cmd.Flags().StringVar(&flags.upstreamIdentityProviderType,"upstream-identity-provider-type",idpdiscoveryv1alpha1.IDPTypeOIDC.String(),fmt.Sprintf("The type of the upstream identity provider used during login with a Supervisor (e.g. '%s', '%s', '%s')",idpdiscoveryv1alpha1.IDPTypeOIDC,idpdiscoveryv1alpha1.IDPTypeLDAP,idpdiscoveryv1alpha1.IDPTypeActiveDirectory))
cmd.Flags().StringVar(&flags.upstreamIdentityProviderFlow,"upstream-identity-provider-flow","",fmt.Sprintf("The type of client flow to use with the upstream identity provider during login with a Supervisor (e.g. '%s', '%s')",idpdiscoveryv1alpha1.IDPFlowBrowserAuthcode,idpdiscoveryv1alpha1.IDPFlowCLIPassword))
mustMarkHidden(&cmd,"debug-session-cache")
mustMarkRequired(&cmd,"issuer")
// --skip-listen is mainly needed for testing. We'll leave it hidden until we have a non-testing use case.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.