These extra timeout contexts were only in the new multiple IDPs e2e
test. Remove this possible cause of test cleanup flakes where the test
runs slow enough in CI that this timeout context has already expired
and then the cleanup function fails with context deadline exceeded
errors.
To make the subject of the downstream ID token more unique when
there are multiple IDPs. It is possible to define two IDPs in a
FederationDomain using the same identity provider CR, in which
case the only thing that would make the subject claim different
is adding the IDP display name into the values of the subject claim.
- Remove that validation from the controller since the CRD already
validates it during creates and updates.
- Also finish the supervisor_federationdomain_status_test.go by adding
more tests for both controller validations and CRD validations
- Also fix small bug in controller where it used Sprintf wrong
- Rename WaitForTestFederationDomainStatus test helper to
WaitForFederationDomainStatusPhase
Also changes the transformation pipeline code to sort and uniq
the transformed group names at the end of the pipeline. This makes
the results more predicable without changing the semantics.
- Avoid a possible race condition where the status says "Ready" but
the endpoints take another moment to become available, potentially
casing a fast client to get a 404 after observing that the status
is "Ready" and then immediately trying to use the endpoints.
Co-authored-by: Benjamin A. Petersen <ben@benjaminapetersen.me>
- Refactor testlib.CreateTestFederationDomain helper
- Call testlib.WaitForTestFederationDomainStatus after each integration
test creates an IDP and expects the FederationDomain to become ready
- Create an IDP for some tests which want the FederationDomain to be
ready but were previously not creating any IDP
- Expect the new FederationDomain condition type
"IdentityProvidersFound" in those tests where it is needed
Co-authored-by: Joshua Casey <joshuatcasey@gmail.com>
- adds the truthy condition
- TODOs for falsy conditions
- addiional notes for other conditions
- tests updated to pass with the new condition
Co-authored-by: Ryan Richard <richardry@vmware.com>
- move pushd/popd inside if statements for alternative-deploy methods
- add specific alternative-deploy vars for individual components
- supervisor
- concierge
- local-user-authenticator
while preserving the current alternative-deploy for all three
- doc that equals for flags does not work
--foo=bar is invalid
--foo bar is valid
Used this as an opportunity to refactor how some tests were
making assertions about error strings.
New test helpers make it easy for an error string to be expected as an
exact string, as a string built using sprintf, as a regexp, or as a
string built to include the platform-specific x509 error string.
All of these helpers can be used in a single `wantErr` field of a test
table. They can be used for both unit tests and integration tests.
Co-authored-by: Benjamin A. Petersen <ben@benjaminapetersen.me>
- Specify mappings on OIDCIdentityProvider.spec.claims.additionalClaimMappings
- Advertise additionalClaims in the OIDC discovery endpoint under claims_supported
Co-authored-by: Ryan Richard <richardry@vmware.com>
Co-authored-by: Joshua Casey <joshuatcasey@gmail.com>
Also increase the timeout in an integration test because it is flaking
on one of the GKE environments sometimes, probably because the
Concierge controllers aren't ready fast enough before the integration
tests start.
The fuzzed value depends on which Go compiler is used. This breaks
the fips tests in CI as long as the fips compiler is a version behind
(we are still waiting for the 1.19 fips compiler to come out).
The fuzzing is still being tested by a separate unit test, so we are
not losing fuzzing test coverage.
- Upgrade Go used in CI from 1.19.0 to 1.19.1
- Upgrade all go.mod direct dependencies to latest available versions
- Upgrade distroless base image to latest available version
- Upgrade Go fips compiler to to latest available version
Note that upgrading the go-oidc library changed an error message
returned by that library, so update the places where tests were
expecting that error message.
When oidcclientsecretstorage.Set() wants to update the contents of the
storage Secret, it also wants to keep the original ownerRef of the
storage Secret, so it needs the middleware to rewrite the API group
of the ownerRef again during the update (just like it had initially done
during the create of the Secret).
Sets the Name, Namespace, CreationTimestamp fields in the object meta
of the return value.
Co-authored-by: Ryan Richard <richardry@vmware.com>
Co-authored-by: Benjamin A. Petersen <ben@benjaminapetersen.me>
- Change update-codegen.sh script to also generated openapi code for the
aggregated API types
- Update both aggregated API servers' configuration to make them serve
the openapi docs for the aggregated APIs
- Add new integration test which runs `kubectl explain` for all Pinniped
API resources, and all fields and subfields of those resources
- Update some the comments on the API structs
- Change some names of the tmpl files to make the filename better match
the struct names
This commit is a WIP commit because it doesn't include many tests
for the new feature.
Co-authored-by: Ryan Richard <richardry@vmware.com>
Co-authored-by: Benjamin A. Petersen <ben@benjaminapetersen.me>
Where possible, use securityContext settings which will work with the
most restrictive Pod Security Admission policy level (as of Kube 1.25).
Where privileged containers are needed, use the namespace-level
annotation to allow them.
Also adjust some integration tests to make similar changes to allow the
integration tests to pass on test clusters which use restricted PSAs.
This test is a little flaky in slow Kubernetes clusters. Try giving a
little more time for things to update before failing the test, to
hopefully make this test a little more reliable.
It seems that google changed it from "the gcp auth plugin is deprecated
in v1.22+, unavailable in v1.25+; use gcloud instead" to instead say
"unavailable in v1.26+". Make the matcher in category_test.go more loose
to allow both to match.
Also fix some tests that were broken by bumping golang and dependencies
in the previous commits.
Note that in addition to changes made to satisfy the linter which do not
impact the behavior of the code, this commit also adds ReadHeaderTimeout
to all usages of http.Server to satisfy the linter (and because it
seemed like a good suggestion).
Add the new `username` scope to the Supervisor and exclude usernames from dynamic clients which are not granted the scope, and other dynamic client related changes
When the token exchange grant type is used to get a cluster-scoped
ID token, the returned token has a new audience value. The client ID
of the client which performed the authorization was lost. This didn't
matter before, since the only client was `pinniped-cli`, but now that
dynamic clients can be registered, the information would be lost in the
cluster-scoped ID token. It could be useful for logging, tracing, or
auditing, so preserve the information by putting the client ID into the
`azp` claim in every ID token (authcode exchange, clsuter-scoped, and
refreshed ID tokens).
- For backwards compatibility with older Pinniped CLIs, the pinniped-cli
client does not need to request the username or groups scopes for them
to be granted. For dynamic clients, the usual OAuth2 rules apply:
the client must be allowed to request the scopes according to its
configuration, and the client must actually request the scopes in the
authorization request.
- If the username scope was not granted, then there will be no username
in the ID token, and the cluster-scoped token exchange will fail since
there would be no username in the resulting cluster-scoped ID token.
- The OIDC well-known discovery endpoint lists the username and groups
scopes in the scopes_supported list, and lists the username and groups
claims in the claims_supported list.
- Add username and groups scopes to the default list of scopes
put into kubeconfig files by "pinniped get kubeconfig" CLI command,
and the default list of scopes used by "pinniped login oidc" when
no list of scopes is specified in the kubeconfig file
- The warning header about group memberships changing during upstream
refresh will only be sent to the pinniped-cli client, since it is
only intended for kubectl and it could leak the username to the
client (which may not have the username scope granted) through the
warning message text.
- Add the user's username to the session storage as a new field, so that
during upstream refresh we can compare the original username from the
initial authorization to the refreshed username, even in the case when
the username scope was not granted (and therefore the username is not
stored in the ID token claims of the session storage)
- Bump the Supervisor session storage format version from 2 to 3
due to the username field being added to the session struct
- Extract commonly used string constants related to OIDC flows to api
package.
- Change some import names to make them consistent:
- Always import github.com/coreos/go-oidc/v3/oidc as "coreosoidc"
- Always import go.pinniped.dev/generated/latest/apis/supervisor/oidc
as "oidcapi"
- Always import go.pinniped.dev/internal/oidc as "oidc"
- Add dynamic client unit tests for the upstream OIDC callback and
POST login endpoints.
- Enhance a few log statements to print the full fosite error messages
into the logs where they were previously only printing the name of
the error type.
- Enhance the token exchange to check that the same client is used
compared to the client used during the original authorization and
token requests, and also check that the client has the token-exchange
grant type allowed in its configuration.
- Reduce the minimum required bcrypt cost for OIDCClient secrets
because 15 is too slow for real-life use, especially considering
that every login and every refresh flow will require two client auths.
- In unit tests, use bcrypt hashes with a cost of 4, because bcrypt
slows down by 13x when run with the race detector, and we run our
tests with the race detector enabled, causing the tests to be
unacceptably slow. The production code uses a higher minimum cost.
- Centralize all pre-computed bcrypt hashes used by unit tests to a
single place. Also extract some other useful test helpers for
unit tests related to OIDCClients.
- Add tons of unit tests for the token endpoint related to dynamic
clients for authcode exchanges, token exchanges, and refreshes.
This is only a first commit towards making this feature work.
- Hook dynamic clients into fosite by returning them from the storage
interface (after finding and validating them)
- In the auth endpoint, prevent the use of the username and password
headers for dynamic clients to force them to use the browser-based
login flows for all the upstream types
- Add happy path integration tests in supervisor_login_test.go
- Add lots of comments (and some small refactors) in
supervisor_login_test.go to make it much easier to understand
- Add lots of unit tests for the auth endpoint regarding dynamic clients
(more unit tests to be added for other endpoints in follow-up commits)
- Enhance crud.go to make lifetime=0 mean never garbage collect,
since we want client secret storage Secrets to last forever
- Move the OIDCClient validation code to a package where it can be
shared between the controller and the fosite storage interface
- Make shared test helpers for tests that need to create OIDC client
secret storage Secrets
- Create a public const for "pinniped-cli" now that we are using that
string in several places in the production code
The following validation is enforced:
1. Names must start with client.oauth.pinniped.dev-
2. Redirect URIs must start with https://
or http://127.0.0.1
or http://::1
3. All spec lists must not have duplicates
Added an integration test to assert all static validations.
Signed-off-by: Monis Khan <mok@vmware.com>
When response_mode=form_post is requested, some error cases will be
returned to the client using the form_post web page to POST the result
back to the client's redirect URL.
Also fix some comments that didn't fit onto one line in the yaml
examples, be consistent about putting a blank line above `---` yaml
separators, and some other small doc improvements.
Also:
- Add CSS to login page
- Refactor login page HTML and CSS into a new package
- New custom CSP headers for the login page, because the requirements
are different from the form_post page
Note that attempting to update 1.18.18 to 1.18.20 didn't work for some
reason, so I skipped that one. The code generator didn't like 1.18.20
and it deleted all the generated code. Avoiding 1.18.19 because it is
listed as having a regression at
https://kubernetes.io/releases/patch-releases/#non-active-branch-history
The other handlers for GET and POST requests are not yet implemented in
this commit. The shared handler code in login_handler.go takes care of
things checking the method, checking the CSRF cookie, decoding the state
param, and adding security headers on behalf of both the GET and POST
handlers.
Some code has been extracted from callback_handler.go to be shared.
Also fix some test failures on the callback handler, register the
new login handler in manager.go and add a (half baked) integration test
Signed-off-by: Margo Crawford <margaretc@vmware.com>
To keep this backwards compatible, this PR changes how
the cli deals with ambiguous flows. Previously, if there
was more than one flow advertised, the cli would require users
to set the flag --upstream-identity-provider-flow. Now it
chooses the first one in the list.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
Go 1.18.1 started using MacOS' x509 verification APIs on Macs
rather than Go's own. The error messages are different.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
We cannot use plog until the log level config has been setup, but
that occurs after this init function has run.
Signed-off-by: Monis Khan <mok@vmware.com>
- Use camel-case in the static configmap
- Parse the value into a boolean in the go struct instead of a string
- Add test for when unsupported value is used in the configmap
- Run the config_test.go tests in parallel
- Update some paragraphs in configure-supervisor.md for clarity
Add new deprecated_insecure_accept_external_unencrypted_http_requests
value in values.yaml. Allow it to be a boolean or a string to make it
easier to use (both --data-value and --data-value-yaml will work).
Also:
- Consider "ip6-localhost" and "ip6-loopback" to be loopback addresses
for the validation
- Remove unused env.SupervisorHTTPAddress var
- Deprecate the `service_http_*` values in values.yaml by renaming them
and causing a ytt render error when the old names are used
ory/x has new releases very often, sometimes multiple times per week,
causing a lot of noise from dependabot. We were barely using it
directly, so replace our direct usages with equivalent code.
You can use an older version of K8s on your development workstation
by temporarily editing kind-up.sh to add the `--image` flag. By defining
both v1beta2 and v1beta3 you should continue to be able to use old
versions of K8s in this way with Kind v0.12.0.
It appears that kind completely ignores kubeadm.k8s.io/v1beta2 config
starting in Kind v0.12.0.
You can observe the config being ignored or used by adding `-v 10` to
the command-line arguments of `kind create cluster` in kind-up.sh.
- Two of the linters changed their names
- Updated code and nolint comments to make all linters pass with 1.44.2
- Added a new hack/install-linter.sh script to help developers install
the expected version of the linter for local development
- Note that v0.8.0 no longer supports the "trivialVersions=true"
command-line option, so remove that from update-codegen.sh.
It doesn't seem to impact the output (our generated CRD yaml files).
Also:
- Make our code generator script work with Go 1.17
- Make our update.sh script work on linux
- Update the patch versions of the old Kube versions that we were using
to generate code (see kube-versions.txt)
- Use our container images from ghcr instead of
projects.registry.vmware.com for codegen purposes
- Make it easier to debug in the future by passing "-v" to the Kube
codegen scripts
- Updated copyright years to make commit checks pass
The purpose of this change is to allow Helm to be used to deploy Pinniped
into the local KinD cluster for the local integration tests. That said,
the change allows any alternate deployment mechanism, I just happen
to be using it with Helm.
All default behavior is preserved. This won't change how anyone uses the
script today, it just allows me not to copy/paste the whole setup for the
integration tests.
Changes:
1) An option called `--alternate-deploy <path-to-deploy-script>` has been
added, that when enabled calls the specified script instead of using ytt
and kapp. The alternate deploy script is called with the app to deploy
and the tag of the docker image to use. We set the default value of
the alternate_deploy variable to undefined, and there is a check that
tests if the alternate deploy is defined. For the superivsor it looks
like this:
```
if [ "$alternate_deploy" != "undefined" ]; then
log_note "The Pinniped Supervisor will be deployed with $alternate_deploy pinniped-supervisor $tag..."
$alternate_deploy pinniped-supervisor $tag
else
normal ytt/kapp deploy
fi
```
2) Additional log_note entries have been added to enumerate all values passed
into the ytt/kapp deploy. Used while I was trying to reach parity in the integration
tests, but I think they are useful for debugging.
3) The manifests produced by ytt and written to /tmp are now named individually.
This is so an easy comparison can be made between manifests produced by a ytt/kapp
run of integration tests and manifests produced by helm run of the integration tests.
If something is not working I have been comparing the manifests after these runs to
find differences.
This allows us to target browser based tests with the regex:
go test -v -race -count 1 -timeout 0 ./test/integration -run '/_Browser'
New tests that call browsertest.Open will automatically be forced to
follow this convention.
Signed-off-by: Monis Khan <mok@vmware.com>
When the POST to the CLI's localhost callback endpoint results in a
non-2XX status code, then treat that as a failed login attempt and
automatically show the manual copy/paste UI.
Just in case some future browser change sends some new kind of request
to our CLI, just ignore them by returning StatusMethodNotAllowed and
continuing to listen.
When the test was going to fail, a goroutine would accidentally block
on writing to an unbuffered channel, and the spawnTestGoroutine helper
would wait for that goroutine to end on cleanup, causing the test to
hang forever while it was trying to fail.
This is to support the new changes in Google Chrome v98 which now
performs CORS preflight requests for the Javascript form submission
on the Supervisor's login page, even though the form is being submitted
to a localhost listener.
reccommend using install-pinniped-concierge-crds.yaml, then
install-pinniped-concierge-resources.yaml.
Previously we recommended install-pinniped-concierge-crds (a subset),
then install-pinniped-concierge (everything concierge related, including
the crds). This works fine for install, but not uninstall. Instead we
should use a separate yaml file that contains everything in
install-pinniped-concierge but *not* in install-pinniped-concierge-crds.
We have been generating this file in CI since a5ced4286b6febc7474b7adee34eeb1b62ec82b7
but we haven't released since then so we haven't been able to recommend
its use.
Fosite v0.42.0 introduced a new RevokeRefreshTokenMaybeGracePeriod()
interface function. Updated our code to support this change. We didn't
support grace periods on refresh tokens before, so implemented it by
making the new RevokeRefreshTokenMaybeGracePeriod() method just call
the old RevokeRefreshToken() method, therefore keeping our old behavior.
This change allows configuration of the http and https listeners
used by the supervisor.
TCP (IPv4 and IPv6 with any interface and port) and Unix domain
socket based listeners are supported. Listeners may also be
disabled.
Binding the http listener to TCP addresses other than 127.0.0.1 or
::1 is deprecated.
The deployment now uses https health checks. The supervisor is
always able to complete a TLS connection with the use of a bootstrap
certificate that is signed by an in-memory certificate authority.
To support sidecar containers used by service meshes, Unix domain
socket based listeners include ACLs that allow writes to the socket
file from any runAsUser specified in the pod's containers.
Signed-off-by: Monis Khan <mok@vmware.com>
Also refactor oidc downstreamsessiondata code to be shared between
callback handler and auth handler.
Signed-off-by: Ryan Richard <richardry@vmware.com>
When the LDAP and AD IDP watcher controllers encountered an update error
while trying to update the status conditions of the IDP resources, then
they would drop the computed desired new value of the condition on the
ground. Next time the controller ran it would not try to update the
condition again because it wants to use the cached settings and had
already forgotten the desired new value of the condition computed during
the previous run of the controller. This would leave the outdated value
of the condition on the IDP resource.
This bug would manifest in CI as random failures in which the expected
condition message and the actual condition message would refer to
different versions numbers of the bind secret. The actual condition
message would refer to an older version of the bind secret because the
update failed and then the new desired message got dropped on the
ground.
This commit changes the in-memory caching strategy to also cache the
computed condition messages, allowing the conditions to be updated
on the IDP resource during future calls to Sync() in the case of a
failed update.
Ran:
go get -u ./... && go mod tidy
Pinned all go.opentelemetry.io deps to match k/k.
This is needed to make the go get command work.
Signed-off-by: Monis Khan <mok@vmware.com>
- Make everything private
- Drop unused AuthTime field
- Use %q format string instead of "%s"
- Only rely on GetRawAttributeValues in AttributeUnchangedSinceLogin
Signed-off-by: Monis Khan <mok@vmware.com>
For password based login on the CLI (i.e. no browser), this change
relaxes the response code check to allow for any redirect code
handled by the Go standard library. In the future, we can drop the
rewriteStatusSeeOtherToStatusFoundForBrowserless logic from the
server side code.
Signed-off-by: Monis Khan <mok@vmware.com>
Highlights from this dep bump:
1. Made a copy of the v0.4.0 github.com/go-logr/stdr implementation
for use in tests. We must bump this dep as Kube code uses a
newer version now. We would have to rewrite hundreds of test log
assertions without this copy.
2. Use github.com/felixge/httpsnoop to undo the changes made by
ory/fosite#636 for CLI based login flows. This is required for
backwards compatibility with older versions of our CLI. A
separate change after this will update the CLI to be more
flexible (it is purposefully not part of this change to confirm
that we did not break anything). For all browser login flows, we
now redirect using http.StatusSeeOther instead of http.StatusFound.
3. Drop plog.RemoveKlogGlobalFlags as klog no longer mutates global
process flags
4. Only bump github.com/ory/x to v0.0.297 instead of the latest
v0.0.321 because v0.0.298+ pulls in a newer version of
go.opentelemetry.io/otel/semconv which breaks k8s.io/apiserver.
We should update k8s.io/apiserver to use the newer code.
5. Migrate all code from k8s.io/apimachinery/pkg/util/clock to
k8s.io/utils/clock and k8s.io/utils/clock/testing
6. Delete testutil.NewDeleteOptionsRecorder and migrate to the new
kubetesting.NewDeleteActionWithOptions
7. Updated ExpectedAuthorizeCodeSessionJSONFromFuzzing caused by
fosite's new rotated_secrets OAuth client field. This new field
is currently not relevant to us as we have no private clients.
Signed-off-by: Monis Khan <mok@vmware.com>
Also refactor the code that decides which types of revocation failures
are worth retrying. Be more selective by only retrying those types of
errors that are likely to be worth retrying.
- Rename the RevokeRefreshToken() function to RevokeToken() and make it
take the token type (refresh or access) as a new parameter.
- This is a prefactor getting ready to support revocation of upstream
access tokens in the garbage collection handler.
This change updates the new TLS integration tests to:
1. Only create the supervisor default TLS serving cert if needed
2. Port forward the node port supervisor service since that is
available in all environments
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the TLS config used by all pinniped components.
There are no configuration knobs associated with this change. Thus
this change tightens our static defaults.
There are four TLS config levels:
1. Secure (TLS 1.3 only)
2. Default (TLS 1.2+ best ciphers that are well supported)
3. Default LDAP (TLS 1.2+ with less good ciphers)
4. Legacy (currently unused, TLS 1.2+ with all non-broken ciphers)
Highlights per component:
1. pinniped CLI
- uses "secure" config against KAS
- uses "default" for all other connections
2. concierge
- uses "secure" config as an aggregated API server
- uses "default" config as a impersonation proxy API server
- uses "secure" config against KAS
- uses "default" config for JWT authenticater (mostly, see code)
- no changes to webhook authenticater (see code)
3. supervisor
- uses "default" config as a server
- uses "secure" config against KAS
- uses "default" config against OIDC IDPs
- uses "default LDAP" config against LDAP IDPs
Signed-off-by: Monis Khan <mok@vmware.com>
- Used to determine on which port the impersonation proxy will bind
- Defaults to 8444, which is the old hard-coded port value
- Allow the port number to be configured to any value within the
range 1024 to 65535
- This commit does not include adding new config knobs to the ytt
values file, so while it is possible to change this port without
needing to recompile, it is not convenient
- Allow the port number to be configured to any value within the
range 1024 to 65535
- This commit does not include adding new config knobs to the ytt
values file, so while it is possible to change this port without
needing to recompile, it is not convenient
- pull construction of authenticators.Response into searchAndBindUser
- remove information about the identity provider in the error that gets
returned to users. Put it in debug instead, where it may show up in
logs.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
- changed to use custom authenticators.Response rather than the k8s one
that doesn't include space for a DN
- Added more checking for correct idp type in token handler
- small style changes
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This stores the user DN in the session data upon login and checks that
the entry still exists upon refresh. It doesn't check anything
else about the entry yet.
Use "..." instead of "main.go" as the build target since we may have
extra files in the future.
https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies
-trimpath
remove all file system paths from the resulting executable.
Instead of absolute file system paths, the recorded file names
will begin with either "go" (for the standard library),
or a module path@version (when using modules),
or a plain import path (when using GOPATH).
Signed-off-by: Monis Khan <mok@vmware.com>
- Discover the revocation endpoint of the upstream provider in
oidc_upstream_watcher.go and save it into the cache for future use
by the garbage collector controller
- Adds RevokeRefreshToken to UpstreamOIDCIdentityProviderI
- Implements the production version of RevokeRefreshToken
- Implements test doubles for RevokeRefreshToken for future use in
garbage collector's unit tests
- Prefactors the crud and session storage types for future use in the
garbage collector controller
- See remaining TODOs in garbage_collector.go
Otherwise, the CA and proxy settings will not be used for the call
to the upstream token endpoint while performing the refresh. This
mistake was exposed by the TestSupervisorLogin integration test, so
it has test coverage.
- If the upstream refresh fails, then fail the downstream refresh
- If the upstream refresh returns an ID token, then validate it (we
use its claims in the future, but not in this commit)
- If the upstream refresh returns a new refresh token, then save it
into the user's session in storage
- Pass the provider cache into the token handler so it can use the
cached providers to perform upstream refreshes
- Handle unexpected errors in the token handler where the user's session
does not contain the expected data. These should not be possible
in practice unless someone is manually editing the storage, but
handle them anyway just to be safe.
- Refactor to share the refresh code between the CLI and the token
endpoint by moving it into the UpstreamOIDCIdentityProviderI
interface, since the token endpoint needed it to be part of that
interface anyway
- Requiring refresh tokens to be returned from upstream OIDC idps
- Storing refresh tokens (for oidc) and idp information (for all idps) in custom session data during authentication
- Don't pass access=offline all the time
- throw an error when prompt=none because the spec says we can't ignore
it
- ignore the other prompt params
Signed-off-by: Ryan Richard <richardry@vmware.com>
This will allow us to store custom data inside the fosite session
storage for all downstream OIDC sessions.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This was wrong, since you don't need a LoadBalancer to run the
impersonation proxy if you specify spec.service.type = "None" or
"ClusterIP" on the CredentialIssuer.
This change fixes a copy paste error that led to the impersonation
proxy signer CA being rotated based on the configuration of the
rotation of the aggregated API serving certificate. This would lead
to occasional "Unauthorized" flakes in our CI environments that
rotate the serving certificate at a frequent interval.
Updated the certs_expirer controller logs to be more detailed.
Updated CA common names to be more specific (this does not update
any previously generated CAs).
Signed-off-by: Monis Khan <mok@vmware.com>
Updated Roadmap to reflect the work on Supervisor token refresh for OIDC and LDAP/AD. Also changed ordering on Multiple IDP Support as we are seeing more user interest for this feature.
At debug level:
upstreamoidc.go:213] "claims from ID token and userinfo"
providerName="oidc"
keys=[at_hash aud email email_verified exp iat iss sub]
At all level:
upstreamoidc.go:207] "claims from ID token and userinfo"
providerName="oidc"
claims="{\"at_hash\":\"C55S-BgnHTmr2_TNf...hYmVhYWESBWxvY2Fs\"}"
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the kube cert agent to a middle ground behavior
that balances leader election gating with how quickly we load the
signer.
If the agent labels have not changed, we will attempt to load the
signer even if we cannot roll out the latest version of the kube
cert agent deployment.
This gives us the best behavior - we do not have controllers
fighting over the state of the deployment and we still get the
signer loaded quickly.
We will have a minute of downtime when the kube cert agent deployment
changes because the new pods will have to wait to become a leader
and for the new deployment to rollout the new pods. We would need
to have a per pod deployment if we want to avoid that downtime (but
this would come at the cost of startup time and would require
coordination with the kubelet in regards to pod readiness).
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates our certificate code to use the same 5 minute
backdate that is used by the Kubernetes controller manager. This
helps to account for clock skews between the API servers and the
kubelets that are running the pinniped pods. While this backdating
reflects a large percentage of the lifetime of our short lived
certificates (100% for the 5 minute client certificates), even a 10
minute irrevocable client certificate is within our limits. When
we move to the CSR based short lived certificates, they will always
have at least a 15 minute lifetime (5 minute backdating plus 10 minute
minimum valid duration).
Signed-off-by: Monis Khan <mok@vmware.com>
CertificatesV1beta1 was removed in Kube 1.22, so the tests cannot
blindly rely on it anymore. Use CertificatesV1 whenever the server
reports that is available, and otherwise use the old
CertificatesV1beta1.
Note that CertificatesV1 was introduced in Kube 1.19.
This commit makes the following changes to the kube cert agent tests:
1. Informers are synced on start using the controllerinit code
2. Deployment client and informer are synced per controller sync loop
3. Controller sync loop exits after two consistent errors
4. Use assert instead of require to avoid ending the test early
Signed-off-by: Monis Khan <mok@vmware.com>
This makes it so that our service selector will match exactly the
YAML we specify instead of including an extra "kapp.k14s.io/app" key.
This will take us closer to the standard kubectl behavior which is
desirable since we want to avoid future bugs that only manifest when
kapp is not used.
Signed-off-by: Monis Khan <mok@vmware.com>
This should make it easier for us to to notice if something is wrong
with our service (especially in any future kubectl tests we add).
Signed-off-by: Monis Khan <mok@vmware.com>
Not required, but within the spirit of using the version number.
Since the existing kube cert agent deployment will get deleted anyway
during an upgrade, it shouldn't hurt to change the version number.
New installations will get the new version number on the new kube cert
agent deployment.
Fixes#801. The solution is complicated by the fact that the Selector
field of Deployments is immutable. It would have been easy to just
make the Selectors of the main Concierge Deployment, the Kube cert agent
Deployment, and the various Services use more specific labels, but
that would break upgrades. Instead, we make the Pod template labels and
the Service selectors more specific, because those not immutable, and
then handle the Deployment selectors in a special way.
For the main Concierge and Supervisor Deployments, we cannot change
their selectors, so they remain "app: app_name", and we make other
changes to ensure that only the intended pods are selected. We keep the
original "app" label on those pods and remove the "app" label from the
pods of the Kube cert agent Deployment. By removing it from the Kube
cert agent pods, there is no longer any chance that they will
accidentally get selected by the main Concierge Deployment.
For the Kube cert agent Deployment, we can change the immutable selector
by deleting and recreating the Deployment. The new selector uses only
the unique label that has always been applied to the pods of that
deployment. Upon recreation, these pods no longer have the "app" label,
so they will not be selected by the main Concierge Deployment's
selector.
The selector of all Services have been updated to use new labels to
more specifically target the intended pods. For the Concierge Services,
this will prevent them from accidentally including the Kube cert agent
pods. For the Supervisor Services, we follow the same convention just
to be consistent and to help future-proof the Supervisor app in case it
ever has a second Deployment added to it.
The selector of the auto-created impersonation proxy Service was
also previously using the "app" label. There is no change to this
Service because that label will now select the correct pods, since
the Kube cert agent pods no longer have that label. It would be possible
to update that selector to use the new more specific label, but then we
would need to invent a way to pass that label into the controller, so
it seemed like more work than was justified.
Today is my last day working full time on Pinniped (for now). This change removes me from the MAINTAINERS.md and the website.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
These fields were changed as a minor hardening attempt when we switched to Distroless, but I bungled the field names and we never noticed because Kapp doesn't apply API validations.
This change fixes the field names so they act as was originally intended. We should also follow up with a change that validates all of our installation manifest in CI.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Changes made to both components:
1. Logs are always flushed on process exit
2. Informer cache sync can no longer hang process start up forever
Changes made to concierge:
1. Add pre-shutdown hook that waits for controllers to exit cleanly
2. Informer caches are synced in post-start hook
Changes made to supervisor:
1. Add shutdown code that waits for controllers to exit cleanly
2. Add shutdown code that waits for active connections to become idle
Waiting for controllers to exit cleanly is critical as this allows
the leader election logic to release the lock on exit. This reduces
the time needed for the next leader to be elected.
Signed-off-by: Monis Khan <mok@vmware.com>
The kubelet will send the SIGTERM signal when it wants a process to
exit. After a grace period, it will send the SIGKILL signal to
force the process to terminate. The concierge has always handled
both SIGINT and SIGTERM as indicators for it to gracefully exit
(i.e. stop watches, controllers, etc). This change updates the
supervisor to do the same (previously it only handled SIGINT). This
is required to allow the leader election lock release logic to run.
Otherwise it can take a few minutes for new pods to acquire the
lease since they believe it is already held.
Signed-off-by: Monis Khan <mok@vmware.com>
For clusters where the control plane nodes aren't running a CNI, the
kube-cert-agent pods deployed by concierge cannot be scheduled as they
don't know to use `hostNetwork: true`. This change allows embedding the
host network setting in the Concierge configuration. (by copying it from
the kube-controller-manager pod spec when generating the kube-cert-agent
Deployment)
Also fixed a stray double comma in one of the nearby tests.
Instead of blindly waiting long enough for a disruptive change to
have been observed by the old leader and followers, we instead rely
on the approximation that checkOnlyLeaderCanWrite provides - i.e.
only a single actor believes they are the leader. This does not
account for clients that were in the followers list before and after
the disruptive change, but it serves as a reasonable approximation.
Signed-off-by: Monis Khan <mok@vmware.com>
Those images that are pulled from Dockerhub will cause pull failures
on some test clusters due to Dockerhub rate limiting.
Because we already have some images that we use for testing, and
because those images are already pre-loaded onto our CI clusters
to make the tests faster, use one of those images and always specify
PullIfNotPresent to avoid pulling the image again during the integration
test.
OpenShift has good defaults for these duration fields that we can
use instead of coming up with them ourselves:
e14e06ba8d/pkg/config/leaderelection/leaderelection.go (L87-L109)
Copied here for easy future reference:
// We want to be able to tolerate 60s of kube-apiserver disruption without causing pod restarts.
// We want the graceful lease re-acquisition fairly quick to avoid waits on new deployments and other rollouts.
// We want a single set of guidance for nearly every lease in openshift. If you're special, we'll let you know.
// 1. clock skew tolerance is leaseDuration-renewDeadline == 30s
// 2. kube-apiserver downtime tolerance is == 78s
// lastRetry=floor(renewDeadline/retryPeriod)*retryPeriod == 104
// downtimeTolerance = lastRetry-retryPeriod == 78s
// 3. worst non-graceful lease acquisition is leaseDuration+retryPeriod == 163s
// 4. worst graceful lease acquisition is retryPeriod == 26s
if ret.LeaseDuration.Duration == 0 {
ret.LeaseDuration.Duration = 137 * time.Second
}
if ret.RenewDeadline.Duration == 0 {
// this gives 107/26=4 retries and allows for 137-107=30 seconds of clock skew
// if the kube-apiserver is unavailable for 60s starting just before t=26 (the first renew),
// then we will retry on 26s intervals until t=104 (kube-apiserver came back up at 86), and there will
// be 33 seconds of extra time before the lease is lost.
ret.RenewDeadline.Duration = 107 * time.Second
}
if ret.RetryPeriod.Duration == 0 {
ret.RetryPeriod.Duration = 26 * time.Second
}
Signed-off-by: Monis Khan <mok@vmware.com>
This change fixes a small race condition that occurred when the
current leader failed to renew its lease. Before this change, the
leader would first release the lease via the Kube API and then would
update its in-memory status to reflect that change. Now those
events occur in the reverse (i.e. correct) order.
Signed-off-by: Monis Khan <mok@vmware.com>
Even though a client may hold the leader election lock in the Kube
lease API, that does not mean it has had a chance to update its
internal state to reflect that. Thus we retry the checks in
checkOnlyLeaderCanWrite a few times to allow the client to catch up.
Signed-off-by: Monis Khan <mok@vmware.com>
We no longer have a transitive dependency on this older repository, so we don't need the replace directive anymore.
There is a new fork of this that we should move to (https://github.com/golang-jwt/jwt), but we can't easily do that until a couple of our direct dependencies upgrade.
This is a revert of d162cb9adf.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Change list of attributeParsingOverrides to a map
- Add unit test for sAMAccountName as group name without the override
- Change some comments in the the type definition.
The Kube API server code that we use will cast inputs in an attempt
to see if they implement optional interfaces. This change adds a
simple wrapper struct to prevent such casts from causing us any
issues.
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the default NO_PROXY for the supervisor to not
proxy requests to the Kubernetes API and other Kubernetes endpoints
such as Kubernetes services.
It also adds https_proxy and no_proxy settings for the concierge
with the same default.
Signed-off-by: Monis Khan <mok@vmware.com>
In the upstream dynamiccertificates package, we rely on two pieces
of code:
1. DynamicServingCertificateController.newTLSContent which calls
- clientCA.CurrentCABundleContent
- servingCert.CurrentCertKeyContent
2. unionCAContent.VerifyOptions which calls
- unionCAContent.CurrentCABundleContent
This results in calls to our tlsServingCertDynamicCertProvider and
impersonationSigningCertProvider. If we Unset these providers, we
subtly break these consumers. At best this results in test slowness
and flakes while we wait for reconcile loops to converge. At worst,
it results in actual errors during runtime. For example, we
previously would Unset the impersonationSigningCertProvider on any
sync loop error (even a transient one caused by a network blip or
a conflict between writes from different replicas of the concierge).
This would cause us to transiently fail to issue new certificates
from the token credential require API. It would also cause us to
transiently fail to authenticate previously issued client certs
(which results in occasional Unauthorized errors in CI).
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the pinniped CLI entrypoint to prevent browser
processes that we spawn from polluting our std out stream.
For example, chrome will print the following message to std out:
Opening in existing browser session.
Which leads to the following incomprehensible error message from
kubectl:
Unable to connect to the server: getting credentials:
decoding stdout: couldn't get version/kind; json parse error:
json: cannot unmarshal string into Go value of type struct
{ APIVersion string "json:\"apiVersion,omitempty\"";
Kind string "json:\"kind,omitempty\"" }
This would only occur on the initial login when we opened the
browser. Since credentials would be cached afterwards, kubectl
would work as expected for future invocations as no browser was
opened.
I could not think of a good way to actually test this change. There
is a clear gap in our integration tests - we never actually launch a
browser in the exact same way a user does - we instead open a chrome
driver at the login URL as a subprocess of the integration test
binary and not the pinniped CLI. Thus even if the chrome driver was
writing to std out, we would not notice any issues.
It is also unclear if there is a good way to prevent future related
bugs since std out is global to the process.
Signed-off-by: Monis Khan <mok@vmware.com>
After merging the new Kube 1.22 ExecCredential changes from main into
this feature branch, some of the new units test on this feature branch
needed to be update to account for the new ExecCredential "interactive"
field.
- Add `AllowPasswordGrant` boolean field to OIDCIdentityProvider's spec
- The oidc upstream watcher controller copies the value of
`AllowPasswordGrant` into the configuration of the cached provider
- Add password grant to the UpstreamOIDCIdentityProviderI interface
which is implemented by the cached provider instance for use in the
authorization endpoint
- Enhance the IDP discovery endpoint to return the supported "flows"
for each IDP ("cli_password" and/or "browser_authcode")
- Enhance `pinniped get kubeconfig` to help the user choose the desired
flow for the selected IDP, and to write the flow into the resulting
kubeconfg
- Enhance `pinniped login oidc` to have a flow flag to tell it which
client-side flow it should use for auth (CLI-based or browser-based)
- In the Dex config, allow the resource owner password grant, which Dex
implements to also return ID tokens, for use in integration tests
- Enhance the authorize endpoint to perform password grant when
requested by the incoming headers. This commit does not include unit
tests for the enhancements to the authorize endpoint, which will come
in the next commit
- Extract some shared helpers from the callback endpoint to share the
code with the authorize endpoint
- Add new integration tests
At a high level, it switches us to a distroless base container image, but that also includes several related bits:
- Add a writable /tmp but make the rest of our filesystems read-only at runtime.
- Condense our main server binaries into a single pinniped-server binary. This saves a bunch of space in
the image due to duplicated library code. The correct behavior is dispatched based on `os.Args[0]`, and
the `pinniped-server` binary is symlinked to `pinniped-concierge` and `pinniped-supervisor`.
- Strip debug symbols from our binaries. These aren't really useful in a distroless image anyway and all the
normal stuff you'd expect to work, such as stack traces, still does.
- Add a separate `pinniped-concierge-kube-cert-agent` binary with "sleep" and "print" functionality instead of
using builtin /bin/sleep and /bin/cat for the kube-cert-agent. This is split from the main server binary
because the loading/init time of the main server binary was too large for the tiny resource footprint we
established in our kube-cert-agent PodSpec. Using a separate binary eliminates this issue and the extra
binary adds only around 1.5MiB of image size.
- Switch the kube-cert-agent code to use a JSON `{"tls.crt": "<b64 cert>", "tls.key": "<b64 key>"}` format.
This is more robust to unexpected input formatting than the old code, which simply concatenated the files
with some extra newlines and split on whitespace.
- Update integration tests that made now-invalid assumptions about the `pinniped-server` image.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We accidentally missed this in the v0.10.0 release process. The new YAML field here should make it easier to automate this step, which seems like a really good idea.
This may be a temporary fix. It switches the manual auth code prompt to use `promptForValue()` instead of `promptForSecret()`. The `promptForSecret()` function no longer supports cancellation (the v0.9.2 behavior) and the method of cancelling in `promptForValue()` is now based on running the blocking read in a background goroutine, which is allowed to block forever or leak (which is not important for our CLI use case).
This means that the authorization code is now visible in the user's terminal, but this is really not a big deal because of PKCE and the limited lifetime of an auth code.
The main goroutine now correctly waits for the "manual prompt" goroutine to clean up, which now includes printing the extra newline that would normally have been entered by the user in the manual flow.
The text of the manual login prompt is updated to be more concise and less scary (don't use the word "fail").
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Remove all the "latest" links and replace them with our new shortcode so they point at the latest release in a more explicit way.
This also eliminates one of the sections in our Concierge and Supervisor install guides, since you're always installing a specific version.
- Provide instructions for installing with both kapp (one step) and kubectl (two steps for the Concierge).
- Minor wording changes. Mainly we are now a bit less verbose about reminding people they can choose a different version (once per page instead of in each step).
- When we give an example `kapp deploy` command, don't suggest `--yes` and `--diff-changes`.
Users can still use these but it seems overly verbose for an example command.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test is asynchronously waiting for the controller to do something, and in some of our test environments it will take a bit longer than we'd previously allowed.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Prior to this fix, this controller did not correctly react to changes to the ClusterIP service. It would still eventually react with a long delay due to our 5 minute resync interval.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This change fixes a race that can occur because we have multiple
writers with no leader election lock.
1. TestAPIServingCertificateAutoCreationAndRotation/automatic
expires the current serving certificate
2. CertsExpirerController 1 deletes expired serving certificate
3. CertsExpirerController 2 starts deletion of expired serving
certificate but has not done so yet
4. CertsManagerController 1 creates new serving certificate
5. TestAPIServingCertificateAutoCreationAndRotation/automatic
records the new serving certificate
6. CertsExpirerController 2 finishes deletion, and thus deletes the
newly created serving certificate instead of the old one
7. CertsManagerController 2 creates new serving certificate
8. TestAPIServingCertificateAutoCreationAndRotation/automatic keeps
running and eventually times out because it is expecting the
serving certificate created by CertsManagerController 2 to match
the value it recorded from CertsManagerController 1 (which will
never happen since that certificate was incorrectly deleted).
Signed-off-by: Monis Khan <mok@vmware.com>
This functioned fine, but did not have the intended visual appearance when it came to how the text of the auth code wrapped inside the copy button in the manual flow.
The new styling behaves correctly on at least Chrome, Firefox, and Safari on macOS.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
It turns out that `syscall.Stdin` is of type `int` on Linux and macOS, but not on Windows (it's `syscall.Handle`). This should now be portable and do all the require type casting on every platform.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Add Dex to the prerequisites and add a note that to query for the groups
scope the user must set the organizations Dex should search against.
Otherwise the groups claim would be empty. This is because of the format
group claims are represented, i.e. "org:team".
Signed-off-by: Radoslav Dimitrov <dimitrovr@vmware.com>
For CLI-based auth, such as with LDAP upstream identity providers, the
user may use these environment variables to avoid getting interactively
prompted for username and password.
The following guide describes the process of configuring Supervisor
with Dex and identify users through their Github account. Issue #415
Signed-off-by: Radoslav Dimitrov <dimitrovr@vmware.com>
this will hopefully fix some flakes where aws provisioned a host for the
load balancer but the tests weren't able to resolve it.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
TestAgentController really runs the controller and evaluates multiple
calls to the controller's Sync with real informers caching updates.
There is a large amount of non-determinism in this unit test, and it
does not always behave the same way. Because it makes assertions about
the specific errors that should be returned by Sync, it was not
accounting for some errors that are only returned by Sync once in a
while depending on the exact (unpredictable) order of operations.
This commit doesn't fix the non-determinism in the test, but rather
tries to work around it by also allowing other (undesired but
inevitable) error messages to appear in the list of actual error
messages returned by the calls to the Sync function.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
Previously, the ytt install docs suggested that you use ytt templates
from the HEAD of main with the container image from the latest public
release, which could result in a mismatch.
It seems like page.ClearCookies() only clears cookies for the current
domain, so there doesn't seem to be a function to clear all browser
cookies. Instead, we'll just start a whole new browser each test.
They start fast enough that it shouldn't be a problem.
Our actual CLI code behaved correctly, but this test made some invalid assumptions about the "upstream" IDP we're testing. It assumed that the upstream didn't support `response_mode=form_post`, but Okta does. This means that when we end up on the localhost callback page, there are no URL query parameters.
Adjusting this regex makes the test pass as expected.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
I found that there are some situations with `response_mode=form_post` where Chrome will open additional speculative TCP connections. These connections will be idle so they block server shutdown until the (previously 5s) timeout. Lowering this to 500ms should be safe and makes any added latency at login much less noticeable.
More information about Chrome's TCP-level behavior here: https://bugs.chromium.org/p/chromium/issues/detail?id=116982#c5
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Using the same fake TTY trick we used to test LDAP login, this new subtest runs through the "manual"/"jump box" login flow. It runs the login with a `--skip-listen` flag set, causing the CLI to skip opening the localhost listener. We can then wait for the login URL to be printed, visit it with the browser and log in, and finally simulate "manually" copying the auth code from the browser and entering it into the waiting CLI prompt.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This flag is (for now) meant only to facilitate end-to-end testing, allowing us to force the "manual" login flow. If it ends up being useful we can un-hide it, but this seemed like the safest option to start with.
There is also a corresponding `--oidc-skip-listen` on the `pinniped get kubeconfig` command.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
For some reason our headless Chrome test setup behaves slightly differently on Linux and macOS hosts. On Linux, the emoji characters are not recognized as valid text, so they are URL encoded. This change updates the test to cope with both cases correctly.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This adds a new login flow that allows manually pasting the authorization code instead of receiving a browser-based callback.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is a more restrictive library interface that more closely matches the use cases of our new form_post login flow.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This allows the prompts to be cancelled, which we need to be able to do in the case where we prompt for a manually-pasted auth code but the automatic callback succeeds.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is a new pacakge internal/oidc/provider/formposthtml containing a number of static files embedded using the relatively recent Go "//go:embed" functionality introduced in Go 1.16 (https://blog.golang.org/go1.16).
The Javascript and CSS files are minifiied and injected to make a single self-contained HTML response. There is a special Content-Security-Policy helper to calculate hash-based script-src and style-src rules.
This new code is covered by a new integration test that exercises the JS/HTML functionality in a real browser outside of the rest of the Supervisor.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Our Supervisor callback handler now needs to load JS and CSS from the provider endpoint, and this JS needs to make a `fetch()` call across origins (to post the form to the CLI callback). This requires a custom Content-Security-Policy compared to other pages we render.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test would occasionally flake for me when running locally. This change moves more of the assertions into the "eventually" loop, so they can temporarily fail as long as they converge on the expected values.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test did not tolerate this connection failing, which can happen for any number of flaky networking-related reasons. This change moves the connection setup into an "eventually" retry loop so it's allowed to fail temporarily as long as it eventually connects.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This CredentialIssuer field is called `spec.impersonationProxy.service.type`, not `spec.impersonationProxy.service.mode`.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
TestSimultaneousLDAPRequestsOnSingleProvider proved to be unreliable
on AKS due to some kind of kubectl port-forward issue, so only
run the LDAP client's integration tests on Kind. They are testing
the integration between the client code and the OpenLDAP test server,
not testing anything about Kubernetes, so running only on Kind should
give us sufficient test coverage.
After noticing that the upstream OIDC discovery calls can hang
indefinitely, I had tried to impose a one minute timeout on them
by giving them a timeout context. However, I hadn't noticed that the
context also gets passed into the JWKS fetching object, which gets
added to our cache and used later. Therefore the timeout context
was added to the cache and timed out while sitting in the cache,
causing later JWKS fetchers to fail.
This commit is trying again to impose a reasonable timeout on these
discovery and JWKS calls, but this time by using http.Client's Timeout
field, which is documented to be a timeout for *each* request/response
cycle, so hopefully this is a more appropriate way to impose a timeout
for this use case. The http.Client instance ends up in the cache on
the JWKS fetcher object, so the timeout should apply to each JWKS
request as well.
Requests that can hang forever are effectively a server-side resource
leak, which could theoretically be taken advantage of in a denial of
service attempt, so it would be nice to avoid having them.
- Add new optional ytt params for the Supervisor deployment.
- When the Supervisor is making calls to an upstream OIDC provider,
use these variables if they were provided.
- These settings are integration tested in the main CI pipeline by
sometimes setting them on deployments in certain cases, and then
letting the existing integration tests (e.g. TestE2EFullIntegration)
provide the coverage, so there are no explicit changes to the
integration tests themselves in this commit.
- this allows the oidc upsream watcher to honor the
HTTP_PROXY,HTTPS_PROXY,NO_PROXY environment variables
Co-authored-by: Christian Ang <angc@vmware.com>
We want the value of time.Now() to be calculated before the call to
IssueClientCertPEM to prevent the ExpirationTimestamp from being
later than the notAfter timestamp on the issued certificate.
Signed-off-by: Monis Khan <mok@vmware.com>
This fixes some rare test flakes caused by a data race inherent in the way we use `assert.Eventually()` with extra variables for followup assertions. This function is tricky to use correctly because it runs the passed function in a separate goroutine, and you have no guarantee that any shared variables are in a coherent state when the `assert.Eventually()` call returns. Even if you add manual mutexes, it's tricky to get the semantics right. This has been a recurring pain point and the cause of several test flakes.
This change introduces a new `library.RequireEventually()` that works by internally constructing a per-loop `*require.Assertions` and running everything on a single goroutine (using `wait.PollImmediate()`). This makes it very easy to write eventual assertions.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Before this change, we used the `fosite.DefaultOpenIDConnectClient{}` struct, which implements the `fosite.Client` and `fosite.OpenIDConnectClient` interfaces. For a future change, we also need to implement some additional optional interfaces, so we can no longer use the provided default types. Instead, we now use a custom `clientregistry.Client{}` struct, which implements all the requisite interfaces and can be extended to handle the new functionality (in a future change).
There is also a new `clientregistry.StaticRegistry{}` struct, which implements the `fosite.ClientManager` and looks up our single static client. We could potentially extend this in the future with a registry backed by Kubernetes API, for example.
This should be 100% refactor, with no user-observable change.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This change updates the impersonator to always authorize every
request instead of relying on the Kuberentes API server to perform
the check on the impersonated request. This protects us from
scenarios where we fail to correctly impersonate the user due to
some bug in our proxy logic. We still rely completely on the API
server to perform admission checks on the impersonated requests.
Signed-off-by: Monis Khan <mok@vmware.com>
This change updates the impersonation proxy code to run as a
distinct service account that only has permission to impersonate
identities. Thus any future vulnerability that causes the
impersonation headers to be dropped will fail closed instead of
escalating to the concierge's default service account which has
significantly more permissions.
Signed-off-by: Monis Khan <mok@vmware.com>
WithImpersonation already deletes impersonation headers and has done
so since the early days:
https://github.com/kubernetes/kubernetes/pull/36769
ensureNoImpersonationHeaders will still reject any request that has
impersonation headers set so we will always fail closed.
Signed-off-by: Monis Khan <mok@vmware.com>
When anonymous authentication is disabled, the impersonation proxy
will no longer authenticate anonymous requests other than calls to
the token credential request API (this API is used to retrieve
credentials and thus must be accessed anonymously).
Signed-off-by: Benjamin A. Petersen <ben@benjaminapetersen.me>
Signed-off-by: Monis Khan <mok@vmware.com>
There was nothing to guarantee that _all_ Supervisor pods would be ready to handle this request. We saw a rare test flake where the LDAPIdentityProvider was marked as ready but one of the Supervisor pods didn't have it loaded yet and returned an HTTP 422 error (`Unprocessable Entity: No upstream providers are configured`).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The `require.Eventually()` function runs the body of the check in a separate goroutine, so it's not safe to use other `require` assertions as we did here. Our `library.RequireEventuallyWithoutError()` function does not spawn a goroutine, so it's safer to use here.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
When a CredentialIssuer is switched from one service type to another (or switched to disabled mode), the `impersonatorconfig` controller will delete the previous Service, if any. Normally one Concierge pod will succeed to delete this initially and any other pods will see a NotFound error.
Before this change, the NotFound would bubble up and cause the strategy to enter a ErrorDuringSetup status until the next reconcile loop. We now handle this case without reporting an error.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This typo wasn't caught in testing because 1) the Kubernetes API ignores the unknown field and 2) the `type` field defaults to `LoadBalancer` anyway, so things behave as expected.
Even though this doesn't cause any large problems, it's quite confusing.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The LastUpdateTime is no longer updated on every resync. It only changes if the underlying status has changed, so that it effectively shows when the transition happened.
This change happened in ab750f48aa, but we missed this test. It only fails when it has been more than ten minutes since the CredentialIssuer transitioned into a healthy state, but that can happen in our long-running CI environments.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test felt overly complex and some of the cleanup logic wasn't 100% correct (it didn't clean up in all cases).
The new code is essentially the same flow but hopefully easier to read.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We had this one test that mutated the CredentialIssuer, which could cause the impersonation proxy to blip on one or both of the running concierge pods. This would sometimes break other concurrently running tests.
Instead, this bit of code is split into a separate non-concurrent test.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This required a weird hack because some of the Fosite tests (or a transitive dependency of them) depends on a newer version of gRPC that's incompatible with the Kubernetes runtime version we use. It wasn't as simple as just replacing the gRPC module with an older version, because in the latest versions of gRPC, they split out the "examples" packages into their own module. This new module name doesn't exist at the old version.
Ultimately, the workaround was to make a fake "examples" module locally. This module can be empty because we never actually depend on that code (it's only used in transitive dependency tests).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
These are tricky because a real load balancer controller (e.g., on GKE) will overwrite and set NodePort, so we can't blindly set the desired state of this fields.
For now, we will just skip reconciling these. In the future, we could be more clever about merging them together with the current state.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
If the only thing that has changed about a strategy is the LastUpdated timestamp, then we should not update the object.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This is to allow the use of binary LDAP entry attributes as the UID.
For example, a user might like to configure AD’s objectGUID or maybe
objectSid attributes as the UID attribute.
This negatively impacts the readability of the UID when it did not come
from a binary value, but we're considering this an okay trade-off to
keep things simple for now. In the future, we may offer more
customizable encoding options for binary attributes.
These UIDs are currently only used in the downstream OIDC `sub` claim.
They do not effect the user's identity on the Kubernetes cluster,
which is only based on their mapped username and group memberships from
the upstream identity provider. We are not currently supporting any
special encoding for those username and group name LDAP attributes, so
their values in the LDAP entry must be ASCII or UTF-8 in order for them
to be interpreted correctly.
This test setup should tolerate when the TokenCredentialRequest API isn't quite ready to authenticate the user or issue a cert.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This check is no longer valid, because there can be ephemeral, recoverable errors that show as ErrorDuringSetup.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This updates the code to use a different mechanism for driving desired state:
- Read existing object
- If it does not exist, create desired object
- If it does exist, make a copy and set all the desired fields
- Do a deepequal to see if an update is necessary.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We also no longer need an initial event, since we don't do anything unless the CredentialIssuer exists, so we'll always be triggered at the appropriate time.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This type of field appears in more than one of our APIs, so this package will provide a single source of truth for validating and parsing inputs.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The documentation was a bit confusing before, and it was easy to accidentally install a very outdated version if you weren't reading carefully.
We could consider writing a post-release CI job to update these references automatically (perhaps using a Hugo macro?), but for now a manual update seems sufficient.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Automatically try to fall back to using StartTLS when using TLS
doesn't work. Only complain when both don't work.
- Remember (in-memory) which one worked and keeping using that one
in the future (unless the pod restarts).
- This enhances our LDAP client code to make it possible to optionally
dial an LDAP server without TLS and then use StartTLS to upgrade
the connection to TLS.
- The controller for LDAPIdentityProviders is not using this option
yet. That will come in a future commit.
Previously, our controllers would automatically create a CredentialIssuer with a singleton name. The helpers we had for this also used "raw" client access and did not take advantage of the informer cache pattern.
With this change, the CredentialIssuer is always created at install time in the ytt YAML. The controllers now only update the existing CredentialIssuer status, and they do so using the informer cache as much as possible.
This change is targeted at only the kubecertagent controller to start. The impersonatorconfig controller will be updated in a following PR along with other changes.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- For testing purposes, we would like to ensure that when we connect
to the LDAP server we cannot accidentally avoid using TLS or StartTLS.
- Also enabled the openldap `memberOf` overlay in case we want to
support group search using `memberOf` in the future.
- This required changes to the docker.io/bitnami/openldap container
image, so we're using our own fork for now. Will submit a PR to
bitnami/openldap to see if they will accept it (or something similar)
upstream.
Reflect the upstream group membership into the Supervisor's
downstream tokens, so they can be added to the user's
identity on the workload clusters.
LDAP group search is configurable on the
LDAPIdentityProvider resource.
- Note that this adds an extra check of the response, which is that
the issuer string in the response must match issuer of the requested
URL.
- Some of the error messages also changed to match the errors provided
by oidc.NewProvider
See RFC6648 which asks that people stop using `X-` on header names.
Also Matt preferred not mentioning "IDP" in the header name.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This change makes it easier to understand misconfigurations caused
by issuers with extraneous trailing slashes.
Signed-off-by: Mo Khan <mok@vmware.com>
The admin kubeconfigs we have on EKS clusters are a bit different from others, because there is no certificate/key (EKS does not use certificate auth).
This code didn't quite work correctly in that case. The fix is to allow the case where `tlsConfig.GetClientCertificate` is non-nil, but returns a value with no certificates.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
to avoid garbage collection breaking the refresh flow
Also changed the access token lifetime to be 2 minutes instead of 15
since we now have cert caching.
- Use `nickname` claim as an example, which means we only need the `openid` scope.
This is also more stable since emails can change over time.
- Put the OIDCIdentityProvider and Secret into one YAML blob, since they will likely be copy-pasted together anyway.
- Add a separate section for using alternate claims.
- Add a separate section for using a private GitLab instance.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Some minor edits I came across while reviewing this:
- Capitalize "GitLab" the way they do.
- Use `{{< ref "xyz" >}}` references when linking internally. The advantage of these is that they're "type checked" by Hugo when the site is rendered, so we'll know if we ever break one.
- Add links to the GitLab docs about creating an OAuth client. These also cover adding a group-level or instance-wide application.
- Re-wrap the YAML lines to fit a bit more naturally.
- Add a `namespace` to the YAML examples, so they're more likely to work without tweaks.
- Use "gitlab" instead of "my-oidc-identity-provider" as the example name, for clarity.
- Re-word a few small bits. These are 100% subjective but hopefully an improvement?
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The supervisor treats all events the same hence it must use a
singleton queue.
Updated the integration test to remove the data race caused by
calling methods on testing.T outside of the main test go routine.
Signed-off-by: Monis Khan <mok@vmware.com>
Followup on the previous comment to split apart the ServiceAccount of the kube-cert-agent and the main concierge pod. This is a bit cleaner and ensures that in testing our main Concierge pod never requires any privileged permissions.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Since 0dfb3e95c5, we no longer directly create the kube-cert-agent Pod, so our "use"
permission on PodSecurityPolicies no longer has the intended effect. Since the deployments controller is now the
one creating pods for us, we need to get the permission on the PodSpec of the target pod instead, which we do somewhat
simply by using the same service account as the main Concierge pods.
We still set `automountServiceAccountToken: false`, so this should not actually give any useful permissions to the
agent pod when running.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- And perform auto-discovery when the flags are not set
- Several TODOs remain which will be addressed in the next commit
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This change updates the impersonator logic to pass through requests
that authenticated via a bearer token that asserts a UID. This
allows us to support service account tokens (as well as any other
form of token based authentication).
Signed-off-by: Monis Khan <mok@vmware.com>
- The linux base64 command is different, so avoid using it at all.
On linux the default is to split the output into multiple lines,
which messes up the integration-test-env file. The flag used to
disable this behavior on linux ("-w0") does not exist on MacOS's
base64.
- On debian linux, the latest version of Docker from apt-get still
requires DOCKER_BUILDKIT=1 or else it barfs.
This controller is responsible for cleaning up kube-cert-agent pods that were deployed by previous versions.
They are easily identified because they use a different `kube-cert-agent.pinniped.dev` label compared to the new agent pods (`true` vs. `v2`).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is a relatively large rewrite of much of the kube-cert-agent controllers. Instead of managing raw Pod objects, they now create a single Deployment and let the builtin k8s controller handle it from there.
This reduces the amount of code we need and should handle a number of edge cases better, especially those where a Pod becomes "wedged" and needs to be recreated.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Make PINNIPED_TEST_LDAP_LDAPS_CA_BUNDLE optional for integration tests
- When there is no CA bundle provided, be careful to use nil instead of
an empty bundle, because nil means to use the OS defaults
Now that we have the fix from https://github.com/kubernetes/kubernetes/pull/97693, we no longer need these sleeps.
The underlying authenticator initialization is still asynchronous, but should happen within a few milliseconds.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This change updates the impersonator logic to use the delegated
authorizer for all non-rest verbs such as impersonate. This allows
it to correctly perform authorization checks for incoming requests
that set impersonation headers while not performing unnecessary
checks that are already handled by KAS.
The audit layer is enabled to track the original user who made the
request. This information is then included in a reserved extra
field original-user-info.impersonation-proxy.concierge.pinniped.dev
as a JSON blob.
Signed-off-by: Monis Khan <mok@vmware.com>
$PINNIPED_TEST_SUPERVISOR_UPSTREAM_OIDC_ISSUER_CA_BUNDLE was recently
changed to be a base64 encoded value, so this script does not need to
base64 encode the value itself anymore.
- Also some light prefactoring in login.go to make room for LDAP-style
login, which is not implemented yet in this commit. TODOs are added.
- And fix a test pollution problem in login_oidc_test.go where it was
using a real on-disk CLI cache file, so the tests were polluted by
the contents of that file and would sometimes cause each other to
fail.
Also force the LDAP server pod to restart whenever the LDIF file
changes, so whenever you redeploy the tools deployment with a new test
user password the server will be updated.
Unfortunately, Secrets do not seem to have a Generation field, so we
use the ResourceVersion field instead. This means that any change to
the Secret will cause us to retry the connection to the LDAP server,
even if the username and password fields in the Secret were not
changed. Seems like an okay trade-off for this early draft of the
controller compared to a more complex implementation.
This early version of the controller is not intended to act as an
ongoing health check for your upstream LDAP server. It will connect
to the LDAP server to essentially "lint" your configuration once.
It will do it again only when you change your configuration. To account
for transient errors, it will keep trying to connect to the server
until it succeeds once.
This commit does not include looking for changes in the associated bind
user username/password Secret.
Avoid them because they can't be used in GoLand for running integration
tests in the UI, like running in the debugger.
Also adds optional PINNIPED_TEST_TOOLS_NAMESPACE because we need it
on the LDAP feature branch where we are developing the upcoming LDAP
support for the Supervisor.
- Bad usernames and passwords aren't really errors, since they are
based on end-user input.
- Other kinds of authentication failures are caused by bad configuration
so still treat those as errors.
- Empty usernames and passwords are already prevented by our endpoint
handler, but just to be safe make sure they cause errors inside the
authenticator too.
- The unit tests for upstreamldap.Provider need to mock the LDAP server,
so add an integration test which allows us to get fast feedback for
this code against a real LDAP server.
- Automatically wrap the user search filter in parenthesis if it is not
already wrapped in parens.
- More special handling for using "dn" as the username or UID attribute
name.
- Also added some more comments to types_ldapidentityprovider.go.tmpl
- The ldap_upstream_watcher.go controller validates the bind secret and
uses the Conditions to report errors. Shares some condition reporting
logic with its sibling controller oidc_upstream_watcher.go, to the
extent which is convenient without generics in golang.
I don't believe this is used by any tests or docs. I think it was for some initial local testing of the impersonation proxy?
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- When the upstream IDP is an LDAP IDP and the user's LDAP username and
password are received as new custom headers, then authenticate the
user and, if authentication was successful, return a redirect with
an authcode. Handle errors according to the OAuth/OIDC specs.
- Still does not support having multiple upstream IDPs defined at the
same time, which was an existing limitation of this endpoint.
- Does not yet include the actual LDAP authentication, which is
hidden behind an interface from the point of view of auth_handler.go
- Move the oidctestutil package to the testutil directory.
- Add an interface for Fosite storage to avoid a cyclical test
dependency.
- Add GetURL() to the UpstreamLDAPIdentityProviderI interface.
- Extract test helpers to be shared between callback_handler_test.go
and auth_handler_test.go because the authcode and fosite storage
assertions should be identical.
- Backfill Content-Type assertions in callback_handler_test.go.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This isn't strictly necessary because we currently always have the concierge endpoint and CA as CLI flags, but it doesn't hurt and it's better to err on the side of _not_ reusing a cache entry.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We have some nice normalization code in this package to remove expired or otherwise malformed cache entries, but we weren't calling it in the appropriate place.
Added calls to normalize the cache data structure before and after each transaction, and added test cases to ensure that it's being called.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Add some fields to LDAPIdentityProvider that we will need to be able
to search for users during login
- Enhance TestSupervisorLogin to test logging in using an upstream LDAP
identity provider. Part of this new test is skipped for now because
we haven't written the corresponding production code to make it
pass yet.
- Some refactoring and enhancement to env.go and the corresponding env
vars to support the new upstream LDAP provider integration tests.
- Use docker.io/bitnami/openldap for our test LDAP server instead of our
own fork now that they have fixed the bug that we reported.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
The goal here was to start on an integration test to get us closer to the red
test that we want so we can start working on LDAP.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Rename the test/deploy/dex directory to test/deploy/tools
- Rename the dex namespace to tools
- Add a new ytt value called `pinny_ldap_password` for the tools
ytt templates
- This new value is not used on main at this time. We intend to use
it in the forthcoming ldap branch. We're defining it on main so
that the CI scripts can use it across all branches and PRs.
Signed-off-by: Ryan Richard <richardry@vmware.com>
Before this change, the "context", "cluster", and "user" fields in generated kubeconfig YAML were always hardcoded to "pinniped". This could be confusing if you generated many kubeconfigs for different clusters.
After this change, the fields will be copied from their names in the original kubeconfig, suffixed with "-pinniped". This suffix can be overridden by setting the new `--generated-name-suffix` CLI flag.
The goal of this change is that you can distinguish between kubeconfigs generated for different clusters, as well as being able to distinguish between the Pinniped and original (admin) kubeconfigs for a cluster.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
A demo of running the Supervisor and Concierge on
a kind cluster. Can be used to quickly set up an
environment for manual testing.
Also added some missing copyright headers to other
hack scripts.
These commits include security fixes (CVE-2021-3121) for code generated by github.com/gogo/protobuf.
We expect this fix to also land in v1.20.6, but we don't want to wait for it.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test could flake if the load balancer hostname was provisioned but is not yet resolving in DNS from the test process.
The fix is to retry this step for up to 5 minutes.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test could fail when the cluster was under heavy load. This could cause kubectl to emit "Throttling request took [...]" logs that triggered a failure in the test.
The fix is to ignore these innocuous warnings.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We had this code that printed out pod logs when certain tests failed, but it is a bit cumbersome. We're removing it because we added a CI task that exports all pod logs after every CI run, which accomplishes the same thing and provides us a bunch more data.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This allows setting `$PINNIPED_TEST_CLI` to point at an existing `pinniped` CLI binary instead of having the test build one on-the-fly. This is more efficient when you're running the tests across many clusters as we do in CI.
Building the CLI from scratch in our CI environment takes 1.5-2 minutes, so this change should save nearly that much time on every test job.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We've seen some test flakes caused by this test. Some small changes:
- Use a 30s timeout for each iteration of the test loop (so each iteration needs to check or fail more quickly).
- Log a bit more during the checks so we can diagnose what's going on.
- Increase the overall timeout from one minute to five minutes
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- a credential that is understood by -> a credential that can be used to
authenticate to
- This is more neutral to whether its going directly to k8s
or through the impersonation proxy
Instead of using the LongRunningFunc to determine if we can safely
use http2, follow the same logic as the aggregation proxy and only
use http2 when the request is not an upgrade.
Signed-off-by: Monis Khan <mok@vmware.com>
In the case where we are using middleware (e.g., when the api group is
different) in our kubeclient, these error messages have a "...middleware request
for..." bit in the middle.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
When the frontend connection to our proxy is closed, the proxy falls through to
a panic(), which means the HTTP handler goroutine is killed, so we were not
seeing this log statement.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This is probably a good idea regardless, but it also avoids an infinite recursion from IntegrationEnv() -> assertNoRestartsDuringTest() -> NewKubeclient() -> IntegrationEnv() -> ...
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test could flake in some rare scenarios. This change adds a bunch of retries, improves the debugging output if the tests fail, and puts all of the subtests in parallel which saves ~10s on my local machine.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This test has occasionally flaked because it only waited for the APIService GET to finish, but did not wait for the controller to successfully update the target object.
The new code should be more patient and allow the controller up to 10s to perform the expected action.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This new capability describes whether a cluster is expected to allow anonymous requests (most do since k8s 1.6.x, but AKS has it disabled).
This commit also contains new capability YAML files for AKS and EKS, mostly to document publicly how we expect our tests to function in those environments.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
It takes a lot of manual steps to get ready to manually test the
impersonation proxy on a kind cluster, which makes it error prone,
so encapsulate them into a script to make it easier.
At the end of the test, wait for the KubeClusterSigningCertificate
strategy on the CredentialIssuer to go back to being healthy, to avoid
polluting other integration tests which follow this one.
This is the configuration for https://pre-commit.com/, which now also runs golangci-lint using the same version as CI (currently v1.33.0).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We were previously issuing both client certs and server certs with
both extended key usages included. Split the Issue*() methods into
separate methods for issuing server certs versus client certs so
they can have different extended key usages tailored for each use
case.
Also took the opportunity to clean up the parameters of the Issue*()
methods and New() methods to more closely match how we prefer to call
them. We were always only passing the common name part of the
pkix.Name to New(), so now the New() method just takes the common name
as a string. When making a server cert, we don't need to set the
deprecated common name field, so remove that param. When making a client
cert, we're always making it in the format expected by the Kube API
server, so just accept the username and group as parameters directly.
I'm kinda surprised this is working with our current implementation of the
impersonator, but regardless this seems like a step forward.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
The impersonator_test.go unit test now starts the impersonation
server and makes real HTTP requests against it using client-go.
It is backed by a fake Kube API server.
The CA IssuePEM() method was missing the argument to allow a slice
of IP addresses to be passed in.
These tests occasionally flake because of a conflict error such as:
```
supervisor_discovery_test.go:105:
Error Trace: supervisor_discovery_test.go:587
supervisor_discovery_test.go:105
Error: Received unexpected error:
Operation cannot be fulfilled on federationdomains.config.supervisor.pinniped.dev "test-oidc-provider-lvjfw": the object has been modified; please apply your changes to the latest version and try again
Test: TestSupervisorOIDCDiscovery
```
These retries should improve the reliability of the tests.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Also make each t.Run use its own namespace to slight reduce the
interdependency between them.
Use t.Cleanup instead of defer in whoami_test.go just to be consistent
with other integration tests.
The same coverage that was supplied by
TestCredentialRequest_OtherwiseValidRequestWithRealTokenShouldFailWhenTheClusterIsNotCapable
is now provided by an assertion at the end of TestImpersonationProxy,
so delete the duplicate test which was failing on GKE because the
impersonation proxy is now active by default on GKE.
When testing that the impersonation proxy port was closed there
is no need to include credentials in the request. At the point when
we want to test that the impersonation proxy port is closed, it is
possible that we cannot perform a TokenCredentialRequest to get a
credential either.
Also add a new assertion that the TokenCredentialRequest stops handing
out credentials on clusters which have no successful strategies.
Signed-off-by: Monis Khan <mok@vmware.com>
To make an impersonation request, first make a TokenCredentialRequest
to get a certificate. That cert will either be issued by the Kube
API server's CA or by a new CA specific to the impersonator. Either
way, you can then make a request to the impersonator and present
that client cert for auth and the impersonator will accept it and
make the impesonation call on your behalf.
The impersonator http handler now borrows some Kube library code
to handle request processing. This will allow us to more closely
mimic the behavior of a real API server, e.g. the client cert
auth will work exactly like the real API server.
Signed-off-by: Monis Khan <mok@vmware.com>
It also works on the slightly older MacOS Catalina.
This script is only used on development laptops, so hopefully
this will work for more laptop OS's now.
Signed-off-by: Ryan Richard <richardry@vmware.com>
This adds two new flags to "pinniped get kubeconfig": --skip-validation and --timeout.
By default, at the end of the kubeconfig generation process, we validate that we can reach the configured cluster. In the future this might also validate that the TokenCredentialRequest API is running, but for not it just verifies that the DNS name resolves, and the TLS connection is available on the given port.
If there is an error during this check, we block and retry for up to 10 minutes. This duration can be changed with --timeout an the entire process can be skipped with --skip-validation.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This makes output that's easier to copy-paste into the test. We could also make it ignore the order of key/value pairs in the future.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The thing we're waiting for is mostly that DNS is resolving, the ELB is listening, and connections are making it to the proxy.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
All controller unit tests were accidentally using a timeout context
for the informers, instead of a cancel context which stays alive until
each test is completely finished. There is no reason to risk
unpredictable behavior of a timeout being reached during an individual
test, even though with the previous 3 second timeout it could only be
reached on a machine which is running orders of magnitude slower than
usual, since each test usually runs in about 100-300 ms. Unfortunately,
sometimes our CI workers might get that slow.
This sparked a review of other usages of timeout contexts in other
tests, and all of them were increased to a minimum value of 1 minute,
under the rule of thumb that our tests will be more reliable on slow
machines if they "pass fast and fail slow".
In impersonator_config_test.go, instead of waiting for the resource
version to appear in the informers, wait for the actual object to
appear.
This is an attempt to resolve flaky failures that only happen in CI,
but it also cleans up the test a bit by avoiding inventing fake resource
version numbers all over the test.
Signed-off-by: Monis Khan <mok@vmware.com>
- Use `Eventually` when making tls connections because the production
code's handling of starting and stopping the TLS server port
has some async behavior.
- Don't use resource version "0" because that has special meaning
in the informer libraries.
This time, don't use the Squid proxy if the cluster supports real external load balancers (as in EKS/GKE/AKS).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Because otherwise `go test` will panic/crash your test if it takes
longer than 10 minutes, which is an annoying way for an integration
test to fail since it skips all of the t.Cleanup's.
This updates our issuerconfig.UpdateStrategy to sort strategies according to a weighted preference.
The TokenCredentialRequest API strategy is preffered, followed by impersonation proxy, followed by any other unknown types.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- This commit does not include the updates that we plan to make to
the `status.strategies[].frontend` field of the CredentialIssuer.
That will come in a future commit.
This is more than an automatic merge. It also includes a rewrite of the CredentialIssuer API impersonation proxy fields using the new structure, and updates to the CLI to account for that new API.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
I think this is another aspect of the test flakes we're trying to fix. This matters especially for the "Multiple Pinnipeds" test environment where two copies of the test suite are running concurrently.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
If the test is run immediately after the Concierge is installed, the API server can still have broken discovery data and return an error on the first call.
This commit adds a retry loop to attempt this first kubectl command for up to 60s before declaring failure.
The subsequent tests should be covered by this as well since they are not run in parallel.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
These controllers were a bit inconsistent. There were cases where the controllers ran out of the expected order and the custom labels might not have been applied.
We should still plan to remove this label handling or move responsibility into the middleware layer, but this avoids any regression.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This field is a new tagged-union style field that describes how clients can connect using each successful strategy.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We don't support using the impersonate headers through the impersonation
proxy yet, so this integration test is a negative test which asserts
that we get an error.
- The CA cert will end up in the end user's kubeconfig on their client
machine, so if it changes they would need to fetch the new one and
update their kubeconfig. Therefore, we should avoid changing it as
much as possible.
- Now the controller writes the CA to a different Secret. It writes both
the cert and the key so it can reuse them to create more TLS
certificates in the future.
- For now, it only needs to make more TLS certificates if the old
TLS cert Secret gets deleted or updated to be invalid. This allows
for manual rotation of the TLS certs by simply deleting the Secret.
In the future, we may want to implement some kind of auto rotation.
- For now, rotation of both the CA and TLS certs will also happen if
you manually delete the CA Secret. However, this would cause the end
users to immediately need to get the new CA into their kubeconfig,
so this is not as elegant as a normal rotation flow where you would
have a window of time where you have more than one CA.
Also fixes our sitemap to have correct `lastmod` times when built locally (it was already correct on Netlify).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Should work on cluster which have:
- load balancers not supported, has squid proxy (e.g. kind)
- load balancers supported, has squid proxy (e.g. EKS)
- load balancers supported, no squid proxy (e.g. GKE)
When testing with a load balancer, call the impersonation proxy through
the load balancer.
Also, added a new library.RequireNeverWithoutError() helper.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This flag selects a CredentialIssuer to use when detecting what mode the Concierge is in on a cluster. If not specified, the command will look for a single CredentialIssuer. If there are multiple, then the flag is required.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Also update concierge_impersonation_proxy_test.go integration test
to use real TLS when calling the impersonator.
Signed-off-by: Ryan Richard <richardry@vmware.com>
These are prone to breaking when stdr is upgraded because they rely on the exact ordering of keys in the log message. If we have more problems we can rewrite the assertions to be more robust, but for this time I'm just fixing them to match the new output.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The login commands now expect either `--concierge-mode ImpersonationProxy` or `--concierge-mode TokenCredentialRequestAPI` (the default).
This is partly a style choice, but I also think it helps in case we need to add a third major mode of operation at some point.
I also cleaned up some other minor style items in the help text.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Adds a new optional `spec.impersonationProxyInfo` field to hold the URL and CA data for the impersonation proxy, as well as some additional status condition constants for describing the current status of the impersonation proxy.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
These are some more changes that came up when Pablo and I were reviewing the previous docs PR.
In no particular order:
- Fix "related posts" on the blog section, and hide the section if there are none.
- Minor style changes to several pages (guided by various style guides).
- Redirect the root of get.pinniped.dev to our main page (shouldn't really be hit, but it's nice to do something).
- Add more mobile-friendly CSS for our docs.
- Reword the "getting started" CTA, and hide it on the docs pages (you're already there).
- Fix the "Learn how Pinniped provides identity services to Kubernetes" link on the landing page.
- Add a date to our blog post cards.
- Rewrite the hero text on the landing page.
- Fix the docs link for the "Get Started with Pinniped" button on the landing page.
- Rework the landing page grid text.
- Add Margo and Nanci to the team section and sort it alphabetically.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Also:
- Shut down the informer correctly in
concierge_impersonation_proxy_test.go
- Remove the t.Failed() checks which avoid cleaning up after failed
tests. This was inconsistent with how most of the tests work, and
left cruft on clusters when a test failed.
Signed-off-by: Ryan Richard <richardry@vmware.com>
Also:
- Changed base64 encoding of impersonator bearer tokens to use
`base64.StdEncoding` to make it easier for users to manually
create a token using the unix `base64` command
- Test the headers which are and are not passed through to the Kube API
by the impersonator more carefully in the unit tests
- More WIP on concierge_impersonation_proxy_test.go
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This change adds a new virtual aggregated API that can be used by
any user to echo back who they are currently authenticated as. This
has general utility to end users and can be used in tests to
validate if authentication was successful.
Signed-off-by: Monis Khan <mok@vmware.com>
- Because the impersonation proxy config controller needs to be able
to delete the load balancer which it created
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This is a more reliable way to determine whether the load balancer
is already running.
Also added more unit tests for the load balancer.
Signed-off-by: Ryan Richard <richardry@vmware.com>
Makes most of the fonts a bit bigger, increases contrast, fixes some nits about the spacing in numbered/bulletted lists, and adds some image alt texts.
Overall this improves our Lighthouse accessibility score from 71 to 95 and I think it's subjectively more readable.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This wasn't needed before because the other code wasn't in the main module and golangci-lint won't cross a module boundary.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
If someone has already set impersonation headers in their request, then
we should fail loudly so the client knows that its existing impersonation
headers will not work.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I think we were assuming the name of our Concierge app, and getting lucky
because it was the name we use when testing locally (but not in CI).
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Watch a configmap to read the configuration of the impersonation
proxy and reconcile it.
- Implements "auto" mode by querying the API for control plane nodes.
- WIP: does not create a load balancer or proper TLS certificates yet.
Those will come in future commits.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
This is a partial revert of 288d9c999e. For some reason it didn't occur to me
that we could do it this way earlier. Whoops.
This also contains a middleware update: mutation funcs can return an error now
and short-circuit the rest of the request/response flow. The idea here is that
if someone is configuring their kubeclient to use middleware, they are agreeing
to a narrow-er client contract by doing so (e.g., their TokenCredentialRequest's
must have an Spec.Authenticator.APIGroup set).
I also updated some internal/groupsuffix tests to be more realistic.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I think the reason we were seeing flakes here is because the kube cert agent
pods had not reached a steady state even though our test assertions passed, so
the test would proceed immediately and run more assertions on top of a weird
state of the kube cert agent pods.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I added that test helper to create an http.Request since I wanted to properly
initialize the http.Request's context.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This allows us to keep all of our resources in the pinniped category
while not having kubectl return errors for calls such as:
kubectl get pinniped -A
Signed-off-by: Monis Khan <mok@vmware.com>
As of upgrading to Kubernetes 1.20, our aggregated API server nows runs some
controllers for the two flowcontrol.apiserver.k8s.io resources in the title of
this commit, so it needs RBAC to read them.
This should get rid of the following error messages in our Concierge logs:
Failed to watch *v1beta1.FlowSchema: failed to list *v1beta1.FlowSchema: flowschemas.flowcontrol.apiserver.k8s.io is forbidden: User "system:serviceaccount:concierge:concierge" cannot list resource "flowschemas" in API group "flowcontrol.apiserver.k8s.io" at the cluster scope
Failed to watch *v1beta1.PriorityLevelConfiguration: failed to list *v1beta1.PriorityLevelConfiguration: prioritylevelconfigurations.flowcontrol.apiserver.k8s.io is forbidden: User "system:serviceaccount:concierge:concierge" cannot list resource "prioritylevelconfigurations" in API group "flowcontrol.apiserver.k8s.io" at the cluster scope
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I messed this up before because the ordering of the path components is a bit different than in the specific version case.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
When the Pinniped server has been installed with the `api_group_suffix`
option, for example using `mysuffix.com`, then clients who would like to
submit a `TokenCredentialRequest` to the server should set the
`Spec.Authenticator.APIGroup` field as `authentication.concierge.mysuffix.com`.
This makes more sense from the client's point of view than using the
default `authentication.concierge.pinniped.dev` because
`authentication.concierge.mysuffix.com` is the name of the API group
that they can observe their cluster and `authentication.concierge.pinniped.dev`
does not exist as an API group on their cluster.
This commit includes both the client and server-side changes to make
this work, as well as integration test updates.
Co-authored-by: Andrew Keesler <akeesler@vmware.com>
Co-authored-by: Ryan Richard <richardry@vmware.com>
Co-authored-by: Margo Crawford <margaretc@vmware.com>
Makes it easy to deploy Pinniped under a different API group for manual
testing and iterating on integration tests on your laptop.
Signed-off-by: Ryan Richard <richardry@vmware.com>
Because it is a test of the conciergeclient package, and the naming
convention for integration test files is supervisor_*_test.go,
concierge_*_test.go, or cli_*_test.go to identify which component
the test is primarily covering.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This makes sure that if our clients ever send types with the wrong
group, the server will refuse to decode it.
Signed-off-by: Monis Khan <mok@vmware.com>
- I realized that the hardcoded fakekubeapi 404 not found response was invalid,
so we were getting a default error message. I fixed it so the tests follow a
higher fidelity code path.
- I caved and added a test for making sure the request body was always closed,
and believe it or not, we were double closing a body. I don't *think* this will
matter in production, since client-go will pass us ioutil.NopReader()'s, but
at least we know now.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Yes, this is a huge commit.
The middleware allows you to customize the API groups of all of the
*.pinniped.dev API groups.
Some notes about other small things in this commit:
- We removed the internal/client package in favor of pkg/conciergeclient. The
two packages do basically the same thing. I don't think we use the former
anymore.
- We re-enabled cluster-scoped owner assertions in the integration tests.
This code was added in internal/ownerref. See a0546942 for when this
assertion was removed.
- Note: the middlware code is in charge of restoring the GV of a request object,
so we should never need to write mutations that do that.
- We updated the supervisor secret generation to no longer manually set an owner
reference to the deployment since the middleware code now does this. I think we
still need some way to make an initial event for the secret generator
controller, which involves knowing the namespace and the name of the generated
secret, so I still wired the deployment through. We could use a namespace/name
tuple here, but I was lazy.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Co-authored-by: Ryan Richard <richardry@vmware.com>
This was generated via `hugo gen chromastyles --style=monokailight > ./site/themes/pinniped/assets/scss/_syntax.css`.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We have these redirects set up to make the `kubectl apply -f [...]` commands cleaner, but we never went back and fixed up the documentation to use them until now.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This project overwrote the v1.0.0 tag with a different commit ID, which has caused issues with the Go module sum DB (which accurately detected the issue).
This has been one of the reasons why Dependabot is not updating our Go dependencies.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This optimizes our image in a few different ways:
- It adds a bunch of files and directories to the `.dockerignore` file.
This lets us have a single `COPY . .` but still be very aggressive about pruning what files end up in the build context.
- It adds build-time cache mounts to the `go build` commands using BuildKit's `--mount=type=cache` flag.
This requires BuildKit-capable Docker, but means that our Go builds can all be incremental builds.
This replaces the previous flow we had where we needed to split out `go mod download`.
- Instead of letting the full `apt-get install ca-certificates` layer end up in our final image, we copy just the single file we need.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We need this in CI when we want to configure Dex with the redirect URI for both
primary and secondary deploys at one time (since we only stand up Dex once).
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I didn't advertise this feature in the deploy README's since (hopefully) not
many people will want to use it?
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Previously, when triggering a Tilt reload via a *.go file change, a reload would
take ~13 seconds and we would see this error message in the Tilt logs for each
component.
Live Update failed with unexpected error:
command terminated with exit code 2
Falling back to a full image build + deploy
Now, Tilt should reload images a lot faster (~3 seconds) since we are running
the images as root.
Note! Reloading the Concierge component still takes ~13 seconds because there
are 2 containers running in the Concierge namespace that use the Concierge
image: the main Concierge app and the kube cert agent pod. Tilt can't live
reload both of these at once, so the reload takes longer and we see this error
message.
Will not perform Live Update because:
Error retrieving container info: can only get container info for a single pod; image target image:image/concierge has 2 pods
Falling back to a full image build + deploy
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
The group claims read from the session cache file are loaded as `[]interface{}` (slice of empty interfaces) so when we previously did a `groups, _ := idTokenClaims[oidc.DownstreamGroupsClaim].([]string)`, then `groups` would always end up nil.
The solution I tried here was to convert the expected value to also be `[]interface{}` so that `require.Equal(t, ...)` does the right thing.
This bug only showed up in our acceptance environnment against Okta, since we don't have any other integration test coverage with IDPs that pass a groups claim.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We were seeing a race in this test code since the require.NoError() and
require.Eventually() would write to the same testing.T state on separate
goroutines. Hopefully this helper function should cover the cases when we want
to require.NoError() inside a require.Eventually() without causing a race.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Co-authored-by: Margo Crawford <margaretc@vmware.com>
Co-authored-by: Monis Khan <i@monis.app>
This reverts commit 4a28d1f800.
This commit was originally made to fix a bug that caused TokenCredentialRequest
to become slow when the server was idle for an extended period of time. This was
to address a Kubernetes issue that was fixed in 1.19.5 and onward. We are now
running with Kubernetes 1.20, so we should be able to pick up this fix.
This change updates our clients to always set an owner ref when:
1. The operation is a create
2. The object does not already have an owner ref set
Signed-off-by: Monis Khan <mok@vmware.com>
See comment. This is at least a first step to make our GKE acceptance
environment greener. Previously, this test assumed that the Pinniped-under-test
had been deployed in (roughly) the last 10 minutes, which is not an assumption
that we make anywhere else in the integration test suite.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- JWKSWriterController
- JWKSObserverController
- FederationDomainSecretsController for HMAC keys
- FederationDomainSecretsController for state signature key
- FederationDomainSecretsController for state encryption key
Signed-off-by: Ryan Richard <richardry@vmware.com>
- Only sync on add/update of secrets in the same namespace which
have the "storage.pinniped.dev/garbage-collect-after" annotation, and
also during a full resync of the informer whenever secrets in the
same namespace with that annotation exist.
- Ignore deleted secrets to avoid having this controller trigger itself
unnecessarily when it deletes a secret. This controller is never
interested in deleted secrets, since its only job is to delete
existing secrets.
- No change to the self-imposed rate limit logic. That still applies
because secrets with this annotation will be created and updated
regularly while the system is running (not just during rare system
configuration steps).
I'm not sure if these docs are used anywhere in our website, but I don't think
that they are. I'm assuming someone or something will yell if these should not
be deleted. These docs also live at the root of the repo, and the duplicate
versions are already drifting out of sync from one another.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
We stared at this very carefully and we don't think there are any structural changes. Maybe something small happened to get the RNG off by one?
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This implementation is janky because I wanted to make the smallest change
possible to try to get the code back to stable so we can release.
Also deep copy an object so we aren't mutating the cache.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This is a bit more clear. We're changing this now because it is a non-backwards-compatible change that we can make now since none of this RFC8693 token exchange stuff has been released yet.
There is also a small typo fix in some flag usages (s/RF8693/RFC8693/)
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- The overall timeout for logins is increased to 90 minutes.
- The timeout for token refresh is increased from 30 seconds to 60 seconds to be a bit more tolerant of extremely slow networks.
- A new, matching timeout of 60 seconds has been added for the OIDC discovery, auth code exchange, and RFC8693 token exchange operations.
The new code uses the `http.Client.Timeout` field rather than managing contexts on individual requests. This is easier because the OIDC package stores a context at creation time and tries to use it later when performing key refresh operations.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Fosite overrides the `Cache-Control` header we set, which is basically fine even though it's not exactly what we want.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
It would be great to do this for the supervisor's callback endpoint as well, but it's difficult to get at those since the request happens inside the spawned browser.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
From RFC2616 (https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2):
> It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair,
> without changing the semantics of the message, by appending each subsequent field-value to the first,
> each separated by a comma.
This was correct before, but this simplifes a bit and shaves off a few bytes from the response.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The bug itself has to do with when headers are streamed to the client. Once a wrapped handler has sent any bytes to the `http.ResponseWriter`, the value of the map returned from `w.Header()` no longer matters for the response. The fix is fairly trivial, which is to add those response headers before invoking the wrapped handler.
The existing unit test didn't catch this due to limitations in `httptest.NewRecorder()`. It is now replaced with a new test that runs a full HTTP test server, which catches the previous bug.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Because the library that we are using which returns that error
formats the timestamp in localtime, which is LMT when running
on a laptop, but is UTC when running in CI.
Signed-off-by: Ryan Richard <richardry@vmware.com>
This reverts commit be4e34d0c0.
Roll back this change that was supposed to make the test more robust. If we
retry multiple token exchanges with the same auth code, of course we are going
to get failures on the second try onwards because the auth code was invalidated
on the first try.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
The value is correctly validated as `secrets.pinniped.dev/oidc-client` elsewhere, only this comment was wrong.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Refactor the test to avoid testing a private method and instead
always test the results of running the controller.
- Also remove the `if testing.Short()` check because it will always
be short when running unit tests. This prevented the unit test
from ever running, both locally and in CI.
Signed-off-by: Ryan Richard <richardry@vmware.com>
It now tests both the deprecated `pinniped get-kubeconfig` and the new `pinniped get kubeconfig --static-token` flows.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is extracted from library.NewClientsetForKubeConfig(). It is useful so you can assert properties of the loaded, parsed kubeconfig.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Adds two new subcommands: `pinniped get kubeconfig` and `pinniped login static`
- Adds concierge support to `pinniped login oidc`.
- Adds back wrapper commands for the now deprecated `pinniped get-kubeconfig` and `pinniped exchange-credential` commands. These now wrap `pinniped get kubeconfig` and `pinniped login static` respectively.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is a slighly evolved version of our previous client package, exported to be public and refactored to use functional options for API maintainability.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
I hope this will make TestSupervisorLogin less flaky. There are some instances
where the front half of the OIDC login flow happens so fast that the JWKS
controller doesn't have time to properly generate an asymmetric key.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I saw this message in our CI logs, which led me to this fix.
could not update status: OIDCProvider.config.supervisor.pinniped.dev "acceptance-provider" is invalid: status.status: Unsupported value: "SameIssuerHostMustUseSameSecret": supported values: "Success", "Duplicate", "Invalid"
Also - correct an integration test error message that was misleading.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
We believe this API is more forwards compatible with future secrets management
use cases. The implementation is a cry for help, but I was trying to follow the
previously established pattern of encapsulating the secret generation
functionality to a single group of packages.
This commit makes a breaking change to the current OIDCProvider API, but that
OIDCProvider API was added after the latest release, so it is technically still
in development until we release, and therefore we can continue to thrash on it.
I also took this opportunity to make some things private that didn't need to be
public.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This forced us to add labels to the CSRF cookie secret, just as we do
for other Supervisor secrets. Yay tests.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- AudienceMatchingStrategy: we want to use the default matcher from
fosite, so remove that line
- AllowedPromptValues: We can use the default if we add a small
change to the auth_handler.go to account for it (in a future commit)
- MinParameterEntropy: Use the fosite default to make it more likely
that off the shelf OIDC clients can work with the supervisor
Signed-off-by: Ryan Richard <richardry@vmware.com>
- Also add more log statements to the controller
- Also have the controller apply a rate limit to itself, to avoid
having a very chatty controller that runs way more often than is
needed.
- Also add an integration test for the controller's behavior.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
When we try to decode with the wrong decryption key, we could get any number of
error messages, depending on what failure mode we are in (couldn't authenticate
plaintext after decryption, couldn't deserialize, etc.). This change makes the
test weaker, but at least we know we will get an error message in the case where
the decryption key is wrong.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This also sets the CSRF cookie Secret's OwnerReference to the Pod's grandparent
Deployment so that when the Deployment is cleaned up, then the Secret is as
well.
Obviously this controller implementation has a lot of issues, but it will at
least get us started.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
There is still a test failing, but I am sure it is a simple fix hiding in the
code. I think this is the general shape of the controller that we want.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Note that we don't cache the securecookie.SecureCookie that we use in our
implementation. This was purely because of laziness. We should think about
caching this value in the future.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Make it more likely that the end user will get the more specific error
message saying that their refresh token has expired the first time
that they try to use an expired refresh token
Signed-off-by: Ryan Richard <richardry@vmware.com>
- This struct represents the configuration of all timeouts. These
timeouts are all interrelated to declare them all in one place.
This should also make it easier to allow the user to override
our defaults if we would like to implement such a feature in the
future.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
Before this, we weren't properly parsing the `Content-Type` header. This breaks in integration with the Supervisor since it sends an extra encoding parameter like `application/json;charset=UTF-8`.
This change switches to properly parsing with the `mime.ParseMediaType` function, and adds test cases to match the supervisor behavior.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This default matches the static client we have defined in the supervisor, which will be the correct value in most cases.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
I think this should be more correct. In the server we're authenticating the request primarily via the `subject_token` parameter anyway, and Fosite needs the `client_id` to be set.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
kubectl 1.20 prints "Kubernetes control plane" instead of "Kubernetes master".
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- This is to make it easier for the token exchange branch to also edit
this test without causing a lot of merge conflicts with the
refresh token branch, to enable parallel development of closely
related stories.
- This refactor will allow us to add new test tables for the
refresh and token exchange requests, which both must come after
an initial successful authcode exchange has already happened
Signed-off-by: Margo Crawford <margaretc@vmware.com>
We decided that we don't really need these in every case, since we'll be returning username and groups in a custom claim.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This refactors the `UpstreamOIDCIdentityProviderI` interface and its implementations to pass ID token claims through a `*oidctypes.Token` return parameter rather than as a third return parameter.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is just a more convenient copy of these values which are already stored inside the ID token. This will save us from having to pass them around seprately or re-parse them later.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
I'm worried that these errors are going to be really burried from the user, so
add some log statements to try to make them a tiny bit more observable.
Also follow some of our error message convetions by using lowercase error
messages.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This CSRF cookie needs to be included on the request to the callback endpoint triggered by the redirect from the OIDC upstream provider. This is not allowed by `Same-Site=Strict` but is allowed by `Same-Site=Lax` because it is a "cross-site top-level navigation" [1].
We didn't catch this earlier with our Dex-based tests because the upstream and downstream issuers were on the same parent domain `*.svc.cluster.local` so the cookie was allowed even with `Strict` mode.
[1]: https://tools.ietf.org/html/draft-ietf-httpbis-cookie-same-site-00#section-3.2
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This commit includes a failing test (amongst other compiler failures) for the
dynamic signing key fetcher that we will inject into fosite. We are checking it
in so that we can pass the WIP off.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
We are currently using EC keys to sign ID tokens, so we should reflect that in
our OIDC discovery metadata.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This fixes a regression introduced by 24c4bc0dd4. It could occasionally cause the tests to fail when run on a machine with an IPv6 localhost interface. As a fix I added a wrapper for the new Go 1.15 `LookupIP()` method, and created a partially-functional backport for Go 1.14. This should be easy to delete in the future.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This adds a few new "create test object" helpers and extends `CreateTestOIDCProvider()` to optionally wait for the created OIDCProvider to enter some expected status condition.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This allows the token exchange request to be performed with the correct TLS configuration.
We go to a bit of extra work to make sure the `http.Client` object is cached between reconcile operations so that connection pooling works as expected.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Note that this WIP commit includes a failing unit test, which will
be addressed in the next commit
Signed-off-by: Ryan Richard <richardry@vmware.com>
Mainly, avoid using some `testing` helpers that were added in 1.14, as well as a couple of other niceties we can live without.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We were assuming that env.SupervisorHTTPAddress was set, but it might not be
depending on the environment on which the integration tests are being run. For
example, in our acceptance environments, we don't currently set
env.SupervisorHTTPAddress.
I tried to follow the pattern from TestSupervisorOIDCDiscovery here.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Generate a new cookie for the user and move on as if they had not sent
a bad cookie. Hopefully this will make the user experience better if,
for example, the server rotated cookie signing keys and then a user
submitted a very old cookie.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Also use ConstantTimeCompare() to compare CSRF tokens to prevent
leaking any information in how quickly we reject bad tokens.
Signed-off-by: Ryan Richard <richardry@vmware.com>
This is much nicer UX for an administrator installing a UpstreamOIDCProvider
CRD. They don't have to guess as hard at what the callback endpoint path should
be for their UpstreamOIDCProvider.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Also aggresively refactor for readability:
- Make helper validations functions for each type of storage
- Try to label symbols based on their downstream/upstream use and group them
accordingly
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Prior to this we re-used the CLI testing client to test the authorize flow of the supervisor, but they really need to be separate upstream clients. For example, the supervisor client should be a non-public client with a client secret and a different callback endpoint.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Also handle several more error cases
- Move RequireTimeInDelta to shared testutils package so other tests
can also use it
- Move all of the oidc test helpers into a new oidc/oidctestutils
package to break a circular import dependency. The shared testutil
package can't depend on any of our other packages or else we
end up with circular dependencies.
- Lots more assertions about what was stored at the end of the
request to build confidence that we are going to pass all of the
right settings over to the token endpoint through the storage, and
also to avoid accidental regressions in that area in the future
Signed-off-by: Ryan Richard <richardry@vmware.com>
Also refactor to get rid of duplicate test structs.
Also also don't default groups ID token claim because there is no standard one.
Also also also add some logging that will hopefully help us in debugging in the
future.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Because we want it to implement an AuthcodeExchanger interface and
do it in a way that will be more unit test-friendly than the underlying
library that we intend to use inside its implementation.
This will allow it to be imported by Go code outside of our repository, which was something we have planned for since this code was written.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Before, we did this in an init container, which meant if the Dex pod restarted we would have fresh certs, but our Tilt/bash setup didn't account for this.
Now, the certs are generated by a Job which runs once and saves the generated files into a Secret. This should be a bit more stable.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This change deploys a small Squid-based proxy into the `dex` namespace in our integration test environment. This lets us use the cluster-local DNS name (`http://dex.dex.svc.cluster.local/dex`) as the OIDC issuer. It will make generating certificates easier, and most importantly it will mean that our CLI can see Dex at the same name/URL as the supervisor running inside the cluster.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- To better support having multiple downstream providers configured,
the authorize endpoint will share a CSRF cookie between all
downstream providers' authorize endpoints. The first time a
user's browser hits the authorize endpoint of any downstream
provider, that endpoint will set the cookie. Then if the user
starts an authorize flow with that same downstream provider or with
any other downstream provider which shares the same domain name
(i.e. differentiated by issuer path), then the same cookie will be
submitted and respected.
- Just in case we are sharing the domain name with some other app,
we sign the value of any new CSRF cookie and check the signature
when we receive the cookie. This wasn't strictly necessary since
we probably won't share a domain name with other apps, but it
wasn't hard to add this cookie signing.
Signed-off-by: Ryan Richard <richardry@vmware.com>
We want to have our APIs respond to `kubectl get pinniped`, and we shouldn't use `all` because we don't think most average users should have permission to see our API types, which means if we put our types there, they would get an error from `kubectl get all`.
I also added some tests to assert these properties on all `*.pinniped.dev` API resources.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This check predates the API renaming we did. Now that our API groups have `concierge`/`supervisor` in the name, we don't need to maintain a specific set of `cp` commands and keep them in sync, so we don't really need this check.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
These only really make sense for aggregated API types where we need `conversion-gen` to do version conversion.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Our unit tests are gonna touch a lot more corner cases than our
integration tests, so let's make them run as close to the real
implementation as possible.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This commit is meant to be reverted when we are unblocked and
ready to start working on this integration test again. Temporarily
remove it so we can merge this PR to main.
Note: I had tried using t.Skip() in the test, but then that caused lint
failures, so decided to just remove it for now.
We want to run all of the fosite validations in the authorize
endpoint, but we don't need to store anything yet because
we are storing what we need for later in the upstream state
parameter.
Signed-off-by: Ryan Richard <richardry@vmware.com>
This is helpful for us, amongst other users, because we want to enable "debug"
logging whenever we deploy components for testing.
See a5643e3 for addition of log level.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Add a new helper method to plog to make a consistent way to log
expected errors at the info level (as opposed to unexpected
system errors that would be logged using plog.Error)
Signed-off-by: Ryan Richard <richardry@vmware.com>
This is awaiting the new upstream OIDC provider CRD in order
to pass, however hopefully this is a starting point for us.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Also move definition of our oauth client and the general fosite
configuration to a helper so we can use the same config to construct
the handler for both test and production code.
Signed-off-by: Ryan Richard <richardry@vmware.com>
This prevents unnecessary sync loop runs when the controller is
running with a single worker. When the controller is running with
more than one worker, it prevents subtle bugs that can cause the
controller to go "back in time."
Signed-off-by: Monis Khan <mok@vmware.com>
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Previously we were injecting the whole oauth handler chain into this function,
which meant we were essentially writing unit tests to test our tests. Let's push
some of this logic into the source code.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Does not validate incoming request parameters yet. Also is not
served on the http/https ports yet. Those will come in future commits.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This is needed on clusters with PodSecurityPolicy enabled by default, but should be harmless in other cases.
This is generally needed because a restrictive PodSecurityPolicy will usually otherwise prevent the `hostPath` volume mount needed by the dynamically-created cert agent pod.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is the beginning of a change to add cpu/memory limits to our pods.
We are doing this because some consumers require this, and it is generally
a good practice.
The limits == requests for "Guaranteed" QoS.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I tried to follow a principle of encapsulation here - we can still default to
peeps making connections to 80/443 on a Service object, but internally we will
use 8080/8443.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
And delete the agent pod when it needs its custom labels to be
updated, so that the creator controller will notice that it is missing
and immediately create it with the new custom labels.
This only matters for local development, since we don't use this script directly in CI. Makes the full codegen ste take ~90s on my laptop.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is the first of a few related changes that re-organize our API after the big recent changes that introduced the supervisor component.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Setting a Secret in the supervisor's namespace with a special name
will cause it to get picked up and served as the supervisor's TLS
cert for any request which does not have a matching SNI cert.
- This is especially useful for when there is no DNS record for an
issuer and the user will be accessing it via IP address. This
is not how we would expect it to be used in production, but it
might be useful for other cases.
- Includes a new integration test
- Also suppress all of the warnings about ignoring the error returned by
Close() in lines like `defer x.Close()` to make GoLand happier
- TLS certificates can be configured on the OIDCProviderConfig using
the `secretName` field.
- When listening for incoming TLS connections, choose the TLS cert
based on the SNI hostname of the incoming request.
- Because SNI hostname information on incoming requests does not include
the port number of the request, we add a validation that
OIDCProviderConfigs where the issuer hostnames (not including port
number) are the same must use the same `secretName`.
- Note that this approach does not yet support requests made to an
IP address instead of a hostname. Also note that `localhost` is
considered a hostname by SNI.
- Add port 443 as a container port to the pod spec.
- A new controller watches for TLS secrets and caches them in memory.
That same in-memory cache is used while servicing incoming connections
on the TLS port.
- Make it easy to configure both port 443 and/or port 80 for various
Service types using our ytt templates for the supervisor.
- When deploying to kind, add another nodeport and forward it to the
host on another port to expose our new HTTPS supervisor port to the
host.
- When two different Issuers have the same host (i.e. they differ
only by path) then they must have the same secretName. This is because
it wouldn't make sense for there to be two different TLS certificates
for one host. Find any that do not have the same secret name to
put an error status on them and to avoid serving OIDC endpoints for
them. The host comparison is case-insensitive.
- Issuer hostnames should be treated as case-insensitive, because
DNS hostnames are case-insensitive. So https://me.com and
https://mE.cOm are duplicate issuers. However, paths are
case-sensitive, so https://me.com/A and https://me.com/a are
different issuers. Fixed this in the issuer validations and in the
OIDC Manager's request router logic.
This was hidden behind a `pinniped alpha` hidden subcommand, but we're comfortable enough with the CLI flag interface now to promote it.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
When using kind we forward the node's port to the host, so we only
really care about the `nodePort` value. For acceptance clusters,
we put an Ingress in front of a NodePort Service, so we only really
care about the `port` value.
Based on our experiences today with GKE, it will be easier for our users
to configure Ingress health checks if the healthz endpoint is available
on the same public port as the OIDC endpoints.
Also add an integration test for the healthz endpoint now that it is
public.
Also add the optional `containers[].ports.containerPort` to the
supervisor Deployment because the GKE docs say that GKE will look
at that field while inferring how to invoke the health endpoint. See
https://cloud.google.com/kubernetes-engine/docs/concepts/ingress#def_inf_hc
- Not used by any of our integration test clusters yet
- Planning to use it later for the kind clusters and maybe for
the acceptance clusters too (although the acceptance clusters might
not need to use self-signed certs so maybe not)
- It didn't matter before because it would be cleaned up by a
t.Cleanup() function, but now that we might loop twice it will matter
during the second time through the loop
EC keys are smaller and take less time to generate. Our integration
tests were super flakey because generating an RSA key would take up to
10 seconds *gasp*. The main token verifier that we care about is
Kubernetes, which supports P256, so hopefully it won't be that much of
an issue that our default signing key type is EC. The OIDC spec seems
kinda squirmy when it comes to using non-RSA signing algorithms...
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
I brought this over because I copied code from work in flight on
another branch. But now the other branch doesn't use this package.
No use bringing on another dependency if we can avoid it.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- When using `local()` in the Tiltfile it will not know
to watch those files for changes, so each time we use
`local()` we now also use `watch_file()`
- As a result, editing a ytt template file now causes
an immediate `kubectl apply` of the results
- New optional ytt value called `into_namespace` means install into that
preexisting namespace rather than creating a new namespace for each app
- Also ensure that every resource that is created statically by our yaml
at install-time by either app is labeled consistently
- Also support adding custom labels to all of those resources from a
new ytt value called `custom_labels`
This will be used for other types of "capabilities" of the test environment besides just those of the test cluster, such as those of an upstream OIDC provider.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Based on the spec, it seems like it's required that OAuth2 servers which do not support PKCE should just ignore the parameters, so this should always work.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Add install-pinniped-supervisor.yaml and rename install-pinniped.yaml
to install-pinniped-concierge.yaml in the release process and
installation/demo documentation.
- Tiltfile and prepare-for-integration-tests.sh both specify the
NodePort Service using `--data-value-yaml 'service_nodeport_port=31234'`
- Also rename the namespaces used by the Concierge and Supervisor apps
during integration tests running locally
- Also continue renaming things related to the concierge app
- Enhance the uninstall test to also test uninstalling the supervisor
and local-user-authenticator apps
- Variables specific to concierge add it to their name
- All variables now start with `PINNIPED_TEST_` which makes it clear
that they are for tests and also helps them not conflict with the
env vars that are used in the Pinniped CLI code
- The OIDCProviderConfigWatcherController synchronizes the
OIDCProviderConfig settings to dynamically mount and unmount the
OIDC discovery endpoints for each provider
- Integration test passes but unit tests need to be added still
- Intended to be a red test in this commit; will make it go
green in a future commit
- Enhance env.go and prepare-for-integration-tests.sh to make it
possible to write integration tests for the supervisor app
by setting more env vars and by exposing the service to the kind
host on a localhost port
- Add `--clean` option to prepare-for-integration-tests.sh
to make it easier to start fresh
- Make prepare-for-integration-tests.sh advise you to run
`go test -v -count 1 ./test/integration` because this does
not buffer the test output
- Make concierge_api_discovery_test.go pass by adding expectations
for the new OIDCProviderConfig type
- Note that this avoids committing the demo screencast
file to our git history because it is 5.76 MB. We won't
want to need to download that content on
every `git clone`.
- Instead the file is hosted by GitHub's CDN
This will hopefully come in handy later if we ever decide to add
support for multiple OIDC providers as a part of one supervisor.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Moves the "front matter" to a Markdown comment so you don't necessarily have to delete it.
- Reduces a little bit of boilerplate (this is a bit subjective).
- Tweaks some formatting (also subjective).
- Describe what happens when you use "Fixes [...]".
- Tweak the release note block so it should be easier to parse out automatically (using the same syntax as Kubernetes).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This needs to be overridden for Tilt usage, since the main image referenced by Tilt isn't actually pullable.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Prevent the server binary from lying about its version number
by having it report "?.?.?" as its version number for now.
- Later we can devise a way for CI to inject the version number
for the server into the container image at release time,
not at build time, since the version number is not known
at build time.
- Pre-release builds of the binary from before the release stage or
builds on developer workstation will also report "?.?.?" as its
version number, which is fine since they are not official releases
and shouldn't find their way to the public.
I saw this helper function the other day and wondered if we could use it.
It does indeed look like it does what we want, because when I run this code,
I get `...User "system:anonymous" cannot get resource...`.
c := library.NewAnonymousPinnipedClientset(t)
_, err := c.
ConfigV1alpha1().
CredentialIssuerConfigs("integration").
Get(context.Background(), "pinniped-config", metav1.GetOptions{})
t.Log(err)
I also ran a similar test using this new helper in the context of
library.NewClientsetWithCertAndKey(). Seemed to get us what we want.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This change replaces our previous test helpers for checking cluster capabilities and passing external test parameters. Prior to this change, we always used `$PINNIPED_*` environment variables and these variables were accessed throughout the test code.
The new code introduces a more strongly-typed `TestEnv` structure and helpers which load and expose the parameters. Tests can now call `env := library.IntegrationEnv(t)`, then access parameters such as `env.Namespace` or `env.TestUser.Token`. This should make this data dependency easier to manage and refactor in the future. In many ways this is just an extended version of the previous cluster capabilities YAML.
Tests can also check for cluster capabilities easily by using `env := library.IntegrationEnv(t).WithCapability(xyz)`.
The actual parameters are still loaded from OS environment variables by default (for compatibility), but the code now also tries to load the data from a Kubernetes Secret (`integration/pinniped-test-env` by default). I'm hoping this will be a more convenient way to pass data between various scripts than the local `/tmp` directory. I hope to remove the OS environment code in a future commit.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This should fix integration tests running on clusters that don't have
visible controller manager pods (e.g., GKE). Pinniped should boot, not
find any controller manager pods, but still post a status in the CIC.
I also updated a test helper so that we could tell the difference
between when an event was not added and when an event was added with
an empty key.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Right now in the YTT templates we assume that the agent pods are gonna use
the same image as the main Pinniped deployment, so we can use the same logic
for the image pull secrets.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This seems to fail on CI when the Concourse workers get slow and
kind stops working reliably. It would be interesting to see the
error message in that case to figure out if there's anything we
could do to make the test more resilient.
Simplifies the implementation, makes it more consistent with other
updaters of the cic (CredentialIssuerConfig), and also retries on
update conflicts
Signed-off-by: Ryan Richard <richardry@vmware.com>
- Only inject things through the constructor that the controller
will need
- Use pkg private constants when possible for things that are not
actually configurable by the user
- Make the agent pod template private to the pkg
- Introduce a test helper to reduce some duplicated test code
- Remove some `it.Focus` lines that were accidentally committed, and
repair the broken tests that they were hiding
I think we want to reconcile on these pod template fields so that if
someone were to redeploy Pinniped with a new image for the agent, the
agent would get updated immediately. Before this change, the agent image
wouldn't get updated until the agent pod was deleted.
This thing is supposed to be used to help our CredentialRequest handler issue certs with a dynamic
CA keypair.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
3 main reasons:
- The cert and key that we store in this object are not always used for TLS.
- The package name "provider" was a little too generic.
- dynamiccert.Provider reads more go-ish than provider.DynamicCertProvider.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Lots of TODOs added that need to be resolved to finish this WIP
- execer_test.go seems like it should be passing, but it fails (sigh)
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Annotations do not have this restriction, so we can put it there instead. This only currently occurs on clusters without the cluster signing capability (GKE).
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This also has fallback compatibility support if no IDP is specified and there is exactly one IDP in the cache.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- All of the `kubecertagent` controllers now take two informers
- This is moving in the direction of creating the agent pods in the
Pinniped installation namespace, but that will come in a future
commit
New resource naming conventions:
- Do not repeat the Kind in the name,
e.g. do not call it foo-cluster-role-binding, just call it foo
- Names will generally start with a prefix to identify our component,
so when a user lists all objects of that kind, they can tell to which
component it is related,
e.g. `kubectl get configmaps` would list one named "pinniped-config"
- It should be possible for an operator to make the word "pinniped"
mostly disappear if they choose, by specifying the app_name in
values.yaml, to the extent that is practical (but not from APIService
names because those are hardcoded in golang)
- Each role/clusterrole and its corresponding binding have the same name
- Pinniped resource names that must be known by the server golang code
are passed to the code at run time via ConfigMap, rather than
hardcoded in the golang code. This also allows them to be prepended
with the app_name from values.yaml while creating the ConfigMap.
- Since the CLI `get-kubeconfig` command cannot guess the name of the
CredentialIssuerConfig resource in advance anymore, it lists all
CredentialIssuerConfig in the app's namespace and returns an error
if there is not exactly one found, and then uses that one regardless
of its name
Eventually we could refactor to remove support for the old APIs, but they are so similar that a single implementation seems to handle both easily.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This is essentially meant to be be "v1alpha2" of the existing CredentialRequest API, but since we want to move API groups we can just start over at v1alpha1.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We want to be able to run kind integration tests against the same
versions that we generate code against. There is no public
kindest/node image for 1.17.9, so let's update to the next 1.17.x
version where there is an image: 1.17.11.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This was previously using the unpadded (raw) base64 encoder, which worked sometimes (if the CA happened to be a length that didn't require padding). The correct encoding is the `base64.StdEncoding` one that includes padding.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The dry-run fails now because we are trying to install a CRD and a custom resource (of that CRD type) in the same step.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
I tried to follow the following principles.
- Use existing wordsmithing.
- Only document things that we support today.
- Grab our README.md reader's attention using a picture.
- Use "upstream" when referring to OSS and "external" when referring to
IDP integrations.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Add flag parsing and help messages for root command,
`exchange-credential` subcommand, and new `get-kubeconfig` subcommand
- The new `get-kubeconfig` subcommand is a work in progress in this
commit
- Also add here.Doc() and here.Docf() to enable nice heredocs in
our code
So I looked into other TokenReview webhook implementations, and most
of them just use the json stdlib package to unmarshal/marshal
TokenReview payloads. I'd say let's follow that pattern, even though
it leads to extra fields in the JSON payload (these are not harmful).
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Also correct the webhook url setting in prepare-for-integration-tests.sh
- Change the bcrypt count to 10, because 16 is way too slow on old laptops
Signed-off-by: Ryan Richard <richardry@vmware.com>
I also started updating the script to deploy the test-webhook instead of
doing TMC stuff. I think the script should live in this repo so that
Pinniped contributors only need to worry about one repo for running
integration tests.
There are a bunch of TODOs in the script, but I figured this was a good
checkpoint. The script successfully runs on my machine and sets up the
test-webhook and pinniped on a local kind cluster. The integration tests
are failing because of some issue with pinniped talking to the test-webhook,
but this is step in the right direction.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This script was basically an alias for `./hack/module.sh unittest`. We even
tell people to run the unit tests via module.sh in our contributing doc.
Let's ditch it - the best line of (shell code) is the one you don't write.
An analagous change was made in CI to use module.sh in place of test-unit.sh.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- For now, build the test-webhook binary in the same container image as
the pinniped-server binary, to make it easier to distribute
- Also fix lots of bugs from the first draft of the test-webhook's
`/authenticate` implementation from the previous commit
- Add a detailed README for the new deploy-test-webhook directory
The webhook still needs to be updated to auto generate its
certificates.
We decided not to give this webhook its own go module for now since
this webhook only pulled in one more dependency, and it is a
dependency that we will most likely need in the future.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- The certs manager controller, along with its sibling certs expirer
and certs observer controllers, are generally useful for any process
that wants to create its own CA and TLS certs, but only if the
updating of the APIService is not included in those controllers
- So that functionality for updating APIServices is moved to a new
controller which watches the same Secret which is used by those
other controllers
- Also parameterize `NewCertsManagerController` with the service name
and the CA common name to make the controller more reusable
- We are not setting an upper limit because Kubernetes might randomly
decide to unschedule our pod in ways that we can't anticipate in
advance, causing very hard to reproduce production bugs.
- We noticed that our app currently uses ~30 MB of memory when idle,
and ~35 MB of memory under some load. So a memory request of 128
MB should be reasonable.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
When we use RSA private keys to sign our test certificates, we run
into strange test timeouts. The internal/controller/apicerts package
was timing out on my machine more than once every 3 runs. When I
changed the RSA crypto to EC crypto, this timeout goes away. I'm not
gonna try to figure out what the deal is here because I think it would
take longer than it would be worth (although I am sure it is some fun
story involving prime numbers; the goroutine traces for timed out
tests would always include some big.Int operations involving prime
numbers...).
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
It looks like requests to our aggregated API service on GKE vacillate
between success and failure until they reach a converged successful
state. I think this has to do with our pods updating the API serving
cert at different times. If only one pod updates its serving cert to
the correct value, then it should respond with success. However, the
other pod would respond with failure. Depending on the load balancing
algorithm that GKE uses to send traffic to pods in a service, we could
end up with a success that we interpret as "all pods have rotated
their certs" when it really just means "at least one pod has rotated
its certs."
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Controllers will automatically run again when there's an error,
but when we want to update CredentialIssuerConfig from server.go
we should be careful to retry on conflicts
- Add unit tests for `issuerconfig.CreateOrUpdateCredentialIssuerConfig()`
which was covered by integration tests in previous commits, but not
covered by units tests yet.
So that operators won't look at the lifetime of the CA cert and be
like, "wtf, why does the serving cert have the lifetime that I
specified, but its CA cert is valid for 100 years".
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Upgrade from `1.19.0-rc.0` to the newly-release `1.19.0`.
- Downgrade from `1.18.6` to `1.18.2` to match some downstream consumers.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This should simplify our build/test setup quite a bit, since it means we have only a single module (at the top level) with all hand-written code. I'll leave `module.sh` alone for now but we may be able to simplify that a bit more.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We were using this at one point to control which tests ran with `go test ./...`, but now we're also using the `-short` flag to differentiate unit vs. integration tests.
Hopefully this will simplify things a bit.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
kubectl pulls these in in their main package...I wonder if we should do
the same for our main packages?
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Controller and aggregated API server are allowed to run
- Keep retrying to borrow the cluster signing key in case the failure
to get it was caused by a transient failure
- The CredentialRequest endpoint will always return an authentication
failure as long as the cluster signing key cannot be borrowed
- Update which integration tests are skipped to reflect what should
and should not work based on the cluster's capability under this
new behavior
- Move CreateOrUpdateCredentialIssuerConfig() and related methods
to their own file
- Update the CredentialIssuerConfig's Status every time we try to
refresh the cluster signing key
- Indicate the success or failure of the cluster signing key strategy
- Also introduce the concept of "capabilities" of an integration test
cluster to allow the integration tests to be run against clusters
that do or don't allow the borrowing of the cluster signing key
- Tests that are not expected to pass on clusters that lack the
borrowing of the signing key capability are now ignored by
calling the new library.SkipUnlessClusterHasCapability test helper
- Rename library.Getenv to library.GetEnv
- Add copyrights where they were missing
These never worked quite right, so let's disable them for now: #51
We can probably come up with some better solution now with the new codegen scripts, but I'll leave that for later.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
We are seeing between 1 and 2 minutes of difference between the current time
reported in the API server pod and the pinniped pods on one of our testing
environments. Hopefully this change makes our tests pass again.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Didn't fix CI. I didn't think it would.
I have never seen the integration tests fail like this locally, so I
have to imagine the failure has something to do with the environment
on which we are testing.
This reverts commit ba2e2f509a.
We are getting these weird flakes in CI where the kube client that we
create with these helper functions doesn't work against the kube API.
The kube API tells us that we are unauthorized (401). Seems like something
is wrong with the keypair itself, but when I create a one-off kubeconfig
with the keypair, I get 200s from the API. Hmmm...I wonder what CI will
think of this change?
I also tried to align some naming in this package.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Makes it easier to guess/remember what are the legal arguments
- Also update the output a little to make it easier to tell
when the command has succeeded
- And run tests using `-count 1` because cached test results are not
very trustworthy
These configuration knobs are much more human-understandable than the
previous percentage-based threshold flag.
We now allow users to set the lifetime of the serving cert via a ConfigMap.
Previously this was hardcoded to 1 year.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
We need a way to validate that this generated code is up to date. I added
a long-term engineering TODO for this.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
The rotation is forced by a new controller that deletes the serving cert
secret, as other controllers will see this deletion and ensure that a new
serving cert is created.
Note that the integration tests now have an addition worst case runtime of
60 seconds. This is because of the way that the aggregated API server code
reloads certificates. We will fix this in a future story. Then, the
integration tests should hopefully get much faster.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
This switches us back to an approach where we use the Pod "exec" API to grab the keys we need, rather than forcing our code to run on the control plane node. It will help us fail gracefully (or dynamically switch to alternate implementations) when the cluster is not self-hosted.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Co-authored-by: Ryan Richard <richardry@vmware.com>
We don't want people to run codegen.sh directly, because it is meant
to be driven by hack/module.sh. To discourage this behavior, we will hide
codegen.sh away in hack/lib. I don't think this is actually what the
hack/lib directory is for, though...meh.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Runs code generation on a per-module basis. If `CONTAINED` is not set
the code generation is run in a container.
Mount point in docker is randomzied to simulate Concourse.
Introduce K8S_PKG_VERSION to make room to build different versions
eventually.
- Call the auto-generated /healthz endpoint of our aggregated API server
- Use http for liveness even though tcp seems like it might be
more appropriate, because tcp probes cause TLS handshake errors
to appear in our logs every few seconds
- Use conservative timeouts and retries on the liveness probe to avoid
having our container get restarted when it is temporarily slow due
to running in an environment under resource pressure
- Use less conservative timeouts and retries for the readiness probe
to remove an unhealthy pod from the service less conservatively than
restarting the container
- Tuning the settings for retries and timeouts seem to be a mysterious
art, so these are just a first draft
- It would sometimes fail with this error:
namespaces is forbidden: User "tanzu-user-authentication@groups.vmware.com"
cannot list resource "namespaces" in API group "" at the cluster scope
- Seems like it was because the RBAC rule added by the test needs a
moment before it starts to take effect, so change the test to retry
the API until it succeeds or fail after 3 seconds of trying.
- We want to follow the <noun>Request convention.
- The actual operation does not login a user, but it does retrieve a
credential with which they can login.
- This commit includes changes to all LoginRequest-related symbols and
constants to try to update their names to follow the new
CredentialRequest type.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
As discussed in API review, this field exists for convenience right
now. Since the username/groups are encoded in the Credential sent in
the LoginRequestStatus, the client still has access to their
user/groups information. We want to remove this for now to be
conservative and limit our API surface area (smaller surface area =
less to maintain). We can always add this back in the future.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- When we call the LoginRequest endpoint in loginrequest_test.go,
do it with an unauthenticated client, to make sure that endpoint works
with unauthenticated clients.
- For tests which want to test using certs returned by LoginRequest to
make API calls back to kube to check if those certs are working, make
sure they start with a bare client and then add only those certs.
Avoid accidentally picking up other kubeconfig configuration like
tokens, etc.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
find(1) seems to look at directory entries in the order in which they exist
in the directory fs entry. Let's sort these so that we get the same results
regardless of the order of the directory entries.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- For high availability reasons, we would like our app to scale linearly
with the size of the control plane. Using a DaemonSet allows us to run
one pod on each node-role.kubernetes.io/master node.
- The hope is that the Service that we create should load balance
between these pods appropriately.
Wow fun times with symlinks. We *think* this script should work in CI
now...but we'll see.
Previously we were seeing a false positive where even though the generated
code was out of date, the CI step did not report failure.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Add integration test for serving cert auto-generation and rotation
- Add unit test for `WithInitialEvent` of the cert manager controller
- Move UpdateAPIService() into the `apicerts` package, since that is
the only user of the function.
- Add a unit test for each cert controller
- Make DynamicTLSServingCertProvider an interface and use a mutex
internally
- Create a shared ToPEM function instead of having two very similar
functions
- Move the ObservableWithInformerOption test helper to testutils
- Rename some variables and imports
- Refactors the existing cert generation code into controllers
which read and write a Secret containing the certs
- Does not add any new functionality yet, e.g. no new handling
for cert expiration, and no leader election to allow for
multiple servers running simultaneously
- This commit also doesn't add new tests for the cert generation
code, but it should be more unit testable now as controllers
- No functional changes
- Move all the stuff about clients and controllers into the controller
package
- Add more comments and organize the code more into more helper
functions to make each function smaller
I suppose we could solve this other ways, but this utility was
only used in one place right now, so it is easiest to copy it over.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Not strictly necessary at the moment because both our build layer
and our run layer are based on alpine, but static linking our binary
will help us later when we want to base our run image on something
closer to scratch
Instead, make the integration tests a separate module. You can't run
these tests by accident because they will not run at all when you
`go test` from the top-level directory. You will need to `cd test`
before using `go test` in order to run the integration tests.
Signed-off-by: Ryan Richard <richardry@vmware.com>
- Previously the golang code would create a Service and an APIService.
The APIService would be given an owner reference which pointed to
the namespace in which the app was installed.
- This prevented the app from being uninstalled. The namespace would
refuse to delete, so `kapp delete` or `kubectl delete` would fail.
- The new approach is to statically define the Service and an APIService
in the deployment.yaml, except for the caBundle of the APIService.
Then the golang code will perform an update to add the caBundle at
runtime.
- When the user uses `kapp deploy` or `kubectl apply` either tool will
notice that the caBundle is not declared in the yaml and will
therefore avoid editing that field.
- When the user uses `kapp delete` or `kubectl delete` either tool will
destroy the objects because they are statically declared with names
in the yaml, just like all of the other objects. There are no
ownerReferences used, so nothing should prevent the namespace from
being deleted.
- This approach also allows us to have less golang code to maintain.
- In the future, if our golang controllers want to dynamically add
an Ingress or other objects, they can still do that. An Ingress
would point to our statically defined Service as its backend.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- We are temporarily adding this change on the main branch so that CI works
with the main branch and we can iterate on our changes on our PR branch.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Why? Because the discovery URL is already there in the kubeconfig; let's
not make our lives more complicated by passing it in via an env var.
- Also allow for ytt callers to not specify data.values.discovery_url - there
are going to be a non-trivial number of installers of placeholder-name
that want to use the server URL found in the cluster-info ConfigMap.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Seems like the next step is to allow override of the CA bundle; I didn't
do that here for simplicity of the commit, but seems like it is the right
thing to do in the future.
- Also includes bumping the api and client-go dependencies to the newer
version which also moved LoginDiscoveryConfig to the
crds.placeholder.suzerain-io.github.io group in the generated code
Our unit test command is going to get slighly more complex in a future revision. This should let us avoid having to sync the CI pipeline definition so many times.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Also, don't repeat `spec.Parallel()` because, according to the docs
for the spec package, "options are inherited by subgroups and subspecs"
- Two tests are left pending to be filled in on the next commit
- `Before` gives a nice place to call `require.New(t)` to make the assertion lines more terse
- Just delete the keys for testing when env vars are missing
This is kind of a subtle bug, but we were using the unversioned Kubernetes type package here, where we should have been using the v1beta1 version. They have the same fields, but they serialize to JSON differently.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The type signatures of these methods make them easy to mix up. `require.Error()` asserts that there is any non-nil error -- the last parameter is an optional human-readable message to log when the assertion fails. `require.EqualError()` asserts that there is a non-nil error _and_ that when you call `err.Error()`, the string matches the expected value. It also takes an additional optional parameter to specify the log message.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
Again, no idea why but this word has two commonly accepted spelling and Go code seems to very consistently use the one with one "l".
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The error was:
```
internal/certauthority/certauthority.go:68:15: err113: do not define dynamic errors, use wrapped static errors instead: "fmt.Errorf(\"expected CA to be a single certificate, found %d certificates\", certCount)" (goerr113)
return nil, fmt.Errorf("expected CA to be a single certificate, found %d certificates", certCount)
^
exit status 1
```
I'm not sure if I love this err113 linter.
It turns out these fields are not meant to be base64 encoded, even though that's how they are in the kubeconfig.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
I think we may still split this apart into multiple packages, but for now it works pretty well in both use cases.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Dynamically grant RBAC permission to the test user to allow them
to make read requests via the API
- Then use the credential returned from the LoginRequest to make a
request back to the API server which should be successful
This is a somewhat more basic way to get access to the certificate and private key we need to issue short lived certificates.
The host path, tolerations, and node selector here should work on any kubeadm-derived cluster including TKG-S and Kind.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
This will make manual testing easier and seems like a reasonable tradeoff. We'll iterate more in the future.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
The error was:
```
internal/certauthority/certauthority.go:68:15: err113: do not define dynamic errors, use wrapped static errors instead: "fmt.Errorf(\"expected CA to be a single certificate, found %d certificates\", certCount)" (goerr113)
return nil, fmt.Errorf("expected CA to be a single certificate, found %d certificates", certCount)
^
exit status 1
```
I'm not sure if I love this err113 linter.
It turns out these fields are not meant to be base64 encoded, even though that's how they are in the kubeconfig.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
I think we may still split this apart into multiple packages, but for now it works pretty well in both use cases.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Dynamically grant RBAC permission to the test user to allow them
to make read requests via the API
- Then use the credential returned from the LoginRequest to make a
request back to the API server which should be successful
This is a somewhat more basic way to get access to the certificate and private key we need to issue short lived certificates.
The host path, tolerations, and node selector here should work on any kubeadm-derived cluster including TKG-S and Kind.
Signed-off-by: Matt Moyer <moyerm@vmware.com>
- Mostly testing the way that the validation webhooks are called
- Also error when the auth webhook does not return user info, since we wouldn't know who you are in that case
- Also fix mistakes in the deployment.yaml
- Also hardcode the ownerRef kind and version because otherwise we get an error
Signed-off-by: Monis Khan <mok@vmware.com>
- Users may want to consume pkg/config to generate configuration files.
- This also involved putting config-related utilities in the config
package for ease of consumption.
- We did not add in versioning into the Config type for now...this is
something we will likely do in the future, but it is not deemed
necessary this early in the project.
- The config file format tries to follow the patterns of Kube. One such
example of this is requiring the use of base64-encoded CA bundle PEM
bytes instead of a file path. This also slightly simplifies the config
file handling because we don't have to 1) read in a file or 2) deal
with the error case of the file not being there.
- The webhook code from k8s.io/apiserver is really exactly what we want
here. If this dependency gets too burdensome, we can always drop it,
but the pros outweigh the cons at the moment.
- Writing out a kubeconfig to disk to configure the webhook is a little
janky, but hopefully this won't hurt performance too much in the year
2020.
- Also bonus: call the right *Serve*() function when starting our
servers.
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
- Trying to use "placeholder-name" or "placeholder_name" everywhere
that should later be changed to the actual name of the product,
so we can just do a simple search and replace when we have a name.
This type of issue should only be opened if you intend to create a
formal proposal document. Please refer to the proposal process in
[proposals/README.md](proposals/README.md).
Please title this issue starting with `[Proposal]` followed by a
title for what you are going to propose. For example:
`[Proposal] Lunar landing module authentication via Pinniped`.
-->
### Proposal Tracking Issue
- Proposal: <!-- this starts empty, then please update to link to proposal PR, then also link to proposal doc file after it is merged -->
- Discussion Links: <!-- link to any mailing list threads, Slack conversations, community meetings, or other places where the proposal was discussed, if any -->
-<!-- A -->
-<!-- B -->
- Pull requests: <!-- link to all PRs related to this proposal such as updates to the proposal doc, implementation PRs, etc. - keep this list up to date -->
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
# queries: security-extended,security-and-quality
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name:Autobuild
uses:github/codeql-action/autobuild@v2
# ℹ️ Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
# If the Autobuild fails above, remove it and uncomment the following three lines.
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
Below is a list of solutions where Pinniped is being used as a component.
**[Kubeapps](https://kubeapps.com/)**
Kubeapps uses Pinniped to [enable SSO authentication](https://github.com/kubeapps/kubeapps/blob/master/docs/user/using-an-OIDC-provider-with-pinniped.md) when running on clusters where SSO cannot be configured for the cluster API server.
TKG uses Pinniped to provide a seamless SSO experience across management and workload clusters.
**[VMware Tanzu Mission Control (TMC)](https://tanzu.vmware.com/mission-control)**
TMC uses Pinniped to provide a uniform authentication experience across all attached clusters.
## Adding your organization to the list of adopters
If you are using Pinniped and would like to be included in the list of Pinniped Adopters, add an SVG version of your logo that is less than 150 KB to
the [img directory](https://github.com/vmware-tanzu/pinniped/tree/main/site/themes/pinniped/static/img) in this repo and submit a pull request with your change including 1-2 sentences describing how your organization is using Pinniped. Name the image file something that
reflects your company (e.g., if your company is called Acme, name the image acme.svg). Please feel free to send us a message in [#pinniped](https://kubernetes.slack.com/archives/C01BW364RJA) with any questions you may have.
We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful.
Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when an individual is officially representing the community in public spaces. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at [oss-coc@vmware.com](mailto:oss-coc@vmware.com). All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series of actions.
**Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0,
available at https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder](https://github.com/mozilla/diversity).
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at https://www.contributor-covenant.org/translations.
and tag it with `proposal`, or create a new [Discussion](https://github.com/vmware-tanzu/pinniped/discussions).
The project [maintainers](MAINTAINERS.md) will work with you on your feature request.
Once the feature request has been validated, a [pull request](https://github.com/vmware-tanzu/pinniped/compare)
can be opened to implement the feature.
For specifics on what to include in your feature request, please follow the
guidelines in the issue and pull request templates.
### Reporting security vulnerabilities
Please follow the procedure described in [SECURITY.md](SECURITY.md).
## CLA
We welcome contributions from everyone, but we can only accept them if you sign
our Contributor License Agreement (CLA). If you would like to contribute and you
have not signed it, our CLA-bot will walk you through the process when you open
a Pull Request. For questions about the CLA process, see the
[FAQ](https://cla.vmware.com/faq) or submit a question through the GitHub issue
tracker.
## Learning about Pinniped
New to Pinniped?
- Start here to learn how to install and use Pinniped: [Learn to use Pinniped for federated authentication to Kubernetes clusters](https://pinniped.dev/docs/tutorials/concierge-and-supervisor-demo/)
- Start here to learn how to navigate the source code: [Code Walk-through](https://pinniped.dev/docs/reference/code-walkthrough/)
- Other more detailed documentation can be found at: [Pinniped Docs](https://pinniped.dev/docs/)
## Building
The [Dockerfile](Dockerfile) at the root of the repo can be used to build and
package the server-side code. After making a change to the code, rebuild the
docker image with the following command.
```bash
# From the root directory of the repo...
docker build .
```
The Pinniped CLI client can be built for local use with the following command.
```bash
# From the root directory of the repo...
go build -o pinniped ./cmd/pinniped
```
## Testing
### Running Lint
```bash
./hack/module.sh lint
```
### Running Unit Tests
```bash
./hack/module.sh units
```
### Running Integration Tests
1. Install dependencies:
- [`docker`](https://www.docker.com/)
-`htpasswd` (installed by default on MacOS, usually found in `apache2-utils` package for linux)
Pinniped provides identity services to Kubernetes.
- Easily plug in external identity providers into Kubernetes clusters while offering a simple install and configuration experience. Leverage first class integration with Kubernetes and kubectl command-line.
- Give users a consistent, unified login experience across all your clusters, including on-premises and managed cloud environments.
- Securely integrate with an enterprise IDP using standard protocols or use secure, externally managed identities instead of relying on simple, shared credentials.
To learn more, please visit the Pinniped project's website, https://pinniped.dev.
## Getting started with Pinniped
Care to kick the tires? It's easy to [install and try Pinniped](https://pinniped.dev/docs/).
## Discussion
Got a question, comment, or idea? Please don't hesitate to reach out
via GitHub [Discussions](https://github.com/vmware-tanzu/pinniped/discussions),
Pinniped provides identity services for Kubernetes clusters. The community has adopted this security disclosure and response policy to ensure we responsibly handle critical issues.
## Supported Versions
As of right now, only the latest version of Pinniped is supported.
## Reporting a Vulnerability - Private Disclosure Process
Security is of the highest importance and all security vulnerabilities or suspected security vulnerabilities should be reported to Pinniped privately, to minimize attacks against current users of Pinniped before they are fixed. Vulnerabilities will be investigated and patched on the next patch (or minor) release as soon as possible. This information could be kept entirely internal to the project.
If you know of a publicly disclosed security vulnerability for Pinniped, please **IMMEDIATELY** contact the VMware Security Team (security@vmware.com). The use of encrypted email is encouraged. The public PGP key can be found at https://kb.vmware.com/kb/1055.
**IMPORTANT: Do not file public issues on GitHub for security vulnerabilities**
To report a vulnerability or a security-related issue, please contact the VMware email address with the details of the vulnerability. The email will be fielded by the VMware Security Team and then shared with the Pinniped maintainers who have committer and release permissions. Emails will be addressed within 3 business days, including a detailed plan to investigate the issue and any potential workarounds to perform in the meantime. Do not report non-security-impacting bugs through this channel. Use [GitHub issues](https://github.com/vmware-tanzu/pinniped/issues/new/choose) instead.
## Proposed Email Content
Provide a descriptive subject line and in the body of the email include the following information:
* Basic identity information, such as your name and your affiliation or company.
* Detailed steps to reproduce the vulnerability (POC scripts, screenshots, and logs are all helpful to us).
* Description of the effects of the vulnerability on Pinniped and the related hardware and software configurations, so that the VMware Security Team can reproduce it.
* How the vulnerability affects Pinniped usage and an estimation of the attack surface, if there is one.
* List other projects or dependencies that were used in conjunction with Pinniped to produce the vulnerability.
## When to report a vulnerability
* When you think Pinniped has a potential security vulnerability.
* When you suspect a potential vulnerability but you are unsure that it impacts Pinniped.
* When you know of or suspect a potential vulnerability on another project that is used by Pinniped.
## Patch, Release, and Disclosure
The VMware Security Team will respond to vulnerability reports as follows:
1. The Security Team will investigate the vulnerability and determine its effects and criticality.
2. If the issue is not deemed to be a vulnerability, the Security Team will follow up with a detailed reason for rejection.
3. The Security Team will initiate a conversation with the reporter within 3 business days.
4. If a vulnerability is acknowledged and the timeline for a fix is determined, the Security Team will work on a plan to communicate with the appropriate community, including identifying mitigating steps that affected users can take to protect themselves until the fix is rolled out.
5. The Security Team will also create a [CVSS](https://www.first.org/cvss/specification-document) using the [CVSS Calculator](https://www.first.org/cvss/calculator/3.0). The Security Team makes the final call on the calculated CVSS; it is better to move quickly than making the CVSS perfect. Issues may also be reported to [Mitre](https://cve.mitre.org/) using this [scoring calculator](https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator). The CVE will initially be set to private.
6. The Security Team will work on fixing the vulnerability and perform internal testing before preparing to roll out the fix.
7. The Security Team will provide early disclosure of the vulnerability by emailing the [Pinniped Distributors](https://groups.google.com/g/project-pinniped-distributors) mailing list. Distributors can initially plan for the vulnerability patch ahead of the fix, and later can test the fix and provide feedback to the Pinniped team. See the section **Early Disclosure to Pinniped Distributors List** for details about how to join this mailing list.
8. A public disclosure date is negotiated by the VMware SecurityTeam, the bug submitter, and the distributors list. We prefer to fully disclose the bug as soon as possible once a user mitigation or patch is available. It is reasonable to delay disclosure when the bug or the fix is not yet fully understood, the solution is not well-tested, or for distributor coordination. The timeframe for disclosure is from immediate (especially if it’s already publicly known) to a few weeks. For a critical vulnerability with a straightforward mitigation, we expect the report date for the public disclosure date to be on the order of 14 business days. The VMware Security Team holds the final say when setting a public disclosure date.
9. Once the fix is confirmed, the Security Team will patch the vulnerability in the next patch or minor release, and backport a patch release into all earlier supported releases. Upon release of the patched version of Pinniped, we will follow the **Public Disclosure Process**.
## Public Disclosure Process
The Security Team publishes a [public advisory](https://github.com/vmware-tanzu/pinniped/security/advisories) to the Pinniped community via GitHub. In most cases, additional communication via Slack, Twitter, mailing lists, blog and other channels will assist in educating Pinniped users and rolling out the patched release to affected users.
The Security Team will also publish any mitigating steps users can take until the fix can be applied to their Pinniped instances. Pinniped distributors will handle creating and publishing their own security advisories.
## Mailing lists
* Use security@vmware.com to report security concerns to the VMware Security Team, who uses the list to privately discuss security issues and fixes prior to disclosure. The use of encrypted email is encouraged. The public PGP key can be found at https://kb.vmware.com/kb/1055.
* Join the [Pinniped Distributors](https://groups.google.com/g/project-pinniped-distributors) mailing list for early private information and vulnerability disclosure. Early disclosure may include mitigating steps and additional information on security patch releases. See below for information on how Pinniped distributors or vendors can apply to join this list.
## Early Disclosure to Pinniped Distributors List
The private list is intended to be used primarily to provide actionable information to multiple distributor projects at once. This list is not intended to inform individuals about security issues.
## Membership Criteria
To be eligible to join the [Pinniped Distributors](https://groups.google.com/g/project-pinniped-distributors) mailing list, you should:
1. Be an active distributor of Pinniped.
2. Have a user base that is not limited to your own organization.
3. Have a publicly verifiable track record up to the present day of fixing security issues.
4. Not be a downstream or rebuild of another distributor.
5. Be a participant and active contributor in the Pinniped community.
6. Accept the Embargo Policy that is outlined below.
7. Have someone who is already on the list vouch for the person requesting membership on behalf of your distribution.
**The terms and conditions of the Embargo Policy apply to all members of this mailing list. A request for membership represents your acceptance to the terms and conditions of the Embargo Policy.**
## Embargo Policy
The information that members receive on the Pinniped Distributors mailing list must not be made public, shared, or even hinted at anywhere beyond those who need to know within your specific team, unless you receive explicit approval to do so from the VMware Security Team. This remains true until the public disclosure date/time agreed upon by the list. Members of the list and others cannot use the information for any reason other than to get the issue fixed for your respective distribution's users.
Before you share any information from the list with members of your team who are required to fix the issue, these team members must agree to the same terms, and only be provided with information on a need-to-know basis.
In the unfortunate event that you share information beyond what is permitted by this policy, you must urgently inform the VMware Security Team (security@vmware.com) of exactly what information was leaked and to whom. If you continue to leak information and break the policy outlined here, you will be permanently removed from the list.
## Requesting to Join
Send new membership requests to https://groups.google.com/g/project-pinniped-distributors. In the body of your request please specify how you qualify for membership and fulfill each criterion listed in the Membership Criteria section above.
## Confidentiality, integrity and availability
We consider vulnerabilities leading to the compromise of data confidentiality, elevation of privilege, or integrity to be our highest priority concerns. Availability, in particular in areas relating to DoS and resource exhaustion, is also a serious security concern. The VMware Security Team takes all vulnerabilities, potential vulnerabilities, and suspected vulnerabilities seriously and will investigate them in an urgent and expeditious manner.
Short:"Generate a Pinniped-based kubeconfig for a cluster",
SilenceUsage:true,// do not print usage message when commands fail
}
flagsgetKubeconfigParams
namespacestring// unused now
)
f:=cmd.Flags()
f.StringVar(&flags.staticToken,"static-token","","Instead of doing an OIDC-based login, specify a static token")
f.StringVar(&flags.staticTokenEnvName,"static-token-env","","Instead of doing an OIDC-based login, read a static token from the environment")
f.BoolVar(&flags.concierge.disabled,"no-concierge",false,"Generate a configuration which does not use the Concierge, but sends the credential to the cluster directly")
f.StringVar(&namespace,"concierge-namespace","pinniped-concierge","Namespace in which the Concierge was installed")
f.StringVar(&flags.concierge.credentialIssuer,"concierge-credential-issuer","","Concierge CredentialIssuer object to use for autodiscovery (default: autodiscover)")
f.StringVar(&flags.concierge.authenticatorType,"concierge-authenticator-type","","Concierge authenticator type (e.g., 'webhook', 'jwt') (default: autodiscover)")
f.StringVar(&flags.concierge.authenticatorName,"concierge-authenticator-name","","Concierge authenticator name (default: autodiscover)")
f.StringVar(&flags.concierge.apiGroupSuffix,"concierge-api-group-suffix",groupsuffix.PinnipedDefaultSuffix,"Concierge API group suffix")
f.BoolVar(&flags.concierge.skipWait,"concierge-skip-wait",false,"Skip waiting for any pending Concierge strategies to become ready (default: false)")
f.Var(&flags.concierge.caBundle,"concierge-ca-bundle","Path to TLS certificate authority bundle (PEM format, optional, can be repeated) to use when connecting to the Concierge")
f.StringVar(&flags.concierge.endpoint,"concierge-endpoint","","API base for the Concierge endpoint")
f.Var(&flags.concierge.mode,"concierge-mode","Concierge mode of operation")
f.StringVar(&flags.oidc.clientID,"oidc-client-id",oidcapi.ClientIDPinnipedCLI,"OpenID Connect client ID (default: autodiscover)")
f.Uint16Var(&flags.oidc.listenPort,"oidc-listen-port",0,"TCP port for localhost listener (authorization code flow only)")
f.StringSliceVar(&flags.oidc.scopes,"oidc-scopes",[]string{oidcapi.ScopeOfflineAccess,oidcapi.ScopeOpenID,oidcapi.ScopeRequestAudience,oidcapi.ScopeUsername,oidcapi.ScopeGroups},"OpenID Connect scopes to request during login")
f.BoolVar(&flags.oidc.skipBrowser,"oidc-skip-browser",false,"During OpenID Connect login, skip opening the browser (just print the URL)")
f.StringVar(&flags.oidc.sessionCachePath,"oidc-session-cache","","Path to OpenID Connect session cache file")
f.Var(&flags.oidc.caBundle,"oidc-ca-bundle","Path to TLS certificate authority bundle (PEM format, optional, can be repeated)")
f.BoolVar(&flags.oidc.debugSessionCache,"oidc-debug-session-cache",false,"Print debug logs related to the OpenID Connect session cache")
f.StringVar(&flags.oidc.requestAudience,"oidc-request-audience","","Request a token with an alternate audience using RFC8693 token exchange")
f.StringVar(&flags.oidc.upstreamIDPName,"upstream-identity-provider-name","","The name of the upstream identity provider used during login with a Supervisor")
f.StringVar(&flags.oidc.upstreamIDPType,"upstream-identity-provider-type","",fmt.Sprintf("The type of the upstream identity provider used during login with a Supervisor (e.g. '%s', '%s', '%s')",idpdiscoveryv1alpha1.IDPTypeOIDC,idpdiscoveryv1alpha1.IDPTypeLDAP,idpdiscoveryv1alpha1.IDPTypeActiveDirectory))
f.StringVar(&flags.oidc.upstreamIDPFlow,"upstream-identity-provider-flow","",fmt.Sprintf("The type of client flow to use with the upstream identity provider during login with a Supervisor (e.g. '%s', '%s')",idpdiscoveryv1alpha1.IDPFlowCLIPassword,idpdiscoveryv1alpha1.IDPFlowBrowserAuthcode))
f.StringVar(&flags.kubeconfigPath,"kubeconfig",os.Getenv("KUBECONFIG"),"Path to kubeconfig file")
f.StringVar(&flags.kubeconfigContextOverride,"kubeconfig-context","","Kubeconfig context name (default: current active context)")
f.BoolVar(&flags.skipValidate,"skip-validation",false,"Skip final validation of the kubeconfig (default: false)")
f.DurationVar(&flags.timeout,"timeout",10*time.Minute,"Timeout for autodiscovery and validation")
f.StringVar(&flags.generatedNameSuffix,"generated-name-suffix","-pinniped","Suffix to append to generated cluster, context, user kubeconfig entries")
f.StringVar(&flags.credentialCachePath,"credential-cache","","Path to cluster-specific credentials cache")
f.StringVar(&flags.installHint,"install-hint","The pinniped CLI does not appear to be installed. See https://get.pinniped.dev/cli for more details","This text is shown to the user when the pinniped CLI is not installed.")
mustMarkHidden(cmd,"oidc-debug-session-cache")
// --oidc-skip-listen is mainly needed for testing. We'll leave it hidden until we have a non-testing use case.
returnnil,fmt.Errorf("multiple authenticators were found, so the --concierge-authenticator-type/--concierge-authenticator-name flags must be specified")
cmd.Flags().Uint16Var(&flags.listenPort,"listen-port",0,"TCP port for localhost listener (authorization code flow only)")
cmd.Flags().StringSliceVar(&flags.scopes,"scopes",[]string{oidcapi.ScopeOfflineAccess,oidcapi.ScopeOpenID,oidcapi.ScopeRequestAudience,oidcapi.ScopeUsername,oidcapi.ScopeGroups},"OIDC scopes to request during login")
cmd.Flags().BoolVar(&flags.skipBrowser,"skip-browser",false,"Skip opening the browser (just print the URL)")
cmd.Flags().BoolVar(&flags.skipListen,"skip-listen",false,"Skip starting a localhost callback listener (manual copy/paste flow only)")
cmd.Flags().StringVar(&flags.sessionCachePath,"session-cache",filepath.Join(mustGetConfigDir(),"sessions.yaml"),"Path to session cache file")
cmd.Flags().StringSliceVar(&flags.caBundlePaths,"ca-bundle",nil,"Path to TLS certificate authority bundle (PEM format, optional, can be repeated)")
cmd.Flags().StringSliceVar(&flags.caBundleData,"ca-bundle-data",nil,"Base64 encoded TLS certificate authority bundle (base64 encoded PEM format, optional, can be repeated)")
cmd.Flags().BoolVar(&flags.debugSessionCache,"debug-session-cache",false,"Print debug logs related to the session cache")
cmd.Flags().StringVar(&flags.requestAudience,"request-audience","","Request a token with an alternate audience using RFC8693 token exchange")
cmd.Flags().BoolVar(&flags.conciergeEnabled,"enable-concierge",false,"Use the Concierge to login")
cmd.Flags().StringVar(&conciergeNamespace,"concierge-namespace","pinniped-concierge","Namespace in which the Concierge was installed")
cmd.Flags().StringVar(&flags.conciergeAuthenticatorType,"concierge-authenticator-type","","Concierge authenticator type (e.g., 'webhook', 'jwt')")
cmd.Flags().StringVar(&flags.conciergeEndpoint,"concierge-endpoint","","API base for the Concierge endpoint")
cmd.Flags().StringVar(&flags.conciergeCABundle,"concierge-ca-bundle-data","","CA bundle to use when connecting to the Concierge")
cmd.Flags().StringVar(&flags.conciergeAPIGroupSuffix,"concierge-api-group-suffix",groupsuffix.PinnipedDefaultSuffix,"Concierge API group suffix")
cmd.Flags().StringVar(&flags.credentialCachePath,"credential-cache",filepath.Join(mustGetConfigDir(),"credentials.yaml"),"Path to cluster-specific credentials cache (\"\" disables the cache)")
cmd.Flags().StringVar(&flags.upstreamIdentityProviderName,"upstream-identity-provider-name","","The name of the upstream identity provider used during login with a Supervisor")
cmd.Flags().StringVar(&flags.upstreamIdentityProviderType,"upstream-identity-provider-type",idpdiscoveryv1alpha1.IDPTypeOIDC.String(),fmt.Sprintf("The type of the upstream identity provider used during login with a Supervisor (e.g. '%s', '%s', '%s')",idpdiscoveryv1alpha1.IDPTypeOIDC,idpdiscoveryv1alpha1.IDPTypeLDAP,idpdiscoveryv1alpha1.IDPTypeActiveDirectory))
cmd.Flags().StringVar(&flags.upstreamIdentityProviderFlow,"upstream-identity-provider-flow","",fmt.Sprintf("The type of client flow to use with the upstream identity provider during login with a Supervisor (e.g. '%s', '%s')",idpdiscoveryv1alpha1.IDPFlowBrowserAuthcode,idpdiscoveryv1alpha1.IDPFlowCLIPassword))
// --skip-listen is mainly needed for testing. We'll leave it hidden until we have a non-testing use case.
Use "pinniped get kubeconfig" to generate a kubeconfig file which includes this
login command in its configuration. This login command is not meant to be
invoked directly by a user.
This login command is a Kubernetes client-go credential plugin which is meant to
be configured inside a kubeconfig file. (See the Kubernetes authentication
documentation for more information about client-go credential plugins.)
Usage:
oidc --issuer ISSUER [flags]
Flags:
--ca-bundle strings Path to TLS certificate authority bundle (PEM format, optional, can be repeated)
--ca-bundle-data strings Base64 encoded TLS certificate authority bundle (base64 encoded PEM format, optional, can be repeated)
--client-id string OpenID Connect client ID (default "pinniped-cli")
--concierge-api-group-suffix string Concierge API group suffix (default "pinniped.dev")
--concierge-authenticator-name string Concierge authenticator name
--concierge-authenticator-type string Concierge authenticator type (e.g., 'webhook', 'jwt')
--concierge-ca-bundle-data string CA bundle to use when connecting to the Concierge
--concierge-endpoint string API base for the Concierge endpoint
--credential-cache string Path to cluster-specific credentials cache ("" disables the cache) (default "`+cfgDir+`/credentials.yaml")
--enable-concierge Use the Concierge to login
-h, --help help for oidc
--issuer string OpenID Connect issuer URL
--listen-port uint16 TCP port for localhost listener (authorization code flow only)
--request-audience string Request a token with an alternate audience using RFC8693 token exchange
--scopes strings OIDC scopes to request during login (default [offline_access,openid,pinniped:request-audience,username,groups])
--session-cache string Path to session cache file (default "`+cfgDir+`/sessions.yaml")
--skip-browser Skip opening the browser (just print the URL)
--upstream-identity-provider-flow string The type of client flow to use with the upstream identity provider during login with a Supervisor (e.g. 'browser_authcode', 'cli_password')
--upstream-identity-provider-name string The name of the upstream identity provider used during login with a Supervisor
--upstream-identity-provider-type string The type of the upstream identity provider used during login with a Supervisor (e.g. 'oidc', 'ldap', 'activedirectory') (default "oidc")
`),
},
{
name:"missing required flags",
args:[]string{},
wantError:true,
wantStderr:here.Doc(`
Error: required flag(s) "issuer" not set
`),
},
{
name:"missing concierge flags",
args:[]string{
"--client-id","test-client-id",
"--issuer","test-issuer",
"--enable-concierge",
},
wantError:true,
wantStderr:here.Doc(`
Error: invalid Concierge parameters: endpoint must not be empty
`),
},
{
name:"invalid CA bundle path",
args:[]string{
"--client-id","test-client-id",
"--issuer","test-issuer",
"--ca-bundle","./does/not/exist",
},
wantError:true,
wantStderr:here.Doc(`
Error: could not read --ca-bundle: open ./does/not/exist: no such file or directory
`),
},
{
name:"invalid CA bundle data",
args:[]string{
"--client-id","test-client-id",
"--issuer","test-issuer",
"--ca-bundle-data","invalid-base64",
},
wantError:true,
wantStderr:here.Doc(`
Error: could not read --ca-bundle-data: illegal base64 data at input byte 7
Error: invalid Concierge parameters: invalid API group suffix: a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
`),
},
{
name:"invalid upstream type is an error",
args:[]string{
"--issuer","test-issuer",
"--upstream-identity-provider-type","invalid",
},
wantError:true,
wantStderr:here.Doc(`
Error: --upstream-identity-provider-type value not recognized: invalid (supported values: oidc, ldap, activedirectory)
`),
},
{
name:"invalid upstream type when flow override env var is used is still an error",
name:"oidc upstream type with unsupported flow is an error",
args:[]string{
"--issuer","test-issuer",
"--client-id","test-client-id",
"--upstream-identity-provider-type","oidc",
"--upstream-identity-provider-flow","foobar",
"--credential-cache","",// must specify --credential-cache or else the cache file on disk causes test pollution
},
wantError:true,
wantStderr:here.Doc(`
Error: --upstream-identity-provider-flow value not recognized for identity provider type "oidc": foobar (supported values: browser_authcode, cli_password)
`),
},
{
name:"oidc upstream type with unsupported flow in flow override env var is an error",
Error: PINNIPED_UPSTREAM_IDENTITY_PROVIDER_FLOW value not recognized for identity provider type "oidc": foo (supported values: browser_authcode, cli_password)
`),
},
{
name:"ldap upstream type with default flow is allowed",
args:[]string{
"--issuer","test-issuer",
"--client-id","test-client-id",
"--upstream-identity-provider-type","ldap",
"--credential-cache","",// must specify --credential-cache or else the cache file on disk causes test pollution
name:"ldap upstream type with unsupported flow is an error",
args:[]string{
"--issuer","test-issuer",
"--client-id","test-client-id",
"--upstream-identity-provider-type","ldap",
"--upstream-identity-provider-flow","foo",
"--credential-cache","",// must specify --credential-cache or else the cache file on disk causes test pollution
},
wantError:true,
wantStderr:here.Doc(`
Error: --upstream-identity-provider-flow value not recognized for identity provider type "ldap": foo (supported values: cli_password, browser_authcode)
`),
},
{
name:"ldap upstream type with unsupported flow in flow override env var is an error",
Error: PINNIPED_UPSTREAM_IDENTITY_PROVIDER_FLOW value not recognized for identity provider type "ldap": foo (supported values: cli_password, browser_authcode)
`),
},
{
name:"active directory upstream type with CLI flow is allowed",
"--credential-cache","",// must specify --credential-cache or else the cache file on disk causes test pollution
},
wantError:true,
wantStderr:here.Doc(`
Error: --upstream-identity-provider-flow value not recognized for identity provider type "activedirectory": foo (supported values: cli_password, browser_authcode)
`),
},
{
name:"active directory upstream type with unsupported flow in flow override env var is an error",
Error: PINNIPED_UPSTREAM_IDENTITY_PROVIDER_FLOW value not recognized for identity provider type "activedirectory": foo (supported values: cli_password, browser_authcode)
`),
},
{
name:"login error",
args:[]string{
"--client-id","test-client-id",
"--issuer","test-issuer",
"--credential-cache","",// must specify --credential-cache or else the cache file on disk causes test pollution
},
loginErr:fmt.Errorf("some login error"),
wantOptionsCount:4,
wantError:true,
wantStderr:here.Doc(`
Error: could not complete Pinniped login: some login error
cmd.Flags().StringVar(&flags.conciergeEndpoint,"concierge-endpoint","","API base for the Concierge endpoint")
cmd.Flags().StringVar(&flags.conciergeCABundle,"concierge-ca-bundle-data","","CA bundle to use when connecting to the Concierge")
cmd.Flags().StringVar(&flags.conciergeAPIGroupSuffix,"concierge-api-group-suffix",groupsuffix.PinnipedDefaultSuffix,"Concierge API group suffix")
cmd.Flags().StringVar(&flags.credentialCachePath,"credential-cache",filepath.Join(mustGetConfigDir(),"credentials.yaml"),"Path to cluster-specific credentials cache (\"\" disables the cache)")
Error: invalid Concierge parameters: invalid API group suffix: a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')
wantStderr:"Error: could not complete WhoAmIRequest (is the Pinniped WhoAmI API running and healthy?): whoamirequests.identity.concierge.pinniped.dev \"whatever\" not found\n",
// Copyright 2020-2022 the Pinniped contributors. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
packagemain
import(
"os"
"github.com/pkg/browser"
"go.pinniped.dev/cmd/pinniped/cmd"
// this side effect import ensures that we use fipsonly crypto in fips_strict mode.
_"go.pinniped.dev/internal/crypto/ptls"
)
//nolint:gochecknoinits
funcinit(){
// browsers like chrome like to write to our std out which breaks our JSON ExecCredential output
// thus we redirect the browser's std out to our std err
browser.Stdout=os.Stderr
}
funcmain(){
iferr:=cmd.Execute();err!=nil{
os.Exit(1)
}
}
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.