utils/base64.cc had some strange code with a strange comment in
base64_begins_with().
The code had
base.substr(operand.size() - 4, operand.size())
The comment claims that this is "last 4 bytes of base64-encoded string",
but this comment is misleading - operand is typically shorter than base
(this this whole point of the base64_begins_with()), so the real
intention of the code is not to find the *last* 4 bytes of base, but rather
the *next* four bytes after the (operand.size() - 4) which we already copied.
These four bytes that may need the full power of base64_decode_string()
because they may or may not contain padding.
But, if we really want the next 4 bytes, why pass operand.size() as the
length of the substring? operand.size() is at least 4 (it's a mutiple of
4, and if it's 0 we returned earlier), but it could me more. We don't
need more, we just need 4. It's not really wrong to take more than 4 (so
this patch doesn't *fix* any bug), but can be wasteful. So this code
should be:
base.substr(operand.size() - 4, 4)
We already have in test/boost/alternator_unit_test.cc a test,
test_base64_begins_with that takes encoded base64 strings up to 12
characters in length (corresponding to decoded strings up to 8 chars),
and substrings from length 0 to the base string's length, and check
that test_base64_begins_with succeeds.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#25712
This patch fixes an error-path bug in the base-64 decoding code in
utils/base64.cc, which among other things is used in Alternator to decode
blobs in JSON requests.
The base-64 decoding code has a lookup table, which was wrongly sized 255
bytes, but needed to be 256 bytes. This meant that if the byte 255 (0xFF)
was included in an invalid base-64 string, instead of detecting that this
is an invalid byte (since the only valid bytes in a base-64 string are
A-Z,a-z,0-9,+,/ and =), the code would either think it's valid with a
nonsense 6-bit part, or even crash on an out-of-bounds read.
Besides the trivial fix, this patch also includes a reproducing test,
which tries to write a blob as a supposedly base-64 encoded string with
a 0xFF byte in it. The test fails before this patch (the write succeeds,
unexpectedly), and passes after this patch (the write fails as
expected). The test also passes on DynamoDB.
Fixes#25701
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#25705
Add helpers for base64url encoding.
base64url is a variant of base64 that uses a URL-safe alphabet. It can
be constructed from base64 by replacing the '+' and '/' characters with
'-' and '_' respectively. Many implementations also strip the padding,
although this is not required by the spec [1].
This will be used in upcoming patches for Azure Key Vault requests that
require base64url-encoded payloads.
[1] https://datatracker.ietf.org/doc/html/rfc4648#section-5
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
The later includes the former and in addition to `seastar::format()`,
`print.hh` also provides helpers like `seastar::fprint()` and
`seastar::print()`, which are deprecated and not used by scylladb.
Previously, we include `seastar/core/print.hh` for using
`seastar::format()`. and in seastar 5b04939e, we extracted
`seastar::format()` into `seastar/core/format.hh`. this allows us
to include a much smaller header.
In this change, we just include `seastar/core/format.hh` in place of
`seastar/core/print.hh`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21574
We already fixed the case of missing padding but there is also
more generic one where input for decode function contains non
base64 characters.
This is mostly done for alternator purpose, it should discard
the request containing such data and return 400 http error.
Addionally some harmless integer overflow during integer casting
was fixed here. This was attempted to be fixed by 2d33a3f
but since we also implicitly cast to uint8_t the problem persisted.
This is done to make alternator behavior more on a pair with dynamodb.
Decode function is used there when processing user requests containing binary
item values. We will now discard improperly formed user input with 400 http error.
It also makes it more consistent as some of our other base64 functions
may have assumed padding is present.
The patch should not break other usages of base64 functions as the only one is
in db/hints where the code already throws std::runtime_error.
Fixes#6487
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
The base64 encoding/decoding functions will be used for serialization of
hint sync point descriptions. Base64 format is not specific to
Alternator, so it can be moved to utils.