Commit Graph

14 Commits

Author SHA1 Message Date
Kefu Chai
7215d4bfe9 utils: do not include unused headers
these unused includes were identifier by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.

please note, because quite a few source files relied on
`utils/to_string.hh` to pull in the specialization of
`fmt::formatter<std::optional<T>>`, after removing
`#include <fmt/std.h>` from `utils/to_string.hh`, we have to
include `fmt/std.h` directly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-01-14 07:56:39 -05:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Kefu Chai
e42d83dc46 treewide: include used headers
before this change, we rely on `seastar/util/std-compat.hh` to
include the used headers provided by stdandard library. this was
necessary before we moved to a C++20 compliant standard library
implementation. but since Seastar has dropped C++17 support. its
`seastar/util/std-compat.hh` is not responsible for providing these
headers anymore.

so, in this change, we include the used headers directly instead
of relying on `seastar/util/std-compat.hh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18883
2024-05-27 17:34:38 +03:00
Kefu Chai
a1dcddd300 utils: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16833
2024-01-18 12:50:06 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Tomasz Grabiec
7747f2dde3 Merge "nodetool toppartitions" from Rafi & Avi
Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write
operation over a period of time.

Content:
- data_listener classes: mechanism that interfaces with mutation readers in database and table classes,
- toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this
  interfaces with data_listeners and the REST api),
- REST api for toppartitions query.

Uses Top-k structure for handling stream summary statistics (based on implementation in C*, see #2811).

What's still missing:
- JMX interface to nodetool (interface customization may be required),
- Querying #rows and #bytes (currently, only #partitions is supported).

Fixes #2811

* https://github.com/avikivity/scylla rafie_toppartitions_v7.1:
  top_k: whitespace and minor fixes
  top_k: map template arguments
  top_k: std::list -> chunked_vector
  top_k: support for appending top_k results
  nodetool toppartitions: refactor table::config constructor
  nodetool toppartitions: data listeners
  nodetool toppartitions: add data_listeners to database/table
  nodetool toppartitions: fully_qualified_cf_name
  nodetool toppartitions: Toppartitions query implementation
  nodetool toppartitions: Toppartitions query REST API
  nodetool toppartitions: nodetool-toppartitions script
2018-12-28 16:31:24 +01:00
Rafi Einstein
eda43b93c9 top_k: support for appending top_k results
Allow appending results of one top_k into another.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:56 +02:00
Rafi Einstein
aeebe8e86b top_k: std::list -> chunked_vector
Replaced std::list with chunked_vector. Because chunked_vector requires
a noexcept move constructor from its value type, change the bad_boy type
in the unit test not to throw in the move constructor.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:07 +02:00
Rafi Einstein
533e46ac72 top_k: map template arguments
Added Hash and KeyEqual template arguments to enable unordered_map in top_k implementation.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-20 16:41:40 +02:00
Rafi Einstein
75f21954d4 top_k: whitespace and minor fixes
Style and minor logic changes from code review.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-20 16:41:33 +02:00
Benny Halevy
dcd18e2b62 remove exec permission from top_k source files
This was introduced by 32525f2694

Cc: Rafi Einstein <rafie@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181121163352.13325-1-bhalevy@scylladb.com>
2018-11-21 18:38:50 +02:00
Rafi Einstein
32525f2694 Space-Saving Top-k algorithm for handling stream summary statistics
Based on the following implementation ([2]) for the Space-Saving algorithm from [1].
[1] http://www.cse.ust.hk/~raywong/comp5331/References/EfficientComputationOfFrequentAndTop-kElementsInDataStreams.pdf
[2] https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/StreamSummary.java

The algorithm keeps a map between keys seen and their counts, keeping a bound on the number of tracked keys.
Replacement policy evicts the key with the lowest count while inheriting its count, and recording an estimation
of the error which results from that.
This error estimation can be later used to prove if the distribution we arrived at corresponds to the real top-K,
which we can display alongside the results.
Accuracy depends on the number of tracked keys.

Introduced as part of 'nodetool toppartition' query implementation.

Refs #2811
Message-Id: <20181027220937.58077-1-rafie@scylladb.com>
2018-10-28 10:10:28 +02:00