Commit Graph

18679 Commits

Author SHA1 Message Date
Piotr Jastrzebski
f711cce024 sstables: Handle empty counter value in read path
Due to a bug in an sstable writer, empty counters
were stored without a header.

Correct way of storing empty counter is to still write
a header that indicates the emptiness.

Next patch in this series fixes the write path
but we have to make sure that we handle incorrectly
serialized counters in the read path becuase there
may exist sstables with counters stored without header.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-05-21 12:07:12 +02:00
Avi Kivity
d92973ba86 Merge "scylla-gdb.py: scylla_fiber: add fallback mode" from Botond
"
Add a fallback-mode that can be used when the `scylla ptr` cannot be
used, either because the application is not built with the seastar
allocator, or due to bugs. The fallback mode relies on a more primitive
method for determining how much memory to scan looking for task pointers
inside the task object. This mode, being more primitive, is less prone
to errors, but is more wasteful and less precise.
"

* 'scylla-fiber-fallback-mode/v2' of https://github.com/denesb/scylla:
  scylla-gdb.py: scylla_fiber: add fallback mode
  scylla-gdb.py: scylla_ptr: add is_seastar_allocator_used()
  scylla-gdb.py: pointer_metadata: allow constructing from non-seastar pointers
  scylla-gdb.py: scylla_fiber: fix misaligned text in docstring
2019-05-19 18:34:55 +03:00
Takuya ASADA
4b08a3f906 reloc/python3: add license files on relocatable python3 package
It's better to have license files on our python3 distribution.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190516094329.13273-1-syuu@scylladb.com>
2019-05-19 18:30:19 +03:00
Jesse Haber-Kucharsky
68353a8265 build: Don't build iotune unconditionally
We compile Seastar unconditionally so that changes to Seastar files are
reflected in Scylla when it's built.

We don't need to unconditionally build `iotune` in the same way.

`iotune` is still listed as a build artifact, so it will be built if
`ninja` is invoked without a particular target.

However, building a specific target (like `ninja build/dev/scylla`) will
not build `iotune`.

Fixes #4165

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <9fb96a281580a8743e04d5dd11398be53960cb58.1558100815.git.jhaberku@scylladb.com>
2019-05-19 18:24:05 +03:00
Avi Kivity
5a276d44af Merge "row_cache: Make invalidate() preemptible" from Tomasz
"
This patchset fixes reactor stalls caused by cache invalidation not being preemptible.
This becomes a problem when there is a lot of partitions in cache inside the invalidated range.

This affects high-level operations like nodetool refresh, table
truncation, repair and streaming.

Fixes #2683

The improvement on stalls was measured using tests/perf_row_cache_update:

  Before:

    Small partitions, no overwrites:
    invalidation: 339.420624 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 339.422144 [ms]}
    Small partition with a few rows:
    invalidation: 191.855331 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 191.856816 [ms]}
    Large partition, lots of small rows:
    invalidation: 0.959328 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 0.961453 [ms]}

  After:

    Small partitions, no overwrites:
    invalidation: 400.505554 [ms], preemption: {count: 843, 99%: 0.545791 [ms], max: 0.502340 [ms]}
    Small partition with a few rows:
    invalidation: 306.352600 [ms], preemption: {count: 644, 99%: 0.545791 [ms], max: 0.506464 [ms]}
    Large partition, lots of small rows:
    invalidation: 0.963660 [ms], preemption: {count: 2, 99%: 0.009887 [ms], max: 0.963264 [ms]}

The maximum scheduling latency went down form 339 ms to 0.5 ms (task quota).

Tests:
  - unit (dev)
"

* tag 'cache-preemptible-invalidation-v2' of github.com:tgrabiec/scylla:
  row_cache: Make invalidate() preemptible
  row_cache: Switch _prev_snapshot_pos to be a ring_position_ext
  dht: Introduce ring_position_ext
  dht: ring_position_view: Take key by const pointer
  tests: perf_row_cache_update: Rename 'stall' to 'preemption' to avoid confusion
  tests: perf_row_cache_update: Report stalls around invalidation
2019-05-19 10:47:46 +03:00
Takuya ASADA
f625284113 dist/debian: apply product name variable on override_dh_auto_install
To make product name templatization works correctly, we cannot use
"debian/scylla-server" as package contents directory path,
need to use template like "debian/{{product}}-server" instead.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190517121946.18248-1-syuu@scylladb.com>
2019-05-19 10:46:08 +03:00
Gleb Natapov
31bf4cfb5e cache_hitrate_calculator: make cache hitrate calculation preemptable
The calculation is done in a non preemptable loop over all tables, so if
numbers of tables is very large it may take a while since we also build
a string for gossiper state. Make the loop preemtable and also make
the string calculation more efficient by preallocating memory for it.
Message-Id: <20190516132748.6469-3-gleb@scylladb.com>
2019-05-16 15:32:36 +02:00
Gleb Natapov
4517c56a57 cache_hitrate_calculator: do not copy stats map for each cpu
invoke_on_all() copies provided function for each shard it is executed
on, so by moving stats map into the capture we copy it for each shard
too. Avoid it by putting it into the top level object which is already
captured by reference.
Message-Id: <20190516132748.6469-2-gleb@scylladb.com>
2019-05-16 15:32:24 +02:00
Dejan Mircevski
8dcb35913a table: Avoid needless allocation of cell lockers
All `table` instances currently unconditionally allocate a cell locker
for counter cells, though not all need one.  Since the lockers occupy
quite a bit of memory (as reported in #4441), it's wasteful to
allocate them when unneeded.

Fixes #4441.

Tests: unit (dev, debug)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190515190910.87931-1-dejan@scylladb.com>
2019-05-16 11:10:38 +03:00
Avi Kivity
5b2c8847c7 Merge "Pre timestamp based data segregation cleanup" from Botond
"
This series contains loosely related generic cleanup patches that the
timestamp based data segregation series depends on. Most of the patches
have to do with making headers self-sustainable, that is compilable on
their own. This was needed to be able to ensure that the new headers
introduced or touched by that series are self-sustainable too.
This series also introduces `schema_fwd.hh` which contains a forward
declaration of `schema` and `schema_ptr` classes. No effort was made to
find and replace all existing ad-hoc schema forward declarations in the
source tree.
"

* 'pre-timestamp-based-data-segregation-cleanup/v1' of https://github.com/denesb/scylla:
  encoding_stats.hh: add missing include
  sstables/time_window_compaction_strategy.hh: make self-sufficient
  sstables/size_tiered_compaction_strategy.hh: make self-sufficient
  sstables/compaction_strategy_impl.hh: make header self-sufficient
  compaction_strategy.hh: use schema_fwd.hh
  db/extensions.hh: use schema_fwd.hh
  Add schema_fwd.hh
2019-05-15 17:37:06 +03:00
Asias He
51c4f8cc47 repair: Fix use after free in remove_repair_meta for repair_metas
We should capture repair_metas so that it will not be freed until the
parallel_for_each is finished.

Fixes: #4333
Tests: repair_additional_test.py:RepairAdditionalTest.repair_kill_1_test
Message-Id: <237b20a359122a639330f9f78c67568410aef014.1557922403.git.asias@scylladb.com>
2019-05-15 17:22:51 +03:00
Calle Wilund
e7003f1051 sstable: Make all sstable components subject to file extensions
Makes opening all sstable components go through same file open
routine, optionally applying extensions to each (except TOC which
is special).

Also ensures we read Scylla metadata before other non-TOC
components, as we might need this for extensions (hint hint).

Message-Id: <20190513201821.14417-1-calle@scylladb.com>
2019-05-15 17:14:58 +03:00
Botond Dénes
a0010f52c5 scylla-gdb.py: scylla_fiber: add fallback mode
The current implementation of the `scylla fiber` command relies on the
`scylla ptr` command to provide metadata on pointers, more
specifically the boundaries of the region the object they point to
occupies. However, in debug mode, seastar is using the standard allocator
and thus the `scylla ptr` command doesn't work.
To work around this, provide a fallback mode for debug builds. This mode
assumes pointers point to the start of objetcts and scans a
configurable region of memory. While less exact than the variant relying
on `scylla ptr` it still works reasonably well.
The size of the to-be-scanned memory region can be set using the
`--scanned-region-size` command line argument. This defaults to 512.

Additionally, add a flag (`--force-fallback-mode`) to force using the
fallback mode. This is useful if `scylla ptr` is not working for any
reason.
2019-05-15 15:46:42 +03:00
Botond Dénes
c78d667153 scylla-gdb.py: scylla_ptr: add is_seastar_allocator_used()
Determines whether the application is using the seastar allocator or
not. This is done by attempting to resolve the
`seastar::memory::cpu_mem` symbol. To avoid the expensive symbol lookup
the result is cached. This means that loading a new inferior will
possibly return the wrong value. The cache can be flushed by re-sourcing
the `scylla-gdb.py` script.
2019-05-15 15:44:38 +03:00
Botond Dénes
c3a06da8fb scylla-gdb.py: pointer_metadata: allow constructing from non-seastar pointers 2019-05-15 15:43:34 +03:00
Botond Dénes
4964671e83 scylla-gdb.py: scylla_fiber: fix misaligned text in docstring 2019-05-15 15:43:29 +03:00
Avi Kivity
8e19121e98 Merge "Implement simple selection alongside aggregation" from Dejan
"
Although CQL allows SELECT statements with both simple and aggregate
selectors, Scylla disallows them.  This patch removes that restriction
and ensures that mixed simple/aggregate selection works as specified
both with and without GROUP BY.

Tests: unit (dev)
"

* 'aggregate-and-simple-select-together' of https://github.com/dekimir/scylla:
  cql: Fix mixed selection with GROUP BY
  cql: Allow mixing of aggregate and simple selectors
2019-05-14 20:03:58 +03:00
Dejan Mircevski
f9b00a4318 cql: Fix mixed selection with GROUP BY
GROUP BY is currently supported by simple_selection, the class used
when all selectors are simple.  But when selectors are mixed, we use
selection_with_processing, which does not yet support GROUP BY.  This
patch fixes that.

It also adapts one testcase in filtering_test to the new behavior of
simple_selector.  The test currently expects the last value seen, but
simple_selector now outputs the first value seen.

(More details: the WHERE clause implicitly selects the columns it
references, and unit tests are forced to provide expected values for
these columns.  The user-visible result is unchanged in the test;
users never see the WHERE column values due to filtering in
cql::transport, outside unit tests.)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-14 12:50:39 -04:00
Dejan Mircevski
06e3b36164 cql: Allow mixing of aggregate and simple selectors
Scylla currently rejects SELECT statements with both simple and
aggregate selectors, but Cassandra allows them.  This patch brings
parity to Scylla.

Fixes #4447.

Tests: unit (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2019-05-14 10:34:02 -04:00
Botond Dénes
fe3b798b51 scylla-gdb.py: scylla fiber: add seastar::smp_message_queue::async_work_item to the whitelist
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4c49fcf5391e027eae68707c9e6ab2f9188c2ea4.1557838171.git.bdenes@scylladb.com>
2019-05-14 17:09:32 +03:00
Avi Kivity
82b91c1511 Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz
"
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.
Refs #4485.
"

* tag 'fix-gc_clock-digest-v2.1' of github.com:tgrabiec/scylla:
  tests: Add test which verifies that schema digest stays the same
  tests: Add sstables for the schema digest test
  schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition
  db/schema_tables: Move feed_hash_for_schema_digest() to .cc file
  hashing: Introduce type-erased interface for the hasher
  hashing: Introduce C++ concept for the hasher
  hashers: Rename hasher to cryptopp_hasher
  gc_clock: Fix hashing to be backwards-compatible
2019-05-14 16:59:50 +03:00
Tomasz Grabiec
285ada5035 Merge "config: remove _make_config_values macro" from Avi
The _make_config_values macro reduces duplication (both the item name
and the types need to be available as C++ identifiers and as runtime
strings), but is hard to work with. The macro is huge and editors
don't handle it well, errors aren't identified at the correct
location, and since the macro doesn't have types, it's hard to
refactor.

This series replaces the macro with ordinary C++ code. Some repetition is
introduced, but IMO the result is easier to maintain than the macro. As a
bonus the bulk of the code is moved away from the header file.

Tests: unit (dev), manual testing of the config REST API

* https://github.com/avikivity/scylla config-no-macro/v2
  config: make the named_value type name available without requiring
    _make_config_values
  config: remove value_status from named_value template parameter list
  config: add named_value::value_as_json()
  api: config: stop using _make_config_values
  config: auto-add named_values into config_file
  config: add allowed_values parameter to named_value constructor
  config: convert _make_config_values to individual named_value member
    declarations and initializers
2019-05-14 16:00:23 +03:00
Avi Kivity
987739898f docs: document SSTable Scylla.db component
Document the format and meaning of the various bits of the Scylla.db component.
Message-Id: <20190513081605.7394-1-avi@scylladb.com>
2019-05-14 16:00:23 +03:00
Avi Kivity
786ce70dfc doc: mention the Slack workspace as a place to get help
Message-Id: <20190514090420.5598-1-avi@scylladb.com>
2019-05-14 16:00:23 +03:00
Botond Dénes
c2ec78358b encoding_stats.hh: add missing include 2019-05-14 13:27:30 +03:00
Botond Dénes
eeacf45b4a sstables/time_window_compaction_strategy.hh: make self-sufficient 2019-05-14 13:27:30 +03:00
Botond Dénes
9953cecc83 sstables/size_tiered_compaction_strategy.hh: make self-sufficient 2019-05-14 13:27:30 +03:00
Botond Dénes
d02c2253a5 sstables/compaction_strategy_impl.hh: make header self-sufficient
Add missing includes and forward declarations. De-inline some methods.
2019-05-14 13:27:30 +03:00
Botond Dénes
20d9d18ab3 compaction_strategy.hh: use schema_fwd.hh 2019-05-14 13:27:30 +03:00
Botond Dénes
690ef09b8f db/extensions.hh: use schema_fwd.hh 2019-05-14 13:27:30 +03:00
Botond Dénes
48bf1d5629 Add schema_fwd.hh 2019-05-14 13:27:30 +03:00
Tomasz Grabiec
6159d5522d tests: Add test which verifies that schema digest stays the same
(cherry picked from commit 8019634dba)
2019-05-14 10:43:06 +02:00
Tomasz Grabiec
815295547d tests: Add sstables for the schema digest test
Generated by running test_schema_digest_does_not_change with
regenerate set to true.

(cherry picked from commit 1f2995c8c5)
2019-05-14 10:43:06 +02:00
Tomasz Grabiec
9de071d214 schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition
Schema digest is calculated by querying for mutations of all schema
tables, then compacting them so that all tombstones in them are
dropped. However, even if the mutation becomes empty after compaction,
we still feed its partition key. If the same mutations were compacted
prior to the query, because the tombstones expire, we won't get any
mutation at all and won't feed the partition key. So schema digest
will change once an empty partition of some schema table is compacted
away.

That's not a problem during normal cluster operation because the
tombstones will expire at all nodes at the same time, and schema
digest, although changes, will change to the same value on all nodes
at about the same time.

This fix changes digest calculation to not feed any digest for
partitions which are empty after compaction.

The digest returned by schema_mutations::digest() is left unchanged by
this patch. It affects the table schema version calculation. It's not
changed because the version is calculated on boot, where we don't yet
know all the cluster features. It's possible to fix this but it's more
complicated, so this patch defers that.

Refs #4485.

Asd
2019-05-14 10:43:06 +02:00
Tomasz Grabiec
3a4a903674 db/schema_tables: Move feed_hash_for_schema_digest() to .cc file 2019-05-14 10:43:06 +02:00
Tomasz Grabiec
b0eecdcb8f hashing: Introduce type-erased interface for the hasher
The motivation is to allow hiding the definition of functions
accepting a hasher. For one, this reduces (re)complication times,
because we can put the definition in .cc
2019-05-14 10:43:06 +02:00
Avi Kivity
1cf72b39a5 Merge "Unbreak the Unbreakable Linux" from Glauber
"
scylla_setup is currently broken for OEL. This happens because the
OS detection code checks for RHEL and Fedora. CentOS returns itself
as RHEL, but OEL does not.
"

* 'unbreakable' of github.com:glommer/scylla:
  scylla_setup: be nicer about unrecognized OS
  scylla_util: recognize OEL as part of the RHEL family
2019-05-13 21:38:21 +03:00
Glauber Costa
3b64727244 scylla_setup: be nicer about unrecognized OS
Right now if the user tries to execute this in an unrecognized OS, the
following will be thrown:

  Traceback (most recent call last):
   File "/usr/lib/scylla/libexec/scylla_setup", line 214, in <module>
     do_verify_package('scylla-enterprise-jmx')
   File "/usr/lib/scylla/libexec/scylla_setup", line 73, in do_verify_package
     if res != 0:
  UnboundLocalError: local variable 'res' referenced before assignment

It would be a lot nicer to exit gracefully and print a messge saying what
is going on. This was caught when running on OEL, which the previous patch
fixed. Still, there are other unknown OS out there the users may try to run
on.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-05-13 14:31:49 -04:00
Glauber Costa
6c15ae5b36 scylla_util: recognize OEL as part of the RHEL family
Oracle Linux is a RHEL-like distribution and we support it just fine, but our
new incarnation of scylla_setup is failing to recognize it.

os-release for OEL is a bit different. It doesn't have an ID_LIKE string, and
only shows an ID string, which is set to 'ol'. So let's recognize this.

Fixes: #4493
Branches: 3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-05-13 14:31:38 -04:00
Tomasz Grabiec
77fb34821b row_cache: Make invalidate() preemptible
This change inserts preemption points between removal of partitions.

The main complication is in maintaining consitency in the face of
concurrent population or eviction. We use the same mechanism which is
used by memtable updates. _prev_snapshot_pos is the ring position
which partitions the ring into the part which is already updated in
cache and the one which is yet to be updated. That position should be
set accordingly on preemption.

In case of invalidation, updating means removing all entries in the
range and marking the range as discontinuous.  When resuming
invalidation of a range we continue from _prev_snapshot_pos as the
lower bound.

This affects high-level operations like nodetool refresh, table
truncation, repair and streaming.

Fixes #2683

The improvement on stalls was measured using tests/perf_row_cache_update:

Before

Small partitions, no overwrites:
invalidation: 339.420624 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 339.422144 [ms]}
Small partition with a few rows:
invalidation: 191.855331 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 191.856816 [ms]}
Large partition, lots of small rows:
invalidation: 0.959328 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 0.961453 [ms]}

After:

Small partitions, no overwrites:
invalidation: 400.505554 [ms], preemption: {count: 843, 99%: 0.545791 [ms], max: 0.502340 [ms]}
Small partition with a few rows:
invalidation: 306.352600 [ms], preemption: {count: 644, 99%: 0.545791 [ms], max: 0.506464 [ms]}
Large partition, lots of small rows:
invalidation: 0.963660 [ms], preemption: {count: 2, 99%: 0.009887 [ms], max: 0.963264 [ms]}

The maximum scheduling latency went down form 339 ms to 0.5 ms (task quota).
2019-05-13 19:32:00 +02:00
Tomasz Grabiec
595e1a540e row_cache: Switch _prev_snapshot_pos to be a ring_position_ext
dht::ring_position cannot represent all ring_position_view instances,
in particular those obtained from
dht::ring_position_view::for_range_start(). To allow using the latter,
switch to views.
2019-05-13 19:30:50 +02:00
Tomasz Grabiec
1530224377 dht: Introduce ring_position_ext
It's an owning version of ring_position_view.

Note that ring_position has a narrower domain than the
ring_position_view for historical reasons, so we cannot use that.
2019-05-13 19:30:50 +02:00
Tomasz Grabiec
b08180c7fa dht: ring_position_view: Take key by const pointer 2019-05-13 19:30:39 +02:00
Tomasz Grabiec
ed697306be tests: perf_row_cache_update: Rename 'stall' to 'preemption' to avoid confusion 2019-05-13 19:18:20 +02:00
Tomasz Grabiec
b516e5fdbf tests: perf_row_cache_update: Report stalls around invalidation 2019-05-13 10:47:03 +02:00
Avi Kivity
a8b3cb8a28 Update seastar submodule
* seastar f73690e...3f7a5e1 (7):
  > Revert "Make sure all allocations/deallocations are properly byte aligned"
  > http: fix request content for POST requests
  > doc: discourage generic lambdas and unconstrained templates
  > smp: add smp_service_group for smp::submit_to() resource control
  > Revert "smp: add smp_service_group for smp::submit_to() resource control"
  > smp: add smp_service_group for smp::submit_to() resource control
  > Make sure all allocations/deallocations are properly byte aligned
2019-05-12 13:32:41 +03:00
Tomasz Grabiec
fd349a3c65 hashing: Introduce C++ concept for the hasher 2019-05-10 12:54:30 +02:00
Tomasz Grabiec
5c2f5b522d hashers: Rename hasher to cryptopp_hasher
So that we can introduce a truly generic interface named "hasher".
2019-05-10 12:54:08 +02:00
Tomasz Grabiec
b7ece4b884 gc_clock: Fix hashing to be backwards-compatible
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.

(cherry picked from commit 549d0eb2f3)
2019-05-10 12:48:46 +02:00
Avi Kivity
fdace36fa5 Merge "Fixes for GCC9 build" from Paweł
"
This series contains fixes for GCC9 build, mostly corrections needed
after changes in libstdc++. With this series and a workaround for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90415 (not included)
Scylla builds and passes unit tests with GCC9 (tested on Fedora 30,
development mode only).

Tests: unit(dev with gcc8 and gcc9).
"

* tag 'gcc9-fixes/v1' of https://github.com/pdziepak/scylla:
  tests/imr: add missing noexcept
  counters: bytes_view::pointer is not const pointer
  imr/fundamental: use bytes_view::const_pointer for const pointer
2019-05-09 21:51:24 +03:00