Compare commits

...

2101 Commits

Author SHA1 Message Date
Asias He
9b46b9f1a8 gossip: Add an option to force gossip generation
Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation
number g1, g2, g3.

n1, n2, n3 running scylla version with commit
0a52ecb6df (gossip: Fix max generation
drift measure)

One year later, user wants the upgrade n1,n2,n3 to a new version

when n3 does a rolling restart with a new version, n3 will use a
generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and
g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's
gossip update and mark g3 as down.

Such unnecessary marking of node down can cause availability issues.
For example:

DC1: n1, n2
DC2: n3, n4

When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which
causes the whole DC2 to be unavailable.

To fix, we can start the node with a gossip generation within
MAX_GENERATION_DIFFERENCE difference for the new node.

Once all the nodes run the version with commit
0a52ecb6df, the option is no logger
needed.

Fixes #5164

(cherry picked from commit 743b529c2b)

[tgrabiec: resolved major conflicts in config.hh]
2020-03-27 13:08:26 +01:00
Asias He
93da2e2ff0 gossiper: Always use the new generation number
User reported an issue that after a node restart, the restarted node
is marked as DOWN by other nodes in the cluster while the node is up
and running normally.

Consier the following:

- n1, n2, n3 in the cluster
- n3 shutdown itself
- n3 send shutdown verb to n1 and n2
- n1 and n2 set n3 in SHUTDOWN status and force the heartbeat version to
  INT_MAX
- n3 restarts
- n3 sends gossip shadow rounds to n1 and n2, in
  storage_service::prepare_to_join,
- n3 receives response from n1, in gossiper::handle_ack_msg, since
  _enabled = false and _in_shadow_round == false, n3 will apply the
  application state in fiber1, filber 1 finishes faster filber 2, it
  sets _in_shadow_round = false
- n3 receives response from n2, in gossiper::handle_ack_msg, since
  _enabled = false and _in_shadow_round == false, n3 will apply the
  application state in fiber2, filber 2 yields
- n3 finishes the shadow round and continues
- n3 resets gossip endpoint_state_map with
  gossiper.reset_endpoint_state_map()
- n3 resumes fiber 2, apply application state about n3 into
  endpoint_state_map, at this point endpoint_state_map contains
  information including n3 itself from n2.
- n3 calls gossiper.start_gossiping(generation_number, app_states, ...)
  with new generation number generated correctly in
  storage_service::prepare_to_join, but in
  maybe_initialize_local_state(generation_nbr), it will not set new
  generation and heartbeat if the endpoint_state_map contains itself
- n3 continues with the old generation and heartbeat learned in fiber 2
- n3 continues the gossip loop, in gossiper::run,
  hbs.update_heart_beat() the heartbeat is set to the number starting
  from 0.
- n1 and n2 will not get update from n3 because they use the same
  generation number but n1 and n2 has larger heartbeat version
- n1 and n2 will mark n3 as down even if n3 is alive.

To fix, always use the the new generation number.

Fixes: #5800
Backports: 3.0 3.1 3.2
(cherry picked from commit 62774ff882)
2020-03-27 12:53:26 +01:00
Piotr Sarna
b764db3f1c cql: fix qualifying indexed columns for filtering
When qualifying columns to be fetched for filtering, we also check
if the target column is not used as an index - in which case there's
no need of fetching it. However, the check was incorrectly assuming
that any restriction is eligible for indexing, while it's currently
only true for EQ. The fix makes a more specific check and contains
many dynamic casts, but these will hopefully we gone once our
long planned "restrictions rewrite" is done.
This commit comes with a test.

Fixes #5708
Tests: unit(dev)

(cherry picked from commit 767ff59418)
2020-03-22 10:08:48 +01:00
Konstantin Osipov
304d339193 locator: correctly select endpoints if RF=0
SimpleStrategy creates a list of endpoints by iterating over the set of
all configured endpoints for the given token, until we reach keyspace
replication factor.
There is a trivial coding bug when we first add at least one endpoint
to the list, and then compare list size and replication factor.
If RF=0 this never yields true.
Fix by moving the RF check before at least one endpoint is added to the
list.
Cassandra never had this bug since it uses a less fancy while()
loop.

Fixes #5962
Message-Id: <20200306193729.130266-1-kostja@scylladb.com>

(cherry picked from commit ac6f64a885)
2020-03-12 12:10:45 +02:00
Avi Kivity
9f7ba4203d logalloc: increase capacity of _regions vector outside reclaim lock
Reclaim consults the _regions vector, so we don't want it moving around while
allocating more capacity. For that we take the reclaim lock. However, that
can cause a false-positive OOM during startup:

1. all memory is allocated to LSA as part of priming (2baa16b371)
2. the _regions vector is resized from 64k to 128k, requiring a segment
   to be freed (plenty are free)
3. but reclaiming_lock is taken, so we cannot reclaim anything.

To fix, resize the _regions vector outside the lock.

Fixes #6003.
Message-Id: <20200311091217.1112081-1-avi@scylladb.com>

(cherry picked from commit c020b4e5e2)
2020-03-12 11:25:50 +02:00
Benny Halevy
8b6a792f81 dist/redhat: scylla.spec.mustache: set _no_recompute_build_ids
By default, `/usr/lib/rpm/find-debuginfo.sh` will temper with
the binary's build-id when stripping its debug info as it is passed
the `--build-id-seed <version>.<release>` option.

To prevent that we need to set the following macros as follows:
  unset `_unique_build_ids`
  set `_no_recompute_build_ids` to 1

Fixes #5881

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 25a763a187)
2020-03-09 15:22:58 +02:00
Benny Halevy
8a94f6b180 gossiper: do_stop_gossiping: copy live endpoints vector
It can be resized asynchronously by mark_dead.

Fixes #5701

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200203091344.229518-1-bhalevy@scylladb.com>
(cherry picked from commit f45fabab73)
2020-02-26 13:00:33 +02:00
Benny Halevy
27209a5b2e storage_service: drain_on_shutdown: unregister storage_proxy subscribers from local_storage_service
Match subscription done in main() and avoid cross shard access
to _lifecycle_subscribers vector.

Fixes #5385

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Acked-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200123092817.454271-1-bhalevy@scylladb.com>
(cherry picked from commit 5b0ea4c114)
2020-02-25 16:40:31 +02:00
Hagit Segev
c25f627a6e release: prepare for 3.1.4 2020-02-24 23:53:03 +02:00
Avi Kivity
58b1bdc20c Revert "streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations"
This reverts commit 4791b726a0. It exposes a data
resurrection bug (#5858).
2020-02-24 11:41:37 +02:00
Piotr Dulikowski
e1a7df174c hh: handle counter update hints correctly
This patch fixes a bug that appears because of an incorrect interaction
between counters and hinted handoff.

When a counter is updated on the leader, it sends mutations to other
replicas that contain all counter shards from the leader. If consistency
level is achieved but some replicas are unavailable, a hint with
mutation containing counter shards is stored.

When a hint's destination node is no longer its replica, it is attempted
to be sent to all its current replicas. Previously,
storage_proxy::mutate was used for that purpose. It was incorrect
because that function treats mutations for counter tables as mutations
containing only a delta (by how much to increase/decrease the counter).
These two types of mutations have different serialization format, so in
this case a "shards" mutation is reinterpreted as "delta" mutation,
which can cause data corruption to occur.

This patch backports `storage_proxy::mutate_hint_from_scratch`
function, which bypasses special handling of counter mutations and
treats them as regular mutations - which is the correct behavior for
"shards" mutations.

Refs #5833.
Backports: 3.1, 3.2, 3.3
Tests: unit(dev)
(cherry picked from commit ec513acc49)
2020-02-19 18:04:06 +02:00
Avi Kivity
6c39e17838 Merge "cql3: time_uuid_fcts: validate time UUID" from Benny
"
Throw an error in case we hit an invalid time UUID
rather than hitting an assert.

Fixes #5552

(Ref #5588 that was dequeued and fixed here)

Test: UUID_test, cql_query_test(debug)
"

* 'validate-time-uuid' of https://github.com/bhalevy/scylla:
  cql3: abstract_function_selector: provide assignment_testable_source_context
  test: cql_query_test: add time uuid validation tests
  cql3: time_uuid_fcts: validate timestamp arg
  cql3: make_max_timeuuid_fct: delete outdated FIXME comment
  cql3: time_uuid_fcts: validate time UUID
  test: UUID_test: add tests for time uuid
  utils: UUID: create_time assert nanos_since validity
  utils/UUID_gen: make_nanos_since
  utils: UUID: assert UUID.is_timestamp

(cherry picked from commit 3343baf159)

Conflicts:
	cql3/functions/time_uuid_fcts.hh
	tests/cql_query_test.cc
2020-02-17 20:09:09 +02:00
Avi Kivity
507d763f45 Update seastar submodule
* seastar a5312ab85a...a51bd8b91a (1):
  > config: Do not allow zero rates

Fixes #5360.
2020-02-16 17:02:54 +02:00
Benny Halevy
b042e27f0a repair: initialize row_level_repair: _zero_rows
Avoid following UBSAN error:
repair/row_level.cc:2141:7: runtime error: load of value 240, which is not a valid value for type 'bool'

Fixes #5531

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 474ffb6e54)
2020-02-16 16:11:36 +02:00
Rafael Ávila de Espíndola
375ce345a3 main: Explicitly allow scylla core dumps
I have not looked into the security reason for disabling it when
a program has file capabilities.

Fixes #5560

[avi: remove extraneous semicolon]
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200106231836.99052-1-espindola@scylladb.com>
(cherry picked from commit b80852c447)
2020-02-16 16:04:24 +02:00
Avi Kivity
493b821dfa Update seastar submodule
* seastar cfc082207c...a5312ab85a (1):
  > perftune.py: Use safe_load() for fix arbitrary code execution

Fixes #5630.
2020-02-16 15:55:05 +02:00
Avi Kivity
e1999c76b2 tools: toolchain: dbuild: relax process limit in container
Docker restricts the number of processes in a container to some
limit it calculates. This limit turns out to be too low on large
machines, since we run multiple links in parallel, and each link
runs many threads.

Remove the limit by specifying --pids-limit -1. Since dbuild is
meant to provide a build environment, not a security barrier,
this is okay (the container is still restricted by host limits).

I checked that --pids-limit is supported by old versions of
docker and by podman.

Fixes #5651.
Message-Id: <20200127090807.3528561-1-avi@scylladb.com>

(cherry picked from commit 897320f6ab)
2020-02-16 15:42:12 +02:00
Asias He
4791b726a0 streaming: Do not invalidate cache if no sstable is added in flush_streaming_mutations
The table::flush_streaming_mutations is used in the days when streaming
data goes to memtable. After switching to the new streaming, data goes
to sstables directly in streaming, so the sstables generated in
table::flush_streaming_mutations will be empty.

It is unnecessary to invalidate the cache if no sstables are added. To
avoid unnecessary cache invalidating which pokes hole in the cache, skip
calling _cache.invalidate() if the sstables is empty.

The steps are:

- STREAM_MUTATION_DONE verb is sent when streaming is done with old or
  new streaming
- table::flush_streaming_mutations is called in the verb handler
- cache is invalidated for the streaming ranges

In summary, this patch will avoid a lot of cache invalidation for
streaming.

Backports: 3.0 3.1 3.2
Fixes: #5769
(cherry picked from commit 5e9925b9f0)
2020-02-16 15:16:50 +02:00
Botond Dénes
d7354a5b8d row: append(): downgrade assert to on_internal_error()
This assert, added by 060e3f8 is supposed to make sure the invariant of
the append() is respected, in order to prevent building an invalid row.
The assert however proved to be too harsh, as it converts any bug
causing out-of-order clustering rows into cluster unavailability.
Downgrade it to on_internal_error(). This will still prevent corrupt
data from spreading in the cluster, without the unavailability caused by
the assert.

Fixes: #5786
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200211083829.915031-1-bdenes@scylladb.com>
(cherry picked from commit 3164456108)
2020-02-16 15:14:03 +02:00
Takuya ASADA
6815b72b06 dist/debian: keep /etc/systemd .conf files on 'remove'
Since dpkg does not re-install conffiles when it removed by user,
currently we are missing dependencies.conf and sysconfdir.conf on rollback.
To prevent this, we need to stop running
'rm -rf /etc/systemd/system/scylla-server.service.d/' on 'remove'.

Fixes #5734

(cherry picked from commit 43097854a5)
2020-02-12 14:29:30 +02:00
Rafael Ávila de Espíndola
efc2df8ca3 types: Fix encoding of negative varint
We would sometimes produce an unnecessary extra 0xff prefix byte.

The new encoding matches what cassandra does.

This was both a efficiency and correctness issue, as using varint in a
key could produce different tokens.

Fixes #5656

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
(cherry picked from commit c89c90d07f)
2020-02-02 17:35:02 +02:00
Avi Kivity
dbe90f131f test: make eventually() more patient
We use eventually() in tests to wait for eventually consistent data
to become consistent. However, we see spurious failures indicating
that we wait too little.

Increasing the timeout has a negative side effect in that tests that
fail will now take longer to do so. However, this negative side effect
is negligible to false-positive failures, since they throw away large
test efforts and sometimes require a person to investigate the problem,
only to conclude it is a false positive.

This patch therefore makes eventually() more patient, by a factor of
32.

Fixes #4707.
Message-Id: <20200130162745.45569-1-avi@scylladb.com>

(cherry picked from commit ec5b721db7)
2020-02-01 13:22:03 +02:00
Takuya ASADA
31d5d16c3d dist/debian: Use tilde for release candidate builds
We need to add '~' to handle rcX version correctly on Debian variants
(merged at ae33e9f), but when we moved to relocated package we mistakenly
dropped the code, so add the code again.

Fixes #5641

(cherry picked from commit dd81fd3454)
2020-01-28 18:35:40 +02:00
Hagit Segev
b0d122f9c5 release: prepare for 3.1.3 2020-01-28 14:09:57 +02:00
Asias He
9a10e4a245 repair: Avoid duplicated partition_end write
Consider this:

1) Write partition_start of p1
2) Write clustering_row of p1
3) Write partition_end of p1
4) Repair is stopped due to error before writing partition_start of p2
5) Repair calls repair_row_level_stop() to tear down which calls
   wait_for_writer_done(). A duplicate partition_end is written.

To fix, track the partition_start and partition_end written, avoid
unpaired writes.

Backports: 3.1 and 3.2
Fixes: #5527
(cherry picked from commit 401854dbaf)
2020-01-21 13:39:19 +02:00
Piotr Sarna
871d1ebdd5 view: ignore duplicated key entries in progress virtual reader
Build progress virtual reader uses Scylla-specific
scylla_views_builds_in_progress table in order to represent
legacy views_builds_in_progress rows. The Scylla-specific table contains
additional cpu_id clustering key part, which is trimmed before returning
it to the user. That may cause duplicated clustering row fragments to be
emitted by the reader, which may cause undefined behaviour in consumers.
The solution is to keep track of previous clustering keys for each
partition and drop fragments that would cause duplication. That way if
any shard is still building a view, its progress will be returned,
and if many shards are still building, the returned value will indicate
the progress of a single arbitrary shard.

Fixes #4524
Tests:
unit(dev) + custom monotonicity checks from <tgrabiec@scylladb.com>

(cherry picked from commit 85a3a4b458)
2020-01-16 12:07:40 +01:00
Tomasz Grabiec
bff996959d cql: alter type: Format field name as text instead of hex
Fixes #4841

Message-Id: <1565702635-26214-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 64ff1b6405)
2020-01-05 18:51:53 +02:00
Gleb Natapov
1bdc83540b cache_hitrate_calculator: do not ignore a future returned from gossiper::add_local_application_state
We should wait for a future returned from add_local_application_state() to
resolve before issuing new calculation, otherwise two
add_local_application_state() may run simultaneously for the same state.

Fixes #4838.

Message-Id: <20190812082158.GE17984@scylladb.com>
(cherry picked from commit 00c4078af3)
2020-01-05 18:50:13 +02:00
Takuya ASADA
478c35e07a dist/debian: fix missing scyllatop files
Debian package build script does runs relocate_python_scripts.py for scyllatop,
but mistakenly forgetting to install tools/scyllatop/*.py.
We need install them by using scylla-server.install.

Fixes #5518

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191227025750.434407-1-syuu@scylladb.com>
2019-12-30 19:38:34 +02:00
Benny Halevy
ba968ab9ec tracing: one_session_records: keep local tracing ptr
Similar to trace_state keep shared_ptr<tracing> _local_tracing_ptr
in one_session_records when constructed so it can be used
during shutdown.

Fixes #5243

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 7aef39e400)
2019-12-24 18:42:21 +02:00
Avi Kivity
883b5e8395 database: fix schema use-after-move in make_multishard_streaming_reader
On aarch64, asan detected a use-after-move. It doesn't happen on x86_64,
likely due to different argument evaluation order.

Fix by evaluating full_slice before moving the schema.

Note: I used "auto&&" and "std::move()" even though full_slice()
returns a reference. I think this is safer in case full_slice()
changes, and works just as well with a reference.

Fixes #5419.

(cherry picked from commit 85822c7786)
2019-12-24 18:35:01 +02:00
Rafael Ávila de Espíndola
b47033676a types: recreate dependent user types.
In the system.types table a user type refers to another by name. When
a user type is modified, only its entry in the table is changed.

At runtime a user type has direct pointer to the types it uses. To
handle the discrepancy we need to recreate any dependent types when a
entry in system.types changes.

Fixes #5049

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
(cherry picked from commit 5af8b1e4a3)
2019-12-23 17:58:26 +02:00
Tomasz Grabiec
67e45b73f0 types: Fix abort on type alter which affects a compact storage table with no regular columns
Fixes #4837

Message-Id: <1565702247-23800-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 34cff6ed6b)
2019-12-23 17:34:06 +02:00
Rafael Ávila de Espíndola
37eac75b6f cql: Fix use of UDT in reversed columns
We were missing calls to underlying_type in a few locations and so the
insert would think the given literal was invalid and the select would
refuse to fetch a UDT field.

Fixes #4672

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190708200516.59841-1-espindola@scylladb.com>
(cherry picked from commit 4e7ffb80c0)
2019-12-23 15:54:36 +02:00
Piotr Sarna
e8431a3474 table: Reduce read amplification in view update generation
This commit makes sure that single-partition readers for
read-before-write do not have fast-forwarding enabled,
as it may lead to huge read amplification. The observed case was:
1. Creating an index.
  CREATE INDEX index1  ON myks2.standard1 ("C1");
2. Running cassandra-stress in order to generate view updates.
cassandra-stress write no-warmup n=1000000 cl=ONE -schema \
  'replication(factor=2) compaction(strategy=LeveledCompactionStrategy)' \
  keyspace=myks2 -pop seq=4000000..8000000 -rate threads=100 -errors
  skip-read-validation -node 127.0.0.1;

Without disabling fast-forwarding, single-partition readers
were turned into scanning readers in cache, which resulted
in reading 36GB (sic!) on a workload which generates less
than 1GB of view updates. After applying the fix, the number
dropped down to less than 1GB, as expected.

Refs #5409
Fixes #4615
Fixes #5418

(cherry picked from commit 79c3a508f4)
2019-12-05 22:36:20 +02:00
Yaron Kaikov
9d78d848e6 release: prepare for 3.1.2 2019-11-27 10:24:43 +02:00
Rafael Ávila de Espíndola
32aa6ddd7e commitlog: make sure a file is closed
If allocate or truncate throws, we have to close the file.

Fixes #4877

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191114174810.49004-1-espindola@scylladb.com>
(cherry picked from commit 6160b9017d)
2019-11-24 17:48:24 +02:00
Tomasz Grabiec
74cc9477af row_cache: Fix abort on bad_alloc during cache update
Since 90d6c0b, cache will abort when trying to detach partition
entries while they're updated. This should never happen. It can happen
though, when the update fails on bad_alloc, because the cleanup guard
invalidates the cache before it releases partition snapshots (held by
"update" coroutine).

Fix by destroying the coroutine first.

Fixes #5327.

Tests:
  - row_cache_test (dev)

Message-Id: <1574360259-10132-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit e3d025d014)
2019-11-24 17:44:07 +02:00
Nadav Har'El
95acf71680 merge: row_marker: correct row expiry condition
Merged patch set by Piotr Dulikowski:

This change corrects condition on which a row was considered expired by its
TTL.

The logic that decides when a row becomes expired was inconsistent with the
logic that decides if a single cell is expired. A single cell becomes expired
when expiry_timestamp <= now, while a row became expired when
expiry_timestamp < now (notice the strict inequality). For rows inserted
with TTL, this caused non-key cells to expire (change their values to null)
one second before the row disappeared. Now, row expiry logic uses non-strict
inequality.

Fixes #4263,
Fixes #5290.

Tests:

    unit(dev)
    python test described in issue #5290

(cherry picked from commit 9b9609c65b)
2019-11-20 21:40:11 +02:00
Asias He
921f8baf00 gossip: Fix max generation drift measure
Assume n1 and n2 in a cluster with generation number g1, g2. The
cluster runs for more than 1 year (MAX_GENERATION_DIFFERENCE). When n1
reboots with generation g1' which is time based, n2 will see
g1' > g2 + MAX_GENERATION_DIFFERENCE and reject n1's gossip update.

To fix, check the generation drift with generation value this node would
get if this node were restarted.

This is a backport of CASSANDRA-10969.

Fixes #5164

(cherry picked from commit 0a52ecb6df)
2019-11-20 11:39:16 +02:00
Avi Kivity
071d7d9210 reloc: do not install dependencies when building the relocatable package
The dependencies are provided by the frozen toolchain. If a dependency
is missing, we must update the toolchain rather than rely on build-time
installation, which is not reproducible (as different package versions
are available at different times).

Luckily "dnf install" does not update an already-installed package. Had
that been a case, none of our builds would have been reproducible, since
packages would be updated to the latest version as of the build time rather
than the version selected by the frozen toolchain.

So, to prevent missing packages in the frozen toolchain translating to
an unreproducible build, remove the support for installing dependencies
from reloc/build_reloc.sh. We still parse the --nodeps option in case some
script uses it.

Fixes #5222.

Tests: reloc/build_reloc.sh.
(cherry picked from commit cd075e9132)
2019-11-18 14:58:24 +02:00
Kamil Braun
769b9bbe59 view: fix bug in virtual columns.
When creating a virtual column of non-frozen map type,
the wrong type was used for the map's keys.

Fixes #5165.

(cherry picked from commit ef9d5750c8)
2019-11-18 14:55:17 +02:00
Benny Halevy
d4e553c153 sstables: delete_atomically: fix misplaced parenthesis in pending_delete_log warning message
Fixes #4861.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190818064637.9207-1-bhalevy@scylladb.com>
(cherry picked from commit 20083be9f6)
2019-11-17 17:57:00 +02:00
Avi Kivity
d983411488 build: adjust libthread_db file name to match gdb expectations
gdb searches for libthread_db.so using its canonical name of libthread_db.so.1 rather
than the file name of libthread_db-1.0.so, so use that name to store the file in the
archive.

Fixes #4996.

(cherry picked from commit d77171e10e)
2019-11-17 17:57:00 +02:00
Avi Kivity
27de1bb8e6 reconcilable_result: use chunked_vector to hold partitions
Usually, a reconcilable_result holds very few partitions (1 is common),
since the page size is limited by 1MB. But if we have paging disabled or
if we are reconciling a range full of tombstones, we may see many more.
This can cause large allocations.

Change to chunked_vector to prevent those large allocations, as they
can be quite expensive.

Fixes #4780.

(cherry picked from commit 093d2cd7e5)
2019-11-17 17:57:00 +02:00
Avi Kivity
854f8ccb40 utils::chunked_vector: add rbegin() and related iterators
Needed as an std::vector replacement.

(cherry picked from commit eaa9a5b0d7)

Prerequisite for #4780.
2019-11-17 17:57:00 +02:00
Avi Kivity
a68170c9a3 utils: chunked_vector: make begin()/end() const correct
begin() of a const vector should return a const_iterator, to avoid
giving the caller the ability to mutate it.

This slipped through since iterator's constructor does a const_cast.

Noticed by code inspection.

(cherry picked from commit df6faae980)

Prerequisite for #4780.
2019-11-17 17:57:00 +02:00
Glauber Costa
7e4bcf2c0f do not crash in user-defined operations if the controller is disabled
Scylla currently crashes if we run manual operations like nodetool
compact with the controller disabled. While we neither like nor
recommend running with the controller disabled, due to some corner cases
in the controller algorithm we are not yet at the point in which we can
deprecate this and are sometimes forced to disable it.

The reason for the crash is that manual operations will invoke
_backlog_of_shares, which returns what is the backlog needed to
create a certain number of shares. That scan the existing control
points, but when we run without the controller there are no control
points and we crash.

Backlog doesn't matter if the controller is disabled, and the return
value of this function will be immaterial in this case. So to avoid the
crash, we return something right away if the controller is disabled.

Fixes #5016

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit c9f2d1d105)
2019-11-17 12:33:23 +02:00
Avi Kivity
a74b3a182e Merge "Add proper aggregation for paged indexing" from Piotr
"
Fixes #4540

This series adds proper handling of aggregation for paged indexed queries.
Before this series returned results were presented to the user in per-page
partial manner, while they should have been returned as a single aggregated
value.

Tests: unit(dev)
"

* 'add_proper_aggregation_for_paged_indexing_for_3.1' of https://github.com/psarna/scylla:
  test: add 'eventually' block to index paging test
  tests: add indexing+paging test case for clustering keys
  tests: add indexing + paging + aggregation test case
  tests: add query_options to cquery_nofail
  cql3: make DEFAULT_COUNT_PAGE_SIZE constant public
  cql3: add proper aggregation to paged indexing
  cql3: add a query options constructor with explicit page size
  cql3: enable explicit copying of query_options
  cql3: split execute_base_query implementation
2019-11-17 12:23:30 +02:00
Piotr Sarna
e9bc579565 test: add 'eventually' block to index paging test
Without 'eventually', the test is flaky because the index can still
be not up to date while checking its conditions.

Fixes #4670

(cherry picked from commit ebbe038d19)
2019-11-15 09:12:40 +01:00
Piotr Sarna
ad46bf06a7 tests: add indexing+paging test case for clustering keys
Indexing a non-prefix part of the clustering key has a separate
code path (see issue #3405), so it deserves a separate test case.
2019-11-14 10:26:49 +01:00
Piotr Sarna
1ff21a28b7 tests: add indexing + paging + aggregation test case
Indexed queries used to erroneously return partial per-page results
for aggregation queries. This test case used to reproduce the problem
and now ensures that there would be no regressions.

Refs #4540
2019-11-14 10:26:49 +01:00
Piotr Sarna
fb3dfaa736 tests: add query_options to cquery_nofail
The cquery_nofail utility is extended, so it can accept custom
query options, just like execute_cql does.
2019-11-14 10:26:49 +01:00
Piotr Sarna
5a02e6976f cql3: make DEFAULT_COUNT_PAGE_SIZE constant public
The constant will be later used in test scenarios.
2019-11-14 10:26:49 +01:00
Piotr Sarna
5202eea7a7 cql3: add proper aggregation to paged indexing
Aggregated and paged filtering needs to aggregate the results
from all pages in order to avoid returning partial per-page
results. It's a little bit more complicated than regular aggregation,
because each paging state needs to be translated between the base
table and the underlying view. The routine keeps fetching pages
from the underlying view, which are then used to fetch base rows,
which go straight to the result set builder.

Fixes #4540
2019-11-14 10:26:48 +01:00
Gleb Natapov
038733f1a5 storage_proxy: do not release mutation if not all replies were received
MV backpressure code frees mutation for delayed client replies earlier
to save memory. The commit 2d7c026d6e that
introduced the logic claimed to do it only when all replies are received,
but this is not the case. Fix the code to free only when all replies
are received for real.

Fixes #5242

Message-Id: <20191113142117.GA14484@scylladb.com>
(cherry picked from commit 552c56633e)
2019-11-14 11:04:27 +02:00
Piotr Sarna
0ed2e90925 cql3: add a query options constructor with explicit page size
For internal use, there already exists a query_options constructor
that copies data from another query_options with overwritten paging
state. This commit adds an option to overwrite page size as well.
2019-11-14 09:58:35 +01:00
Piotr Sarna
9ee6d2bc15 cql3: enable explicit copying of query_options 2019-11-14 09:58:28 +01:00
Piotr Sarna
23582a2ce9 cql3: split execute_base_query implementation
In order to handle aggregation queries correctly, the function that
returns base query results is split into two, so it's possible to
access raw query results, before they're converted into end-user
CQL message.
2019-11-14 09:58:05 +01:00
Takuya ASADA
5ddf0ec1df dist/common/scripts/scylla_setup: don't proceed with empty NIC name
Currently NIC selection prompt on scylla_setup just proceed setup when
user just pressed Enter key on the prompt.
The prompt should ask NIC name again until user input correct NIC name.

Fixes #4517
Message-Id: <20190617124925.11559-1-syuu@scylladb.com>

(cherry picked from commit 7320c966bc)
2019-11-13 17:27:21 +02:00
Avi Kivity
e6eb54af90 Update seastar submodule
* seastar 75488f6ef2...cfc082207c (2):
  > core: fix a race in execution stages
  > execution_stage: prevent unbounded growth

Fixes #4749.
Fixes #4856.
2019-11-13 13:14:27 +02:00
Piotr Sarna
f5a869966a view: fix view_info select statement for local indexes
Calculating the select statement for given view_info structure
used to work fine, but once local indexes were introduced, a subtle
bug appeared: the legacy token column does not exist in local indexes
and a valid clustering key column was omitted instead.
That results in potentially incorrect partition slices being used later
in read-before-write.
There's a long term plan for removing select_statement from
view info altogether, but nonetheless the bug needs to be fixed first.

cherry picked from commit 9e98b51aaa

Fixes #5241
Message-Id: <cb2e863e8e993e00ec7329505f737a9ce4b752ae.1572432826.git.sarna@scylladb.com>
2019-11-01 08:06:30 +02:00
Piotr Sarna
0c70cd626b index: add is_global_index() utility
The helper function is useful for determining if given schema
represents a global index.

cherry picked from commit 2ee8c6f595
Message-Id: <db5c9383e426fb2e55e5dbeebc7b8127afc91158.1572432826.git.sarna@scylladb.com>
2019-11-01 08:06:25 +02:00
Botond Dénes
0928aa4791 repair: repair_cf_range(): extract result of local checksum calculation only once
The loop that collects the result of the checksum calculations and logs
any errors. The error logging includes `checksums[0]` which corresponds
to the checksum calculation on the local node. This violates the
assumption of the code following the loop, which assumes that the future
of `checksums[0]` is intact after the loop terminates. However this is
only true when the checksum calculation is successful and is false when
it fails, as in this case the loop extracts the error and logs it. When
the code after the loop checks again whether said calculation failed, it
will get a false negative and will go ahead and attempt to extract the
value, triggering an assert failure.
Fix by making sure that even in the case of failed checksum calculation,
the result of `checksum[0]` is extracted only once.

Fixes: #5238
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191029151709.90986-1-bdenes@scylladb.com>
(cherry picked from commit e48f301e95)
2019-10-29 20:43:30 +02:00
Yaron Kaikov
f32ec885c4 release: prepare for 3.1.1 2019-10-24 21:55:50 +03:00
Tomasz Grabiec
762eec2bc6 Merge "Fix TTL serialization breakage" from Avi
ommit 93270dd changed gc_clock to be 64-bit, to fix the Y2038
problem. While 64-bit tombstone::deletion_time is serialized in a
compatible way, TTLs (gc_clock::duration) were not.

This patchset reverts TTL serialization to the 32-bit serialization
format, and also allows opting-in to the 64-bit format in case a
cluster was installed with the broken code. Only Scylla 3.1.0 is
vulnerable.

Fixes #4855

Tests: unit (dev)
(cherry picked from commit e621db591e)
2019-10-24 08:55:34 +03:00
Avi Kivity
3f4d9f210f Merge "Fix handling of schema alters and eviction in cache" from Tomasz
"
Fixes #5134, Eviction concurrent with preempted partition entry update after
  memtable flush may allow stale data to be populated into cache.

Fixes #5135, Cache reads may miss some writes if schema alter followed by a
  read happened concurrently with preempted partition entry update.

Fixes #5127, Cache populating read concurrent with schema alter may use the
  wrong schema version to interpret sstable data.

Fixes #5128, Reads of multi-row partitions concurrent with memtable flush may
  fail or cause a node crash after schema alter.
"

* tag 'fix-cache-issues-with-schema-alter-and-eviction-v2' of github.com:tgrabiec/scylla:
  tests: row_cache: Introduce test_alter_then_preempted_update_then_memtable_read
  tests: row_cache_stress_test: Verify all entries are evictable at the end
  tests: row_cache_stress_test: Exercise single-partition reads
  tests: row_cache_stress_test: Add periodic schema alters
  tests: memtable_snapshot_source: Allow changing the schema
  tests: simple_schema: Prepare for schema altering
  row_cache: Record upgraded schema in memtable entries during update
  memtable: Extract memtable_entry::upgrade_schema()
  row_cache, mvcc: Prevent locked snapshots from being evicted
  row_cache: Make evict() not use invalidate_unwrapped()
  mvcc: Introduce partition_snapshot::touch()
  row_cache, mvcc: Do not upgrade schema of entries which are being updated
  row_cache: Use the correct schema version to populate the partition entry
  delegating_reader: Optimize fill_buffer()
  row_cache, memtable: Use upgrade_schema()
  flat_mutation_reader: Introduce upgrade_schema()

(cherry picked from commit 8ed6f94a16)
2019-10-18 13:59:40 +02:00
Yaron Kaikov
9c3cdded9e release: prepare for 3.1.0 2019-10-12 08:45:49 +03:00
yaronkaikov
05272c53ed release: prepare for 3.1.0.rc9 2019-10-06 10:51:37 +03:00
Botond Dénes
393b2abdc9 querier_cache: correctly account entries evicted on insertion in the population
Currently, the population stat is not increased for entries that are
evicted immediately on insert, however the code that does the eviction
still decreases the population stat, leading to an imbalance and in some
cases the underflow of the population stat. To fix, unconditionally
increase the population stat upon inserting an entry, regardless of
whether it is immediately evicted or not.

Fixes: #5123

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20191001153215.82997-1-bdenes@scylladb.com>
(cherry picked from commit 00b432b61d)
2019-10-05 12:36:21 +03:00
Avi Kivity
d9dc8f92cc Merge " hinted handoff: fix races during shutdown and draining" from Vlad
"
Fix races that may lead to use-after-free events and file system level exceptions
during shutdown and drain.

The root cause of use-after-free events in question is that space_watchdog blocks on
end_point_hints_manager::file_update_mutex() and we need to make sure this mutex is alive as long as
it's accessed even if the corresponding end_point_hints_manager instance
is destroyed in the context of manager::drain_for().

File system exceptions may occur when space_watchdog attempts to scan a
directory while it's being deleted from the drain_for() context.
In case of such an exception new hints generation is going to be blocked
- including for materialized views, till the next space_watchdog round (in 1s).

Issues that are fixed are #4685 and #4836.

Tested as follows:
 1) Patched the code in order to trigger the race with (a lot) higher
    probability and running slightly modified hinted handoff replace
    dtest with a debug binary for 100 times. Side effect of this
    testing was discovering of #4836.
 2) Using the same patch as above tested that there are no crashes and
    nodes survive stop/start sequences (they were not without this series)
    in the context of all hinted handoff dtests. Ran the whole set of
    tests with dev binary for 10 times.
"

Fixes #4685
Fixes #4836

* 'hinted_handoff_race_between_drain_for_and_space_watchdog_no_global_lock-v2' of https://github.com/vladzcloudius/scylla:
  hinted handoff: fix a race on a directory removal between space_watchdog and drain_for()
  hinted handoff: make taking file_update_mutex safe
  db::hints::manager::drain_for(): fix alignment
  db::hints::manager: serialize calls to drain_for()
  db::hints: cosmetics: identation and missing method qualifier

(cherry picked from commit 3cb081eb84)
2019-10-05 09:50:05 +03:00
Gleb Natapov
c009f7b182 messaging_service: enable reuseaddr on messaging service rpc
Fixes #4943

Message-Id: <20190918152405.GV21540@scylladb.com>
(cherry picked from commit 73e3d0a283)
2019-10-03 14:42:38 +03:00
Avi Kivity
303a56f2bd Update seastar submodule
* seastar 7dfcf334c4...75488f6ef2 (2):
  > net: socket::{set,get}_reuseaddr() should not be virtual
  > Merge "fix some tcp connection bugs and add reuseaddr option to a client socket" from Gleb

Prerequisite for #4943.
2019-10-03 14:41:34 +03:00
Tomasz Grabiec
57512d3df9 db: read: Filter-out sstables using its first and last keys
Affects single-partition reads only.

Refs #5113

When executing a query on the replica we do several things in order to
narrow down the sstable set we read from.

For tables which use LeveledCompactionStrategy, we store sstables in
an interval set and we select only sstables whose partition ranges
overlap with the queried range. Other compaction strategies don't
organize the sstables and will select all sstables at this stage. The
reasoning behind this is that for non-LCS compaction strategies the
sstables' ranges will typically overlap and using interval sets in
this case would not be effective and would result in quadratic (in
sstable count) memory consumption.

The assumption for overlap does not hold if the sstables come from
repair or streaming, which generates non-overlapping sstables.

At a later stage, for single-partition queries, we use the sstables'
bloom filter (kept in memory) to drop sstables which surely don't
contain given partition. Then we proceed to sstable indexes to narrow
down the data file range.

Tables which don't use LCS will do unnecessary I/O to read index pages
for single-partition reads if the partition is outside of the
sstable's range and the bloom filter is ineffective (Refs #5112).

This patch fixes the problem by consulting sstable's partition range
in addition to the bloom filter, so that the non-overlapping sstables
will be filtered out with certainty and not depend on bloom filter's
efficiency.

It's also faster to drop sstables based on the keys than the bloom
filter.

Tests:
  - unit (dev)
  - manual using cqlsh

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190927122505.21932-1-tgrabiec@scylladb.com>
(cherry picked from commit b0e0f29b06)
2019-09-29 10:57:58 +03:00
Tomasz Grabiec
a894868298 sstables: Fix partition key count estimation for a range
The method sstable::estimated_keys_for_range() was severely
under-estimating the number of partitions in an sstable for a given
token range.

The first reason is that it underestimated the number of sstable index
pages covered by the range, by one. In extreme, if the requested range
falls into a single index page, we will assume 0 pages, and report 1
partition. The reason is that we were using
get_sample_indexes_for_range(), which returns entries with the keys
falling into the range, not entries for pages which may contain the
keys.

A single page can have a lot of partitions though. By default, there
is a 1:20000 ratio between summary entry size and the data file size
covered by it. If partitions are small, that can be many hundreds of
partitions.

Another reason is that we underestimate the number of partitions in an
index page. We multiply the number of pages by:

   (downsampling::BASE_SAMPLING_LEVEL * _components->summary.header.min_index_interval)
     / _components->summary.header.sampling_level

Using defaults, that means multiplying by 128. In the cassandra-stress
workload a single partition takes about 300 bytes in the data file and
summary entry is 22 bytes. That means a single page covers 22 * 20'000
= 440'000 bytes of the data file, which contains about 1'466
partitions. So we underestimate by an order of magnitude.

Underestimating the number of partitions will result in too small
bloom filters being generated for the sstables which are the output of
repair or streaming. This will make the bloom filters ineffective
which results in reads selecting more sstables than necessary.

The fix is to base the estimation on the number of index pages which
may contain keys for the range, and multiply that by the average key
count per index page.

Fixes #5112.
Refs #4994.

The output of test_key_count_estimation:

Before:

count = 10000
est = 10112
est([-inf; +inf]) = 512
est([0; 0]) = 128
est([0; 63]) = 128
est([0; 255]) = 128
est([0; 511]) = 128
est([0; 1023]) = 128
est([0; 4095]) = 256
est([0; 9999]) = 512
est([5000; 5000]) = 1
est([5000; 5063]) = 1
est([5000; 5255]) = 1
est([5000; 5511]) = 1
est([5000; 6023]) = 128
est([5000; 9095]) = 256
est([5000; 9999]) = 256
est(non-overlapping to the left) = 1
est(non-overlapping to the right) = 1

After:

count = 10000
est = 10112
est([-inf; +inf]) = 10112
est([0; 0]) = 2528
est([0; 63]) = 2528
est([0; 255]) = 2528
est([0; 511]) = 2528
est([0; 1023]) = 2528
est([0; 4095]) = 5056
est([0; 9999]) = 10112
est([5000; 5000]) = 2528
est([5000; 5063]) = 2528
est([5000; 5255]) = 2528
est([5000; 5511]) = 2528
est([5000; 6023]) = 5056
est([5000; 9095]) = 7584
est([5000; 9999]) = 7584
est(non-overlapping to the left) = 0
est(non-overlapping to the right) = 0

Tests:
  - unit (dev)

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190927141339.31315-1-tgrabiec@scylladb.com>
(cherry picked from commit b93cc21a94)
2019-09-28 22:12:04 +03:00
Raphael S. Carvalho
a5d385d702 sstables/compaction_manager: Don't perform upgrade on shared SSTables
compaction_manager::perform_sstable_upgrade() fails when it feeds
compaction mechanism with shared sstables. Shared sstables should
be ignored when performing upgrade and so wait for reshard to pick
them up in parallel. Whenever a shared sstable is brought up either
on restart or via refresh, reshard procedure kicks in.
Reshard picks the highest supported format so the upgrade for
shared sstable will naturally take place.

Fixes #5056.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190925042414.4330-1-raphaelsc@scylladb.com>
(cherry picked from commit 571fa94eb5)
2019-09-28 17:39:09 +03:00
Avi Kivity
6413063b1b Merge "mvcc: Fix incorrect schema version being used to copy the mutation when applying (#5099)" from Tomasz
"
Currently affects only counter tables.

Introduced in 27014a2.

mutation_partition(s, mp) is incorrect because it uses s to interpret
mp, while it should use mp_schema.

We may hit this if the current node has a newer schema than the
incoming mutation. This can happen during table schema altering when we receive the
mutation from a node which hasn't processed the schema change yet.

This is undefined behavior in general. If the alter was adding or
removing columns, this may result in corruption of the write where
values of one column are inserted into a different column.

Fixes #5095.
"

* 'fix-schema-alter-counter-tables' of https://github.com/tgrabiec/scylla:
  mvcc: Fix incorrect schema verison being used to copy the mutation when applying
  mutation_partition: Track and validate schema version in debug builds
  tests: Use the correct schema to access mutation_partition

(cherry picked from commit 83bc59a89f)
2019-09-28 17:38:04 +03:00
Tomasz Grabiec
0d31c6da62 Merge "storage_proxy: tolerate view_update_write_response_handler id not found on shutdown" from Benny
1. Add assert in remove_response_handler to make crashes like in #5032 easier to understand.
2. Lookup the view_update_write_response_handler id before calling  timeout_cb and tolerate it not found.
   Just log a warning if this happened.

Fixes #5032

(cherry picked from commit 06b9818e98)
2019-09-28 17:37:40 +03:00
Tomasz Grabiec
b62bb036ed Merge "toppartitions: don't transport schema_ptr across shards" from Avi
When the toppartitions operation gathers results, it copies partition
keys with their schema_ptr:s. When these schema_ptr:s are copies
or destroyed, they can cause leaks or premature frees of the schema
in its original shard since reference count operations in are not atomic.

Fix that by converting the schema_ptr to a global_schema_ptr during
transportation.

Fixes #5104 (direct bug)
Fixes #5018 (schema prematurely freed, toppartitions previously executed on that node)
Fixes #4973 (corrupted memory pool of the same size class as schema, toppartitions previously executed on that node)

Tests: new test added that fails with the existing code in debug mode,
manual toppartitions test

(cherry picked from commit 5b0e48f25b)
2019-09-28 17:35:19 +03:00
Glauber Costa
bdabd2e7a4 toppartitions: fix typo
toppartitons -> toppartitions

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190627160937.7842-1-glauber@scylladb.com>
(cherry picked from commit d916601ea4)

Ref #5104 (prerequisite for patch)
2019-09-28 17:34:24 +03:00
Benny Halevy
d7fc7bcf9f commitlog: descriptor: skip leading path from filename
std::regex_match of the leading path may run out of stack
with long paths in debug build.

Using rfind instead to lookup the last '/' in in pathname
and skip it if found.

Fixes #4464

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190505144133.4333-1-bhalevy@scylladb.com>
(cherry picked from commit d9136f96f3)
2019-09-23 11:29:26 +03:00
Hagit Segev
21aec9c7ef release: prepare for 3.1.0.rc8 2019-09-23 07:01:02 +03:00
Asias He
02ce19e851 storage_service: Replicate and advertise tokens early in the boot up process
When a node is restarted, there is a race between gossip starts (other
nodes will mark this node up again and send requests) and the tokens are
replicated to other shards. Here is an example:

- n1, n2
- n2 is down, n1 think n2 is down
- n2 starts again, n2 starts gossip service, n1 thinks n2 is up and sends
  reads/writes to n2, but n2 hasn't replicated the token_metadata to all
  the shards.
- n2 complains:
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  token_metadata - sorted_tokens is empty in first_token_index!
  storage_proxy - Failed to apply mutation from $ip#4: std::runtime_error
  (sorted_tokens is empty in first_token_index!)

The code path looks like below:

0 stoarge_service::init_server
1    prepare_to_join()
2          add gossip application state of NET_VERSION, SCHEMA and so on.
3         _gossiper.start_gossiping().get()
4    join_token_ring()
5           _token_metadata.update_normal_tokens(tokens, get_broadcast_address());
6           replicate_to_all_cores().get()
7           storage_service::set_gossip_tokens() which adds the gossip application state of TOKENS and STATUS

The race talked above is at line 3 and line 6.

To fix, we can replicate the token_metadata early after it is filled
with the tokens read from system table before gossip starts. So that
when other nodes think this restarting node is up, the tokens are
already replicated to all the shards.

In addition, this patch also fixes the issue that other nodes might see
a node miss the TOKENS and STATUS application state in gossip if that
node failed in the middle of a restarting process, i.e., it is killed
after line 3 and before line 7. As a result we could not replace the
node.

Tests: update_cluster_layout_tests.py
Fixes: #4709
Fixes: #4723
(cherry picked from commit 3b39a59135)
2019-09-22 12:45:22 +03:00
Eliran Sinvani
37c4be5e74 Storage proxy: protect against infinite recursion in query_partition_key_range_concurrent
A recent fix to #3767 limited the amount of ranges that
can return from query_ranges_to_vnodes_generator. This with
the combination of a large amount of token ranges can lead to
an infinite recursion. The algorithm multiplies by factor of
2 (actualy a shift left by one)  the amount of requested
tokens in each recursion iteration. As long as the requested
number of ranges is greater than 0, the recursion is implicit,
and each call is scheduled separately since the call is inside
a continuation of a map reduce.
But if the amount of iterations is large enough (~32) the
counter for requested ranges zeros out and from that moment on
two things will happen:
1. The counter will remain 0 forever (0*2 == 0)
2. The map reduce future will be immediately available and this
will result in the continuation being invoked immediately.
The latter causes the recursive call to be a "regular" recursive call
thus, through the stack and not the task queue of the scheduler, and
the former causes this recursion to be infinite.
The combination creates a stack that keeps growing and eventually
overflows resulting in undefined behavior (due to memory overrun).

This patch prevent the problem from happening, it limits the growth of
the concurrency counter beyond twice the last amount of tokens returned
by the query_ranges_to_vnodes_generator.And also makes sure it is not
get stuck at zero.

Testing: * Unit test in dev mode.
         * Modified add 50 dtest that reproduce the problem

Fixes #4944

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190922072838.14957-1-eliransin@scylladb.com>
(cherry picked from commit 280715ad45)
2019-09-22 11:59:19 +03:00
Avi Kivity
d81ac93728 Update seastar submodule
* seastar b314eb21b1...7dfcf334c4 (1):
  > iotune: fix exception handling in case test file creation fails

Fixes #5001.
2019-09-18 18:36:13 +03:00
Tomasz Grabiec
024d1563ad Revert "Simplify db::cql_type_parser::parse"
This reverts commit 7f64a6ec4b.

Fixes #5011

The reverted commit exposes #3760 for all schemas, not only those
which have UDTs.

The problem is that table schema deserialization now requires keyspace
to be present. If the replica hasn't received schema changes which
introduce the keyspace yet, the write will fail.

(cherry picked from commit 8517eecc28)
2019-09-12 20:17:39 +03:00
yaronkaikov
4a1a281e84 release: prepare for3.1.0.rc7 2019-09-11 15:15:38 +03:00
Piotr Sarna
d61dd1a933 main: make sure view_builder doesn't propagate semaphore errors
Stopping services which occurs in a destructor of deferred_action
should not throw, or it will end the program with
terminate(). View builder breaks a semaphore during its shutdown,
which results in propagating a broken_semaphore exception,
which in turn results in throwing an exception during stop().get().
In order to fix that issue, semaphore exceptions are explicitly
ignored, since they're expected to appear during shutdown.

Fixes #4875
Fixes #4995.

(cherry picked from commit 23c891923e)
2019-09-10 16:34:46 +03:00
Gleb Natapov
447c1e3bcc messaging_service: configure different streaming domain for each rpc server
A streaming domain identifies a server across shards. Each server should
have different one.

Fixes: #4953

Message-Id: <20190908085327.GR21540@scylladb.com>
(cherry picked from commit 9e9f64d90e)
2019-09-09 20:36:11 +03:00
Botond Dénes
834b92b3d7 stream_session: STREAM_MUTATION_FRAGMENTS: print errors in receive and distribute phase
Currently when an error happens during the receive and distribute phase
it is swallowed and we just return a -1 status to the remote. We only
log errors that happen during responding with the status. This means
that when streaming fails, we only know that something went wrong, but
the node on which the failure happened doesn't log anything.

Fix by also logging errors happening in the receive and distribute
phase. Also mention the phase in which the error happened in both error
log messages.

Refs: #4901
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20190903115735.49915-1-bdenes@scylladb.com>
(cherry picked from commit 783277fb02)
2019-09-09 14:34:33 +03:00
Hagit Segev
2ec036f50c release: prepare for3.1.0.rc6 2019-09-08 10:32:22 +03:00
Avi Kivity
958fe2024f Update seastar submodule
* seastar c59d019d6b...b314eb21b1 (2):
  > reactor: fix false positives in the stall detector due to large task queue
  > reactor: remove unused _tasks_processed variable

Ref #4955, #4951, #4899, #4898.
2019-09-05 14:39:53 +03:00
Rafael Ávila de Espíndola
cd998b949a sstable: close file_writer if an exception in thrown
The previous code was not exception safe and would eventually cause a
file to be destroyed without being closed, causing an assert failure.

Unfortunately it doesn't seem to be possible to test this without
error injection, since using an invalid directory fails before this
code is executed.

Fixes #4948

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190904002314.79591-1-espindola@scylladb.com>
(cherry picked from commit 000514e7cc)
2019-09-05 10:14:36 +03:00
Avi Kivity
2e1e1392ea storage_proxy: protect _view_update_handlers_list iterators from invalidation
on_down() iterates over _view_update_handlers_list, but it yields during iteration,
and while it yields, elements in that list can be removed, resulting in a
use-after-free.

Prevent this by registering iterators that can be potentially invalidated, and
any time we remove an element from the list, check whether we're removing an element
that is being pointed to by a live iterator. If that is the case, advance the iterator
so that it points at a valid element (or at the end of the list).

Fixes #4912.

Tests: unit (dev)
(cherry picked from commit 301246f6c0)
2019-09-05 09:42:00 +03:00
yaronkaikov
623ea5e3d9 release: prepare for3.1.0.rc5 2019-09-02 14:42:47 +03:00
Avi Kivity
f92a7ca2bf tools: toolchain: fix dbuild in interactive mode regression
Before ede1d248af, running "tools/toolchain/dbuild -it -- bash" was
a nice way to play in the toolchain environment, for example to start
a debugger. But that commit caused containers to run in detached mode,
which is incompatible with interactive mode.

To restore the old behavior, detect that the user wants interactive mode,
and run the container in non-detached mode instead. Add the --rm flag
so the container is removed after execution (as it was before ede1d248af).

Fixes #4930.

Message-Id: <20190506175942.27361-1-avi@scylladb.com>

(cherry picked from commit db536776d9)
2019-08-29 18:33:44 +03:00
Tomasz Grabiec
d70c2db09c service: Announce the new schema version when features are enabled
Introduced in c96ee98.

We call update_schema_version() after features are enabled and we
recalculate the schema version. This method is not updating gossip
though. The node will still use it's database::version() to decide on
syncing, so it will not sync and stay inconsistent in gossip until the
next schema change.

We should call updatE_schema_version_and_announce() instead so that
the gossip state is also updated.

There is no actual schema inconsistency, but the joining node will
think there is and will wait indefinitely. Making a random schema
change would unbock it.

Fixes #4647.

Message-Id: <1566825684-18000-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit ac5ff4994a)
2019-08-27 08:35:58 +03:00
Paweł Dziepak
e4a39ed319 mutation_partition: verify row::append_cell() precondition
row::append_cell() has a precondition that the new cell column id needs
to be larger than that of any other already existing cell. If this
precondition is violated the row will end up in an invalid state. This
patch adds assertion to make sure we fail early in such cases.

(cherry picked from commit 060e3f8ac2)
2019-08-23 15:05:59 +02:00
Hagit Segev
bb70b9ed56 release: prepare for 3.1.0.rc4 2019-08-22 21:12:42 +03:00
Avi Kivity
e06e795031 Merge "database: assign proper io priority for streaming view updates" from Piotr
"
Streamed view updates parasitized on writing io priority, which is
reserved for user writes - it's now properly bound to streaming
write priority.

Verified manually by checking appropriate io metrics: scylla_io_queue_total_bytes{class="streaming_write" ...} vs scylla_io_queue_total_bytes{class="query" ...}

Tests: unit(dev)
"

Fixes #4615.

* 'assign_proper_io_priority_to_streaming_view_updates' of https://github.com/psarna/scylla:
  db,view: wrap view update generation in stream scheduling group
  database: assign proper io priority for streaming view updates

(cherry picked from commit 2c7435418a)
2019-08-22 16:20:19 +03:00
Piotr Sarna
7d56e8e5bb storage_proxy: fix iterator liveness issue in on_down (#4876)
The loop over view update handlers used a guard in order to ensure
that the object is not prematurely destroyed (thus invalidating
the iterator), but the guard itself was not in the right scope.
Fixed by replacinga 'for' loop with a 'while' loop, which moves
the iterator incrementation inside the scope in which it's still
guarded and valid.

Fixes #4866

(cherry picked from commit 526f4c42aa)
2019-08-21 19:04:56 +03:00
Avi Kivity
417250607b relocatable: switch from run-time relocation to install-time relocation
Our current relocation works by invoking the dynamic linker with the
executable as an argument. This confuses gdb since the kernel records
the dynamic linker as the executable, not the real executable.

Switch to install-time relocation with patchelf: when installing the
executable and libraries, all paths are known, and we can update the
path to the dynamic loader and to the dynamic libraries.

Since patchelf itself is dynamically linked, we have to relocate it
dynamically (with the old method of invoking it via the dynamic linker).
This is okay since it's a one-time operation and since we don't expect
to debug core dumps of patchelf crashes.

We lose the ability to run scylla directly from the uninstalled
tarball, but since the nonroot installer is already moving in the
direction of requiring install.sh, that is not a great loss, and
certainly the ability to debug is more important.

dh_strip barfs on some binaries which were treated with patchelf,
so exclude them from dh_strip. This doesn't lose any functionality,
since these binaries didn't have debug information to begin with
(they are already-stripped Fedora executables).

Fixes #4673.

(cherry-picked from commit 698b72b501)

Backport notes:
 - 3.1 doesn't call install.sh from the debian packager, so add an adjust_bin
   and call it from the debian rules file directly
 - adjusted install.sh for 3.1 prefix (/usr) compared to master prefix (/opt/scylladb)
2019-08-20 17:08:49 +03:00
Pekka Enberg
d06bcef3b7 Merge "docker: relax permission checks" from Avi
"Commit e3f7fe4 added file owner validation to prevent Scylla from
 crashing when it tries to touch a file it doesn't own. However, under
 docker, we cannot expect to pass this check since user IDs are from
 different namespaces: the process runs in a container namespace, but the
 data files usually come from a mounted volume, and so their uids are
 from the host namespace.

 So we need to relax the check. We do this by reverting b1226fb, which
 causes Scylla to run as euid 0 in docker, and by special-casing euid 0
 in the ownership verification step.

 Fixes #4823."

* 'docker-euid-0' of git://github.com/avikivity/scylla:
  main: relax file ownership checks if running under euid 0
  Revert "dist/docker/redhat: change user of scylla services to 'scylla'"

(cherry picked from commit 595434a554)
2019-08-14 08:31:10 +03:00
Tomasz Grabiec
50c5cb6861 Merge "Multishard combining reader more robust reader recreation" from Botond
Make the reader recreation logic more robust, by moving away from
deciding which fragments have to be dropped based on a bunch of
special cases, instead replacing this with a general logic which just
drops all already seen fragments (based on their position).  Special
handling is added for the case when the last position is a range
tombstone with a non full prefix starting position.  Reproducer unit
tests are added for both cases.

Refs #4695
Fixes #4733

(cherry picked from commit 0cf4fab2ca)
2019-08-14 08:30:53 +03:00
Kamil Braun
70f5154109 Fix command line argument parsing in main.
Command line arguments are parsed twice in Scylla: once in main and once in Seastar's app_template::run.
The first parse is there to check if the "--version" flag is present --- in this case the version is printed
and the program exists.  The second parsing is correct; however, most of the arguments were improperly treated
as positional arguments during the first parsing (e.g., "--network host" would treat "host" as a positional argument).
This happened because the arguments weren't known to the command line parser.
This commit fixes the issue by moving the parsing code until after the arguments are registered.
Resolves #4141.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
(cherry picked from commit f155a2d334)
2019-08-13 20:11:26 +03:00
Rafael Ávila de Espíndola
329c419c30 Always close commitlog files
We were using segment::_closed to decide whether _file was already
closed. Unfortunately they are not exactly the same thing. As far as
I understand it, segments can be closed and reused without actually
closing the file.

Found with a seastar patch that asserts on destroying an open
append_challenged_posix_file_impl.

Fixes #4745.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190721171332.7995-1-espindola@scylladb.com>
(cherry picked from commit 636e2470b1)
2019-08-13 19:59:13 +03:00
Avi Kivity
062d43c76e Merge "Unbreak the Unbreakable Linux" from Glauber
"
scylla_setup is currently broken for OEL. This happens because the
OS detection code checks for RHEL and Fedora. CentOS returns itself
as RHEL, but OEL does not.
"

Fixes #4842.

* 'unbreakable' of github.com:glommer/scylla:
  scylla_setup: be nicer about unrecognized OS
  scylla_util: recognize OEL as part of the RHEL family

(cherry picked from commit 1cf72b39a5)
2019-08-13 16:52:05 +03:00
Avi Kivity
cf4c238b28 Merge "Catch unclosed partition sstable write #4794" from Tomasz
"
Not emitting partition_end for a partition is incorrect. SStable
writer assumes that it is emitted. If it's not, the sstable will not
be written correctly. The partition index entry for the last partition
will be left partially written, which will result in errors during
reads. Also, statistics and sstable key ranges will not include the
last partition.

It's better to catch this problem at the time of writing, and not
generate bad sstables.

Another way of handling this would be to implicitly generate a
partition_end, but I don't think that we should do this. We cannot
trust the mutation stream when invariants are violated, we don't know
if this was really the last partition which was supposed to be
written. So it's safer to fail the write.

Enabled for both mc and la/ka.

Passing --abort-on-internal-error on the command line will switch to
aborting instead of throwing an exception.

The reason we don't abort by default is that it may bring the whole
cluster down and cause unavailability, while it may not be necessary
to do so. It's safer to fail just the affected operation,
e.g. repair. However, failing the operation with an exception leaves
little information for debugging the root cause. So the idea is that the
user would enable aborts on only one of the nodes in the cluster to
get a core dump and not bring the whole cluster down.
"

* 'catch-unclosed-partition-sstable-write' of https://github.com/tgrabiec/scylla:
  sstables: writer: Validate that partition is closed when the input mutation stream ends
  config, exceptions: Add helper for handling internal errors
  utils: config_file: Introduce named_value::observe()

(cherry picked from commit 95c0804731)
2019-08-08 13:13:42 +02:00
Amnon Heiman
20090c1992 init: do not allow replace-address for seeds
If a node is a seed node, it can not be started with
replace-address-first-boot or the replace-address flag.

The issue is that as a seed node it will generate new tokens instead of
replacing the existing one the user expect it to replaec when supplying
the flags.

This patch will throw a bad_configuration_error exception
in this case.

Fixes #3889

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
(cherry picked from commit 399d79fc6f)
2019-08-07 22:04:58 +03:00
Raphael S. Carvalho
8ffb567474 table: do not rely on undefined behavior in cleanup_sstables
It shouldn't rely on argument evaluation order, which is ub.

Fixes #4718.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 0e732ed1cf)
2019-08-07 21:48:44 +03:00
Tomasz Grabiec
710ec83d12 Merge "Fix the system.size_estimates table" from Kamil
Fixes a segfault when querying for an empty keyspace.

Also, fixes an infinite loop on smp > 1. Queries to
system.size_estimates table which are not single-partition queries
caused Scylla to go into an infinite loop inside
multishard_combining_reader::fill_buffer. This happened because
multishard_combinind_reader assumes that shards return rows belonging
to separate partitions, which was not the case for
size_estimates_mutation_reader.

Fixes #4689.

(cherry picked from commit 14700c2ac4)
2019-08-07 21:38:38 +03:00
Asias He
8d7c489436 streaming: Move stream_mutation_fragments_cmd to a new file (#4812)
Avoid including the lengthy stream_session.hh in messaging_service.

More importantly, fix the build because currently messaging_service.cc
and messaging_service.hh does not include stream_mutation_fragments_cmd.
I am not sure why it builds on my machine. Spotted this when backporting
the "streaming: Send error code from the sender to receiver" to 3.0
branch.

Refs: #4789
(cherry picked from commit 49a73aa2fc)
2019-08-07 19:11:51 +02:00
Asias He
6ec558e3a0 streaming: Send error code from the sender to receiver
In case of error on the sender side, the sender does not propagate the
error to the receiver. The sender will close the stream. As a result,
the receiver will get nullopt from the source in
get_next_mutation_fragment and pass mutation_fragment_opt with no value
to the generating_reader. In turn, the generating_reader generates end
of stream. However, the last element that the generating_reader has
generated can be any type of mutation_fragment. This makes the sstable
that consumes the generating_reader violates the mutation_fragment
stream rule.

To fix, we need to propagate the error. However RPC streaming does not
support propagate the error in the framework. User has to send an error
code explicitly.

Fixes: #4789
(cherry picked from commit bac987e32a)
(cherry picked from commit 288371ce75)
2019-08-07 19:11:33 +02:00
Tomasz Grabiec
b1e2842c8c sstables: ka/la: reader: Make sure push_ready_fragments() does not miss to emit partition_end
Currently, if there is a fragment in _ready and _out_of_range was set
after row end was consumer, push_ready_fragments() would return
without emitting partition_end.

This is problematic once we make consume_row_start() emit
partiton_start directly, because we will want to assume that all
fragments for the previous partition are emitted by then. If they're
not, then we'd emit partition_start before partition_end for the
previous partition. The fix is to make sure that
push_ready_fragments() emits everything.

Fixes #4786

(cherry picked from commit 9b8ac5ecbc)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-08-01 13:06:08 +03:00
Avi Kivity
5a273737e3 Update seastar submodule
* seastar 82637dcab4...c59d019d6b (1):
  > reactor: fix deadlock of stall detector vs dlopen

Fixes #4759.
2019-07-31 18:31:40 +03:00
Avi Kivity
b0d2312623 toppartitions: fix race between listener removal and reads
Data listener reads are implemented as flat_mutation_readers, which
take a reference to the listener and then execute asynchronously.
The listener can be removed between the time when the reference is
taken and actual execution, resulting in a dangling pointer
dereference.

Fix by using a weak_ptr to avoid writing to a destroyed object. Note that writes
don't need protection because they execute atomically.

Fixes #4661.

Tests: unit (dev)
(cherry picked from commit e03c7003f1)
2019-07-28 13:53:40 +03:00
Avi Kivity
2f007d8e6b sstable: index_reader: close index_reader::reader more robustly
If we had an error while reading, then we would have failed to close
the reader, which in turn can cause memory corruption. Make the
closing more robust by using then_wrapped (that doesn't skip on
exception) and log the error for analysis.

Fixes #4761.

(cherry picked from commit b272db368f)
2019-07-27 18:19:57 +03:00
yaronkaikov
bebfd7b26c release: prepare for 3.1.0.rc3 2019-07-25 12:15:55 +03:00
Tomasz Grabiec
03b48b2caf database: Add missing partition slicing on streaming reader recreation
streaming_reader_lifecycle_policy::create_reader() was ignoring the
partition_slice passed to it and always creating the reader for the
full slice.

That's wrong because create_reader() is called when recreating a
reader after it's evicted. If the reader stopped in the middle of
partition we need to start from that point. Otherwise, fragments in
the mutation stream will appear duplicated or out of ordre, violating
assumptions of the consumers.

This was observed to result in repair writing incorrect sstables with
duplicated clustering rows, which results in
malformed_sstable_exception on read from those sstables.

Fixes #4659.

In v2:

  - Added an overload without partition_slice to avoid changing existing users which never slice

Tests:

  - unit (dev)
  - manual (3 node ccm + repair)

Backport: 3.1
Reviewd-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1563451506-8871-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 7604980d63)
2019-07-22 15:08:09 +03:00
Avi Kivity
95362624bc Merge "Fix disable_sstable_write synchronization with on_compaction_completion" from Benny
"
disable_sstable_write needs to acquire _sstable_deletion_sem to properly synchronize
with background deletions done by on_compaction_completion to ensure no sstables will
be created or deleted during reshuffle_sstables after
storage_service::load_new_sstables disables sstable writes.

Fixes #4622

Test: unit(dev), nodetool_additional_test.py migration_test.py
"

* 'scylla-4622-fix-disable-sstable-write' of https://github.com/bhalevy/scylla:
  table: document _sstables_lock/_sstable_deletion_sem locking order
  table: disable_sstable_write: acquire _sstable_deletion_sem
  table: uninline enable_sstable_write
  table: reshuffle_sstables: add log message

(cherry picked from commit 43690ecbdf)
2019-07-22 13:47:25 +03:00
Asias He
7865c314a5 repair: Avoid deadlock in remove_repair_meta
Start n1, n2
Create ks with rf = 2
Run repair on n2
Stop n2 in the middle of repair
n1 will notice n2 is DOWN, gossip handler will remove repair instance
with n2 which calls remove_repair_meta().

Inside remove_repair_meta(), we have:

```
1        return parallel_for_each(*repair_metas, [repair_metas] (auto& rm) {
2            return rm->stop();
3        }).then([repair_metas, from] {
4            rlogger.debug("Removed all repair_meta for single node {}", from);
5        });
```

Since 3.1, we start 16 repair instances in parallel which will create 16
readers.The reader semaphore is 10.

At line 2, it calls

```
6    future<> stop() {
7       auto gate_future = _gate.close();
8       auto writer_future = _repair_writer.wait_for_writer_done();
9       return when_all_succeed(std::move(gate_future), std::move(writer_future));
10    }
```

The gate protects the reader to read data from disk:

```
11 with_gate(_gate, [] {
12   read_rows_from_disk
13        return _repair_reader.read_mutation_fragment() --> calls reader() to read data
14 })
```

So line 7 won't return until all the 16 readers return from the call of
reader().

The problem is, the reader won't release the reader semaphore until the
reader is destroyed!
So, even if 10 out of the 16 readers have finished reading, they won't
release the semaphore. As a result, the stop() hangs forever.

To fix in short term, we can delete the reader, aka, drop the the
repair_meta object once it is stopped.

Refs: #4693
(cherry picked from commit 8774adb9d0)
2019-07-21 13:31:08 +03:00
Asias He
0e6b62244c streaming: Do not open rpc stream connection if ranges are not relevant to a shard
Given a list of ranges to stream, stream_transfer_task will create an
reader with the ranges and create a rpc stream connection on all the shards.

When user provides ranges to repair with -st -et options, e.g.,
using scylla-manger, such ranges can belong to only one shard, repair
will pass such ranges to streaming.

As a result, only one shard will have data to send while the rpc stream
connections are created on all the shards, which can cause the kernel
run out of ports in some systems.

To mitigate the problem, do not open the connection if the ranges do not
belong to the shard at all.

Refs: #4708
(cherry picked from commit 64a4c0ede2)
2019-07-21 10:23:49 +03:00
Kamil Braun
9d722a56b3 Fix timestamp_type_impl::timestamp_from_string.
Now it accepts the 'z' or 'Z' timezone, denoting UTC+00:00.
Fixes #4641.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
(cherry picked from commit 4417e78125)
2019-07-17 21:54:47 +03:00
Eliran Sinvani
7009d5fb23 auth: Prevent race between role_manager and pasword_authenticator
When scylla is started for the first time with PasswordAuthenticator
enabled, it can be that a record of the default superuser
will be created in the table with the can_login and is_superuser
set to null. It happens because the module in charge of creating
the row is the role manger and the module in charge of setting the
default password salted hash value is the password authenticator.
Those two modules are started together, it the case when the
password authenticator finish the initialization first, in the
period until the role manager completes it initialization, the row
contains those null columns and any loging attempt in this period
will cause a memory access violation since those columns are not
expected to ever be null. This patch removes the race by starting
the password authenticator and autorizer only after the role manger
finished its initialization.

Tests:
  1. Unit tests (release)
  2. Auth and cqlsh auth related dtests.

Fixes #4226

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <20190714124839.8392-1-eliransin@scylladb.com>
(cherry picked from commit 997a146c7f)
2019-07-15 21:18:05 +03:00
Takuya ASADA
eb49fae020 reloc: provide libthread_db.so.1 to debug thread on gdb
In scylla-debuginfo package, we have /usr/lib/debug/opt/scylladb/libreloc/libthread_db-1.0.so-666.development-0.20190711.73a1978fb.el7.x86_64.debug
but we actually does not have libthread_db.so.1 in /opt/scylladb/libreloc
since it's not available on ldd result with scylla binary.

To debug thread, we need to add the library in a relocatable package manually.

Fixes #4673

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190711111058.7454-1-syuu@scylladb.com>
(cherry picked from commit 842f75d066)
2019-07-15 14:50:56 +03:00
Asias He
92bf928170 repair: Allow repair when a replica is down
Since commit bb56653 (repair: Sync schema from follower nodes before
repair), the behaviour of handling down node during repair has been
changed.  That is, if a repair follower is down, it will fail to sync
schema with it and the repair of the range will be skipped. This means
a range can not be repaired unless all the nodes for the replicas are up.

To fix, we filter out the nodes that is down and mark the repair is
partial and repair with the nodes that are still up.

Tests: repair_additional_test:RepairAdditionalTest.repair_with_down_nodes_2b_test
Fixes: #4616
Backports: 3.1

Message-Id: <621572af40335cf5ad222c149345281e669f7116.1562568434.git.asias@scylladb.com>
(cherry picked from commit 39ca044dab)
2019-07-11 11:44:49 +03:00
Rafael Ávila de Espíndola
deac0b0e94 mc writer: Fix exception safety when closing _index_writer
This fixes a possible cause of #4614.

From the backtrace in that issue, it looks like a file is being closed
twice. The first point in the backtrace where that seems likely is in
the MC writer.

My first idea was to add a writer::close and make it the responsibility
of the code using the writer to call it. That way we would move work
out of the destructor.

That is a bit hard since the writer is destroyed from
flat_mutation_reader::impl::~consumer_adapter and that would need to
get a close function too.

This patch instead just fixes an exception safety issue. If
_index_writer->close() throws, _index_writer is still valid and
~writer will try to close it again.

If the exception was thrown after _completed.set_value(), that would
explain the assert about _completed.set_value() being called twice.

With this patch the path outside of the destructor now moves the
writer to a local variable before trying to close it.

Fixes #4614
Message-Id: <20190710171747.27337-1-espindola@scylladb.com>

(cherry picked from commit 281f3a69f8)
2019-07-11 11:42:48 +03:00
kbr-
c294000113 Implement tuple_type_impl::to_string_impl. (#4645)
Resolves #4633.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
(cherry picked from commit 8995945052)
2019-07-08 11:08:10 +03:00
Avi Kivity
18bb2045aa Update seastar submodule
* seastar 4cdccae53b...82637dcab4 (1):
  > perftune.py: fix the i3 metal detection pattern

Ref #4057.
2019-07-02 13:49:21 +03:00
Avi Kivity
5e3276d08f Update seastar submodule to point to scylla-seastar.git
This allows us to add 3.1 specific patches to Seastar.
2019-07-02 13:47:50 +03:00
Piotr Sarna
acff367ea8 main: stop view builder conditionally
The view builder is started only if it's enabled in config,
via the view_building=true variable. Unfortunately, stopping
the builder was unconditional, which may result in failed
assertions during shutdown. To remedy this, view building
is stopped only if it was previously started.

Fixes #4589

(cherry picked from commit efa7951ea5)
2019-06-26 10:45:50 +03:00
Tomasz Grabiec
e39724a343 Merge "Sync schema before repair" from Asias
This series makes sure new schema is propagated to repair master and
follower nodes before repair.

Fixes #4575

* dev.git asias/repair_pull_schema_v2:
  migration_manager: Add sync_schema
  repair: Sync schema from follower nodes before repair

(cherry picked from commit 269e65a8db)
2019-06-26 09:35:42 +02:00
Asias He
31c4db83d8 repair: Avoid searching all the rows in to_repair_rows_on_wire
The repair_rows in row_list are sorted. It is only possible for the
current repair_row to share the same partition key with the last
repair_row inserted into repair_row_on_wire. So, no need to search from
the beginning of the repair_rows_on_wire to avoid quadratic complexity.
To fix, look at the last item in repair_rows_on_wire.

Fixes #4580
Message-Id: <08a8bfe90d1a6cf16b67c210151245879418c042.1561001271.git.asias@scylladb.com>

(cherry picked from commit b99c75429a)
2019-06-25 12:48:37 +02:00
Tomasz Grabiec
433cb93f7a Merge "Use same schema version for repair nodes" from Asias
This patch set fixes repair nodes using different schema version and
optimizes the hashing thanks to the fact now all nodes uses same schema
version.

Fixes: #4549

* seastar-dev.git asias/repair_use_same_schema.v3:
  repair: Use the same schema version for repair master and followers
  repair: Hash column kind and id instead of column name and type name

(cherry picked from commit cd1ff1fe02)
2019-06-23 20:57:16 +03:00
Avi Kivity
f553819919 Merge "Fix infinite paging for indexed queries" from Piotr
"
Fixes #4569

This series fixes the infinite paging for indexed queries issue.
Before this fix, paging indexes tended to end up in an infinite loop
of returning pages with 0 results, but has_more_pages flag set to true,
which confused the drivers.

Tests: unit(dev)
Branches: 3.0, 3.1
"

* 'fix_infinite_paging_for_indexed_queries' of https://github.com/psarna/scylla:
  tests: add test case for finishing index paging
  cql3: fix infinite paging for indexed queries

(cherry picked from commit 9229afe64f)
2019-06-23 20:54:27 +03:00
Nadav Har'El
48c34e7635 storage_proxy: fix race and crash in case of MV and other node shutdown
Recently, in merge commit 2718c90448,
we added the ability to cancel pending view-update requests when we detect
that the target node went down. This is important for view updates because
these have a very long timeout (5 minutes), and we wanted to make this
timeout even longer.

However, the implementation caused a race: Between *creating* the update's
request handler (create_write_response_handler()) and actually starting
the request with this handler (mutate_begin()), there is a preemption point
and we may end up deleting the request handler before starting the request.
So mutate_begin() must gracefully handle the case of a missing request
handler, and not crash with a segmentation fault as it did before this patch.

Eventually the lifetime management of request handlers could be refactored
to avoid this delicate fix (which requires more comments to explain than
code), or even better, it would be more correct to cancel individual writes
when a node goes down, not drop the entire handler (see issue #4523).
However, for now, let's not do such invasive changes and just fix bug that
we set out to fix.

Fixes #4386.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190620123949.22123-1-nyh@scylladb.com>
(cherry picked from commit 6e87bca65d)
2019-06-23 20:53:01 +03:00
Nadav Har'El
7f85b30941 Fix deciding whether a query uses indexing
The code that decides whether a query should used indexing was buggy - a partition key index might have influenced the decision even if the whole partition key was passed in the query (which effectively means that indexing it is not necessary).

Fixes #4539

Closes https://github.com/scylladb/scylla/pull/4544

Merged from branch 'fix_deciding_whether_a_query_uses_indexing' of git://github.com/psarna/scylla
  tests: add case for partition key index and filtering
  cql3: fix deciding if a query uses indexing

(cherry picked from commit 6aab1a61be)
2019-06-18 13:25:18 +03:00
Hagit Segev
7d14514b8a release: prepare for 3.1.0.rc2 2019-06-16 20:28:31 +03:00
Piotr Sarna
35f906f06f tests: add a test case for filtering clustering key
The test cases makes sure that clustering key restriction
columns are fetched for filtering if they form a clustering key prefix,
but not a primary key prefix (partition key columns are missing).

Ref #4541
Message-Id: <3612dc1c6c22c59ac9184220a2e7f24e8d18407c.1560410018.git.sarna@scylladb.com>

(cherry picked from commit 2c2122e057)
2019-06-16 14:36:52 +03:00
Piotr Sarna
2c50a484f5 cql3: fix qualifying clustering key restrictions for filtering
Clustering key restrictions can sometimes avoid filtering if they form
a prefix, but that can happen only if the whole partition key is
restricted as well.

Ref #4541
Message-Id: <9656396ee831e29c2b8d3ad4ef90c4a16ab71f4b.1560410018.git.sarna@scylladb.com>

(cherry picked from commit c4b935780b)
2019-06-16 14:36:52 +03:00
Piotr Sarna
24ddb46707 cql3: fix fetching clustering key columns for filtering
When a column is not present in the select clause, but used for
filtering, it usually needs to be fetched from replicas.
Sometimes it can be avoided, e.g. if primary key columns form a valid
prefix - then, they will be optimized out before filtering itself.
However, clustering key prefix can only be qualified for this
optimization if the whole partition key is restricted - otherwise
the clustering columns still need to be present for filtering.

This commit also fixes tests in cql_query_test suite, because they now
expect more values - columns fetched for filtering will be present as
well (only internally, the clients receive only data they asked for).

Fixes #4541
Message-Id: <f08ebae5562d570ece2bb7ee6c84e647345dfe48.1560410018.git.sarna@scylladb.com>

(cherry picked from commit adeea0a022)
2019-06-16 14:36:52 +03:00
Dejan Mircevski
f2fc3f32af tests: Add cquery_nofail() utility
Most tests await the result of cql_test_env::execute_cql().  Most
would also benefit from reporting errors with top-level location
included.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
(cherry picked from commit a9849ecba7)
2019-06-16 14:36:52 +03:00
Asias He
c9f488ddc2 repair: Avoid writing row with same partition key and clustering key more than once
Consider

   master: row(pk=1, ck=1, col=10)
follower1: row(pk=1, ck=1, col=20)
follower2: row(pk=1, ck=1, col=30)

When repair runs, master fetches row(pk=1, ck=1, col=20) and row(pk=1,
ck=1, col=30) from follower1 and follower2.

Then repair master sends row(pk=1, ck=1, col=10) and row(pk=1, ck=1,
col=30) to follower1, follower1 will write the row with the same
pk=1, ck=1 twice, which violates uniqueness constraints.

To fix, we apply the row with same pk and ck into the previous row.
We only needs this on repair follower because the rows can come from
multiple nodes. While on repair master, we have a sstable writer per
follower, so the rows feed into sstable writer can come from only a
single node.

Tests: repair_additional_test.py:RepairAdditionalTest.repair_same_row_diff_value_3nodes_test
Fixes: #4510
Message-Id: <cb4fbba1e10fb0018116ffe5649c0870cda34575.1560405722.git.asias@scylladb.com>
(cherry picked from commit 9079790f85)
2019-06-16 10:23:58 +03:00
Asias He
46498e77b8 repair: Allow repair_row to initialize partially
On repair follower node, only decorated_key_with_hash and the
mutation_fragment inside repair_row are used in apply_rows() to apply
the rows to disk. Allow repair_row to initialize partially and throw if
the uninitialized member is accessed to be safe.
Message-Id: <b4e5cc050c11b1bafcf997076a3e32f20d059045.1560405722.git.asias@scylladb.com>

(cherry picked from commit 912ce53fc5)
2019-06-16 10:23:50 +03:00
Piotr Jastrzebski
440f33709e sstables: distinguish empty and missing cellpath
Before this patch mc sstables writer was ignoring
empty cellpaths. This is a wrong behaviour because
it is possible to have empty key in a map. In such case,
our writer creats a wrong sstable that we can't read back.
This is becaus a complex cell expects cellpath for each
simple cell it has. When writer ignores empty cellpath
it writes nothing and instead it should write a length
of zero to the file so that we know there's an empty cellpath.

Fixes #4533

Tests: unit(release)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <46242906c691a56a915ca5994b36baf87ee633b7.1560532790.git.piotr@scylladb.com>
(cherry picked from commit a41c9763a9)
2019-06-16 09:04:24 +03:00
Pekka Enberg
34696e1582 dist/docker: Switch to 3.1 release repository 2019-06-14 08:10:02 +03:00
Takuya ASADA
43bb290705 dist/docker/redhat: change user of scylla services to 'scylla'
On branch-3.1 / master, we are getting following error:

ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/data: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/hints: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/commitlog: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)
ERROR 2019-06-11 10:58:49,156 [shard 0] database - /var/lib/scylla/view_hints: File not owned by current euid: 0. Owner is: 999
ERROR 2019-06-11 10:58:49,156 [shard 0] init - Failed owner and mode verification: std::runtime_error (File not owned by current euid: 0. Owner is: 999)

It seems like owner verification of data directory fails because
scylla-server process is running in root but data directory owned by
scylla, so we should run services as scylla user.

Fixes #4536
Message-Id: <20190611113142.23599-1-syuu@scylladb.com>

(cherry picked from commit b1226fb15a)
2019-06-14 08:02:45 +03:00
Calle Wilund
53980816de api.hh: Fix bool parsing in req_param
Fixes #4525

req_param uses boost::lexical cast to convert text->var.
However, lexical_cast does not handle textual booleans,
thus param=true causes not only wrong values, but
exceptions.

Message-Id: <20190610140511.15478-1-calle@scylladb.com>
(cherry picked from commit 26702612f3)
2019-06-13 11:56:27 +03:00
Vlad Zolotarov
c1f4617530 fix_system_distributed_tables.py: declare the 'port' argument as 'int'
If a port value passed as a string this makes the cluster.connect() to
fail with Python3.4.

Let's fix this by explicitly declaring a 'port' argument as 'int'.

Fixes #4527

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190606133321.28225-1-vladz@scylladb.com>
(cherry picked from commit 20a610f6bc)
2019-06-13 11:45:54 +03:00
Raphael S. Carvalho
efde9416ed sstables: fix log of failure on large data entry deletion by fixing use-after-move
Fixes #4532.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190527200828.25339-1-raphaelsc@scylladb.com>
(cherry picked from commit 62aa0ea3fa)
2019-06-13 11:44:22 +03:00
Hagit Segev
224f9cee7e release: prepare for 3.1.0.rc1 2019-06-06 18:16:06 +03:00
Hagit Segev
cd1d13f805 release: prepare for 3.1.rc1 2019-06-06 15:32:54 +03:00
Pekka Enberg
899291bc9b relocate_python_scripts.py: Fix node-exporter install on Debian variants
The relocatable Python is built from Fedora packages. Unfortunately TLS
certificates are in a different location on Debian variants, which
causes "node_exporter_install" to fail as follows:

  Traceback (most recent call last):
    File "/usr/lib/scylla/libexec/node_exporter_install", line 58, in <module>
      data = curl('https://github.com/prometheus/node_exporter/releases/download/v{version}/node_exporter-{version}.linux-amd64.tar.gz'.format(version=VERSION), byte=True)
    File "/usr/lib/scylla/scylla_util.py", line 40, in curl
      with urllib.request.urlopen(req) as res:
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 222, in urlopen
      return opener.open(url, data, timeout)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 525, in open
      response = self._open(req, data)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 543, in _open
      '_open', req)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 503, in _call_chain
      result = func(*args)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1360, in https_open
      context=self._context, check_hostname=self._check_hostname)
    File "/opt/scylladb/python3/lib64/python3.7/urllib/request.py", line 1319, in do_open
      raise URLError(err)
  urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1056)>
  Unable to retrieve version information
  node exporter setup failed.

Fix the problem by overriding the SSL_CERT_FILE environment variable to
point to the correct location of the TLS bundle.

Message-Id: <20190604175434.24534-1-penberg@scylladb.com>
(cherry picked from commit eb00095bca)
2019-06-05 22:20:06 +03:00
Paweł Dziepak
4130973f51 tests/perf_fast_forward: report average number of aio operations
perf_fast_forward is used to detect performance regressions. The two
main metrics used for this are fargments per second and the number of
the IO operations. The former is a median of a several runs, but the
latter is just the actual number of asynchronous IO operations performed
in the run that happened to be picked as a median frag/s-wise. There's
no always a direct correlation between frag/s and aio and the latter can
vary which makes the latter hard to compare.

In order to make this easier a new metric was introduced: "average aio"
which reports the average number of asynchronous IO operations performed
in a run. This should produce much more stable results and therefore
make the comparison more meaningful.
Message-Id: <20190430134401.19238-1-pdziepak@scylladb.com>

(cherry picked from commit 51e98e0e11)
2019-06-05 16:36:09 +03:00
Takuya ASADA
24e2c72888 dist/debian: support relocatable python3 on Debian variants
Unlike CentOS, Debian variants has python3 package on official repository,
so we don't have to use relocatable python3 on these distributions.
However, official python3 version is different on each distribution, we may
have issue because of that.
Also, our scripts and packaging implementation are becoming presuppose
existence of relocatable python3, it is causing issue on Debian
variants.

Switching to relocatable python3 on Debian variants avoid these issues,
it will easier to manage Scylla python3 environments accross multiple
distributions.

Fixes #4495

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190531112707.20082-1-syuu@scylladb.com>
(cherry picked from commit 25112408a7)
2019-06-03 17:42:26 +03:00
Raphael S. Carvalho
69cc7d89c8 compaction: do not unconditionally delete a new sstable in interrupted compaction
After incremental compaction, new sstables may have already replaced old
sstables at any point. Meaning that a new sstable is in-use by table and
a old sstable is already deleted when compaction itself is UNFINISHED.
Therefore, we should *NEVER* delete a new sstable unconditionally for an
interrupted compaction, or data loss could happen.
To fix it, we'll only delete new sstables that didn't replace anything
in the table, meaning they are unused.

Found the problem while auditting the code.

Fixes #4479.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190506134723.16639-1-raphaelsc@scylladb.com>
(cherry picked from commit ef5681486f)
2019-06-03 16:00:20 +03:00
Avi Kivity
5f6c5d566a Revert "dist/debian: support relocatable python3 on Debian variants"
This reverts commit 1fbab82553. Breaks build_deb.sh:

18:39:56 +	seastar/scripts/perftune.py seastar/scripts/seastar-addr2line seastar/scripts/perftune.py
18:39:56 Traceback (most recent call last):
18:39:56   File "./relocate_python_scripts.py", line 116, in <module>
18:39:56     fixup_scripts(archive, args.scripts)
18:39:56   File "./relocate_python_scripts.py", line 104, in fixup_scripts
18:39:56     fixup_script(output, script)
18:39:56   File "./relocate_python_scripts.py", line 79, in fixup_script
18:39:56     orig_stat = os.stat(script)
18:39:56 FileNotFoundError: [Errno 2] No such file or directory: '/data/jenkins/workspace/scylla-master/unified-deb/scylla/build/debian/scylla-package/+'
18:39:56 make[1]: *** [debian/rules:19: override_dh_auto_install] Error 1
2019-05-29 14:00:29 +03:00
Takuya ASADA
f32aea3834 reloc/python3: add license files on relocatable python3 package
It's better to have license files on our python3 distribution.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190516094329.13273-1-syuu@scylladb.com>
(cherry picked from commit 4b08a3f906)
2019-05-29 13:59:38 +03:00
Takuya ASADA
933260cb53 dist/ami: output scylla version information to AMI tags and description
Users may want to know which version of packages are used for the AMI,
it's good to have it on AMI tags and description.

To do this, we need to download .rpm from specified .repo, extract
version information from .rpm.

Fixes #4499

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190520123924.14060-2-syuu@scylladb.com>
(cherry picked from commit a55330a10b)
2019-05-29 13:59:38 +03:00
Takuya ASADA
f8ff0e1993 dist/ami: build scylla-python3 when specified --localrpm
Since we switched to relocatable python3, we need to build it for AMI too.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190520123924.14060-1-syuu@scylladb.com>
(cherry picked from commit abe44c28c5)
2019-05-29 13:59:38 +03:00
Takuya ASADA
1fbab82553 dist/debian: support relocatable python3 on Debian variants
Unlike CentOS, Debian variants has python3 package on official repository,
so we don't have to use relocatable python3 on these distributions.
However, official python3 version is different on each distribution, we may
have issue because of that.
Also, our scripts and packaging implementation are becoming presuppose
existence of relocatable python3, it is causing issue on Debian
variants.

Switching to relocatable python3 on Debian variants avoid these issues,
it will easier to manage Scylla python3 environments accross multiple
distributions.

Fixes #4495

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190526105138.677-1-syuu@scylladb.com>
(cherry picked from commit 4d119cbd6d)
2019-05-26 15:40:56 +03:00
Paweł Dziepak
c664615960 Merge "Fix empty counters handling in MC" from Piotr
"
Before this patchset empty counters were incorrectly persisted for
MC format. No value was written to disk for them. The correct way
is to still write a header that informs the counter is empty.

We also need to make sure that reading wrongly persisted empty
counters works because customers may have sstables with wrongly
persisted empty counters.

Fixes #4363
"

* 'haaawk/4363/v3' of github.com:scylladb/seastar-dev:
  sstables: add test for empty counters
  docs: add CorrectEmptyCounters to sstable-scylla-format
  sstables: Add a feature for empty counters in Scylla.db.
  sstables: Write header for empty counters
  sstables: Remove unused variables in make_counter_cell
  sstables: Handle empty counter value in read path

(cherry picked from commit 899ebe483a)
2019-05-23 22:15:00 +03:00
Benny Halevy
6a682dc5a2 cql3: select_statement: provide default initializer for parameters::_bypass_cache
Fixes #4503

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190521143300.22753-1-bhalevy@scylladb.com>
(cherry picked from commit fae4ca756c)
2019-05-23 08:28:01 +03:00
Gleb Natapov
c1271d08d3 cache_hitrate_calculator: make cache hitrate calculation preemptable
The calculation is done in a non preemptable loop over all tables, so if
numbers of tables is very large it may take a while since we also build
a string for gossiper state. Make the loop preemtable and also make
the string calculation more efficient by preallocating memory for it.
Message-Id: <20190516132748.6469-3-gleb@scylladb.com>

(cherry picked from commit 31bf4cfb5e)
2019-05-17 12:38:34 +02:00
Gleb Natapov
0d5c2501b3 cache_hitrate_calculator: do not copy stats map for each cpu
invoke_on_all() copies provided function for each shard it is executed
on, so by moving stats map into the capture we copy it for each shard
too. Avoid it by putting it into the top level object which is already
captured by reference.
Message-Id: <20190516132748.6469-2-gleb@scylladb.com>

(cherry picked from commit 4517c56a57)
2019-05-17 12:38:30 +02:00
Asias He
0dd84898ee repair: Fix use after free in remove_repair_meta for repair_metas
We should capture repair_metas so that it will not be freed until the
parallel_for_each is finished.

Fixes: #4333
Tests: repair_additional_test.py:RepairAdditionalTest.repair_kill_1_test
Message-Id: <237b20a359122a639330f9f78c67568410aef014.1557922403.git.asias@scylladb.com>
(cherry picked from commit 51c4f8cc47)
2019-05-16 11:12:09 +03:00
Avi Kivity
d568270d7f Merge "gc_clock: Fix hashing to be backwards-compatible" from Tomasz
"
Commit d0f9e00 changed the representation of the gc_clock::duration
from int32_t to int64_t.

Mutation hashing uses appending_hash<gc_clock::time_point>, which by
default feeds duration::count() into the hasher. duration::rep changed
from int32_t to int64_t, which changes the value of the hash.

This affects schema digest and query digests, resulting in mismatches
between nodes during a rolling upgrade.

Fixes #4460.
Refs #4485.
"

* tag 'fix-gc_clock-digest-v2.1' of github.com:tgrabiec/scylla:
  tests: Add test which verifies that schema digest stays the same
  tests: Add sstables for the schema digest test
  schema_tables, storage_service: Make schema digest insensitive to expired tombstones in empty partition
  db/schema_tables: Move feed_hash_for_schema_digest() to .cc file
  hashing: Introduce type-erased interface for the hasher
  hashing: Introduce C++ concept for the hasher
  hashers: Rename hasher to cryptopp_hasher
  gc_clock: Fix hashing to be backwards-compatible

(cherry picked from commit 82b91c1511)
2019-05-15 09:48:05 +03:00
Takuya ASADA
78c57f18c4 dist/ami: fix wrong path of SCYLLA-PRODUCT-FILE
Since other build_*.sh are for running inside extracted relocatable
package, they have SCYLLA-PRODUCT-FILE on top of the directory,
but build_ami.sh is not running in such condition, we need to run
SCYLLA-VERSION-GEN first, then refer to build/SCYLLA-PRODUCT-FILE.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190509110621.27468-1-syuu@scylladb.com>
(cherry picked from commit 19a973cd05)
2019-05-13 16:45:25 +03:00
Glauber Costa
ce27949797 Support AWS i3en instances
AWS just released their new instances, the i3en instances.  The instance
is verified already to work well with scylla, the only adjustments that
we need is advertise that we support it, and pre-fill the disk
information according to the performance numbers obtained by running the
instance.

Fixes #4486
Branches: 3.1

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190508170831.6003-1-glauber@scylladb.com>
(cherry picked from commit a23531ebd5)
2019-05-13 16:45:25 +03:00
Hagit Segev
6b47e23d29 release: prepare for 3.1.0.rc0 2019-05-13 15:03:34 +03:00
Piotr Sarna
1cb6cc0ac4 Revert "view: cache is_index for view pointer"
This reverts commit dbe8491655.
Caching the value was not done in a correct manner, which resulted
in longevity tests failures.

Fixes #4478

Branches: 3.1

Message-Id: <762ca9db618ca2ed7702372fbafe8ecd193dcf4d.1557129652.git.sarna@scylladb.com>
(cherry picked from commit cf8d2a5141)
2019-05-08 11:14:11 +03:00
Benny Halevy
67435eff15 time_window_backlog_tracker: fix use after free
Fixes #4465

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190430094209.13958-1-bhalevy@scylladb.com>
(cherry picked from commit 3a2fa82d6e)
2019-05-06 09:38:08 +03:00
Gleb Natapov
086ce13fb9 batchlog_manager: fix array out of bound access
endpoint_filter() function assumes that each bucket of
std::unordered_multimap contains elements with the same key only, so
its size can be used to know how many elements with a particular key
are there.  But this is not the case, elements with multiple keys may
share a bucket. Fix it by counting keys in other way.

Fixes #3229

Message-Id: <20190501133127.GE21208@scylladb.com>
(cherry picked from commit 95c6d19f6c)
2019-05-03 11:59:09 +03:00
Glauber Costa
eb9a8f4442 scylla_setup: respect user's decision not to call housekeeping
The setup script asks the user whether or not housekeeping should
be called, and in the first time the script is executed this decision
is respected.

However if the script is invoked again, that decision is not respected.

This is because the check has the form:

 if (housekeeping_cfg_file_exists) {
    version_check = ask_user();
 }
 if (version_check) { do_version_check() } else { dont_do_it() }

When it should have the form:

 if (housekeeping_cfg_file_exists) {
    version_check = ask_user();
    if (version_check) { do_version_check() } else { dont_do_it() }
 }

(Thanks python)

This is problematic in systems that are not connected to the internet, since
housekeeping will fail to run and crash the setup script.

Fixes #4462

Branches: master, branch-3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190502034211.18435-1-glauber@scylladb.com>
(cherry picked from commit 47d04e49e8)
2019-05-03 09:57:31 +03:00
Glauber Costa
178fb5fe5f make scylla_util OS detection robust against empty lines
Newer versions of RHEL ship the os-release file with newlines in the
end, which our script was not prepared to handle. As such, scylla_setup
would fail.

This patch makes our OS detection robust against that.

Fixes #4473

Branches: master, branch-3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190502152224.31307-1-glauber@scylladb.com>
(cherry picked from commit 99c00547ad)
2019-05-03 09:57:21 +03:00
Nadav Har'El
a45b6e41a0 materialized views and secondary index: sometimes allow dropping base columns
Until this patch, dropping columns from a table was completely forbidden
if this table has any materialized views or secondary indexes. However,
this is excessively harsh, and not compatible with Cassandra which does
allow dropping columns from a base table which has a secondary index on
*other* columns. This incompatibility was raised in the following
Stackoverflow question:
https://stackoverflow.com/questions/55757273/error-while-dropping-column-from-a-table-with-secondary-index-scylladb/55776490

In this patch, we allow dropping a base table column if none of its
materialized views *needs* this column. Columns selected by a view
(as regular or key columns) are needed by it, of course, but when
virtual columns are used (namely, there is a view with same key columns
as the base), *all* columns are needed by the view, so unfortunately none
of the columns may be dropped.

After this patch, when a base-table column cannot be dropped because one
of the materialized views needs it, the error message will look like:

   exceptions::invalid_request_exception: Cannot drop column a from base
   table ks.cf: a materialized view cf_a_idx_index needs this column.

This patch also includes extensive testing for the cases where dropping
columns are now allowed, and not allowed. The secondary-index tests are
especially interesting, because they demonstrate that now usually (when
a non-key column is being indexed) dropping columns will be allowed,
which is what originally bothered the Stackoverflow user.

Fixes #4448.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190429214805.2972-1-nyh@scylladb.com>
2019-04-30 12:13:10 +01:00
Nadav Har'El
92d5f61ba5 cql: support single-value IN restriction wherever EQ restriction is supported
There are several places were IN restrictions are not currently supported,
especially in queries involving a secondary index. However, when the IN
restriction has just a single value, it is nothing more than an equality
restriction and can be converted into one and be supported. So this patch
does exactly this.

Note that Cassandra does this conversion since August 2016, and therefore
supports the special case of single-value IN even where general IN is not
supported. So it's important for Cassandra compatibility that we do this
conversion too.

This patch also includes a test with two queries involving a secondary
index that were previously disallowed because of the "IN" on the primary
key or the indexed column - and are now allowed when the IN restriction
has just a single value. A third query tested is not related to secondary
indexes, but confirms we don't break multi-column single-value IN queries.

Fixes #4455.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190428160317.23328-1-nyh@scylladb.com>
2019-04-30 12:13:06 +01:00
Tomasz Grabiec
1adcb3637e Merge "multishard reader: fix handling of non strictly monotonous positions" from Botond
The shard readers of the multishard reader assumed that the positions in
the data stream are strictly monotonous. This assumption is invalid.
Range tombstones can have positions that they can share with other range
tombstones and/or a clustering row. The effect of this false assumption
was that when the shard reader was evicted such that the last seen
fragment was a range tombstone, when recreated it would skip any unseen
fragments that have the same position as that of the last seen range
tombstone.

Fixes: #4418

Branches: master, 3.0, 2019.1

Tests: unit(dev)

* https://github.com/denesb/scylla.git
multishard_reader_handle_non_strictly_monotonous_positions/v4:
  multishard_combining_reader: shard_reader::remote_reader extract
    fill-buffer logic into do_fill_buffer()
  mutlishard_combining_reader: reorder
    shard_reader::remote_reader::do_fill_buffer() code
  position_in_partition_view: add region() accessor
  multishard_combining_reader: fix handling of non-strictly monotonous
    positions
  flat_mutation_reader: add flat_mutation_reader_from_mutations()
    overload with range and slice
  flat_mutation_reader: add make_flat_mutation_reader_from_fragments()
    overload with range and slice
  tests: add unit test for multishard reader correctly handling
    non-strictly monotonous positions
2019-04-30 12:35:28 +02:00
Tomasz Grabiec
077c639e42 Merge "Simplify the result_set_row API" from Rafael
Currently null and missing values are treated differently. Missing
values throw no_such_column. Null values return nullptr, std::nullopt
or throw null_column_value.

The api is a bit confusing since a function returning a std::optional
either returns std::nullopt or throws depending on why there is no
value.

With this patch series only get_nonnull throws and there is only one
exception type.

* https://github.com/espindola/scylla.git espindola/merge-null-and-missing-v2:
  query-result-set: merge handling of null and missing values
  Remove result_set_row::has
  Return a reference from get_nonnull
2019-04-30 11:06:29 +02:00
Rafael Ávila de Espíndola
63c47117b5 Return a reference from get_nonnull
No reason to copy if we don't have to. Now that get_nonnull doesn't
copy, replace a raw used of get_data_value with it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-29 21:14:11 -07:00
Rafael Ávila de Espíndola
0474458872 Remove result_set_row::has
Now that the various get methods return nullptr or std::nullopt on
missing values, we don't need to do double lookups.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-29 19:56:26 -07:00
Rafael Ávila de Espíndola
2770b29036 query-result-set: merge handling of null and missing values
Nothing seems to differentiate a missing and a null value. This patch
then merges the two exception types and now the only method that
throws is get_nonnull. The other methods return nullptr or
std::nullopt as appropriate.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-29 19:56:20 -07:00
Avi Kivity
3726a4fbd9 Merge "Fix schema disagreement during rolling upgrade" from Tomasz
"
After 7c87405, schema sync includes system_schema.view_virtual_columns in the
schema digest. Old nodes don't know about this table and will not include it
in the digest calculation. As a result, there will be schema disagreement
until the whole cluster is upgraded.

Also, the order in which tables were hashed changed in 7c87405, which
causes digests to differ in some schemas.

Fixes #4457.
"

* tag 'fix-disagreement-during-upgrade-v2' of github.com:tgrabiec/scylla:
  db/schema_tables: Include view_virtual_columns in the digest only when all nodes do
  storage_service: Introduce the VIEW_VIRTUAL_COLUMNS cluster feature
  db/schema_tables: Hash schema tables in the same order as on 3.0
  db/schema_tables: Remove table name caching from all_tables()
  treewide: Propagate schema_features to db::schema::all_tables()
  enum_set: Introduce full()
  service/storage_service: Introduce cluster_schema_features()
  schema: Introduce schema_features
  schema_tables: Propagate storage_service& to merge_schema()
  gms/feature: Introduce a more convenient when_enabled()
  gms/feature: Mark all when_enabled() overloads as const
2019-04-29 14:23:53 +03:00
Avi Kivity
ede1d248af tools: toolchain: improve dbuild signal handing
Currently, we use --sig-proxy to forward signals to the container. However, this
requires the container's co-operation, which usually doesn't exist. For example,

    docker run --sig-proxy fedora:29 bash -c "sleep 5"

Does not respond to ctrl-C.

This is a problem for continuous integration. If a build is aborted, Jenkins will
first attempt to gracefully terminate the processes (SIGINT/SIGTERM) and then give
up and use SIGKILL. If the graceful termination doesn't work, we end up with an
orphan container running on the node, which can then consume enough memory and CPU
to harm the following jobs.

To fix this, trap signals and handle them by killing the container. Also trap
shell exit, and even kill the container unconditionally, since if Jenkins happens
to kill the "docker wait" process the regular paths will not be taken.

We lose a lot by running the container asynchronously with the dbuild shell
script, so we need to add it back:

 - log display: via the "docker logs" command
 - auto-removal of the container: add a "docker rm -f" command on signal
   or normal exit
Message-Id: <20190424130112.794-1-avi@scylladb.com>
2019-04-29 10:05:21 +02:00
Botond Dénes
aa18bb33b9 tests: add unit test for multishard reader correctly handling non-strictly monotonous positions 2019-04-29 10:24:14 +03:00
Botond Dénes
51e81cf027 flat_mutation_reader: add make_flat_mutation_reader_from_fragments() overload with range and slice
To be able to support this new overload, the reader is made
partition-range aware. It will now correctly only return fragments that
fall into the partition-range it was created with. For completeness'
sake and to be able to test it, also implement
`fast_forward_to(const dht::partition_range)`. Slicing is done by
filtering out non-overlapping fragments from the initial list of
fragments. Also add a unit test that runs it through the mutation_source
test suite.
2019-04-29 10:24:14 +03:00
Tomasz Grabiec
c96ee9882b db/schema_tables: Include view_virtual_columns in the digest only when all nodes do
After 7c87405, schema sync includes system_schema.view_virtual_columns
in the schema digest. Old nodes don't know about this table and will
not include it in the digest calculation. As a result, there will be
schema disagreement until the whole cluster is upgraded.

Fix this by taking the new table into account only when the whole
cluster is upgraded.

The table should not be used for anything before this happens. This is
not currently enforced, but should be.

Fixes #4457.
2019-04-28 15:50:13 +02:00
Tomasz Grabiec
a108df09f9 storage_service: Introduce the VIEW_VIRTUAL_COLUMNS cluster feature
Needed for determining if all nodes in the cluster are aware of the
new schema table. Only when all nodes are aware of it we can take it
into account when calculating schema digest, otherwise there would be
permanent schema disagreement in during rolling upgrade.
2019-04-28 15:50:13 +02:00
Tomasz Grabiec
73b859005c db/schema_tables: Hash schema tables in the same order as on 3.0
The commit 7c87405 also indirectly changed the order of schema tables
during hash calculation (index table should be taken after all other
tables). This shows up when there is an index created and any of {user
defined type, function, or aggregate}.

Refs #4457.
2019-04-28 15:50:13 +02:00
Tomasz Grabiec
394a684a99 db/schema_tables: Remove table name caching from all_tables()
The set of table names will depend on the features and thus will be dynamic.
2019-04-28 15:50:13 +02:00
Tomasz Grabiec
3cb7b2d72e treewide: Propagate schema_features to db::schema::all_tables() 2019-04-28 15:50:13 +02:00
Tomasz Grabiec
f33f0d759d enum_set: Introduce full() 2019-04-28 15:50:12 +02:00
Tomasz Grabiec
1d9b88dceb service/storage_service: Introduce cluster_schema_features() 2019-04-28 15:50:12 +02:00
Tomasz Grabiec
0633fcde10 schema: Introduce schema_features 2019-04-28 15:50:12 +02:00
Tomasz Grabiec
6e2c190b5f schema_tables: Propagate storage_service& to merge_schema()
We will need to calculate cluster schema features at the time we
calculate the schema digest.
2019-04-28 12:33:10 +02:00
Tomasz Grabiec
6db002163f gms/feature: Introduce a more convenient when_enabled()
It can be invoked with a lambda without the ceremony of creating a
class deriving from gms::feature::listener.

The reutrned registration object controls listener's scope.
2019-04-28 12:33:10 +02:00
Tomasz Grabiec
22c07b9183 gms/feature: Mark all when_enabled() overloads as const 2019-04-28 12:33:10 +02:00
Rafael Ávila de Espíndola
ee9f3388f6 cql_query_test: Fix a use after return
There was nothing keeping the verify lambda alive after the return. It
worked most of the time since the only state kept by the lambda was
a pointer to cql_test_env.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190426203823.15562-1-espindola@scylladb.com>
2019-04-27 08:06:35 +03:00
Avi Kivity
07d06aee43 Update seastar submodule
* seastar e84d2647c...4cdccae53 (4):
  > Merge "future: Move some code out of line" from Rafael
  > tests: socket_test: Add missing virtual and override
  > build: Don't pass -Wno-maybe-uninitialized to clang
  > Merge "expose file_permssions for creating files and dirs in API" from Benny
2019-04-26 22:58:48 +03:00
Tomasz Grabiec
c6274fdef3 keys: Avoid implicit conversion to partition_key in the hasher of partition_key_view
Message-Id: <1556230107-13557-1-git-send-email-tgrabiec@scylladb.com>
2019-04-26 20:02:35 +03:00
Botond Dénes
bc08f8fd07 flat_mutation_reader: add flat_mutation_reader_from_mutations() overload with range and slice
To be able to run the mutation-source test suite with this reader. In
the next patch, this reader will be used in testing another reader, so
it is important to make sure it works correctly first.
2019-04-26 12:43:45 +03:00
Botond Dénes
eba310163d multishard_combining_reader: fix handling of non-strictly monotonous positions
The shard readers under a multishard reader are paused after every
operation executed on them. When paused they can be evicted at any time.
When this happens, they will be re-created lazily on the next
operation, with a start position such that they continue reading from
where the evicted reader left off. This start position is determined
from the last fragment seen by the previous reader. When this position
is clustering position, the reader will be recreated such that it reads
the clustering range (from the half-read partition): (last-ckey, +inf).
This can cause problems if the last fragment seen by the evicted reader
was a range-tombstone. Range tombstones can share the same clustering
position with other range tombstones and potentially one clustering row.
This means that when the reader is recreated, it will start from the
next clustering position, ignoring any unread fragments that share the
same position as the last seen range tombstone.
To fix, ensure that on each fill-buffer call, the buffer contains all
fragments for the last position. To this end, when the last fragment in
the buffer is a range tombstone (with pos x), we continue reading until
we see a fragment with a position y that is greater. This way it is
ensured that we have seen all fragments for pos x and it is safe to
resume the read, starting from after position x.
2019-04-26 11:38:12 +03:00
Botond Dénes
b30af48c83 position_in_partition_view: add region() accessor 2019-04-26 11:38:12 +03:00
Piotr Sarna
037b517c85 service: initialize system distributed keyspace after schema agreement
In order to avoid schema disagreements during upgrades (which may lead
to deadlocks), system distributed keyspace initialization is moved
right before starting the bootstrapping process, after the schema
agreement checks already succeeded.

Fixes #3976
Message-Id: <932e642659df1d00a2953df988f939a81275774a.1556204185.git.sarna@scylladb.com>
2019-04-25 18:44:08 +02:00
Raphael S. Carvalho
ccb29c6c20 sstables: make partitioned sstable set available to custom compaction strategies
To make it available, we'll need to make it optional the usage of level metadata,
used to deal with interval map's fragmentation issue when level 0 falls behind,
and also introduce a interface for compaction strategies to implement
make_sstable_set() that instantiate partitioned sstable set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190424232948.668-1-raphaelsc@scylladb.com>
2019-04-25 12:59:04 +03:00
Botond Dénes
a3f79bfe5e mutlishard_combining_reader: reorder shard_reader::remote_reader::do_fill_buffer() code
Reduce the number of indentations - use early return for the short path.
2019-04-24 10:55:16 +03:00
Botond Dénes
bbd3f0acc3 multishard_combining_reader: shard_reader::remote_reader extract fill-buffer logic into do_fill_buffer() 2019-04-24 10:55:16 +03:00
Avi Kivity
b19792405f main: RAII-ify shutdown
Instead of app-template::run_deprecated() and at_exit() hooks, use
app_template::run() and RAII (via defer()) to stop services. This makes it
easier to add services that do support shutdown correctly.

Ref #2737
Message-Id: <20190420175733.29454-1-avi@scylladb.com>
2019-04-23 16:13:39 +02:00
Tomasz Grabiec
21fbf59fa8 lsa: Fix compact_and_evict() being called with a too low step
compact_and_evict gets memory_to_release in bytes while
reclamation step is in segments.

Broken in f092decd90.

It doesn't make much difference with the current default step of 1
segment since we cannot reclaim less than that, so shouldn't cause
problems in practice.

Message-Id: <1556013920-29676-1-git-send-email-tgrabiec@scylladb.com>
2019-04-23 13:14:43 +03:00
Gleb Natapov
c6b3b9ff13 cache_hitrate_calculator: wait for ongoing calculation to complete during stop
Currently stop returns ready future immediately. This is not a problem
since calculation loop holds a shared pointer to the local service, so
it will not be destroyed until calculation completes and global database
object db, that also used by the calculation, is never destroyed. But the
later is just a workaround for a shutdown sequence that cannot handle
it and will be changed one day. Make cache hitrate calculation service
ready for it.

Message-Id: <20190422113538.GR21208@scylladb.com>
2019-04-22 14:44:42 +03:00
Takuya ASADA
64c2aa8f9b reloc/python3: add missing SCYLLA-PRODUCT-FILE to python3 relocatable package
Since 214c74a, we need SCYLLA-PRODUCT-FILE on relocatable package so add
it on python3 package as well.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190422085620.22486-1-syuu@scylladb.com>
2019-04-22 13:56:38 +03:00
Gleb Natapov
306f5b99b5 cache_hitrate_calculator: fix use after free in non_system_filter lambda
non_system_filter lambda is defined static which means it is initialized
only once, so the 'this' that is will capture will belong to a shard
where the function runs first. During service destruction the function
may run on different shard and access already other's shard service that
may be already freed.

Fixed #4425

Message-Id: <20190421152139.GN21208@scylladb.com>
2019-04-21 18:22:31 +03:00
Amnon Heiman
9ad63efcfe Adding node_exporter to docker
This patch add the node_exporter to the docker image.
It install it create and run a service with it.

After this patch node_exporter will run and will be part of scylla
Docker image.

Fixes #4300

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20190421130643.6837-1-amnon@scylladb.com>
2019-04-21 18:12:58 +03:00
Benny Halevy
0c9aaef673 sstables: make lamdas that std:move mutable
As noticed by Rafael Ávila de Espíndola <espindola@scylladb.com>
regarding commit 5a99023d4a:
Without the lambda being mutable, the second std::move actually doesn't move anything.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190421150422.19304-1-bhalevy@scylladb.com>
2019-04-21 18:11:42 +03:00
Benny Halevy
5a99023d4a treewide: use lambda for io_check of *touch_directory
To prepare for a seastar change that adds an optional file_permissions
parameter to touch_directory and recursive_touch_directory.
This change messes up the call to io_check since the compiler can't
derive the Func&& argument.  Therefore, use a lambda function instead
to wrap the call to {recursive_,}touch_directory.

Ref #4395

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190421085502.24729-1-bhalevy@scylladb.com>
2019-04-21 12:04:39 +03:00
Tomasz Grabiec
f092decd90 lsa: Fix potential bad_alloc even though evictable memory exists
When we start the LSA reclamation it can be that
segment_pool::_free_segments is 0 under some conditions and
segment_pool::_current_emergency_reserve_goal is set to 1. The
reclamation step is 1 segment, and compact_and_evict_locked() frees 1
segment back into the segment_pool. However,
segment_pool::reclaim_segments() doesn't free anything to the standard
allocator because the condition _free_segments >
_current_emergency_reserve_goal is false. As a result,
tracker::impl::reclaim() returns 0 as the amount of released memory,
tracker::reclaim() returns
memory::reclaiming_result::reclaimed_nothing and the seastar allocator
thinks it's a real OOM and throws std::bad_alloc.

The fix is to change compact_and_evict() to make sure that reserves
are met, by releasing more if they're not met at entry.

This change also allows us to drop the variant of allocate_segment()
which accepts the reclamation step as a means to refill reserves
faster. This is now not needed, because compact_and_evict() will look
at the reserve deficit to increase the amount of memory to reclaim.

Fixes #4445

Message-Id: <1555671713-16530-1-git-send-email-tgrabiec@scylladb.com>
2019-04-20 09:17:49 +03:00
Avi Kivity
704600f829 Update seastar submodule
* seastar eb03ba5cd...e84d2647c (14):
  > Fix hardcoded python paths in shebang line
  > Disable -Wmaybe-uninitialized everywhere
  > app_template: allow opting out of automatic SIGINT/SIGTERM handling
  > build: Restore DPDK machine inference from cflags
  > http: capture request content for POST requests
  > Merge "Simplify future_state and promise" from Rafael
  > temporary_buffer: fix memleak on fast path
  > perftune.py: allow explicitly giving a CPU mask to be used for binding IRQs
  > perftune.py: fix the sanity check for args.tune
  > perftune.py: identify fast-path hardware queues IRQs of Mellanox NICs
  > memory: malloc_allocator should be always available
  > Merge "Using custom allocator in the posix network stack" from Elazar
  > memory: Tell reclaimers how much should be reclaimed
  > net/ipv4_addr: add std::hash & operator== overloads
2019-04-20 09:16:53 +03:00
Avi Kivity
d485facea2 Revert "tools: toolchain: improve dbuild signal handing"
This reverts commit 6c672e674b. It loses
build logs, and the patch that restores logs causes build failures, so
the whole thing needs to be revisited.
2019-04-19 15:16:42 +03:00
Takuya ASADA
0a874f1897 dist/docker/redhat: prioritize /opt/scylladb/python3/bin on $PATH
To prevent running entrypoint script in another python3 package like
python36 in EPEL, move /opt/scylladb/python3/bin to top of $PATH.
It won't happen on this container image, but may occurs when user tries to
extend the image.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190417165806.12212-1-syuu@scylladb.com>
2019-04-19 11:47:40 +03:00
Takuya ASADA
c3dae6673f dist/common/scripts: use out() to run perftune.py
perftune.py executes hwloc-calc, the command is now provided as
relocatable binary, placed under /opt/scylladb/bin.
So we need to add the directory to PATH when calling
subprocess.check_output(), but our utility function already do that,
switch to it.

Fixes #4443

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190418124345.24973-1-syuu@scylladb.com>
2019-04-19 11:47:40 +03:00
Benny Halevy
9785754e0d distributed_loader: do not follow symlinks when verifying mode and owner
We allow only regular files and directotries so to detect symlinks
we must not follow them.

Fixes #4375

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190418051627.9298-1-bhalevy@scylladb.com>
2019-04-19 11:47:40 +03:00
Takuya ASADA
214c74a71d dist: merge product name parameter on single place
When we add product name customization, we mistakenly defined the
parameter on each package build script.
Number of script is increasing since we recently added relocatable
python3 package, we should merge it in single place.

Also we should save the parameter on relocatable package, just like
version-release parameters.

So move the definition to SCYLLA-VERSION-GEN, save it to
build/SCYLLA-PRODUCT-FILE then archive it to relocatable package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190417163335.10191-1-syuu@scylladb.com>
2019-04-19 11:47:40 +03:00
Paweł Dziepak
d47ea66ec6 messaging_service: add lz4_fragmented RPC compressor
Seastar now supports two RPC compression algorithm: the original LZ4 one
and LZ4_FRAGMENTED. The latter uses lz4 stream interface which allows it
to process large messages without fully linearising them. Since, RPC
requests used by Scylla often contain user-provided data that
potentially could be very large, LZ4_FRAGMENTED is a better choice for
the default compression algorithm.

Message-Id: <20190417144318.27701-1-pdziepak@scylladb.com>
2019-04-18 19:07:14 +03:00
Takuya ASADA
592fec32a0 dist/common/scripts: use /etc/os-release to detect distributions
Since we moved relocatable .rpm now Scylla able to run on Amazon Linux
2.
However, is_redhat_variant() on scylla_util.py does not works on Amazon
Linux 2, since it does not have /etc/redhat-release.
So we need to switch to /etc/os-release, use ID_LIKE to detect Redhat
variants/Debian variants.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190417115634.9635-1-syuu@scylladb.com>
2019-04-18 19:07:14 +03:00
Takuya ASADA
3cf7cf015a dist/docker/redhat: use relocatable python3 on docker-entrypoint.py
Switch to relocatable python3 instead of EPEL's python3 on docker-entrypoint.py.
Also drop uneeded dependencies, since we switched to relocatable scylla
image.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190417111024.6604-1-syuu@scylladb.com>
2019-04-18 19:07:14 +03:00
Paweł Dziepak
85409c1a16 Merge "Validate elements of collections" from Piotr
"
Previously we weren't validating elements of collections so it
was possible to add non-UTF-8 string to a column with type
list<text>.

Tests: unit(release)

Fixes #4009
"

* 'haaawk/4009/v5' of github.com:scylladb/seastar-dev:
  types: Test correct map validation
  types: Test correct in clause validation
  types: Test correct tuple validation
  types: Test correct set validation
  types: Test correct list validation
  types: Add test_tuple_elements_validation
  types: Add test_in_clause_validation
  types: Add test_map_elements_validation
  types: Add test_set_elements_validation
  types: Add test_list_elements_validation
  types: Validate input when tuples
  types: Validate input when parsing a set
  types: Validate input when parsing a map
  types: Validate input when parsing a list
  types: Implement validation for tuple
  types: Implement validation for set
  types: Implement validation for map
  types: Implement validation for list
  types: Add cql_serialization_format parameter to validate
2019-04-18 19:07:14 +03:00
Botond Dénes
6e85d1e8c1 date_type_impl: add notice explaining why its not used
And why is it still in the code. The note has been copied from Origin.

Refs: #4419
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <c7790a898c331a7f58014d82a10cbc9ee7ad3265.1555483620.git.bdenes@scylladb.com>
2019-04-18 19:07:14 +03:00
Piotr Jastrzebski
134b59a425 table_helper: take insert function arguments by value
Previous version wasn't working correctly with r-values.

Fixes #4438

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <5017b04901c47bd826b2e411e603ce01e42a83a5.1555424512.git.piotr@scylladb.com>
2019-04-16 17:34:35 +03:00
Tomasz Grabiec
5dc3f5ea33 Merge "Properly enable MC format on the cluster" from Piotr
1. All nodes in the cluster have to support MC_SSTABLE_FEATURE
2. When a node observes that whole cluster supports MC_SSTABLE_FEATURE
   then it should start using MC format.
3. Once all shards start to use MC then a node should broadcast that
   unbounded range tombstones are now supported by the cluster.
4. Once whole cluster supports unbounded range tombstones we can
   start accepting them on CQL level.

tests: unit(release)

Fixes #4205
Fixes #4113

* seastar-dev.git dev/haaawk/enable_mc/v11:
  system_keyspace: Add scylla_local
  system_keyspace: add accessors for SCYLLA_LOCAL
  storage_service: add _sstables_format field
  feature: add when_enabled callbacks
  system_keyspace: add storage_service param to setup
  Add sstable format helper methods
  Register feature listeners in storage_service
  Add service::read_sstables_format
  Use read_sstables_format in main.cc
  Use _sstables_format to determine current format
  Add _unbounded_range_tombstones_feature
  Update supported features on format change
2019-04-16 14:07:05 +02:00
Avi Kivity
6c672e674b tools: toolchain: improve dbuild signal handing
Currently, we use --sig-proxy to forward signals to the container. However, this
requires the container's co-operation, which usually doesn't exist. For example,

    docker run --sig-proxy fedora:29 bash -c "sleep 5"

Does not respond to ctrl-C.

This is a problem for continuous integration. If a build is aborted, Jenkins will
first attempt to gracefully terminate the processes (SIGINT/SIGTERM) and then give
up and use SIGKILL. If the graceful termination doesn't work, we end up with an
orphan container running on the node, which can then consume enough memory and CPU
to harm the following jobs.

To fix this, trap signals and handle them by killing the container. Also trap
shell exit, and even kill the container unconditionally, since if Jenkins happens
to kill the "docker wait" process the regular paths will not be taken.
Message-Id: <20190415084040.12352-1-avi@scylladb.com>
2019-04-16 14:07:05 +02:00
Tomasz Grabiec
ac0d435c3e Merge "hinted handoff: don't reuse_segments and discard corrupted segments" from Vlad
This series addresses two issues in the hinted handoff that should
complete fixing the infamous #4231.

In particular the second patch removes the requirement to manually
delete hints files after upgrading to 3.0.4.

Tested with manual unit testing.

* https://github.com/vladzcloudius/scylla.git hinted_handoff_drop_broken_segments-v3:
  hinted handoff: disable "reuse_segments"
  commitlog: introduce a segment_error
  hinted handoff: discard corrupted segments
2019-04-16 14:07:05 +02:00
Avi Kivity
643bddbecc Update seastar submodule
* seastar 6f73675...eb03ba5 (11):
  > tests: tests C++14 dialect in continuous integration
  > rpc/compressor/lz4: fix std:variant related compiler errors
  > tests: futures_test: allow project to compile with C++14
  > Merge "io_queue: make io_priority_class namespace global" from Benny
  > future::then_wrapped: use std::terminate instead of abort
  > reactor: make metric about task quota violations less sensitive
  > Merge "Add LZ4_FRAGMENTED compressor for RPC" from Paweł
  > Fix build issues with Clang 7
  > Merge "file_stat follow_symlink option and related fixes" from Benny
  > doc/tutorial.md: reword mention of seastar::thread premption on get()
  > tests: semaphore_test: relax timeouts

Fixes #4272.
2019-04-16 14:34:32 +03:00
Raphael S. Carvalho
52e1125b52 sstables: do not destroy sstable runs after resharding
Resharding wasn't preserving the sstable run structure, which depends
on all fragments sharing the same run identifier. So let's make
resharding run aware, meaning that a run will be created for each
shard involved.

tests: release mode.

Fixes #4428.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190415193556.16435-1-raphaelsc@scylladb.com>
2019-04-16 10:34:49 +03:00
Tomasz Grabiec
ff66b27754 gdb: heapprof: Coalesce parents in the flamegraph mode
This change drops the hit count from the name of the node, because it
prevents coalescing of nodes which are shared parents for paths with
different counts. This lack of coalescing makes the flamegraph a lot
less useful.

Message-Id: <1555348576-26382-1-git-send-email-tgrabiec@scylladb.com>
2019-04-15 21:05:08 +03:00
Tomasz Grabiec
3fd82021b1 schema_tables: Serialize schema merges fairly
All schema changes made to the node locally are serialized on a
semaphore which lives on shard 0. For historical reasons, they don't
queue but rather try to take the lock without blocking and retry on
failure with a random delay from the range [0, 100 us]. Contenders
which do not originate on shard 0 will have an extra disadvantage as
each lock attempt will be longer by the across-shard round trip
latency. If there is constant contention on shard 0, contenders
originating from other shards may keep loosing to take the lock.

Schema merge executed on behalf of a DDL statement may originate on
any shard. Same for the schema merge which is coming from a push
notification. Schema merge executed as part of the background schema
pull will originate on shard 0 only, where the application state
change listeners run. So if there are constant schema pulls, DDL
statements may take a long time to get through.

The fix is to serialize merge requests fairly, by using the blocking
semaphore::wait(), which is fair.

We don't have to back-off any more, since submit_to() no longer has a
global concurrency limit.

Fixes #4436.

Message-Id: <1555349915-27703-1-git-send-email-tgrabiec@scylladb.com>
2019-04-15 20:40:38 +03:00
Botond Dénes
c6314e422f tests/mutation_source_test: use a single random seed
Currently, each instanciation of `random_mutation_generator::impl` will
generate a new random seed for itself. Altough these are printed,
mapping back all the printed seeds to the exact source location where it
has to be substituted in is non-trivial. This makes reproducing random
test failures very hard. To solve this problem, use
`tests::random::get_int()` to produce the random seed of the
`random_mutation_generator::impl` instances. This way the seed of all
the mutation generator will be derived from a single "master" seed that
is easily replaced after a test failure, hopefully also leading to
easily reproducible random test failures.

I checked that after substituting in a previously generated master
random seed, all derived seeds were exactly the same.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <0471415938fc27485975ef9213d37d94bff20fd5.1555329062.git.bdenes@scylladb.com>
2019-04-15 17:37:31 +03:00
Avi Kivity
3afbe219cd Merge "UDF/UDA related cleanups and refactoring" from Rafael
"
These are patches I wrote while working on UDF/UDA, but IMHO they are
independent improvements and are ready for review.

Tests: unit (debug) dtest (release)

I checked that all tests in

nosetests -v  user_types_test.py sstabledump_test.py cqlsh_tests/cqlsh_tests.py

now pass.
"

* 'espindola/udf-uda-refactoring-v3' of https://github.com/espindola/scylla:
  Refactor user type merging
  cql_type_parser::raw_builder: Allow building types incrementally
  cql3: delete dead code
  Include missing header
  return a const reference from return_type
  delete unused var
  Add a test on nested user types.
2019-04-15 16:52:13 +03:00
Glauber Costa
c01ed239a3 fix typo in create table statement error message
specifed -> specified

Fixes #4434

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190415125206.2993-1-glauber@scylladb.com>
2019-04-15 16:51:13 +03:00
Benny Halevy
b543ab4c76 sstables: remove_temp_dir: do not return then_wrapped future
f.get_exception makes the future invalid so it must not be returned.
Instead, make_exception_future<> with the exception ptr.

Fixes #4435.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190415111909.30499-1-bhalevy@scylladb.com>
2019-04-15 16:42:49 +03:00
Glauber Costa
b9327f81cf conf: stop telling people to run auto_bootstrap: false
auto_bootstrap: false provide negligible gains for new clusters and
it is extremely dangerous everywhere else. We have seen a couple of
times in which users, confused by this, added this flag by mistake
and added nodes with it. While they were pleased by the extremely fast
times to add nodes, they were later displeased to find their data
missing.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190414012028.20767-1-glauber@scylladb.com>
2019-04-14 10:42:25 +03:00
Piotr Jastrzebski
2c599122e1 Update supported features on format change
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 10:38:31 +02:00
Piotr Jastrzebski
9c7e3dd470 Add _unbounded_range_tombstones_feature
This requires introduction of storage_service::get_known_features
and using it with check_knows_remote_features.
Otherwise a node joining the existing cluster won't be able to
join because it does not support unbounded range tombstones yet.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 10:37:12 +02:00
Piotr Jastrzebski
96ad8f7df9 Use _sstables_format to determine current format
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 10:37:12 +02:00
Piotr Jastrzebski
da1eba5bdb Use read_sstables_format in main.cc
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 10:37:12 +02:00
Piotr Jastrzebski
7339e9de30 Add service::read_sstables_format
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 10:37:12 +02:00
Piotr Jastrzebski
9934740c39 Register feature listeners in storage_service
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 10:36:58 +02:00
Piotr Jastrzebski
7a62235259 Add sstable format helper methods
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 09:33:40 +02:00
Piotr Jastrzebski
caa6798f2c system_keyspace: add storage_service param to setup
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 09:33:40 +02:00
Piotr Jastrzebski
460fb260cb feature: add when_enabled callbacks
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 09:33:40 +02:00
Piotr Jastrzebski
081542cf00 storage_service: add _sstables_format field
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 09:33:40 +02:00
Piotr Jastrzebski
0211541d84 system_keyspace: add accessors for SCYLLA_LOCAL
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 09:33:40 +02:00
Piotr Jastrzebski
4c205b733a system_keyspace: Add scylla_local
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-12 09:33:40 +02:00
Benny Halevy
adf539fb2c tests: sstable_test_env::do_with_async: wait_for_background_jobs
To solve memory leak seen in
sstable_datafile_test -t test_old_format_non_compound_range_tombstone_is_read

Refs #4376

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190411154621.9716-1-bhalevy@scylladb.com>
2019-04-11 18:50:42 +03:00
Takuya ASADA
4636284856 dist/ami: drop EPEL, convert scylla_install_ami script to python2
We have to run this script in python2, since we dropped EPEL from
dependencies, and the script is installer for rpms so we cannot use
relocatable python3 for it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190411151858.2292-1-syuu@scylladb.com>
2019-04-11 18:21:48 +03:00
Glauber Costa
f3a24b6c22 dist: remove curl dependency to simplify dependency list further
Although curl is widely available, there is no reason to depend on it.
There are mainly two users, as indicated by grep:
1) scylla-housekeeping
2) scripts within the AMI
3) docker image

The AMI has its own RPM and it already depends on curl. While we could
get rid of the curl dependency there too, we can do that later. Docker
is its own thing and it only needs it at build time anyway.

For the main scylla repo, this patch changes scylla-housekeeping so as
not to depend on the curl binary and use urllib directly instead. We can
then remove curl from our dependency list.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190411125642.9754-1-glauber@scylladb.com>
2019-04-11 16:12:36 +03:00
Benny Halevy
8181acd83b test.py: fail if given test name not found
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190411092041.24712-1-bhalevy@scylladb.com>
2019-04-11 12:31:23 +03:00
Tzach Livyatan
f444c949bd Fix the Dockerhub documentation for listen-address
Fix listen-address documention: it is used for internal communication, not for external clients

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
Message-Id: <20190410181409.16078-1-tzach@scylladb.com>
2019-04-11 11:53:40 +03:00
Botond Dénes
f201f8abab types: fix date_type_impl::less() (timestamp cql type)
date_type_impl::less() invokes `compare_unsigned()` to compare the
underlying raw byte values. `compared_unsigned()` is a tri comparator,
however `date_type_impl::less()` implicitely converted the returned
value to bool. In effect, `date_type_impl::less()` would *always* return
`true` when the two compared values were not equal.

Found while working on a unit test which empoly a randomly generated
schema to test a component.


Fixes #4419.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <8a17c81bad586b3772bf3d1d1dae0e3dc3524e2d.1554907100.git.bdenes@scylladb.com>
2019-04-10 21:01:25 +03:00
Botond Dénes
90721468f0 tests/mutation_diff: remove false-positive diff of the partition header
Currently the partition header will always be reported as different when
comparing two mutations. This is because they are prepended with the
"expected: " and "... but got: " texts. This generates unnecessary
noise. Inject a new line between the prefix and the partition-header
proper. This way the partition header will only show up in the diff when
there is an actual difference. The "expected: " and "... but got: "
phrases are still shown as different on the top of the diff but this is
fine as one can immediately see that they are not part of the data and
additionaly they help the reader in determining which part of the diff
is the expected one and which is the actual one.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <29e0f413d248048d7db032224a3fd4180bf1b319.1554909144.git.bdenes@scylladb.com>
2019-04-10 18:05:36 +02:00
Raphael S. Carvalho
8a117c338a compaction: fix use-after-free when calculating backlog after schema change
The problem happens after a schema change because we fail to properly
remove ongoing compaction, which stopped being tracked, from list that
is used to calculate backlog, so it may happen that a compaction read
monitor (ceases to exist after compaction ends) is used after freed.

Fixes #4410.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190409024936.23775-1-raphaelsc@scylladb.com>
2019-04-10 15:54:39 +03:00
Vlad Zolotarov
db2ba0df61 hinted handoff: discard corrupted segments
If we discover that a current segment is corrupted there is nothing we
can do about it.

This patch does the following:
1) Drops the corrupted segment and moves to the next one.
2) Logs such events as ERRORs.
3) Introduces a new metrics that accounts such event.

Fixes #4364

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-04-09 15:54:20 -04:00
Vlad Zolotarov
1cba4a54bb commitlog: introduce a segment_error
Introduce a common base class for all errors that indicate that the current
segment has "issues".

This allows a laconic "catch" clause for all such errors.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-04-09 15:31:13 -04:00
Vlad Zolotarov
00fe2acb35 hinted handoff: disable "reuse_segments"
Hinted handoff doesn't utilize this feature (which was developed with a
commitlog in mind).
Since it's enabled by default we need to explicitly disable it.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-04-09 11:13:41 -04:00
Piotr Jastrzebski
dee64c30b3 types: Test correct map validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:23 +02:00
Piotr Jastrzebski
3d94f0aaf0 types: Test correct in clause validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:23 +02:00
Piotr Jastrzebski
36853a7a5c types: Test correct tuple validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
94bdc1c868 types: Test correct set validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
429a8e082a types: Test correct list validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
910d81e03e types: Add test_tuple_elements_validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
e2fe9ca5d0 types: Add test_in_clause_validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
cd11959a8e types: Add test_map_elements_validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
22f541af1d types: Add test_set_elements_validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
be405e24e9 types: Add test_list_elements_validation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
47e242efc5 types: Validate input when tuples
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
c4df3014ac types: Validate input when parsing a set
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
8a7b05ae26 types: Validate input when parsing a map
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
16596ec045 types: Validate input when parsing a list
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
8482764003 types: Implement validation for tuple
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
bd2823b623 types: Implement validation for set
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
086d8abf89 types: Implement validation for map
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
4a51ee6e34 types: Implement validation for list
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Piotr Jastrzebski
f5f6367674 types: Add cql_serialization_format parameter to validate
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-04-09 16:58:22 +02:00
Takuya ASADA
e3a5ac2945 reloc: run fix_sharedlib() only on application/x-sharedlib and application/x-pie-executable
We need to prevent to run fix_sharedlib() on non-ELF files.

Fixes #4415

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190409114941.28276-1-syuu@scylladb.com>
2019-04-09 14:54:54 +03:00
Rafael Ávila de Espíndola
89b2c4ddc5 Refactor user type merging
The comparison of tables before and after mutation is now done by a
generic diff_rows function. The same function will be used for user
defined functions and user defined aggregates.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 14:16:40 -07:00
Rafael Ávila de Espíndola
4f1260f3e3 cql_type_parser::raw_builder: Allow building types incrementally
Before this patch raw_builder would always start with an empty list of
user types. This means that every time a type is added to a keyspace,
every type in that keyspace needs to be recreated.

With this patch we pass a keyspace_metadata instead of just the
keyspace name and can construct new user types on top of previous
ones.

This will be used in the followup patch, where only new types are
created.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 14:06:51 -07:00
Rafael Ávila de Espíndola
c037b266b4 cql3: delete dead code
In c++ TOKEN_FUNCTION_NAME is only needed in the .cc file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola
1db0b83711 Include missing header
abstract_function.hh uses function, which is defined in function.hh,
so it should include it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola
4551691b5d return a const reference from return_type
We define data_type as

using data_type = shared_ptr<const abstract_type>;

Since it is a shared_ptr, it cannot be copied into another thread
since that would create a race condition incrementing the reference
counter.

In particular, before this patch it is not legal to call
return_type from another thread.

With this patch read only access from another thread is possible.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola
35f1b1055d delete unused var
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 11:07:45 -07:00
Rafael Ávila de Espíndola
b577082c64 Add a test on nested user types.
This would have found a bug in a previous version of this series.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-08 10:54:33 -07:00
Takuya ASADA
1f009b5e9b dist/redhat/python3: drop SCYLLA-*-FILE files in rpm
Related with #4409, These are more files does not needed for runtime, so
drop them too.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190405074030.3990-1-syuu@scylladb.com>
2019-04-08 11:52:48 +03:00
Rafael Ávila de Espíndola
6191fd7701 Avoid duplicated read_keyspace_mutation calls
There were many calls to read_keyspace_mutation. One in each function
that prepares a mutation for some other schema change.

With this patch they are all moved to a single location.

Tests: unit (dev, debug)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190328024440.26201-1-espindola@scylladb.com>
2019-04-07 09:26:56 +03:00
Takuya ASADA
d180caea89 dist/redhat/python3: drop dist/ files in rpm
These files does not needed for runtime, drop them.

Fixes #4409

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190405071445.18678-1-syuu@scylladb.com>
2019-04-07 09:26:56 +03:00
Amos Kong
db9a721d02 scylla_kernel_check: update kb_fs_not_qualified_aio doc link
The doc has been moved to
https://docs.scylladb.com/troubleshooting/error_messages/kb_fs_not_qualified_aio/

Fixes #4398

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <75fdc97d222667f4402cadc7a46e52d6f38a32a8.1554375560.git.amos@scylladb.com>
2019-04-07 09:26:56 +03:00
Glauber Costa
2305cc88f3 relocatable python: Be more permissive with mime type checking
Fedora28 python magic used to return a x-sharedlib mime type for .so files.
Fedora29 changed that to x-pie-executable, so the libraries are no longer
relocated.

Let's be more permissive and relocate everything that starts with application/.

Fixes #4396

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190404140929.7119-1-glauber@scylladb.com>
2019-04-07 09:26:56 +03:00
Piotr Jastrzebski
882ea9caf0 tests: Fix use after free in check_multi_schema
Refs #4376

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <7d7b4cf69cea1e4d31058d8f1fd2c01f1dd11c58.1554387442.git.piotr@scylladb.com>
2019-04-07 09:26:56 +03:00
Piotr Jastrzebski
4485868d27 tests: Fix use after free in check_read_indexes
Refs #4376

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <0dc76b2a55bebc49558f30e8d2894973ce817577.1554386770.git.piotr@scylladb.com>
2019-04-07 09:26:56 +03:00
Tomasz Grabiec
a717e11026 Merge "row level repair shutdown fixes" from Asias
This series fixes row level repair shutdown related issues we saw with
dtests, e.g., use after free of the repair meta object, fail to stop a
table during shutdown.

Fixes: #4044
Fixes: #4314
Fixes: #4333
Fixes: #4380

Tests: repair_additional_test.py:RepairAdditionalTest.repair_abort_test
       repair_additional_test.py:RepairAdditionalTest.repair_kill_2_test

* sestar-dev.git asias/repair.fix.shutdown.v1:
  repair: Wait for pending repair_meta operation before removing it
  repair: Check shutdown in row level repair
  repair: Remove repair meta when node is dead
  repair: Remove all row level repair during shtudown
2019-04-05 15:47:25 +03:00
Avi Kivity
e63bc6b1e3 Update seastar submodule
* seastar 63d8607...6f73675 (5):
  > Merge "seastar-addr2line: improve the context of backtraces" from Botond
  > log: fix std::system_error ostream operator to print full error message
  > Revert "threads: yield on get if we had run for too long."
  > core/queue: Document concurrency constraints
  > core/memory: Make small pools use the full span size

Fixes #4407.
Fixes #4316.
2019-04-05 15:47:25 +03:00
Avi Kivity
b1c4c371fa Merge "fix I/O calculation for i3.metal instances" from Glauber
"
Calculation of IO properties is slightly wrong for i3.metal, because we get
the number of disks wrong. The reason for that is our check for ephemeral nvme
disks, that pre-date the time in which root devices were exposed as nvme devices
(nitro and metal instances).
"

toolchain updated with python3-psutil

* 'ec2fixes' of github.com:glommer/scylla:
  scylla_util.py: do not include root disks in ephemeral list
  scylla-python3: include the psutil module
  fix typo in scylla_ec2_check
2019-04-05 15:46:59 +03:00
Asias He
f212dfb887 streaming: Reject stream if the _sys_dist_ks or _view_update_generator are not ready
They are of type db::system_distributed_keyspace and
db::view::view_update_generator.

n1 is in normal status
n2 boots up and _sys_dist_ks or _view_update_generator are not
initialized
n1 runs stream, n2 is the follower.
n2 uses the _sys_dist_ks or _view_update_generator
"Assertion `local_is_initialized()' failed" is observed

Fixes #4360

Message-Id: <4ae13e1640ac8707a9ba0503a2744f6faf89ecf4.1554330030.git.asias@scylladb.com>
2019-04-04 10:48:00 +03:00
Avi Kivity
8abba6f6a6 Merge "Avoid copying data_type" from Rafael
"
With these changes we avoid a std::vector<data_value> copy, which is
nice in itself, but also makes it possible to call get_list from other
shards.
"

* 'espindola/result-set-v3' of https://github.com/espindola/scylla:
  Avoid copying a std::vector in get_list
  query-result-set: add and use a get_ptr method
2019-04-03 21:29:22 +03:00
Asias He
99da196e6f repair: Reject repair if the _sys_dist_ks or _view_update_generator are not ready
They are of type db::system_distributed_keyspace and db::view::view_update_generator.

n1 is in normal status
n2 boots up and _sys_dist_ks or _view_update_generator are not initialized
n1 runs repair, n2 is the follower.
n2 uses the _sys_dist_ks or _view_update_generator
"Assertion `local_is_initialized()' failed" is observed

Fixes #4360

Message-Id: <6616c21078c47137a99ba71baf82594ba709597c.1553742487.git.asias@scylladb.com>
2019-04-03 21:29:22 +03:00
Rafael Ávila de Espíndola
74f956e5a8 Avoid copying a std::vector in get_list
For now this is just an optimization. But it also avoids copying
data_type, which will allow this be used across shards.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-03 09:20:12 -07:00
Rafael Ávila de Espíndola
c2a8807c35 query-result-set: add and use a get_ptr method
This moves a copy up the call stack and makes it possible to avoid it
completely by passing a reference type to get_nonnull.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-04-03 09:19:52 -07:00
Tomasz Grabiec
3356a085d2 lsa: Cover more bad_alloc cases with abort
When --abort-on-lsa-bad-alloc is enabled we want to abort whenever
we think we can be out of memory.

We covered failures due to bad_alloc thrown from inside of the
allocation section, but did not cover failures from reservations done
at the beginning of with_reserve(). Fix by moving the trap into
reserve().

Message-Id: <1553258915-27929-1-git-send-email-tgrabiec@scylladb.com>
2019-04-03 16:39:40 +03:00
Glauber Costa
0e9a50ab57 scylla_util.py: do not include root disks in ephemeral list
Nitro instances (and metal ones) put their root device in nvme (as a
protocol. it is still EBS). Our algorithm so far has relied on parsing
the nvme devices to figure out which ones are ephemeral but it will
break for those instances.

Out of our supported instances so far, the i3.metal is the only one
in which this breaks.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-04-03 07:57:00 -04:00
Glauber Costa
6d7ac87136 scylla-python3: include the psutil module
Using a new python3 module has never been that easy! So we'll
unapologetically use psutil and don't even worry about whether or not
CentOS supports it (it doesn't)

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-04-02 17:24:25 -04:00
Glauber Costa
027eee5f13 fix typo in scylla_ec2_check
enahanced -> enhanced

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-04-02 17:24:00 -04:00
Dejan Mircevski
a66a5d423a query_processor: Add query-count metrics
... with labels for each consistency level.  Fixes
https://github.com/scylladb/scylla/issues/4309 ("add counters breaking
up cql requests based on consistency_level").

Tests: unit (dev)
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <1554127055-17705-1-git-send-email-dejan@scylladb.com>
2019-04-02 19:08:25 +03:00
Avi Kivity
be6905da84 Update seastar submodule
* seastar 5572de7...63d8607 (6):
  > test: verify that negative sleep time doesn't cause infinite sleep
  > httpd: Change address handling to use socket_address
  > dns: Change "unspecififed" address search type to retrive first avail
  > Allow when_all and when_all_succeed to take function arguments
  > when_all: abort if memory allocation fails
  > inet_address: Add missing constructor impl.
2019-04-02 16:56:56 +03:00
Asias He
b98d95ebf0 repair: Remove all row level repair during shtudown
We saw dtest failed to stop a node like:

```
ERROR: repair_one_missing_row_test (repair_additional_test.RepairAdditionalTest)
----------------------------------------------------------------------
Traceback (most recent
[2019.1.3.node1.repair.zip](https://github.com/scylladb/scylla/files/2723244/2019.1.3.node1.repair.zip)
 call last):
  File "/home/asias/src/cloudius-systems/scylla-dtest/repair_additional_test.py", line 2521, in repair_one_missing_row_test
    return RepairAdditionalBase._repair_one_missing_row_test(self)
  File "/home/asias/src/cloudius-systems/scylla-dtest/repair_additional_test.py", line 1842, in _repair_one_missing_row_test
    self.check_rows_on_node(node2, nr_rows)
  File "/home/asias/src/cloudius-systems/scylla-dtest/repair_additional_test.py", line 34, in check_rows_on_node
    node.stop(wait_other_notice=True)
  File "/home/asias/src/cloudius-systems/scylla-ccm/ccmlib/scylla_node.py", line 496, in stop
    raise NodeError("Problem stopping node %s" % self.name)
NodeError: Problem stopping node node1
```

The problem is:

1) repair_meat is created
repair_meta -> repair_writer::create_writer() -> t.stream_in_progress()
repari_meta -> repair_reader::repair_reader -> cf.read_in_progress()

2) repair_meta is stored in _repair_metas map.

3) Shtudown repair, repair_meta is not removed from the _repair_metas map

4) Shutdown database which wait for the utils::phased_barrier.

To fix, we should stop and remove all the repair_meata from the _repair_metas map.

Tests: 30 successful runs of the repair_kill_2_test

Fixes: #4044
2019-04-02 19:28:53 +08:00
Asias He
344d0ee37d repair: Remove repair meta when node is dead
Repair follower nodes will create repair meta object when repair master
node starts a repair. Normally, the repair meta object is removed when
repair master finishes the repair and sends the verb
REPAIR_ROW_LEVEL_STOP to all the followers to remove the repair meta
object. In case of repair master was killed suddenly, no one will remove
the repair meta object.

To prevent keeping this repair meta object forever, we should remove
such objects when gossip detects a node is dead with the gossip
listener.

Fixes: #4380

Reviewed-by: Botond Dénes <bdenes@scylladb.com>
2019-04-02 19:28:53 +08:00
Asias He
b061157b21 repair: Check shutdown in row level repair
During node shutdown, we should abort the repair as soon as possible.
Check if we are in shutdown in row level repair steps.

Refs: #4044
2019-04-02 19:28:53 +08:00
Asias He
e3e489328e repair: Wait for pending repair_meta operation before removing it
We remove repair_meta object in remove_repair_meta up receiving of stop
row level repair rpc verb. It is possible there is an pending operation
of repair_meta. To avoid use after free, we should not remove the
repair_meta object until all the pending operations are done.
Use a gate to protect it.

Fixes: #4333
Fixes: #4314
Tests: 50 succesful run of repair_additional_test.py:RepairAdditionalTest.repair_kill_2_test
2019-04-02 19:28:53 +08:00
Vlad Zolotarov
0dc0a6025d query_pager::fetch_page: cosmetics: fix code alignment
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190401214030.5570-2-vladz@scylladb.com>
2019-04-02 11:53:10 +03:00
Asias He
70fbe85b3e main: Add shutdown database log
It is useful to know which step we are during shutdown process.

Refs: #4044
Message-Id: <f7c94c60d039560bfacd6d473f7d828940cc55b7.1554172140.git.asias@scylladb.com>
2019-04-02 11:49:00 +03:00
Benny Halevy
3749148339 storage_service: fix handling of load_new_sstables exception
ignore_ready_future in load_new_ss_tables broke
migration_test:TestMigration_with_*.migrate_sstable_with_counter_test_expect_fail dtests.

The java.io.NotSerializableException in nodetool was caused by exceptions that
were too long.

This fix prints the problematic file names onto the node system log
and includes the casue in the resulting exception so to provide the user
with information about the nature of the error.

Fixes #4375

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190331154006.12808-1-bhalevy@scylladb.com>
2019-04-02 11:46:19 +03:00
Avi Kivity
988dfd7209 Merge "add relocatable CLI tools required for scylla setup scripts" from Takuya
"
To make offline installer easier we need to minimize dependencies as
possible.
Python dependencies are already dropped by adding relocatable python3 by
Glauber, now it's time to drop rest of command line tools which used by
scylla setup tools.
(even scripts are converted to python3, it still executes some external
commands, so these commands should be distributed with offline installer)

Note that some of CLI tools haven't added such as NTP and RAID stuff,
since these tools have daemons, not just CLI.
To use such stuff in offline mode, users have to install them manually.
But both NTP setup and RAID setup are optional, users still can run Scylla w/o
them.
"

Toolchain updated to docker.io/scylladb/scylla-toolchain:fedora-29-20190401
for changes in install-dependencies.sh; also updates to gnutls 3.6.7 security
release.

* 'reloc_clitools_v5' of https://github.com/syuu1228/scylla:
  reloc: add relocatable CLI tools for scylla setup scripts
  dist/redhat: drop systemd-libs from dependency
  dist/redhat: drop file from dependency since it seems unused
  dist/redhat: drop pciutils from dependency since it only used in DPDK mode
2019-04-01 14:23:04 +03:00
Raphael S. Carvalho
d59f716e1c table: fix wild disk usage stat after sstables are discarded by truncate
Truncate would make disk usage stat go wild because it isn't updated
when sstables are removed in table::discard_sstables(). Let's update
the stat after sstables are removed from the sstable set.

Fixes #3624.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190328154918.25404-1-raphaelsc@scylladb.com>
2019-04-01 13:55:11 +03:00
Duarte Nunes
b2dd8ce065 database: Make exception message more accurate
It's the sstable read queue that's overloaded, not the inactive one
(which can be considered empty when we can't admit newer reads).

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190328003533.6162-1-duarte@scylladb.com>
2019-04-01 13:53:50 +03:00
Takuya ASADA
75a7859019 reloc: add relocatable CLI tools for scylla setup scripts
To minimize dependencies of Scylla, add relocatable image of CLI tools
required for scylla setup scripts.
2019-04-01 02:59:01 +09:00
Takuya ASADA
a3c1b9fcf3 dist/redhat: drop systemd-libs from dependency
Since we switched to relocatable package, we don't need distribution
native libraries, so the package is not needed anymore.
2019-04-01 02:58:22 +09:00
Takuya ASADA
a3741b4052 dist/redhat: drop file from dependency since it seems unused
The pacakge is not used in our script anymore, drop it.
2019-04-01 02:57:43 +09:00
Takuya ASADA
7d78515d5b dist/redhat: drop pciutils from dependency since it only used in DPDK mode
Since we don't use DPDK mode by default, and the mode is not officially
supported, drop pciutils from package dependency.
Users who want to use DPDK mode they neeed to install the package
manually.
2019-04-01 02:56:31 +09:00
Avi Kivity
77a0d5c5da Update seastar submodule
* seastar 05efbce...5572de7 (5):
  > posix_file_impl::list_directory: do not ignore symbolic link file type
  > prometheus: yield explicitly after each metric is processed
  > thread: add maybe_yield function
  > metrics: add vector overload of add_group()
  > memory: tone down message for memory allocator
2019-03-31 15:26:21 +03:00
Tomasz Grabiec
4c0584289b tests: cql_test_env: Fix _feature_service not being initialized
We moved from uninitialized field instead of the constructor parameter.

No known issues.

Message-Id: <1553854544-26719-1-git-send-email-tgrabiec@scylladb.com>
2019-03-31 13:05:35 +03:00
Takuya ASADA
b1bba0c1b0 dist/redhat/python3: product name customization support
Currently scylla-python3 package name is hardcorded, need to support
package name renaming just like on other scylla packages.
This is required to release enterprise version.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190329003941.12289-1-syuu@scylladb.com>
2019-03-29 19:22:24 +02:00
Amos Kong
98cb7d145b scylla_setup: don't repeatedly select disks if it's assigned
Currently scylla_setup would be stuck to select disks in non-interaction mode.

Fixes #4370

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <8fb445708a6ac0d2130f8a8d041b1d8d71f1cf14.1553745961.git.amos@scylladb.com>
2019-03-28 15:21:36 +02:00
Avi Kivity
65dd45d9cf Merge "sstable: validate file ownership and mode." from Benny
"
File must be either owned by the process uid
or have both read and write access to it,
so it could be (hard) linked when sysctl
fs.protected_hardlinks is enabled.

Fixes #3117
"

* 'projects/valid_owner_and_mode/v3-rebased' of https://github.com/bhalevy/scylla:
  storage_service: handle load_new_sstables exception
  init: validate file ownership and mode.
  treewide: use std::filesystem
2019-03-28 14:58:14 +02:00
Benny Halevy
956cb2e61c storage_service: handle load_new_sstables exception
Refs #3117

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-28 14:54:56 +02:00
Benny Halevy
e3f7fe44c0 init: validate file ownership and mode.
Files and directories must be owned by the process uid.
Files must have read access and directories must have
read, write, and execute access.

Refs #3117

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-28 14:40:12 +02:00
Benny Halevy
ff4d8b6e85 treewide: use std::filesystem
Rather than {std::experimental,boost,seastar::compat}::filesystem

On Sat, 2019-03-23 at 01:44 +0200, Avi Kivity wrote:
> The intent for seastar::compat was to allow the application to choose
> the C++ dialect and have seastar follow, rather than have seastar choose
> the types and have the application follow (as in your patch).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-28 14:21:10 +02:00
Dejan Mircevski
aa11f5f35e Drop unused #include
v2: fix "From" field in email

Tests: unit/cql_query_test (dev)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <1553099087-11621-1-git-send-email-dejan@scylladb.com>
2019-03-28 01:48:19 +00:00
Duarte Nunes
d8fcdefe4a tests/view_schema_test: Remove debug output
A stray std::cout remained.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2019-03-27 21:58:10 +00:00
Tomasz Grabiec
2b8bf0dbf8 Merge "db/view: Apply tracked tombstones for new updates" from Duarte
When generating view updates for base mutations when no pre-existing
data exists, we were forgetting to apply the tracked tombstones.

Fixes #4321
Tests: unit(dev)

* https://github.com/duarten/scylla materialized-views/4321/v1.1:
  db/view: Apply tracked tombstones for new updates
  tests/view_schema_test: Add reproducer for #4321
2019-03-27 13:24:28 +01:00
Duarte Nunes
f609848b69 tests/view_schema_test: Add reproducer for #4321
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2019-03-27 12:01:39 +00:00
Duarte Nunes
ded9221187 db/view: Apply tracked tombstones for new updates
When generating view updates for base mutations when no pre-existing
data exists, we were forgetting to apply the tracked tombstones.

Fixes #4321
Tests: unit(dev)

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2019-03-27 12:01:39 +00:00
Glauber Costa
043d102ab6 commitlog: fix typo in error message
maxiumum -> maximum

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190326191108.7573-1-glauber@scylladb.com>
2019-03-26 21:32:56 +02:00
Avi Kivity
a77762b02a Merge "Optimise vint deserialisation" from Paweł
"

Variable length integers are used are used extensively by SSTables mc
format. The current deserialisation routine is quite naive in a way that
it reads each byte separately. Since, those vints usually appear inside
much larger buffers, we optimise for such cases, read 8-bytes at once
and then mask out the unneeded parts (as well as fix their order because
big-endian).

Tests: unit(dev).

perf_vint (average time per element when deserializing 1000 vints):

before:
vint.deserialize                            69442000    14.400ns     0.000ns    14.399ns    14.400ns

after:
vint.deserialize                           241502000     4.140ns     0.000ns     4.140ns     4.140ns

perf_fast_forward (data on /tmp):
large-partition-single-key-slice on dataset large-part-ds1:

before:
   range            time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> [0, 1]           0.000278         8792         2       7190        119       7367       1960      3        104       2       0        0        1        1        0        0        1 100.0%
-> [1, 100)         0.000344           96        99     288100       4335     307689     193809      2        108       2       0        0        1        1        0        0        1 100.0%
-> (100, 200]       0.000339        13254       100     295263       2824     301734     222725      2        108       2       0        0        1        1        0        0        1 100.0%

after:
   range            time (s)   iterations     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
-> [0, 1]           0.000236        10001         2       8461         59       8718       2261      3        104       2       0        0        1        1        0        0        1 100.0%
-> [1, 100)         0.000285           89        99     347500       2441     355826     215745      2        108       2       0        0        1        1        0        0        1 100.0%
-> (100, 200]       0.000293        14369       100     341302       1512     350123     222049      2        108       2       0        0        1        1        0        0        1 100.0%
"

* tag 'optimise-vint/v2' of https://github.com/pdziepak/scylla:
  sstable: pass full length of buffer to vint deserialiser
  vint: optimise deserialisation routine
  vint: drop deserialize_type structure
  tests/vint: reduce test dependencies
  tests/perf: add performance test for vint serialisation
2019-03-26 16:41:44 +02:00
Avi Kivity
4b330b3911 Merge "introduce sstables manager" from Benny
"
This series introduce a rudimentary sstables manager
that will be used for making and deleting sstables, and tracking
of thereof.

The motivation for having a sstables manager is detailed in
https://github.com/scylladb/scylla/issues/4149.
The gist of it is that we need a proper way to manage the life
cycle of sstables to solve potential races between compaction
and various consumers of sstables, so they don't get deleted by
compaction while being used.

In addition, we plan to add global statistics methods like returning
the total capacity used by all sstables.

This patchset changes the way class sstable gets the large_data_handler.
Rather than passing it separately for writing the sstable and when deleting
sstables, we provide the large_data_handler when the sstable object is
constructed and then use it when needed.

Refs #4149
"

* 'projects/sstables_manager/v3' of https://github.com/bhalevy/scylla:
  sstables: provide large_data_handler to constructor
  sstables_manager: default_sstable_buffer_size need not be a function
  sstables: introduce sstables_manager
  sstables: move shareable_components def to its own header
  tests: use global nop_lp_handler in test_services
  sstables: compress.hh: add missing include
  sstables: reorder entry_descriptor constructor params
  sstables: entry_descriptor: get rid of unused ctor
  sstables: make load_shared_components a method of sstable
  sstables: remove default params from sstable constructor
  database: add table::make_sstable helper
  distributed_loader: pass column_family to load_sstables_with_open_info
  distributed_loader: no need for forward declaration of load_sstables_with_open_info
  distributed_loader: reshard: use default params for make_sstable
2019-03-26 16:31:40 +02:00
Benny Halevy
223e1af521 sstables: provide large_data_handler to constructor
And use it for writing the sstable and/or when deleting it.

Refs #4198

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:24:19 +02:00
Benny Halevy
c23f658d0e sstables_manager: default_sstable_buffer_size need not be a function
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
eebc3701a5 sstables: introduce sstables_manager
The goal of the sstables manager is to track and manage sstables life-cycle.
There is a sstable manager instance per database and it is passed to each column-family
(and test environment) on construction.
All sstables created, loaded, and deleted pass through the sstables manager.

The manager will make sure consumers of sstables are in sync so that sstables
will not be deleted while in use.

Refs #4149

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
b50c041aa2 sstables: move shareable_components def to its own header
To be used by sstables_manager.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
2cd11208a1 tests: use global nop_lp_handler in test_services
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
0e3f9c25e4 sstables: compress.hh: add missing include
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
33cbfe81f2 sstables: reorder entry_descriptor constructor params
To match make_sstable's in preparation of moving to sstables_manager

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
ac5f9c1eae sstables: entry_descriptor: get rid of unused ctor
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
adf8428321 sstables: make load_shared_components a method of sstable
and open code its static part in the caller (distributed_loader)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
ff7b7910f1 sstables: remove default params from sstable constructor
The goal is to construct sstables only via make_sstables
that will be moved to class sstables_manager in a later patch.

Defining the default values in both interfaces is unneeded
and may to lead to them going out of sync.

Therefore, have only make_sstables provide the default parameter values.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
3a17053cb8 database: add table::make_sstable helper
In most cases we make a sstable based on the table schema
and soon - large_data_handler.
Encapsulate that in a make_sstable method.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
67f705ae04 distributed_loader: pass column_family to load_sstables_with_open_info
Rather than just its schema.

In preparation for adding table::make_sstable

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
99875ba966 distributed_loader: no need for forward declaration of load_sstables_with_open_info
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Benny Halevy
7a8ab1d6f1 distributed_loader: reshard: use default params for make_sstable
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Avi Kivity
5e39b62fcc Merge "configure: Optionally don't compress debug in executables" from Rafael
"
Most of the binaries we link in a debug build are linked with -s, so
the only impact is build/debug/scylla, which grows by 583 MiB when
using --compress-exec-debuginfo=0.

On the other hand, not having to recompress all the debug info from
all the used object files is a pretty big win when debugging an issue.

For example, linking build/debug/scylla goes from

56.01s user 15.86s system 220% cpu 32.592 total

to

27.39s user 19.51s system 991% cpu 4.731 total

Note how the cpu time is "only" 2x better, but given that compressing
debug info is a long serial task, the wall time is 6.8x better.

Tests: unit (debug)
"

* 'espindola/dont-compress-debug-v5' of https://github.com/espindola/scylla:
  configure: Add a --compress-exec-debuginfo option
  configure: Move some flags from cxx_ld_flags to cxxflags
  configure: rename per mode opt to cxx_ld_flags
  configure: remove per mode libs
  configure: remove sanitize_libs and merge sanitize into opt
  configure: split a ld_flags_{mode} out of cxxflags_{mode}
2019-03-26 15:25:07 +02:00
Avi Kivity
fad1be0ddc Update seastar submodule
* seastar caa98f8...05efbce (2):
  > fix use after free in rpc server handler
  > rpc: wait for send_negotiation_frame

Fixes #4336.
2019-03-26 14:33:37 +02:00
Gleb Natapov
1abc50ad8a messaging_service: make sure a client is unique for a destination
Function messaging_service::get_rpc_client() suppose to either return
existing client or create one and return it. The function is suppose to
be atomic, so after checking that requested client does not exist it is
safe to assume emplace() will succeed. But we saw bugs that made the
function to not be atomic. Lets add an assert that will help to catch
such bugs easier if they will happen in the future.

Message-Id: <20190326115741.GX26144@scylladb.com>
2019-03-26 14:19:08 +02:00
Avi Kivity
a696a3daf2 Merge "Fix decimal and varint serialization" from Piotr
"
Fixes #4348

v2 changes:
 * added a unit test

This miniseries fixes decimal/varint serialization - it did not update
output iterator in all cases, which may lead to overwriting decimal data
if any other value follows them directly in the same buffer (e.g. in a tuple).
It also comes with a reproducing unit test covering both decimals and varints.

Tests: unit (dev)
dtest: json_test.FromJsonUpdateTests.complex_data_types_test
   json_test.FromJsonInsertTests.complex_data_types_test
   json_test.ToJsonSelectTests.complex_data_types_test
"

* 'fix_varint_serialization_2' of https://github.com/psarna/scylla:
  tests: add test for unpacking decimals
  types: fix varint and decimal serialization
2019-03-26 13:00:19 +02:00
Piotr Sarna
e538163a29 tests: add test for unpacking decimals
Refs #4348
2019-03-26 11:52:44 +01:00
Piotr Sarna
287a02dc05 types: fix varint and decimal serialization
Varint and decimal types serialization did not update the output
iterator after generating a value, which may lead to corrupted
sstables - variable-length integers were properly serialized,
but if anything followed them directly in the buffer (e.g. in a tuple),
their value will be overwritten.

Fixes #4348

Tests: unit (dev)
dtest: json_test.FromJsonUpdateTests.complex_data_types_test
       json_test.FromJsonInsertTests.complex_data_types_test
       json_test.ToJsonSelectTests.complex_data_types_test

Note that dtests still do not succeed 100% due to formatting differences
in compared results (e.g. 1.0e+07 vs 1.0E7, but it's no longer a query
correctness issue.
2019-03-26 11:02:43 +01:00
Rafael Ávila de Espíndola
ddac002fd4 Make atomic_cell comparison symmetrical
I noticed a test failure with

Mutation inequality is not symmetric for ...

And the difference between the two mutations was that one atomic_cell
was live and the other wasn't.

Looking at the code I found a few cases where the comparison was not
symmetrical. This patch fixes them.

This patch will not fix the test, as it will now fail with a
"Mutations differ" error, but that is probably an independent issue.

Ref #3975.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190325194647.54950-1-espindola@scylladb.com>
2019-03-26 11:14:22 +02:00
Vlad Zolotarov
c798563cb0 scylla_util.py: ignore perftune.py's error messages when calling it in order to get mode's CPU mask
When we call perftune.py in order to get a particular mode's cpu set
(e.g. mode=sq_split) it may fail and print an error message to stderr because
there are too few CPUs for a particular configuration mode (e.g. when
there are only 2 CPUs and the mode is sq_split).

We already treat these situations correctly however we let the
corresponding perftune.py error message get out into the syslog.

This is definitely confusing, stressful and annoying.
Let's not let these messages out.

Fixes #4211

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190325220018.22824-1-vladz@scylladb.com>
2019-03-26 11:08:31 +02:00
Vlad Zolotarov
afa176851b transport: result_message: fix the compilation with fmt v5.3.0
Compilation fails with fmt release 5.3.0 when we print a bytes_view
using "{}" formatter.
Compiler's complain is: "error: static assertion failed: mismatch between char-types of context and argument"

Resolve this by explicitly using the operator<<() across the whole
operator<<(std::ostream& os, const result_message::rows& msg) function.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190325203628.5902-1-vladz@scylladb.com>
2019-03-26 11:06:18 +02:00
Benny Halevy
af7f2a07f4 table::open_sstable: test has_scylla_component after load
has_scylla_component is always false before loading the sstable.

Also, return exception future rather than throwing.

Hit with the following dtests:
 counter_tests.TestCounters.upgrade_test
 counter_tests.TestCountersOnMultipleNodes.counter_consistency_node_*_test
 resharding_test.ReshardingTest_nodes?_with_*CompactionStrategy.resharding_counter_test
 update_cluster_layout_tests.TestUpdateClusterLayout.increment_decrement_counters_in_threads_nodes_restarted_test

Fixes #4306

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190326084151.18848-1-bhalevy@scylladb.com>
2019-03-26 10:58:52 +02:00
Avi Kivity
f259a4c3b4 Merge "Remove usage of static gossiper object in init.cc and storage_service" from Asias
"
This series removes the usage of the static gossiper object in init.cc
and storage_service.

Follow up series will remove more in other components. This is the
effort to clean up the component dependencies and have better shutdown
procedure.

Tests: tests/gossip_test, tests/cql_query_test, tests/sstable_mutation_test,  dtests.
"

* tag 'asias/storage_service_gossiper_dep_v5' of github.com:cloudius-systems/seastar-dev:
  storage_service: Do not use the global gms::get_local_gossiper()
  storage_service: Pass gossiper object to storage_service
  gms: Remove i_failure_detector.hh
  gossip: Get rid of the gms::get_local_failure_detector static object
  dht: Do not use failure_detector::is_alive in failure_detector_source_filter
  tests: Fix stop snitch in gossip_test.cc
  gossiper: Do not use value_factory from storage_service object
  gossiper: Use cfg options from _cfg instead of get_local_storage_service
  gossiper: Pass db::config object to gossiper class
  init: Pass gossiper object to init_ms_fd_gossiper
2019-03-26 08:54:46 +02:00
Avi Kivity
1d9699d833 Update seastar submodule
* seastar 33baf62...caa98f8 (8):
  > Merge "Add file_accessible and file_stat methods" from Benny
  > future::then: use std::terminate instead of abort
  > build: Allow cooked dependencies with configure.py
  > tests: Show a test's output when it fails
  > posix_file_impl: Bypass flush() call iff opened with O_DSYNC
  > posix_file_impl: Propagate and keep open_flags
  > open_flags: Add O_DSYNC value
  > build: Forward variables to CMake correctly
2019-03-25 15:45:52 +02:00
Avi Kivity
a7520c0ba9 Merge "Turn cql3_type into a trivial wrapper over data_type" from Rafael
"
Both cql3_type and abstract_type are normally used inside
shared_ptr. This creates a problem when an abstract_type needs to refer
to a cql3_type as that creates a cycle.

To avoid warnings from asan, we were using a std::unordered_map to
store one of the edges of the cycle. This avoids the warning, but
wastes even more memory.

Even before this series cql3_type was a fairly light weight
structure. This patch pushes in that direction and now cql3_type is a
struct with a single member variable, a data_type.

This avoids the reference cycle and is easier to understand IMHO.

The one corner case is varchar. In the old system cql3_type::varchar
and cql3_type::text don't compare equal, but they both map to the same
data_type.

In the new system they would compare equal, so we avoid the confusion
by just removing the cql3_type::varchar variable.

Tests: unit (dev)
"

* 'espindola/merge-cq3-type-and-type-v3' of https://github.com/espindola/scylla:
  Turn cql3_type into a trivial wrapper over data_type
  Delete cql3_type::varchar
  Simplify db::cql_type_parser::parse
  Add a test for the varchar column representation
2019-03-25 15:03:16 +02:00
Tomasz Grabiec
80020118d0 Merge "Fix a couple of bugs related to large entry deletion" from Rafael
The crash observed in issue #4335 happens because
delete_large_data_entries is passed a deleted name.

Normally we don't get a crash, but a garbage name and we fail to
delete entries from system.large_*.

Adding a test for the fix found another issue that the second patch
is this series fixes.

Tests: unit (dev)

Fixes #4335.

* https://github.com/espindola/scylla guthub/fix-use-after-free-v4:
  large_data_handler: Fix a use after destruction
  large_data_handler: Make a variable non static
  Allow large_data_handler to be stopped twice
  Allow table to be stopped twice
  Test that large data entries are deleted
2019-03-25 10:37:36 +01:00
Avi Kivity
8c6306897d Merge "load_new_sstables: validate new_tables before calling row_cache::invalidate" from Benny
"
Validate the to-be-loaded sstables in the open_sstable phase and handle any exceptions before calling cf.get_row_cache().invalidate.

Currently if exception is thrown from distributed_loader::open_sstable cf._sstables_opened_but_not_loaded may be left partially populated.

Fixes #4306

Tests: unit (dev)
	- next-gating dtests (dev)
	- migration_test:TestMigration_with_2_1_x.migrate_sstable_with_counter_test
	  migration_test:TestMigration_with_2_1_x.migrate_sstable_with_counter_test_expect_fail
	  - with bypassing exception in distributed_loader::flush_upload_dir
	    to trigger the exception in table::open_sstable

"

* 'issues/4306/v3' of https://github.com/bhalevy/scylla:
  table: move sstable counters validation from load_sstable to open_sstable
  distributed_loader::load_new_sstables: handle exceptions in open_sstable
2019-03-24 20:30:44 +02:00
Avi Kivity
bd3a836e6c Merge "fixes for relocatable python3 packaging" from Takuya
"
Aligned way to build relocatable rpm with existing relocatable packages.
"

* 'relocatable-python3-fix-v3' of https://github.com/syuu1228/scylla:
  reloc: allow specify rpmbuild dir
  reloc/python3: archive package version number on build_reloc.sh
  reloc/python3: archive rpm build script in the relocatable package, build rpm using the script
  relloc/python3: fix PyYAML package name
  reloc: rename python3 relocatable package filename to align same style with other packages
  reloc: move relocatable python build scripts to reloc/python3 and dist/redhat/python3
2019-03-24 20:29:56 +02:00
Duarte Nunes
93a1c27b31 service/storage_proxy: Don't consider view hints for MV backpressure
When a view replica becomes unavailable, updates to it are stored as
hints at the paired based replica. This on-disk queue of pending view
updates grows as long as there are view updated and the view replica
remains unavailable. Currently, we take that relative queue size into
account when calculating the delay for new base writes, in the context
of the backpressure algorithm for materialized views.

However, the way we're calculating that on-disk backlog is wrong,
since we calculate it per-device and then feed it to all the hints
managers for that device. This means that normal hints will show up as
backlog for the view hints manager, which in turn introduces delays.
This can make the view backpressure mechanism kick-in even if the
cluster uses no materialized views.

There's yet another way in which considering the view hints backlog is
wrong: a view replica that is unavailable for some period of time can
cause the backlog to grow to a point where all base writes are applied
the maximum delay of 1 second. This turns a single-node failure into
cluster unavailability.

The fix to both issues is to simply not take this on-disk backlog into
account for the backpressure algorithm.

Fixes #4351
Fixes #4352

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190321170418.25953-1-duarte@scylladb.com>
2019-03-24 20:29:56 +02:00
Benny Halevy
32bf0f36ef table: move sstable counters validation from load_sstable to open_sstable
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-24 18:25:09 +02:00
Benny Halevy
564be8b720 distributed_loader::load_new_sstables: handle exceptions in open_sstable
Propagate exception to caller.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-24 18:25:09 +02:00
Takuya ASADA
efb3865840 reloc: allow specify rpmbuild dir
Aded same option on python3/build_rpm.sh, --builddir to specify rpmbuild
dir.
2019-03-24 00:34:09 +09:00
Takuya ASADA
dc5cec4194 reloc/python3: archive package version number on build_reloc.sh
Instead of getting python3 version number on build_rpm.sh, archive
version number when generating python3 relocatable package.
2019-03-24 00:27:24 +09:00
Takuya ASADA
4fed4fecf6 reloc/python3: archive rpm build script in the relocatable package, build rpm using the script
Since we archive rpm/deb build script on relocatable package and build
rpm/deb using the script, so align python relocatable package too.

Also added SCYLLA-RELOCATABLE-FILE, SCYLLA-RELEASE-FILE and SCYLLA-VERSION-FILE
since these files are required for relocatable package.
2019-03-24 00:27:16 +09:00
Takuya ASADA
b1283b23bb relloc/python3: fix PyYAML package name
On Fedora 29 (Scylla official toolchain uses it),
PyYAML package name is "python3-pyyaml", no uppercase character.
2019-03-24 00:27:02 +09:00
Takuya ASADA
3762c4447a reloc: rename python3 relocatable package filename to align same style with other packages 2019-03-24 00:26:48 +09:00
Takuya ASADA
a515324732 reloc: move relocatable python build scripts to reloc/python3 and dist/redhat/python3
To make easier to find build scripts and keep script filename simpler,
move them to python3 directory.
2019-03-24 00:25:50 +09:00
Tomasz Grabiec
bc4a614e17 Merge "Add scylla fiber gdb command" from Botond
Debugging continuations is challenging. There is no support from gdb for
finding out which continuation was this continuation called from, nor
what other continuations are attached to it. GDB's `bt` command is of
limited use, at best a handful of continuations will appear in the
backtrace, those that were ready. This series attempts to fill part of
this void and provides a command that answers the latter question: what
continuations are attached to this one?
`scylla fiber` allows for walking a continuation chain, printing each
continuation. It is supposed to be the seastar equivalent of `bt`.
The continuation chain is walked starting from an arbitrary task,
specified by the user. The command will print all continuations attached
to the specified task.
This series also contains some loosely related cleanup of existing
commands and code in `scylla-gdb.py`.

* https://github.com/denesb/scylla.git scylla-fiber-gdb-command/v4:
  scylla-gdb.py: fix static_vector
  scylla-gdb.py: std_unique_ptr: add get() method
  scylla-gdb.py: fix existing documentation
  scylla-gdb.py: fix tasks and task-stats commands
  scylla-gdb.py: resolve(): add cache parameter
  scylla-gdb.py: scylla_ptr: move actual logic into analyze()
  scylla-gdb.py: scylla_ptr: make analyze() usable for outside code
  scylla-gdb.py: scylla_ptr: accept any valid gdb expression as input
  scylla-gdb.py: add scylla fiber command
2019-03-23 10:20:20 +02:00
Asias He
7447c92d63 storage_service: Do not use the global gms::get_local_gossiper()
Use the gossiper object stored in _gossiper member from storage_service.
2019-03-22 09:11:26 +08:00
Asias He
b91452ed4c storage_service: Pass gossiper object to storage_service
Pass the gossiper object to storage_service class in order to avoid the
usage of the static object returned from get_local_gossiper().
2019-03-22 09:11:26 +08:00
Asias He
b2c110699e gms: Remove i_failure_detector.hh
It is not used any more.
2019-03-22 09:08:51 +08:00
Asias He
af579a055b gossip: Get rid of the gms::get_local_failure_detector static object
Store the failure_detector object inside gossiper object.

- No more the global object sharded<failure_detector>

- No need to initialize sharded<failure_detector> manually which
simplifies the code in tests/cql_test_env.cc and init.cc.
2019-03-22 09:08:51 +08:00
Asias He
2b6a4050c2 dht: Do not use failure_detector::is_alive in failure_detector_source_filter
Switch failure_detector_source_filter to use get_local_gossiper::is_alive
directly since we are going to remove the static
gms::get_local_failure_detector object soon.
Pass the nodes that are down to the filter direclty, to avoid the
range_streamer to depends on gossiper at all.
2019-03-22 08:26:47 +08:00
Asias He
9dbc4af1dd tests: Fix stop snitch in gossip_test.cc
It should stop snitch not failure detector. Fix it up. We are going to
remove the static failure_detector object soon.
2019-03-22 08:26:47 +08:00
Asias He
967794798a gossiper: Do not use value_factory from storage_service object
Avoid using value_factory from storage_service inside gossiper.
2019-03-22 08:26:47 +08:00
Asias He
4a55617c6c gossiper: Use cfg options from _cfg instead of get_local_storage_service
Gossiper has db::config _cfg now, avoid using the
get_local_storage_service() to get config options.
2019-03-22 08:26:44 +08:00
Asias He
ee1227b3ae gossiper: Pass db::config object to gossiper class
Gossiper calls service::get_local_storage_service() to get cfg options.
To avoid cyclic dependency, pass the cfg object to gossiper directly.
2019-03-22 08:25:16 +08:00
Asias He
1652ee512a init: Pass gossiper object to init_ms_fd_gossiper
In order to avoid the usage of the static gossiper object returned from
get_local_gossiper().
2019-03-22 08:25:16 +08:00
Rafael Ávila de Espíndola
51754ab068 Test that large data entries are deleted
This area is hard to test since we only issue deletes during
compaction and we wait for deletes only during shutdown.

That is probably worth it, seeing that two independent bugs would have
been found by this test.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 10:48:20 -07:00
Rafael Ávila de Espíndola
bd1593c12a Allow table to be stopped twice
This will be used in a testcase.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 10:47:59 -07:00
Rafael Ávila de Espíndola
c8da28a3eb Allow large_data_handler to be stopped twice
This will be used in a testcase.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 10:47:23 -07:00
Rafael Ávila de Espíndola
c0b0a6baeb configure: Add a --compress-exec-debuginfo option
The default is the old behavior, but it is now possible to configure
with --compress-exec-debuginfo=0 to get faster links but larger
binaries.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 09:55:54 -07:00
Rafael Ávila de Espíndola
ab53055640 configure: Move some flags from cxx_ld_flags to cxxflags
They are moved because they are not relevant for linking.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 09:55:39 -07:00
Rafael Ávila de Espíndola
e11cefab9c configure: rename per mode opt to cxx_ld_flags
It is the same name used in the build.ninja file.

A followup patch will add cxxflags and move compiler only flags there.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 09:46:58 -07:00
Rafael Ávila de Espíndola
443a85a68c configure: remove per mode libs
It was always empty.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 09:46:32 -07:00
Rafael Ávila de Espíndola
35c7ec6777 configure: remove sanitize_libs and merge sanitize into opt
These are flags we want to pass to both compilation and linking. There
is nothing special about the fact that they are sanitizer related.

With {sanitize} being passed to the link, we don't need {sanitize_libs}.

We do need to make sure -fno-sanitize=vptr is the last one in the
command line. Before we were implicitly getting it from seastar, but
it is bad practice to get some sanitizer flags from seastar but not
others.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-21 09:43:02 -07:00
Duarte Nunes
5752174762 Merge 'Use staging directory for uploaded sstables awaiting view updates' from Piotr
"
This series adds moving sstables uploaded via `nodetool refresh` to
staging/ directory if they require generating view updates from them.
Previous behavior (leaving these sstables in upload/ directory until
view updates are generated) might have caused sstables with
conflicting names to be mistakenly overwritten by the user.

Fixes #4047

Tests: unit (dev)
dtest: backup_restore_tests.py + backup_restore_tests.py modified with
       having materialized view definitions
"

* 'use_staging_directory_for_uploaded_sstables_awaiting_view_updates' of https://github.com/psarna/scylla:
  sstables: simplify requires_view_building
  loader: move uploaded view pending sstables to staging
2019-03-21 12:46:02 -03:00
Gleb Natapov
bb93d990ad messaging_service: keep shared pointer to an rpc connection while opening mutation fragment stream
Current code captures a reference to rpc::client in a continuation, but
there is no guaranty that the reference will be valid when continuation runs.
Capture shared pointer to rpc::client instead.

Fixes #4350.

Message-Id: <20190314135538.GC21521@scylladb.com>
2019-03-21 12:46:01 -03:00
Tomasz Grabiec
69775c5721 row_cache: Fix abort in cache populating read concurrent with memtable flush
When we're populating a partition range and the population range ends
with a partition key (not a token) which is present in sstables and
there was a concurrent memtable flush, we would abort on the following
assert in cache::autoupdating_underlying_reader:

     utils::phased_barrier::phase_type creation_phase() const {
         assert(_reader);
         return _reader_creation_phase;
     }

That's because autoupdating_underlying_reader::move_to_next_partition()
clears the _reader field when it tries to recreate a reader but it finds
the new range to be empty:

         if (!_reader || _reader_creation_phase != phase) {
            if (_last_key) {
                auto cmp = dht::ring_position_comparator(*_cache._schema);
                auto&& new_range = _range.split_after(*_last_key, cmp);
                if (!new_range) {
                    _reader = {};
                    return make_ready_future<mutation_fragment_opt>();
                }

Fix by not asserting on _reader. creation_phase() will now be
meaningful even after we clear the _reader. The meaning of
creation_phase() is now "the phase in which the reader was last
created or 0", which makes it valid in more cases than before.

If the reader was never created we will return 0, which is smaller
than any phase returned by cache::phase_of(), since cache starts from
phase 1. This shouldn't affect current behavior, since we'd abort() if
called for this case, it just makes the value more appropriate for the
new semantics.

Tests:

  - unit.row_cache_test (debug)

Fixes #4236
Message-Id: <1553107389-16214-1-git-send-email-tgrabiec@scylladb.com>
2019-03-21 12:46:00 -03:00
Asias He
c0f744b407 storage_service: Wait for gossip to settle only if do_bind is set
In commit 71bf757b2c, we call
wait_for_gossip_to_settle() which takes some time to complete in
storage_service::prepare_to_join().

In tests/cql_query_test calls init_server with do_bind == false which in
turn calls storage_service::prepare_to_join(). Since in the test, there
is only one node, there is no point to wait for gossip to settle.

To make the cql_query_test fast again, do not call
wait_for_gossip_to_settle if do_bind is false.

Before this patch, cql_query_test takes forever to complete.
After it takes 10s.

Tests: tests/cql_query_test
Message-Id: <3ae509e0a011ae30eef3f383c6a107e194e0e243.1553147332.git.asias@scylladb.com>
2019-03-21 12:46:00 -03:00
Avi Kivity
a9cf07369f Merge "Add local indexes" from Piotr
"
This series adds support for local indexing, i.e. when the index table
resides on the same partition as base data.
It addresses the performance issue of having an indexed query
that also specifies a partition key - index will be queried
locally.
"

* 'add_local_indexing_11' of https://github.com/psarna/scylla: (30 commits)
  tests: add cases for local index prefix optimization
  tests: add create/drop local index test case
  tests: add non-standard names cases to local index tests
  tests: add multi pk case for local index tests
  tests: add test for malformed local index definitions
  tests: add local index paging test
  tests: add local indexing test
  cql3: add CREATE INDEX syntax for local indexes
  cql3: use serialization function to create index target string
  index: add serialization function for index targets
  index: use proper local index target when adding index
  index: add parsing target column name from local index targets
  db: add checking for local index in schema tables
  index: add checking if serialized target implies local index
  index: enable parsing multi-key targets
  index: move target parser code to .cc file
  json: add non-throwing overload for to_json_value
  cql3: add checking for local indexes in has_supporting_index()
  cql3: move finding index restrictions to prepare stage
  cql3: add picking an index by score
  ...
2019-03-21 12:46:00 -03:00
Nadav Har'El
561c640ed1 materialized views: allow view without clustering columns
When a materialized view was created, the verification code artificially
forbade creating a view without a clustering key column. However, there
is no real reason to forbid this. In the trivial case, the original base
table might not have had a clustering key, and the view might want to use
the exact same key. In a more complex case, a view may want to have all the
primary key columns as *partition* key columns, and that should be fine.

The patch also includes a regression test, which failed before this patch,
and succeeds with it (we test that we can create materialized views in both
aforementioned scenarios, and these materialized views work as expected).

Duarte raised the opinion that the "trivial" case of a view table with
a key identical to that of the base should be disallowed. However, this
should be done, if at all (I think it shouldn't), in a follow-up patch,
which will implement the non-triviality requirement consistently (e.g.,
require view primary key to be different from base's, regardless of
the existance or non-existance of clustering columns).

Fixes #4340.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Message-Id: <20190320122925.10108-1-nyh@scylladb.com>
2019-03-21 12:45:52 -03:00
Glauber Costa
34b640993f storage proxy: add tracepoints about delays
When we are tracing requests, we would like to know everything that
happened to a query that can contribute to it having increased
latencies.

We insert some of those latencies explicitly due to throttling, but we
do not log that into tracing.

In the case of storage proxy, we do have a log message at trace level
but that is rarely used: trace messages are too heavy of a hammer, there
is no way to specify specific queries, etc.

The correct place for that is CQL tracing. This patch moves that message
to CQL tracing. We also add a matching tracepoint assuring us that no
delay happened if that's the case.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190320163350.15075-1-glauber@scylladb.com>
2019-03-21 12:45:52 -03:00
Avi Kivity
eddb98e8c6 Merge "sstables: mc: Write and read static compact tables the same way as Cassandra" from Tomasz
"
Static compact tables are tables with compact storage and no
clustering columns.

Before this patch, Scylla was writing rows of static compact tables as
clustered rows instead of as static rows. That's because in our in-memory
model such tables have regular rows and no static row. In Cassandra's
schema (since 3.x), those tables have columns which are marked as
static and there are no regular columns.

This worked fine as long as Scylla was writing and reading those
sstables. But when importing sstables from Cassandra, our reader was
skipping the static row, since it's not present in our schema, and
returning no rows as a result. Also, Cassandra, and Scylla tools,
would have problems reading those sstables.

Fix this by writing rows for such tables the same way as Cassandra
does. In order to support rolling downgrade, we do that only when all
nodes are upgraded.

Fixes #4139.

Tests:

  - unit (dev)
"

* tag 'static-compact-mc-fix-v3.1' of github.com:tgrabiec/scylla:
  tests: sstables: Test reading of static compact sstable generated by Cassandra
  tests: sstables: Add test for writing and reading of static compact tables
  sstables: mc: Write static compact tables the same way as Cassandra
  sstable: mc: writer: Set _static_row_written inside write_static_row()
  sstables: Add sstable::features()
  sstables: mc: writer: Prepare write_static_row() for working with any column_kind
  storage_service: Introduce the CORRECT_STATIC_COMPACT feature flag
  sstables: mc: writer: Build indexed_columns together with serialization_header
  sstables: mc: writer: De-optimize make_serialization_header()
  sstable: mc: writer: Move attaching of mc-specific components out of generic code
2019-03-21 12:45:51 -03:00
Rafael Ávila de Espíndola
53ab298957 Turn cql3_type into a trivial wrapper over data_type
Both cql3_type and abstract_type are normally used inside
shared_ptr. This creates a problem when an abstract_type needs to refer
to a cql3_type as that creates a cycle.

To avoid warnings from asan, we were using a std::unordered_map to
store one of the edges of the cycle. This avoids the warning, but
wastes even more memory.

Even before this patch cql3_type was a fairly light weight
structure. This patch pushes in that direction and now cql3_type is a
struct with a single member variable, a data_type.

This avoids the reference cycle and is easier to understand IMHO.

Tests: unit (dev)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 14:10:28 -07:00
Rafael Ávila de Espíndola
c76148b6ce Delete cql3_type::varchar
varchar is just an alias for text. Handle that conversion directly in
the parser and delete the cql3_type::varchar variable.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 14:07:46 -07:00
Rafael Ávila de Espíndola
7f64a6ec4b Simplify db::cql_type_parser::parse
Since its first version, db::cql_type_parser::parse had special cases
for native and user defined types.

Those are not necessary, as the general parser has no problem handling
them.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 12:44:31 -07:00
Rafael Ávila de Espíndola
088d59aced Add a test for the varchar column representation
We map varchar to text, and so does cassandra.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 12:44:31 -07:00
Rafael Ávila de Espíndola
8d9baf9843 large_data_handler: Make a variable non static
The value computed is not static since
f254664fe6, but unfortunately that was
missed in that commit.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 09:31:21 -07:00
Rafael Ávila de Espíndola
e7749e7aee large_data_handler: Fix a use after destruction
The path leading to the issue was:

The sstable name is allocated and passed to maybe_delete_large_data_entries by reference

   auto name = sst->get_filename();
   return large_data_handler.maybe_delete_large_data_entries(*sst->get_schema(), name, sst->data_size());

A future is created with a reference to it

  large_partitions = with_sem([&s, &filename, this] {
     return delete_large_data_entries(s, filename, db::system_keyspace::LARGE_PARTITIONS);
  });

The semaphore blocks.

The filename is destroyed.

delete_large_data_entries is called with a destroyed filename.

The reason this did not reproduce trivially in a debug build was that
the sstable itself was in the stack and the destructed value was read
as an internal value, and so asan had nothing to complain about.

Unfortunately we also had no tests that the entry in
system.large_rows was actually deleted.

This patch passes the name by value. It might create up to 3 copies of
it. If that is too inefficient it can probably be avoided with a
do_with in maybe_delete_large_data_entries.

Fixes #4335

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 09:30:42 -07:00
Rafael Ávila de Espíndola
c250a26e68 configure: split a ld_flags_{mode} out of cxxflags_{mode}
Flags that we want to pass to gcc during compilation and linking are
in cxx_ld_flags_{mode}.

With this patch, we no longer pass

-I. -I build/{mode}/gen

to the link, which should have no impact.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-20 08:33:23 -07:00
Piotr Sarna
9695a47e96 sstables: simplify requires_view_building
Since sstables uploaded via upload/ directory are no longer left there
awaiting view updates, the only remaining valid directory is staging/.
2019-03-20 13:47:21 +01:00
Botond Dénes
0c381572fd repair::row_level: pin table for local reads
The repair reader depends on the table object being alive, while it is
reading. However, for local reads, there was no synchronization between
the lifecycle of the repair reader and that of the table. In some cases
this can result in use-after-free. Solve by using the table's existing
mechanism for lifecycle extension: `read_in_progress()`.

For the non-local reader, when the local node's shard configuration is
different from the remote one's, this problem is already solved, as the
multishard streaming reader already pins table objects on the used
shards. This creates an inconsistency that might be suprising (in a bad
way). One reader takes care of pinning needed resources while the other
one doesn't. I was thorn on how to reconcile this, and decided to go
with the simplest solution, explicitely pinning the table for local
reads, that is conserve the inconsistency. It was suggested that this
inconsitency is remedied by building resource pinning into the local
reader as well [1] but there is opposition to this [2]. Adding a wrapper
reader which does just the resource pinning seems excessive, both in
code and runtime overhead.

Spotted while investigating repair-related crashes which occured during
interrupted repairs.

Fixes: #4342

[1] https://github.com/scylladb/scylla/issues/4342#issuecomment-474271050
[2] https://github.com/scylladb/scylla/issues/4342#issuecomment-474331657

Tests: none, this is a trivial fix for a not-yet-seen-in-the-wild bug.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <8e84ece8343468960d4e161467ecd9bb10870c27.1553072505.git.bdenes@scylladb.com>
2019-03-20 14:45:22 +02:00
Piotr Sarna
986004a959 loader: move uploaded view pending sstables to staging
When loading tables uploaded via `nodetool refresh`, they used to be
left in upload/ directory if view updates would need to be generated
from them. Since view update generation is asynchronous, sstables
left in the directory could erroneously get overwritten by the user,
who decides to upload another batch of sstables and some of the names
collided.
To remedy this, uploaded sstables that need view updates are moved
to staging/ directory with a unique generation number, where they
await view update generation.

Fixes #4047
2019-03-20 13:44:29 +01:00
Juliana Oliveira
8cd6028d0d Dockerfile: remove cgroup volume mount
Mounting /sys/fs/cgroup inside the image causes docker cgroup to not
be mounted internally. Therefore, hosts cannot limit resources on
Scylla. This patch removes the cgroup volume mount, allowing folders
under /sys/fs/cgroup to be created inside docker.

Message-Id: <20190320122053.GA20256@shenzou.localdomain>
2019-03-20 14:30:27 +02:00
Nadav Har'El
7c874057f5 materialized_views: propagate "view virtual columns" between nodes
db::schema_tables::ALL and db::schema_tables::all_tables() are both supposed
to list the same schema tables - the former is the list of their names, and
the latter is the list of their schemas. This code duplication makes it easy
to forget to update one of them, and indeed recently the new
"view_virtual_columns" was added to all_tables() but not to ALL.

What this patch does is to make ALL a function instead of constant vector.
The newly named all_table_names() function uses all_tables() so the list
of schema tables only appears once.

So that nobody worries about the performance impact, all_table_names()
caches the list in a per-thread vector that is only prepared once per thread.

Because after this patch all_table_names() has the "view_virtual_columns"
that was previously missing, this patch also fixes #4339, which was about
virtual columns in materialized views not being propagated to other nodes.

Unfortunately, to test the fix for #4339 we need a test with multiple
nodes, so we cannot test it here in a unit test, and will instead use
the dtest framework, in a separate patch.

Fixes #4339

Branches: 3.0
Tests: all unit tests (release and debug mode), new dtest for #4339. The unit test mutation_reader_test failed in debug mode but not in release mode, but this probably has nothing to do with this patch (?).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Message-Id: <20190320063437.32731-1-nyh@scylladb.com>
2019-03-20 09:14:59 -03:00
Nadav Har'El
ccf731a820 Materialized views: add metric for current flow-control delay
The materialized views flow control mechanism works by adding a certain
delay to each client request, designed to slow down the client to the
rate at we can complete the background view work. Until now we could observe
this mechanism only indirectly, in whether or not it succeeded to keep the
view backlog bounded; But we had no way to directly observe the delay that
we decided to add. In fact, we had a bug where this delay was constantly
zero, and we didn't even notice :-)

So in this patch we add a new metric,
scylla_storage_proxy_coordinator_last_mv_flow_control_delay

The metric is a floating point number, in units of seconds.

This metric is somewhat peculiar that it always contains the *last* delay
used for some request - unlike other metrics it doesn't measure the "current"
value of something. Moreover, it can jump wildly because there is no
guarantee that each request's delay will be identical (in particular,
different requests may involve different base replicas which have different
view backlogs, so decide on different delays). In the future we may want
to supplement this metric with some sort of delay histogram. But even
this simple metric is already useful to debug certain scenarios and
understand if the materialized-views flow control is working or not.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190227133630.26328-1-nyh@scylladb.com>
2019-03-20 09:14:59 -03:00
Tomasz Grabiec
fbeae4ffeb toolchain: Install gdb in the image
Scylla built using the frozen toolchain needs to be debugged
on a system with matching libraries. It's easiest if it's also done on the same image.
Install gdb in the image so that it's always out there when we need it.

Fixes #4329

Message-Id: <1553072393-9145-1-git-send-email-tgrabiec@scylladb.com>
2019-03-20 13:35:26 +02:00
Piotr Sarna
41679de13e tests: add cases for local index prefix optimization
The cases check if incorporating clustering key prefix into
the indexed query works fine (i.e. does not require filtering
and returns proper rows).
2019-03-20 10:51:27 +01:00
Piotr Sarna
56a0e6d992 tests: add create/drop local index test case 2019-03-20 10:51:27 +01:00
Piotr Sarna
3c61c8e18a tests: add non-standard names cases to local index tests
New test cases cover case-sensitive column/table names and names with
non-alphanumeric characters like commas and parentheses.
2019-03-20 10:51:27 +01:00
Piotr Sarna
d664e0e522 tests: add multi pk case for local index tests 2019-03-20 10:51:27 +01:00
Piotr Sarna
3b39029924 tests: add test for malformed local index definitions 2019-03-20 10:51:27 +01:00
Piotr Sarna
4b82011cd3 tests: add local index paging test 2019-03-20 10:51:27 +01:00
Piotr Sarna
8836500fcd tests: add local indexing test
A test case for local indexing is added to the SI suite.
2019-03-20 10:51:27 +01:00
Piotr Sarna
cedec95f8d cql3: add CREATE INDEX syntax for local indexes
In order to create a local index, the syntax used is:
CREATE INDEX t ON ((p1, p2, p3), v);

where (p1, p2, p3) are partition key columns (all of them),
and v is the indexed column.
2019-03-20 10:51:27 +01:00
Piotr Sarna
1fd61c5ac4 cql3: use serialization function to create index target string
Instead of building the string manually, a serialization function
is called to create a string out of index target list.
2019-03-20 10:51:27 +01:00
Piotr Sarna
757419b524 index: add serialization function for index targets
Since target_parser is responsible for deserializing target strings,
the function that serializes them belongs in the same class.
2019-03-20 10:51:26 +01:00
Piotr Sarna
074ed2c8a5 index: use proper local index target when adding index
With global indexes, target column name is always the same as the string
kept in 'options[target]' field. It's not the case for local indexes,
and so a proper extracting function is used to get the value.
2019-03-20 10:20:24 +01:00
Piotr Sarna
2fcae3d0ec index: add parsing target column name from local index targets
When (re)creating a local index, the target string needs to be used
to parse out the actual indexed column:
"(base_pk_part1,base_pk_part2,base_pk_part3),actual_indexed_column".
This column is later used to deterine if an index should be applied
to a SELECT statement.
2019-03-20 10:20:24 +01:00
Piotr Sarna
e0d7807eed db: add checking for local index in schema tables
Based on which targets the index has, it will be either local
or global - local indexes have their full base partition key
embedded in their targets.
2019-03-20 10:20:24 +01:00
Piotr Sarna
de5e5ee1a5 index: add checking if serialized target implies local index
This utility enables checking if the specified target indicated
having a local index, even before base table schema is known.
2019-03-20 10:20:24 +01:00
Piotr Sarna
5672edc149 index: enable parsing multi-key targets
Parsing index targets that consist of partition key columns
followed by clustering key columns is enabled.
2019-03-20 10:20:24 +01:00
Piotr Sarna
9782381dd4 index: move target parser code to .cc file
It will be useful later when expanding the implementation.
2019-03-20 10:20:24 +01:00
Piotr Sarna
25264d61ee json: add non-throwing overload for to_json_value
It will be needed later to avoid unnecessary try-catch blocks.
2019-03-20 10:20:24 +01:00
Piotr Sarna
b46ab76d4b cql3: add checking for local indexes in has_supporting_index()
With local indexes it's not sufficient to check if a single
restriction is supported by an index in order to decide
that in can be used, because local indexes can be leveraged
only when full partition key is properly restricted.

(It also serves as a great example why restrictions code
 would greatly benefit from a facelift! :) )
2019-03-20 10:20:24 +01:00
Piotr Sarna
87f6e37caa cql3: move finding index restrictions to prepare stage
Index restrictions that match a given index were recomputed
during execution stage, which is redundant and prone to errors.
Now, used index restrictions are cached in a prepare statement.
2019-03-20 10:20:22 +01:00
Piotr Sarna
9823898b27 cql3: add picking an index by score
Instead of choosing the first index that we find (in column def order),
the index with highest score is picked. Currently local indexes
score higher than global ones if restrictions allow local indexing
to be applied.
2019-03-20 10:20:02 +01:00
Piotr Sarna
2f173f7ed8 cql3: add handling paging state for local indexes
When computing paging state for local indexes, the partition
and clustering keys are different than with global ones:
 - partition key is the same as base's
 - clustering key starts with the indexed column
2019-03-20 10:20:02 +01:00
Piotr Sarna
75dd964751 cql3: add handling partition slices for local indexes
For local indexes, a slice will consist of the indexed column
followed by base clustering columns.
2019-03-20 10:20:01 +01:00
Piotr Sarna
b12162c8f5 cql3: add returning correct partition ranges for local indexes
Local indexes always share the partition range with their base.
2019-03-20 09:51:46 +01:00
Piotr Sarna
da8e8f18b3 cql3: make read_posting_list a member function
It already accepts several arguments that can be extracted from 'this',
and more will be added in the future.
New parameters include lambdas prepared during prepare stage
that define how to extract partition/clustering key ranges depending
on which index is used, so keeping it a static function will result
in unbounded number of parameters with complex types, which will
in turn make the function header almost illegible for a reader.
Hence, read_posting_list becomes a member function with easy access
to any data prepared during prepare stage.
2019-03-20 09:51:46 +01:00
Piotr Sarna
85017c5ad4 cql3: look for indexed column definition only once
There's no need to look for the column definition inside a loop.
2019-03-20 09:51:46 +01:00
Piotr Sarna
8002471c81 cql3: allow index target to keep multiple columns
Instead of having just one column definition, index target is now
a variant of either single column definition or a vector of them.
The vector is expected to be used when part of a target definition
is enclosed in parentheses:
 $ CREATE INDEX ON t((p),v);
or
 $ CREATE INDEX ON t((p1,p2), v);
etc.

This feature will allow providing (possibly composite) base partition key
to CREATE INDEX statement, which will result in creating a local index.
2019-03-20 09:51:46 +01:00
Piotr Sarna
a45022dbc7 docs: document index target serialization
Index target serialization format is extended for the purpose
of local indexing. Both new and old formats are described
in docs.
2019-03-20 09:51:46 +01:00
Piotr Sarna
9c984f9da9 index: fix indentation 2019-03-20 09:51:46 +01:00
Piotr Sarna
3b908b7b5d index: add base partition keys to local index schema
When the index is local, its partition key in underlying materialized
view is the the same as base's, and the indexed column is a first
clustering key. This implementation ensures that view and base rows
will reside on the same partition, while querying the indexed column
will be possible by putting it as a first clustering key part.
2019-03-20 09:51:46 +01:00
Piotr Sarna
90d47ca183 schema: add is_local_index cached value to index metadata
In order to quickly distinguish global indexes from local ones,
a cached boolean value is introduced.
2019-03-20 09:51:46 +01:00
Botond Dénes
ddf795d2f9 configure.py: add check header targets
Our guidelines dictate that each header is self-sufficient, i.e.
after including it into an empty .cc file, the .cc file can be compiled
without having to include any other header file.
Currently we don't have any tool to check that a header is self
sufficient. This patch aims to remedy that by adding a target to check
each header, as well as a target to check all the headers.
For each header a target is generated that does the equivalent of
including the header into an empty .cc file, then compiling the
resulting .cc file.This targetis called {header_name}.o, so for
given the header `myheader.hh` this will be `build/dev/myheader.hh.o`
(if the dev build-mode is used).
Also a target, `checkheaders` is added which validates all headers in
the project. This currently fails as we have many headers that are not
self-sufficient.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <fdf550dc71203417252f1d8144e7a540eec074a1.1552636812.git.bdenes@scylladb.com>
2019-03-19 17:35:18 +02:00
Botond Dénes
721dd70d93 scylla-gdb.py: add scylla fiber command
The scylla fiber command traverses a continuation chain, given an
arbitrary task pointer.
Example (cropped for brevity):
(gdb) scylla fiber this
 #0  (task*) 0x0000600000550360 0x000000000468ac40 vtable for seastar...
 #1  (task*) 0x0000600000550300 0x00000000046c3778 vtable for seastar...
 #2  (task*) 0x00006000018af600 0x00000000046c37a0 vtable for seastar...
 #3  (task*) 0x00006000005502a0 0x00000000046c37f0 vtable for seastar...
 #4  (task*) 0x0000600001a65e10 0x00000000046c6b10 vtable for seastar...

scylla fiber can be passed any expression that evaluates to a task
pointer. C++ variables, raw adresses and GDB variables (e.g. $1) all
work.

The command works by scanning the task object for pointers. If a pointer
is found it is dereferenced. If successful it checks that the pointer
dereferences to a vtable, the class for which is a known task.
If this succeeds the found task is saved, the scan then recursively
proceeds to scan the newly found task until a task with no further
attached continuations is found.
2019-03-19 17:06:41 +02:00
Botond Dénes
697fc5cefe scylla-gdb.py: scylla_ptr: accept any valid gdb expression as input 2019-03-19 17:06:41 +02:00
Botond Dénes
e1ea4db7ca scylla-gdb.py: scylla_ptr: make analyze() usable for outside code
Instead of a formatted message, intended for humans, return a
`pointer_metadata` object, suitable for being using by code. The
formatting of the pointer metadata into the human readable message is
now done by the `pointer_metadata.__str__()` method, on the call site.

Also make `analyze()` a class method, making it possible for being
called without having to create a `scylla_ptr` command instance,
possibly confusing GDB.
2019-03-19 17:06:41 +02:00
Botond Dénes
e77b6d12d1 scylla-gdb.py: scylla_ptr: move actual logic into analyze()
In preparation to this method being made usable for outside code.
2019-03-19 17:06:41 +02:00
Botond Dénes
7d5c0ff666 scylla-gdb.py: resolve(): add cache parameter
Allow callers to prevent the resolved name from being saved. Useful when
one is just probing addresses but doesn't want to flood the cache with
useless symbols.
2019-03-19 17:06:41 +02:00
Botond Dénes
48b96d25b3 scylla-gdb.py: fix tasks and task-stats commands
These two commands are broken for some time, roughly since the CPU
scheduler was merged. Fix them and move the task queue parsing code into
a common method, which now is used by both commands.
2019-03-19 17:06:41 +02:00
Botond Dénes
87c28df429 scylla-gdb.py: fix existing documentation
Some commands are documented, but not in the python way. Refactor these
commands so they use the standard python way for self documenting. In
addition to being more "python", this makes these documentation strings
discoverable by GDB so they appear in the `help scylla` output.
2019-03-19 17:06:41 +02:00
Botond Dénes
e1dffc3850 scylla-gdb.py: std_unique_ptr: add get() method
Add a `get()` method that retrieves the wrapped pointer without
dereferencing it. All existing methods are refactored to use this new
method to obtain the pointer instead of directly accessing the members.
This way only a single method has to be fixed if the object
implementation changes.
2019-03-19 17:06:41 +02:00
Botond Dénes
c51b11c0ed scylla-gdb.py: fix static_vector
Appearantly a new 'dummy' level was added.
2019-03-19 17:06:41 +02:00
Glauber Costa
7119440cbc tests: make sure that commitlog replay works after truncate.
Tomek and I recently had a discussion about whether or not a commitlog
replay would be safe after we dropped or truncated a table that is not
flushed (durable, but auto_snapshots being false).

While we agreed that would be the safe, we both agreed we would feel
better with a unit test covering that.

This patch adds such a test (btw, it passes)

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190318223811.6862-1-glauber@scylladb.com>
2019-03-19 11:30:51 +01:00
Avi Kivity
0441b59a70 Update seastar submodule
* seastar 463d24e...33baf62 (3):
  > reactor: improve detection of io_pgetevents()
  > rpc: fix stack use after free in frame reading functions
  > core/thread: enable move-only functions
2019-03-19 11:44:35 +02:00
Takuya ASADA
32cee92d56 dist/debian: don't strip ld.so
On some environment dh_strip fails at libreloc/ld.so, so it's better to
skip too just like libprotobuf.so.15.

error message is:
dh_strip -Xlibprotobuf.so.15 --dbg-package=scylla-server-dbg
strip:debian/scylla-server/opt/scylladb/libreloc/ld.so[.gnu.build.attributes]: corrupt GNU build attribute note: bad description size: Bad value
dh_strip: strip --remove-section=.comment --remove-section=.note --strip-unneeded debian/scylla-server/opt/scylladb/libreloc/ld.so returned exit code 1
0

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190319005153.26506-1-syuu@scylladb.com>
2019-03-19 11:06:44 +02:00
Asias He
71bf757b2c gossiper: Enable features only after gossip is settled
n1, n2, n3 in the cluster,

shutdown n1, n2, n3

start n1, n2

start n3, we saw features are enabled using the system table while n1 and n2 are already up and running in the cluster.

INFO  2019-02-27 09:24:41,023 [shard 0] gossip - Feature check passed. Local node 127.0.0.3 features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH}, Remote common_features = {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH}
INFO  2019-02-27 09:24:41,025 [shard 0] storage_service - Starting up server gossip
INFO  2019-02-27 09:24:41,063 [shard 0] gossip - Node 127.0.0.1 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH}
INFO  2019-02-27 09:24:41,063 [shard 0] gossip - Node 127.0.0.2 does not contain SUPPORTED_FEATURES in gossip, using features saved in system table, features={CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS, DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS, LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT, RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3, STREAM_WITH_RPC_STREAM, TRUNCATION_TABLE, WRITE_FAILURE_REPLY, XXHASH}

The problem is we enable the features too early in the start up process.
We should enable features after gossip is settled.

Fixes #4289
Message-Id: <04f2edb25457806bd9e8450dfdcccc9f466ae832.1551406991.git.asias@scylladb.com>
2019-03-18 18:25:29 +01:00
Dejan Mircevski
c7d05b88a6 Update GCC version check in configure.py
This brings the version check up-to-date with README.md and HACKING.md,
which were updated by commit fa2b03 ("Replace std::experimental types
with C++17 std version.") to say that minimum GCC 8.1.1 is required.

Tests: manually run configure.py with various `--compiler` values.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
Message-Id: <20190318130543.24982-1-dejan@scylladb.com>
2019-03-18 15:24:25 +02:00
Tomasz Grabiec
33f15aa1b5 tests: sstables: Test reading of static compact sstable generated by Cassandra 2019-03-18 11:18:33 +01:00
Tomasz Grabiec
c78568daef tests: sstables: Add test for writing and reading of static compact tables 2019-03-18 11:18:33 +01:00
Tomasz Grabiec
47ca280e57 sstables: mc: Write static compact tables the same way as Cassandra
Static compact tables are tables with compact storage and no
clustering columns.

Before this patch, Scylla was writing rows of static compact tables as
clustered rows instead of static rows. That's because in our in-memory
model such tables have regular rows and no static row. In Cassandra's
schema (since 3.x), those tables have columns which are marked as
static and there are no regular columns.

This worked fine as long as Scylla was writing and reading those
sstables. But when importing sstables from Cassandra, our reader was
skipping the static row, since it's not present in the schema, and
returning no rows as a result. Also, Cassandra, and Scylla tools,
would have problems reading those sstables.

Fix this by writing rows for such tables the same way as Cassandra
does. In order to support rolling downgrade, we do that only when all
nodes are upgraded.

Fixes #4139.
2019-03-18 11:18:33 +01:00
Tomasz Grabiec
b0ff68d8d9 sstable: mc: writer: Set _static_row_written inside write_static_row() 2019-03-18 11:18:33 +01:00
Tomasz Grabiec
b68df143a1 sstables: Add sstable::features() 2019-03-18 11:18:33 +01:00
Tomasz Grabiec
cf9721e855 sstables: mc: writer: Prepare write_static_row() for working with any column_kind 2019-03-18 11:18:33 +01:00
Tomasz Grabiec
fefef7b9eb storage_service: Introduce the CORRECT_STATIC_COMPACT feature flag
When enabled on all nodes, sstable writers will start to produce
correct MC-format sstables for compact storage tables by writing rows
into the static row (like C*) rather than into the regular row.

We only do that when all nodes are upgraded to support rolling
downgrade. After all nodes are upgraded, regular rolling downgrade will
not be possible.

Refs #4139
2019-03-18 11:18:33 +01:00
Tomasz Grabiec
52d634025d sstables: mc: writer: Build indexed_columns together with serialization_header
The set of columns in both must match, so it's better to build them
together.  Later the for choosing columns will become more
complicated, and this patch will allow for avoiding duplication.
2019-03-18 11:18:33 +01:00
Tomasz Grabiec
701ac53b80 sstables: mc: writer: De-optimize make_serialization_header()
So that it's easier to make it use schema_v3 conditionally in later
patches. It's not on the hot path, so it shouldn't matter that we
don't reserve the vectors.
2019-03-18 11:15:18 +01:00
Tomasz Grabiec
8bb8d67a93 sstable: mc: writer: Move attaching of mc-specific components out of generic code 2019-03-18 11:15:18 +01:00
Tomasz Grabiec
b0e6f17a22 Merge "Fix empty remote common_features in check_knows_remote_features" from Asias
Three nodes in the cluster node1, node2, node3

Shutdown the whole cluster

Start node1

Start node2, node2 sees empty remote common_features.

   gossip - Feature check passed.  Local node 127.0.0.2 features =
   {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS,
   DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS,
   LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT,
   RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3,
   STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH},
   Remote common_features = {}

The problem is node3 hasn't started yet, node1 sees node3 has empty
features. In get_supported_features(), an empty common features will be
returned if an empty features of a node is seen. To fix, we should
fallback to use the features saved in system table.

Start node3, node3 sees empty remote common_features.

   gossip - Feature check passed. Local node 127.0.0.3 features =
   {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS,
   DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS,
   LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT,
   RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3,
   STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH},
   Remote common_features = {}

The problem is node3 hasn't inserted its own features into gossip
endpoint_state_map. get_supported_features() returns the common features
of all nodes in endpoint_state_map. To fix, we should fallback to use
the features stored in the system table for such node in this case.

Fixes #4225
Fixes #4341

* dev asias/fix_check_knows_remote_features.upstream.v4.1:
  gossiper: Remove unused register_feature and unregister_feature
  gossiper: Remove unused wait_for_feature_on_all_node and
    wait_for_feature_on_node
  gossiper: Log feature is enabled only if the feature is not enabled
    previously
  gossiper: Fix empty remote common_features in
    check_knows_remote_features
2019-03-18 10:56:10 +01:00
Asias He
1d59f26c11 gossiper: Fix empty remote common_features in check_knows_remote_features
Three nodes in the cluster node1, node2, node3

Shutdown the whole cluster

Start node1

Start node2, node2 sees empty remote common_features.

   gossip - Feature check passed.  Local node 127.0.0.2 features =
   {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS,
   DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS,
   LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT,
   RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3,
   STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH},
   Remote common_features = {}

The problem is node3 hasn't started yet, node1 sees node3 has empty
features. In get_supported_features(), an empty common features will be
returned if an empty features of a node is seen. To fix, we should
fallback to use the features saved in system table.

Start node3, node3 sees empty remote common_features.

   gossip - Feature check passed. Local node 127.0.0.3 features =
   {CORRECT_COUNTER_ORDER, CORRECT_NON_COMPOUND_RANGE_TOMBSTONES, COUNTERS,
   DIGEST_MULTIPARTITION_READ, INDEXES, LARGE_PARTITIONS,
   LA_SSTABLE_FORMAT, MATERIALIZED_VIEWS, MC_SSTABLE_FORMAT,
   RANGE_TOMBSTONES, ROLES, ROW_LEVEL_REPAIR, SCHEMA_TABLES_V3,
   STREAM_WITH_RPC_STREAM, WRITE_FAILURE_REPLY, XXHASH},
   Remote common_features = {}

The problem is node3 hasn't inserted its own features into gossip
endpoint_state_map. get_supported_features() returns the common features
of all nodes in endpoint_state_map. To fix, we should fallback to use
the features stored in the system table for such node in this case.

Fixes #4225
2019-03-18 10:56:10 +01:00
Asias He
acb4badbc3 gossiper: Log feature is enabled only if the feature is not enabled previously
We saw the log "Feature FOO is enabled" more than once like below. It is
better to log it only when the feature is not enabled previously.

    gossip - InetAddress 127.0.0.1 is now UP, status = NORMAL
    gossip - Feature CORRECT_COUNTER_ORDER is enabled
    gossip - Feature CORRECT_NON_COMPOUND_RANGE_TOMBSTONES is enabled
    gossip - Feature COUNTERS is enabled
    gossip - Feature DIGEST_MULTIPARTITION_READ is enabled
    gossip - Feature INDEXES is enabled
    gossip - Feature LARGE_PARTITIONS is enabled
    gossip - Feature LA_SSTABLE_FORMAT is enabled
    gossip - Feature MATERIALIZED_VIEWS is enabled
    gossip - Feature MC_SSTABLE_FORMAT is enabled
    gossip - Feature RANGE_TOMBSTONES is enabled
    gossip - Feature ROLES is enabled
    gossip - Feature ROW_LEVEL_REPAIR is enabled
    gossip - Feature SCHEMA_TABLES_V3 is enabled
    gossip - Feature STREAM_WITH_RPC_STREAM is enabled
    gossip - Feature TRUNCATION_TABLE is enabled
    gossip - Feature WRITE_FAILURE_REPLY is enabled
    gossip - Feature XXHASH is enabled

    gossip - Feature CORRECT_COUNTER_ORDER is enabled
    gossip - Feature CORRECT_NON_COMPOUND_RANGE_TOMBSTONES is enabled
    gossip - Feature COUNTERS is enabled
    gossip - Feature DIGEST_MULTIPARTITION_READ is enabled
    gossip - Feature INDEXES is enabled
    gossip - Feature LARGE_PARTITIONS is enabled
    gossip - Feature LA_SSTABLE_FORMAT is enabled
    gossip - Feature MATERIALIZED_VIEWS is enabled
    gossip - Feature MC_SSTABLE_FORMAT is enabled
    gossip - Feature RANGE_TOMBSTONES is enabled
    gossip - Feature ROLES is enabled
    gossip - Feature ROW_LEVEL_REPAIR is enabled
    gossip - Feature SCHEMA_TABLES_V3 is enabled
    gossip - Feature STREAM_WITH_RPC_STREAM is enabled
    gossip - Feature TRUNCATION_TABLE is enabled
    gossip - Feature WRITE_FAILURE_REPLY is enabled
    gossip - Feature XXHASH is enabled
    gossip - InetAddress 127.0.0.2 is now UP, status = NORMAL
2019-03-18 10:56:10 +01:00
Asias He
f32f08c91e gossiper: Remove unused wait_for_feature_on_all_node and wait_for_feature_on_node
Remove unused check_features helper as well.
2019-03-18 10:56:09 +01:00
Asias He
6dbcb2e0c9 gossiper: Remove unused register_feature and unregister_feature
They are not used any more.
2019-03-18 10:56:09 +01:00
Benny Halevy
ecf88d8e2e compaction: fix sstable_window_size calculation is only unit/size is set
If a user that changes the default UNIT from DAYS to HOURS and does not set
the compaction_window_size will endup with a window of 24H instead of 1H.

According to the docs https://docs.scylladb.com/getting-started/compaction/#twcs-options
compaction_window_size should default to a value of 1.

Fixes #4310

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190307131318.13998-1-bhalevy@scylladb.com>
2019-03-18 11:19:18 +02:00
Takuya ASADA
02be95365f reloc/build_rpm.sh: don't use '*' for tar xf argument
It works accidentally but it just expanded by bash to use mached files
in current directory, not correctly recognized by tar.
Need to use full file name instead.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190312172243.5482-2-syuu@scylladb.com>
2019-03-18 11:09:55 +02:00
Takuya ASADA
5b10b6a0ce reloc/build_reloc.sh: enable DPDK
We get following link error when running reloc/build_reloc.sh in dbuild,
need to enable DPDK on Seastar:

g++: error: /usr/lib64/librte_cfgfile.so: No such file or directory
g++: error: /usr/lib64/librte_cmdline.so: No such file or directory
g++: error: /usr/lib64/librte_ethdev.so: No such file or directory
g++: error: /usr/lib64/librte_hash.so: No such file or directory
g++: error: /usr/lib64/librte_kvargs.so: No such file or directory
g++: error: /usr/lib64/librte_mbuf.so: No such file or directory
g++: error: /usr/lib64/librte_eal.so: No such file or directory
g++: error: /usr/lib64/librte_mempool.so: No such file or directory
g++: error: /usr/lib64/librte_mempool_ring.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_bnxt.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_e1000.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_ena.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_enic.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_fm10k.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_qede.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_i40e.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_ixgbe.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_nfp.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_ring.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_sfc_efx.so: No such file or directory
g++: error: /usr/lib64/librte_pmd_vmxnet3_uio.so: No such file or directory
g++: error: /usr/lib64/librte_ring.so: No such file or directory

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190312172243.5482-1-syuu@scylladb.com>
2019-03-18 11:09:55 +02:00
Piotr Sarna
2e05d86cf3 service: reduce number of spawned threads when notifying
Commit 9c544df217 introduced running up/down/join/leave notifications
in threaded context, but spawned a thread for every notification,
while it could be done once for all notifiees.

Reported-by: Avi Kivity <avi@scylladb.com>
Message-Id: <34815d5aa11902c4a052cff38f4c45c45ff919d8.1552897848.git.sarna@scylladb.com>
2019-03-18 10:45:47 +02:00
Avi Kivity
64fa2dd1d2 Merge "gdb: Introduce 'scylla sstables'" from Tomasz
"
Finds all sstables on current shard and prints useful information,
like on-disk and in-memory usage.

Example:

  (gdb) scylla sstables
  (sstables::sstable*) 0x60100034d200: local=1 data_file=9551, in_memory=266192 (bf=400, summary=3072, sm=262096)
  (sstables::sstable*) 0x601000348600: local=1 data_file=1229, in_memory=266192 (bf=400, summary=3072, sm=262096)
  (sstables::sstable*) 0x601000348000: local=1 data_file=4785, in_memory=266192 (bf=400, summary=3072, sm=262096)
  (sstables::sstable*) 0x60100034c600: local=1 data_file=298, in_memory=266192 (bf=400, summary=3072, sm=262096)
  ...
  total (shard-local): count=144, data_file=782839677, in_memory=59774408

Because of the way it finds sstables (bag_sstable_set), doesn't yet support tables using LeveledCompactionStrategy.
"

* 'gdb-scylla-sstables' of github.com:tgrabiec/scylla:
  gdb: Introduce 'scylla sstables'
  gdb: Introduce find_instances()
  gdb: Extract std_unqiue_ptr.get()
  gdb: Add chunked_vector wrapper
  gdb: Add small_vector wrapper
  gdb: Add circular_buffer.size() and circular_buffer.external_memory_footprint()
  gdb: Add wrapper for seastar::lw_shared_ptr
  gdb: Add std_vector.external_memory_footprint()
  gdb: Add wrapper for boost::variant
  gdb: Add wrapper for std::optional
2019-03-17 19:37:44 +02:00
Takuya ASADA
270f9cf9e6 dist/debian: fix installing scyllatop
Since we removed dist/common/bin/scyllatop we are getting a build error
on .deb package build (1bb65a0888).
To fix the error we need to create a symlink for /usr/bin/scyllatop.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190316162105.28855-1-syuu@scylladb.com>
2019-03-17 19:37:44 +02:00
Tomasz Grabiec
05e2c87936 gdb: Introduce 'scylla sstables'
Finds all sstables on current shard and prints useful information,
like on-disk and in-memory usage.

Example:

  (gdb) scylla sstables
  (sstables::sstable*) 0x60100034d200: local=1 data_file=9551, in_memory=266192 (bf=400, summary=3072, sm=262096)
  (sstables::sstable*) 0x601000348600: local=1 data_file=1229, in_memory=266192 (bf=400, summary=3072, sm=262096)
  (sstables::sstable*) 0x601000348000: local=1 data_file=4785, in_memory=266192 (bf=400, summary=3072, sm=262096)
  (sstables::sstable*) 0x60100034c600: local=1 data_file=298, in_memory=266192 (bf=400, summary=3072, sm=262096)
2019-03-15 15:12:48 +01:00
Tomasz Grabiec
929653f51d gdb: Introduce find_instances() 2019-03-15 15:12:48 +01:00
Tomasz Grabiec
fc4952c579 gdb: Extract std_unqiue_ptr.get() 2019-03-15 15:12:48 +01:00
Tomasz Grabiec
e47a5019f2 gdb: Add chunked_vector wrapper 2019-03-15 15:12:47 +01:00
Tomasz Grabiec
a6da71e4da gdb: Add small_vector wrapper 2019-03-15 15:12:47 +01:00
Tomasz Grabiec
0e8589cfdf gdb: Add circular_buffer.size() and circular_buffer.external_memory_footprint() 2019-03-15 15:12:47 +01:00
Tomasz Grabiec
380c6fbdfe gdb: Add wrapper for seastar::lw_shared_ptr 2019-03-15 15:12:47 +01:00
Tomasz Grabiec
93e5e0d644 gdb: Add std_vector.external_memory_footprint() 2019-03-15 15:12:47 +01:00
Tomasz Grabiec
8866b1320a gdb: Add wrapper for boost::variant 2019-03-15 15:12:46 +01:00
Tomasz Grabiec
dd237c32af gdb: Add wrapper for std::optional 2019-03-15 15:12:46 +01:00
Paweł Dziepak
f4f56027bf Merge "Detect partitioner mismatch" from Piotr
"
Refuse to accept SSTables that were created with partitioner
different than the one used by the Scylla server.

Fixes #4331
"

* 'haaawk/4331/v4' of github.com:scylladb/seastar-dev:
  sstables: Add test for sstable::validate_partitioner
  sstables: Add sstable::validate_partitioner and use it
2019-03-15 11:45:10 +00:00
Piotr Jastrzebski
2b0437a147 sstables: Add test for sstable::validate_partitioner
Make sure the exception is thrown when Scylla
tries to load an SSTable created with non-compatible partitioner.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-03-15 10:47:47 +01:00
Piotr Jastrzebski
4aea97f120 sstables: Add sstable::validate_partitioner and use it
Scylla server can't read sstables that were created
with different partitioner than the one being used by Scylla.

We should make sure that Scylla identifies such mismatch
and refuses to use such SSTables.

We can use partitioner information stored in validation metadata
(Statistics.db file) for each SSTable and compare it against
partitioner used by Scylla.

Fixes #4331

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-03-15 10:14:37 +01:00
Rafael Ávila de Espíndola
94c28cfb16 sstable: Wait for future returned by maybe_record_large_cells.
A previous version of the patch that introduced these calls had no
limit on how far behind the large data recording could get, and
maybe_record_large_cells returned null.

The final version switched to a semaphore, but unfortunately these
calls were not updated.

Tests: unit (dev)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190314195856.66387-1-espindola@scylladb.com>
2019-03-14 21:01:37 +01:00
Paweł Dziepak
349601ac32 sstable: pass full length of buffer to vint deserialiser
vint deserialiser can be more performant if it is allowed to do an
overread (i.e. read more memory than the value it is deserialising).
In case of sstable reads those vints are going to be usually in a middle
of a much larger buffer so lets pass the whole length of the buffer and
enable this optimisation.
2019-03-14 13:37:06 +00:00
Paweł Dziepak
552fc0c6b9 vint: optimise deserialisation routine
At the moment, vint deserialisation is using a naive approach, reading
each byte separately. In practice, vints are going to most often appears
inside larger buffers. That means we can read 8-bytes at a time end then
figure out unneded parts and mask them out. This way we avoid a loop and
do less memory loads which are much more expensive than arithmetic
operations (even if they hit the cache).
2019-03-14 13:37:06 +00:00
Paweł Dziepak
57de2c26b3 vint: drop deserialize_type structure
Deserialisation function returns a structure containing both the value
and its length in the input buffer. In the vast majority of the cases
the caller will already know the length and having this structure will
make it harder for the compiler to emit good code, especially if the
function is not inlined.

In practice I've seen the structure causing register pressure problems
that lead to spilling variables to memory.
2019-03-14 13:37:06 +00:00
Paweł Dziepak
6110278439 tests/vint: reduce test dependencies
vint serialisation test doesn't need whole Scylla so lets reduce its
dependencies to improve build times.
2019-03-14 13:37:06 +00:00
Paweł Dziepak
54a079cdb5 tests/perf: add performance test for vint serialisation 2019-03-14 13:37:06 +00:00
Piotr Sarna
9c544df217 service: run notifying code in threaded context
In order to allow yielding when handling endpoint lifecycle changes,
notifiers now run in threaded context.
Implementations which used this assumption before are supplemented
with assertions that they indeed run in seastar::async mode.

Fixes #4317
Message-Id: <45bbaf2d25dac314e4f322a91350705fad8b81ed.1552567666.git.sarna@scylladb.com>
2019-03-14 12:56:53 +00:00
Piotr Sarna
a7602bd2f1 database: add global view update stats
Currently view update metrics are only per-table, but per-table metrics
are not always enabled. In order to be able to see the number of
generated view updates in all cases, global stats are added.

Fixes #4221
Message-Id: <e94c27c530b2d7d262f76d03937e7874d674870a.1552552016.git.sarna@scylladb.com>
2019-03-14 12:04:18 +00:00
Paweł Dziepak
d4d2eb2ed5 Update seastar submodule
* seastar e640314...463d24e (3):
  > Merge 'Handle IOV_MAX limit in posix_file_impl' from Paweł
  > core: remove unneeded 'exceptional future ignored' report
  > tests/perf: support multiple iterations in a single test run
2019-03-13 14:24:58 +00:00
Tomasz Grabiec
2ef9d9c12e Merge "Record large cells to system.large_cells" from Rafael
Issue #4234 asks for a large collection detector. Discussing the issue
Benny pointed out that it is probably better to have a generic large
cell detector as it makes a natural progression on what we already
warn on (large partitions and large rows).

This patch series implements that. It is on top of
shutdown-order-patches-v7 which is currently on next.

With the charges to use a semaphore this patch series might be getting
a bit big. Let me know if I should split it.

* https://github.com/espindola/scylla espindola/large-cells-on-top-of-shutdown-v5:
  db: refactor large data deletion code
  db: Rename (maybe_)?update_large_partitions
  db: refactor a try_record helper
  large_data_handler: assert it is not used after stop()
  db: don't use _stopped directly
  sstables: delete dead error handling code.
  large_data_handler: Remove const from a few functions
  large_data_handler: propagate a future out of stop()
  large_data_handler: Run large data recording in parallel
  Create a system.large_cells table
  db: Record large cells
  Add a test for large cells
2019-03-13 09:44:57 +01:00
Rafael Ávila de Espíndola
f983570ac8 Add a test for large cells
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
63251b66c1 db: Record large cells
Fixes #4234.

Large cells are now recorded in system.large_cells.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
d17083b483 Create a system.large_cells table
This is analogous to the system.large_rows table, but holds individual
cells, so it also needs the column name.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
8b4ae95168 large_data_handler: Run large data recording in parallel
With this changes the futures returned by large_data_handler will not
normally wait for entries to be written to system.large_rows or
system.large_partitions.

We use a semaphore to bound how behind system.large_* table updates
can get.

This should avoid delaying sstables writes in the common case, which
is more relevant once we warn of large cells since the the default
threshold will be just 1MB.

Note that there is no ordering between the various maybe_record_* and
maybe_delete_large_data_entries requests. This means that we can end
up with a stale entry that is only removed once the TTL expires.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
54b856e5e4 large_data_handler: propagate a future out of stop()
stop() will close a semaphore in a followup patch, so it needs to return a
future.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
989ab33507 large_data_handler: Remove const from a few functions
These will use a member semaphore variable in a followup patch, so they
cannot be const.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
0b763ec19b sstables: delete dead error handling code.
maybe_delete_large_data_entries handles exceptions internally, so the
code this patch deletes would never run.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
5fcb3ff2d7 db: don't use _stopped directly
This gives flexibility in how it is implemented.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
a17a936882 large_data_handler: assert it is not used after stop()
This should have been changed in the patch

db: stop the commit log after the tables during shutdown

But unfortunately I missed it then.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:04 -07:00
Rafael Ávila de Espíndola
f3089bf3d1 db: refactor a try_record helper
We had almost identical error handling for large_partitions and
large_rows. Refactor in preparation for large_cells.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:19:02 -07:00
Rafael Ávila de Espíndola
d7f263d334 db: Rename (maybe_)?update_large_partitions
This renames it to record_large_partitions, which matches
record_large_rows. It also changes the signature to be closer to
record_large_rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:16:04 -07:00
Rafael Ávila de Espíndola
f254664fe6 db: refactor large data deletion code
The code for deleting entries from system.large_partitions was almost
a duplicate from the code for deleting entries from system.large_rows.

This patch unifies the two, which also improves the error message when
we fail to delete entries from system.large_partitions.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-12 13:16:04 -07:00
Asias He
b8158dd65d streaming: Get rid of the keep alive timer in streaming
There is no guarantee that rpc streaming makes progress in some time
period. Remove the keep alive timer in streaming to avoid killing the
session when the rpc streaming is just slow.

The keep alive timer is used to close the session in the following case:

n2 (the rpc streaming sender) streams to n1 (the rpc streaming receiver)
kill -9 n2

We need this because we do not kill the session when gossip think a node
is down, because we think the node down might only be temporary
and it is a waste to drop the previous work that has done especially
when the stream session takes long time.

Since in range_streamer, we do not stream all data in a single stream
session, we stream 10% of the data per time, and we have retry logic.
I think it is fine to kill a stream session when gossip thinks a node is
down. This patch changes to close all stream session with the node that
gossip think it is down.
Message-Id: <bdbb9486a533eee25fcaf4a23a946629ba946537.1551773823.git.asias@scylladb.com>
2019-03-12 12:20:28 +01:00
Duarte Nunes
2718c90448 Merge 'Add canceling long-standing view update requests' from Piotr
"
This series allows canceling view update requests when a node is
discovered DOWN. View updates are sent in the background with long
timeout (5 minutes), and in case we discover that the node is
unavailable, there's no point in waiting that long for the request
to finish. What's more, waiting for these requests occurs on shutdown,
which may result in waiting 5 minutes until Scylla properly shuts down,
which is bad for both users and dtests.

This series implements storage_proxy as a lifecycle subscriber,
so it can react to membership changes. It also keeps track of all
"interruptible" writes per endpoint, so once a node is detected as DOWN,
an artificial timeout can be triggered for all aforementioned write
requests.

Fixes #3826
Fixes #3966
Fixes #4028
"

* 'write_hints_for_view_updates_on_shutdown_4' of https://github.com/psarna/scylla:
  service: remove unused stop_hints_manager
  storage_proxy: add drain_on_shutdown implementation
  main: register storage proxy as lifecycle subscriber
  storage_proxy: add endpoint_lifecycle_subscriber interface
  storage_proxy: register view update handlers for view write type
  storage_proxy: add intrusive list of view write handlers
  storage_proxy: add view_update_write_response_handler
2019-03-08 13:34:46 -03:00
Piotr Sarna
ae52b3baa7 tests: fix complex timestamp test flakiness
Complex timestamp tests were ported from dtest and contained a potential
race - rows were updated with TTL 1 and then checked if the row exists
in both base and view replicas in an eventually() loop.
During this loop however, TTL of 1 second might have already passed
and the row could have been deleted from base.
This patch changes the mentioned TTL to 30 seconds, making the tests
extremely unlikely to be flaky.
Message-Id: <6b43fe31850babeaa43465eb771c0af45ee6e80d.1552041571.git.sarna@scylladb.com>
2019-03-08 13:34:27 -03:00
Tomasz Grabiec
eb5506275b Merge "Further enhancements to perf_fast_forward" from Paweł
This series contains several improvements to perf_fast_forward that
either address some of the problems seen in the automated runs or help
understanding the results.

The main problem was that test small-partition-slicing had a preparation
stage disproportionally long compared to the actual testing phase. While
the fragments per second results wasn't affected by that, it restricted
the number of iterations of the test that we were able to run, and the
test which single iterations is short (and more prone to noise) was
executed only four times. This was solved by sharing the preparation
stage with all iterations, thus enabling the test to be run many times
and improving the stability of the results.

Another, improvement is the ability to dump all test results and process
them producing histograms. This allows us to see how the distribution of
particular statistics looks like and if there are some complications.

Refs #4278.

* https://github.com/pdziepak/scylla.git more-perf_fast_forward/v1:
  tests/perf_fast_forward: print number of iterations of each test
  tests/perf_fast_forward: reuse keys in small partition slicing test
  tests/perf_fast_forward: extract json result file writing logic
  tests/perf_fast_forward: add an option to dump all results
  tests/perf_fast_forward: add script for analysing full results
2019-03-07 12:22:13 -03:00
Piotr Sarna
aea4b7ea78 service: remove unused stop_hints_manager
Stopping hints manager now occurs when draining storage proxy
and it shouldn't be executed independently, so it's removed
from external API.
2019-03-07 13:44:06 +01:00
Piotr Sarna
cc806909d7 storage_proxy: add drain_on_shutdown implementation
When storage proxy is shutting down, all interruptible writes
can be timed out in order not to wait for them. Instead, the mechanism
will fall back to storing hints and/or not progressing with view
building.
2019-03-07 13:44:05 +01:00
Piotr Sarna
c61d0ee8aa main: register storage proxy as lifecycle subscriber
In order to be able to act when node joins/leaves, storage proxy
is registered as an endpoint lifecycle subscriber.

Fixes #3826
Fixes #4028
2019-03-07 12:10:40 +01:00
Piotr Sarna
92df1d5a6b storage_proxy: add endpoint_lifecycle_subscriber interface
Storage proxy is able to react to membership changes
in order to cancel long-standing operations for an endpoint.
2019-03-07 12:10:40 +01:00
Piotr Sarna
f9ff97511f storage_proxy: register view update handlers for view write type
View update handlers have a specialized class, so all writes
of type write_type::VIEW are now registered as such.
2019-03-07 12:10:40 +01:00
Piotr Sarna
75ec5fa876 storage_proxy: add intrusive list of view write handlers
In order to be able to iterate over view update write response handlers,
an intrusive list of them is added to storage proxy. This way
iteration can be easily yielded without invalidating operators and all
logic is moved to slow path.
2019-03-07 12:10:40 +01:00
Piotr Sarna
c2048a0758 storage_proxy: add view_update_write_response_handler
View update write response handler inherits from a regular write
response handler, but it's also possible to link it intrusively
in order to be able to induce timeouts on them later.
2019-03-07 12:10:40 +01:00
Paweł Dziepak
0ba7a3c55a tests/perf_fast_forward: add script for analysing full results
perf_fast_forward with flag --dump-all-results reports the results of
every test iteration that was executed. This patch introduces a python
script that can analyse those results (in json format) and present them
in a more human-friendly way.

For now, the only option is to plot histograms of selected statistics.
2019-03-06 15:48:49 +00:00
Paweł Dziepak
4220b90b22 tests/perf_fast_forward: add an option to dump all results
perf_fast_forward runs each test case multiple times and reports a
summary of those results (median, min, max, and median absolute
deviation).

While very convenient the summary may hide some important information
(e.g. the distribution of the results). This patch adds an option to
report results of every single executed iteration.
2019-03-06 15:48:48 +00:00
Paweł Dziepak
55ed8b2472 tests/perf_fast_forward: extract json result file writing logic
We are about to report, depending on flags, both full results as well as
the results summary written now. Most of the logic is going to be
identical.
2019-03-06 15:48:45 +00:00
Paweł Dziepak
daafde21c5 tests/perf_fast_forward: reuse keys in small partition slicing test 2019-03-06 15:48:42 +00:00
Paweł Dziepak
0eb1e570aa tests/perf_fast_forward: print number of iterations of each test 2019-03-06 15:48:38 +00:00
Avi Kivity
0beeb2f721 Merge "implement upgradesstables + scub​" from Calle
"
Fixes #4245

Breaks up "perform_cleanup" in parameterized "rewrite_sstables"
and implements upgrade + scrub in terms of this.

Both run as a "regular" compaction, but ignore the normal criteria
for compaction and select obsolete/all tables.
We also ensure all previous compactions are done so we can guarantee
all tables are rewritten post invocation of command.
"

* 'calle/upgrade_sstables' of github.com:scylladb/seastar-dev:
  api::storage_service: Implement "scrub"
  api/storage_service: Implement "upgradesstables"
  api::storage_service: Add keyspace + tables helper
  compaction_manager: Add perform_sstable_scrub
  compaction_manager: Add perform_sstable_upgrade
  compaction_manager: break out rewrite_sstables from cleanup
  table: parameterize cleanup_sstables
2019-03-06 15:47:26 +02:00
Duarte Nunes
a29ec4be76 Merge 'Update system.large_partitions during shutdown' from Rafael
"
Currently any large partitions found during shutdown are not
recorded. The reason is that the database commit log is already off,
so there is nowhere to record it to.

One possible solution is to have an independent system database. With
that the regular db is shutdown first and writes can continue to the
system db.

That is a pretty big change. It would also not allow us to record
large partitions in any system tables.

This patch series instead tries to stop the commit log later. With
that any large partitions are recorded to the log and moved to a
sstable on the next startup.
"

* 'espindola/shutdown-order-patches-v7' of https://github.com/espindola/scylla:
  db: stop the commit log after the tables during shutdown
  db: stop the compaction manager earlier
  db: Add a stop_database helper
  db: Don't record large partitions in system tables
2019-03-06 10:36:38 -03:00
Calle Wilund
ef1bdebd0a api::storage_service: Implement "scrub" 2019-03-06 13:13:21 +00:00
Calle Wilund
23f4c982ea api/storage_service: Implement "upgradesstables"
Fixes #4245

Implemented as a compation barrier (forcing previous compactions to
finish) + parameterized "cleanup", with sstable list based on
parameters.
2019-03-06 13:13:21 +00:00
Calle Wilund
3b5588dddd api::storage_service: Add keyspace + tables helper
To avoid repeating code to get keyspace + tables
2019-03-06 13:13:21 +00:00
Calle Wilund
c0bb6a4bef compaction_manager: Add perform_sstable_scrub
Suspiciously similar to an unconditional upgrade
2019-03-06 13:13:21 +00:00
Calle Wilund
7585b8c310 compaction_manager: Add perform_sstable_upgrade
Rewrites obsolete/all sstables via compaction
2019-03-06 13:13:21 +00:00
Tomasz Grabiec
889f31fabe Merge "fix slow truncation under flush pressure" from Glauber
Truncating a table is very slow if the system is under pressure. Because
in that case we mostly just want to get rid of the existing data, it
shouldn't take this long. The problem happens because truncate has to
wait for memtable flushes to end, twice. This is regardless of whether
or not the table being truncated has any data.

1. The first time is when we call truncate itself:

if auto_snapshot is enabled, we will flush the contents of this table
first and we are expected to be slow. However, even if auto_snapshot is
disabled we will still do it -- which is a bug -- if the table is marked
as durable. We should just not flush in this case and it is a silly bug.

1. The second time is when we call cf->stop(). Stopping a table will
wait for a flush to finish. At this point, regardless of which path
(Durable or non-durable) we took in the previous step we will have no
more data in the table. However, calling `flush()` still need to acquire
a flush_permit, which means we will wait for whichever memtable is
flushing at that very moment to end.

If the system is under pressure and a memtable flush will take many
seconds, so will truncate.  Even if auto_snapshots are enabled, we
shouldn't have to flush twice. The first flush should already put is in
a state in which the next one is immediate (maybe holding on to the
permit, maybe destroying the memtable_list already at that point ->
since no other memtables should be created).

If auto_snapshots are not enabled, the whole thing should just be
instantaneous.

This patchset fixes that by removing the flush need when !auto_snapshot,
and special casing the flush of an empty table.

Fixes #4294

* git@github.com:glommer/scylla.git slowtruncate-v2:
  database: immediately flush tables with no memtables.
  truncate: do not flush memtables if auto_snapshot is false.
2019-03-06 13:54:58 +01:00
Eliran Sinvani
479131259e auth: prevent failure due to race in tables creation
This commit rewrites the logic of table creation at startup of the auth
mechanism to be race proof. This is done by simply ignoring the
already_exists exception as done in system_distributed_keyspace.
The old creation logic, tested for existance of the column family and
right after called announce_new_column_family with the newly
created table schema. The problem was that it does not prevent
a race since the announcement itself is a fiber and the created table
can still be gossiped from another node, causing the announce
function to throw an already_exists exception that in turn crashes
scylla.
Message-Id: <20190306075016.28131-1-eliransin@scylladb.com>
2019-03-06 13:09:09 +01:00
Rafael Ávila de Espíndola
16ed9a2574 db: stop the commit log after the tables during shutdown
This allows for system.large_partitions to be updated if a large
partition is found while writing the last sstables.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-05 18:04:51 -08:00
Rafael Ávila de Espíndola
a3e1f14134 db: stop the compaction manager earlier
We want to finish all large data logging in stop_system, so stopping
the compaction manager should be the first thing stop_system does.

The make_ready_future<>() will be removed in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-05 18:04:51 -08:00
Rafael Ávila de Espíndola
765d8535f1 db: Add a stop_database helper
This reduces code duplication. A followup patch will add more code to
stop_database.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-05 18:04:45 -08:00
Rafael Ávila de Espíndola
0b86a99592 db: Don't record large partitions in system tables
This will allow us to delay shutdown of all system tables in a uniform
way.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-03-05 17:52:00 -08:00
Tomasz Grabiec
c584f48c32 Merge "transport: sort bound ranges in read reques in order to conform to cql definitions" from Eliran
According to the cql definitions, if no ORDER BY clause is present,
records should be returned ordered by the clustering keys. Since the
backend returns the ranges according to their order of appearance
in the request, the bounds should be sorted before sending it to the
backend. This kind of sorting is needed in queries that generates more
than one bound to be read, examples to such queris are:
1. a SELECT query with an IN clause.
2. a SELECT query on a mixed order tupple of columns (see #2050).
The assumption this commit makes is the correctness of the bounds
list, that is, the bounds are non overlapping. If this wasn't true, multiple
occurences of the same reccord could have returned for certain queries.

Tests:
1. Unit tests release
2. All dtest that requires #2050 and #2029

Fixes #2029
2019-03-05 21:07:15 +01:00
Avi Kivity
3cfbd682ec Merge "Add JSON support to tuples and UDT" from Piotr
"
Fixes #3708

This series adds JSON serialization and deserialization procedures
to tuples and user defined types.

Tests: unit (dev)
"

* 'add_tuple_and_udt_json_support_2' of https://github.com/psarna/scylla:
  tests: add test cases for JSON and UDT
  types: add JSON support to UDT
  tests: add JSON tuple tests
  types: add JSON support for tuples
2019-03-05 20:06:15 +02:00
Glauber Costa
c2c6c71398 truncate: do not flush memtables if auto_snapshot is false.
Right now we flush memtables if the table is durable (which in practice
it almost always is).

We are truncating, so we don't want the data. We should only flush if
auto_snapshot is true.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-03-05 11:22:48 -05:00
Glauber Costa
ed8261a0fe database: immediately flush tables with no memtables.
If a table has no data, it may still take a long time to flush. This is
because before we even try to flush, we need go acquire a permit and
that can take a while if there is a long running flush already queued.

We can special case the situation in which there is no data in any of
the memtables owned by table and return immediately.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-03-05 11:22:48 -05:00
Piotr Sarna
a5c66d5ce1 tests: add test cases for JSON and UDT 2019-03-05 16:25:18 +01:00
Piotr Sarna
ebf0eb92bb types: add JSON support to UDT
User defined types can now be serialized to and deserialized from JSON.

Fixes #3708
2019-03-05 16:08:05 +01:00
Piotr Sarna
c2064d152d tests: add JSON tuple tests 2019-03-05 16:08:05 +01:00
Piotr Sarna
aa0cc8a8a2 types: add JSON support for tuples
Tuples can now be serialized to and deserialized from JSON.

Refs #3708
2019-03-05 16:08:04 +01:00
Piotr Sarna
e9bc2a7912 cql3: fix error message for lack of primary keys in JSON
When any primary key part is not present in INSERT JSON statement,
proper error message will be presented to the client.

Tests: unit (dev) 
Message-Id: <3aa99703523c45056396a0b6d97091da30206dab.1551797502.git.sarna@scylladb.com>
2019-03-05 16:54:46 +02:00
Avi Kivity
256b7d34e2 Update seastar submodule
* seastar ab54765...e640314 (10):
  > net: enable IP_BIND_ADDRESS_NO_PORT before binding a socket during connection
  > core: show address in error message for posix_listen failures
  > fmt: remove submodule
  > tests: fix loopback socket close() to not fail when the peer's side is already closed
  > Merge "Add suffixes to target names" from Jesse
  > temporary_buffer: improve documentation for alignment param requirements
  > docs: Fix dependencies for split tutorial target
  > deleter: prevent early memory free caused by deleter append.
  > doc/tutorial.md: introduce memory allocation foreign_ptr
  > Fix CLI help message (network & DPDK options)

Toolchain and configure.py updated for fmt submodule removal.
2019-03-05 15:51:38 +02:00
Botond Dénes
817490cda1 tests/multishard_mutation_query_test: fuzzy_test: replace BOOST_WARN_* with logger::debug()
fuzzy_test performs some checks that are expected to fail and whoose
failure does not influence the outcome of the test. For this it uses the
`BOOT_WARN_*` family of macros. These will just log a warning when their
predicate fails. This can however confuse someone looking at the logs
trying to determine the cause of a failure. Since these checks are
performed primarly to provide an aid in debugging failures, replace them
with a conditional debug-level log message.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f550a9d9ab1b5b4aeb4f81860cbd3d924fc86898.1551792035.git.bdenes@scylladb.com>
2019-03-05 15:24:53 +02:00
Botond Dénes
0ed0d3297a tests/multishard_mutation_query_test: test_abandoned_read: reduce querier TTL
The `test_abandoned_read` verifies that an abandoned read does a proper
cleanup. One of the things checked is that after the querier TTL
expires, the saved queriers are cleaned-up. This check however had a
very tight timing. The TTL was 2s and the test waited 2s before it did
the check, which is wrapped in an `eventually_true()` (max +1s).
The TTL timer scans the queriers with a period of TTL/2 so a querier
can live 1.5*TTL time. This means that the 2s + 1s wait time is just on
the limit and with some bad luck (and a slow machine) it can fail.
Reduce the TTL in this test to 1s to relax the dependence on timing.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <ed0d45b5a07960b83b391d289cade9b9f60c7785.1551787638.git.bdenes@scylladb.com>
2019-03-05 14:10:04 +02:00
Eliran Sinvani
eeb0845be0 unit test: validate order instead of just content in the mixed order token test
This change ammends on the functionality of the result generation,
it changes the behaviour to return the expected results vector sorted
in the expected order of appearance in the result set. Then the
result set is validated for both, content and also order.

Tests: unit tests (Release)
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2019-03-05 13:51:17 +02:00
Eliran Sinvani
13284d9272 unit test: change IN clause tests to validate with ordering_spec
Whenever a query with an IN clause on clustering keys is executed,
assuming only one partition, the rows are ordered according to the
clustering keys. This commit adds the order validation to the content
validation whenever possible (which means removing the
ignore order part).

Tests: unit tests (Release)
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2019-03-05 13:51:17 +02:00
Eliran Sinvani
7df0c873aa transport: sort bound ranges in read reques in order to conform to cql definitions
According to the cql definitions, if no ORDER BY clause is present,
records should be returned ordered by the clustering keys. Since the
backend returns the ranges according to their order of appearance
in the request, the bounds should be sorted before sending it to the
backend. This kind of sorting is needed in queries that generates more
than one bound to be read, examples to such queris are:
1. a SELECT query with an IN clause.
2. a SELECT query on a mixed order tupple of columns (see #2050).
The assumption this commit makes is the correctness of the bounds
list, that is, the bounds are non overlapping. If this wasn't true, multiple
occurences of the same reccord could have returned for certain queries.

Tests:
1. Unit tests release
2. All dtest that requires #2050 and #2029

Fixes #2029

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2019-03-05 13:51:17 +02:00
Avi Kivity
5993c05a1b Merge "partitioner: Futurize split_range_to_single_shard" from Asias
"
Futurize split_range_to_single_shard to fix reactor stall.

Fixes: #3846
"

* tag 'asias/split_range_to_single_shard/v4' of github.com:scylladb/seastar-dev:
  partitioner: Futurize split_range_to_single_shard
  tests: Use SEASTAR_THREAD_TEST_CASE for partitioner_test.cc
2019-03-05 11:25:36 +02:00
Asias He
58fae5f4c1 partitioner: Futurize split_range_to_single_shard
We saw reactor stalls when closing SSTables. The backtrace looks like:

Oct 12 19:00:51 dell-1 scylla[435045]: Backtrace:[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /home/sylla/scylla/seastar/util/backtrace.hh:56
seastar::backtrace_buffer::append_backtrace() at /home/sylla/scylla/seastar/core/reactor.cc:410
 (inlined by) print_with_backtrace at /home/sylla/scylla/seastar/core/reactor.cc:431
seastar::reactor::block_notifier(int) at /home/sylla/scylla/seastar/core/reactor.cc:749
_L_unlock_13 at funlockfile.c:?
std::experimental::fundamentals_v1::_Optional_base<range_bound<dht::ring_position>, true>::_Optional_base(std::experimental::fundamentals_v1::_Optional_base<range_bound<dht::ring_position>, true>&&) at /opt/scylladb/include/c++/7/experimental/optional:247
 (inlined by) std::experimental::fundamentals_v1::optional<range_bound<dht::ring_position> >::optional(std::experimental::fundamentals_v1::optional<range_bound<dht::ring_position> >&&) at /opt/scylladb/include/c++/7/experimental/optional:493
 (inlined by) wrapping_range<dht::ring_position>::wrapping_range(wrapping_range<dht::ring_position>&&) at /home/sylla/scylla/./range.hh:61
 (inlined by) nonwrapping_range<dht::ring_position>::nonwrapping_range(nonwrapping_range<dht::ring_position>&&) at /home/sylla/scylla/./range.hh:430
 (inlined by) void __gnu_cxx::new_allocator<nonwrapping_range<dht::ring_position> >::construct<nonwrapping_range<dht::ring_position>, nonwrapping_range<dht::ring_position> >(nonwrapping_range<dht::ring_position>*, nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/ext/new_allocator.h:136
 (inlined by) void std::allocator_traits<std::allocator<nonwrapping_range<dht::ring_position> > >::construct<nonwrapping_range<dht::ring_position>, nonwrapping_range<dht::ring_position> >(std::allocator<nonwrapping_range<dht::ring_position> >&, nonwrapping_range<dht::ring_position>*, nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/bits/alloc_traits.h:475
 (inlined by) nonwrapping_range<dht::ring_position>& std::deque<nonwrapping_range<dht::ring_position>, std::allocator<nonwrapping_range<dht::ring_position> > >::emplace_back<nonwrapping_range<dht::ring_position> >(nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/bits/deque.tcc:167
 (inlined by) std::deque<nonwrapping_range<dht::ring_position>, std::allocator<nonwrapping_range<dht::ring_position> > >::push_back(nonwrapping_range<dht::ring_position>&&) at /opt/scylladb/include/c++/7/bits/stl_deque.h:1558
 (inlined by) dht::split_range_to_single_shard(dht::i_partitioner const&, schema const&, nonwrapping_range<dht::ring_position> const&, unsigned int) at /home/sylla/scylla/dht/i_partitioner.cc:454
dht::split_range_to_single_shard(schema const&, nonwrapping_range<dht::ring_position> const&, unsigned int) at /home/sylla/scylla/dht/i_partitioner.cc:464
create_sharding_metadata at /home/sylla/scylla/sstables/sstables.cc:2075
 (inlined by) sstables::sstable::write_scylla_metadata(seastar::io_priority_class const&, unsigned int, sstables::sstable_enabled_features) at /home/sylla/scylla/sstables/sstables.cc:2435
sstables::sstable_writer_m::consume_end_of_stream() at /home/sylla/scylla/sstables/sstables.cc:3483
sstables::compaction::finish_new_sstable(std::experimental::fundamentals_v1::optional<sstables::sstable_writer>&, seastar::lw_shared_ptr<sstables::sstable>&) at /home/sylla/scylla/sstables/compaction.cc:338
 (inlined by) sstables::regular_compaction::stop_sstable_writer() at /home/sylla/scylla/sstables/compaction.cc:579
 (inlined by) sstables::regular_compaction::finish_sstable_writer() at /home/sylla/scylla/sstables/compaction.cc:585
sstables::compacting_sstable_writer::consume_end_of_stream() at /home/sylla/scylla/sstables/compaction.cc:494
 (inlined by) auto compact_mutation_state<(emit_only_live_rows)0, (compact_for_sstables)1>::consume_end_of_stream<sstables::compacting_sstable_writer>(sstables::compacting_sstable_writer&) at /home/sylla/scylla/./mutation_compactor.hh:292
 (inlined by) compact_mutation<(emit_only_live_rows)0, (compact_for_sstables)1, sstables::compacting_sstable_writer>::consume_end_of_stream() at /home/sylla/scylla/./mutation_compactor.hh:397
 (inlined by) stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >::consume_end_of_stream() at /home/sylla/scylla/./mutation_reader.hh:366
 (inlined by) auto flat_mutation_reader::impl::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)> >(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)>, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /home/sylla/scylla/./flat_mutation_reader.hh:288
 (inlined by) auto flat_mutation_reader::consume_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)> >(stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (dht::decorated_key const&)>, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >) at /home/sylla/scylla/./flat_mutation_reader.hh:370
 (inlined by) operator() at /home/sylla/scylla/sstables/compaction.cc:757
 (inlined by) apply at /home/sylla/scylla/seastar/core/apply.hh:35
 (inlined by) apply<sstables::compaction::run(std::unique_ptr<sstables::compaction>)::<lambda()> > at /home/sylla/scylla/seastar/core/apply.hh:43
 (inlined by) apply<sstables::compaction::run(std::unique_ptr<sstables::compaction>)::<lambda()> > at /home/sylla/scylla/seastar/core/future.hh:1309
 (inlined by) operator() at /home/sylla/scylla/./seastar/core/thread.hh:315
 (inlined by) _M_invoke at /opt/scylladb/include/c++/7/bits/std_function.h:316
std::function<void ()>::operator()() const at /opt/scylladb/include/c++/7/bits/std_function.h:706
 (inlined by) seastar::thread_context::main() at /home/sylla/scylla/seastar/core/thread.cc:313

The call chain is:

sstable_writer_k_l::consume_end_of_stream and mc::writer::consume_end_of_stream
-> sstable::write_scylla_metadata -> create_sharding_metadata -> dht::split_range_to_single_shard

Since sstable writer assumes a thread context. We can futurize dht::split_range_to_single_shard.

Fixes: #3846
Tests: dtest + build/dev/tests/partitioner_test
2019-03-05 17:21:27 +08:00
Benny Halevy
1021eb29c9 distributed_loader: fix old format counters exception
table::load_sstable: fix missing arg in old format counters exception

Properly catch and log the exception in load_new_sstables.
Abort when the exception is caught to keep current behavior.

Seen with migration_test:TestMigration_with_2_1_x.migrate_sstable_with_counter_test
without enable_dangerous_direct_import_of_cassandra_counters.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190301091235.2914-1-bhalevy@scylladb.com>
2019-03-04 17:36:09 +01:00
Avi Kivity
026821fb59 Merge "Record large rows in the system.large_rows table" from Rafael
"
This fixes #3988.

We already have a system.large_partitions, but only a warning for
large rows. These patches close the gap by also recording large rows
into a new system.large_rows.
"

* 'espindola/large-row-add-table-v6' of https://github.com/espindola/scylla:
  Add a testcase for large rows
  Populate system.large_rows.
  Create a system.large_rows table
  Extract a key_to_str helper
  Don't call record_large_rows if stopped
  Add a delete_large_rows_entries method to large_data_handler
  db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void
  Rename maybe_delete_large_partitions_entry
  Rename log_large_row to record_large_rows
  Rename maybe_log_large_row to maybe_record_large_rows
2019-03-04 18:31:10 +02:00
Avi Kivity
da0a25859b Merge "Improvements to commitlog logs" from Paweł
"
This series contains minor improvements to commitlog log messages that
have helped investigating #4231, but are not specific to that bug.
"

* tag 'improve-commitlog-logs/v1' of https://github.com/pdziepak/scylla:
  commitlog: use consistent chunk offsets in logs
  commitlog: provide more information in logs
  commitlog: remove unnecessary comment
2019-03-04 14:52:46 +02:00
Paweł Dziepak
00b33de25c commitlog: use consistent chunk offsets in logs
Logs in commitlog writer use offset in the file of the chunk header to
identify chunks. However, the replayer is using offset after the header
for the same purpose. This causes unnecessary confusion suggesting that
the replayer is reading at the wrong position.

This patch changes the replayer so that it reports chunk header offsets.
2019-03-04 12:15:50 +00:00
Paweł Dziepak
813b00a1a6 commitlog: provide more information in logs
This commits adds some more information to the logs. Motivated, by
experiences with investigating #4231.

 * size of each write
 * position of each write
 * log message for final write
2019-03-04 12:15:50 +00:00
Paweł Dziepak
1a657e9c5f commitlog: remove unnecessary comment 2019-03-04 12:15:50 +00:00
Avi Kivity
d95dec22d9 Merge "Fix commitlog chunks overwriting each other" from Paweł
"
This series fixes a problem in the commitlog cycle() function that
confused in-memory and on-disk size of chunks it wrote to disk. The
former was used to decide how much data needs to be actually written,
and the latter was used to compute the offset of the next chunk. If two
chunk writes happened concurrently one the one positioned earlier in
the file could corrupt the header of the next one.

Fixes #4231.

Tests: unit(dev), dtest(commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup,test_commitlog_replay_with_alter_table)
"

* tag 'fix-commitlog-cycle/v1' of https://github.com/pdziepak/scylla:
  commitlog: write the correct buffer size
  utils/fragmented_temporary_buffer_view: add remove suffix
2019-03-04 14:14:32 +02:00
Tomasz Grabiec
58e7ad20eb sstable/compaction: Use correct schema in the writing consumer
Introduced in 2a437ab427.

regular_compaction::select_sstable_writer() creates the sstable writer
when the first partition is consumed from the combined mutation
fragment stream. It gets the schema directly from the table
object. That may be a different schema than the one used by the
readers if there was a concurrent schema alter duringthat small time
window. As a result, the writing consumer attached to readers will
interpret fragments using the wrong version of the schema.

One effect of this is storing values of some columns under a different
column.

This patch replaces all column_family::schema() accesses with accesses
to the _schema memeber which is obtained once per compaction and is
the same schema which readers use.

Fixes #4304.

Tests:

  - manual tests with hard-coded schema change injection to reproduce the bug
  - build/dev/scylla boot
  - tests/sstable_mutation_test

Message-Id: <1551698056-23386-1-git-send-email-tgrabiec@scylladb.com>
2019-03-04 13:27:19 +02:00
Paweł Dziepak
434023425d commitlog: write the correct buffer size
Commitlog files contain multiple chunks. Each chunk starts as a single
(possibly, fragmented buffer). The size of that buffer in memory may be
larger than the size in the file.

cycle() was incorrectly using the in-memory size to write the whole
buffer to the file. That sometimes caused data corruption, since a
smaller on-file size was used to compute the offset of the next chunk
and there could be multiple chunk writes happening at the same time.

This patch solves the issue by ensuring that only the actual on-file
size of the chunk is written.
2019-03-04 10:25:48 +00:00
Paweł Dziepak
ca8d1025c0 utils/fragmented_temporary_buffer_view: add remove suffix
This patch adds fragmented_temporary_buffer_view::remove_suffix(). It is
also necessary to adjust remove_prefix() since now the total size of all
fragments may be larger than the size of the view if both those
operations are performed.
2019-03-04 10:23:45 +00:00
Asias He
3861f538dc tests: Use SEASTAR_THREAD_TEST_CASE for partitioner_test.cc
We are going to convert split_range_to_single_shard to return a future.
2019-03-04 09:41:09 +08:00
Avi Kivity
8f71e7ffd4 Merge "auth: Prevent disallowed roles from logging in" from Jesse
"
This series heavily refactors `auth_test` in anticipation of
the last patch, which fixes a bug and which should be backported.

Branches: branch-3.0, branch-2.3
"

Fixes #4284

* 'jhk/check_can_login/v2' of https://github.com/hakuch/scylla:
  auth: Reject logins from disallowed roles
  tests: Restrict the scope of a variable
  tests: Simplify boolean assertions in `auth_test`
  tests: Abstract out repeated assertion checking
  tests: Do not use the `auth` namespace
  tests: Validate authentication correctly
  tests: Ensure test roles are created and dropped
  tests: Use `static` variables in `auth_test`
  tests: Remove non-useful test
2019-03-02 17:13:06 +02:00
Asias He
a949ccee82 repair: Reject combination of -dc and -hosts options
4 nodes in the cluster
n1, n2 in dc1
n3, n4 in dc2

dc1 RF=2, dc2 RF=2.

If we run

    nodetool repair -hosts 127.0.0.1,127.0.03 -dc "dc1,dc2" multi

on n1.

The -hosts option will be ignored and only the -dc option
will be used to choose which hosts to repair. In this case, n1 to n4
will be repaired.

If user wants to select specific hosts to repair with, there is no need
to specify the -dc option. Use the -hosts option is enough.

Reject the combination and not to surprise the user.

In https://issues.apache.org/jira/browse/CASSANDRA-9876, the same logic
is introduced as well.

Refs #3836
Message-Id: <e95ac1099f98dd53bb9d6534316005ea3577e639.1551406529.git.asias@scylladb.com>
2019-03-02 16:42:29 +02:00
Juliana Oliveira
6322293263 dist/docker: add ssh server
Scylla Manager communicates through SSH, so this patch adds SSH server
to Scylla's docker image in order for it to be configurable by Scylla
Manager.

Message-Id: <20190301161428.GA12148@shenzou.localdomain>
2019-03-01 19:11:35 +02:00
Avi Kivity
41078de096 tools: toolchain: update image for gcc-8.3.1-2.fc29.x86_64
tests: unit (debug, dev, release)
2019-03-01 16:42:18 +02:00
Duarte Nunes
44966d0a66 Merge 'Fix view update generation optimizations' from Piotr
"
This series aims to fix inconsistencies in recent view update generation series (435447998).

First of all, it checks view row marker liveness instead of that of a base row marker
when deciding if optimizations can be applied or not.

Secondly, tests based on creating mutations directly are removed. Instead:
 - dtest case which detected inconsistencies in previous series is ported to be a unit test
 - the above case is also expanded to cover views with regular base column in their key
 - additional test for TTL and timestamps is added and it's based on CQL

Tests: unit (dev)
dtest: materialized_views_test.TestMaterializedViews.test_no_base_column_in_view_pk_complex_timestamp_without_flush

Fixes: #4271
"

* 'fix_virtual_columns_liveness_checks_in_update_optimization_5' of https://github.com/psarna/scylla:
  tests: add view update optimization case for TTL
  database: add view_stats getter
  tests: port complex timestamp view test from dtest
  db,view: fix virtual columns liveness checks
  tests: remove update generating test case
2019-03-01 10:58:39 -03:00
Jesse Haber-Kucharsky
a139afc30c auth: Reject logins from disallowed roles
When the `LOGIN` option for a role is set to `false`, Scylla should not
permit the role to log in.

Fixes #4284

Tests: unit (debug)
2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky
320b4a7b99 tests: Restrict the scope of a variable 2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky
f8764a12e6 tests: Simplify boolean assertions in auth_test 2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky
879217ccaf tests: Abstract out repeated assertion checking 2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky
3c8eeb0e86 tests: Do not use the auth namespace 2019-02-28 15:02:53 -05:00
Jesse Haber-Kucharsky
afed9c7bee tests: Validate authentication correctly
There are additional validation steps that the server executes in
addition to simply invoking the authenticator, so we adapt the tests to
also perform that validation.

We also eliminate lots of code duplication.
2019-02-28 15:01:14 -05:00
Jesse Haber-Kucharsky
baefde0f6c tests: Ensure test roles are created and dropped
Since the role manager and authenticator work in tandem, the test cases
should use the wrapper for `auth::service` to create and drop users
instead of just doing it through the authenticator.
2019-02-28 15:00:20 -05:00
Jesse Haber-Kucharsky
fd88d59ad9 tests: Use static variables in auth_test
This way, we avoid copies and alleviate resource-management concerns.
2019-02-28 14:59:38 -05:00
Jesse Haber-Kucharsky
f274982522 tests: Remove non-useful test
Password handling is verified in its own test suite, and this test not
only makes a number of assumptions about implementation details, but
also tries to verify a hashing scheme (bcrypt) which is not supported on
most Linux distributions.
2019-02-28 14:58:27 -05:00
Avi Kivity
7c968f4a9e build: move XXH_PRIVATE_API and SEASTAR_TESTING_MAIN non-mode-specific
These defines are global, so they can be in the mode-agnostic cxxflags
rather than the mode-specific cxxflags_{mode}.
Message-Id: <20190228081247.20116-1-avi@scylladb.com>
2019-02-28 09:51:02 +00:00
Piotr Sarna
032f8e2893 tests: add view update optimization case for TTL
This test case checks whether redundant updates are omitted
and the essential ones are still generated.
2019-02-28 10:47:20 +01:00
Piotr Sarna
67e63d4dd7 database: add view_stats getter
It will be used for testing purposes
2019-02-28 10:47:20 +01:00
Piotr Sarna
09b8d2e9d6 tests: port complex timestamp view test from dtest
This test was useful in discovering corner cases for TTLs of virtual
columns, so it's ported to unit test suite from dtest.

The test is also extended with a mirrored case for base regular column
that *is* included in view pk.
2019-02-28 10:47:20 +01:00
Piotr Sarna
5f85a7a821 db,view: fix virtual columns liveness checks
When looking for optimization paths, columns selected in a view
are checked against multiple conditions - unfortunately virtual
columns were erroneously skipped from that check, which resulted
in ignoring their TTLs. That can lead to overoptimizing
and not including vital liveness info into view rows,
which can then result in row disappearing too early.
2019-02-28 10:47:19 +01:00
Piotr Sarna
b963543762 tests: remove update generating test case
This test case should have been based on CQL instead of creating
artificial update scenarios. It also contains invalid cases
regarding base and view row marker, so it's removed here
and replaced with CQL-based test in this same series.
2019-02-28 10:40:47 +01:00
Avi Kivity
20eadb2c39 relocatable-package: package and redirect gnutls configuration
gnutls requires a configuration file, and the configuration file must match
the one used by the library. Since we ship our own version of the library with
the relocatable package, we must also ship the configuration file.

Luckily, it is possible to override the location of the configuration file via
an environment variable, so all we need to do is to copy the file to the archive
and provide the environment variable in the thunk that adjusts the library path.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190227110529.14146-1-avi@scylladb.com>
2019-02-28 10:57:32 +02:00
Avi Kivity
4022a919f6 test: allocate at least one logical core per unit test
Currently, we only allocate memory for concurrent unit test runs. This can cause
CPU overcommit when running test.py on machines with a log of memory but few cores.
This overcommit can cause timeouts in tests that are time-sensitive (bad practice,
but can happen) and makes the desktop sluggish.

Improve by allocating at least one logical core per running test.

Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190227132516.22147-1-avi@scylladb.com>
2019-02-28 10:34:33 +02:00
Dan Yasny
6dbb48a12a node_health_check: collect scylla.d contents with node_health_check
We are missing data for CPU conf files and potentially other
information when collecting node data.

Fixes #4094

Message-Id: <20190225204727.20805-5-dyasny@scylladb.com>
2019-02-28 10:23:19 +02:00
Dan Yasny
9055e7a49e node_health_check: Add redhat-release to health check if present
Collect /etc/redhat-release as well as os-release from relevant
hosts. The problem with os-release is that it doesn't contain the
minor version of the EL OS family. Since this is only present in
Red Hat distributions and derivatives, it will not be collected
in Debian derivatives.

Another approach is to use lsb_release -a but it will not provide
anything more useful than os-release on Debian and lsb needs to be
installed on EL derivatives first.

Fixes #4093

Message-Id: <20190225204727.20805-4-dyasny@scylladb.com>
2019-02-28 10:23:12 +02:00
Dan Yasny
2f26390f52 node_health_check: Use clear hostname instead of -i for filenames and report names
Hostname -i produces a garbled output on new systems with ipv6
enabled, better to use the clean hostname instead, for the file
names.

Message-Id: <20190225204727.20805-3-dyasny@scylladb.com>
2019-02-28 10:23:06 +02:00
Dan Yasny
f483c594ee node_health_check: Detect the address for the CQL (port 9042) listener and use it
The script relies on hostname -i for host address, which can be
wrong in some systems. This patch checks for where the defined
CQL_PORT is listening, and uses the correct IP address instead.

Message-Id: <20190225204727.20805-2-dyasny@scylladb.com>
2019-02-28 10:22:58 +02:00
Avi Kivity
632c7c303a Merge "auth: Restructure SASL code" from Jesse
"
This series restructures the SASL code that was previously internal
to the `password_authenticator` so that it can be used in other contexts.
"

* 'jhk/restructure_sasl/v1' of https://github.com/hakuch/scylla:
  auth: Rename SASL challenge class for "PLAIN"
  auth: Make a ctor `explicit`
  auth: Move `sasl_challenge` to its own file
  auth: Decouple SASL code from its parent class
2019-02-28 10:19:41 +02:00
Jesse Haber-Kucharsky
f2d92f81e8 auth: Report a more specific error with bad creds
Without this change, the resulting error message for an invalid password
is "authentication failed".

With this change, we report "Username and/or password are incorrect".

Fixes #4285

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <32d00be8af5075ee10d2c14f85b76843a9adac10.1551306914.git.jhaberku@scylladb.com>
2019-02-28 09:53:57 +02:00
Jesse Haber-Kucharsky
3d883e8cf2 auth: Rename SASL challenge class for "PLAIN" 2019-02-27 18:36:58 -05:00
Jesse Haber-Kucharsky
0c955b7992 auth: Make a ctor explicit 2019-02-27 18:36:58 -05:00
Jesse Haber-Kucharsky
dc41f1098b auth: Move sasl_challenge to its own file
This will allow for other authenticators other than
`password_authenticator` from making use of the PLAIN SASL
authentication code.
2019-02-27 18:36:52 -05:00
Jesse Haber-Kucharsky
2d59fa6be9 auth: Decouple SASL code from its parent class
This way, we can (in the future) use this implementation of the SASL
"PLAIN" mechanism in other contexts other than `password_authenticator`.
2019-02-27 18:11:31 -05:00
Avi Kivity
88322086cb Merge "Add fuzzer-type unit test for range scans" from Botond
"
This series adds a fuzzer-type unit test for range scans, which
generates a semi-random dataset and executes semi-random range scans
against it, validating the result.
This test aims to cover a wide range of corner cases with the help of
randomness. Data and queries against it are generated in such a way that
various corner cases and their combinations are likely to be covered.

The infrastructure under range-scans have gone under massive changes in
the last year, growing in complexity and scope. The correctness of range
scans is critical for the correct functioning of any Scylla cluster, and
while the current unit tests served well in detecting any major problems
(mostly while developing), they are too simplistic and can only be
relied on to check the correctness of the basic functionality. This test
aims to extend coverage drastically, testing cases that the author of
the range-scan code or that of the existing unit tests didn't even think
exists, by relying on some randomness.

Fixes: #3954 (deprecates really)
"

* 'more-extensive-range-scan-unit-tests/v2' of https://github.com/denesb/scylla:
  tests/multishard_mutation_query_test: add fuzzy test
  tests/multishard_mutation_query_test: refactor read_all_partitions_with_paged_scan()
  tests/test_table: add advanced `create_test_table()` overload
  tests/test_table: make `create_test_table()` customizable
  query: add trim_clustering_row_ranges_to()
  tests/test_table: add keyspace and table name params
  tests/test_table: s/create_test_cf/create_test_table/
  tests: move create_test_cf() to tests/test_table.{hh,cc}
  tests/multishard_mutation_query_test: drop many partition test
  tests/multishard_mutation_query_test: drop range tombstone test
2019-02-27 17:26:53 +02:00
Avi Kivity
cc2f9841c4 Merge "Simplify -g and -gz checks in configure.py" from Rafael
* 'simplify-g-gz-check-v2' of https://github.com/espindola/scylla:
  Assume -gz is always available
  Assume -g is always available
2019-02-27 17:19:37 +02:00
Duarte Nunes
871790a340 Merge 'Hide virtual columns write time and ttl from the user' from Piotr
"
This miniseries hides virtual columns's writetime and ttl
from the user.

Tests: unit (dev)

Fixes #4288
"

* 'hide_virtual_columns_writetime_and_ttl_2' of https://github.com/psarna/scylla:
  tests: add test for hiding virtual columns from WRITETIME
  cql3: hide virtual columns from WRITETIME() and TTL()
  schema: add column_definition::is_hidden_from_cql
2019-02-27 14:36:08 +00:00
Calle Wilund
93602ecee3 compaction_manager: break out rewrite_sstables from cleanup
Allowing additional behaviour control. Such as which tables,
and whether to actually lock ourselves out as a "cleanup".
2019-02-27 14:25:31 +00:00
Calle Wilund
7fb6bbe68c table: parameterize cleanup_sstables
To allow using the logic for one-sstable-at-a-time compaction (i.e.
rewrite) of sstables without the "normal" cleanup logic and partition
selection.
2019-02-27 14:25:31 +00:00
Piotr Sarna
09eb0429ce tests: add test for hiding virtual columns from WRITETIME
Visibility checks for virtual columns' WRITETIME and TTL
are added.
2019-02-27 15:08:16 +01:00
Piotr Sarna
af39787bf0 cql3: hide virtual columns from WRITETIME() and TTL()
Virtual columns should not be visible to the user,
so they are now hidden not only from directly selecting them,
but also via WRITETIME() and TTL() keywords.

Fixes #4288
2019-02-27 15:08:15 +01:00
Piotr Sarna
b0ab4c28cf schema: add column_definition::is_hidden_from_cql
Right now the only columns hidden from CQL are view virtual columns,
but in case of expanding this set, a helper function is provided.
2019-02-27 15:07:54 +01:00
Avi Kivity
d189e12438 tests: database_test: fix misaligned dma write
test_distributed_loader_with_pending_delete issues a dma write, but violates
the unwritten contract to temporary_buffer::aligned(), which requires that
size be a multiple of alignment. As a result the test fails spuriously.

Instead of playing with the alignment, rewrite that snippet to use the
easier-to-use make_file_output_stream().

Introduced in 1ba88b709f.
Branches: master.
Message-Id: <20190226181850.3074-1-avi@scylladb.com>
2019-02-27 09:00:31 +01:00
Rafael Ávila de Espíndola
d9e0b47d53 Add a testcase for large rows
Tests: unit (release)

Fixes #3988.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:56:50 -08:00
Rafael Ávila de Espíndola
25f81cf3e3 Populate system.large_rows.
It now records large rows when they are first written to an sstable
and removes them when the sstable is deleted.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:56:42 -08:00
Rafael Ávila de Espíndola
66d8a0cf93 Create a system.large_rows table
This is analogous to the system.large_partitions table, but holds
individual rows, so it also needs the clustering key of the large
rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
da4c0da78a Extract a key_to_str helper
It will be used in more places in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
b7fd03d0fd Don't call record_large_rows if stopped
The implementations large_data_handler should only be called if
large_data_handler hasn't been stopped yet.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
0c401f56f8 Add a delete_large_rows_entries method to large_data_handler
This will be responsible for removing large rows from
system.large_rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
81a21ea425 db::large_data_handler::(maybe_)?record_large_rows: Return future<> instead of void
These functions will record into tables in a followup patch, so they
will need to return a future.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
d4c001cba8 Rename maybe_delete_large_partitions_entry
It will also delete large rows, so rename it to
maybe_delete_large_data_entries.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
e9a13aff90 Rename log_large_row to record_large_rows
It will also record into a table in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
6fb7066755 Rename maybe_log_large_row to maybe_record_large_rows
It will also record into a table in a followup patch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 15:46:21 -08:00
Rafael Ávila de Espíndola
a586ac209a Assume -gz is always available
It is available since clang 5 and gcc 5.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 09:57:26 -08:00
Rafael Ávila de Espíndola
054078b6af Assume -g is always available
From the log it looks like these checks were added in 2014 because of
a broken clang.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-26 09:57:26 -08:00
Rafael Ávila de Espíndola
87106ea5e2 Improve the build mode documentation
With this patch HACKING suggest using just ./configure.py and passing
the mode to ninja. It also expands on the characteristics of each mode
and mentions the dev mode.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190208020444.19145-1-espindola@scylladb.com>
2019-02-26 19:54:50 +02:00
Nadav Har'El
da54d0fc7d Materialized views: fix accidental zeroing of flow-control delay
The materialized-views flow control carefully calculates an amount of
microseconds to delay a client to slow it down to the desired rate -
but then a typo (std::min instead of std::max) causes this delay to
be zeroed, which in effect completely nullifies the flow control
algorithm.

Before this fix, experiments suggested that view flow control was
not having any effect and view backlog not bounded at all. After this
fix, we can see the flow control having its desired effect, and the
view backlog converging.

Fixes #4143.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190226161452.498-1-nyh@scylladb.com>
2019-02-26 18:22:18 +02:00
Tomasz Grabiec
1a63a313c8 Merge "repair: Rename names to be consistent with rpc verb
" from Asias

Some of the function names are not updated after we change the rpc verb
names. Rename them to make them consistent with the rpc verb names.

* seastar-dev.git asias/row_level_repair_rename_consistent_with_rpc_verb/v1:
  repair: Rename request_sync_boundary to get_sync_boundary
  repair: Rename request_full_row_hashes to get_full_row_hashes
  repair: Rename request_combined_row_hash to get_combined_row_hash
  repair: Rename request_row_diff to get_row_diff
  repair: Rename send_row_diff to put_row_diff
  repair: Update function name in docs/row_level_repair.md
2019-02-26 13:01:36 +01:00
Tomasz Grabiec
b06aac4fdb Merge "Fix temporary spurious schema version mismatch when nodes are restarted" from Asias
Fixes: #4148
Fixes: #4258

Tests: resharding_test.py:reshardingtest_nodes4_with_sizetieredcompactionstrategy.resharding_by_smp_increase_test

* seastar-dev.git asias/fix_schema_mismatch_when_nodes_restarts/v1:
  database: Add update_schema_version and announce_schema_version
  storage_service: Add application_state::SCHEMA when gossip starts
2019-02-26 12:55:52 +01:00
Avi Kivity
5f94bc902a transport: add option to disable shard-aware drivers
The shard-aware drivers can cause a huge amount of connections to be created
when there are tens of thousands of clients. While normally the shard-aware
drivers are beneficial, in those cases they can consume too much memory.

Provide an option to disable shard awareness from the server (it is likely to
be easier to do this on the server than to reprovision those thousands of
clients).

Tests: manual test with wireshark.
Message-Id: <20190223173331.24424-1-avi@scylladb.com>
2019-02-26 12:44:11 +01:00
Asias He
459836079c storage_service: Add application_state::SCHEMA when gossip starts
In resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test, we saw:

   4 nodes in the tests

   n1, n2, n3, n4 are started

   n1 is stopped

   n1 is changed to use different shard config

   n1 is restarted ( 2019-01-27 04:56:00,377 )

The backtrace happened on n2 right fater n1 restarts:

   0 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature STREAM_WITH_RPC_STREAM is enabled
   1 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature WRITE_FAILURE_REPLY is enabled
   2 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature XXHASH is enabled
   3 WARN 2019-01-27 04:56:05,177 [shard 0] gossip - Fail to send EchoMessage to 127.0.58.1: seastar::rpc::closed_error (connection is closed)
   4 INFO 2019-01-27 04:56:05,205 [shard 0] gossip - InetAddress 127.0.58.1 is now UP, status =
   5 Segmentation fault on shard 0.
   6 Backtrace:
   7 0x00000000041c0782
   8 0x00000000040d9a8c
   9 0x00000000040d9d35
   10 0x00000000040d9d83
   11 /lib64/libpthread.so.0+0x00000000000121af
   12 0x0000000001a8ac0e
   13 0x00000000040ba39e
   14 0x00000000040ba561
   15 0x000000000418c247
   16 0x0000000004265437
   17 0x000000000054766e
   18 /lib64/libc.so.6+0x0000000000020f29
   19 0x00000000005b17d9

The theory is: migration_manager::maybe_schedule_schema_pull is scheduled, at this time
n1 has SCHEMA application_state, when n1 restarts, n2 gets new application
state from n1 which does not have SCHEMA yet, when migration_manager::maybe_schedule
wakes up from the 60 sleep, n1 has non-empty endpoint_state but empty
application_state for SCHEMA. We dereference the nullptr
application_state and abort.

In commit da80f27f44, we fixed the problem by
checking the pointer before dereference.

To prevent this to happen in the first place, we'd better to add
application_state::SCHEMA when gossip starts. This way, peer nodes
always see the application_state::SCHEMA when a node restarts.

Tests: resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test

Fixes #4148
Fixes #4258
2019-02-26 19:30:22 +08:00
Asias He
75edbe939d database: Add update_schema_version and announce_schema_version
Split the update_schema_version_and_announce() into
update_schema_version() and announce_schema_version(). This is going to
be used in storage_service::prepare_to_join() where we want to first
update the schema version, start gossip, announce the schema version.
2019-02-26 19:10:02 +08:00
Amnon Heiman
b8a838c66c node_exporter_install: Add a force install option
It is sometimes usefull for force reinstallation of the node_exporter,
for example during upgrade or if something is wrong with the current
installation.

This patch adds a --force command line option.

If the --force is given to the node_expoerter_install, it will reinstall
node_exporter to the latest version, regardless if it was already
installed.

The symbolic link in /usr/bin/node_exporter will be set to the installed
version, so if there are other installed version, they will remain.

Examples:
$ sudo ./dist/common/scripts/node_exporter_install
node_exporter already installed, you can use `--force` to force reinstallation

$ sudo ./dist/common/scripts/node_exporter_install --force
node_exporter already installed, reinstalling

Fixes #4201

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20190225151120.21919-1-amnon@scylladb.com>
2019-02-25 20:16:58 +02:00
Pekka Enberg
ca288189a9 dist/ami: Support different products for the AMI
Let's add a PRODUCT variable, similar to build_rpm.sh, for example, so
that we can override package names for enterprise AMIs.

Message-Id: <20190225063319.19516-1-penberg@scylladb.com>
2019-02-25 11:17:44 +02:00
Asias He
3e615c3a15 repair: Update function name in docs/row_level_repair.md
The repair rpc request_* functions are renamed to get_*.
The send_row_diff is renamed to put_row_diff.
2019-02-25 15:13:39 +08:00
Asias He
62104902db repair: Rename send_row_diff to put_row_diff
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Asias He
6e4ea1b3c4 repair: Rename request_row_diff to get_row_diff
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Asias He
5b29fb30ac repair: Rename request_combined_row_hash to get_combined_row_hash
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Asias He
6f6c4878d5 repair: Rename request_full_row_hashes to get_full_row_hashes
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Asias He
02ddfa393e repair: Rename request_sync_boundary to get_sync_boundary
Make it consistent with the row level repair rpc verb.
2019-02-25 15:13:39 +08:00
Avi Kivity
a0b0db7915 Merge "Fix regression in perf_fast_forward results" from Paweł
"
After adcb3ec20c ("row_cache: read is not
single-partition if inter-partition forwarding is enabled") we have
noticed a regression in the results of some perf_fast_forward tests.
This was caused by those tests not disabling partition-level
fast-forwarding even though it was not needed and the commit in question
fixed an incorrect optimisation in such cases.

However, after solving that issue it has also become apparent that
mutation_reader_merger performs worse when the fast-forwarding is
disabled. This was attributed to logic responsible for dropping readers
as soon as they have reached the end of stream (which cannot be done if
fast-forwarding is enabled). This problem was mitigated with avoiding a
scan of the list and removing readers in small batches.

Fixes #4246.
Fixes #4254.

Tests: unit(dev)
"

* tag 'perf_fast_forward-fix-regression/v1' of https://github.com/pdziepak/scylla:
  mutation_reader_merger: drop unneded readers in small batches
  mutation_reader_merger: track readers by iterators and not pointers
  tests/perf_fast_forward: disable partition-level fast-forwarding if not needed
2019-02-24 19:24:00 +02:00
Avi Kivity
e3c53ff3ff Update seastar submodule
* seastar 2313dec...ab54765 (10):
  > Fix C++-17-only uses of static_assert() with a single parameter.
  > README.md: fix out-of-date explanation of C++ dialect
  > net: fix tcp load balancer accounting leak while moving socket to other shard
  > Revert "deleter: prevent early memory free caused by deleter append."
  > deleter: prevent early memory free caused by deleter append.
  > Solve seastar.unit.thread failure in debug mode
  > Fix iovec-based read_dma: use make_readv_iocb instead of make_read_iocb
  > build: Fix the required version of `fmt`
  > app_template: fix use after move in app constructor
  > build: Rename CMake variable for private flags

Fixes #4269.
2019-02-24 16:06:23 +02:00
Avi Kivity
a3a7bea12f Merge "Clean up preprocessor definitions" from Jesse
* 'jhk/define_debug/v1' of https://github.com/hakuch/scylla:
  build: Remove the `DEBUG_SHARED_PTR` pp variable
  build: Prefer the Seastar version of a pp variable
2019-02-23 14:04:08 +02:00
Jesse Haber-Kucharsky
f9297895c1 auth: Change the log level for async. retries
The log message is benign, but it has caused some users of Scylla to
think that an error has occurred.

Fixes #3850

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <ba49c38266c0e77c3ed23cfca3c1a082b3060f17.1550777586.git.jhaberku@scylladb.com>
2019-02-23 14:03:16 +02:00
Tomasz Grabiec
3f698701c2 gdb: Drop incorrect throw of StopIteration
It is converted into a RuntimeError by python3:

  https://docs.python.org/3/library/exceptions.html#StopIteration

We should just return.

Message-Id: <20190221144321.18093-1-tgrabiec@scylladb.com>
2019-02-23 14:02:47 +02:00
Nadav Har'El
0eddf19432 main: add INFO log messages at start, initialization end, and end.
Scylla currently prints a welcome message when it starts, with the
Scylla version, but this is not printed to the regular log so in some
cases (e.g., Jenkins runs) we do not see it in the log. So let's add
a regular INFO-level log message with the same information.

Also, Scylla currently doesn't print any specific log message when it
normally completes its shutdown. In some cases, users may end up
wondering whether Scylla hung in the middle of the shutdown, or in
fact exited normally. Refs #4238. So in this patch we add a "shutdown
complete" message as the very last message in a successfull shutdown.
We print Scylla's version also in the shutdown message, which may be
useful to see in the logs when shutting down one version of Scylla
and starting a different version.

Finally, we also add a log message when initialization is complete,
which may also be useful to understand whether Scylla hung during
initialization.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190217140659.19512-1-nyh@scylladb.com>
2019-02-22 16:52:31 +01:00
Tomasz Grabiec
b90cb91468 gdb: Introduce 'scylla cache'
Prints contents of the row cache for each table on current shard.
Message-Id: <20190222144420.19677-1-tgrabiec@scylladb.com>
2019-02-22 14:58:58 +00:00
Paweł Dziepak
b524f96a74 mutation_reader_merger: drop unneded readers in small batches
It was observed that destroying readers as soon as they are not needed
negatively affects performance of relatively small reads. We don't want
to keep them alive for too long either, since they may own a lot of
memory, but deferring the destruction slightly and removing them in
batches of 4 seems to solve the problem for the small reads.
2019-02-22 14:43:38 +00:00
Paweł Dziepak
435e24f509 mutation_reader_merger: track readers by iterators and not pointers
mutation_reader_merger uses a std::list of mutation_reader to keep them
alive while the rest of the logic operates on non-owning pointers.

This means that when it is a time to drop some of the readers that are
no longer needed, the merger needs to scan the list looking for them.
That's not ideal.

The solution is to make the logic use iterators to elements in that
list, which allows for O(1) removal of an unneeded reader. Iterators to
list are just pointers to the node and are not invalidated by unrelated
additions and removals.
2019-02-22 14:33:10 +00:00
Paweł Dziepak
5d5777f85e tests/perf_fast_forward: disable partition-level fast-forwarding if not needed
Several of the test cases in perf_fast_forward do not need
partition-level fast-forwarding. However, since the defaults are used to
construct most of the readers the fast-forwarding is enabled regardless.

This showed an apparent regression in the perf_fast_forward results
after adcb3ec20c ("row_cache: read is not
single-partition if inter-partition forwarding is enabled") which
disabled an optimisation that was invalid when partition-level
fast-forwarind was requested.

This patch ensures that all single-partition reads that do not need
partition-level fast-forwarding keep it disabled.
2019-02-22 14:28:02 +00:00
Avi Kivity
fdefee696e Merge "sstables: mc: writer: Avoid large allocations for keeping promoted index entries" from Tomasz
"
Currently we keep the entries in a circular_buffer, which uses
a contiguous storage. For large partitions with many promoted index
entries this can cause OOM and sstable compaction failure.

A similar problem exists for the offset vector built
in write_promoted_index().

This change solves the problem by serializing promoted index entries
and the offset vector on the fly directly into a bytes_ostream, which
uses fragmented storage.

The serialization of the first entry is deferred, so that
serialization is avoided if there will be less than 2
entries. Promoted index is not added for such partitions.

There still remains a problem that large-enough promoted index can cause OOM.

Refs #4217

Tests:
  - unit (release)
  - scylla-bench write

Branches: 3.0
"

* tag 'fix-large-alloc-for-promoted-index-v3' of github.com:tgrabiec/scylla:
  sstables: mc: writer: Avoid large allocations for maintaining promoted index
  sstables: mc: writer: Avoid double-serialization of the promoted index
2019-02-22 15:44:51 +02:00
Avi Kivity
177159da75 Merge "delete_atomically recovery" from Benny
"
The delete_atomically function is required to delete a set of sstables
atomically. I.e. Either delete all or none of them.  Deleting only
some sstables in the set might result in data resurrection in case
sstable A holding tombstone that cover mutation in sstable B, is deleted,
while sstable B remains.

This patchset introduces a log file holding a list of SSTable TOC files
to delete for recovering a partial delete_atomically operation.

A new subdirectory is create in the sstables dir called `pending_delete`
holding in-flight logs.

The logs are created with a temporary name (using a .tmp suffix)
and renamed to the final .log name once ready.  This indicates
the commit point for the operation.

When populating the column family, all files in the pending_delete
sub-directory are examined.  Temporary log files are just removed,
and committed log files are read, replayed, and deleted.

Fixes #4082

Tests: unit (dev), database_test (debug)
"

* 'projects/delete_atomically_recovery/v5' of https://github.com/bhalevy/scylla:
  tests: database_test: add test_distributed_loader_with_pending_delete
  distributed_loader: replay and cleanup pending_delete log files
  distributed_loader: populated_column_family: separate temp sst dirs cleanup phase
  docs: add sstables-directory-structure.md
  sstables: commit sstables to delete_atomically into a pending_delete log file
  sstables: delete_atomically: delete sstables in a thread
  sstables: component_basename: reuse with sstring component
  sstables: introduce component_basename
  database: maybe_delete_large_partitions_entry: do not access sstable and do not mask exceptions
  sstables: add delete_sstable_and_maybe_large_data_entries
  sstables: call remove_by_toc_name in dtor if marked_for_deletion
2019-02-22 15:37:17 +02:00
Benny Halevy
1ba88b709f tests: database_test: add test_distributed_loader_with_pending_delete
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:08:22 +02:00
Benny Halevy
043673b236 distributed_loader: replay and cleanup pending_delete log files
Scan the table's pending_delete sub-directory if it exists.
Remove any temporary pending_delete log files to roll back the respective
delete_atomically operation.
Replay completed pending_delete log files to roll forward the respective
delete_atomically operation, and finally delete the log files.

Cleanup of temporary sstable directories and pending_delete
sstables are done in a preliminary scan phase when populating the column family
so that we won't attempt to load the to-be-deleted sstables.

Fixes #4082

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:08:22 +02:00
Benny Halevy
ee3ad75492 distributed_loader: populated_column_family: separate temp sst dirs cleanup phase
In preparation for replaying pending_delete log files,
we would like to first remove any temporary sst dirs
and later handle pending_delete log files, and only
then populate the column family.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:08:22 +02:00
Benny Halevy
f35e4cbac7 docs: add sstables-directory-structure.md
Refs #4184

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:08:22 +02:00
Benny Halevy
024d0a6d49 sstables: commit sstables to delete_atomically into a pending_delete log file
To facilitate recovery of a delete_atomically operation that crashed mid
way, add a replayable log file holding the committed sstables to delete.

It will be used by populate_column_family to replay the atomic deletion.

1. Write the toc names of sstables to be deleted into a temporary file.
2. Once flushed and closed, rename the temp log file into the final name
   and flush the pending_delete directory.
3. delete the sstables.
4. Remove the pending_delete log file
   and flush the pending_delete directory.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:05:37 +02:00
Benny Halevy
70fda0eda0 sstables: delete_atomically: delete sstables in a thread
In prepaton for implementing a pending_delete log file.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:05:37 +02:00
Benny Halevy
9ac04850a0 sstables: component_basename: reuse with sstring component
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 11:05:10 +02:00
Benny Halevy
a2a9750074 sstables: introduce component_basename
component_basename returns just the basename for the component filename
without the leading sstdir path.

To be used for delete_atomically's pending_delete log file.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 10:44:02 +02:00
Benny Halevy
13ffda5c31 database: maybe_delete_large_partitions_entry: do not access sstable and do not mask exceptions
1. We would like to be able to call maybe_delete_large_partitions_entry
from the sstable destructor path in the future so the sstable might go away
while the large data entries are being deleted.

2. We would like the caller to handle any exception on this path,
especially in the prepatation part, before calling delete_large_partitions_entry().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 10:44:02 +02:00
Benny Halevy
ae29db8db6 sstables: add delete_sstable_and_maybe_large_data_entries
To be called by delete_atomically,
rather that passing a vector to delete_sstables.

This way, no need to build `sstables_to_delete_atomically` vector

To be replaced in the future with a sstable method once we
provide the large_data_handler upon construction.

Handle exceptions from remove_by_toc_name or maybe_delete_large_partitions_entry
by merely logging an error.  There is nothing else we can do at this point.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 10:44:02 +02:00
Benny Halevy
387f14a874 sstables: call remove_by_toc_name in dtor if marked_for_deletion
No need to call delete_sstables which works on a list of sstable
(by toc name).

Also, add FIXME comment about not calling
large_data_handler.maybe_delete_large_partitions_entry
on this path.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-22 10:44:02 +02:00
Avi Kivity
34b254381f sstables: checksummed_file_writer: fix dma alignment
checksummed_file_writer does not override allocate_buffer(), so it inherits
data_source_impl's default allocate_buffer, which does not care about alignment.
The buffer is then passed to the real file_data_sink_impl, and thence to the file
itself, which cannot complete the write since it is not properly aligned.

This doesn't fail in release mode, since the Seastar allocator will supply a
properly aligned buffer even if not asked to do so. The ASAN allocator usually
does supply an aligned buffer, but not always, which causes the test to fail.

Fix by forwarding the allocate_buffer() function to the underlying data_source.

Fixes #4262.
Branches: branch-3.0
Message-Id: <20190221184115.6695-1-avi@scylladb.com>
2019-02-21 21:26:56 +01:00
Jesse Haber-Kucharsky
b7b50392ed build: Remove the DEBUG_SHARED_PTR pp variable
This definition is exported by Seastar as `SEASTAR_DEBUG_SHARED_PTR` and
no code in Scylla uses this definition either way.
2019-02-21 10:45:09 -05:00
Jesse Haber-Kucharsky
f4883a1aea build: Prefer the Seastar version of a pp variable
Seastar defines `SEASTAR_DEFAULT_ALLOCATOR`, and everywhere else in
Scylla we use this variable too.
2019-02-21 10:41:42 -05:00
Piotr Sarna
c743617236 cql3: unify max value for row limit and per-partition limit
Limits are stored as uint32_t everywhere, but in some places
int32_t was used, which created inconsistencies when comparing
the value to std::numeric_limits<Type>::max().
In order to solve inconsistencies, the types are unified to uint32_t,
and instead of explicitly calling numeric limit max,
an already existing constant value query::max_rows is utilized.

Fixes #4253

Message-Id: <4234712ff61a0391821acaba63455a34844e489b.1550683120.git.sarna@scylladb.com>
2019-02-21 13:56:02 +02:00
Tomasz Grabiec
ecff716f40 query-result-set: Give more context on failure
We've seen schema application failing with marshal_exception
here. That's not enough information to figure out what is the
problem. Knowing which table and column is affected would make
diagnosis much easier in certain cases.

This patch wraps errors in query::deserialization_error with more
information.

Example output:

  query::deserialization_error (failed on column system_schema.tables#bloom_filter_fp_chance \
  (version: c179c1d7-9503-3f66-a5b3-70e72af3392a, id: 0, index: 0, type: org.apache.cassandra.db.marshal.DoubleType):\
  seastar::internal::backtraced<marshal_exception> (marshaling error: read_simple - not enough bytes (expected 8, got 3)
Message-Id: <20190221113219.13018-1-tgrabiec@scylladb.com>
2019-02-21 11:35:27 +00:00
Nadav Har'El
f55bdea364 compaction manager: avoid spurious "asked to stop" message at the end of the log
This patch removes the log message about "compaction_manager - Asked to stop"
at the very end of Scylla runs. This log message is confusing because it
only has the "asked to stop" part, without finally a "stopped", and may
lead a user to incorrectly fear that the shutdown hung - when it in fact
finished just fine.

The database object holds a compaction_manager and stop()s it when the
database is stop()ed - and that is the very last thing our shutdown does.
However, much earlier, as the *first* shutdown operation (i.e., the last
at_exit() in main.cc), we already stop() the compaction manager.

The second stop() call does nothing, but unfortunately prints the log
message just before checking if it has anything to stop. So this patch
just moves the log message to after the check.

Fixes #4238.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190217142657.19963-1-nyh@scylladb.com>
2019-02-21 12:32:47 +01:00
Rafael Ávila de Espíndola
5a7bff36ca Simplify sstable::filename
No functionality change, but avoids a std::unordered_map.

Tests: unit (dev)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190221014630.15476-1-espindola@scylladb.com>
2019-02-21 12:40:01 +02:00
Avi Kivity
5520fc37ba Merge " Fix INSERT JSON with null values" from Piotr
"
Fixes #4256

This miniseries fixes a problem with inserting NULL values through
INSERT JSON interface.

Tests: unit (dev)
"

* 'fix_insert_json_with_null' of https://github.com/psarna/scylla:
  tests: add test for INSERT JSON with null values
  cql3: add missing value erasing to json parser
2019-02-21 12:36:09 +02:00
Piotr Sarna
4d211690f9 tests: add test for INSERT JSON with null values 2019-02-21 11:25:14 +01:00
Piotr Sarna
6618191e49 cql3: add missing value erasing to json parser
When inserting a null value through INSERT JSON, the column
was erroneously not removed from the 'not used' list of columns.

Fixes #4256
2019-02-21 11:23:44 +01:00
Tomasz Grabiec
8687666169 schema_tables: Add trace-level logging of schema mutations
Can be useful in diagnosing problems with application of schema
mutations.

do_merge_schema() is called on every change of schema of the local
node.

create_table_from_mutations() is called on schema merge when a table
was altered or created using mutations read from local schema tables
after applying the change, or when loading schema on boot.

Message-Id: <20190221093929.8929-2-tgrabiec@scylladb.com>
2019-02-21 12:16:38 +02:00
Tomasz Grabiec
f65d1e649d schema_mutations: Make printable
Message-Id: <20190221093929.8929-1-tgrabiec@scylladb.com>
2019-02-21 12:16:32 +02:00
Avi Kivity
9adfd11374 Merge "Avoid including cryptopp headers" from Rafael
"
cryptopp's config.h has the following pragma:

 #pragma GCC diagnostic ignored "-Wunused-function"

It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.

This patch series introduces a single .cc file that has to include
cryptopp headers.
"

* 'avoid-cryptopp-v3' of https://github.com/espindola/scylla:
  Avoid including cryptopp headers
  Delete dead code
2019-02-21 10:31:20 +02:00
Rafael Ávila de Espíndola
fd5ea2df5a Avoid including cryptopp headers
cryptopp's config.h has the following pragma:

 #pragma GCC diagnostic ignored "-Wunused-function"

It is not wrapped in a push/pop. Because of that, including cryptopp
headers disables that warning on scylla code too.

The issue has been reported as
https://github.com/weidai11/cryptopp/issues/793

To work around it, this patch uses a pimpl to have a single .cc file
that has to include cryptopp headers.

While at it, it also reduces the differences and code duplication
between the md5 and sha1 hashers.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-20 08:03:46 -08:00
Rafael Ávila de Espíndola
a309f952d2 Delete dead code
This code would have be to refactored by the next patch. Since it is
commented out, just delete it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-02-20 08:03:46 -08:00
Duarte Nunes
4354479985 Merge 'Minimize generated view updates for unselected column updates' from Piotr
"
This series addresses the issue of redundant view updates,
generated for columns that were not selected for given materialized view.
Cases covered (quote:)
* If a base row has a live row marker, then we can avoid generating
  view updates if only unselected columns change;
* If a base row has no live row marker, then we can avoid generating
  view updates if unselected columns are updated, unless they are newly
  created, deleted, or they have a TTL.

Additionally, this series includes caching selected columns and is_index information
to avoid unnecessary CPU cycles spent on recomputing these two.

Fixes #3819
"

* 'send_less_view_updates_if_not_necessary_4' of https://github.com/psarna/scylla:
  tests: add cases for view update generation optimizations
  view: minimize generated view updates for unselected columns
  view: cache is_index for view pointer
  index: make non-pointer overload of is_index function
  index: avoid copying when checking for is_index
2019-02-20 13:24:44 +00:00
Piotr Sarna
563456e3ac tests: add cases for view update generation optimizations
Test cases that cover avoiding generating view updates
when not necessary (e.g. when a column not selected by the view
is modified) are added.
2019-02-20 14:05:29 +01:00
Piotr Sarna
bd52e05ae2 view: minimize generated view updates for unselected columns
In some cases generating view updates for columns that were not
selected in CREATE VIEW statement is redundant - it is the case
when the update will not influence row liveness in anyway.
Currently, these cases are optimized out:
 - row marker is live and only unselected columns were updated;
 - row marked is not live and only unselected columns were updated,
   and in the process nothing was created or deleted and there was
   no TTL involved;
2019-02-20 14:05:27 +01:00
Piotr Sarna
dbe8491655 view: cache is_index for view pointer
It's detrimental to keep querying index manager whether a view
is backing a secondary index every time, so this value is cached
at construct time.
At the same time, this value is not simply passed to view_info
when being created in secondary index manager, in order to
decouple materialized view logic from secondary indexes as much as
possible (the sole existence of is_index() is bad enough).
2019-02-20 12:52:32 +01:00
Piotr Sarna
cb20fc2e4f index: make non-pointer overload of is_index function
Previous interface enforced passing a shared pointer, which
might result in calling unneeded shared_from_this().
2019-02-20 12:52:32 +01:00
Piotr Sarna
94db098d39 index: avoid copying when checking for is_index
Previously is_index implementation used list_indexes() helper function,
which copies data.
2019-02-20 12:52:32 +01:00
Tomasz Grabiec
a8c74bc7ab gdb: Print LSA/Cache/Memtable memory usage from "scylla memory"
Example output:

LSA:
  allocated:     181010432
  used:          177209344
  free:            3801088

Cache:
  total:          97255424
  used:           60700600
  free:           36554824

Memtables:
 total:            83755008
 Regular:
  real dirty:      79429632
  virt dirty:      35168426
 System:
  real dirty:        524288
  virt dirty:        466764
 Streaming:
  real dirty:             0
  virt dirty:             0

Message-Id: <1550598424-23428-1-git-send-email-tgrabiec@scylladb.com>
2019-02-20 12:53:53 +02:00
Tomasz Grabiec
dafe22dd83 lsa: Fix spurios abort with --enable-abort-on-lsa-bad-alloc
allocate_segment() can fail even though we're not out of memory, when
it's invoked inside an allocating section with the cache region
locked. That section may later succeed after retried after memory
reclamation.

We should ignore bad_alloc thrown inside allocating section body and
fail only when the whole section fails.

Fixes #2924

Message-Id: <1550597493-22500-1-git-send-email-tgrabiec@scylladb.com>
2019-02-20 12:53:49 +02:00
Avi Kivity
84465c23c4 Merge "Add multi-column restrictions filtering" from Piotr
"
Fixes #3574

This series adds missing multi-column restrictions filtering to CQL.
The underlying infrastructure already allows checking multi-column
restrictions in a reasonable way, so this series consists of mostly
adding simple interfaces and parameters.
Also, unit test cases for multi-column restrictions are provided.

Tests: unit (dev)
"

* 'add_multi_column_restrictions_filtering_3' of https://github.com/psarna/scylla:
  tests: add multi-column filtering tests
  cql3: add multi-column restrictions filtering
  cql3: add specified is_satisfied_by to multi-column restriction
  cql3: rewrite raw loop in is_satisfied_by to boost::any_of
  cql3: fix is_satisfied_by for multi-column restrictions
  cql3: add missing include to multi-column restriction
2019-02-19 14:42:14 +02:00
Piotr Sarna
9432937816 tests: add multi-column filtering tests
Refs #3574
2019-02-19 13:24:25 +01:00
Piotr Sarna
4dc0b0672c cql3: add multi-column restrictions filtering
It's now possible to pass multi-column restrictions
to queries that require filtering.

Fixes #3574
2019-02-19 13:24:25 +01:00
Piotr Sarna
3db526ffe2 cql3: add specified is_satisfied_by to multi-column restriction
Multi-column restrictions need only schema, clustering key and query
options in order to decide if they are satisfied, so an overloaded
function that takes reduced number of parameters is added.
2019-02-19 13:24:25 +01:00
Piotr Sarna
16dbc917a4 cql3: rewrite raw loop in is_satisfied_by to boost::any_of 2019-02-19 13:24:12 +01:00
Piotr Sarna
0d675e4419 cql3: fix is_satisfied_by for multi-column restrictions
Multi-column restriction should be satisfied by the value
if any of the ranges contains it, not all of them.
Example: SELECT * FROM t WHERE (a,b) IN ((1,2),(1,3))
will operate on two singular ranges: [(1,2),(1,2)] and [(1,3),(1,3)].
It's sufficient for a value to be inside any of these two in order
to satisfy the restriction.
2019-02-19 13:10:58 +01:00
Avi Kivity
934ba7ccb2 Merge "tests: introduce test environment and cleanup sstable tests" from Benny
"
As part of implementing sstables manager and fixing issue related
to updating large_data_handler on all delete paths, we want to funnel
all sstable creations, loading, and deletions through a manager.

The patchset lays out test infrastructure to funnel these opeations
through class sstables::test_env.

In the process, it cleans up many numerous call sites in the existing
unit tests that evolved over time.

Refs #4198
Refs #4149

Tests: unit (dev)
"

* 'projects/test_env/v3' of https://github.com/bhalevy/scylla:
  tests: introduce sstables::test_env
  tests: perf_sstable: rename test_env
  tests: sstable_datafile_test: use useable_sst
  tests: sstable_test: add write_and_validate_sst helper
  tests: sstable_test: add test_using_reusable_sst helper
  tests: sstable_test: use reusable_sst where possible
  tests: sstable_test: add test_using_working_sst helper
  tests: sstable_3_x_test: make_test_sstable
  tests: run_sstable_resharding_test: use default parameters to make_sstable
  tests: sstables::test::make_test_sstable: reorder params
  tests: test_setup: do_with_test_directory is unused
  tests: move sstable_resharding_strategy_tests to sstable_reharding_test
  tests: move create_token_from_key helpers to test_services
  tests: move column_family_for_tests to test_services
  dht: move declaration of default_partitioner from sstable_datafile_test to i_partitioner.hh
2019-02-19 11:26:42 +02:00
Piotr Sarna
4eecb57a0b cql3: add missing include to multi-column restriction 2019-02-19 10:24:31 +01:00
Tomasz Grabiec
9c6f897731 tools/toolchain/README: Add the "Troubleshooting" section
Message-Id: <1550567863-29404-1-git-send-email-tgrabiec@scylladb.com>
2019-02-19 11:21:02 +02:00
Tzach Livyatan
622361bf1a docs/docker-hub.md: Docker Compose cluster example
This adds a simple example of launching a 3-node Scylla cluster with
Docker Compose.

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
[ penberg: minor edits ]
Message-Id: <20190213081003.6401-1-tzach@scylladb.com>
2019-02-19 09:52:20 +02:00
Avi Kivity
e37e095432 build: allow configuring and testing multiple modes
Allow the --mode argument to ./configure.py and ./test.py to be repeated. This
is to allow contiuous integration to configure only debug and release, leaving dev
to developers.
Message-Id: <20190214162736.16443-1-avi@scylladb.com>
2019-02-18 15:52:25 +00:00
Tomasz Grabiec
08f4a3664e sstables: mc: writer: Avoid large allocations for maintaining promoted index
Currently, we keep the entries in a circular_buffer, which uses
a contiguous storage. For large partitions with many promoted index
entries this can cause OOM and sstable compaction failure.

A similar problem exists for the offset vector built
in write_promoted_index().

This change solves the problem by serializing promoted index entries
and the offset vector on the fly directly into a bytes_ostream, which
uses fragmented storage.

The serialization of the first entry is deferred, so that
serialization is avoided if there will be less than 2
entries. Promoted index is not added for such partitions.

There still remains a problem that large-enough promoted index can cause OOM.

Refs #4217
2019-02-18 16:03:07 +01:00
Tomasz Grabiec
4e093bc3a4 sstables: mc: writer: Avoid double-serialization of the promoted index 2019-02-18 16:03:07 +01:00
Duarte Nunes
6e83457b1b Merge 'Add PER PARTITION LIMIT' from Piotr
"
This series introduces PER PARTITION LIMIT to CQL.
Protocol and storage is already capable of applying per-partition limits,
so for nonpaged queries the changes are superficial - a variable is parsed
and passed down.
For paged queries and filtering the situation is a little bit more complicated
due to corner cases: results for one partition can be split over 2 or more pages,
filtering may drop rows, etc. To solve these, another variable is added to paging
state - the number of rows already returned from last served partition.
Note that "last" partition may be stretched over any number of pages, not just the
last one, which is a case especially when considering filtering.
As a result, per-partition-limiting queries are not eligible for page generator
optimization, because they may need to have their results locally filtered
for extraneous rows (e.g. when the next page asks for  per-partition limit 5,
but we already received 4 rows from the last partition, so need just 1 more
from last partition key, but 5 from all next ones).

Tests: unit (dev)

Fixes #2202
"

* 'add_per_partition_limit_3' of https://github.com/psarna/scylla:
  tests: remove superficial ignore_order from filtering tests
  tests: add filtering with per partition key limit test
  tests: publish extract_paging_state and count_rows_fetched
  tests: fix order of parameters in with_rows_ignore_order
  cql3,grammar: add PER PARTITION LIMIT
  idl,service: add persistent last partition row count
  cql3: prevent page generator usage for per-partition limit
  cql3: add checking for previous partition count to filtering
  pager: add adjusting per-partition row limit
  cql3: obey per partition limit for filtering
  cql3: clean up unneeded limit variables
  cql3: obey per partition limit for select statement
  cql3: add get_per_partition_limit
  cql3: add per_partition_limit to CQL statement
2019-02-18 14:47:11 +00:00
Amnon Heiman
750b76b1de scylla-housekeeping: Read JSON as UTF-8 string for older Python 3 compatibility
Python 3.6 is the first version to accept bytes to the json.loads(),
which causes the following error on older Python 3 versions:

  Traceback (most recent call last):
    File "/usr/lib/scylla/scylla-housekeeping", line 175, in <module>
      args.func(args)
    File "/usr/lib/scylla/scylla-housekeeping", line 121, in check_version
      raise e
    File "/usr/lib/scylla/scylla-housekeeping", line 116, in check_version
      versions = get_json_from_url(version_url + params)
    File "/usr/lib/scylla/scylla-housekeeping", line 55, in get_json_from_url
      return json.loads(data)
    File "/usr/lib64/python3.4/json/__init__.py", line 312, in loads
      s.__class__.__name__))
  TypeError: the JSON object must be str, not 'bytes'

To support those older Python versions, convert the bytes read to utf8
strings before calling the json.loads().

Fixes #4239
Branches: master, 3.0

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20190218112312.24455-1-amnon@scylladb.com>
2019-02-18 14:52:32 +02:00
Piotr Sarna
5ad5221ce1 tests: remove superficial ignore_order from filtering tests
Testing filtering with LIMIT used with_rows_ignore_order function,
while it's better to use simpler with_rows.
2019-02-18 11:06:44 +01:00
Piotr Sarna
5f67a501ec tests: add filtering with per partition key limit test 2019-02-18 11:06:44 +01:00
Piotr Sarna
a84e237177 tests: publish extract_paging_state and count_rows_fetched
These local lambda functions will be reused, so they are promoted
to static functions.
2019-02-18 11:06:44 +01:00
Piotr Sarna
824e9dc352 tests: fix order of parameters in with_rows_ignore_order
When reporting a failure, expected rows were mixed up with received
rows. Also, the message assumed it received more rows, but it can
as well be less, so now it reports a "different number" of rows.
2019-02-18 11:06:44 +01:00
Piotr Sarna
3e4f065847 cql3,grammar: add PER PARTITION LIMIT
Select statements now allow passing PER PARTITION LIMIT (?) directive
which will trim results for each partition accordingly.
2019-02-18 11:06:44 +01:00
Piotr Sarna
acf7bedad4 idl,service: add persistent last partition row count
In order to process paged queries with per-partition limits properly,
paging state needs to keep additional information: what was the row
count of last partition returned in previous run.
That's necessary because the end of previous page and the beginning
of current one might consist of rows with the same partition key
and we need to be able to trim the results to the number indicated
by per-partition limit.
2019-02-18 11:06:44 +01:00
Piotr Sarna
3a2b004f02 cql3: prevent page generator usage for per-partition limit
Paged queries that induce per-partition limits cannot use
page generator optimization, as sometimes the results need
to be filtered for extraneous rows on page breaks.
2019-02-18 11:06:44 +01:00
Piotr Sarna
1dadae212a cql3: add checking for previous partition count to filtering
Filtering now needs to take into account per partition limits as well,
and for that it's essential to be able to compare partition keys
and decide which rows should be dropped - if previous page(s) contained
rows with the same partition key, these need to be taken into
consideration too.
2019-02-18 11:06:43 +01:00
Piotr Sarna
82a3883575 pager: add adjusting per-partition row limit
For filtering pagers, per partition limit should be set
to page size every time a query is executed, because some rows
may potentially get dropped from results.
2019-02-18 10:55:52 +01:00
Piotr Sarna
b965c3778f cql3: obey per partition limit for filtering
Filtering queries now take into account the limit of rows
per single partition provided by the user.
2019-02-18 10:29:34 +01:00
Piotr Sarna
b3aa939cde cql3: clean up unneeded limit variables
Some places extracted a `limit` variable to be captured by lambdas,
but they were not used inside them.
2019-02-18 10:29:34 +01:00
Piotr Sarna
cfb6e9c79c cql3: obey per partition limit for select statement
Select statement now takes into account the limit of rows
per single partition provided by the user.
2019-02-18 10:29:34 +01:00
Piotr Sarna
41b466246e cql3: add get_per_partition_limit 2019-02-18 10:29:34 +01:00
Piotr Sarna
93786a9148 cql3: add per_partition_limit to CQL statement
Select statements can now accept per_partition_limit variable.
2019-02-18 10:29:34 +01:00
Gleb Natapov
b01a659014 storage_proxy: remove old Cassandra code
Part of the code is already implemented (counters and hinted-handoff).
Part of the code will probably never be (triggers). And the rest is
the code that estimates number of rows per range to determine query
parallelism, but we implemented exponential growth algorithms instead.

Message-Id: <20190214112226.GE19055@scylladb.com>
2019-02-18 10:34:55 +02:00
Avi Kivity
a1567b0997 Merge "replace get_restricted_ranges() function with generator interface" from Gleb
"
get_restricted_ranges() is inefficient since it calculates all
vnodes that cover a requested key ranges in advance, but callers often
use only the first one.  Replace the function with generator interface
that generates requested number of vnodes on demand.
"

* 'gleb/query_ranges_to_vnodes_generator' of github.com:scylladb/seastar-dev:
  storage_proxy: limit amount of precaclulated ranges by query_ranges_to_vnodes_generator
  storage_proxy: remove old get_restricted_ranges() interface
  cql3/statements/select_statement: convert index query interface to new query_ranges_to_vnodes_generator interface
  tests: convert storage_proxy test to new query_ranges_to_vnodes_generator interface
  storage_proxy: convert range query path to new query_ranges_to_vnodes_generator interface
  storage_proxy: introduce new query_ranges_to_vnode_generator interface
2019-02-18 10:33:54 +02:00
Avi Kivity
497367f9f7 Revert "build: switch debug mode from -O0 to -Og"
This reverts commit e988521b89. It triggers a bug
int gcc variable tracking, and there are reports it significantly slows down
compilation.
2019-02-17 18:32:28 +02:00
Nadav Har'El
05db7d8957 Materialized views: name the "batch_memory_max" constant
Give the constant 1024*1024 introduced in an earlier commit a name,
"batch_memory_max", and move it from view.cc to view_builder.hh.
It now resides next to the pre-existing constant that controlled how
many rows were read in each build step, "batch_size".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190217100222.15673-1-nyh@scylladb.com>
2019-02-17 13:28:16 +00:00
Avi Kivity
7b411e30a9 Update seastar submodule
* seastar 11546d4...2313dec (6):
  > Deprecate thread_scheduling_group in favor of scheduling_group
  > Merge "Fixes for Doxygen documentation" from Jesse
  > future: optionally type-erase future::then() and future::then_wrapped
  > build: Allow deprecated declarations internally
  > rpc: fix insertion of server connections into server's container
  > rpc: split BOOST_REQUIRE with long conditions into multiple
2019-02-16 22:27:34 +02:00
Avi Kivity
03531c2443 fragmented_temporary_buffer: fix read_exactly() during premature end-of-stream
read_exactly(), when given a stream that does not contain the amount of data
requested, will loop endlessly, allocating more and more memory as it does, until
it fails with an exception (at which point it will release the memory).

Fix by returning an empty result, like input_stream::read_exactly() (which it
replaces). Add a test case that fails without a fix.

Affected callers are the native transport, commitlog replay, and internal
deserialization.

Fixes #4233.

Branches: master, branch-3.0
Tests: unit(dev)
Message-Id: <20190216150825.14841-1-avi@scylladb.com>
2019-02-16 17:06:19 +00:00
Takuya ASADA
af988a5360 install-dependencies.sh: show description when 'yum-utils' package is installed on Fedora
When yum-utils already installed on Fedora, 'yum install dnf-utils' causes
conflict, will fail.
We should show description message instead of just causing dnf error
mesage.

Fixes #4215

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190215221103.2379-1-syuu@scylladb.com>
2019-02-16 17:16:18 +02:00
Pekka Enberg
f7cf04ac4b tools/toolchain: Clean up DNF cache from Docker image
Make sure we call "dnf clean all" to remove the DNF cache, which reduces
Docker image size as per the following guidelines:

https://github.com/fedora-cloud/Fedora-Dockerfiles/wiki/Guidelines-for-Creating-Dockerfiles

A freshly built image is 250 MB smaller than the one on Docker Hub:

  <none>                                <none>               b8cafc8ff557        16 seconds ago      1.2 GB
  docker.io/scylladb/scylla-toolchain   fedora-29-20190212   d253d45a964c        3 days ago          1.45 GB

Message-Id: <20190215142322.12466-1-penberg@scylladb.com>
2019-02-16 17:12:10 +02:00
Botond Dénes
2125e99531 service/storage_service: fix pre-bootstrap wait for schema agreement
When bootstrapping, a node should to wait to have a schema agreement
with its peers, before it can join the ring. This is to ensure it can
immediately accept writes. Failing to reach schema agreement before
joining is not fatal, as the node can pull unknown schemas on writes
on-demand. However, if such a schema contains references to UDFs, the
node will reject writes using it, due to #3760.

To ensure that schema agreement is reached before joining the ring,
`storage_service::join_token_ring()` has to checks. First it checks that
at least one peer was connected previously. For this it compares
`database::get_version()` with `database::empty_version`. The (implied)
assumption is that this will become something other than
`database::empty_version` only after having connected (and pulled
schemas from) at least one peer. This assumption doesn't hold anymore,
as we now set the version earlier in the boot process.
The second check verifies that we have the same schema version as all
known, live peers. This check assumes (since 3e415e2) that we have
already "met" all (or at least some) of our peers and if there is just
one known node (us) it concludes that this is a single-node cluster,
which automatically has schema agreement.
It's easy to see how these two checks will fail. The first fails to
ensure that we have met our peers, and the second wrongfully concludes
that we are a one-node cluster, and hence have schema agreement.

To fix this, modify the first check. Instead of relying on the presence
of a non-empty database version, supposedly implying that we already
talked to our peers, explicitely make sure that we have really talked to
*at least* one other node, before proceeding to the second check, which
will now do the correct thing, actually checking the schema versions.

Fixes: #4196

Branches: 3.0, 2.3

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <40b95b18e09c787e31ba6c5519fb64d68b4ca32e.1550228389.git.bdenes@scylladb.com>
2019-02-15 15:56:46 +01:00
Rafael Ávila de Espíndola
9cd14f2602 Don't write to system.large_partition during shutdown
The included testcase used to crash because during database::stop() we
would try to update system.large_partition.

There doesn't seem to be an order we can stop the existing services in
cql_test_env that makes this possible.

This patch then adds another step when shutting down a database: first
stop updating system.large_partition.

This means that during shutdown any memtable flush, compaction or
sstable deletion will not be reflected in system.large_partition. This
is hopefully not too bad since the data in the table is TTLed.

This seems to impact only tests, since main.cc calls _exit directly.

Tests: unit (release,debug)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190213194851.117692-1-espindola@scylladb.com>
2019-02-15 10:49:10 +01:00
Avi Kivity
e988521b89 build: switch debug mode from -O0 to -Og
-Og is advertised as debug-friendly optimization, both in compile time
and debug experience. It also cuts sstable_mutation_test run time in half:

Changing -O0 to -Og

Before:

real    16m49.441s
user    16m34.641s
sys    0m10.490s

After:

real    8m38.696s
user    8m26.073s
sys    0m10.575s

Message-Id: <20190214205521.19341-1-avi@scylladb.com>
2019-02-15 08:19:48 +02:00
Benny Halevy
c8f239ff2b tests: introduce sstables::test_env
In preparation to adding sstables_manager we want
to establish an environment for testing sstables.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:37:41 +02:00
Benny Halevy
f9546b23b7 tests: perf_sstable: rename test_env
test_env is going to be a class in sstables namespace

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:15 +02:00
Benny Halevy
d6cfc1fae5 tests: sstable_datafile_test: use useable_sst
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
2a6b5a7622 tests: sstable_test: add write_and_validate_sst helper
In preparation for sstables::test_env

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
255f05e6c8 tests: sstable_test: add test_using_reusable_sst helper
In preparation for sstables::test_env

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
e11e29a1fc tests: sstable_test: use reusable_sst where possible
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
9d4989f2e8 tests: sstable_test: add test_using_working_sst helper
In preparation for sstables::test_env

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
55aac22b37 tests: sstable_3_x_test: make_test_sstable
Reused for making sstables for test cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
3bc1b8b9ff tests: run_sstable_resharding_test: use default parameters to make_sstable
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:22:14 +02:00
Benny Halevy
b0f3f8d766 tests: sstables::test::make_test_sstable: reorder params
In preparation for providing a default large_data_handler in
a test-standard way.

buffer_size parameter reordered and now has a default value
same as make_sstable()'s.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:21:36 +02:00
Benny Halevy
bcd3f36a8a tests: test_setup: do_with_test_directory is unused
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:21:32 +02:00
Benny Halevy
b39c7bc4ae tests: move sstable_resharding_strategy_tests to sstable_reharding_test
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:21:32 +02:00
Benny Halevy
8801a6da1f tests: move create_token_from_key helpers to test_services
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:21:32 +02:00
Benny Halevy
815fd76c25 tests: move column_family_for_tests to test_services
And unify multiple copies of column_family_test_config().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:21:10 +02:00
Benny Halevy
b6ad61d2e5 dht: move declaration of default_partitioner from sstable_datafile_test to i_partitioner.hh
So it can be used by other tests

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-02-14 22:16:52 +02:00
Nadav Har'El
43c42d608d materialized views: forbid using "virtual" columns in restrictions
For fixing issue #3362 we added in materialized views, in some cases,
"virtual columns" for columns which were not selected into the view.
Although these columns nominally exist in the view's schema, they must
not be visible to the user, and in commit
3f3a76aa8f we prevented a user from being
able to SELECT these columns.

In this patch we also prevent the user from being able to use these
column names (which shouldn't exist in the view) in WHERE restrictions.

Fixes #4216

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190212162014.18778-1-nyh@scylladb.com>
2019-02-14 16:08:41 +02:00
Gleb Natapov
0b84b04f97 consistency_level: make it more const correct
Message-Id: <20190214122631.GF19055@scylladb.com>
2019-02-14 14:52:51 +02:00
Nadav Har'El
fec562ec8f Materialized views: limit size of row batching during bulk view building
The bulk materialized-view building processes (when adding a materialized
view to a table with existing data) currently reads the base table in
batches of 128 (view_builder::batch_size) rows. This is clearly better
than reading entire partitions (which may be huge), but still, 128 rows
may grow pretty large when we have rows with large strings or blobs,
and there is no real reason to buffer 128 rows when they are large.

Instead, when the rows we read so far exceed some size threshold (in this
patch, 1MB), we can operate on them immediately instead of waiting for
128.

As a side-effect, this patch also solves another bug: At worst case, all
the base rows of one batch may be written into one output view partition,
in one mutation. But there is a hard limit on the size of one mutation
(commitlog_segment_size_in_mb, by default 32MB), so we cannot allow the
batch size to exceed this limit. By not batching further after 1MB,
we avoid reaching this limit when individual rows do not reach it but
128 of them did.

Fixes #4213.

This patch also includes a unit test reproducing #4213, and demonstrating
that it is now solved.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190214093424.7172-1-nyh@scylladb.com>
2019-02-14 12:04:40 +02:00
Calle Wilund
e70286a849 db/extensions: Allow schema extensions to turn themselves off
Fixes #4222

Iff an extension creation callback returns null (not exception)
we treat this as "I'm not needed" and simply ignore it.

Message-Id: <20190213124311.23238-1-calle@scylladb.com>
2019-02-13 14:50:51 +02:00
Jesse Haber-Kucharsky
74ac1deee1 build: Fix the build on Ubuntu
The way the `pkg-config` executable works on Fedora and Ubuntu is
different, since on Fedora `pkg-config` is provided by the `pkgconf`
project.

In the build directory of Seastar, `seastar.pc` and `seastar-testing.pc`
are generated. `seastar` is a requirement of `seastar-testing`.

When pkg-config is invoked like this:

    pkg-config --libs build/release/seastar-testing.pc

the version of `pkg-config` on Fedora resolves the reference to
`seastar` in `Requires` to the `seastar.pc` in the same directory.

However, the version of `pkg-config` on Ubuntu 18.04 does not:

    Package seastar was not found in the pkg-config search path.
    Perhaps you should add the directory containing `seastar.pc'
    to the PKG_CONFIG_PATH environment variable
    Package 'seastar', required by '/seastar-testing', not found

To address the divergent behavior, we set the `PKG_CONFIG_PATH` variable
to point to the directory containing `seastar.pc`. With this change, I
was able to configure Scylla on both Fedora 29 and Ubuntu 18.04.

Fixes #4218

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <d7164bde2790708425ac6761154d517404818ecd.1550002959.git.jhaberku@scylladb.com>
2019-02-13 13:33:50 +02:00
Avi Kivity
2915baeff4 Merge "Move truncation records to separate table" from Calle
"
Fixes #4083

Instead of sharded collection in system.local, use a
dedicated system table (system.truncated) to store
truncation positions. Makes query/update easier
and easier on the query memory.

The code also migrates any existing truncation
positions on startup and clears the old data.
"

* 'calle/truncation' of github.com:scylladb/seastar-dev:
  truncation_migration_test: Add rudimentary test
  system_keyspace: Add waitable for trunc. migration
  cql_test_env: Add separate config w. feature disable
  cql_test_env: Add truncation migration to init
  cql_assertions: Add null/non-null tests
  storage_service: Add features disabling for tests
  Add system.truncated documentation in docs
  commitlog_replay: Use dedicated table for truncation
  storage_service: Add "truncation_table" feature
2019-02-13 11:16:30 +02:00
Calle Wilund
2e320a456c truncation_migration_test: Add rudimentary test 2019-02-13 09:08:12 +00:00
Calle Wilund
4e657c0633 system_keyspace: Add waitable for trunc. migration
For tests. Hooray for separation of concern.
2019-02-13 09:08:12 +00:00
Calle Wilund
b253757b17 cql_test_env: Add separate config w. feature disable 2019-02-13 09:08:12 +00:00
Calle Wilund
859a1d8f36 cql_test_env: Add truncation migration to init 2019-02-13 09:08:12 +00:00
Calle Wilund
fbcbe529ad cql_assertions: Add null/non-null tests 2019-02-13 09:08:12 +00:00
Calle Wilund
64e8c6f31d storage_service: Add features disabling for tests 2019-02-13 09:08:12 +00:00
Calle Wilund
7d3867e153 Add system.truncated documentation in docs 2019-02-13 09:08:12 +00:00
Calle Wilund
12ebcf1ec7 commitlog_replay: Use dedicated table for truncation
Fixes #4083

Instead of sharded collection in system.local, use a
dedicated system table (system.truncated) to store
truncation positions. Makes query/update easier
and easier on the query memory.

The code also migrates any existing truncation
positions on startup and clears the old data.
2019-02-13 09:08:12 +00:00
Calle Wilund
ff5e541335 storage_service: Add "truncation_table" feature 2019-02-13 09:08:12 +00:00
Avi Kivity
a3de5581ce Update seastar submodule
* seastar 428f4ac...11546d4 (9):
  > reactor: Fix an infinite loop caused the by high resolution timer not being monitored
  > build: Add back `SEASTAR_SHUFFLE_TASK_QUEUE`
  > build: Unify dependency versions
  > future-util: optimize parallel_for_each() with single element
  > core/sharded.hh: fix doxygen for "Multicore" group
  > build: switch from travis-ci to circleci
  > perftune.py: fix irqbalance tuning on Ubuntu 18
  > build: Make the use of sanitizers transitive
  > net: ipv6: fix ipv6 detection and tests by binding to loopback
2019-02-12 18:42:07 +02:00
Avi Kivity
c7aa73af51 Merge "Automatically pause shard readers when not used" from Botond
"
Recently, there has been a series of incidents of the multishard
combining reader deadlocking, when the concurrency of reads were
severely restricted and there was no timeout for the read.
Several fixes have been merged (414b14a6b, 21b4b2b9a, ee193f1ab,
170fa382f) but eliminating all occurrences of deadlocks proved to be a
whack-a-mole game. After the last bug report I have decided that instead
of trying to plug new wholes as we find them, I'll try to make wholes
impossible to appear in the first place. To translate this into the
multishard reader, instead of sprinkling new `reader.pause()` calls all
over the place in the multishard reader to solve the newly found
deadlocks, make the pausing of readers fully automatic on the shard
reader level. Readers are now always kept in a paused state, except when
actually used. This eliminates the entire class of deadlock bugs.

This patch-set also aims at simplifying the multishard reader code, as
well as the code of the existing `lifecycle_policy` implementations.
This effort resulted in:
* mutation_reader.cc: no change in SLOC, although it now also contains
  logic that used to be duplicated in every `lifecycle_policy`
  implementation;
* multishard_mutation_query.cc: 150 SLOC removed;
* database.cc: 30 SLOC removed;
Also the code is now (hopefully) simpler, safer and has a clearer
structure.

Fixes #4050 (main issue)
Fixes #3970
Fixes #3998 (deprecates really)
"

* 'simplify-and-fix-multishard-reader/v3.1' of https://github.com/denesb/scylla:
  query_mutations_on_all_shards(): make states light-weight
  query_mutations_on_all_shards(): get rid of read_context::paused_reader
  query_mutations_on_all_shards(): merge the dismantling and ready_to_save states into saving state
  query_mutations_on_all_shards(): pause looked-up readers
  query_mutation_on_all_shards(): remove unecessary indirection
  shard_reader: auto pause readers after being used
  reader_concurrency_semaphore::inactive_read_handle: fix handle semantics
  shard_reader: make reader creation sync
  shard_reader: use semaphore directly to pause-resume
  shard_reader: recreate_reader(): fix empty range case
  foreign_reader: rip out the now unused private API
  shard_reader: move away from foreign_reader
  multishard_combining_reader: make shard_reader a shared pointer
  multishard_combining_reader: move the shard reader definition out
  multishard_combining_reader: disentangle shard_reader
2019-02-12 16:22:52 +02:00
Botond Dénes
db106a32c8 query_mutations_on_all_shards(): make states light-weight
Previously the different states a reader can be in were all separate
structs, and were joined together by a variant. When this was designed
this made sense as states were numerous and quite different. By this
point however the number of states has been reduced to 4, with 3 of them
being almost the same. Thus it makes sense to merge these states into
single struct and keep track of the current state with an enum field.
This can theoretically increase the chances of mistakes, but in practice
I expect the opposite, due to the simpler (and less) code. Also, all the
important checks that verify that a reader is in the state expected by
the code are all left in place.
A byproduct of this change is that the amount of cross-shard writes is
greatly reduced. Whereas previously the whole state object had to be
rewritten on state change, now a single enum value has to be updated.
Cross shard reads are reduced as well to the read of a few foreign
pointers, all state-related data is now kept on the shard where the
associated reader lives.
2019-02-12 16:20:51 +02:00
Botond Dénes
65b2eb0939 query_mutations_on_all_shards(): get rid of read_context::paused_reader 2019-02-12 16:20:51 +02:00
Botond Dénes
ec44a4dbb1 query_mutations_on_all_shards(): merge the dismantling and ready_to_save states into saving state
These two states are now the same, with the artificial distinction that
all readers are promoted to readey_to_save state after the compaction
state and the combined buffer is dismantled. From a practical
perspective this distinction is meaningless so merge the two states into
a single `saving` state.
2019-02-12 16:20:51 +02:00
Botond Dénes
9a1bd24d82 query_mutations_on_all_shards(): pause looked-up readers
On the beginning of each page, all saved readers from the previous pages
(if any) are looked up, so they can be reused. Some of these saved
readers can end up not being used at all for the current page, in which
case they will needlessly sit on their permit for the duration of
filling the page. Avoid this by immediately pausing all looked-up
readers. This also allows a nice unifying of the reader saving logic, as
now *all* readers will be in a paused state when `save_reader()` is
called. Previously, looked-up, but not used readers were an exception to
this, requiring extra logic to handle both cases. This logic can now be
removed.
2019-02-12 16:20:51 +02:00
Botond Dénes
61b9ed7faf query_mutation_on_all_shards(): remove unecessary indirection 2019-02-12 16:20:51 +02:00
Botond Dénes
9000626647 shard_reader: auto pause readers after being used
Previously it was the responsibility of the layer above (multishard
combining reader) to pause readers, which happened via an explicit
`pause()` call. This proved to be a very bad design as we kept finding
spots where the multishard reader should have paused the reader to avoid
potential deadlocks (due to starved reader concurrency semaphores), but
didn't.

This commit moves the responsibility of pausing the reader into the
shard reader. The reader is now kept in a paused state, except when it
is actually used (a `fill_buffer()` or `fast_forward_to()` call is
executing). This is fully transparent to the layer above.
As a side note, the shard reader now also hides when the reader is
created. This also used to be the responsibility of the multishard
reader, and although it caused no problems so far, it can be considered
a leak of internal details. The shard reader now automatically creates
the remote reader on the first time it is attempted to be used.

The code has been reorganized, such that there is now a clear separation
of responsibilities. The multishard combining reader handles the
combining of the output of the shard readers, as well as issuing
read-aheads. The shard reader handles read-ahead and creating the
remote reader when needed, as well as transferring the results of remote
reads to the "home" shard. The remote reader
(`shard_reader::remote_reader`, new in this patch) handles
pausing-resuming as well as recreating the reader after it was evicted.
Layers don't access each other's internals (like they used to).

After this commit, the reader passed to `destroy_reader()` will always
be in paused state.
2019-02-12 16:20:51 +02:00
Botond Dénes
ab5d717052 reader_concurrency_semaphore::inactive_read_handle: fix handle semantics
That is:
* make it move only;
* make moved-from handles null handles;
* add (public) default constructor, which constructs a null handle;
2019-02-12 16:20:51 +02:00
Botond Dénes
37006135dc shard_reader: make reader creation sync
Reader creation happens through the `reader_lifecycle_policy` interface,
which offers a `create_reader()` method. This method accepts a shard
parameter (among others) and returns a future. Its implementation is
expected to go to the specified shard and then return with the created
reader. The method is expected to be called from the shard where the
shard reader (and consequently the multishard reader) lives. This API,
while reasonable enough, has a serious flaw. It doesn't make batching
possible. For example, if the shard reader issues a call to the remote
shard to fill the remote reader's buffer, but finds that it was evicted
while paused, it has to come back to the local shard just to issue the
recreate call. This makes the code both convoluted and slow.
Change the reader creation API to be synchronous, that is, callable from
the shard where the reader has to be created, allowing for simple call
sites and batching.
This change requires that implementations of the lifecycle policy update
any per-reader data-structure they have from the remote shard. This is
not a problem however, as these data-structures are usually partitioned,
such that they can be accessed safely from a remote shard.
Another, very pleasant, consequence of this change is that now all
methods of the lifecycle interface are sync and thus calls to them
cannot overlap anymore.

This patch also removes the
`test_multishard_combining_reader_destroyed_with_pending_create_reader`
unit test, which is not useful anymore.

For now just emulate the old interface inside shard reader. We will
overhaul the shard reader after some further changes to minimize
noise.
2019-02-12 16:20:51 +02:00
Botond Dénes
57d1f6589c shard_reader: use semaphore directly to pause-resume
The shard reader relies on the `reader_lifecycle_policy` for pausing and
resuming the remote reader. The lifecycle policy's API was designed to
be as general as possible, allowing for any implementation of
pause/resume. However, in practice, we have a single implementation of
pause/resume: registering/unregistering the reader with the relevant
`reader_concurrency_semaphore`, and we don't expect any new
implementations to appear in the future.
Thus, the generic API of the lifecycle policy, is needlessly abstract
making its implementations needlessly complex. We can instead make this
very concrete and have the lifecycle policy just return the relevant
semaphore, removing the need for every implementor of the lifecycle
policy interface to have a duplicate implementation of the very same
logic.

For now just emulate the old interface inside shard reader. We will
overhaul the shard reader after some further changes to minimize noise.
2019-02-12 16:20:51 +02:00
Botond Dénes
fae5a2a8c8 shard_reader: recreate_reader(): fix empty range case
If the shard reader is created for a singular range (has a single
partition), and then it is evicted after reaching EOS, when recreated we
would have to create a reader that reads an empty range, since the only
partition the range has was already read. Since it is not possible to
create a reader with an empty range, we just didn't recreate the reader
in this case. This is incorrect however, as the code might still attempt
to read from this reader, if only due to a bug, and would trigger a
crash. The correct fix is to create an empty reader that will
immediately be at EOS.
2019-02-12 16:20:51 +02:00
Botond Dénes
cd807586f6 foreign_reader: rip out the now unused private API
Drop all the glue code, needed in the past so the shard reader can be
implemented on top of foreign reader. As the shard reader moved away
from foreign reader, this glue code is not needed anymore.
2019-02-12 16:20:51 +02:00
Botond Dénes
d80bc3c0a5 shard_reader: move away from foreign_reader
In the past, shard reader wrapped a foreign reader instance, adding
functionality required by the multishard reader on top. This has worked
well to a certain degree, but after the addition of pause-resume of
shard reader, the cooperation with foreign reader became more-and-more a
struggle. It has now gotten to a point, where it feels like shard reader
is fighting foreign reader as much as it reuses it. This manifested
itself in the ever growing amount of glue code, and hacks baked into
foreign reader (which is supposed to be of general use), specific to
the usage in the multishard reader.
It is time we don't force this code-reuse anymore and instead implement
all the required functionality in shard reader directly.
2019-02-12 16:20:51 +02:00
Botond Dénes
da0c01c68b multishard_combining_reader: make shard_reader a shared pointer
Some members of shard reader have to be accessed even after it is
destroyed. This is required by background work that might still be
pending when the reader is destroyed. This was solved by creating a
special `state` struct, which contained all the members of the shard
readers that had to be accessed even after it was destroyed. This state
struct was managed through a shared pointer, that each continuation that
was expected to outlive the reader, held a copy of. This however created
a minefield, where each line of the code had to be carefully audited to
access only fields that will be guaranteed to remain valid.
Fix this mess by making the whole class a shared pointer, with
`enable_shared_from_this`. Now each continuation just has to make sure
to keep `this` alive and code can now access all members freely (well,
almost).
2019-02-12 16:20:51 +02:00
Botond Dénes
f1c3421eb4 multishard_combining_reader: move the shard reader definition out
Shard reader started its life as a very thin layer above foreign reader,
with just some convenience methods added. As usual, by now it has grown
into a hairy monster, its class definition out-growing even that of the
multishard reader itself. It is time shard reader is moved into the
top-level scope, improving the readability of both classes.
2019-02-12 16:20:51 +02:00
Botond Dénes
7114b59309 multishard_combining_reader: disentangle shard_reader
Currently shard reader has a reference to the owning multishard reader
and it freely accesses its members. This resulted in a mess, where it's
not clear what exactly shard reader depends on. Disentangle this mess,
by making the shard reader self-sufficient, passing all it depends on
into its constructor.
2019-02-12 16:20:51 +02:00
Nadav Har'El
85e5791710 tests/view_schema_test: fix flakiness caused by missing eventually()
All tests that involve writing to a base table and then reading from the
view table must use the eventually() function to account for the fact that
the view update is asynchronous, and may be visible only some time after
writing the base table. Forgetting an eventually() can cause the test
to become flaky and sometimes fail because the expected data is not *yet*
in the view. Botond noticed these failures in practice in two subtests
(test_partition_key_filtering_with_slice and
test_clustering_key_in_restrictions).

This patch fixes both tests, and I also reviewed the entire source file
view_schem_test.cc and found additional places missing an eventually()
(and also places that unnecessarily used eventually() to read from the
base table), and fixed those as well.

Fixes #4212

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190212121140.14679-1-nyh@scylladb.com>
2019-02-12 16:10:30 +02:00
Paweł Dziepak
eb03cf00f5 sstable: write_components: drop default for encoding stats
There is no value if having a default value for encoding_stats parameter
of write_components(). If anything it weakens the tests by encouraging
not using the real encoding stats which is not what the actual sstable
write path in Scylla does.

This patch removes the default value and makes most of the tests provide
real encoding statistics. The ones that do not are those that have no
easy way of obtaining those (and those stats are not that important for
the test itself) or there is a reason for not using those
(sstable_3_x_test::test_sstable_write_large_row uses row size thresholds
based on size with default-constructed encoding_stats).

Message-Id: <20190212124356.14878-1-pdziepak@scylladb.com>
2019-02-12 16:08:24 +02:00
Calle Wilund
4a52ed7884 commitlog: Accept recycled (not yet re-used) segments in replay
Refs #4085

Changes commitlog descriptor to both accept "Recycled-Commitlog..."
file names, and preserve said name in the descriptor.

This ensures we pick up the not-yet-used recycled segments left
from a crash for replay. The replay in turn will simply ignore
the recycled files, and post actual replay they will be deleted
as needed.

Message-Id: <20190129123311.16050-1-calle@scylladb.com>
2019-02-12 12:23:55 +02:00
Nadav Har'El
93baa334ea create-relocatable-package.py: speed up slow compression
create-relocatable-package.py currently (refs #4194) builds a compressed
tar file, but does so using a painfully slow Python implementation of gzip,
which is a problem considering the huge size (around 2 gigabytes) of Scylla's
executable. On my machine, running it for a release build of Scylla takes a
whopping 6 minutes.

Just replacing the Python compression with a pipe to an external "gzip"
process speeds up the run to just 2 minutes. But gzip is still not optimal,
using only one thread even when on a many-core machine. If we switch to
"pigz", a parallel implementation of "gzip", all cores are used and on
my machine the compression speeds up to just 23 seconds - that's 15
times faster than before this patch.

So this patch has create-relocatable-package.py use an external pigz process.
"pigz" is now required on the build system (if you want to create packages),
so is added to install-dependencies.sh.

[avi: update toolchain]
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190212090333.3970-1-nyh@scylladb.com>
2019-02-12 11:19:04 +02:00
Nadav Har'El
1cf1af1502 scylla_setup: fix non-interactive behavior
In commit ec66dd6562, in non-interactive
runs of scylla_setup all options were unintentionally set to "false",
regardless of the options passed on the scylla_setup command line. This
can lead to all sorts of wrong behaviors, and in particular one test
setup assumed it was enabling the Scylla service (which was previously
the default) but after this commit, it no longer did.

This patch restores the previous behavior: Non-interactive invocations
of scylla_setup adhere to the defaults and the command-line options,
rather than blindly choosing "false".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190211214105.32613-1-nyh@scylladb.com>
2019-02-12 10:50:00 +02:00
Gleb Natapov
26e5700819 storage_proxy: limit amount of precaclulated ranges by query_ranges_to_vnodes_generator
Do not recalculate too much ranges in advance, it requires large
allocation and usually means that a consumer of the interface is going
to do to much work in parallel.

Fixes: #3767
2019-02-12 10:45:25 +02:00
Avi Kivity
da9628c6dc auth: password_authenticator: protect against NULL salted_hash
In case salted_hash was NULL, we'd access uninitialized memory when dereferencing
the optional in get_as<>().

Protect against that by using get_opt() and failing authentication if we see a NULL.

Fixes #4168.

Tests: unit (release)
Branches: 3.0, 2.3
Message-Id: <20190211173820.8053-1-avi@scylladb.com>
2019-02-11 18:54:03 +01:00
Botond Dénes
c9e00172e9 tests/multishard_mutation_query_test: add fuzzy test
"Fuzzy test" executes semi-random range-scans against semi-random data.
By doing so we hope to achieve a coverage of edge cases that would
be very hard to achieve by "conventional" unit tests.

Fuzzy test generates a table with a population of partitions that are
a combinations of all of:
* Size of static row: none, tiny, small and large;
* Number of clustering rows: none, few, several, and lots;
* Size of clustering rows: tiny, small and large;
* Number of range deletions: few, several and lots;
* Number of rows covered by a range deletion: few, several;

As well as a partition with extreme large static row, extreme number of
rows and rows of extreme size.

To avoid writing an excess amount of data, the size limit of pages is
reduced to 1KB (from the default 1MB) and the row count limit of pages
is reduced to 1000 (from the default of 10000).

The test then executes range-scans against this population. For each
range scan, a random partition range is generated, that is guaranteed to
contain at least one partition (to avoid executing mostly empty scans),
as well as a random partition-slice (row ranges). The data returned by
the query is then thoroughly validated against the population
description returned by the `create_test_table()` function.

As this test has a large degree of randomness to it, covering a
quasi-infinite input-space, it can (theoretically) fail at any time.
As such I took great care in making such failures deterministically
reproducible, based on a single random seed, which is logged to the
output in case of a failure, together with instructions on how to repeat
the particular run. The test also uses extensive logging to aid
investigations. For logging, seastar's logging mechanism is used, as
`BOOST_TEST_MESSAGE` produces unintelligible output when running with
-c > 1. Log messages are carefully tagged, so that the test produces the
least amount of noise by default, while being very explicit about what's
happening when ran with `debug` or especially `trace` log levels.
2019-02-11 17:14:47 +02:00
Botond Dénes
4b2cac6f40 tests/multishard_mutation_query_test: refactor read_all_partitions_with_paged_scan()
The existing `read_all_partitions_with_paged_scan()` implementation was
tailored to the existing, simplistic test cases. Refactor it so that it
can be used in much more complex test cases:
* Allow specifying the page's `max_size`.
* Allow specifying the query range.
* Allow specifying the partition slice's ck ranges.
* Fix minor bugs in the paging logic.

To avoid churn, a backward-compatible overload is added, that retains
the old parameter set.
2019-02-11 17:14:47 +02:00
Botond Dénes
542301fdc9 tests/test_table: add advanced create_test_table() overload
This overload provides a middle ground between the very generic, but
hard-to-use "expert version" and to very restrictive and simplistic
"beginner version". It allows the user to declaratively describe the
to-be-generated population in terms of bunch
`std::uniform_int_distribution` objects (e.g. number of rows, size of
rows, etc.).
This allows for generating a random population in a controlled way, with
a minimum amount of boiler-plate code on the user side.
2019-02-11 17:14:47 +02:00
Botond Dénes
7e1c1c2e8c tests/test_table: make create_test_table() customizable
Allow the user to specify the population of the table in a generic and
flexible way. This patch essentially rewrites the `create_test_table()`
implementation from scratch, so that it populates the table using the
partition generator passed in by the user. Backward compatibility is
kept, by providing a `create_test_table()` overload that is identical to
the previous API. This overload is now implemented on top of the generic
overload.
2019-02-11 17:14:47 +02:00
Gleb Natapov
ecc5230de5 storage_proxy: remove old get_restricted_ranges() interface
It is not used any more.
2019-02-11 14:45:43 +02:00
Gleb Natapov
0cd9bbb71d cql3/statements/select_statement: convert index query interface to new query_ranges_to_vnodes_generator interface 2019-02-11 14:45:43 +02:00
Gleb Natapov
e6208b1cde tests: convert storage_proxy test to new query_ranges_to_vnodes_generator interface 2019-02-11 14:45:43 +02:00
Gleb Natapov
2735a85c8e storage_proxy: convert range query path to new query_ranges_to_vnodes_generator interface 2019-02-11 14:45:43 +02:00
Gleb Natapov
692a0bd000 storage_proxy: introduce new query_ranges_to_vnode_generator interface
get_restricted_ranges() function gets query provided key ranges
and divides them on vnode boundaries. It iterates over all ranges and
calculates all vnodes, but all its users are usually interested in only
one vnode since most likely it will be enough to populate a page. If it
will be not enough they will ask for more. This patch introduces new
interface instead of the function that allows to generate vnode ranges
on demand instead of precalculating all of them.
2019-02-11 14:45:43 +02:00
Avi Kivity
cb51fcab9d README: improbe dbuild instructions
Add a quick start, document more options, and link from the main README.
Message-Id: <20190210154606.21739-1-avi@scylladb.com>
2019-02-11 09:25:25 +01:00
Avi Kivity
2724a66a12 docker: don't send .git during "docker build"
It's huge and useless during "docker build" operations.
Message-Id: <20190208161848.21125-1-avi@scylladb.com>
2019-02-11 09:17:14 +01:00
Glauber Costa
e0bfd1c40a allow Cassandra SSTables with counters to be imported if they are new enough
Right now Cassandra SSTables with counters cannot be imported into
Scylla.  The reason for that is that Cassandra changed their counter
representation in their 2.1 version and kept transparently supporting
both representations.  We do not support their old representation, nor
there is a sane way to figure out by looking at the data which one is in
use.

For safety, we had made the decision long ago to not import any
tables with counters: if a counter was generated in older Cassandra, we
would misrepresent them.

In this patch, I propose we offer a non-default way to import SSTables
with counters: we can gate it with a flag, and trust that the user knows
what they are doing when flipping it (at their own peril). Cassandra 2.1
is by now pretty old. many users can safely say they've never used
anything older.

While there are tools like sstableloader that can be used to import
those counters, there are often situations in which directly importing
SSTables is either better, faster, or worse: the only option left.  I
argue that having a flag that allow us to import them when we are sure
it is safe is better than having no option at all.

With this patch I was able to successfully import Cassandra tables with
counters that were generated in Cassandra 2.1, reshard and compact their
SSTables, and read the data back to get the same values in Scylla as in
Cassandra.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190210154028.12472-1-glauber@scylladb.com>
2019-02-10 17:50:48 +02:00
Glauber Costa
61ea54eff6 tools: toolchain: dbuild: use host networking
This is convenient to test scylla directly by invoking build/dev/scylla.
This needs to be done under docker because the shared objects scylla
looks for may not exist in the host system.

During quick development we may not want to go through the trouble of
packaging relocatable scylla every time to test changes.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190209021033.8400-1-glauber@scylladb.com>
2019-02-10 12:16:47 +02:00
Duarte Nunes
d2d885fb93 Merge 'Fix misdetection of remote counter shards' from Paweł
"
The code reading counter cells form sstables verifies that there are no
unsupported local or remote shards. The latter are detected by checking
if all shards are present in the counter cell header (only remote shards
do not have entries there). However, the logic responsible for doing
that was incorrectly computing the total number of counter shards in a
cell if the header was larger than a single counter shard. This resulted
in incorrect complaints that remote shards are present.

Fixes #4206

Tests: unit(release)
"

* tag 'counter-header-fix/v1' of https://github.com/pdziepak/scylla:
  tests/sstables: test counter cell header with large number of shards
  sstables/counters: fix remote counter shard detection
2019-02-10 12:16:31 +02:00
Paweł Dziepak
4eeb8eeed5 tests/sstables: test counter cell header with large number of shards
The logic responsible for reading counters from sstables was getting
confused by large headers. The size of the header depends directly on
the number of shards. This tests checks that we can handle cells with
large number of counter shards properly.
2019-02-08 17:06:31 +00:00
Paweł Dziepak
df1ac03154 sstables/counters: fix remote counter shard detection
Each counter cell has a header with an entry for each local and global
shards. The detection of remote shards is done by checking if there are
any counter shards that do not have an entry in the header. This is done
by computing the number of counter shards in a cell and comparing it to
the number of header entries. However, the computation was wrong and
included the size taken by the header itself. As a result, if the header
was as big or larger than a single counter shard Scylla incorrectly
complained about remote shards.
2019-02-08 17:04:22 +00:00
Glauber Costa
8ba6b569b1 relocatable python: make sure all shared objects are relocated
The interpreter as it is right now has a bug: I incorrectly assumed that
all the shared libraries that python dynamically links would be in
lib-dynload. That is not true, and at least some of them are in
site-packages.

With that, we were loading system libraries for some shared objects.
The approach taken to fix this is to just check if we're seeing a shared
library and relocate everything we see: we will end up relocating the
ones in lib64 too, but that not only should be okay, it is probably even
more fool-proof.

While doing that I noticed that I had forgotten to incorporate one of
previous feedback from Avi (that we're leaving temporary files behind).
So I'm fixing that as well.

[avi: update toolchain]
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190208115501.7234-1-glauber@scylladb.com>
2019-02-08 18:42:24 +02:00
Glauber Costa
fb742473e2 replace /usr/local as a source of packages in the python relocatable interpreter
I was playing with the python3 interpreter trying to get pip to work,
just to see how far we can go. We don't really need pip, but I figured
it would be a good stress test to make sure that the process is working
and robust.

And it didn't really work, because although pip will correctly install
things into $relocatable_root/local/lib, sys.path will still refer to a
hardcoded /usr/local. While this should not affect Scylla, since we
expect to have all our modules in out path anyway -- and that path is
searched before /usr/local, it is still dangerous to make an absolute
reference like this.

Unfortunately, /usr/local/ it is included unconditionally by site.py,
which is executed when the interpreter is started and there is no
environment variable I found to change that (the help string refers to
PYTHONNOUSERSITE, but I found no mention of that in site.py whatsoever)

There is a way to tell site.py not to bother to add user sites, by
passing the -s flag, which this patch does.

Aside from doing that, we also enhance PYTHONPATH to include a reference
to ./local/{lib,lib64}/python<version>/site-packages.

After applying this patch, I was able to build an interpreter containing
only python3-pip and python3-setuptools, and build the relocatable
environment from there.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190206052104.25927-1-glauber@scylladb.com>
2019-02-08 18:41:52 +02:00
Botond Dénes
181bf64858 query: add trim_clustering_row_ranges_to()
This algorithm was already duplicated in two places
(service/pager/query_pagers.cc and mutation_reader.cc). Soon it will be
used in a third place. Instead of triplicating, move it into a function
that everybody can use.
2019-02-08 16:30:17 +02:00
Botond Dénes
bc31d8cbcc tests/test_table: add keyspace and table name params
Allow the keyspace and table names to be customizable by the caller.
2019-02-08 16:30:17 +02:00
Botond Dénes
2d885c6453 tests/test_table: s/create_test_cf/create_test_table/
Also move it to the `test` namespace.
2019-02-08 16:30:17 +02:00
Botond Dénes
c2a6ac307f tests: move create_test_cf() to tests/test_table.{hh,cc}
In the next patches `create_test_cf()` will be made much more powerful
and as such generally useful. Move it into its own files so other tests
can start using it as well.
2019-02-08 16:30:17 +02:00
Botond Dénes
2d3c4f9009 tests/multishard_mutation_query_test: drop many partition test
Soon a much better test will be added that will cover many partitions
as well and much more.
2019-02-08 16:30:17 +02:00
Botond Dénes
ced0e7ecb3 tests/multishard_mutation_query_test: drop range tombstone test
Soon a much better test will be added that will also cover range
tombstones and much more.
2019-02-08 16:30:17 +02:00
Paweł Dziepak
64b1a2caf9 tests: modernise tmpdir
tmpdir is a helper class representing a temporary directory.
Unfortunately, it suffers for some problems such as lack of proper
encapsulation and weak typing. This has caused bugs in the past when the
user code accidentally modified the member variable with the path to the
directory.

This patch modernises tmpdir and updates its users. The path is stored
in a std::filesystem::path and available read-only to the class users.
mkdtemp and boost are replaced by standard solution.

The users are update to use path more (when it didn't involve too many
changes to their code) and stop using lw_shared_ptr to store the tmpdir
when it wasn't necessary.

tmpdir intentionally doesn't provide any helpers for getting the path as
a string in order to discourage weak types.

Message-Id: <20190207145727.491-1-pdziepak@scylladb.com>
2019-02-07 20:18:14 +02:00
Avi Kivity
e2e25720c1 Update seastar submodule
* seastar c3be06d...428f4ac (13):
  > build: make the "dist" test respect the build type
  > Merge 'Add support for docker --cpuset-cpus' from Juliana
  > Merge "Add support for Coroutines TS" from Paweł
  > Merge "Modernize dependency management" from Avi
  > future: propagate broken_promise exception to abandoned continuations
  > net/inet_address: avoid clang Wmissing-braces
  > build: Default to the "Release" type if unspecified
  > rpc: log an exception that may happen while processing an RPC message
  > Add a --split-dwarf option to configure.py
  > build: Fix the `StdFilesystem` module
  > Compress debug info by default
  > Add an option for building with split dwarf
  > Dockerfile: install stow
2019-02-07 20:08:15 +02:00
Paweł Dziepak
de2a447576 utils/extremum_tracking: drop default constructor
Default constructed extremum_tracker has uninitialised _default_value
which basically makes it never correct to do that. Since this class is a
mechanism and not a value it doesn't really need to be a regular type,
so let's drop the default constructor.

Message-Id: <20190207162430.7460-1-pdziepak@scylladb.com>
2019-02-07 18:31:25 +02:00
Tomasz Grabiec
7184289015 Merge "Various fixes and improvements for sstables statistics" from Paweł
This series contains several fixes and improvements as well as new tests
for sstable code dealing with statistics.

 * https://github.com/pdziepak/scylla.git sstable-stats-fixes/v1-rebased:
  sstables: compaction: don't access moved-from vector of sstables
  memtable: move encoding_stats_collector implementation out of header
  sstables: seal_statistics(): pass encoding_stats by constant reference
  sstables/mc/writer: don't assume all schema columns are present
  tests/sstable3: improvements to file compare
  tests: extract mutation data model
  tests/data_model: add support for expiring atomic cells
  tests/data_model: allow specifying timestamp for row markers
  tests/memtable: test column tracking for encoding stats
  sstables: use correct source of statistics in
    get_encoding_stats_for_compaction()
  utils/extremum_tracking: preserve "not-set" status on merge
  sstables/metadata_collector: move the default values to the global
    tracker
  tests/sstables: test for reading serialisation header
  tests/sstables: pass encoding stats to write_components()
  tests/sstable: test merging encoding_stats

Fixes #4202.
2019-02-07 12:35:29 +01:00
Paweł Dziepak
67252de195 tests/sstable: test merging encoding_stats 2019-02-07 10:17:06 +00:00
Paweł Dziepak
e25603fbf7 tests/sstables: pass encoding stats to write_components()
By default write_components() uses a safe default for encoding_stats
which indicates that all columns are present. This may hide so bugs, so
let's pass the real thing in the tests that this may matter.
2019-02-07 10:17:06 +00:00
Paweł Dziepak
d44d5ebf86 tests/sstables: test for reading serialisation header 2019-02-07 10:17:06 +00:00
Paweł Dziepak
ebf667fb9c sstables/metadata_collector: move the default values to the global tracker
column_stats is a per-partition tracker, while metadata_collector is the
global one. The statistics gathered by column_stats are merged into the
metadata_collector. In order to ensure that we get proper default values
in case no value of particular kind (e.g. no TTLs) was seen they need to
be set on the global tracker, not the per-partition one.
2019-02-07 10:16:50 +00:00
Paweł Dziepak
2680022df0 utils/extremum_tracking: preserve "not-set" status on merge
extremum_tracker allows choosing a default value that's going to be used
only if no "real" values were provided. Since it is never compared with
the actual input values it can be anything. For instance, if the minimum
tracker default value is 0 and there was one update with the value 1 the
detected minimum is going to be 1 (the default is ignored).

However, this doesn't work when the trackers are merged since that
process always leaves the destination tracker in the "set" state
regardless whether any of the merged trakcers has ever seen any value.

This is fixed by this patch, by properly preserving _is_set state on
merge.
2019-02-07 10:16:50 +00:00
Paweł Dziepak
84d8ee35d4 sstables: use correct source of statistics in get_encoding_stats_for_compaction()
sstable class is responsible for much more things that it should. In
particular, it takes care of both writing and reading sstables. The
problem that it causes is that it is very easy to confuse those two.

This is what has happened in get_encoding_stats_for_compaction().
Originally, it was using _c_stats as a source of the statistics, which
is used only during the write and per-partition. Needless to say, the
returned encoding_stats were bogus.

The correct source of those statistics is get_stats_metadata().
2019-02-07 10:16:50 +00:00
Paweł Dziepak
e315448d0a tests/memtable: test column tracking for encoding stats 2019-02-07 10:16:50 +00:00
Paweł Dziepak
591d5195a9 tests/data_model: allow specifying timestamp for row markers 2019-02-07 10:16:50 +00:00
Paweł Dziepak
b07cba6a89 tests/data_model: add support for expiring atomic cells 2019-02-07 10:16:50 +00:00
Paweł Dziepak
aab0b7360f tests: extract mutation data model 2019-02-07 10:16:50 +00:00
Paweł Dziepak
fa216be260 tests/sstable3: improvements to file compare
This patch introduces some improvement to file comparison:
 - exception flags are set so that any error triggers an exceptions and
   guarantees that they are not silently ignored
 - std::ios_base::binary flag is passed to open()
 - istreambuf_iterator is used instead of istream_iterator. It is better
   suited for comparing binary data.
2019-02-07 10:16:50 +00:00
Paweł Dziepak
bc61471132 sstables/mc/writer: don't assume all schema columns are present
The writer constructor prepares lists of present static and regular
columns, those should be used for any further checks.
2019-02-07 10:16:50 +00:00
Paweł Dziepak
0132bcc035 sstables: seal_statistics(): pass encoding_stats by constant reference 2019-02-07 10:16:50 +00:00
Paweł Dziepak
341f186933 memtable: move encoding_stats_collector implementation out of header 2019-02-07 10:16:50 +00:00
Paweł Dziepak
6d5c1a9813 sstables: compaction: don't access moved-from vector of sstables 2019-02-07 10:16:50 +00:00
Paweł Dziepak
a8a45a243b tests/cql_test_env: don't override tmpdir::path
The interface tmpdir::path isn't properly encapsulated and its users can
modify the path even though they really shouldn't. This can happen
accidentally, in cql_test_env a reference to tmpdir::path was created
and later assigned to in one of the code paths. This caused tmpdir
destructor to remove wrong directory at program exit.

This patch solves the problem by avoiding referencing tmpdir::path, a
copy is perfectly acceptable considering that this is tests-only code.

Message-Id: <20190206173046.26801-1-pdziepak@scylladb.com>
2019-02-06 20:55:40 +02:00
Takuya ASADA
96b1cb97ba dist/ami: don't cleanup build dir
rm -rf build/* was to start rpm building on clean state, but it also delete
scylla built binaries so it was not good idea.

Instead of rm -rf build/*, we can check file existance on cloned
directory, if it seems good we can reuse it.
Also we need to run git pull on each package repo since it may not
included latest commit.

Fixes #4189

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190206101755.2056-1-syuu@scylladb.com>
2019-02-06 15:33:09 +02:00
Nadav Har'El
3e7dc7230d build_deb.sh: fix error message
The error message was apparently copied from the RPM script. Fix it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190205162148.20698-1-nyh@scylladb.com>
2019-02-05 18:22:36 +02:00
Avi Kivity
54748ad15b Merge "Allow non-key IN restrictions" from Piotr
"
Fixes #4193
Fixes #3795

This series enables handling IN restrictions for regular columns,
which is needed by both filtering and indexing mechanisms.

Tests: unit (release)
"

* 'allow_non_key_in_restrictions' of https://github.com/psarna/scylla:
  tests: add filtering with IN restriction test
  cql3: remove unused can_have_only_one_value function
  cql3: allow non-key IN restrictions
2019-02-05 17:30:35 +02:00
Piotr Sarna
45db5da51b tests: add filtering with IN restriction test
Test case for filtering regular columns with IN restriction is added.
2019-02-05 16:04:17 +01:00
Piotr Sarna
36609d1376 cql3: remove unused can_have_only_one_value function 2019-02-05 16:04:17 +01:00
Piotr Sarna
c178ed8b16 cql3: allow non-key IN restrictions
Restricting a regular column with IN restriction is a perfectly
valid case for filtering and indexing, so it should be allowed.

Fixes #4193
Fixes #3795
2019-02-05 15:50:17 +01:00
Rafael Ávila de Espíndola
84542dadfa sstables: delete_atomically: don't drop futures
We still allow the delete of rows from system.large_partition to run
in parallel with the sstable deletion, but now we return a future that
waits for both.

Tests: unit (release)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190205001526.68774-1-espindola@scylladb.com>
2019-02-05 16:47:58 +02:00
Calle Wilund
ba6a8ef35b tls: Use a default prio string disabling TLS1.0 forcing min 128bits
Fixes #4010

Unless user sets this explicitly, we should try explicitly avoid
deprecated protocol versions. While gnutls should do this for
connections initiated thusly, clients such as drivers etc might
use obsolete versions.

Message-Id: <20190107131513.30197-1-calle@scylladb.com>
2019-02-05 15:34:18 +02:00
Avi Kivity
6c71eae63f Merge "API: Stream compaction history records" from Amnon
"
get_compaction_history can return a lot of records which will add up to a
big http reply.

This series makes sure it will not create large allocations when
returning the results.

It adds an api to the query_processor to use paged queries with a
consumer function that returns a future, this way we can use the http
stream after each record.

This implementation will prevent large allocations and stalls.

Fixes #4152
"

* 'amnon/compaction_history_stream_v7' of github.com:scylladb/seastar-dev:
  tests/query_processor_test: add query_with_consumer_test
  system_keyspace, api: stream get_compaction_history
  query_processor: query and for_each_cql_result with future
2019-02-05 14:16:36 +02:00
Avi Kivity
ebf179318c Merge "SI: Add virtual columns to underlying MV" from Duarte
"
Virtual columns are MV-specific columns that contribute to the
liveness of view rows. However, we were not adding those columns when
creating an index's underlying MV, causing indexes to miss base rows.

Fixes #4144
Branches: master, branch-3.0
"

Reviewed-by: Nadav Har'El <nyh@scylladb.com>

* 'sec-index/virtual-columns/v1' of https://github.com/duarten/scylla:
  tests/secondary_index_test: Add reproducer for #4144
  index/secondary_index_manager: Add virtual columns to MV
2019-02-05 13:26:45 +02:00
Avi Kivity
367ef8d318 Merge "provide our own, relocatable, python3 interpreter" from Glauber
"

We would like to deploy Scylla in constrained environments where
internet access is not permitted. In those environments it is not
possible to acquire the dependencies of Scylla from external repos and
the packages have to be sent alongside with its dependencies.

In older distributions, like CentOS7 there isn't a python3 interpreter
available. And while we can package one from EPEL this tends to break in
practice when installing the software in older patchlevels (for
instance, installing into RHEL7.3 when the latest is RHEL7.5).

The reason for that, as we saw in practice, is that EPEL may
not respect RHEL patchlevels and have the python interpreter depending
on newer versions of some system libraries.

virtualenv can be used to create isolated python enviornments, but it is
not designed for full isolation and I hit at least two roadblocks in
practice:

1) It doesn't copy the files, linking some instead. There is an
   --always-copy option but it is broken (for years) in some
   distributions.
2) Even when the above works, it still doesn't copy some files, relying
   on the system files instead (one sad example was the subprocess
   module that was just kept in the system and not moved to the
   virtualenv)

This patch solves that problem by creating a python3 environment in a
directory with the modules that Scylla uses, and no other else. It is
essentially doing what vitualenv should do but doesn't. Once this
environment is assembled the binaries are then made relocatable the same
way the Scylla binary is.

One difference (for now) between the Scylla binary relocation process
and ours is that we steer away from LD_LIBRARY_PATH: the environment
variable is inherited by any child process steming from the caller,
which means that we are unable to use the subprocess module to call
system binaries like mkfs (which our scripts do a lot). Instead, we rely
on RUNPATH to tell the binary where to search for its libraries.

Once we generate an archive with the python3 interpreter, we then
package it as an rpm with bare any dependencies. The dependencies listed
are:

$ rpm -qpR scylla-relocatable-python3-3.6.7-1.el7.x86_64.rpm
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PartialHardlinkSets) <= 4.0.4-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(PayloadIsXz) <= 5.2-1

And the total size of that rpm, with all modules scylla needs is 20MB.

The Scylla rpm now have a way more modest dependency list:

$ rpm -qpR scylla-server-666.development-0.20190121.80b7c7953.el7.x86_64.rpm | sort | uniq
/bin/sh
curl
file
hwloc
kernel >= 3.10.0-514
mdadm
pciutils
rpmlib(CompressedFileNames) <= 3.0.4-1
rpmlib(FileDigests) <= 4.6.0-1
rpmlib(PayloadFilesHavePrefix) <= 4.0-1
rpmlib(PayloadIsXz) <= 5.2-1
scylla-conf
scylla-relocatable-python3 <== our python3 package.
systemd-libs
util-linux
xfsprogs

I have tested this end to end by generating RPMs from our master branch,
then installing them in a clean CentOS7.3 installation without even
using yum, just rpm -Uhv <package_list>

Then I called scylla_setup to make sure all python scripts were working
and started Scylla successfully.
"

* 'scylla-python3-v5' of github.com:glommer/scylla:
  Create a relocatable python3 interpreter
  spec file: fix python3 dependency list.
  fixup scripts before installing them to their final location
  automatically relocate python scripts
  make scyllatop relocatable
  use relative paths for installing scylla and iotune binaries
2019-02-05 12:53:34 +02:00
Amnon Heiman
c96c3ce9e8 tests/query_processor_test: add query_with_consumer_test
This patch adds a unit test for querying with a consumer function.

query with consumer uses paging, the tests covers the scenarios where
the number of rows bellow and above the page size, it also test the
option to stop in the middle of reading.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-02-05 12:35:53 +02:00
Amnon Heiman
6c7742d616 system_keyspace, api: stream get_compaction_history
get_compaciton_history can return big chunk of data.

To prevent large memory allocation, the get_compaction_history now read
each compaction_history record and use the http stream to send it.

Fixes #4152

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-02-05 11:14:53 +02:00
Amnon Heiman
c0e3b7673d query_processor: query and for_each_cql_result with future
query and for_each_cql_result accept a function that reads a row and
return a stop_iterator.

This implementation of those functions gets a function that returns a
future stop_iterator allowing preemption between calls.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-02-05 11:14:53 +02:00
Glauber Costa
afed2cddae Create a relocatable python3 interpreter
We would like to deploy Scylla in constrained environments where
internet access is not permitted. In those environments it is not
possible to acquire the dependencies of Scylla from external repos and
the packages have to be sent alongside with its dependencies.

In older distributions, like CentOS7 there isn't a python3 interpreter
available. And while we can package one from EPEL this tends to break in
practice when installing the software in older patchlevels (for
instance, installing into RHEL7.3 when the latest is RHEL7.5).

The reason for that, as we saw in practice, is that EPEL may
not respect RHEL patchlevels and have the python interpreter depending
on newer versions of some system libraries.

virtualenv can be used to create isolated python enviornments, but it is
not designed for full isolation and I hit at least two roadblocks in
practice:

1) It doesn't copy the files, linking some instead. There is an
  --always-copy option but it is broken (for years) in some
  distributions.
2) Even when the above works, it still doesn't copy some files, relying
   on the system files instead (one sad example was the subprocess
   module that was just kept in the system and not moved to the
   virtualenv)

This patch solves that problem by creating a python3 environment in a
directory with the modules that Scylla uses, and no other else. It is
essentially doing what vitualenv should do but doesn't. Once this
environment is assembled the binaries are then made relocatable the same
way the Scylla binary is.

One difference (for now) between the Scylla binary relocation process
and ours is that we steer away from LD_LIBRARY_PATH: the environment
variable is inherited by any child process steming from the caller,
which means that we are unable to use the subprocess module to call
system binaries like mkfs (which our scripts do a lot). Instead, we rely
on RUNPATH to tell the binary where to search for its libraries.

In terms of the python interpreter, PYTHONPATH does not need to be set
for this to work as the python interpreter will include the lib
directory in its PYTHONPATH. To confirm this, we executed the following
code:

    bin/python3 -c "import sys; print('\n'.join(sys.path))"

with the interpreter unpacked to  both /home/centos/glaubertmp/test/ and
/tmp. It yields respectively:

    /home/centos/glaubertmp/test/lib64/python36.zip
    /home/centos/glaubertmp/test/lib64/python3.6
    /home/centos/glaubertmp/test/lib64/python3.6/lib-dynload
    /home/centos/glaubertmp/test/lib64/python3.6/site-packages

and

    /tmp/python/lib64/python36.zip
    /tmp/python/lib64/python3.6
    /tmp/python/lib64/python3.6/lib-dynload
    /tmp/python/lib64/python3.6/site-packages

This was tested by moving the .tar.gz generated on my Fedora28 laptop to
a CentOS machine without python3 installed. I could then invoke
./scylla_python_env/python3 and use the interpreter to call 'ls' through
the subprocess module.

I have also tested that we can successfully import all the modules we listed
for installation and that we can read a sample yaml file (since PyYAML depends
on the system's libyaml, we know that this works)

Time to build:
real	0m15.935s
user	0m15.198s
sys	0m0.382s

Final archive size (uncompressed): 81MB
Final archive sie (compressed)   : 25MB

Signed-off-by: Glauber Costa <glauber@scylladb.com>
--
v3:
- rewrite in python3
- do not use temporary directories, add directly to the archive. Only the python binary
  have to be materialized
- Use --cacheonly for repoquery, and also repoquery --list in a second step to grab the file list
v2:
- do not use yum, resolve dependencies from installed packages instead
- move to scripts as Avi wants this not only for old offline CentOS
2019-02-04 18:02:40 -05:00
Glauber Costa
f757b42ba7 spec file: fix python3 dependency list.
The dependency list as it was did not reflect the fact that scyllatop is
now written in python3.

Some packages, like urwid, should use the python3 version. CentOS
doesn't really have an urwid package for python3, not even in EPEL. So
this officially marks the point in which we can't build packages that
will install in CentOS7 anyway.

Luckily, we will soon be providing our own python3 interpreter. But for
now, as a first step, simplify the dependency list by removing the
CentOS/Fedora conditional and listing the full python3 list

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-02-04 18:02:40 -05:00
Glauber Costa
7052028752 fixup scripts before installing them to their final location
Before installing python files to their final location in install.sh,
replace them with a thunk so that they can work with our python3
interpreter.  The way the thunk works, they will also work without our
python3 interpreter so unconditionally fixing them up is always safe.

I opt in this patch for fixing up just at install time to simplify
developer's life, who won't have to worry about this at all.

Note about the rpm .spec file: since we are relying on specific format
for the shebangs, we shouldn't let rpmbuild mess with them. Therefore,
we need to disable a global variable that controls that behavior (by
definition, Fedora rpmbuild will rewrite all shebangs to /usr/bin/python3)

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-02-04 18:02:40 -05:00
Glauber Costa
3869628429 automatically relocate python scripts
Given a python script at $DIR/script.py, this copies the script to
$DIR/libexec/script.py.bin, fixes its shebang to use /usr/bin/env instead
of an absolute path for the interpreter and replaces the original script
with a thunk that calls into that script.

PYTHONPATH is adjusted so that the original directory containing the script
can also serve as a source of modules, as would be originally intended.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-02-04 18:02:39 -05:00
Glauber Costa
1bb65a0888 make scyllatop relocatable
Right now the binary we distribute with scyllatop calls into
/usr/lib/scylla/scyllatop/scyllatop.py unconditionally. Calling that is
all that this binary does.

This poses a problem to our relocatable process, since we don't want
to be referring to absolute paths (And moreover, that is calling python
whereas it should be calling python3)

The scyllatop.py files includes a python3 shebang and is executable.
Therefore, it is best to just create a link to that file and execute it
directly

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-02-04 16:12:46 -05:00
Glauber Costa
e890b8af09 use relative paths for installing scylla and iotune binaries
The answer is yes: if we install them in $root/opt, we should link
to $root/opt

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2019-02-04 14:33:51 -05:00
Piotr Jastrzebski
834bec5cc9 Read shard awareness columns as dropped
Without this new version of Scylla won't be able to
start with system tables inherited after older version
that had shard awareness columns.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <cb62f20fc0c98f532c6f4ad5e08b3794951e85bd.1549289050.git.piotr@scylladb.com>
2019-02-04 18:43:11 +02:00
Rafael Ávila de Espíndola
bbd9dfcba7 Add a --split-dwarf option to configure.py
It is off by default as it conflicts with distcc.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190204002706.15540-1-espindola@scylladb.com>
2019-02-04 18:42:16 +02:00
Benny Halevy
a9e1e0233a Add a dev build mode to test.py
Message-Id: <20190204162112.7471-2-espindola@scylladb.com>
2019-02-04 18:38:23 +02:00
Rafael Ávila de Espíndola
6243443591 Add a dev build mode
The build times I got with a clean ccache were:

ninja dev      10806.89s user  678.29s system 2805% cpu  6:49.33 total
ninja release  28906.37s user 1094.53s system 2378% cpu 21:01.27 total
ninja debug    18611.17s user 1405.66s system 2310% cpu 14:26.52 total

With this version -gz is not passed to seastar's configure. It should
probably be seastar's configure responsibility to do that and I will
send a separate patch to do it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190204162112.7471-1-espindola@scylladb.com>
2019-02-04 18:38:22 +02:00
Calle Wilund
9cadbaa96f commitlog_replayer: Bugfix: finding truncation positions uses local var ref
"uuid" was ref:ed in a continuation. Works 99.9% of the time because
the continuation is not actually delayed (and assuming we begin the
checks with non-truncated (system) cf:s it works).
But if we do delay continuation, the resulting cf map will be
borked.

Fixes #4187.

Message-Id: <20190204141831.3387-1-calle@scylladb.com>
2019-02-04 16:51:13 +02:00
Rafael Ávila de Espíndola
15a515a39b build: Don't link utils/gz/gen_crc_combine_table with seastar
It doesn't use seastar, so there is no point in linking with it.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190203214145.43009-1-espindola@scylladb.com>
2019-02-04 15:43:16 +02:00
Botond Dénes
2a67355ded multishard_combining_reader: better shard selection algorithm
The multishard reader has to combine the output of all shards into a
single fragment stream. To do that, each time a `partition_start` is
read it has to check if there is another partition, from another shard,
that has to be emitted before this partition. Currently for this it
uses the partitioner. At every partition start fragment it checks if the
token falls into the current shard sub-range. The shard sub-range is the
continuous range of tokens, where each token belongs to the same shard.
If the partition doesn't belong to the current shard sub-range the
multishard reader assumes the following shard sub-range of the next shard
will have data and move over to it. This assumption will however only
stand on very dense tables, and will fail miserably on less dense
tables, resulting in the multishard reader effectively iterating over
the shard sub-ranges (4096 in the worst case), only to find data in just
a few of them. This resulted in high user-perceived latency when
scanning a sparse table.

This patch replaces this algorithm with one based on a shard heap. The
shards are now organized into a min-heap, by the next token they have
data for. When a partition start fragment is read from the current
shard, its token is compared to the smallest token in the shard heap. If
smaller, we continue to read from the current shard. Otherwise we move
to the shard with the smallest token. When constructing the reader, or
after fast-forwarding we don't know what first token each reader will
produce. To avoid reading in a partition from each reader, we assume
each reader will produce the first token from the first shard sub-range
that overlaps with the query range. This algorithm performs much better
on sparse tables, while also being slightly better on dense tables.

I did only a very rough measurement using CQL tracing. I populated a
table with four rows on a 64 shards machine, then scanned the entire
table.
Time to scan the table (microseconds):
before 27'846
after   5'248

Fixes: #4125

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <d559f887b650ab8caa79ad4d45fa2b7adc39462d.1548846019.git.bdenes@scylladb.com>
2019-02-04 14:10:23 +02:00
Piotr Sarna
11e6d88ca7 tests: supplement filtering collections with more cases
Filtering test cases for collections are supplemented with
checking whether CONTAINS works correctly for sets and maps.

Message-Id: <4a684152cdcdb65e1415ba5859699cb324312c2b.1548837150.git.sarna@scylladb.com>
2019-02-03 17:19:30 +02:00
Avi Kivity
468f8c7ee7 Merge "Print a warning if a row is too large" from Rafael
"
This is a first step in fixing #3988.
"

* 'espindola/large-row-warn-only-v4' of https://github.com/espindola/scylla:
  Rename large_partition_handler
  Print a warning if a row is too large
  Remove defaut parameter value
  Rename _threshold_bytes to _partition_threshold_bytes
  keys: add schema-aware printing for clustering_key_prefix
2019-02-03 13:57:42 +02:00
Nadav Har'El
5a695b8029 Materialized views: fix three error messages
Three error messages were supposed to include a column name, but a "{}"
was missing in the format so the given column name didn't actually appear
in the error message. So this patch adds the missing {}'s.

Fixes #4183.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190203112100.13031-1-nyh@scylladb.com>
2019-02-03 12:23:29 +01:00
Tomasz Grabiec
72dd6f54e3 gdb: Print total amount of memory used by small and large allocations
Message-Id: <1548956406-7601-2-git-send-email-tgrabiec@scylladb.com>
2019-02-01 13:18:16 +00:00
Tomasz Grabiec
f48fa542fc gdb: Extend 'scylla memory' to show memory used by large allocations
Adds new columns to the "Page spans" table named "large [B]" and
"[spans]", which shows how much memory is allocated in spans of given
size. Excludes spans used by small pools.

Useful in determining what is the size of large allocations which
consume the memory.

Example output:

Page spans:
index      size [B]      free [B]     large [B] [spans]
    0          4096          4096          4096       1
    1          8192         32768             0       0
    2         16384         16384             0       0
    3         32768         98304       2785280      85
    4         65536         65536       1900544      29
    5        131072        524288     471597056    3598
...
   31 8796093022208             0             0       0
Large allocations: 484675584 [B]
Message-Id: <1548956406-7601-1-git-send-email-tgrabiec@scylladb.com>
2019-02-01 13:18:01 +00:00
Asias He
28d6d117d2 migration_manager: Fix nullptr dereference in maybe_schedule_schema_pull
Commit 976324bbb8 changed to use
get_application_state_ptr to get a pointer of the application_state. It
may return nullptr that is dereferenced unconditionally.

In resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test, we saw:

   4 nodes in the tests

   n1, n2, n3, n4 are started

   n1 is stopped

   n1 is changed to use different shard config

   n1 is restarted ( 2019-01-27 04:56:00,377 )

The backtrace happened on n2 right fater n1 restarts:

   0 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature STREAM_WITH_RPC_STREAM is enabled
   1 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature WRITE_FAILURE_REPLY is enabled
   2 INFO 2019-01-27 04:56:05,175 [shard 0] gossip - Feature XXHASH is enabled
   3 WARN 2019-01-27 04:56:05,177 [shard 0] gossip - Fail to send EchoMessage to 127.0.58.1: seastar::rpc::closed_error (connection is closed)
   4 INFO 2019-01-27 04:56:05,205 [shard 0] gossip - InetAddress 127.0.58.1 is now UP, status =
   5 Segmentation fault on shard 0.
   6 Backtrace:
   7 0x00000000041c0782
   8 0x00000000040d9a8c
   9 0x00000000040d9d35
   10 0x00000000040d9d83
   11 /lib64/libpthread.so.0+0x00000000000121af
   12 0x0000000001a8ac0e
   13 0x00000000040ba39e
   14 0x00000000040ba561
   15 0x000000000418c247
   16 0x0000000004265437
   17 0x000000000054766e
   18 /lib64/libc.so.6+0x0000000000020f29
   19 0x00000000005b17d9

We do not know when this backtrace happened, but according to log from n3 an n4:

   INFO 2019-01-27 04:56:22,154 [shard 0] gossip - InetAddress 127.0.58.2 is now DOWN, status = NORMAL
   INFO 2019-01-27 04:56:21,594 [shard 0] gossip - InetAddress 127.0.58.2 is now DOWN, status = NORMAL

We can be sure the backtrace on n2 happened before 04:56:21 - 19 seconds (the
delay the gossip notice a peer is down), so the abort time is around 04:56:0X.
The migration_manager::maybe_schedule_schema_pull that triggers the backtrace
must be scheduled before n1 is restarted, because it dereference
application_state pointer after it sleeps 60 seconds, so the time
maybe_schedule_schema_pull is called is around 04:55:0X which is before n1 is
restarted.

So my theory is: migration_manager::maybe_schedule_schema_pull is scheduled, at this time
n1 has SCHEMA application_state, when n1 restarts, n2 gets new application
state from n1 which does not have SCHEMA yet, when migration_manager::maybe_schedule
wakes up from the 60 sleep, n1 has non-empty endpoint_state but empty
application_state for SCHEMA. We dereference the nullptr
application_state and abort.

Fixes: #4148
Tests: resharding_test.py:ReshardingTest_nodes4_with_SizeTieredCompactionStrategy.resharding_by_smp_increase_test
Message-Id: <9ef33277483ae193a49c5f441486ee6e045d766b.1548896554.git.asias@scylladb.com>
2019-02-01 09:01:08 +02:00
Piotr Jastrzebski
ad217bbdc7 Revert "system_keyspace: add sharding information to local table"
This reverts commit bdce561ada.

Those columns are not used and cause problems with tools.

Refs #4112
Message-Id: <c772ebc0ebc001e5bdf229424c6d51dc58cd5d2e.1548945023.git.piotr@scylladb.com>
2019-01-31 19:06:55 +01:00
Avi Kivity
9adf46b50e Update seastar submodule
* seastar 2f35731...c3be06d (1):
  > rpc: support closing streaming when only sink or source was created

Ref #4124.
2019-01-31 12:39:02 +02:00
Nadav Har'El
7b9b7f8ebc docs/metrics.md: document syntax for choosing specific instance/shard
As another useful example of Prometheus syntax, show the syntax of plotting
a graph for one particular node or shard.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Reviewed-by: Botond Denes <bdenes@scylladb.com>
Message-Id: <20190129221607.11813-1-nyh@scylladb.com>
2019-01-31 12:37:30 +02:00
Asias He
9d9ecda619 repair: Log keyspace and table name in repair_cf_range
When a repair failed, we saw logs like:

   repair - Checksum of range (8235770168569320790, 8235957818553794560] on
   127.0.0.1 failed: std::bad_alloc (std::bad_alloc)

It is hard to tell which keyspace and table has failed.

To fix, log the keyspace and table name. It is useful to know when debugging.

Fixes #4166
Message-Id: <8424d314125b88bf5378ea02a703b0f82c2daeda.1548818669.git.asias@scylladb.com>
2019-01-31 12:36:46 +02:00
Gleb Natapov
a70374d982 messaging_service: do not forget to close stream when sending it to another side failed
Fixes #4124

Message-Id: <20190131091857.GC3172@scylladb.com>
2019-01-31 12:01:56 +02:00
Piotr Jastrzebski
4b47094f30 Prevent undefined behaviour while writing range tombstones in LA/KA
Stop calling .remove_suffix on empty string_view.

ck_bview can be empty because this function can be
called for a half open range tombstone.

It is impossible to write such range tombstones to LA/KA SSTables
so we should throw a proper exception instead of allowing
an undefined behaviour.

Refs #4113

Tests: unit(release)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <c3738916953e4b10812aed95e645c739b4c29462.1548777086.git.piotr@scylladb.com>
2019-01-31 10:58:19 +01:00
Glauber Costa
94ead559f7 move scylla-housekeeping to dist/common/scripts
All of our python scripts are there and they are all installed
automatically into /usr/lib/scylla. By keeping scylla-housekeeping
separately we are just complicating our build process.

This would be just a minor annoyance but this broke the new relocatable
process for python3 that I am trying to put together because I forgot to
add the new location as a source for the scripts.

Therefore, I propose we start being more diligent with this and keeping
all scripts together for the future.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190123191732.32126-2-glauber@scylladb.com>
2019-01-31 11:44:34 +02:00
Jesse Haber-Kucharsky
c37aa258c5 build: Fix incremental builds when Seastar changes
When a file in the `seastar` directory changes, we want to minimize the
amount of Scylla artifacts that are re-built while ensuring that all
changes in Seastar are reflected in Scylla correctly.

For compiling object files, we change Seastar to be an "order only"
dependency so that changes to Seastar don't trigger unnecessary builds.

For linking, we add an "implicit" dependency on Seastar so that Scylla
is re-linked when Seastar changes.

With these changes, modifying a Seastar header file will trigger the
recompilation of the affected Scylla object files, and modifying a
Seastar source file will trigger linking only.

Fixes #4171

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <0ab43d79ce0d41348238465d1819d4c937ac6414.1548906335.git.jhaberku@scylladb.com>
2019-01-31 11:00:40 +02:00
Raphael S. Carvalho
930f8caff9 sstables/compaction: Fix segfault when replacing expired sstable in incremental compaction
Fully expired sstable is not added to compacting set, meaning it's not actually
compacted, but it's kept in a list of sstables which incremental compaction
uses to check if any sstable can be replaced.
Incremental compaction was unconditionally removing expired sstable from compacting
set, which led to segfault because end iterator was given.

The fix is about changing sstable_set::erase() behavior to follow standard one
for erase functions which will works if the target element is not present.

Fixes #4085.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190130163100.5824-1-raphaelsc@scylladb.com>
2019-01-30 16:32:45 +00:00
Avi Kivity
056b6a4439 Update seastar submodule
* seastar 07e1ed3...2f35731 (1):
  > Merge " Initial seastar ipv6 support" from Calle
2019-01-30 17:41:39 +02:00
Avi Kivity
1224cde871 Merge "Make perf_simple_query produce JSON results" from Paweł
"
This series enhances perf_simple_query error reporting by adding an
option of producing a json file containing the results. The format of
that file is very similar to the results produces by perf_fast_forward
in order to ease integration with any tools that may want to interpret
them.

In addition to that perf_simple_query now prints to the standard output
median, median absolute deviation, minimum and maximum of the partial
results, so that there is no need for external scripts to compute those
values.
"

* tag 'perf_simple_query-json/v1' of https://github.com/pdziepak/scylla:
  perf_simple_query: produce json results
  perf_simple_query: calculate and print statistics
  perf: time_parallel: return results of each iteration
  perf_simple_query: take advantage of threads in main()
2019-01-30 17:39:19 +02:00
Paweł Dziepak
6a0ee5dbbf Merge "Simpler fix for the memtable reader's fragment monotonicity violation" from Botond
"
Recently it was discovered that the memtable reader
(partition_snapshot_reader to be more precise) can violate mutation
fragment monotonicity, by remitting range tombstones when those overlap
with more than one ck range of the partition slice.
This was fixed by 7049cd9, however after that fix was merged a much
simpler fix was proposed by Tomek, one that doesn't involve nearly as
much changes to the partition snapshot reader and hences poses less risk
of breaking it.
This mini-series reverts the previous fix, then applies the new, simpler
one.

Refs: #4104
"

* 'partition-snapshot-reader-simpler-fix/v2' of https://github.com/denesb/scylla:
  partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges
  Revert "partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges"
2019-01-30 15:24:31 +00:00
Jesse Haber-Kucharsky
b39eac653d Switch to the the CMake-ified Seastar
Committer: Avi Kivity <avi@scylladb.com>
Branch: next

Switch to the the CMake-ified Seastar

This change allows Scylla to be compiled against the `master` branch of
Seastar.

The necessary changes:

- Add `-Wno-error` to prevent a Seastar warning from terminating the
  build

- The new Seastar build system generates the pkg-config files (for
  example, `seastar.pc`) at configure time, so we don't need to invoke
  Ninja to generate them

- The `-march` argument is no longer inherited from Seastar (correctly),
  so it needs to be provided independently

- Define `SEASTAR_TESTING_MAIN` so that the definition of an entry
  point is included for all unit test compilation units

- Independently link Scylla against Seastar's compiled copy of fmt in
  its build directory

- All test files use the (now public) Seastar testing headers

- Add some missing Seastar headers to source files

[avi: regenerate frozen toolchain, adjust seastar submoule]
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <02141f2e1ecff5cbcd56b32768356c3bf62750c4.1548820547.git.jhaberku@scylladb.com>
2019-01-30 11:17:38 +02:00
Botond Dénes
8d59c36165 partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges
When entering a new ck range (of the partition-slice), the partition
snapshot reader will apply to its range tombstones stream all the
tombstones that are relevant to the new ck range. When the partition has
range tombstones that overlap with multiple ck ranges, these will be
applied to the range tombstone stream when entering any of the ck ranges
they overlap with. This will result in the violation of the monotonicity
of the mutation fragments emitted by the reader, as these range
tombstones will be re-emitted on each ck range, if the ck range has at
least one clustering row they apply to.
For example, given the following partition:
    rt{[1,10]}, cr{1}, cr{2}, cr{3}...

And a partition-slice with the following ck ranges:
    [1,2], [3, 4]

The reader will emit the following fragment stream:
    rt{[1,10]}, cr{1}, cr{2}, rt{[1,10]}, cr{3}, ...

Note how the range tombstone is emitted twice. In addition to violating
the monotonicity guarantee, this can also result in an explosion of the
number of emitted range tombstones.

Fix by trimming range tombstones to the start of the current ck range,
thus ensuring that they will not violate mutation fragment monotonicity
guarantees.

Refs: #4104

This is a much simpler fix for the above issue, than the already
committed one (7049cd937A). The latter is reverted by the previous
patch and this patch applies the simpler fix.
2019-01-30 10:01:13 +02:00
Nadav Har'El
9dd3c59c77 docs/metrics.md: explain Prometheus and Grafana
docs/metrics.md so far explained just the REST API for retrieving current
metrics from a single Scylla node. In this patch, I add basic explanations
on how to use the Prometheus and Grafana tools included in the
"scylla-grafana-monitoring" project.

It is true that technically, what is being explained here doesn't come
with the Scylla project and requires the separate scylla-grafana-monitoring
to be installed as well. Nevertheless, most Scylla developers will need this
knowledge eventually and suprisingly it appears it was never documented
anywhere accessible to newbie developers, and I think metrics.md is the
right place to introduce it.

In fact, I myself wasn't aware until today that Prometheus actually had
its own Web UI on port 9090, and that it is probably more useful for
developers than Grafana is.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Reviewed-by: Botond Denes <bdenes@scylladb.com>
Message-Id: <20190129114214.17786-1-nyh@scylladb.com>
2019-01-29 15:46:06 +02:00
Duarte Nunes
35c03f41a4 Merge 'Fix multiple contains for one column' from Piotr
"
An error in validating CONTAINS restrictions against collections caused
only the first restriction to be taken into account due to returning
prematurely.
This miniseries provides a fix for that as well as a matching test case.

Tests: unit (release)
Fixes #4161
"

* 'fix_multiple_contains_for_one_column' of https://github.com/psarna/scylla:
  tests: enable CONTAINS tests for filtering
  cql3: remove premature return from is_satisfied_by
  cql3: restore indentation
2019-01-29 11:10:13 +00:00
Piotr Sarna
11aae54cca tests: enable CONTAINS tests for filtering
Tests for filtering with CONTAINS restrictions were not enabled,
so they are now. Also, another case for having two CONTAINS restrictions
for a single column is added.

Refs #4161
2019-01-29 11:47:28 +01:00
Piotr Sarna
9595fec2ec cql3: remove premature return from is_satisfied_by
Function which checked whether a CONTAINS restriction is satisfied
by a collection erroneously returned prematurely after checking
just the first restriction - which works fine for the usual case,
but fails if there are multiple CONTAINS restrictions present
for a column.

Fixes #4161
2019-01-29 11:47:28 +01:00
Piotr Sarna
89af01315d cql3: restore indentation 2019-01-29 11:47:28 +01:00
Rafael Ávila de Espíndola
625080b414 Rename large_partition_handler
Now that it also handles large rows, rename it to large_data_handler.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 15:03:14 -08:00
Rafael Ávila de Espíndola
1185138a34 Print a warning if a row is too large
Tests: unit (release)

Refs #3988.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 15:03:10 -08:00
Rafael Ávila de Espíndola
776d5bb9e2 Remove defaut parameter value
The value is already passed by cql_table_large_partition_handler, so
the default was just for nop_large_partition_handler.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 13:02:01 -08:00
Rafael Ávila de Espíndola
30528fa853 Rename _threshold_bytes to _partition_threshold_bytes
A followup patch will add a threshold for rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 13:02:01 -08:00
Rafael Ávila de Espíndola
561285488b keys: add schema-aware printing for clustering_key_prefix
For reporting large rows we have to be able to print clustering keys
in addition to partition keys.

Refs #3988.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 13:01:54 -08:00
Paweł Dziepak
335dca54a5 perf_simple_query: produce json results 2019-01-28 16:36:06 +00:00
Paweł Dziepak
7d21c9c31f perf_simple_query: calculate and print statistics 2019-01-28 16:36:06 +00:00
Paweł Dziepak
eb3d80fa2b perf: time_parallel: return results of each iteration 2019-01-28 16:35:33 +00:00
Pekka Enberg
7bda3abbc6 toolchain/dbuild: Fix permission errors when SELinux is enabled
Use the ":z" suffix to tell Docker to relabel file objets on shared
volumes. Fixes accessing filesystem via dbuild when SELinux is enabled.

Message-Id: <20190128160557.2066-1-penberg@scylladb.com>
2019-01-28 18:16:53 +02:00
Paweł Dziepak
6a1e1e8454 perf_simple_query: take advantage of threads in main() 2019-01-28 13:21:08 +00:00
Paweł Dziepak
11a1f97307 Merge "Fix cleanup of temporary sstable directories" from Benny
"
Cleanup of temporary sstable directories in distributed_loader::populate_column_family
is completely broken and non tested. This code path was never executed since
populate_column_family doesn't currently list subdirectories at all.

This patchset fixes this code path and scans subdirectories in populate_column_family.
Also, a unit test is added for testing the cleanup of incomplete (unsealed) sstables.

Fixes: #4129
"

* 'projects/sst-temp-dir-cleanup/v3' of https://github.com/bhalevy/scylla:
  tests: add test_distributed_loader_with_incomplete_sstables
  tests: single_node_cql_env::do_with: use the provided data_file_directories path if available
  tests: single_node_cql_env::_data_dir is not used
  distributed_loader: populate_column_family should scan directories too
  sstables: fix is_temp_dir
  distributed_loader: populate_column_family: ignore directories other than sstable::is_temp_dir
  distributed_loader: remove temporary sstable directories only on shard 0
  distributed_loader: push future returned by rmdir into futures vector
2019-01-28 12:23:00 +00:00
Duarte Nunes
ea34e242de Merge 'Do not use hints for view building' from Piotr
"
This series prevents view building to fall back to storing hints.
Instead, it will try to send hints to an endpoint as if it has
consistency level ONE, and in case of failure retry the whole
building step. Then, view building will never be marked as finished
prematurely (because of pending hints), which will help avoid
creating inconsistencies when decommissioning a node from the cluster.

Tests:
  unit (release)
  dtest (materialized_views_test.py.*)

Fixes #3857
Fixes #4039
"

* 'do_not_mark_view_as_built_with_hints_7' of https://github.com/psarna/scylla:
  db,view: add updating view_building_paused statistics
  database: add view_building_paused metrics
  table: make populate_views not allow hints
  db,view: add allow_hints parameter to mutate_MV
  storage_proxy: add allow_hints parameter to send_to_endpoint
2019-01-28 10:31:14 +00:00
Piotr Sarna
9a6261ca27 db,view: add updating view_building_paused statistics
Each time view building does is paused because of connection failure,
view_building_paused metrics is bumped.
2019-01-28 09:38:42 +01:00
Piotr Sarna
e30b0663d6 database: add view_building_paused metrics
The metrics exposes how many times view building process was paused,
e.g. because target node was down or overloaded.
2019-01-28 09:38:42 +01:00
Piotr Sarna
5dec6dc6c6 table: make populate_views not allow hints
View building uses populate_views to generate and send view updates.
This procedure will now not allow hints to be used to acknowledge
the write. Instead, the whole building step will be retried on failure.

Fixes #3857
Fixes #4039
2019-01-28 09:38:42 +01:00
Piotr Sarna
e30cf22956 db,view: add allow_hints parameter to mutate_MV
Mutating MV function can now accept a parameter whether
hints should be allowed during sending mutations to endpoints.
2019-01-28 09:38:42 +01:00
Piotr Sarna
e0fe9ce2c0 storage_proxy: add allow_hints parameter to send_to_endpoint
With hints allowed, send_to_endpoint will leverage consistency level ANY
to send data. Otherwise, it will use the default - cl::ONE.
2019-01-28 09:38:41 +01:00
Rafael Ávila de Espíndola
5332ebd50c Update the description of compaction_large_partition_warning_threshold_mb
Despite the name, this option also controls if a warning is issued
during memtable writes.

Warning during memtable writes is useful but the option name also
exists in cassandra, so probably the best we can do is update the
description.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190125020821.72815-1-espindola@scylladb.com>
2019-01-28 09:09:35 +02:00
Takuya ASADA
5c6c008109 dist/ami: follow build script changes on -jmx/-tools/-ami packages
We need to follow changes of rpm package build procedure on
-jmx/-tools/-ami packages, since it have been changed when we merged
relocatable pacakge.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190127204436.13959-1-syuu@scylladb.com>
2019-01-28 09:08:32 +02:00
Takuya ASADA
7db1b45839 reloc: move relocatable libraries from /opt/scylladb/lib to /opt/scylladb/libreloc
On Scylla 3rdparty tools, we add /opt/scylladb/lib to LD_LIBRARY_PATH.
We use same directory for relocatable binaries, including libc.so.6.
Once we install both scylla-env package and relocatable version of scylla-server package, the loader tries to load libc from /opt/scylladb/lib then entire distribution become unusable.

We may able to use Obsoletes or Conflict tag on .rpm/.deb to avoid
install new Scylla package with scylla-env, but it's better & safer not to share
same directory for different purpose.

Fixes #3943

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190128023757.25676-1-syuu@scylladb.com>
2019-01-28 09:04:56 +02:00
Avi Kivity
274f553485 tools: toolchain: run dbuild container with same timezone as host
Make it easier to work interactively by not reporting surprising times.

There are also reports that dtest fails with incorrect timezones, but those
are probably bugs in dtest.
Message-Id: <20190127134754.1428-1-avi@scylladb.com>
2019-01-27 22:48:42 +00:00
Duarte Nunes
aafaf840a2 tests/secondary_index_test: Add reproducer for #4144
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2019-01-27 22:30:34 +00:00
Duarte Nunes
aa476cd6c9 index/secondary_index_manager: Add virtual columns to MV
Virtual columns are MV-specific columns that contribute to the
liveness of view rows. However, we were not adding those columns when
creating an index's underlying MV, causing indexes to miss base rows.

Fixes #4144
Branches: master, branch-3.0

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2019-01-27 22:30:12 +00:00
Benny Halevy
36b6a3ebcf tests: add test_distributed_loader_with_incomplete_sstables
Test removal of sstables with temporary TOC file,
with and without temporary sstable directory.

Temporary sstable directories may be empty or still have
leftover components in them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:48:24 +02:00
Benny Halevy
64a23ea3bc tests: single_node_cql_env::do_with: use the provided data_file_directories path if available
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
441809094a tests: single_node_cql_env::_data_dir is not used
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
74ef09a3a2 distributed_loader: populate_column_family should scan directories too
To detect and cleanup leftover temporary sstable directories.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
bd85975277 sstables: fix is_temp_dir
1. fs::canonical required that the path will exist.
   and there is no need for fs::canonical here.
2. fs::path::extension will return the leading dot.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
c2a5f3b842 distributed_loader: populate_column_family: ignore directories other than sstable::is_temp_dir
populate_column_family currently lists only regular files. ignoring all directories.
A later patch in this series allows it to list also directories so to cleanup
the temporary sstable directories, yet valid sub-directories, like staging|upload|snapshots,
may still exist and need to be ignored.

Other kinds of handling, like validating recgnized sub-directories and halting on
unrecognized sub-directories are possible, yet out of scope for this patch(set).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
9bd7b2f4e6 distributed_loader: remove temporary sstable directories only on shard 0
Similar to calling remove_sstable_with_temp_toc later on in
populate_column_family(), we need only one thread to do the
cleanup work and the existing convention is that it's shard 0.

Since lister::rmdir is checking remove_file of all entries
(recursively) and the dir itself, doing that concurrently would fail.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-27 14:14:32 +02:00
Benny Halevy
bcfb2e509b distributed_loader: push future returned by rmdir into futures vector 2019-01-27 14:14:32 +02:00
Asias He
ee0bb0aa94 tests: Drop the unsupported random_read mode in perf_sstable
It is not supported. Remove it.
Message-Id: <fe31e090574be96a9620b6902ceb843699d558d0.1548403105.git.asias@scylladb.com>
2019-01-25 14:24:40 +00:00
Avi Kivity
85abb13679 Merge "Fix cross shard cf usage" from Piotr
"
Lambda passed to distribute_reader_and_consume_on_shards shouldn't
capture shard local variables.

Fixes #4108

Tests:
unit(release),
dtest(update_cluster_layout_tests.TestLargeScaleCluster.add_50_nodes_test)
"

* 'haaawk/4108/v2' of github.com:scylladb/seastar-dev:
  Fix cross shard cf usage in repair
  Fix cross shard cf usage in streaming
2019-01-24 19:40:44 +02:00
Avi Kivity
d0f9e00e85 Merge " Support 64-bit gc_clock" (fixes) from Benny
"
Use int64_t in data::cell for expiry / deletion time.

Extend time_overflow unit tests in cql_query_test to use
select statements with and without bypass cache to access deeper
into the system.

Refs #3353
"

* 'projects/gc_clock_64_fixes/v1' of https://github.com/bhalevy/scylla:
  tests: extend time_overflow unit tests
  data::cell: use int64_t for expiry and deletion time
2019-01-24 19:15:12 +02:00
Piotr Jastrzebski
fab1b7a3a2 Fix cross shard cf usage in repair
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 18:13:49 +01:00
Piotr Jastrzebski
1ac7283550 Fix cross shard cf usage in streaming
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 18:13:30 +01:00
Glauber Costa
ec66dd6562 scylla_setup: tell users about the possibility of a non-interactive session
From day1, scylla_setup can be run either iteractively or through
command line parameters. Still, one of the requests we are asked the
most from users is whether we can provide them with a version of
scylla_setup that they can call from their scripts.

This probably happens because once you call a script interactively,
it may not be totally obvious that a different mode is available.
Even when we do tell users about that possibility, the request number
two is then "which flags do I pass?"

The solution I am proposing is to just tell users the answers to those
qestions at the end of an interactive session. After this patch, we
print the following message to the console:

  ScyllaDB setup finished.
  scylla_setup accepts command line arguments as well! For easily provisioning in a similar environmen than this, type:

    scylla_setup --no-raid-setup --nic eth0 --no-kernel-check \
                 --no-verify-package --no-enable-service --no-ntp-setup \
                 --no-node-exporter --no-fstrim-setup

  Also, to avoid the time-consuming I/O tuning you can add --no-io-setup and copy the contents of /etc/scylla.d/io*
  Only do that if you are moving the files into machines with the exact same hardware

Notes on the implementation: it is unfortunate for these purposes that
all our options are negated. Most conditionals are branching on true
conditions, so although I could write this:

  args.no_option = not interactive_ask_service(...)
  if not args.no_option:
    ...

I opted in this patch to write:

  option = interactive_ask_service(...)
  args.no_option = not option
  if option:
    ...

There is an extra line and we have to update args separately, but it
makes it less hard to get confused in the conditional with the double
negation. Let me know if there are disagreements here.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190124153832.21140-1-glauber@scylladb.com>
2019-01-24 17:41:26 +02:00
Benny Halevy
6efd85ed01 tests: extend time_overflow unit tests
Test also cql select queries with and without bypass cache.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-24 15:55:06 +02:00
Benny Halevy
7373825473 data::cell: use int64_t for expiry and deletion time
Ttl may still use int32_t to reduce footprint

Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-24 15:55:06 +02:00
Takuya ASADA
597059b4b1 dist/debian: skip stripping libprotobuf.so.15
dh_strip won't able to strip libprotobuf.so.15, and we actually don't
need to strip dependency libraries, so skip it.

Fixes #4135

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190123202213.2117-4-syuu@scylladb.com>
2019-01-24 15:51:56 +02:00
Takuya ASADA
aefc18e70d dist/debian: install /usr/bin/file for dh_strip
dh_strip requires /usr/bin/file but does not automatically installed, so
install it on build_deb.sh.

Fixes #4134

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190123202213.2117-3-syuu@scylladb.com>
2019-01-24 15:51:53 +02:00
Benny Halevy
fbebd0bb1d thrift: validate_column_name: fix exception format string
It's printing uint32_t rather than char*.

Refs #4140

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190124104002.32381-1-bhalevy@scylladb.com>
2019-01-24 12:46:23 +02:00
Avi Kivity
b58b82c9a2 Merge "Cut build dependencies around types.hh" from Piotr
"
I've recently had to work around types.hh/types.cc files and had
very unpleasent experience with incremental build on every change
to types.hh. It took ~30 min on my machine which is almost as much
as the clean build.

I looked around and it turns out that types.hh contains the whole
hierarchy of the types. On the same time, many places access the
types only through abstract_type which is the root of the
hierarchy.

This patchset extracts user_type_impl, tuple_type_impl,
map_type_impl, set_type_impl, list_type_impl and
collection_type_impl from types.hh and places each of them
in a separate header.

The result of this is that change in user_type_impl causes now
incremental build of ~6 min instead of ~30 min.
Change to tuple_type_impl causes incremental build of ~7.5 min
instead of ~30 min and change to map_type_impl triggers incremental
build that takes ~20 min instead of ~30 min.

Tests: unit(release)
"

* 'haaawk/types_build_speedup_2/rfc/2' of github.com:scylladb/seastar-dev:
  Stop including types/list.hh in cql3/tuples.hh
  Stop including types/set.hh into cql3/sets.hh
  Move collection_type_impl out of types.hh to types/collection.hh
  Move set_type_impl out of types.hh to types/set.hh
  Move list_type_impl out of types.hh to types/list.hh
  Move map_type_impl out of types.hh to types/map.hh
  Move tuple_type_impl from types.hh to types/tuple.hh
  Decouple database.hh from types/user.hh
  Allow to use shared_ptr with incomplete type other than sstable
  Move user_type_impl out of types.hh to types/user.hh
2019-01-24 11:21:22 +02:00
Piotr Jastrzebski
a3912a35f5 Stop including types/list.hh in cql3/tuples.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:57:19 +01:00
Piotr Jastrzebski
fe8dfc8fdc Stop including types/set.hh into cql3/sets.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:57:19 +01:00
Piotr Jastrzebski
5a5201a50b Move collection_type_impl out of types.hh to types/collection.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
ad016a732b Move set_type_impl out of types.hh to types/set.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
b1e1b66732 Move list_type_impl out of types.hh to types/list.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
147cc031db Move map_type_impl out of types.hh to types/map.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
b6b2fdc5be Move tuple_type_impl from types.hh to types/tuple.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
7666e81b51 Decouple database.hh from types/user.hh
This commit declares shared_ptr<user_types_metadata> in
database.hh were user_types_metadata is an incomplete type so
it requires
"Allow to use shared_ptr with incomplete type other than sstable"
to compile correctly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:55:04 +01:00
Piotr Jastrzebski
316be5c6b5 Allow to use shared_ptr with incomplete type other than sstable
When seastar/core/shared_ptr_incomplete.hh is included in a header
then it causes problems with all declarations of shared_ptr<T> with
incomplete type T that end up in the same compilation unit.

The problem happens when we have a compilation unit that includes
two headers a.hh and b.hh such that a.hh includes
seastar/core/shared_ptr_incomplete.hh and b.hh declares
shared_ptr<T> with incomplete type T. On the same time this
compilation unit does not use declared shared_ptr<T> so it should
compile and work but it does not because shared_ptr_incomplete.hh
is included and it forces instantiation of:

template <typename T>
T*
lw_shared_ptr_accessors<T,
void_t<decltype(lw_shared_ptr_deleter<T>{})>>::to_value(lw_shared_ptr_counter_base*
counter) {
    return static_cast<T*>(counter);
}

for each declared shared_ptr<T> with incomplete type T. Even the once
that are never used.

Following commit "Decouple database.hh from types/user.hh"
moves user_types_metadata type out of database.hh and instead
declares shared_ptr<user_types_metadata> in database.hh where
user_types_metadata is incomplete. Without this commit
the compilation of the following one fails with:

In file included from ./sstables/sstables.hh:34,
                 from ./db/size_estimates_virtual_reader.hh:38,
                 from db/system_keyspace.cc:77:
seastar/include/seastar/core/shared_ptr_incomplete.hh: In
instantiation of ‘static T*
seastar::internal::lw_shared_ptr_accessors<T,
seastar::internal::void_t<decltype
(seastar::lw_shared_ptr_deleter<T>{})>
>::to_value(seastar::lw_shared_ptr_counter_base*) [with T =
user_types_metadata]’:
seastar/include/seastar/core/shared_ptr.hh:243:51:   required from
‘static void seastar::internal::lw_shared_ptr_accessors<T,
seastar::internal::void_t<decltype
(seastar::lw_shared_ptr_deleter<T>{})>
>::dispose(seastar::lw_shared_ptr_counter_base*) [with T =
user_types_metadata]’
seastar/include/seastar/core/shared_ptr.hh:300:31:   required from
‘seastar::lw_shared_ptr<T>::~lw_shared_ptr() [with T =
user_types_metadata]’
./database.hh:1004:7:   required from ‘static void
seastar::internal::lw_shared_ptr_accessors_no_esft<T>::dispose(seastar::lw_shared_ptr_counter_base*)
[with T = keyspace_metadata]’
seastar/include/seastar/core/shared_ptr.hh:300:31:   required from
‘seastar::lw_shared_ptr<T>::~lw_shared_ptr() [with T =
keyspace_metadata]’
./db/size_estimates_virtual_reader.hh:233:67:   required from here
seastar/include/seastar/core/shared_ptr_incomplete.hh:38:12: error:
invalid static_cast from type ‘seastar::lw_shared_ptr_counter_base*’
to type ‘user_types_metadata*’
     return static_cast<T*>(counter);
            ^~~~~~~~~~~~~~~~~~~~~~~~
[131/415] CXX build/release/distributed_loader.o

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:45:25 +01:00
Piotr Jastrzebski
e92b4c3dbc Move user_type_impl out of types.hh to types/user.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:04:04 +01:00
Rafael Ávila de Espíndola
f7d1dc16d4 database: Use nop_large_partition_handler to avoid self-reporting
Currently nop_large_partition_handler is only used in tests, but it
can also be used avoid self-reporting.

Tests: unit(Release)

I also tested starting scylla with
--compaction-large-partition-warning-threshold-mb=0.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190123205059.39573-1-espindola@scylladb.com>
2019-01-23 21:11:21 +00:00
Avi Kivity
4882f29f82 Merge "Detemplatize primary key restrictions" from Piotr
"
This series is a first small step towards rewriting
CQL restrictions layer. Primary key restrictions used to be
a template that accepts either partition_key or clustering_key,
but the implementation is already based on virtual inheritance,
so in multiple cases these templates need specializations.

Refs #3815
"

* 'detemplatize_primary_key_restrictions_2' of https://github.com/psarna/scylla:
  cql3: alias single_column_primary_key_restrictions
  cql3: remove KeyType template from statement_restrictions
  cql3: remove template from primary_key_restrictions
  cql3: remove forwarding_primary_key_restrictions
2019-01-23 17:43:03 +02:00
Piotr Sarna
9982587bea cql3: alias single_column_primary_key_restrictions
In preparation for detemplatizing this class, it's aliased with
single_column_partition_key restrictions and
single_column_clustering_key_restrictions accordingly.
2019-01-23 17:43:03 +02:00
Piotr Sarna
4663094474 cql3: remove KeyType template from statement_restrictions
The code is unfolded into serving partition and clustering key
cases separately instead of overloading a template.
2019-01-23 17:43:03 +02:00
Piotr Sarna
4bd0cb8dd9 cql3: remove template from primary_key_restrictions
Partition key restrictions and clustering key restrictions
currently require virtual function specializations and have
lots of distinct code, so there's no value in having
primary_key_restrictions<KeyType> template.
2019-01-23 17:43:03 +02:00
Piotr Sarna
bdd8566ea3 cql3: remove forwarding_primary_key_restrictions
I presume this header was created during code translation from C*,
but it's not used or included anywhere.
2019-01-23 17:43:03 +02:00
Avi Kivity
c83ae62aed build: fix libdeflate object file corruption during parallel build
libdeflate's build places some object files in the source directory, which is
shared between the debug and release build. If the same object file (for the two
modes) is written concurrently, or if one more reads it while the other writes it,
it will be corrupted.

Fix by not building the executables at all. They aren't needed, and we already
placed the libraries' objects in the build directory (which is unshared). We only
need the libraries anyway.

Fixes #4130.
Branches: master, branch-3.0
Message-Id: <20190123145435.19049-1-avi@scylladb.com>
2019-01-23 15:32:17 +00:00
Nadav Har'El
76f1fcc346 cql3: really ensure retrieval of columns for filtering
Commit fd422c954e aimed to fix
issue #3803. In that issue, if a query SELECTed only certain columns but
did filtering (ALLOW FILTERING) over other unselected columns, the filtering
didn't work. The fix involved adding the columns being filtered to the set
of columns we read from disk, so they can be filtered.

But that commit included an optimization: If you have clustering keys
c1 and c2, and the query asks for a specific partition key and c1 < 3 and
c2 > 3, the "c1 < 3" part does NOT need to be filtered because it is already
done as a slice (a contiguous read from disk). The committed code erroneously
concluded that both c1 and c2 don't need to be filtered, which was wrong
(c2 *does* need to be read and filtered).

In this patch, we fix this optimization. Previously, we used the "prefix
length", which in the above example was 2 (both c1 and c2 were filtered)
but we need a new and more elaborate function,
num_prefix_columns_that_need_not_be_filtered(), to determine we can only
skip filtering of 1 (c1) and cannot skip the second.

Fixes #4121. This patch also adds a unit test to confirm this.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Message-Id: <20190123131212.6269-1-nyh@scylladb.com>
2019-01-23 15:24:30 +02:00
Avi Kivity
835ad406de tools: toolchain: update docker build command to include --no-cache
If docker sees the Dockerfile hasn't changed it may reuse an old image, not
caring that context files and dependent images have in fact changed. This can
happen for us if install-dependencies.sh or the base Fedora image changed.

To make sure we always get a correct image, add --no-cache to the build command.
Message-Id: <20190122185042.23131-1-avi@scylladb.com>
2019-01-23 10:47:40 +01:00
Glauber Costa
5d754c1d11 install-dependencies.sh: add packages that will be needed by scylla-python3
Done in a separate step so we can update the toolchain first.

dnf-utils is used to bring us repoquery, which we will use to derive the
list of files in the python packages.
patchelf is needed so we can add a DT_RUNPATH section to the interpreter
binary.
the python modules, as well as the python3 interpreter are taken from
the current RPM spec file.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
[avi: regenerate frozen toolchain image]
Message-Id: <20190123011751.14440-1-glauber@scylladb.com>
2019-01-23 10:53:10 +02:00
Avi Kivity
c1dd04986b Merge "Prepare for the switch to CMake-ified Seastar" from Jesse
"
This series prepares for the integration of the `master` branch of
Seastar back into Scylla.

A number of changes to the existing build are necessary to integrate
Seastar correctly, and these are detailed in the individual change
messages.

I tested with and without DPDK, in release and debug mode.

The actual switch is a separate patch.
"

* 'jhk/seastar_cmake/v4' of https://github.com/hakuch/scylla:
  build: Fix link order for DPDK
  tests: Split out `sstable_datafile_test`
  build: Remove unnecessary inclusion
  tests: Fix use-after-free errors in static vars
  build: Remove Seastar internals
  build: Only use Seastar flags from pkg-config
  build: Query Seastar flags using pkg-config
  build: Change parameters for `pkg_config` function
2019-01-23 10:33:00 +02:00
Duarte Nunes
88c7c1e851 Merge 'hinted handoff: cache cf mappings' from Vlad
"
Cache cf mappings when breaking in the middle of a segment sending so
that the sender has them the next time it wants to send this segment
for where it left off before.

Also add the "discard" metric so that we can track hints that are being
discarded in the send flow.
"

Fixes #4122

* 'hinted_handoff_cache_cf_mappings-v1' of https://github.com/vladzcloudius/scylla:
  hinted handoff: cache column family mappings for segments that were not sent out in full
  hinted handoff: add a "discarded" metric
2019-01-23 00:44:41 +00:00
Jesse Haber-Kucharsky
3d79bd25b2 build: Fix link order for DPDK
Without this change, DPDK libraries will not be linked to Scylla
correctly when we switch to the new pkg-config support in Seastar.
2019-01-22 18:25:01 -05:00
Jesse Haber-Kucharsky
cfb1492a6e tests: Split out sstable_datafile_test
Each `*_test.cc` file must be compiled separately so that there is only
one definition of `main`.

This change correctly defines an independent `sstable_datafile_test`
from `sstable_datafile_test.cc` and adds that test to the existing
suite.
2019-01-22 18:25:01 -05:00
Jesse Haber-Kucharsky
02dd7bcc82 build: Remove unnecessary inclusion 2019-01-22 18:25:01 -05:00
Jesse Haber-Kucharsky
2a62550002 tests: Fix use-after-free errors in static vars
Without these two variables being declared as TLS, executing these two
tests in "debug" mode fail AddressSanitizer's checks.
2019-01-22 18:24:52 -05:00
Jesse Haber-Kucharsky
88cc43d5e0 build: Remove Seastar internals
We don't need to re-specify Seastar internals in Scylla's build, since
everything private to Seastar is managed via pkg-config.

We can eliminate all references to ragel and generated ragel header
files from Seastar.

We can also simplify the dependence on generated Seastar header files by
ensuring that all object files depend on Seastar being built first.
2019-01-22 18:24:38 -05:00
Jesse Haber-Kucharsky
4f44e143be build: Only use Seastar flags from pkg-config
Some Seastar-specific flags were manually specified as Ninja rules, but
we want to rely exclusively on Seastar for its necessary flags.

The pkg-config file generated by the latest version of Seastar is
correct and allows us to do this, but the version generated by Scylla's
current check-out of Seastar does not. Therefore, we have to manually
adjust the pkg-config results temporarily until we update Seastar.
2019-01-22 18:24:38 -05:00
Jesse Haber-Kucharsky
8743cff59b build: Query Seastar flags using pkg-config
Previously, we manually parsed the pkg-config file. We now used
pkg-config itself to get the correct build flags.

This means that we will get the correct behavior for variable expansion,
and fields like `Requires`, `Requires.private`, and `Libs.private`.
Previously, these fields were ignored.
2019-01-22 18:24:38 -05:00
Vlad Zolotarov
34829b8f81 hinted handoff: cache column family mappings for segments that were not sent out in full
We will try to send a particular segment later (in 1s) from the place
where we left off if it wasn't sent out in full before. However we may miss
some of column family mappings when we get back to sending this file and
start sending from some entry in the middle of it (where we left off)
if we didn't save column family mappings we cached while reading this segment
from its begining.

This happens because commitlog doesn't save a column family information
in every entry but rather once for each uniq column family (version) per
"cycle" (see commitlog::segment description for more info).

Therefore we have to assume that a particular column family mapping
appears once in the whole segment (worst case). And therefore, when we
decide to resume sending a segment we need to keep the column family
mappings we accumulated so far and drop them only after we are done with
this particular segment (sent it out in full).

Fixes #4122

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-01-22 15:24:22 -05:00
Vlad Zolotarov
4516a8cfc4 hinted handoff: add a "discarded" metric
Account the amount of hints that were discarded in the send path.
This may happen for instance due to a schema change or because a hint
being to old.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-01-22 14:11:09 -05:00
Avi Kivity
fa0312d0f2 Merge "Support 64-bit gc_clock" from Benny
"
wrap around on 2038-01-19 03:14:07 UTC.  Such dates are valid deletion
times starting 2018-01-19 with the 20 years long maximum ttl.

This patchset extends gc_clock::duration::rep to int64_t and adds
respective unit tests for the max_ttl cases.

Fixes #3353

Tests: unit (release)
"

* 'projects/gc_clock_64/v2' of https://github.com/bhalevy/scylla:
  tests: cql_query_test add test_time_overflow
  gc_clock: make 64 bit
  sstables: mc: use int64_t for local_deletion_time and ttl
  sstables: add capped_tombstone_deletion_time stats counter
  sstables: mc: cap partition tombstone local_deletion_time to max
  sstables: add capped_local_deletion_time stats counter
  sstables: mc: metadata collector: cap local_deletion_time at max
  sstables: mc: use proper gc_clock types for local_deletion_time and ttl
  db: get default_time_to_live as int32_t rather than gc_clock::rep
  sstables: safely convert ttl and local_deletion_time to int32_t
  sstables: mc: move liveness_info initialization to members
  sstables: mc: move parsing of liveness_info deltas to data_consume_rows_context_m
  sstables: mc: define expired_liveness_ttl as signed int32_t
  sstables: mc: change write_delta_deletion_time to receive tombstone rather than deletion_time
  sstables: mc: use gc_clock types for writing delta ttl and local_deletion_time
2019-01-22 18:21:55 +02:00
Glauber Costa
54bc0ce70d scylla_setup: make sure it works (again) in interactive mode
Commit 019a2e3a27 marked some arguments as required, which improved
the usability of scylla_setup.

The problem is that when we call scylla_setup in interactive mode,
no argument should be required. After the aforementioned commit
scylla_setup will either complain that the required arguments were
not passed if zero arguments are present, or skip interactive mode
if one of the mandatory ones is present.

This patch fixes that by checking whether or not we were invoked with
no command line arguments and lifting the requirements for mandatory
arguments in that case.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190122003621.11156-1-glauber@scylladb.com>
2019-01-22 16:54:55 +02:00
Benny Halevy
7d0854a1e5 tests: cql_query_test add test_time_overflow
Test 32-bit time overflow scenarios.
Fails without "gc_clock: make 64 bit".

Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
93270dd8e0 gc_clock: make 64 bit
Fixes: #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
1ccd72f115 sstables: mc: use int64_t for local_deletion_time and ttl
In preparation for changing gc_clock::duration::rep to int64_t.

Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
427d6e6090 sstables: add capped_tombstone_deletion_time stats counter
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
0ec46924bf sstables: mc: cap partition tombstone local_deletion_time to max
deletion_time struct as int32_t deletion_time that cannot hold long
time values. Cap local_deletion_time to max_local_deletion_time and
log a warning about that,
This corresponds to Cassandra's MAX_DELETION_TIME.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
156f9ffa11 sstables: add capped_local_deletion_time stats counter
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
7609a04565 sstables: mc: metadata collector: cap local_deletion_time at max
max local_deletion_time_tracker in stats is int32_t so just track the limit
of (max int32_t - 1) if time_point is greater than the limit.
This corresponds to Cassandra's MAX_DELETION_TIME.

Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
bd6861989d sstables: mc: use proper gc_clock types for local_deletion_time and ttl
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
9878b36895 db: get default_time_to_live as int32_t rather than gc_clock::rep
Otherwise, value_cast<> throws std::bad_cast exception
when gc_clock::rep is defined as int64_t.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
33314cec3f sstables: safely convert ttl and local_deletion_time to int32_t
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
9a00c5a763 sstables: mc: move liveness_info initialization to members
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Benny Halevy
0aba922b6d sstables: mc: move parsing of liveness_info deltas to data_consume_rows_context_m
To be consistent with other calls to parse_* methods there.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Benny Halevy
6465a673f5 sstables: mc: define expired_liveness_ttl as signed int32_t
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Benny Halevy
c4c2133e3e sstables: mc: change write_delta_deletion_time to receive tombstone rather than deletion_time
mc format only writes delta local_deletion_time of tombstones.
Conventional deletion_time is written only for the partition header.

Restructure the code to pass a tombstone to write_delta_deletion_time
rather than struct deletion_time to prepare for using 64-bit deletion times.

The tombstone uses gc_clock::time_point while struct
deletion_time is limited to int32_t local_deletion_time.

Note that for "live" tombstones we encode <api::missing_timestamp,
no_deletion_time> as was previously evaluated by to_deletion_time().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Benny Halevy
820906b794 sstables: mc: use gc_clock types for writing delta ttl and local_deletion_time
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 13:36:35 +02:00
Tomasz Grabiec
dbc1894bd5 lsa: Avoid unnecessary compact_and_evict_locked()
When the reclaim request was satisfied from the pool there's no need
to call compact_and_evict_locked(). This allows us to avoid calling
boost::range::make_heap(), which is a tiny performance difference, as
well as some confusing log messages.

Message-Id: <1548091941-8534-1-git-send-email-tgrabiec@scylladb.com>
2019-01-21 20:19:20 +02:00
Jesse Haber-Kucharsky
72da3283b9 build: Change parameters for pkg_config function
We can invoke pkg-config with multiple options, and we specify the
package name first since this is the "target" of the pkg-config query.

Supporting multiple options is necessary for querying Seastar's
pkg-config file with `--static`, which we anticipate in a future change.
2019-01-21 11:38:25 -05:00
Glauber Costa
ca997b5f60 scylla_setup: warn users on the severity of answering no to IOTUne
The system won't work properly if IOTune is not run. While it is fair
to skip this step because it takes long-- indeed, it is common to provision
io.conf manually to be able to skip this step, first time users don't know
this and can have the impression that this is just a totally optional step.

Except the node won't boot up without it.

As a user nicely put recently in our mailing list:

"...in this case, it would be even simpler to forbid answering "no"
 to this not-so-optional step :)"

We should not forbid saying no to IOTune, but we should warn the user
about the consequences of doing so.

Fixes #4120

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190121144506.17121-1-glauber@scylladb.com>
2019-01-21 16:55:50 +02:00
Botond Dénes
4e89dea9ea database: don't allow access to global semaphores
Recently we had a bug (#4096) due to a component
(`multishard_mutation_query()`) assuming that all reads used the
semaphore obtainable via `database::user_read_concurrency_sem()`.
This problem revealed that it is plain wrong to allow access to the
shard-global semaphores residing in the database object. Instead all
code wishing to access the relevant semaphore for some read, should do
so via the relevant `table` object, thus guaranteeing that it will get
the correct semaphore, configured for that table.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4f3a6780eb3240822db34aba7c1ba0a675a96592.1547734212.git.bdenes@scylladb.com>
2019-01-21 16:29:02 +02:00
Piotr Sarna
5d76a635ca distributed_loader: migrate flush_upload_dir to thread
Flushing upload dir code suffers from overcomplication,
so in order to make it a little bit simpler, it's moved
to threaded context.

Refs #4118

Message-Id: <232cca077bae7116cfa87de9c9b4ba60efc2a01d.1548077720.git.sarna@scylladb.com>
2019-01-21 15:48:17 +02:00
Gleb Natapov
85cb09294e storage_service: do not start thrift and cql servers if a node is isolated due to errors
Scylla starts doing IO much earlier that it starts cql/thrift servers.
The IO may cause an error that will try stop all servers, but since they
are still not running it will do nothing, but servers will be started
later. Fix it by checking that the node is not isolated before starting
servers.

Message-Id: <20190110152830.GE3172@scylladb.com>
2019-01-21 13:04:23 +00:00
Tomasz Grabiec
e02baabd62 tests: perf_fast_forward: Introduce --with-compression option
Message-Id: <1547819062-4369-1-git-send-email-tgrabiec@scylladb.com>
2019-01-21 12:18:31 +00:00
Botond Dénes
ff2884f25b Revert "partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges"
A much simpler and more complete fix was found. Let's revert this before
applying the simpler fix.

This reverts commit 7049cd9374.
2019-01-21 13:56:56 +02:00
Botond Dénes
f229dff210 auth/service: unregister migration listener on stop()
Otherwise any event that triggers notification to this listener would
trigger a heap-use-after-free.

Refs: #4107

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <b6bbd609371a2312aed7571b05119d59c7d103d7.1548067626.git.bdenes@scylladb.com>
2019-01-21 13:06:59 +02:00
Tomasz Grabiec
d7c701d2d1 Merge "Type-erase gratuitous templates with functions" from Avi
Many area of the code are splattered with unneeded templates. This patchset replaces
some of them, where the template parameter is a function object, with an std::function
or noncopyable_function (with a preference towards the latter; but it is not always
possible). As the template is compiled for each instantiation (if the function
object is a lambda) while a function is compiled only once, there are significant
savings in compile time and bloat.

   text    data     bss     dec     hex filename
85160690          42120  284910 85487720        5187068 scylla.before
84824762          42120  284910 85151792        5135030 scylla.after

* https://github.com/avikivity/scylla detemplate/v2:
  api/commitlog: de-template acquire_cl_metric()
  database: de-template do_parse_schema_tables
  database: merge for_all_partitions and for_all_partitions_slow
  hints: de-template scan_for_hints_dirs()
  schema_tables: partially de-template make_map_mutation()
  distributed_loader: de-template
  tests: commitlog_test: de-template
  tests: cql_auth_query_test: de-template
  test: de-template eventually() and eventually_true()
  tests: flush_queue_test: de-template
  hint_test: de-template
  tests: mutation_fragment_test: de-template
  test: mutation_test: de-template
2019-01-21 11:32:22 +01:00
Avi Kivity
826cf90f3f Merge "Restore mutating uploaded sstables to level 0" from Piotr
"
This miniseries fixes the behaviour of distributed loader,
which now unconditionally mutates new sstables found in /upload
dir to LCS level 0 first, and only after that proceeds with
either queueing them for update generation or moving them
to data directory.
"

* 'restore_always_mutating_sstables_level_0' of https://github.com/psarna/scylla:
  distributed_loader: restore indentation
  distributed_loader: restore always mutating to level 0
2019-01-20 20:32:15 +02:00
Benny Halevy
844a2de263 sstables: mc: prevent signed integer overflow
Fix runtime error: signed integer overflow
introduced by 2dc3776407

Delta-encoded values may wrap around if the encoded value is
less than the base value.  This could happen in two places:
In the mc-format serialization header itself, where the base values are implicit
Cassandra epoch time, and in the sstables data files, where the base values
are taken from the encoding_stats (later written to the serialization_header).

In these cases, when the calculation is done using signed integer/long we may see
"runtime error: signed integer overflow" messages in debug mode
(with -fsanitize=undefined / -fsanitize=signed-integer-overflow).

Overflow here is expected and harmless since we do not gurantee that
neither the base values in the serialization header are greater than
or equal to Cassandra's epoch now that the delta-encoded values are
always greater than or equal to the respective base values in
the serialization header.

To prevent these warnings, the subtraction/addition should be done with unsigned
(two's complement) arithmetic and the result converted to the signed type.

Note that to keep the code simple where possible, when also rely on implicit
conversion of signed integers to unsigned when either one of added value is unsigned
and the other is signed.

Fixes: #4098

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190120142950.15776-1-bhalevy@scylladb.com>
2019-01-20 16:59:46 +02:00
Avi Kivity
1e5c09dbce test: mutation_test: de-template
Replace the with_column_family helper template with an ordinary funciton, to
reduce code bloat.
2019-01-20 15:55:20 +02:00
Avi Kivity
28db56df13 tests: mutation_fragment_test: de-template
The for_each_target() template is called four times, so making it a normal function
reduces a lot of code generation.
2019-01-20 15:55:20 +02:00
Avi Kivity
401684503d hint_test: de-template
While cl_test is duplicated with commitlog_test, at least deduplicate it internally
by converting it to an ordinary function.
2019-01-20 15:55:20 +02:00
Avi Kivity
208b0f80a4 tests: flush_queue_test: de-template
The internal test_propagation template is instantiated many times. Replace
with an oridinary function to reduce bloat. Call sites adjusted to have a
uniform signature.
2019-01-20 15:55:20 +02:00
Avi Kivity
2f36d30572 test: de-template eventually() and eventually_true()
These templates are not trivial and called many times. De-template them to
reduce code bloat.
2019-01-20 15:55:20 +02:00
Avi Kivity
96a8eacc3c tests: cql_auth_query_test: de-template
Replace the with_user() and verify_unauthorized_then_ok() templates with functions.
2019-01-20 15:55:20 +02:00
Avi Kivity
e0b0e18234 tests: commitlog_test: de-template
The cl_test function is called many times, so its contents are bloat. De-template
it so it is compiled only once.
2019-01-20 15:55:20 +02:00
Avi Kivity
baf9480c8d distributed_loader: de-template
distributed_loader has several large templates that can be converted to normal
function with the help of noncopyable_function<>, reducing code bloat.

One of the lambdas used as an actual argument was adjusted, because the de-templated
callee only accepts functions returning a future, while the original accepted both
functions returning a future and functions returning void (similar to future::then).
2019-01-20 15:55:20 +02:00
Avi Kivity
e0914a080e schema_tables: partially de-template make_map_mutation()
make_map_mutation() is called several times, hopfully with the same Map type
parameter. Replace the Func parameter with a noncopyable_function<>.
2019-01-20 15:55:20 +02:00
Avi Kivity
630f841e5b hints: de-template scan_for_hints_dirs()
This function is called twice, and is not doing anything performance critical,
so replace the template parameter Func with std::function<>.x
2019-01-20 15:55:20 +02:00
Avi Kivity
fae4c6c0b6 database: merge for_all_partitions and for_all_partitions_slow
for_all_partitions is only used in the implementation of for_all_partitions_slow,
so merge them and get rid of a template.
2019-01-20 15:55:20 +02:00
Avi Kivity
9858395c3e database: de-template do_parse_schema_tables
This long slow-path function is called four times, so de-templating it is an
easy win. We use std::function instead of noncopyable_function because the
function is copied within the parallel_for_each callback. The original code
uses a move, which is incorrect, but did not fail because moving the lambdas
that were used as the actual arguments is equivalent to a copy.
2019-01-20 15:55:18 +02:00
Tomasz Grabiec
c422bfc2c5 tests: perf_fast_forward: Store results for each dataset in separate sub-directory
Otherwise read test results for subsequent datasets will override each other.

Also, rename population test case to not include dataset name, which
is now redundant.

Message-Id: <1547822942-9690-1-git-send-email-tgrabiec@scylladb.com>
2019-01-20 15:38:46 +02:00
Botond Dénes
7049cd9374 partition_snapshot_reader: don't re-emit range tombstones overlapping multiple ck ranges
When entering a new ck range (of the partition-slice), the partition
snapshot reader will apply to its range tombstones stream all the
tombstones that are relevant to the new ck range. When the partition has
range tombstones that overlap with multiple ck ranges, these will be
applied to the range tombstone stream when entering any of the ck ranges
they overlap with. This will result in the violation of the monotonicity
of the mutation fragments emitted by the reader, as these range
tombstones will be re-emitted on each ck range, if the ck range has at
least one clustering row they apply to.
For example, given the following partition:
    rt{[1,10]}, cr{1}, cr{2}, cr{3}...

And a partition-slice with the following ck ranges:
    [1,2], [3, 4]

The reader will emit the following fragment stream:
    rt{[1,10]}, cr{1}, cr{2}, rt{[1,10]}, cr{3}, ...

Note how the range tombstone is emitted twice. In addition to violating
the monotonicity guarantee, this can also result in an explosion of the
number of emitted range tombstones.

Fix by applying only those range tombstones to the range tombstone
stream, that have a position strictly greater than that of the last
emitted clustering row (or range tombstone), when entering a new ck
range.

Fixes: #4104

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <e047af76df75972acb3c32c7ef9bb5d65d804c82.1547916701.git.bdenes@scylladb.com>
2019-01-20 15:38:04 +02:00
Paweł Dziepak
14757d8a83 types: collection_type: drop tombstone if covered by higher-level one
At the moment are inefficiencies in how
collection_type_impl::mutation::compact_and_expire( handles tombstones.
If there is a higher-level tombstone that covers the collection one
(including cases where there is no collection tombstone) it will be
applied to the collection tombstone and present in the compaction
output. This also means that the collection tombstone is never dropped
if fully covered by a higher-level one.

This patch fixes both those problems. After the compaction the
collection tombstone is either unchanged or removed if covered by a
higher-level one.

Fixes #4092.

Message-Id: <20190118174244.15880-1-pdziepak@scylladb.com>
2019-01-20 15:32:34 +02:00
Avi Kivity
e51ef95868 Update seastar submodule
* seastar af6b797...7d620e1 (1):
  > perftune.py: don't let any exception out when connecting to AWS meta server

Fixes #4102.
2019-01-20 13:59:09 +02:00
Avi Kivity
32e79fc23b api/commitlog: de-template acquire_cl_metric()
Use std::function instead of a template parameter. Likely doesn't gain
anyting, because the template was always instantiated with the same type
(the result of std::bind() with the same signatures), but still good practice.

std::function was used instead of noncopyable_function because
sharded::map_reduce0() copies the input function.
2019-01-20 11:58:39 +02:00
Avi Kivity
6e6372e8d2 Revert "Merge "Type-eaese gratuitous templates with functions" from Avi"
This reverts commit 31c6a794e9, reversing
changes made to 4537ec7426. It causes bad_function_calls
in some situations:

INFO  2019-01-20 01:41:12,164 [shard 0] database - Keyspace system: Reading CF sstable_activity id=5a1ff267-ace0-3f12-8563-cfae6103c65e version=d69820df-9d03-3cd0-91b0-c078c030b708
INFO  2019-01-20 01:41:13,952 [shard 0] legacy_schema_migrator - Moving 0 keyspaces from legacy schema tables to the new schema keyspace (system_schema)
INFO  2019-01-20 01:41:13,958 [shard 0] legacy_schema_migrator - Dropping legacy schema tables
INFO  2019-01-20 01:41:14,702 [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables
ERROR 2019-01-20 01:41:14,999 [shard 0] seastar - Exiting on unhandled exception: std::bad_function_call (bad_function_call)
2019-01-20 11:32:14 +02:00
Paweł Dziepak
e212d37a8a utils/small_vector: fix leak in copy assignment slow path
Fixes #4105.

Message-Id: <20190118153936.5039-1-pdziepak@scylladb.com>
2019-01-18 17:49:46 +02:00
Paweł Dziepak
23cfb29fea Merge "compaction: mc: re-calculate encoding_stats" from Benny
"
Use input sstables stats metadata to re-calculate encoding_stats.

Fixes #3971.
"

* 'projects/compaction-encoding-stats/v3' of https://github.com/bhalevy/scylla:
  compaction: mc: re-calculate encoding_stats based on column stats
  memtable: extract encoding_stats_collector base class to encoding_stats header file
2019-01-18 14:36:17 +00:00
Tomasz Grabiec
7308effb45 tests: flat_mutation_reader_test: Drop unneeded includes
Message-Id: <1547819118-4645-1-git-send-email-tgrabiec@scylladb.com>
2019-01-18 13:58:05 +00:00
Tomasz Grabiec
6461e085fe managed_bytes: Fix compilation on gcc 8.2
The compilation fails on -Warray-bounds, even though the branch is never taken:

    inlined from ‘managed_bytes::managed_bytes(bytes_view)’ at ./utils/managed_bytes.hh:195:22,
    inlined from ‘managed_bytes::managed_bytes(const bytes&)’ at ./utils/managed_bytes.hh:162:77,
    inlined from ‘dht::token dht::bytes_to_token(bytes)’ at dht/random_partitioner.cc:68:57,
    inlined from ‘dht::token dht::random_partitioner::get_token(bytes)’ at dht/random_partitioner.cc:85:39:
/usr/include/c++/8/bits/stl_algobase.h:368:23: error: ‘void* __builtin_memmove(void*, const void*, long unsigned int)’ offset 16 from the object at ‘<anonymous>’ is out of the bounds of referenced subobject ‘managed_bytes::small_blob::data’ with type ‘signed char [15]’ at offset 0 [-Werror=array-bounds]
      __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Work around by disabling the diagnostic locally.
Message-Id: <1547205350-30225-1-git-send-email-tgrabiec@scylladb.com>
2019-01-18 13:48:05 +00:00
Tomasz Grabiec
31c6a794e9 Merge "Type-eaese gratuitous templates with functions" from Avi
Many area of the code are splattered with unneeded templates. This patchset replaces
some of them, where the template parameter is a function object, with an std::function
or noncopyable_function (with a preference towards the latter; but it is not always
possible). As the template is compiled for each instantiation (if the function
object is a lambda) while a function is compiled only once, there are significant
savings in compile time and bloat.

   text    data     bss     dec     hex filename
85160690          42120  284910 85487720        5187068 scylla.before
84824762          42120  284910 85151792        5135030 scylla.after

* https://github.com/avikivity/scylla detemplate/v1:
  api/commitlog: de-template acquire_cl_metric()
  database: de-template do_parse_schema_tables
  database: merge for_all_partitions and for_all_partitions_slow
  hints: de-template scan_for_hints_dirs()
  schema_tables: partially de-template make_map_mutation()
  distributed_loader: de-template
  tests: commitlog_test: de-template
  tests: cql_auth_query_test: de-template
  test: de-template eventually() and eventually_true()
  tests: flush_queue_test: de-template
  hint_test: de-template
  tests: mutation_fragment_test: de-template
  test: mutation_test: de-template
2019-01-18 11:42:01 +01:00
Piotr Sarna
3d65eb5d4a distributed_loader: restore indentation 2019-01-18 10:59:37 +01:00
Piotr Sarna
e50e9b5150 distributed_loader: restore always mutating to level 0
When introducing view update generation path for sstables
in /upload directory, mutating these sstables was moved
to regular path only. It was wrong, because sstables that
need view updates generated from them may still need
to be downgraded to LCS level 0, so they won't disrupt
LCS assumptions after being loaded.

Reported-by: Nadav Har'El <nyh@scylladb.com>
2019-01-18 10:35:20 +01:00
Avi Kivity
089931fb56 test: mutation_test: de-template
Replace the with_column_family helper template with an ordinary funciton, to
reduce code bloat.
2019-01-17 19:06:42 +02:00
Avi Kivity
53a3db9446 tests: mutation_fragment_test: de-template
The for_each_target() template is called four times, so making it a normal function
reduces a lot of code generation.
2019-01-17 19:05:48 +02:00
Avi Kivity
4a21de4592 hint_test: de-template
While cl_test is duplicated with commitlog_test, at least deduplicate it internally
by converting it to an ordinary function.
2019-01-17 19:03:31 +02:00
Avi Kivity
1f02fd3ff6 tests: flush_queue_test: de-template
The internal test_propagation template is instantiated many times. Replace
with an oridinary function to reduce bloat. Call sites adjusted to have a
uniform signature.
2019-01-17 19:02:26 +02:00
Avi Kivity
63077501ed test: de-template eventually() and eventually_true()
These templates are not trivial and called many times. De-template them to
reduce code bloat.
2019-01-17 19:00:55 +02:00
Avi Kivity
a5d3254ed3 tests: cql_auth_query_test: de-template
Replace the with_user() and verify_unauthorized_then_ok() templates with functions.
Some adjustments made to the call site to unify the signatures.
2019-01-17 18:59:30 +02:00
Avi Kivity
8c05debecb tests: commitlog_test: de-template
The cl_test function is called many times, so its contents are bloat. De-template
it so it is compiled only once.
2019-01-17 18:57:35 +02:00
Avi Kivity
b6239134c2 distributed_loader: de-template
distributed_loader has several large templates that can be converted to normal
function with the help of noncopyable_function<>, reducing code bloat.
2019-01-17 18:56:22 +02:00
Avi Kivity
2407c35cc1 schema_tables: partially de-template make_map_mutation()
make_map_mutation() is called several times, hopfully with the same Map type
parameter. Replace the Func parameter with a noncopyable_function<>.
2019-01-17 18:54:43 +02:00
Avi Kivity
81d004b2c0 hints: de-template scan_for_hints_dirs()
This function is called twice, and is not doing anything performance critical,
so replace the template parameter Func with std::function<>.x
2019-01-17 18:51:46 +02:00
Avi Kivity
f61dbc9855 database: merge for_all_partitions and for_all_partitions_slow
for_all_partitions is only used in the implementation of for_all_partitions_slow,
so merge them and get rid of a template.
2019-01-17 18:50:36 +02:00
Avi Kivity
4568a4e4b0 database: de-template do_parse_schema_tables
This long slow-path function is called four times, so de-templating it is an
easy win.
2019-01-17 18:48:57 +02:00
Avi Kivity
08bd28942b api/commitlog: de-template acquire_cl_metric()
Use noncopyable_function instead of a template parameter. Likely doesn't gain
anyting, because the template was always instantiated with the same type
(the result of std::bind() with the same signatures), but still good practice.
2019-01-17 18:45:14 +02:00
Botond Dénes
4537ec7426 mutlishard_mutation_query(): use correct reader concurrency semaphore
The multishard mutation query used the semaphore obtained from
`database::user_read_concurrency_sem()` to pause-resume shard readers.
This presented a problem when `multishard_mutation_query()` was reading
from system tables. In this case the readers themselves would obtain
their permits from the system read concurrency semaphore. Since the
pausing of shard readers used the user read semaphore, pausing failed to
fulfill its objective of alleviating pressure on the semaphore the reads
obtained their permits from. In some cases this lead to a deadlock
during system reads.
To ensure the correct semaphore is used for pausing-resuming readers,
obtain the semaphore from the `table` object. To avoid looking up the
table on every pause or resume call, cache the semaphores when readers
are created.

Fixes: #4096

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <c784a3cd525ce29642d7216fbe92638fa7884e88.1547729119.git.bdenes@scylladb.com>
2019-01-17 15:19:59 +02:00
Avi Kivity
8e9989685d scyllatop: complete conversion to python3
d2dbbba139 converted scyllatop's interperter to Python 3, but neglected to do
the actual conversion. This patch does so, by running 2to3 over allfiles and adding
an additional bytes->string decode step in prometheus.py. Superfluous 2to3 changes
to print() calls were removed.
Message-Id: <20190117124121.7409-1-avi@scylladb.com>
2019-01-17 12:50:25 +00:00
Duarte Nunes
7505815013 Merge 'Fix filtering with LIMIT and paging' from Piotr
"
Before this series the limit was applied per page instead
of globally, which might have resulted in returning too many
rows.

To fix that:
 1. restrictions filter now has a 'remaining' parameter
    in order to stop accepting rows after enough of them
    have already been accepted
 2. pager passes its row limit to restrictions filter,
    so no more rows than necessary will be served to the client
 3. results no longer need to be trimmed on select_statement
    level

Tests: unit (release)
"

* 'fix_filtering_limit_with_paging_3' of https://github.com/psarna/scylla:
  tests: add filtering+limit+paging test case
  tests: allow null paging state in filtering tests
  cql3: fix filtering with LIMIT with regard to paging
2019-01-17 12:50:00 +00:00
Piotr Sarna
ed7328613f tests: add filtering+limit+paging test case
A test case that checks whether a combination of paging
and LIMIT clause for filtering queries doesn't return
with too many rows.

Refs #4100
2019-01-17 13:25:10 +01:00
Piotr Sarna
7d4f994e98 tests: allow null paging state in filtering tests
Previously the utility to extract paging state asserted
that the state exists, but in future tests it would be useful
to be able to call this function even if it would return null.
2019-01-17 13:25:10 +01:00
Piotr Sarna
87c23372fb cql3: fix filtering with LIMIT with regard to paging
Previously the limit was erroneously applied per page
instead of being accumulated, which might have caused returning
too many rows. As of now, LIMIT is handled properly inside
restrictions filter.

Fixes #4100
2019-01-17 13:25:09 +01:00
Piotr Sarna
02d88de082 db,view: add consuming units in staging table registration
View update generator service can accept sstables even before it starts,
but it should still acknowledge the number of waiters in the semaphore.

Reported-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <fcaa0f2884ebb4d34d1716e9e1cfed0642b4b85d.1547661048.git.sarna@scylladb.com>
2019-01-16 18:05:17 +00:00
Benny Halevy
1d483bc424 compaction: mc: re-calculate encoding_stats based on column stats
When compacting several sstables, get and merge their encoding_stats
for encoding the result.

Introduce sstable::get_encoding_stats_for_compaction to return encoding_stats
based on the sstable's column stats.

Use encoding_stats_collector to keep track of the minimum encoding_stats
values of all input sstables.

Fixes #3971

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-16 17:59:59 +02:00
Benny Halevy
e2c4d2d60a memtable: extract encoding_stats_collector base class to encoding_stats header file
To be used also by compaction.

Refs #3971

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-16 17:59:58 +02:00
Asias He
4b9e1a9f1d repair: Add row level metrics
Number of rows sent and received
- tx_row_nr
- rx_row_nr

Bytes of rows sent and received
- tx_row_bytes
- rx_row_bytes

Number of row hashes sent and received
- tx_hashes_nr
- rx_hashes_nr

Number of rows read from disk
- row_from_disk_nr

Bytes of rows read from disk
- row_from_disk_bytes

Message-Id: <d1ee6b8ae8370857fe45f88b6c13087ea217d381.1547603905.git.asias@scylladb.com>
2019-01-16 14:04:57 +02:00
Duarte Nunes
04a14b27e4 Merge 'Add handling staging sstables to /upload dir' from Piotr
"
This series adds generating view updates from sstables added through
/upload directory if their tables have accompanying materialized views.
Said sstables are left in /upload directory until updates are generated
from them and are treated just like staging sstables from /staging dir.
If there are no views for a given tables, sstables are simply moved
from /upload dir to datadir without any changes.

Tests: unit (release)
"

* 'add_handling_staging_sstables_to_upload_dir_5' of https://github.com/psarna/scylla:
  all: rename view_update_from_staging_generator
  distributed_loader: fix indentation
  service: add generating view updates from uploaded sstables
  init: pass view update generator to storage service
  sstables: treat sstables in upload dir as needing view build
  sstables,table: rename is_staging to requires_view_building
  distributed_loader: use proper directory for opening SSTable
  db,view: make throttling optional for view_update_generator
2019-01-15 18:19:27 +00:00
Duarte Nunes
9b79f0f58b Merge 'Add stream phasing' from Piotr
"
This series addresses the problem mentioned in issue 4032, which is a race
between creating a view and streaming sstables to a node. Before this patch
the following scenario is possible:
 - sstable X arrives from a streaming session
 - we decide that view updates won't be generated from an sstable X
   by the view builder
 - new view is created for the table that owns sstable X
 - view builder doesn't generate updates from sstable X, even though the table
   has accompanying views - which is an inconsistency

This race is fixed by making the view builder wait for all ongoing streams,
just like it does for reads and writes. It's implemented with a phaser.

Tests:
unit (release)
dtest(not merged yet: materialized_views_test.TestMaterializedViews.stream_from_repair_during_build_process_test)
"

* 'add_stream_phasing_2' of https://github.com/psarna/scylla:
  repair: add stream phasing to row level repair
  streaming: add phasing incoming streams
  multishard_writer: add phaser operation parameter
  view: wait for stream sessions to finish before view building
  table: wait for pending streams on table::stop
  database: add pending streams phaser
2019-01-15 18:18:40 +00:00
Piotr Sarna
0eb703dc80 all: rename view_update_from_staging_generator
The new name, view_update_generator, is both more concise
and correct, since we now generate from directories
other than "/staging".
2019-01-15 17:31:47 +01:00
Piotr Sarna
a5d24e40e0 distributed_loader: fix indentation
Bad indentation was introduced in the previous commit.
2019-01-15 17:31:37 +01:00
Piotr Sarna
13c8c84045 service: add generating view updates from uploaded sstables
SSTables loaded to the system via /upload dir may sometimes be needed
to generate view updates from them (if their table has accompanying
views).

Fixes #4047
2019-01-15 17:31:37 +01:00
Piotr Sarna
46305861c3 init: pass view update generator to storage service
Storage service needs to access view update generator in order
to register staging sstables from /upload directory.
2019-01-15 17:31:36 +01:00
Piotr Sarna
13f6453350 sstables: treat sstables in upload dir as needing view build
In some cases, sstables put in the upload dir should have view updates
generated from them. In order to avoid moving them across directories
(which then involves handling failure paths), upload dir will also be
treated as a valid directory where staging sstables reside.
Regular sstables that are not needed for view updates will be
immediately moved from upload/ dir as before.
2019-01-15 16:47:01 +01:00
Piotr Sarna
09401e0e71 sstables,table: rename is_staging to requires_view_building
A generalized name will be more fitting once we treat uploaded sstables
as requiring view building too.
2019-01-15 16:47:01 +01:00
Piotr Sarna
76616f6803 distributed_loader: use proper directory for opening SSTable
Previous implementation assumes that each SSTable resides directly
in table::datadir directory, while what should actually be used
is directory path from SSTable descriptor.
This patch prevents a regression when adding staging sstables support
for upload/ dir.
2019-01-15 16:47:01 +01:00
Piotr Sarna
beb4836726 db,view: make throttling optional for view_update_generator
Currently registering new view updates is throttled by a semaphore,
which makes sense during stream sessions in order to avoid overloading
the queue. Still, registration also occurs during initialization,
where it makes little sense to wait on a semaphore, since view update
generator might not have started at all yet.
2019-01-15 16:47:01 +01:00
Paweł Dziepak
635873639b Merge "Encoding stats enhancements" from Benny
"
Cleanup various cases related to updating of metatdata stats and encoding stats
updating in preparation for 64-bit gc_clock (#3353).

Fixes #4026
Fixes #4033
Fixes #4035
Fixes #4041

Refs #3353
"

* 'projects/encoding-stats-fixes/v6' of https://github.com/bhalevy/scylla:
  sstables: remove duplicated code in data_consume_rows_context CELL_VALUE_BYTES
  sstables: mc: use api::timestamp_type in write_liveness_info
  sstables: mc: sstable_write encoding_stats are const
  mp_row_consumer_k_l::consume_deleted_cell rename ttl param to local_deletion_time
  memtable: don't use encoding_stats epochs as default
  memtable: mc: udpate min_ttl encoding stats for dead row marker
  memtable: mc: add comment regarding updating encoding stats of collection tombstones
  sstables: metadata_collector: add update tombstone stats
  sstables: assert that delete_time is not live when updating stats
  sstables: move update_deletion_time_stats to metadata collector
  sstables: metadata_collector: introduce update_local_deletion_time_and_tombstone_histogram
  sstables: mc: write_liveness_info and write_collection should update tombstone_histogram
  sstables: update_local_deletion_time for row marker deletion_time and expiration
2019-01-15 16:53:36 +02:00
Tomasz Grabiec
32f711ce56 row_cache: Fix crash on memtable flush with LCS
Presence checker is constructed and destroyed in the standard
allocator context, but the presence check was invoked in the LSA
context. If the presence checker allocates and caches some managed
objects, there will be alloc-dealloc mismatch.

That is the case with LeveledCompactionStrategy, which uses
incremental_selector.

Fix by invoking the presence check in the standard allocator context.

Fixes #4063.

Message-Id: <1547547700-16599-1-git-send-email-tgrabiec@scylladb.com>
2019-01-15 16:53:36 +02:00
Piotr Sarna
08a42d47a5 repair: add stream phasing to row level repair
In order to allow other services to wait for incoming streams
to finish, row level repair uses stream phasing when creating
new sstables from incoming data.

Fixes scylladb#4032
2019-01-15 10:28:21 +01:00
Piotr Sarna
7e61f02365 streaming: add phasing incoming streams
Incoming streams are now phased, which can be leveraged later
to wait for all ongoing streams to finish.

Refs #4032
2019-01-15 10:28:15 +01:00
Asias He
1cc7e45f44 database: Make log max_vector_size and internal_count debug level
It is useful for developers but not useful for users. Make it debug
level.

Message-Id: <775ce22d6f8088a44d35601509622a7e73ddeb9b.1547524976.git.asias@scylladb.com>
2019-01-15 11:02:30 +02:00
Piotr Sarna
238003b773 multishard_writer: add phaser operation parameter
Multishard writer can now accept a phaser operation parameter
in order to sustain a phased operation (e.g. a streaming session).
2019-01-15 10:02:22 +01:00
Piotr Sarna
b9203ec4f8 view: wait for stream sessions to finish before view building
During streaming, there's a race between streamed sstables
and view creation, which might result in some tables not being
used to generate view updates, even though they should.
That happens when the decision about view update path for a table
is done before view creation, but after already receiving some sstables
via streaming. These will not be used in view building even though
they should.
Hence, a phaser is used to make the view builder wait for all ongoing
stream sessions for a table to finish before proceeding with build steps.

Refs #4032
2019-01-15 09:36:55 +01:00
Piotr Sarna
d3a8fb378c table: wait for pending streams on table::stop
Stream sessions are now phased, so it's possible to wait for existing
streams to finish gently before stopping a table.
2019-01-15 09:36:55 +01:00
Piotr Sarna
8a5aaf2839 database: add pending streams phaser
This phaser will be used later to wait for all existing stream sessions
to finish before proceeding with view building.
2019-01-15 09:36:55 +01:00
Nadav Har'El
9062750089 scylla_util.py: make view_hints_directory setting optional
It is optional to set "view_hints_directory", so we shouldn't insist that
it is defined in scylla.yaml on upgrade.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190114125225.10794-1-nyh@scylladb.com>
2019-01-14 14:59:20 +02:00
Benny Halevy
238866228f memtable: rename get_stats to get_encoding_stats
For symmetry reasons to similar sstable and compaction methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190113105155.29118-2-bhalevy@scylladb.com>
2019-01-14 14:58:43 +02:00
Avi Kivity
df090a15ff Merge "Add counters for inactive reads" from Botond
"
This mini-series adds counters for the inactive reads registered in the
reader concurrency semaphore.
"

* 'reader-concurrency-semaphore-counters/v6' of https://github.com/denesb/scylla:
  tests/querier_cache: use stats to get the no. of inactive reads
  reader_concurrency_semaphore: add counters for inactive reads
2019-01-14 11:56:43 +02:00
Rafael Ávila de Espíndola
acd6999ba9 Don't use SEASTAR_HAVE_LZ4_COMPRESS_DEFAULT in scylla
The existence of LZ4_compress_default is a property of the lz4
library, not seastar.

With this patch scylla does its own configure check instead of
depending on the one done by seastar.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190114013737.5395-1-espindola@scylladb.com>
2019-01-14 11:51:20 +02:00
Rafael Ávila de Espíndola
684fb607c4 sstable: handle missing index entry
This patch fixes a crash when the index file is corrupted and we get
an empty index entry list.

Tests: unit (release)

Fixes: 2532

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190110202833.29333-1-espindola@scylladb.com>
2019-01-14 10:47:21 +01:00
Avi Kivity
f5ee466a1c Merge "Cleanup UDT and tuple names creation" from Piotr
"
Currently the logic is scattered between types.*, cql3_types.* and
sstables/mc/writer.cc.

This patchset places all the logic in types.* and makes sure we
correctly add "frozen<...>" and "FrozenType(...)" to the names of
tuples and UDTs.

Fixes #4087

Tests: unit(release)
"

* 'haaawk/4087_v1' of github.com:scylladb/seastar-dev:
  Add comment explaining tuple type name creation
  Add "FrozenType(...)" to UDT name only when it's frozen
  Move "FrozenType(...)" addition to UDT name to user_type_impl
  Add "frozen<...>" to tuple CQL name only when it's frozen
  Move "frozen<...>" addition to tuple CQL name to tuple_type_impl
  Merge make_cql3_tuple_type into tuple_type_impl::as_cql3_type
  Add "frozen<...>" to UDT CQL name only when it's frozen
  Move "frozen<...>" addition to UDT CQL name to user_type_impl
2019-01-13 15:34:24 +02:00
Benny Halevy
b243852a70 sstables: remove duplicated code in data_consume_rows_context CELL_VALUE_BYTES
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
d9e2aa65fc sstables: mc: use api::timestamp_type in write_liveness_info
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
7ea96aa778 sstables: mc: sstable_write encoding_stats are const
Encoding stats are immutable once statistics are sealed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
5d2d2bf47a mp_row_consumer_k_l::consume_deleted_cell rename ttl param to local_deletion_time
It is actually the local deletion time rather than the ttl

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
2c99eb28d8 memtable: don't use encoding_stats epochs as default
Why default to an artificial minimum when you can do better
with zero effort? Track the actual minima in the memtable instead.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
9b78911379 memtable: mc: udpate min_ttl encoding stats for dead row marker
Update min ttl with expired_liveness_ttl (although it's value of max int32
is not expected to affect the minimum).

Fixes #4041

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
47964d9ddc memtable: mc: add comment regarding updating encoding stats of collection tombstones
When the row flag has_complex_deletion is set, some collection columns may have
deletion tombstones and some may not. we don't strictly need to update stats
will not affect the encoding_stats anyway.

Fixes #4035

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
75ccd29b6a sstables: metadata_collector: add update tombstone stats
Conditionally update timestamp and local_deletion_time stats based on tombstone

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
0ae85a126a sstables: assert that delete_time is not live when updating stats
Be compatible with Cassandra

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
12e6b503c9 sstables: move update_deletion_time_stats to metadata collector
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
2989b986ef sstables: metadata_collector: introduce update_local_deletion_time_and_tombstone_histogram
Refs #4026
Refs #4033

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
bcb1fcd402 sstables: mc: write_liveness_info and write_collection should update tombstone_histogram
Fixes #4033

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Benny Halevy
0ca4ae658c sstables: update_local_deletion_time for row marker deletion_time and expiration
Fixes #4026

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-13 14:17:45 +02:00
Tomasz Grabiec
f12a3e2066 sstables: index_reader: Rename _promoted_index_size
Message-Id: <1547219234-21182-2-git-send-email-tgrabiec@scylladb.com>
2019-01-13 11:29:13 +02:00
Tomasz Grabiec
6c5f8e0eda sstables: index_reader: Simplify offset calculations
Now that continuous_data_consumer::position() is meaningful (since
36dd660), we can use our position in the stream to calculate offsets
instead of duplicating state machine in offset calculations.

The value of position() - data.size() always holds the current offset
in the stream.
Message-Id: <1547219234-21182-1-git-send-email-tgrabiec@scylladb.com>
2019-01-13 11:29:12 +02:00
Avi Kivity
0d52bdcbad install-dependencies.sh: unwrap long lines
Put package names one per line. This makes it easier to review changes,
and to backport changes to this file. No content changes.

Message-Id: <20190112091024.21878-1-avi@scylladb.com>
2019-01-12 14:23:27 +02:00
Avi Kivity
391d1e0fe0 table: const correctness for table::get_sstables() and related
Do not allow write access to the sstable list via this accessor. Luckily
there are no violations, and now we enforce it.
Message-Id: <20190111151049.16953-1-avi@scylladb.com>
2019-01-11 17:39:17 +01:00
Rafael Ávila de Espíndola
cd9ce18874 sstable: rename the is_boundary predicate
The new name makes it clear what is on either side of the boundary.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190110221324.33618-1-espindola@scylladb.com>
2019-01-11 14:36:49 +02:00
Piotr Jastrzebski
96b880f81c Add comment explaining tuple type name creation
To keep format compatibiliti we never wrap tuple type name
into "org.apache.cassandra.db.marshal.FrozenType(...)".
Even when the tuple is frozen.
This patch adds a comment in tuple_type_impl::make_name that
explains the situation.

For more details see #4087

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 12:14:26 +01:00
Piotr Jastrzebski
57e655d716 Add "FrozenType(...)" to UDT name only when it's frozen
At the moment Scylla supports only frozen UDTs but
the code should be able to handle non-frozen UDTs as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 12:08:02 +01:00
Piotr Jastrzebski
fc17bd376b Move "FrozenType(...)" addition to UDT name to user_type_impl
This logic belongs in types.hh/types.cc layer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 12:07:47 +01:00
Piotr Jastrzebski
1fdfc461b8 Add "frozen<...>" to tuple CQL name only when it's frozen
At the moment Scylla supports only frozen tuples but
the code should be able to handle non-frozen tuples as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:14:30 +01:00
Piotr Jastrzebski
749eee2711 Move "frozen<...>" addition to tuple CQL name to tuple_type_impl
This logic belongs in types.hh/types.cc layer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:14:30 +01:00
Piotr Jastrzebski
7aba17de2c Merge make_cql3_tuple_type into tuple_type_impl::as_cql3_type
This logic belongs in types.hh/types.cc layer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:14:30 +01:00
Piotr Jastrzebski
56060573bb Add "frozen<...>" to UDT CQL name only when it's frozen
At the moment Scylla supports only frozen UDTs but
the code should be able to handle non-frozen UDTs as well.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:14:30 +01:00
Piotr Jastrzebski
a928c103c2 Move "frozen<...>" addition to UDT CQL name to user_type_impl
This logic belongs in types.hh/types.cc layer.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-11 11:09:00 +01:00
Raphael S. Carvalho
1b7cad3531 database: Fix race condition in sstable snapshot
Race condition takes place when one of the sstables selected by snapshot
is deleted by compaction. Snapshot fails because it tries to link a
sstable that was previously unlinked by compaction's sstable deletion.

Fixes #4051.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20190110194048.26051-1-raphaelsc@scylladb.com>
2019-01-11 07:53:14 +02:00
Benny Halevy
2dc3776407 sstables: mc: sign-extend serialization_header min_local_deletion_time_base and min_ttl_base
Refs #4074
Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190110141439.1324-1-bhalevy@scylladb.com>
2019-01-10 16:23:20 +02:00
Gleb Natapov
a29182b447 sstable: fix use after free while applying extensions in sstable::open_file
sstable_file_io_extensions() return an array of pointers to extensions,
but do_for_each() may defer and the array will be destroyed. The match
keeps it alive until do_for_each completes.

Message-Id: <20190110125656.GC3172@scylladb.com>
2019-01-10 15:10:06 +02:00
Avi Kivity
b247ce01c3 table: restore indentation after changes to table::make_sstable_reader
Message-Id: <20190109175804.9352-2-avi@scylladb.com>
2019-01-10 13:00:53 +01:00
Avi Kivity
3d6be2f822 table: reduce duplication in table::make_sstable_reader
make_sstable_reader needs to deal with single-key and scanning reads, and
with restricting and non-restricting (in terms of read concurrency) readers.
Right now it does this combinatorically - there are separate cases for
restricting single-key reads, non-restricting single-key reads, restricing
scans, and non-restricting scans.

This makes further changes more complicated, so separate the two concepts.
The patch splits the code into two stages; the first selects between a single-key
and a scan, and the second selects between a restricting and non-restricting read.

This slightly pessimizes non-restricting reads (a mutation_source is created and
immediately destroyed), but that's not the common case.

Tests: unit(release)
Message-Id: <20190109175804.9352-1-avi@scylladb.com>
2019-01-10 13:00:40 +01:00
Benny Halevy
16dda033a5 sstables: row_marker: initialize _expiry
compare_row_marker_for_merge compares deletion_time also for row markers
that have missing timestamps.  This happened to succeed due to implicit
initialization to 0. However, we prefer the initialization to be explicit
and allow calling row_marker::deletion_time() in all states.

Fixes #4068

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190110102949.17896-1-bhalevy@scylladb.com>
2019-01-10 12:45:07 +01:00
Avi Kivity
4a6aeced59 Merge "Fix UDTs representation in serialization header" from Piotr
"
Tests: unit(release)
"

Fixes #4073.

* commit 'FETCH_HEAD~1':
  Add test for serialization header with UDT
  Fix UDT names in serialization header
2019-01-10 12:57:11 +02:00
Piotr Jastrzebski
d4bc5b64cf Add test for serialization header with UDT
Serialization header stores column types for all
columns in sstable. If any of them is a UDT then it
has to be wrapped into
"org.apache.cassandra.db.marshal.FrozenType(...)".

This patch adds a test case to verify that.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-10 10:59:01 +01:00
Piotr Jastrzebski
3de85aebc9 Fix UDT names in serialization header
Serialization header stores type names of all
columns in a table. Including partition key columns,
clustering key columns, static columns and regular columns.

If one of those types is a user defined type then we need to
wrap its name into
"org.apache.cassandra.db.marshal.FrozenType(...)".

Fixes #4073

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-10 10:58:30 +01:00
Benny Halevy
60323b79d1 sstables: mc: sign-extend delta local_deletion_time and delta ttl
Follow Cassandra's encoding so that values that are less than the
baseline encoding_stats will wrap-around in 64-bits rather tham 32.

Fixes #4074
Refs #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190109192703.18371-1-bhalevy@scylladb.com>
2019-01-09 21:43:30 +02:00
Rafael Ávila de Espíndola
26ac2c23ef Change *_row_* names that refer to partitions
This renames some variables and functions to make it clear that they
refer to partitions and not rows.

Old versions of sstablemetadata used to refer to a row histogram, but
current versions now mention a partition histogram instead.

This patch doesn't change the exposed API names.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181229223311.4184-2-espindola@scylladb.com>
2019-01-09 14:53:42 +02:00
Takuya ASADA
f00e9051ea reloc: show error message when relocatable package doesn't exist
Both build_rpm.sh/build_deb.sh are failing at beginning of the script
when relocatable package does not exist, need to prevent it and show
user friendly message.

Fixes #4071

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190109094353.16690-1-syuu@scylladb.com>
2019-01-09 12:53:08 +02:00
Raphael S. Carvalho
f5301990fc compaction: release reference of cleaned sstable in compaction manager
Compaction manager holds reference to all cleaning sstables till the very
end, and that becomes a problem because disk space of cleaned sstables
cannot be reclaimed due to respective file descriptors opened.

Fixes #3735.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181221000941.15024-1-raphaelsc@scylladb.com>
2019-01-08 14:14:01 +02:00
Duarte Nunes
fa2b0384d2 Replace std::experimental types with C++17 std version.
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.

Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.

Scylla now requires GCC 8 to compile.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
2019-01-08 13:16:36 +02:00
Rafael Ávila de Espíndola
51a08c3240 sstable: remove constexpr from run time predicates
We never check these predicates at compile time.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190108010055.92042-1-espindola@scylladb.com>
2019-01-08 12:28:42 +02:00
Piotr Sarna
c5346cdf9b database, table: split table-related code to table.cc
All table:: related code is moved to table.cc source file,
which splits database.cc size in half and thus allows
faster compilation on multiple cores.

Refs #1

Message-Id: <28e67f7793ff2147ffce18df5e0b077e14d3b8bd.1546940360.git.sarna@scylladb.com>
2019-01-08 12:02:42 +02:00
Avi Kivity
8ecb528d5a Update seastar submodule
* seastar 67fd967...af6b797 (1):
  > iotune: Initialize io_rates member variables

Fixes #4064
2019-01-08 12:02:42 +02:00
Avi Kivity
d8adbeda11 tests: mutation_source_test: generate valid utf-8 data
test_fast_forwarding_across_partitions_to_empty_range uses an uninitialized
string to populate an sstable, but this can be invalid utf-8 so that sstable
cannot be sstabledumped.

Make it valid by using make_random_string().

Fixes #4040.
Message-Id: <20190107193240.14409-1-avi@scylladb.com>
2019-01-08 12:02:42 +02:00
Asias He
1de24c8495 repair: Use mf.visit() in fragment_hasher
When new fragment type is added, it will fail to compile instead of
producing runtime errors.

Message-Id: <cf10200e4185c779aad15da3a776a5b79f5323af.1546930796.git.asias@scylladb.com>
2019-01-08 12:02:42 +02:00
Rafael Ávila de Espíndola
67039e942b Remove the only use of with_alignment from scylla
In c++17 there are standard ways of requesting aligned memory, so
seastar doesn't need to provide one.

This patch is in preparation for removing with_alignment from seastar.

Tests: unit (debug)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190107191019.22295-1-espindola@scylladb.com>
2019-01-07 21:34:47 +02:00
Rafael Ávila de Espíndola
0d4529a5f1 Change timeout to fix tests in a debug build
The current timeout is way too small for debug builds. Currently
jenkins runs avoid the problem by increasing the timeout by 100x. This
patch increases it by 10x, with seems to be sufficient to run the
tests in most desktop machines.

Message-Id: <20190107191413.22531-1-espindola@scylladb.com>
2019-01-07 21:34:06 +02:00
Avi Kivity
34251f5ea1 tools: toolchain: update image for all-user sudo 2019-01-07 21:22:42 +02:00
Takuya ASADA
3514b185fd tools: toolchain: allow sudo for all users
Non-privileged user may not belongs to "wheel" group, for example Debian
variants uses "sudo" group instead of "wheel".
To make sudo able to work on all environment we should allow sudo for
"ALL" instead of "wheel".

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190107173410.23140-1-syuu@scylladb.com>
2019-01-07 20:47:22 +02:00
Benny Halevy
40410465d7 sstables: mc: expired_liveness_ttl should be max int32_t rather than max uint32_t
Corresponding to Cassandra's EXPIRED_LIVENESS_TTL = Integer.MAX_VALUE;

Fixes #4060

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190107172457.20430-1-bhalevy@scylladb.com>
2019-01-07 18:41:37 +01:00
Avi Kivity
20b6d00e56 tools: toolchain: support dbuild from subdirectory or parent directory of scylla.git
When building something other than Scylla (like scylla-tools-java or scylla-jmx)
it is convenient to run it from some other directory. To do that, allow running
dbuild from any directory (so we locate tools/toolchain/image relative to the
dbuild script rather than use a fixed path) and mount the current directory
since it's likely the user will want to access files there.
Message-Id: <20190107165824.25164-1-avi@scylladb.com>
2019-01-07 18:35:51 +01:00
Nadav Har'El
f6e0ce02fa docs/isolation.md: new document
Start a new document with an overview of isolation in Scylla, i.e.,
scheduling groups, I/O priority classes, controllers, etc.

As all documents in docs/, this is a document for developers (not users!)
who need to understand how isolation between different pieces of Scylla
(e.g., queries, compaction, repair, etc.) works, which scheduling groups
and I/O classes we have and why, etc.

The document is still very partial and includes a lot of TODOs on
places where the explanation needs to be expanded. In particular it
needs an accurate explanation (and not just a name) of what kind of
work is done under each of the groups and classes, and an explanation
of how we set up RPC to use which scheduling groups for the code it
executes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190103183232.21348-1-nyh@scylladb.com>
2019-01-07 17:48:35 +02:00
Botond Dénes
80affca5f7 tests/querier_cache: use stats to get the no. of inactive reads
Now that we added stats for the inactive reads, the tests don't need
the `reader_concurrency_semaphore::inactive_reads()` method, instead
they can rely on the stats to check the number of inactive reads.
2019-01-07 17:06:26 +02:00
Botond Dénes
e56c26205f reader_concurrency_semaphore: add counters for inactive reads
Add counters that give insight into inactive read related events.
Two counters are added:
* permit_based_evictions
* population
2019-01-07 16:45:49 +02:00
Nadav Har'El
da090a5458 materialized views: move hints to top-level directory
While we keep ordinary hints in a directory parallel to the data directory,
we decided to keep the materialized view hints in a subdirectory of the data
directory, named "view_pending_updates". But during boot, we expect all
subdirectories of data/ to be keyspace names, and when we notice this one,
we print a warning:

   WARN: database - Skipping undefined keyspace: view_pending_updates

This spurious warning annoyed users. But moreover, we could have bigger
problems if the user actually tries to create a keyspace with that name.

So in this patch, we move the view hints to a separate top-level directory,
which defaults to /var/lib/scylla/view_hints, but as usual can be configured.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190107142257.16342-1-nyh@scylladb.com>
2019-01-07 16:43:43 +02:00
Takuya ASADA
eddecdd0b5 dist/redhat: drop unused dependencies
wget and yum-builddep are not used anymore, don't install them.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190107091148.1590-7-syuu@scylladb.com>
2019-01-07 12:56:18 +00:00
Takuya ASADA
40dc62fa98 dist/debian: don't use sudo to rm debian dir
sudo does not allowed in dbuild with non-root privilege, and also it
should be owned by current user, stop using sudo.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190107091148.1590-5-syuu@scylladb.com>
2019-01-07 12:56:18 +00:00
Takuya ASADA
237de20ff9 dist/debian: correct dbuild path
/usr/sbin/debuild is typo, should be /usr/bin.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190107091148.1590-4-syuu@scylladb.com>
2019-01-07 12:56:17 +00:00
Pekka Enberg
2520c8caac Merge 'Improve frozen toolchain for continuous integration' from Avi
"Add features that are useful for continuous integration pipelines (and
also ordinary developers):

 - sudo support, with and without a tty, as our packaging scripts require it
 - install ccache package to allow reducing incremental build times
 - dependencies needed to build scylla-jmx and scylla-tools-java"

* tag 'toolchain-ci/v1' of https://github.com/avikivity/scylla:
  tools: toolchain: update image for ant, maven, ccache, sudo
  tools: toolchain: dbuild: pass-through supplementary groups
  tools: toolchain: defeat PAM
  tools: toolchain: improve sudo support
  tools: toolchain: break long line in dbuild
  tools: toolchain: prepare sudoers file
  tools: toolchain: install ccache
  install-dependencies.sh: add maven and ant
2019-01-07 12:56:17 +00:00
Pekka Enberg
9b27a3035c Merge 'Reduce inclusions of "database.hh"' from Avi
"This patchset reduces inclusions of database.hh, particularly in header
files. It reduces the number of objects depending on database.hh from 166
to 116.

Tests: unit(release), playing a little with tracing"

* tag 'database.hh/v1' of https://github.com/avikivity/scylla:
  streaming: stream_session: remove include of db/view/view_update_from_staging_generator.hh
  sstables: writer.hh: add some forward declarations
  table_helper: remove database.hh include
  table_helper: de-inline insert() and setup_keyspace()
  table_helper: de-template setup_keyspace()
  table_helper: simplify template body of table_helper::insert()
  schema_tables: remove #include of database.hh
  cql_type_parser: remove dependency on user_types_metadata
  thrift: add missing include of sleep.hh
  cql3: ks_prop_defs: remove #include "database.hh"
2019-01-07 12:56:17 +00:00
Benny Halevy
b017d87a43 tests: mc: add back missing sstable_3_x_test Statistics.db files
To be able to verify the golden version with sstabledump.
These files were generated by running sstable_3_x_test and keeping its
generated output files.

Refs #4043

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190103112511.23488-2-bhalevy@scylladb.com>
2019-01-07 12:56:16 +00:00
Benny Halevy
517ad58823 tests: mc: delete empty line from write_static_row/mc-1-big-TOC.txt
Refs #4043

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190103112511.23488-1-bhalevy@scylladb.com>
2019-01-07 12:56:16 +00:00
Nadav Har'El
b14616b879 docs/logging.md: improvements
Various small improvements to docs/logging.md:
1. Describe the options to log to stdout or syslog and their defaults.
2. Mention the possibility of using nodetool instead of REST API.
3. Additional small tweaks to formatting.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190106111851.26700-1-nyh@scylladb.com>
2019-01-06 13:20:53 +02:00
Nadav Har'El
232e97ad06 docs/logging.md: new document
Add a new document about logging in Scylla, and how to change the log levels
when running Scylla and during the run.

It needs more developer-oriented information (e.g., how to create new logger
subsystems in the code) but I think it's a good start.

Some of the text is based on Glauber's writeup for the Scylla website on
changing log levels at runtime.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190106103606.26032-1-nyh@scylladb.com>
2019-01-06 12:40:14 +02:00
Benny Halevy
2daf81e80f dist: redhat/debian specs: add dependency on 'file' package
Needed by seastar-addr2line

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20190101203434.14858-1-bhalevy@scylladb.com>
2019-01-06 12:13:08 +02:00
Avi Kivity
f02c64cadf streaming: stream_session: remove include of db/view/view_update_from_staging_generator.hh
This header, which is easily replaced with a forward declaration,
introduces a dependency on database.hh everywhere. Remove it and scatter
includes of database.hh in source files that really need it.
2019-01-05 17:33:25 +02:00
Avi Kivity
ca93b88cfb sstables: writer.hh: add some forward declarations
This makes the header less dependent on previously-included headers.
2019-01-05 17:04:16 +02:00
Avi Kivity
53a21c7787 table_helper: remove database.hh include 2019-01-05 16:39:26 +02:00
Avi Kivity
7534412071 table_helper: de-inline insert() and setup_keyspace()
After previous patches de-templated these functions, we can de-inline them.
This helps reduce compile time and prepares to reduce header dependencies.
2019-01-05 16:28:46 +02:00
Avi Kivity
cfedf4ab0f table_helper: de-template setup_keyspace()
This setup function has no reason to be a template and is easily
converted. We can then later de-inline it to reduce dependencies.
2019-01-05 16:23:10 +02:00
Avi Kivity
659147cd79 table_helper: simplify template body of table_helper::insert()
Move most of the body into a non-template overload to reduce dependencies
in the header (and template bloat). The function is not on any fast path,
and noncopyable_function will likely not even allocate anything.
2019-01-05 16:22:08 +02:00
Avi Kivity
c3ef99f84f schema_tables: remove #include of database.hh
Distribute in source files (and one header - table_helper.hh) that need it.
2019-01-05 15:43:07 +02:00
Avi Kivity
f43f82d1d2 cql_type_parser: remove dependency on user_types_metadata
A default parameter of type T (or lw_shared_ptr<T>) requires that T be
defined. Remove the depndency by redefining the default parameter
as an overload, for T = user_types_metadata.
2019-01-05 15:40:58 +02:00
Avi Kivity
4ba1d4d1dc thrift: add missing include of sleep.hh
Currently obtained indirectly through database.hh.
2019-01-05 15:39:30 +02:00
Avi Kivity
d24962e16c cql3: ks_prop_defs: remove #include "database.hh"
Replace with forward declaration to reduce rebuilds.
2019-01-05 14:26:03 +02:00
Jesse Haber-Kucharsky
17a5f7acab build: Link against libatomic
Since Scylla uses functions from the `atomic` header in its own source
code, we need to explicitly link against the stub library that is
provided for hardware architectures that do not have native support for
atomic operations.

Fixes #4053

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <7d62e762130494d73565ce8c031f53aaf866d3aa.1546645041.git.jhaberku@scylladb.com>
2019-01-05 13:38:57 +02:00
Avi Kivity
36e4e9fb54 Update seastar submodule
* seastar 6c8c229...67fd967 (1):
  > perftune.py: tune only active NVMe HW queues on i3 AWS instances
2019-01-04 13:17:29 +02:00
Avi Kivity
b0980ba7c6 compaction_controller: increase minimum shares to 50 (~5%) for small-data workloads
The workload in #3844 has these characteristics:
 - very small data set size (a few gigabytes per shard)
 - large working set size (all the data, enough for high cache miss rate)
 - high overwrite rate (so a compaction results in 12X data reduction)

As a result, the compaction backlog controller assigns very few shares to
compaction (low data set size -> low backlog), so compaction proceeds very slowly.
Meanwhile, we have tons of cache misses, and each cache miss needs to read from a
large number of sstables (since compaction isn't progressing). The end result is
a high read amplification, and in this test, timeouts.

While we could declare that the scenario is very artificial, there are other
real-world scenarios that could trigger it. Consider a 100% write load
(population phase) followed by 100% read. Towards the end of the last compaction,
the backlog will drop more and more until compaction slows to a crawl, and until
it completes, all the data (for that compaction) will have to be read from its
input sstables, resulting in read amplification.

We should probably have read amplification affect the backlog, but for now the
simpler solution is to increase the minimum shares to 50 so that compaction
always makes forward progress. This will result in higher-than-needed compaction
bandwidth in some low write rate scenarios so we will see fluctuations in request
rate (what the controller was designed to avoid), but these fluctioations will be
limited to 5%.

Since the base class backlog_controller has a fixed (0, 0) point, remove it
and add it to derived classes (setting it to (0, 50) for compaction).

Fixes #3844 (or at least improves it).
Message-Id: <20181231162710.29410-1-avi@scylladb.com>
2019-01-04 10:58:43 +01:00
Duarte Nunes
b851cb1a9a distributed_loader: Forbid uploading MV sstables
Instead suggest that the views be re-created.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190103142933.35354-1-duarte@scylladb.com>
2019-01-03 16:31:20 +02:00
Avi Kivity
7d3562a403 tools: toolchain: update image for ant, maven, ccache, sudo 2019-01-03 16:16:47 +02:00
Avi Kivity
344468e20d tools: toolchain: dbuild: pass-through supplementary groups
Useful for ccache.
2019-01-03 16:16:47 +02:00
Avi Kivity
11889f5ea9 tools: toolchain: defeat PAM
Prevent PAM from enforcing security and preventing sudo from working. This is
done by replacing the default configuration (designed for workstations) to
one that uses pam_permit for everything.
2019-01-03 16:16:47 +02:00
Avi Kivity
9c258923d8 tools: toolchain: improve sudo support
Bind-mount /etc/passwd and /etc/group so sudo doesn't complain, and
support sudo without password or tty.
2019-01-03 16:16:47 +02:00
Avi Kivity
05f78df7b9 tools: toolchain: break long line in dbuild 2019-01-03 16:16:47 +02:00
Avi Kivity
f79a300081 tools: toolchain: prepare sudoers file
Don't require a tty or passwords, since they won't be available in
continuous integration environments.
2019-01-03 16:16:47 +02:00
Avi Kivity
25040824cf tools: toolchain: install ccache
Not strictly necessary, but often useful to reduce rebuild times. The user
will need to bind-mount a populated cache.
2019-01-03 16:16:47 +02:00
Avi Kivity
527e3a58ff install-dependencies.sh: add maven and ant
Add tools needed to build scylla-jmx and scylla-tools-java. While
not requirements of this repository, it's nicer if a single setup
can be used to build and run everything.

We also install pystache as it's used by packaging scripts.
2019-01-03 16:16:45 +02:00
Duarte Nunes
3235c13125 utils/fragmented_temporary_buffer: Correctly implement remove_suffix()
The current implementation breaks the invariant that

_size_bytes = reduce(_fragments, &temporary_buffer::size)

In particular, this breaks algorithms that check the individual
segment size.

Correctly implement remove_suffix() by destroying superfluous
temporary_buffer's and by trimming the last one, if needed.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190103133523.34937-1-duarte@scylladb.com>
2019-01-03 13:37:01 +00:00
Botond Dénes
021feef513 querier_cache: simplify memory eviction use-after-free fix, add tests
Simplify the fix for memory based eviction, introduced by 918d255 so
there is no need to massage the counters.

Also add a check to `test_memory_based_cache_eviction` which checks for
the bug fixed. While at it also add a check to
`test_time_based_cache_eviction` for the fix to time based eviction
(e5a0ea3).

Tests: tests/querier_cache:debug
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <c89e2788a88c2a701a2c39f377328e77ac01e3ef.1546515465.git.bdenes@scylladb.com>
2019-01-03 13:44:08 +02:00
Tomasz Grabiec
1613a623e1 Merge "Fix crash on corrupt sstable" from Rafael
* https://github.com/espindola/scylla espindola/invalid_boundary4:
  sstables: Refactor predicates on bound_kind_m
  Fix crash on corrupt sstable
2019-01-03 12:02:09 +01:00
Duarte Nunes
42d9ca8266 Merge 'Add staging SSTables support to row level repair' from Piotr
"
This series adds staging SSTables support to row level repair.
It was introduced for streaming sessions before, but since row level
repair doesn't leverage sessions at all, it's added separately.

Tests:
unit (release)
dtest (repair_additional_test.py:RepairAdditionalTest,
       excluding repair_abort_test, which fails for me locally on master)
"

* 'add_staging_sstables_generation_to_row_level_repair_2' of https://github.com/psarna/scylla:
  repair: add staging sstables support to row level repair
  main,repair: add params to row level repair init
  streaming,view: move view update checks to separate file
2019-01-03 09:40:13 +00:00
Piotr Sarna
a73d9ccf31 service: mark existing views as built before bootstrap
When a node is bootstrapping, it will receive data from other nodes
via streaming, including materialized views. Regardless whether these
views are built on other nodes or not, building them on newly
bootstrapped nodes has no effect - updates were either already streamed
completely (if view building have finished) or will be propagated
via view building, if the process is still ongoing.
So, marking all views as 'built' for the bootstrapped node prevents it
from spawning superfluous view building processes.

Fixes #4001
Message-Id: <fd53692c38d944122d1b1013fdb0aedf517fa409.1546498861.git.sarna@scylladb.com>
2019-01-03 09:39:33 +00:00
Botond Dénes
e5a0ea390a querier_cache: unregister queriers evicted due to expired TTL
Currently queriers evicted due to their TTL expiring are not
unregistered from the `reader_concurrency_semaphore`. This can cause a
use-after-free when the semaphore tries to evict the same querier at
some later point in time, as the querier entry it has a pointer to is
now invalid.

Fix by unregistering the querier from the semaphore before destroying
the entry.

Refs: #4018
Refs: #4031

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4adfd09f5af8a12d73c29d59407a789324cd3d01.1546504034.git.bdenes@scylladb.com>
2019-01-03 10:29:26 +02:00
Piotr Sarna
bc74ac6f09 repair: add staging sstables support to row level repair
In some cases, sstables created during row level repair
should be enqueued as staging in order to generate
view updates from them.

Fixes #4034
2019-01-03 08:36:45 +01:00
Piotr Sarna
a0003c52cf main,repair: add params to row level repair init
Row level repair needs references to system distributed keyspace
and view update generator in order to enqueue some sstables
as staging.
2019-01-03 08:31:41 +01:00
Piotr Sarna
9d46715613 streaming,view: move view update checks to separate file
Checking if view update path should be used for sstables
is going to be reused in row level repair code,
so relevant functions are moved to a separate header.
2019-01-03 08:31:40 +01:00
Avi Kivity
918d255168 querier_cache: unregister querier from reader_concurrency_semaphore during eviction
In insert_querier(), we may evict older queriers to make room for the new one.
However, we forgot to unregister the evicted queriers from
reader_concurrency_semaphore. As a result, when reader_concurrency_semaphore
eventually wanted to evict something, it saw an inactive_read_handle that was
not connected to a querier_cache::entry, and crashed on use-after-free.

Fix by evicting through the inactive_read_handle associated with the querier
to be evicted. This removes traces of the querier from both
reader_concurrency_semaphore and querier_cache. We also have to massage the
statistics since querier_inactive_read::evict() updates different counters.

Fixes #4018.

Tests: unit(release)
Reviewed-by: Botond Denes <bdenes@scylladb.com>
Message-Id: <20190102175023.26093-1-avi@scylladb.com>
2019-01-03 09:15:07 +02:00
Rafael Ávila de Espíndola
28c014351f Fix crash on corrupt sstable
The check in consume_range_tombstone was too late. Before getting to
it we would fail an assert calling to_bound_kind.

This moves the check earlier and adds a testcase.

Tests: unit (release)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-02 17:52:07 -08:00
Rafael Ávila de Espíndola
3c9178d122 sstables: Refactor predicates on bound_kind_m
This moves the predicate functions to the start of the file, renames
is_in_bound_kind to is_bound_kind for consistency with to_bound_kind
and defines all predicates in a similar fashion.

It also uses the predicates to reduce code duplication.

Tests: unit (release)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-02 17:50:44 -08:00
Avi Kivity
2717bdd301 tools: toolchain: allow adjusting "docker run" command line
It is useful to adjust the command line when running the docker image,
for example to attach a data volume or a ccache directory. Add e mechanism
to do that.
Message-Id: <20181228163306.19439-1-avi@scylladb.com>
2019-01-01 21:44:50 +00:00
Avi Kivity
d19660ec0a Merge "commitlog: Use fragmented buffers for reading entries" from Duarte
"
Instead of allocating a contiguous temporary_buffer when reading
mutations from the commitlog - or hint - replaying, use fragemnted
buffers instead.

Refs #4020
"

* 'commitlog/fragmented-read/v1' of https://github.com/duarten/scylla:
  db/commitlog: Use fragmented buffers to read entries
  db/commitlog: Implement skip in terms of input buffer skipping
  tests/fragmented_temporary_buffer_test: Add unit test for remove_suffix()
  utils/fragmented_temporary_buffer: Add remove_suffix
  tests/fragmented_temporary_buffer_test: Add unit test for skip()
  utils/fragmented_temporary_buffer: Allow skipping in the input stream
2019-01-01 19:08:34 +02:00
Avi Kivity
6641353854 tracing: remove static class_registry
Static class_registries hinder librarification by requiring linking with
all object files (instead of a library from which objects are linked on
demand) and reduce readability by hiding dependencies and by their
horrible syntax. Hide them behind a non-static, non-template tracing
backend registry.
Message-Id: <20181229121000.7885-1-avi@scylladb.com>
2018-12-31 13:24:54 +00:00
Duarte Nunes
b7517183fa db/commitlog: Use fragmented buffers to read entries
Leverage fragmented_temporary_buffer when reading commit log
entries, avoiding large allocations.

Refs #4020

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Duarte Nunes
0e50a9bc6d db/commitlog: Implement skip in terms of input buffer skipping
This simplifies the code and allows to get rid of the overload of
advance() taking a temporary_buffer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Duarte Nunes
8379ac6189 tests/fragmented_temporary_buffer_test: Add unit test for remove_suffix()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Duarte Nunes
1a88cd7992 utils/fragmented_temporary_buffer: Add remove_suffix
Essentially hide some bytes off the end of the buffer. Needed for
subsequent commit log changes.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Duarte Nunes
50dd8b67b2 tests/fragmented_temporary_buffer_test: Add unit test for skip()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Duarte Nunes
8eab0a3e01 utils/fragmented_temporary_buffer: Allow skipping in the input stream
Add fragmented_temporary_buffer::istream::skip(), needed for
subsequent commit log changes.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-31 13:20:37 +00:00
Avi Kivity
c180a18dbb Distribute distributed_loader into its own header and source files
distributed_loader is a sizeable fraction of database.cc, so moving it
out reduces compile time and improves readability.
Message-Id: <20181230200926.15074-1-avi@scylladb.com>
2018-12-31 14:27:27 +02:00
Avi Kivity
49958d5836 tools: toolchain: update for lz4 1.8.3
lz4 1.8.3 was released with a fix for data corruption during compression. While
the release notes indicate we aren't vulnerable, be cautious and update anyway.
Message-Id: <20181230144716.7238-1-avi@scylladb.com>
2018-12-31 14:27:27 +02:00
Hagit Segev
141fad9c14 Update README.md
fix a typo
2018-12-31 13:33:04 +02:00
Asias He
d90836a2d3 streaming: Make total_incoming_bytes and total_outgoing_bytes metrics monotonic
Currently, they increases and decreases as the stream sessions are
created and destroyed. Make them prometheus monotonically increasing
counter for easier monitoring.

Message-Id: <7c07cea25a59a09377292dc8f64ed33ff12eda87.1545959905.git.asias@scylladb.com>
2018-12-30 16:52:17 +02:00
Pekka Enberg
96172b7bca Merge 'Fixes for the view_update_from_staging_generator' from Duarte
"This series contains a couple of fixes to the
view_update_from_staging_generator, the object responsible for
generating view updates from sstables written through streaming.

Fixes #4021"
* 'materialized-views/staging-generator-fixes/v2' of https://github.com/duarten/scylla:
  db/view/view_update_from_staging_generator: Break semaphore on stop()
  db/view/view_update_from_staging_generator: Restore formatting
  db/view/view_update_from_staging_generator: Avoid creating more than one fiber
2018-12-29 18:31:40 +02:00
Duarte Nunes
f41d13f38c db/view/view_update_from_staging_generator: Break semaphore on stop()
This avoid having fibers waiting _registration_sem without ever being
notified.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-29 12:55:04 +00:00
Duarte Nunes
4974addc5c db/view/view_update_from_staging_generator: Restore formatting
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-29 12:55:02 +00:00
Duarte Nunes
201196130d db/view/view_update_from_staging_generator: Avoid creating more than one fiber
If view_update_from_staging_generator::maybe_generate_view_updates()
is called before view_update_from_staging_generator::start(), as can
happen in main.cc, then we can potentially create more than one fiber,
which leads to corrupted state and conflicting operations.

To avoid this, use just one fiber and be explicit about notifying it
that more work is needed, by leveraging a condition-variable.

Fixes #4021

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-29 12:52:51 +00:00
Duarte Nunes
66113a2d39 Merge 'Replace query_processor's sharded<database> with plain database' from Avi
"
A sharded<database> is not very useful for accessing data since data is
usually distributed across many nodes, while a sharded<database>
contains only a single node's view. So it is really only used for
accessing replicated metadata, not data. As such only the local shard
is accessed.

Use that to simplify query_processor a little by replacing sharded<database>
with a plain database.

We can probably be more ambitious and make all accesses, data and metadata,
go through storage_proxy, but this is a start.
"

* tag 'qp-unshard-database/v1' of https://github.com/avikivity/scylla:
  query_processor: replace sharded<database> with the local shard
  commitlog_replayer: don't use query_processor
  client_state: change set_keyspace() to accept a single database shard
  legacy_schema_migrator: initialize with database reference
2018-12-29 12:14:19 +00:00
Avi Kivity
0c0cc66ee7 system_keyspace, view: reduce interdependencies
system_keyspace is an implementation detail for most of its users, not
part of the interface, as it's only used to store internal data. Therefore,
including it in a header file causes unneeded dependencies.

This patch removes a dependency between views and system_keyspace.hh
by moving view_name and view_build_progress into a separate header file,
and using forward declarations where possible. This allows us to
remove an inclusion of system_keyspace.hh from a header file (the last
one), so that further changes to system_keyspace.hh will cause fewer
recompilations.
Message-Id: <20181228215736.11493-1-avi@scylladb.com>
2018-12-29 12:12:15 +00:00
Avi Kivity
30745eeb72 query_processor: replace sharded<database> with the local shard
query_processor uses storage_proxy to access data, and the local
database object to access replicated metadata. While it seems strange
that the database object is not used to access data, it is logical
when you consider that a sharded<database> only contain's this node's
data, not the cluster data.

Take advantage of this to replace sharded<database> with a single database
shard.
2018-12-29 11:02:15 +02:00
Avi Kivity
f0a709cfc8 commitlog_replayer: don't use query_processor
During normal writes, query processing happens before commitlog, so
logically commitlog replaying the commitlog shouldn't need it. And in
fact the dependency on query_processor can be eliminated, all it needs
is the local node's database.
2018-12-29 11:00:29 +02:00
Avi Kivity
7830086317 client_state: change set_keyspace() to accept a single database shard
set_keyspace() only needs one shard (it is checking replicated state,
not sharded data) so arrange for it to receive only that one shard.
2018-12-29 10:58:39 +02:00
Avi Kivity
e4233262cf legacy_schema_migrator: initialize with database reference
Provide legacy_schema_migrator with a sharded<database> so it doesn't need
to use the one from query_processor. We want to replace query_processor's
sharded<database> with just a local database reference in order to simplify
it, and this is standing in the way.
2018-12-29 10:58:22 +02:00
Duarte Nunes
bab7e6877b streaming/stream_session: Only stage sstables for tables with views
When streaming, sstables for which we need to generate view updates
are placed in a special staging directory. However, we only need to do
this for tables that actually have views.

Refs #4021
Message-Id: <20181227215412.5632-1-duarte@scylladb.com>
2018-12-28 18:32:24 +02:00
Avi Kivity
feddf0b021 tools: toolchain: patch boost for use-after-free in Boost.Test XML output
The version of boost in Fedora 29 has a use-after-free bug that is only
exposed when ./test.py is run with the --jenkins flag.  To patch it,
use a fixed version from the copr repository scylladb/toolchain.
Message-Id: <20181228150419.29623-1-avi@scylladb.com>
2018-12-28 16:35:28 +01:00
Tomasz Grabiec
7747f2dde3 Merge "nodetool toppartitions" from Rafi & Avi
Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write
operation over a period of time.

Content:
- data_listener classes: mechanism that interfaces with mutation readers in database and table classes,
- toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this
  interfaces with data_listeners and the REST api),
- REST api for toppartitions query.

Uses Top-k structure for handling stream summary statistics (based on implementation in C*, see #2811).

What's still missing:
- JMX interface to nodetool (interface customization may be required),
- Querying #rows and #bytes (currently, only #partitions is supported).

Fixes #2811

* https://github.com/avikivity/scylla rafie_toppartitions_v7.1:
  top_k: whitespace and minor fixes
  top_k: map template arguments
  top_k: std::list -> chunked_vector
  top_k: support for appending top_k results
  nodetool toppartitions: refactor table::config constructor
  nodetool toppartitions: data listeners
  nodetool toppartitions: add data_listeners to database/table
  nodetool toppartitions: fully_qualified_cf_name
  nodetool toppartitions: Toppartitions query implementation
  nodetool toppartitions: Toppartitions query REST API
  nodetool toppartitions: nodetool-toppartitions script
2018-12-28 16:31:24 +01:00
Rafi Einstein
7677d2ba2c nodetool toppartitions: nodetool-toppartitions script
A Python script mimicking the nodetool toppartitions utility, utilizing Scylla REST API.

Examples:
$ ./nodetool-toppartitions --help
usage: nodetool-toppartitions [-h] [-k LIST_SIZE] [-s CAPACITY]
                              keyspace table duration

Samples database reads and writes and reports the most active partitions in a
specified table

positional arguments:
  keyspace      Name of keyspace
  table         Name of column family
  duration      Query duration in milliseconds

optional arguments:
  -h, --help    show this help message and exit
  -k LIST_SIZE  The number of the top partitions to list (default: 10)
  -s CAPACITY   The capacity of stream summary (default: 256)

$ ./nodetool-toppartitions ks test1 10000
READ
  Partition   Count
  30          2
  20          2
  10          2

WRITE
  Partition   Count
  30          1
  20          1
  10          1

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:48:03 +02:00
Rafi Einstein
197f38d4ee nodetool toppartitions: Toppartitions query REST API
A HTTP GET operation starts the query (with args: ks/cf name and duration in ms).
It executes synchroneously, results are returned as JSON:
$ curl -s -X GET http://localhost:10000/column_family/toppartitions/ks:cf1?duration=10000 | jq
{
  "read": [
    {
      "count": "15",
      "error": "0",
      "partition": "4b504d39354f37353131"
    },
    {
      "count": "15",
      "error": "0",
      "partition": "3738313134394d353530"
    }
  ],
  "write": [
    {
      "count": "15",
      "error": "0",
      "partition": "4b504d39354f37353131"
    },
    {
      "count": "15",
      "error": "0",
      "partition": "3738313134394d353530"
    }
  ]
}

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
6b2c21f69b nodetool toppartitions: Toppartitions query implementation
toppartitions_query installs toppartitions_data_listener-s on all database shards, waits for
the designated period, uninstalls shards and collects top-k read/write partition keys.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
404f75def5 nodetool toppartitions: fully_qualified_cf_name
Encapsulate keyspace:column_family REST API argument parsing into fully_qualified_cf_name class.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
0bffe5f83e nodetool toppartitions: add data_listeners to database/table
Add data_listeners member to database.
Adds data_listeners* to table::config, to be used by table methods to invoke listeners.
Install on_read() listener in table::make_reader().
Install on_write() listener in database::apply_in_memory().

Tests: Unit (release)
Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
08ba115c16 nodetool toppartitions: data listeners
Mechanism that interfaces with mutation readers in database and table classes, to
allow tracking most frequent partition keys in read and write operation.
Basic design is specified in #2811.

Tracking top #rows and #bytes will be supported in the future.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
038f8c7988 nodetool toppartitions: refactor table::config constructor
Eliminae extra parameters to ctor and deduce them instead from db param.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:57 +02:00
Rafi Einstein
eda43b93c9 top_k: support for appending top_k results
Allow appending results of one top_k into another.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:56 +02:00
Rafi Einstein
aeebe8e86b top_k: std::list -> chunked_vector
Replaced std::list with chunked_vector. Because chunked_vector requires
a noexcept move constructor from its value type, change the bad_boy type
in the unit test not to throw in the move constructor.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-28 16:45:07 +02:00
Avi Kivity
8e2f6d0513 Merge "Fix use-after-free when destroying partition_snapshots in the background"from Tomasz
"
partition_snapshots created in the memtable will keep a reference to
the memtable (as region*) and to memtable::_cleaner. As long as the
reader is alive, the memtable will be kept alive by
partition_snapshot_flat_reader::_container_guard. But after that
nothing prevents it from being destroyed. The snapshot can outlive the
read if mutation_cleaner::merge_and_destroy() defers its destruction
for later. When the read ends after memtable was flushed, the snapshot
will be queued in the cache's cleaner, but internally will reference
memtable's region and cleaner. This will result in a use-after-free
when the snapshot resumes destruction.

The fix is to update snapshots's region and cleaner references at the
time of queueing to point to the cache's region and cleaner.

When memtable is destroyed without being moved to cache there is no
problem because the snapshot would be queued into memtable's cleaner,
which will be drained on destruction from all snapshots.

Introduced in f3da043 (in >= 3.0-rc1)

Fixes #4030.

Tests:

  - mvcc_test (debug)

"

* tag 'fix-snapshot-merging-use-after-free-v1.1' of github.com:tgrabiec/scylla:
  tests: mvcc: Add test_snapshot_merging_after_container_is_destroyed
  tests: mvcc: Introduce mvcc_container::migrate()
  tests: mvcc: Make mvcc_partition move-constructible
  tests: mvcc: Introduce mvcc_container::make_not_evictable()
  tests: mvcc: Allow constructing mvcc_container without a cache_tracker
  mutation_cleaner: Migrate partition_snapshots when queueing for background cleanup
  mvcc: partition_snapshot: Introduce migrate()
  mutation_cleaner: impl: Store a back-reference to the owning mutation_cleaner
2018-12-28 12:45:10 +02:00
Tomasz Grabiec
bb1c9cb6f3 tests: mvcc: Add test_snapshot_merging_after_container_is_destroyed 2018-12-28 10:32:39 +01:00
Tomasz Grabiec
4d13dea39a tests: mvcc: Introduce mvcc_container::migrate() 2018-12-28 10:32:39 +01:00
Tomasz Grabiec
676868ed31 tests: mvcc: Make mvcc_partition move-constructible 2018-12-28 10:32:39 +01:00
Tomasz Grabiec
c6798f7872 tests: mvcc: Introduce mvcc_container::make_not_evictable() 2018-12-28 10:32:39 +01:00
Tomasz Grabiec
1fa00656ea tests: mvcc: Allow constructing mvcc_container without a cache_tracker
Some test cases will need many containers to simulate memtable ->
cache transitions, but there can be only one cache_tracker per shard
due to metrics. Allow constructing a conatiner without a cache_tracker
(and thus non-evictable).
2018-12-28 10:32:39 +01:00
Tomasz Grabiec
ac49b1def0 mutation_cleaner: Migrate partition_snapshots when queueing for background cleanup
partition_snapshots created in the memtable will keep a reference to
the memtable (as region*) and to memtable::_cleaner. As long as the
reader is alive the memtable will be kept alive by
partition_snapshot_flat_reader::_container_guard. But after that,
nothing prevents it from being destroyed. The snapshot can outlive the
read if mutation_cleaner::merge_and_destroy() defers its destruction
for later. When the read ends after memtable was flushed, the snapshot
will be queued in the cache's cleaner, but internally will reference
memtable's region and cleaner. This will result in a use-after-free
when the snapshot resumses destruction.

The fix is to update snapshots's region and cleaner references at the
time of queueing to point to the cache's region and cleaner.

When memtable is destroyed without being moved to cache there is no
problem, because the snapshot would be queued into memtable's cleaner,
which will be drained on destruction from all snapshots.

Introduced in f3da043.

Fixes #4030.
2018-12-27 18:08:50 +01:00
Tomasz Grabiec
20f5d5d1a1 mvcc: partition_snapshot: Introduce migrate()
Snapshots which outlive the memtable will need to have their
_region and _cleaner references updated.

The snapshot can be destroyed after the memtable when it is queud in
the mutation_cleaner.
2018-12-27 18:08:50 +01:00
Tomasz Grabiec
67f9afbd1a mutation_cleaner: impl: Store a back-reference to the owning mutation_cleaner 2018-12-27 18:08:50 +01:00
Gleb Natapov
37b4043677 streaming: always read from rpc::source until end-of-stream during mutation sending
rpc::source cannot be abandoned until EOS is reached, but current code
does not obey it if error code is received, it throws exception instead that
aborts the reading loop. Fix it by moving exception throwing out of the
loop.

Fixes: #4025

Message-Id: <20181227135051.GC29458@scylladb.com>
2018-12-27 16:50:53 +02:00
Asias He
4d3c463536 storage_service: Stop cql server before gossip
We saw failure in dtest concurrent_schema_changes_test.py:
TestConcurrentSchemaChanges.changes_while_node_down_test test.

======================================================================
ERROR: changes_while_node_down_test (concurrent_schema_changes_test.TestConcurrentSchemaChanges)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/asias/src/cloudius-systems/scylla-dtest/concurrent_schema_changes_test.py", line 432, in changes_while_node_down_test
    self.make_schema_changes(session, namespace='ns2')
  File "/home/asias/src/cloudius-systems/scylla-dtest/concurrent_schema_changes_test.py", line 86, in make_schema_changes
    session.execute('USE ks_%s' % namespace)
  File "cassandra/cluster.py", line 2141, in cassandra.cluster.Session.execute
    return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state).result()
  File "cassandra/cluster.py", line 4033, in cassandra.cluster.ResponseFuture.result
    raise self._final_exception
ConnectionShutdown: Connection to 127.0.0.1 is closed

The test:

   session = self.patient_cql_connection(node2)
   self.prepare_for_changes(session, namespace='ns2')
   node1.stop()
   self.make_schema_changes(session, namespace='ns2') --> ConnectionShutdown exception throws

The problem is that, after receiving the DOWN event, the python
Cassandra driver will call Cluster:on_down which checks if this client
has any connections to the node being shutdown. If there is any
connections, the Cluster:on_down handler will exit early, so the session
to the node being shutdown will not be removed.

If we shutdown the cql server first, the connection count will be zero
and the session will be removed.

Fixes: #4013
Message-Id: <7388f679a7b09ada10afe7e783d7868a58aac6ec.1545634941.git.asias@scylladb.com>
2018-12-27 14:13:43 +02:00
Duarte Nunes
2f69ba2844 lwt: Remove Paxos-related Cassandra code
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181227112526.4180-1-duarte@scylladb.com>
2018-12-27 13:30:10 +02:00
Duarte Nunes
66e45469b2 streaming/stream_session: Don't use table reference across defer points
When creating a sstable from which to generate view updates, we held
on to a table reference across defer points. In case there's a
concurrent schema drop, the table object might be destroyed and we
will incur in a use-after-free. Solve this by holding on to a shared
pointer and pinning the table object.

Refs #4021

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181227105921.3601-1-duarte@scylladb.com>
2018-12-27 13:05:46 +02:00
Avi Kivity
b349e11aba tools: toolchain: avoid docker-provided /tmp
On at least one system, using the container's /tmp as provided by docker
results in spurious EINVALs during aio:

INFO  2018-12-27 09:54:08,997 [shard 0] gossip - Feature ROW_LEVEL_REPAIR is enabled
unknown location(0): fatal error: in "test_write_many_range_tombstones": storage_io_error: Storage I/O error: 22: Invalid argument
seastar/tests/test-utils.cc(40): last checkpoint

The setup is overlayfs over xfs.

To avoid this problem, pass through the host's /tmp to the container.
Using --tmpfs would be better, but it's not possible to guess a good size
as the amount of temporary space needed depends on build concurrency.
Message-Id: <20181227101345.11794-1-avi@scylladb.com>
2018-12-27 10:17:23 +00:00
Avi Kivity
2c4a732735 tools: toolchain: update baseline Fedora packages
Image fedora-29-20181219 was broken due to the followin chain of events:

 - we install gnutls, which currently is at version 3.6.5
 - gnutls 3.6.5 introduced a dependency on nettle 3.4.1
 - the gnutls rpm does not include a version requirement on nettle,
   so an already-installed nettle will not be upgraded when gnutls is
   installed
 - the fedora:29 image which we use as a baseline has nettle installed
 - docker does not pull the latest tag in FROM statements during
   "docker build"
 - my build machine already had a fedora:29 image, with nettle 3.4
   installed (the repository's image has 3.4.1, but docker doesn't
   automatically pull if an image with the required tag exists)

As a result, the image ended up hacing gnutls 3.6.5 and nettle 3.4, which
are incompatible.

To fix, update all packages after installation to attempt to have a self
consistent package set even if dependencies are not perfect, and regenerate
the image.
Message-Id: <20181226135711.24074-1-avi@scylladb.com>
2018-12-26 14:58:23 +00:00
Avi Kivity
1414837fcc tools: toolchain: improve dbuild for continuous integration environments
The '-t' flag to 'docker run' passes the tty from the caller environment
to the container, which is nice for interactive jobs, but fails if there
is no tty, such as in a continuous integration environment.

Given that, the '-i' flag doesn't make sense either as there isn't any
input to pass.

Remove both, and replace with --sig-proxy=true which allows SIGTERM to
terminate the container instead of leaving it alive. This reduces the
chances of the build stopping but leaving random containers around.
Message-Id: <20181222105837.22547-1-avi@scylladb.com>
2018-12-26 10:50:34 +00:00
Avi Kivity
bfd8ade914 tools: toolchain: update toolchain for gcc-8.2.1-6
gcc was updated with some important fixes; update the toolchain to
include it.
Message-Id: <20181219190548.28675-1-avi@scylladb.com>
2018-12-26 10:21:02 +00:00
Benny Halevy
206483e6af position_in_partition_view: print bound_weight as int
Rather than a non-printable char.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181226091115.18530-1-bhalevy@scylladb.com>
2018-12-26 11:19:30 +02:00
Rafael Ávila de Espíndola
f73c60d8cf sstables: Convert an unreachable throw into an assert in read path
The function pending_collection is only called when
cdef->is_multi_cell() is true, so the throw is dead.

This patch converts it to an assert.
Message-Id: <20181207022119.38387-1-espindola@scylladb.com>
2018-12-26 11:10:19 +02:00
Benny Halevy
52188a20fa HACKING.md: Add details about unit test debug info
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181225133513.20751-1-bhalevy@scylladb.com>
2018-12-25 16:03:24 +02:00
Avi Kivity
c96fc1d585 Merge "Introduce row level repair" from Asias
"
=== How the the partition level repair works

- The repair master decides which ranges to work on.
- The repair master splits the ranges to sub ranges which contains around 100
partitions.
- The repair master computes the checksum of the 100 partitions and asks the
related peers to compute the checksum of the 100 partitions.
- If the checksum matches, the data in this sub range is synced.
- If the checksum mismatches, repair master fetches the data from all the peers
and sends back the merged data to peers.

=== Major problems with partition level repair

- A mismatch of a single row in any of the 100 partitions causes 100
partitions to be transferred. A single partition can be very large. Not to
mention the size of 100 partitions.

- Checksum (find the mismatch) and streaming (fix the mismatch) will read the
same data twice

=== Row level repair

Row level checksum and synchronization: detect row level mismatch and transfer
only the mismatch

=== How the row level repair works

- To solve the problem of reading data twice

Read the data only once for both checksum and synchronization between nodes.

We work on a small range which contains only a few mega bytes of rows,
We read all the rows within the small range into memory. Find the
mismatch and send the mismatch rows between peers.

We need to find a sync boundary among the nodes which contains only N bytes of
rows.

- To solve the problem of sending unnecessary data.

We need to find the mismatched rows between nodes and only send the delta.
The problem is called set reconciliation problem which is a common problem in
distributed systems.

For example:
Node1 has set1 = {row1, row2, row3}
Node2 has set2 = {      row2, row3}
Node3 has set3 = {row1, row2, row4}

To repair:
Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3.
Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2
Node1 sends row3 (set1 + set2 + set3 - set3) to Node3.

=== How to implement repair with set reconciliation

- Step A: Negotiate sync boundary

class repair_sync_boundary {
    dht::decorated_key pk;
    position_in_partition position
}

Reads rows from disk into row buffers until the size is larger than N
bytes. Return the repair_sync_boundary of the last mutation_fragment we
read from disk. The smallest repair_sync_boundary of all nodes is
set as the current_sync_boundary.

- Step B: Get missing rows from peer nodes so that repair master contains all the rows

Request combined hashes from all nodes between last_sync_boundary and
current_sync_boundary. If the combined hashes from all nodes are identical,
data is synced, goto Step A. If not, request the full hashes from peers.

At this point, the repair master knows exactly what rows are missing. Request the
missing rows from peer nodes.

Now, local node contains all the rows.

- Step C: Send missing rows to the peer nodes

Since local node also knows what peer nodes own, it sends the missing rows to
the peer nodes.

=== How the RPC API looks like

- repair_range_start()

Step A:
- request_sync_boundary()

Step B:
- request_combined_row_hashes()
- reqeust_full_row_hashes()
- request_row_diff()

Step C:
- send_row_diff()

- repair_range_stop()

=== Performance evaluation

We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We
created a keyspace with a replication factor of 3 and inserted 1 billion
rows to each of the 3 nodes. Each node has 241 GiB of data.
We tested 3 cases below.

1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows.

Time to repair:
   old = 87 min
   new = 70 min (rebuild took 50 minutes)
   improvement = 19.54%

2) 100% synced: all of the 3 nodes have 1 billion identical rows.
Time to repair:
   old = 43 min
   new = 24 min
   improvement = 44.18%

3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows.

Time to repair:
   old: 211 min
   new: 44 min
   improvement: 79.15%

Bytes sent on wire for repair:
   old: tx= 162 GiB,  rx = 90 GiB
   new: tx= 1.15 GiB, tx = 0.57 GiB
   improvement: tx = 99.29%, rx = 99.36%

It is worth noting that row level repair sends and receives exactly the
number of rows needed in theory.

In this test case, repair master needs to receives 2 million rows and
sends 4 million rows. Here are the details: Each node has 1 billion *
0.1% distinct rows, that is 1 million rows. So repair master receives 1
million rows from repair slave 1 and 1 million rows from repair slave 2.
Repair master sends 1 million rows from repair master and 1 million rows
received from repair slave 1 to repair slave 2. Repair master sends
sends 1 million rows from repair master and 1 million rows received from
repair slave 2 to repair slave 1.

In the result, we saw the rows on wire were as expected.

tx_row_nr  = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000
rx_row_nr  =  500233 + 500235 +  499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000

Fixes: #3033

Tests: dtests/repair_additional_test.py
"

* 'asias/row_level_repair_v7' of github.com:cloudius-systems/seastar-dev: (51 commits)
  repair: Enable row level repair
  repair: Add row_level_repair
  repair: Add docs for row level repair
  repair: Add repair_init_messaging_service_handler
  repair: Add repair_meta
  repair: Add repair_writer
  repair: Add repair_reader
  repair: Add repair_row
  repair: Add fragment_hasher
  repair: Add decorated_key_with_hash
  repair: Add get_random_seed
  repair: Add get_common_diff_detect_algorithm
  repair: Add shard_config
  repair: Add suportted_diff_detect_algorithms
  repair: Add repair_stats to repair_info
  repair: Introduce repair_stats
  flat_mutation_reader:  Add make_generating_reader
  storage_service: Introduce ROW_LEVEL_REPAIR feature
  messaging_service: Add RPC verbs for row level repair
  repair: Export the repair logger
  ...
2018-12-25 13:13:00 +02:00
Takuya ASADA
b9a06ae552 dist/offline_installer/redhat: support building RHEL 7 offline installer
We had issue to build offline installer on RHEL because of repository
difference.
This fix enables to build offline installer both on CentOS and RHEL.

Also it introduces --releasever <ver>, to build offline installer for
specific minor version of CentOS and RHEL.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181212032129.29515-1-syuu@scylladb.com>
2018-12-25 12:50:09 +02:00
Botond Dénes
3ae77a2587 configure.py: generate ${mode}-objects targets
Sometimes one wants to just compile all the source files in the
projects, because for example one just moved around code or files and
there is no need to link and run anything, just check that everything
still compiles.
Since linking takes up a considerable amount of time it is worthwhile to
have a specific target that caters for such needs.
This patch introduces a ${mode}-objects target for each mode (e.g.
release-objects) that only runs the compilation step for each source
file but does not link anything.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <eaad329bf22dfaa3deff43344f3e65916e2c8aaf.1545045775.git.bdenes@scylladb.com>
2018-12-25 12:40:20 +02:00
Benny Halevy
f104951928 sstable_test: read_file should open the file read-only
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181218145156.12716-1-bhalevy@scylladb.com>
2018-12-25 12:02:46 +02:00
Rafael Ávila de Espíndola
f8c81d4d89 tests: sstables: mc: add tests with incompatible schemas
In one test the types in the schema don't match the types in the
statistics file. In another a column is missing.

The patch also updates the exceptions to have more human readable
messages.

Tests: unit (release)

Part of issue #3960.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181219233046.74229-1-espindola@scylladb.com>
2018-12-25 11:11:54 +02:00
Yibo Cai (Arm Technology China)
422987ab04 utils: add fast ascii string validation
Validate ascii string by ORing all bytes and check if 7-th bit is 0.
Compared with original std::any_of(), which checks ascii string byte
by byte, this new approach validates input in 8 bytes and two
independent streams. Performance is much higher for normal cases,
though slightly slower when string is very short. See table below.

Speed(MB/s) of ascii string validation
+---------------+-------------+---------+
| String length | std::any_of | u64 x 2 |
+---------------+-------------+---------+
| 9 bytes       | 1691        | 1635    |
+---------------+-------------+---------+
| 31 bytes      | 2923        | 3181    |
+---------------+-------------+---------+
| 129 bytes     | 3377        | 15110   |
+---------------+-------------+---------+
| 1039 bytes    | 3357        | 31815   |
+---------------+-------------+---------+
| 16385 bytes   | 3448        | 47983   |
+---------------+-------------+---------+
| 1048576 bytes | 3394        | 31391   |
+---------------+-------------+---------+

Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1544669646-31881-1-git-send-email-yibo.cai@arm.com>
2018-12-24 09:58:08 +02:00
Tomasz Grabiec
419c771791 sstables: index_reader: Fix abort when _trust_pi == trust_promoted_index::no
data is not moved-from if _trust_pi == trust_promoted_index::no, which
triggers the assert on data.empty(). We should make it empty
unconditionally.

Message-Id: <1545408731-14333-1-git-send-email-tgrabiec@scylladb.com>
2018-12-23 12:09:21 +02:00
Tomasz Grabiec
07d153c769 sstables: mc: reader: Use enum class instead of variant
variant is an overkill here.

Message-Id: <1545409014-16289-1-git-send-email-tgrabiec@scylladb.com>
2018-12-23 12:04:02 +02:00
Duarte Nunes
e6a8883228 service/storage_proxy: Protect against empty mutation when storing hint
mutation_holder::get_mutation_for() can return nullptr's, so protect
against those when storing a hint.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181221194853.98775-2-duarte@scylladb.com>
2018-12-23 11:14:44 +02:00
Duarte Nunes
6c4a34f378 service/storage_proxy: Protect against empty mutation in mutation_holder
The per_destination_mutation holder can contain empty mutations,
so make sure release_mutation() skips over those.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181221194853.98775-1-duarte@scylladb.com>
2018-12-23 11:14:43 +02:00
Duarte Nunes
5e7d18380d Merge 'Reduce dependencies on config.hh for extensions access' from Avi
"
Some files use db/config.hh just to access extensions. Reduce dependencies
on this global and volatile file by providing another path to access extensions.

Tests: unit(release)
"

* tag 'unconfig-2/v1' of https://github.com/avikivity/scylla:
  hints: reduce dependencies on db/config.hh
  commitlog: reduce dependencies on db/config.hh
  cql3: reduce dependencies on db/config.hh
  database: provide accessor to db::extensions
2018-12-21 20:15:44 +00:00
Avi Kivity
eae030b061 hints: reduce dependencies on db/config.hh
Instead of accessing extensions via config, access it via
database::extensions(). This reduces recompilations when configuration
is extended.
2018-12-21 20:15:44 +00:00
Avi Kivity
cc8312a8b9 commitlog: reduce dependencies on db/config.hh
Instead of accessing extensions via config, access it via
database::extensions(). This reduces recompilations when configuration
is extended.
2018-12-21 20:15:43 +00:00
Avi Kivity
d2dae3af86 cql3: reduce dependencies on db/config.hh
Instead of accessing extensions via config, access it via
database::extensions(). This reduces recompilations when configuration
is extended.
2018-12-21 20:15:43 +00:00
Avi Kivity
74c1afad29 database: provide accessor to db::extensions
Rather than forcing callers to go through get_config(), provide a
direct accessor. This reduces dependencies on config.hh, and will
allow separation of extensions from configuration.
2018-12-21 20:15:43 +00:00
Tomasz Grabiec
d2f96a60f6 sstables: mc: index_reader: Handle CK_SIZE split across buffers properly
we incorrectly falled-through to the next state instead of returning
to read more data.

This can manifest in a number of ways, an abort, or incorrect read.

Introduced in 917528c

Fixes #4011.

Message-Id: <1545402032-4114-1-git-send-email-tgrabiec@scylladb.com>
2018-12-21 16:34:10 +02:00
Tomasz Grabiec
7afe2bad51 sstables: mc: reader: Avoid unnecessary index reads on fast forwarding
When the next pending fragments are after the start of the new range,
we know there is no need to skip.

Caught by perf_fast_forward --datasets large-part-ds3 \
                            --run-tests=large-partition-slicing

Refs #3984
Message-Id: <1545308006-16389-1-git-send-email-tgrabiec@scylladb.com>
2018-12-20 16:21:07 +00:00
Gleb Natapov
393269d34b streaming: hold to sink while close() is running and call close on error as well
Currently if something throws while streaming in mutation sending loop
sink is not closed. Also when close() is running the code does not hold
onto sink object. close() is async, so sink should be kept alive until
it completes. The patch uses do_with() to hold onto sink while close is
running and run close() on error path too.

Fixes #4004.

Message-Id: <20181220155931.GL3075@scylladb.com>
2018-12-20 18:03:37 +02:00
Rafi Einstein
533e46ac72 top_k: map template arguments
Added Hash and KeyEqual template arguments to enable unordered_map in top_k implementation.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-20 16:41:40 +02:00
Rafi Einstein
75f21954d4 top_k: whitespace and minor fixes
Style and minor logic changes from code review.

Signed-off-by: Rafi Einstein <rafie@scylladb.com>
2018-12-20 16:41:33 +02:00
Tomasz Grabiec
2b55ab8c8e Merge "Add more extensive test for mutation reader fast-forwarding" from Paweł
Mutation readers allow fast-forwarding the ranges from which the data is
being read. The main user of this feature is cache which, when reading
from the underlying reader, may want to skip some data it already has.
Unsurprisingly, this adds more complexity to the implementation of the
readers and more edge cases the developers need to take care of.

While most of the readers were at least to some extent checked in this
area those test usually were quite isolated (e.g. one test doing
inter-partition fast-forwarding, another doing intra-partition
fast-forwarding) and as a consequence didn't cover many corner cases.

This patch adds a generic test for fast-forwarding and slicing that
covers more complicated scenarios when those operations are combined.
Needless to say that did uncover some problems, but fortunately none of
them is user-visible.

Fixes #3963.
Fixes #3997.

Tests: unit(release, debug)

* https://github.com/pdziepak/scylla.git test-fast-forwarding/v4.1:
  tests/flat_mutation_reader_assertions: accumulate received tombstones
  tests/flat_mutation_reader_assertions: add more test messages
  tests/flat_mutation_reader_assertions: relax has_monotonic_positions()
    check
  tests/mutation_readers: do not ignore streamed_mutation::forwarding
  Revert "mutation_source_test: add option to skip intra-partition
    fast-forwarding tests"
  memtable: it is not a single partition read if partition
    fast-forwaring is enabled
  sstables: add more tracing in mp_row_consumer_m
  row_cache: use make_forwardable() to implement
    streamed_mutation::forwarding
  row_cache: read is not single-partition if inter-partition forwarding
    is enabled
  row_cache: drop support for streamed_mutation::forwarding::yes
    entirely
  sstables/mp_row_consumer: position_range end bound is exclusive
  mutation_fragment_filter: handle streamed_mutation::forwarding::yes
    properly
  tests/mutation_reader: reduce sleeping time
  tests/memtable: fix partition_range use-after-free
  tests/mutation: fix partition range use-after-free
  flat_mutation_reader_from_mutations: add overload that accepts a slice
    and partition range
  flat_mutation_reader_from_mutations: fix empty range case
  flat_mutation_reader_from_mutations: destroy all remaining mutations
  tests/mutation_source: drop dropped column handling test
  tests/mutation_source: add test for complex fast_forwarding and
    slicing
2018-12-20 15:05:21 +01:00
Paweł Dziepak
3355d16938 tests/mutation_source: add test for complex fast_forwarding and slicing
While we already had tests that verified inter- and intra-partition
fast-forwarding as well as slicing, they had quite limited scope and
didn't combine those operations. The new test is meant to extensively
test these cases.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
26a30375b1 tests/mutation_source: drop dropped column handling test
Schema changes are now covered by for_each_schema_change() function.
Having some additional tests in run_mutation_source_tests() is
problematic when it is used to test intermediate mutation readers
because schema changes may be irrelevant for them, which makes the test
a waste of time (might be a problem in debug mode) and requires those
intermediate reader to use more complex underlying reader that supports
schema changes (again, problem in a very slow debug mode).
2018-12-20 13:27:25 +00:00
Paweł Dziepak
048ed2e3d3 flat_mutation_reader_from_mutations: destroy all remaining mutations
If the reader is fast-forwarded to another partition range mutation_ may
be left with some partial mutations. Make sure that those are properly
destroyed.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
d50cd31eee flat_mutation_reader_from_mutations: fix empty range case
An iterator shall not be dereferenced until it is verified that it is
dereferencable.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
93488209de tests/mutation: fix partition range use-after-free 2018-12-20 13:27:25 +00:00
Paweł Dziepak
e91165d929 tests/memtable: fix partition_range use-after-free 2018-12-20 13:27:25 +00:00
Paweł Dziepak
5db8dacd1f tests/mutation_reader: reduce sleeping time
It is a very bad taste to sleep anywhere in the code. The test should be
fixed to explicitly test various orderings between concurrent
operations, but before that happens let's at least readuce how much
those sleeps slow it down by changing it from milliseconds to
microseconds.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
243aade3b2 mutation_fragment_filter: handle streamed_mutation::forwarding::yes properly 2018-12-20 13:27:25 +00:00
Paweł Dziepak
dfa5b3d996 sstables/mp_row_consumer: position_range end bound is exclusive 2018-12-20 13:27:25 +00:00
Paweł Dziepak
df1d438fcd row_cache: drop support for streamed_mutation::forwarding::yes entirely 2018-12-20 13:27:25 +00:00
Paweł Dziepak
adcb3ec20c row_cache: read is not single-partition if inter-partition forwarding is enabled 2018-12-20 13:27:25 +00:00
Paweł Dziepak
7ecee197c4 row_cache: use make_forwardable() to implement streamed_mutation::forwarding
Implementing intra-partition fast-forwarding adds more complexity to
already very-much-not-trivial cache readers and isn't really critical in
any way since it is not used outside of the tests. Let's use the generic
adapter instead of natively implementing it.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
e96a5f96d9 sstables: add more tracing in mp_row_consumer_m 2018-12-20 13:27:25 +00:00
Paweł Dziepak
18825af830 memtable: it is not a single partition read if partition fast-forwaring is enabled
Single-partition reader is less expensive than the one that accepts any
range of partitions, but it doesn't support fast-forwarding to another
partition range properly and therefore cannot be used if that option is
enabled.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
bcb5aed1ef Revert "mutation_source_test: add option to skip intra-partition fast-forwarding tests"
This reverts commit b36733971b. That commit made
run_mutation_reader_tests() support  mutation_sources that do not implement
streamed_mutation::forwarding::yes. This is wrong since mutation_sources
are not allowed to ignore or otherwise not support that mode. Moreover,
there is absolutely no reason for them to do so since there is a
make_forwardable() adapter that can make any mutation_reader a
forwardable one (at the cost of performance, but that's not always
important).
2018-12-20 13:27:25 +00:00
Paweł Dziepak
8706750b9b tests/mutation_readers: do not ignore streamed_mutation::forwarding
It is wrong to silently ignore streamed_mutation::forwarding option
which completely changes how the reader is supposed to operate. The best
solution is to use make_forwardable() adapter which changes
non-forwardable reader to a forwardable one.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
edf2c71701 tests/flat_mutation_reader_assertions: relax has_monotonic_positions() check
Since 41ede08a1d "mutation_reader: Allow
range tombstones with same position in the fragment stream" mutation
readers emit fragments in non-decreasing order (as opposed to strictly
increasing), has_monotonic_posiitons() needs to be updated to allow
that.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
787d1ba7b2 tests/flat_mutation_reader_assertions: add more test messages 2018-12-20 13:27:25 +00:00
Paweł Dziepak
593fb936c2 tests/flat_mutation_reader_assertions: accumulate received tombstones
Current data model employed by mutation readers doesn't have an unique
representation of range tombstones. This complicates testing by making
multiple ways of emitting range tombstones and rows equally valid.

This patch adds an option to verify mutation readers by checking whether
tombstones they emit properly affect the clustered rows regardless of how
exactly the tombstones are emitted. The interface of
flat_mutation_reader_assertions is extended by adding
may_produce_tombstones() that accepts any number of tombstones and
accumulates them. Then, produces_row_with_key() accepts an additional
argument which is the expected timestamp of the range tombstone that
affects that row.
2018-12-20 13:27:25 +00:00
Paweł Dziepak
e6d26a528f Merge "Optimize slicing sstable readers" from Tomasz
"
Contains several improvements for fast-forwarding and slicing readers. Mainly
for the MC format, but not only:

  - Exiting the parser early when going out of the fast-forwarding window [MC-format-only]
  - Avoiding reading of the head of the partition when slicing
  - Avoiding parsing rows which are going to be skipped [MC-format-only]
"

* 'sstable-mc-optimize-slicing-reads' of github.com:tgrabiec/scylla:
  sstables: mc: reader: Skip ignored rows before parsing them
  sstables: mc: reader: Call _cells.clear() when row ends rather than when it starts
  sstables: mc: mutation_fragment_filter: Take position_in_partition rather than a clustering_row
  sstables: mc: reader: Do not call consume_row_marker_and_tombstone() for static rows
  sstables: mc: parser: Allow the consumer to skip the whole row
  sstables: continuous_data_consumer: Introduce skip()
  sstables: continuous_data_consumer: Make position() meaningful inside state_processor::process_state()
  sstables: mc: parser: Allocate dynamic_bitset once per read instead of once per row
  sstables: reader: Do not read the head of the partition when index can be used
  sstables: mc: mutation_fragment_filter: Check the fast-forward window first
  sstables: mc: writer: Avoid calling unsigned_vint::serialized_size()
2018-12-20 12:48:22 +00:00
Avi Kivity
b66f59aa3d Merge "materialized views: Apply backpressure from view replicas" from Duarte
"
As the amount of pending view updates increases we know that there’s a
mismatch between the rate at which the base receives writes and the
rate at which the view retires them. We react by applying backpressure
to decrease the rate of incoming base writes, allowing the slow view
replicas to catch up. We want to delay the client’s next writes to a
base replica and we use the base’s backlog of view updates to derive
this delay.

To validate this approach we tested a 3 node Scylla cluster on GCE,
using n1-standard-4 instances with NVMEs. A loader running on a
n1-standard-8 instance run cassandra-stress with 100 threads. With the
delay function d(x) set to 1s, we see no base write timeouts. With the
delay function as defined in the series, we see that backlogs stabilize
at some (arbitrary) point, as predicted, but this stabilization
co-exists with base write timeouts. However, the system overall behaves
better than the current version, with the 100 view update limit, and
also better than the version without such limit or any backpressure.

More work is necessary to further stabilize the system. Namely, we want
to keep delaying until we see the backlog is decreasing. This will
require us to add more delay beyond the stabilization point, which in
turn should minimize the base write timeouts, and will also minimize the
amount of memory the backlog takes at each base replica.

Design document:
    https://docs.google.com/document/d/1J6GeLBvN8_c3SbLVp8YsOXHcLc9nOLlRY7pC6MH3JWo

Fixes #2538
"

Reviewed-by: Nadav Har'El <nyh@scylladb.com>

* 'materialized-views/backpressure/v2' of https://github.com/duarten/scylla: (32 commits)
  service/storage_proxy: Release mutation as early as possible
  service/storage_proxy: Delay replica writes based on view update backlog
  service/storage_proxy: Get the backlog of a particular base replica
  service/storage_proxy: Add counters for delayed base writes
  main: Start and stop the view_update_backlog_broker
  service: Distribute a node's view update backlog
  service: Advertise view update backlog over gossip
  service/storage_proxy: Send view update backlog from replicas
  service/storage_proxy: Prepare to receive replica view update backlog
  service/storage_proxy: Expose local view update backlog
  tests/view_schema_test: Add simple test for db::view::node_update_backlog
  db/view: Introduce node_update_backlog class
  db/hints: Initialize current backlog
  database: Add counter for current view backlog
  database: Expose current memory view update backlog
  idl: Add db::view::update_backlog
  db/view: Add view_update_backlog
  database: Wait on view update semaphore for view building
  service/storage_proxy: Use near-infinite timeouts for view updates
  database: generate_and_propagate_view_updates no longer needs a timeout
  ...
2018-12-20 12:44:51 +02:00
Asias He
bcba6b4f4d streaming: Futurize estimate_partitions
The loop can take a long time if the number of sstables and/or ranges
are large. To fix, futurize the loop.

Fixes: #4005

Message-Id: <3b05cb84f3f57cc566702142c6365a04b075018e.1545290730.git.asias@scylladb.com>
2018-12-20 12:08:03 +02:00
Amos Kong
385d74db01 redhat/scylla.spec: add python34-setuptools dependency
Commit 00476c3946 switched some scripts to python3, it introduced an
ImportError: No module named 'pkg_resources'.

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <293c05d9315ec6c9da1f32e8cb3d2fdf8d8d3924.1545272049.git.amos@scylladb.com>
2018-12-20 06:32:36 +02:00
Duarte Nunes
2d7c026d6e service/storage_proxy: Release mutation as early as possible
When delaying a base write, there is no need to hold on to the
mutation if all replicas have already replied.

We introduce mutation_holder::release_mutation(), which frees the
mutations that are no longer needed during the rest of the delay.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
756b601560 service/storage_proxy: Delay replica writes based on view update backlog
As the amount of pending view updates increases we know that there’s a
mismatch between the rate at which the base receives writes and the
rate at which the view retires them. We react by applying backpressure
to decrease the rate of incoming base writes, allowing the slow view
replicas to catch up. We want to delay the client’s next writes to a
base replica. We use the base’s backlog of view updates to derive
this delay.

If we achieve CL and the backlogs of all replicas involved were last
seen to be empty, then we wouldn't delay the client's reply. However,
it could be that one of the replicas is actually overloaded, and won't
reply for many new such requests. We'll eventually start applying
backpressure to the client via the background's write queue, but in
the meanwhile we may be dropping view updates. To mitigate this we rely
on the backlog being gossiped periodically.

Fixes #2538

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
997bdf5d98 service/storage_proxy: Get the backlog of a particular base replica
Add a function that returns the view update backlog for a particular
replica.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
819b6f3406 service/storage_proxy: Add counters for delayed base writes
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
6df32bfb0c main: Start and stop the view_update_backlog_broker
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
37dfd22619 service: Distribute a node's view update backlog
This patch introduces the view_update_backlog_broker class, which is
responsible for periodically updating the local gossip state with the
current node's view update backlog. It also registers to updates from
other nodes, and updates the local coordinator's view of their view
update backlogs.

We consider the view update backlog received from a peer through the
mutation_done verb to be always fresh, but we consider the one received
through gossip to be fresh only if it has a higher timestamp than what
we currently have recorded.

This is because a node only updates its gossip state periodically, and
also because a node can transitively receive gossip state about a third
node with outdated information.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
8da6a31e75 service: Advertise view update backlog over gossip
This lays the groundwork for brokering a node's view update
backlog across the whole cluster. This is needed for when a
coordinator does not contact a given replica for a long time, and uses
a backlog view that is outdated and causes requests to be
unnecessarily delayed.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
ede5742f9b service/storage_proxy: Send view update backlog from replicas
Change the inter-node protocol so we can propagate the view update
backlog from a base replica to the coordinator through the
mutation_done and mutation_failed verbs.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
34b48e1d98 service/storage_proxy: Prepare to receive replica view update backlog
In subsequent patches, replicas will reply to the coordinator with
their view update backlog. Before introducing changes to the
messaging_service, prepare the storage_proxy to receive and store
those backlogs.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
776fdd4d1a service/storage_proxy: Expose local view update backlog
The local view update backlog is the max backlog out of the relative
memory backlog size and the relative hints backlog size.

We leverage the db::view::node_update_backlog class so we can send the
max backlog out of the node's shards.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
6662475dd9 tests/view_schema_test: Add simple test for db::view::node_update_backlog
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
2bd76f8fc5 db/view: Introduce node_update_backlog class
This class is an atomic view update backlog representation,
safe to update from multiple shards.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
6afbec4685 db/hints: Initialize current backlog
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
8d6718b6e4 database: Add counter for current view backlog
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
2174eed640 database: Expose current memory view update backlog
Expose the base replica's current memory view update backlog, which is
defined in terms of units consumed from the semaphore.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
d54ac4961d idl: Add db::view::update_backlog
Add db::view::update_backlog to the newly created view.idl.hh.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
12ce517242 db/view: Add view_update_backlog
The view update backlog represents the pending view data that a base replica
maintains. It is the maximum of the memory backlog - how much memory pending
view updates are consuming - and the disk backlog - how much view hints are
consuming. The size of a backlog is relative to its maximum size.

We will use this class to represent a base replica's view update
backlog at the coordinator.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
fc9176e784 database: Wait on view update semaphore for view building
View building sends view updates synchronously, which has natural
backpressure. However, they

1) Contribute to the load on the view replicas, and;
2) Add memory pressure to the base replica.

They should thus count towards the current view update backlog, and
consume units from the view update concurrency semaphore.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
e33e187096 service/storage_proxy: Use near-infinite timeouts for view updates
View updates are sent with a timeout of 5 minutes, unrelated to
any user-defined value and meant as a protection mechanism. During
normal operation we don’t benefit from timing out view writes and
offloading them to the hinted-handoff queue, since they are an
internal, non-real time workload that we already spent resources on.

This value should be increases further, but that change depends on

Refs #2538
Refs #3826

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:30 +00:00
Duarte Nunes
86198060e5 database: generate_and_propagate_view_updates no longer needs a timeout
We no longer wait on the semaphore and instead over-subscribe it, so
there's not reason to pass a timeout.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
39eda68094 database: Don't generate view updates when node is overloaded
We arrive at an overloaded state when we fail to acquire semaphore
units in the base replica. This can mean clients are working in
interactive mode, we fail to throttle them and consequently should
start shedding load. We want to avoid impacting base table
availability by running out of memory, so we could offload the memory
queue to disk by writing the view updates as hints without attempting
to send them. However, the disk is also a limited resource and in
extreme cases we won’t be able to write hints. A tension exists
between forgetting the view updates, thereby opening up a window for
inconsistencies between base and view, or failing the base replica
write. The latter can fail the whole user write, or if the
coordinator was able to achieve CL, can instead cause inconsistencies
between base tables (we wouldn't want to store a hint, because if the
base replica is still overloaded, we would redo the whole dance).

Between the devil and the deep blue sea, we chose to forget view
updates. As a further simplification, we don't even write hints,
assuming that if clients can’t be throttled (as we'll attempt to do in
future patches), it will only be a matter of time before view updates
can’t be offloaded. We also start acquiring the semaphore units using
consume(), which is non-blocking, but allows for underflow of the
available semaphore units. This is okay, and we expect not to underflow
by much, as we stop generating new view updates.

Refs #2538

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
a3d30ea99a db/view: Propagate acquired semaphore units to mutate_MV()
Propagate acquired semaphore units to mutate_MV() to allow the
semaphore to be incrementally signalled as view updates are processed
by view replicas.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
8c1e6fcee8 db/timeout_clock: Define timeout_semaphore_units
Defines the type of semaphore_units<> associated with
timeout_semaphore.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
11c02c51fe database: Wait for pending view updates to drain before stopping
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
185a4594af database: Restore formatting of table::stop()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
f286d2ec34 database: Wait for pending operations in table::stop()
Stopping a table with in-flight reads and writes can be happening
concurrently, which rely on table state and we must therefore prevent
its destruction before those operations complete.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
1f1fc36b72 database: Make view update concurrency semaphore memory-based
The semaphore currently limiting the amount of view updates a given
base replica emits aims to control the load that is imposed on the
cluster, to protect view replicas from being overloaded when there
are bursts of traffic (especially for degenerate cases like an index
with low selectivity).

100 is, however, an arbitrary number. It might allow too much load on
the view replicas, and it might also allow too much memory from the
base shard to be consumed. Conversely, it might allow for too few
updates to be queued in case of a burst, or to absorb updates while a
view replica becomes partitioned.

To deal with the load that is inflicted on the cluster, future patches
will ensure that the rate of base writes obeys the rate at which the
slowest view replica can consume the corresponding view updates.

To protect the current shard from using too much memory for this
queue, we will limit it to 10% of the shard's memory. The goal is to
both protect the shard from being overloaded, but also to allow it to
absorb bursts of writes resulting in large view mutations.

Refs #2538

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
bf4277fd8c service/storage_proxy: Remove unused send_to_endpoint() overloads
The send_to_endpoint() overloads that receive a non-frozen mutation
are no longer used.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
2753cfee88 db/view: Generate view updates as frozen_mutations
Working in terms of frozen_mutations allows us to account more
precisely the memory pending view updates consume at the storage_proxy
layer.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
715da6fd6b db/view: Reserve vector space in mutate_MV()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
5d011eb61f db/view: Cleanup mutate_MV()
In particular, extract out the logic updating the stats in case of a
failed update.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
7cfcd21bbb database: Make lambda in table::populate_views mutable
This allows an std::move() in its body to work as intended. Also, make
the lambda's argument type explicit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-12-19 22:38:29 +00:00
Duarte Nunes
122737a8ab Merge seastar upstream
* seastar 132e6cd...6c8c229 (3):
  > reactor: disable nowait aio due to a kernel bug
  > core/semaphore: Allow combining semaphore_units()
  > core/shared_ptr: Allow releasing a lw_shared_ptr to a non-const object

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181217153241.67514-2-duarte@scylladb.com>
2018-12-19 12:57:07 +02:00
Duarte Nunes
bf05e59672 seastar: Change the source repository to scylla-seastar
Scylla is at the moment incompatible with the Seastar master branch,
so in order to allow Scylla commits that depend on Seastar patches,
we change the submodule to point to scylla-seastar and use a branch
(master-20181217) to hold these dependent commits.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181217153241.67514-1-duarte@scylladb.com>
2018-12-19 12:57:03 +02:00
Rafael Ávila de Espíndola
ff18c837b7 tests: Add missing include in random-utils.hh
This file uses std::cout and so should include <iostream>.

Found with a patch to seastar that removes some redundant <iostream>
includes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181218183816.34504-1-espindola@scylladb.com>
2018-12-19 10:52:19 +00:00
Avi Kivity
dd51c659f7 config: remove "to be removed before release" notice mc sstable config
The "enable_sstables_mc_format" config item help text wants to remove itself
before release. Since scylla-3.0 did not get enough mc format mileage, we
decided to leave it in, so the notice should be removed.

Fixes #4003.
Message-Id: <20181219082554.23923-1-avi@scylladb.com>
2018-12-19 09:39:29 +00:00
Duarte Nunes
a7456db687 Merge 'Simplify natural endpoint calculation' from Calle
"
Implementation of origin change c000da13563907b99fe220a7c8bde3c1dec74ad5

Modifies network topology calculation, reducing the amount of
maps/sets used by applying the knowledge of how many replicas we
expect/need per dc and sharing endpoint and rack set (since we cannot have
overlaps).

Also includes a transposed origin test to ensure new calculation
matches the old one.

Fixes #2896
"

* 'calle/network_topology' of github.com:scylladb/seastar-dev:
  network_topology_test: Add test to verify new algorith results equals old
  network_topology_strategy: Simplify calculate_natural_endpoints
  token_metadata: Add "get_location" ip to dc+rack accessor
  sequenced_set: Add "insert" method, following std::set semantics
2018-12-19 09:39:29 +00:00
Rafael Ávila de Espíndola
b93d8d863d Add a test with mismatched timestamps.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181218035931.3554-1-espindola@scylladb.com>
2018-12-18 11:30:56 +01:00
Tomasz Grabiec
37d9ba68bc sstables: mc: reader: Skip ignored rows before parsing them
Currently filtering happens inside consume_row_end() after the whole
row is parsed. It's much faster to skip without parsing.

This patch moves filtering and range tombstones splitting to
consume_row_start().

_stored_row is no longer needed because in case the filter returns
store_and_finish, the consumer exits with retry_later, and the parser
will call consume_row_start() again when resumed.

Tests:

  ./build/release/tests/perf/perf_fast_forward_g \
     --sstable-format=mc \
     --datasets large-part-ds1 \
     --run-tests=large-partition-skips

Before:

read    skip      time (s)     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB)
1       4096      1.085142      1953       1800         32       1803       1720   4990     159604

After:

read    skip      time (s)     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB)
1       4096      0.694560      1953       2812         11       2813       2684   4986     159588
2018-12-18 11:13:52 +01:00
Tomasz Grabiec
e3c3ef2f0e sstables: mc: reader: Call _cells.clear() when row ends rather than when it starts
This way we will later avoid calling clear() for ignored rows.
2018-12-18 11:11:48 +01:00
Tomasz Grabiec
fa126106f8 sstables: mc: mutation_fragment_filter: Take position_in_partition rather than a clustering_row 2018-12-18 11:11:48 +01:00
Tomasz Grabiec
522a75f761 sstables: mc: reader: Do not call consume_row_marker_and_tombstone() for static rows
mp_row_consumer_m::consume_row_marker_and_tombstone() is called for
both clustering and static rows, but it dereferences and modifies
_in_progress_row, which is only set when inside a clustering row.

Fixes #3999.
2018-12-18 11:11:47 +01:00
Tomasz Grabiec
9498977a34 sstables: mc: parser: Allow the consumer to skip the whole row
The MC format contains row size before the row body, which we can use
to skip the row without parsing its contents, which will be much
faster.
2018-12-18 11:11:47 +01:00
Tomasz Grabiec
b4c3b78082 sstables: continuous_data_consumer: Introduce skip() 2018-12-18 11:11:47 +01:00
Tomasz Grabiec
36dd660507 sstables: continuous_data_consumer: Make position() meaningful inside state_processor::process_state()
Will allow state_processor to know its position in the
stream.

Currently position() is meaningless inside process_state() because in
some cases it points to the position after the buffer and in some
cases before it. This patch standardizes on the former. This is more
useful than the latter because process_state() trims from the front of
the buffer as it consumes, so the position inside the stream can be
obtained by subtracting the remaining buffer size from position(),
without introducing any new variables.
2018-12-18 11:11:47 +01:00
Tomasz Grabiec
e950c8b00a sstables: mc: parser: Allocate dynamic_bitset once per read instead of once per row
The size of the bitset is the same for given row kind across the sstable, so we can
allocate it once.

_columns_selector is moved into row_schema structure, which we have
one for each row kind and setup in the constructor.
2018-12-18 11:11:47 +01:00
Tomasz Grabiec
fb15759934 sstables: reader: Do not read the head of the partition when index can be used
read_partition() was always called through read_next_partition(), even
if we're at the beginning of the read.  read_next_partition() is
supposed to skip to the next partition. It still works when we're
positioned before a partition, it doesn't advance the consumer, but it
clears _index_in_current_partition, because it (correctly) assumes it
corresponds to the partition we're about to leave, not the one we're
about to enter.

This means that index lookups we did in the read initializer will be
disregarded when reading starts, and we'll always start by reading
partition data from the data file. This is suboptimal for reads which
are slicing a large partition and don't need to read the front of the
partition.

Regression introduced in 4b9a34a854.

The fix is to call read_partition() directly when we're positioned at
the beginning of the partition. For that purpose a new flag was
introduced.

test_no_index_reads_when_rows_fall_into_range_boundaries has to be
relaxed, because it assumed that slicing reads will read the head of
the partition.

Refs #3984
Fixes #3992

Tested using:

 ./build/release/tests/perf/perf_fast_forward_g \
     --sstable-format=mc \
     --datasets large-part-ds1 \
     --run-tests=large-partition-slicing-clustering-keys

Before (focus on aio):

offset  read      time (s)     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
4000000 1         0.001378         1        726          5        736        102      6        200       4       2        0        1        1        0        0        0  65.8%

After:

offset  read      time (s)     frags     frag/s    mad f/s    max f/s    min f/s    aio      (KiB) blocked dropped  idx hit idx miss  idx blk    c hit   c miss    c blk    cpu
4000000 1         0.001290         1        775          6        788        716      2        136       2       0        0        1        1        0        0        0  69.1%
2018-12-18 11:11:37 +01:00
Tomasz Grabiec
385a4c23fd sstables: mc: mutation_fragment_filter: Check the fast-forward window first
Otherwise the parser will keep consuming and dropping fragments
needlessly, rather than giving the user a chance to consume
end-of-stream condition, and maybe skip again.

Refs #3984
2018-12-18 11:11:37 +01:00
Tomasz Grabiec
62a1afaac9 sstables: mc: writer: Avoid calling unsigned_vint::serialized_size()
Rather than adding serialized_size() to the body size before
serializing the field, we can serialize the field to _tmp_bufs at the
beginning and have the body size automatically account for it.
2018-12-18 11:11:36 +01:00
Duarte Nunes
1f578be187 Merge 'Fix evictable shard reader related issues' from Botond
"
Recently some additional issues were discovered related to recent
changes to the way inactive readers are evicted and making shard readers
evictable.
One such issue is that the `querier_cache` is not prepared for the
querier to be immediately evicted by the reader concurrency semaphore,
when registered with it as an inactive read (#3987).
The other issue is that the multishard mutation query code was not
fully prepared for evicted shard readers being re-created, or failing
why being re-created (#3991).

This series fixes both of these issues and adds a unit test which covers
the second one. I am working on a unit test which would cover the second
issue, but it's proving to be a difficult one and I don't want to delay
the fixes for these issues any longer as they also affect 3.0.

Fixes: #3987
Fixes: #3991

Tests: unit(release, debug)
"

* 'evictable-reader-related-issues/v2' of https://github.com/denesb/scylla:
  multishard_mutation_query: reset failed readers to inexistent state
  multishard_mutation_query: handle missing readers when dismantling
  multishard_mutation_query: add support for keeping stats for discarded partitions
  multishard_mutation_query: expect evicted reader state when creating reader
  multishard_mutation_query: pretty-print the reader state in log messages
  querier_cache: check that the query wasn't evicted during registering
  reader_concurrency_semaphore: use the correct types in the constructor
  reader_concurrency_semaphore: add consume_resources()
  reader_concurrency_semaphore::inactive_read_handle: add operator bool()
2018-12-17 15:36:23 +00:00
Calle Wilund
e353a8633a network_topology_test: Add test to verify new algorith results equals old
Transposed from origin unit test.

Creates a semi-random topology of racks, dcs, tokens and replication
factors and verifies endpoint calculation equals old algo.
2018-12-17 13:10:59 +00:00
Calle Wilund
bfc6c89b00 network_topology_strategy: Simplify calculate_natural_endpoints
Fixes #2896 (hopefully)

Implementation of origin change c000da13563907b99fe220a7c8bde3c1dec74ad5

Reduces the amount of maps and sets and general complexity of
endpoint calculation by simply mapping dc:s to expected node
counts, re-using endpoint sets and iterate thusly.

Tested with transposed origin unit test comparing old vs. new
algo results. (Next patch)
2018-12-17 13:10:59 +00:00
Botond Dénes
b4c3aab4a7 multishard_mutation_query: reset failed readers to inexistent state
When attempting to dismantling readers, some of the to-be-dismantled
readers might be in a failed state. The code waiting on the reader to
stop is expecting failures, however it didn't do anything besides
logging the failure and bumping a counter. Code in the lower layers did
not know how to deal with a failed reader and would trigger
`std::bad_variant_access` when trying to process (save or cleanup) it.
To prevent this, reset the state of failed readers to `inexistent_state`
so code in the lower layers doesn't attempt to further process them.
2018-12-17 13:18:08 +02:00
Botond Dénes
9cef043841 multishard_mutation_query: handle missing readers when dismantling
When dismantling the combined buffer and the compaction state we are no
longer guaranteed to have the reader each partition originated from. The
reader might have been evicted and not resumed, or resuming it might
have failed. In any case we can no longer assume the originating reader
of each partition will be present. If a reader no longer exists,
discard the partitions that it emitted.
2018-12-17 13:18:08 +02:00
Botond Dénes
438bef333b multishard_mutation_query: add support for keeping stats for discarded partitions
In the next patches we will add code that will have to discard some of
the dismantled partitions/fragments/bytes. Prepare the
`dismantle_buffer_stats` struct for being able to track the discarded
partitions/fragments/bytes in addition to those that were successfully
dismantled.
2018-12-17 13:18:08 +02:00
Botond Dénes
ce52436af4 multishard_mutation_query: expect evicted reader state when creating reader
Previously readers were created once, so `make_remote_reader()` had a
validation to ensure readers were not attempted at being created more
than once. This validation was done by checking that the reader-state is
either `inexistent` or `successful_lookup`. However with the
introduction of pausing shard readers, it is now possible that a reader
will have to be created and then re-created several times, however this
validation was not updated to expect this.
Update the validation so it also expects the reader-state to be
`evicted`, the state the reader will be if it was evicted while paused.
2018-12-17 13:18:08 +02:00
Botond Dénes
1effb1995b multishard_mutation_query: pretty-print the reader state in log messages 2018-12-17 13:18:08 +02:00
Botond Dénes
5780f2ce7a querier_cache: check that the query wasn't evicted during registering
The reader concurrency semaphore can evict the querier when it is
registered as an inactive read. Make the `querier_cache` aware of this
so that it doesn't continue to process the inserted querier when this
happens.
Also add a unit test for this.
2018-12-17 13:18:08 +02:00
Botond Dénes
e1d8237e6b reader_concurrency_semaphore: use the correct types in the constructor
Previously there was a type mismatch for `count` and `memory`, between
the actual type used to store them in the class (signed) and the type
of the parameters in the constructor (unsigned).
Although negative numbers are completely valid for these members,
initializing them to negative numbers don't make sense, this is why they
used unsigned types in the constructor. This restriction can backfire
however when someone intends to give these parameters the maximum
possible value, which, when interpreted as a signed value will be `-1`.
What's worse the caller might not even be aware of this unsigned->signed
conversion and be very suprised when they find out.
So to prevent surprises, expose the real type of these members, trusting
the clients of knowing what they are doing.

Also add a `no_limits` constructor, so clients don't have to make sure
they don't overflow internal types.
2018-12-17 13:18:08 +02:00
Botond Dénes
dfd649a6b4 reader_concurrency_semaphore: add consume_resources() 2018-12-17 13:18:08 +02:00
Botond Dénes
21b44adbfe reader_concurrency_semaphore::inactive_read_handle: add operator bool() 2018-12-17 13:18:08 +02:00
Amnon Heiman
571755e117 node-exporter.service: Update command line to fix service startup
The upgrade to node_exporter 0.17 commit
09c2b8b48a ("node_exporter_install: switch
to node_exporter 0.17") caused the service to no longer start. Turns out
node_exported broke backwards compatibility of the command line between
0.15 to 0.16. Fix it up.

While fixing the command line, all the collector that are enabled by
default were removed.

Fixes #3989

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
[ penberg@scylladb.com: edit commit message ]
Message-Id: <20181213114831.27216-1-amnon@scylladb.com>
2018-12-17 10:22:17 +02:00
Rafael Ávila de Espíndola
4de14e6143 Add tests on broken mc range tombstones.
This tests that we diagnose both two consecutive range starts and two
consecutive range ends.
Message-Id: <20181214212608.95452-1-espindola@scylladb.com>
2018-12-15 13:53:25 +01:00
Avi Kivity
b023e8b45d Merge " Extract MC sstable writer to a separate compilation unit" from Tomasz
"
The motivation is to keep code related to each format separate, to make it
easier to comprehend and reduce incremental compilation times.

Also reduces dependency on sstable writer code by removing writer bits from
sstales.hh.

The ka/la format writers are still left in sstables.cc, they could be also extracted.
"

* 'extract-sstable-writer-code' of github.com:tgrabiec/scylla:
  sstables: Make variadic write() not picked on substitution error
  sstables: Extract MC format writer to mc/writer.cc
  sstables: Extract maybe_add_summary_entry() out of components_writer
  sstables: Publish functions used by writers in writer.hh
  sstables: Move common write functions to writer.hh
  sstables: Extract sstable_writer_impl to a header
  sstables: Do not include writer.hh from sstables.hh
  sstables: mc: Extract bound_kind_m related stuff into mc/types.hh
  sstables: types: Extract sstable_enabled_features::all()
  sstables: Move components_writer to .cc
  tests: sstable_datafile_test: Avoid dependency on components_writer
2018-12-14 15:05:00 +02:00
Duarte Nunes
224821303c Merge 'Reduce the dependency on database.hh' from Botond
"
Working on database.hh or any header that is included in database.hh
(of which there is a lot), is a major pain as each change involves the
recompilation of half of our compilation units.
Reduce the impact by removing the `#include "database.hh"` directive
from as many header files as possible. Many headers can make do with
just some forward declarations and don't need to include the entire
headers. I also found some headers that included database.hh without
actually needing it.

Results

Before:
    $ touch database.hh
    $ ninja build/release/scylla
    [1/154] CXX build/release/gen/cql3/CqlParser.o

After:
    $ touch database.hh
    $ ninja build/release/scylla
    [1/107] CXX build/release/gen/cql3/CqlParser.o
"

* 'reduce-dependencies-on-database-hh/v2' of https://github.com/denesb/scylla:
  treewide: remove include database.hh from headers where possible
  database_fwd.hh: add keyspace fwd declaration
  service/client_state: de-inline set_keyspace()
  Move cache_temperature into its own header
2018-12-14 12:24:48 +00:00
Piotr Sarna
63bd43e57e cql3: add refusing to create an index on static column
Secondary indexes on static columns are not yet supported,
so creating such index should return an appropriate error.

Fixes #3993
Message-Id: <700b0a71e80da52d2d5250edacc12626b55681fa.1544785127.git.sarna@scylladb.com>
2018-12-14 11:15:28 +00:00
Rafael Ávila de Espíndola
f48d54543f Use read_rows_flat to test broken sstables.
The previous code was using mp_row_consumer_k_l to be as close to the
tested code as possible.

Given that it is testing for an unhandled exception, there is probably
more value in moving it to a higher level, easier to use, API.

This patch changes it to use read_rows_flat().

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181210235016.41133-1-espindola@scylladb.com>
2018-12-14 10:14:28 +01:00
Botond Dénes
1865e5da41 treewide: remove include database.hh from headers where possible
Many headers don't really need to include database.hh, the include can
be replaced by forward declarations and/or including the actually needed
headers directly. Some headers don't need this include at all.

Each header was verified to be compilable on its own after the change,
by including it into an empty `.cc` file and compiling it. `.cc` files
that used to get `database.hh` through headers that no longer include it
were changed to include it themselves.
2018-12-14 08:03:57 +02:00
Botond Dénes
efe2b2c75d database_fwd.hh: add keyspace fwd declaration 2018-12-14 08:03:57 +02:00
Tomasz Grabiec
245a0d953a tests: cql_test_env: Start the compaction manager
Broken in fee4d2e

Not doing this results in compaction requests being ignored.

One effect of this is that perf_fast_forward produces many sstables instead of one.

Refs #3984
Refs #3983

Message-Id: <1544719540-10178-1-git-send-email-tgrabiec@scylladb.com>
2018-12-13 18:58:50 +02:00
Piotr Sarna
6743af5dbd cql3: refuse to create index on COMPACT STORAGE with ck
To follow C* compatibility, creating an index on COMPACT STORAGE
table should be disallowed not only on base primary keys,
but also when the base table contains clustering keys.
Message-Id: <ab40c39730aff2e164d11ee5159ff62b8ec9e8e8.1544698186.git.sarna@scylladb.com>
2018-12-13 13:39:12 +00:00
Duarte Nunes
f8878238ed service/storage_proxy: Embed the expire timer in the response handler
Embedding the expire timer for a write response in the
abstract_write_response_handler simplifies the code as it allows
removing the rh_entry type.

It will also make the timeout easily accessible inside the handler,
for future patches.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181213111818.39983-1-duarte@scylladb.com>
2018-12-13 14:25:21 +02:00
Tomasz Grabiec
3889b05d7e Merge "Tests and small fixes for composite markers" from Rafael
* https://github.com/espindola/scylla espindola/add-composite-tests:
  Remove newline from exception messages.
  Fix end marker exception message.
  Add tests for broken start and end composite markers.
2018-12-13 10:29:44 +01:00
Rafael Ávila de Espíndola
51fd880892 Add tests for broken start and end composite markers. 2018-12-13 10:29:44 +01:00
Rafael Ávila de Espíndola
64439f6477 Fix end marker exception message.
The code tested the end marker, but the exception mentioned the start
marker.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-13 10:29:44 +01:00
Rafael Ávila de Espíndola
cfd07185b7 Remove newline from exception messages.
They are inconsistent with other uses of malformed_sstable_exception
and incompatible with adding " in sstable ..." to the message.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-13 10:29:44 +01:00
Vlad Zolotarov
7da1ac2c2c large_partition_handler: fix the message
We currently detect large partitions - not rows. So this is what we
should be reporting.

Fixes #3986

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20181212215506.9879-1-vladz@scylladb.com>
2018-12-13 00:11:27 +00:00
Rafael Ávila de Espíndola
894f07f912 Move default case out of two switches.
These switches are fully covered, having the default label disables
-Wswitch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181212160904.17341-1-espindola@scylladb.com>
2018-12-12 18:20:24 +01:00
Botond Dénes
10336c13fc service/client_state: de-inline set_keyspace() 2018-12-12 18:14:03 +02:00
Botond Dénes
76fe4ebc18 Move cache_temperature into its own header
Some headers need to include database.hh just because of
cache_temperature. Move it into its own header so these includes can be
removed.
2018-12-12 16:03:45 +02:00
Tomasz Grabiec
0a853b8866 sstables: index_reader: Avoid schema copy in advance_to()
Introduced in 7e15e43.

Exposed by perf_fast_forward:

  running: large-partition-skips on dataset large-part-ds1
  Testing scanning large partition with skips.
  Reads whole range interleaving reads with skips according to read-skip pattern:
  read    skip      time (s)     frags     frag/s (...)
  1       0         5.268780   8000000    1518378

  1       1        31.695985   4000000     126199
Message-Id: <1544614272-21970-1-git-send-email-tgrabiec@scylladb.com>
2018-12-12 11:33:46 +00:00
Tomasz Grabiec
ff2ad2f6bb sstables: Make variadic write() not picked on substitution error
If write(v, out, x) doesn't match any overload, the variadic write()
will be picked, with Rest = {}. The compiler will print error messages
about unable to find write(v, out), which totally obscures the
original cause of mismatch.

Make it picked only when there are at least two write() parameters so
that debugging compilation errors is actually possible.
2018-12-12 12:07:31 +01:00
Tomasz Grabiec
a14633c6d0 sstables: Extract MC format writer to mc/writer.cc
This moves all MC-related writing code to mc/writer.cc:

  - m_format_write_helpers.hh is dropped
  - m_format_write_helpers_impl.hh is dropped
  - sstable_writer_m is moved out of sstables.cc

sstable_writer_m is renamed to sstables::mc::writer
2018-12-12 12:07:31 +01:00
Tomasz Grabiec
2636e6b5ab sstables: Extract maybe_add_summary_entry() out of components_writer
So that it can be used from writer implementations, which don't have
access to the definition of the components_writer.
2018-12-12 12:07:31 +01:00
Tomasz Grabiec
577e71478d sstables: Publish functions used by writers in writer.hh 2018-12-12 12:07:31 +01:00
Tomasz Grabiec
faf0ff1843 sstables: Move common write functions to writer.hh
They are common for sstable writers of different formats.

Note that writer.hh is supposed to be included only by writer
implementations, not writer users.
2018-12-12 12:07:31 +01:00
Tomasz Grabiec
3b4ccc85d0 sstables: Extract sstable_writer_impl to a header 2018-12-12 12:07:31 +01:00
Tomasz Grabiec
6e3c9c3e5e sstables: Do not include writer.hh from sstables.hh
It is only needed by writer implementations.
2018-12-12 12:07:05 +01:00
Tomasz Grabiec
bd7e9ad3ab sstables: mc: Extract bound_kind_m related stuff into mc/types.hh 2018-12-12 12:06:46 +01:00
Tomasz Grabiec
a4721b4d50 sstables: types: Extract sstable_enabled_features::all() 2018-12-12 12:06:45 +01:00
Tomasz Grabiec
90074d0b75 sstables: Move components_writer to .cc 2018-12-12 12:06:45 +01:00
Tomasz Grabiec
eff47a59ee tests: sstable_datafile_test: Avoid dependency on components_writer
It's LA format specific and it's going to become private to sstable.cc
2018-12-12 12:06:22 +01:00
Avi Kivity
fa96e07e6b build: pass C compiler configuration in relocatable package build
Just like we allow customizing the C++ compiler, we should allow customizing
the C compiler.

Ref #3978
Message-Id: <20181211172821.30830-1-avi@scylladb.com>
2018-12-12 11:45:13 +01:00
Calle Wilund
707bff563e token_metadata: Add "get_location" ip to dc+rack accessor 2018-12-12 09:32:05 +00:00
Calle Wilund
66472bc52d sequenced_set: Add "insert" method, following std::set semantics 2018-12-12 09:32:05 +00:00
Asias He
b9e0db801d repair: Enable row level repair
Finally, enable new row level repair if the cluster supports it. If not,
fallback to the old partition level repair.

Fixes #3033
2018-12-12 16:49:01 +08:00
Asias He
d372317e99 repair: Add row_level_repair
=== How the the partition level repair works

- The repair master decides which ranges to work on.
- The repair master splits the ranges to sub ranges which contains around 100
partitions.
- The repair master computes the checksum of the 100 partitions and asks the
related peers to compute the checksum of the 100 partitions.
- If the checksum matches, the data in this sub range is synced.
- If the checksum mismatches, repair master fetches the data from all the peers
and sends back the merged data to peers.

=== Major problems with partition level repair

- A mismatch of a single row in any of the 100 partitions causes 100
partitions to be transferred. A single partition can be very large. Not to
mention the size of 100 partitions.

- Checksum (find the mismatch) and streaming (fix the mismatch) will read the
same data twice

=== Row level repair

Row level checksum and synchronization: detect row level mismatch and transfer
only the mismatch

=== How the row level repair works

- To solve the problem of reading data twice

Read the data only once for both checksum and synchronization between nodes.

We work on a small range which contains only a few mega bytes of rows,
We read all the rows within the small range into memory. Find the
mismatch and send the mismatch rows between peers.

We need to find a sync boundary among the nodes which contains only N bytes of
rows.

- To solve the problem of sending unnecessary data.

We need to find the mismatched rows between nodes and only send the delta.
The problem is called set reconciliation problem which is a common problem in
distributed systems.

For example:
Node1 has set1 = {row1, row2, row3}
Node2 has set2 = {      row2, row3}
Node3 has set3 = {row1, row2, row4}

To repair:
Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3.
Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2
Node1 sends row3 (set1 + set2 + set3 - set3) to Node3.

=== How to implement repair with set reconciliation

- Step A: Negotiate sync boundary

class repair_sync_boundary {
    dht::decorated_key pk;
    position_in_partition position
}

Reads rows from disk into row buffers until the size is larger than N
bytes. Return the repair_sync_boundary of the last mutation_fragment we
read from disk. The smallest repair_sync_boundary of all nodes is
set as the current_sync_boundary.

- Step B: Get missing rows from peer nodes so that repair master contains all the rows

Request combined hashes from all nodes between last_sync_boundary and
current_sync_boundary. If the combined hashes from all nodes are identical,
data is synced, goto Step A. If not, request the full hashes from peers.

At this point, the repair master knows exactly what rows are missing. Request the
missing rows from peer nodes.

Now, local node contains all the rows.

- Step C: Send missing rows to the peer nodes

Since local node also knows what peer nodes own, it sends the missing rows to
the peer nodes.

=== How the RPC API looks like

- repair_range_start()

Step A:
- request_sync_boundary()

Step B:
- request_combined_row_hashes()
- reqeust_full_row_hashes()
- request_row_diff()

Step C:
- send_row_diff()

- repair_range_stop()

=== Performance evaluation

We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We
created a keyspace with a replication factor of 3 and inserted 1 billion
rows to each of the 3 nodes. Each node has 241 GiB of data.
We tested 3 cases below.

1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows.

Time to repair:
   old = 87 min
   new = 70 min (rebuild took 50 minutes)
   improvement = 19.54%

2) 100% synced: all of the 3 nodes have 1 billion identical rows.
Time to repair:
   old = 43 min
   new = 24 min
   improvement = 44.18%

3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows.

Time to repair:
   old: 211 min
   new: 44 min
   improvement: 79.15%

Bytes sent on wire for repair:
   old: tx= 162 GiB,  rx = 90 GiB
   new: tx= 1.15 GiB, tx = 0.57 GiB
   improvement: tx = 99.29%, rx = 99.36%

It is worth noting that row level repair sends and receives exactly the
number of rows needed in theory.

In this test case, repair master needs to receives 2 million rows and
sends 4 million rows. Here are the details: Each node has 1 billion *
0.1% distinct rows, that is 1 million rows. So repair master receives 1
million rows from repair slave 1 and 1 million rows from repair slave 2.
Repair master sends 1 million rows from repair master and 1 million rows
received from repair slave 1 to repair slave 2. Repair master sends
sends 1 million rows from repair master and 1 million rows received from
repair slave 2 to repair slave 1.

In the result, we saw the rows on wire were as expected.

tx_row_nr  = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000
rx_row_nr  =  500233 + 500235 +  499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000

Fixes #3033
2018-12-12 16:49:01 +08:00
Asias He
b2b20cd5c0 repair: Add docs for row level repair 2018-12-12 16:49:01 +08:00
Asias He
fab31efae1 repair: Add repair_init_messaging_service_handler
This patch implements all the rpc handlers for row level repair.
2018-12-12 16:49:01 +08:00
Asias He
3c80727d51 repair: Add repair_meta
This patch introduces repair_meta class that is the core class for the
row level repair.

For each range to repair, repair_meta objects are created on both repair
master and repair slaves. It stores the meta data for the row level
repair algorithms, e.g, the current sync boundary, the buffer used to
hold the rows the peers are working on, the reader to read data from
sstable and the writer to write data to sstable.

This patch also implements the RPC verbs for row level repair, for
example, REPAIR_ROW_LEVEL_START/REPAIR_ROW_LEVEL_STOP to starts/stops
row level repair for a range, REPAIR_GET_SYNC_BOUNDARY to get sync
boundary peers want to work on, REPAIR_GET_ROW_DIFF to get missing rows
from repair slaves and REPAIR_PUT_ROW_DIFF to pus missing rows to repair
slaves.
2018-12-12 16:49:01 +08:00
Asias He
65099bac85 repair: Add repair_writer
repair_writer uses multishard_writer to apply the mutation_fragments to
sstable. The repair master needs one such writer for each of the repair
slave. The repair slave needs one writer for the repair master.
2018-12-12 16:49:01 +08:00
Asias He
5b75f64e0e repair: Add repair_reader
repair_reader is used to read data from disk. It is simply a local
flat_mutation_reader reader for the repair master. It is more
complicated for the repair slave.

The repair slaves have to follow what repair master read from disk.

For example,

Assume repair master has 2 shards and repair slave has 3 shards
Repair master on shard 0 asks repair slave on shard 0 to read range [0,100).
Repair master on shard 1 asks repair slave on shard 1 to read range [0,100).

Repair master on shard 0 will only read the data that belongs to shard 0
within range [0,100). Since master and slave have different shard count,
repair slave on shard 0 has to use the multi shard reader to collect
data on all the shards. It can not pass range [0, 100) to the multi
shard reader, otherwise it will read more data than the repair master.
Instead, repair slave uses a sharder using sharding configuration of the
repair master, to generate the sub ranges belong to shard 0 of repair
master.

If repair master and slave has the same sharding configuration, a simple
local reader is enough for repair slave.
2018-12-12 16:49:01 +08:00
Asias He
27128d132d repair: Add repair_row
repair_row is the in-memory representation of "row" that the row level
repair works on. It represents a mutation_fragment that is read from the
flat_mutation reader. The hash of a repair_row is the combination of the
mutation_fragment hash and partition_key hash.
2018-12-12 16:49:01 +08:00
Asias He
3e7b1d2ef4 repair: Add fragment_hasher
It is used to calculate the hash of a mutation_fragment.
2018-12-12 16:49:01 +08:00
Asias He
e135871e4a repair: Add decorated_key_with_hash
Represents a decorated_key and the hash for it so that we do not need to
calculate more than once if the decorated_key is used more than once.
2018-12-12 16:49:01 +08:00
Asias He
16c1b26937 repair: Add get_random_seed
Get a random uint64_t number as the seed for the repair row hashing.
The seed is passed to xx_hasher.

We add the randomization when hashing rows so that when we run repair
for the next time the same row produces different hashing number.
2018-12-12 16:49:01 +08:00
Asias He
54888ac52c repair: Add get_common_diff_detect_algorithm
It is used to find the common difference detection algorithms supported
by repair master and repair slaves.

It is up to repair master to choose what algorithm to use.
2018-12-12 16:49:01 +08:00
Asias He
0b294d5829 repair: Add shard_config
It is used to store the shard configuration.
2018-12-12 16:49:01 +08:00
Asias He
a36b0966cf repair: Add suportted_diff_detect_algorithms
It returns a vector of row level repair difference detection algorithms
supported by this node.

We are going to implement the "send_full_set" in the following patches.
2018-12-12 16:49:01 +08:00
Asias He
42f2cd8dc5 repair: Add repair_stats to repair_info
Also add update_statistics() to update current stats.
2018-12-12 16:49:01 +08:00
Asias He
43c04302f3 repair: Introduce repair_stats
It is used by row level repair to track repair statistics.
2018-12-12 16:49:01 +08:00
Asias He
0067d32b47 flat_mutation_reader: Add make_generating_reader
Move generating_reader from stream_session.cc to flat_mutation_reader.cc.
It will be used by repair code soon.

Also introduce a helper make_generating_reader to hide the
implementation of generating_reader.
2018-12-12 16:49:01 +08:00
Asias He
fe4afb1aa3 storage_service: Introduce ROW_LEVEL_REPAIR feature
With this feature enabled, the node supports row level repair.
2018-12-12 16:49:01 +08:00
Asias He
acc9ff8dce messaging_service: Add RPC verbs for row level repair
This patch adds the RPC verbs that are needed by the row level repair.
The usage of those verbs are in the following patches.

All the verbs for row level repair are sent by the repair master.
Repair master asks repair slaves to create repair meta objects, a.k.a,
repair_meta object, to store the repair meta data needed by row level
repair algorithm. The repair meta object is identified by the IP address
of the repair master and a uint32 number repair_meta_id chosen by repair
master. When repair master restarts or is out of the cluster, repair
slaves will detect it and remove all existing repair_meta for the repair
master. When repair slave restarts, the existing repair_meta on the
slave will be gone.

The sync boundary used in the verbs is the position_in_partition of the
last mutation_fragment. In each repair round, peers work on
(last_sync_boundary, current_sync_boundary]
2018-12-12 16:49:01 +08:00
Asias He
8cfdcf435e repair: Export the repair logger
It will be used by the row level repair soon.
2018-12-12 16:49:01 +08:00
Asias He
e62aeae2db repair: Export repair_info
It will be used by the row level repair soon.
2018-12-12 16:49:01 +08:00
Asias He
6be3b35d52 repair: Export estimate_partitions
It will be used by row level repair soon.
2018-12-12 16:49:01 +08:00
Asias He
48341a2d4d idl: Add decorated_key support
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
1db4e3fd0a idl: Add row_level_diff_detect_algorithm
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
ccc706559f idl: Add get_sync_boundary_response
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
1173d1dd5a idl: Add repair_sync_boundary
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
dc223e9216 idl: Add partition_key_and_mutation_fragments
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
5fbbc63676 idl: Add position_in_partition
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
e9fbc27740 idl: Add bound_weight
It will be used by the row level repair code.
2018-12-12 16:49:01 +08:00
Asias He
3c39462397 idl: Add partition_region
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
e2b9840e24 idl: Add repair_hash
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
1a0bc8acf1 repair: Add struct hash<node_repair_meta_id> for node_repair_meta_id 2018-12-12 16:49:01 +08:00
Asias He
28d090ffda repair: Add struct hash<repair_hash> for repair_hash 2018-12-12 16:49:01 +08:00
Asias He
ce70225b1c repair: Introduce row_level_diff_detect_algorithm
It specifies the algorithm that is used to find the row difference in
repair.
2018-12-12 16:49:01 +08:00
Asias He
e9251df478 repair: Introduce partition_key_and_mutation_fragments
Represent a partition_key and frozen_mutation_fragments within the
partition_key.
2018-12-12 16:49:01 +08:00
Asias He
5d5a1beaec repair: Introduce node_repair_meta_id
It uses an IP address and a repair_meta_id to identify a repair
instance started by the row level repair.
2018-12-12 16:49:01 +08:00
Asias He
edd72e10ac repair: Introduce get_sync_boundary_response
The return value of the REPAIR_GET_SYNC_BOUNDARY verb. It will be used
in the row level repair code soon.
2018-12-12 16:49:01 +08:00
Asias He
95b9a889cf repair: Introduce repair_hash
It represents the hash value of a repair row.
2018-12-12 16:49:01 +08:00
Asias He
3e86b7a646 repair: Introduce repair_sync_boundary
Represent a position of a mutation_fragment read from a flat mutation
reader. Repair nodes negotiate a small sub range identified by two
repair_sync_boundary to work on in each round.
2018-12-12 16:49:01 +08:00
Asias He
063dfcda26 messaging_service: Add constructor for msg_addr
Which takes the ip address and shard id.
2018-12-12 16:49:01 +08:00
Asias He
8cb3ea98d0 xx_hasher: Allow specifying seed
It will be used by row level repair.
2018-12-12 16:49:01 +08:00
Asias He
165d3053b1 position_in_partition: Add get_type, get_bound_weight and get_clustering_key_prefix
Needed by the RPC serialization code.
2018-12-12 16:49:01 +08:00
Asias He
4e55d22a8f position_in_partition: Switch _bound_weight to use enum
The _bound_weight in position_in_partition will be sent on wire in rpc.
Make it enum instead of int.
2018-12-12 16:49:01 +08:00
Asias He
5bc109e1ee position_in_partition: Add bound_weight
It will be used to change _bound_weight to use enum instead of int8_t.
2018-12-12 16:49:01 +08:00
Asias He
05c663b932 position_in_partition: Use std::optional for clustering_key_prefix
The new row level repair code will access clustering_key_prefix and it
uses std::optional everywhere. Convert position_in_partition to use
std::optional.
2018-12-12 16:49:01 +08:00
Asias He
0b31d7059b position_in_partition: Make partition_region uint8_t
It will be sent over rpc. Make the type explicit.
2018-12-12 16:49:01 +08:00
Asias He
dfd206b3a3 serializer: Add std::optional support 2018-12-12 16:49:01 +08:00
Asias He
3eecdc670f serializer: Add std::list support
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
b540df2819 serializer: Add std::unordered_set support
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
1367c8c47e dht: Add make_partitioner
Given the name and shard count and the sharding_ignore_msb_bits, make a
partitioner.

It is used by row level repair.
2018-12-12 16:49:01 +08:00
Asias He
f1a914060b dht: Add constructor for decorated_key which takes token and partition_key
decorated_key(const dht::token& t, const partition_key& k)
2018-12-12 16:49:01 +08:00
Asias He
71c1681f6c storage_service: Notify NEW_NODE only when a node is new node
This is a backport of CASSANDRA-11038.

Before this, a restarted node will be reported as new node with NEW_NODE
cql notification.

To fix, only send NEW_NODE notification when the node was not part of
the cluster

Fixes: #3979
Tests: pushed_notifications_test.py:TestPushedNotifications.restart_node_test
Message-Id: <453d750b98b5af510c4637db25b629f07dd90140.1544583244.git.asias@scylladb.com>
2018-12-12 07:33:49 +02:00
Juliana Oliveira
5eb76c9bc6 compress: add support for Cassandra's compression parameter
This patch adds compatibility for Cassandra's "chunk_size_in_kb", as
well as it keeps Scylla's "chunk_size_kb" compression parameter.

Fixes #3669
Tests: unit (release)

v2: use variable instead of array
v3: fix commited files

Signed-off-by: Juliana Oliveira <juliana@scylladb.com>
Message-Id: <20181211215840.GA7379@shenzou.localdomain>
2018-12-11 23:33:27 +00:00
Nadav Har'El
a0379209e6 secondary indexes: fail attempts to create a CUSTOM INDEX
Cassandra supports a "CREATE CUSTOM INDEX" to create a secondary index
with a custom implementation. The only custom implementation that Cassandra
supports is SASI. But Scylla doesn't support this, or any other custom
index implementation. If a CREATE CUSTOM INDEX statement is used, we
shouldn't silently ignore the "CUSTOM" tag, we should generate an error.

This patch also includes a regression test that "CREATE CUSTOM INDEX"
statements with valid syntax fail (before this patch, they succeeded).

Fixes #3977

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181211224545.18349-2-nyh@scylladb.com>
2018-12-11 23:33:02 +00:00
Nadav Har'El
36db4fba23 Fix typo in error message
Interestingly, this typo was copied from the original Cassandra source
code :-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181211224545.18349-1-nyh@scylladb.com>
2018-12-11 23:32:58 +00:00
Avi Kivity
5b08e91bdb tools: add SYS_PTRACE capability to dbuild
LeakSanitizer uses ptrace, and docker disables ptrace by default. Add it
back so tests pass.
Message-Id: <20181208112524.19229-1-avi@scylladb.com>
2018-12-11 19:09:12 +00:00
Avi Kivity
34a31a807d build: build libdeflate with user selected C compiler
If the user specified a C compiler, use it to build libdeflate.

Fixes #3978.
Message-Id: <20181211145604.14847-1-avi@scylladb.com>
2018-12-11 14:58:16 +00:00
Duarte Nunes
89ae3fbf11 db/system_distributed_keyspace: Create the schema with min_timestamp
Different nodes can concurrently create the distributed system
keyspace on boot, before the "if not exists" clause can take effect.

However, the resulting schema mutations will be different since
different nodes use different timestamps. This patch forces the
timestamps to be the same across all nodes, so we save some schema
mismatches.

This fixes a bug exposed by ca5dfdf, whereby the initialization of the
distributed system keyspace is done before waiting for schema
agreement. While waiting for schema agreement in
storage_service::join_token_ring(), the node still hasn't joined the
ring and schemas can't be pulled from it, so nodes can deadlock. A
similar situation can happen between a seed node and a non-seed node,
where the seed node progresses to a different "wait for schema
agreement" barrier, but still can't make progress because it can't
pull the schema from the non-seed node still trying to join the ring.

Finally, it is assumed that changes to the schema of the current
distributed system keyspace tables will be protected by a cluster
feature and a subsequent schema synchronization, such that all nodes
will be at a point where schemas can be transferred around.

Fixes #3976

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181211113407.20075-1-duarte@scylladb.com>
2018-12-11 13:35:48 +01:00
Paweł Dziepak
e3f53542c9 Merge "Optimize sstable writing of large partitions" from Tomasz
"
This series contains several optimizations of the MC format sstable writer, mainly:
  - Avoiding output_stream when serializing into memory (e.g. a row)
  - Faster serialization of primitive types when serializing into memory

I measured the improvement in throughput (frag/s) using perf_fast_forward for
datasets with a single large partition with many small rows:

  - 10% for a row with a single cell of 8 bytes
  - 10% for a row with a single cell of 100 bytes
  -  9% for a row with a single cell of 1000 bytes
  - 13% for a row with 6 cells of 100 bytes
"

* tag 'avoid-output-stream-in-sstable-writer-v2' of github.com:tgrabiec/scylla:
  bytes_ostream: Optimize writing of fixed-size types
  sstables: mc: Write temporary data to bytes_ostream rather than file_writer
  sstables: mc: Avoid double-serialization of a range tombstone marker
  sstables: file_writer: Generalize bytes& writer to accept bytes_view
  sstables: Templetize write() functions on the writer
  sstables: Turn m_format_write_helpers.cc into an impl header
  sstables: De-futurize file_writer
  bytes_ostream: Implement clear()
  bytes_ostream: Make initial chunk size configurable
2018-12-11 12:29:24 +00:00
Duarte Nunes
d66bd0100b Merge 'Simplify db::extensions' from Avi
"
Carry out simplifications of db::extensions: less magical types, de-inline
complex functions, and reduce #include dependencies

Tests: unit(release)
"

* tag 'extensions-simplify/v1' of https://github.com/avikivity/scylla:
  extensions: remove unneeded includes
  extensions: deinline extension accessors
  extensions: return concrete types from the extension accessors
  extensions: remove dependency on cql layer
2018-12-10 22:00:51 +00:00
Avi Kivity
b251183359 extensions: remove unneeded includes
<boost/any.hpp> is not used, and "schema.hh" can be replaced with forward
declarations.
2018-12-10 21:34:09 +02:00
Avi Kivity
119a83bf2f extensions: deinline extension accessors
Quite complex code that is not performance sensitive. Move it out of line.
2018-12-10 21:22:56 +02:00
Avi Kivity
e9f5641b64 extensions: return concrete types from the extension accessors
Returning "auto" makes it harder to understand what the function is returning,
and impossible to de-inline.

Return a vector of pointers instead. The caller should iterate immediately, in
any case, and since the previous return value was a range of references to const
unique_ptrs, nothing else could be done with it anyway.
2018-12-10 21:16:45 +02:00
Tomasz Grabiec
f206ef0038 bytes_ostream: Optimize writing of fixed-size types
Inlining write() allows the writing code to be optimized for
fixed-size types. In particular, memcpy() calls and loops will be
eliminated.

Saw 4% improvement in throughput in perf_fast_forward for tiny rows.
2018-12-10 20:08:16 +01:00
Tomasz Grabiec
5a35240d47 sstables: mc: Write temporary data to bytes_ostream rather than file_writer
Currently temporary data is serialized into a file_writer, because
that's what write() functions used to expect, which goes through an
output_stream, a data_sink, into an in-memory data sink implementation
which collects the temporary_buffers.

Going through those abstractions is relatively expensive if we don't
write much, because each time we begin to write after a flush() of the
file_writer the output stream has to allocate a new buffer, which
means a large allocation for small amount of data.

We could avoid that and write into bytes_ostream directly, which will
keep its buffer across clear().

write() functions which are used both to write directly into the data
file and to a temporary arena were templatized to accept a Writer to
which both file_writer and bytes_ostream conform.
2018-12-10 20:08:16 +01:00
Tomasz Grabiec
c4003b3e79 sstables: mc: Avoid double-serialization of a range tombstone marker 2018-12-10 20:08:16 +01:00
Tomasz Grabiec
9edb9434e5 sstables: file_writer: Generalize bytes& writer to accept bytes_view
Note that bytes is imlpicitly convertible to bytes_view.
2018-12-10 20:08:16 +01:00
Tomasz Grabiec
fad4fba4bc sstables: Templetize write() functions on the writer
Will allow writing to both a file_writer, or an in-memory writer like
a bytes_ostream.
2018-12-10 20:08:16 +01:00
Tomasz Grabiec
f4016996d3 sstables: Turn m_format_write_helpers.cc into an impl header
I need to templatize functions defined in it and want to avoid
explicit instantiations.

There is only one compilation unit in which this is used
(sstables.cc). I think in the long term we should move all those
"helpers" into sstables/mc/writer.{cc,hh} together with their only
user, the sstable_writer_m class from sstables.cc.
2018-12-10 20:07:43 +01:00
Tomasz Grabiec
13999a4d09 sstables: De-futurize file_writer 2018-12-10 20:07:43 +01:00
Tomasz Grabiec
a1fb441df8 bytes_ostream: Implement clear() 2018-12-10 20:07:43 +01:00
Tomasz Grabiec
7cf5de3d9c bytes_ostream: Make initial chunk size configurable 2018-12-10 20:07:43 +01:00
Avi Kivity
8e05bcbe71 extensions: remove dependency on cql layer
The extensions class reaches into cql's property_definitions class to grab
a map<sstring, sstring> type. This generates a few unneeded dependencies.

Reduce dependencies by defining the map type ourselves; if cql's property_definitions
changes in an incompatible way, it will have to adapt, rather than the extensions
class.
2018-12-10 20:55:30 +02:00
Tomasz Grabiec
1dd2bf52ca Merge "Add a couple of tests of broken sstables" From Rafael
These are the current uninteresting cases I found when looking at
malformed_sstable_exception. The existing code is working, just not
being tested.

* https://github.com/espindola/scylla.git espindola/espindola/broken-sst:
  Add a broken sstable test.
  Add a test with mismatched schema.
2018-12-10 19:30:58 +01:00
Tomasz Grabiec
538e041f22 Merge "Remove some dependencies on db::config" from Avi
db::config is a global class; changes in any module can cause changes
in db::config. Therefore, it is a cause of needless recompilation.

Remove some of these dependencies by having consumers of db::config
declare an intermediate config struct that is contains only
configuration of interest to them, and have their caller fill it out
(in the case of auth, it already followed this scheme and the patchset
only moves the translation function).

In addition, some outright pointless inclusions of db/config.hh are
removed.

The result is somewhat shorter compile times, and fewer needless
recompiles.

* https://github.com/avikivity/scylla unconfig-1/v1:
  config: remove inclusions of db/config.hh from header files
  repair: remove unneeded config.hh inclusion
  batchlog_manager: remove dependency on db::config
  auth: remove permissions_cache dependency on db::config
  auth: remove auth::service dependency on db::config
  auth: remove unneeded db/config.hh includes
2018-12-10 14:53:14 +01:00
Benny Halevy
ef53ddf3ae scylla_io_setup: correct units in low space warning
GiB -> GB

Refs #2676

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181210092503.10344-1-bhalevy@scylladb.com>
2018-12-10 13:58:49 +02:00
Avi Kivity
475b151c97 Merge "Use utils::small_vector more in read path" from Paweł
"
This series optimises the read path by replacing some usages of
std::vector by utils::small_vector. The motivation for this change was
an observation that memory allocation functions are pointed out by the
profiler as the ones where we spent most time and while they have a
large number of callers storage allocation for some vectors was close to
the top. The gains are not huge, since the problem is a lot of things
adding up and not a single slow thing, but we need to start with
something.

Unfortunately, the performance of boost::container::small_vector is
quite disappointing so a new implementation of a small_vector was
introduced.

perf_simple_query -c4 --duration 60, medians:

       ./perf_before  ./perf_after  diff
 read      343086.80     360720.53  5.1%

Tests: unit(release, small_vector in debug)
"

* tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla:
  partition_slice: use small_vector for column_ids
  mutation_fragment_merger: use small_vector
  auth: use small_vector in resource
  auth: avoid list-initialisation of vectors
  idl: serialiser: add serialiser for utils::small_vector
  idl: serialiser: deduplicate vector serialisers
  utils: introduce small_vector
  intrusive_set_external_comparator: make iterator nothrow move constructible
  mutation_fragment_merger: value-initialise iterator
2018-12-10 13:50:59 +02:00
Duarte Nunes
a42b2895c2 Merge branch 'gossip: Send node UP event to cql client after cql server is up' from Asias
"
This is a backport of CASSANDRA-8236.

Before this patch, scylla sends the node UP event to cql client when it
sees a new node joins the cluster, i.e., when a new node's status
becomes NORMAL. The problem is, at this time, the cql server might not
be ready yet. Once the client receives the UP event, it tries to
connect to the new node's cql port and fails.

To fix, a new application_sate::RPC_READY is introduced, new node sets
RPC_READY to false when it starts gossip in the very beginning and sets
RPC_READY to true when the cql server is ready.

The RPC_READY is a bad name but I think it is better to follow Cassandra.

Nodes with or without this patch are supposed to work together with no
problem.

Refs #3843
"

* 'asias/node_up_down.upstream.v4.1' of github.com:scylladb/seastar-dev:
  storage_service: Use cql_ready facility
  storage_service: Handle application_state::RPC_READY
  storage_service: Add notify_cql_change
  storage_service: Add debug log in notify_joined
  storage_service: Add extra check in notify_joined
  storage_service: Add notify_joined
  storage_service: Add debug log in notify_up
  storage_service: Add extra check in notify_up
  storage_service: Add notify_up
  storage_service: Make notify_left log debug level
  storage_service: Introduce notify_left
  storage_service: Add debug log in notify_down
  storage_service: Introduce notify_down
  storage_service: Add set_cql_ready
  gossip: Add gossiper::is_cql_ready
  gms: Add endpoint_state::is_cql_ready
  gms: Add application_state::RPC_READY
  gms: Introduce cql_ready in versioned_value
2018-12-10 11:37:59 +00:00
Asias He
06dc9b8da0 storage_service: Use cql_ready facility
At this point the cql_ready facility is ready. To use it, advertise the
RPC_READY application state in the following cases:

- When a node boots, set it to false
- When cql server is ready, set it to true
- When cql server is down, set it to false
2018-12-10 19:20:20 +08:00
Asias He
4761b53035 storage_service: Handle application_state::RPC_READY 2018-12-10 19:20:20 +08:00
Asias He
0e64814206 storage_service: Add notify_cql_change
It is called when a RPC_READY gossip application state is received.
2018-12-10 19:20:20 +08:00
Asias He
a1bbd7bcc7 storage_service: Add debug log in notify_joined 2018-12-10 19:20:20 +08:00
Asias He
17d68cb408 storage_service: Add extra check in notify_joined
Do not send node joined event if node is not in NORMAL status which
means the node has joined the cluster officially.
2018-12-10 19:20:20 +08:00
Asias He
9abb15192f storage_service: Add notify_joined
Add a helper for node joined event.
2018-12-10 19:20:20 +08:00
Asias He
60c74431f7 storage_service: Add debug log in notify_up 2018-12-10 19:20:20 +08:00
Asias He
948d2b6c78 storage_service: Add extra check in notify_up
Do not send up event if is_cql_ready is false which means cql server is
not ready yet or node is down.
2018-12-10 19:20:20 +08:00
Asias He
48cd31dc1e storage_service: Add notify_up
Add a helper for node up event.
2018-12-10 19:20:20 +08:00
Asias He
03f9c3e7e5 storage_service: Make notify_left log debug level
Be consistent with other notification log.
2018-12-10 19:20:20 +08:00
Asias He
a5ec25f28b storage_service: Introduce notify_left
Add a helper for node left event.
2018-12-10 19:20:20 +08:00
Asias He
15d7fce902 storage_service: Add debug log in notify_down 2018-12-10 19:20:19 +08:00
Asias He
f18cb0654d storage_service: Introduce notify_down
Add a helper for node down event.
2018-12-10 19:20:19 +08:00
Asias He
2f3130b36f storage_service: Add set_cql_ready
It is used to set the status of the RPC_READY of this node so it can be
advertised by gossip.
2018-12-10 19:20:17 +08:00
Asias He
e07150166a gossip: Add gossiper::is_cql_ready
- New scylla node always send application_state::RPC_READY = false when
the node boots and send application_state::RPC_READY = true when cql
server is up

- Old scylla node that does not support the application_state::RPC_READY
never has application_state::RPC_READY in the endpoint_state, we can
only think their cql server is up, so we return true here if
application_state::RPC_READY is not present
2018-12-10 19:16:44 +08:00
Asias He
2737654c75 gms: Add endpoint_state::is_cql_ready
Retrun if the endpoint_state has the RPC_READY application_state.
2018-12-10 19:16:44 +08:00
Asias He
67093324ad gms: Add application_state::RPC_READY
It is used to tell peer nodes that the cql server is ready and can
accept clients request.

Follow the same name which Cassandra uses.
2018-12-10 19:16:44 +08:00
Asias He
4ed2ef23e9 gms: Introduce cql_ready in versioned_value 2018-12-10 19:16:43 +08:00
Avi Kivity
7c7da0b462 sstables: fix overflow in clustering key blocks header bit access
_ck_blocks_header is a 64-bit variable, so the mask should be 64 bits too.
Otherwise, a shift in the range 32-63 will produce wrong results.

Fix by using a 64-bit mask.

Found by Fedora 29's ubsan.

Fixes #3973.
Message-Id: <20181209120549.21371-1-avi@scylladb.com>
2018-12-10 11:09:25 +00:00
Takuya ASADA
a2d0ebf4d9 dist/offline_installer/redhat: fix missing dependencies
Offline installer with Scylla 3.0 causes dependency error on CentOS, added
missing packages.

Fixes #3969

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181207020711.23055-1-syuu@scylladb.com>
2018-12-10 12:47:10 +02:00
Avi Kivity
904db433d9 Merge "Re-use commitlog segments" from Calle
"
Refs #3929

Enables re-use of commitlog segments.

First, ensures we never succeed playing back a commitlog
segment with name not matching the ID:s in the actual
file data, by determining expected id based on file name.
This will also handle partially written re-used files, as
each chunk headers CRC is dependent on the ID, and will
fail once we hit any left-overs.

Second part renamed and puts files into a recycle list
instead of actually deleting them when finished.
Allocating new files will the prioritize this list
before creating a new file.

Note that since consumtion and release of segments can
be somewhat unbalanced, this does not really guarantee
we will use recycled files even in all cases when it
might be possible, simply because of timing. It does
however give a good chance of it.

We limit recycled files based on the max disk size
setting, thus we can potentially grow disk size
more than without depending on timing, but not
uncontrolled.

While all this theoretially might improve disk
writes in some cases, it is far from any magic bullet.
No real performance testing has been done yet, only
functional.
"

* 'calle/commitlog-reuse' of github.com:scylladb/seastar-dev:
  commitlog: Recycle used segments instead of delete + new file
  commitlog: Terminate all segments with a zero chunk
  commitlog_replay: Enforce file name based id matching
2018-12-10 11:15:02 +02:00
Calle Wilund
55f10ffc43 commitlog: Recycle used segments instead of delete + new file
Refs #3929

When deleting a segment, IFF we have not yet filled up all reserves,
instead of actually deleting the file, put it on a "recycle" list.
Next segment allocation will instead of creating a new one simply
rename the segment and reuse the file and its allocated space.

We rename the file twice: Once on adding to recycle list, with special
prefix so we don't mix up actual replayable segments and these. Second
when we actually re-use the file (also to ensure consecutive names).

Note that we limit the amount of recyclables, so a really stressed
application which somehow fills up the replenish queue might
cause us to still drop the segments. Could skip this but risk
getting to many files on disk.

Replay should be safe, since all entries are guarded by CRC based
on the file ID (i.e. file name). Thus replaying a recycled segment
will simply cause a CRC error in the main header and be ignored (see
previous patch).

Segments that are fully synced will have terminating zero-header (see
previous patch) so we know when to stop processing a recycled file.
If a file is the result of a mid-write crash, we will generate a CRC
processing error as "normally" in this case, when hitting partially
written block or coming to an old/new chunk boundary.

v2:
* Sync dir on rename
* auto -> const sstring&
* Allow recycling files as long as we're within disk space limits

v3:
* Use special names for files waiting for reuse
2018-12-10 09:09:07 +00:00
Calle Wilund
b13b6ef6a0 commitlog: Terminate all segments with a zero chunk
Writes a final chunk header of zero to the file on close, to mark
end-of-segment.
This allows us to gracefully stop replay processing of a segment file
even if it was not zeroed from the beginning (maybe recycled - hint
hint).
2018-12-10 09:09:07 +00:00
Calle Wilund
b35af84599 commitlog_replay: Enforce file name based id matching
When reading the header chunk of a commitlog file, check the stored id
value against the id derived from the file name, and ignore if
mismatched. This is a prerequisite for re-using renamed commitlog files,
as we can then fail-fast should one such be left on disk, instead of
trying to replay it.

We also check said id via the CRC check for each chunk parsed. If we
find a chunk with
mismatched id, we will get a CRC error for the chunk, and replay will
terminate (albeit not gracefully).
2018-12-10 09:09:07 +00:00
Amnon Heiman
09c2b8b48a node_exporter_install: switch to node_exporter 0.17
The newer version of node_exporter comes with important bug fixes, that
is especially important for I3.metal is not supported with the older
version of node_exporter.

The dashboards can now support both the new and the old version of
node_exporter.

Fixes #3927

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20181210085251.23312-1-amnon@scylladb.com>
2018-12-10 10:54:50 +02:00
Benny Halevy
bcb486b8b9 scylla_io_setup: io_tune should not run when there is less than 10GB of disk space
Fixes #2676

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181209174852.3620-1-bhalevy@scylladb.com>
2018-12-10 10:38:33 +02:00
Yibo Cai (Arm Technology China)
6717816a8d utils/gz: optimize crc_combine for arm64
Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1544418903-26290-1-git-send-email-yibo.cai@arm.com>
2018-12-10 10:31:08 +02:00
Avi Kivity
40677fae37 Merge "Compaction strategy aware major compaction" from Raphael
"
Make major compaction aware of compaction strategy, by using an
optimal approach which suits the strategy needs.

Refs #1431.
"

* 'compaction_strategy_aware_major_compaction_v2' of github.com:raphaelsc/scylla:
  tests: add test for compaction-strategy-aware major compaction
  compaction: implement major compaction heuristic for leveled strategy
  compaction: introduce notion of compaction-strategy-aware major compaction
2018-12-10 10:10:22 +02:00
Avi Kivity
d7c7949d43 auth: remove unneeded db/config.hh includes 2018-12-09 20:11:38 +02:00
Avi Kivity
37a681e46d auth: remove auth::service dependency on db::config
auth::service already has its own configuration and a function to create it
from db::config; just move it to the caller. This reduces dependencies on the
global db::config class.
2018-12-09 20:11:38 +02:00
Avi Kivity
77e6b7a155 auth: remove permissions_cache dependency on db::config
permissions_cache already has its own configuration and a function to create it
from db::config; just move it to the caller. This reduces dependencies on the
global db::config class.
2018-12-09 20:11:38 +02:00
Avi Kivity
89be47e291 batchlog_manager: remove dependency on db::config
Extract configuration into a new struct batchlog_manager_config and have the
callers populate it using db::config. This reduces dependencies on global objects.
2018-12-09 20:11:38 +02:00
Avi Kivity
85e9b0d78d repair: remove unneeded config.hh inclusion 2018-12-09 20:11:38 +02:00
Avi Kivity
864f55e745 config: remove inclusions of db/config.hh from header files
Instead, distribute those inclusions to .cc files that require them. This
reduces rebuilds when config.hh changes, and makes it easier to locate files
that need config disaggregation.
2018-12-09 20:11:38 +02:00
Amos Kong
09a3b11c2f scylla_setup: only ask for nic in interactive mode
Current scylla_setup still asks for nic even nic is already assigned in cmdline.

Fixes #3908

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <6b867e17a5583c495c771a37d5fa1e8366b1d61b.1542337635.git.amos@scylladb.com>
2018-12-09 15:29:31 +02:00
Gleb Natapov
9fb79bf379 storage_proxy: fix crash during write timeout callback invocation
rh_entry address is captured inside timeout's callback lambda, so the
structure should not be moved after it is created. Change the code to
create rh_entry in-place instead of moving it into the map.

Fixes #3972.

Message-Id: <20181206164043.GN25283@scylladb.com>
2018-12-09 10:33:37 +02:00
Vladimir Krivopalov
6a5d8934a6 db: Enable SSTables 'mc' format by default.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <ab4394b98a520b87c986bea2ceef13d015688967.1544227350.git.vladimir@scylladb.com>
2018-12-08 11:07:38 +02:00
Tomasz Grabiec
b78d98a358 tests: perf_fast_forward: Fix result_collector::add() for multi-element results
The results vector should be populated vertically, not horizontally.

Responsible for assertion failure with --cache-enabled:

  void result_collector::add(test_result_vector): Assertion `rs.size() == results.size()' failed.

Introduced in 3fc78a25bf.
Message-Id: <1544105835-24530-2-git-send-email-tgrabiec@scylladb.com>
2018-12-07 12:44:32 +00:00
Tomasz Grabiec
10cde9ae50 tests: perf_fast_forward: Fix live_range not being initialized
Broken in 470552b7ab

Causes test failure when running with --cache-enabled
Message-Id: <1544105835-24530-1-git-send-email-tgrabiec@scylladb.com>
2018-12-07 12:38:01 +00:00
Tomasz Grabiec
bb24d378b2 Merge "Fixes for collecting stats in SST3 + more tests" from Vladimir
This patchset fixes several remaining issues found during thorough
testing of SSTables 3.x statistics and enriches ~30 unit tests with
statistics validation against Cassandra-generated golden copies.

* https://github.com/argenet/scylla/tree/projects/sstables-30/sst3-tests-statistics/v1:
  sstables: Enforce estimated_partitions in generate_summary() to be
    always positive.
  sstables: Don't enforce default max_local_deletion_time value for 'mc'
    files.
  sstables: Update TTL/local deletion stats for non-expiring and live
    liveness_info.
  sstables: Collect statistics when writing RT markers to SSTables 3.x.
  tests: Return sstable_assertions from validate_read() helper.
  tests: Introduce helper for validating stats metadata in SSTables 3.x
    tests.
  tests: Add stats metadata validation to test_write_static_row.
  tests: Add stats metadata validation to
    test_write_composite_partition_key.
  tests: Add stats metadata validation to
    test_write_composite_clustering_key.
  tests: Add stats metadata validation to test_write_wide_partitions.
  tests: Add stats metadata validation to write_ttled_row
  tests: Add stats metadata validation to write_ttled_column
  tests: Add stats metadata validation to write_deleted_column
  tests: Add stats metadata validation to write_deleted_row
  tests: Add stats metadata validation to write_collection_wide_update
  tests: Add stats metadata validation to
    write_collection_incremental_update
  tests: Add stats metadata validation to write_multiple_partitions
  tests: Add stats metadata validation to write_multiple_rows
  tests: Add stats metadata validation to
    write_missing_columns_large_set
  tests: Add stats metadata validation to write_different_types
  tests: Add stats metadata validation to write_empty_clustering_values
  tests: Add stats metadata validation to write_large_clustering_key
  tests: Add stats metadata validation to write_compact_table
  tests: Add stats metadata validation to write_user_defined_type_table
  tests: Add stats metadata validation to write_simple_range_tombstone
  tests: Add stats metadata validation to
    write_adjacent_range_tombstones
  tests: Add stats metadata validation to
    write_non_adjacent_range_tombstones
  tests: Add stats metadata validation to
    write_mixed_rows_and_range_tombstones
  tests: Add stats metadata validation to
    write_adjacent_range_tombstones_with_rows
  tests: Add stats metadata validation to
    write_range_tombstone_same_start_with_row
  tests: Add stats metadata validation to
    write_range_tombstone_same_end_with_row
  tests: Add stats metadata validation to
    write_two_non_adjacent_range_tombstones
  tests: Delete unused (bogus) Statistics.db file from write_ SST3
    tests.
2018-12-07 12:05:55 +01:00
Vladimir Krivopalov
98ae39f920 tests: Delete unused (bogus) Statistics.db file from write_ SST3 tests.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
dcd639b4d5 tests: Add stats metadata validation to write_two_non_adjacent_range_tombstones
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
d07ab3b3ef tests: Add stats metadata validation to write_range_tombstone_same_end_with_row
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
b856cf837e tests: Add stats metadata validation to write_range_tombstone_same_start_with_row
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
ba24572fb6 tests: Add stats metadata validation to write_adjacent_range_tombstones_with_rows
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
4167c9e51d tests: Add stats metadata validation to write_mixed_rows_and_range_tombstones
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
fd1c9b84c6 tests: Add stats metadata validation to write_non_adjacent_range_tombstones
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
1a6d613654 tests: Add stats metadata validation to write_adjacent_range_tombstones
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
57d2d1a1c6 tests: Add stats metadata validation to write_simple_range_tombstone
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
bc5d5633dc tests: Add stats metadata validation to write_user_defined_type_table
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
d9f2829ca0 tests: Add stats metadata validation to write_compact_table
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
3a1e287c6a tests: Add stats metadata validation to write_large_clustering_key
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
722fc7222a tests: Add stats metadata validation to write_empty_clustering_values
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
1367243b7e tests: Add stats metadata validation to write_different_types
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
12b10c0cca tests: Add stats metadata validation to write_missing_columns_large_set
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
c990c518fc tests: Add stats metadata validation to write_multiple_rows
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
9bb46f7cc6 tests: Add stats metadata validation to write_multiple_partitions
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
99d3cbd2fc tests: Add stats metadata validation to write_collection_incremental_update
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
0118b15c06 tests: Add stats metadata validation to write_collection_wide_update
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
85782ed729 tests: Add stats metadata validation to write_deleted_row
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
66913adcc6 tests: Add stats metadata validation to write_deleted_column
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
997101f105 tests: Add stats metadata validation to write_ttled_column
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
a018388049 tests: Add stats metadata validation to write_ttled_row
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
260dfb3492 tests: Add stats metadata validation to test_write_wide_partitions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
349a73c464 tests: Add stats metadata validation to test_write_composite_clustering_key.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
4f14e65d70 tests: Add stats metadata validation to test_write_composite_partition_key.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
a7b85e8009 tests: Add stats metadata validation to test_write_static_row.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
ccb2dec22b tests: Introduce helper for validating stats metadata in SSTables 3.x tests.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
5f6240cd7d tests: Return sstable_assertions from validate_read() helper.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
cc12449646 sstables: Collect statistics when writing RT markers to SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Vladimir Krivopalov
2e5c221865 sstables: Update TTL/local deletion stats for non-expiring and live liveness_info.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 16:40:27 -08:00
Rafael Ávila de Espíndola
298873d33b Add a test with mismatched schema.
The sstable in the test is fine, but the schema thinks a static column
is regular.
2018-12-06 15:38:01 -08:00
Rafael Ávila de Espíndola
d392bc4924 Add a broken sstable test.
This sstable has a static column with clustering information.
2018-12-06 15:23:33 -08:00
Raphael S. Carvalho
1ddbbe51e6 tests: add test for compaction-strategy-aware major compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-12-06 18:37:16 -02:00
Raphael S. Carvalho
525ee18560 compaction: implement major compaction heuristic for leveled strategy
Major compaction for leveled strategy will now create a run of
non-overlapping sstables at the highest level. Until now, a single
sstable would be created at level 0 which was very suboptimal because
all data would need to climb up the levels again, making it a very
expensive I/O process.

Refs #1431.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-12-06 18:22:31 -02:00
Raphael S. Carvalho
3d9566e40d compaction: introduce notion of compaction-strategy-aware major compaction
That's only the very first step which introduces the machinery for making
major compaction aware of all strategies. By the time being, default
implementation is used for them all which only suits size tiered.

Refs #1431.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-12-06 18:22:30 -02:00
Vladimir Krivopalov
d2dfa2e15d sstables: Don't enforce default max_local_deletion_time value for 'mc' files.
Commit cc6c383249 has fixed an issue with
incorrectly tracking max_local_deletion_time and the check in
validate_max_local_deletion_time was called to work around old files.

This fix relaxes conditions for enforcing defaut max_local_deletion_time
so that they don't apply to SSTables in 'mc' format because the original
problem has been resolved before 'mc' format have been introduced.

This is needed to be able to read correct values from
Cassandra-generated SSTables that don't have a Scylla.db component.
Its presence or absence is used as an indicator of possibly affected
files.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 10:15:07 -08:00
Vladimir Krivopalov
0b1e6427ad sstables: Enforce estimated_partitions in generate_summary() to be always positive.
For tiny index files (< 8 bytes long) it could turn to zero and trigger
an assertion in prepare_summary().

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-06 10:15:07 -08:00
Raphael S. Carvalho
ffb00d2118 storage_service: remove outdated comment on ongoing compaction interrupt
After commit 5e953b5e47, compaction manager will forcefully stop
ongoing compactions instead of waiting for them to finish.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181206142600.21354-1-raphaelsc@scylladb.com>
2018-12-06 15:43:42 +01:00
Tomasz Grabiec
6012a63660 Merge "Fix window during init where waiting for a feature can be ignored" from Avi
storage_service keeps a bunch of "feature" variables, indicating cluster-wide
supported features, and has the ability to wait until the entire cluster supports
a given feature.

The propagation of features depends on gossip, but gossip is initialized after
storage_service, so the current code late-initializes the features. However, that
means that whoever waits on a feature between storage_service initialization and
gossip initialization loses their wait entry. In #3952, we have proof that this
in fact happens.

Fix this by removing the circular dependency. We now store features in a new
service, feature_service, that is started before both gossip and storage_service.
Gossip updates feature_service while storage_service reads for it.

Fixes #3953.

* https://github.com/avikivity/3953/v4.1:
  storage_service: deinline enable_all_features()
  gossiper: keep features registered
  tests/gossip: switch to seastar::thread
  storage_service: deinline init/deinit functions
  gossiper: split feature storage into a new feature_service
  gossiper: maybe enable features after start_gossiping()
  storage_service: fix gap when feature::when_enabled() doesn't work
2018-12-06 15:42:26 +01:00
Avi Kivity
33a0366ed8 storage_service: fix gap when feature::when_enabled() doesn't work
storage_service::register_features() reassigns to feature variables in
storage_service. This means that any call to feature::when_enabled() will be
orphaned when the feature is assigned.

Now that feature lifetimes are not tied to gossip, we can move the feature
initialization to the constructor and eliminate the gap. When gossip is started
it will evaluate application_states and enable features that the cluster agrees on.
2018-12-06 16:31:05 +02:00
Avi Kivity
587fd9b6c0 gossiper: maybe enable features after start_gossiping()
Since we may now start with features already registered, we need to enable
features immediately after gossip is started. This case happens in a cluster
that already is fully upgraded on startup. Before this series, features were
only added after this point.
2018-12-06 16:31:04 +02:00
Avi Kivity
4e553b692e gossiper: split feature storage into a new feature_service
Feature lifetime is tied to storage_service lifetime, but features are now managed
by gossip. To avoid circular dependency, add a new feature_service service to manage
feature lifetime.

To work around the problem, the current code re-initializes features after
gossip is initialized. This patch does not fix this problem; it only makes it
possible to solve it by untyping features from gossip.
2018-12-06 16:31:04 +02:00
Avi Kivity
9b476fc377 storage_service: deinline init/deinit functions
Reduces #include dependencies later on.
2018-12-06 16:31:04 +02:00
Avi Kivity
db72a7e8bd tests/gossip: switch to seastar::thread
Much simpler to manage the long initialization chain.
2018-12-06 16:31:04 +02:00
Avi Kivity
1215512e98 gossiper: keep features registered
Gossiper unregisters enabled features as an optimization. However that makes
decoupling features from gossiper harder. Disable this optimization; since the
number of features is small and normal access is to a single feature at a time,
there is no significant performance or memory loss.
2018-12-06 16:31:04 +02:00
Paweł Dziepak
9024187222 partition_slice: use small_vector for column_ids 2018-12-06 14:21:04 +00:00
Paweł Dziepak
a014367c5b mutation_fragment_merger: use small_vector 2018-12-06 14:21:04 +00:00
Paweł Dziepak
142c4a9d84 auth: use small_vector in resource 2018-12-06 14:21:04 +00:00
Paweł Dziepak
edbcac85cb auth: avoid list-initialisation of vectors
List-initialisation forces often completely unnecessary copies of the
elements.
2018-12-06 14:21:04 +00:00
Paweł Dziepak
890a5ba8ac idl: serialiser: add serialiser for utils::small_vector 2018-12-06 14:21:04 +00:00
Paweł Dziepak
abb4953209 idl: serialiser: deduplicate vector serialisers
In Scylla we have three implementations of vector-like structures
std::vector, utils::chunked_vector and utils::small_vector. Which one is
used is largerly an implementation detail and all should be serialised
by the IDL infrastructure in exactly the same way. To make sure that
it's indeed the case let's make them share the serialiser
implementation.
2018-12-06 14:21:04 +00:00
Paweł Dziepak
23d19d21bd utils: introduce small_vector
small_vector is a variation of std::vector<> that reserves a configurable
amount of storage internally, without the need for memory allocation.
This can bring measurable gains if the expected number of elements is
small. The drawback is that moving such small_vector is more expensive
and invalidates iterators as well as references which disqualifies it in
some cases.
2018-12-06 14:21:04 +00:00
Avi Kivity
21b4b2b9a1 Merge "Fix deadlocking multishard readers" from Botond
"
Multishard combining readers, running concurrently, with limited
concurrency and no timeout may deadlock, due to inactive shard readers
sitting on permits. To avoid this we have to make sure that all shard
readers belonging to a multishard combining readers, that are not
currently active, can be evicted to free up their permits, ensuring that
all readers can make progress.
Making inactive shard readers evictable is the solution for this
problem, however the original series introducing this solution
(414b14a6bd) did not go all they way and
left some loose ends. These loose ends are tied up by this mini-series.
Namely, two issues remained:
* The last reader to reach EOS was not paused (made evictable).
* Readers created/resumed as part of a read-ahead were not paused
  immediately after finishing the read-ahead.

This series fixes both of these.

Fixes: #3865
Tests: unit(release, debug)
"

* 'fix-multishard-reader-deadlock/v1' of https://github.com/denesb/scylla:
  multishard_combining_reader: pause readers after reading ahead
  multishard_combining_reader: pause *all* EOS'd readers
2018-12-06 16:08:11 +02:00
Botond Dénes
ee193f1ab4 multishard_combining_reader: pause readers after reading ahead
Readers created or resumed just to read ahead should be paused right
after, to avoid consuming all available permits on the shards they
operate on, causing a deadlock.
2018-12-06 13:20:30 +02:00
Avi Kivity
d4f353d3c8 Merge "normalized python3 compatibility, shebang and encoding" from Alexys
"
This series of patches ensures that all the Python code base is python3 compliant
and consistent by applying the following logic:

- python3 classifier on setup.py to explicitly state our python compatibility matrix
- add UTF-8 encoding header
- correct every shebang to the same /usr/bin/env python3
- shebang is only added on scripts meant to be executed on their own (removed otherwise)
- migrate some leftover scripts from python2 to python3 with minimal QA

This work is important to prepare for a more drastic change on Python code styling
using the black formatter and the setting up of automated QA checks on Python code base.
"

* 'python3_everywhere' of https://github.com/numberly/scylla:
  scylla-housekeeping: fix python3 compat and shebang
  dist/ami/files/scylla_install_ami: python3 shebang
  dist/docker/redhat/docker-entrypoint.py: add encoding comment
  fix_system_distributed_tables.py: fix python3 compat and shebang
  gen_segmented_compress_params.py: add encoding comment
  idl-compiler.py: python3 shebang
  scylla-gdb.py: python3 shebang
  configure.py: python3 shebang
  tools/scyllatop/: add / normalize python3 shebang
  scripts/: add / normalize python3 shebang
  dist/common/scripts: add / normalize python3 shebang
  test.py: add encoding comment
  setup.py: add python3 classifiers
2018-12-06 12:16:57 +02:00
Avi Kivity
f073ea5f87 Merge "Fix tombstone histogram when writing SSTables 3.x" from Vladimir
"
This patchset extends a number of existing tests to check SSTables
statistics for 'mc' format and fixes an issue discovered with the help
of one of the tests.

Tests: unit {release}
"

* 'projects/sstables-30/check-stats/v2' of https://github.com/argenet/scylla:
  tests: Run sstable_timestamp_metadata_correcness_with_negative with all SSTables versions.
  tests: Run sstable_tombstone_histogram_test for all SSTables versions.
  tests: Run min_max_clustering_key_test on all SSTables versions.
  tests: Expand test_sstable_max_local_deletion_time_2 to run for all SSTables versions.
  tests: Run test_sstable_max_local_deletion_time on all SSTables versions.
  tests: Extend test checking tombstones histogram to cover all SSTables versions.
  sstables: Properly track row-level tombstones when writing SSTables 3.x.
  tests: Run min_max_clustering_key_test_2 for all SSTables versions.
  tests: Make reusable_sst() helper accept SSTables version parameter.
2018-12-06 11:44:33 +02:00
Botond Dénes
170fa382fa multishard_combining_reader: pause *all* EOS'd readers
Previously the last shard reader to reach EOS wasn't paused. This is a
mistake and can contribute to causing deadlocks when the number of
concurrently active readers on any shard is limited.
2018-12-06 10:30:43 +02:00
Vladimir Krivopalov
dd769f2b41 tests: Run sstable_timestamp_metadata_correcness_with_negative with all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 15:29:28 -08:00
Vladimir Krivopalov
a098387e9f tests: Run sstable_tombstone_histogram_test for all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 15:29:28 -08:00
Vladimir Krivopalov
06a47fc9f9 tests: Run min_max_clustering_key_test on all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 15:29:28 -08:00
Vladimir Krivopalov
c53afd7bba tests: Expand test_sstable_max_local_deletion_time_2 to run for all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 15:29:28 -08:00
Vladimir Krivopalov
cfbde5b89c tests: Run test_sstable_max_local_deletion_time on all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 15:29:28 -08:00
Vladimir Krivopalov
9955710cac tests: Extend test checking tombstones histogram to cover all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 12:36:22 -08:00
Vladimir Krivopalov
cdae62ec29 sstables: Properly track row-level tombstones when writing SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 12:36:22 -08:00
Vladimir Krivopalov
0f3fb32028 tests: Run min_max_clustering_key_test_2 for all SSTables versions.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 12:36:22 -08:00
Vladimir Krivopalov
c474b0d851 tests: Make reusable_sst() helper accept SSTables version parameter.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-12-05 12:36:22 -08:00
Paweł Dziepak
504c586392 intrusive_set_external_comparator: make iterator nothrow move constructible 2018-12-05 20:07:29 +00:00
Paweł Dziepak
402902ac78 mutation_fragment_merger: value-initialise iterator
ForwardIterators are default constructible, but they have to be
value-initialised to compare equal to other value-initialised instances
of that iterator.
2018-12-05 20:07:29 +00:00
Tomasz Grabiec
2c2d202354 tests: perf_fast_forward: Make output directory configurable
Message-Id: <1544020034-16340-1-git-send-email-tgrabiec@scylladb.com>
2018-12-05 21:51:01 +02:00
Tomasz Grabiec
247347058c tests: perf_fast_forward: Always print to stdout
Otherwise errors cannot be made sense of, since error are reported
always to stdout. Without test output we don't know what they're
referring to.

This change makes the output always go to stdout, in addition to other
reportes, if any.
Message-Id: <1544020084-16492-1-git-send-email-tgrabiec@scylladb.com>
2018-12-05 21:51:01 +02:00
Yibo Cai (Arm Technology China)
6fadba56cc utils: optimize UTF-8 validation
UTF-8 string is now validated by boost::locale::conv::utf_to_utf, it
actually does string conversions which is more than necessary.  As
observed on Arm server, UTF-8 validation can become bottleneck under
heavy loads.

This patch introduces a brand new SIMD implementation supporting both
NEON and SSE, as well as a naive approach to handle short strings.
The naive approach is 3x faster than boost utf_to_utf, whilst SIMD
method outperforms naive approach 3x ~ 5x on Arm and x86. Details at
https://github.com/cyb70289/utf8/.

UTF-8 unit test is added to check various corner cases.

Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1543978498-12123-1-git-send-email-yibo.cai@arm.com>
2018-12-05 21:51:01 +02:00
Tomasz Grabiec
3e70ae1d06 Merge "Improve times to start / stop the nodes" from Glauber
If the compaction manager is started, compactions may start (this is
regardless of whether or not we trigger them). The problem with that is
that they start at a time in which we are flushing the commitlog and the
initialization procedure waits for the commitlog to be fully flushed and
the resulting memtables flushed before we move on.

Because there are no incoming writes, the amount of shares in memtable
flushes decrease as memory used decreases and that can cause the startup
procedure to take a long time.

We have recently started to bump the shares manually for manual flushes.
While that guarantees that we will not drive the shares to zero, I will
make the argument that we can do better by making sure that those things
are, at this point, running alone: user experience is affected by
startup times and the bump we give to user-triggered operations will
only do so much. Even if we increase the shares a lot flushes will still
be fighting for resources with compactions and startup will take longer
than it could.

By making sure that flushes are this point running alone we improve the
user experience by making sure the startup is as fast as it can be.

There is a similar problem at the drain level, which is also fixed in this
series.

Fixes #3958

* git@github.com:glommer/scylla.git faster-restart
  compaction_manager: delay initialization of the compaction manager.
  drain: stop compactions early
2018-12-05 21:51:01 +02:00
Asias He
eeeb2da7bb gossip: Fix race in real_mark_alive and shutdown msg
In dtest, we have

   self.check_rows_on_node(node1, 2000)
   self.check_rows_on_node(node2, 2000)

which introduce the following cluster operations:

1) Initially:

- node1 up
- node2 up

2) self.check_rows_on_node(node1, 2000)
- node2 down
- node2 up (A: node2 will call gossiper::real_mark_alive when node2 boots
up to mark node1 up)

3) self.check_rows_on_node(node2, 2000)
- node1 down (B: node1 will send shutdown gossip message to node2, node2
will mark node1 down)
- node1 up (C: when node1 is up, node2 will call
gossiper::real_mark_alive)

Since there is no guarantee the order of Operation A and Operation B, it
is possible node2 will mark node1 as status=shutdown and mark node1 is
UP.

In Operation C, node2 will call gossiper::real_mark_alive to mark node1
up, but since node2 might think node1 is already up, node2 will exit
early in gossiper::real_mark_alive and not log "InetAddress 127.0.0.1 is
now UP, status={}"

As a result, dtest fails to see node2 reports node1 is up when it boots
node1 and fail the test.

   TimeoutError: 23 Nov 2018 10:44:19 [node2] Missing: ['127.0.0.1.* now UP']

In the log we can see node1 marked as DOWN and UP almost at the same time on node2:

   INFO  2018-11-23 22:31:29,999 [shard 0] gossip - InetAddress 127.0.0.1 is now DOWN, status = shutdown
   INFO  2018-11-23 22:31:30,006 [shard 0] gossip - InetAddress 127.0.0.1 is now UP, status = shutdown

Fixes #3940

Tests: dtest with 20 consecutive succesful runs
Message-Id: <996dc325cbcc3f94fc0b7569217aa65464eaaa1c.1543213511.git.asias@scylladb.com>
2018-12-05 21:51:01 +02:00
Tomasz Grabiec
edbef7400b configure.py: Always add a rule for building gen_crc_combine_table
Fixes a build failure when only the scylla binary was selected for
building like this:

  ./configure.py --with scylla

In this case the rule for gen_crc_combine_table was missing, but it is
needed to build crc_combine_table.o

Message-Id: <1544010138-21282-1-git-send-email-tgrabiec@scylladb.com>
2018-12-05 21:51:01 +02:00
Botond Dénes
77dbc7d09a querier: fix evict_one() and evict_all_for_table()
Both of these have the same problem. They remove the to-be-evicted
entries from `_entries` but they don't unregister the `entry` from the
`read_concurrency_semaphore`. This results in the
`reader_concurrency_semaphore` being left with a dangling pointer to the
entries will trigger segfault when it tries to evict the associated
inactive reads.

Also add a unit test for `evict_all_for_table()` to check that it works
properly (`evict_one()` is only used in tests, so no dedicated test for
it).

Fixes: #3962

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <57001857e3791c6385721b624d33b667ccda2e7d.1544010868.git.bdenes@scylladb.com>
2018-12-05 21:51:01 +02:00
Avi Kivity
0be554c337 storage_service: deinline enable_all_features()
Next commit wants to make it depend on config, which is best done out-of-line.
2018-12-05 17:30:42 +02:00
Asias He
a5d8b66f2c gossip: Make favor newly added node log debug level
It is not very useful for user to know this.

Message-Id: <6c2dfc522d6974adb97c34fbc1e3a0339d2d530c.1543997137.git.asias@scylladb.com>
2018-12-05 10:45:03 +02:00
Avi Kivity
b0cb69ec25 Merge "Make sstable reader fail on unknown colum names in MC format" from Piotr
"
Before the reader was just ignoring such columns but this creates a risk of data loss.

Refs #2598
"

* 'haaawk/2598/v3' of github.com:scylladb/seastar-dev:
  sstables: Add test_sstable_reader_on_unknown_column
  sstables: Exception on sstable's column not present in schema
  sstables: store column name in column_translation::column_info
  sstables: Make test_dropped_column_handling test dropped columns
2018-12-05 10:43:29 +02:00
Takuya ASADA
9388f3d626 reloc: drop --jobs from build_deb.sh/build_rpm.sh scripts
Since we merged relocatable package, build_deb.sh/build_rpm.sh only does
packaging using prebuilt binary taken from relocatable package, won't compile
anything.

So passing --jobs option to build_deb.sh/build_rpm.sh becomes meaningless,
we can drop it.

Note that we still can specify --jobs option on reloc/build_reloc.sh, it
runs "ninja-build -jN" to compile Scylla, then generate relocatable package.

See #3956

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181204205652.25138-1-syuu@scylladb.com>
2018-12-04 21:00:51 +00:00
Glauber Costa
0b7818d2b9 drain: stop compactions early
drain suffers from the same problem as startup suffers now: memtables
are flushed as part of the drain routine, and because there are no
incoming writes the shares the controller assign to flushes go down over
time, slowing down the process of drain.

This patch reorders things so that we stop compactions first, and flush
later. It guarantees that when flush do happen it will have the full
bandwidth to work with.

There is a comment in the code saying we should stop compactions
forcefully instead of waiting for them to finish. I consider this
orthogonal to this patch therefore I am not touching this. Doing so will
make the drain operation even faster but can be done later. Even when we
do it, having the flushes proceed alone instead of during compactions
will make it faster.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-12-04 13:55:59 -05:00
Glauber Costa
fee4d2eb9b compaction_manager: delay initialization of the compaction manager.
If the compaction manager is started, compactions may start (this is
regardless of whether or not we trigger them). The problem with that is
that they start at a time in which we are flushing the commitlog and the
initialization procedure waits for the commitlog to be fully flushed and
the resulting memtables flushed before we move on.

Because there are no incoming writes, the amount of shares in memtable
flushes decrease as memory used decreases and that can cause the startup
procedure to take a long time.

We have recently started to bump the shares manually for manual flushes.
While that guarantees that we will not drive the shares to zero, I will
make the argument that we can do better by making sure that those things
are, at this point, running alone: user experience is affected by
startup times and the bump we give to user-triggered operations will
only do so much. Even if we increase the shares a lot flushes will still
be fighting for resources with compactions and startup will take longer
than it could.

By making sure that flushes are this point running alone we improve the
user experience by making sure the startup is as fast as it can be.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-12-04 13:48:42 -05:00
Tomasz Grabiec
b8c405c019 Merge "Correct the usage of row ttl and add write-read test" from Piotr
Fixes the condition which determines whether a row ttl should be used for a cell
and adds a test that uses each generated mutation to populate mutation source
and then verifies that it can read back the same mutation.

* seastar-dev.git haaawk/sst3/write-read-test/v3:
  Fix use_row_ttl condition
  Add test_all_data_is_read_back
2018-12-04 19:47:28 +01:00
Tomasz Grabiec
9a4c00beb7 utils/gz: Fix compilation on non-x86 archs
gen_crc_combine_table is now executed on every build, so it should not
fail on unsupported archs. The generated file will not contain data,
but this is fine since it should not be used.

Another problem is that u32 and u64 aliases were not visible in the #else
branch in crc_combine.cc
Message-Id: <1543864425-5650-1-git-send-email-tgrabiec@scylladb.com>
2018-12-04 18:17:27 +00:00
Piotr Jastrzebski
fed3b51abe Add test_all_data_is_read_back
This tests that a source after being populated with a mutation
returns exactly the same mutation when read.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-12-04 11:42:08 +01:00
Piotr Sarna
7b0a3fbf8a auth: add abort_source to waiting for schema agreement
When the auth service is requested to stop during bootstrap,
it might have still not reached schema agreement.
Currently, waiting for this agreement is done in an infinite loop,
without taking abort_source into account.
This patch introduces checking if abort was requested
and breaking the loop in such case, so auth service can terminate.

Tests:
unit (release)
dtest (bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test)
Message-Id: <1b7ded14b7c42254f02b5d2e10791eb767aae7fc.1543914769.git.sarna@scylladb.com>
2018-12-04 10:41:09 +00:00
Piotr Jastrzebski
75b99838fc Fix use_row_ttl condition
Previous condition was wrong and was using row ttl too often.

We also have to change test_dead_row_marker to compare
resulting sstable with sstable generated by Origin not
by sstableupgrade.
This is because sstableupgrade transmits information about deleted row
marker automatically to cells in that row.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-12-04 10:51:36 +01:00
Avi Kivity
c3e664eec2 Merge "Improve corrupt sstable reporting" from Rafael
"
This is a small step in fixing issue #2347. It is mostly tests and
testing infrastructure, but it does include a fix for a case where we
were missing the filename in the malformed_sstable_exception.
"

* 'espindola/sstable-corruption-v2' of https://github.com/espindola/scylla:
  Add a filename to a malformed_sstable_exception.
  Try to read the full sst in broken_sst.
  Convert tests to SEASTAR_THREAD_TEST_CASE.
  Check the exception message.
  Move some tests to broken_sstable_test.cc
2018-12-04 10:32:10 +02:00
Avi Kivity
414b14a6bd Merge "Make inactive shard readers evictable" from Botond
"
This series attempts to solve the regressions recently discovered in
performance of multi-partition range-scans. Namely that they:
* Flood the reader concurrency semaphore's queues, trampling other
  reads.
* Behave very badly when too many of them is running concurrently
  (trashing).
* May deadlock if enough of them is running without a timeout.

The solution for these problems is to make inactive shard readers
evictable. This should address all three issues listed above, to varying
degrees:
* Shard readers will now not cling onto their permits for the entire
  duration of the scan, which might be a lot of time.
* Will be less affected by infinite concurrency (more than the node can
  handle) as each scan now can make progress by evicting inactive shard
  readers belonging to other scans.
* Will not deadlock at all.

In addition to the above fix, this series also bundles two further
improvements:
* Add a mechanism to `reader_concurrecy_semaphore` to be notified of
  newly inserted evictables.
* General cleanups and fixes for `multishard_combining_reader` and
  `foreign_reader`.

I can unbundle these mini series and send them separately, if the
maintainers so prefer, altough considering that this series will have to
be backported to 3.0, I think this present form is better.

Fixes: #3835
"

* 'evictable-inactive-shard-readers/v7' of https://github.com/denesb/scylla: (27 commits)
  tests/multishard_mutation_query_test: test stateless query too
  tests/querier_cache: fail resource-based eviction test gracefully
  tests/querier_cache: simplify resource-based eviction test
  tests/mutation_reader_test: add test_multishard_combining_reader_next_partition
  tests/mutation_reader_test: restore indentation
  tests/mutation_reader_test: enrich pause-related multishard reader test
  multishard_combining_reader: use pause-resume API
  query::partition_slice: add clear_ranges() method
  position_in_partition: add region() accessor
  foreign_reader: add pause-resume API
  tests/mutation_reader_test: implement the pause-resume API
  query_mutations_on_all_shards(): implement pause-resume API
  make_multishard_streaming_reader(): implement the pause-resume API
  database: add accessors for user and streaming concurrency semaphores
  reader_lifecycle_policy: extend with a pause-resume API
  query_mutations_on_all_shards(): restore indentation
  query_mutations_on_all_shards(): simplify the state-machine
  multishard_combining_reader: use the reader lifecycle policy
  multishard_combining_reader: add reader lifecycle policy
  multishard_combining_reader: drop unnecessary `reader_promise` member
  ...
2018-12-04 10:22:35 +02:00
Botond Dénes
9de4f3a834 tests/multishard_mutation_query_test: test stateless query too
In the `test_read_all`, do a stateless read as well to ensure that path
works correctly as well.
2018-12-04 08:51:05 +02:00
Botond Dénes
6676ceba7f tests/querier_cache: fail resource-based eviction test gracefully
Currently when this test fails, resources are not released in the
correct order, which results in ASAN complaining about use-after-free
in debug builds. This is due to the BOOST_REQUIRE macro aborting the
test when the predicate fails, not allowing for correct destruction
order to take place.
To avoid this ugly failure, that adds noise and might cause a developer
investigating into the failure starting on the wrong path, use the more
mild BOOST_CHECK family of test macros. These will allow the test to run
to completion even when the predicate fails, allowing for the correct
destruction of the resources.
2018-12-04 08:51:05 +02:00
Botond Dénes
93e41397f7 tests/querier_cache: simplify resource-based eviction test
Now that we have an accessor for all concurrency semaphores, we don't
need the tricks of creating a dummy keyspace to get them. Use the
accessors instead.
2018-12-04 08:51:05 +02:00
Botond Dénes
dcd2d116a3 tests/mutation_reader_test: add test_multishard_combining_reader_next_partition
Test the interaction of the multishard reader with the foreign reader
w.r.t next_partition(). next_partition() is a special operation, as it
its execution is deferred until the next cross-shard operations. Give it
some extra stress-testing.
2018-12-04 08:51:05 +02:00
Botond Dénes
20e994e526 tests/mutation_reader_test: restore indentation
Left over from the previous patch.
2018-12-04 08:51:05 +02:00
Botond Dénes
a577ff97e9 tests/mutation_reader_test: enrich pause-related multishard reader test
Enrich the existing test_multishard_combining_reader_as_mutation_source
test case with delaying the pause/resume and eviction of paused
readers.
2018-12-04 08:51:05 +02:00
Botond Dénes
22b14d593b multishard_combining_reader: use pause-resume API
Refactor the multishard combining reader to make use of the new
pause-resume API to pause inactive shard readers.

Make the pause-resume API mandatory to implement, as by now all existing
clients have adapted it.
2018-12-04 08:51:05 +02:00
Botond Dénes
77b758707c query::partition_slice: add clear_ranges() method
Allows for clearing any custom partition ranges, effectively resetting
them to the default ones. Useful for code that needs to set several
different specific partition ranges, one after the other, but doesn't
want to remember the last key it set a range for to be able to clear the
previous range with `clear_range()`.
2018-12-04 08:51:05 +02:00
Botond Dénes
a594fd39ce position_in_partition: add region() accessor 2018-12-04 08:51:05 +02:00
Botond Dénes
9601d23e0d foreign_reader: add pause-resume API
Allowing for pausing the reader and later resume it. Pausing the reader
waits on the ongoing read ahead (if any), executes any pending
`next_partition()` calls and than detaches the shard reader's buffer.
The paused shard reader is returned to the client.
Resuming the reader consists of getting the previously detached reader
back, or one that has the same position as the old reader had.
This API allows for making the inactive shard readers of the
`multishard_combining_reader` evictable.
The API is private, it's only accessible for classes knowing the full
definition of the `foreign_reader` (which resides in a .cc file).
2018-12-04 08:51:05 +02:00
Botond Dénes
a12fae366d tests/mutation_reader_test: implement the pause-resume API 2018-12-04 08:51:05 +02:00
Botond Dénes
f334d3717f query_mutations_on_all_shards(): implement pause-resume API 2018-12-04 08:51:05 +02:00
Botond Dénes
72ed655ef0 make_multishard_streaming_reader(): implement the pause-resume API 2018-12-04 08:51:05 +02:00
Botond Dénes
bf0d1f4eea database: add accessors for user and streaming concurrency semaphores
These will soon be needed to register inactive user and streaming reads
with the respective semaphores.
2018-12-04 08:51:05 +02:00
Botond Dénes
5f67a065c6 reader_lifecycle_policy: extend with a pause-resume API
This API provides a way for the mulishard reader to pause inactive shard
readers and later resume them when they are needed again. This allows
for these paused shard readers to be evicted when the node is under
pressure.
How the readers are made evictable while paused is up to the clients.

Using this API in the `multishard_combining_reader` and implementing it
in the clients will be done in the next patches.

Provide default implementation for the new virtual methods to facilitate
gradual adoption.
2018-12-04 08:51:05 +02:00
Botond Dénes
6f0e0c4ed7 query_mutations_on_all_shards(): restore indentation
The previous patch added half-aligned lines to improve readability of
that patch.
2018-12-04 08:51:05 +02:00
Botond Dénes
aa6083a75b query_mutations_on_all_shards(): simplify the state-machine
The `read_context` which handles creating, saving and looking-up the
shard readers has to deal with its `destroy_reader()` method called any
time, even before some other method finished its work. For example it is
valid for a reader to be requested to be destroyed, even before the
contexts finishes creating it.
This means that state transitions that take time can be interleaved with
another state transition request. To deal with this the read context
uses `future_` states, states that mark an ongoing state transitions.
This allows for state transition request that arrive in the middle of
another state transition to be attached as a continuation to the ongoing
transition, and to be executed after that finishes. This however
resulted in complex code, that has to handle readers being in all sorts
of different states, when the `save_readers()` method is called.
To avoid all this complexity, exploit the fact that `destroy_reader()`
receives a future<> as its argument, which resolves when all previous
state transitions have finished. Use a gate to wait on all these futures
to resolve. This way we don't need all those transitional states,
instead in `save_readers()` we only need to wait on the gate to close.
Thus the number of states `save_readers()` has to consider drops
drastically.

This has the theoretical drawback of the process of saving the readers
having to wait on each of the readers to stop, but in practice the
process finishes when the last reader is saved anyway, so I don't expect
this to result in any slowdown.
2018-12-04 08:51:05 +02:00
Botond Dénes
007619de4c multishard_combining_reader: use the reader lifecycle policy
Refactor the multishard combining reader and its clients to use the
reader lifecycle policy introduced in the previous patch.
2018-12-04 08:51:05 +02:00
Botond Dénes
0a616c899e multishard_combining_reader: add reader lifecycle policy
Currently `multishard_combining_reader` takes two functors, one for
creating the readers and optionally one for destroying them.
A bag of functors (`std::function`) however make for a terrible
interface, and as we are about to add some more customization points,
it's time to use something more formal: policy based design, a
well-known design pattern.

As well as merging the job of the two functors into a single policy
class, also widen the area of responsibility of the policy to include
keeping alive any resource the shard readers might need on their home
shard. Implementing a proper reader cleanup is now not optional either.

This patch only adds the `reader_managing_policy` interface,
refactoring the multishard reader to use it will be done in the next patch.
2018-12-04 08:51:05 +02:00
Botond Dénes
301abaca07 multishard_combining_reader: drop unnecessary reader_promise member
The `reader_promise` member of the `shard_reader` was used to
synchronize a foreground request to create the underlying reader with an
ongoing background request with the same goal. This is however
unnecessary. The underlying reader is created in the background only as
part of a read ahead. In this case there is no need for extra
synchronization point, the foreground reader create request can just
wait for the read ahead to finish, for which there already exists a
mean. Furthermore, foreground reader create requests are always followed
by a `fill_buffer()` request, so by waiting on the read ahead we ensure
that the following `fill_buffer()` call will not block.
2018-12-04 08:51:05 +02:00
Botond Dénes
a73175fdbc multishard_combining_reader: drop tracking of pending next_partition calls
Shard readers used to track pending `next_partition()` calls that they
couldn't execute, because their underlying reader wasn't created yet.
These pending calls were then executed after the reader was created.
However the only situation where a shard reader can receive a
`next_partition()` call, before its underlying reader wasn't created is
when `next_partition()` is called on the multishard reader before a
single fragment is read. In this case we know we are at a partition
boundary and thus this call has no effect, therefore it is safe to
ignore it.
2018-12-04 08:51:05 +02:00
Botond Dénes
ab3e639c3b foreign_reader: use bool for pending_next_partition
Foreign reader doesn't execute `next_partition()` calls straight away,
when this would require interaction with the remote reader. Instead
these calls are "remembered" and executed on the next occasion the
foreign reader has to interact with the remote reader. This was
implemented with a counter that counts the number of pending
`next_partition()` calls.
However when `next_partition()` is called multiple times, without
interleaving calls to `operator()()` or `fast_forward_to()`, only the
first such call has effect. Thus it doesn't make sense to count these
calls, it is enough to just set a flag if there was at least one such
call.
2018-12-04 08:51:05 +02:00
Botond Dénes
5a4fd1abab multishard_combining_reader: drop support for streamed_mutation fast-forwarding
It doesn't make sense for the multishard reader anyway, as it's only
used by the row-cache. We are about to introduce the pausing of inactive
shard readers, and it would require complex data structures and code
to maintain support for this feature that is not even used. So drop it.
2018-12-04 08:51:05 +02:00
Botond Dénes
b36733971b mutation_source_test: add option to skip intra-partition fast-forwarding tests
To allow for using this test suite for testing mutation sources that
don't support intra-partition fast-forwarding.
2018-12-04 08:51:05 +02:00
Botond Dénes
37f0117747 reader_concurrency_semaphore: refactor eviction mechanism
As we are about to add multiple sources of evictable readers, we need a
more scalable solution than a single functor being passed that opaquely
evicts a reader when called.
Add a generic way to register and unregister evictable (inactive)
readers to the semaphore. The readers are expected to be registered when
they become evictable and are expected to be unregistered when they
cease to become evictable. The semaphore might evict any reader that is
registered to it, when it sees fit.

This also solves the problem of notifying the semaphore when new readers
become evictable. Previously there was no such mechanism, and the
semaphore would only evict any such new readers when a new permit was
requested from it.
2018-12-04 08:51:00 +02:00
Rafael Ávila de Espíndola
21199a7a5c Add a filename to a malformed_sstable_exception.
It is reasonable for parse() to throw when it finds something wrong
with the format. This seems to be the best spot to add the filename
and rethrow.

Also add a testcase to make sure we keep handling this error
gracefully.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-03 13:50:23 -08:00
Rafael Ávila de Espíndola
a6e25e4bd0 Try to read the full sst in broken_sst.
With this patch we use data_consume_rows to try to read the entire
sstable. The patch also adds a test with a particular corruption that
would not be found without parsing the file.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-03 13:47:49 -08:00
Rafael Ávila de Espíndola
b1190c58ec Convert tests to SEASTAR_THREAD_TEST_CASE.
This will simplify future changes to broken_sst.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-03 13:26:06 -08:00
Rafael Ávila de Espíndola
e5c5afffc9 Check the exception message.
This makes the tests a bit more strict by also checking the message
returned by the what() function.

This shows that some of the tests are out of sync with which errors
they check for. I will hopefully fix this in another pass.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-03 12:31:53 -08:00
Rafael Ávila de Espíndola
f9d81bcd43 Move some tests to broken_sstable_test.cc
sstable_test.cc was already a bit too big and there is potential for
having a lot of tests about broken sstables.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2018-12-03 12:16:30 -08:00
Rafael Ávila de Espíndola
cf4dc38259 Simplify state machine loop.
These loops have the structure :

while (true) {
  switch (state) {
  case state1:
  ...
  break;
  case state2:
  if (...) { ... break; } else {... continue; }
  ...
  }
  break;
}

There a couple things I find a bit odd on that structure:

* The break refers to the switch, the continue to the loop.
* A while (true) loop always hits a break or a continue.

This patch uses early returns to simplify the logic to

while (true) {
  switch (state) {
  case state1:
  ...
  return
  case state2:
  if (...) { ... return; }
  ...
  }
}

Now there are no breaks or continues.

Tests: unit (release)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181126171726.84629-1-espindola@scylladb.com>
2018-12-03 20:34:03 +01:00
Avi Kivity
b098b5b987 Merge "Optimize checksum_combine() for CRC32" from Tomek
"
zlib's crc32_combine() is not very efficient. It is faster to re-combine
the buffer using crc32(). It's still substantial amount of work which
could be avoided.

This patch introduces a fast implementation of crc32_combine() which
uses a different algorithm than zlib. It also utilizes intrinsics for
carry-less multiplication instruction to perform the computation faster.
The details of the algorithm can be found in code comments.

Performance results using perf_checksum and second buffer of length 64 KiB:

zlib CRC32 combine:   38'851   ns
libdeflate CRC32:      4'797   ns
fast_crc32_combine():     11   ns

So the new implementation is 3500x faster than zlib's, and 417x faster than
re-checksumming the buffer using libdeflate.

Tested on i7-5960X CPU @ 3.00GHz

Performance was also evaluated using sstable writer benchmark:

  perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \
     --value-size=10000 --rows 1000000 --datasets small-part

It yielded 9% improvement in median frag/s (129'055 vs 117'977).

Refs #3874
"

* tag 'fast-crc32-combine-v2' of github.com:tgrabiec/scylla:
  tests: perf_checksum: Test fast_crc32_combine()
  tests: Rename libdeflate_test to checksum_utils_test
  tests: libdeflate: Add more tests for checksum_combine()
  tests: libdeflate: Check both libdeflate and default checksummers
  sstables: Use fast_crc_combine() in the default checksummer
  utils/gz: Add fast implementation of crc32_combine()
  utils/gz: Add pre-computed polynomials
  utils/gz: Import Barett reduction implementation from libdeflate
  utils: Extract clmul() from crc.hh
2018-12-03 19:02:01 +02:00
Tomasz Grabiec
aa19f98d18 sstables: Write Statistics.db offset map entries in the same order as Cassandra
Before this patch we were writing offset map enteies in unspecified
order, the one returned by std::unorderd_map. Cassandra writes them
sorted by metadata_type. Use the same order for improved
compatibility.

Fixes #3955.

Message-Id: <1543846649-22861-1-git-send-email-tgrabiec@scylladb.com>
2018-12-03 16:40:24 +02:00
Avi Kivity
4dc402b53f Merge "Create sstable in a sub-directory" from Benny
"
Due to an XFS heuristic, if all files are in one (or a few) directories,
then block allocation can become very slow. This is because XFS divides
the disk into a few allocation groups (AGs), and each directory allocates
preferentially from a single AG. That AG can become filled long before
the disk is full.

This patchset works around the problem by:
- creating sstable component files in their own temporary, per-sstable sub-directory,
- moving the files back into the canonical location right after begin created, and finally
- removing the temp sub-directory when the sstable is sealed.
- In addition, any temporary sub-directories that might have been left over if scylla
  crashed while creating sstables are looked up and removed when populating the table.

Fixes: #3167

Tests: unit (release)
"

* 'issues/3167/v7' of https://github.com/bhalevy/scylla:
  distributed_loader::populate_column_family: lookup and remove temp sstable directories
  database: directly use std::experimental::filesystem::path for lister::path
  database: use std::experimental::filesystem::path for lister::path
  sstable: use std::experimental::filesystem rather than boost
  sstable::seal_sstable: fixup indentation
  sstable: create sstable component files in a subdirectory
  sstable::new_sstable_component_file: pass component_type rather than filename
  sstable: cleanup filename related functions
  sstable: make write_crc, write_digest, and new_sstable_component_file private methods
2018-12-03 16:26:12 +02:00
Tomasz Grabiec
feefb23232 tests: perf_checksum: Test fast_crc32_combine() 2018-12-03 14:40:35 +01:00
Tomasz Grabiec
dda0f9b6eb tests: Rename libdeflate_test to checksum_utils_test 2018-12-03 14:40:35 +01:00
Tomasz Grabiec
7febdb5a5c tests: libdeflate: Add more tests for checksum_combine() 2018-12-03 14:40:35 +01:00
Tomasz Grabiec
b22ed75416 tests: libdeflate: Check both libdeflate and default checksummers 2018-12-03 14:40:35 +01:00
Tomasz Grabiec
1eb03b6ff1 sstables: Use fast_crc_combine() in the default checksummer 2018-12-03 14:40:35 +01:00
Tomasz Grabiec
1fb792c547 utils/gz: Add fast implementation of crc32_combine()
zlib's crc32_combine() is not very efficient. It is faster to re-combine
the buffer using crc32(). It's still substantial amount of work which
could be avoided.

This patch introduces a fast implementation of crc32_combine() which
uses a different algorithm than zlib. It also utilizes intrinsics for
carry-less multiplication instruction to perform the computation faster.
The details of the algorithm can be found in code comments.

Performance results using perf_checksum and second buffer of length 64 KiB:

zlib CRC32 combine:   38'851   ns
libdeflate CRC32:      4'797   ns
fast_crc32_combine():     11   ns

So the new implementation is 3500x faster than zlib's, and 417x faster than
re-checksumming the buffer using libdeflate.

Tested on i7-5960X CPU @ 3.00GHz

Performance was also evaluated using sstable writer benchmark:

  perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \
     --value-size=10000 --rows 1000000 --datasets small-part

It yielded 9% improvement in median frag/s (129'055 vs 117'977).
2018-12-03 14:40:35 +01:00
Tomasz Grabiec
cd3d9d357b utils/gz: Add pre-computed polynomials
gen_crc_combine_table.cc will be run during build to produce tables
with precomputed polynomials (4 x 256 x u32). The definitions will
reside in:

  build/<mode>/gen/utils/gz/crc_combine_table.cc

It takes 20ms to generate on my machine.

The purpose of those polynomials will be explained in crc_combine.cc
2018-12-03 14:36:09 +01:00
Tomasz Grabiec
63e0da9e58 utils/gz: Import Barett reduction implementation from libdeflate 2018-12-03 14:36:09 +01:00
Tomasz Grabiec
bb7d95d6c3 utils: Extract clmul() from crc.hh 2018-12-03 14:36:08 +01:00
Botond Dénes
0cb7c43fb5 reader_concurrency_semaphore: add dedicated .cc file
As we are about to extend the functionality of the reader concurrency
semaphore, adding more method implementations that need to go to a .cc
file, it's time we create a dedicated file, instead of keep shoving them
into unrelated .cc files (mutation_reader.cc).
2018-12-03 13:37:02 +02:00
Avi Kivity
d6a22c50cb Update libdeflate submodule
* libdeflate 17ec6c9...e7e54ea (1):
  > build: improve out-of-tree build with multiple output trees
2018-12-03 11:18:02 +02:00
Botond Dénes
34c2d67614 reader_concurrency_semaphore: rearrange members
Use standard convention of the rest of the code base. Type definitions
first, then data members and finally member functions.
As we are about to add more members, its especially important to make
the growing class have a familiar member arrangement.
2018-12-03 08:26:10 +02:00
Benny Halevy
9e7125a9de distributed_loader::populate_column_family: lookup and remove temp sstable directories
These may be left over in case we crash while writing sstables.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
857ff4f59a database: directly use std::experimental::filesystem::path for lister::path
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
585ac6e641 database: use std::experimental::filesystem::path for lister::path
We would like to get rid of boost::filesystem and gradually replace it with
std::experimental::filesystem.

TODO: using namespace fs = std::experimental::filesystem,
use fs::path directly, rather than lister::path

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
0b74927757 sstable: use std::experimental::filesystem rather than boost
Note: Requires linking with -lstdc++fs

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
61d116a1f1 sstable::seal_sstable: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
90118fa9ef sstable: create sstable component files in a subdirectory
When writing the sstable, create a temporary directory
for creating all components so that each sstable files' will be
assigned a different allocaton groups on xfs.

Files are immediately renamed to their default location after creation.
Temp directory is removed when the sstable is sealed.

Additional work to be introduced in the following patches:
- When populating tables, temp directories need to be looked up and removed.

Fixes #3167

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
23d8afb20d sstable::new_sstable_component_file: pass component_type rather than filename
So we can create the file in the sstable directory and then move into the final location

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
7b170eb0dc sstable: cleanup filename related functions
- use const sstring& params rather than sstring
- returning const sstring is superfleous

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Benny Halevy
ad5f1e4fbb sstable: make write_crc, write_digest, and new_sstable_component_file private methods
Prepare for per-sstable sub directory.
Also, these functions get most of their parameters from the sst at hand so they might
as well be first class members.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-12-02 22:02:10 +02:00
Avi Kivity
2a0a36d48b tools: update toolchain to fedora-29-20181202
Added: git, sudo, python
Message-Id: <20181202185608.14141-1-avi@scylladb.com>
2018-12-02 19:00:55 +00:00
Benny Halevy
d257e5c123 sstable: remove unused get_sstable_key_range
Since 024c8ef8a1
db: adjust sstable load to use sstable self-reporting of shard ownership

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181202114523.14296-1-bhalevy@scylladb.com>
2018-12-02 18:32:34 +02:00
Avi Kivity
224c4c0b81 tools: add frozen toolchain support
Add a reference to a docker image that contains an "official" toolchain
for building Scylla. In addition, add a script that allows easy usage of
the image, and some documentation.
Message-Id: <20181202120829.21218-1-avi@scylladb.com>
2018-12-02 18:32:34 +02:00
Takuya ASADA
0fdf807f51 install-dependencies.sh: add missing packages to run build in Fedora container
git, python, sudo packages are installed by default on normal Fedora
installation but not in Docker image, we need to install it by this
script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181201020834.24961-1-syuu@scylladb.com>
2018-12-02 12:51:29 +02:00
Avi Kivity
009cbd3dcb Merge "Fix multiple summary regeneration bugs." from Vladimir
"
This patchset addresses two recently discovered bugs both triggered by
summary regeneration:

Tests: unit {release}

+

Validated with debug build of Scylla (ASAN) that no use-after-free
occurs when re-generating Summary.db.
"

* 'projects/sstables-30/summary-regeneration/v1' of https://github.com/argenet/scylla:
  tests: Add test reading SSTables in 'mc' format with missing summary.
  sstables: When loading, read statistics before summary.
  database: Capture io_priority_class by reference to avoid dangling ref.
2018-12-02 11:56:18 +02:00
Vladimir Krivopalov
d24875b736 tests: Add test reading SSTables in 'mc' format with missing summary.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-11-30 11:56:56 -08:00
Vladimir Krivopalov
b0e5404071 sstables: When loading, read statistics before summary.
In case if summary is missing and we attempt to re-generate it,
statistics must be already read to provide us with values stored in
serialization header to facilitate clustering prefixes deserialization.

Fixes #3947

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-11-30 11:56:56 -08:00
Vladimir Krivopalov
68458148e7 database: Capture io_priority_class by reference to avoid dangling ref.
The original reference points to a thread-local storage object that
guaranteed to outlive the continuation, but copying it make the
subsequent calls point to a local object and introduces a use-after-free
bug.

Fixes #3948

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-11-30 10:43:36 -08:00
Piotr Jastrzebski
329303cae7 sstables: Add test_sstable_reader_on_unknown_column
This test checks that sstable reader throws an exception
when sstable contains a column that's not present in the schema.

It also checks that dropped columns do not cause exceptions.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-30 10:29:47 +01:00
Piotr Jastrzebski
5cc3f904ce sstables: Exception on sstable's column not present in schema
Previously such column was ignored but it's better to be explicit
about this situation.

Refs #2598

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-30 08:59:13 +01:00
Piotr Jastrzebski
c0ce94c6f9 sstables: store column name in column_translation::column_info
This will be used for better diagnostics.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-30 08:59:00 +01:00
Duarte Nunes
1afda28cf3 Merge 'Fix filtering with LIMIT' from Piotr
"
This series adds proper handling of filtering queries with LIMIT.
Previously the limit was erroneously applied before filtering,
which leads to truncated results.

To avoid that, paged filtering queries now use an enhanced pager,
which remembers how many rows dropped and uses that information
to fetch for more pages if the limit is not yet reached.

For unpaged filtering queries, paging is done internally as in case
of aggregations to avoid returning keeping huge results in memory.

Also, previously, all limited queries used the page size counted
from max(page size, limit). It's not good for filtering,
because with LIMIT 1 we would then query for rows one-by-one.
To avoid that, filtered queries ask for the whole page and the results
are truncated if need be afterwards.

Tests: unit (release)
"

* 'fix_filtering_with_limit_2' of https://github.com/psarna/scylla:
  tests: add filtering with LIMIT test
  tests: split filtering tests from cql_query_test
  cql3: add proper handling of filtering with LIMIT
  service/pager: use dropped_rows to adjust how many rows to read
  service/pager: virtualize max_rows_to_fetch function
  cql3: add counting dropped rows in filtering pager
2018-11-29 23:07:40 +00:00
Piotr Jastrzebski
654eeb30ac sstables: Make test_dropped_column_handling test dropped columns
Before it was testing missing columns.
It's better to test dropped columns because they should be ignored
while for missing columns some sources will throw.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-29 16:16:44 +01:00
Avi Kivity
2dba809844 Merge "scylla_io_setup: support multiple devices" from Benny
"
This patchset adds support to scylla_io_setup for
multiple data directories as well as commitlog,
hints, and saved_caches directories.

Refs #2415

Tests: manual testing with scylla-ccm generated scylla.yaml
"

* 'projects/multidev/v3' of https://github.com/bhalevy/scylla:
  scylla_io_setup: assume default directories under /var/lib/scylla
  scylla_io_setup: add support for commitlog, hints, and saved_caches directory
  scylla_io_setup: support multiple data directories
2018-11-29 16:44:33 +02:00
Piotr Sarna
7adbdaba0b tests: add filtering with LIMIT test
Refs #3902
2018-11-29 14:53:30 +01:00
Piotr Sarna
5f97c78875 tests: split filtering tests from cql_query_test
In order to avoid blowing cql_query_test even more out of proportions,
all filtering tests are moved to a separate file.
2018-11-29 14:53:30 +01:00
Piotr Sarna
acf4eadf88 cql3: add proper handling of filtering with LIMIT
Previously, limit was erroneously applied before filtering,
which might have resulted in truncated results.
Now, both paged and unpaged queries are filtered first,
and only after that properly trimmed so only X rows are returned
for LIMIT X.

Fixes #3902
2018-11-29 14:53:30 +01:00
Piotr Sarna
5b052bdae5 service/pager: use dropped_rows to adjust how many rows to read
Filtering pager may drop some rows and as a result return less
than what was fetched from the replica. To properly adjust how
many rows were actually read, dropped_rows variable is introduced.
2018-11-29 14:53:29 +01:00
Piotr Sarna
021caeddf7 service/pager: virtualize max_rows_to_fetch function
Regular pagers use max_rows to figure out how many rows to fetch,
but filtering pager potentially needs the whole page to be fetched
in order to filter the results.
2018-11-29 14:14:37 +01:00
Benny Halevy
5ec191536e scylla_io_setup: assume default directories under /var/lib/scylla
If a specific directory is not configure in scylla.yaml, scylla assumes
a default location under /var/lib/scylla.

Hard code these locations in scylla_io_setup until we have a better way
to probe scylla about it.

Be permissive and ignore the default directories if they don't not exist
on disk and silently ignore them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-11-29 15:07:29 +02:00
Piotr Sarna
4f5ee3dfcd cql3: add counting dropped rows in filtering pager
Counter for dropped rows is added to the filtering pager.
This metrics can be used later to implement applying LIMIT
to filtering queries properly.
Dropped rows are returned on visitor::accept_partition_end.
2018-11-29 14:06:59 +01:00
Benny Halevy
88b85b363a scylla_io_setup: add support for commitlog, hints, and saved_caches directory
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-11-29 10:09:17 +02:00
Benny Halevy
e4382caa4a scylla_io_setup: support multiple data directories
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-11-29 10:09:17 +02:00
Alexys Jacob
00476c3946 scylla-housekeeping: fix python3 compat and shebang 2018-11-29 00:04:02 +01:00
Alexys Jacob
1cf41760a8 dist/ami/files/scylla_install_ami: python3 shebang 2018-11-29 00:00:41 +01:00
Alexys Jacob
a6447f543c dist/docker/redhat/docker-entrypoint.py: add encoding comment 2018-11-29 00:00:19 +01:00
Alexys Jacob
9f041158df fix_system_distributed_tables.py: fix python3 compat and shebang 2018-11-28 23:59:51 +01:00
Alexys Jacob
887322daa2 gen_segmented_compress_params.py: add encoding comment 2018-11-28 23:59:18 +01:00
Alexys Jacob
14e65e1089 idl-compiler.py: python3 shebang 2018-11-28 23:58:38 +01:00
Alexys Jacob
170120a391 scylla-gdb.py: python3 shebang 2018-11-28 23:58:14 +01:00
Alexys Jacob
3902922113 configure.py: python3 shebang 2018-11-28 23:57:54 +01:00
Alexys Jacob
d2dbbba139 tools/scyllatop/: add / normalize python3 shebang 2018-11-28 23:57:03 +01:00
Alexys Jacob
e321b839c7 scripts/: add / normalize python3 shebang 2018-11-28 23:56:35 +01:00
Alexys Jacob
02656fb00e dist/common/scripts: add / normalize python3 shebang 2018-11-28 23:55:26 +01:00
Alexys Jacob
954da947f8 test.py: add encoding comment 2018-11-28 23:54:41 +01:00
Alexys Jacob
cbd72786dd setup.py: add python3 classifiers 2018-11-28 23:54:03 +01:00
Dan Yasny
019a2e3a27 scylla_setup: Mark required args
Fixes #3945

Message-Id: <20181128220549.3083-1-dyasny@gmail.com>
2018-11-28 22:30:02 +00:00
Avi Kivity
de17150cb2 Update seastar submodule
* seastar 1fbb633...132e6cd (2):
  > scripts: json2code: port to Python 3
  > docker/dev/Dockerfile: add c-ares-devel to docker setup
2018-11-28 19:05:21 +02:00
Duarte Nunes
a589dade07 Merge 'Fix checking for multi-column restrictions in filtering' from Piotr
"
This series fixes #3891 by amending the way restrictions
are checked for filtering. Previous implementation that returned
false from need_filtering() when multi-column restrictions
were present was incorrect.
Now, the error is going to be returned from restrictions filter layer,
and once multi-column support is implemented for filtering, it will
require no further changes.

Tests: unit (release)
"

* 'fix_multi_column_filtering_check_3' of https://github.com/psarna/scylla:
  tests: add multi-column filtering check
  cql3: remove incorrect multi-column check
  cql3: check filtering restrictions only if applicable
  cql3: add pk/ck_restrictions_need_filtering()
2018-11-28 15:36:37 +00:00
Piotr Sarna
ae0ffa6575 tests: add multi-column filtering check
Multi-column restrictions filtering is not supported yet,
so a simple case to ensure that is added.
2018-11-28 13:58:16 +01:00
Piotr Sarna
0013929782 cql3: remove incorrect multi-column check
need_filtering() incorrectly returned false if multi-column restrictions
were present. Instead, these restrictions should be allowed to need
filtering.

Fixes #3891
2018-11-28 13:58:16 +01:00
Piotr Sarna
65f21cc518 cql3: check filtering restrictions only if applicable
Primary key restrictions should be checked only when they need
filtering - otherwise it's superfluous, since they were already
applied on query level.
2018-11-28 13:58:16 +01:00
Piotr Sarna
f59ddcab52 cql3: add pk/ck_restrictions_need_filtering()
These functions return true if partition/clustering key restriction
parts of statement restrictions require filtering.
2018-11-28 13:58:16 +01:00
Duarte Nunes
d09d4bbd91 Merge 'Fix checking if system tables need view updates' from Piotr
"
This miniseries ensures that system tables are not checked
for having view updates, because they never do.
What's more, distributed system table is used in the process,
so it's unsafe to query the table while streaming it.

Tests: unit (release), dtest(update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test)
"

* 'fix_checking_if_system_tables_need_view_updates_3' of https://github.com/psarna/scylla:
  streaming: don't check view building of system tables
  database: add is_internal_keyspace
  streaming: remove unused sstable_is_staging bool class
2018-11-28 10:00:34 +00:00
Piotr Sarna
8e6021dfa1 streaming: don't check view building of system tables
System tables will never need view building, and, what's more,
are actually used in the process of view build checking.
So, checking whether system tables need a view update path
is simplified to returning 'false'.
2018-11-28 09:21:56 +01:00
Piotr Sarna
1336b9ee31 database: add is_internal_keyspace
Similarly to is_system_keyspace, it will allow checking if a keyspace
is created for internal use.
2018-11-28 09:21:56 +01:00
Piotr Sarna
6ad2c39f88 streaming: remove unused sstable_is_staging bool class
sstable_is_staging bool class is not used anywhere in the code anymore,
so it's removed.
2018-11-28 09:21:56 +01:00
Duarte Nunes
9f639edaa2 Merge 'storage_proxy: fix some bugs in early (due to errors) request completion' from Gleb
"
The series fixed #3565 and #3566
"

* 'gleb/write_failure_fixes' of github.com:scylladb/seastar-dev:
  storage_proxy: store hint for CL=ANY if all nodes replied with failure
  storage_proxy: complete write request early if all replicas replied with success of failure
  storage_proxy: check that write failure response comes from recognized replica
  storage_proxy: move code executed on write timeout into separate function
2018-11-27 21:44:01 +00:00
Takuya ASADA
52f030806f install-dependencies.sh: fix dependency issues on Debian variants
Sync Debian variants dependencies with dist/debian/control.mustache
(before merging relocatable), use scylla 3rdparty packages.

Since we use 3rdparty repo on seastar/install-dependencies.sh, drop repo
setup part from this script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181031122800.11802-1-syuu@scylladb.com>
2018-11-27 21:44:01 +00:00
Gleb Natapov
17197fb005 storage_proxy: store hint for CL=ANY if all nodes replied with failure
Current code assumes that request failed if all replicas replied with
failure, but this is not true for CL=ANY requests. Take it into account.

Fixed: #3565
2018-11-27 15:06:37 +02:00
Gleb Natapov
d1d04eae3c storage_proxy: complete write request early if all replicas replied with success of failure
Currently if write request reaches CL and all replicas replied, but some
replied with failures, the request will wait for timeout to be retired.
Detect this case and retire request immediately instead.

Fixes #3566
2018-11-27 14:49:37 +02:00
Gleb Natapov
76ab3d716b storage_proxy: check that write failure response comes from recognized replica
Before accounting failure response we need to make sure it comes from a
replica that participates in the request.
2018-11-27 14:44:49 +02:00
Rafael Ávila de Espíndola
777ea893e6 Delete data_consume_rows_at_once.
As far as I can tell the old sstable reading code required reading the
data into a contiguous buffer. The function data_consume_rows_at_once
implemented the old behavior and incrementally code was moved away
from it.

Right now the only use is in two tests. The sstables used in those
tests are already used in other tests with data_consume_rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181127024319.18732-2-espindola@scylladb.com>
2018-11-27 14:11:50 +02:00
Avi Kivity
1ff6b8fb96 Merge "Don't binary compare compressed sstables in test_write_many_partitions_* tests" from Piotr
"
Compression is not deterministic so instead of binary comparing the sstable files we just read data back
and make sure everything that was written down is still present.

Tests: unit(release)
"

* 'haaawk/binary-compare-of-compressed-sstables/v3' of github.com:scylladb/seastar-dev:
  sstables: Remove compressed parameter from get_write_test_path
  sstables: Remove unused sstable test files
  sstables: Ensure compare_sstables isn't used for compressed files
  sstables: Don't binary compare compressed sstables
  sstables: Remove debug printout from test_write_many_partitions
2018-11-27 14:01:20 +02:00
Duarte Nunes
098dd90bd2 Merge 'Reduce dependencies around consistency_level.hh' from Avi
"
consistency_level.hh is rather heavyweighy in both its contents and what it
includes. Reduce the number of inclusion sites and split the file to reduce
dependencies.
"

* tag 'cl-header/v2' of https://github.com/avikivity/scylla:
  consistency_level: simplify validation API
  Split consistency_level.hh header
  database: remove unneeded consistency_level.hh include
  cql: remove unneeded includes of consistency_level.hh
2018-11-27 11:59:34 +00:00
Piotr Jastrzebski
4366302c4c sstables: Extract mp_row_cosumer_m::check_schema_mismatch
This method will contain common logic used in multiple places
and reduce code duplication.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <bbda2f4ea4f9325055f096dc549f63b1bb03d3b6.1543311990.git.piotr@scylladb.com>
2018-11-27 12:45:12 +01:00
Avi Kivity
4676e07400 consistency_level: simplify validation API
Remove unused parameters, replace refcounted pointers by references.
2018-11-27 13:41:49 +02:00
Avi Kivity
2c08bff8d5 Split consistency_level.hh header
It has two unrelated users: cql for validation, and storage_proxy for
complicated calculations. Split the simple stuff into a new header to reduce
dependencies.
2018-11-27 13:32:10 +02:00
Avi Kivity
b015f41344 database: remove unneeded consistency_level.hh include 2018-11-27 13:30:56 +02:00
Gleb Natapov
7bc68aa0eb storage_proxy: move code executed on write timeout into separate function
Currently the callback is in lambda, but we will want to call the code
not only during timer expiration.
2018-11-27 13:23:30 +02:00
Avi Kivity
9201d22c06 cql: remove unneeded includes of consistency_level.hh
Move the includes to .cc to reduce include pollution.
2018-11-27 13:18:33 +02:00
Raphael S. Carvalho
626afa6973 database: conditionally release sstable references from compaction manager
Not all compaction operations submitted through compaction manager sets a callback
for releasing references of exhausted sstables in compaction manager itself.
That callback lives in compaction descriptor which is passed to table::compaction().
Let's make the call conditional to avoid bad function call exceptions.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181126235616.10452-1-raphaelsc@scylladb.com>
2018-11-27 12:10:43 +01:00
Avi Kivity
2eaeb3e4eb Update swagger-ui submodule
Updates to version 2.2.10 with a local change (from Amnon) to support our location.

Fixes #3942.
2018-11-27 13:01:02 +02:00
Tomasz Grabiec
17a8a9d13d gdb: Properly parse unique_ptr in 'scylla lsa'
There's no _M_t._M_head_impl any more in the standard library.

We now have std_unique_ptr wrapper which abstracts this fact away so
use that.

Message-Id: <20181126174837.11542-1-tgrabiec@scylladb.com>
2018-11-27 12:32:41 +02:00
Tomasz Grabiec
eecda72175 gdb: Adjust 'scylla lsa' for removal of emergency reserve
There's no _emergency_reserve any more. Show _free_segments instead.

Message-Id: <20181126174837.11542-2-tgrabiec@scylladb.com>
2018-11-27 12:32:37 +02:00
Avi Kivity
5e759b0c07 Merge "Optimize checksum computation for the MC sstable format" from Tomek
"
One part of the improvement comes from replacing zlib's CRC32 with the one
from libdeflate, which is optimized for modern architecture and utilizes the
PCLMUL instruction.

perf_checksum test was introduced to measure performance of various
checksumming operations.

Results for 514 B (relevant for writing with compression enabled):

    test                                      iterations      median         mad         min         max
    crc_test.perf_deflate_crc32_combine            58414    16.711us     3.483ns    16.708us    16.725us
    crc_test.perf_adler_combine                165788278     6.059ns     0.031ns     6.027ns     7.519ns
    crc_test.perf_zlib_crc32_combine               59546    16.767us    26.191ns    16.741us    16.801us
    ---
    crc_test.perf_deflate_crc32_checksum        12705072    83.267ns     4.580ns    78.687ns    98.964ns
    crc_test.perf_adler_checksum                 3918014   206.701ns    23.469ns   183.231ns   258.859ns
    crc_test.perf_zlib_crc32_checksum            2329682   428.787ns     0.085ns   428.702ns   510.085ns

Results for 64 KB (relevant for writing with compression disabled):

    test                                      iterations      median         mad         min         max
    crc_test.perf_deflate_crc32_combine            25364    38.393us    17.683ns    38.375us    38.545us
    crc_test.perf_adler_combine                169797143     5.842ns     0.009ns     5.833ns     6.901ns
    crc_test.perf_zlib_crc32_combine               26067    38.663us    95.094ns    38.546us    40.523us
    ---
    crc_test.perf_deflate_crc32_checksum          202821     4.937us    14.426ns     4.912us     5.093us
    crc_test.perf_adler_checksum                   44684    22.733us   206.263ns    22.492us    25.258us
    crc_test.perf_zlib_crc32_checksum              18839    53.049us    36.117ns    53.013us    53.274us

The new CRC32 implementation (deflate_crc32) doesn't provide a fast
checksum_combine() yet, it delegates to zlib so it's as slow as the latter.

Because for CRC32 checksum_combine() is several orders of magnitude slower
than checksum(), we avoid calling checksum_combine() completely for this
checksummer. We still do it for adler32, which has combine() which is faster
than checksum().

SStable write performance was evaluated by running:

  perf_fast_forward --populate --data-directory /tmp/perf-mc \
     --rows=10000000 -c1 -m4G --datasets small-part

Below is a summary of the average frag/s for a memtable flush. Each result is
an average of about 20 flushes with stddev of about 4k.

Before:

 [1] MC,lz4: 330'903
 [2] LA,lz4: 450'157
 [3] MC,checksum: 419'716
 [4] LA,checksum: 459'559

After:

 [1'] MC,lz4: 446'917 ([1] + 35%)
 [2'] LA,lz4: 456'046 ([2] + 1.3%)
 [3'] MC,checksum: 462'894 ([3] + 10%)
 [4'] LA,checksum: 467'508 ([4] + 1.7%)

After this series, the performance of the MC format writer is similar to that
of the LA format before the series.

There seems to be a small but consistent improvement for LA too. I'm not sure
why.
"

* tag 'improve-mc-sstable-checksum-libdeflate-v3' of github.com:tgrabiec/scylla:
  tests: perf: Introduce perf_checksum
  tests: Add test for libdeflate CRC32 implementation
  sstables: compress: Use libdeflate for crc32
  sstables: compress: Rename crc32_utils to zlib_crc32_checksummer
  licenses: Add libdeflate license
  Integrate libdeflate with the build system
  Add libdeflate submodule
  sstables: Avoid checksum_combine() for the crc32 checksummer
  sstables: compress: Avoid unnecessary checksum_combine()
  sstables: checksum_utils: Add missing include
2018-11-26 20:10:46 +02:00
Tomasz Grabiec
f1a35b654a tests: perf: Introduce perf_checksum 2018-11-26 18:59:43 +01:00
Tomasz Grabiec
5b6e3fb5ed tests: Add test for libdeflate CRC32 implementation 2018-11-26 18:59:42 +01:00
Tomasz Grabiec
bf0164cdaf sstables: compress: Use libdeflate for crc32
Improves memtable flush performance by 10% in a CPU-bound case.

Unlike the zlib implementation, libdeflate is optimized for modern
CPUs. It utilizes the PCLMUL instruction.
2018-11-26 18:59:42 +01:00
Tomasz Grabiec
0ac1905f4f sstables: compress: Rename crc32_utils to zlib_crc32_checksummer 2018-11-26 18:59:42 +01:00
Tomasz Grabiec
ba141a4852 licenses: Add libdeflate license 2018-11-26 18:59:41 +01:00
Tomasz Grabiec
048d569b45 Integrate libdeflate with the build system 2018-11-26 18:59:09 +01:00
Tomasz Grabiec
f704f7bc19 Add libdeflate submodule 2018-11-26 18:57:51 +01:00
Tomasz Grabiec
743cf43847 sstables: Avoid checksum_combine() for the crc32 checksummer
checksum_combine() is much slower than re-feeding the buffer to
checksum() for the zlib CRC32 checksummer.

Introduce Checksum::prefer_combine() to determine this and select
more optimal behavior for given checksummer.

Improves performance of memtable flush with compression enabled by 30%.
2018-11-26 18:57:33 +01:00
Avi Kivity
b351a9fee7 db/repair_decision.hh: add missing #include
Message-Id: <20181126154948.2453-1-avi@scylladb.com>
2018-11-26 18:49:08 +01:00
Tomasz Grabiec
88cf1c61ba sstables: compress: Avoid unnecessary checksum_combine() 2018-11-26 14:31:38 +01:00
Tomasz Grabiec
8372cf7bcc sstables: checksum_utils: Add missing include 2018-11-26 14:31:38 +01:00
Avi Kivity
c6d700279b class_registry: introduce a non-static variant of class_registry
class_registry's staticness brings has the usual problem of
static classes (loss of dependency information) and prevents us
from librarifying Scylla since all objects that define a registration
must be linked in.

Take a first step against this staticness by defining a nonstatic
variant. The static class_registry is then redefined in terms of the
nonstatic class. After all uses have been converted, the static
variant can be retired.
Message-Id: <20181126130935.12837-1-avi@scylladb.com>
2018-11-26 13:30:21 +00:00
Paweł Dziepak
62ea153629 Merge "Check for schema mismatch after dropping dead cells" from Piotr
"
Previously we were checking for schema incompatibility between current schema and sstable
serialization header before reading any data. This isn't the best approach because
data in sstable may be already irrelevant due to column drop for example.

This patchset moves the check after actual data is read and verified that it has
a timestamp new enough to classify it as nonobsolete.

Fixes #3924
"

* 'haaawk/3924/v3' of github.com:scylladb/seastar-dev:
  sstables: Enable test_schema_change for MC format
  sstables3: Throw error on schema mismatch only for live cells
  sstables: Pass column_info to consume_*_column
  sstables: Add schema_mismatch to column_info
  sstables: Store column data type in column_info
  sstables: Remove code duplication in column_translation
2018-11-26 13:10:18 +00:00
Avi Kivity
9a46ee69d4 doc: fix BYPASS CACHE documentation
BYPASS CACHE was mistakenly documenting an earlier version of the patch.
Correct it to document th committed version.
Message-Id: <20181126125810.9344-1-avi@scylladb.com>
2018-11-26 13:04:52 +00:00
Piotr Jastrzebski
dec48dd1e2 sstables: Remove compressed parameter from get_write_test_path
This parameter is no longer used.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:46:23 +01:00
Piotr Jastrzebski
92ffccd636 sstables: Remove unused sstable test files
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:35:15 +01:00
Piotr Jastrzebski
a29c9189cb sstables: Ensure compare_sstables isn't used for compressed files
Binary comparing compressed sstables is wrong because compression
is not deterministic.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:35:15 +01:00
Piotr Jastrzebski
7e263208f0 sstables: Don't binary compare compressed sstables
This family of test_write_many_partitions_* tests writes
sstables down from memtable using different compressions.
Then it compares the resulting file with a blueprint file
and reads the data back to check everything is there.

Compression is not deterministic so this patch makes the
tests not compare resulting compressed sstable file with blueprint
file and instead only read data back.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:35:03 +01:00
Piotr Jastrzebski
5c86294a56 sstables: Enable test_schema_change for MC format
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:25:23 +01:00
Piotr Jastrzebski
4bdb86c712 sstables3: Throw error on schema mismatch only for live cells
Previously we were throwing exception during the creation of
column_translation. This wasn't always correct because sometimes
column for which the mismatch appeared was already dropped and
data present in sstable should be ignored anyway.

Fixes #3924

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-26 13:25:10 +01:00
Piotr Sarna
6ab8235369 main: fix deinitialization order for view update generator
View update generator should be stopped only after
drain_on_shutdown() is performed on storage service.
Message-Id: <4d2bda4c73422a2ebf46d6dcd06c95d960839889.1543230849.git.sarna@scylladb.com>
2018-11-26 11:21:37 +00:00
Duarte Nunes
2a371c2689 Merge 'Allow bypassing cache on a per-query basis' from Avi
"
Some queries are very unlikely to hit cache. Usually this includes
range queries on large tables, but other patterns are possible.

While the database should adapt to the query pattern, sometimes the
user has information the database does not have. By passing this
information along, the user helps the database manage its resources
more optimally.

To do this, this patch introduces a BYPASS CACHE clause to the
SELECT statement. A query thus marked will not attempt to read
from the cache, and instead will read from sstables and memtables
only. This reduces CPU time spent to query and populate the cache,
and will prevent the cache from being flooded with data that is
not likely to be read again soon. The existing cache disabled path
is engaged when the option is selected.

Tests: unit (release), manual metrics verification with ccm with and without the
    BYPASS CACHE clause.

Ref #3770.
"

* tag 'cache-bypass/v2' of https://github.com/avikivity/scylla:
  doc: document SELECT ... BYPASS CACHE
  tests: add test for SELECT ... BYPASS CACHE
  cql: add SELECT ... BYPASS CACHE clause
  db: add query option to bypass cache
2018-11-26 09:59:40 +00:00
Paweł Dziepak
13385778fd Merge "Measure performance of dataset population in perf_fast_forward" from Tomasz
* tag 'perf-ffwd-dataset-population-v2' of github.com:tgrabiec/scylla:
  tests: perf_fast_forward: Measure performance of dataset population
  tests: perf_fast_forward: Record the dataset on which test case was run
  tests: perf_fast_forward: Introduce the concept of a dataset
  tests: perf_fast_forward: Introduce make_compaction_disabling_guard()
  tests: perf_fast_forward: Initialize output manager before population
  tests: perf_fast_forward: Handle empty test parameter set
  tests: perf_fast_forward: Extract json_output_writer::write_common_test_group()
  tests: perf_fast_forward: Factor out access to cfg to a single place per function
  tests: perf_fast_forward: Extract result_collector
  tests: perf_fast_forward: Take writes into account in AIO statistics
  tests: perf_fast_forward: Reorder members
  tests: perf_fast_forward: Add --sstable-format command line option
2018-11-26 09:45:55 +00:00
Avi Kivity
58033ad3a4 doc: document SELECT ... BYPASS CACHE
Add a new cql-extensions.md file and document BYPASS CACHE there.
2018-11-26 11:37:52 +02:00
Avi Kivity
f69401c609 tests: add test for SELECT ... BYPASS CACHE
The test verifies that cache read metrics are not incremented during a cache
bypass read.
2018-11-26 11:37:52 +02:00
Avi Kivity
ecf3f92ec7 cql: add SELECT ... BYPASS CACHE clause
The BYPASS CACHE clause instructs the database not to read from or populate the
cache for this query. The new keywords (BYPASS and CACHE) are not reserved.
2018-11-26 11:37:49 +02:00
Takuya ASADA
7740cd2142 dist/common/systemd/scylla-housekeeping-restart.service.mustache: specify correct repo for Debian variants
We do specify correct repo for both Red Hat/Debian variants on -deily, but
mistakenly don't for -restart, so do same on -restart.

Fixes #3906

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181109224509.27380-1-syuu@scylladb.com>
2018-11-26 11:02:25 +02:00
Rafael Ávila de Espíndola
6746907999 Use fully covered switches in continuous_data_consumer
do_process_buffer had two unreachable default cases and a long
if-else-if chain.

This converts the the if-else-if chain to a switch and a helper
function.

This moves the error checking from run time to compile time. If we
were to add a 128 bit integer for example, gcc would complain about it
missing from the switch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181125221451.106067-1-espindola@scylladb.com>
2018-11-25 22:52:11 +00:00
Avi Kivity
b4765af790 Merge "Introduce SSTable-run-based compaction" from Raphael
"
This new compaction approach consists of releasing exhausted fragments[1] of a run[2] a
compaction proceeds, so decreasing considerably the space requirement.
These changes will immediately benefit leveled strategy because it already works with
the run concept.

[1] fragment is a sstable composing a run; exhausted means sstable was fully consumed
by compaction procedure.
[2] run is a set of non-overlapping sstables which roughly span the
entire token range.

Note:
Last patch includes an example compaction strategy showing how to work with the interface.

unit tests: all modes passing
dtests: compaction ones passing
"

* 'sstable_run_based_compaction_v10' of github.com:raphaelsc/scylla:
  tests: add example compaction strategy for sstable run based approach
  sstables/compaction: propagate sstable replacement to all compaction of a CF
  sstables: store cf pointer in compaction_info
  tests/sstable_test: add test for compaction replacement of exhausted sstable
  sstables: add sstable's on closed handling
  tests/sstables: add test for sstable run based compaction
  sstables/compaction_manager: prevent partial run from being selected for compaction
  compaction: use same run identifier for sstables generated by same compaction
  sstables: introduce sstable run
  sstables/compaction_manager: release reference to exhausted sstable through callback
  sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor
  database: do not keep reference to sstable in selector when done selecting
  compaction: share sstable set with incremental reader selector
  sstables/compaction: release space earlier of exhausted input sstables
  sstables: make partitioned sstable set's incremental selector resilient to changes in the set
  database: do not store reference to sstable in incremental selector
  tests/sstables: add run identifier correctness test
  sstables: use a random uuid for sstables without run identifier
  sstables: add run identifier to scylla metadata
2018-11-25 17:20:24 +02:00
Avi Kivity
b835b93ee6 db: add query option to bypass cache
With the option enabled, we bypass the cache unconditionally and only
read from memtables+sstables. This is useful for analytics queries.
2018-11-25 16:26:08 +02:00
Piotr Jastrzebski
c2561a2796 sstables: Remove debug printout from test_write_many_partitions
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-25 13:29:10 +01:00
Raphael S. Carvalho
3fa70d6b5f tests: add example compaction strategy for sstable run based approach
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 20:16:54 -02:00
Raphael S. Carvalho
2058001f94 sstables/compaction: propagate sstable replacement to all compaction of a CF
This is needed for parallel compaction to work with sstable run based approach.
That's because regular compaction clones a set containing all sstables of its
column family. So compaction A can potentially hold a reference to a compacting
sstable of compaction B, so preventing compacting B from releasing its exhausted
sstable.

So all replacements are propagated to all compactions of a given column family,
and compactions in turn, including the one which initiated the propagation,
will do the replacement.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:30 -02:00
Raphael S. Carvalho
953fdcc867 sstables: store cf pointer in compaction_info
motivation is that we need a more efficient way to find compactions
that belong to a given column family in compaction list.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:28 -02:00
Raphael S. Carvalho
baf89f0df3 tests/sstable_test: add test for compaction replacement of exhausted sstable
Make sure that compaction is capable of releasing exhausted sstable space
early in the procedure.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:26 -02:00
Raphael S. Carvalho
824c20b76d sstables: add sstable's on closed handling
Motivation is that it will be useful for catching regression on compaction
when releasing early exhausted sstables. That's because sstable's space
is only released once it's closed. So this will allow us to write a test
case and possibly use it for entities holding exhausted sstable.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:25 -02:00
Raphael S. Carvalho
0085e8371d tests/sstables: add test for sstable run based compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:23 -02:00
Raphael S. Carvalho
e88d1d54b9 sstables/compaction_manager: prevent partial run from being selected for compaction
Filter out sstable belonging to a partial run being generated by an ongoing
compaction. Otherwise, that could lead to wrong decisions by the compaction
strategy.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:22 -02:00
Raphael S. Carvalho
23884fe9f6 compaction: use same run identifier for sstables generated by same compaction
SSTables composing the same run will share the same run identifier.
Therefore, a new compaction strategy will be able to get all sstables belong
to the same run from sstable_set, which now keeps track of existing runs.

Same UUID is passed to writers of a given compaction. Otherwise, a new UUID
is picked for every sstable created by compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:20 -02:00
Raphael S. Carvalho
4f68cb34a6 sstables: introduce sstable run
sstable run is a structure that will hold all sstables that has the same
run identifier. All sstables belonging to the same run will not overlap
with one another.
It can be used by compaction strategy to work on runs instead of individual
sstables.

sstable_set structure which holds all sstables for a given column family
will be responsible for providing to its user an interface to work with
runs instead of individual sstables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:18 -02:00
Raphael S. Carvalho
fc92fb955d sstables/compaction_manager: release reference to exhausted sstable through callback
That's important for the reference to sstable to not be kept throughout
the compaction procedure, which would break the goal of releasing
space during compaction.

Manager passes a callback to compaction which calls it whenever
there's sstable replacement.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:16 -02:00
Raphael S. Carvalho
3f309ebba9 sstables/compaction: stop tracking exhausted input sstable in compaction_read_monitor
Motivation is that we want to release space for exhausted sstable and that
will only happen when all references to it are gone *and* that backlog
tracker takes the early replacement into account.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:13 -02:00
Raphael S. Carvalho
3433de3dc0 database: do not keep reference to sstable in selector when done selecting
When compacting, we'll create all readers at once and will not select
again from incremental selector, meaning the selector will keep all
respective sstables in current_sstables, preventing compaction from
releasing space as it goes on.

The change is about refreshing sstable set's selector such that it
will not hold a reference to an exhausted sstable whatsoever.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:12 -02:00
Raphael S. Carvalho
f6df949c1a compaction: share sstable set with incremental reader selector
By doing that, we'll be able to release exhausted sstable from both
simulteaneously.
That's achieved by sharing set containing input sstables with the incremental
reader selector and removing exhausted sstables from shared set when the
time has come.

Step towards reducing disk requirement for compaction by making it delete
sstable which all data is in a sealed new sstable. For that to happen,
all references must be gone.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:10 -02:00
Raphael S. Carvalho
e5a0b05c15 sstables/compaction: release space earlier of exhausted input sstables
Currently, compaction only replace input sstables at end of compaction,
meaning compaction must be finished for all the space of those sstables
to be released.

What we can do instead is to delete earlier some input sstable under
some conditions:

1) SStable data should be committed to a new, sealed output sstable,
meaning it's exhausted.
2) Exhausted sstable mustn't overlap with a non-exhausted sstable
because a tombstone in the exhausted could have been purged and the
shadowed data in non-exhausted could be ressurected if system
crashes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:07 -02:00
Raphael S. Carvalho
ace070c8fc sstables: make partitioned sstable set's incremental selector resilient to changes in the set
The motivation is that compaction may remove a sstable from the set while the
incremental selector is alive, and for that to work, we need to invalidate
the iterators stored by the selector. We could have added a method to notify
it, but there will be a case where the one keeping the set cannot forward
the notification to the selector. So it's better for the selector to take
care of itself. Change counter approach is used which allows the selector
to know when to invalidate the iterators.

After invalidation, selector will move the iterator back into its right
place by looking for lower bound for current pos.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:05 -02:00
Raphael S. Carvalho
8d11b0bbb4 database: do not store reference to sstable in incremental selector
Use sstable generation instead to keep track of read sstables.
The motivation is that we'll not keep reference to sstables, so allowing
their space on disk to be released as soon they get exhausted.
Generation is used because it guarantees uniqueness of the sstable.

Reviewed-by: Botond Dénes <bdenes@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:04 -02:00
Raphael S. Carvalho
edc87014c1 tests/sstables: add run identifier correctness test
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:02 -02:00
Raphael S. Carvalho
a66b1954cc sstables: use a random uuid for sstables without run identifier
Older sstables must have an identifier for them to be associated
with their own run.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:53:01 -02:00
Raphael S. Carvalho
62025fa52c sstables: add run identifier to scylla metadata
It identifies a run which a particular sstable belongs to.
Existing sstables will have a random uuid associated with it
in memory.

UUID is the correct choice because it allows sstables to be
exported without having conflicts when using identifier generated
by different nodes.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2018-11-24 18:52:44 -02:00
Rafael Ávila de Espíndola
d18bbe9d45 Remove unreachable default cases.
These switches are fully covered. We can be sure they will stay this
way because of -Werror and gcc's -Wswitch warning.

We can also be sure that we never have an invalid enum value since the
state machine values are not read from disk.

The patch also removes a superfluous ';'.
Message-Id: <20181124020128.111083-1-espindola@scylladb.com>
2018-11-24 09:31:51 +00:00
Piotr Jastrzebski
569508158c sstables: Pass column_info to consume_*_column
This will allow checking for schema mismatches
and better error messages.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-23 21:48:14 +01:00
Piotr Jastrzebski
9ca6877cbd sstables: Add schema_mismatch to column_info
This field is true when there's a mismatch
between column type in serialization header and
current schema.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-23 21:48:14 +01:00
Piotr Jastrzebski
51fa8e0c94 sstables: Store column data type in column_info
This will be used to check schema mismatch and
provide informative error message.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-23 21:48:14 +01:00
Piotr Jastrzebski
99dfb9cc96 sstables: Remove code duplication in column_translation
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-11-23 21:48:14 +01:00
Raphael S. Carvalho
d29482dce8 sstables: deprecate sstable metadata's ancestors
The reason for that is that it's not available in sstable format mc,
so we can no longer rely on it in common code for the currently
supported formats.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181121170057.20900-1-raphaelsc@scylladb.com>
2018-11-23 19:38:32 +01:00
Tomasz Grabiec
8e93046abc tests: perf_fast_forward: Measure performance of dataset population 2018-11-23 19:22:50 +01:00
Tomasz Grabiec
2c95aa4d8d tests: perf_fast_forward: Record the dataset on which test case was run
Now any given test case can potentially run on many different datasets.
2018-11-23 19:22:12 +01:00
Tomasz Grabiec
470552b7ab tests: perf_fast_forward: Introduce the concept of a dataset
A dataset represents a table with data, populated in certain way, with
certain characteristics of the schema and data.

Before this change, datasets were implicitly defined, with population
hard-coded inside the populate() function.

This change gathers logic related to datasets into classes, in order to:

  - make it easier to define new datasets.

  - be able to measure performance of dataset population in a
    standardized way.

  - being able to express constraints on datasets imposed by different
    test cases.  Test cases are matched with possible datasets based
    on the abstract interface they accept (e.g. clustered_ds,
    multipartition_ds), and which must be implemented by a compatible
    dataset. To facilitate this matching, test function is now wrapped
    into a dataset_acceptor object, with an automatically-generated can_run()
    virtual method, deduced by make_test_fn().

  - be able to select tests to run based on the dataset name.
    Only tests which are compatible with that dataset will be run.
2018-11-23 19:22:09 +01:00
Tomasz Grabiec
2746f78a9f tests: perf_fast_forward: Introduce make_compaction_disabling_guard() 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
b00d360281 tests: perf_fast_forward: Initialize output manager before population 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
25dc481030 tests: perf_fast_forward: Handle empty test parameter set 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
38a1b7e87b tests: perf_fast_forward: Extract json_output_writer::write_common_test_group() 2018-11-23 19:18:10 +01:00
Tomasz Grabiec
a507ca8159 tests: perf_fast_forward: Factor out access to cfg to a single place per function
Preparatory change before making n_rows be determined through a
dataset object.
2018-11-23 19:18:09 +01:00
Tomasz Grabiec
3fc78a25bf tests: perf_fast_forward: Extract result_collector
Extracts the result collection and reporting logic out of
run_test_case(). Will be needed in population tests, for which we
don't want the looping logic.
2018-11-23 19:18:09 +01:00
Tomasz Grabiec
f4a70283ee tests: perf_fast_forward: Take writes into account in AIO statistics
Relevant for population tests. So far all tests were read tests.
2018-11-23 19:18:09 +01:00
Tomasz Grabiec
96f5bd2f46 tests: perf_fast_forward: Reorder members 2018-11-23 19:18:09 +01:00
Tomasz Grabiec
3ac5e8887e tests: perf_fast_forward: Add --sstable-format command line option 2018-11-23 19:18:09 +01:00
Tomasz Grabiec
564b328b2e Merge 'Add tests for schema changes' from Paweł
This series adds a generic test for schema changes that generates
various schema and data before and after an ALTER TABLE operation. It is
then used to check correctness of mutation::upgrade() and sstable
readers and lead to the discovery of #3924 and #3925.

Fixes #3925.

* https://github.com/pdziepak/scylla.git schema-change-test/v3.1
  schema_builder: make member function names less confusing
  converting_mutation_partition_applier: fix collection type changes
  converting_mutation_partition_applier: do not emit empty collections
  sstable: use format() instead of sprint()
  tests/random-utils: make functions and variables inline
  tests: add models for schemas and data
  tests: generate schema changes
  tests/mutation: add test for schema changes
  tests/sstable: add test for schema changes
2018-11-23 15:11:31 +01:00
Paweł Dziepak
09439cd809 tests/sstable: add test for schema changes
for_each_schema_change() is used for testing reading an sstable that was
written with a different schema. Because of #3924, for now the mc format
is not verified this way.
2018-11-23 12:14:06 +00:00
Paweł Dziepak
dc7f9fea5b tests/mutation: add test for schema changes 2018-11-23 12:14:06 +00:00
Paweł Dziepak
35f9f424e9 tests: generate schema changes
This patch adds for_each_schema_change() functions which generates
schemas and data before and after some modification to the schema (e.g.
adding a column, changing its type). It can be used to test schema
upgrades.
2018-11-23 12:14:06 +00:00
Paweł Dziepak
daee4bd3b8 tests: add models for schemas and data
This patch introduces a model of Scylla schemas and data, implemented
using simple standard library primitives. It can be used for testing the
actuall schemas, mutation_partitions, etc. used by the schema by
comparing the results of various actions.

The initial use case for this model was to test schema changes, but
there is no reason why in the future it cannot be extended to test other
things as well.
2018-11-23 12:14:06 +00:00
Takuya ASADA
cf0d00b81a dist/ami: fix 'unknown configuration key: "enhanced_networking"' error while building AMI
packer 1.3.2 no longer supported enhanced_networking directive, we need
to use new directives("sriov_support" and "ena_support") to build with
new version.
packer provides automatic configuration file fixing tool, so new
scylla.json is generated by following command:
 ./packer/packer fix scylla.json

Fixes #3938

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181123053719.32451-1-syuu@scylladb.com>
2018-11-23 08:15:47 +02:00
Paweł Dziepak
91793c0a43 bytes_ostream: drop appending_hash specialisation
appending_hash is used for computing hashes that become part of the
binary interface. They cannot change between Scylla version and the same
data needs to always result in the same hash.

At the moment, appending_hash<bytes_ostream> doesn't fulfil those
requirements since it leaks information how the underlying buffer is
fragmented. Fortunately, it has no users so it doesn't casue any
compatibility issues.

Moreover, bytes_ostream is usually used as an output of some
serialisation routine (e.g. frozen_mutation_fragment or CQL response).
Those serialisation formats do not guarantee that there is a single
representation of a given data and therefore are not fit to be hashed by
appending_hash. Removing appending_hash<bytes_ostream> may help
preventing such incorrect uses.
Message-Id: <20181122163823.12759-1-pdziepak@scylladb.com>
2018-11-22 23:53:54 +00:00
Tomasz Grabiec
fb38f0e9f8 Update seastar submodule
* seastar b924495...1fbb633 (3):
  > rpc: Reduce code duplication
  > tests: perf: Make do_not_optimize() take the argument by const&
  > doc: Fix import paths in the tutorial
2018-11-22 23:53:54 +00:00
Paweł Dziepak
2a0e929830 tests/random-utils: make functions and variables inline
random-utils.hh is a header which may be included in multiple
translation units so all members should be non-static inline to avoid
any duplication.
2018-11-22 11:30:31 +00:00
Paweł Dziepak
edb5402a73 sstable: use format() instead of sprint()
The format message was using the new stlye formatting markers ("{}")
which are understood by format() but not by sprint() (the latter is
basically deprecated).
2018-11-22 11:30:31 +00:00
Paweł Dziepak
1fbe33791d converting_mutation_partition_applier: do not emit empty collections
This patch changes the behaviour of the schema upgrade code so that if
all cells and the tombstons of a collection are removed during the upgrade
the collection is not emitted (as opposed to emitting an empty one).
Both behaviours are valid, but the new one makes it more consistent with
how atomic cells are upgraded and how schema upgrades work for sstable
readers.
2018-11-22 11:30:31 +00:00
Paweł Dziepak
7b12aaa093 converting_mutation_partition_applier: fix collection type changes
ALTER TABLE allows changing the type of a collection to a compatible
one. This includes changes from a fixed-sized type to a variable-sized
one. If that happens the atomic_cells representing collection elements
need to be rewritten so that the value size is included. The logic for
rewritting atomic cells already exists (for those that are not
collection members) and is reused in this patch.

Fixes #3925.
2018-11-22 11:30:31 +00:00
Paweł Dziepak
43e0201ec6 schema_builder: make member function names less confusing
Right now, schema_builder member functions have names that very poorly
convey the actions that are performed for them. This is made even worse
by some overloads which drastically change the semantics. For example:

    schema_builder()
        .with_column("v1", /* ... */)
        .without_column("v1", removal_timestamp);

Creates a column "v1" and adds an information that there was a column
with that name that was removed at 'removal_timestamp'.

    schema_builder()
        .with_coulmn("v1")
        .without_column(utf8_type->decompose("v1"));

This adds column "v1" and then immediately removes it.

In order to clean up this mess the names were changes so that:
 * with_/without_ functions only add informations to the schema (e.g.
   info that a column was removed, but without removing a column of that
   name if one exists)
 * functions which names start with a verb actually perform that action,
   e.g. the new remove_column() removes the column (and adds information
   that it used to exist) as in the second example.
2018-11-22 11:30:31 +00:00
Benny Halevy
dcd18e2b62 remove exec permission from top_k source files
This was introduced by 32525f2694

Cc: Rafi Einstein <rafie@scylladb.com>
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181121163352.13325-1-bhalevy@scylladb.com>
2018-11-21 18:38:50 +02:00
Gleb Natapov
b4a8802edc hints: make hints manager more resilient to unexpected directory content
Currently if hints directory contains unexpected directories Scylla fails to
start with unhandled std::invalid_argument exception. Make the manager
ignore malformed files instead and try to proceed anyway.
Message-Id: <20181121134618.29936-2-gleb@scylladb.com>
2018-11-21 14:53:03 +00:00
Gleb Natapov
9433d02624 hints: add auxiliary function for scanning high level hints directory
We scan hints directory in two places: to search for files to replay and
to search for directories to remove after resharding. The code that
translates directory name to a shard is duplicated. It is simple now, so
not a bit issue but in case it grows better have it in one place.
Message-Id: <20181121134618.29936-1-gleb@scylladb.com>
2018-11-21 14:53:03 +00:00
Paweł Dziepak
4aa5d83590 Merge "Optimize sstable writing of the MC format" from Tomasz
"
Tested with perf_fast_forward from:

  github.com/tgrabiec/scylla.git perf_fast_forward-for-sst3-opt-write-v1

Using the following command line:

  build/release/tests/perf/perf_fast_forward_g --populate --sstable-format=mc \
     --data-directory /tmp/perf-mc --rows=10000000 -c1 -m4G \
     --datasets small-part

The average reported flush throughput was (stdev for the avergages is around 4k):
  - for mc before the series: 367848 frag/s
  - for lc before the series: 463458 frag/s (= mc.before +25%)
  - for mc after the series: 429276 frag/s (= mc.before +16%)
  - for lc after the series: 466495 frag/s (= mc.before +26%)

Refs #3874.
"

* tag 'sst3-opt-write-v2' of github.com:tgrabiec/scylla:
  sstables: mc: Avoid serialization of promoted index when empty
  sstables: mc: Avoid double serialization of rows
  tests: sstable 3.x: Do not compare Statistics component
  utils: Introduce memory_data_sink
  schema: Optimize column count getters
  sstables: checksummed_file_data_sink_impl: Bypass output_stream
2018-11-21 13:11:40 +00:00
Tomasz Grabiec
049926bfb8 sstables: mc: Avoid serialization of promoted index when empty
calculate_write_size() adds some overhead, even if we're not going to
write anything.
2018-11-21 14:04:27 +01:00
Tomasz Grabiec
0a9f5b563a sstables: mc: Avoid double serialization of rows
The old code was serializing the row twice. Once to get the size of
its block on disk, which is needed to write the block length, and then
to actually write the block.

This patch avoids this by serializing once into a temporary buffer and
then appending that buffer to the data file writer.

I measured about 10% improvement in memtable flush throughput with
this for the small-part dataset in perf_fast_forward.
2018-11-21 14:04:27 +01:00
Tomasz Grabiec
8f686af9af tests: sstable 3.x: Do not compare Statistics component
The Statistics component recorded in the test was generated using a
buggy verion of Scylla, and is not correct. Exposed by fixing the bug
in the way statistics are generated.

Rather than comparing binary content, we should have explicit checks
for statistics.
2018-11-21 14:04:27 +01:00
Tomasz Grabiec
143fd6e1c2 utils: Introduce memory_data_sink 2018-11-21 14:04:27 +01:00
Tomasz Grabiec
789fac9884 schema: Optimize column count getters 2018-11-21 14:04:27 +01:00
Tomasz Grabiec
8e8b96c6ed sstables: checksummed_file_data_sink_impl: Bypass output_stream
We can avoid the data copying by switching from this:

  sink -> stream -> sink

to this:

  sink -> sink
2018-11-21 14:04:27 +01:00
Avi Kivity
bb85a21a8f Merge "compress: Restore lz4 as default compressor" from Duarte
"
Enables sstable compression with LZ4 by default, which was the
long-time behavior until a regression turned off compression by
default.

Fixes #3926
"

* 'restore-default-compression/v2' of https://github.com/duarten/scylla:
  tests/cql_query_test: Assert default compression options
  compress: Restore lz4 as default compressor
  tests: Be explicit about absence of compression
2018-11-21 14:20:39 +02:00
Benny Halevy
76b1c184b7 conf: clean up cassandra references in scylla.yaml
Indicate the default scylla directories, rather than Cassandra's.
Provide links to Scylladocumentation where possible,
update links to Casandra documentation otherwise.
Clean up a few typos.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181119141912.28830-1-bhalevy@scylladb.com>
2018-11-21 13:04:24 +02:00
Rafael Ávila de Espíndola
7fa7e9716d Mention scylla-tools-java and scylla-jmx in HACKING.md
I struggled a bit finding out why nodetool was not working, so it
might be a good idea to expand the documentation a bit.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181120233358.25859-1-espindola@scylladb.com>
2018-11-21 12:55:17 +02:00
Tomasz Grabiec
349c9f7a69 HACKING.md: Add a link to the slides about core dump debugging tools
Message-Id: <1542793207-1620-1-git-send-email-tgrabiec@scylladb.com>
2018-11-21 11:45:23 +02:00
Michael Munday
53fdde75f6 dht: use little endian byte order explicitly for token hash
This avoids a difference between little and big endian sytems. We
now also calculate a full murmur hash for tokens with less than 8
bytes, however in practice the token size is always 8.

Message-Id: <20181120214733.43800-1-mike.munday@ibm.com>
2018-11-21 11:44:29 +02:00
Michael Munday
360374cfde tests: fix compilation of partitioner_test with boost 1.68 on IBM Z
The boost multiprecision library that I am compiling against seems
to be missing an overload for the cast to a string. The easy
workaround seems to be to call str() directly instead.

This also fixes #3922.

Message-Id: <20181120215709.43939-1-mike.munday@ibm.com>
2018-11-21 11:43:42 +02:00
Duarte Nunes
9464fffc8c tests/cql_query_test: Assert default compression options
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-11-20 22:47:27 +00:00
Duarte Nunes
36dc9e3280 compress: Restore lz4 as default compressor
Fixes a regression introduced in
74758c87cd, where tables started to be
created without compression by default (before they were created with
lz4 by default).

Fixes #3926

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-11-20 22:47:27 +00:00
Duarte Nunes
5f64e34fcc tests: Be explicit about absence of compression
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-11-20 22:47:26 +00:00
Avi Kivity
775b7e41f4 Update seastar submodule
* seastar d59fcef...b924495 (2):
  > build: Fix protobuf generation rules
  > Merge "Restructure files" from Jesse

Includes fixup patch from Jesse:

"
Update Seastar `#include`s to reflect restructure

All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
2018-11-21 00:01:44 +02:00
Takuya ASADA
42baf6a6f7 dist/ami: update packer
Update packer to latest version, 1.3.2.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181031110441.16284-2-syuu@scylladb.com>
2018-11-20 21:29:57 +02:00
Takuya ASADA
b9a42e83ad dist/ami: enable AMI build log
To make easier to debug AMI build error, enable logging.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181031110441.16284-1-syuu@scylladb.com>
2018-11-20 21:29:57 +02:00
Takuya ASADA
72411f95cb reloc/build_reloc.sh: find ninja-build after executed install-dependencies.sh
The build environment may not installed ninja-build before running
install-dependencies.sh, so do it after running the script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181031110737.17755-1-syuu@scylladb.com>
2018-11-20 21:29:57 +02:00
Avi Kivity
183c2369f3 Update seastar submodule
* seastar a44cedf...d59fcef (10):
  > dns: Set tcp output stream buffer size to zero explicitly
  > tests: add libc-ares to travis dependencies
  > tests: add dns_test to test suite
  > build: drop bundled c-ares package
  > prometheus: replace the instance label with an optional one
  > build: Refactor C++ dialect detection
  > build: add libatomic to install-depenencies.sh
  > core: use std::underlying_type for open_flags
  > core: introduce open_flags::operator&
  > core: Fix build for `gnu++14`
2018-11-20 21:29:57 +02:00
Tomasz Grabiec
57e25fa0f8 utils: phased_barrier: Make advance_and_await() have strong exception guarantees
Currently, when advance_and_await() fails to allocate the new gate
object, it will throw bad_alloc and leave the phased_barrier object in
an invalid state. Calling advance_and_await() again on it will result
in undefined behavior (typically SIGSEGV) beacuse _gate will be
disengaged.

One place affected by this is table::seal_active_memtable(), which
calls _flush_barrier.advance_and_await(). If this throws, subsequent
flush attempts will SIGSEGV.

This patch rearranges the code so that advance_and_await() has strong
exception guarantees.
Message-Id: <1542645562-20932-1-git-send-email-tgrabiec@scylladb.com>
2018-11-20 16:15:12 +00:00
Glauber Costa
9f403334c8 remove monitor if sstable write failed
In (almost) all SSTable write paths, we need to inform the monitor that
the write has failed as well. The monitor will remove the SSTable from
controller's tracking at that point.

Except there is one place where we are not doing that: streaming of big
mutations. Streaming of big mutations is an interesting use case, in
which it is done in 2 parts: if the writing of the SSTable fails right
away, then we do the correct thing.

But the SSTables are not commited at that point and the monitors are
still kept around with the SSTables until a later time, when they are
finally committed. Between those two points in time, it is possible that
the streaming code will detect a failure and manually call
fail_streaming_mutations(), which marks the SSTable for deletions. At
that point we should propagate that information to the monitor as well,
but we don't.

Fixes #3732 (hopefully)
Tests: unit (release)

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181114213618.16789-1-glauber@scylladb.com>
2018-11-20 16:15:12 +00:00
Gleb Natapov
d144e6ceac messaging_service: enable port load balancing algorithm for RPC server
In a homogeneous cluster this will reduce number of internal cross-shard hops
per request since RPC calls will arrive to correct shard.

Message-Id: <20181118150817.GF2062@scylladb.com>
2018-11-20 16:15:12 +00:00
Michael Munday
b9a2f4a228 dht: fix byte ordered partitioner midpoint calculation
New versions of boost saturate the output of the convert_to method
so we need to mask the part we want to extract.

Updates #3922.

Message-Id: <20181116191441.35000-1-mike.munday@ibm.com>
2018-11-16 21:19:06 +02:00
Glauber Costa
c6811bd877 sstables: correctly parse estimated histograms
In commit a33f0d6, we changed the way we handle arrays during the write
and parse code to avoid reactor stalls. Some potentially big loops were
transformed into futurized loops, and also some calls to vector resizes
were replaced by a reserve + push_back idiom.

The latter broke parsing of the estimated histogram. The reason being
that the vectors that are used here are already initialized internally
by the estimated_histogram object. Therefore, when we push_back, we
don't fill the array all the way from index 0, but end up with a zeroed
beginning and only push back some of the elements we need.

We could revert this array to a resize() call. After all, the reason we
are using reserve + push_back is to avoid calling the constructor member
for each element, but We don't really expect the integer specialization
to do any of that.

However, to avoid confusion with future developers that may feel tempted
to converted this as well for the sake of consistency, it is safer to
just make sure these arrays are zeroed.

Fixes #3918

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181116130853.10473-1-glauber@scylladb.com>
2018-11-16 20:52:44 +02:00
Avi Kivity
d708dabab9 doc: add reference to Linux' submitting-patches document
Since our development process is a derivative of Linux, almost everything there
is pertinent.

Message-Id: <20181115184037.5256-1-avi@scylladb.com>
2018-11-16 20:15:40 +02:00
Vladimir Krivopalov
759fbbd5f6 random_mutation_generator: Add row_marker to rows regardless of whether they're deleted.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <f55b91f1349f0e98def6b7ca9755b5ccf4f48a3e.1542308626.git.vladimir@scylladb.com>
2018-11-16 13:17:07 +01:00
Avi Kivity
6548a404b2 Remove patch file committed by mistake 2018-11-15 19:47:55 +02:00
Duarte Nunes
6fbf792777 db/view/view_builder: Don't timeout waiting for view to be built
Remove the timeout argument to
db::view::view_builder::wait_until_built(), a test-only function to
wait until a given materialized view has finished building.

This change is motivated by the fact that some tests running on slow
environments will timeout. Instead of incrementally increasing the
timeout, remove it completely since tests are already run under an
exterior timeout.

Fixes #3920

Tests: unit release(view_build_test, view_schema_test)

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181115173902.19048-1-duarte@scylladb.com>
2018-11-15 19:41:43 +02:00
Amnon Heiman
25378916bc API: colummn_family.hh yield in map_reduce_column_families_locally
map_reduce_column_families_locally iterate over all tables (column
family) in a shard.

If the number of tables is big it can cause latency spikes.

This patch replaces the current loop with a do_for_each allowing
preepmtion inside the loop.

Fixes #3886

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20181115154825.23430-1-amnon@scylladb.com>
2018-11-15 18:58:23 +02:00
Nadav Har'El
45f05b06d2 view_complex_test: fix another ttl
In a previous patch I fixed most TTLs in the view_complex_test.cc tests
from low numbers to 100 seconds. I missed one. This one never caused
problems in practice, but for good form, let's fix it too.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181115160234.26478-1-nyh@scylladb.com>
2018-11-15 18:03:28 +02:00
Nadav Har'El
78ed7d6d0c Materialized Views and Secondary Index: no longer experimental
After this patch, the Materialized Views and Secondary Index features
are considered generally-available and no longer require passing an
explicit "--experimental=on" flag to Scylla.

The "--experimental=on" flag and the db::config::check_experimental()
function remain unused, as we graduated the only two features which used
this flag. However, we leave the support for experimental features in
the code, to make it easier to add new experimental features in the future.
Another reason to leave the command-line parameter behind is so existing
scripts that still use it will not break.

Fixes #3917

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181115144456.25518-1-nyh@scylladb.com>
2018-11-15 17:59:27 +02:00
Vladimir Krivopalov
51afb1d8bd tests: Generate deleted rows and shadowable tombstones in random_mutation_generator.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <77e956890264023227e07cc6d295df870d0a5af2.1542295208.git.vladimir@scylladb.com>
2018-11-15 16:26:07 +01:00
Avi Kivity
0216f49bb0 Merge "Add filtering support for CONTAINS" from Piotr
"
This series enables filtering support for CONTAINS restriction.
"

* 'enable_filtering_for_contains_2' of https://github.com/psarna/scylla:
  tests: add CONTAINS test case to filtering tests
  cql3: enable filtering for CONTAINS restriction
  cql3: add is_satisfied_by(bytes_view) for CONTAINS
2018-11-15 16:49:29 +02:00
Nadav Har'El
4108458b8e view_complex_test: increase low ttl which may fail test on busy machine
Several of the tests in tests/view_complex_test.cc set a cell with a
TTL, and then skip time ahead artificially with forward_jump_clocks(),
to go past the TTL time and check the cell disappeared as expected.

The TTLs chosen for these tests were arbitrary numbers - some had 3 seconds,
some 5 seconds, and some 60 seconds. The actual number doesn't matter - it
is completely artificial (we move the clock with forward_jump_clocks() and
never really wait for that amount of time) and could very well be a million
seconds. But *low* numbers, like the 3 seconds, present a problem on extremely
overcomitted test machines. Our eventually() function already allows for
the possibility that things can hang for up to 8 seconds, but with a 3 second
TTL, we can find ourselves with data being expired and the test failing just
after 3 seconds of wall time have passed - while the test intended that the
dataq will expire only when we explicitly call forward_jump_clocks().

So this patch changes all the TTLs in this test to be the same high number -
100 seconds. This hopefully fixes #3918.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181115125607.22647-1-nyh@scylladb.com>
2018-11-15 15:34:08 +02:00
Piotr Jastrzebski
411437f320 Fix format string in mutation_partition::operator<<
fmt does not allow bool values for :d and previous
format string was resulting in:

fmt::v5::format_error: invalid type specifier

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <3980a3cdb903263e29689b1c6cd24e3592826fe0.1542284205.git.piotr@scylladb.com>
2018-11-15 12:22:10 +00:00
Yannis Zarkadas
d292d0c78d dist/redhat: extend docker entrypoint with more cmd flags
With the use of Docker image, some extra options needed to be exposed
to provide extended functionality when starting the image. The flags
added by this commit are:

 - cluster-name: name of the Scylla cluster. cluster_name option in
scylla.yaml.
 - rpc-address: IP address for client connections (CQL). rpc_address
option in scylla.yaml.
 - endpoint-snitch: The snitch used to discover the cluster topology.
endpoint_snitch option in scylla.yaml.
 - replace-address-first-boot: Replace a Scylla node by its IP.
replace_address_first_boot option in scylla.yaml.

Signed-off-by: Yannis Zarkadas <yanniszarkadas@gmail.com>
[ penberg@scylladb.com: fix up merge conflicts ]
Message-Id: <20181108234212.19969-2-yanniszarkadas@gmail.com>
2018-11-15 09:07:52 +02:00
Alexys Jacob
cd9d01cd7e test.py: coding style fixes
test.py:26:1: F401 'signal' imported but unused
test.py:27:1: F401 'shlex' imported but unused
test.py:28:1: F401 'threading' imported but unused
test.py:173:1: E305 expected 2 blank lines after class or function definition,
found 1
test.py:181:34: E241 multiple spaces after ','
test.py:183:34: E241 multiple spaces after ','
test.py:209:24: E222 multiple spaces after operator
test.py:240:5: E301 expected 1 blank line, found 0
test.py:249:23: W504 line break after binary operator
test.py:254:9: E306 expected 1 blank line before a nested definition, found 0
test.py:263:13: F841 local variable 'out' is assigned to but never used
test.py:264:33: E128 continuation line under-indented for visual indent
test.py:265:33: E128 continuation line under-indented for visual indent
test.py:266:33: E128 continuation line under-indented for visual indent
test.py:274:64: F821 undefined name 'e'
test.py:278:53: F821 undefined name 'e'

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104115255.22547-1-ultrabug@gentoo.org>
2018-11-14 19:25:14 +02:00
Alexys Jacob
e76a1085d3 scylla-gdb.py: coding style fixes
scylla-gdb.py:1:11: E401 multiple imports on one line
scylla-gdb.py:5:1: F811 redefinition of unused 're' from line 2
scylla-gdb.py:10:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:19:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:24:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:30:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:39:9: E722 do not use bare 'except'
scylla-gdb.py:47:33: E711 comparison to None should be 'if cond is None:'
scylla-gdb.py:63:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:90:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:115:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:139:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:161:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:184:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:204:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:210:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:214:5: E301 expected 1 blank line, found 0
scylla-gdb.py:221:5: E301 expected 1 blank line, found 0
scylla-gdb.py:224:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:252:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:267:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:284:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:300:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:314:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:318:5: E301 expected 1 blank line, found 0
scylla-gdb.py:322:5: E301 expected 1 blank line, found 0
scylla-gdb.py:337:1: E305 expected 2 blank lines after class or function
definition, found 1
scylla-gdb.py:339:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:342:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:345:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:348:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:352:129: E202 whitespace before ')'
scylla-gdb.py:361:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:363:129: E202 whitespace before ')'
scylla-gdb.py:371:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:375:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:378:5: E301 expected 1 blank line, found 0
scylla-gdb.py:383:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:386:5: E301 expected 1 blank line, found 0
scylla-gdb.py:393:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:396:5: E301 expected 1 blank line, found 0
scylla-gdb.py:407:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:410:5: E301 expected 1 blank line, found 0
scylla-gdb.py:412:9: E306 expected 1 blank line before a nested definition,
found 0
scylla-gdb.py:439:26: E703 statement ends with a semicolon
scylla-gdb.py:462:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:500:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:506:5: E722 do not use bare 'except'
scylla-gdb.py:516:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:518:18: E271 multiple spaces after keyword
scylla-gdb.py:522:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:530:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:533:5: E301 expected 1 blank line, found 0
scylla-gdb.py:537:13: E306 expected 1 blank line before a nested definition,
found 0
scylla-gdb.py:547:9: E722 do not use bare 'except'
scylla-gdb.py:550:26: E261 at least two spaces before inline comment
scylla-gdb.py:568:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:571:5: E301 expected 1 blank line, found 0
scylla-gdb.py:577:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:577:39: E226 missing whitespace around arithmetic operator
scylla-gdb.py:583:15: E128 continuation line under-indented for visual indent
scylla-gdb.py:596:19: E128 continuation line under-indented for visual indent
scylla-gdb.py:609:82: E227 missing whitespace around bitwise or shift operator
scylla-gdb.py:609:90: E226 missing whitespace around arithmetic operator
scylla-gdb.py:609:113: E226 missing whitespace around arithmetic operator
scylla-gdb.py:613:1: E303 too many blank lines (3)
scylla-gdb.py:645:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:659:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:671:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:678:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:679:9: E128 continuation line under-indented for visual indent
scylla-gdb.py:680:9: E128 continuation line under-indented for visual indent
scylla-gdb.py:681:9: E128 continuation line under-indented for visual indent
scylla-gdb.py:682:9: E128 continuation line under-indented for visual indent
scylla-gdb.py:708:12: E111 indentation is not a multiple of four
scylla-gdb.py:721:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:723:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:725:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:727:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:729:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:748:33: E261 at least two spaces before inline comment
scylla-gdb.py:770:17: E306 expected 1 blank line before a nested definition,
found 0
scylla-gdb.py:795:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:796:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:797:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:798:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:800:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:807:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:814:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:820:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:823:5: E301 expected 1 blank line, found 0
scylla-gdb.py:845:35: E703 statement ends with a semicolon
scylla-gdb.py:865:91: E703 statement ends with a semicolon
scylla-gdb.py:896:9: F841 local variable 'segment_size' is assigned to but
never used
scylla-gdb.py:904:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:907:5: E301 expected 1 blank line, found 0
scylla-gdb.py:915:73: E128 continuation line under-indented for visual indent
scylla-gdb.py:916:73: E128 continuation line under-indented for visual indent
scylla-gdb.py:917:73: E126 continuation line over-indented for hanging indent
scylla-gdb.py:922:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:925:5: E301 expected 1 blank line, found 0
scylla-gdb.py:933:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:934:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:934:49: E251 unexpected spaces around keyword / parameter equals
scylla-gdb.py:934:51: E251 unexpected spaces around keyword / parameter equals
scylla-gdb.py:934:74: E251 unexpected spaces around keyword / parameter equals
scylla-gdb.py:934:76: E251 unexpected spaces around keyword / parameter equals
scylla-gdb.py:940:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:941:13: E128 continuation line under-indented for visual indent
scylla-gdb.py:949:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:950:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:951:17: E128 continuation line under-indented for visual indent
scylla-gdb.py:952:21: E128 continuation line under-indented for visual indent
scylla-gdb.py:953:21: E128 continuation line under-indented for visual indent
scylla-gdb.py:954:21: E128 continuation line under-indented for visual indent
scylla-gdb.py:955:21: E128 continuation line under-indented for visual indent
scylla-gdb.py:958:1: E305 expected 2 blank lines after class or function
definition, found 1
scylla-gdb.py:958:11: E261 at least two spaces before inline comment
scylla-gdb.py:959:1: E302 expected 2 blank lines, found 0
scylla-gdb.py:971:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:989:5: E301 expected 1 blank line, found 0
scylla-gdb.py:993:5: E301 expected 1 blank line, found 0
scylla-gdb.py:995:5: E301 expected 1 blank line, found 0
scylla-gdb.py:997:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1001:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1005:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1029:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1034:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1037:46: E128 continuation line under-indented for visual indent
scylla-gdb.py:1057:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1060:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1071:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1076:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1084:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1093:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1096:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1101:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1104:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1116:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1119:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1123:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1126:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1132:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1135:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1138:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1141:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1147:15: E241 multiple spaces after ':'
scylla-gdb.py:1148:15: E241 multiple spaces after ':'
scylla-gdb.py:1149:15: E241 multiple spaces after ':'
scylla-gdb.py:1150:15: E241 multiple spaces after ':'
scylla-gdb.py:1151:15: E241 multiple spaces after ':'
scylla-gdb.py:1152:15: E241 multiple spaces after ':'
scylla-gdb.py:1153:15: E241 multiple spaces after ':'
scylla-gdb.py:1154:15: E241 multiple spaces after ':'
scylla-gdb.py:1170:20: E221 multiple spaces before operator
scylla-gdb.py:1191:40: E226 missing whitespace around arithmetic operator
scylla-gdb.py:1191:59: E226 missing whitespace around arithmetic operator
scylla-gdb.py:1225:1: E305 expected 2 blank lines after class or function
definition, found 1
scylla-gdb.py:1227:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1233:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1236:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1240:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1278:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1281:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1284:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1287:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1293:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1296:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1320:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1323:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1355:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1362:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1383:1: E302 expected 2 blank lines, found 1
scylla-gdb.py:1386:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1388:9: E306 expected 1 blank line before a nested definition,
found 0
scylla-gdb.py:1397:13: F841 local variable 'selector' is assigned to but never
used
scylla-gdb.py:1446:5: E301 expected 1 blank line, found 0
scylla-gdb.py:1477:5: E301 expected 1 blank line, found 0

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104113603.1111-1-ultrabug@gentoo.org>
2018-11-14 19:25:14 +02:00
Alexys Jacob
e58eb6d6ab idl-compiler.py: coding style fixes
idl-compiler.py:22:1: F401 'json' imported but unused
idl-compiler.py:23:1: F401 'sys' imported but unused
idl-compiler.py:24:1: F401 're' imported but unused
idl-compiler.py:25:1: F401 'glob' imported but unused
idl-compiler.py:27:1: F401 'os' imported but unused
idl-compiler.py:54:1: F811 redefinition of unused 'reindent' from line 33
idl-compiler.py:57:1: E302 expected 2 blank lines, found 1
idl-compiler.py:61:1: E302 expected 2 blank lines, found 1
idl-compiler.py:66:1: E302 expected 2 blank lines, found 1
idl-compiler.py:96:1: E302 expected 2 blank lines, found 1
idl-compiler.py:160:1: E302 expected 2 blank lines, found 1
idl-compiler.py:163:1: E302 expected 2 blank lines, found 1
idl-compiler.py:166:1: E302 expected 2 blank lines, found 1
idl-compiler.py:172:1: E302 expected 2 blank lines, found 1
idl-compiler.py:176:1: E302 expected 2 blank lines, found 1
idl-compiler.py:176:47: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:176:49: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:191:24: E203 whitespace before ':'
idl-compiler.py:191:43: E203 whitespace before ':'
idl-compiler.py:191:67: E203 whitespace before ':'
idl-compiler.py:191:84: E202 whitespace before '}'
idl-compiler.py:195:1: E302 expected 2 blank lines, found 1
idl-compiler.py:195:45: E203 whitespace before ','
idl-compiler.py:195:69: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:195:71: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:198:28: E225 missing whitespace around operator
idl-compiler.py:198:40: E225 missing whitespace around operator
idl-compiler.py:198:43: E272 multiple spaces before keyword
idl-compiler.py:212:25: E203 whitespace before ':'
idl-compiler.py:212:45: E203 whitespace before ':'
idl-compiler.py:212:100: E203 whitespace before ':'
idl-compiler.py:218:1: E302 expected 2 blank lines, found 1
idl-compiler.py:225:1: E302 expected 2 blank lines, found 1
idl-compiler.py:226:11: E271 multiple spaces after keyword
idl-compiler.py:228:1: E302 expected 2 blank lines, found 1
idl-compiler.py:235:1: E302 expected 2 blank lines, found 1
idl-compiler.py:238:1: E302 expected 2 blank lines, found 1
idl-compiler.py:241:5: E722 do not use bare 'except'
idl-compiler.py:243:1: E305 expected 2 blank lines after class or function
definition, found 0
idl-compiler.py:245:1: E302 expected 2 blank lines, found 1
idl-compiler.py:250:25: E231 missing whitespace after ','
idl-compiler.py:253:1: E302 expected 2 blank lines, found 1
idl-compiler.py:256:1: E302 expected 2 blank lines, found 1
idl-compiler.py:263:1: E302 expected 2 blank lines, found 1
idl-compiler.py:266:1: E302 expected 2 blank lines, found 1
idl-compiler.py:267:75: E225 missing whitespace around operator
idl-compiler.py:269:1: E302 expected 2 blank lines, found 1
idl-compiler.py:272:1: E302 expected 2 blank lines, found 1
idl-compiler.py:275:1: E302 expected 2 blank lines, found 1
idl-compiler.py:278:1: E305 expected 2 blank lines after class or function
definition, found 1
idl-compiler.py:280:1: E302 expected 2 blank lines, found 1
idl-compiler.py:283:1: E302 expected 2 blank lines, found 1
idl-compiler.py:286:1: E302 expected 2 blank lines, found 1
idl-compiler.py:288:1: E302 expected 2 blank lines, found 0
idl-compiler.py:293:1: E302 expected 2 blank lines, found 1
idl-compiler.py:294:20: E203 whitespace before ':'
idl-compiler.py:294:22: E241 multiple spaces after ':'
idl-compiler.py:294:51: E203 whitespace before ':'
idl-compiler.py:294:55: E202 whitespace before '}'
idl-compiler.py:296:1: E302 expected 2 blank lines, found 1
idl-compiler.py:298:23: E203 whitespace before ':'
idl-compiler.py:300:1: E305 expected 2 blank lines after class or function
definition, found 1
idl-compiler.py:301:1: E302 expected 2 blank lines, found 0
idl-compiler.py:304:1: E302 expected 2 blank lines, found 1
idl-compiler.py:304:45: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:304:47: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:311:67: E202 whitespace before '}'
idl-compiler.py:314:74: E241 multiple spaces after ':'
idl-compiler.py:316:114: E241 multiple spaces after ':'
idl-compiler.py:316:129: E203 whitespace before ':'
idl-compiler.py:326:1: E302 expected 2 blank lines, found 1
idl-compiler.py:328:27: E231 missing whitespace after ','
idl-compiler.py:328:34: E225 missing whitespace around operator
idl-compiler.py:330:1: E302 expected 2 blank lines, found 1
idl-compiler.py:332:5: F841 local variable 'typ' is assigned to but never used
idl-compiler.py:348:63: E202 whitespace before '}'
idl-compiler.py:352:1: E302 expected 2 blank lines, found 1
idl-compiler.py:353:21: E231 missing whitespace after ','
idl-compiler.py:368:30: E203 whitespace before ':'
idl-compiler.py:374:30: E203 whitespace before ':'
idl-compiler.py:411:57: E203 whitespace before ':'
idl-compiler.py:413:1: E302 expected 2 blank lines, found 1
idl-compiler.py:413:64: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:413:66: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:413:80: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:413:82: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:413:98: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:413:100: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:415:51: E225 missing whitespace around operator
idl-compiler.py:417:57: E225 missing whitespace around operator
idl-compiler.py:448:1: E302 expected 2 blank lines, found 1
idl-compiler.py:448:60: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:448:62: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:448:76: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:448:78: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:448:94: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:448:96: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:451:51: E225 missing whitespace around operator
idl-compiler.py:453:57: E225 missing whitespace around operator
idl-compiler.py:455:30: E231 missing whitespace after ','
idl-compiler.py:477:1: E302 expected 2 blank lines, found 1
idl-compiler.py:477:48: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:477:50: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:477:67: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:477:69: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:484:24: E222 multiple spaces after operator
idl-compiler.py:488:74: E203 whitespace before ':'
idl-compiler.py:498:20: E222 multiple spaces after operator
idl-compiler.py:507:68: E203 whitespace before ':'
idl-compiler.py:507:88: E203 whitespace before ':'
idl-compiler.py:514:87: E231 missing whitespace after ','
idl-compiler.py:520:14: E211 whitespace before '('
idl-compiler.py:521:15: E703 statement ends with a semicolon
idl-compiler.py:523:1: E302 expected 2 blank lines, found 1
idl-compiler.py:540:47: E231 missing whitespace after ':'
idl-compiler.py:542:1: E302 expected 2 blank lines, found 1
idl-compiler.py:542:47: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:542:49: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:542:69: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:542:71: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:547:24: E222 multiple spaces after operator
idl-compiler.py:553:47: E231 missing whitespace after ':'
idl-compiler.py:558:43: E231 missing whitespace after ':'
idl-compiler.py:560:1: E302 expected 2 blank lines, found 1
idl-compiler.py:564:1: E302 expected 2 blank lines, found 1
idl-compiler.py:564:82: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:564:84: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:564:105: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:564:107: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:573:21: E222 multiple spaces after operator
idl-compiler.py:576:25: E222 multiple spaces after operator
idl-compiler.py:577:13: F841 local variable 'sate' is assigned to but never
used
idl-compiler.py:584:66: E203 whitespace before ':'
idl-compiler.py:589:66: E203 whitespace before ':'
idl-compiler.py:589:89: E203 whitespace before ':'
idl-compiler.py:589:113: E203 whitespace before ':'
idl-compiler.py:600:48: E203 whitespace before ':'
idl-compiler.py:600:68: E203 whitespace before ':'
idl-compiler.py:602:1: E302 expected 2 blank lines, found 1
idl-compiler.py:602:1: F811 redefinition of unused 'add_vector_node' from line
330
idl-compiler.py:604:38: E231 missing whitespace after ','
idl-compiler.py:604:59: E202 whitespace before ')'
idl-compiler.py:607:1: E305 expected 2 blank lines after class or function
definition, found 1
idl-compiler.py:609:1: E302 expected 2 blank lines, found 1
idl-compiler.py:615:39: E231 missing whitespace after ','
idl-compiler.py:622:1: E302 expected 2 blank lines, found 1
idl-compiler.py:630:46: E203 whitespace before ':'
idl-compiler.py:637:33: E231 missing whitespace after ':'
idl-compiler.py:640:90: E203 whitespace before ':'
idl-compiler.py:641:13: F841 local variable 'vr' is assigned to but never used
idl-compiler.py:642:1: E305 expected 2 blank lines after class or function
definition, found 0
idl-compiler.py:644:1: E302 expected 2 blank lines, found 1
idl-compiler.py:657:1: E302 expected 2 blank lines, found 1
idl-compiler.py:657:51: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:657:53: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:657:67: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:657:69: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:660:5: E265 block comment should start with '# '
idl-compiler.py:679:16: E272 multiple spaces before keyword
idl-compiler.py:692:56: E271 multiple spaces after keyword
idl-compiler.py:695:5: F841 local variable 'is_param_vector' is assigned to
but never used
idl-compiler.py:699:1: E302 expected 2 blank lines, found 1
idl-compiler.py:699:56: E202 whitespace before ')'
idl-compiler.py:711:1: E302 expected 2 blank lines, found 1
idl-compiler.py:719:26: E201 whitespace after '{'
idl-compiler.py:730:39: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:730:41: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:733:1: E302 expected 2 blank lines, found 1
idl-compiler.py:735:21: E225 missing whitespace around operator
idl-compiler.py:738:1: E302 expected 2 blank lines, found 1
idl-compiler.py:747:1: E305 expected 2 blank lines after class or function
definition, found 1
idl-compiler.py:749:1: E302 expected 2 blank lines, found 1
idl-compiler.py:767:17: E211 whitespace before '('
idl-compiler.py:767:26: E203 whitespace before ':'
idl-compiler.py:770:5: E303 too many blank lines (2)
idl-compiler.py:777:20: E211 whitespace before '('
idl-compiler.py:777:29: E203 whitespace before ':'
idl-compiler.py:783:28: E203 whitespace before ':'
idl-compiler.py:783:44: E203 whitespace before ':'
idl-compiler.py:783:82: E203 whitespace before ':'
idl-compiler.py:786:1: E302 expected 2 blank lines, found 1
idl-compiler.py:794:28: E203 whitespace before ':'
idl-compiler.py:802:33: E203 whitespace before ':'
idl-compiler.py:815:21: E126 continuation line over-indented for hanging
indent
idl-compiler.py:815:28: E203 whitespace before ':'
idl-compiler.py:815:50: E203 whitespace before ':'
idl-compiler.py:817:82: E203 whitespace before ':'
idl-compiler.py:817:104: E203 whitespace before ':'
idl-compiler.py:827:33: E203 whitespace before ':'
idl-compiler.py:827:48: E203 whitespace before ':'
idl-compiler.py:827:68: E203 whitespace before ':'
idl-compiler.py:827:84: E203 whitespace before ':'
idl-compiler.py:827:100: E203 whitespace before ':'
idl-compiler.py:859:24: E203 whitespace before ':'
idl-compiler.py:859:58: E203 whitespace before ':'
idl-compiler.py:859:78: E203 whitespace before ':'
idl-compiler.py:861:1: E302 expected 2 blank lines, found 1
idl-compiler.py:865:1: E302 expected 2 blank lines, found 1
idl-compiler.py:876:1: E302 expected 2 blank lines, found 1
idl-compiler.py:876:71: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:876:73: E251 unexpected spaces around keyword / parameter
equals
idl-compiler.py:883:21: E222 multiple spaces after operator
idl-compiler.py:884:28: E225 missing whitespace around operator
idl-compiler.py:884:46: E225 missing whitespace around operator
idl-compiler.py:884:49: E272 multiple spaces before keyword
idl-compiler.py:904:86: E203 whitespace before ':'
idl-compiler.py:904:107: E203 whitespace before ':'
idl-compiler.py:906:81: E203 whitespace before ':'
idl-compiler.py:906:106: E203 whitespace before ':'
idl-compiler.py:906:124: E203 whitespace before ':'
idl-compiler.py:906:143: E203 whitespace before ':'
idl-compiler.py:911:49: E203 whitespace before ':'
idl-compiler.py:911:69: E203 whitespace before ':'
idl-compiler.py:911:93: E203 whitespace before ':'
idl-compiler.py:918:85: E203 whitespace before ':'
idl-compiler.py:918:108: E203 whitespace before ':'
idl-compiler.py:918:151: E203 whitespace before ':'
idl-compiler.py:922:62: E203 whitespace before ':'
idl-compiler.py:922:90: E203 whitespace before ':'
idl-compiler.py:925:82: E203 whitespace before ':'
idl-compiler.py:925:110: E203 whitespace before ':'
idl-compiler.py:940:70: E203 whitespace before ':'
idl-compiler.py:940:128: E203 whitespace before ':'
idl-compiler.py:942:110: E203 whitespace before ':'
idl-compiler.py:942:168: E203 whitespace before ':'
idl-compiler.py:948:25: E203 whitespace before ':'
idl-compiler.py:948:75: E203 whitespace before ':'
idl-compiler.py:954:78: E203 whitespace before ':'
idl-compiler.py:954:101: E203 whitespace before ':'
idl-compiler.py:954:144: E203 whitespace before ':'
idl-compiler.py:957:62: E203 whitespace before ':'
idl-compiler.py:957:90: E203 whitespace before ':'
idl-compiler.py:969:13: E271 multiple spaces after keyword
idl-compiler.py:971:13: E271 multiple spaces after keyword
idl-compiler.py:976:1: E302 expected 2 blank lines, found 1
idl-compiler.py:987:1: E302 expected 2 blank lines, found 1
idl-compiler.py:1016:1: E302 expected 2 blank lines, found 1
idl-compiler.py:1023:42: E225 missing whitespace around operator
idl-compiler.py:1024:79: E225 missing whitespace around operator
idl-compiler.py:1027:1: E305 expected 2 blank lines after class or function
definition, found 0

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104112308.19409-1-ultrabug@gentoo.org>
2018-11-14 19:25:13 +02:00
Alexys Jacob
0cf480aad0 gen_segmented_compress_params.py: coding style fixes
gen_segmented_compress_params.py:52:47: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:56:64: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:60:36: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:60:48: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:70:35: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:70:48: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:99:43: E226 missing whitespace around
arithmetic operator
gen_segmented_compress_params.py:106:18: E225 missing whitespace around
operator
gen_segmented_compress_params.py:120:5: E303 too many blank lines (2)
gen_segmented_compress_params.py:200:30: E261 at least two spaces before
inline comment
gen_segmented_compress_params.py:200:31: E262 inline comment should start with
'# '
gen_segmented_compress_params.py:218:76: E261 at least two spaces before
inline comment
gen_segmented_compress_params.py:219:59: E703 statement ends with a semicolon
gen_segmented_compress_params.py:219:60: E261 at least two spaces before
inline comment

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104115753.4701-1-ultrabug@gentoo.org>
2018-11-14 19:25:12 +02:00
Alexys Jacob
43a04ad693 fix_system_distributed_tables.py: coding style fixes
fix_system_distributed_tables.py:28:20: E203 whitespace before ':'
fix_system_distributed_tables.py:29:20: E203 whitespace before ':'
fix_system_distributed_tables.py:30:20: E203 whitespace before ':'
fix_system_distributed_tables.py:31:20: E203 whitespace before ':'
fix_system_distributed_tables.py:33:20: E203 whitespace before ':'
fix_system_distributed_tables.py:34:23: E203 whitespace before ':'
fix_system_distributed_tables.py:35:23: E203 whitespace before ':'
fix_system_distributed_tables.py:39:20: E203 whitespace before ':'
fix_system_distributed_tables.py:40:20: E203 whitespace before ':'
fix_system_distributed_tables.py:41:20: E203 whitespace before ':'
fix_system_distributed_tables.py:42:20: E203 whitespace before ':'
fix_system_distributed_tables.py:43:20: E203 whitespace before ':'
fix_system_distributed_tables.py:44:20: E203 whitespace before ':'
fix_system_distributed_tables.py:45:20: E203 whitespace before ':'
fix_system_distributed_tables.py:46:20: E203 whitespace before ':'
fix_system_distributed_tables.py:47:20: E203 whitespace before ':'
fix_system_distributed_tables.py:48:20: E203 whitespace before ':'
fix_system_distributed_tables.py:52:20: E203 whitespace before ':'
fix_system_distributed_tables.py:53:20: E203 whitespace before ':'
fix_system_distributed_tables.py:54:20: E203 whitespace before ':'
fix_system_distributed_tables.py:55:20: E203 whitespace before ':'
fix_system_distributed_tables.py:56:20: E203 whitespace before ':'
fix_system_distributed_tables.py:57:20: E203 whitespace before ':'
fix_system_distributed_tables.py:58:20: E203 whitespace before ':'
fix_system_distributed_tables.py:59:20: E203 whitespace before ':'
fix_system_distributed_tables.py:60:20: E203 whitespace before ':'
fix_system_distributed_tables.py:61:20: E203 whitespace before ':'
fix_system_distributed_tables.py:62:20: E203 whitespace before ':'
fix_system_distributed_tables.py:66:19: E203 whitespace before ':'
fix_system_distributed_tables.py:67:19: E203 whitespace before ':'
fix_system_distributed_tables.py:72:19: E203 whitespace before ':'
fix_system_distributed_tables.py:73:19: E203 whitespace before ':'
fix_system_distributed_tables.py:74:19: E203 whitespace before ':'
fix_system_distributed_tables.py:78:19: E203 whitespace before ':'
fix_system_distributed_tables.py:79:19: E203 whitespace before ':'
fix_system_distributed_tables.py:80:19: E203 whitespace before ':'
fix_system_distributed_tables.py:84:19: E203 whitespace before ':'
fix_system_distributed_tables.py:85:19: E203 whitespace before ':'
fix_system_distributed_tables.py:89:19: E203 whitespace before ':'
fix_system_distributed_tables.py:90:19: E203 whitespace before ':'
fix_system_distributed_tables.py:91:19: E203 whitespace before ':'
fix_system_distributed_tables.py:95:22: E203 whitespace before ':'
fix_system_distributed_tables.py:96:22: E203 whitespace before ':'
fix_system_distributed_tables.py:99:1: E302 expected 2 blank lines, found 0
fix_system_distributed_tables.py:103:72: E201 whitespace after '['
fix_system_distributed_tables.py:103:82: E202 whitespace before ']'
fix_system_distributed_tables.py:105:43: E201 whitespace after '['
fix_system_distributed_tables.py:105:53: E202 whitespace before ']'
fix_system_distributed_tables.py:111:16: E713 test for membership should be
'not in'
fix_system_distributed_tables.py:118:20: E713 test for membership should be
'not in'
fix_system_distributed_tables.py:135:25: E722 do not use bare 'except'
fix_system_distributed_tables.py:138:5: E722 do not use bare 'except'
fix_system_distributed_tables.py:144:1: E305 expected 2 blank lines after
class or function definition, found 0
fix_system_distributed_tables.py:145:47: E251 unexpected spaces around keyword
/ parameter equals
fix_system_distributed_tables.py:145:49: E251 unexpected spaces around keyword
/ parameter equals
fix_system_distributed_tables.py:160:1: W391 blank line at end of file

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104113001.22783-1-ultrabug@gentoo.org>
2018-11-14 19:25:12 +02:00
Alexys Jacob
c9e3b739ae dist/docker/redhat/scyllasetup.py: coding style fixes
dist/docker/redhat/scyllasetup.py:6:1: E302 expected 2 blank lines, found 1
dist/docker/redhat/scyllasetup.py:41:21: E128 continuation line under-indented for visual indent
dist/docker/redhat/scyllasetup.py:65:22: E201 whitespace after '['
dist/docker/redhat/scyllasetup.py:65:51: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:67:22: E201 whitespace after '['
dist/docker/redhat/scyllasetup.py:67:45: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:69:22: E201 whitespace after '['
dist/docker/redhat/scyllasetup.py:69:42: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:79:18: E201 whitespace after '['
dist/docker/redhat/scyllasetup.py:79:42: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:80:39: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:81:70: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:84:48: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:84:70: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:86:22: E201 whitespace after '['
dist/docker/redhat/scyllasetup.py:86:53: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:86:78: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:89:42: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:89:58: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:92:44: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:92:63: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:95:41: E225 missing whitespace around operator
dist/docker/redhat/scyllasetup.py:95:57: E202 whitespace before ']'
dist/docker/redhat/scyllasetup.py:98:22: E201 whitespace after '['
dist/docker/redhat/scyllasetup.py:98:42: E202 whitespace before ']'

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104110913.13796-1-ultrabug@gentoo.org>
2018-11-14 19:25:11 +02:00
Alexys Jacob
1585983fc9 dist/docker/redhat: coding style fixes
dist/docker/redhat/docker-entrypoint.py:20:1: E722 do not use bare 'except'
dist/docker/redhat/commandlineparser.py:13:13: E128 continuation line
under-indented for visual indent

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104120134.9598-1-ultrabug@gentoo.org>
2018-11-14 19:25:10 +02:00
Alexys Jacob
c24e0e5599 dist/common/scripts/scylla_util.py: coding style fixes
dist/common/scripts/scylla_util.py:388:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:414:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:418:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:453:1: E302 expected 2 blank lines, found 1
dist/common/scripts/scylla_util.py:468:5: E722 do not use bare 'except'
dist/common/scripts/scylla_util.py:472:1: E302 expected 2 blank lines, found 1

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181104120832.11273-1-ultrabug@gentoo.org>
2018-11-14 19:25:09 +02:00
Vladimir Krivopalov
2c21fb4897 Use coloured tests results in test.py script output.
With the number of unit tests approaching one hundred, the output of
test.py becomes more challenging to read.
If some test fails, we will only get the details after all the tests
complete, but some tests take way longer than others.

With the coloured status, it is much simpler to immediately locate
failing tests. Developer can cancel others and repeat the failing
ones.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <63a99a2fb70fdc33fd6eeb8e18fee977a47bd278.1541541184.git.vladimir@scylladb.com>
2018-11-14 19:23:39 +02:00
Piotr Sarna
b04508041d tests: add CONTAINS test case to filtering tests 2018-11-14 16:08:19 +01:00
Piotr Sarna
0fc7d63842 cql3: enable filtering for CONTAINS restriction
With contains::is_satisfied_by(bytes_view) implemented,
it's possible to enable filtering support for CONTAINS restriction.

Fixes #3573
2018-11-14 14:39:21 +01:00
Piotr Sarna
d8a1693d84 cql3: add is_satisfied_by(bytes_view) for CONTAINS
is_satisfied_by that takes a bytes_view parameter is needed for
filtering, so it's provided for CONTAINS restriction.
2018-11-14 14:39:21 +01:00
Botond Dénes
9e4276669b flat_mutation_reader: document next_partition()
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <01fa57c7473c00e4dc891527a8628026b6dccc01.1542180913.git.bdenes@scylladb.com>
2018-11-14 13:38:38 +00:00
Avi Kivity
447f953a2c Merge "Add DEFAULT UNSET support to JSON" from Piotr
"
This series adds DEFAULT UNSET and DEFAULT NULL keyword support
to INSERT JSON statement, as stated in #3909.

Tests: unit (release)
"

* 'add_json_default_unset_2' of https://github.com/psarna/scylla:
  tests: add DEFAULT UNSET case to JSON cql tests
  tests: split JSON part of cql query test
  cql3: add DEFAULT UNSET to INSERT JSON
2018-11-13 09:14:50 -08:00
Piotr Sarna
fc4ecf9be4 tests: add DEFAULT UNSET case to JSON cql tests
A case covering DEFAULT UNSET/DEFAULT NULL params is added
to json cql query test suite.

Refs #3909
2018-11-13 18:06:15 +01:00
Piotr Sarna
cb6fd6a30d tests: split JSON part of cql query test
JSON part of cql query test is split into another file
to make cql_query_test.cc less huge.
2018-11-13 18:06:15 +01:00
Piotr Sarna
e153e590c1 cql3: add DEFAULT UNSET to INSERT JSON
When inserting a JSON, additional DEFAULT UNSET or DEFAULT NULL
keywords can be appended.
With DEFAULT UNSET, values omitted in JSON will not be changed
at all. With DEFAULT NULL (default), omitted values will be
treated as having a 'null' value.

Fixes #3909
2018-11-13 18:05:55 +01:00
Avi Kivity
a089f66755 Merge "ec2_multi_region_snitch: print a proper error message when a Public IP is not available" from Vlad
"
Fix for #3897
"Ec2MultiRegionSnitch: prints a cryptic error when a Public IP is not
available"

Ec2MultiRegionSnitch naturally requires a Public IP to be available and
therefore it's expected to refuse to work without it.

However the error message that is printed today is a total disaster and
has to be fixed ASAP to be something much more human readable.

This series adds a human readable preabmle that will let a poor user
understand what should he/she do.
"

* 'improve-ec2-multi-region-snitch-error-message-when-pulic-address-is-not-available-v2' of https://github.com/vladzcloudius/scylla:
  locator: ec2_multi_region_snitch::start(): print a human readable error if Public IP may not be retrieved
  locator: ec2_multi_region_snitch::start(): rework on top of seastar::thread
2018-11-13 09:02:55 -08:00
Duarte Nunes
a38f6078fb Merge 'Generating view updates during streaming' from Piotr
During streaming, there are cases when we should invoke the view write
path. In particular, if we're streaming because of repair or if a view
has not yet finished building and we're bootstrapping a new node.

The design constraints are:
1) The streamed writes should be visible to new writes, but the
   sstable should not participate in compaction, or we would lose the
   ability to exclude the streamed writes on a restart;
2) The streamed writes must not be considered when generating view
   updates for them;
3) Resilient to node restarts;
4) Resilient to concurrent stream sessions, possibly streaming mutations for overlapping ranges.

We achieve this by writing the streamed writes to an sstable in a
different folder, call it "staging". We achieve 1) by publishing the
sstable to the column family sstable set, but excluding it from
compactions. We do these steps upon boot, by looking at the staging
directory, thus achieving 3).

Fixes #3275

* 'streaming_view_to_staging_sstables_9' of https://github.com/psarna/scylla: (29 commits)
  tests: add materialized views test
  tests: add view update generator to cql test env
  main: add registering staging sstables read from disk
  database: add a check if loaded sstable is already staging
  database: add get_staging_sstable method
  streaming: stream tables with views through staging sstables
  streaming: add system distributed keyspace ref to streaming
  streaming: add view update generator reference to streaming
  main: add generating missed mv updates from staging sstables
  storage_service: move initializing sys_dist_ks before bootstrap
  db/view: add view_update_from_staging_generator service
  db/view: add view updating consumer
  table: add stream_view_replica_updates
  table: split push_view_replica_updates
  table: add as_mutation_source_excluding
  table: move push_view_replica_updates to table.cc
  database: add populating tables with staging sstables
  database: add creating /staging directory for sstables
  database: add sstable-excluding reader
  table: add move_sstable_from_staging_in_thread function
  ...
2018-11-13 15:16:31 +00:00
Piotr Sarna
1724ee55c7 tests: add materialized views test
Right now materialized_views_test.cc contains view updating tests,
but the intention is to move mv-related tests from cql_query_test
here and use it for all future unit testing of MV.
2018-11-13 15:21:55 +01:00
Piotr Sarna
056a78bbc7 tests: add view update generator to cql test env
Keeping view update generator in cql test env enables
generating updates from staging sstables in tests.
2018-11-13 15:04:43 +01:00
Piotr Sarna
16c042039c main: add registering staging sstables read from disk
Staging sstables read from disk are registered to the view update
generator right after initializing non system keyspaces.

Fixes #3275
2018-11-13 15:04:43 +01:00
Piotr Sarna
de43b4f41d database: add a check if loaded sstable is already staging
Staging sstables are loaded before regular ones. If the process
fails midway, an sstable can be linked both in the regular directory
and in staging directory. In such cases, the sstable remains
in staging and will be moved to the regular directory
by view update streamer service.
2018-11-13 15:04:43 +01:00
Piotr Sarna
d7849e6ea4 database: add get_staging_sstable method
This method can be used to check if sstable is staging,
i.e. it shouldn't be compacted and it will not be used
for generating view updates from other staging tables,
and return proper shared_sstable pointer if it is.
2018-11-13 15:04:43 +01:00
Piotr Sarna
32c0fe8df2 streaming: stream tables with views through staging sstables
While streaming to a table with paired views, staging sstables
are used. After the table is written to disk, it's used to generate
all required view updates. It's also resistant to restarts as it's
stored on a hard drive in staging/ directory.

Refs #3275
2018-11-13 15:04:42 +01:00
Piotr Sarna
dc74887ff3 streaming: add system distributed keyspace ref to streaming
Streaming code needs system distributed keyspace to check if streamed
sstables should be staging, so a proper reference is added.
2018-11-13 15:01:53 +01:00
Piotr Sarna
7ef5e1b685 streaming: add view update generator reference to streaming
Streaming code may need view update generator service to generate
and send view updates, so a proper reference is added.
2018-11-13 15:01:53 +01:00
Piotr Sarna
eb0c507a45 main: add generating missed mv updates from staging sstables
If any sstables are found in the staging directory, it means that
they missed generating view updates, so it's performed now.
2018-11-13 15:01:53 +01:00
Piotr Sarna
ca5dfdffc6 storage_service: move initializing sys_dist_ks before bootstrap
Bootstrapping process may need system distributed keyspace
to generate view updates, so initializing sys_dist_ks
is moved before the bootstrapping process is launched.
2018-11-13 15:01:53 +01:00
Piotr Sarna
fc7267c797 db/view: add view_update_from_staging_generator service
A shardable service for generating mv updates after restarts
is added.
2018-11-13 15:01:52 +01:00
Piotr Sarna
ed05d91adc db/view: add view updating consumer
This consumer is used to generate and push view replica updates
from read mutations.
2018-11-13 14:54:39 +01:00
Piotr Sarna
348fa3b092 table: add stream_view_replica_updates
Generating view replica updates during streaming ignores
the staging sstable that is used to generate them.
2018-11-13 14:52:22 +01:00
Piotr Sarna
fed9c59eb8 table: split push_view_replica_updates
push_view_replica_updates is split in order to allow different
mutation source to be provided.
2018-11-13 14:52:22 +01:00
Piotr Sarna
466d780445 table: add as_mutation_source_excluding
A variant of table::as_mutation_source that allows excluding
a single sstable is added.
2018-11-13 14:52:22 +01:00
Piotr Sarna
c825a17b9d table: move push_view_replica_updates to table.cc 2018-11-13 14:52:22 +01:00
Piotr Sarna
a17fcb8d94 database: add populating tables with staging sstables
After populating tables with regular sstables, same procedure
is performed for staging sstables.
2018-11-13 14:52:22 +01:00
Piotr Sarna
19bf94fa8f database: add creating /staging directory for sstables
staging directory is now created on boot.
2018-11-13 14:52:22 +01:00
Piotr Sarna
e88b85134c database: add sstable-excluding reader
When generating view updates from a staging sstable, this sstable
should not be used in the process. Hence, a reader that skips a single
sstable is added.
2018-11-13 14:52:22 +01:00
Avi Kivity
a8203ca799 Update seastar submodule
* seastar c02150e...a44cedf (5):
  > build: link against libatomic
  > dns.cc: Include name/address in resolver error messages
  > log: Print full error message for std::system_error
  > tests: test-utils: Add missing include
  > fstream: Introduce make_file_data_sink()

Fixes #3894.
2018-11-13 03:28:16 -08:00
Piotr Sarna
160a6d58d2 table: add move_sstable_from_staging_in_thread function
After materialized view updates are generated, the sstable
should be moved from staging/ to a regular directory.
It's expected to be called from seastar::async thread context.
2018-11-13 11:45:30 +01:00
Piotr Sarna
ff361ca877 sstables: add move_to_new_dir_in_thread function
When moving sstables between directories, this helper function
will create links and update generation and dir accordingly.
It's expected to be called in thread context.
2018-11-13 11:45:30 +01:00
Piotr Sarna
b7977f4790 sstables: add staging directory to regex
datadir/staging directory becomes a valid path for an sstable.
2018-11-13 11:45:30 +01:00
Piotr Sarna
e42d97060f database: provide nonfrozen version of push_view_replica_updates
Now it's also possible to pass a mutation to push to view replicas.
2018-11-13 11:45:30 +01:00
Piotr Sarna
642c3ae0e0 database: add subdir param to make_streaming_sstable_for_write
This function allows specifying a subfolder to put a newly created
sstable in - e.g. staging/ subfolder for streamed base table mutations.
2018-11-13 11:45:30 +01:00
Piotr Sarna
788e03433c table: init table.cc file
This file will be used to move table-related functions to it.
2018-11-13 11:45:30 +01:00
Piotr Sarna
8e053f9efb database: add staging sstables to a map
SSTables that belong to staging/ directory are put in the
_sstables_staging map.
2018-11-13 11:45:30 +01:00
Piotr Sarna
3970808294 sstables: add is_staging() method
This method returns true if the last part of directory structure
is /staging.
2018-11-13 11:45:30 +01:00
Piotr Sarna
3f34312aa6 database: skip staging sstables in compaction
Staging sstables are not part of the compaction process to ensure
than each sstable can be easily excluded from view generation process
that depends on the mentioned sstable.
2018-11-13 11:45:30 +01:00
Piotr Sarna
701d88e39f database: add staging sstables map
In order to keep track of staging sstables (used for mv updates),
a map of them is now kept in table class.
2018-11-13 11:45:30 +01:00
Paweł Dziepak
6469a1b451 Merge "Write static rows for all partitions if there are static columns" from Vladimir
"
It appears that in case when there are any static columns in serialization header,
Cassandra would write a (possibly empty) static row to every partition
in the SSTables file.

This patchset alings Scylla's logic with that of Cassandra.

Note that Scylla optimizes the case when no partition contains a static
row because it keeps track of updated columns that Scylla currently does
not do - see #3901 for details.

Fixes #3900.
"

* 'projects/sstables-30/write-all-static-rows/v1' of https://github.com/argenet/scylla:
  tests: Test writing empty static rows for partitions in tables with static columns.
  sstables: Ignore empty static rows on reading.
  sstables: Write empty static rows when there are static columns in the table.
2018-11-09 12:01:25 -08:00
Raphael S. Carvalho
1c5934c934 sstables: fix procedure to get fully expired sstables with MC format
MC format lacks ancestors metadata, so we need to workaround it by using
ancestors in metadata collector, which is only available for a sstable
written during this instance. It works fine here because we only want
to know if a sstable recently compacted has an ancestor which wasn't
yet deleted.

Fixes #3852.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Reviewed-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <20181102154951.22950-1-raphaelsc@scylladb.com>
2018-11-06 09:28:37 +02:00
Vladimir Krivopalov
69b453fb69 tests: Test writing empty static rows for partitions in tables with static columns.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-11-05 13:47:30 -08:00
Vladimir Krivopalov
f767dfbb33 sstables: Ignore empty static rows on reading.
Fixes #3900.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-11-05 13:47:30 -08:00
Vladimir Krivopalov
89051d37e3 sstables: Write empty static rows when there are static columns in the table.
This is consistent with what Cassandra does.

Fixes #3900.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-11-05 13:28:50 -08:00
Vladimir Krivopalov
2ebab69ce7 mutation_source_test: Use counter and collection columns in static rows.
They are legal and should be covered along with atomic columns.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <a1c0e0f8c0c0f12b68af6df426370511f4e1253b.1541106233.git.vladimir@scylladb.com>

[tgrabiec: fixed the patch title]
2018-11-02 10:33:27 +01:00
Vlad Zolotarov
2636395c65 locator: ec2_multi_region_snitch::start(): print a human readable error if Public IP may not be retrieved
Public IP is required for Ec2MultiRegionSnitch. If it's not available
different snitch should be used.

This patch would result in a readable error message to be printed
instead of just a cryptic message with HTTP response body.

Fixes #3897

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-11-01 11:50:58 -04:00
Vlad Zolotarov
c462af5549 locator: ec2_multi_region_snitch::start(): rework on top of seastar::thread
Rework ec2_multi_region_snitch::start() on top of seastar::async() in
order to simplify the code.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-11-01 10:48:37 -04:00
Paweł Dziepak
1129134a4a Merge "Convert sprint() calls to fmt" from Avi
"
The update to libfmt 5.2.1 brought with it a subtle change - calls to
sprint("%s", 3) now throw a format_error instead of returning "3". To
prevent such hidden (or not so hidden) bugs from lurking, convert all calls
to the modern fmt syntax.

Such conversion has several benefits:
 - prevent the bug from biting us
 - as fmt is being standardized, we can later move to std::format()
 - commonality with the logger format syntax (indeed, we may move the logger
   to use libfmt itself)

During the conversion, some bugs were caught and fixed. These are presented in
individual patches in the patchset.

Most of the conversion was scripted, using https://github.com/avikivity/unsprint.

Some sprint() calls remain, as they were too complex for the script. They
will be converted later.
"

* tag 'fmt-1/v1' of https://github.com/avikivity/scylla:
  toplevel: convert sprint() to format()
  repair: convert sprint() to format()
  tests: convert sprint() to format()
  tracing: convert sprint() to format()
  service: convert sprint() to format()
  exceptions: convert sprint() to format()
  index: convert sprint() to format()
  streaming: convert sprint() to format()
  streaming: progress_info: fix format string
  api: convert sprint() to format()
  dht: convert sprint() to format()
  thrift: convert sprint() to format()
  locator: convert sprint() to format()
  gms: convert sprint() to format()
  db: convert sprint() to format()
  transport: convert sprint() to format()
  utils: convert sprint() to format()
  sstables: convert sprint() to format()
  auth: convert sprint() to format()
  cql3: convert sprint() to format()
  row_cache: fix bad format string syntax
  repair: fix bad format string syntax
  tests: fix bad format string syntax
  dht: fix bad format string syntax
  sstables: fix bad format string syntax
  utils: estimated_histogram: convert generated format strings to fmt
  tests: perf_fast_forward: rename "format" variable
  tests: perf_fast_forward: massage result of sprint() into std::string
  utils: i_filter: rename "format" variable
  system_keyspace: simplify complicated sprint()
  cql: convert Cql.g sprint()s to fmt
  types: get rid of PRId64 formatting
2018-11-01 13:16:17 +00:00
Avi Kivity
a71ab365e3 toplevel: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
51ce53738f repair: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
f70ece9f88 tests: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
239ecec043 tracing: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
bb0eb9dae8 service: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
71fc5fb738 exceptions: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
7ae23d8f9b index: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
fd513c42ad streaming: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
8501e2a45d streaming: progress_info: fix format string
We try to escape % as \%, but the correct escape is %%.
2018-11-01 13:16:17 +00:00
Avi Kivity
da17c29bd3 api: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
82818758ca dht: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
7a125c6634 thrift: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
0c33d13165 locator: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
e096fa2fde gms: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
d77e044cde db: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
5f79ff0f54 transport: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
be99101f36 utils: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
455f00e993 sstables: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
eb74fe784d auth: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
cb7ee5c765 cql3: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Avi Kivity
8cca3b2879 row_cache: fix bad format string syntax
Some sprint() calls use the fmt language instead of the printf syntax. Convert
them all the way to format().
2018-11-01 13:16:17 +00:00
Avi Kivity
6488b017c3 repair: fix bad format string syntax
Some sprint() calls use the fmt language instead of the printf syntax. Convert
them all the way to format().
2018-11-01 13:16:17 +00:00
Avi Kivity
bceff1550c tests: fix bad format string syntax
Some sprint() calls use the fmt language instead of the printf syntax. Convert
them all the way to format().
2018-11-01 13:16:17 +00:00
Avi Kivity
7ff5569ee8 dht: fix bad format string syntax
Some sprint() calls use the fmt language instead of the printf syntax. Convert
them all the way to format().
2018-11-01 13:16:17 +00:00
Avi Kivity
738e713edf sstables: fix bad format string syntax
Some sprint() calls use the fmt language instead of the printf syntax. Convert
them all the way to format().
2018-11-01 13:16:17 +00:00
Avi Kivity
3cf434b863 utils: estimated_histogram: convert generated format strings to fmt
Convert printf games to format games.

Note that fmt supports specifying the field width as an argument, but that
is left to a dedicated change.
2018-11-01 13:16:17 +00:00
Avi Kivity
8ca4b7abea tests: perf_fast_forward: rename "format" variable
The format local variable will soon alias with the format function which we
intend to use in the same context. Rename it away to avoid a clash.
2018-11-01 13:16:17 +00:00
Avi Kivity
7908f09148 tests: perf_fast_forward: massage result of sprint() into std::string
sprint() returns std::string(), but the new format() returns an sstring. Usually
an sstring is wanted but in this case an sstring will fail as it is added to
an std::string.

Fix the failure (after spring->format conversion) by converting to an std::string.
2018-11-01 13:16:17 +00:00
Avi Kivity
7726ce23b7 utils: i_filter: rename "format" variable
The format variable hides the format function, which we'll soon want to use
here. Rename the format variable to unhide the function.
2018-11-01 13:16:17 +00:00
Avi Kivity
04b70a2ff8 system_keyspace: simplify complicated sprint()
update_peer_info() uses two sprint()s where one would do, which confuses
the sprint-to-fmt translator. Simplify the code by using just one call.
2018-11-01 13:16:17 +00:00
Avi Kivity
23e05a045b cql: convert Cql.g sprint()s to fmt
The only sprint() call had an extra complication due to quoting, which can be
removed now.
2018-11-01 13:16:16 +00:00
Avi Kivity
8db8c01fbe types: get rid of PRId64 formatting
It's not needed for out sprint() implementation, and gets in the way of
converting all formatting to fmt.
2018-11-01 13:16:16 +00:00
Avi Kivity
f170e3e589 Merge "dist: use perftune.py for disks tuning" from Vlad
"
Use perftune.py for tuning disks:
   - Distribute/pin disks' IRQs:
      - For NVMe drives: evenly among all present CPUs.
      - For non-NVMe drives: according to chosen tuning mode.
   - For all disks used by scylla:
      - Tune nomerges
      - Tune I/O scheduler.

It's important to tune NIC and disks together in order to keep IRQ
pinning in the same mode.

Disk are detected and tuned based on the current content of
/etc/scylla/scylla.yaml configuration file.
"

Fixes #3831.

* 'use_perftune_for_disks-v3' of https://github.com/vladzcloudius/scylla:
  dist: change the sysconfig parameter name to reflect the new semantics
  scylla_util.py::sysconfig_parser: introduce has_option()
  dist: scylla_setup and scylla_sysconfig_setup: change paremeters names to reflect new semantics
  dist: don't distribute posix_net_conf.sh any more
  dist: use perftune.py to tune disks and NIC
2018-11-01 13:13:49 +00:00
Avi Kivity
96173e81e0 Update seastar submodule
* seastar c1e0e5d...c02150e (5):
  > prometheus: pass names as query parameter instead of part of the URL
  > treewide: convert printf() style formatting to fmt
  > print: add fmt_print()
  > build: Remove experimental CMake support
  > Merge "Correct and clean-up `signal_test`" from Jesse
2018-11-01 13:13:48 +00:00
Yibo Cai (Arm Technology China)
79136e895f utils/crc: calculate crc in parallel
It achieves 2.0x speedup on intel E5 and 1.1x to 2.5x speedup on
various arm64 microarchitectures.

The algorithm cuts data into blocks of 1024 bytes and calculates crc
for each block, which is furthur divided into three subblocks of 336
bytes(42 uint64) each, and 16 remaining bytes(2 uint64).

For each iteration, three independent crc are caculated for one uint64
from each subgroup. It increases IPC(instructions per cycle) much.
After subblocks are done, three crc and remaining two uint64 are
combined using carry-less multiplication to reach the final result
for one block of 1024 bytes.

Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1541042759-24767-1-git-send-email-yibo.cai@arm.com>
2018-11-01 10:19:32 +02:00
Vlad Zolotarov
84d341a12d dist: change the sysconfig parameter name to reflect the new semantics
We tune NIC and disks together now. Change the sysconfig parameter to
reflect this new semantics.

However if we detect an old parameter name in the scylla-server we would
still update it thereby keeping the support for old installations.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-31 15:28:13 -04:00
Vlad Zolotarov
7950062a82 scylla_util.py::sysconfig_parser: introduce has_option()
has_option() returns TRUE if a given configuration option is set.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-31 15:27:00 -04:00
Vlad Zolotarov
9a5373254a dist: scylla_setup and scylla_sysconfig_setup: change paremeters names to reflect new semantics
Change the name of the corresponding parameter (--setup-nic) to reflect
the fact that we tune not just NIC now but rather NIC and disks together.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-31 15:27:00 -04:00
Vlad Zolotarov
c74e1a9368 dist: don't distribute posix_net_conf.sh any more
We don't need it since we use perftune.py directly

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-31 15:27:00 -04:00
Vlad Zolotarov
0e47d8bb1d dist: use perftune.py to tune disks and NIC
Tune disks using perftune.py together with NIC.
This is needed because disk(s) and NIC tuning has to be
performed using the mode (for non-NVMe disks).

We tune disks based on the current content of /etc/scylla/scylla.yaml.

Don't use scylla-blocktune for optimizing disks' performance
any more.

Unite the decision to optimize the NIC and disks tuning.
Optimize or not optimize them both together.

Disable disk tuning for DPDK and "virtio" modes for now.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-31 15:27:00 -04:00
Takuya ASADA
5bf9a03d65 dist/debian: skip running dh_strip_nondeterminism
On some Fedora environment dh build tries to run
dh_strip_nondeterminism, and fails sice Fedora does not provide such
command.
(see:
http://jenkins.cloudius-systems.com/view/master/job/scylla-master/job/unified-deb/3/console)

To prevent the build error we need to skip it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181030062935.9930-1-syuu@scylladb.com>
2018-10-31 10:23:54 +02:00
Tomasz Grabiec
62c7685b0d Merge "Proper support for static rows in SSTables 3.x" from Vladimir
This patchset addresses two issues with static rows support in SSTables
3.x. ('mc' format):

1. Since collections are allowed in static rows, we need to check for
complex deletion, set corresponding flag and write tombstones, if any.
2. Column indices need to be partitioned for static columns the same way
they are partitioned for regular ones.

 * github.com/argenet/scylla.git projects/sstables-30/columns-proper-order-followup/v1:
  sstables: Partition static columns by atomicity when reading/writing
    SSTables 3.x.
  sstables: Use std::reference_wrapper<> instead of a helper structure.
  sstables: Check for complex deletion when writing static rows.
  tests: Add/fix comments to
    test_write_interleaved_atomic_and_collection_columns.
  tests: Add test covering inverleaved atomic and collection cells in
    static row.
2018-10-30 10:36:46 +01:00
Vladimir Krivopalov
d82ac02fad tests: Add test covering inverleaved atomic and collection cells in static row.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-29 15:01:34 -07:00
Vladimir Krivopalov
7bd95399ed tests: Add/fix comments to test_write_interleaved_atomic_and_collection_columns.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-29 15:00:55 -07:00
Vladimir Krivopalov
6bd738ceb1 sstables: Check for complex deletion when writing static rows.
It is possible to have collections in a static row so we need to check
for collection-wide tombstones like with clustering rows.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-29 14:59:19 -07:00
Vladimir Krivopalov
6b7003088a sstables: Use std::reference_wrapper<> instead of a helper structure.
No need to store column_id separately as it can be accessed from the
column_definition.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-29 14:58:08 -07:00
Vladimir Krivopalov
8592b834d1 sstables: Partition static columns by atomicity when reading/writing SSTables 3.x.
Collections are permitted in static rows so same partitioning as for
regular columns is required.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-29 10:32:02 -07:00
Takuya ASADA
2ac14dcf25 dist/redhat: prevent build error on older Fedora/CentOS
Current scylla.spec fails build on Fedora 27, since python2-pystache is
new package name that renamed on Fedora 28.
But Fedora 28's python2-pystache has tag "Provides: pystache",
so we can depends on old package name, this way we can build scylla.spec both
on Fedora 27/28.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181028175450.31156-1-syuu@scylladb.com>
2018-10-29 11:36:40 +02:00
Yibo Cai (Arm Technology China)
1c48e3fbec utils/crc: leverage arm64 crc extension
It achieves 6.7x to 11x speedup on various arm64 microarchitectures.

Signed-off-by: Yibo Cai <yibo.cai@arm.com>
Message-Id: <1540781879-15465-1-git-send-email-yibo.cai@arm.com>
2018-10-29 10:50:48 +02:00
Nadav Har'El
b8337f8c9d Materalized views: fix race condition in resharding while view building
When a node reshards (i.e., restarts with a different number of CPUs), and
is in the middle of building a view for a pre-existing table, the view
building needs to find the right token from which to start building on all
shards. We ran the same code on all shards, hoping they would all make
the same decision on which token to continue. But in some cases, one
shard might make the decision, start building, and make progress -
all before a second shard goes to make the decision, which will now
be different.

This resulted, in some rare cases, in the new materialized view missing
a few rows when the build was interrupted with a resharding.

The fix is to add the missing synchronization: All shards should make
the same decision on whether and how to reshard - and only then should
start building the view.

Fixes #3890
Fixes #3452

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181028140549.21200-1-nyh@scylladb.com>
2018-10-28 17:20:10 +00:00
Avi Kivity
75dbff984c Merge "Re-order columns when reading/writing SSTables 3.x" from Vladimir
"
In Cassandra, row columns are stored in a BTree that uses the following
ordering on them:
    - all atomic columns go first, then all multi-cell ones
    - columns of both types (atomic and multi-cell) are
      lexicographically ordered by name regarding each other

Scylla needs to store columns and their respective indices using the
same ordering as well as when reading them back.

Fixes #3853

Tests: unit {release}

+

Checked that the following SSTables are dumped fine using Cassandra's
sstabledump:

cqlsh:sst3> CREATE TABLE atomic_and_collection3 ( pk int, ck int, rc1 text, rc2 list<text>, rc3 text, rc4 list<text>, rc5 text, rc6 list<text>, PRIMARY KEY (pk, ck)) WITH compression = {'sstable_compression': ''};
cqlsh:sst3> INSERT INTO atomic_and_collection3 (pk, ck, rc1, rc4, rc5) VALUES (0, 0, 'hello', ['beautiful','world'], 'here');
<< flush >>

sstabledump:

[
  {
    "partition" : {
      "key" : [ "0" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 96,
        "clustering" : [ 0 ],
        "liveness_info" : { "tstamp" : "1540599270139464" },
        "cells" : [
          { "name" : "rc1", "value" : "hello" },
          { "name" : "rc5", "value" : "here" },
          { "name" : "rc4", "deletion_info" : { "marked_deleted" : "1540599270139463", "local_delete_time" : "1540599270" } },
          { "name" : "rc4", "path" : [ "45e22cb0-d97d-11e8-9f07-000000000000" ], "value" : "beautiful" },
          { "name" : "rc4", "path" : [ "45e22cb1-d97d-11e8-9f07-000000000000" ], "value" : "world" }
        ]
      }
    ]
  }
]
"

* 'projects/sstables-30/columns-proper-order/v1' of https://github.com/argenet/scylla:
  tests: Test interleaved atomic and multi-cell columns written to SSTables 3.x.
  sstables: Re-order columns (atomic first, then collections) for SSTables 3.x.
  sstables: Use a compound structure for storing information used for reading columns.
2018-10-28 10:56:09 +02:00
Rafi Einstein
32525f2694 Space-Saving Top-k algorithm for handling stream summary statistics
Based on the following implementation ([2]) for the Space-Saving algorithm from [1].
[1] http://www.cse.ust.hk/~raywong/comp5331/References/EfficientComputationOfFrequentAndTop-kElementsInDataStreams.pdf
[2] https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/StreamSummary.java

The algorithm keeps a map between keys seen and their counts, keeping a bound on the number of tracked keys.
Replacement policy evicts the key with the lowest count while inheriting its count, and recording an estimation
of the error which results from that.
This error estimation can be later used to prove if the distribution we arrived at corresponds to the real top-K,
which we can display alongside the results.
Accuracy depends on the number of tracked keys.

Introduced as part of 'nodetool toppartition' query implementation.

Refs #2811
Message-Id: <20181027220937.58077-1-rafie@scylladb.com>
2018-10-28 10:10:28 +02:00
Vladimir Krivopalov
f3dc2a4927 tests: Test interleaved atomic and multi-cell columns written to SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-26 16:58:34 -07:00
Vladimir Krivopalov
7e56e9fca6 sstables: Re-order columns (atomic first, then collections) for SSTables 3.x.
In Cassandra, row columns are stored in a BTree that uses the following
ordering on them:
    - all atomic columns go first, then all multi-cell ones
    - columns of both types (atomic and multi-cell) are
      lexicographically ordered by name regarding each other

Since schema already has all columns lexicographically sorted by name,
we only need to stably partition them by atomicity for that.

Fixes #3853

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-26 15:58:33 -07:00
Vladimir Krivopalov
210507b867 sstables: Use a compound structure for storing information used for reading columns.
This representation makes it easier to operate with compound structures
instead of separate values that were stored in multiple containers.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-26 11:32:44 -07:00
Tomasz Grabiec
cf2d5c19fb Merge "Properly write static rows missing columns for SSTables 3.x." from Vladimir
Before this fix, write_missing_columns() helper would always deal with
regular columns even when writing static rows.

This would cause errors on reading those files.

Now, the missing columns are written correctly for regular and static
rows alike.

* github.com/argenet/scylla.git projects/sstables-30/fix-writing-static-missing-columns/v1:
  schema: Add helper method returning the count of columns of specified
    kind.
  sstables: Honour the column kind when writing missing columns in 'mc'
    format.
  tests: Add test for a static row with missing columns (SStables 3.x.).
2018-10-26 09:06:01 +02:00
Vladimir Krivopalov
9843343ad8 tests: Add test for a static row with missing columns (SStables 3.x.).
This is a test case for #3892.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-25 17:16:31 -07:00
Vladimir Krivopalov
44043cfd44 sstables: Honour the column kind when writing missing columns in 'mc' format.
Previously, we've been writing the wrong missing columns indices for
static rows because write_missing_columns() explicitly used regular
columns internally.

Now, it takes the proper column kind into account.

Fixes #3892

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-25 17:09:09 -07:00
Vladimir Krivopalov
399f815a89 schema: Add helper method returning the count of columns of specified kind.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-25 17:07:20 -07:00
Tomasz Grabiec
dcac0ac80c tests: sstables: Verify no index reads during scans which dont need it
Reproducer for https://github.com/scylladb/scylla/issues/3868

Message-Id: <1540459849-27612-2-git-send-email-tgrabiec@scylladb.com>
2018-10-25 16:14:45 +03:00
Tomasz Grabiec
46d0c157ae tests: sstables: Extract make_sstable_mutation_source()
Message-Id: <1540459849-27612-1-git-send-email-tgrabiec@scylladb.com>
2018-10-25 16:14:39 +03:00
Tomasz Grabiec
fe0a0bdf1e utils/loading_shared_values: Add missing stat update call in one of the cases
Message-Id: <1540469591-32738-1-git-send-email-tgrabiec@scylladb.com>
2018-10-25 15:15:05 +03:00
Duarte Nunes
e46ef6723b Merge seastar upstream
* seastar d152f2d...c1e0e5d (6):
  > scripts: perftune.py: properly merge parameters from the command line and the configuration file
  > fmt: update to 5.2.1
  > io_queue: only increment statistics when request is admitted
  > Adds `read_first_line.cc` and `read_first_line.hh` to CMake.
  > fstream: remove default extent allocation hint
  > core/semaphore: Change the access of semaphore_units main ctor

Due to a compile-time fight between fmt and boost::multiprecision, a
lexical_cast was added to mediate.

sprint("%s", var) no longer accepts numeric values, so some sprint()s were
converted to format() calls. Since more may be lurking we'll need to remove
all sprint() calls.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-25 12:53:30 +03:00
Benny Halevy
2a57c454f2 update_compaction_history: handle execute_cql exception
Fixes #3774

Tested using view_schema_test with and without injecting an exception in
modification_statement::do_execute for "compaction_history".

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181017105758.9602-3-bhalevy@scylladb.com>
2018-10-24 18:39:53 +03:00
Benny Halevy
44e5c2643b compaction_manager::maybe_stop_on_error: add stop_iteration param
some call sites are stopping in any case, regardless of what
maybe_stop_on_error returns. Reflect that in the log messages.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181017105758.9602-2-bhalevy@scylladb.com>
2018-10-24 18:39:52 +03:00
Avi Kivity
8210f4c982 Merge "Properly writing/reading shadowable deletions with SSTables 3.x." from Vladimir
"
This patchset adddresses two problems with shadowable deletions handling
in SSTables 3.x. ('mc' format).

Firstly, we previously did not set a flag indicating the presence of
extended flags byte with HAS_SHADOWABLE_DELETION bitmask on writing.
This would break subsequent reading and cause all types of failures up
to crash.

Secondly, when reading rows with this extended flag set, we need to
preserve that information and create a shadowable_tombstone for the row.

Tests: unit {release}
+

Verified manually with 'hexdump' and using modified 'sstabledump' that
second (shadowable) tombstone is written for MV tables by Scylla.

+
DTest (materialized_views_test.py:TestMaterializedViews.hundred_mv_concurrent_test)
that originally failed due to this issue has successfully passed locally.
"

* 'projects/sstables-30/shadowable-deletion/v4' of https://github.com/argenet/scylla:
  tests: Add tests writing both regular and shadowable tombstones to SSTables 3.x.
  tests: Add test covering writing and reading a shadowable tombstone with SSTables 3.x.
  sstables: Support Scylla-specific extension for writing shadowable tombstones.
  sstables: Introduce a feature for shadowable tombstones in Scylla.db.
  memtable: Track regular and shadowable tombstones separately in encoding_stats_collector.
  sstables: Error out when reading SSTables 3.x with Cassandra shadowable deletion.
  sstables: Support checking row extension flags for Cassandra shadowable deletion.
2018-10-24 18:20:16 +03:00
Tomasz Grabiec
9e756d3863 sstable_mutation_reader: Do not read partition index when scanning
Even when we're using a full clustering range, need_skip() will return
true when we start a new partition and advance_context() will be
called with position_in_partition::before_all_clustered_rows(). We
should detect that there is no need to skip to that position before
the call to advance_to(*_current_partition_key), which will read the
index page.

Fixes #3868.

Message-Id: <1539881775-8578-1-git-send-email-tgrabiec@scylladb.com>
2018-10-24 15:55:13 +03:00
Avi Kivity
925ef48fce Merge "Use relocatable package to generate .rpm/.deb" from Takuya
"
This patchset adds support generating .rpm/.deb from relocatable
package.
"

* 'reloc_rpmdeb_v5' of https://github.com/syuu1228/scylla:
  configure.py: run create-relocatable-package.py everytime
  configure.py: add SCYLLA-RELEASE-FILE/SCYLLA-VERSION-FILE targets
  configure.py: use {mode} instead of $mode on scylla-package.tar.gz build target
  dist/ami: build relocatable .rpm when --localrpm specified
  dist/debian: use relocatable package to produce .deb
  dist/redhat: use relocatable package to produce .rpm
  install-dependencies.sh: add libsystemd as dependencies
  install.sh: drop hardcoded distribution name, add --target option to specify distribution
  build: add script to build relocatable package
  build: compress relocatable package
  build: add files on relocatable package to support generating .rpm/.deb
2018-10-24 14:44:09 +03:00
Takuya ASADA
59e4900ca7 configure.py: run create-relocatable-package.py everytime
Right now we don't have dependencies for dist/, ninja not able to detect
changes under the directory.
To update relocatable package even only change is under dist/, we need
to run create-relocatable-package.py everytime.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
6e1617d71c configure.py: add SCYLLA-RELEASE-FILE/SCYLLA-VERSION-FILE targets
To re-generate scylla version files when it removed, since these files
required for relocatable package.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
0cb8a4cb0c configure.py: use {mode} instead of $mode on scylla-package.tar.gz build target
It's better to use {mode} to extract fixed path just like other build targets do.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
929f03533d dist/ami: build relocatable .rpm when --localrpm specified
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
f3c3b9183c dist/debian: use relocatable package to produce .deb
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
8e2dc9e4f4 dist/redhat: use relocatable package to produce .rpm
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
5fa7ed52e3 install-dependencies.sh: add libsystemd as dependencies
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
ce4067ca02 install.sh: drop hardcoded distribution name, add --target option to specify distribution
Allow user to build .rpm for Fedora, need to support specifying distribution.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
6319229020 build: add script to build relocatable package
To build relocatable package easier, add build_reloc.sh to build it in
one command.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
a502715b29 build: compress relocatable package
Since debian packaging system requires source package to compress tar
file, so let's use .gz compression.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Takuya ASADA
85fed12c07 build: add files on relocatable package to support generating .rpm/.deb
We are missing some files on relocatable package to generate .rpm/.deb,
add them.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2018-10-24 11:29:47 +00:00
Paweł Dziepak
637b9a7b3b atomic_cell_or_collection: make operator<< show cell content
After the new in-memory representation of cells was introduced there was
a regression in atomic_cell_or_collection::operator<< which stopped
printing the content of the cell. This makes debugging more incovenient
are time-consuming. This patch fixes the problem. Schema is propagated
to the atomic_cell_or_collection printer and the full content of the
cell is printed.

Fixes #3571.

Message-Id: <20181024095413.10736-1-pdziepak@scylladb.com>
2018-10-24 13:29:51 +03:00
Avi Kivity
a9836ad758 thrift: limit message size
Limit message size according to the configuration, to avoid a huge message from
allocating all of the server's memory.

We also need to limit memory used in aggregate by thrift, but that is left to
another patch.

Fixes #3878.
Message-Id: <20181024081042.13067-1-avi@scylladb.com>
2018-10-24 09:57:58 +01:00
Raphael S. Carvalho
c958294991 tests/sstable_perf: fix compaction mode for a multi shard instance
Compaction mode fails if more than one shard is used because it doesn't
make sure sstables used as input for compaction only contain local keys.
Therefore, sstable generated by compaction has less keys than expected
because non-local keys are purged out.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181022225153.12029-1-raphaelsc@scylladb.com>
2018-10-24 09:58:34 +03:00
Glauber Costa
fc5635100d install seastar-addr2line and seastar-cpumap into scylla packages
It is very useful for investigations in scylla issues, and we have
been moving those scripts manually when needed. Make it officially
part of the scylla package.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181023184400.23187-1-glauber@scylladb.com>
2018-10-24 09:52:17 +03:00
Amnon Heiman
6bcde841bd scyllatop: Nicer error message when fail opening a log file or connecting
scyllatop uses a log file, if opening the file fails, the user should
get a clear response not an exception trace.

The same is true for connecting to scylla

After this patch the following:
$ scyllatop.py -L /usr/lib/scyllatop.log
scyllatop failed opening log file: '/usr/lib/scyllatop.log' With an error: [Errno 13] Permission denied: '/usr/lib/scyllatop.log'

Fixes #3860

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20181021065525.22749-1-amnon@scylladb.com>
2018-10-24 09:50:45 +03:00
Vlad Zolotarov
4d1bb719a4 config: enable hinted handoff by default
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20181019180401.12400-1-vladz@scylladb.com>
2018-10-24 09:47:36 +03:00
Vladimir Krivopalov
ad599d4342 tests: Add tests writing both regular and shadowable tombstones to SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
3dcf0acfc2 tests: Add test covering writing and reading a shadowable tombstone with SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
759d36a26e sstables: Support Scylla-specific extension for writing shadowable tombstones.
The original SSTables 'mc' format, as defined in Cassandra, does not provide
a way to store shadowable deletion in addition to regular row deletion
for materialized views.
It is essential to store it because of known corner-case issues that
otherwise appear.

For this to work, we introduce a Scylla-specific extended flag to be set
in SSTables in 'mc' format that indicates a shadowable tombstone is
written after the regular row tombstone.

This is deemed to be safe because shadowable tombstones are specific to
materialized views and MV tables are not supposed to be imported or
exported.

Note that a shadowable tombstone can be written without a regular
tombstone as well as along with it.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
e168433945 sstables: Introduce a feature for shadowable tombstones in Scylla.db.
This is used to indicate that the SSTables being read may contain a
Scylla-specific HAS_SCYLLA_SHADOWABLE_TOMBSTONE extended flag set.

If feature is not disabled, we should not honour this flag.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
a95ba2f38a memtable: Track regular and shadowable tombstones separately in encoding_stats_collector.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
b7d48c1ccd sstables: Error out when reading SSTables 3.x with Cassandra shadowable deletion.
This flag can be only set in MV tables that are not supported to be
imported to Scylla.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Vladimir Krivopalov
8f79f76116 sstables: Support checking row extension flags for Cassandra shadowable deletion.
This flag can be only used in MV tables that are not supposed to be
imported to Scylla.
Since Scylla representation of shadowable tombstones differs from that
of Cassandra, such SSTables are rejected on read and Scylla never sets
this flag on writing.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-23 16:30:42 -07:00
Avi Kivity
1533487ba8 Merge "hinted handoff: give a sender a low priority" from Vlad
"
Hinted handoff should not overpower regular flows like READs, WRITEs or
background activities like memtable flushes or compactions.

In order to achieve this put its sending in the STEAMING CPU scheduling
group and its commitlog object into the STREAMING I/O scheduling group.

Fixes #3817
"

* 'hinted_handoff_scheduling_groups-v2' of https://github.com/vladzcloudius/scylla:
  db::hints::manager: use "streaming" I/O scheduling class for reads
  commitlog::read_log_file(): set the a read I/O priority class explicitly
  db::hints::manager: add hints sender to the "streaming" CPU scheduling group
2018-10-23 16:55:05 +00:00
Raphael S. Carvalho
65e8853e8d tests: test that sstable cleanup wont get rid of key which token belongs to node
Commit 1ce52d54 fixed sort order of local ranges, which is needed for cleanup to
work properly because it relies on that to perform a binary search.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20181023031322.22763-1-raphaelsc@scylladb.com>
2018-10-23 16:55:05 +00:00
Avi Kivity
d9e0ea6bb0 config: mark range_request_timeout_in_ms and request_timeout_in_ms as Used
This makes them available in scylla --help.

Fixes #3884.
Message-Id: <20181023101150.29856-1-avi@scylladb.com>
2018-10-23 11:52:03 +01:00
Paweł Dziepak
c94d2b6aa6 cql3: restore original timeout behaviour for aggregate queries
Commit 1d34ef38a8 "cql3: make pagers use
time_point instead of duration" has unintentionally altered the timeout
semantics for aggregate queries. Such requests fetch multiple pages before
sending a response to the client. Originally, each of those fetches had
a timeout-duration to finish, after the problematic commit the whole
request needs to complete in a single timeout-duration. This,
unsurprisingly, makes some queries that were successful before fail with
a timeout. This patch restores the original behaviour.

Fixes #3877.

Message-Id: <20181022125318.4384-1-pdziepak@scylladb.com>
2018-10-23 12:52:42 +03:00
Takuya ASADA
950dbdb466 dist/common/sysctl.d: add new conf file to set fs.aio-max-nr
We need raise fs.aio-max-nr to larger value since Seastar may allocates
more then 65535 AIO events (= kernel default value)

Fixes #3842

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181023030449.15445-1-syuu@scylladb.com>
2018-10-23 11:01:07 +03:00
Tomasz Grabiec
a34e417874 Merge "Stabilise perf_fast_forward results" from Paweł
his series attempts to make fragments per second results reported by
perf_fast_forward more stable. That includes running each test case
multiple time and reporting median, median average deviation, maximum
and minimum value. That should allow to relatively easily assess how
repeatable the presented results are. Moreover, since perf_fast_forward
does IO operation it is important that they do not introduce any
excessive noise to the results. The location of the data directory
is made configurable so that the user can choose less noisy disk or a
ramdisk.

 * github.com/pdziepak/scylla.git stabilise-perf_fast_forward/v3:
  tests/perf_fast_forward: make fragments/s measurements more stable
  tests/perf_fast_forward: make data directory location configurable
2018-10-22 18:33:25 +02:00
Avi Kivity
d5d831f41b tests: network_topology_strategy_test: remove quadratic complexity
network_topology_strategy test creates a ring with hundreds of tokens (and one
token per node). Then, for each token, it calls get_primary_ranges(), which in
turn walks the token ring. However, because the each datacenter occupies a
disjoint token range, this walk practically has to walk the entire ring until
it collects enough endpoints for each datacenter. The whole thing takes 15 minutes.

Speed this up by randomizing the token<->dc relationship. This is more realistic,
and switches the algorithm to be O(token count), and now it completes in less
than a minute (still not great, but better).
Message-Id: <20181022154026.19618-1-avi@scylladb.com>
2018-10-22 17:06:57 +01:00
Paweł Dziepak
63a705dca3 tests/perf_fast_forward: make data directory location configurable
perf_fast_forward populates perf_fast_forward_output with some data and
then runs performance tests that read it. That makes the disk a
significant factor in the final result and may make the results less
repeatable. This patch adds a flag that allows setting the location
of the data directory so that the user can opt for a less noisy disk
or a ramdisk.
2018-10-22 16:52:58 +01:00
Paweł Dziepak
29e872f865 tests/perf_fast_forward: make fragments/s measurements more stable
perf_fast_forward performs various operations, many of which involve
sstable reads and verifies the metrics that there weren't any
unnecessary IO operations. It also provides fragments per seconds
measurements for the tests it runs. However, since some of the tests are
very short and involve IO those values vary a lot what makes them not
very useful.

This commit attempts to stabilise those results. Each test case is run
multiple time (by default for a second, but at least 3 times) and shows
median, median absolute deviation, maximum and minimum value. This
should allow assessing whether the changes in the results are just noise
or a real regression or improvement.
2018-10-22 16:52:58 +01:00
Duarte Nunes
f3a5ec0fd9 db/view: Don't copy keyspace name
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181022104527.14555-1-duarte@scylladb.com>
2018-10-22 13:00:00 +02:00
George Kollias
c2343dc841 Make restricting reader fill_buffer more efficient
Currently, restricting_mutation_reader::fill_buffer justs reads
lower-layer reader's fragments one by one without doing any further
transformations. This change just swaps the parent-child buffers in a
single step, as suggested in #3604, and, hence, removing any possible
per-fragment overhead.

I couldn't find any test that exercises restricting_mutation_reader as
a mutation source, so I added test_restricted_reader_as_mutation_source
in mutation_reader_test.

Tests: unit (release), though these 4 tests are failing regardless of
my changes (they fail on master for me as well): snitch_reset_test,
sstable_mutation_test, sstable_test, sstable_3_x_test.

Fixes: #3604

Signed-off-by: George Kollias <georgioskollias@gmail.com>
Message-Id: <1540052861-621-1-git-send-email-georgioskollias@gmail.com>
2018-10-22 11:36:54 +03:00
Duarte Nunes
3fe92663d4 Merge 'Fix for a select statement with filtered columns' from Eliran
"
This patchset fixes #3803. When a select statement with filtering
is executed and the column that is needed for the filtering is not
present in the select clause, rows that should have been filtered out
according to this column will still be present in the result set.

Tests:
 1. The testcase from the issue.
 2. Unit tests (release) including the
 newly added test from this patchset.
"

* 'issues/3803/v10' of https://github.com/eliransin/scylla:
  unit test: add test for filtering queries without the filtered column
  cql3 unit test: add assertion for the number of serialized columns
  cql3: ensure retrieval of columns for filtering
  cql3: refactor find_idx to be part of statement restrictions object
  cql3: add prefix size common functionality to all clustering restrictions
  cql3: rename selection metadata manipulation functions
2018-10-21 09:53:37 +01:00
Eliran Sinvani
145f931ae7 unit test: add test for filtering queries without the filtered column
Test the usecase where the column that the filtering operates on
is not a part of the select clause. The expected result is a set
containing the columns of the select clause with the additional
columns for filtering marked as non serializable.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2018-10-21 08:41:46 +03:00
Eliran Sinvani
86637a1d0d cql3 unit test: add assertion for the number of serialized columns
The result sets that the assertions are performed against
are result sets before serialization to the user and therefore
contain also columns that will not be serialized and sent as
the query's final result. The patch adds an assertion on the
number of columns that will be present in the final serialized
result.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2018-10-21 08:41:46 +03:00
Eliran Sinvani
fd422c954e cql3: ensure retrieval of columns for filtering
When a query that needs filtering is executed, the columns
that the coordinator is filtering by have to be retrieved.The
columns should be retrieved even if they are not used for
ordering or named in the actual select clause.
If the columns are missing from the result set, then any
filtering that restricts the missing column will not take
place.

Fixes #3803

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2018-10-21 08:41:46 +03:00
Eliran Sinvani
3e036e2c8c cql3: refactor find_idx to be part of statement restrictions object
find_idx calculates the index that will be used in the statement if
indexes are to be used. In the static form it requires redundant
information (the schema is already contained within the statement
restrictions object). In addition find_idx will need to be used for
filtering in order not to include redundant selectors in the selection
objects. This change refactors find_idx to run under the statement
restrictions object and changes it's scope from private to public.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2018-10-21 08:40:24 +03:00
Eliran Sinvani
4496086bf1 cql3: add prefix size common functionality to all clustering restrictions
Up untill now, knowing the prefix size, which is used to determine
if a filtering is needed was implemented only for a single column
clustering restrictions. The patch adds a function to calculate the
prefix size for all types of clustering key restrictions given the
schema.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2018-10-21 08:39:57 +03:00
Vlad Zolotarov
a87c11bad2 storage_proxy::query_result_local: create a single tracing span on a replica shard
Every call of a tracing::global_trace_state_ptr object instead of a
tracing::tracing_state_ptr or a call to tracing::global_trace_state_ptr::get()
creates a new tracing session (span) object.

This should never be done unless query handling moves to a different shard.

Fixes #3862

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20181018003500.10030-1-vladz@scylladb.com>
2018-10-19 16:47:17 +00:00
Tomasz Grabiec
fc37b80d24 Merge "Correctly handle dropped columns in SSTable 3" from Piotr J.
Previously we were making assumptions about missing columns
(the size of its value, whether it's a collection or a counter) but
they didn't have to be always true. Now we're using column type
from serialization header to use the right values.

Fixes #3859

* seastar-dev.git haaawk/projects/sstables-30/handling-dropped-columns/v4:
  sstables 3: Correctly handle dropped columns in column_translation
  sstables 3: Add test for dropped columns handling
2018-10-19 16:47:17 +00:00
Duarte Nunes
3a53b3cebc Merge 'hinted handoff: add manager::state and split storing and replaying enablement' from Vlad
"
Refs #3828
(Probably fixes it)

We found a few flaws in a way we enable hints replaying.
First of all it was allowed before manager::start() is complete.
Then, since manager::start() is called after messaging_service is
initialized there was a time window when hints are rejected and this
creates an issue for MV.

Both issues above were found in the context of #3828.

This series fixes them both.

Tested {release}:
dtest: materialized_views_test.py:TestMaterializedViews.write_to_hinted_handoff_for_views_test
dtest: hintedhandoff_additional_test.py
"

* 'hinted_handoff_dont_create_hints_until_started-v1' of https://github.com/vladzcloudius/scylla:
  hinted handoff: enable storing hints before starting messaging_service
  db::hints::manager: add a "started" state
  db::hints::manager: introduce a _state
2018-10-19 16:47:16 +00:00
Avi Kivity
1ce52d5432 locator: fix abstract_replication_strategy::get_ranges() and friends violating sort order
get_ranges() is supposed to return ranges in sorted order. However, a35136533d
broke this and returned the range that was supposed to be last in the second
position (e.g. [0, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9]). The broke cleanup, which
relied on the sort order to perform a binary search. Other users of the
get_ranges() family did not rely on the sort order.

Fixes #3872.
Message-Id: <20181019113613.1895-1-avi@scylladb.com>
2018-10-19 16:47:12 +00:00
Vlad Zolotarov
aca0882a3f hinted handoff: enable storing hints before starting messaging_service
When messaging_service is started we may immediately receive a mutation
from another node (e.g. in the MV update context). If hinted handoff is not
ready to store hints at that point we may fail some of MV updates.

We are going to resolve this by start()ing hints::managers before we
start messaging_service and blocking hints replaying until all relevant
objects are initialized.

Refs #3828

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-18 16:49:58 -04:00
Vlad Zolotarov
cff4186517 db::hints::manager: add a "started" state
Hinting is allowed after "started" before "stopping".
Hints that attempted to be stored outside this time frame are going to
be dropped.

Refs #3828

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-18 16:41:36 -04:00
Vlad Zolotarov
fb513a4b23 db::hints::manager: introduce a _state
Introduce a multi-bit state field. In this patch it replaces the _stopping
boolean. We are going to add more states in the following patches.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-18 16:41:33 -04:00
Piotr Jastrzebski
e94254b563 sstables 3: Add test for dropped columns handling
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-18 19:13:58 +02:00
Piotr Jastrzebski
cafb3dc2ae sstables 3: Correctly handle dropped columns in column_translation
Previously we were making assumptions about missing columns
(the size of its value, whether it's a collection or a counter) but
they didn't have to be always true. Now we're using column type
from serialization header to use the right values.

Fixes #3859

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-18 19:13:44 +02:00
Eliran Sinvani
ded3a03356 cql3: rename selection metadata manipulation functions
In the past the addition of non serializable columns was being used
only for post ordering of result sets.The newly added ALLOW FILTERING
feature will need to use these functions to other post processing operations
i.e filtering. The renaming accounts for the new and existing uses for the
function.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2018-10-18 17:52:04 +03:00
Avi Kivity
472afea6cd Update seastar submodule
* seastar 4669469...d152f2d (5):
  > build: don't link with libgcc_s explicitly
  > scheduling: add std::hash<seastar::scheduling_group>
  > prometheus: Allow preemption between each metric
  > Merge "improve memory detection in containers" from Juliana
  > Merge "perf_tests: produce json reports" from Paweł
2018-10-18 14:55:18 +03:00
Duarte Nunes
7610cedc34 Merge "db/hints: Expose current backlog" from Duarte
"
Hints are stored on disk by a hints::manager, ensuring they are
eventually sent. A hints::resource_manager ensures the hints::managers
it tracks don't consume more than their allocated resources by
monitoring disk space and disabling new hints if needed. This series
fixes some bugs related to the backlog calculation, but mainly exposes
the backlog through a hints::manager so upper layers can apply flow
control.

Refs #2538
"

* 'hh-manager-backlog/v3' of https://github.com/duarten/scylla:
  db/hints/manager: Expose current backlog
  db/hints/manager: Move decision about blocking hints to the manager
  db/hints/resource_manager: Correctly account resources in space_watchdog
  db/hints/resource_manager: Replace timer with seastar::thread
  db/hints/resource_manager: Ensure managers are correctly registered
  db/hints/resource_manager: Fix formatting
  db/hints: Disallow moving or copying the managers
2018-10-16 20:35:34 +01:00
Duarte Nunes
624472d16a db/hints/manager: Expose current backlog
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:35:00 +01:00
Duarte Nunes
6dcb7a39d4 db/hints/manager: Move decision about blocking hints to the manager
The space_watchdog enables or disables hints for the managers
associated with a particular device. We encapsulate this decision
inside the hints::managers by introducing the update_backlog()
function.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:35:00 +01:00
Duarte Nunes
207c9c8e38 db/hints/resource_manager: Correctly account resources in space_watchdog
A db::hints::resource_manager manages the resources for one or two
db::hints::managers. Each of these can be using the same or different
devices. The db::hints::space_watchdog periodically checks whether
each manager is within their resource allocation, and if not disables
it.

The watchdog iterates over the managers and accounts for the total
size they are using. This is wrong, since it can account in the same
variable the size consumed by managers using different devices.

We fix this while taking advantage of the fact that on_timer is now
called in the context of a seastar::thread, instead of using future
combinators.

Fixes #3821

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:34:54 +01:00
Duarte Nunes
25d266bdc1 db/hints/resource_manager: Replace timer with seastar::thread
Will make on_timer() much simpler to allow fixing a bug in subsequent
patches.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:32:16 +01:00
Duarte Nunes
278aa13bb0 db/hints/resource_manager: Ensure managers are correctly registered
Registering a manager for a new device used
std::unordered_map::emplace(), which may not insert the specified
value if one with the same key has already been added. This could
happen if both managers were using the same device and the fiber
deferred in-between adding them.

Found during code reading. Could cause hints to not be disabled for an
overloaded manager.

Fixes #3822

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:32:16 +01:00
Duarte Nunes
9e3b09cf48 db/hints/resource_manager: Fix formatting
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:32:16 +01:00
Duarte Nunes
622ac734da db/hints: Disallow moving or copying the managers
Disable the copy and move ctors and assignment operators for both the
hints::manager and the hints::resource_manager.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-16 20:32:16 +01:00
Glauber Costa
7edae5421d sstables: print sstable path in case of an exception
Without that, we don't know where to look for the problems

Before:

 compaction failed: sstables::malformed_sstable_exception (Too big ttl: 3163676957)

After:

 compaction_manager - compaction failed: sstables::malformed_sstable_exception (Too big ttl: 4294967295 in sstable /var/lib/scylla/data/system_traces/events-8826e8e9e16a372887533bc1fc713c25/mc-832-big-Data.db)

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181016181004.17838-1-glauber@scylladb.com>
2018-10-16 20:31:20 +01:00
Asias He
7f826d3343 streaming: Expose reason for streaming
On receiving a mutation_fragment or a mutation triggered by a streaming
operation, we pass an enum stream_reason to notify the receiver what
the streaming is used for. So the receiver can decide further operation,
e.g., send view updates, beyond applying the streaming data on disk.

Fixes #3276
Message-Id: <f15ebcdee25e87a033dcdd066770114a499881c0.1539498866.git.asias@scylladb.com>
2018-10-15 22:03:28 +01:00
Benny Halevy
7eef527769 handle both special token_kinds in dht::tri_compare
Handle the before_all_keys and after_all_keys token_kind
at the highest layer before calling into the virtual
i_partitioner::tri_compare that is not set up to handle these cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181015165612.29356-1-bhalevy@scylladb.com>
2018-10-15 20:00:54 +03:00
Glauber Costa
51906f7144 compactions: log tokens that we decide not to write down to an SSTable
May be important when debugging issues related to cleanups

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181015162643.7834-1-glauber@scylladb.com>
2018-10-15 19:28:00 +03:00
Vladimir Krivopalov
092276b13d sstables: Reset opened range tombstone when moving to another partition.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <f6dc6b0bd88ca44f2ef84c2a8bee43fde82c89cc.1539396572.git.vladimir@scylladb.com>
2018-10-14 11:20:11 +03:00
Vladimir Krivopalov
926b6430fd sstables: Factor out code resetting values for a new partition.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <83a3a4ce6942b036be447bcfeb66142828e75293.1539396572.git.vladimir@scylladb.com>
2018-10-14 11:20:10 +03:00
Glauber Costa
98332de268 api: use longs instead of ints for snapshot sizes
Int types in json will be serialized to int types in C++. They will then
only be able to handle 4GB, and we tend to store more data than that.

Without this patch, listsnapshots is broken in all versions.

Fixes: #3845

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20181012155902.7573-1-glauber@scylladb.com>
2018-10-12 21:17:24 +03:00
Tomasz Grabiec
b89556512a Merge "Enable sstable_mutation_test with SSTables 3.x." from Vladimir
Introduce uppermost_bound() method instead of upper_bound() in mutation_fragment_filter and clustering_ranges_walker.

For now, this has been only used to produce the final range tombstone
for sliced reads inside consume_partition_end().

Usage of the upper bound of the current range causes problems of two
kinds:
    1. If not all the slicing ranges have been traversed with the
    clustering range walker, which is possible when the last read
    mutation fragment was before some of the ranges and reading was limited
    to a specific range of positions taken from index, the emitted range
    tombstone will not cover the untraversed slices.

    2. At the same time, if all ranges have been walked past, the end
    bound is set to after_all_clustered_rows and the emitted RT may span
    more data than it should.

To avoid both situations, the uppermost bound is used instead, which
refers to the upper bound of the last range in the sequence.

* github.com/scylladb/seastar-dev.git haaawk/projects/sstables-30/enable-mc-with-sstable-mutation-test/v2
  sstables: Use uppermost_bound() instead of upper_bound() in
    mutation_fragment_filter.
  tests: Enable sstable_mutation_test for SSTables 'mc' format.

Rebased by Piotr J.
2018-10-12 15:14:17 +02:00
Vladimir Krivopalov
5b03fe7982 tests: Enable sstable_mutation_test for SSTables 'mc' format.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-12 14:18:15 +02:00
Vladimir Krivopalov
199dc9d5a7 sstables: Use uppermost_bound() instead of upper_bound() in mutation_fragment_filter.
For now, this has been only used to produce the final range tombstone
for sliced reads inside consume_partition_end().

Usage of the upper bound of the current range causes problems of two
kinds:
    1. If not all the slicing ranges have been traversed with the
    clustering range walker, which is possible when the last read
    mutation fragment was before some of the ranges and reading was limited
    to a specific range of positions taken from index, the emitted range
    tombstone will not cover the untraversed slices.

    2. At the same time, if all ranges have been walked past, the end
    bound is set to after_all_clustered_rows and the emitted RT may span
    more data than it should.

To avoid both situations, the uppermost bound is used instead, which
refers to the upper bound of the last range in the sequence.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-12 14:18:15 +02:00
Tomasz Grabiec
193efef950 Merge "Make SST3 pass test_clustering_slices test" from Piotr
* seastar-dev.git haaawk/sst3/test_clustering_slices/v8:
  sstables: Extract on_end_of_stream from consume_partition_end
  sstables: Don't call consume_range_tombstone_end in
    consume_partition_end
  sstables: Change the way fragments are returned from consumer
2018-10-12 14:11:51 +02:00
Piotr Jastrzebski
1a6cef80f0 sstables: Change the way fragments are returned from consumer
Split range tombstone (if present) on every consume_row_end call
and store both range tombstone and row in different fields called
_stored_row and _stored_tombstone instead of using single field
called _stored.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-12 13:51:39 +02:00
Piotr Jastrzebski
3109c94c84 sstables: Don't call consume_range_tombstone_end in consume_partition_end
We don't need to check _opened_range_tombstone and _mf_filter again

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-12 13:51:28 +02:00
Piotr Jastrzebski
7dcea660e8 sstables: Extract on_end_of_stream from consume_partition_end
The new function will be called when the stream of data is finished
while old consume_partition_end will be called when partition
is finished but stream is not done yet.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-12 13:50:52 +02:00
Piotr Jastrzebski
717cb2a9e7 sstables: Adopt test_clustering_slices test for SST3
Readers for SST3 return a bit more precise range tombstones
when reader is slicing. Namely, SST2 readers return whole
range tombstones that overlap with slicing range but SST3
trim those range tombstones to slicing range.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-11 15:47:47 +02:00
Tomasz Grabiec
a7a14e3af2 Merge "Handle dead row markers when writing to SSTables 3.x" from Vladimir
There is a mismatch between row markers used in SSTables 2.x (ka/la) and
liveness_info used by SSTables 3.x (mc) in that a row marker can be
written as a deleted cell but liveness_info cannot.

To handle this, for a dead row marker the corresponding liveness_info is
written as expiring liveness_info with a fake TTL set to 1.
This approach is adapted from the solution for CASSANDRA-13395 that
exercised similar issue during SSTables upgrades.

* github.com/argenet/scylla.git projects/sstables-30/dead-row-marker/v7:
  sstables: Introduce TTL limitation and special 'expired TTL' value.
  sstables: Write dead row marker as expired liveness info.
  tests: Add test covering dead row marker writing to SSTables 3.x.
2018-10-11 10:58:57 +02:00
Gleb Natapov
ceb361544a stream_session: remove unused capture
'Consumer function' parameter for distribute_reader_and_consume_on_shards()
captures schema_ptr (which is a seastar::shared_ptr), but the function
is later copied on another shard at which point schema_ptr is also copied
and its counter is incremented by the wrong shard. The capture is not
even used, so lets just drop it.

Fixes #3838

Message-Id: <20181011075500.GN14449@scylladb.com>
2018-10-11 11:10:58 +03:00
Botond Dénes
23f3831aaf table::make_streaming_reader(): add forwarding parameter
The single-range overload, when used by
make_multishard_streaming_reader(), has to create a reader that is
forwardable. Otherwise the multishard streaming reader will not produce
any output as it cannot fast-forward its shard readers to the ranges
produced by the generator.

Also add a unit test, that is based on the real-life purpose the
multishard streaming reader was designed for - serving partition
from a shard, according to a sharding configuration that is different
than the local one. This is also the scenario that found the buf in the
first place.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <bf799961bfd535882ede6a54cd6c4b6f92e4e1c1.1539235034.git.bdenes@scylladb.com>
2018-10-11 10:59:18 +03:00
Vlad Zolotarov
5b12ec441d db::hints::manager: use "streaming" I/O scheduling class for reads
Make sure that read I/O in the context of HH sending do not overpower I/O
in the context of queries, memtable flushes or compactions.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-10 15:22:43 -04:00
Vlad Zolotarov
a89188de07 commitlog::read_log_file(): set the a read I/O priority class explicitly
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-10 15:22:43 -04:00
Vlad Zolotarov
629972d586 db::hints::manager: add hints sender to the "streaming" CPU scheduling group
Make sure that HH sends do not overpower (CPU wise) regular WRITEs flow.

Fixes #3817

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2018-10-10 15:22:43 -04:00
Vladimir Krivopalov
9a04200b03 tests: Add test covering dead row marker writing to SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-10 11:44:54 -07:00
Vladimir Krivopalov
9c773fa6cf sstables: Write dead row marker as expired liveness info.
This allows to distinguish expired liveness info from yet-to-expire one
and convert it into a dead row marker on read.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-10 11:44:14 -07:00
Vladimir Krivopalov
e71cc5ab20 sstables: Introduce TTL limitation and special 'expired TTL' value.
This allows to store expired liveness info in SSTables 3.x format
without introducing a possible conflict with real TTL values.

As per Cassandra, TTL cannot exceed 20 years so taking the maximum value
as a special value for indicating expired liveness info is safe.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-10 11:44:14 -07:00
Calle Wilund
3cb50c861d messaging_service: Make rpc streaming sink respect tls connection
Fixes #3787

Message service streaming sink was created using direct call to
rpc::client::make_sink. This in turn needs a new socker, which it
creates completely ignoring what underlying transport is active for the
client in question.

Fix by retaining the tls credential pointer in the client wrapper, and
using this in a sink method to determine whether to create a new tls
socker, or just go ahead with a plain one.

Message-Id: <20181010003249.30526-1-calle@scylladb.com>
2018-10-10 12:55:28 +03:00
Avi Kivity
1891779e64 Merge "db/hints: Use frozen_mutation in hinted handoff" from Duarte
"
This series changes hinted handoff to work with `frozen_mutation`s
instead of naked `mutation`s. Instead of unfreezing a mutation from
the commitlog entry and then freezing it again for sending, now we'll
just keep the read, frozen mutation.

Tests: unit(release)
"

* 'hh-manager-cleanup/v1' of https://github.com/duarten/scylla:
  db/hints/manager: Use frozen_mutation instead of mutation
  db/hints/manager: Use database::find_schema()
  db/commitlog/commitlog_entry: Allow moving the contained mutation
  service/storage_proxy: send_to_endpoint overload accepting frozen_mutation
  service/storage_proxy: Build a shared_mutation from a frozen_mutation
  service/storage_proxy: Lift frozen_mutation_and_schema
  service/storage_proxy: Allow non-const ranges in mutate_prepare()
2018-10-09 17:48:18 +03:00
Piotr Sarna
a93d27960c tests: add secondary index paging unit test case
A simple case for SI paging is added to secondary_index_test suite.
This commit should be followed by more complex testing
and serves as an example on how to extract paging state and use it
across CQL queries.
Message-Id: <b22bdb5da1ef8df399849a66ac6a1f377e6a650a.1539090350.git.sarna@scylladb.com>
2018-10-09 15:05:20 +01:00
Avi Kivity
cfab7a2be6 Update seastar submodule
* seastar ed44af8...4669469 (2):
  > prometheus: Fix histogram text representation
  > reactor: count I/O errors

Fixes #3827.
2018-10-09 16:36:47 +03:00
Gleb Natapov
319ece8180 storage_proxy: do not pass write_stats down to send_to_live_endpoints
write_stats is referenced from write handler which is available in
send_to_live_endpoints already. No need to pass it down.

Message-Id: <20181009133017.GA14449@scylladb.com>
2018-10-09 16:33:53 +03:00
Botond Dénes
d467b518bc multishard_mutation_query(): don't attempt to stop broken readers
Currently, when stopping a reader fails, it simply won't be attempted to
be saved, and it will be left in the `_readers` array as-is. This can
lead to an assertion failure as the reader state will contain futures
that were already waited upon, and that the cleanup code will attempt to
wait on again. To prevent this, when stopping a reader fails, reset it
to nonexistent state, so that the cleanup code doesn't attempt to do
anything with it.

Refs: #3830

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <a1afc1d3d74f196b772e6c218999c57c15ca05be.1539088164.git.bdenes@scylladb.com>
2018-10-09 15:59:50 +03:00
Gleb Natapov
207b57a892 storage_proxy: count number of timed out write attempts after CL is reached
It is useful to have this counter to investigate the reason for read
repairs. Non zero value means that writes were lost after CL is reached
and RR is expected.

Message-Id: <20181009120900.GF22665@scylladb.com>
2018-10-09 15:17:07 +03:00
Piotr Sarna
b3685342a6 service/pager: avoid dereferencing null partition key
The pager::state() function returns a valid paging object even
if the pager itself is exhausted. It may also not contain the partition
key, so using it unconditionally was a bug - now, in case there is no
partition key present, paging state will contain an empty partition key.

Fixes #3829

Message-Id: <28401eb21ab8f12645c0a33d9e92ada9de83e96b.1539074813.git.sarna@scylladb.com>
2018-10-09 12:13:52 +03:00
Botond Dénes
4bb0bbb9e2 database: add make_multishard_streaming_reader()
Creates a streaming reader that reads from all shards. Shard readers are
created with `table::make_streaming_reader()`.
This is needed for the new row-level repair.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4b74c710bed2ef98adf07555a4c841e5b690dd8c.1538470782.git.bdenes@scylladb.com>
2018-10-09 11:07:47 +03:00
Botond Dénes
3eeb6fbd23 table::make_streaming_reader(): add single-range overload
This will be used by the `make_multishard_streaming_reader()` in the
next patch. This method will create a multishard combining reader which
needs its shard readers to take a single range, not a vector of ranges
like the existing overload.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <cc6f2c9a8cf2c42696ff756ed6cb7949b95fe986.1538470782.git.bdenes@scylladb.com>
2018-10-09 11:07:46 +03:00
Botond Dénes
a56871fab7 tests/multishard_mutation_query_test: test rage-tombstones spanning multiple pages
Extend the existing range-tombstone test, such that range tombstones
span multiple pages worth of rows.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <583aa826ea12118289b08d483b55b5573d27e1ee.1539002810.git.bdenes@scylladb.com>
2018-10-09 10:18:28 +03:00
Vladimir Krivopalov
e9aba6a9c3 sstables: Add missing 'mc' format into format strings map in sstable::filename().
Fixes #3832.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <269421fb2ac8ab389231cbe9ed501da7e7ff936a.1539048008.git.vladimir@scylladb.com>
2018-10-09 10:07:08 +03:00
Asias He
8edf3defdf range_streamer: Futurize add_ranges
It might take long time for get_all_ranges_with_sources_for and
get_all_ranges_with_strict_sources_for to calculate which cause reactor
stall. To fix, run them in a thread and yield. Those functions are used in
the slow path, it is ok to yield more than needed.

Fixes #3639

Message-Id: <63aa7794906ac020c9d9b2984e1351a8298a249b.1536135617.git.asias@scylladb.com>
2018-10-09 09:46:50 +03:00
Nadav Har'El
b8668dc0f8 materialized views: refuse to filter by non-key column
A materialized views can provide a filter so as to pick up only a subset
of the rows from the base table. Usually, the filter operates on columns
from the base table's primary key. If we use a filter on regular (non-key)
columns, things get hairy, and as issue #3430 showed, wrong: merely updating
this column in the base table may require us to delete, or resurrect, the
view row. But normally we need to do the above when the "new view key column"
was updated, when there is one. We use shadowable tombstones with one
timestamp to do this, so it cannot take into account the two timestamp from
those two columns (the filtered column and the new key column).

So in the current code, filtering by a non-key column does not work correctly.
In this patch we provide two test cases (one involving TTLs, and one involves
only normal updates), which demonstrate vividly that it does *not* work
correctly. With normal updates, trying to resurect a view row that has
previously disappeared, fails. With TTLs, things are even worse, and the view
row fails to disappear when the filtered column is TTLed.

In Cassandra, the same thing doesn't work correctly as well (see
CASSANDRA-13798 and CASSANDRA-13832) so they decided to refuse creating
a materialized view filtering a non-key column. In this patch we also
do this - fail the creation of such an unsupported view. For this reason,
the two tests mentioned above are commented out in a "#if", with, instead,
a trivial test verifying a failure to create such a view.

Note that as explained above, when the filtered column and new view key
column are *different* we have a problem. But when they are the *same* - namely
we filter by a non-key base column which actually *is* a key in the view -
we are actually fine. This patch includes additional test cases verifying
that this case is really fine and provides correct results. Accordingly,
this case is *not* forbidden in the view creation code.

Fixes #3430.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181008185633.24616-1-nyh@scylladb.com>
2018-10-08 20:37:11 +01:00
Avi Kivity
0fa60660b8 Merge "Fix mutation fragments clobbering on fast_forward" from Vladimir
"
This patchset fixes a bug in SSTables 3.x reading when fast-forwarding
is enabled. It is possible that a mutation fragment, row or RT marker,
is read and then stored because it falls outside the current
fast-forwarding range.

If the reader is further fast-forwarded but the
row still falls outside of it, the reader would still continue reading
and get the next fragment, if any, that would clobber the currently
stored one. With this fix, the reader does not attempt to read on
after storing the current fragment.

Tests: unit {release}
"

* 'projects/sstables-30/row-skipped-on-double-ff/v2' of https://github.com/argenet/scylla:
  tests: Add test for reading rows after multiple fast-forwarding with SSTables 3.x.
  sstables: mp_row_consumer_m to notify reader on end of stream when storing a mutation fragment.
  sstables: In mp_row_consumer_m::push_mutation_fragments(), return the called helper's value.
2018-10-08 20:18:42 +03:00
Vladimir Krivopalov
07d61683b6 tests: Add test for reading rows after multiple fast-forwarding with SSTables 3.x.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-08 09:09:33 -07:00
Botond Dénes
d0eb443913 result_memory_accounter: drop state_for_another_shard()
This is not used since range-scans were refactored (e49a14e30) as part
of making them stateful.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <589f30163e29299e840750457919214a26f0da93.1539005336.git.bdenes@scylladb.com>
2018-10-08 14:29:48 +01:00
Duarte Nunes
48ebe6552c Merge 'Fix issues with endpoint state replication to other shards' from Tomasz
Fixes #3798
Fixes #3694

Tests:

  unit(release), dtest([new] cql_tests.py:TruncateTester.truncate_after_restart_test)

* tag 'fix-gossip-shard-replication-v1' of github.com:tgrabiec/scylla:
  gms/gossiper: Replicate enpoint states in add_saved_endpoint()
  gms/gossiper: Make reset_endpoint_state_map() have effect on all shards
  gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards
  gms/gossiper: Always override states from older generations
2018-10-08 14:19:19 +01:00
Avi Kivity
4b16867bd7 cql: relax writetime/ttl selections of collections
writetime() or ttl() selections of non-frozen collections can work, as they
are single cells. Relax the check to allow them, and only forbid non-frozen
collections.

Fixes #3825.

Tests: cql_query_test (release).
Message-Id: <20181008123920.27575-1-avi@scylladb.com>
2018-10-08 14:07:01 +01:00
Duarte Nunes
56e36ee14b flat_mutation_reader: Use std::move(range) in move_buffer_content_to()
Instead of open coding it.

Tests: unit(release)

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181008104328.13164-1-duarte@scylladb.com>
2018-10-08 13:57:13 +03:00
Avi Kivity
474bb4e44f cql: functions: implement min/max/count for bytes type
Uncomment existing declare() calls and implement tests. Because the
data_value(bytes) constructor is explicit, we add explicit conversion to
data_value in impl_min_function_for<> and impl_max_function_for<>.

Fixes #3824.
Message-Id: <20181008084127.11062-1-avi@scylladb.com>
2018-10-08 10:48:30 +01:00
Takuya ASADA
d89114d1fc dist/debian: install GPG key for cross-building
We found on some Debian environment Ubuntu .deb build fails with
gpg error because lack of Ubuntu GPG key, so we need to install it before
start pbuilder.
Same as on Ubuntu, it needs to install Debian GPG key.

Fixes #3823

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181008072246.13305-1-syuu@scylladb.com>
2018-10-08 10:43:25 +03:00
Botond Dénes
b01050e28c HACKING.md: add link to the scylla-dev mailing list
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <9a5d967f791d7a0db584864f68f93bbc68f52372.1538977773.git.bdenes@scylladb.com>
2018-10-08 10:06:50 +03:00
Duarte Nunes
74d809f8be db/hints/manager: Use frozen_mutation instead of mutation
Instead of unfreezing a mutation from the commitlog and then freezing
it again to send, just keep the read frozen mutation.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:57:30 +01:00
Duarte Nunes
6eec9748fc db/hints/manager: Use database::find_schema()
Instead of using find_column_family() and repeatedly asking for
column_family::schema(), use database::find_schema() instead.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:57:30 +01:00
Duarte Nunes
5b3d08defc db/commitlog/commitlog_entry: Allow moving the contained mutation
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:57:30 +01:00
Duarte Nunes
3b6d2286e9 service/storage_proxy: send_to_endpoint overload accepting frozen_mutation
Add an overload to send_to_endpoint() which accepts a frozen_mutation.
The motivation is to allow better accounting of pending view updates,
but this change also allows some callers to avoid unfreezing already
frozen mutations.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:37:39 +01:00
Duarte Nunes
c7639f53e0 service/storage_proxy: Build a shared_mutation from a frozen_mutation
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:27:29 +01:00
Duarte Nunes
9e14412528 service/storage_proxy: Lift frozen_mutation_and_schema
Lift frozen_mutation_and_schema to frozen_mutation.hh, since other
subsystems using frozen_mutations will likely want to pass it around
together with the schema.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:27:29 +01:00
Duarte Nunes
2c739f36cc service/storage_proxy: Allow non-const ranges in mutate_prepare()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-07 19:27:29 +01:00
Avi Kivity
1cc81d1492 Update seastar submodule
* seastar 71e914e...ed44af8 (4):
  > Merge "Add semaphore_units<>::split() function" from Duarte
  > scheduling: introduce destroy_scheduling_group()
  > tls: include "api.hh" for listen_options
  > rpc: connection-level resource isolation
2018-10-07 20:45:49 +03:00
Duarte Nunes
4162bff37a Merge 'cql3: allow adding or dropping multiple columns in ALTER TABLE statement' from Benny
"
This patchset implements ALTER TABLE ADD/DROP for multiple columns.

Fixes: #2907
Fixes: #3691

Tests: schema_change_test
"

* 'projects/cql3/alter-table-multi/v3' of https://github.com/bhalevy/scylla:
  cql3: schema_change_test: add test_multiple_columns_add_and_drop
  cql3: allow adding or dropping multiple columns in ALTER TABLE statement
  cql3: alter_table_statement: extract add/alter/drop per-column code into functions
  cql3: testing for MVs for alter_table_statement::type::drop is not per column
  cql3: schema_change_test: add test_static_column_is_dropped
2018-10-07 17:30:09 +01:00
Benny Halevy
0f350f5d59 cql3: schema_change_test: add test_multiple_columns_add_and_drop
Add a unit test for adding or dropping multiple columns.
See https://github.com/scylladb/scylla/issues/2907

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-07 19:14:29 +03:00
Benny Halevy
23fecc7e5e cql3: allow adding or dropping multiple columns in ALTER TABLE statement
Fixes #2907
Fixes #3691

See Cassandra reference: https://apache.googlesource.com/cassandra/+/cassandra-3.6/src/antlr/Parser.g
/**
 * ALTER COLUMN FAMILY <CF> ALTER <column> TYPE <newtype>;
 * ALTER COLUMN FAMILY <CF> ADD <column> <newtype>; | ALTER COLUMN FAMILY <CF> ADD (<column> <newtype>,<column1> <newtype1>..... <column n> <newtype n>)
 * ALTER COLUMN FAMILY <CF> DROP <column>; | ALTER COLUMN FAMILY <CF> DROP ( <column>,<column1>.....<column n>)
 * ALTER COLUMN FAMILY <CF> WITH <property> = <value>;
 * ALTER COLUMN FAMILY <CF> RENAME <column> TO <column>;
 */
alterTableStatement returns [shared_ptr<alter_table_statement> expr]
    @init {
        alter_table_statement::type type;
        auto props = make_shared<cql3::statements::cf_prop_defs>();
        std::vector<alter_table_statement::column_change> column_changes;
        std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>, shared_ptr<cql3::column_identifier::raw>>> renames;
    }
    : K_ALTER K_COLUMNFAMILY cf=columnFamilyName
          ( K_ALTER id=cident K_TYPE v=comparatorType { type = alter_table_statement::type::alter; column_changes.emplace_back(id, v); }
          | K_ADD                                     { type = alter_table_statement::type::add; }
            (         id1=cident  v1=comparatorType  b1=cfisStatic { column_changes.emplace_back(id1, v1, b1); }
            | '('     id1=cident  v1=comparatorType  b1=cfisStatic { column_changes.emplace_back(id1, v1, b1); }
                 (',' idn=cident  vn=comparatorType  bn=cfisStatic { column_changes.emplace_back(idn, vn, bn); } )* ')'
            )

          | K_DROP  id=cident                         { type = alter_table_statement::type::drop; column_changes.emplace_back(id); }
          | K_WITH  properties[props]                 { type = alter_table_statement::type::opts; }
          | K_RENAME                                  { type = alter_table_statement::type::rename; }
               id1=cident K_TO toId1=cident { renames.emplace_back(id1, toId1); }
               ( K_AND idn=cident K_TO toIdn=cident { renames.emplace_back(idn, toIdn); } )*
          )
    {
        $expr = ::make_shared<alter_table_statement>(std::move(cf), type, std::move(column_changes), std::move(props), std::move(renames));
    }
    ;

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-07 19:14:26 +03:00
Benny Halevy
3fa6d3d3a8 cql3: alter_table_statement: extract add/alter/drop per-column code into functions
In preparation to supporting ALTER TABLE with multiple columns (#3691)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-07 18:57:06 +03:00
Alexys Jacob
eebbae066a dist/common/scripts/scylla_setup: fix gentoo linux installed package detection
return code is expected to be 0 when installed package was found

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181002123433.4702-1-ultrabug@gentoo.org>
2018-10-07 16:46:02 +03:00
Alexys Jacob
850d046551 dist/common/scripts/scylla_ntp_setup: fix gentoo linux systemd service name
fix typo as ntpd package systemd service is named ntpd, not sntpd

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181002123802.5576-1-ultrabug@gentoo.org>
2018-10-07 16:46:01 +03:00
Alexys Jacob
54151d2039 dist/common/scripts/scylla_cpuscaling_setup: fix file open mode for writing
gentoo linux part tries to open the configuration file without the
write flag, leading to an exception

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20181002123957.6010-1-ultrabug@gentoo.org>
2018-10-07 16:46:00 +03:00
Avi Kivity
700994a4f2 Merge "Add GDB commands for examining gossiper and RPC state" from Tomasz
* 'gdb-gms-netw' of github.com:tgrabiec/scylla:
  gdb: Introduce 'scylla netw' command
  gdb: Introduce 'scylla gms' command
  gdb: Add sharded service wrapper
  gdb: Add unique_ptr wrapper
  gdb: Add list_unordered_set()
  gdb: Make std_vector wrapper indexable
  gdb: Add wrapper for std_map
2018-10-07 16:42:52 +03:00
Vlad Zolotarov
7cbe5f2983 service: priority_manager.hh: add #pragma once
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20181005040552.2183-3-vladz@scylladb.com>
2018-10-07 16:04:26 +03:00
Duarte Nunes
30d6ed8f92 service/storage_proxy: Consider target liveness in sent_to_endpoint()
So we don't attempt to send mutations to unreachable endpoints and
instead store a hint for them, we now check the endpoint status and
populate dead_endpoints accordingly in
storage_proxy::send_to_endpoint().

Fixes #3820

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181007100640.2182-1-duarte@scylladb.com>
2018-10-07 16:04:26 +03:00
Benny Halevy
581b9006d4 cql3: testing for MVs for alter_table_statement::type::drop is not per column
No column can be dropped from a table with materialized views
so the respective exception can ignore and omit the dropped column name.

In preparation for refactoring the respective code, moving the per-column
code to member functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-07 15:16:32 +03:00
Benny Halevy
8d298064b1 cql3: schema_change_test: add test_static_column_is_dropped
Test dropping of static column defined in CREATE TABLE, and
adding and dropping of a static column using ALTER TABLE.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-07 14:34:28 +03:00
Duarte Nunes
a69d468101 service/storage_proxy: Fix formatting of send_to_endpoint()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181006204756.32232-1-duarte@scylladb.com>
2018-10-07 11:05:32 +03:00
Vladimir Krivopalov
9db124c6e5 sstables: mp_row_consumer_m to notify reader on end of stream when storing a mutation fragment.
Without it, the reader will attempt to read further and may clobber the
stored fragment with the next one read, if any.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-05 19:09:09 -07:00
Vladimir Krivopalov
8e004684e9 sstables: In mp_row_consumer_m::push_mutation_fragments(), return the called helper's value.
Instead of blindly proceeding, use whatever the call to maybe_push_*()
has returned.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-10-05 19:05:03 -07:00
Duarte Nunes
b839f551cf cql3/statements/select_statement: Don't double count unpaged queries
Unpaged queries are those for which the client didn't enable paging,
and we already account for them in
indexed_table_select_statement::do_execute().

Remove the second increment in read_posting_list().

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181003121811.11750-1-duarte@scylladb.com>
2018-10-05 17:36:39 +02:00
Nadav Har'El
e4ef7fc40a materialized views: enable two tests in view_schema_test
We had two commented out tests based on Cassandra's MV unit tests, for
the case that the view's filter (the "SELECT" clause used to define the
view) filtered by a non-primary-key column. These tests used to fail
because of problems we had in the filtering code, but they now succeed,
so we can enable them. This patch also adds some comments about what
the tests do, and adds a few more cases to one of the tests.

Refs #3430.

However, note that the success of these tests does not really prove that
the non-PK-column filtering feature works fully correctly and that issue
forbidding it, as explained in
https://issues.apache.org/jira/browse/CASSANDRA-13798. We can probably
fix this feature with our "virtual cells" mechanism, but will need to add
a test to confirm the possible problem and its (probably needed fix).
We do not add such a test in this patch.

In the meantime, issue #3430 should remain open: we still *allow* users
to create MV with such a filter, and, as the tests in this patch show,
this "mostly" works correctly. We just need to prove and/or fix what happens
with the complex row liveness issues a la issue #3362.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181004213637.32330-1-nyh@scylladb.com>
2018-10-04 22:43:38 +01:00
Tomasz Grabiec
3c7de9fee9 gms/gossiper: Replicate enpoint states in add_saved_endpoint() 2018-10-04 12:54:00 +02:00
Tomasz Grabiec
ddf3a61bcf gms/gossiper: Make reset_endpoint_state_map() have effect on all shards 2018-10-04 12:53:56 +02:00
Tomasz Grabiec
9e3f744603 gms/gossiper: Replicate STATUS change from mark_as_shutdown() to other shards
Lack of this may result in non-zero shards on some nodes still seeing
STATUS as NORMAL for a node which shut down, in some cases.

mark_as_shutdown() is invoked in reaction to an RPC call initiated by
the node which is shutting down. Another way a node can learn about
other node shutting down is via gossiping with a node which knows
this. In that case, the states will be replicated to non-zero
shards. The node which learnt via mark_as_shutdown() may also
eventually propagate this to non-zero shards, e.g. when it gossips
about it with other nodes, and its local version number at the time of
mark_as_shudown() was smaller than the one used to set the STATE by
the shutting down node.
2018-10-04 12:51:42 +02:00
Tomasz Grabiec
c4ec81e126 gms/gossiper: Always override states from older generations
Application states of each node are versioned per-node with a pair of
generation number (more significant) and value version. Generation
number uniquely identifies the life time of a scylla
process. Generation number changes after restart. Value versions start
from 0 on each restart. When a node gets updates for application
states, it merges them with its view on given node. Value updates with
older versions are ignored.

Gossiper processes updates only on shard 0, and replicates value
updates to other shards. When it sees a value with a new generation,
it correclty forgets all previous values. However, non-zero shards
don't forget values from previous generations. As a result,
replication will fail to override the values on non-zero shards when
generation number changes until their value version exceeds the
version prior to the restart.

This will result in incorrect STATUS for non-seed nodes on non-zero
shards.  When restarting a non-seed node, it will do a shadow gossip
round before setting its STATUS to NORMAL. In the shadow round it will
learn from other nodes about itself, and set its STATUS to shutdown on
all shards with a high value version. Later, when it sets its status
to NORMAL, it will override it only on shard 0, because on other
shards the version of STATUS is higher.

This will cause CQL truncate to skip current node if the coordinator
runs on non-zero shards.

The fix is to override the entries on remote shards in the same way we
do on shard 0. All updates to endpoint states should be already
serialized on shard 0, and remote shards should see them in the same
order.

Introduced in 2d5fb9d

Fixes #3798
Fixes #3694
2018-10-04 12:47:27 +02:00
Piotr Sarna
a5570cb288 tests: add missing get() calls in threaded context
One test case missed a few get() calls in order to wait
for continuations, which only accidentally worked,
because it was followed by 'eventually()' blocks.
Message-Id: <69c145575ac81154c4b5f500d01c6b045a267088.1536839959.git.sarna@scylladb.com>
2018-10-04 10:55:45 +01:00
Piotr Sarna
8a2abd45fb tests: add collections test for secondary indexing
Test case regarding creating indexes on collection columns
is added to the suite.

Refs #3654
Refs #2962
Message-Id: <1b6844634b6e9a353028545813571647c92fb330.1536839959.git.sarna@scylladb.com>
2018-10-04 10:55:45 +01:00
Piotr Sarna
2d355bdf47 cql3: prevent creation of indexes on non-frozen collections
Until indexes for non-frozen collections is implemented,
creating such indexes should be disallowed to prevent unnecessary
errors on insertions/selections.

Fixes #3653
Refs #2962
Message-Id: <218cf96d5e38340806fb9446b8282d2296ba5f43.1536839959.git.sarna@scylladb.com>
2018-10-04 10:55:45 +01:00
Duarte Nunes
959559d568 cql3/statements/select_statement: Remove outdated comment
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181003193033.13862-1-duarte@scylladb.com>
2018-10-04 09:45:17 +03:00
Eliran Sinvani
20f49566a2 cql3 : add workaround to antlr3 null dereference bug
The Antlr3 exception class has a null dereference bug that crashes
the system when trying to extract the exception message using
ANTLR_Exception<...>::displayRecognitionError(...) function. When
a parsing error occurs the CqlParser throws an exception which in
turn processesed for some special cases in scylla to generate a custom
message. The default case however, creates the message using
displayRecognitionError, causing the system to crash.
The fix is a simple workaround, making sure the pointer is not null
before the call to the function. A "proper" fix can't be implemented
because the exception class itself is implemented outside scylla
in antlr headers that resides on the host machine os.

Tested manualy 2 testcases, a typo causing scylla to crash and
a cql comment without a newline at the end also caused scylla to crash.
Ran unit tests (release).

Fixes #3740
Fixes #3764

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Message-Id: <cfc7e0d758d7a855d113bb7c8191b0fd7d2e8921.1538566542.git.eliransin@scylladb.com>
2018-10-03 18:30:06 +03:00
Tomasz Grabiec
9c57abcce7 gossiper: Fix shutdown_announce_in_ms not being respected
shutdown_announce_in_ms specifies a period of time that a node which
is shutting down waits to allow its state to propagate to other nodes.
However, we were setting _enabled to false before waiting, which
will make the current node ignore gossip messages.
Message-Id: <1538576996-26283-1-git-send-email-tgrabiec@scylladb.com>
2018-10-03 15:43:00 +01:00
Tomasz Grabiec
fda8e271e3 gdb: Introduce 'scylla netw' command
Prints information about the state of the messaging service layer.

Example:

(gdb) scylla netw
Dropped messages: {0 <repeats 25 times>}
Outgoing connections:
IP: 127.0.0.2, (netw::messaging_service::rpc_protocol_client_wrapper*) 0x6000051cd220:
  stats: {replied = 0, pending = 0, exception_received = 0, sent_messages = 23, wait_reply = 0, timeout = 0}
  outstanding: 0
Server: resources={_count = 85899345, _ex = {_M_exception_object = 0x0}, _wait_list = {_list = {_front_chunk = 0x0, _back_chunk = 0x0, _nchunks = 0, _free_chunks = 0x0, _nfree_chunks = 0}, _on_expiry = {<No data fields>}, _size = 0}}
Incoming connections:
127.0.0.1:28071:
   {replied = 0, pending = 0, exception_received = 0, sent_messages = 2, wait_reply = 0, timeout = 0}
2018-10-03 15:05:22 +02:00
Tomasz Grabiec
cf07cda08f gdb: Introduce 'scylla gms' command
Prints gossiper state. Example:

(gdb) scylla gms
127.0.0.2: (gms::endpoint_state*) 0x6010050c0550 ({_generation = 1538568389, _version = 2147483647})
  gms::application_state::STATUS: {version=18, value="NORMAL,968364964011550971"}
  gms::application_state::LOAD: {version=267, value="494510"}
  gms::application_state::SCHEMA: {version=13, value="27e48f6a-a668-398a-b2f5-cf4b905450e9"}
  gms::application_state::DC: {version=10, value="datacenter1"}
  gms::application_state::RACK: {version=11, value="rack1"}
  gms::application_state::RELEASE_VERSION: {version=4, value="3.0.8"}
  gms::application_state::RPC_ADDRESS: {version=3, value="127.0.0.2"}
  gms::application_state::NET_VERSION: {version=1, value="0"}
  gms::application_state::HOST_ID: {version=2, value="ee281b83-1acb-4aa3-927c-985a7d9a7c6f"}
127.0.0.1: (gms::endpoint_state*) 0x6010051422b0 ({_generation = 1538557402, _version = 0})
  gms::application_state::STATUS: {version=18, value="NORMAL,9176584852507611499"}
  gms::application_state::LOAD: {version=22521, value="409817"}
  gms::application_state::SCHEMA: {version=13, value="27e48f6a-a668-398a-b2f5-cf4b905450e9"}
  gms::application_state::DC: {version=10, value="datacenter1"}
  gms::application_state::RACK: {version=11, value="rack1"}
  gms::application_state::RELEASE_VERSION: {version=4, value="3.0.8"}
  gms::application_state::RPC_ADDRESS: {version=3, value="127.0.0.1"}
  gms::application_state::NET_VERSION: {version=1, value="0"}
  gms::application_state::HOST_ID: {version=2, value="88ff543f-e9b8-42eb-a876-c0f917078a31"}
2018-10-03 15:05:22 +02:00
Tomasz Grabiec
8c6f8b1773 gdb: Add sharded service wrapper 2018-10-03 15:05:22 +02:00
Tomasz Grabiec
4adfed9dba gdb: Add unique_ptr wrapper 2018-10-03 15:05:22 +02:00
Tomasz Grabiec
e29e302272 gdb: Add list_unordered_set() 2018-10-03 15:05:22 +02:00
Tomasz Grabiec
272bc88699 gdb: Make std_vector wrapper indexable 2018-10-03 15:05:22 +02:00
Tomasz Grabiec
b436759d49 gdb: Add wrapper for std_map 2018-10-03 15:05:22 +02:00
Pekka Enberg
de48966abc cql3: Move as_json_function class to separate file
The as_json_function class is not registered as a function, but we can
still keep it cql3/functions, as per its namespace, to reduce the size
of select_statement.cc.
Message-Id: <20181002132637.30233-1-penberg@scylladb.com>
2018-10-03 13:30:08 +01:00
Piotr Sarna
4a23297117 cql3: add asking for pk/ck in the base query
Base query partition and clustering keys are used to generate
paging state for an index query, so they always need to be present
when a paged base query is processed.
Message-Id: <f3bf69453a6fd2bc842c8bdbd602d62c91cf9218.1538568953.git.sarna@scylladb.com>
2018-10-03 13:26:51 +01:00
Piotr Sarna
50d3de0693 cql3: add checking for may_need_paging when executing base query
It's not sufficient to check for positive page_size when preparing
a base query for indexed select statement - may_need_paging() should
be called as well.
Message-Id: <d435820019e4082a64ca9807541f0c9ad334e6a8.1538568953.git.sarna@scylladb.com>
2018-10-03 13:26:51 +01:00
Piotr Sarna
11b8831c04 cql3: move base query command creation to a separate function
Message-Id: <6b48b8cbd6312da4a17bfd3c85af628b4215e9f4.1538568953.git.sarna@scylladb.com>
2018-10-03 13:26:51 +01:00
Avi Kivity
7c8143c3c4 Revert "compaction: demote compaction start/end messages to DEBUG level"
This reverts commit b443a9b930. The compaction
history table doesn't have enough information to be a replacement for this
log message yet.
2018-10-03 13:13:37 +03:00
Avi Kivity
b9702222f8 Merge "Handle simple column type schema changes in SST3" from Piotr
"
This patchset enables very simple column type conversions.
It covers only handling variable and fixed size type differences.
Two types still have to be compatiple on bits level to be able to convert a field from one to the other.
"

* 'haaawk/sst3/column_type_schema_change/v4' of github.com:scylladb/seastar-dev:
  Fix check_multi_schema to actually check the column type change
  Handle very basic column type conversions in SST3
  Enable check_multi_schema for SST3
2018-10-03 13:12:10 +03:00
Piotr Jastrzebski
3a60eac1d5 Fix check_multi_schema to actually check the column type change
Field 'e' was supposed to be read as blob but the test had a bug
and the read schema was treating that field as int. This patch
changes that and makes the test really check column type change.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-03 10:56:40 +02:00
Piotr Jastrzebski
3cecb61ac1 Handle very basic column type conversions in SST3
After this change very simple schema changes of column type
will work. This change makes sure that variable size and fixed
size types can be converted to each other but only if their bit
representation can be automatically converted between those types.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-03 10:56:40 +02:00
Piotr Jastrzebski
c117a6b3c8 Enable check_multi_schema for SST3
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2018-10-03 10:56:39 +02:00
Nadav Har'El
bebe5b5df2 materialized views: add view_updates_pending statistic
We are already maintaining a statistic of the number of pending view updates
sent but but not yet completed by view replicas, so let's expose it.
As all per-table statistics, also this one will only be exposed if the
"--enable-keyspace-column-family-metrics" option is on.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2018-10-02 20:44:58 +01:00
Nadav Har'El
1d5f8d0015 materialized views: update stats.write statistics in all cases
mutate_MV usually calls send_to_endpoint() to push view update to remote
view replicas. This function gets passed a statistics object,
service::storage_proxy_stats::write_stats and, in particular, updates
its "writes" statistic which counts the number of ongoing writes.

In the case that the paired view replica happens to be the *same* node,
we avoid calling send_to_endpoint() and call mutate_locally() instead.
That function does not take a write_stats object, so the "writes" statistic
doesn't get incremented for the duration of the write. So we should do
this explicitly.

Co-authored-by: Nadav Har'El <nyh@scylladb.com>
Co-authored-by: Duarte Nunes <duarte@scylladb.com>
2018-10-02 20:44:58 +01:00
Duarte Nunes
40a30d4129 db/schema_tables: Diff tables using ID instead of name
Currently we diff schemas based on table/view name, and if the names
match, then we detect altered schemas by comparing the schema
mutations. This fails to detect transitions which involve dropping and
recreating a schema with the same name, if a node receives these
notifications simultaneously (for example, if the node was temporarily
down or partitioned).

Note that because the ID is persisted and created when executing a
create_table_statement, then even if a schema is re-created with the
exact same structure as before, we will still considered it altered
because the mutations will differ.

This also stops schema pulling from working, since it relies on schema
merging.

The solution is to diff schemas using their ID, and not their name.

Keyspaces and user types are also susceptible to this, but in their
case it's fine: these are values with no identity, and are just
metadata. Dropping and recreating a keyspace can be views as dropping
all tables from the keyspace, altering it, and eventually adding new
tables to the keyspace.

Note that this solution doesn't apply to tables dropped and created
with the same ID (using the `WITH ID = {}` syntax). For that, we would
need to detect deltas instead of applying changes and then reading the
new state to find differences. However, this solution is enough,
because tables are usually created with ID = {} for very specific,
peculiar reasons. The original motivation meant for the new table to
be treated exactly as the old, so the current behavior is in fact the
desired one.

Tests: unit(release), dtests(schema_test, schema_management_test)

Fixes #3797

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181001230932.47153-2-duarte@scylladb.com>
2018-10-02 20:15:46 +02:00
Duarte Nunes
e404f09a23 db/schema_tables: Drop tables before creating new ones
Doing it by the inverse order doesn't support dropping and creating a
schema with the same name.

Refs #3797

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181001230932.47153-1-duarte@scylladb.com>
2018-10-02 20:15:32 +02:00
Avi Kivity
aaab8a3f46 utils: crc32: mark power crc32 assembly as not requiring an executable stack
The linker uses an opt-in system for non-executable stack: if all object files
opt into a non-executable stack, the binary will have a non-executable stack,
which is very desirable for security. The compiler cooperates by opting into
a non-executable stack whenever possible (always for our code).

However, we also have an assembly file (for fast power crc32 computations).
Since it doesn't opt into a non-executable stack, we get a binary with
executable stack, which Gentoo's build system rightly complains about.

Fix by adding the correct incantation to the file.

Fixes #3799.

Reported-by: Alexys Jacob <ultrabug@gmail.com>
Message-Id: <20181002151251.26383-1-avi@scylladb.com>
2018-10-02 18:48:23 +01:00
Avi Kivity
53a4b8ae86 Update seastar submodule
* seastar 5712816...71e914e (12):
  > Merge "rpc shard to shard connection" from Gleb
  > Merge "Fix memory leaks when stoppping memcached" from Tomasz
  > scripts: perftune.py: prioritize I/O schedulers
  > alien: fix the size of local item[]
  > seastar-addr2line: don't invoke addr2line multiple times
  > reactor: use labels for different io_priority_class:s
  > util/spinlock: fix bad namespacing of <xmmintrin.h>
  > Merge "scripts: perftune.py: support different I/O schedulers" from Vlad
  > timer: Do not require callback to be copyable
  > core/reactor: Fix hang on shutdown with long task quota
  > build: use 'ppa:scylladb/ppa' instead of URL for sourceline
  > net/dns: add net::dns::get_srv_records() helper
2018-10-02 18:48:23 +01:00
Avi Kivity
7322ac105c Merge "sstables_stats" from Benny
"
This patchset adds sstable partition/row read/write/seek statistics.

Tests: dtest sstable_generation_loading_test.py stress_tool_test.py

Fixes: #251
"

* 'projects/sstables-stats/v5' of https://github.com/bhalevy/scylla:
  sstables stats: row reads
  sstables stats: partition seeks
  sstables stats: partition reads
  sstables stats: flat mutation reads
  sstables stats: cell/cell_tombstone writes
  sstables stats: partition/row/tombstone writes
  sstables_stats: writer_impl: move common members to base class
2018-10-02 15:05:10 +03:00
Duarte Nunes
7ba944a243 service/migration_manager: Validate duplicate ID in time
We allow tables to be created with the ID property, mostly for
advanced recovery cases. However, we need to validate that the ID
doesn't match an existing one. We currently do this in
database::add_column_family(), but this is already too late in the
normal workflow: if we allow the schema change to go through, then
it is applied to the system tables and loaded the next time the node
boots, regardless of us throwing from database::add_column_family().

To fix this, we perform this validation when announcing a new table.

Note that the check wasn't removed from database::add_column_family();
it's there since 2015 and there might have been other reasons to add
it that are not related to the ID property.

Refs #2059

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181001230142.46743-1-duarte@scylladb.com>
2018-10-02 13:40:40 +03:00
Benny Halevy
bd6533f471 sstables stats: row reads
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:15:43 +03:00
Benny Halevy
192c1949a3 sstables stats: partition seeks
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:15:43 +03:00
Benny Halevy
edb3c23125 sstables stats: partition reads
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:15:43 +03:00
Benny Halevy
e9dffa56c8 sstables stats: flat mutation reads
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:15:43 +03:00
Benny Halevy
4ccdc1115d sstables stats: cell/cell_tombstone writes
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:15:41 +03:00
Benny Halevy
2f48f72d5c sstables stats: partition/row/tombstone writes
Introduce per-thread sstables stats infrastructure

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:01:14 +03:00
Benny Halevy
6853c1677d sstables_stats: writer_impl: move common members to base class
To be used by sstable_writer for stats collection.

Note that this patch is factored out so it can be verified with no
other change in functionality.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2018-10-01 13:01:00 +03:00
1326 changed files with 51250 additions and 21361 deletions

3
.dockerignore Normal file
View File

@@ -0,0 +1,3 @@
.git
build
seastar/build

5
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui
@@ -9,3 +9,6 @@
[submodule "xxHash"]
path = xxHash
url = ../xxHash
[submodule "libdeflate"]
path = libdeflate
url = ../libdeflate

View File

@@ -138,4 +138,5 @@ target_include_directories(scylla PUBLIC
${SEASTAR_INCLUDE_DIRS}
${Boost_INCLUDE_DIRS}
xxhash
libdeflate
build/release/gen)

View File

@@ -24,7 +24,9 @@ Running `./install-dependencies.sh` (as root) installs the appropriate packages
### Build system
**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native thread, and up to 3 GB per native thread while linking.
**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native
thread, and up to 3 GB per native thread while linking. GCC >= 8.1.1. is
required.
Scylla is built with [Ninja](https://ninja-build.org/), a low-level rule-based system. A Python script, `configure.py`, generates a Ninja file (`build.ninja`) based on configuration options.
@@ -43,9 +45,7 @@ The full suite of options for project configuration is available via
$ ./configure.py --help
```
The most important options are:
- `--mode={release,debug,all}`: Debug mode enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer) and allows for debugging with tools like GDB. Debugging builds are generally slower and generate much larger object files than release builds.
The most important option is:
- `--{enable,disable}-dpdk`: [DPDK](http://dpdk.org/) is a set of libraries and drivers for fast packet processing. During development, it's not necessary to enable support even if it is supported by your platform.
@@ -57,6 +57,29 @@ To save time -- for instance, to avoid compiling all unit tests -- you can also
$ ninja-build build/release/tests/schema_change_test
```
You can also specify a single mode. For example
```bash
$ ninja-build release
```
Will build everytihng in release mode. The valid modes are
* Debug: Enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer)
and other sanity checks. It has no optimizations, which allows for debugging with tools like
GDB. Debugging builds are generally slower and generate much larger object files than release builds.
* Release: Fewer checks and more optimizations. It still has debug info.
* Dev: No optimizations or debug info. The objective is to compile and link as fast as possible.
This is useful for the first iterations of a patch.
Note that by default unit tests binaries are stripped so they can't be used with gdb or seastar-addr2line.
To include debug information in the unit test binary, build the test binary with a `_g` suffix. For example,
```bash
$ ninja-build build/release/tests/schema_change_test_g
```
### Unit testing
Unit tests live in the `/tests` directory. Like with application source files, test sources and executables are specified manually in `configure.py` and need to be updated when changes are made.
@@ -83,7 +106,7 @@ The `-c1 -m1G` arguments limit this Seastar-based test to a single system thread
### Preparing patches
All changes to Scylla are submitted as patches to the public mailing list. Once a patch is approved by one of the maintainers of the project, it is committed to the maintainers' copy of the repository at https://github.com/scylladb/scylla.
All changes to Scylla are submitted as patches to the public [mailing list](mailto:scylladb-dev@googlegroups.com). Once a patch is approved by one of the maintainers of the project, it is committed to the maintainers' copy of the repository at https://github.com/scylladb/scylla.
Detailed instructions for formatting patches for the mailing list and advice on preparing good patches are available at the [ScyllaDB website](http://docs.scylladb.com/contribute/). There are also some guidelines that can help you make the patch review process smoother:
@@ -112,6 +135,8 @@ The usual is "Tests: unit (release)", although running debug tests is encouraged
5. When answering review comments, prefer inline quotes as they make it easier to track the conversation across multiple e-mails.
6. The Linux kernel's [Submitting Patches](https://www.kernel.org/doc/html/v4.19/process/submitting-patches.html) document offers excellent advice on how to prepare patches and patchsets for review. Since the Scylla development process is derived from the kernel's, almost all of the advice there is directly applicable.
### Finding a person to review and merge your patches
You can use the `scripts/find-maintainer` script to find a subsystem maintainer and/or reviewer for your patches. The script accepts a filename in the git source tree as an argument and outputs a list of subsystems the file belongs to and their respective maintainers and reviewers. For example, if you changed the `cql3/statements/create_view_statement.hh` file, run the script as follows:
@@ -164,6 +189,29 @@ On a development machine, one might run Scylla as
$ SCYLLA_HOME=$HOME/scylla build/release/scylla --overprovisioned --developer-mode=yes
```
To interact with scylla it is recommended to build our versions of
cqlsh and nodetool. They are available at
https://github.com/scylladb/scylla-tools-java and can be built with
```bash
$ ./install-dependencies.sh
$ ant jar
```
cqlsh should work out of the box, but nodetool depends on a running
scylla-jmx (https://github.com/scylladb/scylla-jmx). It can be build
with
```bash
$ mvn package
```
and must be started with
```bash
$ ./scripts/scylla-jmx
```
### Branches and tags
Multiple release branches are maintained on the Git repository at https://github.com/scylladb/scylla. Release 1.5, for instance, is tracked on the `branch-1.5` branch.
@@ -254,7 +302,7 @@ In this example, `10.0.0.2` will be sent up to 16 jobs and the local machine wil
When a compilation is in progress, the status of jobs on all remote machines can be visualized in the terminal with `distccmon-text` or graphically as a GTK application with `distccmon-gnome`.
One thing to keep in mind is that linking object files happens on the coordinating machine, which can be a bottleneck. See the next section speeding up this process.
One thing to keep in mind is that linking object files happens on the coordinating machine, which can be a bottleneck. See the next sections speeding up this process.
### Using the `gold` linker
@@ -264,6 +312,24 @@ Linking Scylla can be slow. The gold linker can replace GNU ld and often speeds
$ sudo alternatives --config ld
```
### Using split dwarf
With debug info enabled, most of the link time is spent copying and
relocating it. It is possible to leave most of the debug info out of
the link by writing it to a side .dwo file. This is done by passing
`-gsplit-dwarf` to gcc.
Unfortunately just `-gsplit-dwarf` would slow down `gdb` startup. To
avoid that the gold linker can be told to create an index with
`--gdb-index`.
More info at https://gcc.gnu.org/wiki/DebugFission.
Both options can be enable by passing `--split-dwarf` to configure.py.
Note that distcc is *not* compatible with it, but icecream
(https://github.com/icecc/icecream) is.
### Testing changes in Seastar with Scylla
Sometimes Scylla development is closely tied with a feature being developed in Seastar. It can be useful to compile Scylla with a particular check-out of Seastar.
@@ -277,3 +343,8 @@ $ git remote add local /home/tsmith/src/seastar
$ git remote update
$ git checkout -t local/my_local_seastar_branch
```
### Core dump debugging
Slides:
2018.11.20: https://www.slideshare.net/tomekgrabiec/scylla-core-dump-debugging-tools

View File

@@ -13,6 +13,11 @@ $ # Rejoice!
Please see [HACKING.md](HACKING.md) for detailed information on building and developing Scylla.
**Note**: GCC >= 8.1.1 is require to compile Scylla.
**Note**: See [frozen toolchain](tools/toolchain/README.md) for a way to build and run
on an older distribution.
## Running Scylla
* Run Scylla

View File

@@ -1,6 +1,7 @@
#!/bin/sh
VERSION=666.development
PRODUCT=scylla
VERSION=3.1.4
if test -f version
then
@@ -22,3 +23,4 @@ echo "$SCYLLA_VERSION-$SCYLLA_RELEASE"
mkdir -p build
echo "$SCYLLA_VERSION" > build/SCYLLA-VERSION-FILE
echo "$SCYLLA_RELEASE" > build/SCYLLA-RELEASE-FILE
echo "$PRODUCT" > build/SCYLLA-PRODUCT-FILE

View File

@@ -611,6 +611,54 @@
}
]
},
{
"path":"/column_family/toppartitions/{name}",
"operations":[
{
"method":"GET",
"summary":"Toppartitions query",
"type":"toppartitions_query_results",
"nickname":"toppartitions",
"produces":[
"application/json"
],
"parameters":[
{
"name":"name",
"description":"The column family name in keyspace:name format",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"duration",
"description":"Duration (in milliseconds) of monitoring operation",
"required":true,
"allowMultiple":false,
"type":"int",
"paramType":"query"
},
{
"name":"list_size",
"description":"number of the top partitions to list",
"required":false,
"allowMultiple":false,
"type":"int",
"paramType":"query"
},
{
"name":"capacity",
"description":"capacity of stream summary: determines amount of resources used in query processing",
"required":false,
"allowMultiple":false,
"type":"int",
"paramType":"query"
}
]
}
]
},
{
"path":"/column_family/metrics/memtable_columns_count/",
"operations":[
@@ -2816,6 +2864,44 @@
"description":"The column family type"
}
}
},
"toppartitions_record":{
"id":"toppartitions_record",
"description":"nodetool toppartitions query record",
"properties":{
"partition":{
"type":"string",
"description":"Partition key"
},
"count":{
"type":"long",
"description":"Number of read/write operations"
},
"error":{
"type":"long",
"description":"Indication of inaccuracy in counting PKs"
}
}
},
"toppartitions_query_results":{
"id":"toppartitions_query_results",
"description":"nodetool toppartitions query results",
"properties":{
"read":{
"type":"array",
"items":{
"type":"toppartitions_record"
},
"description":"Read results"
},
"write":{
"type":"array",
"items":{
"type":"toppartitions_record"
},
"description":"Write results"
}
}
}
}
}

View File

@@ -2228,11 +2228,11 @@
"description":"The column family"
},
"total":{
"type":"int",
"type":"long",
"description":"The total snapshot size"
},
"live":{
"type":"int",
"type":"long",
"description":"The live snapshot size"
}
}

View File

@@ -20,9 +20,9 @@
*/
#include "api.hh"
#include "http/file_handler.hh"
#include "http/transformers.hh"
#include "http/api_docs.hh"
#include <seastar/http/file_handler.hh>
#include <seastar/http/transformers.hh>
#include <seastar/http/api_docs.hh>
#include "storage_service.hh"
#include "commitlog.hh"
#include "gossiper.hh"
@@ -36,11 +36,13 @@
#include "endpoint_snitch.hh"
#include "compaction_manager.hh"
#include "hinted_handoff.hh"
#include "http/exception.hh"
#include <seastar/http/exception.hh>
#include "stream_manager.hh"
#include "system.hh"
#include "api/config.hh"
logging::logger apilog("api");
namespace api {
static std::unique_ptr<reply> exception_reply(std::exception_ptr eptr) {

View File

@@ -21,13 +21,15 @@
#pragma once
#include "json/json_elements.hh"
#include <seastar/json/json_elements.hh>
#include <type_traits>
#include <boost/lexical_cast.hpp>
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/classification.hpp>
#include <boost/units/detail/utility.hpp>
#include "api/api-doc/utils.json.hh"
#include "utils/histogram.hh"
#include "http/exception.hh"
#include <seastar/http/exception.hh>
#include "api_init.hh"
#include "seastarx.hh"
@@ -216,4 +218,42 @@ std::vector<T> concat(std::vector<T> a, std::vector<T>&& b) {
return a;
}
template <class T, class Base = T>
class req_param {
public:
sstring name;
sstring param;
T value;
req_param(const request& req, sstring name, T default_val) : name(name) {
param = req.get_query_param(name);
if (param.empty()) {
value = default_val;
return;
}
try {
// boost::lexical_cast does not use boolalpha. Converting a
// true/false throws exceptions. We don't want that.
if constexpr (std::is_same_v<Base, bool>) {
// Cannot use boolalpha because we (probably) want to
// accept 1 and 0 as well as true and false. And True. And fAlse.
std::transform(param.begin(), param.end(), param.begin(), ::tolower);
if (param == "true" || param == "1") {
value = T(true);
} else if (param == "false" || param == "0") {
value = T(false);
} else {
throw boost::bad_lexical_cast{};
}
} else {
value = T{boost::lexical_cast<Base>(param)};
}
} catch (boost::bad_lexical_cast&) {
throw bad_param_exception(format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));
}
}
operator T() const { return value; }
};
}

View File

@@ -19,9 +19,9 @@
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "database.hh"
#include "database_fwd.hh"
#include "service/storage_proxy.hh"
#include "http/httpd.hh"
#include <seastar/http/httpd.hh>
namespace api {

View File

@@ -21,8 +21,8 @@
#include "collectd.hh"
#include "api/api-doc/collectd.json.hh"
#include "core/scollectd.hh"
#include "core/scollectd_api.hh"
#include <seastar/core/scollectd.hh>
#include <seastar/core/scollectd_api.hh>
#include "endian.h"
#include <boost/range/irange.hpp>
#include <regex>

View File

@@ -22,11 +22,15 @@
#include "column_family.hh"
#include "api/api-doc/column_family.json.hh"
#include <vector>
#include "http/exception.hh"
#include <seastar/http/exception.hh>
#include "sstables/sstables.hh"
#include "utils/estimated_histogram.hh"
#include <algorithm>
#include "db/data_listeners.hh"
extern logging::logger apilog;
namespace api {
using namespace httpd;
@@ -34,7 +38,7 @@ using namespace std;
using namespace json;
namespace cf = httpd::column_family_json;
const utils::UUID& get_uuid(const sstring& name, const database& db) {
std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {
auto pos = name.find("%3A");
size_t end;
if (pos == sstring::npos) {
@@ -46,11 +50,15 @@ const utils::UUID& get_uuid(const sstring& name, const database& db) {
} else {
end = pos + 3;
}
return std::make_tuple(name.substr(0, pos), name.substr(end));
}
const utils::UUID& get_uuid(const sstring& name, const database& db) {
auto [ks, cf] = parse_fully_qualified_cf_name(name);
try {
return db.find_uuid(name.substr(0, pos), name.substr(end));
return db.find_uuid(ks, cf);
} catch (std::out_of_range& e) {
throw bad_param_exception("Column family '" + name.substr(0, pos) + ":"
+ name.substr(end) + "' not found");
throw bad_param_exception(format("Column family '{}:{}' not found", ks, cf));
}
}
@@ -166,27 +174,27 @@ static future<json::json_return_type> get_cf_unleveled_sstables(http_context& ct
}, std::plus<int64_t>());
}
static int64_t min_row_size(column_family& cf) {
static int64_t min_partition_size(column_family& cf) {
int64_t res = INT64_MAX;
for (auto i: *cf.get_sstables() ) {
res = std::min(res, i->get_stats_metadata().estimated_row_size.min());
res = std::min(res, i->get_stats_metadata().estimated_partition_size.min());
}
return (res == INT64_MAX) ? 0 : res;
}
static int64_t max_row_size(column_family& cf) {
static int64_t max_partition_size(column_family& cf) {
int64_t res = 0;
for (auto i: *cf.get_sstables() ) {
res = std::max(i->get_stats_metadata().estimated_row_size.max(), res);
res = std::max(i->get_stats_metadata().estimated_partition_size.max(), res);
}
return res;
}
static integral_ratio_holder mean_row_size(column_family& cf) {
static integral_ratio_holder mean_partition_size(column_family& cf) {
integral_ratio_holder res;
for (auto i: *cf.get_sstables() ) {
auto c = i->get_stats_metadata().estimated_row_size.count();
res.sub += i->get_stats_metadata().estimated_row_size.mean() * c;
auto c = i->get_stats_metadata().estimated_partition_size.count();
res.sub += i->get_stats_metadata().estimated_partition_size.mean() * c;
res.total += c;
}
return res;
@@ -403,22 +411,24 @@ void set_column_family(http_context& ctx, routes& r) {
return get_cf_stats(ctx, &column_family::stats::memtable_switch_count);
});
// FIXME: this refers to partitions, not rows.
cf::get_estimated_row_size_histogram.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](column_family& cf) {
utils::estimated_histogram res(0);
for (auto i: *cf.get_sstables() ) {
res.merge(i->get_stats_metadata().estimated_row_size);
res.merge(i->get_stats_metadata().estimated_partition_size);
}
return res;
},
utils::estimated_histogram_merge, utils_json::estimated_histogram());
});
// FIXME: this refers to partitions, not rows.
cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](column_family& cf) {
uint64_t res = 0;
for (auto i: *cf.get_sstables() ) {
res += i->get_stats_metadata().estimated_row_size.count();
res += i->get_stats_metadata().estimated_partition_size.count();
}
return res;
},
@@ -546,30 +556,36 @@ void set_column_family(http_context& ctx, routes& r) {
return sum_sstable(ctx, true);
});
// FIXME: this refers to partitions, not rows.
cf::get_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_row_size, min_int64);
return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_partition_size, min_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_all_min_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, INT64_MAX, min_row_size, min_int64);
return map_reduce_cf(ctx, INT64_MAX, min_partition_size, min_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_row_size, max_int64);
return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_partition_size, max_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_all_max_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, int64_t(0), max_row_size, max_int64);
return map_reduce_cf(ctx, int64_t(0), max_partition_size, max_int64);
});
// FIXME: this refers to partitions, not rows.
cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
// Cassandra 3.x mean values are truncated as integrals.
return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());
return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());
});
// FIXME: this refers to partitions, not rows.
cf::get_all_mean_row_size.set(r, [&ctx] (std::unique_ptr<request> req) {
// Cassandra 3.x mean values are truncated as integrals.
return map_reduce_cf(ctx, integral_ratio_holder(), mean_row_size, std::plus<integral_ratio_holder>());
return map_reduce_cf(ctx, integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());
});
cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<request> req) {
@@ -920,5 +936,45 @@ void set_column_family(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(container_to_vec(res));
});
});
cf::toppartitions.set(r, [&ctx] (std::unique_ptr<request> req) {
auto name_param = req->param["name"];
auto [ks, cf] = parse_fully_qualified_cf_name(name_param);
api::req_param<std::chrono::milliseconds, unsigned> duration{*req, "duration", 1000ms};
api::req_param<unsigned> capacity(*req, "capacity", 256);
api::req_param<unsigned> list_size(*req, "list_size", 10);
apilog.info("toppartitions query: name={} duration={} list_size={} capacity={}",
name_param, duration.param, list_size.param, capacity.param);
return seastar::do_with(db::toppartitions_query(ctx.db, ks, cf, duration.value, list_size, capacity), [&ctx](auto& q) {
return q.scatter().then([&q] {
return sleep(q.duration()).then([&q] {
return q.gather(q.capacity()).then([&q] (auto topk_results) {
apilog.debug("toppartitions query: processing results");
cf::toppartitions_query_results results;
for (auto& d: topk_results.read.top(q.list_size())) {
cf::toppartitions_record r;
r.partition = sstring(d.item);
r.count = d.count;
r.error = d.error;
results.read.push(r);
}
for (auto& d: topk_results.write.top(q.list_size())) {
cf::toppartitions_record r;
r.partition = sstring(d.item);
r.count = d.count;
r.error = d.error;
results.write.push(r);
}
return make_ready_future<json::json_return_type>(results);
});
});
});
});
});
}
}

View File

@@ -24,6 +24,7 @@
#include "api.hh"
#include "api/api-doc/column_family.json.hh"
#include "database.hh"
#include <seastar/core/future-util.hh>
#include <any>
namespace api {
@@ -71,12 +72,13 @@ struct map_reduce_column_families_locally {
std::any init;
std::function<std::any (column_family&)> mapper;
std::function<std::any (std::any, std::any)> reducer;
std::any operator()(database& db) const {
auto res = init;
for (auto i : db.get_column_families()) {
res = reducer(res, mapper(*i.second.get()));
}
return res;
future<std::any> operator()(database& db) const {
auto res = seastar::make_lw_shared<std::any>(init);
return do_for_each(db.get_column_families(), [res, this](const std::pair<utils::UUID, seastar::lw_shared_ptr<table>>& i) {
*res = reducer(*res.get(), mapper(*i.second.get()));
}).then([res] {
return *res;
});
}
};

View File

@@ -22,15 +22,16 @@
#include "commitlog.hh"
#include <db/commitlog/commitlog.hh>
#include "api/api-doc/commitlog.json.hh"
#include "database.hh"
#include <vector>
namespace api {
template<typename Func>
static auto acquire_cl_metric(http_context& ctx, Func&& func) {
typedef std::result_of_t<Func(db::commitlog *)> ret_type;
template<typename T>
static auto acquire_cl_metric(http_context& ctx, std::function<T (db::commitlog*)> func) {
typedef T ret_type;
return ctx.db.map_reduce0([func = std::forward<Func>(func)](database& db) {
return ctx.db.map_reduce0([func = std::move(func)](database& db) {
if (db.commitlog() == nullptr) {
return make_ready_future<ret_type>();
}
@@ -63,15 +64,15 @@ void set_commitlog(http_context& ctx, routes& r) {
});
httpd::commitlog_json::get_completed_tasks.set(r, [&ctx](std::unique_ptr<request> req) {
return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_completed_tasks, std::placeholders::_1));
return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_completed_tasks, std::placeholders::_1));
});
httpd::commitlog_json::get_pending_tasks.set(r, [&ctx](std::unique_ptr<request> req) {
return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_pending_tasks, std::placeholders::_1));
return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_pending_tasks, std::placeholders::_1));
});
httpd::commitlog_json::get_total_commit_log_size.set(r, [&ctx](std::unique_ptr<request> req) {
return acquire_cl_metric(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));
return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));
});
}

View File

@@ -47,8 +47,8 @@ void set_compaction_manager(http_context& ctx, routes& r) {
for (const auto& c : cm.get_compactions()) {
cm::summary s;
s.ks = c->ks;
s.cf = c->cf;
s.ks = c->ks_name;
s.cf = c->cf_name;
s.unit = "keys";
s.task_type = sstables::compaction_name(c->type);
s.completed = c->total_keys_written;
@@ -103,29 +103,37 @@ void set_compaction_manager(http_context& ctx, routes& r) {
});
cm::get_compaction_history.set(r, [] (std::unique_ptr<request> req) {
return db::system_keyspace::get_compaction_history().then([] (std::vector<db::system_keyspace::compaction_history_entry> history) {
std::vector<cm::history> res;
res.reserve(history.size());
for (auto& entry : history) {
cm::history h;
h.id = entry.id.to_sstring();
h.ks = std::move(entry.ks);
h.cf = std::move(entry.cf);
h.compacted_at = entry.compacted_at;
h.bytes_in = entry.bytes_in;
h.bytes_out = entry.bytes_out;
for (auto it : entry.rows_merged) {
httpd::compaction_manager_json::row_merged e;
e.key = it.first;
e.value = it.second;
h.rows_merged.push(std::move(e));
}
res.push_back(std::move(h));
}
return make_ready_future<json::json_return_type>(res);
});
std::function<future<>(output_stream<char>&&)> f = [](output_stream<char>&& s) {
return do_with(output_stream<char>(std::move(s)), true, [] (output_stream<char>& s, bool& first){
return s.write("[").then([&s, &first] {
return db::system_keyspace::get_compaction_history([&s, &first](const db::system_keyspace::compaction_history_entry& entry) mutable {
cm::history h;
h.id = entry.id.to_sstring();
h.ks = std::move(entry.ks);
h.cf = std::move(entry.cf);
h.compacted_at = entry.compacted_at;
h.bytes_in = entry.bytes_in;
h.bytes_out = entry.bytes_out;
for (auto it : entry.rows_merged) {
httpd::compaction_manager_json::row_merged e;
e.key = it.first;
e.value = it.second;
h.rows_merged.push(std::move(e));
}
auto fut = first ? make_ready_future<>() : s.write(", ");
first = false;
return fut.then([&s, h = std::move(h)] {
return formatter::write(s, h);
});
}).then([&s] {
return s.write("]").then([&s] {
return s.close();
});
});
});
});
};
return make_ready_future<json::json_return_type>(std::move(f));
});
cm::get_compaction_info.set(r, [] (std::unique_ptr<request> req) {

View File

@@ -22,6 +22,7 @@
#include "api/config.hh"
#include "api/api-doc/config.json.hh"
#include "db/config.hh"
#include "database.hh"
#include <sstream>
#include <boost/algorithm/string/replace.hpp>

View File

@@ -23,7 +23,7 @@
#include "api/lsa.hh"
#include "api/api.hh"
#include "http/exception.hh"
#include <seastar/http/exception.hh>
#include "utils/logalloc.hh"
#include "log.hh"

View File

@@ -21,7 +21,7 @@
#include "messaging_service.hh"
#include "message/messaging_service.hh"
#include "rpc/rpc_types.hh"
#include <seastar/rpc/rpc_types.hh>
#include "api/api-doc/messaging_service.json.hh"
#include <iostream>
#include <sstream>
@@ -139,7 +139,7 @@ void set_messaging_service(http_context& ctx, routes& r) {
messaging_verb v = i; // for type safety we use messaging_verb values
auto idx = static_cast<uint32_t>(v);
if (idx >= map->size()) {
throw std::runtime_error(sprint("verb index out of bounds: %lu, map size: %lu", idx, map->size()));
throw std::runtime_error(format("verb index out of bounds: {:d}, map size: {:d}", idx, map->size()));
}
if ((*map)[idx] > 0) {
c.count = (*map)[idx];

View File

@@ -26,6 +26,7 @@
#include "service/storage_service.hh"
#include "db/config.hh"
#include "utils/histogram.hh"
#include "database.hh"
namespace api {

View File

@@ -28,13 +28,17 @@
#include <db/commitlog/commitlog.hh>
#include <gms/gossiper.hh>
#include <db/system_keyspace.hh>
#include "http/exception.hh"
#include <seastar/http/exception.hh>
#include "repair/repair.hh"
#include "locator/snitch_base.hh"
#include "column_family.hh"
#include "log.hh"
#include "release.hh"
#include "sstables/compaction_manager.hh"
#include "sstables/sstables.hh"
#include "database.hh"
sstables::sstable::version_types get_highest_supported_format();
namespace api {
@@ -72,6 +76,19 @@ static std::vector<ss::token_range> describe_ring(const sstring& keyspace) {
}
void set_storage_service(http_context& ctx, routes& r) {
using ks_cf_func = std::function<future<json::json_return_type>(std::unique_ptr<request>, sstring, std::vector<sstring>)>;
auto wrap_ks_cf = [&ctx](ks_cf_func f) {
return [&ctx, f = std::move(f)](std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
auto column_families = split_cf(req->get_query_param("cf"));
if (column_families.empty()) {
column_families = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());
}
return f(std::move(req), std::move(keyspace), std::move(column_families));
};
};
ss::local_hostid.set(r, [](std::unique_ptr<request> req) {
return db::system_keyspace::get_local_host_id().then([](const utils::UUID& id) {
return make_ready_future<json::json_return_type>(id.to_sstring());
@@ -299,24 +316,44 @@ void set_storage_service(http_context& ctx, routes& r) {
});
});
ss::scrub.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
auto column_family = req->get_query_param("cf");
auto disable_snapshot = req->get_query_param("disable_snapshot");
ss::scrub.set(r, wrap_ks_cf([&ctx](std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {
// TODO: respect this
auto skip_corrupted = req->get_query_param("skip_corrupted");
return make_ready_future<json::json_return_type>(json_void());
});
ss::upgrade_sstables.set(r, [&ctx](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto keyspace = validate_keyspace(ctx, req->param);
auto column_family = req->get_query_param("cf");
auto exclude_current_version = req->get_query_param("exclude_current_version");
return make_ready_future<json::json_return_type>(json_void());
});
auto f = make_ready_future<>();
if (!req_param<bool>(*req, "disable_snapshot", false)) {
auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());
f = parallel_for_each(column_families, [keyspace, tag](sstring cf) {
return service::get_local_storage_service().take_column_family_snapshot(keyspace, cf, tag);
});
}
return f.then([&ctx, keyspace, column_families] {
return ctx.db.invoke_on_all([=] (database& db) {
return do_for_each(column_families, [=, &db](sstring cfname) {
auto& cm = db.get_compaction_manager();
auto& cf = db.find_column_family(keyspace, cfname);
return cm.perform_sstable_scrub(&cf);
});
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
}));
ss::upgrade_sstables.set(r, wrap_ks_cf([&ctx](std::unique_ptr<request> req, sstring keyspace, std::vector<sstring> column_families) {
bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);
return ctx.db.invoke_on_all([=] (database& db) {
return do_for_each(column_families, [=, &db](sstring cfname) {
auto& cm = db.get_compaction_manager();
auto& cf = db.find_column_family(keyspace, cfname);
return cm.perform_sstable_upgrade(&cf, exclude_current_version);
});
}).then([]{
return make_ready_future<json::json_return_type>(0);
});
}));
ss::force_keyspace_flush.set(r, [&ctx](std::unique_ptr<request> req) {
auto keyspace = validate_keyspace(ctx, req->param);
@@ -454,7 +491,7 @@ void set_storage_service(http_context& ctx, routes& r) {
return service::get_storage_service().map_reduce(adder<service::storage_service::drain_progress>(), [] (auto& ss) {
return ss.get_drain_progress();
}).then([] (auto&& progress) {
auto progress_str = sprint("Drained %s/%s ColumnFamilies", progress.remaining_cfs, progress.total_cfs);
auto progress_str = format("Drained {}/{} ColumnFamilies", progress.remaining_cfs, progress.total_cfs);
return make_ready_future<json::json_return_type>(std::move(progress_str));
});
});
@@ -665,7 +702,11 @@ void set_storage_service(http_context& ctx, routes& r) {
auto coordinator = std::hash<sstring>()(cf) % smp::count;
return service::get_storage_service().invoke_on(coordinator, [ks = std::move(ks), cf = std::move(cf)] (service::storage_service& s) {
return s.load_new_sstables(ks, cf);
}).then([] {
}).then_wrapped([] (auto&& f) {
if (f.failed()) {
auto msg = fmt::format("Failed to load new sstables: {}", f.get_exception());
return make_exception_future<json::json_return_type>(httpd::server_error_exception(msg));
}
return make_ready_future<json::json_return_type>(json_void());
});
});
@@ -699,7 +740,7 @@ void set_storage_service(http_context& ctx, routes& r) {
} catch (std::out_of_range& e) {
throw httpd::bad_param_exception(e.what());
} catch (std::invalid_argument&){
throw httpd::bad_param_exception(sprint("Bad format in a probability value: \"%s\"", probability.c_str()));
throw httpd::bad_param_exception(format("Bad format in a probability value: \"{}\"", probability.c_str()));
}
});
});
@@ -735,7 +776,7 @@ void set_storage_service(http_context& ctx, routes& r) {
return make_ready_future<json::json_return_type>(json_void());
});
} catch (...) {
throw httpd::bad_param_exception(sprint("Bad format value: "));
throw httpd::bad_param_exception(format("Bad format value: "));
}
});

View File

@@ -22,7 +22,7 @@
#include "api/api-doc/system.json.hh"
#include "api/api.hh"
#include "http/exception.hh"
#include <seastar/http/exception.hh>
#include "log.hh"
namespace api {

View File

@@ -22,6 +22,7 @@
#include "atomic_cell.hh"
#include "atomic_cell_or_collection.hh"
#include "types.hh"
#include "types/collection.hh"
/// LSA mirator for cells with irrelevant type
///
@@ -191,20 +192,20 @@ bool atomic_cell_or_collection::equals(const abstract_type& type, const atomic_c
if (a.timestamp() != b.timestamp()) {
return false;
}
if (a.is_live() != b.is_live()) {
return false;
}
if (a.is_live()) {
if (!b.is_live()) {
if (a.is_counter_update() != b.is_counter_update()) {
return false;
}
if (a.is_counter_update()) {
if (!b.is_counter_update()) {
return false;
}
return a.counter_update_value() == b.counter_update_value();
}
if (a.is_live_and_has_ttl() != b.is_live_and_has_ttl()) {
return false;
}
if (a.is_live_and_has_ttl()) {
if (!b.is_live_and_has_ttl()) {
return false;
}
if (a.ttl() != b.ttl() || a.expiry() != b.expiry()) {
return false;
}
@@ -243,16 +244,18 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)
+ imr_object_type::size_overhead + external_value_size;
}
std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection& c) {
if (!c._data.get()) {
std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection::printer& p) {
if (!p._cell._data.get()) {
return os << "{ null atomic_cell_or_collection }";
}
using dc = data::cell;
os << "{ ";
if (dc::structure::get_member<dc::tags::flags>(c._data.get()).get<dc::tags::collection>()) {
os << "collection";
if (dc::structure::get_member<dc::tags::flags>(p._cell._data.get()).get<dc::tags::collection>()) {
os << "collection ";
auto cmv = p._cell.as_collection_mutation();
os << to_hex(cmv.data.linearize());
} else {
os << "atomic cell";
os << p._cell.as_atomic_cell(p._cdef);
}
return os << " @" << static_cast<const void*>(c._data.get()) << " }";
return os << " }";
}

View File

@@ -26,7 +26,7 @@
#include "tombstone.hh"
#include "gc_clock.hh"
#include "utils/managed_bytes.hh"
#include "net/byteorder.hh"
#include <seastar/net//byteorder.hh>
#include <cstdint>
#include <iosfwd>
#include <seastar/util/gcc6-concepts.hh>

View File

@@ -24,6 +24,7 @@
// Not part of atomic_cell.hh to avoid cyclic dependency between types.hh and atomic_cell.hh
#include "types.hh"
#include "types/collection.hh"
#include "atomic_cell.hh"
#include "atomic_cell_or_collection.hh"
#include "hashing.hh"

View File

@@ -67,7 +67,19 @@ public:
bytes_view serialize() const;
bool equals(const abstract_type& type, const atomic_cell_or_collection& other) const;
size_t external_memory_usage(const abstract_type&) const;
friend std::ostream& operator<<(std::ostream&, const atomic_cell_or_collection&);
class printer {
const column_definition& _cdef;
const atomic_cell_or_collection& _cell;
public:
printer(const column_definition& cdef, const atomic_cell_or_collection& cell)
: _cdef(cdef), _cell(cell) { }
printer(const printer&) = delete;
printer(printer&&) = delete;
friend std::ostream& operator<<(std::ostream&, const printer&);
};
friend std::ostream& operator<<(std::ostream&, const printer&);
};
namespace std {

View File

@@ -72,19 +72,19 @@ public:
return make_ready_future<authenticated_user>(anonymous_user());
}
virtual future<> create(stdx::string_view, const authentication_options& options) const override {
virtual future<> create(std::string_view, const authentication_options& options) const override {
return make_ready_future();
}
virtual future<> alter(stdx::string_view, const authentication_options& options) const override {
virtual future<> alter(std::string_view, const authentication_options& options) const override {
return make_ready_future();
}
virtual future<> drop(stdx::string_view) const override {
virtual future<> drop(std::string_view) const override {
return make_ready_future();
}
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override {
virtual future<custom_options> query_custom_options(std::string_view role_name) const override {
return make_ready_future<custom_options>();
}

View File

@@ -23,7 +23,6 @@
#include "auth/authorizer.hh"
#include "exceptions/exceptions.hh"
#include "stdx.hh"
namespace cql3 {
class query_processor;
@@ -58,12 +57,12 @@ public:
return make_ready_future<permission_set>(permissions::ALL);
}
virtual future<> grant(stdx::string_view, permission_set, const resource&) const override {
virtual future<> grant(std::string_view, permission_set, const resource&) const override {
return make_exception_future<>(
unsupported_authorization_operation("GRANT operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke(stdx::string_view, permission_set, const resource&) const override {
virtual future<> revoke(std::string_view, permission_set, const resource&) const override {
return make_exception_future<>(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}
@@ -74,7 +73,7 @@ public:
"LIST PERMISSIONS operation is not supported by AllowAllAuthorizer"));
}
virtual future<> revoke_all(stdx::string_view) const override {
virtual future<> revoke_all(std::string_view) const override {
return make_exception_future(
unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));
}

View File

@@ -45,7 +45,7 @@
namespace auth {
authenticated_user::authenticated_user(stdx::string_view name)
authenticated_user::authenticated_user(std::string_view name)
: name(sstring(name)) {
}

View File

@@ -41,7 +41,7 @@
#pragma once
#include <experimental/string_view>
#include <string_view>
#include <functional>
#include <iosfwd>
#include <optional>
@@ -49,7 +49,6 @@
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
#include "stdx.hh"
namespace auth {
@@ -67,7 +66,7 @@ public:
/// An anonymous user.
///
authenticated_user() = default;
explicit authenticated_user(stdx::string_view name);
explicit authenticated_user(std::string_view name);
};
///

View File

@@ -57,7 +57,7 @@ inline bool any_authentication_options(const authentication_options& aos) noexce
class unsupported_authentication_option : public std::invalid_argument {
public:
explicit unsupported_authentication_option(authentication_option k)
: std::invalid_argument(sprint("The %s option is not supported.", k)) {
: std::invalid_argument(format("The {} option is not supported.", k)) {
}
};

View File

@@ -45,7 +45,6 @@
#include "auth/common.hh"
#include "auth/password_authenticator.hh"
#include "cql3/query_processor.hh"
#include "db/config.hh"
#include "utils/class_registrator.hh"
const sstring auth::authenticator::USERNAME_KEY("username");

View File

@@ -41,7 +41,7 @@
#pragma once
#include <experimental/string_view>
#include <string_view>
#include <memory>
#include <set>
#include <stdexcept>
@@ -55,10 +55,10 @@
#include "auth/authentication_options.hh"
#include "auth/resource.hh"
#include "auth/sasl_challenge.hh"
#include "bytes.hh"
#include "enum_set.hh"
#include "exceptions/exceptions.hh"
#include "stdx.hh"
namespace db {
class config;
@@ -122,7 +122,7 @@ public:
///
/// The options provided must be a subset of `supported_options()`.
///
virtual future<> create(stdx::string_view role_name, const authentication_options& options) const = 0;
virtual future<> create(std::string_view role_name, const authentication_options& options) const = 0;
///
/// Alter the authentication record of an existing user.
@@ -131,39 +131,25 @@ public:
///
/// Callers must ensure that the specification of `alterable_options()` is adhered to.
///
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const = 0;
virtual future<> alter(std::string_view role_name, const authentication_options& options) const = 0;
///
/// Delete the authentication record for a user. This will disallow the user from logging in.
///
virtual future<> drop(stdx::string_view role_name) const = 0;
virtual future<> drop(std::string_view role_name) const = 0;
///
/// Query for custom options (those corresponding to \ref authentication_options::options).
///
/// If no options are set the result is an empty container.
///
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const = 0;
virtual future<custom_options> query_custom_options(std::string_view role_name) const = 0;
///
/// System resources used internally as part of the implementation. These are made inaccessible to users.
///
virtual const resource_set& protected_resources() const = 0;
///
/// A stateful SASL challenge which supports many authentication schemes (depending on the implementation).
///
class sasl_challenge {
public:
virtual ~sasl_challenge() = default;
virtual bytes evaluate_response(bytes_view client_response) = 0;
virtual bool is_complete() const = 0;
virtual future<authenticated_user> get_authenticated_user() const = 0;
};
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const = 0;
};

View File

@@ -41,7 +41,7 @@
#pragma once
#include <experimental/string_view>
#include <string_view>
#include <functional>
#include <optional>
#include <stdexcept>
@@ -54,7 +54,6 @@
#include "auth/permission.hh"
#include "auth/resource.hh"
#include "seastarx.hh"
#include "stdx.hh"
namespace auth {
@@ -117,14 +116,14 @@ public:
///
/// \throws \ref unsupported_authorization_operation if granting permissions is not supported.
///
virtual future<> grant(stdx::string_view role_name, permission_set, const resource&) const = 0;
virtual future<> grant(std::string_view role_name, permission_set, const resource&) const = 0;
///
/// Revoke a set of permissions from a role for a particular \ref resource.
///
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke(stdx::string_view role_name, permission_set, const resource&) const = 0;
virtual future<> revoke(std::string_view role_name, permission_set, const resource&) const = 0;
///
/// Query for all directly granted permissions.
@@ -138,7 +137,7 @@ public:
///
/// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.
///
virtual future<> revoke_all(stdx::string_view role_name) const = 0;
virtual future<> revoke_all(std::string_view role_name) const = 0;
///
/// Revoke all permissions granted to any role for a particular resource.

View File

@@ -48,9 +48,9 @@ future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_f
struct empty_state { };
return delay_until_system_ready(as).then([&as, func = std::move(func)] () mutable {
return exponential_backoff_retry::do_until_value(1s, 1min, as, [func = std::move(func)] {
return func().then_wrapped([] (auto&& f) -> stdx::optional<empty_state> {
return func().then_wrapped([] (auto&& f) -> std::optional<empty_state> {
if (f.failed()) {
auth_log.info("Auth task failed with error, rescheduling: {}", f.get_exception());
auth_log.debug("Auth task failed with error, rescheduling: {}", f.get_exception());
return { };
}
return { empty_state() };
@@ -60,16 +60,14 @@ future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_f
}
future<> create_metadata_table_if_missing(
stdx::string_view table_name,
std::string_view table_name,
cql3::query_processor& qp,
stdx::string_view cql,
std::string_view cql,
::service::migration_manager& mm) {
auto& db = qp.db().local();
if (db.has_schema(meta::AUTH_KS, sstring(table_name))) {
return make_ready_future<>();
}
static auto ignore_existing = [] (seastar::noncopyable_function<future<>()> func) {
return futurize_apply(std::move(func)).handle_exception_type([] (exceptions::already_exists_exception& ignored) { });
};
auto& db = qp.db();
auto parsed_statement = static_pointer_cast<cql3::statements::raw::cf_statement>(
cql3::query_processor::parse_statement(cql));
@@ -78,20 +76,29 @@ future<> create_metadata_table_if_missing(
auto statement = static_pointer_cast<cql3::statements::create_table_statement>(
parsed_statement->prepare(db, qp.get_cql_stats())->statement);
const auto schema = statement->get_cf_meta_data(qp.db().local());
const auto schema = statement->get_cf_meta_data(qp.db());
const auto uuid = generate_legacy_id(schema->ks_name(), schema->cf_name());
schema_builder b(schema);
b.set_uuid(uuid);
schema_ptr table = b.build();
return ignore_existing([&mm, table = std::move(table)] () {
return mm.announce_new_column_family(table, false);
});
return mm.announce_new_column_family(b.build(), false);
}
future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db) {
future<> wait_for_schema_agreement(::service::migration_manager& mm, const database& db, seastar::abort_source& as) {
static const auto pause = [] { return sleep(std::chrono::milliseconds(500)); };
return do_until([&db] { return db.get_version() != database::empty_version; }, pause).then([&mm] {
return do_until([&mm] { return mm.have_schema_agreement(); }, pause);
return do_until([&db, &as] {
as.check();
return db.get_version() != database::empty_version;
}, pause).then([&mm, &as] {
return do_until([&mm, &as] {
as.check();
return mm.have_schema_agreement();
}, pause);
});
}

View File

@@ -22,7 +22,7 @@
#pragma once
#include <chrono>
#include <experimental/string_view>
#include <string_view>
#include <seastar/core/future.hh>
#include <seastar/core/abort_source.hh>
@@ -76,12 +76,12 @@ inline future<> delay_until_system_ready(seastar::abort_source& as) {
future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func);
future<> create_metadata_table_if_missing(
stdx::string_view table_name,
std::string_view table_name,
cql3::query_processor&,
stdx::string_view cql,
std::string_view cql,
::service::migration_manager&);
future<> wait_for_schema_agreement(::service::migration_manager&, const database&);
future<> wait_for_schema_agreement(::service::migration_manager&, const database&, seastar::abort_source&);
///
/// Time-outs for internal, non-local CQL queries.

View File

@@ -61,6 +61,7 @@ extern "C" {
#include "cql3/untyped_result_set.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "database.hh"
namespace auth {
@@ -94,11 +95,11 @@ default_authorizer::~default_authorizer() {
static const sstring legacy_table_name{"permissions"};
bool default_authorizer::legacy_metadata_exists() const {
return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);
return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);
}
future<bool> default_authorizer::any_granted() const {
static const sstring query = sprint("SELECT * FROM %s.%s LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);
static const sstring query = format("SELECT * FROM {}.{} LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);
return _qp.process(
query,
@@ -112,7 +113,7 @@ future<bool> default_authorizer::any_granted() const {
future<> default_authorizer::migrate_legacy_metadata() const {
alogger.info("Starting migration of legacy permissions metadata.");
static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);
return _qp.process(
query,
@@ -160,7 +161,7 @@ future<> default_authorizer::start() {
_migration_manager).then([this] {
_finished = do_after_system_ready(_as, [this] {
return async([this] {
wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
wait_for_schema_agreement(_migration_manager, _qp.db(), _as).get0();
if (legacy_metadata_exists()) {
if (!any_granted().get0()) {
@@ -178,7 +179,7 @@ future<> default_authorizer::start() {
future<> default_authorizer::stop() {
_as.request_abort();
return _finished.handle_exception_type([](const sleep_aborted&) {});
return _finished.handle_exception_type([](const sleep_aborted&) {}).handle_exception_type([](const abort_requested_exception&) {});
}
future<permission_set>
@@ -187,8 +188,7 @@ default_authorizer::authorize(const role_or_anonymous& maybe_role, const resourc
return make_ready_future<permission_set>(permissions::NONE);
}
static const sstring query = sprint(
"SELECT %s FROM %s.%s WHERE %s = ? AND %s = ?",
static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? AND {} = ?",
PERMISSIONS_NAME,
meta::AUTH_KS,
PERMISSIONS_CF,
@@ -210,13 +210,12 @@ default_authorizer::authorize(const role_or_anonymous& maybe_role, const resourc
future<>
default_authorizer::modify(
stdx::string_view role_name,
std::string_view role_name,
permission_set set,
const resource& resource,
stdx::string_view op) const {
std::string_view op) const {
return do_with(
sprint(
"UPDATE %s.%s SET %s = %s %s ? WHERE %s = ? AND %s = ?",
format("UPDATE {}.{} SET {} = {} {} ? WHERE {} = ? AND {} = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
PERMISSIONS_NAME,
@@ -234,17 +233,16 @@ default_authorizer::modify(
}
future<> default_authorizer::grant(stdx::string_view role_name, permission_set set, const resource& resource) const {
future<> default_authorizer::grant(std::string_view role_name, permission_set set, const resource& resource) const {
return modify(role_name, std::move(set), resource, "+");
}
future<> default_authorizer::revoke(stdx::string_view role_name, permission_set set, const resource& resource) const {
future<> default_authorizer::revoke(std::string_view role_name, permission_set set, const resource& resource) const {
return modify(role_name, std::move(set), resource, "-");
}
future<std::vector<permission_details>> default_authorizer::list_all() const {
static const sstring query = sprint(
"SELECT %s, %s, %s FROM %s.%s",
static const sstring query = format("SELECT {}, {}, {} FROM {}.{}",
ROLE_NAME,
RESOURCE_NAME,
PERMISSIONS_NAME,
@@ -272,9 +270,8 @@ future<std::vector<permission_details>> default_authorizer::list_all() const {
});
}
future<> default_authorizer::revoke_all(stdx::string_view role_name) const {
static const sstring query = sprint(
"DELETE FROM %s.%s WHERE %s = ?",
future<> default_authorizer::revoke_all(std::string_view role_name) const {
static const sstring query = format("DELETE FROM {}.{} WHERE {} = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME);
@@ -293,8 +290,7 @@ future<> default_authorizer::revoke_all(stdx::string_view role_name) const {
}
future<> default_authorizer::revoke_all(const resource& resource) const {
static const sstring query = sprint(
"SELECT %s FROM %s.%s WHERE %s = ? ALLOW FILTERING",
static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? ALLOW FILTERING",
ROLE_NAME,
meta::AUTH_KS,
PERMISSIONS_CF,
@@ -311,8 +307,7 @@ future<> default_authorizer::revoke_all(const resource& resource) const {
res->begin(),
res->end(),
[this, res, resource](const cql3::untyped_result_set::row& r) {
static const sstring query = sprint(
"DELETE FROM %s.%s WHERE %s = ? AND %s = ?",
static const sstring query = format("DELETE FROM {}.{} WHERE {} = ? AND {} = ?",
meta::AUTH_KS,
PERMISSIONS_CF,
ROLE_NAME,

View File

@@ -77,13 +77,13 @@ public:
virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override;
virtual future<> grant(stdx::string_view, permission_set, const resource&) const override;
virtual future<> grant(std::string_view, permission_set, const resource&) const override;
virtual future<> revoke( stdx::string_view, permission_set, const resource&) const override;
virtual future<> revoke( std::string_view, permission_set, const resource&) const override;
virtual future<std::vector<permission_details>> list_all() const override;
virtual future<> revoke_all(stdx::string_view) const override;
virtual future<> revoke_all(std::string_view) const override;
virtual future<> revoke_all(const resource&) const override;
@@ -96,7 +96,7 @@ private:
future<> migrate_legacy_metadata() const;
future<> modify(stdx::string_view, permission_set, const resource&, stdx::string_view) const;
future<> modify(std::string_view, permission_set, const resource&, std::string_view) const;
};
} /* namespace auth */

View File

@@ -44,6 +44,8 @@
#include <algorithm>
#include <chrono>
#include <random>
#include <string_view>
#include <optional>
#include <boost/algorithm/cxx11/all_of.hpp>
#include <seastar/core/reactor.hh>
@@ -56,6 +58,7 @@
#include "log.hh"
#include "service/migration_manager.hh"
#include "utils/class_registrator.hh"
#include "database.hh"
namespace auth {
@@ -93,8 +96,7 @@ static bool has_salted_hash(const cql3::untyped_result_set_row& row) {
return !row.get_or<sstring>(SALTED_HASH, "").empty();
}
static const sstring update_row_query = sprint(
"UPDATE %s SET %s = ? WHERE %s = ?",
static const sstring update_row_query = format("UPDATE {} SET {} = ? WHERE {} = ?",
meta::roles_table::qualified_name(),
SALTED_HASH,
meta::roles_table::role_col_name);
@@ -102,12 +104,12 @@ static const sstring update_row_query = sprint(
static const sstring legacy_table_name{"credentials"};
bool password_authenticator::legacy_metadata_exists() const {
return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);
return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);
}
future<> password_authenticator::migrate_legacy_metadata() const {
plogger.info("Starting migration of legacy authentication metadata.");
static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);
return _qp.process(
query,
@@ -157,7 +159,7 @@ future<> password_authenticator::start() {
_stopped = do_after_system_ready(_as, [this] {
return async([this] {
wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
wait_for_schema_agreement(_migration_manager, _qp.db(), _as).get0();
if (any_nondefault_role_row_satisfies(_qp, &has_salted_hash).get0()) {
if (legacy_metadata_exists()) {
@@ -182,10 +184,10 @@ future<> password_authenticator::start() {
future<> password_authenticator::stop() {
_as.request_abort();
return _stopped.handle_exception_type([] (const sleep_aborted&) { });
return _stopped.handle_exception_type([] (const sleep_aborted&) { }).handle_exception_type([](const abort_requested_exception&) {});
}
db::consistency_level password_authenticator::consistency_for_user(stdx::string_view role_name) {
db::consistency_level password_authenticator::consistency_for_user(std::string_view role_name) {
if (role_name == DEFAULT_USER_NAME) {
return db::consistency_level::QUORUM;
}
@@ -211,10 +213,10 @@ authentication_option_set password_authenticator::alterable_options() const {
future<authenticated_user> password_authenticator::authenticate(
const credentials_map& credentials) const {
if (!credentials.count(USERNAME_KEY)) {
throw exceptions::authentication_exception(sprint("Required key '%s' is missing", USERNAME_KEY));
throw exceptions::authentication_exception(format("Required key '{}' is missing", USERNAME_KEY));
}
if (!credentials.count(PASSWORD_KEY)) {
throw exceptions::authentication_exception(sprint("Required key '%s' is missing", PASSWORD_KEY));
throw exceptions::authentication_exception(format("Required key '{}' is missing", PASSWORD_KEY));
}
auto& username = credentials.at(USERNAME_KEY);
@@ -226,8 +228,7 @@ future<authenticated_user> password_authenticator::authenticate(
// Rely on query processing caching statements instead, and lets assume
// that a map lookup string->statement is not gonna kill us much.
return futurize_apply([this, username, password] {
static const sstring query = sprint(
"SELECT %s FROM %s WHERE %s = ?",
static const sstring query = format("SELECT {} FROM {} WHERE {} = ?",
SALTED_HASH,
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
@@ -241,7 +242,11 @@ future<authenticated_user> password_authenticator::authenticate(
}).then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
if (res->empty() || !passwords::check(password, res->one().get_as<sstring>(SALTED_HASH))) {
auto salted_hash = std::optional<sstring>();
if (!res->empty()) {
salted_hash = res->one().get_opt<sstring>(SALTED_HASH);
}
if (!salted_hash || !passwords::check(password, *salted_hash)) {
throw exceptions::authentication_exception("Username and/or password are incorrect");
}
return make_ready_future<authenticated_user>(username);
@@ -249,13 +254,15 @@ future<authenticated_user> password_authenticator::authenticate(
std::throw_with_nested(exceptions::authentication_exception("Could not verify password"));
} catch (exceptions::request_execution_exception& e) {
std::throw_with_nested(exceptions::authentication_exception(e.what()));
} catch (exceptions::authentication_exception& e) {
std::throw_with_nested(e);
} catch (...) {
std::throw_with_nested(exceptions::authentication_exception("authentication failed"));
}
});
}
future<> password_authenticator::create(stdx::string_view role_name, const authentication_options& options) const {
future<> password_authenticator::create(std::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return make_ready_future<>();
}
@@ -267,13 +274,12 @@ future<> password_authenticator::create(stdx::string_view role_name, const authe
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();
}
future<> password_authenticator::alter(stdx::string_view role_name, const authentication_options& options) const {
future<> password_authenticator::alter(std::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return make_ready_future<>();
}
static const sstring query = sprint(
"UPDATE %s SET %s = ? WHERE %s = ?",
static const sstring query = format("UPDATE {} SET {} = ? WHERE {} = ?",
meta::roles_table::qualified_name(),
SALTED_HASH,
meta::roles_table::role_col_name);
@@ -285,9 +291,8 @@ future<> password_authenticator::alter(stdx::string_view role_name, const authen
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();
}
future<> password_authenticator::drop(stdx::string_view name) const {
static const sstring query = sprint(
"DELETE %s FROM %s WHERE %s = ?",
future<> password_authenticator::drop(std::string_view name) const {
static const sstring query = format("DELETE {} FROM {} WHERE {} = ?",
SALTED_HASH,
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
@@ -298,7 +303,7 @@ future<> password_authenticator::drop(stdx::string_view name) const {
{sstring(name)}).discard_result();
}
future<custom_options> password_authenticator::query_custom_options(stdx::string_view role_name) const {
future<custom_options> password_authenticator::query_custom_options(std::string_view role_name) const {
return make_ready_future<custom_options>();
}
@@ -307,75 +312,13 @@ const resource_set& password_authenticator::protected_resources() const {
return resources;
}
::shared_ptr<authenticator::sasl_challenge> password_authenticator::new_sasl_challenge() const {
class plain_text_password_challenge : public sasl_challenge {
const password_authenticator& _self;
public:
plain_text_password_challenge(const password_authenticator& self) : _self(self) {
}
/**
* SASL PLAIN mechanism specifies that credentials are encoded in a
* sequence of UTF-8 bytes, delimited by 0 (US-ASCII NUL).
* The form is : {code}authzId<NUL>authnId<NUL>password<NUL>{code}
* authzId is optional, and in fact we don't care about it here as we'll
* set the authzId to match the authnId (that is, there is no concept of
* a user being authorized to act on behalf of another).
*
* @param bytes encoded credentials string sent by the client
* @return map containing the username/password pairs in the form an IAuthenticator
* would expect
* @throws javax.security.sasl.SaslException
*/
bytes evaluate_response(bytes_view client_response) override {
plogger.debug("Decoding credentials from client token");
sstring username, password;
auto b = client_response.crbegin();
auto e = client_response.crend();
auto i = b;
while (i != e) {
if (*i == 0) {
sstring tmp(i.base(), b.base());
if (password.empty()) {
password = std::move(tmp);
} else if (username.empty()) {
username = std::move(tmp);
}
b = ++i;
continue;
}
++i;
}
if (username.empty()) {
throw exceptions::authentication_exception("Authentication ID must not be null");
}
if (password.empty()) {
throw exceptions::authentication_exception("Password must not be null");
}
_credentials[USERNAME_KEY] = std::move(username);
_credentials[PASSWORD_KEY] = std::move(password);
_complete = true;
return {};
}
bool is_complete() const override {
return _complete;
}
future<authenticated_user> get_authenticated_user() const override {
return _self.authenticate(_credentials);
}
private:
credentials_map _credentials;
bool _complete = false;
};
return ::make_shared<plain_text_password_challenge>(*this);
::shared_ptr<sasl_challenge> password_authenticator::new_sasl_challenge() const {
return ::make_shared<plain_sasl_challenge>([this](std::string_view username, std::string_view password) {
credentials_map credentials{};
credentials[USERNAME_KEY] = sstring(username);
credentials[PASSWORD_KEY] = sstring(password);
return this->authenticate(credentials);
});
}
}

View File

@@ -61,7 +61,7 @@ class password_authenticator : public authenticator {
seastar::abort_source _as;
public:
static db::consistency_level consistency_for_user(stdx::string_view role_name);
static db::consistency_level consistency_for_user(std::string_view role_name);
password_authenticator(cql3::query_processor&, ::service::migration_manager&);
@@ -81,13 +81,13 @@ public:
virtual future<authenticated_user> authenticate(const credentials_map& credentials) const override;
virtual future<> create(stdx::string_view role_name, const authentication_options& options) const override;
virtual future<> create(std::string_view role_name, const authentication_options& options) const override;
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const override;
virtual future<> alter(std::string_view role_name, const authentication_options& options) const override;
virtual future<> drop(stdx::string_view role_name) const override;
virtual future<> drop(std::string_view role_name) const override;
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override;
virtual future<custom_options> query_custom_options(std::string_view role_name) const override;
virtual const resource_set& protected_resources() const override;

View File

@@ -24,19 +24,9 @@
#include "auth/authorizer.hh"
#include "auth/common.hh"
#include "auth/service.hh"
#include "db/config.hh"
namespace auth {
permissions_cache_config permissions_cache_config::from_db_config(const db::config& dc) {
permissions_cache_config c;
c.max_entries = dc.permissions_cache_max_entries();
c.validity_period = std::chrono::milliseconds(dc.permissions_validity_in_ms());
c.update_period = std::chrono::milliseconds(dc.permissions_update_interval_in_ms());
return c;
}
permissions_cache::permissions_cache(const permissions_cache_config& c, service& ser, logging::logger& log)
: _cache(c.max_entries, c.validity_period, c.update_period, log, [&ser, &log](const key_type& k) {
log.debug("Refreshing permissions for {}", k.first);

View File

@@ -22,7 +22,7 @@
#pragma once
#include <chrono>
#include <experimental/string_view>
#include <string_view>
#include <functional>
#include <iostream>
#include <optional>
@@ -37,7 +37,6 @@
#include "auth/resource.hh"
#include "auth/role_or_anonymous.hh"
#include "log.hh"
#include "stdx.hh"
#include "utils/hash.hh"
#include "utils/loading_cache.hh"
@@ -59,8 +58,6 @@ namespace auth {
class service;
struct permissions_cache_config final {
static permissions_cache_config from_db_config(const db::config&);
std::size_t max_entries;
std::chrono::milliseconds validity_period;
std::chrono::milliseconds update_period;

View File

@@ -61,7 +61,7 @@ std::ostream& operator<<(std::ostream& os, resource_kind kind) {
return os;
}
static const std::unordered_map<resource_kind, stdx::string_view> roots{
static const std::unordered_map<resource_kind, std::string_view> roots{
{resource_kind::data, "data"},
{resource_kind::role, "roles"}};
@@ -101,24 +101,25 @@ static permission_set applicable_permissions(const role_resource_view& rv) {
permission::DESCRIBE>();
}
resource::resource(resource_kind kind) : _kind(kind), _parts{sstring(roots.at(kind))} {
resource::resource(resource_kind kind) : _kind(kind) {
_parts.emplace_back(roots.at(kind));
}
resource::resource(resource_kind kind, std::vector<sstring> parts) : resource(kind) {
_parts.reserve(parts.size() + 1);
resource::resource(resource_kind kind, utils::small_vector<sstring, 3> parts) : resource(kind) {
_parts.insert(_parts.end(), std::make_move_iterator(parts.begin()), std::make_move_iterator(parts.end()));
}
resource::resource(data_resource_t, stdx::string_view keyspace)
: resource(resource_kind::data, std::vector<sstring>{sstring(keyspace)}) {
resource::resource(data_resource_t, std::string_view keyspace) : resource(resource_kind::data) {
_parts.emplace_back(keyspace);
}
resource::resource(data_resource_t, stdx::string_view keyspace, stdx::string_view table)
: resource(resource_kind::data, std::vector<sstring>{sstring(keyspace), sstring(table)}) {
resource::resource(data_resource_t, std::string_view keyspace, std::string_view table) : resource(resource_kind::data) {
_parts.emplace_back(keyspace);
_parts.emplace_back(table);
}
resource::resource(role_resource_t, stdx::string_view role)
: resource(resource_kind::role, std::vector<sstring>{sstring(role)}) {
resource::resource(role_resource_t, std::string_view role) : resource(resource_kind::role) {
_parts.emplace_back(role);
}
sstring resource::name() const {
@@ -173,7 +174,7 @@ data_resource_view::data_resource_view(const resource& r) : _resource(r) {
}
}
std::optional<stdx::string_view> data_resource_view::keyspace() const {
std::optional<std::string_view> data_resource_view::keyspace() const {
if (_resource._parts.size() == 1) {
return {};
}
@@ -181,7 +182,7 @@ std::optional<stdx::string_view> data_resource_view::keyspace() const {
return _resource._parts[1];
}
std::optional<stdx::string_view> data_resource_view::table() const {
std::optional<std::string_view> data_resource_view::table() const {
if (_resource._parts.size() <= 2) {
return {};
}
@@ -210,7 +211,7 @@ role_resource_view::role_resource_view(const resource& r) : _resource(r) {
}
}
std::optional<stdx::string_view> role_resource_view::role() const {
std::optional<std::string_view> role_resource_view::role() const {
if (_resource._parts.size() == 1) {
return {};
}
@@ -230,9 +231,9 @@ std::ostream& operator<<(std::ostream& os, const role_resource_view& v) {
return os;
}
resource parse_resource(stdx::string_view name) {
static const std::unordered_map<stdx::string_view, resource_kind> reverse_roots = [] {
std::unordered_map<stdx::string_view, resource_kind> result;
resource parse_resource(std::string_view name) {
static const std::unordered_map<std::string_view, resource_kind> reverse_roots = [] {
std::unordered_map<std::string_view, resource_kind> result;
for (const auto& pair : roots) {
result.emplace(pair.second, pair.first);
@@ -241,7 +242,7 @@ resource parse_resource(stdx::string_view name) {
return result;
}();
std::vector<sstring> parts;
utils::small_vector<sstring, 3> parts;
boost::split(parts, name, [](char ch) { return ch == '/'; });
if (parts.empty()) {

View File

@@ -41,7 +41,7 @@
#pragma once
#include <experimental/string_view>
#include <string_view>
#include <iostream>
#include <optional>
#include <stdexcept>
@@ -54,15 +54,15 @@
#include "auth/permission.hh"
#include "seastarx.hh"
#include "stdx.hh"
#include "utils/hash.hh"
#include "utils/small_vector.hh"
namespace auth {
class invalid_resource_name : public std::invalid_argument {
public:
explicit invalid_resource_name(stdx::string_view name)
: std::invalid_argument(sprint("The resource name '%s' is invalid.", name)) {
explicit invalid_resource_name(std::string_view name)
: std::invalid_argument(format("The resource name '{}' is invalid.", name)) {
}
};
@@ -98,16 +98,16 @@ struct role_resource_t final {};
class resource final {
resource_kind _kind;
std::vector<sstring> _parts;
utils::small_vector<sstring, 3> _parts;
public:
///
/// A root resource of a particular kind.
///
explicit resource(resource_kind);
resource(data_resource_t, stdx::string_view keyspace);
resource(data_resource_t, stdx::string_view keyspace, stdx::string_view table);
resource(role_resource_t, stdx::string_view role);
resource(data_resource_t, std::string_view keyspace);
resource(data_resource_t, std::string_view keyspace, std::string_view table);
resource(role_resource_t, std::string_view role);
resource_kind kind() const noexcept {
return _kind;
@@ -123,7 +123,7 @@ public:
permission_set applicable_permissions() const;
private:
resource(resource_kind, std::vector<sstring> parts);
resource(resource_kind, utils::small_vector<sstring, 3> parts);
friend class std::hash<resource>;
friend class data_resource_view;
@@ -131,7 +131,7 @@ private:
friend bool operator<(const resource&, const resource&);
friend bool operator==(const resource&, const resource&);
friend resource parse_resource(stdx::string_view);
friend resource parse_resource(std::string_view);
};
bool operator<(const resource&, const resource&);
@@ -150,7 +150,7 @@ class resource_kind_mismatch : public std::invalid_argument {
public:
explicit resource_kind_mismatch(resource_kind expected, resource_kind actual)
: std::invalid_argument(
sprint("This resource has kind '%s', but was expected to have kind '%s'.", actual, expected)) {
format("This resource has kind '{}', but was expected to have kind '{}'.", actual, expected)) {
}
};
@@ -166,9 +166,9 @@ public:
///
explicit data_resource_view(const resource& r);
std::optional<stdx::string_view> keyspace() const;
std::optional<std::string_view> keyspace() const;
std::optional<stdx::string_view> table() const;
std::optional<std::string_view> table() const;
};
std::ostream& operator<<(std::ostream&, const data_resource_view&);
@@ -187,7 +187,7 @@ public:
///
explicit role_resource_view(const resource&);
std::optional<stdx::string_view> role() const;
std::optional<std::string_view> role() const;
};
std::ostream& operator<<(std::ostream&, const role_resource_view&);
@@ -197,20 +197,20 @@ std::ostream& operator<<(std::ostream&, const role_resource_view&);
///
/// \throws \ref invalid_resource_name when the name is malformed.
///
resource parse_resource(stdx::string_view name);
resource parse_resource(std::string_view name);
const resource& root_data_resource();
inline resource make_data_resource(stdx::string_view keyspace) {
inline resource make_data_resource(std::string_view keyspace) {
return resource(data_resource_t{}, keyspace);
}
inline resource make_data_resource(stdx::string_view keyspace, stdx::string_view table) {
inline resource make_data_resource(std::string_view keyspace, std::string_view table) {
return resource(data_resource_t{}, keyspace, table);
}
const resource& root_role_resource();
inline resource make_role_resource(stdx::string_view role) {
inline resource make_role_resource(std::string_view role) {
return resource(role_resource_t{}, role);
}

View File

@@ -21,7 +21,7 @@
#pragma once
#include <experimental/string_view>
#include <string_view>
#include <memory>
#include <optional>
#include <stdexcept>
@@ -33,7 +33,6 @@
#include "auth/resource.hh"
#include "seastarx.hh"
#include "stdx.hh"
namespace auth {
@@ -60,31 +59,31 @@ public:
class role_already_exists : public roles_argument_exception {
public:
explicit role_already_exists(stdx::string_view role_name)
: roles_argument_exception(sprint("Role %s already exists.", role_name)) {
explicit role_already_exists(std::string_view role_name)
: roles_argument_exception(format("Role {} already exists.", role_name)) {
}
};
class nonexistant_role : public roles_argument_exception {
public:
explicit nonexistant_role(stdx::string_view role_name)
: roles_argument_exception(sprint("Role %s doesn't exist.", role_name)) {
explicit nonexistant_role(std::string_view role_name)
: roles_argument_exception(format("Role {} doesn't exist.", role_name)) {
}
};
class role_already_included : public roles_argument_exception {
public:
role_already_included(stdx::string_view grantee_name, stdx::string_view role_name)
role_already_included(std::string_view grantee_name, std::string_view role_name)
: roles_argument_exception(
sprint("%s already includes role %s.", grantee_name, role_name)) {
format("{} already includes role {}.", grantee_name, role_name)) {
}
};
class revoke_ungranted_role : public roles_argument_exception {
public:
revoke_ungranted_role(stdx::string_view revokee_name, stdx::string_view role_name)
revoke_ungranted_role(std::string_view revokee_name, std::string_view role_name)
: roles_argument_exception(
sprint("%s was not granted role %s, so it cannot be revoked.", revokee_name, role_name)) {
format("{} was not granted role {}, so it cannot be revoked.", revokee_name, role_name)) {
}
};
@@ -104,7 +103,7 @@ class role_manager {
public:
virtual ~role_manager() = default;
virtual stdx::string_view qualified_java_name() const noexcept = 0;
virtual std::string_view qualified_java_name() const noexcept = 0;
virtual const resource_set& protected_resources() const = 0;
@@ -115,17 +114,17 @@ public:
///
/// \returns an exceptional future with \ref role_already_exists for a role that has previously been created.
///
virtual future<> create(stdx::string_view role_name, const role_config&) const = 0;
virtual future<> create(std::string_view role_name, const role_config&) const = 0;
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
virtual future<> drop(stdx::string_view role_name) const = 0;
virtual future<> drop(std::string_view role_name) const = 0;
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
virtual future<> alter(stdx::string_view role_name, const role_config_update&) const = 0;
virtual future<> alter(std::string_view role_name, const role_config_update&) const = 0;
///
/// Grant `role_name` to `grantee_name`.
@@ -135,7 +134,7 @@ public:
/// \returns an exceptional future with \ref role_already_included if granting the role would be redundant, or
/// create a cycle.
///
virtual future<> grant(stdx::string_view grantee_name, stdx::string_view role_name) const = 0;
virtual future<> grant(std::string_view grantee_name, std::string_view role_name) const = 0;
///
/// Revoke `role_name` from `revokee_name`.
@@ -144,26 +143,26 @@ public:
///
/// \returns an exceptional future with \ref revoke_ungranted_role if the role was not granted.
///
virtual future<> revoke(stdx::string_view revokee_name, stdx::string_view role_name) const = 0;
virtual future<> revoke(std::string_view revokee_name, std::string_view role_name) const = 0;
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
virtual future<role_set> query_granted(stdx::string_view grantee, recursive_role_query) const = 0;
virtual future<role_set> query_granted(std::string_view grantee, recursive_role_query) const = 0;
virtual future<role_set> query_all() const = 0;
virtual future<bool> exists(stdx::string_view role_name) const = 0;
virtual future<bool> exists(std::string_view role_name) const = 0;
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
virtual future<bool> is_superuser(stdx::string_view role_name) const = 0;
virtual future<bool> is_superuser(std::string_view role_name) const = 0;
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
virtual future<bool> can_login(stdx::string_view role_name) const = 0;
virtual future<bool> can_login(std::string_view role_name) const = 0;
};
}

View File

@@ -21,7 +21,7 @@
#pragma once
#include <experimental/string_view>
#include <string_view>
#include <functional>
#include <iosfwd>
#include <optional>
@@ -29,7 +29,6 @@
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
#include "stdx.hh"
namespace auth {
@@ -38,7 +37,7 @@ public:
std::optional<sstring> name{};
role_or_anonymous() = default;
role_or_anonymous(stdx::string_view name) : name(name) {
role_or_anonymous(std::string_view name) : name(name) {
}
};

View File

@@ -36,7 +36,7 @@ namespace meta {
namespace roles_table {
stdx::string_view creation_query() {
std::string_view creation_query() {
static const sstring instance = sprint(
"CREATE TABLE %s ("
" %s text PRIMARY KEY,"
@@ -51,7 +51,7 @@ stdx::string_view creation_query() {
return instance;
}
stdx::string_view qualified_name() noexcept {
std::string_view qualified_name() noexcept {
static const sstring instance = AUTH_KS + "." + sstring(name);
return instance;
}
@@ -63,8 +63,7 @@ stdx::string_view qualified_name() noexcept {
future<bool> default_role_row_satisfies(
cql3::query_processor& qp,
std::function<bool(const cql3::untyped_result_set_row&)> p) {
static const sstring query = sprint(
"SELECT * FROM %s WHERE %s = ?",
static const sstring query = format("SELECT * FROM {} WHERE {} = ?",
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
@@ -98,7 +97,7 @@ future<bool> default_role_row_satisfies(
future<bool> any_nondefault_role_row_satisfies(
cql3::query_processor& qp,
std::function<bool(const cql3::untyped_result_set_row&)> p) {
static const sstring query = sprint("SELECT * FROM %s", meta::roles_table::qualified_name());
static const sstring query = format("SELECT * FROM {}", meta::roles_table::qualified_name());
return do_with(std::move(p), [&qp](const auto& p) {
return qp.process(

View File

@@ -21,13 +21,12 @@
#pragma once
#include <experimental/string_view>
#include <string_view>
#include <functional>
#include <seastar/core/future.hh>
#include "seastarx.hh"
#include "stdx.hh"
namespace cql3 {
class query_processor;
@@ -40,13 +39,13 @@ namespace meta {
namespace roles_table {
stdx::string_view creation_query();
std::string_view creation_query();
constexpr stdx::string_view name{"roles", 5};
constexpr std::string_view name{"roles", 5};
stdx::string_view qualified_name() noexcept;
std::string_view qualified_name() noexcept;
constexpr stdx::string_view role_col_name{"role", 4};
constexpr std::string_view role_col_name{"role", 4};
}

102
auth/sasl_challenge.cc Normal file
View File

@@ -0,0 +1,102 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2019 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include "auth/sasl_challenge.hh"
#include "exceptions/exceptions.hh"
namespace auth {
/**
* SASL PLAIN mechanism specifies that credentials are encoded in a
* sequence of UTF-8 bytes, delimited by 0 (US-ASCII NUL).
* The form is : {code}authzId<NUL>authnId<NUL>password<NUL>{code}
* authzId is optional, and in fact we don't care about it here as we'll
* set the authzId to match the authnId (that is, there is no concept of
* a user being authorized to act on behalf of another).
*
* @param bytes encoded credentials string sent by the client
* @return map containing the username/password pairs in the form an IAuthenticator
* would expect
* @throws javax.security.sasl.SaslException
*/
bytes plain_sasl_challenge::evaluate_response(bytes_view client_response) {
sstring username, password;
auto b = client_response.crbegin();
auto e = client_response.crend();
auto i = b;
while (i != e) {
if (*i == 0) {
sstring tmp(i.base(), b.base());
if (password.empty()) {
password = std::move(tmp);
} else if (username.empty()) {
username = std::move(tmp);
}
b = ++i;
continue;
}
++i;
}
if (username.empty()) {
throw exceptions::authentication_exception("Authentication ID must not be null");
}
if (password.empty()) {
throw exceptions::authentication_exception("Password must not be null");
}
_username = std::move(username);
_password = std::move(password);
return {};
}
bool plain_sasl_challenge::is_complete() const {
return _username && _password;
}
future<authenticated_user> plain_sasl_challenge::get_authenticated_user() const {
return _when_complete(*_username, *_password);
}
}

89
auth/sasl_challenge.hh Normal file
View File

@@ -0,0 +1,89 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2019 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <functional>
#include <optional>
#include <string_view>
#include <seastar/core/future.hh>
#include <seastar/core/sstring.hh>
#include "auth/authenticated_user.hh"
#include "bytes.hh"
#include "seastarx.hh"
namespace auth {
///
/// A stateful SASL challenge which supports many authentication schemes (depending on the implementation).
///
class sasl_challenge {
public:
virtual ~sasl_challenge() = default;
virtual bytes evaluate_response(bytes_view client_response) = 0;
virtual bool is_complete() const = 0;
virtual future<authenticated_user> get_authenticated_user() const = 0;
};
class plain_sasl_challenge : public sasl_challenge {
public:
using completion_callback = std::function<future<authenticated_user>(std::string_view, std::string_view)>;
explicit plain_sasl_challenge(completion_callback f) : _when_complete(std::move(f)) {
}
virtual bytes evaluate_response(bytes_view) override;
virtual bool is_complete() const override;
virtual future<authenticated_user> get_authenticated_user() const override;
private:
std::optional<sstring> _username, _password;
completion_callback _when_complete;
};
}

View File

@@ -36,12 +36,12 @@
#include "auth/standard_role_manager.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
#include "db/config.hh"
#include "db/consistency_level_type.hh"
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "service/migration_listener.hh"
#include "utils/class_registrator.hh"
#include "database.hh"
namespace auth {
@@ -97,7 +97,7 @@ private:
void on_drop_view(const sstring& ks_name, const sstring& view_name) override {}
};
static future<> validate_role_exists(const service& ser, stdx::string_view role_name) {
static future<> validate_role_exists(const service& ser, std::string_view role_name) {
return ser.underlying_role_manager().exists(role_name).then([role_name](bool exists) {
if (!exists) {
throw nonexistant_role(role_name);
@@ -105,19 +105,6 @@ static future<> validate_role_exists(const service& ser, stdx::string_view role_
});
}
service_config service_config::from_db_config(const db::config& dc) {
const qualified_name qualified_authorizer_name(meta::AUTH_PACKAGE_NAME, dc.authorizer());
const qualified_name qualified_authenticator_name(meta::AUTH_PACKAGE_NAME, dc.authenticator());
const qualified_name qualified_role_manager_name(meta::AUTH_PACKAGE_NAME, dc.role_manager());
service_config c;
c.authorizer_java_name = qualified_authorizer_name;
c.authenticator_java_name = qualified_authenticator_name;
c.role_manager_java_name = qualified_role_manager_name;
return c;
}
service::service(
permissions_cache_config c,
cql3::query_processor& qp,
@@ -139,8 +126,7 @@ service::service(
if ((_authenticator->qualified_java_name() == password_authenticator_name())
&& (_role_manager->qualified_java_name() != standard_role_manager_name())) {
throw incompatible_module_combination(
sprint(
"The %s authenticator must be loaded alongside the %s role-manager.",
format("The {} authenticator must be loaded alongside the {} role-manager.",
password_authenticator_name(),
standard_role_manager_name()));
}
@@ -161,7 +147,7 @@ service::service(
}
future<> service::create_keyspace_if_missing() const {
auto& db = _qp.db().local();
auto& db = _qp.db();
if (!db.has_keyspace(meta::AUTH_KS)) {
std::map<sstring, sstring> opts{{"replication_factor", "1"}};
@@ -184,7 +170,9 @@ future<> service::start() {
return once_among_shards([this] {
return create_keyspace_if_missing();
}).then([this] {
return when_all_succeed(_role_manager->start(), _authorizer->start(), _authenticator->start());
return _role_manager->start().then([this] {
return when_all_succeed(_authorizer->start(), _authenticator->start());
});
}).then([this] {
_permissions_cache = std::make_unique<permissions_cache>(_permissions_cache_config, *this, log);
}).then([this] {
@@ -196,24 +184,26 @@ future<> service::start() {
}
future<> service::stop() {
// Only one of the shards has the listener registered, but let's try to
// unregister on each one just to make sure.
_migration_manager.unregister_listener(_migration_listener.get());
return _permissions_cache->stop().then([this] {
return when_all_succeed(_role_manager->stop(), _authorizer->stop(), _authenticator->stop());
});
}
future<bool> service::has_existing_legacy_users() const {
if (!_qp.db().local().has_schema(meta::AUTH_KS, meta::USERS_CF)) {
if (!_qp.db().has_schema(meta::AUTH_KS, meta::USERS_CF)) {
return make_ready_future<bool>(false);
}
static const sstring default_user_query = sprint(
"SELECT * FROM %s.%s WHERE %s = ?",
static const sstring default_user_query = format("SELECT * FROM {}.{} WHERE {} = ?",
meta::AUTH_KS,
meta::USERS_CF,
meta::user_name_col_name);
static const sstring all_users_query = sprint(
"SELECT * FROM %s.%s LIMIT 1",
static const sstring all_users_query = format("SELECT * FROM {}.{} LIMIT 1",
meta::AUTH_KS,
meta::USERS_CF);
@@ -256,7 +246,7 @@ service::get_uncached_permissions(const role_or_anonymous& maybe_role, const res
return _authorizer->authorize(maybe_role, r);
}
const stdx::string_view role_name = *maybe_role.name;
const std::string_view role_name = *maybe_role.name;
return has_superuser(role_name).then([this, role_name, &r](bool superuser) {
if (superuser) {
@@ -270,7 +260,7 @@ service::get_uncached_permissions(const role_or_anonymous& maybe_role, const res
return do_with(permission_set(), [this, role_name, &r](auto& all_perms) {
return get_roles(role_name).then([this, &r, &all_perms](role_set all_roles) {
return do_with(std::move(all_roles), [this, &r, &all_perms](const auto& all_roles) {
return parallel_for_each(all_roles, [this, &r, &all_perms](stdx::string_view role_name) {
return parallel_for_each(all_roles, [this, &r, &all_perms](std::string_view role_name) {
return _authorizer->authorize(role_name, r).then([&all_perms](permission_set perms) {
all_perms = permission_set::from_mask(all_perms.mask() | perms.mask());
});
@@ -287,7 +277,7 @@ future<permission_set> service::get_permissions(const role_or_anonymous& maybe_r
return _permissions_cache->get(maybe_role, r);
}
future<bool> service::has_superuser(stdx::string_view role_name) const {
future<bool> service::has_superuser(std::string_view role_name) const {
return this->get_roles(std::move(role_name)).then([this](role_set roles) {
return do_with(std::move(roles), [this](const role_set& roles) {
return do_with(false, roles.begin(), [this, &roles](bool& any_super, auto& iter) {
@@ -305,7 +295,7 @@ future<bool> service::has_superuser(stdx::string_view role_name) const {
});
}
future<role_set> service::get_roles(stdx::string_view role_name) const {
future<role_set> service::get_roles(std::string_view role_name) const {
//
// We may wish to cache this information in the future (as Apache Cassandra does).
//
@@ -316,7 +306,7 @@ future<role_set> service::get_roles(stdx::string_view role_name) const {
future<bool> service::exists(const resource& r) const {
switch (r.kind()) {
case resource_kind::data: {
const auto& db = _qp.db().local();
const auto& db = _qp.db();
data_resource_view v(r);
const auto keyspace = v.keyspace();
@@ -411,7 +401,7 @@ static void validate_authentication_options_are_supported(
future<> create_role(
const service& ser,
stdx::string_view name,
std::string_view name,
const role_config& config,
const authentication_options& options) {
return ser.underlying_role_manager().create(name, config).then([&ser, name, &options] {
@@ -435,7 +425,7 @@ future<> create_role(
future<> alter_role(
const service& ser,
stdx::string_view name,
std::string_view name,
const role_config_update& config_update,
const authentication_options& options) {
return ser.underlying_role_manager().alter(name, config_update).then([&ser, name, &options] {
@@ -452,7 +442,7 @@ future<> alter_role(
});
}
future<> drop_role(const service& ser, stdx::string_view name) {
future<> drop_role(const service& ser, std::string_view name) {
return do_with(make_role_resource(name), [&ser, name](const resource& r) {
auto& a = ser.underlying_authorizer();
@@ -468,14 +458,14 @@ future<> drop_role(const service& ser, stdx::string_view name) {
});
}
future<bool> has_role(const service& ser, stdx::string_view grantee, stdx::string_view name) {
future<bool> has_role(const service& ser, std::string_view grantee, std::string_view name) {
return when_all_succeed(
validate_role_exists(ser, name),
ser.get_roles(grantee)).then([name](role_set all_roles) {
return make_ready_future<bool>(all_roles.count(sstring(name)) != 0);
});
}
future<bool> has_role(const service& ser, const authenticated_user& u, stdx::string_view name) {
future<bool> has_role(const service& ser, const authenticated_user& u, std::string_view name) {
if (is_anonymous(u)) {
return make_ready_future<bool>(false);
}
@@ -485,7 +475,7 @@ future<bool> has_role(const service& ser, const authenticated_user& u, stdx::str
future<> grant_permissions(
const service& ser,
stdx::string_view role_name,
std::string_view role_name,
permission_set perms,
const resource& r) {
return validate_role_exists(ser, role_name).then([&ser, role_name, perms, &r] {
@@ -493,7 +483,7 @@ future<> grant_permissions(
});
}
future<> grant_applicable_permissions(const service& ser, stdx::string_view role_name, const resource& r) {
future<> grant_applicable_permissions(const service& ser, std::string_view role_name, const resource& r) {
return grant_permissions(ser, role_name, r.applicable_permissions(), r);
}
future<> grant_applicable_permissions(const service& ser, const authenticated_user& u, const resource& r) {
@@ -506,7 +496,7 @@ future<> grant_applicable_permissions(const service& ser, const authenticated_us
future<> revoke_permissions(
const service& ser,
stdx::string_view role_name,
std::string_view role_name,
permission_set perms,
const resource& r) {
return validate_role_exists(ser, role_name).then([&ser, role_name, perms, &r] {
@@ -517,7 +507,7 @@ future<> revoke_permissions(
future<std::vector<permission_details>> list_filtered_permissions(
const service& ser,
permission_set perms,
std::optional<stdx::string_view> role_name,
std::optional<std::string_view> role_name,
const std::optional<std::pair<resource, recursive_permissions>>& resource_filter) {
return ser.underlying_authorizer().list_all().then([&ser, perms, role_name, &resource_filter](
std::vector<permission_details> all_details) {

View File

@@ -21,7 +21,7 @@
#pragma once
#include <experimental/string_view>
#include <string_view>
#include <memory>
#include <optional>
@@ -35,16 +35,11 @@
#include "auth/permissions_cache.hh"
#include "auth/role_manager.hh"
#include "seastarx.hh"
#include "stdx.hh"
namespace cql3 {
class query_processor;
}
namespace db {
class config;
}
namespace service {
class migration_manager;
class migration_listener;
@@ -55,8 +50,6 @@ namespace auth {
class role_or_anonymous;
struct service_config final {
static service_config from_db_config(const db::config&);
sstring authorizer_java_name;
sstring authenticator_java_name;
sstring role_manager_java_name;
@@ -141,13 +134,13 @@ public:
///
/// \returns an exceptional future with \ref nonexistant_role if the role does not exist.
///
future<bool> has_superuser(stdx::string_view role_name) const;
future<bool> has_superuser(std::string_view role_name) const;
///
/// Return the set of all roles granted to the given role, including itself and roles granted through other roles.
///
/// \returns an exceptional future with \ref nonexistent_role if the role does not exist.
future<role_set> get_roles(stdx::string_view role_name) const;
future<role_set> get_roles(std::string_view role_name) const;
future<bool> exists(const resource&) const;
@@ -197,7 +190,7 @@ bool is_protected(const service&, const resource&) noexcept;
///
future<> create_role(
const service&,
stdx::string_view name,
std::string_view name,
const role_config&,
const authentication_options&);
@@ -210,7 +203,7 @@ future<> create_role(
///
future<> alter_role(
const service&,
stdx::string_view name,
std::string_view name,
const role_config_update&,
const authentication_options&);
@@ -219,20 +212,20 @@ future<> alter_role(
///
/// \returns an exceptional future with \ref nonexistant_role if the named role does not exist.
///
future<> drop_role(const service&, stdx::string_view name);
future<> drop_role(const service&, std::string_view name);
///
/// Check if `grantee` has been granted the named role.
///
/// \returns an exceptional future with \ref nonexistent_role if `grantee` or `name` do not exist.
///
future<bool> has_role(const service&, stdx::string_view grantee, stdx::string_view name);
future<bool> has_role(const service&, std::string_view grantee, std::string_view name);
///
/// Check if the authenticated user has been granted the named role.
///
/// \returns an exceptional future with \ref nonexistent_role if the user or `name` do not exist.
///
future<bool> has_role(const service&, const authenticated_user&, stdx::string_view name);
future<bool> has_role(const service&, const authenticated_user&, std::string_view name);
///
/// \returns an exceptional future with \ref nonexistent_role if the named role does not exist.
@@ -242,7 +235,7 @@ future<bool> has_role(const service&, const authenticated_user&, stdx::string_vi
///
future<> grant_permissions(
const service&,
stdx::string_view role_name,
std::string_view role_name,
permission_set,
const resource&);
@@ -254,7 +247,7 @@ future<> grant_permissions(
/// \returns an exceptional future with \ref unsupported_authorization_operation if granting permissions is not
/// supported.
///
future<> grant_applicable_permissions(const service&, stdx::string_view role_name, const resource&);
future<> grant_applicable_permissions(const service&, std::string_view role_name, const resource&);
future<> grant_applicable_permissions(const service&, const authenticated_user&, const resource&);
///
@@ -265,7 +258,7 @@ future<> grant_applicable_permissions(const service&, const authenticated_user&,
///
future<> revoke_permissions(
const service&,
stdx::string_view role_name,
std::string_view role_name,
permission_set,
const resource&);
@@ -290,7 +283,7 @@ using recursive_permissions = bool_class<struct recursive_permissions_tag>;
future<std::vector<permission_details>> list_filtered_permissions(
const service&,
permission_set,
std::optional<stdx::string_view> role_name,
std::optional<std::string_view> role_name,
const std::optional<std::pair<resource, recursive_permissions>>& resource_filter);
}

View File

@@ -21,7 +21,7 @@
#include "auth/standard_role_manager.hh"
#include <experimental/optional>
#include <optional>
#include <unordered_set>
#include <vector>
@@ -39,6 +39,7 @@
#include "exceptions/exceptions.hh"
#include "log.hh"
#include "utils/class_registrator.hh"
#include "database.hh"
namespace auth {
@@ -46,9 +47,9 @@ namespace meta {
namespace role_members_table {
constexpr stdx::string_view name{"role_members" , 12};
constexpr std::string_view name{"role_members" , 12};
static stdx::string_view qualified_name() noexcept {
static std::string_view qualified_name() noexcept {
static const sstring instance = AUTH_KS + "." + sstring(name);
return instance;
}
@@ -72,7 +73,7 @@ struct record final {
role_set member_of;
};
static db::consistency_level consistency_for_role(stdx::string_view role_name) noexcept {
static db::consistency_level consistency_for_role(std::string_view role_name) noexcept {
if (role_name == meta::DEFAULT_SUPERUSER_NAME) {
return db::consistency_level::QUORUM;
}
@@ -80,9 +81,8 @@ static db::consistency_level consistency_for_role(stdx::string_view role_name) n
return db::consistency_level::LOCAL_ONE;
}
static future<stdx::optional<record>> find_record(cql3::query_processor& qp, stdx::string_view role_name) {
static const sstring query = sprint(
"SELECT * FROM %s WHERE %s = ?",
static future<std::optional<record>> find_record(cql3::query_processor& qp, std::string_view role_name) {
static const sstring query = format("SELECT * FROM {} WHERE {} = ?",
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
@@ -93,12 +93,12 @@ static future<stdx::optional<record>> find_record(cql3::query_processor& qp, std
{sstring(role_name)},
true).then([](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return stdx::optional<record>();
return std::optional<record>();
}
const cql3::untyped_result_set_row& row = results->one();
return stdx::make_optional(
return std::make_optional(
record{
row.get_as<sstring>(sstring(meta::roles_table::role_col_name)),
row.get_as<bool>("is_superuser"),
@@ -109,8 +109,8 @@ static future<stdx::optional<record>> find_record(cql3::query_processor& qp, std
});
}
static future<record> require_record(cql3::query_processor& qp, stdx::string_view role_name) {
return find_record(qp, role_name).then([role_name](stdx::optional<record> mr) {
static future<record> require_record(cql3::query_processor& qp, std::string_view role_name) {
return find_record(qp, role_name).then([role_name](std::optional<record> mr) {
if (!mr) {
throw nonexistant_role(role_name);
}
@@ -123,12 +123,12 @@ static bool has_can_login(const cql3::untyped_result_set_row& row) {
return row.has("can_login") && !(boolean_type->deserialize(row.get_blob("can_login")).is_null());
}
stdx::string_view standard_role_manager_name() noexcept {
std::string_view standard_role_manager_name() noexcept {
static const sstring instance = meta::AUTH_PACKAGE_NAME + "CassandraRoleManager";
return instance;
}
stdx::string_view standard_role_manager::qualified_java_name() const noexcept {
std::string_view standard_role_manager::qualified_java_name() const noexcept {
return standard_role_manager_name();
}
@@ -166,8 +166,7 @@ future<> standard_role_manager::create_metadata_tables_if_missing() const {
future<> standard_role_manager::create_default_role_if_missing() const {
return default_role_row_satisfies(_qp, &has_can_login).then([this](bool exists) {
if (!exists) {
static const sstring query = sprint(
"INSERT INTO %s (%s, is_superuser, can_login) VALUES (?, true, true)",
static const sstring query = format("INSERT INTO {} ({}, is_superuser, can_login) VALUES (?, true, true)",
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
@@ -191,12 +190,12 @@ future<> standard_role_manager::create_default_role_if_missing() const {
static const sstring legacy_table_name{"users"};
bool standard_role_manager::legacy_metadata_exists() const {
return _qp.db().local().has_schema(meta::AUTH_KS, legacy_table_name);
return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);
}
future<> standard_role_manager::migrate_legacy_metadata() const {
log.info("Starting migration of legacy user metadata.");
static const sstring query = sprint("SELECT * FROM %s.%s", meta::AUTH_KS, legacy_table_name);
static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);
return _qp.process(
query,
@@ -227,7 +226,7 @@ future<> standard_role_manager::start() {
return this->create_metadata_tables_if_missing().then([this] {
_stopped = auth::do_after_system_ready(_as, [this] {
return seastar::async([this] {
wait_for_schema_agreement(_migration_manager, _qp.db().local()).get0();
wait_for_schema_agreement(_migration_manager, _qp.db(), _as).get0();
if (any_nondefault_role_row_satisfies(_qp, &has_can_login).get0()) {
if (this->legacy_metadata_exists()) {
@@ -251,12 +250,11 @@ future<> standard_role_manager::start() {
future<> standard_role_manager::stop() {
_as.request_abort();
return _stopped.handle_exception_type([] (const sleep_aborted&) { });
return _stopped.handle_exception_type([] (const sleep_aborted&) { }).handle_exception_type([](const abort_requested_exception&) {});;
}
future<> standard_role_manager::create_or_replace(stdx::string_view role_name, const role_config& c) const {
static const sstring query = sprint(
"INSERT INTO %s (%s, is_superuser, can_login) VALUES (?, ?, ?)",
future<> standard_role_manager::create_or_replace(std::string_view role_name, const role_config& c) const {
static const sstring query = format("INSERT INTO {} ({}, is_superuser, can_login) VALUES (?, ?, ?)",
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
@@ -269,7 +267,7 @@ future<> standard_role_manager::create_or_replace(stdx::string_view role_name, c
}
future<>
standard_role_manager::create(stdx::string_view role_name, const role_config& c) const {
standard_role_manager::create(std::string_view role_name, const role_config& c) const {
return this->exists(role_name).then([this, role_name, &c](bool role_exists) {
if (role_exists) {
throw role_already_exists(role_name);
@@ -280,7 +278,7 @@ standard_role_manager::create(stdx::string_view role_name, const role_config& c)
}
future<>
standard_role_manager::alter(stdx::string_view role_name, const role_config_update& u) const {
standard_role_manager::alter(std::string_view role_name, const role_config_update& u) const {
static const auto build_column_assignments = [](const role_config_update& u) -> sstring {
std::vector<sstring> assignments;
@@ -301,8 +299,7 @@ standard_role_manager::alter(stdx::string_view role_name, const role_config_upda
}
return _qp.process(
sprint(
"UPDATE %s SET %s WHERE %s = ?",
format("UPDATE {} SET {} WHERE {} = ?",
meta::roles_table::qualified_name(),
build_column_assignments(u),
meta::roles_table::role_col_name),
@@ -312,7 +309,7 @@ standard_role_manager::alter(stdx::string_view role_name, const role_config_upda
});
}
future<> standard_role_manager::drop(stdx::string_view role_name) const {
future<> standard_role_manager::drop(std::string_view role_name) const {
return this->exists(role_name).then([this, role_name](bool role_exists) {
if (!role_exists) {
throw nonexistant_role(role_name);
@@ -320,8 +317,7 @@ future<> standard_role_manager::drop(stdx::string_view role_name) const {
// First, revoke this role from all roles that are members of it.
const auto revoke_from_members = [this, role_name] {
static const sstring query = sprint(
"SELECT member FROM %s WHERE role = ?",
static const sstring query = format("SELECT member FROM {} WHERE role = ?",
meta::role_members_table::qualified_name());
return _qp.process(
@@ -359,8 +355,7 @@ future<> standard_role_manager::drop(stdx::string_view role_name) const {
// Finally, delete the role itself.
auto delete_role = [this, role_name] {
static const sstring query = sprint(
"DELETE FROM %s WHERE %s = ?",
static const sstring query = format("DELETE FROM {} WHERE {} = ?",
meta::roles_table::qualified_name(),
meta::roles_table::role_col_name);
@@ -379,14 +374,14 @@ future<> standard_role_manager::drop(stdx::string_view role_name) const {
future<>
standard_role_manager::modify_membership(
stdx::string_view grantee_name,
stdx::string_view role_name,
std::string_view grantee_name,
std::string_view role_name,
membership_change ch) const {
const auto modify_roles = [this, role_name, grantee_name, ch] {
const auto query = sprint(
"UPDATE %s SET member_of = member_of %s ? WHERE %s = ?",
const auto query = format(
"UPDATE {} SET member_of = member_of {} ? WHERE {} = ?",
meta::roles_table::qualified_name(),
(ch == membership_change::add ? '+' : '-'),
meta::roles_table::role_col_name);
@@ -402,8 +397,7 @@ standard_role_manager::modify_membership(
switch (ch) {
case membership_change::add:
return _qp.process(
sprint(
"INSERT INTO %s (role, member) VALUES (?, ?)",
format("INSERT INTO {} (role, member) VALUES (?, ?)",
meta::role_members_table::qualified_name()),
consistency_for_role(role_name),
internal_distributed_timeout_config(),
@@ -411,8 +405,7 @@ standard_role_manager::modify_membership(
case membership_change::remove:
return _qp.process(
sprint(
"DELETE FROM %s WHERE role = ? AND member = ?",
format("DELETE FROM {} WHERE role = ? AND member = ?",
meta::role_members_table::qualified_name()),
consistency_for_role(role_name),
internal_distributed_timeout_config(),
@@ -426,7 +419,7 @@ standard_role_manager::modify_membership(
}
future<>
standard_role_manager::grant(stdx::string_view grantee_name, stdx::string_view role_name) const {
standard_role_manager::grant(std::string_view grantee_name, std::string_view role_name) const {
const auto check_redundant = [this, role_name, grantee_name] {
return this->query_granted(
grantee_name,
@@ -457,7 +450,7 @@ standard_role_manager::grant(stdx::string_view grantee_name, stdx::string_view r
}
future<>
standard_role_manager::revoke(stdx::string_view revokee_name, stdx::string_view role_name) const {
standard_role_manager::revoke(std::string_view revokee_name, std::string_view role_name) const {
return this->exists(role_name).then([this, revokee_name, role_name](bool role_exists) {
if (!role_exists) {
throw nonexistant_role(sstring(role_name));
@@ -479,7 +472,7 @@ standard_role_manager::revoke(stdx::string_view revokee_name, stdx::string_view
static future<> collect_roles(
cql3::query_processor& qp,
stdx::string_view grantee_name,
std::string_view grantee_name,
bool recurse,
role_set& roles) {
return require_record(qp, grantee_name).then([&qp, &roles, recurse](record r) {
@@ -497,7 +490,7 @@ static future<> collect_roles(
});
}
future<role_set> standard_role_manager::query_granted(stdx::string_view grantee_name, recursive_role_query m) const {
future<role_set> standard_role_manager::query_granted(std::string_view grantee_name, recursive_role_query m) const {
const bool recurse = (m == recursive_role_query::yes);
return do_with(
@@ -508,8 +501,7 @@ future<role_set> standard_role_manager::query_granted(stdx::string_view grantee_
}
future<role_set> standard_role_manager::query_all() const {
static const sstring query = sprint(
"SELECT %s FROM %s",
static const sstring query = format("SELECT {} FROM {}",
meta::roles_table::role_col_name,
meta::roles_table::qualified_name());
@@ -534,19 +526,19 @@ future<role_set> standard_role_manager::query_all() const {
});
}
future<bool> standard_role_manager::exists(stdx::string_view role_name) const {
return find_record(_qp, role_name).then([](stdx::optional<record> mr) {
future<bool> standard_role_manager::exists(std::string_view role_name) const {
return find_record(_qp, role_name).then([](std::optional<record> mr) {
return static_cast<bool>(mr);
});
}
future<bool> standard_role_manager::is_superuser(stdx::string_view role_name) const {
future<bool> standard_role_manager::is_superuser(std::string_view role_name) const {
return require_record(_qp, role_name).then([](record r) {
return r.is_superuser;
});
}
future<bool> standard_role_manager::can_login(stdx::string_view role_name) const {
future<bool> standard_role_manager::can_login(std::string_view role_name) const {
return require_record(_qp, role_name).then([](record r) {
return r.can_login;
});

View File

@@ -23,14 +23,13 @@
#include "auth/role_manager.hh"
#include <experimental/string_view>
#include <string_view>
#include <unordered_set>
#include <seastar/core/abort_source.hh>
#include <seastar/core/future.hh>
#include <seastar/core/sstring.hh>
#include "stdx.hh"
#include "seastarx.hh"
namespace cql3 {
@@ -43,7 +42,7 @@ class migration_manager;
namespace auth {
stdx::string_view standard_role_manager_name() noexcept;
std::string_view standard_role_manager_name() noexcept;
class standard_role_manager final : public role_manager {
cql3::query_processor& _qp;
@@ -58,7 +57,7 @@ public:
, _stopped(make_ready_future<>()) {
}
virtual stdx::string_view qualified_java_name() const noexcept override;
virtual std::string_view qualified_java_name() const noexcept override;
virtual const resource_set& protected_resources() const override;
@@ -66,25 +65,25 @@ public:
virtual future<> stop() override;
virtual future<> create(stdx::string_view role_name, const role_config&) const override;
virtual future<> create(std::string_view role_name, const role_config&) const override;
virtual future<> drop(stdx::string_view role_name) const override;
virtual future<> drop(std::string_view role_name) const override;
virtual future<> alter(stdx::string_view role_name, const role_config_update&) const override;
virtual future<> alter(std::string_view role_name, const role_config_update&) const override;
virtual future<> grant(stdx::string_view grantee_name, stdx::string_view role_name) const override;
virtual future<> grant(std::string_view grantee_name, std::string_view role_name) const override;
virtual future<> revoke(stdx::string_view revokee_name, stdx::string_view role_name) const override;
virtual future<> revoke(std::string_view revokee_name, std::string_view role_name) const override;
virtual future<role_set> query_granted(stdx::string_view grantee_name, recursive_role_query) const override;
virtual future<role_set> query_granted(std::string_view grantee_name, recursive_role_query) const override;
virtual future<role_set> query_all() const override;
virtual future<bool> exists(stdx::string_view role_name) const override;
virtual future<bool> exists(std::string_view role_name) const override;
virtual future<bool> is_superuser(stdx::string_view role_name) const override;
virtual future<bool> is_superuser(std::string_view role_name) const override;
virtual future<bool> can_login(stdx::string_view role_name) const override;
virtual future<bool> can_login(std::string_view role_name) const override;
private:
enum class membership_change { add, remove };
@@ -97,9 +96,9 @@ private:
future<> create_default_role_if_missing() const;
future<> create_or_replace(stdx::string_view role_name, const role_config&) const;
future<> create_or_replace(std::string_view role_name, const role_config&) const;
future<> modify_membership(stdx::string_view role_name, stdx::string_view grantee_name, membership_change) const;
future<> modify_membership(std::string_view role_name, std::string_view grantee_name, membership_change) const;
};
}

View File

@@ -45,7 +45,6 @@
#include "auth/default_authorizer.hh"
#include "auth/password_authenticator.hh"
#include "auth/permission.hh"
#include "db/config.hh"
#include "utils/class_registrator.hh"
namespace auth {
@@ -118,19 +117,19 @@ public:
});
}
virtual future<> create(stdx::string_view role_name, const authentication_options& options) const override {
virtual future<> create(std::string_view role_name, const authentication_options& options) const override {
return _authenticator->create(role_name, options);
}
virtual future<> alter(stdx::string_view role_name, const authentication_options& options) const override {
virtual future<> alter(std::string_view role_name, const authentication_options& options) const override {
return _authenticator->alter(role_name, options);
}
virtual future<> drop(stdx::string_view role_name) const override {
virtual future<> drop(std::string_view role_name) const override {
return _authenticator->drop(role_name);
}
virtual future<custom_options> query_custom_options(stdx::string_view role_name) const override {
virtual future<custom_options> query_custom_options(std::string_view role_name) const override {
return _authenticator->query_custom_options(role_name);
}
@@ -218,11 +217,11 @@ public:
return make_ready_future<permission_set>(transitional_permissions);
}
virtual future<> grant(stdx::string_view s, permission_set ps, const resource& r) const override {
virtual future<> grant(std::string_view s, permission_set ps, const resource& r) const override {
return _authorizer->grant(s, std::move(ps), r);
}
virtual future<> revoke(stdx::string_view s, permission_set ps, const resource& r) const override {
virtual future<> revoke(std::string_view s, permission_set ps, const resource& r) const override {
return _authorizer->revoke(s, std::move(ps), r);
}
@@ -230,7 +229,7 @@ public:
return _authorizer->list_all();
}
virtual future<> revoke_all(stdx::string_view s) const override {
virtual future<> revoke_all(std::string_view s) const override {
return _authorizer->revoke_all(s);
}

View File

@@ -77,7 +77,7 @@ protected:
, _io_priority(iop)
, _interval(interval)
, _update_timer([this] { adjust(); })
, _control_points({{0,0}})
, _control_points()
, _current_backlog(std::move(backlog))
, _inflight_update(make_ready_future<>())
{
@@ -125,7 +125,7 @@ public:
flush_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, float static_shares) : backlog_controller(sg, iop, static_shares) {}
flush_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, std::chrono::milliseconds interval, float soft_limit, std::function<float()> current_dirty)
: backlog_controller(sg, iop, std::move(interval),
std::vector<backlog_controller::control_point>({{soft_limit, 10}, {soft_limit + (hard_dirty_limit - soft_limit) / 2, 200} , {hard_dirty_limit, 1000}}),
std::vector<backlog_controller::control_point>({{0.0, 0.0}, {soft_limit, 10}, {soft_limit + (hard_dirty_limit - soft_limit) / 2, 200} , {hard_dirty_limit, 1000}}),
std::move(current_dirty)
)
{}
@@ -139,7 +139,7 @@ public:
compaction_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, float static_shares) : backlog_controller(sg, iop, static_shares) {}
compaction_controller(seastar::scheduling_group sg, const ::io_priority_class& iop, std::chrono::milliseconds interval, std::function<float()> current_backlog)
: backlog_controller(sg, iop, std::move(interval),
std::vector<backlog_controller::control_point>({{0.5, 10}, {1.5, 100} , {normalization_factor, 1000}}),
std::vector<backlog_controller::control_point>({{0.0, 50}, {1.5, 100} , {normalization_factor, 1000}}),
std::move(current_backlog)
)
{}

View File

@@ -20,7 +20,7 @@
*/
#include "bytes.hh"
#include "core/print.hh"
#include <seastar/core/print.hh>
static inline int8_t hex_to_int(unsigned char c) {
switch (c) {
@@ -55,7 +55,7 @@ bytes from_hex(sstring_view s) {
auto half_byte1 = hex_to_int(s[i * 2]);
auto half_byte2 = hex_to_int(s[i * 2 + 1]);
if (half_byte1 == -1 || half_byte2 == -1) {
throw std::invalid_argument(sprint("Non-hex characters in %s", s));
throw std::invalid_argument(format("Non-hex characters in {}", s));
}
out[i] = (half_byte1 << 4) | half_byte2;
}

View File

@@ -22,18 +22,18 @@
#pragma once
#include "seastarx.hh"
#include "core/sstring.hh"
#include <seastar/core/sstring.hh>
#include "hashing.hh"
#include <experimental/optional>
#include <optional>
#include <iosfwd>
#include <functional>
#include "utils/mutable_view.hh"
using bytes = basic_sstring<int8_t, uint32_t, 31, false>;
using bytes_view = std::experimental::basic_string_view<int8_t>;
using bytes_view = std::basic_string_view<int8_t>;
using bytes_mutable_view = basic_mutable_view<bytes_view::value_type>;
using bytes_opt = std::experimental::optional<bytes>;
using sstring_view = std::experimental::string_view;
using bytes_opt = std::optional<bytes>;
using sstring_view = std::string_view;
inline sstring_view to_sstring_view(bytes_view view) {
return {reinterpret_cast<const char*>(view.data()), view.size()};

View File

@@ -24,9 +24,9 @@
#include <boost/range/iterator_range.hpp>
#include "bytes.hh"
#include "core/unaligned.hh"
#include <seastar/core/unaligned.hh>
#include "hashing.hh"
#include "seastar/core/simple-stream.hh"
#include <seastar/core/simple-stream.hh>
/**
* Utility for writing data into a buffer when its final size is not known up front.
*
@@ -57,12 +57,12 @@ private:
value_type data[0];
void operator delete(void* ptr) { free(ptr); }
};
// FIXME: consider increasing chunk size as the buffer grows
static constexpr size_type chunk_size{512};
static constexpr size_type default_chunk_size{512};
private:
std::unique_ptr<chunk> _begin;
chunk* _current;
size_type _size;
size_type _initial_chunk_size = default_chunk_size;
public:
class fragment_iterator : public std::iterator<std::input_iterator_tag, bytes_view> {
chunk* _current = nullptr;
@@ -102,13 +102,13 @@ private:
}
// Figure out next chunk size.
// - must be enough for data_size
// - must be at least chunk_size
// - must be at least _initial_chunk_size
// - try to double each time to prevent too many allocations
// - do not exceed max_chunk_size
size_type next_alloc_size(size_t data_size) const {
auto next_size = _current
? _current->size * 2
: chunk_size;
: _initial_chunk_size;
next_size = std::min(next_size, max_chunk_size());
// FIXME: check for overflow?
return std::max<size_type>(next_size, data_size + sizeof(chunk));
@@ -116,13 +116,19 @@ private:
// Makes room for a contiguous region of given size.
// The region is accounted for as already written.
// size must not be zero.
[[gnu::always_inline]]
value_type* alloc(size_type size) {
if (size <= current_space_left()) {
if (__builtin_expect(size <= current_space_left(), true)) {
auto ret = _current->data + _current->offset;
_current->offset += size;
_size += size;
return ret;
} else {
return alloc_new(size);
}
}
[[gnu::noinline]]
value_type* alloc_new(size_type size) {
auto alloc_size = next_alloc_size(size);
auto space = malloc(alloc_size);
if (!space) {
@@ -140,19 +146,22 @@ private:
}
_size += size;
return _current->data;
};
}
public:
bytes_ostream() noexcept
explicit bytes_ostream(size_t initial_chunk_size) noexcept
: _begin()
, _current(nullptr)
, _size(0)
, _initial_chunk_size(initial_chunk_size)
{ }
bytes_ostream() noexcept : bytes_ostream(default_chunk_size) {}
bytes_ostream(bytes_ostream&& o) noexcept
: _begin(std::move(o._begin))
, _current(o._current)
, _size(o._size)
, _initial_chunk_size(o._initial_chunk_size)
{
o._current = nullptr;
o._size = 0;
@@ -162,6 +171,7 @@ public:
: _begin()
, _current(nullptr)
, _size(0)
, _initial_chunk_size(o._initial_chunk_size)
{
append(o);
}
@@ -199,18 +209,20 @@ public:
return place_holder<T>{alloc(sizeof(T))};
}
[[gnu::always_inline]]
value_type* write_place_holder(size_type size) {
return alloc(size);
}
// Writes given sequence of bytes
[[gnu::always_inline]]
inline void write(bytes_view v) {
if (v.empty()) {
return;
}
auto this_size = std::min(v.size(), size_t(current_space_left()));
if (this_size) {
if (__builtin_expect(this_size, true)) {
memcpy(_current->data + _current->offset, v.begin(), this_size);
_current->offset += this_size;
_size += this_size;
@@ -219,11 +231,12 @@ public:
while (!v.empty()) {
auto this_size = std::min(v.size(), size_t(max_chunk_size()));
std::copy_n(v.begin(), this_size, alloc(this_size));
std::copy_n(v.begin(), this_size, alloc_new(this_size));
v.remove_prefix(this_size);
}
}
[[gnu::always_inline]]
void write(const char* ptr, size_t size) {
write(bytes_view(reinterpret_cast<const signed char*>(ptr), size));
}
@@ -393,14 +406,19 @@ public:
bool operator!=(const bytes_ostream& other) const {
return !(*this == other);
}
};
template<>
struct appending_hash<bytes_ostream> {
template<typename Hasher>
void operator()(Hasher& h, const bytes_ostream& b) const {
for (auto&& frag : b.fragments()) {
feed_hash(h, frag);
// Makes this instance empty.
//
// The first buffer is not deallocated, so callers may rely on the
// fact that if they write less than the initial chunk size between
// the clear() calls then writes will not involve any memory allocations,
// except for the first write made on this instance.
void clear() {
if (_begin) {
_begin->offset = 0;
_size = 0;
_current = _begin.get();
_begin->next.reset();
}
}
};

View File

@@ -61,6 +61,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
// - _last_row points at a direct predecessor of the next row which is going to be read.
// Used for populating continuity.
// - _population_range_starts_before_all_rows is set accordingly
// - _underlying is engaged and fast-forwarded
reading_from_underlying,
end_of_stream
@@ -99,7 +100,13 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
// forward progress is not guaranteed in case iterators are getting constantly invalidated.
bool _lower_bound_changed = false;
// Points to the underlying reader conforming to _schema,
// either to *_underlying_holder or _read_context->underlying().underlying().
flat_mutation_reader* _underlying = nullptr;
std::optional<flat_mutation_reader> _underlying_holder;
future<> do_fill_buffer(db::timeout_clock::time_point);
future<> ensure_underlying(db::timeout_clock::time_point);
void copy_from_cache_to_buffer();
future<> process_static_row(db::timeout_clock::time_point);
void move_to_end();
@@ -186,23 +193,22 @@ future<> cache_flat_mutation_reader::process_static_row(db::timeout_clock::time_
return make_ready_future<>();
} else {
_read_context->cache().on_row_miss();
return _read_context->get_next_fragment(timeout).then([this] (mutation_fragment_opt&& sr) {
if (sr) {
assert(sr->is_static_row());
maybe_add_to_cache(sr->as_static_row());
push_mutation_fragment(std::move(*sr));
}
maybe_set_static_row_continuous();
return ensure_underlying(timeout).then([this, timeout] {
return (*_underlying)(timeout).then([this] (mutation_fragment_opt&& sr) {
if (sr) {
assert(sr->is_static_row());
maybe_add_to_cache(sr->as_static_row());
push_mutation_fragment(std::move(*sr));
}
maybe_set_static_row_continuous();
});
});
}
}
inline
void cache_flat_mutation_reader::touch_partition() {
if (_snp->at_latest_version()) {
rows_entry& last_dummy = *_snp->version()->partition().clustered_rows().rbegin();
_snp->tracker()->touch(last_dummy);
}
_snp->touch();
}
inline
@@ -232,14 +238,36 @@ future<> cache_flat_mutation_reader::fill_buffer(db::timeout_clock::time_point t
});
}
inline
future<> cache_flat_mutation_reader::ensure_underlying(db::timeout_clock::time_point timeout) {
if (_underlying) {
return make_ready_future<>();
}
return _read_context->ensure_underlying(timeout).then([this, timeout] {
flat_mutation_reader& ctx_underlying = _read_context->underlying().underlying();
if (ctx_underlying.schema() != _schema) {
_underlying_holder = make_delegating_reader(ctx_underlying);
_underlying_holder->upgrade_schema(_schema);
_underlying = &*_underlying_holder;
} else {
_underlying = &ctx_underlying;
}
});
}
inline
future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_point timeout) {
if (_state == state::move_to_underlying) {
if (!_underlying) {
return ensure_underlying(timeout).then([this, timeout] {
return do_fill_buffer(timeout);
});
}
_state = state::reading_from_underlying;
_population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);
auto end = _next_row_in_range ? position_in_partition(_next_row.position())
: position_in_partition(_upper_bound);
return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
return _underlying->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
return read_from_underlying(timeout);
});
}
@@ -280,7 +308,7 @@ future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_poin
inline
future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::time_point timeout) {
return consume_mutation_fragments_until(_read_context->underlying().underlying(),
return consume_mutation_fragments_until(*_underlying,
[this] { return _state != state::reading_from_underlying || is_buffer_full(); },
[this] (mutation_fragment mf) {
_read_context->cache().on_row_miss();
@@ -425,7 +453,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
_read_context->cache().on_mispopulate();
return;
}
clogger.trace("csm {}: populate({})", this, cr);
clogger.trace("csm {}: populate({})", this, clustering_row::printer(*_schema, cr));
_lsa_manager.run_in_update_section_with_allocator([this, &cr] {
mutation_partition& mp = _snp->version()->partition();
rows_entry::compare less(*_schema);
@@ -577,7 +605,7 @@ void cache_flat_mutation_reader::move_to_next_entry() {
inline
void cache_flat_mutation_reader::add_to_buffer(mutation_fragment&& mf) {
clogger.trace("csm {}: add_to_buffer({})", this, mf);
clogger.trace("csm {}: add_to_buffer({})", this, mutation_fragment::printer(*_schema, mf));
if (mf.is_clustering_row()) {
add_clustering_row_to_buffer(std::move(mf));
} else {
@@ -599,7 +627,7 @@ void cache_flat_mutation_reader::add_to_buffer(const partition_snapshot_row_curs
// (2) If _lower_bound > mf.position(), mf was emitted
inline
void cache_flat_mutation_reader::add_clustering_row_to_buffer(mutation_fragment&& mf) {
clogger.trace("csm {}: add_clustering_row_to_buffer({})", this, mf);
clogger.trace("csm {}: add_clustering_row_to_buffer({})", this, mutation_fragment::printer(*_schema, mf));
auto& row = mf.as_clustering_row();
auto new_lower_bound = position_in_partition::after_key(row.key());
push_mutation_fragment(std::move(mf));
@@ -640,7 +668,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const range_tombstone& rt) {
inline
void cache_flat_mutation_reader::maybe_add_to_cache(const static_row& sr) {
if (can_populate()) {
clogger.trace("csm {}: populate({})", this, sr);
clogger.trace("csm {}: populate({})", this, static_row::printer(*_schema, sr));
_read_context->cache().on_static_row_insert();
_lsa_manager.run_in_update_section_with_allocator([&] {
if (_read_context->digest_requested()) {

45
cache_temperature.hh Normal file
View File

@@ -0,0 +1,45 @@
/*
* Copyright 2018 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <stdint.h>
namespace ser {
template <typename T>
class serializer;
};
class cache_temperature {
float hit_rate;
explicit cache_temperature(uint8_t hr) : hit_rate(hr/255.0f) {}
public:
uint8_t get_serialized_temperature() const {
return hit_rate * 255;
}
cache_temperature() : hit_rate(0) {}
explicit cache_temperature(float hr) : hit_rate(hr) {}
explicit operator float() const { return hit_rate; }
static cache_temperature invalid() { return cache_temperature(-1.0f); }
friend struct ser::serializer<cache_temperature>;
};

View File

@@ -20,7 +20,7 @@
*/
#pragma once
#include <core/sstring.hh>
#include <seastar/core/sstring.hh>
#include <boost/lexical_cast.hpp>
#include "exceptions/exceptions.hh"
#include "json.hh"

View File

@@ -68,7 +68,7 @@ mutation canonical_mutation::to_mutation(schema_ptr s) const {
auto cf_id = mv.table_id();
if (s->id() != cf_id) {
throw std::runtime_error(sprint("Attempted to deserialize canonical_mutation of table %s with schema of table %s (%s.%s)",
throw std::runtime_error(format("Attempted to deserialize canonical_mutation of table {} with schema of table {} ({}.{})",
cf_id, s->id(), s->ks_name(), s->cf_name()));
}

View File

@@ -63,7 +63,7 @@ class partition_cells_range {
public:
class iterator {
const mutation_partition& _mp;
stdx::optional<mutation_partition::rows_type::const_iterator> _position;
std::optional<mutation_partition::rows_type::const_iterator> _position;
cells_range _current;
public:
explicit iterator(const mutation_partition& mp)

View File

@@ -142,7 +142,7 @@ public:
}
template<template<typename> typename R>
GCC6_CONCEPT( requires Range<R, clustering_key_prefix_view> )
static stdx::optional<typename R<clustering_key_prefix_view>::bound> to_range_bound(const bound_view& bv) {
static std::optional<typename R<clustering_key_prefix_view>::bound> to_range_bound(const bound_view& bv) {
if (&bv._prefix.get() == &_empty_prefix) {
return {};
}

View File

@@ -200,8 +200,9 @@ public:
return _current_start;
}
position_in_partition_view upper_bound() const {
return _current_end;
// Returns the upper bound of the last range in provided ranges set
position_in_partition_view uppermost_bound() const {
return position_in_partition_view::for_range_end(_ranges.back());
}
// When lower_bound() changes, this also does

View File

@@ -61,6 +61,8 @@ public:
// Return a list of sstables to be compacted after applying the strategy.
compaction_descriptor get_sstables_for_compaction(column_family& cfs, std::vector<shared_sstable> candidates);
compaction_descriptor get_major_compaction_job(column_family& cf, std::vector<shared_sstable> candidates);
std::vector<resharding_descriptor> get_resharding_jobs(column_family& cf, std::vector<shared_sstable> candidates);
// Some strategies may look at the compacted and resulting sstables to
@@ -73,6 +75,9 @@ public:
// Return if optimization to rule out sstables based on clustering key filter should be applied.
bool use_clustering_key_filter() const;
// Return true if compaction strategy ignores sstables coming from partial runs.
bool ignore_partial_runs() const;
// An estimation of number of compaction for strategy to be satisfied.
int64_t estimated_pending_compactions(column_family& cf) const;
@@ -111,7 +116,7 @@ public:
} else if (short_name == "TimeWindowCompactionStrategy") {
return compaction_strategy_type::time_window;
} else {
throw exceptions::configuration_exception(sprint("Unable to find compaction strategy class '%s'", name));
throw exceptions::configuration_exception(format("Unable to find compaction strategy class '{}'", name));
}
}

View File

@@ -28,7 +28,7 @@
#include <boost/range/iterator_range.hpp>
#include <boost/range/adaptor/transformed.hpp>
#include "utils/serialization.hh"
#include "util/backtrace.hh"
#include <seastar/util/backtrace.hh>
#include "unimplemented.hh"
enum class allow_prefixes { no, yes };
@@ -98,7 +98,7 @@ public:
static bytes serialize_value(RangeOfSerializedComponents&& values) {
auto size = serialized_size(values);
if (size > std::numeric_limits<size_type>::max()) {
throw std::runtime_error(sprint("Key size too large: %d > %d", size, std::numeric_limits<size_type>::max()));
throw std::runtime_error(format("Key size too large: {:d} > {:d}", size, std::numeric_limits<size_type>::max()));
}
bytes b(bytes::initialized_later(), size);
auto i = b.begin();
@@ -145,7 +145,7 @@ public:
}
len = read_simple<size_type>(_v);
if (_v.size() < len) {
throw_with_backtrace<marshal_exception>(sprint("compound_type iterator - not enough bytes, expected %d, got %d", len, _v.size()));
throw_with_backtrace<marshal_exception>(format("compound_type iterator - not enough bytes, expected {:d}, got {:d}", len, _v.size()));
}
}
_current = bytes_view(_v.begin(), len);

View File

@@ -284,7 +284,7 @@ private:
// bytes is the static prefix or not).
auto value_size = size(*it);
if (value_size > static_cast<size_type>(std::numeric_limits<size_type>::max() - uint8_t(is_compound))) {
throw std::runtime_error(sprint("First component size too large: %d > %d", value_size, std::numeric_limits<size_type>::max() - is_compound));
throw std::runtime_error(format("First component size too large: {:d} > {:d}", value_size, std::numeric_limits<size_type>::max() - is_compound));
}
if (!is_compound) {
return value_size;
@@ -295,7 +295,7 @@ private:
for ( ; it != values.end(); ++it) {
auto value_size = size(*it);
if (value_size > std::numeric_limits<size_type>::max()) {
throw std::runtime_error(sprint("Component size too large: %d > %d", value_size, std::numeric_limits<size_type>::max()));
throw std::runtime_error(format("Component size too large: {:d} > {:d}", value_size, std::numeric_limits<size_type>::max()));
}
len += sizeof(size_type) + value_size + sizeof(eoc_type);
}
@@ -346,7 +346,7 @@ public:
}
len = read_simple<size_type>(_v);
if (_v.size() < len) {
throw_with_backtrace<marshal_exception>(sprint("composite iterator - not enough bytes, expected %d, got %d", len, _v.size()));
throw_with_backtrace<marshal_exception>(format("composite iterator - not enough bytes, expected {:d}, got {:d}", len, _v.size()));
}
}
auto value = bytes_view(_v.begin(), len);
@@ -468,7 +468,7 @@ public:
template <typename Component>
friend inline std::ostream& operator<<(std::ostream& os, const std::pair<Component, eoc>& c) {
return os << "{value=" << c.first << "; eoc=" << sprint("0x%02x", eoc_type(c.second) & 0xff) << "}";
return os << "{value=" << c.first << "; eoc=" << format("0x{:02x}", eoc_type(c.second) & 0xff) << "}";
}
friend std::ostream& operator<<(std::ostream& os, const composite& v);
@@ -511,7 +511,7 @@ public:
auto marker = it->second;
++it;
if (it != e && marker != composite::eoc::none) {
throw runtime_exception(sprint("non-zero component divider found (%d) mid", sprint("0x%02x", composite::eoc_type(marker) & 0xff)));
throw runtime_exception(format("non-zero component divider found ({:d}) mid", format("0x{:02x}", composite::eoc_type(marker) & 0xff)));
}
}
return ret;

View File

@@ -95,7 +95,7 @@ shared_ptr<compressor> compressor::create(const std::map<sstring, sstring>& opti
return create(i->second, [&options](const sstring& key) -> opt_string {
auto i = options.find(key);
if (i == options.end()) {
return std::experimental::nullopt;
return std::nullopt;
}
return { i->second };
});
@@ -108,11 +108,12 @@ thread_local const shared_ptr<compressor> compressor::snappy = make_shared<snapp
thread_local const shared_ptr<compressor> compressor::deflate = make_shared<deflate_processor>(namespace_prefix + "DeflateCompressor");
const sstring compression_parameters::SSTABLE_COMPRESSION = "sstable_compression";
const sstring compression_parameters::CHUNK_LENGTH_KB = "chunk_length_kb";
const sstring compression_parameters::CHUNK_LENGTH_KB = "chunk_length_in_kb";
const sstring compression_parameters::CHUNK_LENGTH_KB_ERR = "chunk_length_kb";
const sstring compression_parameters::CRC_CHECK_CHANCE = "crc_check_chance";
compression_parameters::compression_parameters()
: compression_parameters(nullptr)
: compression_parameters(compressor::lz4)
{}
compression_parameters::~compression_parameters()
@@ -127,12 +128,14 @@ compression_parameters::compression_parameters(const std::map<sstring, sstring>&
validate_options(options);
auto chunk_length = options.find(CHUNK_LENGTH_KB);
auto chunk_length = options.find(CHUNK_LENGTH_KB) != options.end() ?
options.find(CHUNK_LENGTH_KB) : options.find(CHUNK_LENGTH_KB_ERR);
if (chunk_length != options.end()) {
try {
_chunk_length = std::stoi(chunk_length->second) * 1024;
} catch (const std::exception& e) {
throw exceptions::syntax_exception(sstring("Invalid integer value ") + chunk_length->second + " for " + CHUNK_LENGTH_KB);
throw exceptions::syntax_exception(sstring("Invalid integer value ") + chunk_length->second + " for " + chunk_length->first);
}
}
auto crc_chance = options.find(CRC_CHECK_CHANCE);
@@ -149,11 +152,13 @@ void compression_parameters::validate() {
if (_chunk_length) {
auto chunk_length = _chunk_length.value();
if (chunk_length <= 0) {
throw exceptions::configuration_exception(sstring("Invalid negative or null ") + CHUNK_LENGTH_KB);
throw exceptions::configuration_exception(
fmt::sprintf("Invalid negative or null for %s/%s", CHUNK_LENGTH_KB, CHUNK_LENGTH_KB_ERR));
}
// _chunk_length must be a power of two
if (chunk_length & (chunk_length - 1)) {
throw exceptions::configuration_exception(sstring(CHUNK_LENGTH_KB) + " must be a power of 2.");
throw exceptions::configuration_exception(
fmt::sprintf("%s/%s must be a power of 2.", CHUNK_LENGTH_KB, CHUNK_LENGTH_KB_ERR));
}
}
if (_crc_check_chance && (_crc_check_chance.value() < 0.0 || _crc_check_chance.value() > 1.0)) {
@@ -192,6 +197,7 @@ void compression_parameters::validate_options(const std::map<sstring, sstring>&
static std::set<sstring> keywords({
sstring(SSTABLE_COMPRESSION),
sstring(CHUNK_LENGTH_KB),
sstring(CHUNK_LENGTH_KB_ERR),
sstring(CRC_CHECK_CHANCE),
});
std::set<sstring> ckw;
@@ -200,7 +206,7 @@ void compression_parameters::validate_options(const std::map<sstring, sstring>&
}
for (auto&& opt : options) {
if (!keywords.count(opt.first) && !ckw.count(opt.first)) {
throw exceptions::configuration_exception(sprint("Unknown compression option '%s'.", opt.first));
throw exceptions::configuration_exception(format("Unknown compression option '{}'.", opt.first));
}
}
}
@@ -241,7 +247,7 @@ size_t lz4_processor::compress(const char* input, size_t input_len,
output[1] = (input_len >> 8) & 0xFF;
output[2] = (input_len >> 16) & 0xFF;
output[3] = (input_len >> 24) & 0xFF;
#ifdef SEASTAR_HAVE_LZ4_COMPRESS_DEFAULT
#ifdef HAVE_LZ4_COMPRESS_DEFAULT
auto ret = LZ4_compress_default(input, output + 4, input_len, LZ4_compressBound(input_len));
#else
auto ret = LZ4_compress(input, output + 4, input_len);

View File

@@ -29,7 +29,6 @@
#include <seastar/core/sstring.hh>
#include "exceptions/exceptions.hh"
#include "stdx.hh"
class compressor {
@@ -73,7 +72,7 @@ public:
}
// to cheaply bridge sstable compression options / maps
using opt_string = stdx::optional<sstring>;
using opt_string = std::optional<sstring>;
using opt_getter = std::function<opt_string(const sstring&)>;
static shared_ptr<compressor> create(const sstring& name, const opt_getter&);
@@ -99,11 +98,12 @@ public:
static const sstring SSTABLE_COMPRESSION;
static const sstring CHUNK_LENGTH_KB;
static const sstring CHUNK_LENGTH_KB_ERR;
static const sstring CRC_CHECK_CHANCE;
private:
compressor_ptr _compressor;
std::experimental::optional<int> _chunk_length;
std::experimental::optional<double> _crc_check_chance;
std::optional<int> _chunk_length;
std::optional<double> _crc_check_chance;
public:
compression_parameters();
compression_parameters(compressor_ptr);
@@ -118,6 +118,10 @@ public:
std::map<sstring, sstring> get_options() const;
bool operator==(const compression_parameters& other) const;
bool operator!=(const compression_parameters& other) const;
static compression_parameters no_compression() {
return compression_parameters(nullptr);
}
private:
void validate_options(const std::map<sstring, sstring>&);
};

View File

@@ -22,17 +22,17 @@
# of tokens assuming they have equal hardware capability.
#
# If you already have a cluster with 1 token per node, and wish to migrate to
# multiple tokens per node, see http://wiki.apache.org/cassandra/Operations
# multiple tokens per node, see http://cassandra.apache.org/doc/latest/operating
num_tokens: 256
# Directory where Scylla should store data on disk.
# If not set, the default directory is $CASSANDRA_HOME/data/data.
# If not set, the default directory is /var/lib/scylla/data.
data_file_directories:
- /var/lib/scylla/data
# commit log. when running on magnetic HDD, this should be a
# separate spindle than the data directories.
# If not set, the default directory is $CASSANDRA_HOME/data/commitlog.
# If not set, the default directory is /var/lib/scylla/commitlog.
commitlog_directory: /var/lib/scylla/commitlog
# commitlog_sync may be either "periodic" or "batch."
@@ -65,7 +65,7 @@ commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
# seed_provider class_name is saved for future use.
# seeds address are mandatory!
# seeds address(es) are mandatory!
seed_provider:
# Addresses of hosts that are deemed contact points.
# Scylla nodes use this list of hosts to find each other and learn
@@ -242,8 +242,11 @@ batch_size_fail_threshold_in_kb: 50
# The directory where hints files are stored if hinted handoff is enabled.
# hints_directory: /var/lib/scylla/hints
# The directory where hints files are stored for materialized-view updates
# view_hints_directory: /var/lib/scylla/view_hints
# See http://wiki.apache.org/cassandra/HintedHandoff
# See https://docs.scylladb.com/architecture/anti-entropy/hinted-handoff
# May either be "true" or "false" to enable globally, or contain a list
# of data centers to enable per-datacenter.
# hinted_handoff_enabled: DC1,DC2
@@ -389,7 +392,7 @@ partitioner: org.apache.cassandra.dht.Murmur3Partitioner
# memory_allocator: NativeAllocator
# saved caches
# If not set, the default directory is $CASSANDRA_HOME/data/saved_caches.
# If not set, the default directory is /var/lib/scylla/saved_caches.
# saved_caches_directory: /var/lib/scylla/saved_caches
@@ -535,7 +538,7 @@ commitlog_total_space_in_mb: -1
# /proc/sys/net/core/wmem_max
# /proc/sys/net/core/rmem_max
# /proc/sys/net/ipv4/tcp_wmem
# /proc/sys/net/ipv4/tcp_wmem
# /proc/sys/net/ipv4/tcp_rmem
# and: man tcp
# internode_send_buff_size_in_bytes:
# internode_recv_buff_size_in_bytes:
@@ -609,9 +612,12 @@ commitlog_total_space_in_mb: -1
# of compaction, including validation compaction.
# compaction_throughput_mb_per_sec: 16
# Log a warning when compacting partitions larger than this value
# Log a warning when writing partitions larger than this value
# compaction_large_partition_warning_threshold_mb: 100
# Log a warning when writing rows larger than this value
# compaction_large_row_warning_threshold_mb: 50
# When compacting, the replacement sstable(s) can be opened before they
# are completely written, and used in place of the prior sstables for
# any range that has been written. This helps to smoothly transfer reads

View File

@@ -1,4 +1,5 @@
#!/usr/bin/python3
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# Copyright (C) 2015 ScyllaDB
#
@@ -95,8 +96,16 @@ def have_pkg(package):
return subprocess.call(['pkg-config', package]) == 0
def pkg_config(option, package):
output = subprocess.check_output(['pkg-config', option, package])
def pkg_config(package, *options):
# Add the directory containing the package to the search path, if a file is
# specified instead of a name.
if package.endswith('.pc'):
local_path = os.path.dirname(package)
env = { 'PKG_CONFIG_PATH' : '{}:{}'.format(local_path, os.environ.get('PKG_CONFIG_PATH', '')) }
else:
env = None
output = subprocess.check_output(['pkg-config'] + list(options) + [package], env=env)
return output.decode('utf-8').strip()
@@ -133,27 +142,6 @@ def flag_supported(flag, compiler):
return try_compile(flags=['-Werror'] + split, compiler=compiler)
def debug_flag(compiler):
src_with_auto = textwrap.dedent('''\
template <typename T>
struct x { auto f() {} };
x<int> a;
''')
if try_compile(source=src_with_auto, flags=['-g', '-std=gnu++1y'], compiler=compiler):
return '-g'
else:
print('Note: debug information disabled; upgrade your compiler')
return ''
def debug_compress_flag(compiler):
if try_compile(compiler=compiler, flags=['-gz']):
return '-gz'
else:
return ''
def gold_supported(compiler):
src_main = 'int main(int argc, char **argv) { return 0; }'
if try_compile_and_link(source=src_main, flags=['-fuse-ld=gold'], compiler=compiler):
@@ -197,7 +185,9 @@ class Thrift(object):
def default_target_arch():
if platform.machine() in ['i386', 'i686', 'x86_64']:
return 'nehalem'
return 'westmere' # support PCLMUL
elif platform.machine() == 'aarch64':
return 'armv8-a+crc+crypto'
else:
return ''
@@ -225,18 +215,41 @@ class Antlr3Grammar(object):
return self.source.endswith(end)
def find_headers(repodir, excluded_dirs):
walker = os.walk(repodir)
_, dirs, files = next(walker)
for excl_dir in excluded_dirs:
try:
dirs.remove(excl_dir)
except ValueError:
# Ignore complaints about excl_dir not being in dirs
pass
is_hh = lambda f: f.endswith('.hh')
headers = list(filter(is_hh, files))
for dirpath, _, files in walker:
if dirpath.startswith('./'):
dirpath = dirpath[2:]
headers += [os.path.join(dirpath, hh) for hh in filter(is_hh, files)]
return headers
modes = {
'debug': {
'sanitize': '-fsanitize=address -fsanitize=leak -fsanitize=undefined',
'sanitize_libs': '-lasan -lubsan',
'opt': '-O0 -DDEBUG -DDEBUG_SHARED_PTR -DDEFAULT_ALLOCATOR -DDEBUG_LSA_SANITIZER',
'libs': '',
'cxxflags': '-DDEBUG -DDEBUG_LSA_SANITIZER',
# Disable vptr because of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88684
'cxx_ld_flags': '-O0 -fsanitize=address -fsanitize=leak -fsanitize=undefined -fno-sanitize=vptr',
},
'release': {
'sanitize': '',
'sanitize_libs': '',
'opt': '-O3',
'libs': '',
'cxxflags': '',
'cxx_ld_flags': '-O3',
},
'dev': {
'cxxflags': '',
'cxx_ld_flags': '-O1',
},
}
@@ -271,13 +284,17 @@ scylla_tests = [
'tests/perf/perf_sstable',
'tests/cql_query_test',
'tests/secondary_index_test',
'tests/json_cql_query_test',
'tests/filtering_test',
'tests/storage_proxy_test',
'tests/schema_change_test',
'tests/mutation_reader_test',
'tests/mutation_query_test',
'tests/row_cache_test',
'tests/test-serialization',
'tests/broken_sstable_test',
'tests/sstable_test',
'tests/sstable_datafile_test',
'tests/sstable_3_x_test',
'tests/sstable_mutation_test',
'tests/sstable_resharding_test',
@@ -306,6 +323,7 @@ scylla_tests = [
'tests/log_heap_test',
'tests/managed_vector_test',
'tests/crc_test',
'tests/checksum_utils_test',
'tests/flush_queue_test',
'tests/dynamic_bitset_test',
'tests/auth_test',
@@ -352,12 +370,19 @@ scylla_tests = [
'tests/json_test',
'tests/auth_passwords_test',
'tests/multishard_mutation_query_test',
'tests/top_k_test',
'tests/utf8_test',
'tests/small_vector_test',
'tests/data_listeners_test',
'tests/truncation_migration_test',
]
perf_tests = [
'tests/perf/perf_mutation_readers',
'tests/perf/perf_checksum',
'tests/perf/perf_mutation_fragment',
'tests/perf/perf_idl',
'tests/perf/perf_vint',
]
apps = [
@@ -380,7 +405,7 @@ arg_parser.add_argument('--pie', dest='pie', action='store_true',
help='Build position-independent executable (PIE)')
arg_parser.add_argument('--so', dest='so', action='store_true',
help='Build shared object (SO) instead of executable')
arg_parser.add_argument('--mode', action='store', choices=list(modes.keys()) + ['all'], default='all')
arg_parser.add_argument('--mode', action='append', choices=list(modes.keys()), dest='selected_modes')
arg_parser.add_argument('--with', dest='artifacts', action='append', choices=all_artifacts, default=[])
arg_parser.add_argument('--cflags', action='store', dest='user_cflags', default='',
help='Extra flags for the C++ compiler')
@@ -400,6 +425,8 @@ arg_parser.add_argument('--dpdk-target', action='store', dest='dpdk_target', def
help='Path to DPDK SDK target location (e.g. <DPDK SDK dir>/x86_64-native-linuxapp-gcc)')
arg_parser.add_argument('--debuginfo', action='store', dest='debuginfo', type=int, default=1,
help='Enable(1)/disable(0)compiler debug information generation')
arg_parser.add_argument('--compress-exec-debuginfo', action='store', dest='compress_exec_debuginfo', type=int, default=1,
help='Enable(1)/disable(0) debug information compression in executables')
arg_parser.add_argument('--static-stdc++', dest='staticcxx', action='store_true',
help='Link libgcc and libstdc++ statically')
arg_parser.add_argument('--static-thrift', dest='staticthrift', action='store_true',
@@ -414,24 +441,28 @@ arg_parser.add_argument('--python', action='store', dest='python', default='pyth
help='Python3 path')
add_tristate(arg_parser, name='hwloc', dest='hwloc', help='hwloc support')
add_tristate(arg_parser, name='xen', dest='xen', help='Xen support')
arg_parser.add_argument('--split-dwarf', dest='split_dwarf', action='store_true', default=False,
help='use of split dwarf (https://gcc.gnu.org/wiki/DebugFission) to speed up linking')
arg_parser.add_argument('--enable-gcc6-concepts', dest='gcc6_concepts', action='store_true', default=False,
help='enable experimental support for C++ Concepts as implemented in GCC 6')
arg_parser.add_argument('--enable-alloc-failure-injector', dest='alloc_failure_injector', action='store_true', default=False,
help='enable allocation failure injection')
arg_parser.add_argument('--with-antlr3', dest='antlr3_exec', action='store', default=None,
help='path to antlr3 executable')
arg_parser.add_argument('--with-ragel', dest='ragel_exec', action='store', default=None,
help='path to ragel executable')
args = arg_parser.parse_args()
defines = []
defines = ['XXH_PRIVATE_API',
'SEASTAR_TESTING_MAIN',
]
extra_cxxflags = {}
cassandra_interface = Thrift(source='interface/cassandra.thrift', service='Cassandra')
scylla_core = (['database.cc',
'table.cc',
'atomic_cell.cc',
'hashers.cc',
'schema.cc',
'frozen_schema.cc',
'schema_registry.cc',
@@ -461,16 +492,16 @@ scylla_core = (['database.cc',
'compress.cc',
'sstables/mp_row_consumer.cc',
'sstables/sstables.cc',
'sstables/sstables_manager.cc',
'sstables/mc/writer.cc',
'sstables/sstable_version.cc',
'sstables/compress.cc',
'sstables/row.cc',
'sstables/partition.cc',
'sstables/compaction.cc',
'sstables/compaction_strategy.cc',
'sstables/compaction_manager.cc',
'sstables/integrity_checked_file_impl.cc',
'sstables/prepended_input_stream.cc',
'sstables/m_format_write_helpers.cc',
'sstables/m_format_read_helpers.cc',
'transport/event.cc',
'transport/event_notifier.cc',
@@ -485,6 +516,7 @@ scylla_core = (['database.cc',
'cql3/keyspace_element_name.cc',
'cql3/lists.cc',
'cql3/sets.cc',
'cql3/tuples.cc',
'cql3/maps.cc',
'cql3/functions/functions.cc',
'cql3/functions/castas_fcts.cc',
@@ -564,21 +596,24 @@ scylla_core = (['database.cc',
'db/consistency_level.cc',
'db/system_keyspace.cc',
'db/system_distributed_keyspace.cc',
'db/size_estimates_virtual_reader.cc',
'db/schema_tables.cc',
'db/cql_type_parser.cc',
'db/legacy_schema_migrator.cc',
'db/commitlog/commitlog.cc',
'db/commitlog/commitlog_replayer.cc',
'db/commitlog/commitlog_entry.cc',
'db/data_listeners.cc',
'db/hints/manager.cc',
'db/hints/resource_manager.cc',
'db/config.cc',
'db/extensions.cc',
'db/heat_load_balance.cc',
'db/large_partition_handler.cc',
'db/large_data_handler.cc',
'db/marshal/type_parser.cc',
'db/batchlog_manager.cc',
'db/view/view.cc',
'db/view/view_update_generator.cc',
'db/view/row_locking.cc',
'index/secondary_index_manager.cc',
'index/secondary_index.cc',
@@ -592,6 +627,7 @@ scylla_core = (['database.cc',
'utils/managed_bytes.cc',
'utils/exceptions.cc',
'utils/config_file.cc',
'utils/gz/crc_combine.cc',
'gms/version_generator.cc',
'gms/versioned_value.cc',
'gms/gossiper.cc',
@@ -650,6 +686,7 @@ scylla_core = (['database.cc',
'init.cc',
'lister.cc',
'repair/repair.cc',
'repair/row_level.cc',
'exceptions/exceptions.cc',
'auth/allow_all_authenticator.cc',
'auth/allow_all_authorizer.cc',
@@ -668,9 +705,11 @@ scylla_core = (['database.cc',
'auth/transitional.cc',
'auth/authentication_options.cc',
'auth/role_or_anonymous.cc',
'auth/sasl_challenge.cc',
'tracing/tracing.cc',
'tracing/trace_keyspace_helper.cc',
'tracing/trace_state.cc',
'tracing/tracing_backend_registry.cc',
'table_helper.cc',
'range_tombstone.cc',
'range_tombstone_list.cc',
@@ -682,6 +721,10 @@ scylla_core = (['database.cc',
'data/cell.cc',
'multishard_writer.cc',
'multishard_mutation_query.cc',
'reader_concurrency_semaphore.cc',
'distributed_loader.cc',
'utils/utf8.cc',
'utils/ascii.cc',
] + [Antlr3Grammar('cql3/Cql.g')] + [Thrift('interface/cassandra.thrift', 'Cassandra')]
)
@@ -744,18 +787,21 @@ idls = ['idl/gossip_digest.idl.hh',
'idl/tracing.idl.hh',
'idl/consistency_level.idl.hh',
'idl/cache_temperature.idl.hh',
'idl/view.idl.hh',
]
scylla_tests_dependencies = scylla_core + idls + [
headers = find_headers('.', excluded_dirs=['idl', 'build', 'seastar', '.git'])
scylla_tests_generic_dependencies = [
'tests/cql_test_env.cc',
'tests/test_services.cc',
]
scylla_tests_dependencies = scylla_core + idls + scylla_tests_generic_dependencies + [
'tests/cql_assertions.cc',
'tests/result_set_assertions.cc',
'tests/mutation_source_test.cc',
]
scylla_tests_seastar_deps = [
'seastar/tests/test-utils.cc',
'seastar/tests/test_runner.cc',
'tests/data_model.cc',
]
deps = {
@@ -763,7 +809,6 @@ deps = {
}
pure_boost_tests = set([
'tests/partitioner_test',
'tests/map_difference_test',
'tests/keys_test',
'tests/compound_test',
@@ -773,6 +818,7 @@ pure_boost_tests = set([
'tests/test-serialization',
'tests/range_test',
'tests/crc_test',
'tests/checksum_utils_test',
'tests/managed_vector_test',
'tests/dynamic_bitset_test',
'tests/idl_test',
@@ -794,6 +840,8 @@ pure_boost_tests = set([
'tests/observable_test',
'tests/json_test',
'tests/auth_passwords_test',
'tests/top_k_test',
'tests/small_vector_test',
])
tests_not_using_seastar_test_framework = set([
@@ -812,6 +860,7 @@ tests_not_using_seastar_test_framework = set([
'tests/memory_footprint',
'tests/gossip',
'tests/perf/perf_sstable',
'tests/small_vector_test',
]) | pure_boost_tests
for t in tests_not_using_seastar_test_framework:
@@ -822,9 +871,8 @@ for t in scylla_tests:
deps[t] = [t + '.cc']
if t not in tests_not_using_seastar_test_framework:
deps[t] += scylla_tests_dependencies
deps[t] += scylla_tests_seastar_deps
else:
deps[t] += scylla_core + idls + ['tests/cql_test_env.cc']
deps[t] += scylla_core + idls + scylla_tests_generic_dependencies
perf_tests_seastar_deps = [
'seastar/tests/perf/perf_tests.cc'
@@ -833,22 +881,32 @@ perf_tests_seastar_deps = [
for t in perf_tests:
deps[t] = [t + '.cc'] + scylla_tests_dependencies + perf_tests_seastar_deps
deps['tests/sstable_test'] += ['tests/sstable_datafile_test.cc', 'tests/sstable_utils.cc', 'tests/normalizing_reader.cc']
deps['tests/sstable_test'] += ['tests/sstable_utils.cc', 'tests/normalizing_reader.cc']
deps['tests/sstable_datafile_test'] += ['tests/sstable_utils.cc', 'tests/normalizing_reader.cc']
deps['tests/mutation_reader_test'] += ['tests/sstable_utils.cc']
deps['tests/bytes_ostream_test'] = ['tests/bytes_ostream_test.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/input_stream_test'] = ['tests/input_stream_test.cc']
deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc', 'utils/uuid.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/UUID_test'] = ['utils/UUID_gen.cc', 'tests/UUID_test.cc', 'utils/uuid.cc', 'utils/managed_bytes.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc', 'hashers.cc']
deps['tests/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'tests/murmur_hash_test.cc']
deps['tests/allocation_strategy_test'] = ['tests/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/log_heap_test'] = ['tests/log_heap_test.cc']
deps['tests/anchorless_list_test'] = ['tests/anchorless_list_test.cc']
deps['tests/perf/perf_fast_forward'] += ['release.cc']
deps['tests/perf/perf_simple_query'] += ['release.cc']
deps['tests/meta_test'] = ['tests/meta_test.cc']
deps['tests/imr_test'] = ['tests/imr_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc']
deps['tests/reusable_buffer_test'] = ['tests/reusable_buffer_test.cc']
deps['tests/utf8_test'] = ['utils/utf8.cc', 'tests/utf8_test.cc']
deps['tests/small_vector_test'] = ['tests/small_vector_test.cc']
deps['tests/multishard_mutation_query_test'] += ['tests/test_table.cc']
deps['tests/vint_serialization_test'] = ['tests/vint_serialization_test.cc', 'vint-serialization.cc', 'bytes.cc']
deps['utils/gz/gen_crc_combine_table'] = ['utils/gz/gen_crc_combine_table.cc']
warnings = [
'-Wall',
'-Werror',
'-Wno-mismatched-tags', # clang-only
'-Wno-maybe-uninitialized', # false positives on gcc 5
'-Wno-tautological-compare',
@@ -862,7 +920,11 @@ warnings = [
'-Wno-misleading-indentation',
'-Wno-overflow',
'-Wno-noexcept-type',
'-Wno-nonnull-compare'
'-Wno-nonnull-compare',
'-Wno-error=cpp',
'-Wno-ignored-attributes',
'-Wno-overloaded-virtual',
'-Wno-stringop-overflow',
]
warnings = [w
@@ -877,11 +939,11 @@ optimization_flags = [
optimization_flags = [o
for o in optimization_flags
if flag_supported(flag=o, compiler=args.cxx)]
modes['release']['opt'] += ' ' + ' '.join(optimization_flags)
modes['release']['cxx_ld_flags'] += ' ' + ' '.join(optimization_flags)
gold_linker_flag = gold_supported(compiler=args.cxx)
dbgflag = debug_flag(args.cxx) if args.debuginfo else ''
dbgflag = '-g' if args.debuginfo else ''
tests_link_rule = 'link' if args.tests_debuginfo else 'link_stripped'
if args.so:
@@ -924,18 +986,22 @@ for pkglist in optional_packages:
compiler_test_src = '''
#if __GNUC__ < 7
#if __GNUC__ < 8
#error "MAJOR"
#elif __GNUC__ == 7
#if __GNUC_MINOR__ < 3
#elif __GNUC__ == 8
#if __GNUC_MINOR__ < 1
#error "MINOR"
#elif __GNUC_MINOR__ == 1
#if __GNUC_PATCHLEVEL__ < 1
#error "PATCHLEVEL"
#endif
#endif
#endif
int main() { return 0; }
'''
if not try_compile_and_link(compiler=args.cxx, source=compiler_test_src):
print('Wrong GCC version. Scylla needs GCC >= 7.3 to compile.')
print('Wrong GCC version. Scylla needs GCC >= 8.1.1 to compile.')
sys.exit(1)
if not try_compile(compiler=args.cxx, source='#include <boost/version.hpp>'):
@@ -951,6 +1017,14 @@ if not try_compile(compiler=args.cxx, source='''\
print('Installed boost version too old. Please update {}.'.format(pkgname("boost-devel")))
sys.exit(1)
if try_compile(args.cxx, source = textwrap.dedent('''\
#include <lz4.h>
void m() {
LZ4_compress_default(static_cast<const char*>(0), static_cast<char*>(0), 0, 0);
}
'''), flags=args.user_cflags.split()):
defines.append("HAVE_LZ4_COMPRESS_DEFAULT")
has_sanitize_address_use_after_scope = try_compile(compiler=args.cxx, flags=['-fsanitize-address-use-after-scope'], source='int f() {}')
@@ -961,7 +1035,8 @@ globals().update(vars(args))
total_memory = os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES')
link_pool_depth = max(int(total_memory / 7e9), 1)
build_modes = modes if args.mode == 'all' else [args.mode]
selected_modes = args.selected_modes or modes.keys()
build_modes = {m: modes[m] for m in selected_modes}
build_artifacts = all_artifacts if not args.artifacts else args.artifacts
status = subprocess.call("./SCYLLA-VERSION-GEN")
@@ -980,26 +1055,31 @@ seastar_flags = []
if args.dpdk:
# fake dependencies on dpdk, so that it is built before anything else
seastar_flags += ['--enable-dpdk']
elif args.dpdk_target:
seastar_flags += ['--dpdk-target', args.dpdk_target]
if args.staticcxx:
seastar_flags += ['--static-stdc++']
if args.staticboost:
seastar_flags += ['--static-boost']
if args.staticyamlcpp:
seastar_flags += ['--static-yaml-cpp']
if args.gcc6_concepts:
seastar_flags += ['--enable-gcc6-concepts']
if args.alloc_failure_injector:
seastar_flags += ['--enable-alloc-failure-injector']
if args.split_dwarf:
seastar_flags += ['--split-dwarf']
# We never compress debug info in debug mode
modes['debug']['cxxflags'] += ' -gz'
# We compress it by default in release mode
flag_dest = 'cxx_ld_flags' if args.compress_exec_debuginfo else 'cxxflags'
modes['release'][flag_dest] += ' -gz'
modes['debug']['cxxflags'] += ' ' + dbgflag
modes['release']['cxxflags'] += ' ' + dbgflag
seastar_cflags = args.user_cflags
seastar_cflags += ' ' + debug_compress_flag(compiler=args.cxx)
seastar_cflags += ' -Wno-error'
if args.target != '':
seastar_cflags += ' -march=' + args.target
seastar_ldflags = args.user_ldflags
seastar_flags += ['--compiler', args.cxx, '--c-compiler', args.cc, '--cflags=%s' % (seastar_cflags), '--ldflags=%s' % (seastar_ldflags),
'--c++-dialect=gnu++1z', '--optflags=%s' % (modes['release']['opt']), ]
'--c++-dialect=gnu++17', '--use-std-optional-variant-stringview=1', '--optflags=%s' % (modes['release']['cxx_ld_flags']), ]
libdeflate_cflags = seastar_cflags
status = subprocess.call([args.python, './configure.py'] + seastar_flags, cwd='seastar')
@@ -1013,25 +1093,26 @@ ninja = find_executable('ninja') or find_executable('ninja-build')
if not ninja:
print('Ninja executable (ninja or ninja-build) not found on PATH\n')
sys.exit(1)
status = subprocess.call([ninja] + list(pc.values()), cwd='seastar')
if status:
print('Failed to generate {}\n'.format(pc))
sys.exit(1)
def query_seastar_flags(seastar_pc_file, link_static_cxx=False):
pc_file = os.path.join('seastar', seastar_pc_file)
cflags = pkg_config(pc_file, '--cflags', '--static')
libs = pkg_config(pc_file, '--libs', '--static')
if link_static_cxx:
libs = libs.replace('-lstdc++ ', '')
return cflags, libs
for mode in build_modes:
cfg = dict([line.strip().split(': ', 1)
for line in open('seastar/' + pc[mode])
if ': ' in line])
if args.staticcxx:
cfg['Libs'] = cfg['Libs'].replace('-lstdc++ ', '')
modes[mode]['seastar_cflags'] = cfg['Cflags']
modes[mode]['seastar_libs'] = cfg['Libs']
seastar_cflags, seastar_libs = query_seastar_flags(pc[mode], link_static_cxx=args.staticcxx)
modes[mode]['seastar_cflags'] = seastar_cflags
modes[mode]['seastar_libs'] = seastar_libs
seastar_deps = 'practically_anything_can_change_so_lets_run_it_every_time_and_restat.'
args.user_cflags += " " + pkg_config("--cflags", "jsoncpp")
libs = ' '.join([maybe_static(args.staticyamlcpp, '-lyaml-cpp'), '-llz4', '-lz', '-lsnappy', pkg_config("--libs", "jsoncpp"),
maybe_static(args.staticboost, '-lboost_filesystem'), ' -lcrypt', ' -lcryptopp',
args.user_cflags += " " + pkg_config('jsoncpp', '--cflags')
args.user_cflags += ' -march=' + args.target
libs = ' '.join([maybe_static(args.staticyamlcpp, '-lyaml-cpp'), '-latomic', '-llz4', '-lz', '-lsnappy', pkg_config('jsoncpp', '--libs'),
' -lstdc++fs', ' -lcrypt', ' -lcryptopp', ' -lpthread',
maybe_static(args.staticboost, '-lboost_date_time'), ])
xxhash_dir = 'xxHash'
@@ -1043,10 +1124,10 @@ if not args.staticboost:
args.user_cflags += ' -DBOOST_TEST_DYN_LINK'
for pkg in pkgs:
args.user_cflags += ' ' + pkg_config('--cflags', pkg)
libs += ' ' + pkg_config('--libs', pkg)
user_cflags = args.user_cflags
user_ldflags = args.user_ldflags
args.user_cflags += ' ' + pkg_config(pkg, '--cflags')
libs += ' ' + pkg_config(pkg, '--libs')
user_cflags = args.user_cflags + ' -fvisibility=hidden'
user_ldflags = args.user_ldflags + ' -fvisibility=hidden'
if args.staticcxx:
user_ldflags += " -static-libgcc -static-libstdc++"
if args.staticthrift:
@@ -1067,11 +1148,6 @@ if args.antlr3_exec:
else:
antlr3_exec = "antlr3"
if args.ragel_exec:
ragel_exec = args.ragel_exec
else:
ragel_exec = "ragel"
with open(buildfile, 'w') as f:
f.write(textwrap.dedent('''\
configure_args = {configure_args}
@@ -1084,14 +1160,11 @@ with open(buildfile, 'w') as f:
depth = {link_pool_depth}
pool seastar_pool
depth = 1
rule ragel
command = {ragel_exec} -G2 -o $out $in
description = RAGEL $out
rule gen
command = echo -e $text > $out
description = GEN $out
rule swagger
command = seastar/json/json2code.py -f $in -o $out
command = seastar/scripts/seastar-json2code.py -f $in -o $out
description = SWAGGER $out
rule serializer
command = {python} ./idl-compiler.py --ns ser -f $in -o $out
@@ -1100,6 +1173,9 @@ with open(buildfile, 'w') as f:
command = {ninja} -C $subdir $target
restat = 1
description = NINJA $out
rule run
command = $in > $out
description = GEN $out
rule copy
command = cp $in $out
description = COPY $out
@@ -1108,18 +1184,23 @@ with open(buildfile, 'w') as f:
''').format(**globals()))
for mode in build_modes:
modeval = modes[mode]
fmt_lib = 'fmt'
f.write(textwrap.dedent('''\
cxxflags_{mode} = {opt} -DXXH_PRIVATE_API -I. -I $builddir/{mode}/gen -I seastar -I seastar/build/{mode}/gen
cxx_ld_flags_{mode} = {cxx_ld_flags}
ld_flags_{mode} = $cxx_ld_flags_{mode}
cxxflags_{mode} = $cxx_ld_flags_{mode} {cxxflags} -I. -I $builddir/{mode}/gen
libs_{mode} = -l{fmt_lib}
seastar_libs_{mode} = {seastar_libs}
rule cxx.{mode}
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags -c -o $out $in
description = CXX $out
depfile = $out.d
rule link.{mode}
command = $cxx $cxxflags_{mode} {sanitize_libs} $ldflags {seastar_libs} -o $out $in $libs $libs_{mode}
command = $cxx $ld_flags_{mode} $ldflags -o $out $in $libs $libs_{mode}
description = LINK $out
pool = link_pool
rule link_stripped.{mode}
command = $cxx $cxxflags_{mode} -s {sanitize_libs} $ldflags {seastar_libs} -o $out $in $libs $libs_{mode}
command = $cxx $ld_flags_{mode} -s $ldflags -o $out $in $libs $libs_{mode}
description = LINK (stripped) $out
pool = link_pool
rule ar.{mode}
@@ -1141,7 +1222,11 @@ with open(buildfile, 'w') as f:
s/exceptions::syntax_exception e/exceptions::syntax_exception\& e/' $
build/{mode}/gen/${{stem}}Parser.cpp
description = ANTLR3 $in
''').format(mode=mode, antlr3_exec=antlr3_exec, **modeval))
rule checkhh.{mode}
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags -x c++ --include=$in -c -o $out /dev/null
description = CHECKHH $in
depfile = $out.d
''').format(mode=mode, antlr3_exec=antlr3_exec, fmt_lib=fmt_lib, **modeval))
f.write(
'build {mode}: phony {artifacts}\n'.format(
mode=mode,
@@ -1149,11 +1234,11 @@ with open(buildfile, 'w') as f:
)
)
compiles = {}
ragels = {}
swaggers = {}
serializers = {}
thrifts = set()
antlr3_grammars = set()
seastar_dep = 'seastar/build/{}/libseastar.a'.format(mode)
for binary in build_artifacts:
if binary in other:
continue
@@ -1172,10 +1257,17 @@ with open(buildfile, 'w') as f:
if binary.endswith('.a'):
f.write('build $builddir/{}/{}: ar.{} {}\n'.format(mode, binary, mode, str.join(' ', objs)))
else:
objs.extend(['$builddir/' + mode + '/' + artifact for artifact in [
'libdeflate/libdeflate.a'
]])
objs.append('$builddir/' + mode + '/gen/utils/gz/crc_combine_table.o')
if binary.startswith('tests/'):
local_libs = '$libs'
if binary not in tests_not_using_seastar_test_framework or binary in pure_boost_tests:
local_libs = '$seastar_libs_{} $libs'.format(mode)
if binary in pure_boost_tests:
local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework')
if binary not in tests_not_using_seastar_test_framework:
pc_path = os.path.join('seastar', pc[mode].replace('seastar.pc', 'seastar-testing.pc'))
local_libs += ' ' + pkg_config(pc_path, '--libs', '--static')
if has_thrift:
local_libs += ' ' + thrift_libs + ' ' + maybe_static(args.staticboost, '-lboost_system')
# Our code's debugging information is huge, and multiplied
@@ -1183,24 +1275,18 @@ with open(buildfile, 'w') as f:
# So we strip the tests by default; The user can very
# quickly re-link the test unstripped by adding a "_g"
# to the test name, e.g., "ninja build/release/testname_g"
f.write('build $builddir/{}/{}: {}.{} {} {}\n'.format(mode, binary, tests_link_rule, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
f.write('build $builddir/{}/{}: {}.{} {} | {}\n'.format(mode, binary, tests_link_rule, mode, str.join(' ', objs), seastar_dep))
f.write(' libs = {}\n'.format(local_libs))
f.write('build $builddir/{}/{}_g: link.{} {} {}\n'.format(mode, binary, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
f.write('build $builddir/{}/{}_g: link.{} {} | {}\n'.format(mode, binary, mode, str.join(' ', objs), seastar_dep))
f.write(' libs = {}\n'.format(local_libs))
else:
f.write('build $builddir/{}/{}: link.{} {} {}\n'.format(mode, binary, mode, str.join(' ', objs),
'seastar/build/{}/libseastar.a'.format(mode)))
f.write('build $builddir/{}/{}: link.{} {} | {}\n'.format(mode, binary, mode, str.join(' ', objs), seastar_dep))
if has_thrift:
f.write(' libs = {} {} $libs\n'.format(thrift_libs, maybe_static(args.staticboost, '-lboost_system')))
f.write(' libs = {} {} $seastar_libs_{} $libs\n'.format(thrift_libs, maybe_static(args.staticboost, '-lboost_system'), mode))
for src in srcs:
if src.endswith('.cc'):
obj = '$builddir/' + mode + '/' + src.replace('.cc', '.o')
compiles[obj] = src
elif src.endswith('.rl'):
hh = '$builddir/' + mode + '/gen/' + src.replace('.rl', '.hh')
ragels[hh] = src
elif src.endswith('.idl.hh'):
hh = '$builddir/' + mode + '/gen/' + src.replace('.idl.hh', '.dist.hh')
serializers[hh] = src
@@ -1213,26 +1299,37 @@ with open(buildfile, 'w') as f:
antlr3_grammars.add(src)
else:
raise Exception('No rule for ' + src)
compiles['$builddir/' + mode + '/gen/utils/gz/crc_combine_table.o'] = '$builddir/' + mode + '/gen/utils/gz/crc_combine_table.cc'
compiles['$builddir/' + mode + '/utils/gz/gen_crc_combine_table.o'] = 'utils/gz/gen_crc_combine_table.cc'
f.write('build {}: run {}\n'.format('$builddir/' + mode + '/gen/utils/gz/crc_combine_table.cc',
'$builddir/' + mode + '/utils/gz/gen_crc_combine_table'))
f.write('build {}: link.{} {}\n'.format('$builddir/' + mode + '/utils/gz/gen_crc_combine_table', mode,
'$builddir/' + mode + '/utils/gz/gen_crc_combine_table.o'))
f.write(
'build {mode}-objects: phony {objs}\n'.format(
mode=mode,
objs=' '.join(compiles)
)
)
gen_headers = []
for th in thrifts:
gen_headers += th.headers('$builddir/{}/gen'.format(mode))
for g in antlr3_grammars:
gen_headers += g.headers('$builddir/{}/gen'.format(mode))
gen_headers += list(swaggers.keys())
gen_headers += list(serializers.keys())
gen_headers_dep = ' '.join(gen_headers)
for obj in compiles:
src = compiles[obj]
gen_headers = list(ragels.keys())
gen_headers += ['seastar/build/{}/gen/http/request_parser.hh'.format(mode)]
gen_headers += ['seastar/build/{}/gen/http/http_response_parser.hh'.format(mode)]
for th in thrifts:
gen_headers += th.headers('$builddir/{}/gen'.format(mode))
for g in antlr3_grammars:
gen_headers += g.headers('$builddir/{}/gen'.format(mode))
gen_headers += list(swaggers.keys())
gen_headers += list(serializers.keys())
f.write('build {}: cxx.{} {} || {} \n'.format(obj, mode, src, ' '.join(gen_headers)))
f.write('build {}: cxx.{} {} || {} {}\n'.format(obj, mode, src, seastar_dep, gen_headers_dep))
if src in extra_cxxflags:
f.write(' cxxflags = {seastar_cflags} $cxxflags $cxxflags_{mode} {extra_cxxflags}\n'.format(mode=mode, extra_cxxflags=extra_cxxflags[src], **modeval))
for hh in ragels:
src = ragels[hh]
f.write('build {}: ragel {}\n'.format(hh, src))
for hh in swaggers:
src = swaggers[hh]
f.write('build {}: swagger {} | seastar/json/json2code.py\n'.format(hh, src))
f.write('build {}: swagger {} | seastar/scripts/seastar-json2code.py\n'.format(hh, src))
for hh in serializers:
src = serializers[hh]
f.write('build {}: serializer {} | idl-compiler.py\n'.format(hh, src))
@@ -1252,17 +1349,27 @@ with open(buildfile, 'w') as f:
if cc.endswith('Parser.cpp') and has_sanitize_address_use_after_scope:
# Parsers end up using huge amounts of stack space and overflowing their stack
f.write(' obj_cxxflags = -fno-sanitize-address-use-after-scope\n')
f.write('build seastar/build/{mode}/libseastar.a seastar/build/{mode}/apps/iotune/iotune seastar/build/{mode}/gen/http/request_parser.hh seastar/build/{mode}/gen/http/http_response_parser.hh: ninja {seastar_deps}\n'
for hh in headers:
f.write('build $builddir/{mode}/{hh}.o: checkhh.{mode} {hh} || {gen_headers_dep}\n'.format(
mode=mode, hh=hh, gen_headers_dep=gen_headers_dep))
f.write('build seastar/build/{mode}/libseastar.a seastar/build/{mode}/apps/iotune/iotune: ninja | always\n'
.format(**locals()))
f.write(' pool = seastar_pool\n')
f.write(' subdir = seastar\n')
f.write(' target = build/{mode}/libseastar.a build/{mode}/apps/iotune/iotune build/{mode}/gen/http/request_parser.hh build/{mode}/gen/http/http_response_parser.hh\n'.format(**locals()))
f.write(' subdir = seastar/build/{mode}\n'.format(**locals()))
f.write(' target = seastar seastar_testing app_iotune\n'.format(**locals()))
f.write(textwrap.dedent('''\
build build/{mode}/iotune: copy seastar/build/{mode}/apps/iotune/iotune
''').format(**locals()))
f.write('build build/$mode/scylla-package.tar: package build/{mode}/scylla build/{mode}/iotune\n'.format(**locals()))
f.write('build build/{mode}/scylla-package.tar.gz: package build/{mode}/scylla build/{mode}/iotune build/SCYLLA-RELEASE-FILE build/SCYLLA-VERSION-FILE | always\n'.format(**locals()))
f.write(' mode = {mode}\n'.format(**locals()))
f.write('build {}: phony\n'.format(seastar_deps))
f.write('rule libdeflate.{mode}\n'.format(**locals()))
f.write(' command = make -C libdeflate BUILD_DIR=../build/{mode}/libdeflate/ CFLAGS="{libdeflate_cflags}" CC={args.cc} ../build/{mode}/libdeflate//libdeflate.a\n'.format(**locals()))
f.write('build build/{mode}/libdeflate/libdeflate.a: libdeflate.{mode}\n'.format(**locals()))
mode = 'dev' if 'dev' in modes else modes[0]
f.write('build checkheaders: phony || {}\n'.format(' '.join(['$builddir/{}/{}.o'.format(mode, hh) for hh in headers])))
f.write(textwrap.dedent('''\
rule configure
command = {python} configure.py $configure_args
@@ -1278,3 +1385,9 @@ with open(buildfile, 'w') as f:
build clean: clean
default {modes_list}
''').format(modes_list=' '.join(build_modes), **globals()))
f.write(textwrap.dedent('''\
build always: phony
rule scylla_version_gen
command = ./SCYLLA-VERSION-GEN
build build/SCYLLA-RELEASE-FILE build/SCYLLA-VERSION-FILE: scylla_version_gen
''').format(modes_list=' '.join(build_modes), **globals()))

View File

@@ -38,44 +38,44 @@ private:
static bool is_compatible(const column_definition& new_def, const data_type& old_type, column_kind kind) {
return ::is_compatible(new_def.kind, kind) && new_def.type->is_value_compatible_with(*old_type);
}
static atomic_cell upgrade_cell(const abstract_type& new_type, const abstract_type& old_type, atomic_cell_view cell,
atomic_cell::collection_member cm = atomic_cell::collection_member::no) {
if (cell.is_live() && !old_type.is_counter()) {
if (cell.is_live_and_has_ttl()) {
return atomic_cell::make_live(new_type, cell.timestamp(), cell.value().linearize(), cell.expiry(), cell.ttl(), cm);
}
return atomic_cell::make_live(new_type, cell.timestamp(), cell.value().linearize(), cm);
} else {
return atomic_cell(new_type, cell);
}
}
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, atomic_cell_view cell) {
if (!is_compatible(new_def, old_type, kind) || cell.timestamp() <= new_def.dropped_at()) {
return;
}
auto new_cell = [&] {
if (cell.is_live() && !old_type->is_counter()) {
if (cell.is_live_and_has_ttl()) {
return atomic_cell_or_collection(
atomic_cell::make_live(*new_def.type, cell.timestamp(), cell.value().linearize(), cell.expiry(), cell.ttl())
);
}
return atomic_cell_or_collection(
atomic_cell::make_live(*new_def.type, cell.timestamp(), cell.value().linearize())
);
} else {
return atomic_cell_or_collection(*new_def.type, cell);
}
}();
dst.apply(new_def, std::move(new_cell));
dst.apply(new_def, upgrade_cell(*new_def.type, *old_type, cell));
}
static void accept_cell(row& dst, column_kind kind, const column_definition& new_def, const data_type& old_type, collection_mutation_view cell) {
if (!is_compatible(new_def, old_type, kind)) {
return;
}
cell.data.with_linearized([&] (bytes_view cell_bv) {
auto&& ctype = static_pointer_cast<const collection_type_impl>(old_type);
auto old_view = ctype->deserialize_mutation_form(cell_bv);
auto new_ctype = static_pointer_cast<const collection_type_impl>(new_def.type);
auto old_ctype = static_pointer_cast<const collection_type_impl>(old_type);
auto old_view = old_ctype->deserialize_mutation_form(cell_bv);
collection_type_impl::mutation_view new_view;
collection_type_impl::mutation new_view;
if (old_view.tomb.timestamp > new_def.dropped_at()) {
new_view.tomb = old_view.tomb;
}
for (auto& c : old_view.cells) {
if (c.second.timestamp() > new_def.dropped_at()) {
new_view.cells.emplace_back(std::move(c));
new_view.cells.emplace_back(c.first, upgrade_cell(*new_ctype->value_comparator(), *old_ctype->value_comparator(), c.second, atomic_cell::collection_member::yes));
}
}
dst.apply(new_def, ctype->serialize_mutation_form(std::move(new_view)));
if (new_view.tomb || !new_view.cells.empty()) {
dst.apply(new_def, new_ctype->serialize_mutation_form(std::move(new_view)));
}
});
}
public:

View File

@@ -171,7 +171,7 @@ void counter_cell_view::apply(const column_definition& cdef, atomic_cell_or_coll
});
}
stdx::optional<atomic_cell> counter_cell_view::difference(atomic_cell_view a, atomic_cell_view b)
std::optional<atomic_cell> counter_cell_view::difference(atomic_cell_view a, atomic_cell_view b)
{
assert(!a.is_counter_update());
assert(!b.is_counter_update());
@@ -204,7 +204,7 @@ stdx::optional<atomic_cell> counter_cell_view::difference(atomic_cell_view a, at
++a_it;
}
stdx::optional<atomic_cell> diff;
std::optional<atomic_cell> diff;
if (!result.empty()) {
diff = result.build(std::max(a.timestamp(), b.timestamp()));
} else if (a.timestamp() > b.timestamp()) {

View File

@@ -26,8 +26,6 @@
#include "atomic_cell_or_collection.hh"
#include "types.hh"
#include "stdx.hh"
class mutation;
class mutation;
@@ -385,7 +383,7 @@ public:
});
}
stdx::optional<counter_shard_view> get_shard(const counter_id& id) const {
std::optional<counter_shard_view> get_shard(const counter_id& id) const {
auto it = boost::range::find_if(shards(), [&id] (counter_shard_view csv) {
return csv.id() == id;
});
@@ -395,7 +393,7 @@ public:
return *it;
}
stdx::optional<counter_shard_view> local_shard() const {
std::optional<counter_shard_view> local_shard() const {
// TODO: consider caching local shard position
return get_shard(counter_id::local());
}
@@ -424,7 +422,7 @@ struct counter_cell_view : basic_counter_cell_view<mutable_view::no> {
// Computes a counter cell containing minimal amount of data which, when
// applied to 'b' returns the same cell as 'a' and 'b' applied together.
static stdx::optional<atomic_cell> difference(atomic_cell_view a, atomic_cell_view b);
static std::optional<atomic_cell> difference(atomic_cell_view a, atomic_cell_view b);
friend std::ostream& operator<<(std::ostream& os, counter_cell_view ccv);
};

View File

@@ -91,7 +91,7 @@ options {
#include "cql3/ut_name.hh"
#include "cql3/functions/function_name.hh"
#include "cql3/functions/function_call.hh"
#include "core/sstring.hh"
#include <seastar/core/sstring.hh>
#include "CqlLexer.hpp"
#include <algorithm>
@@ -115,7 +115,7 @@ using operations_type = std::vector<std::pair<::shared_ptr<cql3::column_identifi
// problem. It is up to the user to ensure it is actually assigned to.
template <typename T>
struct uninitialized {
std::experimental::optional<T> _val;
std::optional<T> _val;
uninitialized() = default;
uninitialized(const uninitialized&) = default;
uninitialized(uninitialized&&) = default;
@@ -242,7 +242,7 @@ struct uninitialized {
return res;
}
bool convert_boolean_literal(stdx::string_view s) {
bool convert_boolean_literal(std::string_view s) {
std::string lower_s(s.size(), '\0');
std::transform(s.cbegin(), s.cend(), lower_s.begin(), &::tolower);
return lower_s == "true";
@@ -253,8 +253,7 @@ struct uninitialized {
{
for (auto&& p : operations) {
if (*p.first == *key && !p.second->is_compatible_with(update)) {
// \%s is escaped for antlr
add_recognition_error(sprint("Multiple incompatible setting of column \%s", *key));
add_recognition_error(format("Multiple incompatible setting of column {}", *key));
}
}
operations.emplace_back(std::move(key), std::move(update));
@@ -382,9 +381,11 @@ selectStatement returns [shared_ptr<raw::select_statement> expr]
@init {
bool is_distinct = false;
::shared_ptr<cql3::term::raw> limit;
::shared_ptr<cql3::term::raw> per_partition_limit;
raw::select_statement::parameters::orderings_type orderings;
bool allow_filtering = false;
bool is_json = false;
bool bypass_cache = false;
}
: K_SELECT (
( K_JSON { is_json = true; } )?
@@ -394,12 +395,14 @@ selectStatement returns [shared_ptr<raw::select_statement> expr]
K_FROM cf=columnFamilyName
( K_WHERE wclause=whereClause )?
( K_ORDER K_BY orderByClause[orderings] ( ',' orderByClause[orderings] )* )?
( K_PER K_PARTITION K_LIMIT rows=intValue { per_partition_limit = rows; } )?
( K_LIMIT rows=intValue { limit = rows; } )?
( K_ALLOW K_FILTERING { allow_filtering = true; } )?
( K_BYPASS K_CACHE { bypass_cache = true; })?
{
auto params = ::make_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering, is_json);
auto params = ::make_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering, is_json, bypass_cache);
$expr = ::make_shared<raw::select_statement>(std::move(cf), std::move(params),
std::move(sclause), std::move(wclause), std::move(limit));
std::move(sclause), std::move(wclause), std::move(limit), std::move(per_partition_limit));
}
;
@@ -470,6 +473,7 @@ insertStatement returns [::shared_ptr<raw::modification_statement> expr]
std::vector<::shared_ptr<cql3::column_identifier::raw>> column_names;
std::vector<::shared_ptr<cql3::term::raw>> values;
bool if_not_exists = false;
bool default_unset = false;
::shared_ptr<cql3::term::raw> json_value;
}
: K_INSERT K_INTO cf=columnFamilyName
@@ -487,13 +491,15 @@ insertStatement returns [::shared_ptr<raw::modification_statement> expr]
}
| K_JSON
json_token=jsonValue { json_value = $json_token.value; }
( K_DEFAULT K_UNSET { default_unset = true; } | K_DEFAULT K_NULL )?
( K_IF K_NOT K_EXISTS { if_not_exists = true; } )?
( usingClause[attrs] )?
{
$expr = ::make_shared<raw::insert_json_statement>(std::move(cf),
std::move(attrs),
std::move(json_value),
if_not_exists);
if_not_exists,
default_unset);
}
)
;
@@ -817,10 +823,15 @@ createIndexStatement returns [::shared_ptr<create_index_statement> expr]
;
indexIdent returns [::shared_ptr<index_target::raw> id]
@init {
std::vector<::shared_ptr<cql3::column_identifier::raw>> columns;
}
: c=cident { $id = index_target::raw::values_of(c); }
| K_KEYS '(' c=cident ')' { $id = index_target::raw::keys_of(c); }
| K_ENTRIES '(' c=cident ')' { $id = index_target::raw::keys_and_values_of(c); }
| K_FULL '(' c=cident ')' { $id = index_target::raw::full_collection(c); }
| '(' c1=cident { columns.push_back(c1); } ( ',' cn=cident { columns.push_back(cn); } )* ')' { $id = index_target::raw::columns(std::move(columns)); }
;
/**
@@ -894,8 +905,8 @@ alterKeyspaceStatement returns [shared_ptr<cql3::statements::alter_keyspace_stat
/**
* ALTER COLUMN FAMILY <CF> ALTER <column> TYPE <newtype>;
* ALTER COLUMN FAMILY <CF> ADD <column> <newtype>;
* ALTER COLUMN FAMILY <CF> DROP <column>;
* ALTER COLUMN FAMILY <CF> ADD <column> <newtype>; | ALTER COLUMN FAMILY <CF> ADD (<column> <newtype>,<column1> <newtype1>..... <column n> <newtype n>)
* ALTER COLUMN FAMILY <CF> DROP <column>; | ALTER COLUMN FAMILY <CF> DROP ( <column>,<column1>.....<column n>)
* ALTER COLUMN FAMILY <CF> WITH <property> = <value>;
* ALTER COLUMN FAMILY <CF> RENAME <column> TO <column>;
*/
@@ -903,21 +914,38 @@ alterTableStatement returns [shared_ptr<alter_table_statement> expr]
@init {
alter_table_statement::type type;
auto props = make_shared<cql3::statements::cf_prop_defs>();
std::vector<alter_table_statement::column_change> column_changes;
std::vector<std::pair<shared_ptr<cql3::column_identifier::raw>, shared_ptr<cql3::column_identifier::raw>>> renames;
bool is_static = false;
}
: K_ALTER K_COLUMNFAMILY cf=columnFamilyName
( K_ALTER id=cident K_TYPE v=comparatorType { type = alter_table_statement::type::alter; }
| K_ADD id=cident v=comparatorType ({ is_static=true; } K_STATIC)? { type = alter_table_statement::type::add; }
| K_DROP id=cident { type = alter_table_statement::type::drop; }
( K_ALTER id=cident K_TYPE v=comparatorType { type = alter_table_statement::type::alter; column_changes.emplace_back(alter_table_statement::column_change{id, v}); }
| K_ADD { type = alter_table_statement::type::add; }
( id=cident v=comparatorType s=cfisStatic { column_changes.emplace_back(alter_table_statement::column_change{id, v, s}); }
| '(' id1=cident v1=comparatorType s1=cfisStatic { column_changes.emplace_back(alter_table_statement::column_change{id1, v1, s1}); }
(',' idn=cident vn=comparatorType sn=cfisStatic { column_changes.emplace_back(alter_table_statement::column_change{idn, vn, sn}); } )* ')'
)
| K_DROP { type = alter_table_statement::type::drop; }
( id=cident { column_changes.emplace_back(alter_table_statement::column_change{id}); }
| '(' id1=cident { column_changes.emplace_back(alter_table_statement::column_change{id1}); }
(',' idn=cident { column_changes.emplace_back(alter_table_statement::column_change{idn}); } )* ')'
)
| K_WITH properties[props] { type = alter_table_statement::type::opts; }
| K_RENAME { type = alter_table_statement::type::rename; }
id1=cident K_TO toId1=cident { renames.emplace_back(id1, toId1); }
( K_AND idn=cident K_TO toIdn=cident { renames.emplace_back(idn, toIdn); } )*
)
{
$expr = ::make_shared<alter_table_statement>(std::move(cf), type, std::move(id),
std::move(v), std::move(props), std::move(renames), is_static);
$expr = ::make_shared<alter_table_statement>(std::move(cf), type, std::move(column_changes), std::move(props), std::move(renames));
}
;
cfisStatic returns [bool isStaticColumn]
@init{
bool isStatic = false;
}
: (K_STATIC { isStatic=true; })?
{
$isStaticColumn = isStatic;
}
;
@@ -1568,12 +1596,12 @@ comparator_type [bool internal] returns [shared_ptr<cql3_type::raw> t]
#endif
;
native_or_internal_type [bool internal] returns [shared_ptr<cql3_type> t]
native_or_internal_type [bool internal] returns [data_type t]
: n=native_type { $t = n; }
// The "internal" types, only supported when internal==true:
| K_EMPTY {
if (internal) {
$t = cql3_type::empty;
$t = empty_type;
} else {
add_recognition_error("Invalid (reserved) user type name empty");
}
@@ -1584,28 +1612,28 @@ comparatorType returns [shared_ptr<cql3_type::raw> t]
: tt=comparator_type[false] { $t = tt; }
;
native_type returns [shared_ptr<cql3_type> t]
: K_ASCII { $t = cql3_type::ascii; }
| K_BIGINT { $t = cql3_type::bigint; }
| K_BLOB { $t = cql3_type::blob; }
| K_BOOLEAN { $t = cql3_type::boolean; }
| K_COUNTER { $t = cql3_type::counter; }
| K_DECIMAL { $t = cql3_type::decimal; }
| K_DOUBLE { $t = cql3_type::double_; }
| K_DURATION { $t = cql3_type::duration; }
| K_FLOAT { $t = cql3_type::float_; }
| K_INET { $t = cql3_type::inet; }
| K_INT { $t = cql3_type::int_; }
| K_SMALLINT { $t = cql3_type::smallint; }
| K_TEXT { $t = cql3_type::text; }
| K_TIMESTAMP { $t = cql3_type::timestamp; }
| K_TINYINT { $t = cql3_type::tinyint; }
| K_UUID { $t = cql3_type::uuid; }
| K_VARCHAR { $t = cql3_type::varchar; }
| K_VARINT { $t = cql3_type::varint; }
| K_TIMEUUID { $t = cql3_type::timeuuid; }
| K_DATE { $t = cql3_type::date; }
| K_TIME { $t = cql3_type::time; }
native_type returns [data_type t]
: K_ASCII { $t = ascii_type; }
| K_BIGINT { $t = long_type; }
| K_BLOB { $t = bytes_type; }
| K_BOOLEAN { $t = boolean_type; }
| K_COUNTER { $t = counter_type; }
| K_DECIMAL { $t = decimal_type; }
| K_DOUBLE { $t = double_type; }
| K_DURATION { $t = duration_type; }
| K_FLOAT { $t = float_type; }
| K_INET { $t = inet_addr_type; }
| K_INT { $t = int32_type; }
| K_SMALLINT { $t = short_type; }
| K_TEXT { $t = utf8_type; }
| K_TIMESTAMP { $t = timestamp_type; }
| K_TINYINT { $t = byte_type; }
| K_UUID { $t = uuid_type; }
| K_VARCHAR { $t = utf8_type; }
| K_VARINT { $t = varint_type; }
| K_TIMEUUID { $t = timeuuid_type; }
| K_DATE { $t = simple_date_type; }
| K_TIME { $t = time_type; }
;
collection_type [bool internal] returns [shared_ptr<cql3::cql3_type::raw> pt]
@@ -1651,7 +1679,7 @@ unreserved_keyword returns [sstring str]
unreserved_function_keyword returns [sstring str]
: u=basic_unreserved_keyword { $str = u; }
| t=native_or_internal_type[true] { $str = t->to_string(); }
| t=native_or_internal_type[true] { $str = t->as_cql3_type().to_string(); }
;
basic_unreserved_keyword returns [sstring str]
@@ -1698,6 +1726,10 @@ basic_unreserved_keyword returns [sstring str]
| K_NON
| K_DETERMINISTIC
| K_JSON
| K_CACHE
| K_BYPASS
| K_PER
| K_PARTITION
) { $str = $k.text; }
;
@@ -1835,9 +1867,17 @@ K_OR: O R;
K_REPLACE: R E P L A C E;
K_DETERMINISTIC: D E T E R M I N I S T I C;
K_JSON: J S O N;
K_DEFAULT: D E F A U L T;
K_UNSET: U N S E T;
K_EMPTY: E M P T Y;
K_BYPASS: B Y P A S S;
K_CACHE: C A C H E;
K_PER: P E R;
K_PARTITION: P A R T I T I O N;
K_SCYLLA_TIMEUUID_LIST_INDEX: S C Y L L A '_' T I M E U U I D '_' L I S T '_' I N D E X;
K_SCYLLA_COUNTER_SHARD_LIST: S C Y L L A '_' C O U N T E R '_' S H A R D '_' L I S T;

View File

@@ -45,6 +45,7 @@
#include "cql3/lists.hh"
#include "cql3/maps.hh"
#include "cql3/sets.hh"
#include "types/list.hh"
namespace cql3 {

View File

@@ -77,9 +77,9 @@ int64_t attributes::get_timestamp(int64_t now, const query_options& options) {
if (tval.is_unset_value()) {
return now;
}
return with_linearized(*tval, [] (bytes_view val) {
return with_linearized(*tval, [&] (bytes_view val) {
try {
data_type_for<int64_t>()->validate(val);
data_type_for<int64_t>()->validate(val, options.get_cql_serialization_format());
} catch (marshal_exception& e) {
throw exceptions::invalid_request_exception("Invalid timestamp value");
}
@@ -98,9 +98,9 @@ int32_t attributes::get_time_to_live(const query_options& options) {
if (tval.is_unset_value()) {
return 0;
}
auto ttl = with_linearized(*tval, [] (bytes_view val) {
auto ttl = with_linearized(*tval, [&] (bytes_view val) {
try {
data_type_for<int32_t>()->validate(val);
data_type_for<int32_t>()->validate(val, options.get_cql_serialization_format());
}
catch (marshal_exception& e) {
throw exceptions::invalid_request_exception("Invalid TTL value");

View File

@@ -43,7 +43,7 @@
#include "exceptions/exceptions.hh"
#include "cql3/term.hh"
#include <experimental/optional>
#include <optional>
namespace cql3 {
/**

View File

@@ -57,7 +57,7 @@ void validate_operation_on_durations(const abstract_type& type, const cql3::oper
check_false(type.is_user_type(), "Slice conditions are not supported on UDTs containing durations");
// We're a duration.
throw exceptions::invalid_request_exception(sprint("Slice conditions are not supported on durations"));
throw exceptions::invalid_request_exception(format("Slice conditions are not supported on durations"));
}
}
@@ -119,7 +119,7 @@ column_condition::raw::prepare(database& db, const sstring& keyspace, const colu
}
if (!receiver.type->is_collection()) {
throw exceptions::invalid_request_exception(sprint("Invalid element access syntax for non-collection column %s", receiver.name_as_text()));
throw exceptions::invalid_request_exception(format("Invalid element access syntax for non-collection column {}", receiver.name_as_text()));
}
shared_ptr<column_specification> element_spec, value_spec;
@@ -131,7 +131,7 @@ column_condition::raw::prepare(database& db, const sstring& keyspace, const colu
element_spec = maps::key_spec_of(*receiver.column_specification);
value_spec = maps::value_spec_of(*receiver.column_specification);
} else if (&ctype->_kind == &collection_type_impl::kind::set) {
throw exceptions::invalid_request_exception(sprint("Invalid element access syntax for set column %s", receiver.name()));
throw exceptions::invalid_request_exception(format("Invalid element access syntax for set column {}", receiver.name()));
} else {
abort();
}

View File

@@ -125,12 +125,12 @@ std::ostream& operator<<(std::ostream& out, const column_identifier::raw& id) {
column_identifier::new_selector_factory(database& db, schema_ptr schema, std::vector<const column_definition*>& defs) {
auto def = get_column_definition(schema, *this);
if (!def) {
throw exceptions::invalid_request_exception(sprint("Undefined name %s in selection clause", _text));
throw exceptions::invalid_request_exception(format("Undefined name {} in selection clause", _text));
}
// Do not allow explicitly selecting hidden columns. We also skip them on
// "SELECT *" (see selection::wildcard()).
if (def->is_view_virtual()) {
throw exceptions::invalid_request_exception(sprint("Undefined name %s in selection clause", _text));
if (def->is_hidden_from_cql()) {
throw exceptions::invalid_request_exception(format("Undefined name {} in selection clause", _text));
}
return selection::simple_selector::new_factory(def->name_as_text(), add_and_get_index(*def, defs), def->type);
}

View File

@@ -85,20 +85,19 @@ assignment_testable::test_result
constants::literal::test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver)
{
auto receiver_type = receiver->type->as_cql3_type();
if (receiver_type->is_collection()) {
if (receiver_type.is_collection()) {
return test_result::NOT_ASSIGNABLE;
}
if (!receiver_type->is_native()) {
if (!receiver_type.is_native()) {
return test_result::WEAKLY_ASSIGNABLE;
}
auto kind = receiver_type.get()->get_kind();
auto kind = receiver_type.get_kind();
switch (_type) {
case type::STRING:
if (cql3_type::kind_enum_set::frozen<
cql3_type::kind::ASCII,
cql3_type::kind::TEXT,
cql3_type::kind::INET,
cql3_type::kind::VARCHAR,
cql3_type::kind::TIMESTAMP,
cql3_type::kind::DATE,
cql3_type::kind::TIME>::contains(kind)) {
@@ -159,8 +158,8 @@ constants::literal::test_assignment(database& db, const sstring& keyspace, ::sha
constants::literal::prepare(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver)
{
if (!is_assignable(test_assignment(db, keyspace, receiver))) {
throw exceptions::invalid_request_exception(sprint("Invalid %s constant (%s) for \"%s\" of type %s",
_type, _text, *receiver->name, receiver->type->as_cql3_type()->to_string()));
throw exceptions::invalid_request_exception(format("Invalid {} constant ({}) for \"{}\" of type {}",
_type, _text, *receiver->name, receiver->type->as_cql3_type().to_string()));
}
return ::make_shared<value>(cql3::raw_value::make_value(parsed_value(receiver->type)));
}

View File

@@ -46,7 +46,7 @@
#include "cql3/operation.hh"
#include "cql3/values.hh"
#include "cql3/term.hh"
#include "core/shared_ptr.hh"
#include <seastar/core/shared_ptr.hh>
namespace cql3 {
@@ -164,7 +164,7 @@ public:
virtual assignment_testable::test_result test_assignment(database& db, const sstring& keyspace, ::shared_ptr<column_specification> receiver);
virtual sstring to_string() const override {
return _type == type::STRING ? sstring(sprint("'%s'", _text)) : _text;
return _type == type::STRING ? sstring(format("'{}'", _text)) : _text;
}
};
@@ -180,7 +180,7 @@ public:
try {
auto value = options.get_value_at(_bind_index);
if (value) {
_receiver->type->validate(*value);
_receiver->type->validate(*value, options.get_cql_serialization_format());
}
return value;
} catch (const marshal_exception& e) {
@@ -246,7 +246,7 @@ public:
return value_cast<int64_t>(long_type->deserialize_value(value_view));
});
if (increment == std::numeric_limits<int64_t>::min()) {
throw exceptions::invalid_request_exception(sprint("The negation of %d overflows supported counter precision (signed 8 bytes integer)", increment));
throw exceptions::invalid_request_exception(format("The negation of {:d} overflows supported counter precision (signed 8 bytes integer)", increment));
}
m.set_cell(prefix, column, make_counter_update_cell(-increment, params));
}

View File

@@ -26,20 +26,15 @@
#include "cql3_type.hh"
#include "cql3/util.hh"
#include "ut_name.hh"
#include "database.hh"
#include "user_types_metadata.hh"
#include "types/map.hh"
#include "types/set.hh"
#include "types/list.hh"
namespace cql3 {
sstring cql3_type::to_string() const {
if (_type->is_user_type()) {
return "frozen<" + util::maybe_quote(_name) + ">";
}
if (_type->is_tuple()) {
return "frozen<" + _name + ">";
}
return _name;
}
shared_ptr<cql3_type> cql3_type::raw::prepare(database& db, const sstring& keyspace) {
cql3_type cql3_type::raw::prepare(database& db, const sstring& keyspace) {
try {
auto&& ks = db.find_keyspace(keyspace);
return prepare_internal(keyspace, ks.metadata()->user_types());
@@ -58,16 +53,16 @@ bool cql3_type::raw::references_user_type(const sstring& name) const {
class cql3_type::raw_type : public raw {
private:
shared_ptr<cql3_type> _type;
cql3_type _type;
public:
raw_type(shared_ptr<cql3_type> type)
raw_type(cql3_type type)
: _type{type}
{ }
public:
virtual shared_ptr<cql3_type> prepare(database& db, const sstring& keyspace) {
virtual cql3_type prepare(database& db, const sstring& keyspace) {
return _type;
}
shared_ptr<cql3_type> prepare_internal(const sstring&, lw_shared_ptr<user_types_metadata>) override {
cql3_type prepare_internal(const sstring&, lw_shared_ptr<user_types_metadata>) override {
return _type;
}
@@ -76,15 +71,15 @@ public:
}
virtual bool is_counter() const {
return _type->is_counter();
return _type.is_counter();
}
virtual sstring to_string() const {
return _type->to_string();
return _type.to_string();
}
virtual bool is_duration() const override {
return _type->get_type()->equals(duration_type);
return _type.get_type()->equals(duration_type);
}
};
@@ -115,35 +110,35 @@ public:
return true;
}
virtual shared_ptr<cql3_type> prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
virtual cql3_type prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
assert(_values); // "Got null values type for a collection";
if (!_frozen && _values->supports_freezing() && !_values->_frozen) {
throw exceptions::invalid_request_exception(sprint("Non-frozen collections are not allowed inside collections: %s", *this));
throw exceptions::invalid_request_exception(format("Non-frozen collections are not allowed inside collections: {}", *this));
}
if (_values->is_counter()) {
throw exceptions::invalid_request_exception(sprint("Counters are not allowed inside collections: %s", *this));
throw exceptions::invalid_request_exception(format("Counters are not allowed inside collections: {}", *this));
}
if (_keys) {
if (!_frozen && _keys->supports_freezing() && !_keys->_frozen) {
throw exceptions::invalid_request_exception(sprint("Non-frozen collections are not allowed inside collections: %s", *this));
throw exceptions::invalid_request_exception(format("Non-frozen collections are not allowed inside collections: {}", *this));
}
}
if (_kind == &collection_type_impl::kind::list) {
return make_shared(cql3_type(to_string(), list_type_impl::get_instance(_values->prepare_internal(keyspace, user_types)->get_type(), !_frozen), false));
return cql3_type(list_type_impl::get_instance(_values->prepare_internal(keyspace, user_types).get_type(), !_frozen));
} else if (_kind == &collection_type_impl::kind::set) {
if (_values->is_duration()) {
throw exceptions::invalid_request_exception(sprint("Durations are not allowed inside sets: %s", *this));
throw exceptions::invalid_request_exception(format("Durations are not allowed inside sets: {}", *this));
}
return make_shared(cql3_type(to_string(), set_type_impl::get_instance(_values->prepare_internal(keyspace, user_types)->get_type(), !_frozen), false));
return cql3_type(set_type_impl::get_instance(_values->prepare_internal(keyspace, user_types).get_type(), !_frozen));
} else if (_kind == &collection_type_impl::kind::map) {
assert(_keys); // "Got null keys type for a collection";
if (_keys->is_duration()) {
throw exceptions::invalid_request_exception(sprint("Durations are not allowed as map keys: %s", *this));
throw exceptions::invalid_request_exception(format("Durations are not allowed as map keys: {}", *this));
}
return make_shared(cql3_type(to_string(), map_type_impl::get_instance(_keys->prepare_internal(keyspace, user_types)->get_type(), _values->prepare_internal(keyspace, user_types)->get_type(), !_frozen), false));
return cql3_type(map_type_impl::get_instance(_keys->prepare_internal(keyspace, user_types).get_type(), _values->prepare_internal(keyspace, user_types).get_type(), !_frozen));
}
abort();
}
@@ -160,11 +155,11 @@ public:
sstring start = _frozen ? "frozen<" : "";
sstring end = _frozen ? ">" : "";
if (_kind == &collection_type_impl::kind::list) {
return sprint("%slist<%s>%s", start, _values, end);
return format("{}list<{}>{}", start, _values, end);
} else if (_kind == &collection_type_impl::kind::set) {
return sprint("%sset<%s>%s", start, _values, end);
return format("{}set<{}>{}", start, _values, end);
} else if (_kind == &collection_type_impl::kind::map) {
return sprint("%smap<%s, %s>%s", start, _keys, _values, end);
return format("{}map<{}, {}>{}", start, _keys, _values, end);
}
abort();
}
@@ -177,7 +172,7 @@ public:
: _name(std::move(name)) {
}
virtual std::experimental::optional<sstring> keyspace() const override {
virtual std::optional<sstring> keyspace() const override {
return _name.get_keyspace();
}
@@ -185,7 +180,7 @@ public:
_frozen = true;
}
virtual shared_ptr<cql3_type> prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
virtual cql3_type prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
if (_name.has_keyspace()) {
// The provided keyspace is the one of the current statement this is part of. If it's different from the keyspace of
// the UTName, we reject since we want to limit user types to their own keyspace (see #6643)
@@ -199,16 +194,16 @@ public:
}
if (!user_types) {
// bootstrap mode.
throw exceptions::invalid_request_exception(sprint("Unknown type %s", _name));
throw exceptions::invalid_request_exception(format("Unknown type {}", _name));
}
try {
auto&& type = user_types->get_type(_name.get_user_type_name());
if (!_frozen) {
throw exceptions::invalid_request_exception("Non-frozen User-Defined types are not supported, please use frozen<>");
}
return make_shared<cql3_type>(_name.to_string(), std::move(type));
return cql3_type(std::move(type));
} catch (std::out_of_range& e) {
throw exceptions::invalid_request_exception(sprint("Unknown type %s", _name));
throw exceptions::invalid_request_exception(format("Unknown type {}", _name));
}
}
bool references_user_type(const sstring& name) const override {
@@ -244,7 +239,7 @@ public:
}
_frozen = true;
}
virtual shared_ptr<cql3_type> prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
virtual cql3_type prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata> user_types) override {
if (!_frozen) {
freeze();
}
@@ -253,9 +248,9 @@ public:
if (t->is_counter()) {
throw exceptions::invalid_request_exception("Counters are not allowed inside tuples");
}
ts.push_back(t->prepare_internal(keyspace, user_types)->get_type());
ts.push_back(t->prepare_internal(keyspace, user_types).get_type());
}
return make_cql3_tuple_type(tuple_type_impl::get_instance(std::move(ts)));
return tuple_type_impl::get_instance(std::move(ts))->as_cql3_type();
}
bool references_user_type(const sstring& name) const override {
@@ -265,7 +260,7 @@ public:
}
virtual sstring to_string() const override {
return sprint("tuple<%s>", join(", ", _types));
return format("tuple<{}>", join(", ", _types));
}
};
@@ -279,19 +274,19 @@ cql3_type::raw::is_counter() const {
return false;
}
std::experimental::optional<sstring>
std::optional<sstring>
cql3_type::raw::keyspace() const {
return std::experimental::nullopt;
return std::nullopt;
}
void
cql3_type::raw::freeze() {
sstring message = sprint("frozen<> is only allowed on collections, tuples, and user-defined types (got %s)", to_string());
sstring message = format("frozen<> is only allowed on collections, tuples, and user-defined types (got {})", to_string());
throw exceptions::invalid_request_exception(message);
}
shared_ptr<cql3_type::raw>
cql3_type::raw::from(shared_ptr<cql3_type> type) {
cql3_type::raw::from(cql3_type type) {
return ::make_shared<raw_type>(type);
}
@@ -326,32 +321,31 @@ cql3_type::raw::frozen(shared_ptr<raw> t) {
return t;
}
thread_local shared_ptr<cql3_type> cql3_type::ascii = make("ascii", ascii_type, cql3_type::kind::ASCII);
thread_local shared_ptr<cql3_type> cql3_type::bigint = make("bigint", long_type, cql3_type::kind::BIGINT);
thread_local shared_ptr<cql3_type> cql3_type::blob = make("blob", bytes_type, cql3_type::kind::BLOB);
thread_local shared_ptr<cql3_type> cql3_type::boolean = make("boolean", boolean_type, cql3_type::kind::BOOLEAN);
thread_local shared_ptr<cql3_type> cql3_type::double_ = make("double", double_type, cql3_type::kind::DOUBLE);
thread_local shared_ptr<cql3_type> cql3_type::empty = make("empty", empty_type, cql3_type::kind::EMPTY);
thread_local shared_ptr<cql3_type> cql3_type::float_ = make("float", float_type, cql3_type::kind::FLOAT);
thread_local shared_ptr<cql3_type> cql3_type::int_ = make("int", int32_type, cql3_type::kind::INT);
thread_local shared_ptr<cql3_type> cql3_type::smallint = make("smallint", short_type, cql3_type::kind::SMALLINT);
thread_local shared_ptr<cql3_type> cql3_type::text = make("text", utf8_type, cql3_type::kind::TEXT);
thread_local shared_ptr<cql3_type> cql3_type::timestamp = make("timestamp", timestamp_type, cql3_type::kind::TIMESTAMP);
thread_local shared_ptr<cql3_type> cql3_type::tinyint = make("tinyint", byte_type, cql3_type::kind::TINYINT);
thread_local shared_ptr<cql3_type> cql3_type::uuid = make("uuid", uuid_type, cql3_type::kind::UUID);
thread_local shared_ptr<cql3_type> cql3_type::varchar = make("varchar", utf8_type, cql3_type::kind::TEXT);
thread_local shared_ptr<cql3_type> cql3_type::timeuuid = make("timeuuid", timeuuid_type, cql3_type::kind::TIMEUUID);
thread_local shared_ptr<cql3_type> cql3_type::date = make("date", simple_date_type, cql3_type::kind::DATE);
thread_local shared_ptr<cql3_type> cql3_type::time = make("time", time_type, cql3_type::kind::TIME);
thread_local shared_ptr<cql3_type> cql3_type::inet = make("inet", inet_addr_type, cql3_type::kind::INET);
thread_local shared_ptr<cql3_type> cql3_type::varint = make("varint", varint_type, cql3_type::kind::VARINT);
thread_local shared_ptr<cql3_type> cql3_type::decimal = make("decimal", decimal_type, cql3_type::kind::DECIMAL);
thread_local shared_ptr<cql3_type> cql3_type::counter = make("counter", counter_type, cql3_type::kind::COUNTER);
thread_local shared_ptr<cql3_type> cql3_type::duration = make("duration", duration_type, cql3_type::kind::DURATION);
thread_local cql3_type cql3_type::ascii{ascii_type};
thread_local cql3_type cql3_type::bigint{long_type};
thread_local cql3_type cql3_type::blob{bytes_type};
thread_local cql3_type cql3_type::boolean{boolean_type};
thread_local cql3_type cql3_type::double_{double_type};
thread_local cql3_type cql3_type::empty{empty_type};
thread_local cql3_type cql3_type::float_{float_type};
thread_local cql3_type cql3_type::int_{int32_type};
thread_local cql3_type cql3_type::smallint{short_type};
thread_local cql3_type cql3_type::text{utf8_type};
thread_local cql3_type cql3_type::timestamp{timestamp_type};
thread_local cql3_type cql3_type::tinyint{byte_type};
thread_local cql3_type cql3_type::uuid{uuid_type};
thread_local cql3_type cql3_type::timeuuid{timeuuid_type};
thread_local cql3_type cql3_type::date{simple_date_type};
thread_local cql3_type cql3_type::time{time_type};
thread_local cql3_type cql3_type::inet{inet_addr_type};
thread_local cql3_type cql3_type::varint{varint_type};
thread_local cql3_type cql3_type::decimal{decimal_type};
thread_local cql3_type cql3_type::counter{counter_type};
thread_local cql3_type cql3_type::duration{duration_type};
const std::vector<shared_ptr<cql3_type>>&
const std::vector<cql3_type>&
cql3_type::values() {
static thread_local std::vector<shared_ptr<cql3_type>> v = {
static thread_local std::vector<cql3_type> v = {
cql3_type::ascii,
cql3_type::bigint,
cql3_type::blob,
@@ -368,7 +362,6 @@ cql3_type::values() {
cql3_type::timestamp,
cql3_type::tinyint,
cql3_type::uuid,
cql3_type::varchar,
cql3_type::varint,
cql3_type::timeuuid,
cql3_type::date,
@@ -378,16 +371,6 @@ cql3_type::values() {
return v;
}
shared_ptr<cql3_type>
make_cql3_tuple_type(tuple_type t) {
auto name = sprint("tuple<%s>",
::join(", ",
t->all_types()
| boost::adaptors::transformed(std::mem_fn(&abstract_type::as_cql3_type))));
return ::make_shared<cql3_type>(std::move(name), std::move(t), false);
}
std::ostream&
operator<<(std::ostream& os, const cql3_type::raw& r) {
return os << r.to_string();

View File

@@ -54,17 +54,14 @@ namespace cql3 {
class ut_name;
class cql3_type final {
sstring _name;
data_type _type;
bool _native;
public:
cql3_type(sstring name, data_type type, bool native = true)
: _name(std::move(name)), _type(std::move(type)), _native(native) {}
cql3_type(data_type type) : _type(std::move(type)) {}
bool is_collection() const { return _type->is_collection(); }
bool is_counter() const { return _type->is_counter(); }
bool is_native() const { return _native; }
bool is_native() const { return _type->is_native(); }
data_type get_type() const { return _type; }
sstring to_string() const;
const sstring& to_string() const { return _type->cql3_type_name(); }
// For UserTypes, we need to know the current keyspace to resolve the
// actual type used, so Raw is a "not yet prepared" CQL3Type.
@@ -77,11 +74,11 @@ public:
virtual bool is_counter() const;
virtual bool is_duration() const;
virtual bool references_user_type(const sstring&) const;
virtual std::experimental::optional<sstring> keyspace() const;
virtual std::optional<sstring> keyspace() const;
virtual void freeze();
virtual shared_ptr<cql3_type> prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata>) = 0;
virtual shared_ptr<cql3_type> prepare(database& db, const sstring& keyspace);
static shared_ptr<raw> from(shared_ptr<cql3_type> type);
virtual cql3_type prepare_internal(const sstring& keyspace, lw_shared_ptr<user_types_metadata>) = 0;
virtual cql3_type prepare(database& db, const sstring& keyspace);
static shared_ptr<raw> from(cql3_type type);
static shared_ptr<raw> user_type(ut_name name);
static shared_ptr<raw> map(shared_ptr<raw> t1, shared_ptr<raw> t2);
static shared_ptr<raw> list(shared_ptr<raw> t);
@@ -102,74 +99,38 @@ private:
}
public:
enum class kind : int8_t {
ASCII, BIGINT, BLOB, BOOLEAN, COUNTER, DECIMAL, DOUBLE, EMPTY, FLOAT, INT, SMALLINT, TINYINT, INET, TEXT, TIMESTAMP, UUID, VARCHAR, VARINT, TIMEUUID, DATE, TIME, DURATION
};
using kind_enum = super_enum<kind,
kind::ASCII,
kind::BIGINT,
kind::BLOB,
kind::BOOLEAN,
kind::COUNTER,
kind::DECIMAL,
kind::DOUBLE,
kind::EMPTY,
kind::FLOAT,
kind::INET,
kind::INT,
kind::SMALLINT,
kind::TINYINT,
kind::TEXT,
kind::TIMESTAMP,
kind::UUID,
kind::VARCHAR,
kind::VARINT,
kind::TIMEUUID,
kind::DATE,
kind::TIME,
kind::DURATION>;
using kind_enum_set = enum_set<kind_enum>;
private:
std::experimental::optional<kind_enum_set::prepared> _kind;
static shared_ptr<cql3_type> make(sstring name, data_type type, kind kind_) {
return make_shared<cql3_type>(std::move(name), std::move(type), kind_);
}
public:
static thread_local shared_ptr<cql3_type> ascii;
static thread_local shared_ptr<cql3_type> bigint;
static thread_local shared_ptr<cql3_type> blob;
static thread_local shared_ptr<cql3_type> boolean;
static thread_local shared_ptr<cql3_type> double_;
static thread_local shared_ptr<cql3_type> empty;
static thread_local shared_ptr<cql3_type> float_;
static thread_local shared_ptr<cql3_type> int_;
static thread_local shared_ptr<cql3_type> smallint;
static thread_local shared_ptr<cql3_type> text;
static thread_local shared_ptr<cql3_type> timestamp;
static thread_local shared_ptr<cql3_type> tinyint;
static thread_local shared_ptr<cql3_type> uuid;
static thread_local shared_ptr<cql3_type> varchar;
static thread_local shared_ptr<cql3_type> timeuuid;
static thread_local shared_ptr<cql3_type> date;
static thread_local shared_ptr<cql3_type> time;
static thread_local shared_ptr<cql3_type> inet;
static thread_local shared_ptr<cql3_type> varint;
static thread_local shared_ptr<cql3_type> decimal;
static thread_local shared_ptr<cql3_type> counter;
static thread_local shared_ptr<cql3_type> duration;
static thread_local cql3_type ascii;
static thread_local cql3_type bigint;
static thread_local cql3_type blob;
static thread_local cql3_type boolean;
static thread_local cql3_type double_;
static thread_local cql3_type empty;
static thread_local cql3_type float_;
static thread_local cql3_type int_;
static thread_local cql3_type smallint;
static thread_local cql3_type text;
static thread_local cql3_type timestamp;
static thread_local cql3_type tinyint;
static thread_local cql3_type uuid;
static thread_local cql3_type timeuuid;
static thread_local cql3_type date;
static thread_local cql3_type time;
static thread_local cql3_type inet;
static thread_local cql3_type varint;
static thread_local cql3_type decimal;
static thread_local cql3_type counter;
static thread_local cql3_type duration;
static const std::vector<shared_ptr<cql3_type>>& values();
static const std::vector<cql3_type>& values();
public:
cql3_type(sstring name, data_type type, kind kind_)
: _name(std::move(name)), _type(std::move(type)), _native(true), _kind(kind_enum_set::prepare(kind_)) {
}
kind_enum_set::prepared get_kind() const {
assert(_kind);
return *_kind;
}
using kind = abstract_type::cql3_kind;
using kind_enum_set = abstract_type::cql3_kind_enum_set;
kind_enum_set::prepared get_kind() const { return _type->get_cql3_kind(); }
};
shared_ptr<cql3_type> make_cql3_tuple_type(tuple_type t);
inline bool operator==(const cql3_type& a, const cql3_type& b) {
return a.get_type() == b.get_type();
}
#if 0
public static class Custom implements CQL3Type

View File

@@ -67,6 +67,12 @@ class error_collector : public error_listener<RecognizerType, ExceptionBaseType>
*/
const sstring_view _query;
/**
* An empty bitset to be used as a workaround for AntLR null dereference
* bug.
*/
static typename ExceptionBaseType::BitsetListType _empty_bit_list;
public:
/**
@@ -144,6 +150,14 @@ private:
break;
}
default:
// AntLR Exception class has a bug of dereferencing a null
// pointer in the displayRecognitionError. The following
// if statement makes sure it will not be null before the
// call to that function (displayRecognitionError).
// bug reference: https://github.com/antlr/antlr3/issues/191
if (!ex->get_expectingSet()) {
ex->set_expectingSet(&_empty_bit_list);
}
ex->displayRecognitionError(token_names, msg);
}
return msg.str();
@@ -345,4 +359,8 @@ private:
#endif
};
template<typename RecognizerType, typename TokenType, typename ExceptionBaseType>
typename ExceptionBaseType::BitsetListType
error_collector<RecognizerType,TokenType,ExceptionBaseType>::_empty_bit_list = typename ExceptionBaseType::BitsetListType();
}

View File

@@ -41,6 +41,7 @@
#pragma once
#include "function.hh"
#include "types.hh"
#include "cql3/cql3_type.hh"
#include <vector>
@@ -73,7 +74,7 @@ public:
return _arg_types;
}
virtual data_type return_type() const {
virtual const data_type& return_type() const {
return _return_type;
}
@@ -92,7 +93,7 @@ public:
}
virtual sstring column_name(const std::vector<sstring>& column_names) override {
return sprint("%s(%s)", _name, join(", ", column_names));
return format("{}({})", _name, join(", ", column_names));
}
virtual void print(std::ostream& os) const override;
@@ -106,9 +107,9 @@ abstract_function::print(std::ostream& os) const {
if (i > 0) {
os << ", ";
}
os << _arg_types[i]->as_cql3_type()->to_string();
os << _arg_types[i]->as_cql3_type().to_string();
}
os << ") -> " << _return_type->as_cql3_type()->to_string();
os << ") -> " << _return_type->as_cql3_type().to_string();
}
}

View File

@@ -249,7 +249,7 @@ struct aggregate_type_for<timeuuid_native_type> {
template <typename Type>
class impl_max_function_for final : public aggregate_function::aggregate {
std::experimental::optional<typename aggregate_type_for<Type>::type> _max{};
std::optional<typename aggregate_type_for<Type>::type> _max{};
public:
virtual void reset() override {
_max = {};
@@ -258,7 +258,7 @@ public:
if (!_max) {
return {};
}
return data_type_for<Type>()->decompose(Type{*_max});
return data_type_for<Type>()->decompose(data_value(Type{*_max}));
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {
@@ -296,7 +296,7 @@ make_max_function() {
template <typename Type>
class impl_min_function_for final : public aggregate_function::aggregate {
std::experimental::optional<typename aggregate_type_for<Type>::type> _min{};
std::optional<typename aggregate_type_for<Type>::type> _min{};
public:
virtual void reset() override {
_min = {};
@@ -305,7 +305,7 @@ public:
if (!_min) {
return {};
}
return data_type_for<Type>()->decompose(Type{*_min});
return data_type_for<Type>()->decompose(data_value(Type{*_min}));
}
virtual void add_input(cql_serialization_format sf, const std::vector<opt_bytes>& values) override {
if (!values[0]) {

View File

@@ -42,7 +42,7 @@
#pragma once
#include "function.hh"
#include <experimental/optional>
#include <optional>
namespace cql3 {
namespace functions {

View File

@@ -0,0 +1,156 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/*
* Copyright (C) 2018 ScyllaDB
*
* Modified by ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include "cql3/functions/function.hh"
#include "cql3/functions/scalar_function.hh"
#include "cql3/cql3_type.hh"
#include "bytes_ostream.hh"
#include "types.hh"
#include <boost/algorithm/cxx11/any_of.hpp>
namespace cql3 {
namespace functions {
/*
* This function is used for handling 'SELECT JSON' statement.
* 'SELECT JSON' is supposed to return a single column named '[json]'
* with JSON representation of the query result as its value.
* In order to achieve it, selectors from 'SELECT' are wrapped with
* 'as_json' function, which also keeps information about underlying
* selector names and types. This function is not registered in functions.cc,
* because it should not be invoked directly from CQL.
* Case-sensitive column names are wrapped in additional quotes,
* as stated in CQL-JSON documentation.
*/
class as_json_function : public scalar_function {
std::vector<sstring> _selector_names;
std::vector<data_type> _selector_types;
public:
as_json_function(std::vector<sstring>&& selector_names, std::vector<data_type> selector_types)
: _selector_names(std::move(selector_names)), _selector_types(std::move(selector_types)) {
}
virtual bytes_opt execute(cql_serialization_format sf, const std::vector<bytes_opt>& parameters) override {
bytes_ostream encoded_row;
encoded_row.write("{", 1);
for (size_t i = 0; i < _selector_names.size(); ++i) {
if (i > 0) {
encoded_row.write(", ", 2);
}
bool has_any_upper = boost::algorithm::any_of(_selector_names[i], [](unsigned char c) { return std::isupper(c); });
encoded_row.write("\"", 1);
if (has_any_upper) {
encoded_row.write("\\\"", 2);
}
encoded_row.write(_selector_names[i].c_str(), _selector_names[i].size());
if (has_any_upper) {
encoded_row.write("\\\"", 2);
}
encoded_row.write("\": ", 3);
sstring row_sstring = _selector_types[i]->to_json_string(parameters[i]);
encoded_row.write(row_sstring.c_str(), row_sstring.size());
}
encoded_row.write("}", 1);
return bytes(encoded_row.linearize());
}
virtual const function_name& name() const override {
static const function_name f_name = function_name::native_function("as_json");
return f_name;
}
virtual const std::vector<data_type>& arg_types() const override {
return _selector_types;
}
virtual const data_type& return_type() const override {
return utf8_type;
}
virtual bool is_pure() override {
return true;
}
virtual bool is_native() override {
return true;
}
virtual bool is_aggregate() override {
// Aggregates of aggregates are currently not supported, but JSON handles them
return false;
}
virtual void print(std::ostream& os) const override {
os << "as_json(";
bool first = true;
for (const sstring& selector_name: _selector_names) {
if (first) {
first = false;
} else {
os << ", ";
}
os << selector_name;
}
os << ") -> " << utf8_type->as_cql3_type().to_string();
}
virtual bool uses_function(const sstring& ks_name, const sstring& function_name) override {
return false;
}
virtual bool has_reference_to(function& f) override {
return false;
}
virtual sstring column_name(const std::vector<sstring>& column_names) override {
return "[json]";
}
};
}
}

View File

@@ -43,7 +43,7 @@
#include "native_scalar_function.hh"
#include "exceptions/exceptions.hh"
#include "core/print.hh"
#include <seastar/core/print.hh>
#include "cql3/cql3_type.hh"
namespace cql3 {
@@ -56,7 +56,7 @@ namespace functions {
inline
shared_ptr<function>
make_to_blob_function(data_type from_type) {
auto name = from_type->as_cql3_type()->to_string() + "asblob";
auto name = from_type->as_cql3_type().to_string() + "asblob";
return make_native_scalar_function<true>(name, bytes_type, { from_type },
[] (cql_serialization_format sf, const std::vector<bytes_opt>& parameters) {
return parameters[0];
@@ -66,7 +66,7 @@ make_to_blob_function(data_type from_type) {
inline
shared_ptr<function>
make_from_blob_function(data_type to_type) {
sstring name = sstring("blobas") + to_type->as_cql3_type()->to_string();
sstring name = sstring("blobas") + to_type->as_cql3_type().to_string();
return make_native_scalar_function<true>(name, to_type, { bytes_type },
[name, to_type] (cql_serialization_format sf, const std::vector<bytes_opt>& parameters) -> bytes_opt {
auto&& val = parameters[0];
@@ -74,13 +74,12 @@ make_from_blob_function(data_type to_type) {
return val;
}
try {
to_type->validate(*val);
to_type->validate(*val, sf);
return val;
} catch (marshal_exception& e) {
using namespace exceptions;
throw invalid_request_exception(sprint(
"In call to function %s, value 0x%s is not a valid binary representation for type %s",
name, to_hex(val), to_type->as_cql3_type()->to_string()));
throw invalid_request_exception(format("In call to function {}, value 0x{} is not a valid binary representation for type {}",
name, to_hex(val), to_type->as_cql3_type().to_string()));
}
});
}

View File

@@ -27,7 +27,7 @@ namespace functions {
namespace {
using bytes_opt = std::experimental::optional<bytes>;
using bytes_opt = std::optional<bytes>;
class castas_function_for : public cql3::functions::native_scalar_function {
castas_fctn _func;
@@ -35,7 +35,7 @@ public:
castas_function_for(data_type to_type,
data_type from_type,
castas_fctn func)
: native_scalar_function("castas" + to_type->as_cql3_type()->to_string(), to_type, {from_type})
: native_scalar_function("castas" + to_type->as_cql3_type().to_string(), to_type, {from_type})
, _func(func) {
}
virtual bool is_pure() override {

View File

@@ -47,7 +47,7 @@
#include "cql3/functions/function.hh"
#include "cql3/functions/abstract_function.hh"
#include "exceptions/exceptions.hh"
#include "core/print.hh"
#include <seastar/core/print.hh>
#include "cql3/cql3_type.hh"
#include "cql3/selection/selector.hh"

View File

@@ -44,18 +44,18 @@
#include "function_name.hh"
#include "types.hh"
#include <vector>
#include <experimental/optional>
#include <optional>
namespace cql3 {
namespace functions {
class function {
public:
using opt_bytes = std::experimental::optional<bytes>;
using opt_bytes = std::optional<bytes>;
virtual ~function() {}
virtual const function_name& name() const = 0;
virtual const std::vector<data_type>& arg_types() const = 0;
virtual data_type return_type() const = 0;
virtual const data_type& return_type() const = 0;
/**
* Checks whether the function is a pure function (as in doesn't depend on, nor produce side effects) or not.

View File

@@ -41,7 +41,7 @@
#pragma once
#include "core/sstring.hh"
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
#include <iosfwd>
#include <functional>

View File

@@ -27,6 +27,10 @@
#include "cql3/sets.hh"
#include "cql3/lists.hh"
#include "cql3/constants.hh"
#include "database.hh"
#include "types/map.hh"
#include "types/set.hh"
#include "types/list.hh"
namespace cql3 {
namespace functions {
@@ -59,17 +63,17 @@ functions::init() {
for (auto&& type : cql3_type::values()) {
// Note: because text and varchar ends up being synonymous, our automatic makeToBlobFunction doesn't work
// for varchar, so we special case it below. We also skip blob for obvious reasons.
if (type == cql3_type::varchar || type == cql3_type::blob) {
if (type == cql3_type::blob) {
continue;
}
// counters are not supported yet
if (type->is_counter()) {
if (type.is_counter()) {
warn(unimplemented::cause::COUNTERS);
continue;
}
declare(make_to_blob_function(type->get_type()));
declare(make_from_blob_function(type->get_type()));
declare(make_to_blob_function(type.get_type()));
declare(make_from_blob_function(type.get_type()));
}
declare(aggregate_fcts::make_count_function<int8_t>());
declare(aggregate_fcts::make_max_function<int8_t>());
@@ -123,10 +127,9 @@ functions::init() {
declare(aggregate_fcts::make_max_function<utils::UUID>());
declare(aggregate_fcts::make_min_function<utils::UUID>());
//FIXME:
//declare(aggregate_fcts::make_count_function<bytes>());
//declare(aggregate_fcts::make_max_function<bytes>());
//declare(aggregate_fcts::make_min_function<bytes>());
declare(aggregate_fcts::make_count_function<bytes>());
declare(aggregate_fcts::make_max_function<bytes>());
declare(aggregate_fcts::make_min_function<bytes>());
// FIXME: more count/min/max
@@ -163,7 +166,7 @@ functions::make_arg_spec(const sstring& receiver_ks, const sstring& receiver_cf,
std::transform(name.begin(), name.end(), name.begin(), ::tolower);
return ::make_shared<column_specification>(receiver_ks,
receiver_cf,
::make_shared<column_identifier>(sprint("arg%d(%s)", i, name), true),
::make_shared<column_identifier>(format("arg{:d}({})", i, name), true),
fun.arg_types()[i]);
}
@@ -283,13 +286,13 @@ functions::get(database& db,
if (compatibles.empty()) {
throw exceptions::invalid_request_exception(
sprint("Invalid call to function %s, none of its type signatures match (known type signatures: %s)",
format("Invalid call to function {}, none of its type signatures match (known type signatures: {})",
name, join(", ", candidates)));
}
if (compatibles.size() > 1) {
throw exceptions::invalid_request_exception(
sprint("Ambiguous call to function %s (can be matched by following signatures: %s): use type casts to disambiguate",
format("Ambiguous call to function {} (can be matched by following signatures: {}): use type casts to disambiguate",
name, join(", ", compatibles)));
}
@@ -328,7 +331,7 @@ functions::validate_types(database& db,
const sstring& receiver_cf) {
if (provided_args.size() != fun->arg_types().size()) {
throw exceptions::invalid_request_exception(
sprint("Invalid number of arguments in call to function %s: %d required but %d provided",
format("Invalid number of arguments in call to function {}: {:d} required but {:d} provided",
fun->name(), fun->arg_types().size(), provided_args.size()));
}
@@ -344,7 +347,7 @@ functions::validate_types(database& db,
auto&& expected = make_arg_spec(receiver_ks, receiver_cf, *fun, i);
if (!is_assignable(provided->test_assignment(db, keyspace, expected))) {
throw exceptions::invalid_request_exception(
sprint("Type error: %s cannot be passed as argument %d of function %s of type %s",
format("Type error: {} cannot be passed as argument {:d} of function {} of type {}",
provided, i, fun->name(), expected->type->as_cql3_type()));
}
}
@@ -419,7 +422,7 @@ function_call::bind_and_get(const query_options& options) {
// simplify things.
auto val = t->bind_and_get(options);
if (!val) {
throw exceptions::invalid_request_exception(sprint("Invalid null value for argument to %s", *_fun));
throw exceptions::invalid_request_exception(format("Invalid null value for argument to {}", *_fun));
}
buffers.push_back(std::move(to_bytes_opt(val)));
}
@@ -433,13 +436,13 @@ function_call::execute_internal(cql_serialization_format sf, scalar_function& fu
try {
// Check the method didn't lied on it's declared return type
if (result) {
fun.return_type()->validate(*result);
fun.return_type()->validate(*result, sf);
}
return result;
} catch (marshal_exception& e) {
throw runtime_exception(sprint("Return of function %s (%s) is not a valid value for its declared return type %s",
throw runtime_exception(format("Return of function {} ({}) is not a valid value for its declared return type {}",
fun, to_hex(result),
*fun.return_type()->as_cql3_type()
fun.return_type()->as_cql3_type()
));
}
}
@@ -485,7 +488,7 @@ function_call::raw::prepare(database& db, const sstring& keyspace, ::shared_ptr<
});
auto&& fun = functions::functions::get(db, keyspace, _name, args, receiver->ks_name, receiver->cf_name, receiver);
if (!fun) {
throw exceptions::invalid_request_exception(sprint("Unknown function %s called", _name));
throw exceptions::invalid_request_exception(format("Unknown function {} called", _name));
}
if (fun->is_aggregate()) {
throw exceptions::invalid_request_exception("Aggregation function are not supported in the where clause");
@@ -497,13 +500,13 @@ function_call::raw::prepare(database& db, const sstring& keyspace, ::shared_ptr<
// Functions.get() will complain if no function "name" type check with the provided arguments.
// We still have to validate that the return type matches however
if (!receiver->type->is_value_compatible_with(*scalar_fun->return_type())) {
throw exceptions::invalid_request_exception(sprint("Type error: cannot assign result of function %s (type %s) to %s (type %s)",
throw exceptions::invalid_request_exception(format("Type error: cannot assign result of function {} (type {}) to {} (type {})",
fun->name(), fun->return_type()->as_cql3_type(),
receiver->name, receiver->type->as_cql3_type()));
}
if (scalar_fun->arg_types().size() != _terms.size()) {
throw exceptions::invalid_request_exception(sprint("Incorrect number of arguments specified for function %s (expected %d, found %d)",
throw exceptions::invalid_request_exception(format("Incorrect number of arguments specified for function {} (expected {:d}, found {:d})",
fun->name(), fun->arg_types().size(), _terms.size()));
}
@@ -562,7 +565,7 @@ function_call::raw::test_assignment(database& db, const sstring& keyspace, share
sstring
function_call::raw::to_string() const {
return sprint("%s(%s)", _name, join(", ", _terms));
return format("{}({})", _name, join(", ", _terms));
}

View File

@@ -59,13 +59,6 @@ namespace cql3 {
namespace functions {
#if 0
// We special case the token function because that's the only function whose argument types actually
// depend on the table on which the function is called. Because it's the sole exception, it's easier
// to handle it as a special case.
private static final FunctionName TOKEN_FUNCTION_NAME = FunctionName.nativeFunction("token");
#endif
class functions {
static thread_local std::unordered_multimap<function_name, shared_ptr<function>> _declared;
private:

View File

@@ -43,7 +43,7 @@
#include "types.hh"
#include "native_function.hh"
#include "core/shared_ptr.hh"
#include <seastar/core/shared_ptr.hh>
namespace cql3 {
namespace functions {

View File

@@ -43,7 +43,7 @@
#include "native_function.hh"
#include "scalar_function.hh"
#include "core/shared_ptr.hh"
#include <seastar/core/shared_ptr.hh>
namespace cql3 {
namespace functions {

Some files were not shown because too many files have changed in this diff Show More