Commit Graph

14955 Commits

Author SHA1 Message Date
Asias He
f539e993d3 gossip: Relax generation max difference check
start node 1 2 3
shutdown node2
shutdown node1 and node3
start node1 and node3
nodetool removenode node2
clean up all scylla data on node2
bootstrap node2 as a new node

I saw node2 could not bootstrap stuck at waiting for schema information to compelte for ever:

On node1, node3

    [shard 0] gossip - received an invalid gossip generation for peer 127.0.0.2; local generation = 2, received generation = 1521779704

On node2

    [shard 0] storage_service - JOINING: waiting for schema information to complete

This is becasue in nodetool removenode operation, the generation of node1 was increased from 0 to 2.

   gossiper::advertise_removing () calls eps.get_heart_beat_state().force_newer_generation_unsafe();
   gossiper::advertise_token_removed() calls eps.get_heart_beat_state().force_newer_generation_unsafe();

Each force_newer_generation_unsafe increases the generation by 1.

Here is an example,

Before nodetool removenode:
```
curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" | python -mjson.tool
   {
   "addrs": "127.0.0.2",
   "generation": 0,
   "is_alive": false,
   "update_time": 1521778757334,
   "version": 0
   },
```

After nodetool revmoenode:
```
curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" | python -mjson.tool
 {
     "addrs": "127.0.0.2",
     "application_state": [
         {
             "application_state": 0,
             "value": "removed,146b52d5-dc94-4e35-b7d4-4f64be0d2672,1522038476246",
             "version": 214
         },
         {
             "application_state": 6,
             "value": "REMOVER,14ecc9b0-4b88-4ff3-9c96-38505fb4968a",
             "version": 153
            }
     ],
     "generation": 2,
     "is_alive": false,
     "update_time": 1521779276246,
     "version": 0
 },
```

In gossiper::apply_state_locally, we have this check:

```
if (local_generation != 0 && remote_generation > local_generation + MAX_GENERATION_DIFFERENCE) {
    // assume some peer has corrupted memory and is broadcasting an unbelievable generation about another peer (or itself)
  logger.warn("received an invalid gossip generation for peer {}; local generation = {}, received generation = {}",ep, local_generation, remote_generation);

}
```
to skip the gossip update.

To fix, we relax generation max difference check to allow the generation
of a removed node.

After this patch, the removed node bootstraps successfully.

Tests: dtest:update_cluster_layout_tests.py
Fixes #3331

Message-Id: <678fb60f6b370d3ca050c768f705a8f2fd4b1287.1522289822.git.asias@scylladb.com>
2018-03-29 12:09:49 +03:00
Glauber Costa
b092234f2b sstables: print informative message earlier
Just saw this today during a crash when creating Materialized Views.
It is still unclear why this happened. But the message says:

Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: scylla: sstables/sstables.cc:2973: sstables::sstable::remove_sstable_with_temp_toc(seastar::sstring, seastar::sstring, seastar::sstring, int64_t, sstables::sstable::version_types, sstables::sstable::format_types)::<lambda()>: Assertion `tmptoc == true' failed.
Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: Aborting on shard 0.
Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: Backtrace:
Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: 0x00000000005b4b4c
Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: 0x00000000005b4df5
Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: 0x00000000005b4ea3
Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: /lib64/libpthread.so.0+0x000000000000f0ff
Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: /lib64/libc.so.6+0x00000000000355f6
Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: /lib64/libc.so.6+0x0000000000036ce7
Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: /lib64/libc.so.6+0x000000000002e565
Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: /lib64/libc.so.6+0x000000000002e611
Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: 0x00000000015969d0
Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: 0x0000000001596f7a
Mar 28 15:55:58 ip-172-31-24-9 scylla[14055]: 0x000000000051ca8d

I can't even guess which table caused the problem, let alone which SSTable.
That's because those asserts are the very first thing we do. We can discuss
whether or not assert is the right behaviour (usually we can't guarantee the
state is sane if that is missing, so I don't see a problem)

But it would be nice to see which SSTable we are processing before we assert.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180328160856.10717-1-glauber@scylladb.com>
2018-03-28 19:55:04 +03:00
Avi Kivity
4419e60207 Merge "Add a confiugration API" from Amnon
"
The configuration API is part of scylla v2 configuration.
It uses the new definition capabilities of the API to dynamically create
the swagger definition for the configuration.
This mean that the swagger will contain an entry with description and
type for each of the config value.

To get the v2 of the swager file:
http://localhost:10000/v2

If using with swagger ui, change http://localhost:10000/api-doc to http://localhost:10000/v2
It takes longer to load because the file is much bigger now.
"

* 'amnon/config_api_v5' of github.com:scylladb/seastar-dev:
  Explanation about the API V2
  API: add the config API as part of the v2 API.
  Defining the config api
2018-03-28 12:45:17 +03:00
Amnon Heiman
71a04b5d26 Explanation about the API V2
Currently it holds a general explanation about the V2 and specific entry
about the config.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2018-03-28 12:42:04 +03:00
Amnon Heiman
94c2d82942 API: add the config API as part of the v2 API.
After this patch, the API v2 will contain a config section with all the
configuration parametes.

get http://localhost:10000/v2

Will contain the config section.

An example for getting a configuration parameter:
curl http://localhost:10000/v2/config/listen_address

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2018-03-28 12:42:04 +03:00
Amnon Heiman
6d907e43e0 Defining the config api
The config API is created dynamically from the config. This mean that
the swagger definition file will contain the description and types based on the
configuration.

The config.json file is used by the code generator to define a path that is
used to register the handler function.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2018-03-28 12:41:55 +03:00
Vladimir Krivopalov
b268ea951a tests: perf_fast_forward: Sanitize JSON files names.
Substitute various brackets and parentheses with alnum strings, remove
whitespaces, strip single-range values off curly braces.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <206adea8d05a1e64ce2627df1e4da3a845454906.1522171869.git.vladimir@scylladb.com>
2018-03-28 12:29:07 +03:00
Tomasz Grabiec
52c61df930 Relax includes
To avoid unnecessary recompilations.
Message-Id: <1522168295-994-1-git-send-email-tgrabiec@scylladb.com>
2018-03-28 10:49:07 +03:00
Avi Kivity
4c3e82bd67 Merge "db/view: Populate views with existing base table data" from Duarte
"
This series introduces the view_builder class, a sharded service
responsible for building all defined materialized views. This process
entails walking over the existing data in a given base table, and using
it to calculate and insert the respective entries for one or more views.

The view_builder uses the migration_manager to subscribe to schema
change events, and update its bookkeeping accordingly. We prefer this
to having the database call into the view_builder, as that would
create a cyclic dependency.

We serialize changes to the views of a particular base table, such
that schema changes do not interfere with the view building process.

We employ a flat_mutation_reader for each base table for which we're
building views.

We consume from the reader associated with each base table until all
its views are built. If the reader reaches the end and there are
incomplete views, then a view was added while others were being built.
In such cases, we restart the reader to the beginning of the current
token, but not to the beginning of the token range, when the view is
added. Then, when we exhaust the reader, we simply create a new one
for the whole token range, and resume building the pending views.

We aim to be resource-conscious. On a given shard, at any given moment,
we consume at most from one reader. We also strive for fairness, in that
each build step inserts entries for the views of a different base. Each
build step reads and generates updates for batch_size rows. We lack a
controller, which could potentially allow us to go faster (to execute
multiple steps at the same time, or consume more rows per batch), and
also which would apply backpressure, so we could, for example, delay
executing a build step.

Interaction with the system tables:
  - When we start building a view, we add an entry to the
    scylla_views_builds_in_progress system table. If the node restarts
    at this point, we'll consider these newly inserted views as having
    made no progress, and we'll treat them as new views;
  - When we finish a build step, we update the progress of the views
    that we built during this step by writing the next token to the
    scylla_views_builds_in_progress table. If the node restarts here,
    we'll start building the views at the token in the next_token
    column.
  - When we finish building a view, we mark it as completed in the
    built views system table, and remove it from the in-progress system
    table. Under failure, the following can happen:
        * When we fail to mark the view as built, we'll redo the last
          step upon node reboot;
        * When we fail to delete the in-progress record, upon reboot
          we'll remove this record.
    A view is marked as completed only when all shards have finished
    their share of the work, that is, if a view is not built, then all
    shards will still have an entry in the in-progress system table;
  - A view that a shard finished building, but not all other shards,
    remains in the in-progress system table, with first_token ==
    next_token.

Interaction with the distributed system tables:
  - When we start building a view, we mark the view build as being
    in-progress;
  - When we finish building a view, we mark the view as being built.
    Upon failure, we ensure that if the view is in the in-progress
    system table, then it may not have been written to this table. We
    don't load the built views from this table when starting. When
    starting, the following happens:
         * If the view is in the system.built_views table and not the
           in-progress system table, then it will be in this one;
         * If the view is in the system.built_views table and not in
           this one, it will still be in the in-progress system table -
           we detect this and mark it as built in this table too,
           keeping the invariant;
         * If the view is in this table but not in system.built_views,
           then it will also be in the in-progress system table - we
           don't detect this and will redo the missing step, for
           simplicity.

View building is necessarily a sharded process. That means that on
restart, if the number of shards has changed, we need to calculate
the most conservative token range that has been built, and build
the remainder.

When building view updates, we consider that everything is new and
nothing pre-existing is there (which means no tombstones will be sent
out to the paired view replicas).

Tests:
  unit (debug)
  dtest (materialized_view_test.py(smp=1, smp=2))
"

* 'view-building/v4' of https://github.com/duarten/scylla: (22 commits)
  tests/view_build_test: Add tests for view building
  tests/cql_test_env: Move eventually() to this file
  tests/cql_assertions: Assert result set is not empty
  tests/cql_test_env: Start the view_builder
  db/view/view_builder: Allow synchronizing with the end of a build
  db/view/view_builder: Actually build views
  flat_mutation_reader: Make reader from mutation fragments
  db/view/view_builder: React to schema changes
  service/migration_listener: Add class for view notifications
  db/view: Introduce view_builder
  column_family: Add function to populate views
  column_family: Allow synchronizing with in-progress writes
  database: Compare view id instead of name in find_views()
  database: Add get_views() function
  db/view: Return a future when sending view updates
  service/storage_service: Allow querying the view build status
  db: Introduce system_distributed_keyspace
  tests: Add unit test for build_progress_virtual_reader
  db/system_keyspace: Add API for MV-related system tables
  db/system_keyspace: Add virtual reader for MV in-progress build status
  ...
2018-03-27 15:41:28 +03:00
Daniel Fiala
051ed12ad2 cql3/functions: Print function declaration with cql3 types, not with internal types.
Signed-off-by: Daniel Fiala <daniel@scylladb.com>
Message-Id: <20180327084953.20313-3-daniel@scylladb.com>
2018-03-27 13:33:29 +03:00
Duarte Nunes
9f5cfa76f7 tests/view_build_test: Add tests for view building
This is a separate file from view_schema_test because that one is
already becoming too long to run; also, having multiple test files
means they can be executed in parallel.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:11 +01:00
Duarte Nunes
e5031f70ef tests/cql_test_env: Move eventually() to this file
Move eventually() from view_schema_test to cql_test_env.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:11 +01:00
Duarte Nunes
8528584056 tests/cql_assertions: Assert result set is not empty
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:11 +01:00
Duarte Nunes
a2c94e7925 tests/cql_test_env: Start the view_builder
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:11 +01:00
Duarte Nunes
a45fa8eaa2 db/view/view_builder: Allow synchronizing with the end of a build
Intended for use by unit tests, this patch allows synchronizing with
the end of a build for a particular view.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:11 +01:00
Duarte Nunes
5f822e3928 db/view/view_builder: Actually build views
This patch adds the missing view building code to the eponymous class.

We consume from the reader associated with each base table until all
its views are built. If the reader reaches the end and there are
incomplete views, then a view was added while others were being built.
In such cases, we restart the reader to the beginning of the current
token, but not to the beginning of the token range, when the view is
added. Then, when we exhaust the reader, we simply create a new one
for the whole token range, and resume building the pending views.

We aim to be resource-conscious. On a given shard, at any given moment,
we consume at most from one reader. We also strive for fairness, in that
each build step inserts entries for the views of a different base. Each
build step reads and generates updates for batch_size rows. We lack a
controller, which could potentially allow us to go faster (to execute
multiple steps at the same time, or consume more rows per batch), and
also which would apply backpressure, so we could, for example, delay
executing a build step.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:11 +01:00
Duarte Nunes
1f3e3d3813 flat_mutation_reader: Make reader from mutation fragments
Builds a reader from a set of ordered mutations fragments. This is
useful for building a reader out of a subset of segments returned by a
different reader. It is equivalent to building a mutation out of the
set of mutation fragments, and calling
make_flat_mutation_reader_from_mutations, except that it doest not yet
support fast-forwarding.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:11 +01:00
Duarte Nunes
a21efeffa0 db/view/view_builder: React to schema changes
The view_builder now uses the migration_manager to subscribe to schema
change events, and update its bookkeeping accordingly. We prefer this
to having the database call into the view_builder, as that would
create a cyclic dependency.

We serialize changes to the views of a particular base table, such
that schema changes do not interfere with the upcoming view building
code.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:11 +01:00
Duarte Nunes
3ffa3b6b54 service/migration_listener: Add class for view notifications
Add a convenience base class for view notifications, which provides
a default implementation for all other types of notifications.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:11 +01:00
Duarte Nunes
901faabaa2 db/view: Introduce view_builder
This patch introduces the view_builder class, a sharded service
responsible for building all defined materialized views. This process
entails walking over the existing data in a given base table, and using
it to calculate and insert the respective entries for one or more views.

This patch introduces only the bootstrap functionality, which is
responsible for loading the data stored in the system tables and
filling the in-memory data structures with the relevant information,
to be used in subsequent patches for the actual view building. The
interaction with the system tables is as follows.

Interaction with the tables in system_keyspace:
  - When we start building a view, we add an entry to the
    scylla_views_builds_in_progress system table. If the node restarts
    at this point, we'll consider these newly inserted views as having
    made no progress, and we'll treat them as new views;
  - When we finish a build step, we update the progress of the views
    that we built during this step by writing the next token to the
    scylla_views_builds_in_progress table. If the node restarts here,
    we'll start building the views at the token in the next_token
    column.
  - When we finish building a view, we mark it as completed in the
    built views system table, and remove it from the in-progress system
    table. Under failure, the following can happen:
        * When we fail to mark the view as built, we'll redo the last
          step upon node reboot;
        * When we fail to delete the in-progress record, upon reboot
          we'll remove this record.
    A view is marked as completed only when all shards have finished
    their share of the work, that is, if a view is not built, then all
    shards will still have an entry in the in-progress system table;
  - A view that a shard finished building, but not all other shards,
    remains in the in-progress system table, with first_token ==
    next_token.

Interaction with the distributed system table (view_build_status):
  - When we start building a view, we mark the view build as being
    in-progress;
  - When we finish building a view, we mark the view as being built.
    Upon failure, we ensure that if the view is in the in-progress
    system table, then it may not have been written to this table. We
    don't load the built views from this table when starting. When
    starting, the following happens:
         * If the view is in the system.built_views table and not the
           in-progress system table, then it will be in view_build_status;
         * If the view is in the system.built_views table and not in
           this one, it will still be in the in-progress system table -
           we detect this and mark it as built in this table too,
           keeping the invariant;
         * If the view is in this table but not in system.built_views,
           then it will also be in the in-progress system table - we
           don't detect this and will redo the missing step, for
           simplicity.

View building is necessarily a sharded process. That means that on
restart, if the number of shards has changed, we need to calculate
the most conservative token range that has been built, and build
the remainder.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
f298f57137 column_family: Add function to populate views
The populate_views() function takes a set of views to update, a
tokento select base table partitions, and the set of sstables to
query. This lays the foundation for a view building mechanism to exist,
which walks over a given base table, reads data token-by-token,
calculates view updates (in a simplified way, compared to the existing
functions that push view updates), and sends them to the paired view
replicas.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
67dd3e6e5d column_family: Allow synchronizing with in-progress writes
This patch adds a mechanism to class column_family through which we
can synchronize with in-progress writes. This is useful for code that,
after some modification, needs to ensure that new writes will see it
before it can proceed.

In particular, this will be used by the view building code, which needs
to wait until the in-progress writes, which may have missed that there
is now a view, is observable to the view building code.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
9640205f11 database: Compare view id instead of name in find_views()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
9b9ba525f7 database: Add get_views() function
Returns all the schemas that are views.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
dc44a08370 db/view: Return a future when sending view updates
While we now send view mutations asynchronously in the normal view
write path, other processes interested in sending view updates, such
as streaming or view building, may wish to do it synchronously.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
ff15068a41 service/storage_service: Allow querying the view build status
This patch adds support for the nodetool viewbuildstatus command,
which shows the progress of a materialized view build across the
cluster.

A view can be absent from the result, successfully built, or
currently being built.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
78b232d98f db: Introduce system_distributed_keyspace
This patch introduces a distributed system keyspace, used to hold
system tables that need to be replicated across a set of replicas
(that is, can't use the LocalStrategy).

In following patches, we will use this keyspace to hold a table
containing view building status updates for each node, used to support
range movements and a new nodetool command.

Fixes #3237

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
412f081db9 tests: Add unit test for build_progress_virtual_reader
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
4227641a3d db/system_keyspace: Add API for MV-related system tables
This patch implements an API to access the MV-related system tables,
which pertain to the view building process.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
b2cae7ea09 db/system_keyspace: Add virtual reader for MV in-progress build status
Provide a virtual reader so users can query the in-progress view table
in a way compatible with Apache Cassandra.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
7811474697 db/system_keyspace: Add Scylla-specific MV system table
When building a materialized view, we divide our work by shard, so we
need to register which shard did what work in the in-progress system
table. We also add the token we started at, which will enable some
optimizations in the view building code.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Duarte Nunes
38831888d2 db/system_keyspace: Include MV system tables in all_tables()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-03-27 01:20:10 +01:00
Avi Kivity
16a7650873 Merge "More extensions: commitlog + system tables" from Calle
"
Additional extension points.

* Allows wrapping commitlog file io (including hinted handoff).
* Allows system schema modification on boot, allowing extensions
  to inject extensions into hardcoded schemas.

Note: to make commitlog file extensions work, we need to both
enforce we can be notified on segment delete, and thus need to
fix the old issue of hard ::unlink call in segment destructor.
Segment delete is therefore moved to a batch routine, run at
intervals/flush. Replay segments and hints are also deleted via
the commitlog object, ensuring an extension is notified (metadata).

Configurable listeneres are now allowed to inject configuration
object into the main config. I.e. a local object can, either
by becoming a "configurable" or manually, add references to
self-describing values that will be parsed from the scylla.yaml
file, effectively extending it.

All these wonderful abstractions courtesy of encryption of course.
But super generalized!
"

* 'calle/commitlog_ext' of github.com:scylladb/seastar-dev:
  db::extensions: Allow extensions to modify (system) schemas
  db::commitlog: Add commitlog/hints file io extension
  db::commitlog: Do segment delete async + force replay delete go via CL
  main/init: Change configurable callbacks and calls to allow adding opts
  util::config_file: Add "add" config item overload
2018-03-26 16:18:22 +03:00
Calle Wilund
ff41f47a08 db::extensions: Allow extensions to modify (system) schemas
Allows extensions/config listeners to potentially augument
(system) schemas at boot time. This is only useful for schemas
who do not pass through system_schema tables.
2018-03-26 11:58:28 +00:00
Calle Wilund
bb1a2c6c2e db::commitlog: Add commitlog/hints file io extension
To allow on-disk data to be augumented.
2018-03-26 11:58:27 +00:00
Calle Wilund
2bc98aebaf db::commitlog: Do segment delete async + force replay delete go via CL
Refs #2858

Push segement files to be deleted to a pending list, and process at
intervals or flush-requests (or shutdown). Note that we do _not_
indescrimenately do deletes in non-anchored tasks, because we need
to guarantee that finshed segments are fully deleted and gone on CL
shutdown, not to be mistaken for replayables.

Also make sure we delete segments replayed via commitlog call,
so IFF we add metadata processing for CL, we can clear it out.
2018-03-26 11:58:27 +00:00
Duarte Nunes
a985ea0fcb column_family: Don't retry flushing memtable if shutdown is requested
Since we just keep retrying, this can cause Scylla to not shutdown for
a while.

The data will be safe in the commit log.

Note that this patch doesn't fix the issue when shutdown goes through
storage_service::drain_on_shutdown - more work is required to handle
that case.

Ref #3318.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180324140822.3743-3-duarte@scylladb.com>
2018-03-26 14:36:40 +03:00
Duarte Nunes
50ad37d39b column_family: Increase scope of exception handling when flushing a memtable
In column_family::try_flush_memtable_to_sstable, the handle_exception()
block is on the inside of the continuations to
write_memtable_to_sstable(), which, if it fails, will leave the
sstable in the compaction_backlog_tracker::_ongoing_writes map, which
will waste disk space, and that sstable will map to a dangling pointer
to a destroyed database_sstable_write_monitor, which causes a seg
fault when accessed (for example, through the backlog_controller,
which accounts the _ongoing_writes when calculating the backlog).

Fix this by increasing the scope of handle_exception().

Fixes #3315

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180324140822.3743-2-duarte@scylladb.com>
2018-03-26 14:36:16 +03:00
Duarte Nunes
b7bd9b8058 backlog_controller: Stop update timer
On database shutdown, this timer can cause use-after-free errors if
not stopped.

Refs #3315

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180324140822.3743-1-duarte@scylladb.com>
2018-03-26 14:36:16 +03:00
Botond Dénes
0e6aa91269 Fix test.py output and error handling
* Don't dump output of failed tests immediately, print the output
for failed tests in the end instead.
* Fix exception printing in run_test(): don't assume passed in error
object is a `bytes` (or bytes-like) object, call the object's str
operator instead and let callers encode bytes objects instead.
* Don't assume Exception object has an `out` member, use operator str
instead to convert it to string.
* Don't print progress in run_test() directly because it results in
incomprehensible output as the executors race to print to stdout. Leave
progress report to the caller who can serialize progress prints.
* Automatically detect non-tty stdout and don't try to edit already
printed text.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <7bb7e0003ded9b28710250bff851ea849bb99f7d.1522062795.git.bdenes@scylladb.com>
2018-03-26 14:26:45 +03:00
Avi Kivity
999df41a49 Merge "Bug fixes for access-control, and finalizing roles" from Jesse
"
This series does not add or change any features of access-control and
roles, but addresses some bugs and finalizes the switch to roles.

"auth: Wait for schema agreement" and the patch prior help avoid false
negatives for integration tests and error messages in logs.

"auth: Remove ordering dependence" fixes an important bug in `auth` that
could leave the default superuser in a corrupted state when it is first
created.

Since roles are feature-complete (to the best of the author's knowledge
as of this writing), the final patch in the series removes any warnings
about them being unimplemented.

Tests: unit (release), dtest (PENDING)
"

* 'jhk/auth_fixes/v1' of https://github.com/hakuch/scylla:
  Roles are implemented
  auth: Increase delay before background tasks start
  auth: Remove ordering dependence
  auth: Don't warn on rescheduled task
  auth: Wait for schema agreement
  Single-node clusters can agree on schema
2018-03-26 09:29:41 +03:00
Jesse Haber-Kucharsky
849cf49b8d Roles are implemented
Fixes #1941.
2018-03-26 00:52:59 -04:00
Jesse Haber-Kucharsky
af24637565 auth: Increase delay before background tasks start
I've observed failures due to "missing" the peer nodes by about 1
second. Adding 5 second to the existing delay should cover most false
negative test results.

Fixes #3320.
2018-03-26 00:52:55 -04:00
Jesse Haber-Kucharsky
00f7bc676d auth: Remove ordering dependence
If `auth::password_authenticator` also creates `system_auth.roles` and
we fix the existence check for the default superuser in
`auth::standard_role_manager` to only search for the columns that it
owns (instead of the column itself), then both modules' initialization
are independent of one another.

Fixes #3319.
2018-03-25 22:38:11 -04:00
Jesse Haber-Kucharsky
968c61c296 auth: Don't warn on rescheduled task
Apache Cassandra also prints at the `info` level. This change prevents
tasks which we expect to be rescheduled from failing tests and scaring
users.

A good example of this importance of this change is when queries with a
quorum consistency level (for the default superuser) fail because a
quorum is not available. We will try again in this case, and this should
not cause integration tests to fail.
2018-03-25 22:38:11 -04:00
Jesse Haber-Kucharsky
881656cea4 auth: Wait for schema agreement
Some modules of `auth` create a default superuser if it does not already
exist.

The existence check is through a SELECT query with quorum consistency
level. If the schema for the applicable tables has not yet propagated to
a peer node at the time that it processes this query, then the
`storage_proxy` will print an error message to the log and the query
will be retried.

Eventually, the schema will propagate and the default superuser will be
created. However, the error message in the log causes integration tests
to fail (and is somewhat annoying).

Now, prior to querying for existing data, we wait for all gossip peers
to have the same schema version as we do.

Fixes #2852.
2018-03-25 22:38:08 -04:00
Jesse Haber-Kucharsky
3e415e28bc Single-node clusters can agree on schema
At some points while bootstrapping [1], new non-seed Scylla nodes wait
for schema agreement among all known endpoints in the cluster.

The check for schema agreement was in
`service::migration_manager::is_ready_for_bootstrap`. This function
would return `true` if, at the time of its invocation, the node was
aware of at least one `UP` peer (not itself) and that all `UP` peers had
the same schema version as the node.

We wish to re-use this check in the `auth` sub-system to ensure that
the schema for internal system tables used for access-control have
propagated to the entire cluster.

Unlike in `service/storage_service.cc`, where `is_ready_for_bootstrap`
was only invoked for seed nodes, we wish to wait for schema agreement
for all nodes regardless of whether or not they are seeds.

For a single-node cluster with itself as a seed,
`is_ready_for_bootstrap` would always return `false`.

We therefore change the conditions for schema agreement. Schema
agreement is now reached when there are no known peers (so the endpoint
map of the gossiper consists only of ourselves), or when there is at
least one `UP` peer and all `UP` peers have the same schema version as
us.

This change should not impact any bootstrap behavior in
`storage_service` because seed nodes do not invoke the function and
non-seed nodes wait for peer visibility before checking for schema
agreement.

Since this function is no longer checking for schema agreement only in
the context of bootstrapping non-seed nodes, we rename it to reflect its
generality.

[1] http://thelastpickle.com/blog/2017/05/23/auto-bootstrapping-part1.html
2018-03-25 22:08:42 -04:00
Duarte Nunes
aed28c667c db/view: Pass pending endpoints to storage_proxy::send_to_endpoint
This minimizes the number of mutation copies by just doing a single
call to send_to_endpoint().

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180325121412.76844-2-duarte@scylladb.com>
2018-03-25 15:45:22 +03:00
Duarte Nunes
fb54c09e0b service/storage_proxy: Pass pending endpoints to send_to_endpoint()
This will allow us to minimize the number of mutation copies in
mutate_MV().

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180325121412.76844-1-duarte@scylladb.com>
2018-03-25 15:45:21 +03:00
Avi Kivity
389fb54a42 tests: sstable_test: fix for_each_sstable_version concept (again)
I see the following error:

seastar/core/future-util.hh:597:10: note:   constraints not satisfied
seastar/core/future-util.hh:597:10: note:     with ‘sstables::sstable_version_types* c’
seastar/core/future-util.hh:597:10: note:     with ‘sub_partitions_read::run_test_case()::<lambda(sstables::sstable::version_types)> aa’
seastar/core/future-util.hh:597:10: note: the required expression ‘seastar::futurize_apply(aa, (* c.begin()))’ would be ill-formed
seastar/core/future-util.hh:597:10: note: ‘seastar::futurize_apply(aa, (* c.begin()))’ is not implicitly convertible to ‘seastar::future<>’

The C array all_sstable_versions decayed to a pointer (see second gcc note)
and of course doesn't support std::begin().

Fix by replacing the C array with an std::array<>, which supports std::begin().

Not clear what made this break again, or why it worked before.
Message-Id: <20180325095239.12407-1-avi@scylladb.com>
2018-03-25 13:02:57 +01:00