Commit Graph

17341 Commits

Author SHA1 Message Date
Asias He
edd72e10ac repair: Introduce get_sync_boundary_response
The return value of the REPAIR_GET_SYNC_BOUNDARY verb. It will be used
in the row level repair code soon.
2018-12-12 16:49:01 +08:00
Asias He
95b9a889cf repair: Introduce repair_hash
It represents the hash value of a repair row.
2018-12-12 16:49:01 +08:00
Asias He
3e86b7a646 repair: Introduce repair_sync_boundary
Represent a position of a mutation_fragment read from a flat mutation
reader. Repair nodes negotiate a small sub range identified by two
repair_sync_boundary to work on in each round.
2018-12-12 16:49:01 +08:00
Asias He
063dfcda26 messaging_service: Add constructor for msg_addr
Which takes the ip address and shard id.
2018-12-12 16:49:01 +08:00
Asias He
8cb3ea98d0 xx_hasher: Allow specifying seed
It will be used by row level repair.
2018-12-12 16:49:01 +08:00
Asias He
165d3053b1 position_in_partition: Add get_type, get_bound_weight and get_clustering_key_prefix
Needed by the RPC serialization code.
2018-12-12 16:49:01 +08:00
Asias He
4e55d22a8f position_in_partition: Switch _bound_weight to use enum
The _bound_weight in position_in_partition will be sent on wire in rpc.
Make it enum instead of int.
2018-12-12 16:49:01 +08:00
Asias He
5bc109e1ee position_in_partition: Add bound_weight
It will be used to change _bound_weight to use enum instead of int8_t.
2018-12-12 16:49:01 +08:00
Asias He
05c663b932 position_in_partition: Use std::optional for clustering_key_prefix
The new row level repair code will access clustering_key_prefix and it
uses std::optional everywhere. Convert position_in_partition to use
std::optional.
2018-12-12 16:49:01 +08:00
Asias He
0b31d7059b position_in_partition: Make partition_region uint8_t
It will be sent over rpc. Make the type explicit.
2018-12-12 16:49:01 +08:00
Asias He
dfd206b3a3 serializer: Add std::optional support 2018-12-12 16:49:01 +08:00
Asias He
3eecdc670f serializer: Add std::list support
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
b540df2819 serializer: Add std::unordered_set support
Needed by the row level repair RPC verbs.
2018-12-12 16:49:01 +08:00
Asias He
1367c8c47e dht: Add make_partitioner
Given the name and shard count and the sharding_ignore_msb_bits, make a
partitioner.

It is used by row level repair.
2018-12-12 16:49:01 +08:00
Asias He
f1a914060b dht: Add constructor for decorated_key which takes token and partition_key
decorated_key(const dht::token& t, const partition_key& k)
2018-12-12 16:49:01 +08:00
Juliana Oliveira
5eb76c9bc6 compress: add support for Cassandra's compression parameter
This patch adds compatibility for Cassandra's "chunk_size_in_kb", as
well as it keeps Scylla's "chunk_size_kb" compression parameter.

Fixes #3669
Tests: unit (release)

v2: use variable instead of array
v3: fix commited files

Signed-off-by: Juliana Oliveira <juliana@scylladb.com>
Message-Id: <20181211215840.GA7379@shenzou.localdomain>
2018-12-11 23:33:27 +00:00
Nadav Har'El
a0379209e6 secondary indexes: fail attempts to create a CUSTOM INDEX
Cassandra supports a "CREATE CUSTOM INDEX" to create a secondary index
with a custom implementation. The only custom implementation that Cassandra
supports is SASI. But Scylla doesn't support this, or any other custom
index implementation. If a CREATE CUSTOM INDEX statement is used, we
shouldn't silently ignore the "CUSTOM" tag, we should generate an error.

This patch also includes a regression test that "CREATE CUSTOM INDEX"
statements with valid syntax fail (before this patch, they succeeded).

Fixes #3977

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181211224545.18349-2-nyh@scylladb.com>
2018-12-11 23:33:02 +00:00
Nadav Har'El
36db4fba23 Fix typo in error message
Interestingly, this typo was copied from the original Cassandra source
code :-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181211224545.18349-1-nyh@scylladb.com>
2018-12-11 23:32:58 +00:00
Avi Kivity
5b08e91bdb tools: add SYS_PTRACE capability to dbuild
LeakSanitizer uses ptrace, and docker disables ptrace by default. Add it
back so tests pass.
Message-Id: <20181208112524.19229-1-avi@scylladb.com>
2018-12-11 19:09:12 +00:00
Avi Kivity
34a31a807d build: build libdeflate with user selected C compiler
If the user specified a C compiler, use it to build libdeflate.

Fixes #3978.
Message-Id: <20181211145604.14847-1-avi@scylladb.com>
2018-12-11 14:58:16 +00:00
Duarte Nunes
89ae3fbf11 db/system_distributed_keyspace: Create the schema with min_timestamp
Different nodes can concurrently create the distributed system
keyspace on boot, before the "if not exists" clause can take effect.

However, the resulting schema mutations will be different since
different nodes use different timestamps. This patch forces the
timestamps to be the same across all nodes, so we save some schema
mismatches.

This fixes a bug exposed by ca5dfdf, whereby the initialization of the
distributed system keyspace is done before waiting for schema
agreement. While waiting for schema agreement in
storage_service::join_token_ring(), the node still hasn't joined the
ring and schemas can't be pulled from it, so nodes can deadlock. A
similar situation can happen between a seed node and a non-seed node,
where the seed node progresses to a different "wait for schema
agreement" barrier, but still can't make progress because it can't
pull the schema from the non-seed node still trying to join the ring.

Finally, it is assumed that changes to the schema of the current
distributed system keyspace tables will be protected by a cluster
feature and a subsequent schema synchronization, such that all nodes
will be at a point where schemas can be transferred around.

Fixes #3976

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181211113407.20075-1-duarte@scylladb.com>
2018-12-11 13:35:48 +01:00
Paweł Dziepak
e3f53542c9 Merge "Optimize sstable writing of large partitions" from Tomasz
"
This series contains several optimizations of the MC format sstable writer, mainly:
  - Avoiding output_stream when serializing into memory (e.g. a row)
  - Faster serialization of primitive types when serializing into memory

I measured the improvement in throughput (frag/s) using perf_fast_forward for
datasets with a single large partition with many small rows:

  - 10% for a row with a single cell of 8 bytes
  - 10% for a row with a single cell of 100 bytes
  -  9% for a row with a single cell of 1000 bytes
  - 13% for a row with 6 cells of 100 bytes
"

* tag 'avoid-output-stream-in-sstable-writer-v2' of github.com:tgrabiec/scylla:
  bytes_ostream: Optimize writing of fixed-size types
  sstables: mc: Write temporary data to bytes_ostream rather than file_writer
  sstables: mc: Avoid double-serialization of a range tombstone marker
  sstables: file_writer: Generalize bytes& writer to accept bytes_view
  sstables: Templetize write() functions on the writer
  sstables: Turn m_format_write_helpers.cc into an impl header
  sstables: De-futurize file_writer
  bytes_ostream: Implement clear()
  bytes_ostream: Make initial chunk size configurable
2018-12-11 12:29:24 +00:00
Duarte Nunes
d66bd0100b Merge 'Simplify db::extensions' from Avi
"
Carry out simplifications of db::extensions: less magical types, de-inline
complex functions, and reduce #include dependencies

Tests: unit(release)
"

* tag 'extensions-simplify/v1' of https://github.com/avikivity/scylla:
  extensions: remove unneeded includes
  extensions: deinline extension accessors
  extensions: return concrete types from the extension accessors
  extensions: remove dependency on cql layer
2018-12-10 22:00:51 +00:00
Avi Kivity
b251183359 extensions: remove unneeded includes
<boost/any.hpp> is not used, and "schema.hh" can be replaced with forward
declarations.
2018-12-10 21:34:09 +02:00
Avi Kivity
119a83bf2f extensions: deinline extension accessors
Quite complex code that is not performance sensitive. Move it out of line.
2018-12-10 21:22:56 +02:00
Avi Kivity
e9f5641b64 extensions: return concrete types from the extension accessors
Returning "auto" makes it harder to understand what the function is returning,
and impossible to de-inline.

Return a vector of pointers instead. The caller should iterate immediately, in
any case, and since the previous return value was a range of references to const
unique_ptrs, nothing else could be done with it anyway.
2018-12-10 21:16:45 +02:00
Tomasz Grabiec
f206ef0038 bytes_ostream: Optimize writing of fixed-size types
Inlining write() allows the writing code to be optimized for
fixed-size types. In particular, memcpy() calls and loops will be
eliminated.

Saw 4% improvement in throughput in perf_fast_forward for tiny rows.
2018-12-10 20:08:16 +01:00
Tomasz Grabiec
5a35240d47 sstables: mc: Write temporary data to bytes_ostream rather than file_writer
Currently temporary data is serialized into a file_writer, because
that's what write() functions used to expect, which goes through an
output_stream, a data_sink, into an in-memory data sink implementation
which collects the temporary_buffers.

Going through those abstractions is relatively expensive if we don't
write much, because each time we begin to write after a flush() of the
file_writer the output stream has to allocate a new buffer, which
means a large allocation for small amount of data.

We could avoid that and write into bytes_ostream directly, which will
keep its buffer across clear().

write() functions which are used both to write directly into the data
file and to a temporary arena were templatized to accept a Writer to
which both file_writer and bytes_ostream conform.
2018-12-10 20:08:16 +01:00
Tomasz Grabiec
c4003b3e79 sstables: mc: Avoid double-serialization of a range tombstone marker 2018-12-10 20:08:16 +01:00
Tomasz Grabiec
9edb9434e5 sstables: file_writer: Generalize bytes& writer to accept bytes_view
Note that bytes is imlpicitly convertible to bytes_view.
2018-12-10 20:08:16 +01:00
Tomasz Grabiec
fad4fba4bc sstables: Templetize write() functions on the writer
Will allow writing to both a file_writer, or an in-memory writer like
a bytes_ostream.
2018-12-10 20:08:16 +01:00
Tomasz Grabiec
f4016996d3 sstables: Turn m_format_write_helpers.cc into an impl header
I need to templatize functions defined in it and want to avoid
explicit instantiations.

There is only one compilation unit in which this is used
(sstables.cc). I think in the long term we should move all those
"helpers" into sstables/mc/writer.{cc,hh} together with their only
user, the sstable_writer_m class from sstables.cc.
2018-12-10 20:07:43 +01:00
Tomasz Grabiec
13999a4d09 sstables: De-futurize file_writer 2018-12-10 20:07:43 +01:00
Tomasz Grabiec
a1fb441df8 bytes_ostream: Implement clear() 2018-12-10 20:07:43 +01:00
Tomasz Grabiec
7cf5de3d9c bytes_ostream: Make initial chunk size configurable 2018-12-10 20:07:43 +01:00
Avi Kivity
8e05bcbe71 extensions: remove dependency on cql layer
The extensions class reaches into cql's property_definitions class to grab
a map<sstring, sstring> type. This generates a few unneeded dependencies.

Reduce dependencies by defining the map type ourselves; if cql's property_definitions
changes in an incompatible way, it will have to adapt, rather than the extensions
class.
2018-12-10 20:55:30 +02:00
Tomasz Grabiec
1dd2bf52ca Merge "Add a couple of tests of broken sstables" From Rafael
These are the current uninteresting cases I found when looking at
malformed_sstable_exception. The existing code is working, just not
being tested.

* https://github.com/espindola/scylla.git espindola/espindola/broken-sst:
  Add a broken sstable test.
  Add a test with mismatched schema.
2018-12-10 19:30:58 +01:00
Tomasz Grabiec
538e041f22 Merge "Remove some dependencies on db::config" from Avi
db::config is a global class; changes in any module can cause changes
in db::config. Therefore, it is a cause of needless recompilation.

Remove some of these dependencies by having consumers of db::config
declare an intermediate config struct that is contains only
configuration of interest to them, and have their caller fill it out
(in the case of auth, it already followed this scheme and the patchset
only moves the translation function).

In addition, some outright pointless inclusions of db/config.hh are
removed.

The result is somewhat shorter compile times, and fewer needless
recompiles.

* https://github.com/avikivity/scylla unconfig-1/v1:
  config: remove inclusions of db/config.hh from header files
  repair: remove unneeded config.hh inclusion
  batchlog_manager: remove dependency on db::config
  auth: remove permissions_cache dependency on db::config
  auth: remove auth::service dependency on db::config
  auth: remove unneeded db/config.hh includes
2018-12-10 14:53:14 +01:00
Benny Halevy
ef53ddf3ae scylla_io_setup: correct units in low space warning
GiB -> GB

Refs #2676

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20181210092503.10344-1-bhalevy@scylladb.com>
2018-12-10 13:58:49 +02:00
Avi Kivity
475b151c97 Merge "Use utils::small_vector more in read path" from Paweł
"
This series optimises the read path by replacing some usages of
std::vector by utils::small_vector. The motivation for this change was
an observation that memory allocation functions are pointed out by the
profiler as the ones where we spent most time and while they have a
large number of callers storage allocation for some vectors was close to
the top. The gains are not huge, since the problem is a lot of things
adding up and not a single slow thing, but we need to start with
something.

Unfortunately, the performance of boost::container::small_vector is
quite disappointing so a new implementation of a small_vector was
introduced.

perf_simple_query -c4 --duration 60, medians:

       ./perf_before  ./perf_after  diff
 read      343086.80     360720.53  5.1%

Tests: unit(release, small_vector in debug)
"

* tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla:
  partition_slice: use small_vector for column_ids
  mutation_fragment_merger: use small_vector
  auth: use small_vector in resource
  auth: avoid list-initialisation of vectors
  idl: serialiser: add serialiser for utils::small_vector
  idl: serialiser: deduplicate vector serialisers
  utils: introduce small_vector
  intrusive_set_external_comparator: make iterator nothrow move constructible
  mutation_fragment_merger: value-initialise iterator
2018-12-10 13:50:59 +02:00
Duarte Nunes
a42b2895c2 Merge branch 'gossip: Send node UP event to cql client after cql server is up' from Asias
"
This is a backport of CASSANDRA-8236.

Before this patch, scylla sends the node UP event to cql client when it
sees a new node joins the cluster, i.e., when a new node's status
becomes NORMAL. The problem is, at this time, the cql server might not
be ready yet. Once the client receives the UP event, it tries to
connect to the new node's cql port and fails.

To fix, a new application_sate::RPC_READY is introduced, new node sets
RPC_READY to false when it starts gossip in the very beginning and sets
RPC_READY to true when the cql server is ready.

The RPC_READY is a bad name but I think it is better to follow Cassandra.

Nodes with or without this patch are supposed to work together with no
problem.

Refs #3843
"

* 'asias/node_up_down.upstream.v4.1' of github.com:scylladb/seastar-dev:
  storage_service: Use cql_ready facility
  storage_service: Handle application_state::RPC_READY
  storage_service: Add notify_cql_change
  storage_service: Add debug log in notify_joined
  storage_service: Add extra check in notify_joined
  storage_service: Add notify_joined
  storage_service: Add debug log in notify_up
  storage_service: Add extra check in notify_up
  storage_service: Add notify_up
  storage_service: Make notify_left log debug level
  storage_service: Introduce notify_left
  storage_service: Add debug log in notify_down
  storage_service: Introduce notify_down
  storage_service: Add set_cql_ready
  gossip: Add gossiper::is_cql_ready
  gms: Add endpoint_state::is_cql_ready
  gms: Add application_state::RPC_READY
  gms: Introduce cql_ready in versioned_value
2018-12-10 11:37:59 +00:00
Asias He
06dc9b8da0 storage_service: Use cql_ready facility
At this point the cql_ready facility is ready. To use it, advertise the
RPC_READY application state in the following cases:

- When a node boots, set it to false
- When cql server is ready, set it to true
- When cql server is down, set it to false
2018-12-10 19:20:20 +08:00
Asias He
4761b53035 storage_service: Handle application_state::RPC_READY 2018-12-10 19:20:20 +08:00
Asias He
0e64814206 storage_service: Add notify_cql_change
It is called when a RPC_READY gossip application state is received.
2018-12-10 19:20:20 +08:00
Asias He
a1bbd7bcc7 storage_service: Add debug log in notify_joined 2018-12-10 19:20:20 +08:00
Asias He
17d68cb408 storage_service: Add extra check in notify_joined
Do not send node joined event if node is not in NORMAL status which
means the node has joined the cluster officially.
2018-12-10 19:20:20 +08:00
Asias He
9abb15192f storage_service: Add notify_joined
Add a helper for node joined event.
2018-12-10 19:20:20 +08:00
Asias He
60c74431f7 storage_service: Add debug log in notify_up 2018-12-10 19:20:20 +08:00
Asias He
948d2b6c78 storage_service: Add extra check in notify_up
Do not send up event if is_cql_ready is false which means cql server is
not ready yet or node is down.
2018-12-10 19:20:20 +08:00
Asias He
48cd31dc1e storage_service: Add notify_up
Add a helper for node up event.
2018-12-10 19:20:20 +08:00