Represent a position of a mutation_fragment read from a flat mutation
reader. Repair nodes negotiate a small sub range identified by two
repair_sync_boundary to work on in each round.
The new row level repair code will access clustering_key_prefix and it
uses std::optional everywhere. Convert position_in_partition to use
std::optional.
This patch adds compatibility for Cassandra's "chunk_size_in_kb", as
well as it keeps Scylla's "chunk_size_kb" compression parameter.
Fixes#3669
Tests: unit (release)
v2: use variable instead of array
v3: fix commited files
Signed-off-by: Juliana Oliveira <juliana@scylladb.com>
Message-Id: <20181211215840.GA7379@shenzou.localdomain>
Cassandra supports a "CREATE CUSTOM INDEX" to create a secondary index
with a custom implementation. The only custom implementation that Cassandra
supports is SASI. But Scylla doesn't support this, or any other custom
index implementation. If a CREATE CUSTOM INDEX statement is used, we
shouldn't silently ignore the "CUSTOM" tag, we should generate an error.
This patch also includes a regression test that "CREATE CUSTOM INDEX"
statements with valid syntax fail (before this patch, they succeeded).
Fixes#3977
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181211224545.18349-2-nyh@scylladb.com>
Different nodes can concurrently create the distributed system
keyspace on boot, before the "if not exists" clause can take effect.
However, the resulting schema mutations will be different since
different nodes use different timestamps. This patch forces the
timestamps to be the same across all nodes, so we save some schema
mismatches.
This fixes a bug exposed by ca5dfdf, whereby the initialization of the
distributed system keyspace is done before waiting for schema
agreement. While waiting for schema agreement in
storage_service::join_token_ring(), the node still hasn't joined the
ring and schemas can't be pulled from it, so nodes can deadlock. A
similar situation can happen between a seed node and a non-seed node,
where the seed node progresses to a different "wait for schema
agreement" barrier, but still can't make progress because it can't
pull the schema from the non-seed node still trying to join the ring.
Finally, it is assumed that changes to the schema of the current
distributed system keyspace tables will be protected by a cluster
feature and a subsequent schema synchronization, such that all nodes
will be at a point where schemas can be transferred around.
Fixes#3976
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181211113407.20075-1-duarte@scylladb.com>
"
This series contains several optimizations of the MC format sstable writer, mainly:
- Avoiding output_stream when serializing into memory (e.g. a row)
- Faster serialization of primitive types when serializing into memory
I measured the improvement in throughput (frag/s) using perf_fast_forward for
datasets with a single large partition with many small rows:
- 10% for a row with a single cell of 8 bytes
- 10% for a row with a single cell of 100 bytes
- 9% for a row with a single cell of 1000 bytes
- 13% for a row with 6 cells of 100 bytes
"
* tag 'avoid-output-stream-in-sstable-writer-v2' of github.com:tgrabiec/scylla:
bytes_ostream: Optimize writing of fixed-size types
sstables: mc: Write temporary data to bytes_ostream rather than file_writer
sstables: mc: Avoid double-serialization of a range tombstone marker
sstables: file_writer: Generalize bytes& writer to accept bytes_view
sstables: Templetize write() functions on the writer
sstables: Turn m_format_write_helpers.cc into an impl header
sstables: De-futurize file_writer
bytes_ostream: Implement clear()
bytes_ostream: Make initial chunk size configurable
"
Carry out simplifications of db::extensions: less magical types, de-inline
complex functions, and reduce #include dependencies
Tests: unit(release)
"
* tag 'extensions-simplify/v1' of https://github.com/avikivity/scylla:
extensions: remove unneeded includes
extensions: deinline extension accessors
extensions: return concrete types from the extension accessors
extensions: remove dependency on cql layer
Returning "auto" makes it harder to understand what the function is returning,
and impossible to de-inline.
Return a vector of pointers instead. The caller should iterate immediately, in
any case, and since the previous return value was a range of references to const
unique_ptrs, nothing else could be done with it anyway.
Inlining write() allows the writing code to be optimized for
fixed-size types. In particular, memcpy() calls and loops will be
eliminated.
Saw 4% improvement in throughput in perf_fast_forward for tiny rows.
Currently temporary data is serialized into a file_writer, because
that's what write() functions used to expect, which goes through an
output_stream, a data_sink, into an in-memory data sink implementation
which collects the temporary_buffers.
Going through those abstractions is relatively expensive if we don't
write much, because each time we begin to write after a flush() of the
file_writer the output stream has to allocate a new buffer, which
means a large allocation for small amount of data.
We could avoid that and write into bytes_ostream directly, which will
keep its buffer across clear().
write() functions which are used both to write directly into the data
file and to a temporary arena were templatized to accept a Writer to
which both file_writer and bytes_ostream conform.
I need to templatize functions defined in it and want to avoid
explicit instantiations.
There is only one compilation unit in which this is used
(sstables.cc). I think in the long term we should move all those
"helpers" into sstables/mc/writer.{cc,hh} together with their only
user, the sstable_writer_m class from sstables.cc.
The extensions class reaches into cql's property_definitions class to grab
a map<sstring, sstring> type. This generates a few unneeded dependencies.
Reduce dependencies by defining the map type ourselves; if cql's property_definitions
changes in an incompatible way, it will have to adapt, rather than the extensions
class.
These are the current uninteresting cases I found when looking at
malformed_sstable_exception. The existing code is working, just not
being tested.
* https://github.com/espindola/scylla.git espindola/espindola/broken-sst:
Add a broken sstable test.
Add a test with mismatched schema.
db::config is a global class; changes in any module can cause changes
in db::config. Therefore, it is a cause of needless recompilation.
Remove some of these dependencies by having consumers of db::config
declare an intermediate config struct that is contains only
configuration of interest to them, and have their caller fill it out
(in the case of auth, it already followed this scheme and the patchset
only moves the translation function).
In addition, some outright pointless inclusions of db/config.hh are
removed.
The result is somewhat shorter compile times, and fewer needless
recompiles.
* https://github.com/avikivity/scylla unconfig-1/v1:
config: remove inclusions of db/config.hh from header files
repair: remove unneeded config.hh inclusion
batchlog_manager: remove dependency on db::config
auth: remove permissions_cache dependency on db::config
auth: remove auth::service dependency on db::config
auth: remove unneeded db/config.hh includes
"
This series optimises the read path by replacing some usages of
std::vector by utils::small_vector. The motivation for this change was
an observation that memory allocation functions are pointed out by the
profiler as the ones where we spent most time and while they have a
large number of callers storage allocation for some vectors was close to
the top. The gains are not huge, since the problem is a lot of things
adding up and not a single slow thing, but we need to start with
something.
Unfortunately, the performance of boost::container::small_vector is
quite disappointing so a new implementation of a small_vector was
introduced.
perf_simple_query -c4 --duration 60, medians:
./perf_before ./perf_after diff
read 343086.80 360720.53 5.1%
Tests: unit(release, small_vector in debug)
"
* tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla:
partition_slice: use small_vector for column_ids
mutation_fragment_merger: use small_vector
auth: use small_vector in resource
auth: avoid list-initialisation of vectors
idl: serialiser: add serialiser for utils::small_vector
idl: serialiser: deduplicate vector serialisers
utils: introduce small_vector
intrusive_set_external_comparator: make iterator nothrow move constructible
mutation_fragment_merger: value-initialise iterator
"
This is a backport of CASSANDRA-8236.
Before this patch, scylla sends the node UP event to cql client when it
sees a new node joins the cluster, i.e., when a new node's status
becomes NORMAL. The problem is, at this time, the cql server might not
be ready yet. Once the client receives the UP event, it tries to
connect to the new node's cql port and fails.
To fix, a new application_sate::RPC_READY is introduced, new node sets
RPC_READY to false when it starts gossip in the very beginning and sets
RPC_READY to true when the cql server is ready.
The RPC_READY is a bad name but I think it is better to follow Cassandra.
Nodes with or without this patch are supposed to work together with no
problem.
Refs #3843
"
* 'asias/node_up_down.upstream.v4.1' of github.com:scylladb/seastar-dev:
storage_service: Use cql_ready facility
storage_service: Handle application_state::RPC_READY
storage_service: Add notify_cql_change
storage_service: Add debug log in notify_joined
storage_service: Add extra check in notify_joined
storage_service: Add notify_joined
storage_service: Add debug log in notify_up
storage_service: Add extra check in notify_up
storage_service: Add notify_up
storage_service: Make notify_left log debug level
storage_service: Introduce notify_left
storage_service: Add debug log in notify_down
storage_service: Introduce notify_down
storage_service: Add set_cql_ready
gossip: Add gossiper::is_cql_ready
gms: Add endpoint_state::is_cql_ready
gms: Add application_state::RPC_READY
gms: Introduce cql_ready in versioned_value
At this point the cql_ready facility is ready. To use it, advertise the
RPC_READY application state in the following cases:
- When a node boots, set it to false
- When cql server is ready, set it to true
- When cql server is down, set it to false