"This series contains a couple of fixes to the
view_update_from_staging_generator, the object responsible for
generating view updates from sstables written through streaming.
Fixes#4021"
* 'materialized-views/staging-generator-fixes/v2' of https://github.com/duarten/scylla:
db/view/view_update_from_staging_generator: Break semaphore on stop()
db/view/view_update_from_staging_generator: Restore formatting
db/view/view_update_from_staging_generator: Avoid creating more than one fiber
(cherry picked from commit 96172b7bca)
"
As the amount of pending view updates increases we know that there’s a
mismatch between the rate at which the base receives writes and the
rate at which the view retires them. We react by applying backpressure
to decrease the rate of incoming base writes, allowing the slow view
replicas to catch up. We want to delay the client’s next writes to a
base replica and we use the base’s backlog of view updates to derive
this delay.
To validate this approach we tested a 3 node Scylla cluster on GCE,
using n1-standard-4 instances with NVMEs. A loader running on a
n1-standard-8 instance run cassandra-stress with 100 threads. With the
delay function d(x) set to 1s, we see no base write timeouts. With the
delay function as defined in the series, we see that backlogs stabilize
at some (arbitrary) point, as predicted, but this stabilization
co-exists with base write timeouts. However, the system overall behaves
better than the current version, with the 100 view update limit, and
also better than the version without such limit or any backpressure.
More work is necessary to further stabilize the system. Namely, we want
to keep delaying until we see the backlog is decreasing. This will
require us to add more delay beyond the stabilization point, which in
turn should minimize the base write timeouts, and will also minimize the
amount of memory the backlog takes at each base replica.
Design document:
https://docs.google.com/document/d/1J6GeLBvN8_c3SbLVp8YsOXHcLc9nOLlRY7pC6MH3JWoFixes#2538
"
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
* 'materialized-views/backpressure/v2' of https://github.com/duarten/scylla: (32 commits)
service/storage_proxy: Release mutation as early as possible
service/storage_proxy: Delay replica writes based on view update backlog
service/storage_proxy: Get the backlog of a particular base replica
service/storage_proxy: Add counters for delayed base writes
main: Start and stop the view_update_backlog_broker
service: Distribute a node's view update backlog
service: Advertise view update backlog over gossip
service/storage_proxy: Send view update backlog from replicas
service/storage_proxy: Prepare to receive replica view update backlog
service/storage_proxy: Expose local view update backlog
tests/view_schema_test: Add simple test for db::view::node_update_backlog
db/view: Introduce node_update_backlog class
db/hints: Initialize current backlog
database: Add counter for current view backlog
database: Expose current memory view update backlog
idl: Add db::view::update_backlog
db/view: Add view_update_backlog
database: Wait on view update semaphore for view building
service/storage_proxy: Use near-infinite timeouts for view updates
database: generate_and_propagate_view_updates no longer needs a timeout
...
(cherry picked from commit b66f59aa3d)
The "enable_sstables_mc_format" config item help text wants to remove itself
before release. Since scylla-3.0 did not get enough mc format mileage, we
decided to leave it in, so the notice should be removed.
Fixes#4003.
Message-Id: <20181219082554.23923-1-avi@scylladb.com>
(cherry picked from commit dd51c659f7)
Different nodes can concurrently create the distributed system
keyspace on boot, before the "if not exists" clause can take effect.
However, the resulting schema mutations will be different since
different nodes use different timestamps. This patch forces the
timestamps to be the same across all nodes, so we save some schema
mismatches.
This fixes a bug exposed by ca5dfdf, whereby the initialization of the
distributed system keyspace is done before waiting for schema
agreement. While waiting for schema agreement in
storage_service::join_token_ring(), the node still hasn't joined the
ring and schemas can't be pulled from it, so nodes can deadlock. A
similar situation can happen between a seed node and a non-seed node,
where the seed node progresses to a different "wait for schema
agreement" barrier, but still can't make progress because it can't
pull the schema from the non-seed node still trying to join the ring.
Finally, it is assumed that changes to the schema of the current
distributed system keyspace tables will be protected by a cluster
feature and a subsequent schema synchronization, such that all nodes
will be at a point where schemas can be transferred around.
Fixes#3976
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181211113407.20075-1-duarte@scylladb.com>
(cherry picked from commit 89ae3fbf11)
Currently if hints directory contains unexpected directories Scylla fails to
start with unhandled std::invalid_argument exception. Make the manager
ignore malformed files instead and try to proceed anyway.
Message-Id: <20181121134618.29936-2-gleb@scylladb.com>
(cherry picked from commit b4a8802edc)
We scan hints directory in two places: to search for files to replay and
to search for directories to remove after resharding. The code that
translates directory name to a shard is duplicated. It is simple now, so
not a bit issue but in case it grows better have it in one place.
Message-Id: <20181121134618.29936-1-gleb@scylladb.com>
(cherry picked from commit 9433d02624)
"
This series changes hinted handoff to work with `frozen_mutation`s
instead of naked `mutation`s. Instead of unfreezing a mutation from
the commitlog entry and then freezing it again for sending, now we'll
just keep the read, frozen mutation.
Tests: unit(release)
"
* 'hh-manager-cleanup/v1' of https://github.com/duarten/scylla:
db/hints/manager: Use frozen_mutation instead of mutation
db/hints/manager: Use database::find_schema()
db/commitlog/commitlog_entry: Allow moving the contained mutation
service/storage_proxy: send_to_endpoint overload accepting frozen_mutation
service/storage_proxy: Build a shared_mutation from a frozen_mutation
service/storage_proxy: Lift frozen_mutation_and_schema
service/storage_proxy: Allow non-const ranges in mutate_prepare()
(cherry picked from commit 1891779e64)
Remove the timeout argument to
db::view::view_builder::wait_until_built(), a test-only function to
wait until a given materialized view has finished building.
This change is motivated by the fact that some tests running on slow
environments will timeout. Instead of incrementally increasing the
timeout, remove it completely since tests are already run under an
exterior timeout.
Fixes#3920
Tests: unit release(view_build_test, view_schema_test)
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181115173902.19048-1-duarte@scylladb.com>
(cherry picked from commit 6fbf792777)
mutate_MV usually calls send_to_endpoint() to push view update to remote
view replicas. This function gets passed a statistics object,
service::storage_proxy_stats::write_stats and, in particular, updates
its "writes" statistic which counts the number of ongoing writes.
In the case that the paired view replica happens to be the *same* node,
we avoid calling send_to_endpoint() and call mutate_locally() instead.
That function does not take a write_stats object, so the "writes" statistic
doesn't get incremented for the duration of the write. So we should do
this explicitly.
Co-authored-by: Nadav Har'El <nyh@scylladb.com>
Co-authored-by: Duarte Nunes <duarte@scylladb.com>
(cherry picked from commit 1d5f8d0015)
During streaming, there are cases when we should invoke the view write
path. In particular, if we're streaming because of repair or if a view
has not yet finished building and we're bootstrapping a new node.
The design constraints are:
1) The streamed writes should be visible to new writes, but the
sstable should not participate in compaction, or we would lose the
ability to exclude the streamed writes on a restart;
2) The streamed writes must not be considered when generating view
updates for them;
3) Resilient to node restarts;
4) Resilient to concurrent stream sessions, possibly streaming mutations for overlapping ranges.
We achieve this by writing the streamed writes to an sstable in a
different folder, call it "staging". We achieve 1) by publishing the
sstable to the column family sstable set, but excluding it from
compactions. We do these steps upon boot, by looking at the staging
directory, thus achieving 3).
Fixes#3275
* 'streaming_view_to_staging_sstables_9' of https://github.com/psarna/scylla: (29 commits)
tests: add materialized views test
tests: add view update generator to cql test env
main: add registering staging sstables read from disk
database: add a check if loaded sstable is already staging
database: add get_staging_sstable method
streaming: stream tables with views through staging sstables
streaming: add system distributed keyspace ref to streaming
streaming: add view update generator reference to streaming
main: add generating missed mv updates from staging sstables
storage_service: move initializing sys_dist_ks before bootstrap
db/view: add view_update_from_staging_generator service
db/view: add view updating consumer
table: add stream_view_replica_updates
table: split push_view_replica_updates
table: add as_mutation_source_excluding
table: move push_view_replica_updates to table.cc
database: add populating tables with staging sstables
database: add creating /staging directory for sstables
database: add sstable-excluding reader
table: add move_sstable_from_staging_in_thread function
...
(cherry picked from commit a38f6078fb)
When a node reshards (i.e., restarts with a different number of CPUs), and
is in the middle of building a view for a pre-existing table, the view
building needs to find the right token from which to start building on all
shards. We ran the same code on all shards, hoping they would all make
the same decision on which token to continue. But in some cases, one
shard might make the decision, start building, and make progress -
all before a second shard goes to make the decision, which will now
be different.
This resulted, in some rare cases, in the new materialized view missing
a few rows when the build was interrupted with a resharding.
The fix is to add the missing synchronization: All shards should make
the same decision on whether and how to reshard - and only then should
start building the view.
Fixes#3890Fixes#3452
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20181028140549.21200-1-nyh@scylladb.com>
(cherry picked from commit b8337f8c9d)
Limit message size according to the configuration, to avoid a huge message from
allocating all of the server's memory.
We also need to limit memory used in aggregate by thrift, but that is left to
another patch.
Fixes#3878.
Message-Id: <20181024081042.13067-1-avi@scylladb.com>
(cherry picked from commit a9836ad758)
"
Hinted handoff should not overpower regular flows like READs, WRITEs or
background activities like memtable flushes or compactions.
In order to achieve this put its sending in the STEAMING CPU scheduling
group and its commitlog object into the STREAMING I/O scheduling group.
Fixes#3817
"
* 'hinted_handoff_scheduling_groups-v2' of https://github.com/vladzcloudius/scylla:
db::hints::manager: use "streaming" I/O scheduling class for reads
commitlog::read_log_file(): set the a read I/O priority class explicitly
db::hints::manager: add hints sender to the "streaming" CPU scheduling group
(cherry picked from commit 1533487ba8)
"
Refs #3828
(Probably fixes it)
We found a few flaws in a way we enable hints replaying.
First of all it was allowed before manager::start() is complete.
Then, since manager::start() is called after messaging_service is
initialized there was a time window when hints are rejected and this
creates an issue for MV.
Both issues above were found in the context of #3828.
This series fixes them both.
Tested {release}:
dtest: materialized_views_test.py:TestMaterializedViews.write_to_hinted_handoff_for_views_test
dtest: hintedhandoff_additional_test.py
"
* 'hinted_handoff_dont_create_hints_until_started-v1' of https://github.com/vladzcloudius/scylla:
hinted handoff: enable storing hints before starting messaging_service
db::hints::manager: add a "started" state
db::hints::manager: introduce a _state
(cherry picked from commit 3a53b3cebc)
"
Hints are stored on disk by a hints::manager, ensuring they are
eventually sent. A hints::resource_manager ensures the hints::managers
it tracks don't consume more than their allocated resources by
monitoring disk space and disabling new hints if needed. This series
fixes some bugs related to the backlog calculation, but mainly exposes
the backlog through a hints::manager so upper layers can apply flow
control.
Refs #2538
"
* 'hh-manager-backlog/v3' of https://github.com/duarten/scylla:
db/hints/manager: Expose current backlog
db/hints/manager: Move decision about blocking hints to the manager
db/hints/resource_manager: Correctly account resources in space_watchdog
db/hints/resource_manager: Replace timer with seastar::thread
db/hints/resource_manager: Ensure managers are correctly registered
db/hints/resource_manager: Fix formatting
db/hints: Disallow moving or copying the managers
"
This patchset makes it possible to use SSTables 'mc' format, commonly
referred to as 'SSTables 3.x', when running Scylla instance.
Several bugs found on this way are fixed. Also, a configuration option
is introduced to allow running Scylla either with 'mc' or 'la' format
as default.
Tests: unit {release}
+ tested Scylla with both 'la' and 'mc' formats to work fine:
cqlsh> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; [3/1890]
cqlsh> USE test;
cqlsh:test> CREATE TABLE cfsst3 (pk int, ck int, rc int, PRIMARY KEY (pk, ck)) WITH compression = {'sstable_compression': ''};
cqlsh:test> INSERT INTO cfsst3 (pk, ck, rc) VALUES ( 4, 7, 8);
<<flush>>
cqlsh:test> DELETE from cfsst3 WHERE pk = 4 and ck> 3 and ck < 8;
<<flush>>
cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 2, 3);
cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 4, 6);
cqlsh:test> SELECT * FROM cfsst3 ;
pk | ck | rc
----+----+------
2 | 3 | null
4 | 6 | null
(2 rows)
<<Scylla restart>>
cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 5, 7);
cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 6, 8);
cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 7, 9);
cqlsh:test> INSERT INTO cfsst3 (pk, ck) VALUES ( 8, 10);
cqlsh:test> SELECT * from cfsst3 ;
pk | ck | rc
----+----+------
5 | 7 | null
8 | 10 | null
2 | 3 | null
4 | 6 | null
7 | 9 | null
6 | 8 | null
(6 rows)
"
* 'projects/sstables-30/try-runtime/v8' of https://github.com/argenet/scylla:
database: Honour enable_sstables_mc_format configuration option.
sstables: Support SSTables 'mc' format as a feature.
db: Add configuration option for enabling SSTables 'mc' format.
tests: Add test for reading a complex column with zero subcolumns (SST3).
sstables: Fix parsing of complex columns with zero subcolumns.
sstables: Explicitly cast api::timestamp_type to uint64_t when delta-encoding.
sstables: Use parser_type instead of abstract_type::parse_type in column_translation.
bytes: Add helper for turning bytes_view into sstring_view.
sstables: Only forward the call to fast_forwarding_to in mp_row_consumer_m if filter exists.
sstables: Fix string formatting for exception messages in m_format_read_helpers.
sstables: Don't validate timestamps against the max value on parsing.
sstables: Always store only min bases in serialization_header.
sstables: Support 'mc' version parsing from filename.
SST3: Make sure we call consume_partition_end
This flag will only be used for testing purposes until Scylla 3.o
release and will be removed once SSTables 'mc' testing is completed.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
"
This series changes commitlog write path so that it uses fragmented
buffers and therefore avoids large allocations. This is done by first
switching the code to use seastar memory_output_stream interface, which
can handle fragmented buffer without any additional actions from the
user code needed and then making it use buffers of fixed size 128 kB.
Tests: unit(release, debug) dtest(commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup commitlog_test.py:TestCommitLog.test_commitlog_replay_with_alter_table)
"
* tag 'fragmented-commitlog-writes/v3' of https://github.com/pdziepak/scylla:
commitlog: switch to fragmented buffers
commitlog: drop buffer pools
commitlog: drop recovery from bad alloc
utils: drop data_output
commitlog: use memory_output_stream
serialization_visitors: add support for memory_output_stream
utils: fragmented_temporary_buffer::view: add remove_prefix()
utils: fragmented_temporary_buffer: add empty() and size_bytes()
utils: fragmented_temporary_buffer: add get_ostream()
idl: serializer: don't assume Iterator::value_type is bytes_view
idl: serializer: create buffer view from streams
utils: crc: accept FragmentRange
Currently timeout is opt-in, that is, all methods that even have it
default it to `db::no_timeout`. This means that ensuring timeout is used
where it should be is completely up to the author and the reviewrs of
the code. As humans are notoriously prone to mistakes this has resulted
in a very inconsistent usage of timeout, many clients of
`flat_mutation_reader` passing the timeout only to some members and only
on certain call sites. This is small wonder considering that some core
operations like `operator()()` only recently received a timeout
parameter and others like `peek()` didn't even have one until this
patch. Both of these methods call `fill_buffer()` which potentially
talks to the lower layers and is supposed to propagate the timeout.
All this makes the `flat_mutation_reader`'s timeout effectively useless.
To make order in this chaos make the timeout parameter a mandatory one
on all `flat_mutation_reader` methods that need it. This ensures that
humans now get a reminder from the compiler when they forget to pass the
timeout. Clients can still opt-out from passing a timeout by passing
`db::no_timeout` (the previous default value) but this will be now
explicit and developers should think before typing it.
There were suprisingly few core call sites to fix up. Where a timeout
was available nearby I propagated it to be able to pass it to the
reader, where I couldn't I passed `db::no_timeout`. Authors of the
latter kind of code (view, streaming and repair are some of the notable
examples) should maybe consider propagating down a timeout if needed.
In the test code (the wast majority of the changes) I just used
`db::no_timeout` everywhere.
Tests: unit(release, debug)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>
So far commitlog was using contiguous buffers for storing the data that
is about to be written to disk. It was able to coalesce small writes so
that multiple small mutations would use the same buffer, but if a
muation was large the commitlog would attempt to allocate a single,
appropriately large buffer. This excessively stresses the memory
allocator and may cause memory fragmentation to become an issue. The
solution is to use fixed-size buffers of 128 kB, which is the standard
buffer size in Scylla and keep large values fragmented.
Buffer pools were added in 7191a130bb
"Commitlog: recycle buffers to reduce fragmentation." They introduce a
lot of complexity and will become unnecessary once the code is switched
to use fixed-size 128kB buffers.
If a node cannot allocate a 128 kB it is already in a very bad shape, so
there isn't much value in trying to recover by attempting smaller
allocations and it just adds more complexity to the segment allocation.
It actually may be better to let some requests fail and give the node a
chance to recover rather than trying to use every last byte of free
memory and end up with bad_alloc in a noexcept context.
In previous patches, we gave up on an old (and broken) attempt to track
the timestamps of many unselected base-table columns through one row marker
in the view table - and replaced them by "virtual cells", one per unselected
cell.
The do_delete_old_entry() function still contains old code which maintained
that row marker, and is no longer needed. That old code is no only no longer
needed, it also no longer did anything because all columns now appear in
the view (as virtual columns) so the code ignored them when calculating the
row marker.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180829131914.16042-1-nyh@scylladb.com>
"
When a view's partition key contains only columns from the base's partition
key (and not an additional one), the liveness - existance or disappearance -
of a view-table row is tied to the liveness of the base table row. And
that, in turn, depends not only on selected columns (base-table columns
SELECTed to also appear in the view) but also on unselected columns.
This means that we may need to keep a view row alive even without data,
just because some unselected column is alive in the base table. Before this
patch set we tried to build a single "row marker" in the view column which
tried to summarize the liveness information in all unselected columns.
But this proved unworkable, as explained in issue #3362 and as will be
demonstrated in unit tests at the end of this series.
Because we can't replace several unselected cells by one row marker, what
we do in this series is to add for each for the unselected cells a "virtual
cell" which contains the cell's liveness information (timestamp, deletion,
ttl) but not its value. For collections, we can't represent the entire
collection by one virtual cell, and rather need a collection of virtual
cells.
Fixes#3362
"
* 'virtual-cols-v3' of https://github.com/nyh/scylla:
Materialized Views: test that virtual columns are not visible
Materialized Views: unit test reproducing fixed issue #3362
Materialized Views: no need for elaborate row marker calculations
Materialized Views: add unselected columns as virtual columns
Materialized Views: fill virtual columns
Do not allow selecting a virtual column
schema: persist "view virtual" columns to a separate system table
schema: add "view virtual" flag to schema's column_definition
Add "empty" type name to CQL parser, but only for internal parsing
memtable flushes for system and regular region groups run under the
memtable_scheduling_group, but the controller adjusts shares based on
the occupancy of the regular region group.
It can happen that regular is not under pressure, but system is. In
this case the controller will incorrectly assign low shares to the
memtable flush of system. This may result in high latency and low
throughput for writes in the system group.
I observed writes to the sytem keyspace timing out (on scylla-2.3-rc2)
in the dtest: limits_test.py:TestLimits.max_cells_test, which went
away after this.
Fixes#3717.
Message-Id: <1535016026-28006-1-git-send-email-tgrabiec@scylladb.com>
Now that we have separate virtual cells to represent unselected columns
in a materialized view, we no longer need the elaborate row-marker liveness
calculations which aimed (but failed) to do the same thing. So that code
can be removed.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
When a view's partition key contains only columns from the base's partition
key (and not an additional one), the liveness (existance or disappearance)
of a view-table row is tied to the liveness of the base table row - and
that depends not only on selected columns (base-table columns SELECTed to
also appear in the view) but also on unselected columns.
This means that we may need to keep a view row alive even without data,
just because some unselected column is alive in the base table. Before this
patch we tried to build a single "row marker" in the view column which
summarizes the liveness information in all unselected columns, but this
proved unworkable, as explained in issue #3362 and as will be demonstrated
in unit tests in a later patch.
Because we can't replace several unselected cells by one row marker, what
we do in this patch is to add for each for the unselected cell a "virtual
cell" which contains the cell's liveness information (timestamp, deletion,
ttl) but not its value. For collections, we can't represent the entire
collection by one virtual cell, and rather need a collection of virtual
cells.
This patch just adds the virtual columns to the view schema. Code in
the previous patch, when it notices the virtual columns in the view's
schema, added the appropriate content into these columns.
We may need to add virtual columns to a view when first created, but also
when an unselected column is added to the base table with "ALTER TABLE",
so both are supported in this patch.
Fixes#3362.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The add_cells_to_view() function usually adds selected cells from the base
table to the view mutation. For issue #3362, we sometimes want to also
add unselected cells as "virtual" cells - truncated versions of the
base-table cells just without the values.
This patch contains the code to fill the virtual columns' data using the
regular columns from the base table.
This patch does not yet actually *add* any virtual columns to the schema,
so until that is done (in the next patch), this patch will not yet cause
any behavior change. This is important for bisectability.
Refs #3362.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In the previous patch, we added a "view virtual" flag on columns. In this
patch we add persistance to this flag: I.e., writing it to the on-disk
schema table and reading it back on startup. But the implementation is
not as simple as adding a flag:
In the on-disk system tables, we have a "columns" table listing all the
columns in the database and their types. Cqlsh's "DESCRIBE MATERIALIZED
VIEW" works by reading this "columns" table, and listing all of the
requested view's columns. Therefore, we cannot add "virtual columns" -
which are columns not added by the user and not intended to be seen -
to this list.
We therefore need to create in this patch a separate list for virtual
columns, in a new table "view_virtual_columns". This table is essentially
identical to the existing "columns" table, just separate. We need to write
each column to the appropriate table (columns with the view_virtual flag to
"view_virtual_columns", columns without it to the old "columns"), read
from both on startup, and remember to delete columns from both when a table
is dropped.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Even before this patch, Scylla supported the "empty" type (a column with
no content) but only internally - i.e., in code but not in CQL syntax.
The "empty" type was used in dense tables without regular columns, and a
special optimization in db::cql_type_parser::parse() allowed this type
name to be parsed when reading the schema tables, without allowing the
"empty" type to be used by users in CQL statements.
However, parse() only supported "empty" itself, and more complex types
like list<empty> were not recognized by parse(). In the following patches,
we plan to add to virtual columns to materialized views, with types empty,
list<empty> or map<something, empty>. We need all these types to work, and
before this patch, they don't. But we want all of these types to only work
internally - when Scylla's code creates these hidden columns; we do not
want to add the "empty" type to CQL's syntax.
This is what we do in this patch: The CQL parser's comparator_type rule
now has a parameter, "internal", used to differenciate internal calls
via db::cql_type_parser::parse() from calls from CQL query parsing.
If a user tries something like:
CREATE TABLE e (pk empty PRIMARY KEY);
He will get the error:
Invalid (reserved) user type name empty
Note that here, as usual, unknown types are treated as "user types",
and "empty" is not allowed as a user type name - we "reserve" it in case
one day in the future we will want to allow users a direct syntax to
create empty columns. We already have, following Cassandra, a bunch of
other names reserved from being user type names, including "byte",
"complex", and others (see _reserved_type_names()), and using "empty"
as a type name will result in a similar error message.
Just like all other type names, the name "empty" is not a reserved
keyword in other senses: a user can create a table or a column with
the name "empty", just like he can create one with the name "int".
Refs #3362.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
When murmur3_ignore_msb_bits was introduced in 1.7, we set its default zero
(to avoid resharding on upgrade) and set it to 12 in the scylla.yaml template
(to make sure we get the right value for new clusters).
Now, however, things have changed:
- clusters installed before 1.7 are a small minority
- they should have resharded long ago
- resharding is much better these days
- we have more migrations from Cassandra compared to old clusters
To allow clusters that migrated using their cassandra.yaml, and to clean up
the default scylla.yaml, make the default 12.
Users upgrading from pre-1.7 clusters will need to update their scylla.yaml,
or to reshard (which is a good idea anyway).
Fixes#3670.
Message-Id: <20180808063003.26046-1-avi@scylladb.com>
"
This series replaces infinite time-outs in internal distributed
(non-local) CQL queries with finite ones.
The implementation of tracing, which also performs internal queries,
already has finite time-outs, so it is unchanged.
Fixes#3603.
"
* 'jhk/finite_time_outs/v2' of https://github.com/hakuch/scylla:
Use finite time-outs for internal auth. queries
Use finite query time-outs for `system_distributed`
The moving operation changes a node's token to a new token. It is
supported only when a node has one token. The legacy moving operation is
useful in the early days before the vnode is introduced where a node has
only one token. I don't think it is useful anymore.
In the future, we might support adjusting the number of vnodes to reblance
the token range each node owns.
Removing it simplifies the cluster operation logic and code.
Fixes#3475
Message-Id: <144d3bea4140eda550770b866ec30e961933401d.1533111227.git.asias@scylladb.com>
std::random_device() uses the relatively slow /dev/urandom, and we rarely if
ever intend to use it directly - we normally want to use it to seed a faster
random_engine (a pseudo-random number generator).
In many places in the code, we first created a random_device variable, and then
using it created a random_engine variable. However, this practice created the
risk of a programmer accidentally using the random_device object, instead of the
random_engine object, because both have the same API; This hurts performance.
This risk materialized in just two places in the code, utils/uuid.cc and
gms/gossiper.cc. A patch for to uuid.cc was sent previously by Pawel and is
not included in this patch, and the fix for gossiper.{cc,hh} is included here.
To avoid risking the same mistake in the future, this patch switches across the
code to an idiom where the random_device object is not *named*, so cannot be
accidentally used. We use the following idiom:
std::default_random_engine _engine{std::random_device{}()};
Here std::random_device{}() creates the random device (/dev/urandom) and pulls
a random integer from it. It then uses this seed to create the random_engine
(the pseudo-random number generator). The std::random_device{} object is
temporary and unnamed, and cannot be unintentionally used directly.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180726154958.4405-1-nyh@scylladb.com>
This series contains a couple of fixes to the bookkeeping of the view
build process, which could cause data to be left behind in the system
tables.
* git@github.com:duarten/scylla.git materialized-views/view-build-fixes/v1:
Duarte Nunes (3):
db/system_keyspace: Add function to remove view build status of a
shard
db/view: Don't have shard 0 clear other shard's status on drop
db/view: Restrict writes to the distributed system keyspace to shard 0
This series contains a couple of fixes to the adjusting of clustering
keys in the build_progress_virtual_reader, some of which could
potentially cause heap overflows when querying the legacy system table.
* git@github.com:duarten/scylla.git materialized-views/build-progress-virtual-reader-fixes/v1:
Duarte Nunes (3):
db/view/build_progress_virtual_reader: Use correct schema to adjust ck
db/view/build_progress_virtual_reader: Fix full ck detection
db/view/build_progress_virtual_reader: Also adjust end RT bound
Use abstract_type::to_string() to prettify partition key components.
Manually tested by setting --compaction-large-partition-warning-threshold-mb
to zero and inspecting the output for compound and non-compound partition
keys.
As an optimization, the virtual reader doesn't change the underlying
key if it is not full, and hence doesn't include the extra clustering
key. However, this detection is broken because it checked for 3
clustering columns, instead of 2.
This patch fixes that by obtaining the clustering key size from the
underlying schema instead of hardcoding the size.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
The virtual reader adjusts clustering keys obtained from the
underlying, scylla-specific schema, and potentially sheds the extra
clustering key that's absent from the Cassandra-compatible schema.
This patches ensures we use the correct schema to iterator over the
key.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>