"If a node is bootstrapped with auto_boostrap disabled, it will not
wait for schema sync before creating global keyspaces for auth and
tracing. When such schema changes are then reconciled with schema on
other nodes, they may overwrite changes made by the user before the
node was started, because they will have higher timestamp.
To prevent that, let's use minimum timestamp so that default schema
always looses with manual modifications. This is what Cassandra does.
Fixes #2129."
* tag 'tgrabiec/prevent-keyspace-metadata-loss-v1' of github.com:scylladb/seastar-dev:
db: Create default auth and tracing keyspaces using lowest timestamp
migration_manager: Append actual keyspace mutations with schema notifications
(cherry picked from commit 6db6d25f66)
The skip() implementation for the compressed file input stream incorrectly
handled the case of skipping to the end of file: In that case we just need
to update the file pointer, but not skip anywhere in the compressed disk
file; In particular, we must NOT call locate() to find the relevant on-disk
compressed chunk, because there is none - locate() can only be called on
actual positions of bytes, not on the one-past-end-of-file position.
Fixes#2143
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170308100057.23316-1-nyh@scylladb.com>
(cherry picked from commit 506e074ba4)
db_apply() expects to be given a time point at which the request will
time out. Originally, do_apply_counter_update() passed 0, which meant
that all requests were timed out if do_apply() needed to wait. The
caller of do_apply_counter_update() is already given a correct timeout
time point so the only thing needed to fix this problem it to propagate
it properly inside do_apply_counter_update() to the call to do_apply().
Fixes#2119.
Message-Id: <20170307104405.5843-1-pdziepak@scylladb.com>
logalloc::reclaim_lock prevents reclaim from running which may cause
regular allocation to fail although there is enough of free memory.
To solve that there is an allocation_section which acquire reclaim_lock
and if allocation fails it run reclaimer outside of a lock and retries
the allocation. The patch make use of allocation_section instead of
direct use of reclaim_lock in memtable code.
Fixes#2138.
Message-Id: <20170306160050.GC5902@scylladb.com>
(cherry picked from commit d7bdf16a16)
If query_time is time_point::min(), which is used by
to_data_query_result(), the result of subtraction of
gc_grace_seconds() from query_time will overflow.
I don't think this bug would currently have user-perceivable
effects. This affects which tombstones are dropped, but in case of
to_data_query_result() uses, tombstones are not present in the final
data query result, and mutation_partition::do_compact() takes
tombstones into consideration while compacting before expiring them.
Fixes the following UBSAN report:
/usr/include/c++/5.3.1/chrono:399:55: runtime error: signed integer overflow: -2147483648 - 604800 cannot be represented in type 'int'
Message-Id: <1488385429-14276-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 4b6e77e97e)
Since gcc-5/stretch=5.4.1-2 removed from apt repository, we nolonger able to
build gcc-5.
To avoid dead link, use launchpad.net archives instead of using apt-get source.
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1488189378-5607-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit ba323e2074)
Fixes the following UBSAN warning:
core/semaphore.hh:293:74: runtime error: reference binding to misaligned address 0x0000006c55d7 for type 'struct basic_semaphore', which requires 8 byte alignment
Since the field was not initialied properly, probably also fixes some
user-visible bug.
Message-Id: <1488368222-32009-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 0c84f00b16)
Failing to close a file properly before destroying file's object causes
crashes.
[tgrabiec: fixed typo]
Fixes#2122.
Message-Id: <20170221144858.GG11471@scylladb.com>
(cherry picked from commit 0977f4fdf8)
Set murmur3_partitioner_ignore_msb_bits to 12 (enabling the new sharding
algorithm), but do this in scylla.yaml rather than the built-in defaults.
This avoids changing the configuration for existing clusters, as their
scylla.yaml file will not be updated during the upgrade.
Message-Id: <20170214123253.3933-1-avi@scylladb.com>
(cherry picked from commit 9b113ffd3e)
"This series contains some fixes and a unit test for the logic responsible
for locking counter cells."
* 'pdziepak/cell-locking-fixes/v1' of github.com:cloudius-systems/seastar-dev:
tests: add test for counter cell locker
cell_locking: fix schema upgrades
cell_locker: make locker non-movable
cell_locking: allow to be included by anyone
(cherry picked from commit b8c4b35b57)
ninja-build-1.6.0-2.fc23.src.rpm on fedora web site deleted for some
reason, but there is ninja-build-1.7.2-2 on EPEL, so we don't need to
backport from Fedora anymore.
Fixes#2087
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1487155729-13257-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit 9c8515eeed)
"This series makes sure that schemas containing both counter and non-counter
regular or static columns are not allowed."
* 'pdziepak/disallow-mixed-schemas/v1' of github.com:cloudius-systems/seastar-dev:
schema: verify that there are no both counter and non-counter columns
test/mutation_source: specify whether to generate counter mutations
tests/canonical_mutation: don't try to upgrade incompatible schemas
(cherry picked from commit 9e4ae0763d)
Merge commit 45b6070832 used butchered version of storage_proxy
patch to adjust to rpc timer change instead the one I've sent. This
patch fixes the differences.
Message-Id: <20170206095237.GA7691@scylladb.com>
(cherry picked from commit 3c372525ed)
scylla-housekeeping requires to run 'restart mode' for check the version during
scylla-server restart, which wasn't called on systemd timer so added it.
Existing scylla-housekeeping.timer renamed to scylla-housekeeping-daily.timer,
since it is running 'daily mode'.
Fixes#1953
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1486180031-18093-1-git-send-email-syuu@scylladb.com>
(cherry picked from commit e82932b774)
The comparator constructor took schema by value instead of const l-ref
and, consequently, later tried to access object that has been destroyed
long time ago.
Message-Id: <20170202135853.8190-1-pdziepak@scylladb.com>
(cherry picked from commit 37b0c71f1d)
"Before, the logic for releasing writes blocked on dirty worked like this:
1) When region group size changes and it is not under pressure and there
are some requests blocked, then schedule request releasing task
2) request releasing task, if no pressure, runs one request and if there are
still blocked requests, schedules next request releasing task
If requests don't change the size of the region group, then either some request
executes or there is a request releasing task scheduled. The amount of scheduled
tasks is at most 1, there is a single releasing thread.
However, if requests themselves would change the size of the group, then each
such change would schedule yet another request releasing thread, growing the task
queue size by one.
The group size can also change when memory is reclaimed from the groups (e.g.
when contains sparse segments). Compaction may start many request releasing
threads due to group size updates.
Such behavior is detrimental for performance and stability if there are a lot
of blocked requests. This can happen on 1.5 even with modest concurrency
because timed out requests stay in the queue. This is less likely on 1.6 where
they are dropped from the queue.
The releasing of tasks may start to dominate over other processes in the
system. When the amount of scheduled tasks reaches 1000, polling stops and
server becomes unresponsive until all of the released requests are done, which
is either when they start to block on dirty memory again or run out of blocked
requests. It may take a while to reach pressure condition after memtable flush
if it brings virtual dirty much below the threshold, which is currently the
case for workloads with overwrites producing sparse regions.
I saw this happening in a write workload from issue #2021 where the number of
request releasing threads grew into thousands.
Fix by ensuring there is at most one request releasing thread at a time. There
will be one releasing fiber per region group which is woken up when pressure is
lifted. It executes blocked requests until pressure occurs."
* tag 'tgrabiec/lsa-single-threaded-releasing-v2' of github.com:cloudius-systems/seastar-dev:
tests: lsa: Add test for reclaimer starting and stopping
tests: lsa: Add request releasing stress test
lsa: Avoid avalanche releasing of requests
lsa: Move definitions to .cc
lsa: Simplify hard pressure notification management
lsa: Do not start or stop reclaiming on hard pressure
tests: lsa: Adjust to take into account that reclaimers are run synchronously
lsa: Document and annotate reclaimer notification callbacks
tests: lsa: Use with_timeout() in quiesce()
(cherry picked from commit 7a00dd6985)
"This series introduces support for counters. The implementation of
counters more or less follows the design described on our wiki page [1].
Counter cells contain many shards with replicas being able to modify
and announce new versions only of the shards that they own. Historically,
there were three types of shards: local, remote and global. In these
patches only support for the global ones is added.
[1] https://github.com/scylladb/scylla/wiki/Counters
Currently, counters are only enabled as experimental features as there
still several things that need to be done before they become production
ready. Namely, the performance is expected to be quite poor (especially
for writes), there is no proper tracing support and timed out counter
requests may not be recognized and dropped early. There are also no
counter-related metrics.
However, apart from these problems there are no other missing parts of
counter implementation and they are expected to work correctly.
Fixes #577."
* 'pdziepak/counters/v3-rebased' of github.com:cloudius-systems/seastar-dev: (38 commits)
perf_simple_query: add counter tables tests
thrift: add support for counter operations
cql3: allow counters in CREATE TABLE statements
cql3: selection: do not panic when seeing counters
storage_proxy: support counter updates
storage_proxy: add get_live_endpoints()
cql3: add counter increment and decrement operations
db: add operations for applying counter updates
counters: implement transforming counter deltas to shards
add infrastructure for locking counter cells
add fnv1a hasher
position_in_partition: add feed_hash()
position_in_partition: add functions for querying object type
types: make counter_type_impl report its cql3_type
transport: encode counters as long_type
mutation_partition: make for_each_cell() accessible outside source file
messaging_service: add COUNTER_MUTATION verb
storage_service: add COUNTERS feature
idl: add idl description of consistency level
schema: make is_counter() return correct value
...
The leader receives counter updates as deltas which have to be
transformed to counter shards. In order to do that, current local shard
of the modified counter cell needs to be read, logical clock incremented
and the value modified by the specified delta.
The leader receives counter update in a form of deltas which need to be
transformed to counter shards. In order to do that the node needs to
read its current state of the modified counter cells. Since this is
essentially a read-modify-write opertation an appropriate locking
mechanism is needed.
Counter cell locker introduced in this patch uses a hashtable of
partition entry each containing a hashtable of cell entries. Inside a
cell entry there is a semaphore used for synchronization. Once no longer
needed cell entries and partition entries are removed.
In order to avoid deadlocks cell entries are always locked in the same
order which is the lexicographical order of (clustering key, column id)
pairs. Note that schema changes are not a difficulty since they do not
make it possible to change ordering of such pairs.