Compare commits

...

44 Commits

Author SHA1 Message Date
Pekka Enberg
adad12ddc3 release: prepare for 2.3.rc1 2018-07-24 09:21:38 +03:00
Avi Kivity
a77bb1fe34 Merge "row_cache: Fix violation of continuity on concurrent eviction and population" from Tomasz
"
The problem happens under the following circumstances:

  - we have a partially populated partition in cache, with a gap in the middle

  - a read with no clustering restrictions trying to populate that gap

  - eviction of the entry for the lower bound of the gap concurrent with population

The population may incorrectly mark the range before the gap as continuous.
This may result in temporary loss of writes in that clustering range. The
problem heals by clearing cache.

Caught by row_cache_test::test_concurrent_reads_and_eviction, which has been
failing sporadically.

The problem is in ensure_population_lower_bound(), which returns true if
current clustering range covers all rows, which means that the populator has a
right to set continuity flag to true on the row it inserts. This is correct
only if the current population range actually starts since before all
clustering rows. Otherwise, we're populating since _last_row and should
consult it.

Fixes #3608.
"

* 'tgrabiec/fix-violation-of-continuity-on-concurrent-read-and-eviction' of github.com:tgrabiec/scylla:
  row_cache: Fix violation of continuity on concurrent eviction and population
  position_in_partition: Introduce is_before_all_clustered_rows()

(cherry picked from commit 31151cadd4)
2018-07-18 12:05:51 +02:00
Tomasz Grabiec
3c7e6dfdb9 mutation_partition: Fix exception-safety of row copy constructor
In case population of the vector throws, the vector object would not
be destroyed. It's a managed object, so in addition to causing a leak,
it would corrupt memory if later moved by the LSA, because it would
try to fixup forward references to itself.

Caused sporadic failures and crashes of row_cache_test, especially
with allocation failure injector enabled.

Introduced in 27014a23d7.
Message-Id: <1531757764-7638-1-git-send-email-tgrabiec@scylladb.com>

(cherry picked from commit 3f509ee3a2)
2018-07-17 18:25:12 +02:00
Amos Kong
fab136ae1d scylla_setup: nic setup dialog is only for interactive mode
Current code raises dialog even for non-interactive mode when we pass options
in executing scylla_setup. This blocked automatical artifact-test.

Fixes #3549

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <58f90e1e2837f31d9333d7e9fb68ce05208323da.1531824972.git.amos@scylladb.com>
(cherry picked from commit 0fcdab8538)
2018-07-17 18:24:23 +03:00
Botond Dénes
a4218f536b storage_proxy: use the original row limits for the final results merging
`query_partition_key_range()` does the final result merging and trimming
(if necessary) to make sure we don't send more rows to the client than
requested. This merging and trimming is done by a continuation attached
to the `query_partition_key_range_concurrent()` which does the actual
querying. The continuations captures via value the `row_limit` and
`partition_limit` fields of the `query::read_command` object of the
query. This has an unexpected consequence. The lambda object is
constructed after the call to `query_partition_key_range_concurrent()`
returns. If this call doesn't defer, any modifications done to the read
command object done by `query_partition_key_range_concurrent()` will be
visible to the lambda. This is undesirable because
`query_partition_key_range_concurrent()` updates the read command object
directly as the vnodes are traversed which in turn will result in the
lambda doing the final trimming according to a decremented `row_limits`,
which will cause the paging logic to declare the query as exhausted
prematurely because the page will not be full.
To avoid all this make a copy of the relevant limit fields before
`query_partition_key_range_concurrent()` is called and pass these copies
to the continuation, thus ensuring that the final trimming will be done
according to the original page limits.

Spotted while investigating a dtest failure on my 1865/range-scans/v2
branch. On that branch the way range scans are executed on replicas is
completely refactored. These changes appearantly reduce the number of
continuations in the read path to the point where an entire page can be
filled without deferring and thus causing the problem to surface.

Fixes #3605.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f11e80a6bf8089d49ba3c112b25a69edf1a92231.1531743940.git.bdenes@scylladb.com>
(cherry picked from commit cc4acb6e26)
2018-07-16 16:55:12 +03:00
Takuya ASADA
9f4431ef04 dist/common/scripts/scylla_prepare: fix error when /etc/scylla/ami_disabled exists
On this part shell command wasn't converted to python3, need to fix.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180715075015.13071-1-syuu@scylladb.com>
(cherry picked from commit 9479ff6b1e)
2018-07-16 09:56:57 +03:00
Takuya ASADA
66250bf8cc dist/redhat: drop scylla_lib.sh from .rpm
Since we dropped scylla_lib.sh at 58e6ad22b2,
we need remove it from RPM spec file too.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180712155129.17056-1-syuu@scylladb.com>
(cherry picked from commit 1511d92473)
2018-07-16 09:44:48 +03:00
Takuya ASADA
88fe3c2694 dist/common/scripts/scylla_ec2_check: support custom NIC ifname on EC2
Since some AMIs using consistent network device naming, primary NIC
ifname is not 'eth0'.
But we hardcoded NIC name as 'eth0' on scylla_ec2_check, we need to add
--nic option to specify custom NIC ifname.

Fixes #3584

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180712142446.15909-1-syuu@scylladb.com>
(cherry picked from commit ee61660b76)
2018-07-16 09:44:26 +03:00
Takuya ASADA
db4c3d3e52 dist/common/scripts/scylla_util.py: fix typo
Fix typo, and rename get_mode_cpu_set() to get_mode_cpuset(), since a
term 'cpuset' is not included '_' on other places.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180711141923.12675-1-syuu@scylladb.com>
(cherry picked from commit 8f80d23b07)
2018-07-16 09:43:47 +03:00
Takuya ASADA
ca22a1cd1a dist/common/scripts: drop scylla_lib.sh
Drop scylla_lib.sh since all bash scripts depends on the library is
already converted to python3, and all scylla_lib.sh features are
implemented on scylla_util.py.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180711114756.21823-1-syuu@scylladb.com>
(cherry picked from commit 58e6ad22b2)
2018-07-16 09:43:25 +03:00
Avi Kivity
f9b702764e Update scylla-ami submodule
* dist/ami/files/scylla-ami 5200f3f...d53834f (1):
  > Merge "AMI scripts python3 conversion" from Takuya

(cherry picked from commit 83d72f3755)
2018-07-16 09:43:15 +03:00
Avi Kivity
54701bd95c Merge "more conversion from bash to python3" from Takuya
"Converted more scripts to python3."

* 'script_python_conversion2_v2' of https://github.com/syuu1228/scylla:
  dist/common/scripts/scylla_util.py: make run()/out() functions shorter
  dist/ami: install python34 to run scylla_install_ami
  dist/common/scripts/scylla_ec2_check: move ec2 related code to class aws_instance
  dist/common/scripts: drop class concolor, use colorprint()
  dist/ami/files/.bash_profile: convert almost all lines to python3
  dist/common/scripts: convert node_exporter_install to python3
  dist/common/scripts: convert scylla_stop to python3
  dist/common/scripts: convert scylla_prepare to python3

(cherry picked from commit 693cf77022)
2018-07-16 09:41:50 +03:00
Asias He
30eca5f534 storage_service: Limit number of REPLICATION_FINISHED verb can retry
In the removenode operation, if the message servicing is stopped, e.g., due
to disk io error isolation, the node can keep retrying the
REPLICATION_FINISHED verb infinitely.

Scylla log full of such message was observed:

[shard 0] storage_service - Fail to send REPLICATION_FINISHED to $IP:0:
seastar::rpc::closed_error (connection is closed)

To fix, limit the number of retires.

Tests: update_cluster_layout_tests.py

Fixes #3542

Message-Id: <638d392d6b39cc2dd2b175d7f000e7fb1d474f87.1529927816.git.asias@scylladb.com>
(cherry picked from commit bb4d361cf6)
2018-07-16 09:33:56 +03:00
Piotr Sarna
cd057d3882 database: make drop_column_family wait on reads in progress
drop_column_family now waits for both writes and reads in progress.
It solves possible liveness issues with row cache, when column_family
could be dropped prematurely, before the read request was finished.

Phaser operation is passed inside database::query() call.
There are other places where reading logic is applied (e.g. view
replicas), but these are guarded with different synchronization
mechanisms, while _pending_reads_phaser applies to regular reads only.

Fixes #3357

Reported-by: Duarte Nunes <duarte@scylladb.com>
Signed-off-by: Piotr Sarna <sarna@scylladb.com>
Message-Id: <d58a5ee10596d0d62c765ee2114ac171b6f087d2.1529928323.git.sarna@scylladb.com>
(cherry picked from commit 03753cc431)
2018-07-16 09:32:15 +03:00
Piotr Sarna
c5a5a2265e database: add phaser for reads
Currently drop_column_family waits on write_in_progress phaser,
but there's no such mechanism for reads. This commit adds
a corresponding reads phaser.

Refs #3357

Reported-by: Duarte Nunes <duarte@scylladb.com>
Signed-off-by: Piotr Sarna <sarna@scylladb.com>
Message-Id: <70b5fdd44efbc24df61585baef024b809cabe527.1529928323.git.sarna@scylladb.com>
(cherry picked from commit e1a867cbe3)
2018-07-16 09:32:06 +03:00
Takuya ASADA
3e482c6c9d dist/common/scripts/scylla_util.py: use os.open(O_EXCL) to verify disk is unused
To simplify is_unused_disk(), just try to open the disk instead of
checking multiple block subsystems.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180709102729.30066-1-syuu@scylladb.com>
(cherry picked from commit 1a5a40e5f6)
2018-07-11 12:51:17 +03:00
Avi Kivity
5b6cadb890 Update scylla-ami submodule
* dist/ami/files/scylla-ami 67293ba...5200f3f (1):
  > Add custom script options to AMI user-data

(cherry picked from commit 7d0df2a06d)
2018-07-11 12:51:08 +03:00
Takuya ASADA
9cf8cd6c02 dist/common/scripts/scylla_util.py: strip double quote from sysconfig parameter
Current sysconfig_parser.get() returns parameter including double quote,
it will cause problem by append text using sysconfig_parser.set().

Fixes #3587

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180706172219.16859-1-syuu@scylladb.com>
(cherry picked from commit 929ba016ed)
2018-07-11 12:51:01 +03:00
Vlad Zolotarov
b34567b69b dist: scylla_lib.sh: get_mode_cpu_set: split the declaration and ssignment to the local variable
In bash local variable declaration is a separate operation with its own exit status
(always 0) therefore constructs like

local var=`cmd`

will always result in the 0 exit status ($? value) regardless of the actual
result of "cmd" invocation.

To overcome this we should split the declaration and the assignment to be like this:

local var
var=`cmd`

Fixes #3508

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1529702903-24909-3-git-send-email-vladz@scylladb.com>
(cherry picked from commit 7495c8e56d)
2018-07-11 12:50:51 +03:00
Vlad Zolotarov
02b763ed97 dist: scylla_lib.sh: get_mode_cpu_set: don't let the error messages out
References #3508

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1529702903-24909-2-git-send-email-vladz@scylladb.com>
(cherry picked from commit f3ca17b1a1)
2018-07-11 12:50:43 +03:00
Takuya ASADA
05500a52d7 dist/common/scripts/scylla_sysconfig_setup: fix typo
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180705133313.16934-1-syuu@scylladb.com>
(cherry picked from commit 4df982fe07)
2018-07-11 12:50:32 +03:00
Avi Kivity
4afa558e97 Update scylla-ami submodule
* dist/ami/files/scylla-ami 0fd9d23...67293ba (1):
  > scylla_install_ami: fix broken argument parser

Fixes #3578.

(cherry picked from commit dd083122f9)
2018-07-11 12:50:24 +03:00
Takuya ASADA
f3956421f7 dist/ami: hardcode target for scylla_current_repo since we don't have --target option anymore
We break build_ami.sh since we dropped Ubuntu support, scylla_current_repo
command does not finishes because of less argument ('--target' with no
distribution name, since $TARGET is always blank now).
It need to hardcoded as centos.

Fixes #3577

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180705035251.29160-1-syuu@scylladb.com>
(cherry picked from commit 3bcc123000)
2018-07-11 12:49:52 +03:00
Takuya ASADA
a17a6ce8f5 dist/debian/build_deb.sh: make build_deb.sh more simplified
Use is_debian()/is_ubuntu() to detect target distribution, also install
pystache by path since package name is different between Fedora and
CentOS.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180703193224.4773-1-syuu@scylladb.com>
(cherry picked from commit 3cb7ddaf68)
2018-07-11 12:49:40 +03:00
Takuya ASADA
58a362c1f2 dist/ami/files/.bash_profile: drop Ubuntu support
Drop Ubuntu support on login prompt, too.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180703192813.4589-1-syuu@scylladb.com>
(cherry picked from commit ed1d0b6839)
2018-07-11 12:49:30 +03:00
Alexys Jacob
361b2dd7a5 Support Gentoo Linux on node_health_check script.
Gentoo Linux was not supported by the node_health_check script
which resulted in the following error message displayed:

"This s a Non-Supported OS, Please Review the Support Matrix"

This patch adds support for Gentoo Linux while adding a TODO note
to add support for authenticated clusters which the script does
not support yet.

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20180703124458.3788-1-ultrabug@gentoo.org>
(cherry picked from commit 8c03c1e2ce)
2018-07-11 12:49:22 +03:00
Duarte Nunes
f6a2bafae2 Merge 'Expose sharding information to connections' from Avi
"
In the same way that drivers can route requests to a coordinator that
is also a replica of the data used by the request, we can allow
drivers to route requests directly to the shard. This patchset
adds and documents a way for drivers to know which shard a connection
is connected to, and how to perform this routing.
"

* tag 'shard-info-alt/v1' of https://github.com/avikivity/scylla:
  doc: documented protocol extension for exposing sharding
  transport: expose more information about sharding via the OPTIONS/SUPPORTED messages
  dht: add i_partitioner::sharding_ignore_msb()

(cherry picked from commit 33d7de0805)
2018-07-09 17:06:30 +03:00
Avi Kivity
2ec25a55cd Update seastar submodule
* seastar d7f35d7...814a055 (1):
  > reactor: pollable_fd: limit fragment count to IOV_MAX
2018-07-09 17:05:26 +03:00
Avi Kivity
d3fb7c5515 .gitmodules: branch seastar
This allows us to backport individual patches to seastar for
branch-2.3.
2018-07-09 17:03:50 +03:00
Botond Dénes
b1ac6a36f2 tests/cql_query_tess: add unit test for querying empty ranges test
A bug was found recently (#3564) in the paging logic, where the code
assumed the queried ranges list is non-empty. This assumption is
incorrect as there can be valid (if rare) queries that can result in the
ranges list to be empty. Add a unit test that executes such a query with
paging enabled to detect any future bugs related to assumptions about
the ranges list being non-empty.

Refs: #3564
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f5ba308c4014c24bb392060a7e72e7521ff021fa.1530618836.git.bdenes@scylladb.com>
(cherry picked from commit c236a96d7d)
Message-Id: <af315aef64d381a7f486ba190c9a1b5bdd6f800b.1530698046.git.bdenes@scylladb.com>
2018-07-04 12:13:33 +02:00
Botond Dénes
8cba125bce query_pager: use query::is_single_partition() to check for singular range
Use query::is_single_partition() to check whether the queried ranges are
singular or not. The current method of using
`dht::partition_range::is_singular()` is incorrect, as it is possible to
build a singular range that doesn't represent a single partition.
`query::is_single_partition()` correctly checks for this so use it
instead.

Found during code-review.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f671f107e8069910a2f84b14c8d22638333d571c.1530675889.git.bdenes@scylladb.com>
(cherry picked from commit 8084ce3a8e)
2018-07-04 12:03:18 +02:00
Tomasz Grabiec
f46f9f7533 Merge "Fix atomic_cell_or_collection::external_memory_usage()" from Paweł
After the transition to the new in-memory representation in
aab6b0ee27 'Merge "Introduce new in-memory
representation for cells" from Paweł'
atomic_cell_or_collection::external_memory_usage() stopped accounting
for the externally stored data. Since, it wasn't covered by the unit
tests the bug remained unnotices until now.

This series fixes the memory usage calculation and adds proper unit
tests.

* https://github.com/pdziepak/scylla.git fix-external-memory-usage/v1:
  tests/mutation: properly mark atomic_cells that are collection members
  imr::utils::object: expose size overhead
  data::cell: expose size overhead of external chunks
  atomic_cell: add external chunks and overheads to
    external_memory_usage()
  tests/mutation: test external_memory_usage()

(cherry picked from commit 2ffb621271)
2018-07-04 11:45:06 +02:00
Botond Dénes
090d991f8e query_pager: be prepared to _ranges being empty
do_fetch_page() checks in the beginning whether there is a saved query
state already, meaning this is not the first page. If there is not it
checks whether the query is for a singulular partitions or a range scan
to decide whether to enable the stateful queries or not. This check
assumed that there is at least one range in _ranges which will not hold
under some circumstances. Add a check for _ranges being empty.

Fixes: #3564
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <cbe64473f8013967a93ef7b2104c7ca0507afac9.1530610709.git.bdenes@scylladb.com>
(cherry picked from commit 59a30f0684)
2018-07-03 18:33:25 +03:00
Avi Kivity
ae15a80d01 Merge "more scylla_setup fixes" from Takuya
"
Added NIC / Disk existance check, --force-raid mode on
scylla_raid_setup.
"

* 'scylla_setup_fix4' of https://github.com/syuu1228/scylla:
  dist/common/scripts/scylla_raid_setup: verify specified disks are unused
  dist/common/scripts/scylla_raid_setup: add --force-raid to construct raid even only one disk is specified
  dist/common/scripts/scylla_setup: don't accept disk path if it's not block device
  dist/common/scripts/scylla_raid_setup: verify specified disk paths are block device
  dist/common/scripts/scylla_sysconfig_setup: verify NIC existance

(cherry picked from commit a36b1f1967)
2018-07-03 18:33:04 +03:00
Takuya ASADA
6cf902343a scripts: merge scylla_install_pkg to scylla-ami
scylla_install_pkg is initially written for one-liner-installer, but now
it only used for creating AMI, and it just few lines of code, so it should be
merge into scylla_install_ami script.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180612150106.26573-2-syuu@scylladb.com>
(cherry picked from commit 084c824d12)
2018-07-03 18:32:58 +03:00
Takuya ASADA
d5e59f671c dist/ami: drop Ubuntu AMI support
Drop Ubuntu AMI since it's not maintained for a long time, and we have
no plan to officially provide it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180612150106.26573-1-syuu@scylladb.com>
(cherry picked from commit fafcacc31c)
2018-07-03 18:32:53 +03:00
Avi Kivity
38944655c5 Uodate scylla-ami submodule
* dist/ami/files/scylla-ami 36e8511...0fd9d23 (2):
  > scylla_install_ami: merge scylla_install_pkg
  > scylla_install_ami: drop Ubuntu AMI

(cherry picked from commit 677991f353)
2018-07-03 18:32:45 +03:00
Avi Kivity
06e274ff34 Merge "scylla_setup fixes" from Takuya
"
I found problems on previously submmited patchset 'scylla_setup fixes'
and 'more fixes for scylla_setup', so fixed them and merged into one
patchset.

Also added few more patches.
"

* 'scylla_setup_fix3' of https://github.com/syuu1228/scylla:
  dist/common/scripts/scylla_setup: allow input multiple disk paths on RAID disk prompt
  dist/common/scripts/scylla_raid_setup: skip constructing RAID0 when only one disk specified
  dist/common/scripts/scylla_raid_setup: fix module import
  dist/common/scripts/scylla_setup: check disk is used in MDRAID
  dist/common/scripts/scylla_setup: move unmasking scylla-fstrim.timer on scylla_fstrim_setup
  dist/common/scripts/scylla_setup: use print() instead of logging.error()
  dist/common/scripts/scylla_setup: implement do_verify_package() for Gentoo Linux
  dist/common/scripts/scylla_coredump_setup: run os.remove() when deleting directory is symlink
  dist/common/scripts/scylla_setup: don't include the disk on unused list when it contains partitions
  dist/common/scripts/scylla_setup: skip running rest of the check when the disk detected as used
  dist/common/scripts/scylla_setup: add a disk to selected list correctly
  dist/common/scripts/scylla_setup: fix wrong indent
  dist/common/scripts: sync instance type list for detect NIC type to latest one
  dist/common/scripts: verify systemd unit existance using 'systemctl cat'

(cherry picked from commit 0b148d0070)
2018-07-03 18:32:35 +03:00
Avi Kivity
c24d4a8acb Merge "Fix handling of stale write replies in storage_proxy" from Gleb
"
If a coordinator sends write requests with ID=X and restarts it may get a reply to
the request after it restarts and sends another request with the same ID (but to
different replicas). This condition will trigger an assert in a coordinator. Drop
the assertion in favor of a warning and initialize handler id in a way to make
this situation less likely.

Fixes: #3153
"

* 'gleb/write-handler-id' of github.com:scylladb/seastar-dev:
  storage_proxy: initialize write response id counter from wall clock value
  storage_proxy: drop virtual from signal(gms::inet_address)
  storage_proxy: do not assert on getting an unexpected write reply

(cherry picked from commit a45c3aa8c7)
2018-07-02 11:56:52 +03:00
Nadav Har'El
5f95b76c65 repair: fix combination of "-pr" and "-local" repair options
When nodetool repair is used with the combination of the "-pr" (primary
range) and "-local" (only repair with nodes in the same DC) options,
Scylla needs to define the "primary ranges" differently: Rather than
assign one node in the entire cluster to be the primary owner of every
token, we need one node in each data-center - so that a "-local"
repair will cover all the tokens.

Fixes #3557.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180701132445.21685-1-nyh@scylladb.com>
(cherry picked from commit 3194ce16b3)
2018-07-02 11:56:41 +03:00
Tomasz Grabiec
0bdb7e1e7c row_cache: Fix memtable reads concurrent with cache update missing writes
Introduced in 5b59df3761.

It is incorrect to erase entries from the memtable being moved to
cache if partition update can be preempted because a later memtable
read may create a snapshot in the memtable before memtable writes for
that partition are made visible through cache. As a result the read
may miss some of the writes which were in the memtable. The code was
checking for presence of snapshots when entering the partition, but
this condition may change if update is preempted. The fix is to not
allow erasing if update is preemptible.

This also caused SIGSEGVs because we were assuming that no such
snapshots will be created and hence were not invalidating iterators on
removal of the entries, which results in undefined behavior when such
snapshots are actually created.

Fixes SIGSEGV in dtest: limits_test.py:TestLimits.max_cells_test

Fixes #3532

Message-Id: <1530129009-13716-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit b464b66e90)
2018-07-01 15:36:21 +03:00
Avi Kivity
56ea4f3154 Merge "Disable sstable filtering based on min/max clustering key components" from Tomasz
"
With DateTiered and TimeWindow, there is a read optimization enabled
which excludes sstables based on overlap with recorded min/max values
of clustering key components. The problem is that it doesn't take into
account partition tombstones and static rows, which should still be
returned by the reader even if there is no overlap in the query's
clustering range. A read which returns no clustering rows can
mispopulate cache, which will appear as partition deletion or writes
to the static row being lost. Until node restart or eviction of the
partition entry.

There is also a bad interaction between cache population on read and
that optimization. When the clustering range of the query doesn't
overlap with any sstable, the reader will return no partition markers
for the read, which leads cache populator to assume there is no
partition in sstables and it will cache an empty partition. This will
cause later reads of that partition to miss prior writes to that
partition until it is evicted from cache or node is restarted.

Disable until a more elaborate fix is implemented.

Fixes #3552
Fixes #3553
"

* tag 'tgrabiec/disable-min-max-sstable-filtering-v1' of github.com:tgrabiec/scylla:
  tests: Add test for slicing a mutation source with date tiered compaction strategy
  tests: Check that database conforms to mutation source
  database: Disable sstable filtering based on min/max clustering key components

(cherry picked from commit e1efda8b0c)
2018-06-27 17:01:28 +03:00
Calle Wilund
d9c178063c sstables::compress: Ensure unqualified compressor name if possible
Fixes #3546

Both older origin and scylla writes "known" compressor names (i.e. those
in origin namespace) unqualified (i.e. LZ4Compressor).

This behaviour was not preserved in the virtualization change. But
probably should be.

Message-Id: <20180627110930.1619-1-calle@scylladb.com>
(cherry picked from commit 054514a47a)
2018-06-27 17:01:22 +03:00
Avi Kivity
b21b7f73b9 version: prepare for scylla 2.3-rc0 2018-06-27 14:14:19 +03:00
50 changed files with 1037 additions and 687 deletions

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -1,6 +1,6 @@
#!/bin/sh
VERSION=666.development
VERSION=2.3.rc1
if test -f version
then

View File

@@ -187,7 +187,24 @@ size_t atomic_cell_or_collection::external_memory_usage(const abstract_type& t)
return 0;
}
auto ctx = data::cell::context(_data.get(), t.imr_state().type_info());
return data::cell::structure::serialized_object_size(_data.get(), ctx);
auto view = data::cell::structure::make_view(_data.get(), ctx);
auto flags = view.get<data::cell::tags::flags>();
size_t external_value_size = 0;
if (flags.get<data::cell::tags::external_data>()) {
if (flags.get<data::cell::tags::collection>()) {
external_value_size = get_collection_mutation_view(_data.get()).data.size_bytes();
} else {
auto cell_view = data::cell::atomic_cell_view(t.imr_state().type_info(), view);
external_value_size = cell_view.value_size();
}
// Add overhead of chunk headers. The last one is a special case.
external_value_size += (external_value_size - 1) / data::cell::maximum_external_chunk_length * data::cell::external_chunk_overhead;
external_value_size += data::cell::external_last_chunk_overhead;
}
return data::cell::structure::serialized_object_size(_data.get(), ctx)
+ imr_object_type::size_overhead + external_value_size;
}
std::ostream& operator<<(std::ostream& os, const atomic_cell_or_collection& c) {

View File

@@ -60,6 +60,7 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
// - _next_row_in_range = _next.position() < _upper_bound
// - _last_row points at a direct predecessor of the next row which is going to be read.
// Used for populating continuity.
// - _population_range_starts_before_all_rows is set accordingly
reading_from_underlying,
end_of_stream
@@ -86,6 +87,13 @@ class cache_flat_mutation_reader final : public flat_mutation_reader::impl {
partition_snapshot_row_cursor _next_row;
bool _next_row_in_range = false;
// True iff current population interval, since the previous clustering row, starts before all clustered rows.
// We cannot just look at _lower_bound, because emission of range tombstones changes _lower_bound and
// because we mark clustering intervals as continuous when consuming a clustering_row, it would prevent
// us from marking the interval as continuous.
// Valid when _state == reading_from_underlying.
bool _population_range_starts_before_all_rows;
// Whether _lower_bound was changed within current fill_buffer().
// If it did not then we cannot break out of it (e.g. on preemption) because
// forward progress is not guaranteed in case iterators are getting constantly invalidated.
@@ -231,6 +239,7 @@ inline
future<> cache_flat_mutation_reader::do_fill_buffer(db::timeout_clock::time_point timeout) {
if (_state == state::move_to_underlying) {
_state = state::reading_from_underlying;
_population_range_starts_before_all_rows = _lower_bound.is_before_all_clustered_rows(*_schema);
auto end = _next_row_in_range ? position_in_partition(_next_row.position())
: position_in_partition(_upper_bound);
return _read_context->fast_forward_to(position_range{_lower_bound, std::move(end)}, timeout).then([this, timeout] {
@@ -360,7 +369,7 @@ future<> cache_flat_mutation_reader::read_from_underlying(db::timeout_clock::tim
inline
bool cache_flat_mutation_reader::ensure_population_lower_bound() {
if (!_ck_ranges_curr->start()) {
if (_population_range_starts_before_all_rows) {
return true;
}
if (!_last_row.refresh(*_snp)) {
@@ -415,6 +424,7 @@ inline
void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
if (!can_populate()) {
_last_row = nullptr;
_population_range_starts_before_all_rows = false;
_read_context->cache().on_mispopulate();
return;
}
@@ -448,6 +458,7 @@ void cache_flat_mutation_reader::maybe_add_to_cache(const clustering_row& cr) {
with_allocator(standard_allocator(), [&] {
_last_row = partition_snapshot_row_weakref(*_snp, it, true);
});
_population_range_starts_before_all_rows = false;
});
}

View File

@@ -211,6 +211,7 @@ struct cell {
imr::member<tags::chunk_next, imr::pod<uint8_t*>>,
imr::member<tags::chunk_data, imr::buffer<tags::chunk_data>>
>;
static constexpr size_t external_chunk_overhead = sizeof(uint8_t*) * 2;
using external_last_chunk_size = imr::pod<uint16_t>;
/// The last fragment of an externally stored value
@@ -224,6 +225,7 @@ struct cell {
imr::member<tags::last_chunk_size, external_last_chunk_size>,
imr::member<tags::chunk_data, imr::buffer<tags::chunk_data>>
>;
static constexpr size_t external_last_chunk_overhead = sizeof(uint8_t*) + sizeof(uint16_t);
class context;
class minimal_context;

View File

@@ -383,9 +383,13 @@ filter_sstable_for_reader(std::vector<sstables::shared_sstable>&& sstables, colu
};
sstables.erase(boost::remove_if(sstables, sstable_has_not_key), sstables.end());
// FIXME: Workaround for https://github.com/scylladb/scylla/issues/3552
// and https://github.com/scylladb/scylla/issues/3553
const bool filtering_broken = true;
// no clustering filtering is applied if schema defines no clustering key or
// compaction strategy thinks it will not benefit from such an optimization.
if (!schema->clustering_key_size() || !cf.get_compaction_strategy().use_clustering_key_filter()) {
if (filtering_broken || !schema->clustering_key_size() || !cf.get_compaction_strategy().use_clustering_key_filter()) {
return sstables;
}
::cf_stats* stats = cf.cf_stats();
@@ -2699,7 +2703,7 @@ future<> database::drop_column_family(const sstring& ks_name, const sstring& cf_
remove(*cf);
cf->clear_views();
auto& ks = find_keyspace(ks_name);
return cf->await_pending_writes().then([this, &ks, cf, tsf = std::move(tsf), snapshot] {
return when_all_succeed(cf->await_pending_writes(), cf->await_pending_reads()).then([this, &ks, cf, tsf = std::move(tsf), snapshot] {
return truncate(ks, *cf, std::move(tsf), snapshot).finally([this, cf] {
return cf->stop();
});
@@ -3139,7 +3143,7 @@ database::query(schema_ptr s, const query::read_command& cmd, query::result_opti
seastar::ref(get_result_memory_limiter()),
max_result_size,
timeout,
std::move(cache_ctx)).then_wrapped([this, s = _stats, hit_rate = cf.get_global_cache_hit_rate()] (auto f) {
std::move(cache_ctx)).then_wrapped([this, s = _stats, hit_rate = cf.get_global_cache_hit_rate(), op = cf.read_in_progress()] (auto f) {
if (f.failed()) {
++s->total_reads_failed;
return make_exception_future<lw_shared_ptr<query::result>, cache_temperature>(f.get_exception());
@@ -3167,7 +3171,7 @@ database::query_mutations(schema_ptr s, const query::read_command& cmd, const dh
std::move(accounter),
std::move(trace_state),
timeout,
std::move(cache_ctx)).then_wrapped([this, s = _stats, hit_rate = cf.get_global_cache_hit_rate()] (auto f) {
std::move(cache_ctx)).then_wrapped([this, s = _stats, hit_rate = cf.get_global_cache_hit_rate(), op = cf.read_in_progress()] (auto f) {
if (f.failed()) {
++s->total_reads_failed;
return make_exception_future<reconcilable_result, cache_temperature>(f.get_exception());

View File

@@ -475,6 +475,8 @@ private:
// after some modification, needs to ensure that news writes will see it before
// it can proceed, such as the view building code.
utils::phased_barrier _pending_writes_phaser;
// Corresponding phaser for in-progress reads.
utils::phased_barrier _pending_reads_phaser;
private:
void update_stats_for_new_sstable(uint64_t disk_space_used_by_sstable, const std::vector<unsigned>& shards_for_the_sstable) noexcept;
// Adds new sstable to the set of sstables
@@ -817,6 +819,14 @@ public:
return _pending_writes_phaser.advance_and_await();
}
utils::phased_barrier::operation read_in_progress() {
return _pending_reads_phaser.start();
}
future<> await_pending_reads() {
return _pending_reads_phaser.advance_and_await();
}
void add_or_update_view(view_ptr v);
void remove_view(view_ptr v);
void clear_views();

View File

@@ -384,6 +384,10 @@ public:
return "biased-token-round-robin";
}
virtual unsigned sharding_ignore_msb() const {
return 0;
}
friend bool operator==(token_view t1, token_view t2);
friend bool operator<(token_view t1, token_view t2);
friend int tri_compare(token_view t1, token_view t2);

View File

@@ -290,6 +290,11 @@ murmur3_partitioner::token_for_next_shard(const token& t, shard_id shard, unsign
return bias(n);
}
unsigned
murmur3_partitioner::sharding_ignore_msb() const {
return _sharding_ignore_msb_bits;
}
using registry = class_registrator<i_partitioner, murmur3_partitioner, const unsigned&, const unsigned&>;
static registry registrator("org.apache.cassandra.dht.Murmur3Partitioner");

View File

@@ -52,6 +52,7 @@ public:
virtual unsigned shard_of(const token& t) const override;
virtual token token_for_next_shard(const token& t, shard_id shard, unsigned spans) const override;
virtual unsigned sharding_ignore_msb() const override;
private:
using uint128_t = unsigned __int128;
static int64_t normalize(int64_t in);

111
dist/ami/build_ami.sh vendored
View File

@@ -11,11 +11,9 @@ print_usage() {
echo " --repo repository for both install and update, specify .repo/.list file URL"
echo " --repo-for-install repository for install, specify .repo/.list file URL"
echo " --repo-for-update repository for update, specify .repo/.list file URL"
echo " --target specify target distribution"
exit 1
}
LOCALRPM=0
TARGET=centos
while [ $# -gt 0 ]; do
case "$1" in
"--localrpm")
@@ -34,10 +32,6 @@ while [ $# -gt 0 ]; do
INSTALL_ARGS="$INSTALL_ARGS --repo-for-update $2"
shift 2
;;
"--target")
TARGET="$2"
shift 2
;;
*)
print_usage
;;
@@ -62,91 +56,42 @@ pkg_install() {
fi
}
case "$TARGET" in
"centos")
AMI=ami-ae7bfdb8
REGION=us-east-1
SSH_USERNAME=centos
;;
"trusty")
AMI=ami-ff427095
REGION=us-east-1
SSH_USERNAME=ubuntu
;;
"xenial")
AMI=ami-da05a4a0
REGION=us-east-1
SSH_USERNAME=ubuntu
;;
*)
echo "build_ami.sh does not supported this distribution."
exit 1
;;
esac
AMI=ami-ae7bfdb8
REGION=us-east-1
SSH_USERNAME=centos
if [ $LOCALRPM -eq 1 ]; then
sudo rm -rf build/*
REPO=`./scripts/scylla_current_repo --target $TARGET`
REPO=`./scripts/scylla_current_repo --target centos`
INSTALL_ARGS="$INSTALL_ARGS --localrpm --repo $REPO"
if [ ! -f /usr/bin/git ]; then
pkg_install git
fi
if [ "$TARGET" = "centos" ]; then
if [ ! -f dist/ami/files/scylla.x86_64.rpm ] || [ ! -f dist/ami/files/scylla-kernel-conf.x86_64.rpm ] || [ ! -f dist/ami/files/scylla-conf.x86_64.rpm ] || [ ! -f dist/ami/files/scylla-server.x86_64.rpm ] || [ ! -f dist/ami/files/scylla-debuginfo.x86_64.rpm ]; then
dist/redhat/build_rpm.sh --dist --target epel-7-x86_64
cp build/rpms/scylla-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla.x86_64.rpm
cp build/rpms/scylla-kernel-conf-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla-kernel-conf.x86_64.rpm
cp build/rpms/scylla-conf-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla-conf.x86_64.rpm
cp build/rpms/scylla-server-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla-server.x86_64.rpm
cp build/rpms/scylla-debuginfo-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla-debuginfo.x86_64.rpm
fi
if [ ! -f dist/ami/files/scylla-jmx.noarch.rpm ]; then
cd build
git clone --depth 1 https://github.com/scylladb/scylla-jmx.git
cd scylla-jmx
dist/redhat/build_rpm.sh --target epel-7-x86_64
cd ../..
cp build/scylla-jmx/build/rpms/scylla-jmx-`cat build/scylla-jmx/build/SCYLLA-VERSION-FILE`-`cat build/scylla-jmx/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/files/scylla-jmx.noarch.rpm
fi
if [ ! -f dist/ami/files/scylla-tools.noarch.rpm ] || [ ! -f dist/ami/files/scylla-tools-core.noarch.rpm ]; then
cd build
git clone --depth 1 https://github.com/scylladb/scylla-tools-java.git
cd scylla-tools-java
dist/redhat/build_rpm.sh --target epel-7-x86_64
cd ../..
cp build/scylla-tools-java/build/rpms/scylla-tools-`cat build/scylla-tools-java/build/SCYLLA-VERSION-FILE`-`cat build/scylla-tools-java/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/files/scylla-tools.noarch.rpm
cp build/scylla-tools-java/build/rpms/scylla-tools-core-`cat build/scylla-tools-java/build/SCYLLA-VERSION-FILE`-`cat build/scylla-tools-java/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/files/scylla-tools-core.noarch.rpm
fi
else
if [ ! -f dist/ami/files/scylla-server_amd64.deb ]; then
./scripts/git-archive-all --force-submodules --prefix scylla build/scylla.tar
tar -C build/ -xvpf build/scylla.tar
cd build/scylla
dist/debian/build_deb.sh --dist --target $TARGET
cd ../..
cp build/scylla/build/debs/scylla_`cat build/SCYLLA-VERSION-FILE | sed 's/\.rc/~rc/'`-`cat build/SCYLLA-RELEASE-FILE`-0ubuntu1~${TARGET}_amd64.deb dist/ami/files/scylla_amd64.deb
cp build/scylla/build/debs/scylla-kernel-conf_`cat build/SCYLLA-VERSION-FILE | sed 's/\.rc/~rc/'`-`cat build/SCYLLA-RELEASE-FILE`-0ubuntu1~${TARGET}_amd64.deb dist/ami/files/scylla-kernel-conf_amd64.deb
cp build/scylla/build/debs/scylla-conf_`cat build/SCYLLA-VERSION-FILE | sed 's/\.rc/~rc/'`-`cat build/SCYLLA-RELEASE-FILE`-0ubuntu1~${TARGET}_amd64.deb dist/ami/files/scylla-conf_amd64.deb
cp build/scylla/build/debs/scylla-server_`cat build/SCYLLA-VERSION-FILE | sed 's/\.rc/~rc/'`-`cat build/SCYLLA-RELEASE-FILE`-0ubuntu1~${TARGET}_amd64.deb dist/ami/files/scylla-server_amd64.deb
cp build/scylla/build/debs/scylla-server-dbg_`cat build/SCYLLA-VERSION-FILE | sed 's/\.rc/~rc/'`-`cat build/SCYLLA-RELEASE-FILE`-0ubuntu1~${TARGET}_amd64.deb dist/ami/files/scylla-server-dbg_amd64.deb
fi
if [ ! -f dist/ami/files/scylla-jmx_all.deb ]; then
cd build
git clone --depth 1 https://github.com/scylladb/scylla-jmx.git
cd scylla-jmx
dist/debian/build_deb.sh --target $TARGET
cd ../..
cp build/scylla-jmx/build/debs/scylla-jmx_`cat build/scylla-jmx/build/SCYLLA-VERSION-FILE | sed 's/\.rc/~rc/'`-`cat build/scylla-jmx/build/SCYLLA-RELEASE-FILE`-0ubuntu1~${TARGET}_all.deb dist/ami/files/scylla-jmx_all.deb
fi
if [ ! -f dist/ami/files/scylla-tools_all.deb ]; then
cd build
git clone --depth 1 https://github.com/scylladb/scylla-tools-java.git
cd scylla-tools-java
dist/debian/build_deb.sh --target $TARGET
cd ../..
cp build/scylla-tools-java/build/debs/scylla-tools_`cat build/scylla-tools-java/build/SCYLLA-VERSION-FILE | sed 's/\.rc/~rc/'`-`cat build/scylla-tools-java/build/SCYLLA-RELEASE-FILE`-0ubuntu1~${TARGET}_all.deb dist/ami/files/scylla-tools_all.deb
fi
if [ ! -f dist/ami/files/scylla.x86_64.rpm ] || [ ! -f dist/ami/files/scylla-kernel-conf.x86_64.rpm ] || [ ! -f dist/ami/files/scylla-conf.x86_64.rpm ] || [ ! -f dist/ami/files/scylla-server.x86_64.rpm ] || [ ! -f dist/ami/files/scylla-debuginfo.x86_64.rpm ]; then
dist/redhat/build_rpm.sh --dist --target epel-7-x86_64
cp build/rpms/scylla-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla.x86_64.rpm
cp build/rpms/scylla-kernel-conf-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla-kernel-conf.x86_64.rpm
cp build/rpms/scylla-conf-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla-conf.x86_64.rpm
cp build/rpms/scylla-server-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla-server.x86_64.rpm
cp build/rpms/scylla-debuginfo-`cat build/SCYLLA-VERSION-FILE`-`cat build/SCYLLA-RELEASE-FILE`.*.x86_64.rpm dist/ami/files/scylla-debuginfo.x86_64.rpm
fi
if [ ! -f dist/ami/files/scylla-jmx.noarch.rpm ]; then
cd build
git clone --depth 1 https://github.com/scylladb/scylla-jmx.git
cd scylla-jmx
dist/redhat/build_rpm.sh --target epel-7-x86_64
cd ../..
cp build/scylla-jmx/build/rpms/scylla-jmx-`cat build/scylla-jmx/build/SCYLLA-VERSION-FILE`-`cat build/scylla-jmx/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/files/scylla-jmx.noarch.rpm
fi
if [ ! -f dist/ami/files/scylla-tools.noarch.rpm ] || [ ! -f dist/ami/files/scylla-tools-core.noarch.rpm ]; then
cd build
git clone --depth 1 https://github.com/scylladb/scylla-tools-java.git
cd scylla-tools-java
dist/redhat/build_rpm.sh --target epel-7-x86_64
cd ../..
cp build/scylla-tools-java/build/rpms/scylla-tools-`cat build/scylla-tools-java/build/SCYLLA-VERSION-FILE`-`cat build/scylla-tools-java/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/files/scylla-tools.noarch.rpm
cp build/scylla-tools-java/build/rpms/scylla-tools-core-`cat build/scylla-tools-java/build/SCYLLA-VERSION-FILE`-`cat build/scylla-tools-java/build/SCYLLA-RELEASE-FILE`.*.noarch.rpm dist/ami/files/scylla-tools-core.noarch.rpm
fi
fi

View File

@@ -7,121 +7,8 @@ fi
# User specific environment and startup programs
. /usr/lib/scylla/scylla_lib.sh
PATH=$PATH:$HOME/.local/bin:$HOME/bin
export PATH
echo
echo ' _____ _ _ _____ ____ '
echo ' / ____| | | | | __ \| _ \ '
echo ' | (___ ___ _ _| | | __ _| | | | |_) |'
echo ' \___ \ / __| | | | | |/ _` | | | | _ < '
echo ' ____) | (__| |_| | | | (_| | |__| | |_) |'
echo ' |_____/ \___|\__, |_|_|\__,_|_____/|____/ '
echo ' __/ | '
echo ' |___/ '
echo ''
echo ''
echo 'Nodetool:'
echo ' nodetool help'
echo 'CQL Shell:'
echo ' cqlsh'
echo 'More documentation available at: '
echo ' http://www.scylladb.com/doc/'
echo 'By default, Scylla sends certain information about this node to a data collection server. For information, see http://www.scylladb.com/privacy/'
echo
if [ `ec2_is_supported_instance_type` -eq 0 ]; then
TYPE=`curl -s http://169.254.169.254/latest/meta-data/instance-type`
tput setaf 1
tput bold
echo " $TYPE is not supported instance type!"
tput sgr0
echo -n "To continue startup ScyllaDB on this instance, run 'sudo scylla_io_setup' "
if ! is_systemd; then
echo "then 'initctl start scylla-server'."
else
echo "then 'systemctl start scylla-server'."
fi
echo "For a list of optimized instance types and more EC2 instructions see http://www.scylladb.com/doc/getting-started-amazon/"
echo
else
SETUP=
if is_systemd; then
SETUP=`systemctl is-active scylla-ami-setup`
fi
if [ "$SETUP" == "activating" ]; then
tput setaf 4
tput bold
echo " Constructing RAID volume..."
tput sgr0
echo
echo "Please wait for setup. To see status, run "
echo " 'systemctl status scylla-ami-setup'"
echo
echo "After setup finished, scylla-server service will launch."
echo "To see status of scylla-server, run "
echo " 'systemctl status scylla-server'"
echo
elif [ "$SETUP" == "failed" ]; then
tput setaf 1
tput bold
echo " AMI initial configuration failed!"
tput sgr0
echo
echo "To see status, run "
echo " 'systemctl status scylla-ami-setup'"
echo
else
if is_systemd; then
SCYLLA=`systemctl is-active scylla-server`
else
if [ "`initctl status scylla-server|grep "running, process"`" != "" ]; then
SCYLLA="active"
else
SCYLLA="failed"
fi
fi
if [ "$SCYLLA" == "activating" ]; then
tput setaf 4
tput bold
echo " ScyllaDB is starting..."
tput sgr0
echo
echo "Please wait for start. To see status, run "
echo " 'systemctl status scylla-server'"
echo
elif [ "$SCYLLA" == "active" ]; then
tput setaf 4
tput bold
echo " ScyllaDB is active."
tput sgr0
echo
echo "$ nodetool status"
echo
nodetool status
else
tput setaf 1
tput bold
echo " ScyllaDB is not started!"
tput sgr0
echo "Please wait for startup. To see status of ScyllaDB, run "
if ! is_systemd; then
echo " 'initctl status scylla-server'"
echo "and"
echo " 'sudo cat /var/log/upstart/scylla-server.log'"
echo
else
echo " 'systemctl status scylla-server'"
echo
fi
fi
fi
echo -n " "
/usr/lib/scylla/scylla_ec2_check
if [ $? -eq 0 ]; then
echo
fi
fi
~/.scylla_ami_login

118
dist/ami/files/.scylla_ami_login vendored Executable file
View File

@@ -0,0 +1,118 @@
#!/usr/bin/python3
#
# Copyright 2018 ScyllaDB
#
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
import os
import sys
import argparse
sys.path.append('/usr/lib/scylla')
from scylla_util import *
MSG_HEADER = '''
_____ _ _ _____ ____
/ ____| | | | | __ \| _ \
| (___ ___ _ _| | | __ _| | | | |_) |
\___ \ / __| | | | | |/ _` | | | | _ <
____) | (__| |_| | | | (_| | |__| | |_) |
|_____/ \___|\__, |_|_|\__,_|_____/|____/
__/ |
|___/
Nodetool:
nodetool help
CQL Shell:
cqlsh
More documentation available at:
http://www.scylladb.com/doc/
By default, Scylla sends certain information about this node to a data collection server. For information, see http://www.scylladb.com/privacy/
'''[1:-1]
MSG_UNSUPPORTED_INSTANCE_TYPE = '''
{red}{type} is not supported instance type!{nocolor}
To continue startup ScyllaDB on this instance, run 'sudo scylla_io_setup' then 'systemctl start scylla-server'.
For a list of optimized instance types and more EC2 instructions see http://www.scylladb.com/doc/getting-started-amazon/"
'''[1:-1]
MSG_SETUP_ACTIVATING = '''
{green}Constructing RAID volume...{nocolor}
Please wait for setup. To see status, run
'systemctl status scylla-ami-setup'
After setup finished, scylla-server service will launch.
To see status of scylla-server, run
'systemctl status scylla-server'
'''[1:-1]
MSG_SETUP_FAILED = '''
{red}AMI initial configuration failed!{nocolor}
To see status, run
'systemctl status scylla-ami-setup'
'''[1:-1]
MSG_SCYLLA_ACTIVATING = '''
{green}ScyllaDB is starting...{nocolor}
Please wait for start. To see status, run
'systemctl status scylla-server'
'''[1:-1]
MSG_SCYLLA_FAILED = '''
{red}ScyllaDB is not started!{nocolor}
Please wait for startup. To see status of ScyllaDB, run
'systemctl status scylla-server'
'''[1:-1]
MSG_SCYLLA_ACTIVE = '''
{green}ScyllaDB is active.{nocolor}
$ nodetool status
'''[1:-1]
if __name__ == '__main__':
colorprint(MSG_HEADER)
aws = aws_instance()
if not aws.is_supported_instance_class():
colorprint(MSG_UNSUPPORTED_INSTANCE_TYPE.format(type=aws.instance_class()))
else:
setup = systemd_unit('scylla-ami-setup.service')
res = setup.is_active()
if res == 'activating':
colorprint(MSG_SETUP_ACTIVATING)
elif res == 'failed':
colorprint(MSG_SETUP_FAILED)
else:
server = systemd_unit('scylla-server.service')
res = server.is_active()
if res == 'activating':
colorprint(MSG_SCYLLA_ACTIVATING)
elif res == 'failed':
colorprint(MSG_SCYLLA_FAILED)
else:
colorprint(MSG_SCYLLA_ACTIVE)
run('nodetool status', exception=False)
print(' ', end='')
res = run('/usr/lib/scylla/scylla_ec2_check --nic eth0', exception=False)
if res == 0:
print('')

View File

@@ -64,14 +64,11 @@
"source": "files/",
"destination": "/home/{{user `ssh_username`}}/"
},
{
"type": "file",
"source": "../../scripts/scylla_install_pkg",
"destination": "/home/{{user `ssh_username`}}/scylla_install_pkg"
},
{
"type": "shell",
"inline": [
"sudo yum install -y epel-release",
"sudo yum install -y python34",
"sudo /home/{{user `ssh_username`}}/scylla-ami/scylla_install_ami {{ user `install_args` }}"
]
}

View File

@@ -1,6 +1,8 @@
#!/bin/sh
#!/usr/bin/python3
#
# Copyright 2016 ScyllaDB
# Copyright 2018 ScyllaDB
#
#
# This file is part of Scylla.
#
@@ -17,42 +19,46 @@
# You should have received a copy of the GNU General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
if [ "`id -u`" -ne 0 ]; then
echo "Requires root permission."
exit 1
fi
import os
import sys
import tempfile
import tarfile
from scylla_util import *
if [ -f /usr/bin/node_exporter ] || [ -f /usr/bin/prometheus-node_exporter ]; then
echo "node_exporter already installed"
exit 1
fi
VERSION='0.14.0'
INSTALL_DIR='/usr/lib/scylla/Prometheus/node_exporter'
. /usr/lib/scylla/scylla_lib.sh
if __name__ == '__main__':
if os.getuid() > 0:
print('Requires root permission.')
sys.exit(1)
if is_gentoo_variant; then
emerge -uq app-metrics/node_exporter
if is_systemd; then
echo "app-metrics/node_exporter does not install systemd service files, please fill a bug if you need them."
else
rc-update add node_exporter default
rc-service node_exporter start
fi
else
version=0.14.0
dir=/usr/lib/scylla/Prometheus/node_exporter
mkdir -p $dir
cd $dir
curl -L https://github.com/prometheus/node_exporter/releases/download/v$version/node_exporter-$version.linux-amd64.tar.gz -o $dir/node_exporter-$version.linux-amd64.tar.gz
tar -xvzf $dir/node_exporter-$version.linux-amd64.tar.gz
rm $dir/node_exporter-$version.linux-amd64.tar.gz
ln -s $dir/node_exporter-$version.linux-amd64/node_exporter /usr/bin
. /etc/os-release
if os.path.exists('/usr/bin/node_exporter') or os.path.exists('/usr/bin/prometheus-node_exporter'):
print('node_exporter already installed')
sys.exit(1)
if is_systemd; then
systemctl enable node-exporter
systemctl start node-exporter
else
cat <<EOT >> /etc/init/node_exporter.conf
if is_gentoo_variant():
run('emerge -uq app-metrics/node_exporter')
if is_systemd():
print('app-metrics/node_exporter does not install systemd service files, please fill a bug if you need them.')
sys.exit(1)
else:
run('rc-update add node_exporter default')
run('rc-service node_exporter start')
else:
data = curl('https://github.com/prometheus/node_exporter/releases/download/v{version}/node_exporter-{version}.linux-amd64.tar.gz'.format(version=VERSION), byte=True)
with open('/var/tmp/node_exporter-{version}.linux-amd64.tar.gz'.format(version=VERSION), 'wb') as f:
f.write(data)
with tarfile.open('/var/tmp/node_exporter-{version}.linux-amd64.tar.gz'.format(version=VERSION)) as tf:
tf.extractall(INSTALL_DIR)
os.remove('/var/tmp/node_exporter-{version}.linux-amd64.tar.gz'.format(version=VERSION))
os.symlink('{install_dir}/node_exporter-{version}.linux-amd64/node_exporter'.format(install_dir=INSTALL_DIR, version=VERSION), '/usr/bin/node_exporter')
if is_systemd():
node_exporter = systemd_unit('node-exporter.service')
node_exporter.enable()
node_exporter.start()
else:
conf = '''
# Run node_exporter
start on startup
@@ -60,9 +66,9 @@ start on startup
script
/usr/bin/node_exporter
end script
EOT
service node_exporter start
fi
fi
'''[1:-1]
with open('/etc/init/node_exporter.conf', 'w') as f:
f.write(conf)
run('service node_exporter start')
printf "node_exporter successfully installed\n"
print('node_exporter successfully installed')

View File

@@ -28,6 +28,7 @@ OUTPUT_PATH4="$OUTPUT_PATH/data_model"
OUTPUT_PATH5="$OUTPUT_PATH/network_checks"
IS_FEDORA="0"
IS_DEBIAN="0"
IS_GENTOO="0"
JMX_PORT="7199"
CQL_PORT="9042"
PRINT_DM=NO
@@ -75,7 +76,7 @@ while getopts ":hdncap:q:" opt; do
done
##Check server release (Fedora/Oracle/Debian)##
##Check server release (Fedora/Oracle/Debian/Gentoo)##
cat /etc/os-release | grep -i fedora &> /dev/null
if [ $? -ne 0 ]; then
cat /etc/os-release | grep -i oracle &> /dev/null
@@ -89,7 +90,12 @@ if [ $? -ne 0 ]; then
IS_DEBIAN="1"
fi
if [ "$IS_FEDORA" == "1" ] && [ "$IS_DEBIAN" == "1" ]; then
cat /etc/os-release | grep -i gentoo &> /dev/null
if [ $? -ne 0 ]; then
IS_GENTOO="1"
fi
if [ "$IS_FEDORA" == "1" ] && [ "$IS_DEBIAN" == "1" ] && [ "$IS_GENTOO" == "1" ]; then
echo "This s a Non-Supported OS, Please Review the Support Matrix"
exit 222
fi
@@ -108,7 +114,7 @@ if [ $? -ne 0 ]; then
else
echo "Scylla-server Service: OK"
echo "--------------------------------------------------"
fi
fi
##Scylla-JMX service status##
@@ -125,7 +131,7 @@ if [ $? -ne 0 ]; then
else
echo "Scylla-JMX Service (nodetool): OK"
echo "--------------------------------------------------"
fi
fi
#Install 'net-tools' pkg, to be used for netstat command#
@@ -141,6 +147,9 @@ if [ "$IS_DEBIAN" == "0" ]; then
sudo apt-get install net-tools -y | grep already
fi
if [ "$IS_GENTOO" == "0" ]; then
sudo emerge -1uq sys-apps/ethtool sys-apps/net-tools
fi
#Create dir structure to save output_files#
echo "--------------------------------------------------"
@@ -182,6 +191,12 @@ if [ "$IS_DEBIAN" == "0" ]; then
cp -p /etc/default/scylla-server $OUTPUT_PATH2
fi
if [ "$IS_GENTOO" == "0" ]; then
sudo emerge -1uq app-portage/portage-utils
sudo qlist -ICv scylla > $OUTPUT_PATH2/scylla-pkgs.txt
cp -p /etc/default/scylla-server $OUTPUT_PATH2
fi
#Scylla Logs#
echo "--------------------------------------------------"
@@ -192,7 +207,11 @@ journalctl --help &> /dev/null
if [ $? -eq 0 ]; then
journalctl -t scylla > $OUTPUT_PATH/scylla-logs.txt
else
cat /var/log/syslog | grep -i scylla > $OUTPUT_PATH/scylla-logs.txt
if [ "$IS_GENTOO" == "0" ]; then
cat /var/log/scylla/scylla.log > $OUTPUT_PATH/scylla-logs.txt
else
cat /var/log/syslog | grep -i scylla > $OUTPUT_PATH/scylla-logs.txt
fi
fi
gzip -f $OUTPUT_PATH/scylla-logs.txt
@@ -224,6 +243,7 @@ if [ "$SCYLLA_SERVICE" == "1" ]; then
echo "Skipping Data Model Info Collection"
echo "--------------------------------------------------"
else
# TODO: handle connecting with authentication
cqlsh `hostname -i` $CQL_PORT -e "HELP" &> /dev/null
if [ $? -eq 0 ]; then
echo "Collecting Data Model Info (using port $CQL_PORT)"
@@ -357,7 +377,7 @@ if [ "$IS_FEDORA" == "0" ]; then
echo "## /etc/sysconfig/scylla-server ##" >> $REPORT
fi
if [ "$IS_DEBIAN" == "0" ]; then
if [ "$IS_DEBIAN" == "0" ] || [ "$IS_GENTOO" == "0" ]; then
echo "## /etc/default/scylla-server ##" >> $REPORT
fi

View File

@@ -23,7 +23,6 @@ import os
import sys
import argparse
import subprocess
import shutil
from scylla_util import *
if __name__ == '__main__':
@@ -62,7 +61,7 @@ ExternalSizeMax=1024G
with open('/etc/systemd/coredump.conf', 'w') as f:
conf = f.write(conf_data)
if args.dump_to_raiddir:
shutil.rmtree('/var/lib/systemd/coredump')
rmtree('/var/lib/systemd/coredump')
makedirs('/var/lib/scylla/coredump')
os.symlink('/var/lib/scylla/coredump', '/var/lib/systemd/coredump')
run('systemctl daemon-reload')

View File

@@ -24,46 +24,36 @@ import sys
import argparse
from scylla_util import *
def get_en_interface_type():
type, subtype = curl('http://169.254.169.254/latest/meta-data/instance-type').split('.')
if type in ['c3', 'c4', 'd4', 'd2', 'i2', 'r3']:
return 'ixgbevf'
if type in ['i3', 'p2', 'r4', 'x1']:
return 'ena'
if type == 'm4':
if subtype == '16xlarge':
return 'ena'
else:
return 'ixgbevf'
def is_vpc_enabled():
with open('/sys/class/net/eth0/address') as f:
mac = f.read().strip()
mac_stat = curl('http://169.254.169.254/latest/meta-data/network/interfaces/macs/{}/'.format(mac))
return True if re.search(r'^vpc-id$', mac_stat, flags=re.MULTILINE) else False
if __name__ == '__main__':
if not is_ec2():
sys.exit(0)
parser = argparse.ArgumentParser(description='Verify EC2 configuration is optimized.')
parser.add_argument('--nic', default='eth0',
help='specify NIC')
args = parser.parse_args()
type = curl('http://169.254.169.254/latest/meta-data/instance-type')
en = get_en_interface_type()
match = re.search(r'^driver: (\S+)$', out('ethtool -i eth0'), flags=re.MULTILINE)
if not is_valid_nic(args.nic):
print('NIC {} doesn\'t exist.'.format(args.nic))
sys.exit(1)
aws = aws_instance()
instance_class = aws.instance_class()
en = aws.get_en_interface_type()
match = re.search(r'^driver: (\S+)$', out('ethtool -i {}'.format(args.nic)), flags=re.MULTILINE)
driver = match.group(1)
if not en:
print('{bold_red}{type} doesn\'t support enahanced networking!{no_color}'.format(bold_red=concolor.BOLD_RED, type=type, no_color=concolor.NO_COLOR))
colorprint('{red}{instance_class} doesn\'t support enahanced networking!{nocolor}'.format(instance_class))
print('''To enable enhanced networking, please use the instance type which supports it.
More documentation available at:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#enabling_enhanced_networking''')
sys.exit(1)
elif not is_vpc_enabled():
print('{bold_red}VPC is not enabled!{no_color}'.format(bold_red=concolor.BOLD_RED, no_color=concolor.NO_COLOR))
elif not aws.is_vpc_enabled(args.nic):
colorprint('{red}VPC is not enabled!{nocolor}')
print('To enable enhanced networking, please enable VPC.')
sys.exit(1)
elif driver != en:
print('{bold_red}Enhanced networking is disabled!{no_color}'.format(bold_red=concolor.BOLD_RED, no_color=concolor.NO_COLOR))
colorprint('{red}Enhanced networking is disabled!{nocolor}')
print('''More documentation available at:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html''')
sys.exit(1)

View File

@@ -28,6 +28,8 @@ if __name__ == '__main__':
if os.getuid() > 0:
print('Requires root permission.')
sys.exit(1)
if is_systemd():
systemd_unit('scylla-fstrim.timer').unmask()
if is_redhat_variant():
systemd_unit('fstrim.timer').disable()
if dist_name() == 'Ubuntu' and os.path.exists('/etc/cron.weekly/fstrim'):

View File

@@ -1,122 +0,0 @@
#
# Copyright (C) 2016 ScyllaDB
is_debian_variant() {
[ -f /etc/debian_version ]
}
is_redhat_variant() {
[ -f /etc/redhat-release ]
}
is_gentoo_variant() {
[ -f /etc/gentoo-release ]
}
is_systemd() {
grep -q '^systemd$' /proc/1/comm
}
is_ec2() {
[ -f /sys/hypervisor/uuid ] && [ "$(head -c 3 /sys/hypervisor/uuid)" = "ec2" ]
}
is_selinux_enabled() {
STATUS=`getenforce`
if [ "$STATUS" = "Disabled" ]; then
return 0
else
return 1
fi
}
ec2_is_supported_instance_type() {
TYPE=`curl -s http://169.254.169.254/latest/meta-data/instance-type|cut -d . -f 1`
case $TYPE in
"i2"|"i3") echo 1;;
*) echo 0;;
esac
}
verify_args() {
if [ -z "$2" ] || [[ "$2" =~ ^--+ ]]; then
echo "Requires more parameter for $1."
print_usage
exit 1
fi
}
#
# get_mode_cpu_set <mode name, e.g. 'mq', 'sq', 'sq_split'>
#
get_mode_cpu_set() {
local mode=$1
local mode_cpu_mask=`/usr/lib/scylla/perftune.py --tune net --nic "$nic" --mode "$mode" --get-cpu-mask` 2>&-
# If the given mode is not supported - return invalid CPU set
if [[ "$?" -ne "0" ]]; then
echo "-1"
else
echo "$mode_cpu_mask" | /usr/lib/scylla/hex2list.py
fi
}
#
# check_cpuset_conf <NIC name>
#
get_tune_mode() {
local nic=$1
# if cpuset.conf doesn't exist use the default mode
[[ ! -e '/etc/scylla.d/cpuset.conf' ]] && return
local cur_cpuset=`cat /etc/scylla.d/cpuset.conf | cut -d "\"" -f2- | cut -d" " -f2`
local mq_cpuset=`get_mode_cpu_set 'mq'`
local sq_cpuset=`get_mode_cpu_set 'sq'`
local sq_split_cpuset=`get_mode_cpu_set 'sq_split'`
local tune_mode=""
case "$cur_cpuset" in
"$mq_cpuset")
tune_mode="--mode mq"
;;
"$sq_cpuset")
tune_mode="--mode sq"
;;
"$sq_split_cpuset")
tune_mode="--mode sq_split"
;;
esac
# if cpuset is something different from what we expect - use the default mode
echo "$tune_mode"
}
#
# create_perftune_conf [<NIC name>]
#
create_perftune_conf() {
local nic=$1
[[ -z "$nic" ]] && nic='eth0'
# if exists - do nothing
[[ -e '/etc/scylla.d/perftune.yaml' ]] && return
local mode=`get_tune_mode "$nic"`
/usr/lib/scylla/perftune.py --tune net --nic "$nic" $mode --dump-options-file > /etc/scylla.d/perftune.yaml
}
. /etc/os-release
if is_debian_variant || is_gentoo_variant; then
SYSCONFIG=/etc/default
else
SYSCONFIG=/etc/sysconfig
fi
. $SYSCONFIG/scylla-server
for i in /etc/scylla.d/*.conf; do
if [ "$i" = "/etc/scylla.d/*.conf" ]; then
break
fi
. "$i"
done

View File

@@ -1,33 +1,71 @@
#!/bin/bash -e
#!/usr/bin/python3
#
# Copyright 2018 ScyllaDB
#
. /usr/lib/scylla/scylla_lib.sh
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
if [ "$AMI" = "yes" ] && [ -f /etc/scylla/ami_disabled ]; then
rm /etc/scylla/ami_disabled
exit 1
fi
import os
import sys
import glob
from scylla_util import *
if [ "$NETWORK_MODE" = "virtio" ]; then
ip tuntap del mode tap dev $TAP
ip tuntap add mode tap dev $TAP user $USER one_queue vnet_hdr
ip link set dev $TAP up
ip link set dev $TAP master $BRIDGE
chown $USER.$GROUP /dev/vhost-net
elif [ "$NETWORK_MODE" = "dpdk" ]; then
modprobe uio
modprobe uio_pci_generic
/usr/lib/scylla/dpdk-devbind.py --force --bind=uio_pci_generic $ETHPCIID
for n in /sys/devices/system/node/node?; do
echo $NR_HUGEPAGES > $n/hugepages/hugepages-2048kB/nr_hugepages
done
if [ "$ID" = "ubuntu" ]; then
hugeadm --create-mounts
fi
else # NETWORK_MODE = posix
if [ "$SET_NIC" = "yes" ]; then
create_perftune_conf "$IFNAME"
/usr/lib/scylla/posix_net_conf.sh $IFNAME --options-file /etc/scylla.d/perftune.yaml
fi
fi
if __name__ == '__main__':
if os.getuid() > 0:
print('Requires root permission.')
sys.exit(1)
if is_redhat_variant():
cfg = sysconfig_parser('/etc/sysconfig/scylla-server')
else:
cfg = sysconfig_parser('/etc/default/scylla-server')
ami = cfg.get('AMI')
mode = cfg.get('NETWORK_MODE')
/usr/lib/scylla/scylla-blocktune
if ami == 'yes' and os.path.exists('/etc/scylla/ami_disabled'):
os.remove('/etc/scylla/ami_disabled')
sys.exit(1)
if mode == 'virtio':
tap = cfg.get('TAP')
user = cfg.get('USER')
group = cfg.get('GROUP')
bridge = cfg.get('BRIDGE')
run('ip tuntap del mode tap dev {TAP}'.format(TAP=tap))
run('ip tuntap add mode tap dev {TAP} user {USER} one_queue vnet_hdr'.format(TAP=tap, USER=user))
run('ip link set dev {TAP} up'.format(TAP=tap))
run('ip link set dev {TAP} master {BRIDGE}'.format(TAP=tap, BRIDGE=bridge))
run('chown {USER}.{GROUP} /dev/vhost-net'.format(USER=user, GROUP=group))
elif mode == 'dpdk':
ethpcciid = cfg.get('ETHPCIID')
nr_hugepages = cfg.get('NR_HUGEPAGES')
run('modprobe uio')
run('modprobe uio_pci_generic')
run('/usr/lib/scylla/dpdk-devbind.py --force --bind=uio_pci_generic {ETHPCIID}'.format(ETHPCIID=ethpciid))
for n in glob.glob('/sys/devices/system/node/node?'):
with open('{n}/hugepages/hugepages-2048kB/nr_hugepages'.format(n=n), 'w') as f:
f.write(nr_hugepages)
if dist_name() == 'Ubuntu':
run('hugeadm --create-mounts')
fi
else:
set_nic = cfg.get('SET_NIC')
ifname = cfg.get('IFNAME')
if set_nic == 'yes':
create_perftune_conf(ifname)
run('/usr/lib/scylla/posix_net_conf.sh {IFNAME} --options-file /etc/scylla.d/perftune.yaml'.format(IFNAME=ifname))
run('/usr/lib/scylla/scylla-blocktune')

View File

@@ -23,6 +23,8 @@ import os
import argparse
import pwd
import grp
import sys
import stat
from scylla_util import *
if __name__ == '__main__':
@@ -40,6 +42,8 @@ if __name__ == '__main__':
help='specify the root of the tree')
parser.add_argument('--volume-role', default='all',
help='specify how will this device be used (data, commitlog, or all)')
parser.add_argument('--force-raid', action='store_true', default=False,
help='force constructing RAID when only one disk is specified')
args = parser.parse_args()
@@ -60,6 +64,12 @@ if __name__ == '__main__':
if not os.path.exists(disk):
print('{} is not found'.format(disk))
sys.exit(1)
if not stat.S_ISBLK(os.stat(disk).st_mode):
print('{} is not block device'.format(disk))
sys.exit(1)
if not is_unused_disk(disk):
print('{} is busy'.format(disk))
sys.exit(1)
if os.path.exists(args.raiddev):
print('{} is already using'.format(args.raiddev))
@@ -74,12 +84,20 @@ if __name__ == '__main__':
elif is_gentoo_variant():
run('emerge -uq sys-fs/mdadm sys-fs/xfsprogs')
print('Creating RAID0 for scylla using {nr_disk} disk(s): {disks}'.format(nr_disk=len(disks), disks=args.disks))
if len(disks) == 1 and not args.force_raid:
raid = False
fsdev = disks[0]
else:
raid = True
fsdev = args.raiddev
print('Creating {type} for scylla using {nr_disk} disk(s): {disks}'.format(type='RAID0' if raid else 'XFS volume', nr_disk=len(disks), disks=args.disks))
if dist_name() == 'Ubuntu' and dist_ver() == '14.04':
run('udevadm settle')
run('mdadm --create --verbose --force --run {raid} --level=0 -c1024 --raid-devices={nr_disk} {disks}'.format(raid=args.raiddev, nr_disk=len(disks), disks=args.disks.replace(',', ' ')))
run('udevadm settle')
run('mkfs.xfs {} -f'.format(args.raiddev))
if raid:
run('udevadm settle')
run('mdadm --create --verbose --force --run {raid} --level=0 -c1024 --raid-devices={nr_disk} {disks}'.format(raid=fsdev, nr_disk=len(disks), disks=args.disks.replace(',', ' ')))
run('udevadm settle')
run('mkfs.xfs {} -f'.format(fsdev))
else:
procs=[]
for disk in disks:
@@ -93,22 +111,24 @@ if __name__ == '__main__':
procs.append(proc)
for proc in procs:
proc.wait()
run('udevadm settle')
run('mdadm --create --verbose --force --run {raid} --level=0 -c1024 --raid-devices={nr_disk} {disks}'.format(raid=args.raiddev, nr_disk=len(disks), disks=args.disks.replace(',', ' ')))
run('udevadm settle')
run('mkfs.xfs {} -f -K'.format(args.raiddev))
if raid:
run('udevadm settle')
run('mdadm --create --verbose --force --run {raid} --level=0 -c1024 --raid-devices={nr_disk} {disks}'.format(raid=fsdev, nr_disk=len(disks), disks=args.disks.replace(',', ' ')))
run('udevadm settle')
run('mkfs.xfs {} -f -K'.format(fsdev))
if is_debian_variant():
confpath = '/etc/mdadm/mdadm.conf'
else:
confpath = '/etc/mdadm.conf'
res = out('mdadm --detail --scan')
with open(confpath, 'w') as f:
f.write(res)
if raid:
res = out('mdadm --detail --scan')
with open(confpath, 'w') as f:
f.write(res)
makedirs(mount_at)
run('mount -t xfs -o noatime {raid} "{mount_at}"'.format(raid=args.raiddev, mount_at=mount_at))
run('mount -t xfs -o noatime {raid} "{mount_at}"'.format(raid=fsdev, mount_at=mount_at))
makedirs('{}/data'.format(root))
makedirs('{}/commitlog'.format(root))
@@ -122,7 +142,7 @@ if __name__ == '__main__':
os.chown('{}/coredump'.format(root), uid, gid)
if args.update_fstab:
res = out('blkid {}'.format(args.raiddev))
res = out('blkid {}'.format(fsdev))
match = re.search(r'^/dev/\S+: (UUID="\S+")', res.strip())
uuid = match.group(1)
with open('/etc/fstab', 'a') as f:

View File

@@ -22,7 +22,6 @@
import os
import sys
import argparse
import logging
import glob
import shutil
import io
@@ -49,11 +48,28 @@ def interactive_ask_service(msg1, msg2, default = None):
elif ans == 'no' or ans =='n':
return False
def interactive_choose_nic():
nics = [os.path.basename(n) for n in glob.glob('/sys/class/net/*') if n != '/sys/class/net/lo']
if len(nics) == 0:
print('A NIC was not found.')
sys.exit(1)
elif len(nics) == 1:
return nics[0]
else:
print('Please select a NIC from the following list:')
while True:
print(nics)
n = input('> ')
if is_valid_nic(n):
return nic
def do_verify_package(pkg):
if is_debian_variant():
res = run('dpkg -s {}'.format(pkg), silent=True, exception=False)
elif is_redhat_variant():
res = run('rpm -q {}'.format(pkg), silent=True, exception=False)
elif is_gentoo_variant():
res = 1 if len(glob.glob('/var/db/pkg/*/{}-*'.format(pkg))) else 0
if res != 0:
print('{} package is not installed.'.format(pkg))
sys.exit(1)
@@ -67,22 +83,18 @@ def list_block_devices():
devices = []
for p in ['/dev/sd*', '/dev/hd*', '/dev/xvd*', '/dev/nvme*', '/dev/mapper/*']:
devices.extend([d for d in glob.glob(p) if d != '/dev/mapper/control'])
return devices
return devices
def get_unused_disks():
unused = []
for dev in list_block_devices():
with open('/proc/mounts') as f:
s = f.read().strip()
count_raw = len(re.findall('^{} '.format(dev), s, flags=re.MULTILINE))
count_pvs = 0
if shutil.which('pvs'):
s = out('pvs -o pv_name --nohead')
count_pvs = len(re.findall(dev, s, flags=re.MULTILINE))
s = out('swapon --show=NAME --noheadings')
count_swap = len(re.findall(dev, s, flags=re.MULTILINE))
if count_raw + count_pvs + count_swap == 0:
unused.append(dev)
# dev contains partitions
if len(glob.glob('/sys/class/block/{dev}/{dev}*'.format(dev=dev.replace('/dev/','')))) > 0:
continue
# dev is used
if not is_unused_disk(dev):
continue
unused.append(dev)
return unused
def run_setup_script(name, script):
@@ -90,7 +102,7 @@ def run_setup_script(name, script):
res = run(script, exception=False)
if res != 0:
if interactive:
print('{red}{name} setup failed. Press any key to continue...{no_color}'.format(red=concolor.BOLD_RED, name=name, no_color=concolor.NO_COLOR))
colorprint('{red}{name} setup failed. Press any key to continue...{nocolor}'.format(name=name))
input()
else:
print('{} setup failed.'.format(name))
@@ -99,12 +111,12 @@ def run_setup_script(name, script):
if __name__ == '__main__':
if os.getuid() > 0:
logging.error('Requires root permission.')
print('Requires root permission.')
sys.exit(1)
parser = argparse.ArgumentParser(description='Configure environment for Scylla.')
parser.add_argument('--disks',
help='specify disks for RAID')
parser.add_argument('--nic',
parser.add_argument('--nic', default='eth0',
help='specify NIC')
parser.add_argument('--ntp-domain',
help='specify NTP domain')
@@ -115,7 +127,7 @@ if __name__ == '__main__':
parser.add_argument('--developer-mode', action='store_true', default=False,
help='enable developer mode')
parser.add_argument('--no-ec2-check', action='store_true', default=False,
help='skip EC2 configuration check(only on EC2)')
help='skip EC2 configuration check')
parser.add_argument('--no-kernel-check', action='store_true', default=False,
help='skip kernel version check')
parser.add_argument('--no-verify-package', action='store_true', default=False,
@@ -150,12 +162,14 @@ if __name__ == '__main__':
if len(sys.argv) == 1:
interactive = True
if not interactive and not args.no_raid_setup and not args.disks:
parser.print_help()
sys.exit(1)
if not interactive and not args.no_sysconfig_setup and not args.nic:
parser.print_help()
sys.exit(1)
if not interactive:
if not args.no_raid_setup and not args.disks:
parser.print_help()
sys.exit(1)
if not args.no_sysconfig_setup or (is_ec2() and not args.no_ec2_check):
if not is_valid_nic(args.nic):
print('NIC {} doesn\'t exist.'.format(args.nic))
sys.exit(1)
disks = args.disks
nic = args.nic
@@ -178,13 +192,15 @@ if __name__ == '__main__':
fstrim_setup = not args.no_fstrim_setup
selinux_reboot_required = False
print('{green}Skip any of the following steps by answering \'no\'{no_color}'.format(green=concolor.GREEN, no_color=concolor.NO_COLOR))
colorprint('{green}Skip any of the following steps by answering \'no\'{nocolor}')
if is_ec2():
if interactive:
ec2_check = interactive_ask_service('Do you want to run Amazon EC2 configuration check?', 'Yes - runs a script to verify that this instance is optimized for running Scylls. No - skips the configuration check.', 'yes')
if ec2_check:
nic = interactive_choose_nic()
if ec2_check:
run('/usr/lib/scylla/scylla_ec2_check')
run('/usr/lib/scylla/scylla_ec2_check --nic {}'.format(nic))
if interactive:
kernel_check = interactive_ask_service('Do you want to run check your kernel version?', 'Yes - runs a script to verify that the kernel for this instance qualifies to run Scylla. No - skips the kernel check.', 'yes')
@@ -202,7 +218,6 @@ if __name__ == '__main__':
if enable_service:
if is_systemd():
systemd_unit('scylla-server.service').enable()
systemd_unit('scylla-fstrim.timer').unmask()
elif is_gentoo_variant():
run('rc-update add scylla-server default')
@@ -277,10 +292,14 @@ if __name__ == '__main__':
else:
print('Please select unmounted disks from the following list: {}'.format(devices))
selected = []
dsklist = []
while len(devices):
print('type \'cancel\' to cancel RAID/XFS setup.')
print('type \'done\' to finish selection. Selected: {}'.format(selected))
dsk = input('> ')
if len(dsklist) > 0:
dsk = dsklist.pop(0)
else:
dsk = input('> ')
if dsk == 'cancel':
raid_setup = 0
break
@@ -290,12 +309,16 @@ if __name__ == '__main__':
break
if dsk == '':
continue
if dsk.find(',') > 0:
dsklist = dsk.split(',')
continue
if not os.path.exists(dsk):
print('{} not found'.format(dsk))
continue
if not stat.S_ISBLK(os.stat(dsk).st_mode):
print('{} is not block device'.format(dsk))
selected += dsk
continue
selected.append(dsk)
devices.remove(dsk)
disks = ','.join(selected)
if raid_setup:
@@ -312,21 +335,9 @@ if __name__ == '__main__':
if interactive:
sysconfig_setup = interactive_ask_service('Do you want to setup a system-wide customized configuration for Scylla?', 'Yes - setup the sysconfig file. No - skips this step.', 'yes')
if sysconfig_setup:
nics = [os.path.basename(n) for n in glob.glob('/sys/class/net/*') if n != '/sys/class/net/lo']
if len(nics) == 0:
print('A NIC was not found.')
sys.exit(1)
elif len(nics) == 1:
nic=nics[0]
else:
print('Please select a NIC from the following list:')
while True:
print(nics)
n = input('> ')
if os.path.exists('/sys/class/net/{}'.format(n)):
nic = n
break
set_nic = interactive_ask_service('Do you want to enable Network Interface Card (NIC) optimization?', 'Yes - optimize the NIC queue settings. Selecting Yes greatly improves performance. No - skip this step.', 'yes')
nic = interactive_choose_nic()
if interactive:
set_nic = interactive_ask_service('Do you want to enable Network Interface Card (NIC) optimization?', 'Yes - optimize the NIC queue settings. Selecting Yes greatly improves performance. No - skip this step.', 'yes')
if sysconfig_setup:
setup_args = '--setup-nic' if set_nic else ''
run_setup_script('NIC queue', '/usr/lib/scylla/scylla_sysconfig_setup --nic {nic} {setup_args}'.format(nic=nic, setup_args=setup_args))

View File

@@ -1,10 +1,40 @@
#!/bin/bash -e
#!/usr/bin/python3
#
# Copyright 2018 ScyllaDB
#
. /usr/lib/scylla/scylla_lib.sh
#
# This file is part of Scylla.
#
# Scylla is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Scylla is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Scylla. If not, see <http://www.gnu.org/licenses/>.
if [ "$NETWORK_MODE" = "virtio" ]; then
ip tuntap del mode tap dev $TAP
elif [ "$NETWORK_MODE" = "dpdk" ]; then
/usr/lib/scylla/dpdk-devbind.py -u $ETHPCIID
/usr/lib/scylla/dpdk-devbind.py -b $ETHDRV $ETHPCIID
fi
import os
import sys
from scylla_util import *
if __name__ == '__main__':
if os.getuid() > 0:
print('Requires root permission.')
sys.exit(1)
if is_redhat_variant():
cfg = sysconfig_parser('/etc/sysconfig/scylla-server')
else:
cfg = sysconfig_parser('/etc/default/scylla-server')
if cfg.get('NETWORK_MODE') == 'virtio':
run('ip tuntap del mode tap dev {TAP}'.format(TAP=cfg.get('TAP')))
elif cfg.get('NETWORK_MODE') == 'dpdk':
run('/usr/lib/scylla/dpdk-devbind.py -u {ETHPCIID}'.format(ETHPCIID=cfg.get('ETHPCIID')))
run('/usr/lib/scylla/dpdk-devbind.py -b {ETHDRV} {ETHPCIID}'.format(ETHDRV=cfg.get('ETHDRV'), ETHPCIID=cfg.get('ETHPCIID')))

View File

@@ -64,6 +64,10 @@ if __name__ == '__main__':
help='AMI instance mode')
args = parser.parse_args()
if args.nic and not is_valid_nic(args.nic):
print('NIC {} not found.'.format(args.nic))
sys.exit(1)
ifname = args.nic if args.nic else cfg.get('IFNAME')
network_mode = args.mode if args.mode else cfg.get('NETWORK_MODE')

View File

@@ -27,14 +27,19 @@ import platform
import configparser
import io
import shlex
import shutil
def curl(url):
def curl(url, byte=False):
max_retries = 5
retries = 0
while True:
try:
req = urllib.request.Request(url)
return urllib.request.urlopen(req).read().decode('utf-8')
with urllib.request.urlopen(req) as res:
if byte:
return res.read()
else:
return res.read().decode('utf-8')
except urllib.error.HTTPError:
logging.warn("Failed to grab %s..." % url)
time.sleep(5)
@@ -79,6 +84,10 @@ class aws_instance:
continue
self._disks[t] += [ self.__xenify(dev) ]
def __mac_address(self, nic='eth0'):
with open('/sys/class/net/{}/address'.format(nic)) as f:
return f.read().strip()
def __init__(self):
self._type = self.__instance_metadata("instance-type")
self.__populate_disks()
@@ -95,6 +104,25 @@ class aws_instance:
"""Returns the class of the instance we are running in. i.e.: i3"""
return self._type.split(".")[0]
def is_supported_instance_class(self):
if self.instance_class() in ['i2', 'i3']:
return True
return False
def get_en_interface_type(self):
instance_class = self.instance_class()
instance_size = self.instance_size()
if instance_class in ['c3', 'c4', 'd2', 'i2', 'r3']:
return 'ixgbevf'
if instance_class in ['c5', 'c5d', 'f1', 'g3', 'h1', 'i3', 'm5', 'm5d', 'p2', 'p3', 'r4', 'x1']:
return 'ena'
if instance_class == 'm4':
if instance_size == '16xlarge':
return 'ena'
else:
return 'ixgbevf'
return None
def disks(self):
"""Returns all disks in the system, as visible from the AWS registry"""
disks = set()
@@ -133,6 +161,11 @@ class aws_instance:
"""Returns the private IPv4 address of this instance"""
return self.__instance_metadata("local-ipv4")
def is_vpc_enabled(self, nic='eth0'):
mac = self.__mac_address(nic)
mac_stat = self.__instance_metadata('network/interfaces/macs/{}'.format(mac))
return True if re.search(r'^vpc-id$', mac_stat, flags=re.MULTILINE) else False
## Regular expression helpers
# non-advancing comment matcher
@@ -222,37 +255,24 @@ class scylla_cpuinfo:
return len(self._cpu_data["system"])
def run(cmd, shell=False, silent=False, exception=True):
stdout=None
stderr=None
if silent:
stdout=subprocess.DEVNULL
stderr=subprocess.DEVNULL
if shell:
if exception:
return subprocess.check_call(cmd, shell=True, stdout=stdout, stderr=stderr)
else:
p = subprocess.Popen(cmd, shell=True, stdout=stdout, stderr=stderr)
return p.wait()
stdout=subprocess.DEVNULL if silent else None
stderr=subprocess.DEVNULL if silent else None
if not shell:
cmd = shlex.split(cmd)
if exception:
return subprocess.check_call(cmd, shell=shell, stdout=stdout, stderr=stderr)
else:
if exception:
return subprocess.check_call(shlex.split(cmd), stdout=stdout, stderr=stderr)
else:
p = subprocess.Popen(shlex.split(cmd), stdout=stdout, stderr=stderr)
return p.wait()
p = subprocess.Popen(cmd, shell=shell, stdout=stdout, stderr=stderr)
return p.wait()
def out(cmd, shell=False, exception=True):
if shell:
if exception:
return subprocess.check_output(cmd, shell=True).strip().decode('utf-8')
else:
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
return p.communicate()[0].strip().decode('utf-8')
if not shell:
cmd = shlex.split(cmd)
if exception:
return subprocess.check_output(cmd, shell=shell).strip().decode('utf-8')
else:
if exception:
return subprocess.check_output(shlex.split(cmd)).strip().decode('utf-8')
else:
p = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE)
return p.communicate()[0].strip().decode('utf-8')
p = subprocess.Popen(cmd, shell=shell, stdout=subprocess.PIPE)
return p.communicate()[0].strip().decode('utf-8')
def is_debian_variant():
return os.path.exists('/etc/debian_version')
@@ -306,17 +326,80 @@ def makedirs(name):
if not os.path.isdir(name):
os.makedirs(name)
def rmtree(path):
if not os.path.islink(path):
shutil.rmtree(path)
else:
os.remove(path)
def dist_name():
return platform.dist()[0]
def dist_ver():
return platform.dist()[1]
def is_unused_disk(dev):
# dev is not in /sys/class/block/, like /dev/nvme[0-9]+
if not os.path.isdir('/sys/class/block/{dev}'.format(dev=dev.replace('/dev/',''))):
return False
try:
fd = os.open(dev, os.O_EXCL)
os.close(fd)
return True
except OSError:
return False
CONCOLORS = {'green':'\033[1;32m', 'red':'\033[1;31m', 'nocolor':'\033[0m'}
def colorprint(msg):
print(msg.format(**CONCOLORS))
def get_mode_cpuset(nic, mode):
try:
mode_cpu_mask=out('/usr/lib/scylla/perftune.py --tune net --nic "{nic}" --mode "{mode}" --get-cpu-mask'.format(nic=nic, mode=mode))
return hex2list(mode_cpu_mask)
except subprocess.CalledProcessError:
return '-1'
def get_cur_cpuset():
cfg = sysconfig_parser('/etc/scylla.d/cpuset.conf')
cpuset=cfg.get('CPUSET')
return re.sub(r'^--cpuset (.+)$', r'\1', cpuset).strip()
def get_tune_mode(nic):
if not os.path.exists('/etc/scylla.d/cpuset.conf'):
return
cur_cpuset=get_cur_cpuset()
mq_cpuset=get_mode_cpuset(nic, 'mq')
sq_cpuset=get_mode_cpuset(nic, 'sq')
sq_split_cpuset=get_mode_cpuset(nic, 'sq_split')
if cur_cpuset == mq_cpuset:
return 'mq'
elif cur_cpuset == sq_cpuset:
return 'sq'
elif cur_cpuset == sq_split_cpuset:
return 'sq_split'
def create_perftune_conf(nic='eth0'):
if os.path.exists('/etc/scylla.d/perftune.yaml'):
return
mode=get_tune_mode(nic)
yaml=out('/usr/lib/scylla/perftune.py --tune net --nic "{nic}" --mode {mode} --dump-options-file'.format(nic=nic, mode=mode))
with open('/etc/scylla.d/perftune.yaml', 'w') as f:
f.write(yaml)
def is_valid_nic(nic):
return os.path.exists('/sys/class/net/{}'.format(nic))
class SystemdException(Exception):
pass
class systemd_unit:
def __init__(self, unit):
try:
run('systemctl cat {}'.format(unit), silent=True)
except subprocess.CalledProcessError:
raise SystemdException('unit {} not found'.format(unit))
self._unit = unit
def start(self):
@@ -336,8 +419,7 @@ class systemd_unit:
return run('systemctl disable {}'.format(self._unit))
def is_active(self):
res = out('systemctl is-active {}'.format(self._unit), exception=False)
return True if re.match(r'^active', res, flags=re.MULTILINE) else False
return out('systemctl is-active {}'.format(self._unit), exception=False)
def mask(self):
return run('systemctl mask {}'.format(self._unit))
@@ -368,7 +450,7 @@ class sysconfig_parser:
self.__load()
def get(self, key):
return self._cfg.get('global', key)
return self._cfg.get('global', key).strip('"')
def set(self, key, val):
if not self._cfg.has_option('global', key):
@@ -379,9 +461,3 @@ class sysconfig_parser:
def commit(self):
with open(self._filename, 'w') as f:
f.write(self._data)
class concolor:
GREEN = '\033[0;32m'
RED = '\033[0;31m'
BOLD_RED = '\033[1;31m'
NO_COLOR = '\033[0m'

View File

@@ -51,6 +51,18 @@ is_redhat_variant() {
is_debian_variant() {
[ -f /etc/debian_version ]
}
is_debian() {
case "$1" in
jessie|stretch) return 0;;
*) return 1;;
esac
}
is_ubuntu() {
case "$1" in
trusty|xenial|bionic) return 0;;
*) return 1;;
esac
}
pkg_install() {
@@ -99,7 +111,7 @@ if [ ! -f /usr/bin/dh_testdir ]; then
fi
if [ ! -f /usr/bin/pystache ]; then
if is_redhat_variant; then
sudo yum install -y python2-pystache || sudo yum install -y pystache
sudo yum install -y /usr/bin/pystache
elif is_debian_variant; then
sudo apt-get install -y python-pystache
fi
@@ -125,12 +137,12 @@ echo $VERSION > version
cp -a dist/debian/debian debian
cp dist/common/sysconfig/scylla-server debian/scylla-server.default
if [ "$TARGET" = "jessie" ] || [ "$TARGET" = "stretch" ]; then
REVISION="1~$TARGET"
elif [ "$TARGET" = "trusty" ]; then
if [ "$TARGET" = "trusty" ]; then
cp dist/debian/scylla-server.cron.d debian/
REVISION="0ubuntu1~$TARGET"
elif [ "$TARGET" = "xenial" ] || [ "$TARGET" = "bionic" ]; then
fi
if is_debian $TARGET; then
REVISION="1~$TARGET"
elif is_ubuntu $TARGET; then
REVISION="0ubuntu1~$TARGET"
else
echo "Unknown distribution: $TARGET"

View File

@@ -201,7 +201,6 @@ rm -rf $RPM_BUILD_ROOT
%{_prefix}/lib/scylla/api/api-doc/*
%{_prefix}/lib/scylla/scyllatop/*
%{_prefix}/lib/scylla/scylla_config_get.py
%{_prefix}/lib/scylla/scylla_lib.sh
%{_prefix}/lib/scylla/scylla_util.py
%if 0%{?fedora} >= 27
%{_prefix}/lib/scylla/scylla-gdb.py

View File

@@ -0,0 +1,82 @@
Protocol extensions to the Cassandra Native Protocol
====================================================
This document specifies extensions to the protocol defined
by Cassandra's native_protocol_v4.spec and native_protocol_v5.spec.
The extensions are designed so that a driver supporting them can
continue to interoperate with Cassandra and other compatible servers
with no configuration needed; the driver can discover the extensions
and enable them conditionally.
An extension can be discovered by using the OPTIONS request; the
returned SUPPORTED response will have zero or more options beginning
with SCYLLA indicating extensions defined in this documented, in
addition to options documented by Cassandra. How to use the extension
is further explained in this document.
# Intranode sharding
This extension allows the driver to discover how Scylla internally
partitions data among logical cores. It can then create at least
one connection per logical core, and send queries directly to the
logical core that will serve them, greatly improving load balancing
and efficiency.
To use the extension, send the OPTIONS message. The data is returned
in the SUPPORTED message, as a set of key/value options. Numeric values
are returned as their base-10 ASCII representation.
The keys and values are:
- `SCYLLA_SHARD` is an integer, the zero-based shard number this connection
is connected to (for example, `3`).
- `SCYLLA_NR_SHARDS` is an integer containing the number of shards on this
node (for example, `12`). All shard numbers are smaller than this number.
- `SCYLLA_PARTITIONER` is a the fully-qualified name of the partitioner in use (i.e.
`org.apache.cassandra.partitioners.Murmur3Partitioner`).
- `SCYLLA_SHARDING_ALGORITHM` is the name of an algorithm used to select how
partitions are mapped into shards (described below)
- `SCYLLA_SHARDING_IGNORE_MSB` is an integer parameter to the algorithm (also
described below)
Currently, one `SCYLLA_SHARDING_ALGORITHM` is defined,
`biased-token-round-robin`. To apply the algorithm,
perform the following steps (assuming infinite-precision arithmetic):
- subtract the minimum token value from the partition's token
in order to bias it: `biased_token = token - (-2**63)`
- shift `biased_token` left by `ignore_msb` bits, discarding any
bits beyond the 63rd:
`biased_token = (biased_token << SCYLLA_SHARDING_IGNORE_MSB) % (2**64)`
- multiply by `SCYLLA_NR_SHARDS` and perform a truncating division by 2**64:
`shard = (biased_token * SCYLLA_NR_SHARDS) / 2**64`
(this apparently convoluted algorithm replaces a slow division instruction with
a fast multiply instruction).
in C with 128-bit arithmetic support, these operations can be efficiently
performed in three steps:
```c++
uint64_t biased_token = token + ((uint64_t)1 << 63);
biased_token <<= ignore_msb;
int shard = ((unsigned __int128)biased_token * nr_shards) >> 64;
```
In languages without 128-bit arithmetic support, use the following (this example
is for Java):
```Java
private int scyllaShardOf(long token) {
token += Long.MIN_VALUE;
token <<= ignoreMsb;
long tokLo = token & 0xffffffffL;
long tokHi = (token >>> 32) & 0xffffffffL;
long mul1 = tokLo * nrShards;
long mul2 = tokHi * nrShards;
long sum = (mul1 >>> 32) + mul2;
return (int)(sum >>> 32);
}
```
It is recommended that drivers open connections until they have at
least one connection per shard, then close excess connections.

View File

@@ -92,7 +92,7 @@ public:
imr::member<tags::back_pointer, imr::tagged_type<tags::back_pointer, imr::pod<basic_object*>>>,
imr::member<tags::object, Structure>
>;
static constexpr size_t size_overhead = sizeof(basic_object*);
private:
explicit object(uint8_t* ptr) noexcept
: basic_object(ptr)

View File

@@ -164,6 +164,30 @@ abstract_replication_strategy::get_primary_ranges(inet_address ep) {
return ret;
}
dht::token_range_vector
abstract_replication_strategy::get_primary_ranges_within_dc(inet_address ep) {
dht::token_range_vector ret;
sstring local_dc = _snitch->get_datacenter(ep);
std::unordered_set<inet_address> local_dc_nodes = _token_metadata.get_topology().get_datacenter_endpoints().at(local_dc);
auto prev_tok = _token_metadata.sorted_tokens().back();
for (auto tok : _token_metadata.sorted_tokens()) {
auto&& eps = calculate_natural_endpoints(tok, _token_metadata);
// Unlike get_primary_ranges() which checks if ep is the first
// owner of this range, here we check if ep is the first just
// among nodes which belong to the local dc of ep.
for (auto& e : eps) {
if (local_dc_nodes.count(e)) {
if (e == ep) {
insert_token_range_to_sorted_container_while_unwrapping(prev_tok, tok, ret);
}
break;
}
}
prev_tok = tok;
}
return ret;
}
std::unordered_multimap<inet_address, dht::token_range>
abstract_replication_strategy::get_address_ranges(token_metadata& tm) const {
std::unordered_multimap<inet_address, dht::token_range> ret;

View File

@@ -113,6 +113,10 @@ public:
// This function is the analogue of Origin's
// StorageService.getPrimaryRangesForEndpoint().
dht::token_range_vector get_primary_ranges(inet_address ep);
// get_primary_ranges_within_dc() is similar to get_primary_ranges()
// except it assigns a primary node for each range within each dc,
// instead of one node globally.
dht::token_range_vector get_primary_ranges_within_dc(inet_address ep);
std::unordered_multimap<inet_address, dht::token_range> get_address_ranges(token_metadata& tm) const;

View File

@@ -1396,12 +1396,17 @@ row::row(const schema& s, column_kind kind, const row& o)
if (_type == storage_type::vector) {
auto& other_vec = o._storage.vector;
auto& vec = *new (&_storage.vector) vector_storage;
vec.present = other_vec.present;
vec.v.reserve(other_vec.v.size());
column_id id = 0;
for (auto& cell : other_vec.v) {
auto& cdef = s.column_at(kind, id++);
vec.v.emplace_back(cell_and_hash { cell.cell.copy(*cdef.type), cell.hash });
try {
vec.present = other_vec.present;
vec.v.reserve(other_vec.v.size());
column_id id = 0;
for (auto& cell : other_vec.v) {
auto& cdef = s.column_at(kind, id++);
vec.v.emplace_back(cell_and_hash{cell.cell.copy(*cdef.type), cell.hash});
}
} catch (...) {
_storage.vector.~vector_storage();
throw;
}
} else {
auto cloner = [&] (const auto& x) {

View File

@@ -457,7 +457,10 @@ coroutine partition_entry::apply_to_incomplete(const schema& s,
pe.upgrade(pe_schema.shared_from_this(), s.shared_from_this(), pe_cleaner, no_cache_tracker);
}
bool can_move = !pe._snapshot;
// When preemptible, later memtable reads could start using the snapshot before
// snapshot's writes are made visible in cache, which would cause them to miss those writes.
// So we cannot allow erasing when preemptible.
bool can_move = !preemptible && !pe._snapshot;
auto src_snp = pe.read(reg, pe_cleaner, s.shared_from_this(), no_cache_tracker);
lw_shared_ptr<partition_snapshot> prev_snp;

View File

@@ -273,6 +273,11 @@ public:
return is_partition_end() || (_ck && _ck->is_empty(s) && _bound_weight > 0);
}
bool is_before_all_clustered_rows(const schema& s) const {
return _type < partition_region::clustered
|| (_type == partition_region::clustered && _ck->is_empty(s) && _bound_weight < 0);
}
template<typename Hasher>
void feed_hash(Hasher& hasher, const schema& s) const {
::feed_hash(hasher, _bound_weight);

View File

@@ -1004,6 +1004,22 @@ static dht::token_range_vector get_primary_ranges(
utils::fb_utilities::get_broadcast_address());
}
// get_primary_ranges_within_dc() is similar to get_primary_ranges(),
// but instead of each range being assigned just one primary owner
// across the entire cluster, here each range is assigned a primary
// owner in each of the clusters.
static dht::token_range_vector get_primary_ranges_within_dc(
database& db, sstring keyspace) {
auto& rs = db.find_keyspace(keyspace).get_replication_strategy();
return rs.get_primary_ranges_within_dc(
utils::fb_utilities::get_broadcast_address());
}
static sstring get_local_dc() {
return locator::i_endpoint_snitch::get_local_snitch_ptr()->get_datacenter(
utils::fb_utilities::get_broadcast_address());
}
struct repair_options {
// If primary_range is true, we should perform repair only on this node's
@@ -1256,21 +1272,14 @@ static int do_repair_start(seastar::sharded<database>& db, sstring keyspace,
rlogger.info("primary-range repair");
// when "primary_range" option is on, neither data_centers nor hosts
// may be set, except data_centers may contain only local DC (-local)
#if 0
if (options.data_centers.size() == 1 &&
options.data_centers[0] == DatabaseDescriptor.getLocalDataCenter()) {
options.data_centers[0] == get_local_dc()) {
ranges = get_primary_ranges_within_dc(db.local(), keyspace);
} else
#endif
#if 0
if (options.data_centers.size() > 0 || options.hosts.size() > 0) {
} else if (options.data_centers.size() > 0 || options.hosts.size() > 0) {
throw std::runtime_error("You need to run primary range repair on all nodes in the cluster.");
} else {
#endif
ranges = get_primary_ranges(db.local(), keyspace);
#if 0
}
#endif
} else {
ranges = get_local_ranges(db.local(), keyspace);
}

View File

@@ -1,109 +0,0 @@
#!/bin/bash -e
#
# Copyright (C) 2015 ScyllaDB
if [ "`id -u`" -ne 0 ]; then
echo "Requires root permission."
exit 1
fi
print_usage() {
echo "scylla_install_pkg --local-pkg /home/scylla/rpms --repo [URL]"
echo " --local-pkg install locally built .rpm/.deb on specified directory"
echo " --repo repository for both install and update, specify .repo/.list file URL"
echo " --repo-for-install repository for install, specify .repo/.list file URL"
echo " --repo-for-update repository for update, specify .repo/.list file URL"
exit 1
}
LOCAL_PKG=
UNSTABLE=0
REPO_FOR_INSTALL=
REPO_FOR_UPDATE=
while [ $# -gt 0 ]; do
case "$1" in
"--local-pkg")
LOCAL_PKG=$2
shift 2
;;
"--repo")
REPO_FOR_INSTALL=$2
REPO_FOR_UPDATE=$2
shift 2
;;
"--repo-for-install")
REPO_FOR_INSTALL=$2
shift 2
;;
"--repo-for-update")
REPO_FOR_UPDATE=$2
shift 2
;;
*)
print_usage
shift 1
;;
esac
done
. /etc/os-release
if [ -f /etc/debian_version ]; then
echo "#!/bin/sh" >> /usr/sbin/policy-rc.d
echo "exit 101" >> /usr/sbin/policy-rc.d
chmod +x /usr/sbin/policy-rc.d
cp /etc/hosts /etc/hosts.orig
echo 127.0.0.1 `hostname` >> /etc/hosts
if [ "$REPO_FOR_INSTALL" != "" ]; then
curl -L -o /etc/apt/sources.list.d/scylla_install.list $REPO_FOR_INSTALL
fi
apt-get -o Acquire::AllowInsecureRepositories=true \
-o Acquire::AllowDowngradeToInsecureRepositories=true update
if [ "$LOCAL_PKG" = "" ]; then
apt-get install -o APT::Get::AllowUnauthenticated=true \
-y --force-yes scylla
else
if [ ! -f /usr/bin/gdebi ]; then
apt-get install -y --force-yes gdebi-core
fi
echo Y | gdebi $LOCAL_PKG/scylla-kernel-conf*.deb
echo Y | gdebi $LOCAL_PKG/scylla-conf*.deb
echo Y | gdebi $LOCAL_PKG/scylla-server_*.deb
echo Y | gdebi $LOCAL_PKG/scylla-server-dbg*.deb
echo Y | gdebi $LOCAL_PKG/scylla-jmx*.deb
echo Y | gdebi $LOCAL_PKG/scylla-tools*.deb
echo Y | gdebi $LOCAL_PKG/scylla_*.deb
fi
mv /etc/hosts.orig /etc/hosts
rm /usr/sbin/policy-rc.d
rm /etc/apt/sources.list.d/scylla_install.list
if [ "$REPO_FOR_UPDATE" != "" ]; then
curl -L -o /etc/apt/sources.list.d/scylla.list $REPO_FOR_UPDATE
fi
apt-get -o Acquire::AllowInsecureRepositories=true \
-o Acquire::AllowDowngradeToInsecureRepositories=true update
else
if [ "$REPO_FOR_INSTALL" != "" ]; then
curl -L -o /etc/yum.repos.d/scylla_install.repo $REPO_FOR_INSTALL
fi
if [ "$ID" = "centos" ]; then
yum install -y epel-release
elif [ "$ID" = "rhel" ]; then
rpm -ivh http://download.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-7.noarch.rpm
else
echo "Unsupported distribution"
exit 1
fi
if [ "$LOCAL_PKG" = "" ]; then
yum install -y scylla
else
yum install -y $LOCAL_PKG/scylla*.*.rpm
fi
rm /etc/yum.repos.d/scylla_install.repo
if [ "$REPO_FOR_UPDATE" != "" ]; then
curl -L -o /etc/yum.repos.d/scylla.repo $REPO_FOR_UPDATE
fi
fi

Submodule seastar updated: d7f35d7663...814a0552b6

View File

@@ -85,7 +85,7 @@ static bool has_clustering_keys(const schema& s, const query::read_command& cmd)
_query_read_repair_decision = state->get_query_read_repair_decision();
} else {
// Reusing readers is currently only supported for singular queries.
if (_ranges.front().is_singular()) {
if (!_ranges.empty() && query::is_single_partition(_ranges.front())) {
_cmd->query_uuid = utils::make_random_uuid();
}
_cmd->is_first_page = true;

View File

@@ -211,7 +211,7 @@ protected:
protected:
virtual bool waited_for(gms::inet_address from) = 0;
virtual void signal(gms::inet_address from) {
void signal(gms::inet_address from) {
if (waited_for(from)) {
signal();
}
@@ -221,7 +221,7 @@ public:
abstract_write_response_handler(shared_ptr<storage_proxy> p, keyspace& ks, db::consistency_level cl, db::write_type type,
std::unique_ptr<mutation_holder> mh, std::unordered_set<gms::inet_address> targets, tracing::trace_state_ptr trace_state,
storage_proxy::write_stats& stats, size_t pending_endpoints = 0, std::vector<gms::inet_address> dead_endpoints = {})
: _id(p->_next_response_id++), _proxy(std::move(p)), _trace_state(trace_state), _cl(cl), _type(type), _mutation_holder(std::move(mh)), _targets(std::move(targets)),
: _id(p->get_next_response_id()), _proxy(std::move(p)), _trace_state(trace_state), _cl(cl), _type(type), _mutation_holder(std::move(mh)), _targets(std::move(targets)),
_dead_endpoints(std::move(dead_endpoints)), _stats(stats) {
// original comment from cassandra:
// during bootstrap, include pending endpoints in the count
@@ -285,10 +285,13 @@ public:
}
// return true on last ack
bool response(gms::inet_address from) {
signal(from);
auto it = _targets.find(from);
assert(it != _targets.end());
_targets.erase(it);
if (it != _targets.end()) {
signal(from);
_targets.erase(it);
} else {
slogger.warn("Receive outdated write ack from {}", from);
}
return _targets.size() == 0;
}
future<> wait() {
@@ -632,9 +635,12 @@ void storage_proxy_stats::split_stats::register_metrics_for(gms::inet_address ep
}
}
using namespace std::literals::chrono_literals;
storage_proxy::~storage_proxy() {}
storage_proxy::storage_proxy(distributed<database>& db, storage_proxy::config cfg)
: _db(db)
, _next_response_id(std::chrono::system_clock::now().time_since_epoch()/1ms)
, _hints_resource_manager(cfg.available_memory / 10)
, _hints_for_views_manager(_db.local().get_config().data_file_directories()[0] + "/view_pending_updates", {}, _db.local().get_config().max_hint_window_in_ms(), _hints_resource_manager, _db)
, _background_write_throttle_threahsold(cfg.available_memory / 10) {
@@ -3323,9 +3329,22 @@ storage_proxy::query_partition_key_range(lw_shared_ptr<query::read_command> cmd,
slogger.debug("Estimated result rows per range: {}; requested rows: {}, ranges.size(): {}; concurrent range requests: {}",
result_rows_per_range, cmd->row_limit, ranges.size(), concurrency_factor);
// The call to `query_partition_key_range_concurrent()` below
// updates `cmd` directly when processing the results. Under
// some circumstances, when the query executes without deferring,
// this updating will happen before the lambda object is constructed
// and hence the updates will be visible to the lambda. This will
// result in the merger below trimming the results according to the
// updated (decremented) limits and causing the paging logic to
// declare the query exhausted due to the non-full page. To avoid
// this save the original values of the limits here and pass these
// to the lambda below.
const auto row_limit = cmd->row_limit;
const auto partition_limit = cmd->partition_limit;
return query_partition_key_range_concurrent(query_options.timeout(*this), std::move(results), cmd, cl, ranges.begin(), std::move(ranges),
concurrency_factor, std::move(query_options.trace_state), cmd->row_limit, cmd->partition_limit)
.then([row_limit = cmd->row_limit, partition_limit = cmd->partition_limit](std::vector<foreign_ptr<lw_shared_ptr<query::result>>> results) {
.then([row_limit, partition_limit](std::vector<foreign_ptr<lw_shared_ptr<query::result>>> results) {
query::result_merger merger(row_limit, partition_limit);
merger.reserve(results.size());

View File

@@ -143,7 +143,7 @@ public:
};
private:
distributed<database>& _db;
response_id_type _next_response_id = 1; // 0 is reserved for unique_response_handler
response_id_type _next_response_id;
std::unordered_map<response_id_type, rh_entry> _response_handlers;
// This buffer hold ids of throttled writes in case resource consumption goes
// below the threshold and we want to unthrottle some of them. Without this throttled
@@ -263,6 +263,13 @@ public:
return _db;
}
response_id_type get_next_response_id() {
auto next = _next_response_id++;
if (next == 0) { // 0 is reserved for unique_response_handler
next = _next_response_id++;
}
return next;
}
void init_messaging_service();
// Applies mutation on this node.

View File

@@ -2643,14 +2643,20 @@ future<> storage_service::send_replication_notification(inet_address remote) {
// notify the remote token
auto done = make_shared<bool>(false);
auto local = get_broadcast_address();
auto sent = make_lw_shared<int>(0);
slogger.debug("Notifying {} of replication completion", remote);
return do_until(
[done, remote] {
return *done || !gms::get_local_failure_detector().is_alive(remote);
[done, sent, remote] {
// The node can send REPLICATION_FINISHED to itself, in which case
// is_alive will be true. If the messaging_service is stopped,
// REPLICATION_FINISHED can be sent infinitely here. To fix, limit
// the number of retries.
return *done || !gms::get_local_failure_detector().is_alive(remote) || *sent >= 3;
},
[done, remote, local] {
[done, sent, remote, local] {
auto& ms = netw::get_local_messaging_service();
netw::msg_addr id{remote, 0};
(*sent)++;
return ms.send_replication_finished(id, local).then_wrapped([id, done] (auto&& f) {
try {
f.get();

View File

@@ -33,6 +33,7 @@
#include "unimplemented.hh"
#include "stdx.hh"
#include "segmented_compress_params.hh"
#include "utils/class_registrator.hh"
namespace sstables {
@@ -299,7 +300,8 @@ size_t local_compression::compress_max_size(size_t input_len) const {
void compression::set_compressor(compressor_ptr c) {
if (c) {
auto& cn = c->name();
unqualified_name uqn(compressor::namespace_prefix, c->name());
const sstring& cn = uqn;
name.value = bytes(cn.begin(), cn.end());
for (auto& p : c->options()) {
if (p.first != compression_parameters::SSTABLE_COMPRESSION) {

View File

@@ -2993,3 +2993,18 @@ SEASTAR_TEST_CASE(test_time_conversions) {
});
}
// Corner-case test that checks for the paging code's preparedness for an empty
// range list.
SEASTAR_TEST_CASE(test_empty_partition_range_scan) {
return do_with_cql_env_thread([] (cql_test_env& e) {
e.execute_cql("create keyspace empty_partition_range_scan with replication = {'class': 'SimpleStrategy', 'replication_factor': 1};").get();
e.execute_cql("create table empty_partition_range_scan.tb (a int, b int, c int, val int, PRIMARY KEY ((a,b),c) );").get();
auto qo = std::make_unique<cql3::query_options>(db::consistency_level::LOCAL_ONE, infinite_timeout_config, std::vector<cql3::raw_value>{},
cql3::query_options::specific_options{1, nullptr, {}, api::new_timestamp()});
auto res = e.execute_cql("select * from empty_partition_range_scan.tb where token (a,b) > 1 and token(a,b) <= 1;", std::move(qo)).get0();
assert_that(res).is_rows().is_empty();
});
}

View File

@@ -29,6 +29,9 @@
#include "database.hh"
#include "partition_slice_builder.hh"
#include "frozen_mutation.hh"
#include "mutation_source_test.hh"
#include "schema_registry.hh"
#include "service/migration_manager.hh"
SEASTAR_TEST_CASE(test_querying_with_limits) {
return do_with_cql_env([](cql_test_env& e) {
@@ -74,3 +77,33 @@ SEASTAR_TEST_CASE(test_querying_with_limits) {
});
});
}
SEASTAR_THREAD_TEST_CASE(test_database_with_data_in_sstables_is_a_mutation_source) {
do_with_cql_env([] (cql_test_env& e) {
run_mutation_source_tests([&] (schema_ptr s, const std::vector<mutation>& partitions) -> mutation_source {
try {
e.local_db().find_column_family(s->ks_name(), s->cf_name());
service::get_local_migration_manager().announce_column_family_drop(s->ks_name(), s->cf_name(), true).get();
} catch (const no_such_column_family&) {
// expected
}
service::get_local_migration_manager().announce_new_column_family(s, true).get();
column_family& cf = e.local_db().find_column_family(s);
for (auto&& m : partitions) {
e.local_db().apply(cf.schema(), freeze(m)).get();
}
cf.flush().get();
cf.get_row_cache().invalidate([] {}).get();
return mutation_source([&] (schema_ptr s,
const dht::partition_range& range,
const query::partition_slice& slice,
const io_priority_class& pc,
tracing::trace_state_ptr trace_state,
streamed_mutation::forwarding fwd,
mutation_reader::forwarding fwd_mr) {
return cf.make_reader(s, range, slice, pc, std::move(trace_state), fwd, fwd_mr);
});
});
return make_ready_future<>();
}).get();
}

View File

@@ -659,6 +659,46 @@ void test_mutation_reader_fragments_have_monotonic_positions(populate_fn populat
});
}
static void test_date_tiered_clustering_slicing(populate_fn populate) {
BOOST_TEST_MESSAGE(__PRETTY_FUNCTION__);
simple_schema ss;
auto s = schema_builder(ss.schema())
.set_compaction_strategy(sstables::compaction_strategy_type::date_tiered)
.build();
auto pkey = ss.make_pkey();
mutation m1(s, pkey);
ss.add_static_row(m1, "s");
m1.partition().apply(ss.new_tombstone());
ss.add_row(m1, ss.make_ckey(0), "v1");
mutation_source ms = populate(s, {m1});
// query row outside the range of existing rows to exercise sstable clustering key filter
{
auto slice = partition_slice_builder(*s)
.with_range(ss.make_ckey_range(1, 2))
.build();
auto prange = dht::partition_range::make_singular(pkey);
assert_that(ms.make_reader(s, prange, slice))
.produces(m1, slice.row_ranges(*s, pkey.key()))
.produces_end_of_stream();
}
{
auto slice = partition_slice_builder(*s)
.with_range(query::clustering_range::make_singular(ss.make_ckey(0)))
.build();
auto prange = dht::partition_range::make_singular(pkey);
assert_that(ms.make_reader(s, prange, slice))
.produces(m1)
.produces_end_of_stream();
}
}
static void test_clustering_slices(populate_fn populate) {
BOOST_TEST_MESSAGE(__PRETTY_FUNCTION__);
auto s = schema_builder("ks", "cf")
@@ -1012,6 +1052,7 @@ void test_slicing_with_overlapping_range_tombstones(populate_fn populate) {
}
void run_mutation_reader_tests(populate_fn populate) {
test_date_tiered_clustering_slicing(populate);
test_fast_forwarding_across_partitions_to_empty_range(populate);
test_clustering_slices(populate);
test_mutation_reader_fragments_have_monotonic_positions(populate);

View File

@@ -53,7 +53,7 @@
#include "cell_locking.hh"
#include "flat_mutation_reader_assertions.hh"
#include "service/storage_proxy.hh"
#include "random-utils.hh"
#include "simple_schema.hh"
using namespace std::chrono_literals;
@@ -78,7 +78,7 @@ static atomic_cell make_atomic_cell(data_type dt, T value) {
template<typename T>
static atomic_cell make_collection_member(data_type dt, T value) {
return atomic_cell::make_live(*dt, 0, dt->decompose(std::move(value)));
return atomic_cell::make_live(*dt, 0, dt->decompose(std::move(value)), atomic_cell::collection_member::yes);
};
static mutation_partition get_partition(memtable& mt, const partition_key& key) {
@@ -1603,3 +1603,116 @@ SEASTAR_TEST_CASE(test_continuity_merging) {
}
});
}
class measuring_allocator final : public allocation_strategy {
size_t _allocated_bytes;
public:
virtual void* alloc(migrate_fn mf, size_t size, size_t alignment) override {
_allocated_bytes += size;
return standard_allocator().alloc(mf, size, alignment);
}
virtual void free(void* ptr, size_t size) override {
standard_allocator().free(ptr, size);
}
virtual void free(void* ptr) override {
standard_allocator().free(ptr);
}
virtual size_t object_memory_size_in_allocator(const void* obj) const noexcept override {
return standard_allocator().object_memory_size_in_allocator(obj);
}
size_t allocated_bytes() const { return _allocated_bytes; }
};
SEASTAR_THREAD_TEST_CASE(test_external_memory_usage) {
measuring_allocator alloc;
auto s = simple_schema();
auto generate = [&s] {
size_t data_size = 0;
auto m = mutation(s.schema(), s.make_pkey("pk"));
auto row_count = tests::random::get_int(1, 16);
for (auto i = 0; i < row_count; i++) {
auto ck_value = to_hex(tests::random::get_bytes(tests::random::get_int(1023) + 1));
data_size += ck_value.size();
auto ck = s.make_ckey(ck_value);
auto value = to_hex(tests::random::get_bytes(tests::random::get_int(128 * 1024)));
data_size += value.size();
s.add_row(m, ck, value);
}
return std::pair(std::move(m), data_size);
};
for (auto i = 0; i < 16; i++) {
auto [ m, size ] = generate();
with_allocator(alloc, [&] {
auto before = alloc.allocated_bytes();
auto m2 = m;
auto after = alloc.allocated_bytes();
BOOST_CHECK_EQUAL(m.partition().external_memory_usage(*s.schema()),
m2.partition().external_memory_usage(*s.schema()));
BOOST_CHECK_GE(m.partition().external_memory_usage(*s.schema()), size);
BOOST_CHECK_EQUAL(m.partition().external_memory_usage(*s.schema()), after - before);
});
}
}
SEASTAR_THREAD_TEST_CASE(test_cell_external_memory_usage) {
measuring_allocator alloc;
auto test_live_atomic_cell = [&] (data_type dt, bytes_view bv) {
with_allocator(alloc, [&] {
auto before = alloc.allocated_bytes();
auto ac = atomic_cell_or_collection(atomic_cell::make_live(*dt, 1, bv));
auto after = alloc.allocated_bytes();
BOOST_CHECK_GE(ac.external_memory_usage(*dt), bv.size());
BOOST_CHECK_EQUAL(ac.external_memory_usage(*dt), after - before);
});
};
test_live_atomic_cell(int32_type, { });
test_live_atomic_cell(int32_type, int32_type->decompose(int32_t(1)));
test_live_atomic_cell(bytes_type, { });
test_live_atomic_cell(bytes_type, bytes(1, 'a'));
test_live_atomic_cell(bytes_type, bytes(16, 'a'));
test_live_atomic_cell(bytes_type, bytes(32, 'a'));
test_live_atomic_cell(bytes_type, bytes(1024, 'a'));
test_live_atomic_cell(bytes_type, bytes(64 * 1024 - 1, 'a'));
test_live_atomic_cell(bytes_type, bytes(64 * 1024, 'a'));
test_live_atomic_cell(bytes_type, bytes(64 * 1024 + 1, 'a'));
test_live_atomic_cell(bytes_type, bytes(1024 * 1024, 'a'));
auto test_collection = [&] (bytes_view bv) {
auto collection_type = map_type_impl::get_instance(int32_type, bytes_type, true);
auto m = make_collection_mutation({ }, int32_type->decompose(0), make_collection_member(bytes_type, data_value(bytes(bv))));
auto cell = atomic_cell_or_collection(collection_type->serialize_mutation_form(m));
with_allocator(alloc, [&] {
auto before = alloc.allocated_bytes();
auto cell2 = cell.copy(*collection_type);
auto after = alloc.allocated_bytes();
BOOST_CHECK_GE(cell2.external_memory_usage(*collection_type), bv.size());
BOOST_CHECK_EQUAL(cell2.external_memory_usage(*collection_type), cell.external_memory_usage(*collection_type));
BOOST_CHECK_EQUAL(cell2.external_memory_usage(*collection_type), after - before);
});
};
test_collection({ });
test_collection(bytes(1, 'a'));
test_collection(bytes(16, 'a'));
test_collection(bytes(32, 'a'));
test_collection(bytes(1024, 'a'));
test_collection(bytes(64 * 1024 - 1, 'a'));
test_collection(bytes(64 * 1024, 'a'));
test_collection(bytes(64 * 1024 + 1, 'a'));
test_collection(bytes(1024 * 1024, 'a'));
}

View File

@@ -1111,7 +1111,12 @@ std::unique_ptr<cql_server::response> cql_server::connection::make_supported(int
opts.insert({"CQL_VERSION", cql3::query_processor::CQL_VERSION});
opts.insert({"COMPRESSION", "lz4"});
opts.insert({"COMPRESSION", "snappy"});
auto& part = dht::global_partitioner();
opts.insert({"SCYLLA_SHARD", sprint("%d", engine().cpu_id())});
opts.insert({"SCYLLA_NR_SHARDS", sprint("%d", smp::count)});
opts.insert({"SCYLLA_SHARDING_ALGORITHM", part.cpu_sharding_algorithm_name()});
opts.insert({"SCYLLA_SHARDING_IGNORE_MSB", sprint("%d", part.sharding_ignore_msb())});
opts.insert({"SCYLLA_PARTITIONER", part.name()});
auto response = std::make_unique<cql_server::response>(stream, cql_binary_opcode::SUPPORTED, tr_state);
response->write_string_multimap(opts);
return response;

View File

@@ -164,7 +164,7 @@ class unqualified_name {
public:
// can be optimized with string_views etc.
unqualified_name(const sstring& pkg_pfx, const sstring& name)
: _qname(name.compare(0, pkg_pfx.size(), pkg_pfx) == 0 ? name.substr(pkg_pfx.size() + 1) : name)
: _qname(name.compare(0, pkg_pfx.size(), pkg_pfx) == 0 ? name.substr(pkg_pfx.size()) : name)
{}
operator const sstring&() const {
return _qname;