Commit Graph

225 Commits

Author SHA1 Message Date
Daniel Fiala
06089474c9 Print warning if user uses default cluster_name
* Configuration for cluster_name is commented-out in config file.
* Default value set to empty string and if not rewritten by user then
  warning is printed and value is reset to "ScyllaDB Cluster".

Fixes #2648.

Message-Id: <20170808113322.9313-1-daniel@scylladb.com>
2017-08-08 14:47:17 +03:00
Duarte Nunes
e11e66723a main: Don't catch polymorphic exceptions by value
GCC trunk complains due to exception slicing.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170727163021.8000-1-duarte@scylladb.com>
2017-07-31 10:12:13 +03:00
Tomasz Grabiec
8624edc0fa legacy_schema_migrator: Take storage_proxy as dependency
Will be needed to query for mutations.
2017-07-11 14:52:23 +02:00
Glauber Costa
780a6e4d2e change task quota's default
The default of 2ms is somewhat arbitrary. Now that we have a lot more
mileage deploying Scylla applications in production it does sound not
only arbitrary, but high.

In particular, it is really hard to achieve 1ms latencies in the face of
CPU-heavy workloads with it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <1499354495-27173-1-git-send-email-glauber@scylladb.com>
2017-07-11 13:50:39 +03:00
Gleb Natapov
d23111312f main: wait for wait_for_gossip_to_settle() to complete during boot
Boot should not continue until a future returned by
wait_for_gossip_to_settle() is resolved.  Commit 991ec4a16 mistakenly
broke that, so restore it back. Also fix calls for supervisor::notify()
to be in the right places.

Message-Id: <20170702082355.GQ14563@scylladb.com>
2017-07-02 11:32:36 +03:00
Avi Kivity
c4ae2206c7 messaging: respect inter_dc_tcp_nodelay configuration parameter
We respect it partially (client side only) for now.

Fixes #6.
Message-Id: <20170623172048.23103-1-avi@scylladb.com>
2017-06-24 21:49:27 +02:00
Gleb Natapov
ca812a8ea0 database: reset node's hit rate information on connection drop
Node may go down, so after it restarts cache hit rate info will be
incorrect and it can be overwhelmed with traffic until new and
up-to-date cache hit rate arrives. Solve this by dropping node's
information on connection reset, it is more accurate than relying on
gossip which may be slow and miss reboot of a node.
2017-06-13 09:57:14 +03:00
Gleb Natapov
991ec4a16c periodically calculate avg cache hit rate between all shards
This patch adds new class cache_hitrate_calculator whose responsibility
is to periodically calculate average cache hit rates between all shards
for each CF.
2017-06-13 09:57:14 +03:00
Avi Kivity
8979d7abf0 Deprecate non-murmur3 partitioners
Removing non-murmur3 partitioners will allow us to reduce memory footprint
and speed up some code by utilizing the properties of the murmur3 partitioner
token.
Message-Id: <20170528172536.16079-1-avi@scylladb.com>
2017-05-28 19:35:56 +02:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Calle Wilund
48ddcbb77b database/main: encapsulate system CF dir touching 2017-05-10 16:44:47 +00:00
Calle Wilund
9eb91bc30b main: Add legacy schema migration to startup 2017-05-10 16:44:47 +00:00
Takuya ASADA
7a59336b8a main.cc: drop FS type check
Since we add support ext4, we don't need to limit filesystem to XFS anymore.

Fixes #1933

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1493212525-26264-1-git-send-email-syuu@scylladb.com>
2017-04-26 17:35:55 +03:00
Raphael S. Carvalho
43ac19eb52 database: wire up new resharding algorithm
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-04-21 17:11:31 -03:00
Vlad Zolotarov
2d8fcde695 init: add a proper message when there is a bad 'seeds' configuration
Fixes #2193

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1490912678-32004-1-git-send-email-vladz@scylladb.com>
2017-04-02 10:41:52 +03:00
Tomasz Grabiec
388315c1ff sstables: Expose index metrics 2017-03-28 18:10:39 +02:00
Amnon Heiman
7b04841dda main: Name the http servers
In main there are two http servers that start, the API and prometheus.
This patch name them accordingly so their metrics will have more
meaning.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1489055282-10887-1-git-send-email-amnon@scylladb.com>
2017-03-09 12:30:49 +02:00
Amnon Heiman
4e8d73098f main: Prometheus should start as early as possible
There is no need to wait when starting the prometheus server. As it is
up to each of the modules to register its metrics when it is ready.

This is especially important when debuging boot issues.

This patch moves the prometheus initilization to be done at an early
stage of the boot sequencec.

Fixes #2144

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1489041986-28974-1-git-send-email-amnon@scylladb.com>
2017-03-09 11:26:51 +02:00
Vlad Zolotarov
f2e4629254 main.cc: expose scylla version as a gauge metrics
Add a new metric that exposes the current ScyllaDB version as a
gauge metrics.

The version is exposed as a label with the "version" key.

Fixes #1979

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1487083703-27929-1-git-send-email-vladz@scylladb.com>
2017-02-16 16:57:55 +02:00
Calle Wilund
c4c4eb06c4 main.cc: remove scylla dns dependency
Use seastar facilities instead.
2017-02-06 11:36:57 +00:00
Calle Wilund
feffc2bbe1 main/init: Lookup inet addresses from config by dns lookup
I.e. allow symbolic names in addition to ip addresses.
2017-02-06 09:45:37 +00:00
Calle Wilund
ff8f82f21c scylla tls: Add option support for client auth and tls opts
Refs #1813 (fixes scylla part)

Added require_client_auth and priority_string options to
server_encryption_options/client_encryption_options an process them.

Allows TLS method/algo specification. Also enabled enforcing known cert
authentication for both node-to-node and client communication.
2017-02-06 09:45:09 +00:00
Avi Kivity
0591303b72 Merge "avoid excessive memory usage during resharding" from Rapahel
"Intended to reduce memory usage when resharding by sharing sstable
components among shards. File descriptors are also shared from now
on, meaning that a much smaller number of file descriptors will be
used during resharding.

Fixes #1951."

branch 'excessive_memory_usage_v4' of github.com:raphaelsc/scylla

* 'excessive_memory_usage_v4' of github.com:raphaelsc/scylla:
  db: avoid excessive memory usage during resharding
  checked_file_impl: add support to dup
  sstables: group sstable components that can be shared among shards
  sstables: rename sstable member
2017-01-09 20:43:50 +02:00
Raphael S. Carvalho
68dfcf5256 db: avoid excessive memory usage during resharding
After resharding, sstables may be owned by all shards, which
means that file descriptors and memory usage for metadata will
increase by a factor equal to number of shards. That can easily
lead to OOM.

SSTable components are immutable, so they can be stored in one
shard and shared with others that need it. We use the following
formula to decide which shard will open the sstable and share
it with the others: (generation % smp::count), which is the
inverse of how we calculate generation for new sstables.
So if no resharding is performed, everything is shard-local.
With this approach, resource usage due to loaded sstables will
be evenly distributed among shards.

For this approach to work, we now only populate keyspaces from
shard 0. It's now the sole responsible for iterating through
column family dirs. In addition, most of population functions
are now free and take distributed database object as parameter.

Fixes #1951.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-01-09 15:24:36 -02:00
Avi Kivity
85f4e16336 main: fix incorrect low memory warning
A spurious division by smp::count warns that memory is low even when plenty
is available.  Fix by removing the division.

Fix #2002.

Message-Id: <20170108122216.27233-1-avi@scylladb.com>
Tested-by: Benoît Canet <benoit@scylladb.com>
2017-01-08 15:14:36 +02:00
Gleb Natapov
9ed3346f98 main: fix error reporting about low memory
Message-Id: <20170108112144.GT1829@scylladb.com>
2017-01-08 13:46:48 +02:00
Vlad Zolotarov
492295eb7f init: move supervisor_notify() out of main.cc
Transform the supervisor_notify() and related functions into
the "supervisor" class and place this class implementation in
a separate .cc file.

This is going to fix the compilation breakage of tests introduced
by a

commit 8014adc2a1

    init: serialize the creation of system_traces KS objects

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1483663955-20096-1-git-send-email-vladz@scylladb.com>
2017-01-06 10:10:55 +00:00
Vlad Zolotarov
8014adc2a1 init: serialize the creation of system_traces KS objects
Serialize the creation of a system_traces KS objects when
they do not exist - the initial cluster boot.
Avoid creating them in parallel by different cluster Nodes
in order to avoid issue #420.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1483552503-12873-3-git-send-email-vladz@scylladb.com>
2017-01-05 12:41:38 +01:00
Nadav Har'El
45f19f2633 main: better error message on failing to start Prometheus
Previously, if the Prometheus port (by default, 0.0.0.0:9180) could not
be opened, the following message appeared in the log about 10 seconds into
the run, and Scylla crashed.

ERROR 2017-01-01 19:31:04,066 [shard 0] seastar - Exiting on unhandled exception: std::system_error (error system:98, Address already in use)

The puzzled user would have no idea *which* address was already in use, why,
or why Scylla stopped.

In this patch, before the above message we get the much more informative
message:

ERROR 2017-01-01 19:58:19,080 [shard 0] init - Could not start Prometheus API server on 0.0.0.0:9180: std::system_error (error system:98, Address already in use)

We continue to print the original message - and exit - in this case,
under the assumption that it's better not to run the database while
improperly configured.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20170102121304.2060-1-nyh@scylladb.com>
2017-01-04 14:58:26 +02:00
Avi Kivity
339cc0c2fa main: verify sufficient memory per shard
Refuse to boot if we don't have at least 1 GiB per shard, unless in developer
mode.

The primary violator here is docker, but since it starts in developer mode,
it won't get fixed.  We need some extra logic for this case.
Message-Id: <20161221090222.28677-1-avi@scylladb.com>
2016-12-27 12:05:52 +02:00
Amnon Heiman
70b2a1bfd4 Set the prometheus prefix to scylla
This patch make the prometheus prefix configurable and set the default
value to scylla.

Fixes #1964

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1482671970-21487-1-git-send-email-amnon@scylladb.com>
2016-12-25 15:21:53 +02:00
Raphael S. Carvalho
27fb8ec512 db: avoid excessive disk usage during sstable resharding
Shared sstables will now be resharded in the same order to guarantee
that all shards owning a sstable will agree on its deletion nearly
the same time, therefore, reducing disk space requirement.
That's done by picking which column family to reshard in UUID order,
and each individual column family will reshard its shared sstables
in generation order.

Fixes #1952.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <87ff649ed24590c55c00cbb32bffd8fa2743e36e.1482342754.git.raphaelsc@scylladb.com>
2016-12-21 23:18:06 +02:00
Vlad Zolotarov
62cad0f5f5 tracing: don't start tracing until a Tracing service is fully initialized
RPC messaging service is initialized before the Tracing service, so
we should prevent creation of tracing spans before the service is
fully initialized.

We will use an already existing "_down" state and extend it in a way
that !_down equals "started", where "started" is TRUE when the local
service is fully initialized.

We will also split the Tracing service initialization into two parts:
   1) Initialize the sharded object.
   2) Start the tracing service:
      - Create the I/O backend service.
      - Enable tracing.

Fixes issue #1939

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1481836429-28478-1-git-send-email-vladz@scylladb.com>
2016-12-21 12:40:14 +02:00
Asias He
d1178fa299 Convert to use dht::token_range 2016-12-19 08:04:29 +08:00
Tomasz Grabiec
f7197dabf8 commitlog: Fix replay to not delete dirty segments
The problem is that replay will unlink any segments which were on disk
at the time the replay starts. However, some of those segments may
have been created by current node since the boot. If a segment is part
of reserve for example, it will be unlinked by replay, but we will
still use that segment to log mutations. Those mutations will not be
visible to replay after a crash though.

The fix is to record preexisting segents before any new segments will
have a chance to be created and use that as the replay list.

Introduced in abe7358767.

dtest failure:

 commitlog_test.py:TestCommitLog.test_commitlog_replay_on_startup

Message-Id: <1481117436-6243-1-git-send-email-tgrabiec@scylladb.com>
2016-12-07 15:54:47 +02:00
Takuya ASADA
2976799ef2 main: fix startup failing on Ubuntu 15.10/16.04
Since Ubuntu 15.10/16.04 still uses Upstart to manage GUI session (not as init), when we directly launch Scylla on Ubuntu's GUI Terminal(not using systemctl or initctl), raise(SIGSTOP) mistakenly calls (Because GUI session has "UPSTART_JOB" environment variable, won't happen when running Scylla as systemd service).

To avoid this, we need to verify UPSTART_JOB == "scylla-server".
If it's part of GUI session UPSTART_JOB has to be "unity7", we need to avoid raise(SIGSTOP) in that case.

Fixes #1199

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1480620421-28967-1-git-send-email-syuu@scylladb.com>
2016-12-05 16:28:25 +02:00
Avi Kivity
28857e42e7 Merge " Virtualize size_estimates system table" from Duarte
"We currently write the size_estimates system table for every schema
on a periodic basis, currently set to 5 minutes, which can interfere
with an ongoing workload.

This patchset virtualizes it such that queries are intercepted and we
calculate the results on the fly, only for the ranges the caller is interested in.

Fixes #1616"

* 'virtual-estimates/v4' of github.com:duarten/scylla:
  size_estimates_virtual_reader: Add unit test
  db: Delete size_estimates_recorder
  size_estimates: Add virtual reader
  column_family: Add support for virtual readers
  storage_service: get_local_tokens() returns a future
  nonwrapping_range: Add slice() function
  range: Find a sequence's lower and upper bounds
  system_keyspace: Build mutations for size estimates
  size_estimates: Store the token range as bytes
  range_estimates: Add schema
  murmur3_partitioner: Convert maximum_token to sstring
2016-11-28 10:12:59 +02:00
Avi Kivity
07d5a20bae Wire up sharding ignore msb parameter to configuration
We might have used a fancy map<sstring, any> to pass the parameters, but
that's overkill for now.
2016-11-22 22:40:47 +02:00
Duarte Nunes
6a37d87c76 db: Delete size_estimates_recorder
Now that access to the size_estimates system is virtualized, we no
longer need the recorder.

Fixes #1616

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-11-21 11:15:05 +00:00
Raphael S. Carvalho
9a9f0d3a0f main: fix exception handling when initializing data or commitlog dirs
Exception handling was broken because after io checker, storage_io_error
exception is wrapped around system error exceptions. Also the message
when handling exception wasn't precise enough for all cases. For example,
lack of permission to write to existing data directory.

Fixes #883.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <b2dc75010a06f16ab1b676ce905ae12e930a700a.1478542388.git.raphaelsc@scylladb.com>
2016-11-14 12:34:10 +02:00
Avi Kivity
a35136533d Convert ring_position and token ranges to be nonwrapping
Wrapping ranges are a pain, so we are moving wrap handling to the edges.

Since cql can't generate wrapping ranges, this means thrift and the ring
maintenance code; also range->ring transformations need to merge the first
and last ranges.

Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>
2016-11-02 21:04:11 +02:00
Takuya ASADA
587d375e19 main: exit with 1 when verify_seastar_io_scheduler() failed
Since we are exiting Scylla process in engine().at_exit() using
::_exit(0), even verify_seastar_io_scheduler() throwing an exception,
scylla always exit with 0.

Systemd misunderstands scylla-server.service was shutdown successfully
because of this, so we need to pass correct exit code to ::_exit() here.

Fixes #1674

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1475065607-15486-1-git-send-email-syuu@scylladb.com>
2016-10-17 13:57:00 +03:00
Raphael S. Carvalho
76862d0d9c main: start compaction procedure after commit log is replayed
Commit log replay is a synchronous operation in bootstrap, so services
will only be started after it's completed. By starting compaction before,
less bandwidth will be available to both and consequently boot will be
slowed down. Fix is simply about moving compaction, which is an
asynchronous operation after commitlog replay is over.

Fixes #1620.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <d2a173a4ee4d474317b970c6b39530e61067fea9.1475527955.git.raphaelsc@scylladb.com>
2016-10-06 18:25:24 +03:00
Avi Kivity
c94fb1bf12 build: reduce inclusions of messaging_service.hh
Remove inclusions from header files (primary offender is fb_utilities.hh)
and introduce new messaging_service_fwd.hh to reduce rebuilds when the
messaging service changes.

Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>
2016-10-05 11:46:49 +03:00
Gleb Natapov
26ae8e8365 implement listen_on_broadcast_address option
When using multiple physical network interfaces, set this to true to
listen on broadcast_address in addition to the listen_address, allowing
nodes to communicate in both interfaces.  Ignore this property if the
network configuration automatically routes between the public and
private networks such as EC2.

Message-Id: <20160921094810.GA28654@scylladb.com>
2016-09-26 08:49:54 +03:00
Pekka Enberg
f1d0401ed2 main: Use proper logger for API server messages
We have a "startlog" that we can use to print out API server messages.

Message-Id: <1474358312-26510-1-git-send-email-penberg@scylladb.com>
2016-09-20 11:09:59 +03:00
Tomasz Grabiec
9476bc5a31 Introduce --abort-on-lsa-bad-alloc command line option
Useful for triggerring core dump on allocation failure inside LSA,
which makes it easier to debug allocation failures. They normally
don't cause aborts, just fail the current operation, which makes it
hard to figure out what was the cause of allocation failure.

Message-Id: <1470233631-18508-1-git-send-email-tgrabiec@scylladb.com>
2016-08-03 17:26:44 +03:00
Amnon Heiman
bb4268a8a5 Add prometheus API
This patch adds the prometheus API it adds the proto library to the
compilation, adds an optional configuration parameter to change the
prometheus listening port and start the prometheus API in main.

To disable the prometheus API, set its listening port to 0.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1470228764-19545-2-git-send-email-amnon@scylladb.com>
2016-08-03 15:55:18 +03:00
Duarte Nunes
9ffdf4a5cd db: Implement size_estimates_recorder
This patch implements the size_estimates_recorder, which periodically
writes estimations for all the non-system column families in the
size_estimates system table. The size_estimates_recorder class
corresponds to the one in Cassandra's SizeEstimatesRecorder.java.

Estimation is carried out by shard 0. Since we're estimating based on
data in shared sstables, having multiple shards doing this would skew
the results.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-19 09:44:58 +00:00
Paweł Dziepak
7e06499458 repair: convert hashing to streamed_mutations
This patch makes hashing for repair calculate checksums in a way that
doesn't require rebuilding whole mutation.
Unfortunately, such checksums are incompatible with the old ones so the
old way for computing checksums is preserved for compatibility reasons.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:23 +01:00