Commit Graph

9705 Commits

Author SHA1 Message Date
Piotr Jastrzebski
cd9f3f94c4 Fix row_cache::update
Clear continuous flag on the last cache entry
with key smaller than a partition being dropped from
memtable on flush and not saved in cache.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <0b5293cc0bf8bb858e62aa8dd00ae7fe7a484380.1467059472.git.piotr@scylladb.com>
2016-06-28 10:25:38 +02:00
Piotr Jastrzebski
eb959a8b81 Change check for artificial entry
in cache_entry destructor from
_key.has_key() to _lru_link.is_linked()

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <f6d3d1bc49d9f6dd5b67a10cbe862466047b039d.1467059472.git.piotr@scylladb.com>
2016-06-28 10:24:29 +02:00
Nadav Har'El
164c760324 Switch compression chunk default from 64 KB to 4 KB
Following Cassandra, our default sstable compression chunk size is 64 KB.
The big downside of this default size is that small reads need to read
and uncompress a large chunk, around 32 KB (if compression halves the data
size). In this patch we switch the default chunk size to 4 KB, which allows
faster small reads (the report in issue #1337 was of a 60-fold speedup...).

Since commit 2f56577, large reads will not be signficantly slowed down by
changing to a small chunk size. The remaining potential downside of this
change is lowering of the compression ratio because of the smaller chunks
individually compressed. However, experimentation shows that the compression
ratio is hurt somewhat, but not dramatically, by lowering the chunk size:
A recent survey of Cassandra compression in
https://www.percona.com/blog/2016/03/09/evaluating-database-compression-methods/
reports a compression ratio of 2 for 64 KB chunks, vs. 1.75 for 4 KB chunks.
My own test on a cassandra-stress workload (whose data is relatively hard
to compress), showed compression ratio 1.25 for 64 KB chunk, vs. 1.23 for
4 KB chunks.

Also remember that if a user wants to control the chunk length for a
particular table, he can - the 64 KB or 4 KB sizes are just the default.

Fixes #1337

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1467063335-12096-1-git-send-email-nyh@scylladb.com>
2016-06-28 08:50:24 +03:00
Tomasz Grabiec
6108d91362 scylla-gdb: Introduce scylla ptr
Helps in identifying pointers allocated through seastar
allocator. Shows to which thread the pointer belongs, to which size
class, whether it's live or free, what's the offset realtive to the
live object.

Example:

  (gdb) scylla ptr 0x6040abe88170
  thread 1, small (size <= 320), live (0x6040abe88140 +48)

Message-Id: <1467047215-1763-1-git-send-email-tgrabiec@scylladb.com>
2016-06-27 20:11:56 +03:00
Avi Kivity
22ec25b1b3 Merge seastar upstream
* seastar 3029ebe...c15055c (5):
  > memory: add option to mlock() all memory
  > reactor: run idle poll handler with a pure poll function
  > ignore all but one failed futures in map_reduce
  > tutorial: more general exception printout on startup
  > resource: don't abort on too-high io queue count

Fixes #1395.
Fixes #1400.
2016-06-27 19:24:04 +03:00
Tomasz Grabiec
85a37cb379 Merge tag '1398/v3' from https://github.com/avikivity/scylla
From Avi:

Both the cql binary transport and the rpc server have protection against
too many concurrent requests overwhelming the database due to transient
allocations.  There work by estimating the amount of memory a request
requires, and accounting that against a semaphore.  When the semaphore
blocks, we stop dequeing requests from the tcp connection.

Unfortunately, this doesn't work for reads, because we can't estimate the
required memory size.  A small read request can require many sstables to be
read, perhaps concurrently, and a large response to be generated.

Fix by limiting the number of concurrent reads in a shard to 100.  This
is more than enough concurrency for any reasonable disk, and there is no
network communication at this level, so we're safe from high network
latency requiring high concurrency.

Fixes #1398.
2016-06-27 18:04:33 +02:00
Avi Kivity
f03cd6e913 db: add statistics about queued reads 2016-06-27 17:25:08 +03:00
Avi Kivity
edeef03b34 db: restrict replica read concurrency
Since reading mutations can consume a large amount of memory, which, moreover,
is not predicatable at the time the read is initiated, restrict the number
of reads to 100 per shard.  This is more than enough to saturate the disk,
and hopefully enough to prevent allocation failures.

Restriction is applied in column_family::make_sstable_reader(), which is
called either on a cache miss or if the cache is disabled.  This allows
cached reads to proceed without restriction, since their memory usage is
supposedly low.

Reads from the system keyspace use a separate semaphore, to prevent
user reads from blocking system reads.  Perhaps we should select the
semaphore based on the source of the read rather than the keyspace,
but for now using the keyspace is sufficient.
2016-06-27 17:17:56 +03:00
Avi Kivity
bea7d7ee94 mutation_reader: introduce restricting_reader
A restricting_reader wraps a mutation_reader, and restricts it concurrency
using a provided semaphore; this allows controlling read concurrency, which
is important since reads can consume a lot of resources ((number of
participating sstables) * 128k after we have streaming mutations, and a lot
more before).
2016-06-27 17:17:52 +03:00
Avi Kivity
f96e5d7c1b managed_bytes: fix build with gcc 6
gcc 6 complains that deleting a managed_bytes::external isn't defined
because the size isn't known.  I'm not sure it's correct, but there's no
way to tell because flexible arrays aren't standardized.

Fix by using an array of zero size.
Message-Id: <1466715187-4125-1-git-send-email-avi@scylladb.com>
2016-06-27 10:54:10 +02:00
Avi Kivity
056b427855 range_tombstone_list: use non-template lambda for cloning tombstones
Using a template lambda invokes a bug in Fedora 24's boost where the
lambda's parameter is an internal boost type rather than a range_tombestone.

Constraining the parameter with an explicit type avoids the problem.
Message-Id: <1466844211-17298-1-git-send-email-avi@scylladb.com>
2016-06-27 10:48:59 +02:00
Amnon Heiman
a439a6b8d3 API: Add the collectd enable/disable implementation
This adds the implementation to the enable and disable of the collectd
metrics.

An example for disabling all collectd metrics that has write in their
type_instance part:

curl -X POST --header "Content-Type: application/json" --header "Accept:
application/json"
"http://localhost:10000/collectd/.*?instance=.*&type=.*&type_instance=.*write.*&enable=false"

After that a call to:
curl -X GET "http://localhost:10000/collectd/"

Would return those metrics with the enable set to "false"

An example to enable all the metrics in cache that their type starts
with byt:

curl -X POST --header "Content-Type: application/json" --header "Accept:
application/json"
"http://localhost:10000/collectd/cache?type=byt.*&enable=true"

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1466932139-19264-3-git-send-email-amnon@scylladb.com>
2016-06-26 12:26:50 +03:00
Amnon Heiman
4d7837af40 API Definition: collectd to support enable disable
This adds to the definition of the collectd API the ability to turn on
and off specific collectd metrics.

For the GET end point a POST option was added that allow to enable or
disable a metric.

The general GET endpoint now returns the enable flag that indicates if
the metric is enable.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1466932139-19264-2-git-send-email-amnon@scylladb.com>
2016-06-26 12:26:48 +03:00
Duarte Nunes
dfbf68cd24 commitlog: Define operator<< in namespace db
Needed for compilation with gcc6.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1466852874-8448-1-git-send-email-duarte@scylladb.com>
2016-06-26 10:05:28 +03:00
Avi Kivity
5b81448ed6 main: add scylla --version option
Fixes #1384.
Message-Id: <1466691517-29964-1-git-send-email-avi@scylladb.com>
2016-06-23 16:24:03 +02:00
Duarte Nunes
1ffae6e6ee database_test: Add test case for row limit
This patch introduces database_test and adds a test case to ensure
the row limit is respected when querying multiple partition ranges.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20160623111723.17523-1-duarte@scylladb.com>
2016-06-23 14:20:34 +02:00
Avi Kivity
e647ec1c4a Merge "thrift: Implement describe verbs" from Duarte
"This patchset implements the thrib describe verbs:
    - describe_keyspace
    - describe_keyspaces
    - describe_cluster_name
    - describe_version
    - describe_ring
    - describe_local_ring
    - describe_token_map
    - describe_partitioner
    - describe_snitch
    - describe_schema_versions

The verbs describe_splits and describe_splits_ex are not implemented
because they are marked as experimentail (Origin's thrift interface has
this to say about them: "experimental API for hadoop/parallel query
support. may change violently and without warning."). Some drivers have
moved away from depending on this verb (SPARKC-94). The correct way to
implement the verbs for us would be to use the size_estimates system table
(CASSANDRA-7688). However, we currently don't populate size_estimates, which
is done by SizeEstimatesRecorder.java in Origin."
2016-06-23 13:30:39 +03:00
Duarte Nunes
b291c22e39 thrift: Complete describe_keyspace verb
This patch completes the describe_keyspace verb by adding setting the
remaining fields.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 12:02:47 +02:00
Duarte Nunes
febc48166d thrift: Type name is already based on Origin
This patch removes a conversion function from an internal type
name to Origin's naming, which isn't needed because the
abstract_type hierarchy already keeps that mapping.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 12:02:47 +02:00
Duarte Nunes
8b00fe3989 thrift: Add explanatory note about describe_splits
We don't implement describe_splits, and this patch describes why that
it. In a nutshell, to properly implement this, we would need something
like Origin's SizeEstimatesRecorder.java, but as the verb is marked as
experimental, we don't do it for now.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 12:02:46 +02:00
Duarte Nunes
b175204cfe thrift: Implement describe_snitch verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:47 +02:00
Duarte Nunes
9e6ab878d6 thrift: Implement describe_partitioner verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:47 +02:00
Duarte Nunes
358b03c409 thrift: Implement describe_token_map verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:47 +02:00
Duarte Nunes
1ea7102d9f thrift: Implement describe_ring verbs
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:42 +02:00
Duarte Nunes
8377264226 thrift: Implement describe_version verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:10 +02:00
Duarte Nunes
8370450dcb trhift: Implement describe_cluster_name verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:09 +02:00
Duarte Nunes
2a898743c6 thrift: Implement describe_schema_versions verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:53:07 +02:00
Pekka Enberg
c464e85a3f Merge "thrift: Implement DDL verbs" from Duarte
"This patchset implements the thrift DDL verbs:
    - system_add_column_family
    - system_drop_column_family
    - system_update_column_family
    - system_add_keyspace
    - system_drop_keyspace
    - system_update_keyspace"
2016-06-23 12:46:58 +03:00
Duarte Nunes
3c02af083c thrift: Implement system_update_keyspace verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:29 +02:00
Duarte Nunes
aa16c303ca thrift: Implement system_drop_keyspace verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:29 +02:00
Duarte Nunes
8ff3fbe916 thrift: Implement system_drop_column_family verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:29 +02:00
Duarte Nunes
f6fab027c6 thrift: Implement system_update_column_family verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:28 +02:00
Duarte Nunes
de46653036 thrift: Implement system_add_column_family verb
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:28 +02:00
Duarte Nunes
25a8ffb09a thrift: Extract keyspace_from_thrift function
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:28 +02:00
Duarte Nunes
74cb796de7 thrift: Extract schema_from_thrift function
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:28 +02:00
Duarte Nunes
9d85ea6304 thrift: Complete system_add_keyspace verb
This patch completes the system_add_keyspace verb by setting all
relevant options on the new schemas.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:28 +02:00
Duarte Nunes
05f7a6d63e thrift: Add basic support for dynamic CF
In thrift, a static column family is one where all columns are
defined upon schema creation. It maps to a CQL table with a singular
partition key and a set of regular columns.

On the other hand, a dynamic column family is one which allows column
to be dynamically added by insertion requests. It maps to a CQL table
with a partition key and a clustering key, which will hold the names of
the dynamic columns, and a regular column, which will how the respective
values. If the thrift comparator type is composite, then there will be a
clustering column for each of the composite's components.

There can also be mixed column families; supporting those is future
work.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:41:28 +02:00
Duarte Nunes
49b8bff21c thrift: Extract make_exception to common header
This patch moves the make_exception function from thrift/handler.cc to
the new header file thrift/utils.hh.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-23 11:40:52 +02:00
Benoît Canet
e37b18b231 scylla_ntp_setup: Define an ntp server on ubuntu if there is none
The pool directive from ntp.conf is not recognized by ntpdate.
Strip it and put the ubuntu server in place.

Fixes: #1345

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466607457-14029-1-git-send-email-benoit@scylladb.com>
2016-06-23 12:40:13 +03:00
Benoît Canet
8b6bb0251d README.md: Fix markdown formating
I suspect wrong formatting causes us trouble in the
docker hub descriptions.

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466603787-13423-1-git-send-email-benoit@scylladb.com>
2016-06-23 12:39:04 +03:00
Duarte Nunes
aacc7193f2 schema: Replace keyspace's schema_ptr on CF update
This patch ensures we replace the schema_ptr held by its respective
keyspace object when a column family is being updated.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20160623085710.26168-1-duarte@scylladb.com>
2016-06-23 11:11:52 +02:00
Tzach Livyatan
3fa7bb1292 scylla_setup: Ignore case in prompt responses
Fix #1376 by converting each response to lowercase.

Signed-off-by: Tzach Livyatan <tzach@scylladb.com>
Message-Id: <1466672539-5625-1-git-send-email-tzach@scylladb.com>
2016-06-23 12:08:26 +03:00
Glauber Costa
e08fa7dafa fix potential stale data in cache update
We currently have a problem in update_cache, that can be trigger by ordering
issues related to memtable flush termination (not initiation) and/or
update_cache() call duration.

That issue is described in #1364, and in short, happens if a call to
update_cache starts before and ongoing call finishes. There is now a new SSTable
that should be consulted by the presence checker that is not.

The partition checker operates in a stale list because we need to make sure the
SSTable we just wrote is excluded from it.  This patch changes the partition
checker so that all SSTables currently in use are consulted, except for the one
we have just flushed. That provides both the guarantee that we won't check our
own SSTable and access to the most up-to-date SSTable list.

Fixes #1364

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <fa1cee672bba8e21725c6847353552791225295f.1466534499.git.glauber@scylladb.com>
2016-06-23 10:54:44 +02:00
Pekka Enberg
bcba45f546 Merge "Prevent old node to join new cluster" from Asias
Fixes #1253
2016-06-23 10:25:38 +03:00
Piotr Jastrzebski
9b011bff18 row_cache: add contiguity flag to cache entry to reduce disk IO during scans
Add contiguity flag to cache entry and set it in scanning reader.
Partitions fetched during scanning are continuous
and we know there's nothing between them.

Clear contiguity flag on cache entries
when the succeeding entry is removed.

Use continuous flag in range queries.
Don't go do disk if we know that there's nothing
between two entries we have in cache. We know that
when continuous flag of the first one is set to true.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <72bae432717037e95d1ac9465deaccfa7c7da707.1466627603.git.piotr@scylladb.com>
2016-06-23 09:43:15 +03:00
Avi Kivity
5af22f6cb1 main: handle exceptions during startup
If we don't, std::terminate() causes a core dump, even though an
exception is sort-of-expected here and can be handled.

Add an exception handler to fix.

Fixes #1379.
Message-Id: <1466595221-20358-1-git-send-email-avi@scylladb.com>
2016-06-23 09:25:33 +03:00
Avi Kivity
a192c80377 gdb: fully-qualify type names
gdb gets confused if a non-fully-qualified class name is used when
we are in some namespace context.  Help it out by adding a :: prefix.
Message-Id: <1466587895-8690-1-git-send-email-avi@scylladb.com>
2016-06-22 12:04:17 +02:00
Avi Kivity
9dacd4fb80 Merge "query: Add new limits" from Duarte
This patchset adds two new types of query limits:

 - Per partition row limit, which limits how many rows
   a given partition may return; needed both for thrift
   and for future CQL features;
 - Limit on the number of partitions returned, needed
   by thrift.
2016-06-22 11:03:13 +03:00
Duarte Nunes
82dbf5bff3 storage_proxy: Trace when retrying a query
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-22 09:48:15 +02:00
Duarte Nunes
69798df95e query: Limit number of partitions returned
This is required to implement a thrift verb.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-06-22 09:48:13 +02:00