Commit Graph

7370 Commits

Author SHA1 Message Date
Pekka Enberg
3c72ea9f96 gms: Fix gossiper::handle_major_state_change() restart logic
The restart logic is wrong because C* had a bug in
bf599fb5b062cbcc652da78b7d699e7a01b949ad and they fixed later and we
translated the broken version. We must check if there is an existing
endpoint state and call on_restart() hooks on that, not the newly
available endpoint state.

Spotted while inspecting the code.

Acked-by: Asias He <asias@scylladb.com>
2015-11-16 11:25:48 +02:00
Tomasz Grabiec
7e0f99cc3b Merge tag 'native-preparatory/v1' from https://github.com/avikivity/scylla.git
Assorted patches that pave the way for native storage (while not
committing us in any way).
2015-11-16 10:01:38 +01:00
Tomasz Grabiec
b6e7312c07 Merge 'memtable-allocating_section/v2' from https://github.com/avikivity/scylla.git
From Avi:

Memtables do not use an allocating_section to guard against allocation
failure, and hence can fail an allocation.  Reproducible by changing
perf_mutation to use an allocating type (bytes_type with a nontrivial
size) and making the loop longer.

Fix by using an allocating_section.
2015-11-16 10:01:05 +01:00
Avi Kivity
a40a62d840 memtable: use allocating_section to guard allocations
Without this, an allocation can fail, and we may not be able to reclaim
memory.
2015-11-16 10:56:06 +02:00
Avi Kivity
1c425d6b50 logalloc: allow allocating_section code blocks to return references 2015-11-15 19:10:24 +02:00
Tomasz Grabiec
6626d1a100 storage_proxy: Stop latency_counter before throwing 2015-11-15 10:34:55 +02:00
Glauber Costa
0989f80c97 provide cf_stats where one is needed
Recently, I have introduced cf_stats into the database, propagating all the way
back to the column family. The problem, however, is that some tests create a
column family config themselves instead of going through make_column_family.

That is ultimately ok if those tests are not expected to flush memtables. But
if they are, the cf_stats pointer will be null and we will crash. Although
there are many solutions to this, the one that is in tune with our current
practices is to have the test that requires it provide an empty cf_stats storage
area that can be written to. That's already how we handle the disk directory and
other things like compaction properties.

With this patch, test.py passes again.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-15 10:31:32 +02:00
Glauber Costa
fe2b928a3f change defaults for commitlog maximum size
arbitrary 8G -> all_memory.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-15 10:29:23 +02:00
Glauber Costa
00c12319f1 config: change type for commitlog maximum size config option
This patch substitutes uint64_t for uint32_t as the type for
commitlog_total_space_in_mb.  Moving to 64 is not strictly needed, since even a
signed 32-bit type would allow us to easily handle 2TB. But since we store that
in the commitlog as a 64-bit value, let's match it.

Moving from unsigned to signed, however, allow us to represent negative
numbers.  With that in place, we can change the semantics of the value
slightly, so to allow a negative number to mean "all memory".

The reason behind this, is that the default value "8GB", is an artifact of the
JVM.  We don't need that, and in many-shards configuration, each shard flushes
the commitlog way too often, since 8GB / many_shards = small_number.

8GB also happens to be a popular heap size for C* in the JVM. For us, we would
like to equate that (at least) with the amount of memory. The problem is how to
do that without introducing new options or changing the semantics of existing
options too radically.

The proposed solution will allow us to still parse C* yaml files, since those
will always have positive numbers, while introducing our own defaults.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-15 10:29:23 +02:00
Avi Kivity
baf3b2ff4f Merge "Eliminate errors/warnings on building ubuntu package" from Takuya
"Fixes part of #493 (few warnings still remains)"
2015-11-15 10:26:19 +02:00
Takuya ASADA
df9e7d542f dist: ignore bogus 'dist/ubuntu/debian/source/lintian-overrides' on ubuntu package building 2015-11-15 15:03:02 +09:00
Takuya ASADA
d40ce51a01 dist: fix 'init.d-script-missing-dependency-on-remote_fs' message on building ubuntu package 2015-11-15 15:03:02 +09:00
Takuya ASADA
87ed97c387 dist: fix 'init.d-script-not-included-in-package' message on building ubuntu package 2015-11-15 15:03:02 +09:00
Takuya ASADA
100c22cfa5 dist: place 'default' file by debian way 2015-11-15 15:03:02 +09:00
Takuya ASADA
7c4c182268 dist: move redhat/sysconfig to common/sysconfig
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
0c9f837d04 dist: fix 'source-is-missing' message on building ubuntu package 2015-11-15 15:03:02 +09:00
Takuya ASADA
580ca26092 dist: fix 'extended-description-line-too-long' message on building ubuntu package 2015-11-15 15:03:02 +09:00
Takuya ASADA
ee45201b5a dist: make ubuntu package as 'debian non-native package'
Debian package system has two types of package, 'native' and 'non-native'.
'native' is the package just for Debian, it contains debian/ directory source tar.gz, doesn't have debian.tar.gz.
'non-native' has orig.tar.gz which is upstream source code tar ball, then it has debian.tar.gz which contains debian/ directory.
Scylla is 'native' now but should be 'non-native' since this is not just for Debian, so move debian/ to dist/ubuntu/, make orig.tar.gz using git-archive-all, copy dist/ubuntu/debian/ to debian/ then generate debian.tar.gz.
2015-11-15 15:03:02 +09:00
Takuya ASADA
c9c9a86195 dist: fix 'maintainer-script-needs-depends-on-adduser postinst' message on building ubuntu package
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
922f747122 dist: ignore some files under debian/ which are created on building time
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
10a7bd2d4a dist: fix 'wrong-section-according-to-package-name' and 'debug-package-should-be-priority-extra' message on building ubuntu package
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
63e262e507 correct permission of cassandra-rackdc.properties
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
91bd7cf71a dist: fix 'maintainer-script-lacks-debhelper-token' message on building ubuntu package
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
89fe283506 dist: fix 'ancient-standards-version 3.9.2' message on building ubuntu package
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Takuya ASADA
c09226fda8 dist: fix 'missing-license-paragraph-in-dep5-copyright' message on building ubuntu package
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-11-15 15:03:02 +09:00
Avi Kivity
36994a5d08 managed_bytes: add a constructor from std::initializer_list<>
Not actually used in the patchset now, but nice.
2015-11-13 17:13:07 +02:00
Avi Kivity
f3afe3e876 allocation_strategy: constify migrate_fn
Since abstract_type will be providing our migrate_fn, they must be const,
and indeed a migration does not change the migration function.
2015-11-13 17:13:07 +02:00
Avi Kivity
a4a776e66c cql3: add operation::make_cell() helpers
atomic_cell will soon become type-aware, so add helpers to class operation
that can supply the type, as it is available in operation::column.type.

(the type will be used in following patches)
2015-11-13 17:13:07 +02:00
Avi Kivity
79f7431a03 db: change collection_mutation::{one,view} not to use nested classes
Nested classes cannot be forward-declared, so change the naming
not to use them.  Follows atomic_cell{,_view}.
2015-11-13 17:13:07 +02:00
Avi Kivity
3fcb7add2e types: fix concrete_type::native_type_move()
The source is modified during a move, and so must not be const.
2015-11-13 17:13:07 +02:00
Avi Kivity
68a902ad0c data_value: add constructor from bool
schema_tables manages some boolean columns stored in system tables; it
dynamically creates them from C++ values.  But as we lacked bool->data_value
conversion, the C++ value was converted to a int32_type.  Somehow this didn't
cause any problems, but with some pending patches I have, it does.

Add a bool->data_value converting constructor to fix this.
2015-11-13 17:13:07 +02:00
Avi Kivity
47499dcf18 data_value: make conversion from bytes explicit
Since bytes is a very generic value that is returned from many calls,
it is easy to pass it by mistake to a function expecting a data_value,
and to get a wrong result.  It is impossible for the data_value constructor
to know if the argument is a genuine bytes variable, a data_value of another
type, but serialized, or some other serialized data type.

To prevent misuse, make the data_value(bytes) constructor
(and complementary data_value(optional<bytes>) explicit.
2015-11-13 17:12:29 +02:00
Asias He
2ebd18a248 streaming: Add two more virtual destructors
For stream_event_handler and stream_task.
2015-11-13 09:55:19 +02:00
Asias He
25a03013bf storage_service: Fix cql and thrift server stop in decommission
When do_stop_native_transport exits, cserver is destroyed which can
happen before cserver->stop(). Fix by capturing cserver in
cserver->stop()'s continuation to extend its lifetime. The same for
thrift server.

scylla: scylla/seastar/core/sharded.hh:327: seastar::sharded<Service>::~sharded()
[with Service = transport::cql_server]: Assertion `_instances.empty()' failed.
2015-11-13 09:40:07 +02:00
Tomasz Grabiec
b883ed3abf paging_state: Move instead of copy when possible 2015-11-12 20:19:00 +02:00
Glauber Costa
fa1ae45218 database: export collectd metrics about the state of memtable flushing
When analyzing a recent performance issue, I found helpful to keep track of
the amount of memtables that are currently in flight, as well as how much memory
they are consuming in the system.

Although those are memtable statistics, I am grouping them under the "cf_stats"
structure: being the column family a central piece of the puzzle, it is reasonable
to assume that a lot of metrics about it would be potentially welcome in the future.

Note that we don't want to reuse the "stats" structure in the column family: for once,
the fields not always map precisely (pending flushes, for instance, only tracks explicit
flushes), and also the stats structure is a lot more complex than we need.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-12 20:17:22 +02:00
Tomasz Grabiec
f3f2bf0b44 schema: Move definitions to source file 2015-11-12 13:50:01 +02:00
Avi Kivity
6a9ed4a4eb Merge seastar upstream
* seastar 5c10d3e...20bf03b (5):
  > do not re-throw exception to get to an exception pointer
  > Adding timeout counter to the rpc
  > configure.py: support for pkg-config before release 0.28
  > future: don't forget to warn about ignored exception
  > tutorial: continue network API section
2015-11-12 11:19:52 +02:00
Asias He
6aa5bfe59f range_streamer: Add virtual destructor to i_source_filter
Found by debug build

==10190==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x602000084430 in thread T0:
  object passed to delete has wrong type:
  size of the allocated type:   16 bytes;
  size of the deallocated type: 8 bytes.
    #0 0x7fe244add512 in operator delete(void*, unsigned long) (/lib64/libasan.so.2+0x9a512)
    #1 0x3c674fe in std::default_delete<dht::range_streamer::i_source_filter>::operator()(dht::range_streamer::i_source_filter*)
       const /usr/include/c++/5.1.1/bits/unique_ptr.h:76
    #2 0x3c60584 in std::unique_ptr<dht::range_streamer::i_source_filter, std::default_delete<dht::range_streamer::i_source_filter> >::~unique_ptr()
       /usr/include/c++/5.1.1/bits/unique_ptr.h:236
    #3 0x3c7ac22 in void __gnu_cxx::new_allocator<std::unique_ptr<dht::range_streamer::i_source_filter,
       std::default_delete<dht::range_streamer::i_source_filter> > >::destroy<std::unique_ptr<dht::range_streamer::i_source_filter,
       std::default_delete<dht::range_streamer::i_source_filter> > >(std::unique_ptr<dht::range_streamer::i_source_filter,
       std::default_delete<dht::range_streamer::i_source_filter> >*) /usr/include/c++/5.1.1/ext/new_allocator.h:124
...
2015-11-12 11:19:22 +02:00
Avi Kivity
47c7dd96c5 Merge "Aggregate paging support" from Calle
"This adds repeated paged querying to do aggregate queries (similar to
origin). Uses "batched" paging."

Fixes #549
2015-11-11 20:47:14 +02:00
Calle Wilund
fdc549cd47 select_statement: Handle aggregate queries
Fixes #549.

Being clinically absent-minded, aggregate query support (i.e. count(...))
was left out of the "paging" change set.

This adds repeated paged querying to do aggregate queries (similar to
origin). Uses "batched" paging.
2015-11-11 18:41:47 +01:00
Calle Wilund
cc88763961 query_pager: Add method for repeated queries
fetch_page method that instead of returning a full result set, adds row
to a pre-existing one. For "batching".
2015-11-11 18:40:14 +01:00
Amnon Heiman
1b369be663 compaction_strategy should accept both class name and full class name
For compatibility reasons, compaction_strategy should accept both class
name strategy and the full class name that includes the package name.

In origin the result name depends on the configuration, we cannot mimic
that as we are using enum for the type.

So currently the return class name remains the class itself, we can
consider changing it in the future.

If the name is org.apache.cassandra.db.compaction.Name the it will be
compare as Name

The error message was modified to report the name it was given.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Fixes #545
2015-11-11 15:31:39 +02:00
Avi Kivity
b95ff5c09e Merge "Commitlog format change: add scylla "magic"" from Calle
"Slight file format change for commitlog segments, now incluing
a scylla "marker". Allows for fast-fail if trying to load an
Origin segment.

WARNING: This changes the file format, and there is no good way for me to
check if a CL is "old" scylla, or Origin (since "version" is the same). So
either "old" scylla files also fail, or we never fail (until later, and
worse). Thus, if upgrading from older to this patch ensure to
have cleaned out all commit logs first."
2015-11-11 11:42:47 +02:00
Avi Kivity
dcc7302312 Merge "Paging support" from Calle
Fixes #355

"Implements query paging similar to origin. If driver sets a "page size" in
a query, and we cannot know that we will not exceed this limit in a single
query, the query is performed using a "pager" object, which, using modified
partition ranges and query limits, keeps track of returned rows to "page"
through the results.

Implementation structure sort of mimics the origin design, even though it
is maybe a little bit overkill for us (currently). On the other hand, it
does not really hurt.

This implementation is tested using the "paging_test" subset in dtest.
It passes all test except:

* test_paging_using_secondary_indexes
* test_paging_using_secondary_indexes_with_static_cols
* test_failure_threshold_deletions

The two first because we don't have secondary indexes yet, the latter
because the test depends on "tombstone_failure_threshold" in origin.

Potential todo: Currently the pager object does not shortcut result
building fully when page limit is exceeded. Could save a little work
here, but probably not very significant."
2015-11-11 10:45:41 +02:00
Avi Kivity
5aecf210e2 Merge "gossip shutdown fix + streaming fix" from Asias
Fixes: #540 #542
2015-11-11 10:27:43 +02:00
Asias He
efda753c0c token_metadata: Implement pending_endpoints_for
It is used in storage_proxy::create_write_response_handler. The second
argument should be keyspace name instead of the keyspace class.

Refs: #539
2015-11-11 09:41:21 +02:00
Calle Wilund
43712a583d commitlog_replayer: Special case exception from "old/origin file"
And write some nice informative stuff.
2015-11-10 17:14:22 +01:00
Calle Wilund
85b8d65374 commitlog: Change file format to include magic marker
Allows us fail fast if someone tries to replay an Origin commit log.

WARNING: This changes the file format, and there is no good way for me to
check if a CL is "old" scylla, or Origin (since "version" is the same). So
either "old" scylla files also fail, or we never fail (until later, and
worse). Thus, if upgrading from older to this patch, likewise, ensure to
have cleaned out all commit logs first.
2015-11-10 17:11:06 +01:00
Glauber Costa
72573d0b46 storage proxy: be more vocal about timeouts
If a timeout happens, we should log it. Trace level is really
not adequate for this.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-11-10 16:41:22 +02:00