Commit Graph

11716 Commits

Author SHA1 Message Date
Tomasz Grabiec
f1c2481040 sstables: Fix bug in promoted index generation
maybe_flush_pi_block, which is called for each cell, assumes that
block_first_colname will be empty when the first cell is encountered
for each partition.

This didn't hold after writing partition which generated no index
entry, because block_first_colname was cleared only when there way any
data written into the promoted index. Fix by always clearing the name.

The effect was that the promoted index entry for the next partition
would be flushed sooner than necessary (still counting since the start
of the previous partition) and with offset pointing to the start of
the current partition. This will cause parsing error when such sstable
is read through promoted index entry because the offset is assumed to
point to a cell not to partition start.

Fixes #1567

Message-Id: <1470909915-4400-1-git-send-email-tgrabiec@scylladb.com>
2016-08-11 13:08:48 +03:00
Amnon Heiman
a24941cc5f build_deb: Add dist flag
The dist flag mark the debian package as distributed package.
As such the housekeeping configuration file will be included in the
package and will not need to be created by the scylla_setup.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1470907208-502-2-git-send-email-amnon@scylladb.com>
2016-08-11 12:25:07 +03:00
Pekka Enberg
d1a052237d dist/docker: Fix typo in "--overprovisioned" help text
Reported by Mathias Bogaert (@analytically).
Message-Id: <1470904395-4614-1-git-send-email-penberg@scylladb.com>
2016-08-11 11:38:03 +03:00
Nadav Har'El
7409688356 README.md: add another required package
I tried to compile scylladb on a new Fedora 24 system, and the "-lsystemd"
library was missing during like. We need the systemd-devel package for that.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470865063-12871-1-git-send-email-nyh@scylladb.com>
2016-08-11 10:21:21 +02:00
Avi Kivity
42d8701121 Merge seastar upstream
* seastar 64ae228...59613e7 (20):
  > reactor: fix I/O queue pending requests collectd metric
  > simple_input_stream: introduce copy_to() member function
  > scollectd: Fix merge skew between "disabled" and "descriped" metrics patches
  > Merge "Write-behind for XFS"
  > Merge "Fix the SMP queue poller" from Tomasz
  > Merge "collectd syntactical sugar & descriptions" from Calle
  > Fix build failure introduced by 5b1051ce0de5b772f22444abe6a1d97076b49b1f
  > Add --abort-on-seastar-bad-alloc option
  > Merge "Make arp comply with C++ strict aliasing rules"
  > doc: use install-dependencies.sh on docs
  > add execute bit on install-dependencies.sh
  > core/reactor: Fix use-after-free on io_event's promise
  > tcp: write option length correctly
  > tcp: make tcp options comply with strict aliasing rules
  > tcp: comply with strict aliasing rules
  > add libdl to library list
  > reactor: add exception counter
  > install-dependencies.sh: remove unnecessary sudo
  > install-dependencies.sh: install add-apt-repository when it's not installed
  > install-dependencies.sh: add protobuf to dependencies, for newly added prometheus API support

Fixes #1558.
2016-08-10 15:14:31 +03:00
Tomasz Grabiec
d7f8ce7722 Merge branch 'raphael/fix_min_max_metadata_v2' from git@github.com:raphaelsc/scylla.git
Fix for generation of sstables min/max clustering metadata from Raphael.
2016-08-10 10:43:35 +02:00
Pekka Enberg
6a5ab6bff4 dist/docker: Add '--smp', '--memory', and '--overprovisoned' options
Add '--smp', '--memory', and '--overprovisioned' options to the Docker
image. The options are written to /etc/scylla.d/docker.conf file, which
is picked up by the Scylla startup scripts.

You can now, for example, restrict your Docker container to 1 CPU and 1
GB of memory with:

   $ docker run --name some-scylla penberg/scylla --smp 1 --memory 1G --overprovisioned 1

Needed by folks who want to run Scylla on Docker in production.

Cc: Sasha Levin <alexander.levin@verizon.com>
Message-Id: <1470680445-25731-1-git-send-email-penberg@scylladb.com>
2016-08-10 11:34:08 +03:00
Raphael S. Carvalho
8deb1ca19d tests: add test to check sstables's min and max clustering values
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-08-09 15:54:40 -03:00
Raphael S. Carvalho
ef6ddf2398 sstables: fix tracking of min and max clustering components
Scylla was tracking min and max column names instead. Min and max
clustering components are tracked to optimize reads that use a
clustering filter. For more details:
https://issues.apache.org/jira/browse/CASSANDRA-5514

Also fix potential bug if clustering value is empty.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-08-09 15:01:30 -03:00
Vlad Zolotarov
5deec0e327 tracing::write_complete(): improve a message in case of a logic error
Improve a message if there is a logic error and add logging of such
errors.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-08-09 19:00:43 +03:00
Vlad Zolotarov
67d537ecb5 tracing: issue a write event if a single session creates a lot of events
Currently write events are issued every time a trace session is closed.
However if a single session creates a lot of events we will start dropping them
after the total amount of pending records bypasses the limit.

This patch will issue a write event before the session end in that case.

Since now new events may be added to the active tracing session while it's
scheduled for write we have to ensure the following:
   - Not to add the already pending for write session to the pending bulk.
   - Grab all pending data in a specific session in a synchronous way during
     the write event.
   - Serialize creation of events mutations - otherwise the "monotonic nanos"
     logic won't work.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-08-09 19:00:43 +03:00
Vlad Zolotarov
5391bcc5a9 tracing: improve a back pressure policy
Use a per-shard tracing records budget instead
of maintaining a fixed-size per-session records budget and
a per-shard sessions budget.

The original policy could lead to some irrational situations,
when we have a single tracing session that creates a substantial
amount of records that we can handle but we would start dropping
new records after it surpasses the per-session limit.

The new policy handles a per-shard trace records budget that is
being consumed by each trace() call and by a primary session destructor
when a session record is created.

Each active record may only be in one of the following states:
   - cached: stored in its session's object. When record is in this state
             it's not going to be written to I/O during the next write event.
   - pending for write: when record is in this state it's going to be written
             to I/O during the next write event.
   - flushing: the record is being currently written to the I/O.

There are counters of the total amount of records in each state above.

Each record may only be in a specific state at every point of time and
thereby it must be accounted only in one and only one of the three
counters.

The sum of all three counters should not be greater than
(max_pending_trace_records + write_event_records_threshold) at any time
(actually it can get as high as a value above plus (max_pending_sessions)
if all sessions are primary but we won't take this into an account for
simplicity).

The same is about the number of outstanding sessions: it may not be greater
than (max_pending_sessions + write_event_sessions_threshold) at any time.

If total number of tracing records is greater or equal to the limit
above, the new trace point is going to be dropped.

If current number or records plus the expected number of trace records
per session (exp_trace_events_per_session) is greater than the limit
above new sessions will be dropped. A new session will also be dropped if
there are too many active sessions.

When the record or a session is dropped the appropriate statistics
counters are updated and there is a rate-limited warning message printed
to the log.

Every time a number of records pending for write is greater or equal to
(write_event_records_threshold) or a number of sessions pending for
write is greater or equal to (write_event_sessions_threshold) a write
event is issued.

Every 2 seconds a timer would write all pending for write records
available so far.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-08-09 19:00:43 +03:00
Vlad Zolotarov
d8fe5317d1 tracing::trace_keyspace_helper: make events' mutations applying loop interruptible
When building events' mutation don't apply them in a tight loop
but rather apply each of them in a separate continuation to allow
reactor to interrupt this loop if it takes too long for it to
complete (e.g. where there are a lot of mutations to apply).

Since building all events' mutations is asynchronous now we can
no longer keep the "nanos" state in a global trace_keyspace_helper
object but rather have to move it into the per-session
backend_session_state class.

backend_session_state class is a backend-specific implementation of a
tracing::backend_session_state_base class.

An instance of the above object is created by a
tracing::i_tracing_backend_helper::allocate_session_state() virtual
method and is stored in a tracing::one_session_records object.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-08-09 19:00:39 +03:00
Vlad Zolotarov
63a0502ed1 tracing: rework the interface between the tracing/trace_state and the backend
Before this patch the interaction between the layers above was as follows:
   - trace_state was passing the trace event data to a backend object every
     time trace() method was called.
   - trace_state was passing the session data to a backend object in a destructor.
   - A backend object was storing this data in a form of lambda where all data
     above was caught in a capture list. This was primarily done in order to
     delay the call for make_xxx_mutation(). Lambdas were stored in a map by a session
     ID and they were executed when a kick() method was called.
   - A tracing::tracing object was periodically calling a kick() method of a
     backend that was initiating a write of all pending data to the storage.

All backend methods used in the described above interactions were virtual.
Thereby, for instance, for each and every trace record we were calling a virtual method that was
receiving a significant amount of parameters, store a lambda in a map and return.
This is clearly a suboptimal way of using virtual functions since we prevent a compiler
from inlining an obviously inlinable operations.

This patch changes the interaction scheme to be as follows:
   - Trace events and session data are stored and passed around in a form of structs
     that hold all relevant information (no more lambdas).
   - As long as a trace session is active its data is aggregated inside the corresponding
     trace_state object.
   - The object containing all records is passed and stored as a lw_shared_ptr to save extra
     copies and to shorten capture lists.
   - All aggregated data is passed to a tracing::tracing object in a trace_state destructor.
     The data is stored in a std::deque in a tracing::tracing object (instead of a map by a session ID).
   - A single backend's virtual method call writes all data aggregated so far (kick()
     method is not needed any more), every time a write event occurs.
   - Backend has only one virtual method now:
      - Write a bulk of sessions' data aggregated so far.
   - Backend's virtual method receives a records bulk object by reference.

As a result:
   - A latency of a single trace event that has no formatting improved from 0.2us to 0.1us.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-08-09 15:25:52 +03:00
Vlad Zolotarov
960b423ce0 tracing/tracing.cc: rename a logger object
s/logger/tracing_logger/

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-08-09 15:21:47 +03:00
Nadav Har'El
c2e4f5ba16 Avoid some warnings in debug build
The sanitizer of the debug build warns when a "bool" variable is read when
containing a value not 0 or 1. In particular, if a class has an
uninitialized bool field, which class logic allows to only be set later,
then "move"ing such an object will read the uninitialized value and produce
this warning.

This patch fixes four of these warnings seen in sstable_test by initializing
some bool fields to false, even though the code doesn't strictly need this
initialization.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470744318-10230-1-git-send-email-nyh@scylladb.com>
2016-08-09 13:21:45 +01:00
Vlad Zolotarov
e1b2926a8d tracing: add a missing try-catch in params building
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-08-09 15:21:41 +03:00
Duarte Nunes
0ed19ec64d thrift: Set default validator
This patch sets the default validator for dynamic column families.
Doing so has no consequences in terms of behavior, but it causes the
correct type to be shown when describing the column family through
cassandra-cli.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470739773-30497-1-git-send-email-duarte@scylladb.com>
2016-08-09 13:55:07 +02:00
Avi Kivity
da4d33802e Merge "Add configuration file to scylla-housekeeping" from Amnon
"The series adds an optional configuration file to the scylla-housekeeping. The
file act as a way to prevent the scylla-housekeeping to run. A missing
configuration file, will make the scylla-housekeeping immediately.

The series adds a flag to the build_rpm that differentiate between public
distributions that would contain the configuration file and private
distributions that will not contain it which will cause the setup script to
create it."
2016-08-09 14:52:19 +03:00
Nadav Har'El
e005762271 sstable: avoid copying non-existant value
The promoted-index reading code contained a bug where it copied the value
of an disengaged optional (this non-value was never used, but it was still
copied ). Fix it by keeping the optional<> as such longer.

This bug caused tests/sstable_test in the debug build to crash (the release
build somehow worked).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470742418-8813-1-git-send-email-nyh@scylladb.com>
2016-08-09 14:35:18 +03:00
Duarte Nunes
f63886b32e thrift: Send empty col metadata when describing ks
This patch ensures we always send the column metadata, even when the
column family is dynamic and the metadata is empty, as some clients
like cassandra-cli always assume its presence.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470740971-31169-1-git-send-email-duarte@scylladb.com>
2016-08-09 14:33:18 +03:00
Pekka Enberg
9ff242d339 cql3: Filter compaction strategy class from compaction options
Cassandra 2.x does not store the compaction strategy class in compaction
options so neither should we to avoid confusing the drivers.

Fixes #1538.
Message-Id: <1470722615-29106-1-git-send-email-penberg@scylladb.com>
2016-08-09 10:38:37 +02:00
Pekka Enberg
c23acbe5e6 Update scylla-ami submodule
* dist/ami/files/scylla-ami 2e599a3...14c1666 (1):
  > setup coredump on first startup
2016-08-09 11:09:24 +03:00
Nadav Har'El
bce020efbd Fix failing tests
Commit 0d8463aba5 broke some of the tests with an assertion
failure about local_is_initialized(). It turns out that there is more than
one level of local_is_initialized() we need to check... For some tests,
neither locals were initialized, but for others, one was and the other
wasn't, and the wrong one was tested.

With this patch, all unit tests except "flush_queue_test.cc" pass on my
machine. I doubt this test is relevant to the promoted index patches,
but I'll continue to investigate it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1470695199-32649-1-git-send-email-nyh@scylladb.com>
2016-08-09 10:51:40 +03:00
Calle Wilund
1593f4254b flush_queue_test: Start semaphore in propagation tests not initialized
Somehow, all my local runs timed ok anyway, but obviously not on all
machines.
Message-Id: <1470727968-1759-1-git-send-email-calle@scylladb.com>
2016-08-09 09:35:28 +02:00
Asias He
d8bff4f745 gossip: Fix debug log in wait_for_gossip_to_settle
There is an extra '{}' in the logger format string.

Fixes:

gossip - Gossip looks settled. 8 gossip round completed: ???

Message-Id: <1470278008-29914-2-git-send-email-asias@scylladb.com>
2016-08-08 16:38:21 +03:00
Takuya ASADA
60ce16cd54 dist/common/scripts: mkdir -p /var/lib/scylla/coredump before symlinking
We are creating this dir in scylla_raid_setup, but user may create XFS volume w/o using the command, scylla_coredump_setup should work on such condition.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1470638615-17262-1-git-send-email-syuu@scylladb.com>
2016-08-08 16:02:55 +03:00
Pekka Enberg
3b31d500c8 dist/docker: Document data volume and cpuset configuration
Message-Id: <1470649675-5648-1-git-send-email-penberg@scylladb.com>
2016-08-08 15:50:30 +03:00
Pekka Enberg
4372da426c dist/docker: Add '--broadcast-rpc-address' command line option
We already have a '--broadcat-address' command line option so let's add
the same thing for RPC broadcast address configuration.

Message-Id: <1470656449-11038-1-git-send-email-penberg@scylladb.com>
2016-08-08 15:46:42 +03:00
Avi Kivity
98226a14ac Merge "Exception propagation writers in commitlog batch"
"
While periodic mode is a all-bets-off crap-shoot as far as knowing if
data actually reached disk or not, batch mode is supposed to be
somewhat more reliable/deterministic.

Thus, if we get an exception writing/flushing the current buffer,
we should propagate exceptions to all execution paths involved
in this buffer.

Flush queue can now (optionally) propagate exceptions to all clients, and
commit log uses this to ensure that commit log writers in batch mode
all generate exceptions on disk errors.

Also includes some rudimentary tests for flush queue mechanisms.

Note: other main user, sstable flushing, is not affected, as default
mode is still to keep exceptions to individual worker continuations,
not waiters."
2016-08-08 15:33:26 +03:00
Avi Kivity
700feda0db Merge "promoted index for reading partial partitions" from Nadav
"The goal of this patch series is to support reading and writing of a
"promoted index" - the Cassandra 2.* SSTable feature which allows reading
only a part of the partition without needing to read an entire partition
when it is very long. To make a long story short, a "promoted index" is
a sample of each partition's column names, written to the SSTable Index
file with that partition's entry. See a longer explanation of the index
file format, and the promoted index, here:

     https://github.com/scylladb/scylla/wiki/SSTables-Index-File

There are two main features in this series - first enabling reading of
parts of partitions (using the promoted index stored in an sstable),
and then enable writing promoted indexes to new sstables. These two
features are broken up into smaller stand-alone pieces to facilitate the
review.

Three features are still missing from this series and are planned to be
developed later:

1. When we fail to parse a partition's promoted index, we silently fall back
   to reading the entire partition. We should log (with rate limiting) and
   count these errors, to help in debugging sstable problems.

2. The current code only uses the promoted index when looking for a single
   contiguous clustering-key range. If the ck range is non-contiguous, we
   fall back to reading the entire partition. We should use the promoted
   index in that case too.

3. The current code only uses the promoted index when reading a single
   partition, via sstable::read_row(). When scanning through all or a
   range of partitions (read_rows() or read_range_rows()), we do not yet
   use the promoted index; We read contiguously from data file (we do not
   even read from the index file, so unsurprisingly we can't use it)."
2016-08-07 17:53:17 +03:00
Avi Kivity
bbaebb39a8 Update scylla-ami submodule
* dist/ami/files/scylla-ami 863cc45...2e599a3 (1):
  > Do not set developer-mode on unsupported instance types
2016-08-07 17:51:24 +03:00
Nadav Har'El
fc063ae62d tests: add test for promoted index writing
In this unit test, we create using Scylla C++ code, the same large
partition with 13520 CQL rows as we previously imported from Cassandra
for the large partition test. We then verify that the sstable index file
we just wrote is byte-for-byte identical to the one previously created by
Cassandra. They should indeed be identical, because the data file has the
same layout (even if timestamps are different) and our default promoted-
index block size is the same (64K) so the sample of columns should be
identical.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:47:15 +03:00
Nadav Har'El
0d8463aba5 sstables: promoted index write support
This patch adds writing of promoted index to sstables.

The promoted index is basically a sample of columns and their positions
for large partitions: The promoted index appears in the sstable's index
file for partitions which are larger than 64 KB, and divides the partition
to 64 KB blocks (as in Cassandra, this interval is configurable through
the column_index_size_in_kb config parameter). Beyond modifying the index
file, having a promoted index may also modify the data file: Since each
of blocks may be read independently, we need to add in the beginning of
each block the list of range tombstones that are still open at that
position.

See also https://github.com/scylladb/scylla/wiki/SSTables-Index-File

Fixes #959

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:47:12 +03:00
Nadav Har'El
09ede5f333 range tombstone accumulator: add method
Add to the range_tombstone_accumulator a range_tombstones_for_row(ck)
method.

Just like the existing tombstone_for_row(ck), this function drops from
the accumulator tombstones that end before ck. But while the existing
function returned just a single tombstone affecting the given row (the
most recent tombstone), the new function range_tombstones_for_row(ck)
returns all the accumulated range tombstones which cover ck.

This function will be useful for the promoted-index writing code later,
which divides a partition into blocks which may be read independently,
so each block needs to start with a repeat of the earlier tombstones
which still cover the first row in the new block.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:47:10 +03:00
Nadav Har'El
022a69caea sstables: promoted index read support
This patch adds support more efficiently reading small parts of a large
partition, without reading the entire partition as we had to do so far.
This is done using the "promoted index".

The "promoted index" is stored in the sstable index file, and provides
for each large sstable row ("partition" in CQL nomenclature) a sample of
the column names at (for example) 64KB intervals. This means that when we
read a slice of columns (e.g., cql rows), or page through a large partition,
we do not have to read the entire partition from disk.

This patch only implements the read side of promoted index - a later patch
will add the write-side support (i.e., writing the promoted index to the
index file while saving the sstable). Nevertheless this patch can already
be tested by reading existing sstables from Cassandra which include a
promoted index - such as the one included in the test in the previous patch.

The use of the promoted index currently has two limitations:

1. It is only used when reading a single partition with sstable::read_row(),
not when scanning through many partitions with sstable::read_range_rows()
or sstable::read_rows().

2. It is only used when filtering a single clustering-key range, rather
than a list of disjoint ranges. A single range is the common case.

These two issues will be improved later. In the meantime, in those
unsupported cases we simply continue to read entire partitions, so we're not
worse-off than before.

Also note that this patch only helps when sstable::read_row() is used with
a clustering-key prefix (i.e., a slice). Our higher-level request handling
code may decide to read an entire partition into the cache, and not use
a clustering-key prefix at all when reading. We will need to indepdently
improve the high-level code to use read_row()'s slicing capabilities
when paging through large partitions, for example.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:47:09 +03:00
Nadav Har'El
6cdf5684f5 sstables: introduce find_disk_ranges()
Our sstable reading code is currently hard-coded to read entire partitions,
even if we know that only a subset of the columns are requested.
This patch introduces find_disk_ranges(), a function to find the ranges of
bytes we need to read from the sstable data file to guarantee that the
desired columns from the desired partition are read.

The returned range may be the entire byte range of the given partition -
as found using the summary and index files - but if the index contains a
"promoted index" (basically a sample of column positions for each key)
we may return a smaller range. The "disk_read_range" type introduced in
the previous patch is extended here to support reading a partial partition -
by including additional information which would be missed when reading only
part of a partition (viz., the partition key and the partition's tombstone).

This function isn't used in this patch - we will wire its use in the next
patch, which will complete the read-side support for the promoted index.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:47:08 +03:00
Nadav Har'El
1d38a69e49 sstables: expose promoted index in index entry
Our index_entry type, holding one partition's entry that we read from the
index file, already contained the "_promoted_index" which we read from
disk - as an unparsed byte buffer. But there wasn't any API to access
this buffer after it was read. This patch adds a trivial getter, to get
a read-only view of this buffer.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:47:06 +03:00
Nadav Har'El
4e5a09538d make column-name parsing code public
The "struct column" code in partition.cc is generally useful code for
parsing serialized column names from the sstable. It is currently private
inside the "mp_row_consumer" class. But in a next patch we'll also want
to use it in the "sstable" class, for the promoted-index parsing code,
which among other things also needs to deserialize column names.

The trivial fix, in this patch, is to make this code "public". However,
for now it is still available only in partition.cc.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:47:04 +03:00
Nadav Har'El
1975abbfd6 sstables: disk_read_range
Currently, the main sstable data parsing entry point data_consume_rows()
takes a contiguous range of bytes to read from disk and parse. This range
is supposed to be an entire partition or contiguous group of partitions.
and is self contained (can be parsed without extra information about the
identity of these partitions).

For the promoted index feature (which we will add in a following patch)
we will want the range to span only a part of a partition, and will need
the caller to provide some information not available to the parser (such
as the partition's key). In the future, we will also want to support a
vector of byte ranges, instead of just one.

So in preparation for this, this patch simply replaces the start/end pair
by a new class disk_read_range, which can be easily extended in later
patches. No new functionality is introduced in this patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:47:02 +03:00
Nadav Har'El
6fed716bd3 sstables: STOP_THEN_ATOM_START parser state
In a later patch adding "promoted index" read support, we would like to
parse only part of an sstable row. In that case, the parser should start
not at the usual ROW_START state, but rather at the ATOM_START state.

But there's a problem: The sstable parser consumer currently assumes that
the parser stops after the start of the row, before reading any atoms.
So in the partial row case too, we must stop parsing before reading the
first atom.

For this, this patch adds the new "STOP_THEN_ATOM_START" parser state.
When starting in this state, the parser stops immediately (with
row_consumer::proceed::no), and when restarted again it will be in the
ATOM_START case.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:47:01 +03:00
Nadav Har'El
3dd079fb7a tests: add test for reading parts of a large partition
This patch adds a test that takes an sstable with one partition of 13,520
clustering rows (spanning 700 KB in the data file), and attempts to read
various slices CQL rows, counting that we got back the expected number
of rows.

The sstable included here was generated by Cassandra, and includes a
promoted index. Promoted index reading is not supported yet (we will
add it in the next patch), so for now the code will always read the
entire partition from disk; But still the clustering-key filtering is
already functional, and will drop some of the rows as requested,
so this test will pass.

Later, when we add promoted index support, we should check that this test
still passes - promoted index will make the reads in this test more
efficient (which the test cannot verify), but the important thing to check
is that it doesn't break any of these tests.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2016-08-07 17:46:59 +03:00
Takuya ASADA
3d45d6579b dist/ami/files: add a message for privacy policy agreement on login prompt
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1470212054-351-1-git-send-email-syuu@scylladb.com>
2016-08-07 17:40:25 +03:00
Amnon Heiman
beba3a31cb scylla-housekeeping.service: Add a configuration file
Adding the configuration file is a way to make the running
scylla-housekeeping service not run the check version.

If the file does not exists, it will be set by the setup script.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-08-07 16:24:43 +03:00
Amnon Heiman
9c4bf651ae install housekeeping.conf based on dist flag
The new dist flag difrentiate between public distribution and private
compilations.

For public distributation the housekeeping configuration file will be
installed.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-08-07 16:24:42 +03:00
Amnon Heiman
060c3029b0 scylla_setup: create the scylla-housekeeping conf file if missing
When running the scylla_setup, the script would check that the
housekeeping configuration file: housekepping.conf exists.
If not, it would ask the user if to run the check version option or not
and create the file accordingly.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-08-07 16:24:23 +03:00
Amnon Heiman
63582dd043 Add a config file for housekeeping
The housekeeping.conf is a configuration file for scylla-housekeeping.
By default it will be included in the rpm and state that the
check-version would be run.

If the file is missing, or if check-version is set to false, the check
version operation will not be performed.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-08-07 13:14:59 +03:00
Amnon Heiman
93737a601f scylla-housekeeping: An optional config file
This patch adds an optional config file that can be passed from the
command line.

If the config file is specified and does not exist, the script will
terminate.

The only parameter that is currently available is check-version that can
be either true or false.

The ConfigParser module is used to read the config file, that should be
on the form of:
[housekeeping]
check-version: True

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-08-07 13:14:59 +03:00
Duarte Nunes
e0a43a82c6 system_keyspace: Correctly deal with wrapped ranges
This patch ensures we correctly deal with ranges that wrap around when
querying the size_estimates system table.

Ref #693

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470412433-7767-1-git-send-email-duarte@scylladb.com>
2016-08-05 19:17:00 +03:00
Avi Kivity
b0a275945f Merge "Remove compact columns" from Duarte
"The compact column is a dense schema's single regular column. Its
existence has been a source of bugs, so this patchset removes the
column_kind::compact_column, as well as further references to compact
columns from the code base.

Fixes #1542"
2016-08-05 12:39:23 +03:00