Commit Graph

12013 Commits

Author SHA1 Message Date
Paweł Dziepak
3ecceaee48 Merge "Fix fast_forward_to() on sstable reader being ignored in some cases" from Tomasz
"When mutation reader enters the partition using index,
streamed_mutation object is returned to the user before the row start
fragment is processed. In that case, when we process the row start, we
should ignore it and not call setup_for_partition() again. That may
override user's fast_forward_to() request."

* 'tgrabiec/fix-initial-fast-forward-to-for-single-key-sstable-readers' of github.com:scylladb/seastar-dev:
  tests: mutation_source_test: Test forwarding in single-key readers
  sstables: Remove unused code
  sstables: mutation_reader: Fix setup_for_partition() being called twice in some cases
  sstables: Fix verify_end_state() to tolerate ATOM_START_2 state
2017-05-17 15:35:30 +01:00
Avi Kivity
eb69fe78a4 Merge "Adding private repository to housekeeping" from Amnon
"This series adds private repository support to scylla-housekeeping"

* 'amnon/housekeeping_private_repo_v3' of github.com:cloudius-systems/seastar-dev:
  scylla-housekeeping service: Support private repositories
  scylla-housekeeping-upstart: Use repository id, when checking for version
  scylla-housekeeping: support private repositories
2017-05-17 15:56:46 +03:00
Tomasz Grabiec
84648f73ef Merge "Fix performance problems with high shard counts tag" from Avi
From http://github.com/avikivity/scylla exponential-sharder/v3.

The sharder, which takes a range of tokens and splits it among shards, is
slow with large shard count and the default
murmur3_partitioner_ignore_msb_bits.

This patchset fixes excessive iteration in sstable sharding metadata writer and
nonsignular range scans.

Without this patchset, sealing a memtable takes > 60 ms on a 48-shard
system.  With the patchset, it drops below the latency tracker threshold I
used (5 ms).
2017-05-17 14:03:33 +02:00
Avi Kivity
68034604e1 dht: murmur3_partitioner: simplify moving to and from the zero-based token range 2017-05-17 13:50:30 +03:00
Avi Kivity
1a99ebaa65 storage_proxy: switch to the exponential sharder for nonsingular queries
Nonsingular queries used exponential expansion of the token space to
avoid spending too much cpu time on near-empty tables, but the generation
of the search space was itself exponential.  Switch to the exponential sharder
which has linear cost.
2017-05-17 13:50:30 +03:00
Avi Kivity
00f48f96cb sstables: select just the shard we want when writing sharding metadata
On a system with many shards, this saves many useless iterations where
we just skip the unwanted shard.
2017-05-17 13:50:30 +03:00
Avi Kivity
44a1a51987 tests: add tests for dht::split_range_to_single_shard() 2017-05-17 13:50:30 +03:00
Avi Kivity
76f12a8842 dht: add split_range_to_single_shard()
Intersects a shard's owning range with a ring position range, and return
the sorted result.
2017-05-17 13:50:27 +03:00
Tomasz Grabiec
1da3daa4f4 range: Use more standard notation for singular range
Reuse notation for a single-element set.

Message-Id: <1494923827-10097-1-git-send-email-tgrabiec@scylladb.com>
2017-05-17 13:28:42 +03:00
Avi Kivity
a65e8bd215 dht: add a ring-position-range-vector variant of the exponential sharder
The "exponentiality" is not carried over from one range to another, because
we expect one or two ranges (two ranges result from a wrapped around thrift
token range).
2017-05-17 13:18:52 +03:00
Avi Kivity
6eb6f12909 tests: add test for ring_position_exponential_sharder 2017-05-17 13:18:52 +03:00
Avi Kivity
f671ac13b4 dht: add an exponential ring_position range sharder
Like the regular sharder, the exponential sharder divides a range into
subranges owned by individual ranges.  Unlike the regular sharder, it
generates ever-increasing subranges, spanning more and more shards, and
eventually returns several subranges per shard.  To avoid using
exponential cpu and memory, subranges belonging to a single shard are merged,
and a flag is set to indicate the subranges are not ordered wrt. each other.
2017-05-17 13:18:49 +03:00
Avi Kivity
025c6b45b2 dht: extend i_partitioner::next_token_for_shard()
Right now, next_token_for_shard() only allows iterating linearly in shard
order.  Add the ability to select a specific shard to skip to (in case we're
only interested in a single shard), and to select larger ranges (so that
exponential increases are not implemented by iteration).
2017-05-17 12:30:03 +03:00
Avi Kivity
7156ea8804 dht: make ring_position_range_sharder more independent of global_partitioner
Useful for testing.
2017-05-17 12:30:03 +03:00
Avi Kivity
302fec8293 dht: make i_partitioner::name() const 2017-05-17 12:30:03 +03:00
Avi Kivity
f462c4327e dht: make i_partitioner keep track of the number of shards it was configured with
Useful for testing classes layered on top of the partitioner (the sharders).
2017-05-17 12:30:03 +03:00
Avi Kivity
04b16ae8ec dht: fix partitioner initialization for tests
The partitioners now depend on smp::count to be initialized correctly,
but smp::count isn't available at static initialization time.

The scylla executable isn't affected because it calls set_global_partitioner()
after smp::count has been initialized.

Fix by deferring initialization to the first global_partitioner() call.
2017-05-17 12:30:03 +03:00
Avi Kivity
1c6cecd9d0 utils: introduce div_ceil()
Divides integrals but rounds up rather than down.
2017-05-17 12:30:03 +03:00
Avi Kivity
f1dbb951da Merge "Materialized views: implement read before write" from Duarte
"This patch ensures we read the base table rows that an update
is modifying, in order to correctly calculate the set of
materialized view updates.

The read-before-write is performed on the shard applying the
update and attempts to do a precise read of the rows being modified,
which can be more than one in case of ranged deletions or a
batch update."

* 'materialized-views/read-existing/v2' of https://github.com/duarten/scylla:
  database: Read existing base mutations
  db/view: Calculate clustering ranges for MV read-before-write query
  db/view: Replace entry if cells don't match
  view_info: Store base regular col in the view's PK as column_id
  compound_view_wrapper: Add tri_compare
  bound_view: Build range bound from bound_view
  clustering_bounds_comparator: Enable Range concept
  range: Add lvalue version of transform()
  tests: Add test case for nonwrapping_range::intersection()
  nonwrapping_range: Add intersection() function
2017-05-17 12:26:26 +03:00
Duarte Nunes
983af595e9 database: Read existing base mutations
When generating updates for a materialized view we need to read the
existing base row, to be able to determine the primary key of the view
row the new base update will supplant, in case the view includes a
base non-primary key column in its own primary key. That old view row
will be tombstoned or updated, if it exists, depending on the difference
between the new base row and the existing one, if any.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-17 10:33:19 +02:00
Duarte Nunes
8a77bfe35b db/view: Calculate clustering ranges for MV read-before-write query
Introduce the calculate_affected_clustering_ranges() function to
calculate the smallest subject of affected clustering ranges that we
need to query for.

The update_requires_read_before_write() function checks whether
a view is potentially affected by the base update.

The patch also cleans up the may_be_affected_by() function.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-17 10:33:19 +02:00
Duarte Nunes
ec681060a8 db/view: Replace entry if cells don't match
If a base table regular columns is part of the view's pk, and if that
column changes, we should replace the entry, by deleting the row(s)
with the old value and inserting a new one.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-17 10:33:19 +02:00
Duarte Nunes
f41a5e554d view_info: Store base regular col in the view's PK as column_id
This patch stores the base_non_pk_column_in_view column as column_id,
which is more convenient, and it also stores a two-level optional to
encode both lazy initialization and the absence of such a column.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-17 10:33:18 +02:00
Duarte Nunes
257eaa0d05 compound_view_wrapper: Add tri_compare
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-17 10:33:18 +02:00
Duarte Nunes
06a6679826 bound_view: Build range bound from bound_view
We introduce the bound_view::to_range_bound() function, which builds a
wrapping_range or nonwrapping_range bound from a bound_view.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-17 10:33:18 +02:00
Duarte Nunes
8288e504fb clustering_bounds_comparator: Enable Range concept
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-17 10:33:18 +02:00
Duarte Nunes
fb1e966137 range: Add lvalue version of transform()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-17 10:33:18 +02:00
Duarte Nunes
f365b7f1f7 tests: Add test case for nonwrapping_range::intersection()
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-17 10:33:18 +02:00
Duarte Nunes
1f9359efba nonwrapping_range: Add intersection() function
intersection() returns an optional range with the intersection of the
this range and the other, specified range.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-05-17 10:33:18 +02:00
Avi Kivity
f5dae826ce Merge "Migrate schema tables to v3 format" from Calle
"Defines origin v3-format for system/schema tables, and use them for
schema storage/retrival.

Includes a legacy_schema_migrator implementation/port from origin. Note
that since we don't support features like triggers, functions and
aggregates, it will bail if encountering such a feature used.

Note also that this patch set does not convert the "hints" and
"backlog" tables, even though these have changed in v3 as well.
That will be a separate patch set.

Tested against dtests. Note that patches for dtest + ccm
will follow."

* 'calle/systemtables' of github.com:cloudius-systems/seastar-dev: (36 commits)
  legacy_schema_migrator: Actually truncate legacy schema tables on finish
  database: Extract "remove" from "drop_columnfamily"
  v3 schema test fixes
  thrift: Update CQL mapping of static CFs
  schema_tables: Use v3 schema tables and formats
  type_parser: Origin expects empty string -> bytes_type
  cf_prop_defs: Add crc_check_chance as recognized (even if we don't use)
  types_test: v3 style schemas enforce explicit "frozen" in tupes/ut:s
  cql3_type: v3 to_string
  cql_types: Introduce cql3_type::empty and associate with empty data_type
  schema: rename column accessors to be in line with origin
  schema: Add "is_static_compact_table"
  schema_builder: Add helper to generate unique column names akin origin
  schema: Add utility functions for static columns
  schema: Use heterogeneous comparator for columns bounds
  cql3_type_parser: Resolve from cql3 names/expressions
  cql3_type: Add "prepare_interal" and "references_user_type"
  cql3::cql3_type: Add prepare_internal path using only "local" holders
  cql3_type: Add virtual destructor.
  database/main: encapsulate system CF dir touching
  ...
2017-05-17 11:25:52 +03:00
Asias He
0abfe39d8f database: Log compaction strategy setting on shard 0 only
The compaction strategy is per node not per shard. Do not duplicate the
same log on all shards.

Message-Id: <1494835519.git.asias@scylladb.com>
2017-05-17 11:17:41 +03:00
Avi Kivity
f09f056515 Merge seastar upstream
* seastar 4a3118c...45b718b (7):
  > tests: make connect_test use a random port
  > log: Introduce log.info0
  > configure.py: link to DPDK PMD drivers which are already built on build/dpdk and enabled by default on DPDK config
  > Update fmt submodule
  > perftune: fix perftune.py IndexError when NIC uses less IRQs than requested.
  > build: Add more required build dependencies to the Dockerfile
  > Prometheus: Reserve in protobuf object before iterating
2017-05-17 11:16:58 +03:00
Raphael S. Carvalho
a58699cc92 sstables: kill sstable::mark_for_deletion_on_disk
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170515233021.21223-1-raphaelsc@scylladb.com>
2017-05-17 11:15:59 +03:00
Raphael S. Carvalho
deabf06d49 lcs: log invariant restoration
It will be useful for understanding the strategy behavior after
invariant is possibly broken by resharding.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170515234925.22793-1-raphaelsc@scylladb.com>
2017-05-17 11:15:41 +03:00
Avi Kivity
2eef7cd395 Merge "compress the tracing session ID when compression is requested" from Vlad
"Tested with:
   - test.py --mode relase
   - debug/test-serialization
   - c-s with both debug and relase compiled scylla with authentication enabled:
     cassandra-stress write  n=10000 no-warmup -rate threads=10 -mode native unprepared cql3 user='cassandra' password='cassandra'

Tested with:
   - test.py --mode relase
   - debug/test-serialization
   - c-s with both debug and relase compiled scylla with authentication enabled:
     cassandra-stress write  n=10000 no-warmup -rate threads=10 -mode native unprepared cql3 user='cassandra' password='cassandra'"

* 'compress_tracing_session_id-v6' of github.com:cloudius-systems/seastar-dev:
  cql_server::response: rework the tracing session ID insertion
  utils::UUID: align the UUID serialization API with the similar API of other classes in the project
  utils: serialization: unify the variety of serialize_XXX(...)
  cql_server::response: rework the compress(...) method
  cql_server::response: store the frame flags inside the class
2017-05-17 09:48:49 +03:00
Pekka Enberg
374c3d66ab Merge "Fixes for CQL regressions" from Duarte
"This series fixes a set of regressions introduced by
 f7bc88734a, resulting in two failed
 tests:

   testDenseNonCompositeTable(org.apache.cassandra.cql3.validation.operations.CreateTest)

 and

   testStaticColumnsWithDistinct(org.apache.cassandra.cql3.validation.entities.StaticColumnsTest)"

* 'cql-fixes/v1' of github.com:duarten/scylla:
  update_statement: Reject empty values for dense clustering key
  modification_statement: Fix detection of clustering keys
  cql3/restrictions/statement_restrictions: Consider statement type
  cql3/statements/modification_statement: Extract statement_type
2017-05-17 09:29:24 +03:00
Vlad Zolotarov
a0737abdc5 cql_server::response: rework the tracing session ID insertion
Insert the tracing session ID into the response body in the cql_server::response constructor.

Fixes #2356

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-16 15:57:28 -04:00
Vlad Zolotarov
494ea82a88 utils::UUID: align the UUID serialization API with the similar API of other classes in the project
The standard serialization API (e.g. in data_value) includes the following methods:

size_t serialized_size() const;
void serialize(bytes::iterator& it) const;
bytes serialize() const;

Align the utils::UUID API with the pattern above.

The only addition is that we are going to make an output iterator parameter of a second method above
a template so that we may serialize into different output sources.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-16 15:56:03 -04:00
Vlad Zolotarov
7706775a63 utils: serialization: unify the variety of serialize_XXX(...)
Use the same templated implementation for all different serialize_XXX(...).

The chosen implementation is based on the std::copy_n(char*, size, OutputIterator),
which is heavily optimized and will be using memcpy/memmove where possible.

This patch also removes the not needed specializations that accept signed integer
values since we were casting them to unsigned value anyway.

The std::ostream based specifications are also removed since they are not used
anywhere except for a test-serialization.cc and adjusting the ostream to the iterator
is a single-liner.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-16 15:56:03 -04:00
Vlad Zolotarov
a33fe5b775 cql_server::response: rework the compress(...) method
Cleanup the compress(...) method interface:
   - Encapsulate the technical details inside the method:
      - Re-write the _body inside the method instead of returning it.
      - Set the response::_flags inside the method.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-16 15:53:35 -04:00
Vlad Zolotarov
c00814383d cql_server::response: store the frame flags inside the class
It makes a lot more sense to keep the flags mask inside the response and update it each time
the corresponding feature is set instead of holding the separate components like tracing state
pointer.

This patch adds this ability to set the flags.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-05-16 14:31:54 -04:00
Takuya ASADA
da55aecca3 dist: add conflict with Cassandra
Cassandra and Scylla are not able to install single instance, so add
cassandra to 'Conflicts'.

Fixes #2157

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1494856314-9322-1-git-send-email-syuu@scylladb.com>
2017-05-16 19:18:27 +03:00
Gleb Natapov
c7ad3b9959 database: remove temporary sstables sequentially
The code that removes each sstable runs in a thread. Parallel
removing of a lot of sstables may start a lot of threads each of which
is taking 128k for its stack. There is no much benefit in running
deletion in parallel anyway, so fix it by deleting sstables sequentially.

Fixes #2384

Message-Id: <20170516103018.GQ3874@scylladb.com>
2017-05-16 15:06:10 +03:00
Tomasz Grabiec
bdf3c536aa tests: mutation_source_test: Test forwarding in single-key readers 2017-05-16 13:36:10 +02:00
Tomasz Grabiec
e07cc44af2 sstables: Remove unused code 2017-05-16 13:31:01 +02:00
Tomasz Grabiec
0e23f8aa9b sstables: mutation_reader: Fix setup_for_partition() being called twice in some cases
When mutation reads enters the partition using index,
streamed_mutation object is returned to the user before the row start
fragment is processed. In that case, when we process the row start, we
should ignore it and not call setup_for_partition() again. That may
override user's fast_forward_to() request.
2017-05-16 13:31:01 +02:00
Tomasz Grabiec
a1dea3c4fc sstables: Fix verify_end_state() to tolerate ATOM_START_2 state
We would be in that state if consume_row_start() returns porceed::yes
and the stream ends after that. This can happen if slicing using
promoted index determined that there are no cells in the partition in
the range.
2017-05-16 13:31:01 +02:00
Raphael S. Carvalho
706ce5a27b sstables: do not swallow system error exception in read_simple
If error code is different than ENOENT, exception is swallowed.
That can lead to a variety of problems down the road.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170515225309.19185-1-raphaelsc@scylladb.com>
2017-05-16 08:47:34 +02:00
Alexys Jacob
9ddc05899d Fix scylla-housekeeping version detection to work with newer setuptools
Newer setuptools parse_version() don't like dashed version strings,
so we should trim it to avoid false negative version_compare() checks.

Signed-off-by: Alexys Jacob <ultrabug@gentoo.org>
Message-Id: <20170511162646.22129-1-ultrabug@gentoo.org>
2017-05-15 12:41:49 +03:00
Gleb Natapov
385645e8df storage_proxy: Fix mutation logging
Log mutation type only if mutation set is not empty.

Message-Id: <20170510142406.GA30426@scylladb.com>
2017-05-11 15:49:52 +01:00