Commit Graph

4972 Commits

Author SHA1 Message Date
Piotr Sarna
dbe8491655 view: cache is_index for view pointer
It's detrimental to keep querying index manager whether a view
is backing a secondary index every time, so this value is cached
at construct time.
At the same time, this value is not simply passed to view_info
when being created in secondary index manager, in order to
decouple materialized view logic from secondary indexes as much as
possible (the sole existence of is_index() is bad enough).
2019-02-20 12:52:32 +01:00
Nadav Har'El
05db7d8957 Materialized views: name the "batch_memory_max" constant
Give the constant 1024*1024 introduced in an earlier commit a name,
"batch_memory_max", and move it from view.cc to view_builder.hh.
It now resides next to the pre-existing constant that controlled how
many rows were read in each build step, "batch_size".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190217100222.15673-1-nyh@scylladb.com>
2019-02-17 13:28:16 +00:00
Rafael Ávila de Espíndola
9cd14f2602 Don't write to system.large_partition during shutdown
The included testcase used to crash because during database::stop() we
would try to update system.large_partition.

There doesn't seem to be an order we can stop the existing services in
cql_test_env that makes this possible.

This patch then adds another step when shutting down a database: first
stop updating system.large_partition.

This means that during shutdown any memtable flush, compaction or
sstable deletion will not be reflected in system.large_partition. This
is hopefully not too bad since the data in the table is TTLed.

This seems to impact only tests, since main.cc calls _exit directly.

Tests: unit (release,debug)

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190213194851.117692-1-espindola@scylladb.com>
2019-02-15 10:49:10 +01:00
Gleb Natapov
0b84b04f97 consistency_level: make it more const correct
Message-Id: <20190214122631.GF19055@scylladb.com>
2019-02-14 14:52:51 +02:00
Nadav Har'El
fec562ec8f Materialized views: limit size of row batching during bulk view building
The bulk materialized-view building processes (when adding a materialized
view to a table with existing data) currently reads the base table in
batches of 128 (view_builder::batch_size) rows. This is clearly better
than reading entire partitions (which may be huge), but still, 128 rows
may grow pretty large when we have rows with large strings or blobs,
and there is no real reason to buffer 128 rows when they are large.

Instead, when the rows we read so far exceed some size threshold (in this
patch, 1MB), we can operate on them immediately instead of waiting for
128.

As a side-effect, this patch also solves another bug: At worst case, all
the base rows of one batch may be written into one output view partition,
in one mutation. But there is a hard limit on the size of one mutation
(commitlog_segment_size_in_mb, by default 32MB), so we cannot allow the
batch size to exceed this limit. By not batching further after 1MB,
we avoid reaching this limit when individual rows do not reach it but
128 of them did.

Fixes #4213.

This patch also includes a unit test reproducing #4213, and demonstrating
that it is now solved.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20190214093424.7172-1-nyh@scylladb.com>
2019-02-14 12:04:40 +02:00
Calle Wilund
e70286a849 db/extensions: Allow schema extensions to turn themselves off
Fixes #4222

Iff an extension creation callback returns null (not exception)
we treat this as "I'm not needed" and simply ignore it.

Message-Id: <20190213124311.23238-1-calle@scylladb.com>
2019-02-13 14:50:51 +02:00
Calle Wilund
4e657c0633 system_keyspace: Add waitable for trunc. migration
For tests. Hooray for separation of concern.
2019-02-13 09:08:12 +00:00
Calle Wilund
64e8c6f31d storage_service: Add features disabling for tests 2019-02-13 09:08:12 +00:00
Calle Wilund
12ebcf1ec7 commitlog_replay: Use dedicated table for truncation
Fixes #4083

Instead of sharded collection in system.local, use a
dedicated system table (system.truncated) to store
truncation positions. Makes query/update easier
and easier on the query memory.

The code also migrates any existing truncation
positions on startup and clears the old data.
2019-02-13 09:08:12 +00:00
Calle Wilund
4a52ed7884 commitlog: Accept recycled (not yet re-used) segments in replay
Refs #4085

Changes commitlog descriptor to both accept "Recycled-Commitlog..."
file names, and preserve said name in the descriptor.

This ensures we pick up the not-yet-used recycled segments left
from a crash for replay. The replay in turn will simply ignore
the recycled files, and post actual replay they will be deleted
as needed.

Message-Id: <20190129123311.16050-1-calle@scylladb.com>
2019-02-12 12:23:55 +02:00
Glauber Costa
e0bfd1c40a allow Cassandra SSTables with counters to be imported if they are new enough
Right now Cassandra SSTables with counters cannot be imported into
Scylla.  The reason for that is that Cassandra changed their counter
representation in their 2.1 version and kept transparently supporting
both representations.  We do not support their old representation, nor
there is a sane way to figure out by looking at the data which one is in
use.

For safety, we had made the decision long ago to not import any
tables with counters: if a counter was generated in older Cassandra, we
would misrepresent them.

In this patch, I propose we offer a non-default way to import SSTables
with counters: we can gate it with a flag, and trust that the user knows
what they are doing when flipping it (at their own peril). Cassandra 2.1
is by now pretty old. many users can safely say they've never used
anything older.

While there are tools like sstableloader that can be used to import
those counters, there are often situations in which directly importing
SSTables is either better, faster, or worse: the only option left.  I
argue that having a flag that allow us to import them when we are sure
it is safe is better than having no option at all.

With this patch I was able to successfully import Cassandra tables with
counters that were generated in Cassandra 2.1, reshard and compact their
SSTables, and read the data back to get the same values in Scylla as in
Cassandra.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190210154028.12472-1-glauber@scylladb.com>
2019-02-10 17:50:48 +02:00
Calle Wilund
ba6a8ef35b tls: Use a default prio string disabling TLS1.0 forcing min 128bits
Fixes #4010

Unless user sets this explicitly, we should try explicitly avoid
deprecated protocol versions. While gnutls should do this for
connections initiated thusly, clients such as drivers etc might
use obsolete versions.

Message-Id: <20190107131513.30197-1-calle@scylladb.com>
2019-02-05 15:34:18 +02:00
Avi Kivity
6c71eae63f Merge "API: Stream compaction history records" from Amnon
"
get_compaction_history can return a lot of records which will add up to a
big http reply.

This series makes sure it will not create large allocations when
returning the results.

It adds an api to the query_processor to use paged queries with a
consumer function that returns a future, this way we can use the http
stream after each record.

This implementation will prevent large allocations and stalls.

Fixes #4152
"

* 'amnon/compaction_history_stream_v7' of github.com:scylladb/seastar-dev:
  tests/query_processor_test: add query_with_consumer_test
  system_keyspace, api: stream get_compaction_history
  query_processor: query and for_each_cql_result with future
2019-02-05 14:16:36 +02:00
Amnon Heiman
6c7742d616 system_keyspace, api: stream get_compaction_history
get_compaciton_history can return big chunk of data.

To prevent large memory allocation, the get_compaction_history now read
each compaction_history record and use the http stream to send it.

Fixes #4152

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2019-02-05 11:14:53 +02:00
Piotr Jastrzebski
834bec5cc9 Read shard awareness columns as dropped
Without this new version of Scylla won't be able to
start with system tables inherited after older version
that had shard awareness columns.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <cb62f20fc0c98f532c6f4ad5e08b3794951e85bd.1549289050.git.piotr@scylladb.com>
2019-02-04 18:43:11 +02:00
Calle Wilund
9cadbaa96f commitlog_replayer: Bugfix: finding truncation positions uses local var ref
"uuid" was ref:ed in a continuation. Works 99.9% of the time because
the continuation is not actually delayed (and assuming we begin the
checks with non-truncated (system) cf:s it works).
But if we do delay continuation, the resulting cf map will be
borked.

Fixes #4187.

Message-Id: <20190204141831.3387-1-calle@scylladb.com>
2019-02-04 16:51:13 +02:00
Avi Kivity
468f8c7ee7 Merge "Print a warning if a row is too large" from Rafael
"
This is a first step in fixing #3988.
"

* 'espindola/large-row-warn-only-v4' of https://github.com/espindola/scylla:
  Rename large_partition_handler
  Print a warning if a row is too large
  Remove defaut parameter value
  Rename _threshold_bytes to _partition_threshold_bytes
  keys: add schema-aware printing for clustering_key_prefix
2019-02-03 13:57:42 +02:00
Piotr Jastrzebski
ad217bbdc7 Revert "system_keyspace: add sharding information to local table"
This reverts commit bdce561ada.

Those columns are not used and cause problems with tools.

Refs #4112
Message-Id: <c772ebc0ebc001e5bdf229424c6d51dc58cd5d2e.1548945023.git.piotr@scylladb.com>
2019-01-31 19:06:55 +01:00
Rafael Ávila de Espíndola
625080b414 Rename large_partition_handler
Now that it also handles large rows, rename it to large_data_handler.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 15:03:14 -08:00
Rafael Ávila de Espíndola
1185138a34 Print a warning if a row is too large
Tests: unit (release)

Refs #3988.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 15:03:10 -08:00
Rafael Ávila de Espíndola
776d5bb9e2 Remove defaut parameter value
The value is already passed by cql_table_large_partition_handler, so
the default was just for nop_large_partition_handler.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 13:02:01 -08:00
Rafael Ávila de Espíndola
30528fa853 Rename _threshold_bytes to _partition_threshold_bytes
A followup patch will add a threshold for rows.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2019-01-28 13:02:01 -08:00
Duarte Nunes
ea34e242de Merge 'Do not use hints for view building' from Piotr
"
This series prevents view building to fall back to storing hints.
Instead, it will try to send hints to an endpoint as if it has
consistency level ONE, and in case of failure retry the whole
building step. Then, view building will never be marked as finished
prematurely (because of pending hints), which will help avoid
creating inconsistencies when decommissioning a node from the cluster.

Tests:
  unit (release)
  dtest (materialized_views_test.py.*)

Fixes #3857
Fixes #4039
"

* 'do_not_mark_view_as_built_with_hints_7' of https://github.com/psarna/scylla:
  db,view: add updating view_building_paused statistics
  database: add view_building_paused metrics
  table: make populate_views not allow hints
  db,view: add allow_hints parameter to mutate_MV
  storage_proxy: add allow_hints parameter to send_to_endpoint
2019-01-28 10:31:14 +00:00
Piotr Sarna
9a6261ca27 db,view: add updating view_building_paused statistics
Each time view building does is paused because of connection failure,
view_building_paused metrics is bumped.
2019-01-28 09:38:42 +01:00
Piotr Sarna
e30cf22956 db,view: add allow_hints parameter to mutate_MV
Mutating MV function can now accept a parameter whether
hints should be allowed during sending mutations to endpoints.
2019-01-28 09:38:42 +01:00
Piotr Sarna
e0fe9ce2c0 storage_proxy: add allow_hints parameter to send_to_endpoint
With hints allowed, send_to_endpoint will leverage consistency level ANY
to send data. Otherwise, it will use the default - cl::ONE.
2019-01-28 09:38:41 +01:00
Rafael Ávila de Espíndola
5332ebd50c Update the description of compaction_large_partition_warning_threshold_mb
Despite the name, this option also controls if a warning is issued
during memtable writes.

Warning during memtable writes is useful but the option name also
exists in cassandra, so probably the best we can do is update the
description.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190125020821.72815-1-espindola@scylladb.com>
2019-01-28 09:09:35 +02:00
Piotr Jastrzebski
ad016a732b Move set_type_impl out of types.hh to types/set.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
b1e1b66732 Move list_type_impl out of types.hh to types/list.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
147cc031db Move map_type_impl out of types.hh to types/map.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:56:38 +01:00
Piotr Jastrzebski
7666e81b51 Decouple database.hh from types/user.hh
This commit declares shared_ptr<user_types_metadata> in
database.hh were user_types_metadata is an incomplete type so
it requires
"Allow to use shared_ptr with incomplete type other than sstable"
to compile correctly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:55:04 +01:00
Piotr Jastrzebski
e92b4c3dbc Move user_type_impl out of types.hh to types/user.hh
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2019-01-24 09:04:04 +01:00
Rafael Ávila de Espíndola
f7d1dc16d4 database: Use nop_large_partition_handler to avoid self-reporting
Currently nop_large_partition_handler is only used in tests, but it
can also be used avoid self-reporting.

Tests: unit(Release)

I also tested starting scylla with
--compaction-large-partition-warning-threshold-mb=0.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20190123205059.39573-1-espindola@scylladb.com>
2019-01-23 21:11:21 +00:00
Duarte Nunes
88c7c1e851 Merge 'hinted handoff: cache cf mappings' from Vlad
"
Cache cf mappings when breaking in the middle of a segment sending so
that the sender has them the next time it wants to send this segment
for where it left off before.

Also add the "discard" metric so that we can track hints that are being
discarded in the send flow.
"

Fixes #4122

* 'hinted_handoff_cache_cf_mappings-v1' of https://github.com/vladzcloudius/scylla:
  hinted handoff: cache column family mappings for segments that were not sent out in full
  hinted handoff: add a "discarded" metric
2019-01-23 00:44:41 +00:00
Vlad Zolotarov
34829b8f81 hinted handoff: cache column family mappings for segments that were not sent out in full
We will try to send a particular segment later (in 1s) from the place
where we left off if it wasn't sent out in full before. However we may miss
some of column family mappings when we get back to sending this file and
start sending from some entry in the middle of it (where we left off)
if we didn't save column family mappings we cached while reading this segment
from its begining.

This happens because commitlog doesn't save a column family information
in every entry but rather once for each uniq column family (version) per
"cycle" (see commitlog::segment description for more info).

Therefore we have to assume that a particular column family mapping
appears once in the whole segment (worst case). And therefore, when we
decide to resume sending a segment we need to keep the column family
mappings we accumulated so far and drop them only after we are done with
this particular segment (sent it out in full).

Fixes #4122

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-01-22 15:24:22 -05:00
Vlad Zolotarov
4516a8cfc4 hinted handoff: add a "discarded" metric
Account the amount of hints that were discarded in the send path.
This may happen for instance due to a schema change or because a hint
being to old.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2019-01-22 14:11:09 -05:00
Benny Halevy
93270dd8e0 gc_clock: make 64 bit
Fixes: #3353

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Benny Halevy
9878b36895 db: get default_time_to_live as int32_t rather than gc_clock::rep
Otherwise, value_cast<> throws std::bad_cast exception
when gc_clock::rep is defined as int64_t.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-01-22 15:34:32 +02:00
Botond Dénes
4e89dea9ea database: don't allow access to global semaphores
Recently we had a bug (#4096) due to a component
(`multishard_mutation_query()`) assuming that all reads used the
semaphore obtainable via `database::user_read_concurrency_sem()`.
This problem revealed that it is plain wrong to allow access to the
shard-global semaphores residing in the database object. Instead all
code wishing to access the relevant semaphore for some read, should do
so via the relevant `table` object, thus guaranteeing that it will get
the correct semaphore, configured for that table.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <4f3a6780eb3240822db34aba7c1ba0a675a96592.1547734212.git.bdenes@scylladb.com>
2019-01-21 16:29:02 +02:00
Avi Kivity
e0914a080e schema_tables: partially de-template make_map_mutation()
make_map_mutation() is called several times, hopfully with the same Map type
parameter. Replace the Func parameter with a noncopyable_function<>.
2019-01-20 15:55:20 +02:00
Avi Kivity
630f841e5b hints: de-template scan_for_hints_dirs()
This function is called twice, and is not doing anything performance critical,
so replace the template parameter Func with std::function<>.x
2019-01-20 15:55:20 +02:00
Avi Kivity
6e6372e8d2 Revert "Merge "Type-eaese gratuitous templates with functions" from Avi"
This reverts commit 31c6a794e9, reversing
changes made to 4537ec7426. It causes bad_function_calls
in some situations:

INFO  2019-01-20 01:41:12,164 [shard 0] database - Keyspace system: Reading CF sstable_activity id=5a1ff267-ace0-3f12-8563-cfae6103c65e version=d69820df-9d03-3cd0-91b0-c078c030b708
INFO  2019-01-20 01:41:13,952 [shard 0] legacy_schema_migrator - Moving 0 keyspaces from legacy schema tables to the new schema keyspace (system_schema)
INFO  2019-01-20 01:41:13,958 [shard 0] legacy_schema_migrator - Dropping legacy schema tables
INFO  2019-01-20 01:41:14,702 [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables
ERROR 2019-01-20 01:41:14,999 [shard 0] seastar - Exiting on unhandled exception: std::bad_function_call (bad_function_call)
2019-01-20 11:32:14 +02:00
Avi Kivity
2407c35cc1 schema_tables: partially de-template make_map_mutation()
make_map_mutation() is called several times, hopfully with the same Map type
parameter. Replace the Func parameter with a noncopyable_function<>.
2019-01-17 18:54:43 +02:00
Avi Kivity
81d004b2c0 hints: de-template scan_for_hints_dirs()
This function is called twice, and is not doing anything performance critical,
so replace the template parameter Func with std::function<>.x
2019-01-17 18:51:46 +02:00
Piotr Sarna
02d88de082 db,view: add consuming units in staging table registration
View update generator service can accept sstables even before it starts,
but it should still acknowledge the number of waiters in the semaphore.

Reported-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <fcaa0f2884ebb4d34d1716e9e1cfed0642b4b85d.1547661048.git.sarna@scylladb.com>
2019-01-16 18:05:17 +00:00
Duarte Nunes
04a14b27e4 Merge 'Add handling staging sstables to /upload dir' from Piotr
"
This series adds generating view updates from sstables added through
/upload directory if their tables have accompanying materialized views.
Said sstables are left in /upload directory until updates are generated
from them and are treated just like staging sstables from /staging dir.
If there are no views for a given tables, sstables are simply moved
from /upload dir to datadir without any changes.

Tests: unit (release)
"

* 'add_handling_staging_sstables_to_upload_dir_5' of https://github.com/psarna/scylla:
  all: rename view_update_from_staging_generator
  distributed_loader: fix indentation
  service: add generating view updates from uploaded sstables
  init: pass view update generator to storage service
  sstables: treat sstables in upload dir as needing view build
  sstables,table: rename is_staging to requires_view_building
  distributed_loader: use proper directory for opening SSTable
  db,view: make throttling optional for view_update_generator
2019-01-15 18:19:27 +00:00
Piotr Sarna
0eb703dc80 all: rename view_update_from_staging_generator
The new name, view_update_generator, is both more concise
and correct, since we now generate from directories
other than "/staging".
2019-01-15 17:31:47 +01:00
Piotr Sarna
beb4836726 db,view: make throttling optional for view_update_generator
Currently registering new view updates is throttled by a semaphore,
which makes sense during stream sessions in order to avoid overloading
the queue. Still, registration also occurs during initialization,
where it makes little sense to wait on a semaphore, since view update
generator might not have started at all yet.
2019-01-15 16:47:01 +01:00
Piotr Sarna
b9203ec4f8 view: wait for stream sessions to finish before view building
During streaming, there's a race between streamed sstables
and view creation, which might result in some tables not being
used to generate view updates, even though they should.
That happens when the decision about view update path for a table
is done before view creation, but after already receiving some sstables
via streaming. These will not be used in view building even though
they should.
Hence, a phaser is used to make the view builder wait for all ongoing
stream sessions for a table to finish before proceeding with build steps.

Refs #4032
2019-01-15 09:36:55 +01:00
Rafael Ávila de Espíndola
26ac2c23ef Change *_row_* names that refer to partitions
This renames some variables and functions to make it clear that they
refer to partitions and not rows.

Old versions of sstablemetadata used to refer to a row histogram, but
current versions now mention a partition histogram instead.

This patch doesn't change the exposed API names.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20181229223311.4184-2-espindola@scylladb.com>
2019-01-09 14:53:42 +02:00