Commit Graph

16140 Commits

Author SHA1 Message Date
Piotr Sarna
27590816f0 cql3: add is_all_eq to primary key restrictions
is_all_eq is later needed to decide if restrictions can be used
in an indexed query.
2018-07-18 18:45:08 +02:00
Piotr Sarna
20a349777e cql3: add explicit conversion between key restrictions
Partition and clustering key restrictions sometimes need to be converted
and this commit provides a way to do that.
2018-07-18 18:45:08 +02:00
Piotr Sarna
f1357defd6 cql3: add apply_to() method to single column restriction
This method allows copying single column restriction,
possibly with a new column definition.
2018-07-18 18:44:38 +02:00
Piotr Sarna
30f9924ad5 cql3: make primary key restrictions' values unambiguous
using directive must be used to disambiguate the overridden method.
2018-07-18 13:28:37 +02:00
Avi Kivity
31151cadd4 Merge "row_cache: Fix violation of continuity on concurrent eviction and population" from Tomasz
"
The problem happens under the following circumstances:

  - we have a partially populated partition in cache, with a gap in the middle

  - a read with no clustering restrictions trying to populate that gap

  - eviction of the entry for the lower bound of the gap concurrent with population

The population may incorrectly mark the range before the gap as continuous.
This may result in temporary loss of writes in that clustering range. The
problem heals by clearing cache.

Caught by row_cache_test::test_concurrent_reads_and_eviction, which has been
failing sporadically.

The problem is in ensure_population_lower_bound(), which returns true if
current clustering range covers all rows, which means that the populator has a
right to set continuity flag to true on the row it inserts. This is correct
only if the current population range actually starts since before all
clustering rows. Otherwise, we're populating since _last_row and should
consult it.

Fixes #3608.
"

* 'tgrabiec/fix-violation-of-continuity-on-concurrent-read-and-eviction' of github.com:tgrabiec/scylla:
  row_cache: Fix violation of continuity on concurrent eviction and population
  position_in_partition: Introduce is_before_all_clustered_rows()
2018-07-18 10:11:34 +03:00
Asias He
506eed325a dht: Fix typo in boot_strapper.cc
Eror -> Error

Message-Id: <ab1050c526f6e70c3a365595376acde7706d86e9.1531877929.git.asias@scylladb.com>
2018-07-18 10:00:27 +03:00
Tomasz Grabiec
894961006b Merge "db/view/view_builder: Fixes to bookkeeping" from Duarte
This series contains a couple of fixes to the bookkeeping of the view
build process, which could cause data to be left behind in the system
tables.

* git@github.com:duarten/scylla.git materialized-views/view-build-fixes/v1:

Duarte Nunes (3):
  db/system_keyspace: Add function to remove view build status of a
    shard
  db/view: Don't have shard 0 clear other shard's status on drop
  db/view: Restrict writes to the distributed system keyspace to shard 0
2018-07-17 18:01:28 +02:00
Tomasz Grabiec
25d09e51ac Merge "db/view/build_progress_virtual_reader: Fixes to clustering key adjusts" from Duarte
This series contains a couple of fixes to the adjusting of clustering
keys in the build_progress_virtual_reader, some of which could
potentially cause heap overflows when querying the legacy system table.

* git@github.com:duarten/scylla.git materialized-views/build-progress-virtual-reader-fixes/v1:

Duarte Nunes (3):
  db/view/build_progress_virtual_reader: Use correct schema to adjust ck
  db/view/build_progress_virtual_reader: Fix full ck detection
  db/view/build_progress_virtual_reader: Also adjust end RT bound
2018-07-17 18:00:30 +02:00
Avi Kivity
9ffa6b9ad6 Merge "Fix leaks and corruption of continuity in cache in case of bad_alloc from key linearization" from Tomasz
"
This series fixes two issues related to bad_allocs and keys which require
linearization (larger than 12.8 KiB). With such keys, comparators may throw if
memory allocation fails. This may cause lookups in partition and rows trees to
fail with bad_alloc.

The first issue (#3583) was that partition version merging
(mutation_partition::apply_monotonically()) was not taking into account that
lookups may fail. If we fail, the partition which is being applied may be
incorrectly left with the clustering range since the begging of the range up
to the current row marked as continuous, if the current row has the continuity
flag set, because we've moved all of the preceding rows into the target, and
the correct lower bound row is no longer there in the source. This may mark
some discontinuous ranges as continuous. Merging is retried by
allocating_section, and there will be no problem if it eventually succeeds,
original continuity will be reflected in the sum. The problem will persist if
it doesn't eventually succeed, when we're really out of memory.

The user-perceivable effect of this would be temporary loss of writes in the
clustering range which was marked as continuous but shouldn't. Introduced in
2.2-rc1.

The second issue (#3585) is that the code which inserts partitions in memtable
and cache will leak the entry if boost::intrusive_set::insert() throws. This
will also cause SIGSEGV when cache tries to evict from such a leaked entry.
"

* tag 'tgrabiec/fix-bad-continuity-on-oom-in-apply-v2' of github.com:tgrabiec/scylla:
  managed_bytes: Mark read_linearize() as an allocation point
  tests: Relax expectation about continuity after failed merging
  tests: mutation_partition: Verify continuity is consistent on bad_alloc on merging
  tests: Switch to seastar's allocation failure injector
  mutation_partition: Introduce set_continuity()
  clustering_interval_set: Introduce contained_in()
  clustering_interval_set: Introduce add() overload accepting another interval set
  mutation_partition: Fix merging to not leave the source with broader continuity on bad_alloc
  mutation_partition: Preserve continuity in case row merging with no tracker throws
  memtable, cache: Fix exception safety of partition entry insertions
2018-07-17 18:19:37 +03:00
Tomasz Grabiec
477d7b439b row_cache: Fix violation of continuity on concurrent eviction and population
ensure_population_lower_bound() returned true if current clustering
range covers all rows, which means that the populator has a right to
set continuity flag to true on the row it inserts. This is correct
only if the current population range actually starts since before all
clustering rows. Otherwise we're populating since _last_row, and
should consult it.

The fix introduces a new flag, set when starting to populte, which
indicates if we're populating from the beginning of the range or
not. We cannot simply check if _last_row is set in
ensure_population_lower_bound() because _last_row can be set and then
become empty again.

Fixes #3608
2018-07-17 16:43:21 +02:00
Tomasz Grabiec
8d47d21149 position_in_partition: Introduce is_before_all_clustered_rows() 2018-07-17 16:43:21 +02:00
Tomasz Grabiec
612b223819 managed_bytes: Mark read_linearize() as an allocation point 2018-07-17 16:39:43 +02:00
Tomasz Grabiec
be678a81ee tests: Relax expectation about continuity after failed merging
Currently we check that the sum of continuities is exactly the same as
expected on failure. Relax this to require that continuity is not
broader, since in some bad_alloc scenarios, or preemption, we will
have to mark some ranges as discontinuous.
2018-07-17 16:39:43 +02:00
Tomasz Grabiec
f366ac76e8 tests: mutation_partition: Verify continuity is consistent on bad_alloc on merging 2018-07-17 16:30:01 +02:00
Tomasz Grabiec
d9db79a85d tests: Switch to seastar's allocation failure injector
It catches more allocation sites.
2018-07-17 16:30:01 +02:00
Tomasz Grabiec
6b1fe6cbe5 mutation_partition: Introduce set_continuity() 2018-07-17 16:30:01 +02:00
Tomasz Grabiec
ac772cbd81 clustering_interval_set: Introduce contained_in() 2018-07-17 16:30:01 +02:00
Tomasz Grabiec
d24ebe8565 clustering_interval_set: Introduce add() overload accepting another interval set 2018-07-17 16:30:01 +02:00
Tomasz Grabiec
c6c54021a8 mutation_partition: Fix merging to not leave the source with broader continuity on bad_alloc
When clustering keys are larger than 12.8 KiB they may get fragmented
and key comparator will need to linearize them on comparison. This may
cause lookups in the rows tree to fail with bad_alloc. Partition
version merging (mutation_partition::apply_monotonically()) was not
taking this into account. If we fail on lookup, the partition which is
being applied may be incorrectly left with the clustering range since
the begging up to the current row marked as continuous, if the current
row has the continuity flag set, because we've moved all of the
preceding rows into the target, and the correct lower bound row is no
longer there in the source. This may mark some discontinuous ranges as
continuous.

Merging is retried by allocating_section, and there will be no problem
if it eventually suceeds, original continity will be reflected in the
sum. The problem will persist if it doesn't eventually succeed, when
we're really out of memory.

To protect against this, we could reset the continuity flag of the
current row in the source when exiting on exception.

Fixes #3583
2018-07-17 16:30:01 +02:00
Tomasz Grabiec
de5c52f422 mutation_partition: Preserve continuity in case row merging with no tracker throws
Example:

 p:      row{key=A, cont=0} row{key=C, cont=1}
 this:                      row{key=C, cont=0}

When we get to processing key=C, key=A was already moved to this, so p
has stale continuity on key=C, which marks (-inf,C) as continuous,
whereas it should mark only (A, C). That's not a problem if merging
succeeds, but if exception happens at this point, we will violate the
invariant which says that the sum of p and this should yield the same
logical partition. It wouldn't because continuity of the sum is
calculated as a set union, and (-inf, A) would be incorrectly turned
into a continuous range.

This is not a problem currently because continuity is always full when
there is no tracker (memtables), so won't change anyway, and when
there is a tracker (cache) we never merge but overwrite instead, so
there is no memory allocation and thus no possibility for failure. But
better be safe.
2018-07-17 16:30:01 +02:00
Tomasz Grabiec
567da3e063 memtable, cache: Fix exception safety of partition entry insertions
boost::intrusive::set::insert() may throw if keys require
linearization and that fails, in which case we will leak the entry.

When this happens in cache, we will also violate the invariant for
entry eviction, which assumes all tracked entries are linked, and
cause a SEGFAULT.

Use the non-throwing and faster insert_before() instead. Where we
can't use insert_before(), use alloc_strategy_unique_ptr<> to ensure
that entry is deallocated on insert failure.

Fixes #3585.
2018-07-17 16:30:01 +02:00
Tomasz Grabiec
c82c0be0be tests: mutation_diff: Ignore differences in memory addresses
Differences in memory addresses are not necessarily differences in
values.

Refs #3571

Message-Id: <1531824919-12737-1-git-send-email-tgrabiec@scylladb.com>
2018-07-17 16:32:04 +03:00
Amos Kong
0fcdab8538 scylla_setup: nic setup dialog is only for interactive mode
Current code raises dialog even for non-interactive mode when we pass options
in executing scylla_setup. This blocked automatical artifact-test.

Fixes #3549

Signed-off-by: Amos Kong <amos@scylladb.com>
Message-Id: <58f90e1e2837f31d9333d7e9fb68ce05208323da.1531824972.git.amos@scylladb.com>
2018-07-17 16:31:18 +03:00
Paweł Dziepak
422d1eaeb9 Merge "Improve usability of pkeys in system.large_partitions table" from Avi
"
Partition keys are currently stored in serialized form in the
system.large_partitions table. This is an obstacle to operators
who usually can't deserialize partition keys in their heads.

Improve the situation by deserializing the partition key for them.
"

* tag 'pkey-print/v1' of https://github.com/avikivity/scylla:
  large_partition_handler: output friendly partition key
  keys: schema-aware printing of a partition_key
2018-07-17 13:51:22 +01:00
Avi Kivity
002ac87aac Update seastar submodule
* seastar aac6cf1...6b97e00 (5):
  > Merge "changes to fix travis CI builds" from Kefu
  > tls.cc: Make "close" timeout delay exception proof
  > core/sharded: mark foreign_ptr::get_owner_shard() const
  > core/memory: Expose counter of large allocations
  > tests: add test for multi-fragmented net::packet

Fixes #3461.
Ref scylladb/seastar#474.
2018-07-17 15:43:01 +03:00
Tomasz Grabiec
3f509ee3a2 mutation_partition: Fix exception-safety of row copy constructor
In case population of the vector throws, the vector object would not
be destroyed. It's a managed object, so in addition to causing a leak,
it would corrupt memory if later moved by the LSA, because it would
try to fixup forward references to itself.

Caused sporadic failures and crashes of row_cache_test, especially
with allocation failure injector enabled.

Introduced in 27014a23d7.
Message-Id: <1531757764-7638-1-git-send-email-tgrabiec@scylladb.com>
2018-07-17 13:21:21 +01:00
Avi Kivity
acb3163639 large_partition_handler: output friendly partition key
Use abstract_type::to_string() to prettify partition key components.

Manually tested by setting --compaction-large-partition-warning-threshold-mb
to zero and inspecting the output for compound and non-compound partition
keys.
2018-07-17 14:44:52 +03:00
Avi Kivity
bfd14b4123 keys: schema-aware printing of a partition_key
Add a with_schema() helper to decorate a partition key with its
schema for pretty-printing purposes, and matching operator<<.

This is useful to print partition keys where the operator, who
may not be familiar with the encoding, may see them.
2018-07-17 14:43:12 +03:00
Tomasz Grabiec
d94c7c07a3 lsa: Disable alloc failure injector inside the LSA sanitizer
Message-Id: <1531814822-30259-1-git-send-email-tgrabiec@scylladb.com>
2018-07-17 11:27:56 +01:00
Botond Dénes
cc4acb6e26 storage_proxy: use the original row limits for the final results merging
`query_partition_key_range()` does the final result merging and trimming
(if necessary) to make sure we don't send more rows to the client than
requested. This merging and trimming is done by a continuation attached
to the `query_partition_key_range_concurrent()` which does the actual
querying. The continuations captures via value the `row_limit` and
`partition_limit` fields of the `query::read_command` object of the
query. This has an unexpected consequence. The lambda object is
constructed after the call to `query_partition_key_range_concurrent()`
returns. If this call doesn't defer, any modifications done to the read
command object done by `query_partition_key_range_concurrent()` will be
visible to the lambda. This is undesirable because
`query_partition_key_range_concurrent()` updates the read command object
directly as the vnodes are traversed which in turn will result in the
lambda doing the final trimming according to a decremented `row_limits`,
which will cause the paging logic to declare the query as exhausted
prematurely because the page will not be full.
To avoid all this make a copy of the relevant limit fields before
`query_partition_key_range_concurrent()` is called and pass these copies
to the continuation, thus ensuring that the final trimming will be done
according to the original page limits.

Spotted while investigating a dtest failure on my 1865/range-scans/v2
branch. On that branch the way range scans are executed on replicas is
completely refactored. These changes appearantly reduce the number of
continuations in the read path to the point where an entire page can be
filled without deferring and thus causing the problem to surface.

Fixes #3605.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <f11e80a6bf8089d49ba3c112b25a69edf1a92231.1531743940.git.bdenes@scylladb.com>
2018-07-16 16:54:50 +03:00
Takuya ASADA
9479ff6b1e dist/common/scripts/scylla_prepare: fix error when /etc/scylla/ami_disabled exists
On this part shell command wasn't converted to python3, need to fix.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180715075015.13071-1-syuu@scylladb.com>
2018-07-16 09:29:38 +03:00
Takuya ASADA
1511d92473 dist/redhat: drop scylla_lib.sh from .rpm
Since we dropped scylla_lib.sh at 58e6ad22b2,
we need remove it from RPM spec file too.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180712155129.17056-1-syuu@scylladb.com>
2018-07-15 14:46:22 +03:00
Avi Kivity
ef9b36376c Merge "database: support multiple data directories" from Glauber
"
While Cassandra supports multiple data directories, we have been
historically supporting just one. The one-directory model suits us
better because of the I/O Scheduler and so far we have seen very few
requests -- if any, to support this.

Still, the infrastructure needed to support multiple directories can be
beneficial so I am trying to bring this in.

For simplicity, we will treat the first directory in the list as the
main directory. By being able to still associate one singular directory
with a table, most of the code doesn't have to change and we don't have
to worry about how to distribute data between the directories.

In this design:
- We scan all data directories for existing data.
- resharding only happens within a particular data directory.
- snapshot details are accumulated with data for all directories that
  host snapshots for the tables we are examining
- snapshots are created with files in its own directories, but the
  manifest file goes to the main directory. For this one, note that in
  Cassandra the same thing happens, except that there is no "main"
  directory. Still the manifest file is still just in one of them.
- SSTables are flushed into the main directory.
- Compactions write data into the main directory

Despite the restrictions, one example of usage of this is recovery.  If
we have network attached devices for instance, we can quickly attach a
network device to an existing node and make the data immediately
available as it is compacted back to main storage.

Tests: unit (release)
"

* 'multi-data-file-v2' of github.com:glommer/scylla:
  database: change ident
  database: support multiple data directories
  database: allow resharing to specify a directory
  database: support multiple directories in get_snapshot_details
  database: move get_snapshot_info into a seastar::thread
  snapshots: always create the snapshot directory
  sstables: pass sstable dir with entry descriptor
  database: make nodetool listsnapshots print correct information
  sstables: correctly create descriptors for snapshots
2018-07-15 13:31:04 +03:00
Avi Kivity
8ee807321f Merge "scylla streaming with rpc streaming" from Asias
"
This work is on top of Gleb's rpc streaming which is merged recently.

What this series does is to replace scylla streaming service's data plane to
use the new rpc streaming instead of the old rpc verb to send the mutations for
scylla streaming. Other parts of scylla streaming, the control plane, are not
changed.

In my test, to bootstrap a new node to the existing one node cluster, smp 2,
scylla stores data on ramdisk to minimize disk io impact.

I saw x2 improvment in streaming bandwidth.

Before:
   [shard 0] stream_session - [Stream #2ae92320-5fc8-11e8-911a-000000000000]
   Streaming plan for Bootstrap-ks3-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1570312 KiB, 109521.02 KiB/s
   [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks3 succeeded, took 14.338 seconds

After:
   [shard 0] stream_session - [Stream #e5589ac0-5fc7-11e8-b463-000000000000]
   Streaming plan for Bootstrap-ks3-index-0 succeeded, peers={127.0.0.1}, tx=0 KiB, 0.00 KiB/s, rx=1546875 KiB, 220415.36 KiB/s
   [shard 0] range_streamer - Bootstrap with 127.0.0.1 for keyspace=ks3 succeeded, took 7.018 seconds

Tests: dtest update_cluster_layout_tests.py

Fixes: #3591
"

* tag 'asias/scylla_streaming_with_rpc_streaming_v8' of github.com:scylladb/seastar-dev:
  streaming: Add rpc streaming support
  storage_service: Introduce STREAM_WITH_RPC_STREAM feature
  streaming: Add estimate_partitions to send_info
  messaging_service: Add streaming with rpc streaming support
  messaging_service: Add streaming_domain
  database: Add add_sstable_and_update_cache
  database: Add make_streaming_sstable_for_write
2018-07-15 12:36:52 +03:00
Avi Kivity
8c993e0728 messaging: tag RPC services with scheduling groups
Assign a scheduling_group for each RPC service. Assignement is
done by connection (get_rpc_client_idx()) - all verbs on the
same connection are assigned the same group. While this may seem
arbitrary, it avoids priority inversion; if two verbs on the same
connection have different scheduling groups, the verb with the low
shares may cause a backlog and stall the connection, including
following requests from verbs that ought to have higher shares.

The scheduling_group parameters are encapsulated in different
classes as they are passed around to avoid adding dependencies.
Message-Id: <20180708140433.6426-1-avi@scylladb.com>
2018-07-13 13:57:08 +02:00
Vladimir Krivopalov
cf7b42619d clustering_ranges_walker: Improve class consistency and readability.
This patch addresses several issues.
  1. The class no longer uses placement-new trick for move-assignment.
     It was incorrect to use because the class contains const refererences
     and re-initializing the same region of memory would result in undefined
     behaviour on accessing these members.

  2. Use boost::iterator_range for tracking the current range of
     cr_ranges. It is easier to deal with and avoids possible bugs like
     assigning only one of two iterators
Message-Id: <4096182c4ee2fb1157e135c487c41012b266ba69.1531440684.git.vladimir@scylladb.com>
2018-07-13 11:23:33 +02:00
Asias He
deff5e7d60 streaming: Add rpc streaming support
This patch changes scylla streaming to use the recently added rpc
streaming feature provided by seastar to send mutation fragments for
scylla streaming instead of the rpc verbs.

It also changes the receiver to write to the sstable file directly,
skipping writing to memtable.
2018-07-13 08:36:47 +08:00
Asias He
71e22fe981 storage_service: Introduce STREAM_WITH_RPC_STREAM feature
With this feature, the node supports scylla streaming using the rpc
streaming.
2018-07-13 08:36:47 +08:00
Asias He
faa6769cdb streaming: Add estimate_partitions to send_info
The sender needs to estimate the number of partitions to send, because
the receiver needs this to prepare the sstables.
2018-07-13 08:36:46 +08:00
Asias He
ddfb4590ce messaging_service: Add streaming with rpc streaming support
Preparation for adding rpc streaming in scylla streaming.

- register_stream_mutation_fragments is used to register the rpc
streaming verb

- make_sink_and_source_for_stream_mutation_fragments is used to get the
sink and source object for the sender

- make_sink_for_stream_mutation_fragments is used to get a sink object
for the receiver
2018-07-13 08:36:46 +08:00
Asias He
671e1b08fe messaging_service: Add streaming_domain
The rpc streaming needs a streaming_domain id for the same logical
server. Chose one for our messaging service.
2018-07-13 08:36:46 +08:00
Asias He
6540051f77 database: Add add_sstable_and_update_cache
Since we can write mutations to sstable directly in streaming, we need
to add those sstables to the system so it can be seen by the query.
Also we need to update the cache so the query refects the latest data.
2018-07-13 08:36:45 +08:00
Asias He
dfc2739625 database: Add make_streaming_sstable_for_write
This will be used to create sstable for streaming receiver to write the
mutations received from network to sstable file instead of writing to
memtable.
2018-07-13 08:36:45 +08:00
Takuya ASADA
ee61660b76 dist/common/scripts/scylla_ec2_check: support custom NIC ifname on EC2
Since some AMIs using consistent network device naming, primary NIC
ifname is not 'eth0'.
But we hardcoded NIC name as 'eth0' on scylla_ec2_check, we need to add
--nic option to specify custom NIC ifname.

Fixes #3584

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180712142446.15909-1-syuu@scylladb.com>
2018-07-12 18:22:28 +03:00
Tomasz Grabiec
b17f7257a9 sstables: index_reader: Reduce size of index_entry by indirecting promoted_index
Reduces size of index_entry from 384 bytes to 64 bytes by using
indirection for the optional promoted index instead of embedding it.

Improves query time from 9ms to 4ms in a micro benchmark with a very
large index page.

Message-Id: <1531406354-10089-1-git-send-email-tgrabiec@scylladb.com>
2018-07-12 17:46:58 +03:00
Tomasz Grabiec
101dcdbb48 gdb: Fix scylla heapprof command
Type of _frames was chagned to static_vector<>

Message-Id: <1531233685-20786-2-git-send-email-tgrabiec@scylladb.com>
2018-07-12 16:51:30 +03:00
Tomasz Grabiec
059133ffa8 gdb: Introduce iteration wrapper for static_vector
Message-Id: <1531233685-20786-1-git-send-email-tgrabiec@scylladb.com>
2018-07-12 16:51:30 +03:00
Duarte Nunes
63b63b0461 utils/loading_cache: Avoid using invalidated iterators
When periodically reloading the values in the loading_cache, we would
iterate over the list of entries and call the load() function for
those which need to be reloaded.

For some concrete caches, load() can remove the entry from the LRU set,
and can be executed inline from the parallel_for_each(). This means we
could potentially keep iterating using an invalidated iterator.

Fix this by using a temporary container to hold those entries to be
reloaded.

Spotted when reading the code.

Also use if constexpr and fix the comment in the function containing
the changes.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180712124143.13638-1-duarte@scylladb.com>
2018-07-12 13:59:09 +01:00
Botond Dénes
2e7bf9c6f9 loading_cache::reload(): obtain key before calling _load()
The continuation attached to _load() needs the key of the loaded entry
to check whether it was disposed during the load. However if _load()
invalidates the entry the continuation's capture line will access
invalid memory while trying to obtain the key.
To avoid this save a copy of the key before calling _load() and pass it
to both _load() and the continuation.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <b571b73076ca863690f907fbd3fb4ff54e597b28.1531393608.git.bdenes@scylladb.com>
2018-07-12 13:42:42 +01:00
Avi Kivity
a4a2f743a8 Merge "Avoid large allocations when reading sstable index pages" from Tomasz
"
If there is a lot of partitions in the index page, index_list may grow large
and require large contiguous blocks of memory, because it's based on
std::vector. That puts pressure on the memory allocator, and if memory is
fragmented, may not be possible to satisfy without a lot of eviction. Switch
to chunked_vector to avoid this.

Refs #3597
"

* 'tgrabiec/avoid-large-alloc-in-index-reader' of github.com:tgrabiec/scylla:
  sstables: Switch index_list to chunked_vector to avoid large allocations
  utils: chunked_vector: Do not require T to be default-constructible for clear()
  utils: chunked_vector: Implement front()
2018-07-12 15:30:18 +03:00