Commit Graph

471 Commits

Author SHA1 Message Date
Asias He
8624467e26 utils: Remove utils/utils.cc
It is used to make sure the header compiles in the early days.
Message-Id: <531fc6570805bd163afedd53f5d71e1b79a477d1.1520840644.git.asias@scylladb.com>
2018-03-12 09:47:40 +02:00
Tomasz Grabiec
654d4b76c0 anchorless_list: Introduce all_elements_reversed() 2018-03-06 11:50:26 +01:00
Avi Kivity
d973445a94 Merge "sstable/schema extensions" from Calle
"
Adds extension points to schema/sstables to enable hooking in
stuff, like, say, something that modifies how sstable disk io
works. (Cough, cough, *encryption*)

Extensions are processed as property keywords in CQL. To add
an extension, a "module" must register it into the extensions
object on boot time. To avoid globals (and yet don't),
extensions are reachable from config (and thus from db).

Table/view tables already contain an extension element, so
we utilize this to persist config.

schema_tables tables/views from mutations now require a "context"
object (currently only extensions, but abstracted for easier
further changes.

Because of how schemas currently operate, there is a super
lame workaround to allow "schema_registry" access to config
and by extension extensions. DB, upon instansiation, calls
a thread local global "init" in schema_registry and registers
the config. It, in turn, can then call table_from_mutations
as required.

Includes the (modified) patch to encapsulate compression
into objects, mainly because it is nice to encapsulate, and
isolate a little.
"

* 'calle/extensions-v5' of github.com:scylladb/seastar-dev:
  extensions: Small unit test
  sstables: Process extensions on file open
  sstables::types: Add optional extensions attribute to scylla metadata
  sstables::disk_types: Add hash and comparator(sstring) to disk_string
  schema_tables: Load/save extensions table
  cql: Add schema extensions processing to properties
  schema_tables: Require context object in schema load path
  schema_tables: Add opaque context object
  config_file_impl: Remove ostream operators
  main/init: Formalize configurables + add extensions to init call
  db::config: Add extensions as a config sub-object
  db::extensions: Configuration object to store various extensions
  cql3::statements::property_definitions: Use std::variant instead of any
  sstables: Add extension type for wrapping file io
  schema: Add opaque type to represent extensions
  sstables::compress/compress: Make compression a virtual object
2018-02-26 17:15:29 +02:00
Paweł Dziepak
5dfa36c526 lsa: add basic sanitizer
LSA being an allocator built on top of the standard may hide some
erroneous usage from AddressSanitizer. Moreover, it has its own classes
of bugs that could be caused by incorrect user behaviour (e.g. migrator
returning wrong object size).

This patch adds basic sanitizer for the LSA that is active in the debug
mode and verifies if the allocator is used correctly and if a problem is
found prints information about the affected object that it has collected
earlier. Theat includes the address and size of an object as well as
backtrace of the allocation site. At the moment the following errors are
being checked for:
 * leaks, objects not freed at region destructor
 * attempts to free objects at invalid address
 * mismatch between object size at allocation and free
 * mismatch between object size at allocation and as reported by the
   migrator
 * internal LSA error: attempt to allocate object at already used
   address
 * internal LSA error: attempt to merge regions containing allocated
   objects at conflicting addresses

Message-Id: <20180226122314.32049-1-pdziepak@scylladb.com>
2018-02-26 14:35:13 +02:00
Vladimir Krivopalov
721bd3eef6 Added missing 'override' to skip() in buffer_input_stream and prepended_input_stream.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <4e91bead8de7f6fa9b3bfdab8bda73efdb22749d.1519152303.git.vladimir@scylladb.com>
2018-02-20 19:49:11 +00:00
Duarte Nunes
9ce0be60d4 utils/flush_queue: Remove unused function
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180216234502.23931-1-duarte@scylladb.com>
2018-02-19 13:09:11 +00:00
Tomasz Grabiec
7e0ff8a920 lsa: Disable allocation failure injection inside merge()
Fixes termiantion in tests due to throw from merge(), which is noexcept.
2018-02-14 16:42:49 +01:00
Tomasz Grabiec
66701c1671 lsa: Make region deregistration robust against duplicates 2018-02-14 16:42:49 +01:00
Tomasz Grabiec
cf876bbe2d lsa: Make region allocation exception safe
We were not unregisterring in case add() fails.
2018-02-14 16:42:49 +01:00
Calle Wilund
2ee68ce0d4 config_file_impl: Remove ostream operators
We don't generate default strings for command line, so these are not
needed as such, and conflict with other operators in to_string.hh
2018-02-07 10:11:46 +00:00
Paweł Dziepak
dcd79af8ed lsa: optimise disabling reclamation and invalidation counter
Most of the lsa gory details are hidden in utils/logalloc.cc. That
includes the actual implementation of a lsa region: region_impl.

However, there is code in the hot path that often accesses the
_reclaiming_enabled member as well as its base class
allocation_strategy.

In order to optimise those accesses another class is introduced:
basic_region_impl that inherits from allocation_strategy and is a base
of region_impl. It is defined in utils/logalloc.hh so that it is
publicly visible and its member functions are inlineable from anywhere
in the code. This class is supposed to be as small as possible, but
contain all members and functions that are accessed from the fast path
and should be inlined.
2018-01-30 18:33:26 +01:00
Paweł Dziepak
d825ae37bf lsa: split alloc section into reserving and reclamation-disabled parts
Allocating sections reserves certain amount of memory, then disables
reclamation and attempts to perform given operation. If that fails due
to std::bad_alloc the reserve is increased and the operation is retried.

Reserving memory is expensive while just disabling reclamation isn't.
Moreover, the code that runs inside the section needs to be safely
retryable. This means that we want the amount of logic running with
reclamation disabled as small as possible, even if it means entering and
leaving the section multiple times.

In order to reduce the performance penalty of such solution the memory
reserving and reclamation disabling parts of the allocating sections are
separated.
2018-01-30 18:33:26 +01:00
Paweł Dziepak
eb2e88e925 linearization_context: remove non-trivial operations from fast path
Since linearization_context is thread_local every time it is accessed
the compiler needs to emit code that checks if it was already
constructed and does so if it wasn't. Moreover, upon leaving the context
from the outermost scope the map needs to be cleared.

All these operations impose some performance overhead and aren't really
necessary if no buffers were linearised (the expected case). This patch
rearranges the code so that lineatization_context is trivially
constructible and the map is cleared only if it was modified.
2018-01-30 18:33:25 +01:00
Vladimir Krivopalov
9fdf4b24b5 Add helper input streams: buffer_input_stream and prepended_input_stream.
buffer_input_stream is a simple input_stream wrapping a single
temporary_buffer.

prepended_input_stream suits for the case when some data has been read
into a buffer and the rest is still in a stream. It accepts a buffer and
a data_source and first reads from the buffer and then, when it ends,
proceeds reading from the data_source.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-01-29 11:57:04 -08:00
Botond Dénes
12b1520415 exponential_backoff_retry::do_until_value(): restore indentation
Deferred from previous patch.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <a10053f6c0ed8a24a74e51f1df4e9a5acf59922d.1517222195.git.bdenes@scylladb.com>
2018-01-29 10:50:01 +00:00
Botond Dénes
e0c082616a exponential_backoff_retry::do_until_value(): fix use-after-move
The exponential_backoff_retry instance is captured by move and is then
indirectly moved again as repeat_until_value() moves the lambda its
passed into its internal state. This caused problems as internal
lambdas store references to the instance and these references go stale
after the move.
To fix this keep hold of the existential_backoff_retry instance in an
enclosing do_with() to make it safe for internal lambdas to reference
it.

Indentation will be fixed by the next patch.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <adc49d25a6176756d60e092f3713c0c897732382.1517222195.git.bdenes@scylladb.com>
2018-01-29 10:50:01 +00:00
Duarte Nunes
bfe5a8e96f utils/managed_vector: Return reference to emplaced element
We are in 2018, after all.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180126105417.54285-1-duarte@scylladb.com>
2018-01-26 13:49:56 +01:00
Tomasz Grabiec
1292315579 anchorless_list: Introduce last() 2018-01-18 11:32:49 +01:00
Tomasz Grabiec
5c85e9c2db lsa: Expose max_zone_segments for tests 2018-01-16 13:17:20 +01:00
Tomasz Grabiec
99708cc498 lsa: Expose tracker::non_lsa_used_space()
So that it can be used in unit tests.
2018-01-16 13:17:20 +01:00
Tomasz Grabiec
e5f8176c32 lsa: Fix memory leak on zone reclaim
_free_segments_in_zones is not adjusted by
segment_pool::reclaim_segments() for empty zones on reclaim under some
conditions. For instance when some zone becomes empty due to regular
free() and then reclaiming is called from the std allocator, and it is
satisfied from a zone after the one which is empty. This would result
in free memory in such zone to appear as being leaked due to corrupted
free segment count, which may cause a later reclaim to fail. This
could result in bad_allocs.

The fix is to always collect such zones.

Fixes #3129
Refs #3119
Refs #3120
2018-01-16 13:17:11 +01:00
Glauber Costa
80c4a211d8 consolidate timeout_clock
At the moment, various different subsystems use their different
ideas of what a timeout_clock is. This makes it a bit harder to pass
timeouts between them because although most are actually a lowres_clock,
that is not guaranteed to be the case. As a matter of fact, the timeout
for restricted reads is expressed as nanoseconds, which is not a valid
duration in the lowres_clock.

As a first step towards fixing this, we'll consolidate all of the
existing timeout_clocks in one, now called db::timeout_clock. Other
things that tend to be expressed in terms of that clock--like the fact
that the maximum time_point means no timeout and a semaphore that
wait()s with that resolution are also moved to the common header.

In the upcoming patch we will fix the restricted reader timeouts to
be expressed in terms of the new timeout_clock.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-01-11 12:07:41 -05:00
Duarte Nunes
40ad65666f utils/exponential_backoff_retry: Add helper to automate retries
This patch adds the do_until_value static member function to
exponential_backoff_retry, which retries the specified function until
it returns an engaged optional.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-12-28 13:00:28 +00:00
Duarte Nunes
9a602c7796 utils/exponential_backoff_retry: Add abort_source-based retry
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2017-12-28 13:00:28 +00:00
Duarte Nunes
1374f898b9 Merge seastar upstream
Class optimized_optional was moved into seastar, and its usage
simplified so move_and_disengage() is replaced in favour of
std::exchange(_, { }).

* seastar adaca37...b0f5591 (9):
  > Merge "core: Introduce cancellation mechanism" from Duarte
  > Fix Seastar build that no longer builds with --enable-dpdk after the recent commit fd87ea2
  > noncopyable_function: support function objects whose move constructors throw
  > Adding new hardware options to new config format, using new config format for dpdk device
  > Fix check for Boost version during pre-build configuration.
  > variant_utils: add variant_visitor constructor for C++17 mode
  > Merge "Allows json object to be stream to an" from Amnon
  > Merge 'Default to C++17' from Avi
  > Add const version of subscript operator to circular_buffer

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20171228112126.18142-1-duarte@scylladb.com>
2017-12-28 13:24:18 +02:00
Vlad Zolotarov
6c037899b5 utils::fb_utilities: add is_me(addr) method
Add a widely used method that returns TRUE if a given address is a broadcast
address of the local node.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-12-14 15:05:48 -05:00
Avi Kivity
e6940d8d4a Merge "Gossip propagation and stabilization" from Calle
"Fixes #2866
Fixes #2894

Changes gossip propagation to allow "atomic" grouping of values to ensure
their respective order.
Modifies gossip bootstrap startup to potentially wait longer in cases
where stabilization (messages done) takes time, to avoid data loss
in repair."

* 'calle/gossip' of github.com:scylladb/seastar-dev:
  gossip: wait for stabilized gossip on bootstrap
  gossiper: Prevent race condition in  propagation
  utils::to_string: Add printers for pairs+maps
  utils::in: Add helper type for perfect forwarding initializer lists
2017-12-12 17:59:00 +02:00
Vlad Zolotarov
0145ae2b4b utils::crc32: add power64 crc32 HW accelerated implementation
Based on the work of Anton Blanchard <anton@au.ibm.com>, IBM that may be found
here: https://github.com/antonblanchard/crc32-vpmsum

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-12-08 13:38:13 -05:00
Michael Munday
18c0ab539e utils/allocation_strategy: force alignment to be at least sizeof(void*)
The alignment of packed structs can be 1. The system¹ posix_memalign
function will return EINVAL when passed this alignment. This fix
forces the alignment to be at least sizeof(void*).

¹ The seastar implementation of posix_memalign does not appear to
  have this limitation currently.
2017-12-08 10:12:41 -05:00
Michael Munday
5158b3f484 utils::crc: introduce process_le/be(T) methods
Replace the oblique process(T) overloads for integer types with
explicit process_le/be(T) methods that would interpret the given integer
as a stream of bytes using the corresponding endiannes.

For instance

process_le(0x11223344) would treat this integer as the following array of bytes:
{0x44, 0x33, 0x22, 0x11}.

process_be(0x11223344) on the other hand would treat this integer as if it's
{0x11, 0x22, 0x33, 0x44}.

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
2017-12-08 10:12:21 -05:00
Michael Munday
26b7c2622e utils/crc: use zlib for crc32 on non-x86 platforms
Ideally we should use the Castagnoli polynomial to match the SSE 4.2
crc32 instructions, but this works for now.
2017-12-08 09:47:50 -05:00
Calle Wilund
f4362a5289 utils::in: Add helper type for perfect forwarding initializer lists
wrapper type (courtesy of
http://cpptruths.blogspot.se/2013/09/21-ways-of-passing-parameters-plus-one.html#inTidiom)
to enable move semantics in initializer lists. Useful as an engineering
overkill to retain nice call sites.
2017-12-05 14:28:34 +00:00
Amnon Heiman
3f8d9a87ee estimated_histogram: update the sum and count when merging
When merging histograms the count and the sum should be updated.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20171122154822.23855-1-amnon@scylladb.com>
2017-11-22 16:55:55 +01:00
Glauber Costa
6c4e8049a0 estimated_histogram: also fill up sum metric
Prometheus histograms have 3 embedded metrics: count, buckets, and sum.
Currently we fill up count and buckets but sum is left at 0. This is
particularly bad, since according to the prometheus documentation, the
best way to calculate histogram averages is to write:

  rate(metric_sum[5m]) / rate(metric_count[5m])

One way of keeping track of the sum is adding the value we sampled,
every time we sample. However, the interface for the estimated histogram
has a method that allows to add a metric while allowing to adjust the
count for missing metrics (add_nano())

That makes acumulating a sum inaccurate--as we will have no values for
the points that were added. To overcome that, when we call add_nano(),
we pretend we are introducing new_count - _count metrics, all with the
same value.

Long term, doing away with sampling may help us provide more accurate
results.

After this patch, we are able to correctly calculate latency averages
through the data exported in prometheus.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20171122144558.7575-1-glauber@scylladb.com>
2017-11-22 16:10:12 +01:00
Vladimir Krivopalov
61b1988aa1 Use meaningful error messages when throwing a marshal_exception
Fixes #2977

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <20171121005108.23074-1-vladimir@scylladb.com>
2017-11-21 16:05:43 +02:00
Daniel Fiala
21ea05ada1 utils/big_decimal: Fix compilation issue with converion of cpp_int to uint64_t.
Signed-off-by: Daniel Fiala <daniel@scylladb.com>
Message-Id: <20171121134854.16278-1-daniel@scylladb.com>
2017-11-21 15:51:29 +02:00
Jesse Haber-Kucharsky
6f4241574c utils/loading_cache: Include necessary dependency 2017-11-15 23:17:05 -05:00
Tomasz Grabiec
8d69d217af lsa: Guarantee invalidated references on allocating section retry
There is existing code (e.g. use of partition_snapshot_row_cursor in
cache_streamed_mutation) which assumes that references will be
invalidated when bad_alloc is thrown from allocating_section. That is
currently the case because on retry we will attempt memory reclamation
which will invalidate references either through compaction or
eviction. Make this guarantee explicit.
2017-11-13 20:55:13 +01:00
Daniel Fiala
ce2f010859 utils/big_decimal: Added necessary operators and methods for aggregate functions.
Signed-off-by: Daniel Fiala <daniel@scylladb.com>
2017-11-12 15:51:29 +01:00
Tomasz Grabiec
5348d9f596 managed_bytes: Declare copy constructor as allocation point
Because of the small size optimization, not all copies will call the
allocator, so allocation failure injection may miss this site if the
value is not large enough. Make the testing more effective by marking
this place explicitly as an allocation point.
2017-11-07 15:33:24 +01:00
Tomasz Grabiec
34ccf234ea Integrate with allocation failure injection framework 2017-11-07 15:33:24 +01:00
Calle Wilund
287b6fd8bd config_file: Add optional "error_handler" to yaml parse functions
Allowing parse errors / unknown options to be ignored.
2017-11-06 09:53:05 +00:00
Duarte Nunes
044b8deae4 Merge 'Solves problems related to gossip which can be observed in a large cluster' from Tomasz
"The main problem fixed is slow processing of application state changes.
This may lead to a bootstrapping node not having up to date view on the
ring, and serve incorrect data.

Fixes #2855."

* tag 'tgrabiec/gossip-performance-v3' of github.com:scylladb/seastar-dev:
  gms/gossiper: Remove periodic replication of endpoint state map
  gossiper: Check for features in the change listener
  gms/gossiper: Replicate changes incrementally to other shards
  gms/gossiper: Document validity of endpoint_state properties
  storage_service: Update token_metadata after changing endpoint_state
  gms/gossiper: Process endpoints in parallel
  gms/gossiper: Serialize state changes and notifications for given node
  utils/loading_shared_values: Allow Loader to return non-future result
  gms/gossiper: Encapsulate lookup of endpoint_state
  storage_service: Batch token metadata and endpoint state replication
  utils/serialized_action: Introduce trigger_later()
  gossiper: Add and improve logging
  gms/gossiper: Don't fire change listeners when there is no change
  gms/gossiper: Allow parallel apply_state_locally()
  gms/gossiper: Avoid copies in endpoint_state::add_application_state()
  gms/failure_detector: Ignore short update intervals
2017-10-18 10:13:25 +01:00
Duarte Nunes
c468e59817 Merge 'Extract config file mechanism + allow additional' from Calle
"Extracts the yaml/boost-po aspects of the "self-describing" db::config
into an abstract type.

db::config is then reimplemented in said type, removing some of the
slightly cumbersome entanglement with seastar opts (log).

Adds a main hook for additional configuration files (options + file)"

* 'calle/config' of github.com:scylladb/seastar-dev:
  main/init: Add registerable configuration objects
  db::config: Re-implement on utils/config_file.
  utils::config_file: Abstract out config file to external type
2017-10-18 09:50:53 +01:00
Tomasz Grabiec
f7a7e97095 utils/loading_shared_values: Allow Loader to return non-future result 2017-10-18 08:49:52 +02:00
Tomasz Grabiec
2e2ae4671e utils/serialized_action: Introduce trigger_later()
Can be used instead of trigger() to improve batching.
2017-10-18 08:49:52 +02:00
Calle Wilund
05db87e068 utils::config_file: Abstract out config file to external type
Handling all the boost::commandline + YAML stuff.
This patch only provides an external version of these functions,
it does not modify the db::config object. That is for a follow-up
patch.
2017-10-18 00:51:41 +00:00
Pekka Enberg
ae92055b52 Merge "Bring histogram closer to what Prometheus expects" from Glauber
"Histograms are a native prometheus type, and there are many functions
available that operate on them. There is extensive documentation about
them at https://prometheus.io/docs/practices/histograms/

One example is the function histogram_quantile(), that can extract
useful quantiles from the histograms. Currently, those functions don't
work well.

The reasons are twofold:
1) We are only exporting 16 metrics, starting from 1usec. That means
that the highest latency we can differentiate is 4ms. After that,
everything falls into the same bin.

2) The format that prometheus expects is that each bin will contain
the total number of points seen *up until that bin*, while we
currently export the total number of points that falls between bins.
IOW, it is a cummulative histogram.

About point two, granted it is a bit hidden in their website, but it is
there. The following phrase about a caveat make it clear:

"Note that we divide the sum of both buckets. The reason is that the
histogram buckets are cumulative. The le="0.3" bucket is also contained
in the le="1.2" bucket; dividing it by 2 corrects for that."

It is also not needed to accumulate things that fall over the last bin:
the _count component of the histogram will already account for that."

Acked-by: Amnon Heiman <amnon@scylladb.com>
Acked-by: Gleb Natapov <gleb@scylladb.com>

* 'prometheus-histograms' of github.com:glommer/scylla:
  storage_proxy: change reporting of estimated histograms
  estimated_histogram: bring histogram closer to what prometheus expects.
2017-10-17 20:23:10 +03:00
Tomasz Grabiec
68fe1a5bee utils/loading_cache: Fix compilation on older compilers
Message-Id: <1507728312-10585-1-git-send-email-tgrabiec@scylladb.com>
2017-10-12 14:55:34 +03:00
Botond Dénes
d2b294dc06 loading_cache: prepend this-> to method calls on captured this
To make gcc 6.3 happy.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <849402e20a1ffa6f603eff4fe295981a94b9ca79.1507282527.git.bdenes@scylladb.com>
2017-10-06 12:09:34 +02:00