Commit Graph

3800 Commits

Author SHA1 Message Date
Avi Kivity
5be417175e Merge "snitch creation"
From Vlad:

"Currently database always created a SimpleSnitch and ignores the corresponding parameter
provided by the user. This series fixes this situation:
   - Changes the snitch creation interface to comply the Java-like interface that
     has already been used in a topology_strategy classes family.
   - Fix all the places where a SimpleSnitch has been created ignoring the user configuration."
2015-06-14 17:59:15 +03:00
Avi Kivity
06031ea273 Merge seastar upstream 2015-06-14 17:47:32 +03:00
Avi Kivity
25420a6fdf core: add support for --cpuset command line option
Syntax: [cpu-]cpu(,[cpu-]cpu=)...
Default: all processors
2015-06-14 16:11:31 +03:00
Avi Kivity
f85a2b48bb resource: support using only a subset of a machine's processors
This is useful for running multiple seastar applications on the same
machine, for testing purposes.
2015-06-14 16:10:21 +03:00
Gleb Natapov
50b18a56cd test: add semaphore test 2015-06-14 16:03:46 +03:00
Gleb Natapov
361db498d1 semaphore: add wait() with timeout support 2015-06-14 16:02:19 +03:00
Gleb Natapov
f19ba7c334 Add timer move constructor 2015-06-14 16:02:18 +03:00
Gleb Natapov
1b2bf57a2b move timer out of reactor.hh to its own header 2015-06-14 16:02:16 +03:00
Vlad Zolotarov
2f14c53f4e gossiping_property_file_snitch: use a logger from i_endpoint_snitch
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-14 15:31:58 +03:00
Vlad Zolotarov
e045d8465c db: use snitch name from the configuration file
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-14 15:31:58 +03:00
Vlad Zolotarov
03ffaea768 locator: introduce i_endpoint_snitch::create_snitch()
- Kill make_snitch().
   - i_endpoint_snitch::create_snitch() uses the utilities from class_registrator.hh.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-14 15:31:49 +03:00
Avi Kivity
0b0ad13418 doc: prevent -- from becoming an emdash 2015-06-14 09:20:38 +03:00
Avi Kivity
5251c2523a thread: point out async() as an easy way to launch a thread 2015-06-14 09:14:30 +03:00
Avi Kivity
980a7dc881 thread: more documentation 2015-06-14 08:56:42 +03:00
Raphael S. Carvalho
bf33cae9e1 sstables: re-change buffer size of output streams
Commit 0993142d8 changed buffer size of output streams to a better
number, but this change was lost when translating the sstable write
code to use the thread facility.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-11 15:31:27 +03:00
Gleb Natapov
c2f975bee9 test querying of multiple singular ranges 2015-06-11 15:18:07 +03:00
Gleb Natapov
fc6f6634fa support query of multiple singular ranges 2015-06-11 15:18:07 +03:00
Gleb Natapov
b7155ad862 pass partitions_ranges separately from from read_command
partitions_ranges will be manipulated upon to be split for different
destination, so provide it separately from read_command to not copy the
later for each destination.
2015-06-11 15:18:07 +03:00
Avi Kivity
ce6cd4b67e Merge "Store keyspace strategy options to database"
From Pekka:

"This series fixes up schema management code to store keyspace strategy
options to database. The map is stored as JSON just like in Origin."
2015-06-11 14:21:53 +03:00
Avi Kivity
09f0a90cac Merge seastar upstream 2015-06-11 14:12:16 +03:00
Vlad Zolotarov
2f238e7d2e dpdk: exclude the KNI module from the required modules
We don't use this module and it's compilation is broken in DPDK 2.0.0
against Linux kernels 4.0.x.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-11 14:11:29 +03:00
Pekka Enberg
5b4c073170 db/legacy_schema_tables: Store keyspace strategy options
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-06-11 13:02:42 +03:00
Pekka Enberg
673c0c1759 JSON parsing and formatting helper functions
There's various places in Origin where we convert a Map to a JSON string
and vice versa.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-06-11 13:02:42 +03:00
Pekka Enberg
d088cb8181 Fix keyspace strategy options to preserve key-value ordering
Fix keyspace strategy options to preserve key-value ordering by
switching to std::map. We need this to be able to store the map in
database as JSON because unordered maps can cause the schema merging
code to attempt a keyspace update, which we don't support, even though
the values did not change.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-06-11 13:02:42 +03:00
Avi Kivity
b5b1fa730b dpdk: compatibility with dpdk 2.1 2015-06-11 11:12:07 +03:00
Pekka Enberg
b7a23ddadd database: Memtable flush batching
Currently, we flush out memtables very aggressively which results into
lots of small sstable writes. The proper fix here is to do accounting on
the memtable size but before that happens, bump up the threshold to
another magic number which gives better batching:

  $ ./build/release/seastar --smp 1 --data-file-directories data --commitlog-directory commitlog/

  $ tools/bin/cassandra-stress write -mode cql3 native prepared -rate threads=32

Before:

  Results:
  op rate                   : 37280
  partition rate            : 37280
  row rate                  : 37280
  latency mean              : 0.8
  latency median            : 0.6
  latency 95th percentile   : 1.1
  latency 99th percentile   : 7.6
  latency 99.9th percentile : 11.9
  latency max               : 50.5
  Total operation time      : 00:00:30
  END

After:

  Results:
  op rate                   : 46721
  partition rate            : 46721
  row rate                  : 46721
  latency mean              : 0.7
  latency median            : 0.5
  latency 95th percentile   : 0.9
  latency 99th percentile   : 1.3
  latency 99.9th percentile : 5.8
  latency max               : 96.3
  Total operation time      : 00:00:39
  END

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-06-11 10:24:35 +03:00
Calle Wilund
65f25e1840 Database/commitlog - Fix broken assert
Previous patch added an assert that is not true in the case a test runs
without an attached commit log, yet still generates enough mutations to cause
a memtable flush.

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2015-06-10 15:20:36 +03:00
Calle Wilund
8b9a63a3c6 Database/commitlog: guard against replay position reordering
Commit log guarantees that once an RP is assigned to a data frame/caller, it
will not block before returning the result via future. However, this is not
enough, since we could
a.) Have blocked earlier, in which case the return value processing will be
async anyway
b.) Even if no blocking takes place, future chaining mechanism could decide
it has to reorder execution.

Assuming though that the case where this happens is rare, and cases where it
actually affects the rule of replay position ordering is even rarer, we can
guard against it by simply keeping track of the highest RP _discarded_ (sent
to sstable flush), and if we attempt to apply a mutation with a higher RP,
simply re-do the operation (i.e. write same entry to commit log again).

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2015-06-10 11:56:45 +03:00
Avi Kivity
4c4e90a948 Merge seastar upstream 2015-06-10 11:49:08 +03:00
Avi Kivity
c4756c7622 fstream: fix dropped future in write path
Noticed by Raphael.
2015-06-10 11:48:20 +03:00
Avi Kivity
11b09a0b27 Merge "snitching"
From Vlad:

"The series includes the first production snitch implementation:
gossiping_property_file_snitch.

There are also a few fixes/improvements in different parts of the project
that were discovered on the way."
2015-06-10 10:29:50 +03:00
Vlad Zolotarov
dc732d95d5 gossiping_property_file_snitch_test: Checks parsing facilities of gossiping_property_file_snitch class
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Added "missing declarations" tests
   - tests/urchin/snitch_property_files: renamed: s/-/_/
   - Reworked to use boost testing facilities
   - Use snitch::stop().
2015-06-09 15:33:38 +03:00
Vlad Zolotarov
cbbdcad649 locator: gossiping_property_file_snitch
Reads the configuration from cassandra-rackdc.properties.
This file may include the following fields:
   - dc: Local Data Center name
   - rack: Local Rack name
   - prefer_local: A boolean value that defines if cluster should prefer
                   local address - relevant for AWS cloud.

Class will schedule a timer that will re-read the property file and inform a
Gossiper if there are changes in the local configuration.

Differences from the Origin C* implementation:
   - No support for a legacy property_file_snitch.
   - Class supports overriding the property file name in a constructor.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v4:
   - Fix a debug compilation: define reload_property_file_period() to be a constexpr
     method instead of a member.
   - Don't stop() the snitch when snitch_is_ready is set to an exceptional state.

New in v2:
   - Adjust to new file interface.
   - Futurize reload_propery_file().
   - Use trim() and split() from boost::algorithm.
   - Read optimization and logging:
      - Re-read the file only if it was changed since the last read.
      - Use logging facilities from log.hh.
   - Cleanups:
      - Introduce bad_property_file_error exception.
      - Remove unnecessary check after dma_read_exactly() call.
   - Styling.
   - Copyright.
   - Move most of the functions implementation into the .cc file.
   - Added stop() method.
2015-06-09 15:33:38 +03:00
Vlad Zolotarov
aa11ebca41 locator: added a production_snitch_base implementation
- Implements the non-trivial versions of get_rack() and get_datacenter().
     Performs a lookup in a following order:
        1) Searches in a gossiper::endpoint_state_map.
        2) Searches in a SystemTable.
        3) If not found in any of the above returns a default value.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Introduce db::system_keyspace::endpoint_dc_rack.
   - Kill trim() and split().
   - Added missing copyright and license statements.
   - _my_rack and _my_dc are not optional anymore.
   - Added a promiss that has to be set when snitch is stopped.
2015-06-09 15:33:38 +03:00
Vlad Zolotarov
a2594015f9 locator: futurize snitch creation
- Forbid explicit snitch creation with constructor.
   - Allow the creation of snitches only with locator::make_snitch() template
     function.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v4:
   - Make sure the snitch is stopped before it's destroyed when _snitch_is_ready
     is returned in an exceptional state.

New in v2:
   - Change snitch_ptr to be std::unique_ptr<i_endpoint_snitch>
   - abstract_replication_strategy::create_replication_strategy(): explicitly
     specify (template) types of create_object() parameters.
   - Re-arrange the loop in marge_keyspaces() so that lambdas that depend on
     "this" complete before there is a chance that "this" gets destroyed.
   - create_keyspace(): Don't add a new keyspace if a keyspace with this name
     already exists.
   - i_endpoint_snitch: added a stop() virtual method
      - Added a stop() pure virtual method.
      - Added an enum class snitch_state and a _state member initialized to snitch_state::initializing,
        added an assert() in a destructor requiring _state to become snitch_state::stopped,
        which should be set when stop() is complete.
   - rack_inferring_snitch: added a stop() method.
   - simple_snitch: added a stop() method.
   - Added stop() methods to abstract_replication_strategy and keyspace.
   - Updated database::stop() to wait for all keyspaces in _keyspaces to stop.
2015-06-09 15:33:38 +03:00
Vlad Zolotarov
aecc2b4279 locator: reworked the snitch'es infrastructure
- Introduce snitch_base class that implements all snitch functionality
     except for get_rack() and get_datacenter() methods.
   - Requires the inheriting classes to initialize _my_rack and _my_dc fields.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Returned copyright lines.
   - Make _my_dc and _my_rack a non-optional for now.
   - Styling and add an "override" qualifier to virtual functions implementations.
   - Move most of snitch_base members into snitch_base.cc.
   - snitch_base.hh: Add "Modified by Cloudius Systems" to a license.
   - simple_snitch: copyright
   - rack_inferring_snitch: copyright
2015-06-09 15:33:38 +03:00
Vlad Zolotarov
ab14716ce8 gossiper: "Start" gossiper on all CPUs and initialize its services only on CPU0
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-09 15:33:38 +03:00
Vlad Zolotarov
4703987faf gossiper: replicate the endpoint_state_map and _live_endpoints on all shards
For all replicated maps:
   - Keep the shadow copy on CPU0 and if at the end of a gossiper task execution
     it differs from the current contents of the map replicate it on all shards
     and update the shadow copy on CPU0.
   - Ensure that gossiper task is restarted 1 second AFTER the current iteration
     is over and not 1 second after it started.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - Rename: _live_endpoints_shadow -> _shadow_live_endpoints
   - s/inly/only/
   - Clean up the things that don't belong to this patch.
   - Replicate _live_endpoints as well
   - gossiper: copy _shadow_endpoint_state_map
2015-06-09 15:33:38 +03:00
Vlad Zolotarov
e850a723e4 class_registrator::create(): Enforce refference arguments
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-09 15:33:38 +03:00
Vlad Zolotarov
f1aa0df4c3 class_registrator: ensure the static member initialization order
There was a possibility for initialization disorder of static member _classes
and its usage in another static class.

Defining the _classes inside the static method that is called when it's accessed ensures
the proper initialization (aka "standard trick", quoting Avi ;)).

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-09 15:33:38 +03:00
Vlad Zolotarov
73278798a9 added missing methods (stubs) required for snitch implementation
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v2:
   - storage_service: add a non-const version of get_token_metadata().
   - get_broadcast_address(): check if net::get_messaging_service().local_is_initialized()
     before calling net::get_local_messaging_service().listen_address().
   - get_broadcast_address(): return an inet_address by value.
   - system_keyspace: introduce db::system_keyspace::endpoint_dc_rack
   - fb_utilities: use listen_address as broadcast_address for now
2015-06-09 15:33:29 +03:00
Vlad Zolotarov
1e32bdf090 gms: added missing operator==() required for endpoint_state_map comparison.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-09 15:18:46 +03:00
Vlad Zolotarov
c1f0d285bb database: make the the create_keyspace() function declaration match the definitiion.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-09 15:18:46 +03:00
Avi Kivity
e07a1e2924 Merge seastar upstream 2015-06-09 12:54:50 +03:00
Avi Kivity
4bbd90d14c reactor: workaround missing FALLOC_FL_ZERO_RANGE in kernel headers
Prehistoric kernels don't expose FALLOC_FL_ZERO_RANGE, humor them.
2015-06-09 12:53:58 +03:00
Avi Kivity
7a464ddf99 reactor: batch aio
Instead of issuing a system call for every aio, wait for them to accumulate,
and issue them all at once.  This reduces syscall count, and allows the kernel
to batch requests (bu plugging the I/O queues during the call).  A poller is
added so that requests are not delayed too much.

Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-06-09 12:52:40 +03:00
Raphael S. Carvalho
d1ed0744f0 schema: add sstable compressor property
The field compressor is about saying which compressor algorithm
must be used in compression of sstable data file.
This is a small step towards compressed sstable data file.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-09 11:18:56 +03:00
Shlomi Livne
bd89fa4905 config: add string_list (vec of sstring) as config data type + use for datadir
To handle the fact that --data-file-directories is supposed to be 1+
folders.

Note that boost::program_ops already "reserves" the use of std::vector
as reciever of values for multitoken options (i.e. those with more than
one value). Thus, values recieving a list of tokens via command line
should adhere to the multi-token rules, i.e. space separated values.

End result is that --data-file-directories now accept multiple paths,
white space separated,
i.e. --data-file-directories <path1> <path2>
And as it turns out, this is really a nicer way of writing stuff than
using "," or ":" seperation of paths etc, so...

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2015-06-09 10:40:45 +03:00
Avi Kivity
551114586d Merge "Initial table merging"
Pekka says:

"This series is the initial table merging code conversion. We now store
column family metadata in the database but without information about the
actual columns."
2015-06-09 10:39:54 +03:00
Avi Kivity
7f7381dc1e Merge seastar upstream 2015-06-09 08:45:25 +03:00