Snitch class semantics defined to be per-Node. To make it so we
introduce here a static member in an i_endpoint_snitch class that
has to contain the pointer to the relevant snitch class instance.
Since the snitch contents are not always pure const it has to be per
shard, therefore we'll make it a "distributed". All the I/O is going
to take place on a single shard and if there are changes - they are going
to be propagated to the rest of the shards.
The application is responsible to initialize this distributed<shnitch>
before it's used for the first time.
This patch effectively reverts most of the "locator: futurize
snitch creation" a2594015f9 patch - the part that modifies the
code that was creating the snitch instance. Since snitch is
created explicitly by the application and all the rest of the code
simply assumes that the above global is initialized we won't need
all those changes any more and the code will get back to be nice and simple
as it was before the patch above.
So, to summarize, this patch does the following:
- Reverts the changes introduced by a2594015f9 related to the fact that
every time a replication strategy was created there should have been created
a snitch that would have been stored in this strategy object. More specifically,
methods like keyspace::create_replication_strategy() do not return a future<>
any more and this allows to simplify the code that calls it significantly.
- Introduce the global distributed<snitch_ptr> object:
- It belongs to the i_endpoint_snitch class.
- There has been added a corresponding interface to access both global and
shard-local instances.
- locator::abstract_replication_strategy::create_replication_strategy() does
not accept snitch_ptr&& - it'll get and pass the corresponding shard-local
instance of the snitch to the replication strategy's constructor by itself.
- Adjusted the existing snitch infrastructure to the new semantics:
- Modified the create_snitch() to create and start all per-shard snitch
instances and update the global variable.
- Introduced a static i_endpoint_snitch::stop_snitch() function that properly
stops the global distributed snitch.
- Added the code to the gossiping_property_file_snitch that distributes the
changed data to all per-shard snitch objects.
- Made all existing snitches classes properly maintain their state in order
to be able to shut down cleanly.
- Patched both urchin and cql_query_test to initialize a snitch instance before
all other services.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v6:
- Rebased to the current master.
- Extended a commit message a little - the summary.
New in v5:
- database::create_keyspace(): added a missing _keyspaces.emplace()
New in v4:
- Kept the database::create_keyspace() to return future<> by Glauber's request
and added a description to this method that needs to be changed when Glauber
adds his bits that require this interface.
To support initialization of system tables keyspace replication_strategy
without the need of having snitch creation.
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
We need to do that in order to close the database cleanly, flushing all pending
data before we do.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
This change abstracts reading from on-disk data sources behind a single
reader which is then composed with memtable readers. This change also
abstracts all data sources behind a single reader obtained via
column_family::make_reader(). That reader is then used by algorithms
like column_family::for_all_partitions() or
column_family::query(). Having those abstractions will make it easier
to add row cache, because it will be encapsulated in a single place.
Current model was not really correct because Origin doesn't support
querying of partition ranges by their value. We can query slices
according to dht::decorated_key ordering, which orders partitions
first by token then by key value.
ring_position encapsulates range constraint. Key value is optional, in
which case only token is constrained.
Currently, Origin generates sstables in the form CF-UUID, where UUID
is a string of numbers.
We also do CF-UUID, but for us, UUID has dashes separating the UUID components.
Due to the current test, we fails to load our current sstables. That test
really isn't that important, since we are currently not doing anything with the
UUID. And if we were, we should be able to accept both formats anyway.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Because system keyspace is not created using the same way as the others - and
it would be hard to convert, due to the fact that it is created inside the
database constructor, make sure that it is created when the database boots.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
A lot of our tests run in memory only, but now that our write path is complete,
we may start running into problems soon, as we write down the sstables.
It would be nice to force the database to run in-memory only in some situations.
Even in the real world, some scenarios may benefit from that in the future.
This patch forces durable_writes to be always false in case we force the data
directory to be an empty list.
For system tables, the patch also fixes a bug. Because system tables were
forceably initialized with durable_writes = false, we would never write them to
disk, even when we were supposed to.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
The "mutation_reader" defined in database.cc is a convenient mechanism
for iterating over mutations. It can be useful for more than just
database.cc (I want to use it in the compaction code), so this patch moves
the type's definition to mutation.hh, and the make_memtable_reader()
function to memtable::make_reader() (in memtable.hh).
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
partitions_ranges will be manipulated upon to be split for different
destination, so provide it separately from read_command to not copy the
later for each destination.
Previous patch added an assert that is not true in the case a test runs
without an attached commit log, yet still generates enough mutations to cause
a memtable flush.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
Commit log guarantees that once an RP is assigned to a data frame/caller, it
will not block before returning the result via future. However, this is not
enough, since we could
a.) Have blocked earlier, in which case the return value processing will be
async anyway
b.) Even if no blocking takes place, future chaining mechanism could decide
it has to reorder execution.
Assuming though that the case where this happens is rare, and cases where it
actually affects the rule of replay position ordering is even rarer, we can
guard against it by simply keeping track of the highest RP _discarded_ (sent
to sstable flush), and if we attempt to apply a mutation with a higher RP,
simply re-do the operation (i.e. write same entry to commit log again).
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
- Forbid explicit snitch creation with constructor.
- Allow the creation of snitches only with locator::make_snitch() template
function.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v4:
- Make sure the snitch is stopped before it's destroyed when _snitch_is_ready
is returned in an exceptional state.
New in v2:
- Change snitch_ptr to be std::unique_ptr<i_endpoint_snitch>
- abstract_replication_strategy::create_replication_strategy(): explicitly
specify (template) types of create_object() parameters.
- Re-arrange the loop in marge_keyspaces() so that lambdas that depend on
"this" complete before there is a chance that "this" gets destroyed.
- create_keyspace(): Don't add a new keyspace if a keyspace with this name
already exists.
- i_endpoint_snitch: added a stop() virtual method
- Added a stop() pure virtual method.
- Added an enum class snitch_state and a _state member initialized to snitch_state::initializing,
added an assert() in a destructor requiring _state to become snitch_state::stopped,
which should be set when stop() is complete.
- rack_inferring_snitch: added a stop() method.
- simple_snitch: added a stop() method.
- Added stop() methods to abstract_replication_strategy and keyspace.
- Updated database::stop() to wait for all keyspaces in _keyspaces to stop.
To handle the fact that --data-file-directories is supposed to be 1+
folders.
Note that boost::program_ops already "reserves" the use of std::vector
as reciever of values for multitoken options (i.e. those with more than
one value). Thus, values recieving a list of tokens via command line
should adhere to the multi-token rules, i.e. space separated values.
End result is that --data-file-directories now accept multiple paths,
white space separated,
i.e. --data-file-directories <path1> <path2>
And as it turns out, this is really a nicer way of writing stuff than
using "," or ":" seperation of paths etc, so...
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
This reverts commit a19d2171eb.
This commit breaks cql_query_test.
[asias@hjpc urchin]$ ./cql_query_test
Running 1 test case...
WARNING: Not implemented: COMPACT_TABLES
WARNING: Not implemented: METRICS
WARNING: Not implemented: PERMISSIONS
cql_query_test: core/distributed.hh:290: Service&
distributed<Service>::local() [with Service =
service::storage_service]: Assertion `local_is_initialized()' failed.
unknown location(0): fatal error in "test_create_keyspace_statement":
signal: SIGABRT (application abort requested)
tests/test-utils.cc(31): last checkpoint
*** 1 failure detected in test suite "tests/urchin/cql_query_test.cc"
(gdb) bt
#0 0x00000032930348d7 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:55
#1 0x000000329303653a in __GI_abort () at abort.c:89
#2 0x000000329302d47d in __assert_fail_base (fmt=0x3293186cb8
"%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x8ec10a "local_is_initialized()",
file=file@entry=0x92508d "core/distributed.hh",
line=line@entry=290, function=function@entry=0x8ed440
<distributed<service::storage_service>::local()::__PRETTY_FUNCTION__>
"Service& distributed<Service>::local() [with Service =
service::storage_service]")
at assert.c:92
#3 0x000000329302d532 in __GI___assert_fail (assertion=0x8ec10a
"local_is_initialized()", file=0x92508d "core/distributed.hh",
line=290,
function=0x8ed440
<distributed<service::storage_service>::local()::__PRETTY_FUNCTION__>
"Service& distributed<Service>::local() [with Service =
service::storage_service]") at assert.c:101
#4 0x0000000000430f19 in local (this=<optimized out>) at
core/distributed.hh:290
#5 get_local_storage_service () at service/storage_service.hh:3326
#6 keyspace::create_replication_strategy (this=0x7ffff6bf8350) at
database.cc:690
#7 0x000000000061537a in
_ZZZN2db20legacy_schema_tables15merge_keyspacesERN7service13storage_proxyEOSt3mapI13basic_sstringIcjLj15EE13lw_shared_ptrIN5query10result_setEESt4lessIS6_ESaISt4pairIKS6_SA_EEESI_ENKUlRT_E0_clISt6ve
ctorISF_SG_EEEDaSK_ENKUlR8databaseE_clESQ_ () at
db/legacy_schema_tables.cc:584
#8 0x0000000000617d19 in operator() (__closure=0x7ffff6bf8650) at
./core/distributed.hh:284
In the test, storage_service and other services are not stared.
Let's revert it and figure out a way to run cql_query_test with the
needed services started properly and then bring the "storage_service:
Remove ad-hoc token_metadata creation" change back.
Adapt for_all_partitions() to use futures instead of iterators,
as that will be the interface to sstables. We drop use of nway_merger as
that is not able to use futures and instead open-code the heap
functionality.
* Forward commitlog replay_position to column_family.memtable, updating
highest RP if needed
* When flushing memtable, signal back to commitlog that RP has been dealt with
to potentially remove finished segment(s)
Note: since memtable flushing right now is _not_ explicitly ordered,
this does not actually work, since we need to guarantee ordering with
regards to RP. I.e. if we flush N blocks, we must guarantee that:
a.) We report "flushed RP" in RP order
b.) For a given RP1, all RP* lower than RP1 must also have been flushed.
(The latter means that it is fine to say, flush X tables at the same time, as long as we report a single RP that is the highest, and no lower RP:s exist in non-flushed tables)
I am however letting someone else deal with ensuring MT->sstable flush order.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
Store a lw_shared_ptr<keyspace_metadata> in struct keyspace so callers
in migration manager, for example, can look it up.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
The function write_datafile was renamed to write_components and made a member
of sstable class because write of components requires access to private
members. This change is an important step towards the generation of components
other than data file.
The respective testcases were adapted to the changes.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>