From Vlad:
"The series includes the first production snitch implementation:
gossiping_property_file_snitch.
There are also a few fixes/improvements in different parts of the project
that were discovered on the way."
Reads the configuration from cassandra-rackdc.properties.
This file may include the following fields:
- dc: Local Data Center name
- rack: Local Rack name
- prefer_local: A boolean value that defines if cluster should prefer
local address - relevant for AWS cloud.
Class will schedule a timer that will re-read the property file and inform a
Gossiper if there are changes in the local configuration.
Differences from the Origin C* implementation:
- No support for a legacy property_file_snitch.
- Class supports overriding the property file name in a constructor.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v4:
- Fix a debug compilation: define reload_property_file_period() to be a constexpr
method instead of a member.
- Don't stop() the snitch when snitch_is_ready is set to an exceptional state.
New in v2:
- Adjust to new file interface.
- Futurize reload_propery_file().
- Use trim() and split() from boost::algorithm.
- Read optimization and logging:
- Re-read the file only if it was changed since the last read.
- Use logging facilities from log.hh.
- Cleanups:
- Introduce bad_property_file_error exception.
- Remove unnecessary check after dma_read_exactly() call.
- Styling.
- Copyright.
- Move most of the functions implementation into the .cc file.
- Added stop() method.
- Implements the non-trivial versions of get_rack() and get_datacenter().
Performs a lookup in a following order:
1) Searches in a gossiper::endpoint_state_map.
2) Searches in a SystemTable.
3) If not found in any of the above returns a default value.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v2:
- Introduce db::system_keyspace::endpoint_dc_rack.
- Kill trim() and split().
- Added missing copyright and license statements.
- _my_rack and _my_dc are not optional anymore.
- Added a promiss that has to be set when snitch is stopped.
- Forbid explicit snitch creation with constructor.
- Allow the creation of snitches only with locator::make_snitch() template
function.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v4:
- Make sure the snitch is stopped before it's destroyed when _snitch_is_ready
is returned in an exceptional state.
New in v2:
- Change snitch_ptr to be std::unique_ptr<i_endpoint_snitch>
- abstract_replication_strategy::create_replication_strategy(): explicitly
specify (template) types of create_object() parameters.
- Re-arrange the loop in marge_keyspaces() so that lambdas that depend on
"this" complete before there is a chance that "this" gets destroyed.
- create_keyspace(): Don't add a new keyspace if a keyspace with this name
already exists.
- i_endpoint_snitch: added a stop() virtual method
- Added a stop() pure virtual method.
- Added an enum class snitch_state and a _state member initialized to snitch_state::initializing,
added an assert() in a destructor requiring _state to become snitch_state::stopped,
which should be set when stop() is complete.
- rack_inferring_snitch: added a stop() method.
- simple_snitch: added a stop() method.
- Added stop() methods to abstract_replication_strategy and keyspace.
- Updated database::stop() to wait for all keyspaces in _keyspaces to stop.
- Introduce snitch_base class that implements all snitch functionality
except for get_rack() and get_datacenter() methods.
- Requires the inheriting classes to initialize _my_rack and _my_dc fields.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v2:
- Returned copyright lines.
- Make _my_dc and _my_rack a non-optional for now.
- Styling and add an "override" qualifier to virtual functions implementations.
- Move most of snitch_base members into snitch_base.cc.
- snitch_base.hh: Add "Modified by Cloudius Systems" to a license.
- simple_snitch: copyright
- rack_inferring_snitch: copyright
For all replicated maps:
- Keep the shadow copy on CPU0 and if at the end of a gossiper task execution
it differs from the current contents of the map replicate it on all shards
and update the shadow copy on CPU0.
- Ensure that gossiper task is restarted 1 second AFTER the current iteration
is over and not 1 second after it started.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v2:
- Rename: _live_endpoints_shadow -> _shadow_live_endpoints
- s/inly/only/
- Clean up the things that don't belong to this patch.
- Replicate _live_endpoints as well
- gossiper: copy _shadow_endpoint_state_map
There was a possibility for initialization disorder of static member _classes
and its usage in another static class.
Defining the _classes inside the static method that is called when it's accessed ensures
the proper initialization (aka "standard trick", quoting Avi ;)).
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v2:
- storage_service: add a non-const version of get_token_metadata().
- get_broadcast_address(): check if net::get_messaging_service().local_is_initialized()
before calling net::get_local_messaging_service().listen_address().
- get_broadcast_address(): return an inet_address by value.
- system_keyspace: introduce db::system_keyspace::endpoint_dc_rack
- fb_utilities: use listen_address as broadcast_address for now
Instead of issuing a system call for every aio, wait for them to accumulate,
and issue them all at once. This reduces syscall count, and allows the kernel
to batch requests (bu plugging the I/O queues during the call). A poller is
added so that requests are not delayed too much.
Reviewed-by: Pekka Enberg <penberg@cloudius-systems.com>
The field compressor is about saying which compressor algorithm
must be used in compression of sstable data file.
This is a small step towards compressed sstable data file.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
To handle the fact that --data-file-directories is supposed to be 1+
folders.
Note that boost::program_ops already "reserves" the use of std::vector
as reciever of values for multitoken options (i.e. those with more than
one value). Thus, values recieving a list of tokens via command line
should adhere to the multi-token rules, i.e. space separated values.
End result is that --data-file-directories now accept multiple paths,
white space separated,
i.e. --data-file-directories <path1> <path2>
And as it turns out, this is really a nicer way of writing stuff than
using "," or ":" seperation of paths etc, so...
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
Pekka says:
"This series is the initial table merging code conversion. We now store
column family metadata in the database but without information about the
actual columns."
From Glauber:
"Up until now, we were still generating one future per element that we write.
Now that we have new infrastructure, we can avoid that, and generate only the
ones we really need to. This has the added advantage of lifting the need to do
lambda captures and allowing for a more straightfoward forwarding of rest...
parameters"
Up until now, we were still generating one future per element that we write.
Now that we have new infrastructure, we can avoid that, and generate only the
ones we really need to. This has the added advantage of lifting the need to do
lambda captures and allowing for a more straightfoward forwarding of rest...
parameters
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
We always return a future, but with the threaded writer, we can get rid of
that. So while reads will still return a future, the writer will be able to
return void.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Table merging code needs to compare schema_ptrs for equality so add
comparison operators for column_definition and schema classes.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
For some reason, I added a fsync call when the file underlying the
stream gets truncated. That happens when flushing a file, which
size isn't aligned to the requested DMA buffer.
Instead, fsync should only be called when closing the stream, so this
patch changes the code to do that.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Glauber says:
"The current series fixes a small conversion bug and brings some much needed
cleanups (like in key.cc).
With that in place, it implements and wires support for collections and range
tombstones."
We're looking up shared_ptr<column_identifier> type so make sure we
lookup by value, not by pointer.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
We did so because passing shared pointers in the old code was so much easier.
But it is no longer, so we can avoid the reference bump.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
This is the code to write a range tombstone. This is not yet wired up to actually do it.
The use case for collections is a lot simpler, and will be handled first.
The actual code should be virtually identical, though.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
We can insert markers in the end of composites, which can be used to identify
the presence of ranges in a column.
One option, would be to change all methods in sstables/key.hh to take an
optional marker parameter, and append that as the last marker.
But because we are talking about a single byte, and always added to the end,
it's a lot easier to allow the composite to be created normally, and then
replace the last byte with the marker.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Tomek got confused with the fact that we had to pass bytes_type for this code
to work. And well, that's understandable: that code evolved quite a bit since
its first user, and now the interface is not quite the best for the job,
forcing us to employ weird tricks like that for the code to work.
In this cleanup, I am creating a serializer object, that will encode
information about how to serialize the component passed. In the majority of the
cases, a simple sstable_serializer will just serialize to itself - accepting as
parameters byte_views.
In the case we need to operate on a deeply exploded view - the only case for
which we truly needed types, the respective serializer will take a types vector
and use it accordingly.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
In a testament to how confusing our old code was, while collapsing the futures
I ended up getting the end_of_row element inside the loop for clustered keys.
The end of row, as the name implies, should only be written at the end of the
(thrift) row (= CQL partition).
Move it to the right place.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
writer.hh includes sstables.hh which includes writer.hh
We can't remove the reference if we include core/fstream.hh into writer.hh instead
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
From Glauber:
"This my attempt to convert the sstable write code to seastar threads.
The code does look a lot cleaner now, and the future path on how to improve
it, a bit more clear."