Commit Graph

242 Commits

Author SHA1 Message Date
Pekka Enberg
a358990855 db/legacy_schema_tables: Pass storage_proxy by reference
We always operate on the local storage proxy so pass it by reference.
This simplifies DEFINITIONS_UPDATE message handler where all we have is
a "this" pointer to the local storage proxy.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-07 16:27:58 +03:00
Glauber Costa
5044e5191d gate: use with_gate idiom
Aside from guaranteeing that we will always leave correctly, it will also
allow us to change the implementation of the enter / leave pair without
disrupting existing code.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-05 19:18:34 +03:00
Calle Wilund
abe3851376 Move memtable in_flight_seal.leave() to finally (symmetry) 2015-07-05 16:04:40 +03:00
Paweł Dziepak
183b6fc6d9 db: do not return already expired cells in queries
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-07-02 17:25:41 +02:00
Tomasz Grabiec
3fc951e807 mutation_partition: Use default value for row_limit in query() 2015-07-02 13:25:46 +02:00
Nadav Har'El
c892228018 compaction: remove compacted sstables
After compaction, remove the source sstables. This cannot be done
immediately, as ongoing reads might be using them, so we mark the sstable
as "to be deleted", and when all references to this sstable are lost and
the object is destroy, we see this flag and delete the on-disk files.

This patch doesn't change the low-level compact_sstables() (which doesn't
mark its input sstables for deletion), but rather the higher-level example
"strategy" column_family::compact_all_sstables(). I thought we might want
to do this to allow in the future strategies that might only mark the input
sstables for deletion after doing perhaps other steps and to be sure it
doesn't want to abort the compaction and return to the old files. If we
decide this isn't needed, we can easily move the mark_for_deletion() call
to compact_sstables().

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-06-30 15:00:39 +03:00
Avi Kivity
e85197a806 db: fix std::terminate() called during failed find_schema()
One of the find_schema variants calls a find_uuid() that throws out_of_range,
without converting it to no_such_column_family first.  This results in
std::terminate() being called due to exception specifications.

Fix by converting the exception.
2015-06-29 13:32:28 +03:00
Glauber Costa
e19f7e93a4 database: column families loading
Same as for keyspaces, and we will reuse most of the code. CFs that are
found on disk upon bootstrap are now recreated in memory.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-25 09:50:55 -04:00
Glauber Costa
b3a4eb83d3 database: keyspaces loading
This patch bootstraps the keyspaces found in system sstables and make
our in-memory structures reflect them. It tries to reuse as much code
as we can from db::legacy_system_tables, but keeping everything local
and without applying any mutations to the database - since the latter
would only unnecessarily complicate the write path.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-25 09:50:55 -04:00
Glauber Costa
c0ad0dc1d3 database: pass storage proxy to init_data_directory
In order for us to call some function from db::legacy_schema_tables, we need
a working storage proxy. We will use those functions in order to leverage the
work done in keyspace / table creation from mutations

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-25 09:50:55 -04:00
Nadav Har'El
3f5114e415 compaction: compact_all_sstables demo function
This is an example of how to use the low-level compact_sstable() function
to compact all the sstables of one column family into one. It is not a
full-fledged "compaction strategy" but the real ones can be based on this
example.

Among the things that this code doesn't do yet is to delete the old
sstables. In the future, this should happen automatically in the sstable
destructor when all the references to the sstable get deleted.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-06-25 11:05:35 +03:00
Glauber Costa
4d07c952cc remove global create_keyspace
There is only one user in tree, and as Tomek pointed out, it is buggy. The
reason is that ksm is a shard-local structure, and it is currently used
indiscriminately by all shards which can easily lead to problems.

We could fix it with some tricks, but it is way better and safer to make sure
the callers are doing it right instead. There is only one caller, so let's fix
that.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-24 12:44:05 -04:00
Glauber Costa
f0f04892d3 database: no longer replicate create_keyspace code
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-24 12:44:05 -04:00
Avi Kivity
3d22623a6b Merge "Flush schema changes to disk" from Glauber
"This is the current patchset to flush and persist schema changes to disk.
It is not perfect, in the sense that older changes still in flight won't be
waited for. But as we discussed - at this moment we'll just note that, and
leave the fix for later"
2015-06-24 17:08:33 +03:00
Glauber Costa
fd6ec4a7ca database: fix sstables exception
This exception handling code is clearly bogus. That's old code, and it is not
the proper way to propagate it.

Fix it to use then_wrapped. Also include the filename in the message, so we have
a better clue about what happened.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-24 06:16:54 +03:00
Glauber Costa
a6ef9815e9 database: futurize seal_active_memtables
By doing this, it is possible to synchronously wait for the seal to complete by
waiting on this future. This is useful in situations where we want to
synchronously flush data to disk.

Existing callers will not be patched, and this keeps their current behavior,
alas, asynchronously initiating a write, is preserved.

TODO: A better interface would guarantee that all writes before this one are
also complete

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-23 15:23:37 -04:00
Tomasz Grabiec
d4e0e5957b db: Integrate cache with the read path 2015-06-23 13:49:25 +02:00
Tomasz Grabiec
b9288d9fa7 db: Make column_family managed by lw_shared_ptr<>
It will be share-owned by readers.
2015-06-23 13:49:24 +02:00
Tomasz Grabiec
83e7a21dfb mutation: Add apply() helper which works on mutation_opt 2015-06-23 13:49:23 +02:00
Vlad Zolotarov
3520d4de10 locator: introduce a global distributed<snitch_ptr> i_endpoint_snitch::snitch_instance()
Snitch class semantics defined to be per-Node. To make it so we
introduce here a static member in an i_endpoint_snitch class that
has to contain the pointer to the relevant snitch class instance.

Since the snitch contents are not always pure const it has to be per
shard, therefore we'll make it a "distributed". All the I/O is going
to take place on a single shard and if there are changes - they are going
to be propagated to the rest of the shards.

The application is responsible to initialize this distributed<shnitch>
before it's used for the first time.

This patch effectively reverts most of the "locator: futurize
snitch creation" a2594015f9 patch - the part that modifies the
code that was creating the snitch instance. Since snitch is
created explicitly by the application and all the rest of the code
simply assumes that the above global is initialized we won't need
all those changes any more and the code will get back to be nice and simple
as it was before the patch above.

So, to summarize, this patch does the following:
   - Reverts the changes introduced by a2594015f9 related to the fact that
     every time a replication strategy was created there should have been created
     a snitch that would have been stored in this strategy object. More specifically,
     methods like keyspace::create_replication_strategy() do not return a future<>
     any more and this allows to simplify the code that calls it significantly.
   - Introduce the global distributed<snitch_ptr> object:
      - It belongs to the i_endpoint_snitch class.
      - There has been added a corresponding interface to access both global and
        shard-local instances.
      - locator::abstract_replication_strategy::create_replication_strategy() does
        not accept snitch_ptr&& - it'll get and pass the corresponding shard-local
        instance of the snitch to the replication strategy's constructor by itself.
      - Adjusted the existing snitch infrastructure to the new semantics:
         - Modified the create_snitch() to create and start all per-shard snitch
           instances and update the global variable.
         - Introduced a static i_endpoint_snitch::stop_snitch() function that properly
           stops the global distributed snitch.
         - Added the code to the gossiping_property_file_snitch that distributes the
           changed data to all per-shard snitch objects.
         - Made all existing snitches classes properly maintain their state in order
           to be able to shut down cleanly.
         - Patched both urchin and cql_query_test to initialize a snitch instance before
           all other services.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v6:
   - Rebased to the current master.
   - Extended a commit message a little - the summary.

New in v5:
   - database::create_keyspace(): added a missing _keyspaces.emplace()

New in v4:
   - Kept the database::create_keyspace() to return future<> by Glauber's request
     and added a description to this method that needs to be changed when Glauber
     adds his bits that require this interface.
2015-06-22 23:18:31 +03:00
Shlomi Livne
0ce374a853 Add support for setting keyspace replication_strategy
To support initialization of system tables keyspace replication_strategy
without the need of having snitch creation.

Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-06-22 13:19:55 +03:00
Glauber Costa
f4a167670a database: seal active memtables when we close the database
Failing to do so can lead to data not being written to disk when
we terminate.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-21 09:39:31 +03:00
Glauber Costa
1f13d3e38f database: gate seal_active_memtable
We need to do that in order to close the database cleanly, flushing all pending
data before we do.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-21 09:39:29 +03:00
Avi Kivity
f221301d5e Merge "preparation work - system table handling" from Glauber 2015-06-18 17:49:29 +03:00
Tomasz Grabiec
51cae834e3 db: Put all sstables behind single reader
This change abstracts reading from on-disk data sources behind a single
reader which is then composed with memtable readers. This change also
abstracts all data sources behind a single reader obtained via
column_family::make_reader(). That reader is then used by algorithms
like column_family::for_all_partitions() or
column_family::query(). Having those abstractions will make it easier
to add row cache, because it will be encapsulated in a single place.
2015-06-18 16:33:33 +02:00
Tomasz Grabiec
a8fde0847e db: Fix too broad catch clause
The current handling, which ignores the future and a FIXME, should
apply only to the case when a table is missing.
2015-06-18 15:47:40 +02:00
Tomasz Grabiec
3779506990 db: query: Make partition_range hold ring_position
Current model was not really correct because Origin doesn't support
querying of partition ranges by their value. We can query slices
according to dht::decorated_key ordering, which orders partitions
first by token then by key value.

ring_position encapsulates range constraint. Key value is optional, in
which case only token is constrained.
2015-06-18 15:47:40 +02:00
Glauber Costa
2f5b1b642b database: be more forgiving with sstables parsing
Currently, Origin generates sstables in the form CF-UUID, where UUID
is a string of numbers.

We also do CF-UUID, but for us, UUID has dashes separating the UUID components.

Due to the current test, we fails to load our current sstables. That test
really isn't that important, since we are currently not doing anything with the
UUID. And if we were, we should be able to accept both formats anyway.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-18 09:22:20 -04:00
Glauber Costa
efc57ef65e create system directory if it doesn't exist
Because system keyspace is not created using the same way as the others - and
it would be hard to convert, due to the fact that it is created inside the
database constructor, make sure that it is created when the database boots.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-18 09:22:20 -04:00
Glauber Costa
057c38b61c only populate system keyspace
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-18 09:22:20 -04:00
Glauber Costa
a95a529865 database: allow for empty data file directories
A lot of our tests run in memory only, but now that our write path is complete,
we may start running into problems soon, as we write down the sstables.

It would be nice to force the database to run in-memory only in some situations.
Even in the real world, some scenarios may benefit from that in the future.

This patch forces durable_writes to be always false in case we force the data
directory to be an empty list.

For system tables, the patch also fixes a bug. Because system tables were
forceably initialized with durable_writes = false, we would never write them to
disk, even when we were supposed to.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-18 09:22:20 -04:00
Pekka Enberg
8345874dda database: Add database::has_schema() helper
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-06-17 15:45:45 +03:00
Nadav Har'El
78a8ac8470 Make mutation_reader usable outside database.cc
The "mutation_reader" defined in database.cc is a convenient mechanism
for iterating over mutations. It can be useful for more than just
database.cc (I want to use it in the compaction code), so this patch moves
the type's definition to mutation.hh, and the make_memtable_reader()
function to memtable::make_reader() (in memtable.hh).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-06-16 14:03:34 +02:00
Gleb Natapov
2d409250f2 remove ad-hoc token_metadata creation 2015-06-15 12:51:09 +03:00
Vlad Zolotarov
e045d8465c db: use snitch name from the configuration file
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-14 15:31:58 +03:00
Vlad Zolotarov
03ffaea768 locator: introduce i_endpoint_snitch::create_snitch()
- Kill make_snitch().
   - i_endpoint_snitch::create_snitch() uses the utilities from class_registrator.hh.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-06-14 15:31:49 +03:00
Gleb Natapov
b7155ad862 pass partitions_ranges separately from from read_command
partitions_ranges will be manipulated upon to be split for different
destination, so provide it separately from read_command to not copy the
later for each destination.
2015-06-11 15:18:07 +03:00
Calle Wilund
65f25e1840 Database/commitlog - Fix broken assert
Previous patch added an assert that is not true in the case a test runs
without an attached commit log, yet still generates enough mutations to cause
a memtable flush.

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2015-06-10 15:20:36 +03:00
Calle Wilund
8b9a63a3c6 Database/commitlog: guard against replay position reordering
Commit log guarantees that once an RP is assigned to a data frame/caller, it
will not block before returning the result via future. However, this is not
enough, since we could
a.) Have blocked earlier, in which case the return value processing will be
async anyway
b.) Even if no blocking takes place, future chaining mechanism could decide
it has to reorder execution.

Assuming though that the case where this happens is rare, and cases where it
actually affects the rule of replay position ordering is even rarer, we can
guard against it by simply keeping track of the highest RP _discarded_ (sent
to sstable flush), and if we attempt to apply a mutation with a higher RP,
simply re-do the operation (i.e. write same entry to commit log again).

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2015-06-10 11:56:45 +03:00
Vlad Zolotarov
a2594015f9 locator: futurize snitch creation
- Forbid explicit snitch creation with constructor.
   - Allow the creation of snitches only with locator::make_snitch() template
     function.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>

New in v4:
   - Make sure the snitch is stopped before it's destroyed when _snitch_is_ready
     is returned in an exceptional state.

New in v2:
   - Change snitch_ptr to be std::unique_ptr<i_endpoint_snitch>
   - abstract_replication_strategy::create_replication_strategy(): explicitly
     specify (template) types of create_object() parameters.
   - Re-arrange the loop in marge_keyspaces() so that lambdas that depend on
     "this" complete before there is a chance that "this" gets destroyed.
   - create_keyspace(): Don't add a new keyspace if a keyspace with this name
     already exists.
   - i_endpoint_snitch: added a stop() virtual method
      - Added a stop() pure virtual method.
      - Added an enum class snitch_state and a _state member initialized to snitch_state::initializing,
        added an assert() in a destructor requiring _state to become snitch_state::stopped,
        which should be set when stop() is complete.
   - rack_inferring_snitch: added a stop() method.
   - simple_snitch: added a stop() method.
   - Added stop() methods to abstract_replication_strategy and keyspace.
   - Updated database::stop() to wait for all keyspaces in _keyspaces to stop.
2015-06-09 15:33:38 +03:00
Shlomi Livne
bd89fa4905 config: add string_list (vec of sstring) as config data type + use for datadir
To handle the fact that --data-file-directories is supposed to be 1+
folders.

Note that boost::program_ops already "reserves" the use of std::vector
as reciever of values for multitoken options (i.e. those with more than
one value). Thus, values recieving a list of tokens via command line
should adhere to the multi-token rules, i.e. space separated values.

End result is that --data-file-directories now accept multiple paths,
white space separated,
i.e. --data-file-directories <path1> <path2>
And as it turns out, this is really a nicer way of writing stuff than
using "," or ":" seperation of paths etc, so...

Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2015-06-09 10:40:45 +03:00
Pekka Enberg
87e525b6b5 database: Add update and drop column family stubs
They're needed by table merging in db/legacy_schema_tables.cc.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-06-08 14:42:36 +03:00
Asias He
1aac08b8ab Revert "storage_service: Remove ad-hoc token_metadata creation"
This reverts commit a19d2171eb.

This commit breaks cql_query_test.

   [asias@hjpc urchin]$ ./cql_query_test
   Running 1 test case...
   WARNING: Not implemented: COMPACT_TABLES
   WARNING: Not implemented: METRICS
   WARNING: Not implemented: PERMISSIONS
   cql_query_test: core/distributed.hh:290: Service&
   distributed<Service>::local() [with Service =
   service::storage_service]: Assertion `local_is_initialized()' failed.
   unknown location(0): fatal error in "test_create_keyspace_statement":
   signal: SIGABRT (application abort requested)
   tests/test-utils.cc(31): last checkpoint

   *** 1 failure detected in test suite "tests/urchin/cql_query_test.cc"
   (gdb) bt
   #0  0x00000032930348d7 in __GI_raise (sig=sig@entry=6) at
   ../sysdeps/unix/sysv/linux/raise.c:55
   #1  0x000000329303653a in __GI_abort () at abort.c:89
   #2  0x000000329302d47d in __assert_fail_base (fmt=0x3293186cb8
   "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
   assertion=assertion@entry=0x8ec10a "local_is_initialized()",
   file=file@entry=0x92508d "core/distributed.hh",
       line=line@entry=290, function=function@entry=0x8ed440
   <distributed<service::storage_service>::local()::__PRETTY_FUNCTION__>
   "Service& distributed<Service>::local() [with Service =
   service::storage_service]")
       at assert.c:92
   #3  0x000000329302d532 in __GI___assert_fail (assertion=0x8ec10a
   "local_is_initialized()", file=0x92508d "core/distributed.hh",
   line=290,
       function=0x8ed440
   <distributed<service::storage_service>::local()::__PRETTY_FUNCTION__>
   "Service& distributed<Service>::local() [with Service =
   service::storage_service]") at assert.c:101
   #4  0x0000000000430f19 in local (this=<optimized out>) at
   core/distributed.hh:290
   #5  get_local_storage_service () at service/storage_service.hh:3326
   #6  keyspace::create_replication_strategy (this=0x7ffff6bf8350) at
   database.cc:690
   #7  0x000000000061537a in
   _ZZZN2db20legacy_schema_tables15merge_keyspacesERN7service13storage_proxyEOSt3mapI13basic_sstringIcjLj15EE13lw_shared_ptrIN5query10result_setEESt4lessIS6_ESaISt4pairIKS6_SA_EEESI_ENKUlRT_E0_clISt6ve
   ctorISF_SG_EEEDaSK_ENKUlR8databaseE_clESQ_ () at
   db/legacy_schema_tables.cc:584
   #8  0x0000000000617d19 in operator() (__closure=0x7ffff6bf8650) at
   ./core/distributed.hh:284

In the test, storage_service and other services are not stared.

Let's revert it and figure out a way to run cql_query_test with the
needed services started properly and then bring the "storage_service:
Remove ad-hoc token_metadata creation" change back.
2015-06-05 08:21:59 +03:00
Asias He
a19d2171eb storage_service: Remove ad-hoc token_metadata creation
Use token_metadata from storage_service when creating a
replication_strategy in keyspace::create_replication_strategy.
2015-06-04 17:16:50 +08:00
Asias He
edee90550c database: Fix boost::find compile error
boost::find confuses compiling when both <boost/algorithm/string/find.hpp> and
<boost/range/algorithm/find.hpp> are included.
2015-06-04 17:12:09 +08:00
Avi Kivity
7fa17d9880 Merge "range query read path"
Conflicts:
	database.cc
2015-06-04 10:21:48 +03:00
Avi Kivity
9765eda012 db: drop memtables that were successfully flushed 2015-06-03 16:39:53 +03:00
Avi Kivity
a71c287c10 db: add sstables to the range scan read path 2015-06-03 16:39:46 +03:00
Calle Wilund
5418673659 Column family seal_active_memtable fix: don't use local by ref in cont.
Signed-off-by: Calle Wilund <calle@cloudius-systems.com>
2015-06-03 13:43:47 +02:00
Tomasz Grabiec
b2549a7b14 Merge branch 'calle/secondary_index' from seastar-dev.git 2015-06-03 13:22:01 +02:00