Commit Graph

76 Commits

Author SHA1 Message Date
Piotr Jastrzebski
ec3d59bf13 Add flag to configure
max size of a cached partition.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
(cherry picked from commit 636a4acfd0)
2016-07-27 14:09:34 +03:00
Tomasz Grabiec
35c1781913 schema_tables: Fix hang during keyspace drop
Fixes #1484.

We drop tables as part of keyspace drop. Table drop starts with
creating a snapshot on all shards. All shards must use the same
snapshot timestamp which, among other things, is part of the snapshot
name. The timestamp is generated using supplied timestamp generating
function (joinpoint object). The joinpoint object will wait for all
shards to arrive and then generate and return the timestamp.

However, we drop tables in parallel, using the same joinpoint
instance. So joinpoint may be contacted by snapshotting shards of
tables A and B concurrently, generating timestamp t1 for some shards
of table A and some shards of table B. Later the remaining shards of
table A will get a different timestamp. As a result, different shards
may use different snapshot names for the same table. The snapshot
creation will never complete because the sealing fiber waits for all
shards to signal it, on the same name.

The fix is to give each table a separate joinpoint instance.

Message-Id: <1469117228-17879-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit 5e8f0efc85)
2016-07-22 15:36:45 +02:00
Tomasz Grabiec
9c430c2cff schema_tables: Add more logging
Message-Id: <1468917771-2592-1-git-send-email-tgrabiec@scylladb.com>
(cherry picked from commit a0832f08d2)
2016-07-20 10:13:28 +03:00
Duarte Nunes
aacc7193f2 schema: Replace keyspace's schema_ptr on CF update
This patch ensures we replace the schema_ptr held by its respective
keyspace object when a column family is being updated.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20160623085710.26168-1-duarte@scylladb.com>
2016-06-23 11:11:52 +02:00
Calle Wilund
8cdf4e37fb schema_tables: Fix merge_keyspaces to handle alter keyspace
Must keep "altered" alive into the call chain.
2016-05-10 14:32:51 +00:00
Duarte Nunes
809b45e160 udt: Add drop type statement
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 18:07:02 +02:00
Duarte Nunes
d1f215b743 udt: Merge user defined type mutations
This patch implements the merge_types() function,
allowing mutations to user defined types to be applied.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
d6d29f7c52 schema: Replace ad hoc func with indirect_equal_to
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
dd75fe8ec0 udt: Add mutations for user defined types
This patch implements mutations for user defined types.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Duarte Nunes
c7b3a4b144 udt: Parse user types system table
This patch loads and parses the user types system table during
bootstrap.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-04-20 09:54:06 +02:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Avi Kivity
a919113fdb schema_tables: fix deadlock in cross-node communications
Seastar wrongly limits the number of concurrent submit_to()s to a single
remote shard.  This can cause an ABBA deadlock:

  fiberA                fiberB (x127)
  submit_to(0)                         # lock schema
  <- returns
                        submit_to(0)   # lock schema (waits)
  submit_to(0)                         # do work (waits)

The fiberBs wait for fiberA, which in turn waits for a fiberB to return.

While the correct fix is to remote the client-side limit and replace it
with a server-side per-verb limit, we start with a simpler fix that
replaces the blocking lock call with a non-blocking call, removing the
deadlock.

Fixes #1088.

Message-Id: <1459095357-28950-1-git-send-email-avi@scylladb.com>
2016-03-28 10:12:10 +03:00
Tomasz Grabiec
53bbcf4a1e schema_tables: Wait for notifications to be processed.
Listeners may defer since:

 93015bcc54 "migration_manager: Make the migration callbacks runs inside seastar thread"

Not all places were adjusted to wait for them. Fix that.

Message-Id: <1458837613-27616-1-git-send-email-tgrabiec@scylladb.com>
2016-03-24 19:04:12 +02:00
Asias He
93015bcc54 migration_manager: Make the migration callbacks runs inside seastar thread
At the momment, the callbacks returns void, it is impossible to wait for
the callbacks to complete. Make the callbacks runs inside seastar
thread, so if we need to wait for the callback, we can make it call
foo_operation().get() in the callback. It is easier than making the
callbacks return future<>.
2016-03-15 15:41:23 +08:00
Glauber Costa
a339296385 database: turn sstable generation number into an optional
This patch makes sure that every time we need to create a new generation number -
the very first step in the creation of a new SSTable, the respective CF is already
initialized and populated. Failure to do so can lead to data being overwritten.
Extensive details about why this is important can be found
in Scylla's Github Issue #1014

Nothing should be writing to SSTables before we have the chance to populate the
existing SSTables and calculate what should the next generation number be.

However, if that happens, we want to protect against it in a way that does not
involve overwriting existing tables. This is one of the ways to do it: every
column family starts in an unwriteable state, and when it can finally be written
to, we mark it as writeable.

Note that this *cannot* be a part of add_column_family. That adds a column family
to a db in memory only, and if anybody is about to write to a CF, that was most
likely already called. We need to call this explicitly when we are sure we're ready
to issue disk operations safely.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-10 21:06:05 -05:00
Tomasz Grabiec
04f2482d74 schema_tables: Log results of schema merge
Currently schema changes are only logged at coordinator node which
initiates the change. It would be helpful in post morten analysis to
also see when and how schema changes are resolved when applied on
other nodes.
Message-Id: <1456953095-1982-1-git-send-email-tgrabiec@scylladb.com>
2016-03-03 11:12:15 +02:00
Calle Wilund
590ec1674b truncate: Require timestamp join-function to ensure equal values
Fixes #937

In fixing #884, truncation not truncating memtables properly,
time stamping in truncate was made shard-local. This however
breaks the snapshot logic, since for all shards in a truncate,
the sstables should snapshot to the same location.

This patch adds a required function argument to truncate (and
by extension drop_column_family) that produces a time stamp in
a "join" fashion (i.e. same on all shards), and utilizes the
joinpoint type in caller to do so.

Message-Id: <1456332856-23395-2-git-send-email-calle@scylladb.com>
2016-02-24 18:59:31 +02:00
Calle Wilund
18203a4244 database::truncate/drop: Move time stamp generation to shard
Fixes #884

Time stamps for truncation must be generated after flush, either by
splitting the truncate into two (or more) for-each-shard operations,
or simply by doing time stamping per shard (this solution).

We generate TS on each shard after flushing, and then rely on the
actual stored value to be the highest time point generated.

This should however, from batch replay point of view, be functionally
equivalent. And not a problem.
2016-02-09 15:45:37 +00:00
Gleb Natapov
63a5aa6122 prevent superfluous frozen_mutation copying
Sometimes frozen_mutation is copied while it can be moved instead. Fix
those cases.

Message-Id: <20160204165708.GI6705@scylladb.com>
2016-02-07 10:54:16 +02:00
Paweł Dziepak
4927ff95da schema: read collections from comparator
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-18 08:35:33 +01:00
Tomasz Grabiec
e62857da48 schema_tables: Wait for make_directory_for_column_family() to finish in merge_tables() 2016-01-11 10:34:55 +01:00
Tomasz Grabiec
71bbbceced schema_tables: Notify about table creation after it is fully inited
I'm not aware of any issues it could cause, but it makes more sense
that way.
2016-01-11 10:34:55 +01:00
Tomasz Grabiec
8deb3f18d3 query_processor: Invalidate prepared statements when columns change
Replicates https://issues.apache.org/jira/browse/CASSANDRA-7910 :

"Prepare a statement with a wildcard in the select clause.
2. Alter the table - add a column
3. execute the prepared statement
Expected result - get all the columns including the new column
Actual result - get the columns except the new column"
2016-01-11 10:34:55 +01:00
Tomasz Grabiec
d80ffc580f schema_tables: Notify about table schema update 2016-01-11 10:34:54 +01:00
Tomasz Grabiec
8817e9613d migration_manager: Simplify notifications
Currently the notify_*() method family broadcasts to all shards, so
schema merging code invokes them only on shard 0, to avoid doubling
notifications. We can simplify this by making the notify_*() methods
per-instance and thus shard-local.
2016-01-11 10:34:54 +01:00
Paweł Dziepak
f24f677dde db/schema_tables: simplify column difference computation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Paweł Dziepak
ae3acd0f9c system_tables: store sechma::dropped_columns in system tables
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-11 10:34:54 +01:00
Tomasz Grabiec
d8ff9ee441 schema_tables: Make merge_tables() compare by mutations
Schema version is calculated from mutations, so merge_schema should
also look at mutation changes to detect schema changes whenever
version changes.
2016-01-11 10:34:53 +01:00
Tomasz Grabiec
5707c5e7ca schema_tables: Simplify merge_tables() and merge_keyspaces()
read_schema_for_keyspaces() drops empty results so the emptiness
checks are always false and we can remove some redundancy.
2016-01-11 10:34:53 +01:00
Tomasz Grabiec
bfefe5a546 schema_tables: Calculate digest from mutations
We want the node's schema version to change whenever
table_schema_version of any table changes. The latter is calculated by
hashing mutations so we should also use mutation hash when calculating
schema digest.
2016-01-11 10:34:53 +01:00
Tomasz Grabiec
b91c92401f migration_manager: Implement migration_manager::announce_column_family_update 2016-01-11 10:34:53 +01:00
Tomasz Grabiec
8164902c84 schema_tables: Change column_family schema on schema sync
Notifications are not implemented yet.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
4e5a52d6fa db: Make read interface schema version aware
The intent is to make data returned by queries always conform to a
single schema version, which is requested by the client. For CQL
queries, for example, we want to use the same schema which was used to
compile the query. The other node expects to receive data conforming
to the requested schema.

Interface on shard level accepts schema_ptr, across nodes we use
table_schema_version UUID. To transfer schema_ptr across shards, we
use global_schema_ptr.

Because schema is identified with UUID across nodes, requestors must
be prepared for being queried for the definition of the schema. They
must hold a live schema_ptr around the request. This guarantees that
schema_registry will always know about the requested version. This is
not an issue because for queries the requestor needs to hold on to the
schema anyway to be able to interpret the results. But care must be
taken to always use the same schema version for making the request and
parsing the results.

Schema requesting across nodes is currently stubbed (throws runtime
exception).
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
04eb58159a query: Add schema_version field to read_command 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
f58c2dec1e schema: Make schema objects versioned
The version needs to change value not only on structural changes but
also temporal. This is needed for nodes to detect if the version they
see was already synchronized with or not even if it has the same
structure as the past versions. We also need to end up with the same
version on all nodes when schema changes are commuted.

For regular mutable schemas version will be calculated from underlying
mutations when schema is announced. For static schemas of system
keyspace it is calculated by hashing scylla version and column id,
because we don't have mutations at the time of building the schema.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
fdb9e01eb4 schema_tables: Use schema_mutations for schema_ptr translations
We will be able to reuse the code in frozen_schema. We need to read
data in mutation form so that we can construct the correct
schema_table_version, and attach the mutations to schema_ptr.
2016-01-08 21:10:26 +01:00
Tomasz Grabiec
d07e32bc32 schema_tables: Simplify schema building invocation chain 2016-01-08 21:10:26 +01:00
Tomasz Grabiec
3c3ea20640 schema_tables: Drop pkey parameter from add_table_to_schema_mutation()
It simplifies add_table_to_schema_mutation() interface.

The current code is also a bit confusing, partition_key is created
with the keyspaces() schema and used in mutations destined for the
columnfamilies() schema. It works, the types are the same, but looks a
bit scary.
2016-01-08 21:10:26 +01:00
Pekka Enberg
e56bf8933f Improve not implemented errors
Print out the function name where we're throwing the exception from to
make it easier to debug such exceptions.
2015-12-18 10:51:37 +01:00
Tomasz Grabiec
bc23ebcbc3 schema_tables: Replace schema_result::value_type with equivalent movable type
future<> requires and will assert nothrow move constructible types.
2015-12-07 09:50:27 +01:00
Tomasz Grabiec
8d88ece896 schema_tables: Fix "comment" property not being loaded from storage 2015-11-30 10:57:36 +02:00
Avi Kivity
2c3591cbd9 data_value de-any-fication
We use boost::any to convert to and from database values (stored in
serlialized form) and native C++ values.  boost::any captures information
about the data type (how to copy/move/delete etc.) and stores it inside
the boost::any instance.  We later retrieve the real value using
boost::any_cast.

However, data_value (which has a boost::any member) already has type
information as a data_type instance.  By teaching data_type intances about
the corresponding native type, we can elimiante the use of boost::any.

While boost::any is evil and eliminating it improves efficiency somewhat,
the real goal is growing native type support in data_type.  We will use that
later to store native types in the cache, enabling O(log n) access to
collections, O(1) access to tuples, and more efficient large blob support.
2015-10-30 17:38:51 +01:00
Raphael S. Carvalho
6bea503f9a db: fallback to sizetiered if compaction strategy isn't supported
It may happen that the user will migrate a table to Scylla which
compaction strategy isn't supported yet, such as Data tiered.
Let's handle that by falling back to size-tiered compaction
strategy and printing a warning message.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-29 09:33:28 +02:00
Raphael S. Carvalho
a21af32eed db: do not ignore compaction strategy class
When building the in-memory schema for a column family, we were
ignoring compaction strategy class because of a bug in the
existing code. Example: suppose that you create a column family
with leveled compaction strategy. This option would be ignored
and the default strategy (size-tiered) would be used instead.
Found this problem while working on leveled compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-18 11:06:37 +03:00
Glauber Costa
e99e418238 schema_tables: make sure CF directory exists upon creation
In Cassandra, when you create a new column family, a directory for it
immediately appears under the KS directory.

In the past, we have made a decision to delay that creation until the first
SSTable is created, which works well in general.

There is a problem, however, for backup restoration: the standard procedure to
call loadNewSSTables is to do that in an empty directory. But the directory
simply won't be there until we create the first SSTable: bummer!

In the current incarnation of the code in schema_tables.cc, there is already
some code that runs on CPU0 only. That is a perfect place for the directory
creation. So let's do it.

After this patch, a directory for the CF appears right after the CF creation.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-17 13:08:07 +02:00
Glauber Costa
b2fef14ada do not calculate truncation time independently
Currently, we are calculating truncated_at during truncate() independently for
each shard. It will work if we're lucky, but it is fairly easy to trigger cases
in which each shard will end up with a slightly different time.

The main problem here, is that this time is used as the snapshot name when auto
snapshots are enabled. Previous to my last fixes, this would just generate two
separate directories in this case, which is wrong but not severe.

But after the fix, this means that both shards will wait for one another to
synchronize and this will hang the database.

Fix this by making sure that the truncation time is calculated before
invoke_on_all in all needed places.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-09 17:17:11 +03:00
Pekka Enberg
95012793e5 db/schema_tables: Wire up drop keyspace notifications
Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-08 13:10:48 +02:00
Pekka Enberg
5878f62b18 db/schema_tables: Clean up indentation
Almost the whole file is (accidentally) indented four spaces to the
right for no reason. Fix that up because it's annoying as hell.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 17:09:27 +02:00
Pekka Enberg
1f9e769dd3 db/schema_tables: Remove obsolete ifdef'd code
Remove ifdef'd code that we won't be converting to C++ because of design
differences.

Signed-off-by: Pekka Enberg <penberg@scylladb.com>
2015-10-06 17:09:27 +02:00
Pekka Enberg
6e304cd58c db/schema_tables: Fix merge_keyspaces() to actually drop keyspaces
When we query schema keyspaces after we have applied a delete mutation,
the dropped keyspace does not exist in the "after" result set. Fix the
merge_keyspaces() algorithm to take that into account.

Makes merge_keyspaces() really call to database::drop_keyspace() when a
keyspace is dropped.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-10-06 14:53:35 +03:00