Since the table is written from all shards, and we possibly might
have conflicting time stamps, we define the trucated_at time
as the highest time point. I.e. conservative.
Truncation records are not portable between us and Origin.
We need to detect and ensure we neither try to use, and more to the
point, don't crash because of data format error when loading, origin
records from a migrated system.
This problem was seen by Tzach when doing a migration from an origin
setup.
Updated record storage to use IDL-serialized types + added versioning
and magic marking + odd-size-checking to ensure we load only correct
data. The code will also deal with records from an older version of
scylla.
The intent is to make data returned by queries always conform to a
single schema version, which is requested by the client. For CQL
queries, for example, we want to use the same schema which was used to
compile the query. The other node expects to receive data conforming
to the requested schema.
Interface on shard level accepts schema_ptr, across nodes we use
table_schema_version UUID. To transfer schema_ptr across shards, we
use global_schema_ptr.
Because schema is identified with UUID across nodes, requestors must
be prepared for being queried for the definition of the schema. They
must hold a live schema_ptr around the request. This guarantees that
schema_registry will always know about the requested version. This is
not an issue because for queries the requestor needs to hold on to the
schema anyway to be able to interpret the results. But care must be
taken to always use the same schema version for making the request and
parsing the results.
Schema requesting across nodes is currently stubbed (throws runtime
exception).
The version needs to change value not only on structural changes but
also temporal. This is needed for nodes to detect if the version they
see was already synchronized with or not even if it has the same
structure as the past versions. We also need to end up with the same
version on all nodes when schema changes are commuted.
For regular mutable schemas version will be calculated from underlying
mutations when schema is announced. For static schemas of system
keyspace it is calculated by hashing scylla version and column id,
because we don't have mutations at the time of building the schema.
Current service initialization is a total mess in cql_test_env. Start
the service the same order as in main.cc.
Fixes#715, #716
'./test.py --mode release' passes.
This method is intended to return content of the system table
COMPACTION_HISTORY as a vector of compaction_history_entry.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
1) Start node 1, node 2, node 3
2) Stop node 3
3) Start node 4 to replace node 3
4) Kill node 4 (removal of node 3 in system.peers is not flushed to disk)
5) Start node 4 (will load node 3's token and host_id info in bootup)
This makes
"Token .* changing ownership from 127.0.0.3 to 127.0.0.4"
messages printed again in step 5) which are not expected, which fails the dtest
FAIL: replace_first_boot_test (replace_address_test.TestReplaceAddress)
----------------------------------------------------------------------
Traceback (most recent call last):
File "scylla-dtest/replace_address_test.py",
line 220, in replace_first_boot_test
self.assertEqual(len(movedTokensList), numNodes)
AssertionError: 512 != 256
Backport: CASSANDRA-8801
a53a6ce Decommissioned nodes will not rejoin the cluster.
Tested with:
topology_test.py:TestTopology.decommissioned_node_cant_rejoin_test
Since bytes is a very generic value that is returned from many calls,
it is easy to pass it by mistake to a function expecting a data_value,
and to get a wrong result. It is impossible for the data_value constructor
to know if the argument is a genuine bytes variable, a data_value of another
type, but serialized, or some other serialized data type.
To prevent misuse, make the data_value(bytes) constructor
(and complementary data_value(optional<bytes>) explicit.
We use boost::any to convert to and from database values (stored in
serlialized form) and native C++ values. boost::any captures information
about the data type (how to copy/move/delete etc.) and stores it inside
the boost::any instance. We later retrieve the real value using
boost::any_cast.
However, data_value (which has a boost::any member) already has type
information as a data_type instance. By teaching data_type intances about
the corresponding native type, we can elimiante the use of boost::any.
While boost::any is evil and eliminating it improves efficiency somewhat,
the real goal is growing native type support in data_type. We will use that
later to store native types in the cache, enabling O(log n) access to
collections, O(1) access to tuples, and more efficient large blob support.
This map will contain the (internal) IPs corresponding to specific Nodes.
The mapping is also stored in the system.peers table.
So, instead of always connecting to external IP messaging_service::get_rpc_client()
will query _preferred_ip_cache and only if there is no entry for a given
Node will connect to the external IP.
We will call for init_local_preferred_ip_cache() at the end of system table init.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v2:
- Improved the _preferred_ip_cache description.
- Code styling issues.
New in v3:
- Make get_internal_ip() public.
- get_rpc_client(): return a get_preferred_ip() usage dropped
in v2 by mistake during rebase.
get_preferred_ips() returns all preferred_ip's stored in system.peers
table.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
New in v2:
- Get rid of extra std::move().
Fixes #423
* CF ID now maps to a truncation record comprised of a set of
per-shard RP:s and a high-mark timestamp
* Retrieving RP:s are done in "bulk"
* Truncation time is calculated as max of all shards.
This version of the patch will accept "old" truncation data, though the
result of applying it will most likely not be correct (just one shard)
Record is still kept as a blob, "new" format is indicated by
record size.
Align with rest of file (for better or worse). This allows calls from
entity without query_processor handy (i.e. storage_proxy).
Added "minimal" setup method for the "global" state, to facilitate
tests. Doing a full setup either in cql_test_env or after it is created
breaks badly. (Not sure why). So quick workaround.
Updated the current two users (batchlog_manager and commitlog_replayer)
callsites to conform.
All database code was converted to is when storage_proxy was made
distributed, but then new code was written to use storage_proxy& again.
Passing distributed<> object is safer since it can be passed between
shards safely. There was a patch to fix one such case yesterday, I found
one more while converting.
Fixes#266
Some callsites are fine: if we just get the message and process it, as is the
case with check_health for instance, msg will be alive and all is good. But if
we return a future inside the processing, msg must be kept alive. Classic bug,
appearing again.
Pekka saw this in practice in another bug. We haven't seen anything that is
related to this, but it is certainly wrong.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Before:
host_id in system.local is empty
After:
host_id in system.local is inserted correctly
This fixes a hasty problem that we always get a new host_id when
booting up a node with data.
We set status to COMPLETED in join_token_ring
set_bootstrap_state(db::system_keyspace::bootstrap_state::COMPLETED)
but
cqlsh 127.0.0.$i -e "SELECT * from system.local;"
shows
bootstrapped -> IN_PROGRESS
The static sstring state_name is the bad boy.
To avoid spreading the futures all over, we will resort to a cache with this,
the same way we did for the dc/rack information.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
I'm not sure what happened. We have the same commented code in both .hh
and .cc. It is very confusing when enabling some of the code. Let's
remove the duplicated code in .cc and leave the in .hh only.
They are multi-cell in Origin. This has nothing to do with 2.2 vs 2.1,
and it is just a plain bug.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
We will invoke the schema builder from schema_tables.cc, and at that point, the
information about compact storage no longer exists anywhere. If we just call it
like this, it will be the same as calling it with compact_storage::no, which
will trigger a (wrong) recomputation for compact_storage::yes CFs
The best way to solve that, is make the compact_storage parameter mandatory
every time we create a new table - instead of defaulting to no. This will
ensure that the correct dense and compound calculation are always done when
calling the builder with a parameter, and not done at all when we call it
without a parameter.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
This table exists in 2.1.8, and although it is dropped in 2.2, we
should at least list its schema.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2.1.8 tables have 3 more fields in their system tables, that 2.2 don't.
Since we aim at 2.1 compatibility, we have to include them.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
There's nothing legacy about it so rename legacy_schema_tables to
schema_tables. The naming comes from a Cassandra 3.x development branch
which is not relevant for us in the near future.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
We should pass inet_address.addr().
With this, tokens in system.peers are updated correctly.
(1 rows)
cqlsh> SELECT tokens from system.peers;
tokens
------------------------------------------------------------------------
{'-5463187748725106974', '8051017138680641610', '8833112506891013468'}
(1 rows)
I got this error If I pass inet_address to it.
boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_any_cast>
> (boost::bad_any_cast: failed conversion using boost::any_cast)
Abillity to enable/disable specific sub-modules - this settings do not
affect system tables which are allways persisted,cached and written to
commitlog
enable-in-memory-data-store marks if tables will be written/read to/from
disk
enable-commitllog marks if tables will be written to commitlog
enable-cache marks if tables will be written/read to/from cache
Please note in-memory-data-store does not change the read path so "old"
sstables are still read and cache may be used to cache their data
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
The at_exit() callback needs to return a future. In one place we forgot,
and now that at_exit() takes an std::function<>, this is verified at
compilation time and fails compilation.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>