Every lsa-allocated object is prefixed by a header that contains information
needed to free or migrate it. This includes its size (for freeing) and
an 8-byte migrator (for migrating). Together with some flags, the overhead
is 14 bytes (16 bytes if the default alignment is used).
This patch reduces the header size to 1 byte (8 bytes if the default alignment
is used). It uses the following techniques:
- ULEB128-like encoding (actually more like ULEB64) so a live object's header
can typically be stored using 1 byte
- indirection, so that migrators can be encoded in a small index pointing
to a migrator table, rather than using an 8-byte pointer; this exploits
the fact that only a small number of types are stored in LSA
- moving the responsibility for determining an object's size to its
migrator, rather than storing it in the header; this exploits the fact
that the migrator stores type information, and object size is in fact
information about the type
The patch improves the results of memory_footprint_test as following:
Before:
- in cache: 976
- in memtable: 947
After:
mutation footprint:
- in cache: 880
- in memtable: 858
A reduction of about 10%. Further reductions are possible by reducing the
alignment of lsa objects.
logalloc_test was adjusted to free more objects, since with the lower
footprint, rounding errors (to full segments) are different and caused
false errors to be detected.
Missing: adjustments to scylla-gdb.py; will be done after we agree on the
new descriptor's format.
The diagnostic that clang spits out when it sees an unrecognized warning
is itself a warning, so the test compilation succeeds and we don't notice
the warning is not supported.
Adding -Werror turns the warning about the unrecognized warning into an
error, allowing the detection machinery to work.
"Work on this series started with fixing the 'nodetool clearsnapshot'.
The current master code ignores the snapshots in deleted keyspaces (issue #2045).
I noticed that in many places our code has to build the path to some directory/file
it simply had the sstring(<path1>) + "/" + sstring(<path2>) constructs which may cause us issues
if somebody decides to complile/run scylla on not-Unix-based OS, like Microsoft Windows.
I understand that this is a long shot but if we can make it right now - why not to.
The answer is boost::filesystem::path class - its synchronous parts, of course.
I decided to take an initiative and fix the issues above and then use the fixed code for
fixing the issue #2045:
- Fix some minor issues in the existing code.
- Extend the lister class and move it into the separate files outside database.cc.
On the way I've found an issue in the existing code (issue #2071).
This series fixes this one too (PATCH2)."
Move lister class away from database.cc.
This is a preparation for moving it to the seastar library.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
This patch adds the may_be_affected_by() function to the view class,
which is responsible to determine whether an update to a base class
affects one of its views.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Live counter cells are collections of shards, each one representing the
sum of all operations performed by a particular replica. This commits
introduces an in-memory representation of counters as well as basic
operations such as merge, difference and hashing.
Transform the supervisor_notify() and related functions into
the "supervisor" class and place this class implementation in
a separate .cc file.
This is going to fix the compilation breakage of tests introduced
by a
commit 8014adc2a1
init: serialize the creation of system_traces KS objects
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <1483663955-20096-1-git-send-email-vladz@scylladb.com>
This patch adds a set of tests for materialized view schema
handling, complementing the dtests for the same feature.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds the alter_view_statement, which enables users to
change the properties of a materialized view.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Now that access to the size_estimates system is virtualized, we no
longer need the recorder.
Fixes#1616
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
By default, io checker will cause Scylla to shutdown if it finds
specific system errors. Right now, io checker isn't flexible
enough to allow a specialized handler. For example, we don't want
to Scylla to shutdown if there's an permission problem when
uploading new files from upload dir. This desired flexibility is
made possible here by allowing a handler parameter to io check
functions and also changing existing code to take advantage of it.
That's a step towards fixing #1709.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This patch adds the parsing for the "CREATE MATERIALIZED VIEW" statement,
following Cassandra 3 syntax. For example:
CREATE MATERIALIZED VIEW building_by_city
AS SELECT * FROM buildings
WHERE city IS NOT NULL
PRIMARY KEY(city, name);
It also adds the "IS NOT NULL" operator needed for this purpose.
As in Cassandra, "IS NOT NULL" can only be used for materialized
view creation, and not in a normal SELECT. It can only be used with
the NULL operand (i.e., "IS NOT 3" will be a syntax error).
The current implementation of this statement just does some sanity
checking (such as to verify that "city" is a valid column name and that
the "building" base table exists), complains that materialized views are
not yet supported:
SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] message="Failed parsing statement: [CREATE MATERIALIZED VIEW building_by_city AS
SELECT * FROM buildings
WHERE city IS NOT NULL
PRIMARY KEY(city, name);] reason: unsupported operation: Materialized views not yet supported">
As mentioned above, the "IS NOT NULL" restriction is not allowed in
ordinary selects not creating a materialized views:
SELECT * FROM buildings WHERE city IS NOT NULL;
InvalidRequest: code=2200 [Invalid query] message="restriction 'city IS NOT null' is only supported in materialized view creation"
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1475742927-30695-1-git-send-email-nyh@scylladb.com>
Remove clustering_key_filter_factory and clustering_key_filtering_context.
Use partition_slice directly with a static get_ranges method.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Cassandra 1.x clusters often use RandomPartitioner. Supporting
RandomPartitioner will allow easier migration to Scylla
Tests are added to make sure scylla generates the same token as
Cassandra does for the same partition key.
Fixes#1438
Message-Id: <3bc8b7f06fad16d59aaaa96e2827198ce74214c6.1469166766.git.asias@scylladb.com>
This patch implements the size_estimates_recorder, which periodically
writes estimations for all the non-system column families in the
size_estimates system table. The size_estimates_recorder class
corresponds to the one in Cassandra's SizeEstimatesRecorder.java.
Estimation is carried out by shard 0. Since we're estimating based on
data in shared sstables, having multiple shards doing this would skew
the results.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
The sstables::key class now delegates much of its functionality
to the composite class. All existing behavior is preserved.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
To ensure isolation of operation when streaming a mutation from a
mutable source (such as cache or memtable) MVCC is used.
Each entry in memtable or cache is actually a list of used versions of
that entry. Incoming writes are either applied directly to the last
verion (if it wasn't being read by anyone) or preprended to the list
(if the former head was being read by someone). When reader finishes it
tries to squash versions together provided there is no other reader that
could prevent this.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
This commit introduces mutation_fragment class which represents the parts
of mutation streamed by streamed_mutation.
mutation_fragment can be:
- a static row (only one in the mutation)
- a clustering row
- start of range tombstone
- end of range rombstone
There is an ordering (implemented in position_in_partition class) between
mutation_fragment objects. It reflects the order in which content of
partition appears in the sstables.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
This patch changes the type of the mutation partition's row_tombstones
to be a range_tombstone_list, so that they are now represented as a
set of disjoint ranges. All of its usages are updated accordingly.
Fixes#1155
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This class is responsible for representing a set of range tombstones
as non-overlapping disjoint sets of range tombstones.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch introduces the range_tombstone class, composed of
a [start, end] pair of clustering_key_prefixes, the type
of inclusiveness of each bound, and a tombstone.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>