It is safe to copy column_mapping accros shards. Such guarantee comes at
the cost of performance.
This patch makes commitlog_entry_writer use IDL generated writer to
serialise commitlog_entry so that column_mapping is not copied. This
also simplifies commitlog_entry itself.
Performance difference tested with:
perf_simple_query -c4 --write --duration 60
(medians)
before after diff
write 79434.35 89247.54 +12.3%
This patch allows a view schema to be frozen. To unfreeze such a
schema, we add an is_view attribute to the schema idl.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
reconcilable_result can be merged with another or transformed into
query::result. Make sure that short_read information is never lost.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
When paging is used the cluster is allowed to return less rows than the
client asked for. However, if such possibility is used we need a way of
telling that to the coordinator and the paging implementation so that
they can differentiate between short reads caused by the replica running
out of data to sent and short reads caused by any other means.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
If we're talking to just one replica, the digest is not going to be used,
so better not to calculate it at all. The optimization helps with
LOCAL_ONE queries where the result is large, but does not contain large
blobs (many small rows).
This patch adds a digest_algorithm parameter to the READ_DATA verb that
can take on two values: none and MD5 (default), and sets it to none when
we're reading from one replica.
In the future we may add other values for more hardware-friendly digest
algorithms.
Message-Id: <1479380600-19206-1-git-send-email-avi@scylladb.com>
Wrapping ranges are a pain, so we are moving wrap handling to the edges.
Since cql can't generate wrapping ranges, this means thrift and the ring
maintenance code; also range->ring transformations need to merge the first
and last ranges.
Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>
The main idea is to log queries that take "too long" to complete.
The "too long" is above the given threshold.
To achieve the above this patch does the following:
- Introduce two new properties to the tracing::trace_state:
- "Full tracing": when the tracing of this query was explicitly requested.
In this state we will record all possible traces related to this query:
both on the coordinator and on any replica involved.
- "Log slow query": when slow query logging is enabled.
If slow query logging is enabled and a session's "duration" is above
the specified threshold we will create a record in the "slow queries log"
and write all trace records created on the coordinator and on a replica
if a replica's session lasts longer than that threshold.
(We will propagate the Coordinator's slow query logging threshold to replicas
in the context of a specific tracing/logging session).
The properties above are independent, namely they may be enabled and/or disabled
independently and any combination of them is legal (naturally, creating a tracing
session when both states above are disabled makes no sense).
- Instrument the tracing::tracing service to allow the following:
- Enable/disable slow query logging.
- Set/get the slow query duration threshold (in microseconds).
- Set/get the slow query log record TTL value (in seconds).
- Instrument the trace_keyspace_helper to write a slow query log entry
when requested.
- The slow query logging is disabled by default and the threshold is set to half a second.
- The TTL of a slow log record is set to 86400 seconds by default.
- It makes sense to use the same "slow query logging threshold" and a "slow query record TTL"
both on a coordinator and on a replica Nodes in a context of the same tracing session:
- Pass both TTL and a threshold to the replica in a trace_info.
This patch also implements the new slow query logging specific logic:
- Don't write the pending tracing records before the end of a tracing session
until "duration" reaches the logging threshold.
- Don't build the parameters<sstring, sstring> map unless we know we will write it
to I/O.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
- Instead of keeping separate booleans introduce a trace_state_props_set enum_set and
pass it around instead of separate booleans.
- Change the trace_info to hold this value in addition to write_on_close. Initialize
a corresponding bit in an enum_set based on a write_on_close value in a trace_info
constructor for a backward compatibility.
- Separate a trace_state constructor into two:
- For a primary session object.
- For a secondary session object.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
This patch changes the type of query::clustering_range to express that
ranges that wrap around are not allowed, and ranges that have the
start bound after the end bound are considered empty.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
In names of functions and variables:
s/flush_/write_/
s/store_/write_/
In a i_tracing_backend_helper:
s/flush()/kick()/
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
This patch adds support to send a cell's ttl as part of a query's
result. This is needed for thrift support.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch makes hashing for repair calculate checksums in a way that
doesn't require rebuilding whole mutation.
Unfortunately, such checksums are incompatible with the old ones so the
old way for computing checksums is preserved for compatibility reasons.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
This patch as a per-partition row limit. It ensures both local
queries and the reconciliation logic abide by this limit.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch introduces the range_tombstone class, composed of
a [start, end] pair of clustering_key_prefixes, the type
of inclusiveness of each bound, and a tombstone.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
... and make it a clustering_key_prefix, in preparation of
supporting not-whole-row range tombstones.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
We did the clean up in idl/gossip_digest.idl.hh, but the patch to clean
up gms/application_state.hh was never merged.
To maintain compatibility with previous version of scylla, we can not
change application_state.hh, instead change idl to be sync with
application_state.hh.
Message-Id: <3a78b159d5cb60bc65b354d323d163ce8528b36d.1458557948.git.asias@scylladb.com>
Result digest is going to be computed in query result builder and
require information not available in the query resylt. That's why the
digest now needs to be sent to the other nodes together with the result
as they won't be able compute it on their own.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
The query result footprint for cassandra-stress mutation as reported
by tests/memory-footprint increased by 18% from 285 B to 337 B.
perf_simple_query shows slight regression in throughput (-8%):
build/release/tests/perf/perf_simple_query -c4 -m1G --partitions 100000
Before: ~433k tps
After: ~400k tps
Test auto-generated and writer-based serialization as well as
deserialization of simple compound type, vectors and variants.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
"Fixes #884Fixes#895
Also at seastar-dev: calle/truncate_more
1.) Change truncation records to be stored with IDL serialization
2.) Fix db::serializers encoding of replay_position
3.) Detect attempted reading of Origin truncation records, and instead
of crashing, ignore and warn.
4.) Change truncation time stamps to be generated per-shard, _after_
CF flush is done, otherwise data in memtables at flush would be
retained/replayed on next start. Retain the highest time stamp
generated.
Note for (3): This patch set does _not_ clear out origin records
automatically. This because I feel that is a somewhat drastic and
irreversible thing to do. If we want to avail the user of a means
to get rid of the (3) warning, we should probably tell him to either
use cqlsh, or add an API call for this, so he can do it explicitly.
"
There are only two messages: prepare_message and outgoing_file_message.
Actually only the prepare_message is the message we send on wire.
Flatten the namespace.