When we use str.format() to pass variables on the message it will always
causes Exception like "KeyError: 'red'", since the message contains color
variables but it's not passed to str.format().
To avoid the error we need to pass all format variables to colorprint()
and run str.format() inside the function.
Fixes#3649
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180803015216.14328-1-syuu@scylladb.com>
When murmur3_ignore_msb_bits was introduced in 1.7, we set its default zero
(to avoid resharding on upgrade) and set it to 12 in the scylla.yaml template
(to make sure we get the right value for new clusters).
Now, however, things have changed:
- clusters installed before 1.7 are a small minority
- they should have resharded long ago
- resharding is much better these days
- we have more migrations from Cassandra compared to old clusters
To allow clusters that migrated using their cassandra.yaml, and to clean up
the default scylla.yaml, make the default 12.
Users upgrading from pre-1.7 clusters will need to update their scylla.yaml,
or to reshard (which is a good idea anyway).
Fixes#3670.
Message-Id: <20180808063003.26046-1-avi@scylladb.com>
Currently scylla_ec2_check exits silently when EC2 instance is optimized
for Scylla, it's not clear a result of the check, need to output
message.
Note that this change effects AMI login prompt too.
Fixes#3655
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20180808024256.9601-1-syuu@scylladb.com>
These tests check the correctness of resulting compacted SSTables based
on the files produced by compacting input files with Cassandra.
Note that output files are not identical to those generated by Cassandra
because Scylla compaction does not yet optimise delta-encoded values
using serialization header.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
Message-Id: <3fa05ce72352292d1026ce80ac87552889d10d96.1533667535.git.vladimir@scylladb.com>
Since our scripts were converted to Python, we can no longer
source them from a shell. Execute them directly instead. Also,
we now need to import configuration variables ourselves, since
scylla_prepare, being an independent process, won't do it for
us.
Fixes#3647
Message-Id: <20180802153017.11112-1-avi@scylladb.com>
"
Store sizes of the request and the response for each traces query.
In the example below I traced the cassandra-stress write workload with a default schema using the probabilistic tracing.
Here is an entry created for one of queries:
cassandra@cqlsh> SELECT parameters FROM system_traces.sessions where session_id=30c3a8ea-96bb-11e8-8a97-000000000000;
parameters
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{'consistency_level': 'LOCAL_ONE', 'page_size': '5000', 'param[0]': 'f749eb03d6a995d8b3496075da8f20aa9228c5db12401e8a37000fa5baa13531...', 'param[1]': '845809b53a9aff7eef8f85308eaef79e03c696653ca23957f1ed5d539dc00463...', 'param[2]': 'd303585def93a5d40e41ceb12880ad3ede3d9f6308a1b1c5e42e911a191f1de1...', 'param[3]': 'be77c7da059d4b52687cd9b3eaa7d04cdfe7b5e38e84a8eea318299a01c7845f...', 'param[4]': '32faaaea1b3d73d9d628a4945b69a8531740348d49ee30c03f697dd2d63e8dee...', 'param[5]': '50503850374d34323330', 'query': 'UPDATE "standard1" SET "C0" = ?,"C1" = ?,"C2" = ?,"C3" = ?,"C4" = ? WHERE KEY=?', 'serial_consistency_level': 'SERIAL'}
(1 rows)
cassandra@cqlsh> SELECT request_size,response_size FROM system_traces.sessions where session_id=30c3a8ea-96bb-11e8-8a97-000000000000;
request_size | response_size
--------------+---------------
239 | 4
(1 rows)
Now let's try to read the same keyspace1.standard1 entry (based on the "key" value in "param[5]") from cqlsh and trace it using TRACING ON.
cassandra@cqlsh> TRACING ON
Now Tracing is enabled
cassandra@cqlsh> SELECT * from keyspace1.standard1 where key=0x50503850374d34323330;
key | C0 | C1 | C2 | C3 |
C4
------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+------------------------------------------------------------------------+-
-----------------------------------------------------------------------
0x50503850374d34323330 | 0xf749eb03d6a995d8b3496075da8f20aa9228c5db12401e8a37000fa5baa135315430 | 0x845809b53a9aff7eef8f85308eaef79e03c696653ca23957f1ed5d539dc00463e10e | 0xd303585def93a5d40e41ceb12880ad3ede3d9f6308a1b1c5e42e911a191f1de12924 | 0xbe77c7da059d4b52687cd9b3eaa7d04cdfe7b5e38e84a8eea318299a01c7845fb8a2 |
0x32faaaea1b3d73d9d628a4945b69a8531740348d49ee30c03f697dd2d63e8dee5dde
(1 rows)
Tracing session: 639ca0a0-96bb-11e8-8a97-000000000000
activity | timestamp | source | source_elapsed
------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------+----------------
Execute CQL3 query | 2018-08-02 21:20:20.906000 | 192.168.1.138 | 0
Parsing a statement [shard 0] | 2018-08-02 21:20:20.906358 | 192.168.1.138 | --
Processing a statement [shard 0] | 2018-08-02 21:20:20.906405 | 192.168.1.138 | 47
Creating read executor for token -5698461774438220979 with all: {192.168.1.138} targets: {192.168.1.138} repair decision: NONE [shard 0] | 2018-08-02 21:20:20.906445 | 192.168.1.138 | 87
read_data: querying locally [shard 0] | 2018-08-02 21:20:20.906448 | 192.168.1.138 | 90
Start querying the token range that starts with -5698461774438220979 [shard 0] | 2018-08-02 21:20:20.906452 | 192.168.1.138 | 94
Querying is done [shard 0] | 2018-08-02 21:20:20.906509 | 192.168.1.138 | 151
Done processing - preparing a result [shard 0] | 2018-08-02 21:20:20.906533 | 192.168.1.138 | 175
Request complete | 2018-08-02 21:20:20.906186 | 192.168.1.138 | 186
cassandra@cqlsh> TRACING OFF
Disabled Tracing.
cassandra@cqlsh> SELECT request_size,response_size FROM system_traces.sessions where session_id=639ca0a0-96bb-11e8-8a97-000000000000;
request_size | response_size
--------------+---------------
82 | 369
(1 rows)
"
* 'tracing_request_response_size-v2' of https://github.com/vladzcloudius/scylla:
tracing: move all tracing related API functions to a cold path
tracing: store a query response size
tracing: store request size
In previous versions of Fedora, the `crypt_r` function returned
`nullptr` when a requested hashing algorithm was not supported.
This is consistent with the documentation of the function in its man
page.
As of Fedora 28, the function's behavior changes so that the encrypted
text is not `nullptr` on error, but instead the string "*0".
The info pages for `crypt_r` clarify somewhat (and contradict the man
pages):
Some implementations return `NULL` on failure, and others return an
_invalid_ hashed passphrase, which will begin with a `*` and will
not be the same as SALT.
Because of this change of behavior, users running Scylla on a Fedora 28
machine which was upgraded from a previous release would not be able to
authenticate: an unsupported hashing algorithm would be selected,
producing encrypted text that did not match the entry in the table.
With this change, unsupported algorithms are correctly detected and
users should be able to continue to authenticate themselves.
Fixes#3637.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <bcd708f3ec195870fa2b0d147c8910fb63db7e0e.1533322594.git.jhaberku@scylladb.com>
This patch completes what was started in a4282c2c6e
Make trace_state_ptr to be a wrapper class around lw_shared_ptr<trace_state> that
hints that bool(trace_state_ptr) is likely to return FALSE.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Add a new "response_size" column to system_traces.sessions and store a size of an uncompressed response
for a traced query.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Add a new column "request_size" to system_traces.sessions and store
the uncompressed request frame data size.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
"
There is an exception safety problem in imr::utils::object. If multiple
memory allocations are needed and one of them fails the main object is
going to be freed (as expected). However, at this stage it is not
constructed yet, so when LSA asks its migrator for the size it may get
a meaningless value. The solution is to remember the size until object
is fully created and use sized deallocation in case of failures.
Fixes#3618.
Tests: unit(release, debug/imr_test)
"
Each IMR type needs its own LSA migrator. It is possible that user will
provide a migrator for a different type than the one which instance is
being created. This patch adds compile-time detection of that bug.
Each member of a structure may require different deserialisation
context. They are provided by context_for<Tag>() method of the context
used to deserialise the structure itself.
imr::utils::object needs to add backpointer to the structure it manages
so that it can be used in the LSA memory. This is done by creating a
structure that has two members: the backpointer and the actual structure
that imr::utils::object is to manage. imr::utils::object_context creates
approperiate deserialisation context for it.
context_for() is called for each member of a structure. object_context
implementation of context_for() always created a deserialisation context
for the underlying structure regardless which member that was, so it was
done also for backpointer. This is wrong since the context may read the
object on its creation.
The fix is to use no_context_t for the backpointer.
imr::utils::object::make() handles creation of IMR objects. They are
created in three phases:
1. The size of the object and all additional needed memory allocations
is determined
2. All needed buffers are allocated
3. Data is written to the allocated space
When IMR objects are deallocated LSA asks their migrator for the size.
Migrator may read some parts of the object to figure out its size. This
is a problem if there is allocation failure in make() at point 2.
If one of required allocations fails, the buffers that were already
acquired need to be freed. However, since the object hasn't been fully
created yet migrator won't return a valid value.
The solution for this is to remember object size until all allocations
are completed. This way the LSA won't need to ask migrators for it in
case of failure. imr::alloc::object_allocator already does that but
imr::utils::object doesn't. This patch fixes that.
For some reason the doc entry for large_partitions was outdated.
It contained incorrect ORDERING information and wrong usage example,
since large_partitions' schema changed multiple times during
the reviewing process.
Message-Id: <1910f270419536ebccffde163ec1bfc67d273306.1533128957.git.sarna@scylladb.
com>
We need the mapping between dht::token_range to
std::vector<inet_address> and inet_address to dht::token_range_vector in
various places. Currently, we use std::unordered_multimap and convert to
std::unordered_map. It is better to use std::unordered_map in the first
place. The changes like below:
- Change from
std::unordered_multimap<dht::token_range, inet_address>
to
std::unordered_map<dht::token_range, std::vector<inet_address>>
- Change from
std::unordered_multimap<inet_address, dht::token_range>
to
std::unordered_map<inet_address, dht::token_range_vector>
Message-Id: <b8ecc41775e46ec064db3ee07510c404583390aa.1533106019.git.asias@scylladb.com>
The calculation consists of several parts with preemption point between
them, so a table can be added while calculation is ongoing. Do not
assume that table exists in intermediate data structure.
Fixes#3636
Message-Id: <20180801093147.GD23569@scylladb.com>
"
This series replaces infinite time-outs in internal distributed
(non-local) CQL queries with finite ones.
The implementation of tracing, which also performs internal queries,
already has finite time-outs, so it is unchanged.
Fixes#3603.
"
* 'jhk/finite_time_outs/v2' of https://github.com/hakuch/scylla:
Use finite time-outs for internal auth. queries
Use finite query time-outs for `system_distributed`
The moving operation changes a node's token to a new token. It is
supported only when a node has one token. The legacy moving operation is
useful in the early days before the vnode is introduced where a node has
only one token. I don't think it is useful anymore.
In the future, we might support adjusting the number of vnodes to reblance
the token range each node owns.
Removing it simplifies the cluster operation logic and code.
Fixes#3475
Message-Id: <144d3bea4140eda550770b866ec30e961933401d.1533111227.git.asias@scylladb.com>
"
When we are out of memtable space (real of virtual), lsa will defer running
our mutation application and run it later when memory is in fact available.
However, it will run it in the main group, giving the write more shares than it
would otherwise get.
This patchset fixes the problem by running those deferred mutation applications
in the correct scheduling group.
Fixes#3638
"
* tag '3638/v2' of https://github.com/avikivity/scylla:
database: tag dirty memory managers with scheduling groups
logalloc: run releaser() in user-provided scheduling group
dirty memory managers run code on behalf of their callers
in a background fiber, so provide that background fiber with
the scheduling group appropriate to their caller.
- system: main (we want to let system writes through quickly)
- dirty: statement (normal user writes)
- streaming: streaming (streaming writes)
* seastar 6b97e00...d40faff (10):
> tutorial: update build as needed for newer pandoc
> core: fix __libc_free return type signature
> future-utils: when_all: avoid calling member function on an uninitialized data member
> future-util: reduce continuations in when_all (variadic version)
> future-utils: remove allocation in when_all() if all futures are available
> future: reduce allocations in when_all()
> future: fill missing futurize::from_tuple() functions
> future: expose more types in continuation_base
> log: predict logger::is_enabled() as false
> README: add Resources section with infomation about the mailing list etc.
Let the user specify which scheduling group should run the
releaser, since it is running functions on the user's behalf.
Perhaps a cleaner interface is to require the user to call
a long-running function for the releaser, and so we'd just
inherit its scheduling group, but that's a much bigger change.
Both the Prometheus and the API servers are used for maintenance
operations, similarly to streaming. Run them under the streaming
scheduling group to prevent them from impacting normal operations,
and rename the streaming scheduling group to reflect the more
generic role.
This helps to prevent spikes from Prometheus or API requests from
interfering with the normal workload. Using an existing group is
preferable to creating a new group because in the worst case, all
the non-main-workload groups compete with the main workload.
Consolidating them allows us to give them significant shares in
total without increasing competition in the worst case.
The group's label is unchanged to preserve compatibility with
dashboards.
A nice side effect is that repair, which is initiated by API calls,
gets placed into the maintenance group naturally. Compaction tasks
which are run by compaction manager are not changed.
Message-Id: <20180714160723.23655-1-avi@scylladb.com>
Most queries run without tracing (and those that run with tracing
are not sensitive to a few cycles), so mark the tracing paths as
cold.
Message-Id: <20180723133000.30482-1-avi@scylladb.com>
This will allow continuous integration to use the optimal number
of compiler jobs, without having to resort to complex calculations
from its scripting environment.
Message-Id: <20180722172050.13148-1-avi@scylladb.com>
"
This series adds some optimisations to the paging logic, that attempt to
close the performance gap between paged and not paged queries. The
former are more complex so always are going to be slower, but the
performance loss was unacceptably large.
Fixes#3619.
Performance with paging:
./perf_paging_before ./perf_paging_after diff
read 271246.13 312815.49 15.3%
Without paging:
./perf_nopaging_before ./perf_nopaging_after diff
read 343732.17 342575.77 -0.3%
Tests: unit(release), dtests(paging_test.py, paging_additional_test.py)
"
* tag 'optimise-paging/v1' of https://github.com/pdziepak/scylla:
cql3: select statement: don't copy metadata if not needed
cql3: query_options: make simple getter inlineable
cql3: metadata: avoid copying column information
query_pager: avoid visiting result_view if not needed
query::result_view: add get_last_partition_and_clustering_key()
query::result_reader: fix const correctness
tests/uuid: add more tests including make_randm_uuid()
utils: uuid: don't use std::random_device()
std::random_device() uses the relatively slow /dev/urandom, and we rarely if
ever intend to use it directly - we normally want to use it to seed a faster
random_engine (a pseudo-random number generator).
In many places in the code, we first created a random_device variable, and then
using it created a random_engine variable. However, this practice created the
risk of a programmer accidentally using the random_device object, instead of the
random_engine object, because both have the same API; This hurts performance.
This risk materialized in just two places in the code, utils/uuid.cc and
gms/gossiper.cc. A patch for to uuid.cc was sent previously by Pawel and is
not included in this patch, and the fix for gossiper.{cc,hh} is included here.
To avoid risking the same mistake in the future, this patch switches across the
code to an idiom where the random_device object is not *named*, so cannot be
accidentally used. We use the following idiom:
std::default_random_engine _engine{std::random_device{}()};
Here std::random_device{}() creates the random device (/dev/urandom) and pulls
a random integer from it. It then uses this seed to create the random_engine
(the pseudo-random number generator). The std::random_device{} object is
temporary and unnamed, and cannot be unintentionally used directly.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180726154958.4405-1-nyh@scylladb.com>