Use seastar::checked_ptr<weak_ptr<pepared_statement>> instead of shared_ptr for passing prepared statements around.
This allows an easy tracking and handling of statements invalidation.
This implementation will throw an exception every time an invalidated
statement reference is dereferenced.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
* seastar 6b21197...2ebe842 (6):
> Merge "Various improvements to execution stages" from Paweł
> app-template: allow apps to specify a name for help message
> bool_class: avoid initializing object of incomplete type
> app-template: make sure we can still get help with required options
> prometheus: Http handler that returns prometheus 0.4 protobuf or text format
> Update DPDK to 17.02
Includes patch from Pawel to adjust to updated execution_stage interface.
Adds yet another magic function "SCYLLA_COUNTER_SHARD_LIST", indicating that
argument value, which must be a list of tuples <int, UUID, long, long>,
should be inserted as an actual counter value, not update.
This of course to allow counters to be read from sstable loader.
Note that we also need to allow timestamps for counter mutations,
as well as convince the counter code itself to treat the data as
already baked. So ugly wormhole galore.
v2:
* Changed flag names
* More explicit wormholing, bypassing normal counter path, to
avoid read-before-write etc
* throw exceptions on unhandled shard types in marshalling
v3:
* Added counter id ordering check
* Added batch statement check for mixing normal and raw counter updates
Message-Id: <1487683665-23426-2-git-send-email-calle@scylladb.com>
This patch checks for additional permissions when modifying a table
with views, since that update will require reading from the table and
writing into its views.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
- Add a inserts, updates, deletes members to cql_stats.
- Store cql_stats& in a modification_statement and increment the corresponding counter according to the value of a "type" field.
- Store cql_stats& in a batch_statement and increment the statistics for each BATCH member.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Having a trace_state_ptr in the storage_proxy level is needed to trace code bits in this level.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Store the trace state in the abstract_write_response_handler.
Instrument send_mutation RPC to receive an additional
rpc::optional parameter that will contain optional<trace_info>
value.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
We want the format of query results to be eventually defined in the
IDL and be independent of the format we use in memory to represent
collections. This change is a step in this direction.
The change decouples format of collection cells in query results from
our in-memory representation. We currently use collection_mutation_view,
after the change we will use CQL binary protocol format. We use that because
it requires less transformations on the coordinator side.
One complication is that some list operations need to retrieve keys
used in list cells, not only values. To satisfy this need, new query
option was added called "collections_as_maps" which will cause lists
and sets to be reinterpreted as maps matching their underlying
representation. This allows the coordinator to generate mutations
referencing existing items in lists.
List operations and prefetching were not handling static columns
correctly. One issue was that prefetching was attaching static column
data to row data using ids which might overlap with clustered columns.
Another problem was that list operations were always constructing
clustering key even if they worked on a static column. For static
columns the key would be always empty and lookup would fail.
The effect was that list operations which depend on curent state had
no effect. Similar problem could be observed on C* 2.1.9, but not on 2.2.3.
Fixes#903.
This is how Java does. But in C++, "throw new", although valid, would require
the catcher to catch a pointer to the exception - which isn't really what we
do.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Persist column family's "is_dense" value to system tables. Please note
that we throw an exception if "is_dense" is null upon read. That needs
to be fixed later by inferring the value from other information like
Origin does.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Current model was not really correct because Origin doesn't support
querying of partition ranges by their value. We can query slices
according to dht::decorated_key ordering, which orders partitions
first by token then by key value.
ring_position encapsulates range constraint. Key value is optional, in
which case only token is constrained.
partitions_ranges will be manipulated upon to be split for different
destination, so provide it separately from read_command to not copy the
later for each destination.
To prepare a user-defined type, we need to look up its name in the keyspace.
While we get the keyspace name as an argument to prepare(), it is useless
without the database instance.
Fix the problem by passing a database reference along with the keyspace.
This precolates through the class structure, so most cql3 raw types end up
receiving this treatment.
Origin gets along without it by using a singleton. We can't do this due
to sharding (we could use a thread-local instance, but that's ugly too).
Hopefully the transition to a visitor will clean this up.
This gives about 30% increase in tps in:
build/release/tests/perf/perf_simple_query -c1 --query-single-key
This patch switches query result format from a structured one to a
serialized one. The problems with structured format are:
- high level of indirection (vector of vectors of vectors of blobs), which
is not CPU cache friendly
- high allocation rate due to fine-grained object structure
On replica side, the query results are probably going to be serialized
in the transport layer anyway, so this change only subtracts
work. There is no processing of the query results on replica other
than concatenation in case of range queries. If query results are
collected in serialized form from different cores, we can concatenate
them without copying by simply appending the fragments into the
packet. This optimization is not implemented yet.
On coordinator side, the query results would have to be parsed from
the transport layer buffers anyway, so this also doesn't add work, but
again saves allocations and copying. The CQL server doesn't need
complex data structures to process the results, it just goes over it
linearly consuming it. This patch provides views, iterators and
visitors for consuming query results in serialized form. Currently the
iterators assume that the buffer is contiguous but we could easily
relax this in future so that we can avoid linearization of data
received from seastar sockets.
The coordinator side could be optimized even further for CQL queries
which do not need processing (eg. select * from cf where ...) we
could make the replica send the query results in the format which is
expected by the CQL binary protocol client. So in the typical case the
coordinator would just pass the data using zero-copy to the client,
prepending a header.
We do need structure for prefetched rows (needed by list
manipulations), and this change adds query result post-processing
which converts serialized query result into a structured one, tailored
particularly for prefetched rows needs.
This change also introduces partition_slice options. In some queries
(maybe even in typical ones), we don't need to send partition or
clustering keys back to the client, because they are already specified
in the query request, and not queried for. The query results hold now
keys as optional elements. Also, meta-data like cell timestamp and
ttl is now also optional. It is only needed if the query has
writetime() or ttl() functions in it, which it typically won't have.
'keys' and 'prefix' are used twice in the same expression, and as the language
does not guarantee any ordering in this case, any moves are illegal.
Get rid of them.