test_case_sensitivity from tests/view_schema_test.cc was well-intentioned,
aiming to test from different angles the issue of non-lowercase (quoted)
column names and their interaction with materialized views.
But unfortunately, it didn't test anything! This is because the quotation
marks were forgotten, so all the identifier in this test were folded to
lowercase, and the test didn't test non-lowercase identifiers like it
intended.
So this patch adds the missing quotes, to make this test great again.
After the patches for issues #3388 and #3391 which I sent earlier, the
test *passes* (before those patches, the fixed test did not pass -
the unfixed test trivially passed).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-8-nyh@scylladb.com>
When the secondary index code builds a "%s IS NOT NULL" clause for a
CQL statement, it needs to quote the column name if it needs to be
(not only lowercase, digits and _).
Fixes#3401.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-7-nyh@scylladb.com>
We had another case-sensitivity bug in materialized views, where if
a case-sensitive (quoted) column name was listed explicitly on "SELECT"
(instead of implicitly, e.g., in "SELECT *") the column name was
incorrectly folded to lower-case and inserts would fail.
This patch fixes the code, where a "SELECT" statement was built using
the desired column names, but column names that needed quoting were
not being quoted. The bug was in a helper function build_select_statement()
which took column name strings and failed to quote them. We clean up this
function to take column definitions instead of strings - and take care
of the quoting itself. It also needs to quote the table's name in the
select statement being built.
Fixes#3391.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-6-nyh@scylladb.com>
Before this patch, if a materialized view is defined with the restriction
IS NOT NULL on a case-sensitive (quoted) column name, inserts fail with
a "restriction 'foobar IS NOT null' unknown column foobar" error, where
foobar is the lowercased version of the case-sensitive column name.
The problem is that the code uses single_column_relation::to_string()
to convert the relation into a CQL where clause. And indeed, this method
generates a CQL expression; But it calls column_identifier::raw::to_string()
to print identifiers. This is the wrong function - it doesn't quote
identifiers that need quoting because they are not lowercase.
So this patch uses column_identifier::raw::to_cql_string() (a method we
added in the previous patch) to generate the properly quoted CQL relation.
Fixes#3388
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-5-nyh@scylladb.com>
Implement a method column_identifier::raw::to_cql_string(). Exactly like
the one without "raw", this method quotes the identifier name as needed
for CQL. We'll need this method in a later patch.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-4-nyh@scylladb.com>
There is no reason for to_cql_string() and maybe_quote() to both
implement the same quoting algorithm. Use the latter to implement the
former.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-3-nyh@scylladb.com>
The utility function maybe_quote() is supposed to quote identifier names
(name of keyspace, table, or column) according to CQL rules, e.g., if the
name has any uppercase or non-alphanumeric characters, it needs to be
quoted. Unfortunatelty, it didn't quite do the right thing, so this patch
fixes that. This patch also adds a comment explaining what maybe_quote()
is supposed to do (until now, users could only guess).
Fixes#3400.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-2-nyh@scylladb.com>
In commit d674b6f672, I fixed a case-
sensitive column name bug by avoiding CQL quoting of a column name
in create_index_statement.cc when building a "targets" option string.
However, there is also matching code in target_parser.hh to unquote
that option string. So this unquoting code is no longer necessary, and
should be dropped.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-1-nyh@scylladb.com>
In the current code, if the base table has a compound partition key (i.e.,
multiple partition-key columns) searching its secondary indexes didn't work.
There is no real reason why this, it was a just a bug in preparing the
second query:
Every SI query is converted to two queries. The first queries the associated
materialized view, to find a list of primary keys. Those we need to use in a
second query, of the base table. The second query needs to list, as
restrictions, the keys found above. When a partition key is compound, its
components build one key and one restriction. But in the buggy code, we
incorrectly used each component as a separate (improperly formatted) key
and restriction, and obviously this didn't work.
This patch also adds a test that reproduces this problem and confirms its fix.
In the fixed code I also found another incorrect use of to_cql_string() (which
could break case-sensitive primary key column names) and changed it to
to_string().
Fixes#3210.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429124138.24406-1-nyh@scylladb.com>
We make multiple attempts to mark a node as alive. We do that be
sending an EchoMessage, and marking the node as alive upon receiving a
successful answer. In case there's a network partition and the nodes
can't reach each other, multiple messages may be delivered and
processed.
We can avoid processing duplicate EchoMessage replies by checking
whether we had already marked the node as alive.
Fixes#1184
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180428191942.31990-1-duarte@scylladb.com>
"
Recently many changes have landed in seastar for the I/O Scheduler. We
can now describe the I/O storage of a machine by its visible properties
like throughput and bandwidth instead of relying in an indirect
calculation.
For the instances we support, we can just measure that and start using
them right away.
A version of iotune that computes those properties is not yet ready, but
in its making I have noticed that we aren't really setting the nomerges
and scheduler properties of the disks under testing. We definitely
should, since that can influence the results. So this patchset also
starts doing that.
The commandline for iotunev2 shouldn't change much. When it is ready we
will just adjust this script once more.
"
* 'scylla_io_setup' of github.com:glommer/scylla:
scylla_io_setup: preconfigure i3 and i2 instances with new I/O scheduler properties
scylla_lib: drop support for m3 and c3 AWS instance types
io_setup: call blocktune before tuning I/O
blocktune: allow it to be called as a library.
scripts: move scylla-blocktune to scripts location
* seastar 70aecca...ac02df7 (5):
> Merge "Prefix preprocessor definitions" from Jesse
> cmake: Do not enable warnings transitively
> posix: prevent unused variable warning
> build: Adjust DPDK options to fix compilation
> io_scheduler: adjust property names
DEBUG, DEFAULT_ALLOCATOR, and HAVE_LZ4_COMPRESS_DEFAULT macro
references prefixed with SEASTAR_. Some may need to become
Scylla macros.
iterator incorrectly dereferenced when timestamp resolution not
explicitly specified.
following dtests are fixed:
compaction_additional_test.CompactionAdditionalStrategyTests_with_TimeWindowCompactionStrategy.compaction_is_started_on_boot_test
compaction_additional_test.CompactionAdditionalTest.compact_data_by_time_window_test
compaction_additional_test.CompactionAdditionalTest.compaction_removes_ttld_data_by_time_windows_test
compaction_test.TestCompaction_with_DateTieredCompactionStrategy.compaction_strategy_switching_test
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180427192545.17440-1-raphaelsc@scylladb.com>
We can use iotunev2 (or any other I/O generator) to test for the limits
of the disks for the i2 and i3 instance classes. The values I got here
are the values I got from ~5 invocations of the (yet to be upstreamed)
iotune v2, with the IOPS numbers rounded for convenience of reading.
During the execution, I verified that the disks were saturated so we
can trust these numbers even if iotunev2 is merged in a different form.
The numbers are very consistent, unlike what we usually saw with the
first version of iotune.
Previously, we were just multiplying the concurrency number by the
number of disks. Now that we have better infrastructure, we will
manually test i3.large and i3.xlarge, since their disks are smaller
and slower.
For the other i3, and all instances in the i2 family storage scales up
by adding more disks. So we can keep multiplying the characteristics of
one known disk by the number of disks and assuming perfect scaling.
Example for i3, obtained with i3.2xlarge:
read_iops = 411k
read_bandwidth = 1.9GB/s
So for i3.16xlarge, we would have read_iops = 3.28M and 15GB/s - very
close to the numbers advertised by AWS.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
m3 has 80GB SSDs in its largest form and I doubt anybody has ever
used it with Scylla.
I am also not aware of any c3 deployments. Since it is past generation,
it doesn't even show up in the default instance selector anymore.
I propose we drop AMI support for it. In practice, what that means is
that we won't auto-tune its I/O properties and people that want to use
it will have to run scylla_io_setup - like they do today with the EBS
instances.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
We are not configuring the disks the way we want them with respect to
scheduler and nomerges. This is an oversigh that became clear now that
I started rewriting iotune-- since I will explicitly test for that. But
since this can affect the results, it should be here all along.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
This patch makes the functions in scylla-blocktune available as a
library for other scripts - namely scylla_io_setup.
The filename, scylla-blocktune, is not the most convenient thing to call
from python so instead of just wrapping it in the usual test for
__main__ I am just splitting the file into two.
Another option would be to patch all callers to call
scylla_blocktune.py, but because we are usually not using extensions in
scripts that are meant to be called directly I decided for the split.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
scylla-blocktune currently lives in the top level but this is mostly
historical. When time comes for us to install it, the packaging systems
will copy it to /usr/lib/scylla with the others.
So for consistency let's make sure that it also lives in the scripts
directory.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
After upgrade from 1.7 to 2.0, nodes will record a per-table schema
version which matches that on 1.7 to support the rolling upgrade. Any
later schema change (after the upgrade is done) will drop this record
from affected tables so that the per-table schema version is
recalculated. If nodes perform a schema pull (they detect schema
mismatch), then the merge will affect all tables and will wipe the
per-table schema version record from all tables, even if their schema
did not change. If then only some nodes get restarted, the restarted
nodes will load tables with the new (recalculated) per-table schema
version, while not restarted nodes will still use the 1.7 per-table
schema version. Until all nodes are restarted, writes or reads between
nodes from different groups will involve a needless exchange of schema
definition.
This will manifest in logs with repeated messages indicating schema
merge with no effect, triggered by writes:
database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f
database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f
database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f
The sync will be performed if the receiving shard forgets the foreign
version, which happens if it doesn't process any request referencing
it for more than 1 second.
This may impact latency of writes and reads.
The fix is to treat schema changes which drop the 1.7 per-table schema
version marker as an alter, which will switch in-memory data
structures to use the new per-table schema version immediately,
without the need for a restart.
Fixes#3394
Tests:
- dtest: schema_test.py, schema_management_test.py
- reproduced and validated the fix with run_upgrade_tests.sh from git@github.com:tgrabiec/scylla-dtest.git
- unit (release)
Message-Id: <1524764211-12868-1-git-send-email-tgrabiec@scylladb.com>
"
This patch series introduces initial support for writing SSTables in
'mc' format (aka SSTables 3.0).
Currently, the following components are written in 3.0 format:
- Data.db
- Index.db
- Summary.db
(there were no changes to summary files format compared to ka/la)
Other SSTables components are written in the old format for now as they
still need to exist to satisfy post-flush processing.
For now, only rows are written to the data file and indexed. Range
tombstones are not supported.
Writing rows is supported in full with the only exception being counter
cells. All the other features (TTLed data, row/cell level tombstones,
collections, etc) are supported.
Unit tests rely on producing files and binary-comparing them with
'golden' copies that are produced using Cassandra 3.11. This is done to
not block until reading SSTables 3.0 format is implemented.
=======================================
Implementation notes
=======================================
Internally, sstable_writer has been refactored to support multiple
implementations that are instantiated in its constructor based on the
sstable version. Little to no code is shared among sstable_writer_v2 and
sstable_writer_v3 as we only intend to support sstable_writer_v2
alongside sstable_writer_v3 for a single release (to be able to do
rollback on rolling upgrade failure) and then plan to get rid of it
entirely and switch to always writing SSTables in the new format.
The design of sstable_writer_v3 mostly follows that of its precursors
sstable_writer(_v2) and components_writer. Some refactoring and further
code rearrangements are expected in the future but the main code is
there.
"
* 'projects/sstables-30/write-rows/v2' of https://github.com/argenet/scylla:
Add tests for writing data and index files in SSTables 3.0 ('mc') format.
Support for writing SSTables 3.0 ('mc') Data.db and Index.db files - rows only.
Add missing enum values to bound_kind.
Add building blocks for writing data in SSTables 3.0 format.
Refactor sstable_writer to support various internal implementations.
Add is_fixed_length() to data types.
Add mutation_partition::apply_insert() overload that accepts TTL and expiry for row marker.
bound_kind::clustering, bound_kind::excl_end_incl_start and
bound_kind::incl_end_excl_start are used during SSTables 3.0 writing.
bound_kind::static_clustering is not used yet but added for completeness
and parity with the Origin.
For #1969.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
For any given CQL data type, this member returns whether its values are
of fixed or variable length. This is used by SSTables 3.0 format to only
store the length value for variable-length cells.
For #1969.
Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
"
This patchset prepares everything for support of both 2.x and 3.x formats and implements reading from sstable 3.x
very simple table with just partition keys.
Tests: units (release)
"
* 'haaawk/sstables3/read_only_partitions_v4' of ssh://github.com/scylladb/seastar-dev: (22 commits)
Test for reading sstable in MC format with no columns
Use new mp_row_consumer_m and data_consume_rows_context_m
Introduce mp_row_consumer_m
Rename mp_row_consumer to mp_row_consumer_k_l
Introduce consumer_m and data_consume_rows_context_m
Use read_short_length_bytes in RANGE_TOMBSTONE
Use read_short_length_bytes in ATOM_START
Use read_short_length_bytes in ROW_START
Add continuous_data_consumer::read_short_length_bytes
Reduce duplication with continuous_data_consumer::read_partial_int
Add test for a simple table with just partition key
Add test for reading index
Extract mp_row_consumer to separate header
Make sstable_mutation_reader independent from mp_row_consumer
Make sstable_mutation_reader a template
Make data_consume_context a template
Move data_consume_rows_context from row.cc to row.hh
Decouple sstable.hh and row.hh
Reduce visibility of sstable::data_consume_*
Move data_consume_context to separate header
...
Take DataConsumeRowsContext type as parameter.
This will allow us to implement different context
for reading 3.x files.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>