mirror of
https://github.com/scylladb/scylladb.git
synced 2026-06-01 12:36:56 +00:00
f917f73616782673153531dbbe8ad999c42698db
"Our domain objects have schema version dependent format, for efficiency reasons. The data structures which map between columns and values rely on column ids, which are consecutive integers. For example, we store cells in a vector where index into the vector is an implicit column id identifying table column of the cell. When columns are added or removed the column ids may shift. So, to access mutations or query results one needs to know the version of the schema corresponding to it. In case of query results, the schema version to which it conforms will always be the version which was used to construct the query request. So there's no change in the way query result consumers operate to handle schema changes. The interfaces for querying needed to be extended to accept schema version and do the conversions if necessary. Shard-local interfaces work with a full definition of schema version, represented by the schema type (usually passed as schema_ptr). Schema versions are identified across shards and nodes with a UUID (table_schema_version type). We maintain schema version registry (schema_registry) to avoid fetching definitions we already know about. When we get a request using unknown schema, we need to fetch the definition from the source, which must know it, to obtain a shard-local schema_ptr for it. Because mutation representation is schema version dependent, mutations of different versions don't necessarily commute. When a column is dropped from schema, the dropped column is no longer representable in the new schema. It is generally fine to not hold data for dropped columns, the intent behind dropping a column is to lose the data in that column. However, when merging an incoming mutation with an existing mutation both of which have different schema versions, we'd have to choose which schema should be considered "latest" in order not to loose data. Schema changes can be made concurrently in the cluster and initiated on different nodes so there is not always a single notion of latest schema. However, schema changes are commutative and by merging changes nodes eventually agree on the version. For example adding column A (version X) on one node and adding column B (version Y) on another eventually results in a schema version with both A and B (version Z). We cannot tell which version among X and Y is newer, but we can tell that version Z is newer than both X and Y. So the solution to the problem of merging conflicting mutations could be to ensure that such merge is performed using the schema which is superior to schemas of both mutations. The approach taken in the series for ensuring this is as follows. When a node receives a mutation of an unknown schema version it first performs a schema merge with the source of that mutation. Schema merge makes sure that current node's version is superior to the schema of incoming mutation. Once the version is synced with, it is remembered as such and won't be synced with on later mutations. Because of this bookkeeping, schema versions must be monotonic; we don't want table altering to result in any earlier version because that would cause nodes to avoid syncing with them. The version is a cryptographically-secure hash of schema mutations, which should fulfill this purpose in practice. TODO: It's possible that the node is already performing a sync triggered by broadcasted schema mutations. To avoid triggering a second sync needlessly, the schema merging should mark incoming versions as being synced with. Each table shard keeps track of its current schema version, which is considered to be superior to all versions which are going to be applied to it. All data sources for given column family within a shard have the same notion of current schema version. Individual entries in cache and memtables may be at earlier versions but this is hidden behind the interface. The entries are upgraded to current version lazily on access. Sstables are immutable, so they don't need to track current version. Like any other data source, they can be queried with any schema version. Note, the series triggered a bug in demangler: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68700"
#Scylla
##Building Scylla
In addition to required packages by Seastar, the following packages are required by Scylla.
Submodules
Scylla uses submodules, so make sure you pull the submodules first by doing:
git submodule init
git submodule update --recursive
Building and Running Scylla on Fedora
- Installing required packages:
sudo yum install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel
- Build Scylla
./configure.py --mode=release --with=scylla --disable-xen
ninja-build build/release/scylla -j2 # you can use more cpus if you have tons of RAM
- Run Scylla
./build/release/scylla
- run Scylla with one CPU and ./tmp as data directory
./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1
- For more run options:
./build/release/scylla --help
Building Fedora RPM
As a pre-requisite, you need to install Mock on your machine:
# Install mock:
sudo yum install mock
# Add user to the "mock" group:
usermod -a -G mock $USER && newgrp mock
Then, to build an RPM, run:
./dist/redhat/build_rpm.sh
The built RPM is stored in /var/lib/mock/<configuration>/result directory.
For example, on Fedora 21 mock reports the following:
INFO: Done(scylla-server-0.00-1.fc21.src.rpm) Config(default) 20 minutes 7 seconds
INFO: Results and/or logs in: /var/lib/mock/fedora-21-x86_64/result
Building Fedora-based Docker image
Build a Docker image with:
cd dist/docker
docker build -t <image-name> .
Run the image with:
docker run -p $(hostname -i):9042:9042 -i -t <image name>
Contributing to Scylla
Do not send pull requests.
Send patches to the mailing list address scylladb-dev@googlegroups.com. Be sure to subscribe.
In order for your patches to be merged, you must sign the Contributor's License Agreement, protecting your rights and ours. See http://www.scylladb.com/opensource/cla/.
Description
Languages
C++
72.3%
Python
26.5%
CMake
0.3%
GAP
0.3%
Shell
0.3%