Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all articles
Browse latest Browse all 9596

Interview: Liran Zvibel of WekaIO

$
0
0

Joakim is the resident interviewer for the D Blog. He has also interviewed members of the D community for This Week in D and is responsible for the Android port of LDC .


Interview: Liran Zvibel of WekaIO
WekaIO is a San Jose, CA based startup with engineering inTel Aviv, Israel, that has built Matrix, the world’s fastest file system , in D. Recently, they posted impressive numbers in the IO-500 Node Challenge . Liran Zvibel, the co-founder and CEO of WekaIO, has been a regular speaker at DConf, talking about their use of D at DConf 2015 , 2016 , and 2018 .

WekaIO is an expression of a design goal of D, that you can write your prototype quickly and easily in D, then continue working on the same codebase until it reaches production quality , as opposed to prototyping in a different high-level programming language. Liran took some time out of his busy schedule to answer some questions about WekaIO and their use of D.

Liran Zvibel at DConf 2018

Joakim: Tell us about the enterprise storage market that WekaIO is in. How do you use D in your product?

Liran: Any compute environment has a mix of CPU power, networking, and storage: this is uniform across many organization types and sizes, and whether the infrastructure is in-house (“on premises”) or in the public cloud (“The cloud is just somebody else’s computer,” as they say).

The storage component provides the current state and also the history needed by compute. While compute is stateless (how many times have you rebooted your computer to fix a problem?) and the network is ephemeral, storage must be able to keep its state consistently and coherently while providing enough performance for the compute to do its job (otherwise the running jobs are IO-bound, and nobody likes that).

There are three main types of centralized storage systems:

Block storage systems, that provide the abstraction of a local drive that sits remotely. Several systems may get access to their own “volume” on the centralized system. The AWS equivalent for such a system is Elastic Block Store. These volumes are usually not shared between more than one server, and the reason to use them is failure resiliency, reliability, and performance (also some advanced features such as taking a point in time, backup, integration into a VM environment, etc.). File systems, centralized storage that is also shared, and allows several
servers on the network to access the same data. This is the kind of system
that WekaIO provides. Traditionally, people have turned to block-based solutions if they needed high performance, then created a local file system based on that shared volume, but with WekaIO we show that we enable a shareable file system that is even faster than a local file system over block storage. Object storage solutions, these enable storing objects with reduced semantics (no ability to modify, data stored is only eventually consistent, etc) to enable cost savings, and generally don’t care about performance.

When I review storage systems here, I talk mainly about block-based and file system solutions as object storage is much simpler and is implemented

using different methods.

Requirements for storage systems are:

Reliability Performance (low-latency IOPS and throughput) Features

Traditionally, these systems have a “data path” that cares about reliability and performance, and then a “control path” or “management path” that takes care of higher-level features and making reliability work at a higher level.

For many storage systems, the data path is implemented in some system programming language, such as C or C++, and the management path is implemented in a higher-level language such as python, Java, Go, etc. Our previous company, XIV (that IBM acquired in 2007) used a private version of C that had polymorphism, generic programming, and integrated RPC, for code that was a mixture of XML, C, and weird header files.

At WekaIO, we use the D programming Language to implement both the data path and the control path, as we can use a single language to get both machine-level access with high performance when needed, and a higher-level language for the features.

Joakim: How did you end up choosing D? You mentioned at DConf that you

initially tried a combination of Python and C++.

Liran: The first part of the work we had done on the C++/Python combo was to work on an efficient in-process RPC mechanism between Python and C++, that would

allow us to refer to the same object from C++ and Python in an efficient way. That way the same object could have run in the C++ context when needing performance, and in Python when needing brevity. The idea was to start implementing the system in Python then convert pieces to C++ as needed, where inter-process communication (inter or intra servers) would be done in C++ only. At that time, we had a prototype of the system implemented in Python and we started working on the C++ part, and the RPC definition first.

When we discovered D, we realized we may be able to get a single language to handle both the high-level and performance requirements, and we started by running some pet projects to verify that this was indeed a working language. The next phase was implementing our RPC over D (which was much more elegant than the C++ version), and a tracing system that would allow us to debug (the tracing system was reviewed in my DConf 2015 talk).

The biggest limitation we had initially was the availability of an optimizing compiler, as the reference compiler, DMD, does not provide assembly that is equivalent to LLVM or even GCC when running on modern x86 processors. After DConf 2015, and with the help of David Nadlinger, we were able to get LDC (the D compiler with an LLVM backend) to compile our code and generate results that met our needs.

Joakim: Andrei Alexandrescu, one of the co-architects of the D language, blogged about visiting you in Tel Aviv : how your code is very compact and that you added new features without growing the codebase much. What key features of D do you use to accomplish this and the speed and other benefits of your storage system?

Liran: The strongest part of D is its generic programming. We use generic programming in two ways, that usually are contradictory but with D they actual

Viewing all articles
Browse latest Browse all 9596

Trending Articles