Blockchain technology
1 Dec, 2017
Baldolino Calvino

In this new post, we will talk about the possible use of blockchain technology in scientific communication. I bumped into a twitter article published on the Digital Science News Blog, talking about the potential of using blockchain technology in scientific research. Digital Science & Research is a company founded by Englishman Timmo Hannay and operated by the publishing mega-group Holtzbrinck, which controls some of the world’s largest publishing houses, such as Springer Nature and MacMillan. After working at Nature, founding and directing Digital Science and Overleaf, Hannay now is non-executive Director of SAGE. That is, it is a game of giants.

Digital Science focuses on producing digital software and services for researchers and research institutions. In this post, they reveal the Blockchain for Research Digital Science Report [1], a paper that shows the current perspective of the possible impact of blockchain technology on academic research and communication. In addition, they launch a special funding grant of up to $30k for research on blockchain use in science: Catalyst Grant for the Blockchain. Applications are open until January 15, 2018 and range from proposals for creating cryptocurrencies or crypto-protocols to creating groups to influence the implementation of new technology in scientific research. A sure sign that the mainstream of the organized scholarly publication is keenly watching the blockchain and eager to participate in this nascent application of technology.

But after all, what is the blockchain? We can translate this expression as block chain, which corresponds with some fidelity to its basic structure. The block chain is a “chain” formed by digital structures called blocks, which constitute data packets with a standardized structure: it contains a cryptographic pointer (hash pointer) that connects it to the immediately preceding block, a timestamp (timestamp) and a stored data set. Each block is thus a small database, or, to be more exact, the block set is a database distributed into discrete entities. The big conceptual leap in the block chain is the fact that it is impossible to successfully change a block without changing the whole chain. To change the entire string, each cryptographic pointer (created using a SHA 256 encryption function) of each block must be recalculated. If the chain is large enough, the computational power required for each such change operation would be far from trivial, making the entire operation unfeasible. Even more difficult, the block chain does not have a central server and can be stored on an arbitrary number of servers (nodes). An equally arbitrary fraction of these nodes compete with each other to perform the cryptographic calculations necessary to generate the next block (this is what is called “mining the block chain”) in return for a reward.

Illustration by Matth�us Wander (Wikimedia)

In the case of the first implementation of block chain in history, Bitcoin digital currency, the difficulty of calculations to be made to create a new block is controlled by the system ruleset and each node tries to find the correct answer by varying an arbitrary number written to the block, the nonce. The cryptographic operation required for this follows a protocol called proof-of-work and is an example of a brute force attack to the problem. The reward is in the form of cryptocurrency units. That is, the “miners” (servers that compete with intensive calculations) who “find” (beat the digital competition and generate one block before the other) one block receive digital coins in return. The block chain network is decentralized and increasingly accumulates computational capacity in competition between the “miners”. Thus, it becomes virtually impossible over time to alter the block chain network. As a consequence, altering data written to the blocks of the chain is considered impossible and the records inserted in the block chain are said immutable. This solves a big old problem in information technology: how to rely on digital data obtained from any server? Normally, the presence of a “centralized authority” with a correspondence to a real-world institution, which is trusted by all parties involved, is required. The gaps for potential problems are obvious. Satoshi Nakamoto, the elusive creator of blockchain and Bitcoin, created a solution by combining existing concepts such as a “distributed network” and a “cryptographic trust proof” to ensure that the “chain of blocks” was trusted to anonymous users without the presence of a central authority. Obviously, such reliability is not inherent in any block chain and will depend on its characteristics and implementation. Bitcoin cryptocurrency currently survives with no history of security issues in its block chain. The same can no longer be said of the several hundred “clone” cryptocurrencies (altcoins) that emerged in the wake of Bitcoin’s success, many of them clearly fraudulent and some involved in serious security issues.

In the case of Bitcoin, the data stored in the blocks are the transactions between network users, in the form of a structure called Merkle Root, forming a “digital ledger”, but virtually any type of data can be stored in Bitcoin’s “block chain” or similar strings. There are already companies in their early stages of deployment that offer a “notarial service” where documents can be “permanently registered” on the network. This notarial function can already be used by researchers to record their publications (thus obtaining an unchanging time record that proves their claim of priority over a discovery). However, what is envisioned for scientific research is much broader and covers the whole cycle of research data production and publication.

block chain

My personal interest in using the blockchain has recently sprung from a somewhat disturbing finding. Like many, I use the Git version control program to organize locally and the GitHub code repository to store my data, searches, etc. Git efficiently tracks each version of your work by assigning a SHA 256 hash to each. Thus, it is impossible to tamper without your knowledge with your code or data. So I thought, so naive. As strikingly demonstrated by Mike Gerwitz in his Horror Story, it is not only possible, but really easy to change your data and even your Git record tree timestamp. This is because Git has these built-in functions, and they are not difficult to use (a quick and clear tutorial can be seen on a Link Intersystems blog). In my mind came a scenario where a research director implicated in fraud nervously tampered with his databases, including the personal accounts of his scientists, in order to escape conviction. Very sinister. How to trust your own database like that? Even with the support of a centralized external institution like GitHub?

That was when I remembered the blockchain and its immutability. Perfect for giving research data on sensitive projects (such as new drug research for humans) or public projects like Open Science a degree of confidence that can hardly be achieved otherwise. As I am a fan of the concept of Open Science, I soon understood the crucial role that the block chain can play as a decentralized confidence provider. This would free open research from doubts about the possibility of plagiarism surrounding it. This principle of “confidence building” can be used throughout the lifecycle of scientific research, unifying data collection, analysis and presentation into a single link of reliability.

But it doesn’t end there. Another potential impact of blockchain technology on scientific research is direct funding from cryptocurrencies. This would open up an interesting perspective: the same block chain used to research and publish scientific data can also be used to fund these same scientific activities. Institutions wishing to foster research could thus audit in real time funded projects using their own funding tool. It’s almost surreal, isn’t it?

This certainly explains the great interest of the academic publishing industry in this technology. Whoever dominates it will face a race that promises to change the way the scientific world works in a revolutionary and lasting way. But does the block chain already have applicability in science? To try to find out more, I went to the Blockchain for Science project page, whose creator, Soenke Bartling, was interviewed for the Digital Science document. In his FAQ, the project creator is already quite visionary, coming up with several bold notions (even stating that his “final frontier” is “multi-planetary humanity”), but objectively assumes that decentralized, unchanging, blockchain-based databases, etc., will be useless unless scientists are educated about their advantages and a movement is created. The authors are in a position to provide this education to the academic world. So we can conclude from this project, still in a conceptual phase, that the implementation of the block chain in science is still in a nascent stage, prior to any practical application.

However, I can see a point where even the authors of this project are still losing sight of some possibilities. According to them, problems of information mismatch, lack of prior information, nonexistent metadata, among others, will eventually be resolved. And afterwards, they question how do you organize the sharing of this data?, suggesting that the block chain will be of use in the next big step. In fact, the Blockchain for Science project misses a great opportunity to use the block chain even before the information dissemination step. The unifying problem of all these issues is an interacting multipart reliability problem, a situation for which the block chain has been used very successfully (the assembly of a cryptocurrency requires solving problems such as double spend). Therefore, I believe that an appropriate implementation of the block chain could solve, among others, problems related to database reliability (including omitted data), scientific publication reliability (improving or replacing the peer review system) and transparency of financing of science (perhaps creating a system where only results, even theoretical ones, authorize the funding of a project, and judgment is not centralized).

Will we ever see this day in our lifetime?

References:

  1. Science, Digital; van Rossum, Joris (2017): Blockchain for Research. figshare. doi: 10.6084 / m9.figshare.5607778.v1