APP下载

Development and challenge of data sharing based on block chain technology

2020-12-28,-,

, -,

(Information Security Research Center, Harbin Engineering University, Harbin 150001, China)

Abstract:The emergence of Bitcoin makes blockchain technology widely concerned.Its high security makes it widely used in finance, network security, internet of things and other aspects.At the same time, the explosive growth of data brings many opportunities for data sharing, but traditional data sharing methods cannot cope with the increasingly complex network environment and the massive growth of data.This paper covers five aspects: ① The motivation of applying block chain technology to data sharing scenarios; ② The typical application architecture of applying block chain technology to data sharing scenarios; ③ The typical applications of security storage, access control and auditing in the fields of smart medical treatment, smart transportation, smart city and supply chain management; ④ The opportunity of combining the data sharing mechanism based on block chain technology with emerging hot technologies such as artificial intelligence; ⑤ The future research direction from the perspective of application expansion and supervision.

Key words:blockchain; data sharing; smart medical treatment; smart traffic; smart city; supply chain management

0 Introduction

In recent years, the rapid development of blockchain technology has attracted worldwide attention from various sectors.Blockchain technology, as the foundation for cryptocurrencies, has contributed to the success of Bitcoin[1], the most popular cryptocurrency.With the success of Bitcoin’s emergence, Gavin Wood’s proposed Ethereum[2]came into being.The essence of blockchain is a distributed ledger that cannot be tampered with, in which transaction information and other data can be recorded.Because blockchain itself does not require a third-party authentication mechanism, it can ensure the security of transactions in an untrusted network environment.At present, blockchain has been widely used in various scenarios in a number of industries, for example, AI[3], cyber security[4], the Internet of Things(IoT)[5-6], edge computing[7-8], finance[9], banking[10], and insurance[11].Although blockchain technology has been widely applied, there are still untapped opportunities[12].

In the current information age, data have become the most valuable resource, and current scientific research is based on data collection.With the development of hardware, the widespread popularity of equipment has led to an explosive growth of data.This provides excellent opportunities to explore data sharing.Until now, there have been few studies exploring ways to share data.Most existing data sharing modes use centralized data management.In an era of explosive growth of big data, centralized data management brings many disadvantages, such as high latency of response, lack of security, and a large workload[13].Unlike the centralized data management approach, the blockchain approach to data management is distributed and verified by participating nodes in the system reaching a consensus.In Ref.[14], the author compares the performance of the traditional data sharing method using the cloud for third-party storage with that using blockchain for data sharing, and finds that the scheme based on blockchain has more reliable performance in data tracking, data storage, and data sharing.In Ref.[12], the authors integrated blockchain technology into data sharing using a distributed hash table(DHT)to store encrypted data outside the chain, and store the addresses in the blockchain.In Ref.[15], Wang, et al.proposed a decentralized storage system based on blockchain.The system combines the Inter Planetary File System, Ethereum, and ABE technology.In this framework, the data provider can distribute a key for the data consumer and encrypt the data that will be shared by specifying an access policy.

In this article, we first use a lot of research to summarize the types and forms of shared data.We found that data sharing has a large number of needs in two aspects, namely user security and information security.According to a series of problems that exist in data sharing, several advantages of blockchain applications in data sharing are proposed.This study focuses on a summary of the role of blockchain in data sharing in smart medical treatment, intelligent traffic, smart cities, supply chain management, and other domains, and proposes a common data sharing architecture based on blockchain.We also discuss the application of blockchain for data sharing in the emerging field of AI.A large number of studies have shown that the application of blockchain for data sharing in AI has taken shape and has an enormous potential that is waiting to be tapped.Although the application of blockchain to data sharing is in full swing, the application of technology has also exposed many problems.We found that there was a lack of laws and regulations in the application of blockchain to data sharing, which could lead to illegal applications, and eventually to considerable data leakage.This could cause unimaginable privacy disclosure disasters[16].

The structure of the remainder of this paper is shown in Fig.1.The second section mainly introduces the definition of data sharing, the basics of blockchain, and the development process of blockchain.The third section analyzes the requirements for data sharing and proposes the advantages of blockchain for data sharing.The fourth section introduces a common architecture for data sharing based on blockchain.The fifth section focuses on the application of blockchain to data sharing.The sixth section puts forward the development trend and current situation of blockchain in emerging fields.At the end, the paper summarizes and puts forward the research direction that needs attention, and a plan for further research.

Fig.1 Drganization of the article

1 The theory and development of blockchain

1.1 Blockchain infrastructure

What is the nature of blockchain? As the name implies, a blockchain is a series of blocks linked together.So, what is the content recorded in the block? Moreover, how are blocks linked? First, each block can be considered as a bill, and the block can record any information, such as transactions, various types of data, and even code.The code recorded on the block can also be interpreted as a smart contract, as described later.Take the Bitcoin blockchain, a mature blockchain system, as an example.In the bitcoin system, the addresses of buyers and sellers as well as specific transaction information are recorded in the blocks.Second, the nodes participating in the blockchain system package the information they want to record, and after a series of processes, a block is formed.However, not every block packaged is recorded in the chain.Blocks that need to be recorded in the chain will be broadcast to the blockchain network, and are verified by the nodes in the blockchain network, and will only be recorded in the chain after passing through all the nodes.The order of forming blocks is based on the timestamp recorded in blocks and sorted in a sequential manner.Fig.2 shows the transaction model of Bitcoin.

Fig.2 Bitcoin trading model

So, why does blockchain have immutability? Take the bitcoin system as an example.In the bitcoin system, the workload proof mechanism is adopted because “computing power” is adopted in bitcoin as the “measurement standard” for those who have bookkeeping rights.If illegal personnel want to modify the information in the blockchain, they need to obtain more than 51% of the computing power in the system, which would consume a lot of resources, and they could not complete it by themselves.Therefore, this mechanism of workload proof brings a unique immutability to the blockchain.

1.2 Development stage and typical application of blockchain technology

(1)Blockchain 1.0: Bitcoin[1]

In 2008, Satoshi Nakamoto created Bitcoin, a peer-to-peer electronic cash system, in response to the global economic crisis and inflation.Bitcoin is also known as blockchain 1.0.Fig.3 shows the Block stuctrure of Bitcoin.

Fig.3 Block structure of Bitcoin

(2)Blockchain 2.0: Ethereum[2]

Ethereum is called blockchain 2.0, and is essentially a state machine based on transactions.In the Ethereum state machine, we start from a state without any transactions on the network and move from this initial state to a certain state.The state finally reached is the state of Ethereum.Ethereum is currently the second-largest blockchain in the world after Bitcoin, with a scale of US$14 billion.As of 12:00 noon on March 3, 2019, the ETH share price was US$135.17, and the market value was about US$14.2 billion.Fig.4 shows the transition of Ethereum state.

Fig.4 Ethereum state transition

(3)Blockchain 3.0: Hyperledger Fabric[18]

Hyperledger Fabric was originally a super-ledger project contributed by Digital Asset and IBM and sponsored by the Linux Foundation.It is a well-known implementation of the blockchain network framework.As a basis for developing applications or solutions based on a modular architecture, Hyperledger Fabric supports plug-and-play components such as consensus and membership services.Hyperledger Fabric uses container technology to run a smart contract called chaincode, which contains the system’s application logic.

1.3 Blockchain classification

In many existing studies, the blockchain is divided into three categories: public blockchain, private blockchain, and consortium blockchain.Table 1 compares the attributes of the three types of blockchain.

Table 1 Comparison of the properties of three types of blockchain

Public blockchain: In the public blockchain, the data recorded by the blockchain are publicly visible, and every member of the blockchain participates in the consensus process.Every node on the public blockchain can freely join and exit the network and participate in reading and writing data.

Consortium blockchain: A consortium blockchain is managed by multiple institutions, each of which selects a representative(node)to manage the blockchain.It is essentially a public blockchain, but only partially decentralized.Only one set of preselected nodes will participate in the consensus process of the consortium blockchain.

Private blockchain: In a private blockchain, only those nodes from a particular organization are allowed to join the consensus process.A private blockchain is fully controlled by an organization and is therefore considered a centralized network.

2 Motivation for applying blockchain technology in data sharing

2.1 Data sharing

In the present information age, data have become the most valuable resource.The purposes for achieving data sharing in different scenarios has become a popular topic.So, what is data sharing? Data sharing means that different users can read other people’s data and perform various operations and analyses.As the name implies, the most important parts of data sharing are data and sharing.In terms of data, what kind of data can be shared? With the massive circulation of data in the market, we divide data into several categories according to the demand analysis of the data itself: Message, record.With different types of data sharing, the forms of data sharing are also different.Nowadays, there are three mature forms of data sharing: Public announcement, remote use.In Table 2, the forms of data sharing for different data types are collated.

Table 2 Forms of data sharing for different data types

(1)Types of shared data

The shared data can be divided into two categories: message and record.

Record: According to the data requirements of various scenarios in the market, the data in a record mainly represent personal medical information.

Message: The data in the message class contain real-time information.The data in the message class contains real-time information.For example, traffic information at a certain time.

(2)Forms of sharing data

Public announcement: Some data sharing is in the form of public release, which does not require any authorized access or encryption of the data.

Remote use: With the need to share data in the Internet of Things, the distribution of many devices in the Internet of Things is relatively scattered, and its data are generally stored in the cloud with the help of third parties[19-20].At this time, data sharing needs to be authorized, and only authorized people have the rights to perform different types of operations on data.This form of sharing is remote use.

2.2 Typical requirements for data sharing

According to a large number of research surveys, the traditional way to share data has been unable to meet the present demand for data sharing.We summarize the current requirements for data sharing as follows.

(1)User security

Anonymous transaction: In order to protect their own privacy, most of the time users who participate in data sharing do not want others to know their true identity.However, according to the traditional way of data sharing, real users can be located by IP address to obtain their real identity.The need for anonymity in data sharing presupposes that a user’s true identity is not disclosed.

Traceability: In today’s data explosion, a wide variety of data types emerge in an endless stream, and the sources of data are diverse.In the face of incorrect data, it may become important to trace the source.Who did the recording? When? These problems can be solved by tracing the source.However, there are few ways to share data to ensure that it can be traced.Therefore, the need for traceability in data sharing has become an urgent problem to be solved.

Identity authentication: In an untrustworthy network environment, network attackers can be seen everywhere.Avoiding network attacks has become a serious issue.However, an attacker in the network can employ identity camouflage, and present a “false” identity to change and access shared data, which produces a bottleneck in the development of data sharing.Therefore, the identification of users is an important function.

Undeniable: In an untrustworthy sharing environment, two parties who are sharing data have no way of knowing each other’s identities.This poses a significant challenge for data-sharing transactions.The data sharing parties are required not to deny the transaction they initiate, and can automatically execute the transaction.At this point, functionality is required for undeniability.

(2)Data security

Data confidentiality: The traditional way of data sharing data cannot resist the various emerging types of network attacks.When an attack on a network is successful, much of the shared data are compromised.This represents an irreparable loss to the data owner.Using certain technology to encrypt or sign the data can effectively prevent this situation.

Non-tamperability: With a single way of sharing data that requires third-party operation, malicious attackers may disguise the correct data and modify it in the process of data transmission.At this point, ensuring that the data are tamper-proof becomes a difficult problem.

(3)Use convenience

Automatic transactions: In sharing transactional data, currency exchange may be involved between the data requester and the data provider.At this time, when the data requester receives the data, the data provider can obtain the corresponding compensation.In this case, the function of automatic trading must be implemented.

Complex logic: In the face of more complex sharing scenarios, common sharing rules may not be practical for some users, and some users need to customize the data sharing rules.At present, traditional data sharing methods cannot meet this demand.

2.3 The advantages of blockchain technology in data sharing applications

(1)Privacy protection

For example, the more popular blockchain 1.0-Bitcoin system.For users participating in the Bitcoin system, running the wallet software generates a public key address belonging to a specific user.Through a series of encryption algorithms, such as an elliptical encryption algorithm, and in any transaction, the corresponding transfer is made to the address.The address is unique and does not reveal any personal information.In the era of blockchain 2.0, due to the rise of Ethereum, the deployment of smart contracts provides a series of conveniences for data sharing.The essence of smart contracts is logical code, so smart contracts can be used to identify users, and whether or not user data are available.In Ref.[20], Yue, et al.proposed a sharing model for big data based on smart contracts.This model implements the function of data authentication by designing smart contracts.It was found through experiments under this model, there was not attack.The probability of a successful attack is extremely small.It was verified that the application of blockchain technology in data sharing could combat security problems in the network and maintain the security of data.

(2)Immutability

As a typical representative of the public blockchain, all transactions in the Bitcoin system are publicly visible.For example, the special proof-of-work method(PoW)in the Bitcoin system means that tampering with records in the blockchain requires 51% of nodes to authenticate, which requires a lot of computing power, consumes a lot of resources.The attacker has to give up on making changes to the data.

(3)Flexible scalability

By using smart contracts, Blockchain 2.0-Ethereum, users can customize and generate a smart contract based on sharing rules.Users can control data sharing policies through this smart contract[21], and can customize access rules.

3 Data sharing application architecture based on blockchain technology

Fig.5 is a summary of the architecture of blockchain technology applied to data sharing.The architecture is mainly divided into four layers, from bottom to top: terminal device layer, consensus layer, data sharing layer, and data storage layer.The following are the main functions of each layer using the main technical description.

Fig.5 Blockchain-based data sharing architecture

3.1 Data storage layer

As the top layer of the data sharing architecture, the data storage layer mainly provides the function of storing data.At present, there are two popular storage modes.One is for the data that do not need to be changed, which is a small amount of data.Most researchers propose to store such data in the blockchain because the non-tamperability of the blockchain gives safe storage performance.However, this only applies to a small amount of data that do not change.The other mode contains a large amount of data that need to be modified frequently.At this point, the structure and workload proof mechanism of the blockchain itself cannot supply the large amount of space, resources, and data throughput it needs, so a series of distributed storage technologies have been derived.

For the task of combining blockchain and data sharing, a series of distributed storage technologies have been used.The derived technologies include PDS, IPFS, and BigchainDB.

Personal data storage(PDS)[22]: In Ref.[23], Chowdhury, et al.proposed the concept of PDS.PDS is a service that allows individuals to store, manage, and deploy their critical personal data in a highly secure and structured manner.It also provides users with a central point of control over their personal information.In Ref.[23], the specific functions of PDS are elaborated in detail: The user needs to control a secure digital space, namely the PDS, in which his data can be kept.Given the vast number of data sources that users interact with every day, interoperability alone is not enough.There needs to be a centralized location where the user can view data about himself.PDS also enables users to easily control data flow and manage fine-grained authorization for third-party services.

BigChainDB[24]: An extensible blockchain data-base with both blockchain and database attributes.In BigChainDB, each node has its own local Mango database, and all communication between nodes uses the Tendermint protocol.To ensure that BigChainDB is tamper-proof, BigChainDB does not provide any interfaces for data deletion or modification.Another strategy is that all transactions are encrypted and signed.After a transaction is stored, changing its content changes the signature, which can be detected(unless the public key is also changed, but this should also be detectable because each block of the transaction is signed by a node, and the public keys of all nodes are known), so BigChainDB’s data are tamper-proof.

IPFS[25]: The interstellar file system(IPFS)is a point-to-point distributed file system that attempts to connect all computing devices to the same file system.IPFS is similar to the Web in that it generates a hash value for each file stored in it, and can be thought of as a URL on the Web.Users can address files based on hash values.IPFS combines DHTs.IPFS has no single point of failure and there is no need for mutual trust between nodes.

3.2 Data storage layer

The data sharing layer mainly includes the blockchain layer and contract layer.In order to achieve the shared data and system user authentication and permission changes.

The data operations include data query, insert, modify, and delete.The data sharing layer provides an interface for users to manipulate the data and interact with the storage layer.In the data query, modification, and deletion processes, the data sharing layer first authenticates the relevant identity of the user.If the authentication passes, the user’s request can be processed, and the corresponding operation on the data can be carried out by calling the interface that has been deployed in advance.When the data are inserted, the data sharing layer distributes the data after it passes the consensus layer, and passes the data and user identity authentication or stores them in the blockchain.

In the case of simultaneous data sharing by multiple organizations, it is necessary to ensure the identity security of the users who share data.The existence of the data sharing layer ensures that users in the system can be validated and that the system administrator can change the permissions of the users in the system if necessary.The system based on blockchain 2.0 Ethereum also uses smart contracts to write certain algorithms to customize the sharing rules.Access control is achieved by validating data signatures and custom sharing rules.

3.3 Consensus layer

The consensus layer encapsulates various consensus algorithms.In a decentralized system, achieving consensus among nodes that hold different data is crucial.The more decentralized the system, the higher the requirements of the consensus algorithm.In the consensus layer, the encapsulated consensus algorithm is used to reach consensus between nodes, and send the results generated after reaching consensus to the data sharing layer.The algorithms commonly used in mature blockchain systems are PoW, PoS, BFT, DPoS, and PBF.

PoW: The idea of PoW is that if a node wants to create a new trading block, a node must first prove that this node will not attack the whole network.Popular public blockchain systems, such as Bitcoin and Ethereum, use PoW for consensus.In a PoW-based protocol, transactions must be validated by 51% of the nodes.However, Bitcoin and Ethereum need to select nodes with random numbers for the hash value of the block header to obtain the right for verifi cation.This method of random number selection consumes a large amount of resources and time, so malicious attackers have to give up the attack and obtain security by consuming resources.Bitcoin and Ethereum, which used PoW to achieve consensus, have the disadvantages of low throughput and long transaction times.

Proof of stake(PoS): A consensus protocol based on PoS solves the high energy consumption problem of PoW[27].A PoS agreement is a form of proof of monetary ownership.However, PoS can cause delays when creating new blocks.

Byzantine fault tolerance(BFT)is a fault-tolerant technology in distributed computing.The Byzantine hypothesis is a model of the real world, where computers and networks can behave in unpredictable ways due to hardware errors, network congestion or outages, and malicious attacks.Byzantine fault-tolerant techniques are designed to handle these aberrant behaviors and meet the specification requirements of the problem to be solved.

3.4 Terminal device layer

With the explosive growth of data, data sharing is urgently needed in different industries and application backgrounds.There are many possibilities for the application of blockchain technology to achieve data sharing.The variety of devices that perform bottom-end data manipulation is even more complex.

The terminal device layer in data sharing mainly collects data and performs a series of operations on the data to achieve user interaction with the data sharing layer and data storage layer.The terminal device layer mainly contains two kinds of users: One is the data producer, that is, the user who collects data and uploads the data to the consensus layer, which passes the data through the data sharing layer to the data storage layer.The other is the data consumer, who queries the shared data.

At present, Internet of Things devices with sensor functions, such as smart phones, vehicles, mobile computers, and other devices with collection, storage, and routing functions can be used as terminals for data producers.

4 Typical application

This article focuses on the roles that blockchain plays in data sharing for smart medical treatment, smart transportation, smart cities, and supply chain management, and summarizing its relationship to these domains, as shown in Table 3.

Table 3 Fields and functions of blockchain applications in data sharing

4.1 Applications of blockchain in medical treatment data sharing

With the rapid development of the economy and medical technology, sharing medical data intelligently and providing more convenience for patients has become an urgent problem.A large number of studies have found the following challenges in sharing medical data: ① Explosive growth of data: The data explosion makes the traditional server-based sharing model unsuitable for this situation; ② Cross-institutional data interoperability: Because patients often need to exchange information between multiple organizations, multiple institutions may be required.At the same time, the traditional data sharing model cannot provide good effectiveness for read-write medical data; ③ Data security and privacy: When sharing medical data, multiple parties operate in an untrusted network environment, so data security and privacy cannot be realized.Some scholars believe that using blockchain can overcome these challenges.

In a typical application scenario of medical and health data sharing, the nodes participating in the data sharing include patients, portable health data acquisition terminals, health data storage, hospital visits, historical hospital visits, doctors, and third-party storage.When patients are in a hospital, the authorized doctor read other historical records of his patients in this hospital in the shared data, and combining the shared data from the data acquisition terminal with health, for the doctor to conduct a comprehensive diagnosis, treatment.The related medical records in the data shared storage system, provide auxiliary information for medical follow-up of patients.

Current research focuses on how to apply the blockchain technology to secure the sharing of personal medical record data between different medical institutions, and especially the security of the shared storage mechanism, authorization mechanism, and data security audit mechanism.Of these, the shared storage mechanism is mainly involved with the safe storage and use of patients’ medical data in the same medical institution and the use of a patients’ medical data by different medical institutions, so as to realize the fast query and safe storage of medical data.The mechanism of authorization is mainly involved with the access control methods of patients’ own medical record data in the same medical structure and between different medical institutions, so as to achieve a specific user’s specific operation on the data and prevent access to illegal users.The data security audit mechanism is mainly involved with the method of combining intelligent contracts with encryption and signature, so as to realize an audit on whether the user’s identity is correct or not, and whether the data have been tampered with, without the need of a third party.Using blockchain eliminates security doubts auditing by distrustful third parties.

(1)Secure storage

Real-time sharing is greatly affected by the need to share medical record data in different places, and the throughput of the blockchain is greatly reduced because the medical data need to consume a large amount of spatial resources to be directly stored in the blockchain.A large number of scholars have proposed using blockchain to store metadata composed of indexes, summaries, digital signatures, and relevant real data.The advantages of combining blockchain technology are as follows: ① the throughput of blockchain is guaranteed; ② data recorded on the blockchain will not be tampered with; ③ metadata are encrypted and recorded on the chain, and will not be used by illegal personnel, thus ensuring the privacy of the data.

In Ref.[27], the author proposed a medical data management system based on the consortium blockchain to realize the sharing of patients’ medical records by recording real medical date in the blockchain.

The application of blockchain technology as secure storage is as follows: In Ref.[28], the author proposed a searchable encryption scheme based on blockchain in which the data provider first builds an index of real data and creates an intelligent contract to describe the location of the index.The data provider then stores the index and the smart contract on the blockchain, encrypts the real medical data and stores it in the cloud, and stores the index on the blockchain.The blockchain can ensure that the index is not tampered with, while the data index can ensure the efficiency of the real data query.At this point, the use of a blockchain storage index ensures the security of the data and the query efficiency.In Ref.[29], the authors proposed a scenario in which the user could access the data of the medical data provider(the data provider could access the data from the medical institution, the patient, etc.).The data provider stores the real data in the cloud through encryption, while patient information, medical data summary, signed data id, and other data are stored in the blockchain in the form of metadata.The metadata stored in the blockchain will not be tampered with and can be evaluated using the metadata.

In Ref.[30], the author proposed a medical information service platform.In this framework, data are not stored in several centralized hospital databases, but is distributed, and the data are partitioned and uploaded to the distributed databases.When the data are partitioned, an identity variable is added to the header of the data block to prove the validity of the data.After the data provider uploads the data to the platform, it applies to join the blockchain network by calling the client, then extracts the data from the local database according to the chaincode on the blockchain, and stores the metadata in the blockchain.

(2)Access control

In a medical scenario, the control of access to medical data is of utmost importance, and progress has been made in using blockchain to control access.Today, most access control methods in medical scenarios use the deployment of smart contracts.By deploying a smart contract, the two parties sharing data can be authenticated.The corresponding smart contract can be invoked according to the user’s identity and user’s requirements, and different functions can be realized according to the execution of different smart contracts.The advantages of combining blockchain are as follows: ① Scalability can be realized through the deployment of a contract.Users can customize data sharing rules and generate different policies for different needs; ② A smart contract is used to realize access control, which can ensure that the shared data are not tampered with.For the scenario of sharing medical data in different places, the application of blockchain technology as access control is as follows.

In the system proposed in Ref.[28], the provider of medical data(medical institutions or patients themselves)stores the data index in the blockchain.After the real data are stored in the cloud, the data requester issues a data request.The blockchain listens to the request, queries the corresponding smart contract according to the request, and queries the corresponding data according to the smart contract.The smart contract deployed on the chain controls the data request with the correct identity according to the corresponding rules, so that the real data can be accessed to achieve access control.Similarly, in a similar application scenario, the authors[31]put forward a medical data sharing system based on the consortium blockchain.The system only allows specific miners to access the mining blocks together, and deploys smart contracts to authenticate the identity of the requester, and grants the requester the appropriate permissions for the data.Similarly, in Ref.[32], a blockchain-based architecture called MeDShare was proposed, and was used to manage access control to obtain data from network entities.In this architecture, there are two identities: patient and medical data provider.The information obtained from the medical data provider is distributed by IPFS, and the distributed medical data are accessed by blockchain and smart contracts deployed on the blockchain.In Ref.[33], Vora proposed a system called BHEEM for sharing medical data.Different types of smart contracts are stored in the blockchain, and different smart contracts are assigned according to different user identities, so that users have different functions in controlling the access to data.

In Ref.[30], the author proposes a medical information service platform where data providers access data from their local databases through chaincode on the blockchain.After the platform receives the requester’s request, it will process the requester’s request according to the chaincode and return the data on demand.In Ref.[34], access control is achieved for Shared medical data by creating different types of smart contracts and storing patient records in a local database.

(3)Auditing

In a scenario where shared medical record data are used in different places, multiple users participate in data sharing, and the identity of the participants and the shared data need to be audited to ensure that the right people share the right data.With traditional third-party audit methods, the use of third-party institutions for audit no longer meets the current demand for data sharing due to increasing distrust.Some scholars have proposed the use of blockchain as an audit function.Out of the characteristics of blockchain itself, there is no need for any third party to audit, so the use of blockchain technology can solve the audit problem in data sharing.

Kaanichen proposed an audit architecture for data use based on blockchain.The system has three main identities: data provider, service provider, data processor.The layered identity encryption mechanism and the smart contract in the blockchain are used to achieve the audit function.In Ref.[35], proxy re-encryption and other encryption techniques are used to ensure the data auditing.For situations where multiple sets of data are shared, how the data are audited is important.

4.2 Applications of blockchain in smart traffic data sharing

The typical architecture for data sharing using blockchain in intelligent transportation is a model that includes data requesters, data providers, and edge nodes(RSUs running on the vehicle blockchain).The data requester requests the data provider to share the data.The data provider collects traffic-related information and shares the data stored in its edge nodes to receive rewards based on its contributions.Each car chooses its own role based on the data requirements and driving plans.An edge node in an RSU is an edge device(node).The RSU is upgraded to have computing capabilities and storage space for computing and storing services.A certain number of RSUs in the same region form a set of vehicle edges.Each vehicle edge group has a local controller and a storage pool.This local controller acts as a data broker to manage data requests from local data requesters.The storage pool stores local data uploaded by the vehicle.After it finds the best data provider, each data requester sends a request for data requirements to the nearest RSU.The data provider makes the decision about the data sharing authorization.

In data sharing for intelligent transportation, the data owner first stores the data to be uploaded in the edge node, which audits and authenticates the data and the identity of the user.If this is successful, the processed data are packaged and broadcast to the blockchain in the form of metadata, and then a block is generated through a specific consensus algorithm.When a data requester makes a request, the data provider generates smart contract custom data access rules to control the shared data access.

At present, most research focuses on the application of blockchain technology to the sharing of traffic information and parking space information between vehicles, especially the traffic data sharing storage mechanism and access control mechanism.The mechanism for shared storage deals with the safe storage of explosive traffic data without affecting the efficiency and security of the system.To meet this requirement, the blockchain is mainly used to store metadata or data indices instead of real data.The blockchain can also be used to record API functions, where the real data are stored by a third party.The benefits of this are as follows: ① Data stored on the blockchain is guaranteed to be readable, and not to be tamperable; ② Blockchain provide interfaces for third-party storage, which can cope with mass data storage; ③ The user’s operations are packaged into blocks and recorded in the blockchain, which realizes the traceability of the shared data.Access control mechanisms mainly focus on how the vehicle nodes in the collected data connect to the corresponding storage organization, and control the data provider and data requester for data sharing.Data are not tampered with by other people and attack.When the two sides reach an agreement, can automatically perform data sharing, the two sides cannot refuse this decision.The benefits of this approach are: ① The deployment of smart contracts makes shared data non-repudiating for both parties; ② The deployment of smart contracts can achieve user identity authentication in the untrustworthy vehicle network environment.

(1)Secure storage

Sharing message data for public release: In complex traffic conditions, real-time traffic information sharing is needed.However, realizing storage and distribution of data among many vehicles to achieve sharing has become a considerable challenge.In view of this situation, Zhang et al.[36]proposed a security data sharing system based on blockchain.Facing the problem of the wide geographical area of the Internet of Vehicles, the author proposed dividing the whole Internet of Vehicles into multiple regions.Each region is assigned a parent blockchain and a secondary blockchain for storing information.The parent blockchain is maintained by all entities participating in the system, and the secondary blockchain is maintained by the participants of the secondary blockchain.Smart contracts on the parent chain are used to ensure data consistency between the parent chain and the secondary chain.A real-time traffic information sharing system based on blockchain is also proposed in Ref.[37].In this system, faced with the problem of vast traffic data and its safe storage, the author generates an index of real data and stores it in the blockchain in the form of metadata.The workload proof mechanism is used to audit the stored data.Only audited data are eligible for storage on the blockchain.Storing the index in a blockchain allows for efficient querying of data and security to ensure that data are not tampered with.

With the development of the economy, increasingly more cars are driving on the highway, but parking resources are very scarce.In Ref.[38], a parking system based on blockchain was proposed that provides a storage interface for parking information through a smart contract, and uses the consortium blockchain as a distributed database to store user information and smart contract code.The user information is stored in the blockchain.Due to the tamper-proof nature of the blockchain, the user’s identity can be guaranteed not to be tampered with, thus ensuring the user’s security.

Sharing publicly released record data: Publicly released record data usually requires a large amount of storage resources.Due to the limited storage resources of devices in the Internet of Vehicles, it is a great challenge to realize the storage of record data for the Internet of Vehicles.At the same time, data that is shared through public distribution needs to be audited before it is stored, and the identity of the person who uploaded the data needs to be audited.In view of these two needs, relevant scholars proposed to use blockchain technology to solve the corresponding problems.

In the face of the emergence of autonomous vehicles, the timing of the diagnosis of vehicles has become an essential link.In the case of a traffic accident, these diagnostic data(that is, the data for past accidents)need to be shared.Because of the historical nature of diagnostic data, data from multiple time periods may require a large amount of storage resources.Simply storing it all in the blockchain will lead to a large blockchain, which will cause a lot of problems in terms of throughput and running speed.In Ref.[39], a system of sharing vehicle diagnostic data based on blockchain is proposed.The author proposes a segmented ledger, which stores not bulk data but segments of data.The timestamp technology of the blockchain can also obtain the time corresponding to the data storage and ensure the traceability of the data.

(2)Access control

In a public message data sharing scenario: For increasingly complex and unpredictable road conditions, strangers who want to go to a new place, or road conditions for a section of road, such as whether the road surface is frozen or whether there is an accident in the road, etc., with real-time information.Most of this type of information is shared publicly.For this situation, the author in Ref.[37]put forward an effective vehicle Creditcoin public network based on blockchain, and the authors used the edges of the roadside set node.The user uploads local data to be stored temporarily in the edge nodes, and the edge node through signature algorithm and encryption function implement request validation.After verification and auditing, store the metadata in the chain of blocks.If a data request is made, the data provider will generate a smart contract for the data sharing rules.Access control to data through custom smart contracts.

In Ref.[40], to trade charging pile information, the sharing process is as follows: First, during the exploration process, the electric vehicles send a request to the blockchain.The charging station then sends a bid for the request, and the electric vehicle selects a charging station.In this case, malicious attackers may be involved in sending malicious emails and requests to the blockchain.As a result, the system cannot run normally, and the charging station cannot obtain any user information or share its own information.In this situation, the system adopts a smart contract and charges a certain fee for each request, which effectively prevents a malicious attack.The information sharing between the car and the charging pile is studied in Ref.[41], this study proposes a framework combining a lightning network and an intelligent contract.The smart contract verifies the vehicle identity by verifying the secret key and signature.If the verification fails, the charging pile will terminate the session, so as to control the data access.

(3)Real-time data sharing

In a content-centric data sharing scenario in a vehicle network, the information holder may disseminate false data to the customer out of malicious intent, thereby influencing the recipient’s driving decisions and even causing traffic congestion and accidents.In Ref.[42], a data sharing system consisting of a double-layer block chain is constructed.The underlying nodes request the service by announcing their requirements in the Named Data Networking(NDN)paradigm.For upper layers, nodes submit their demand and supply to the nearest roadside unit for further matching.Simulations verify the validity of the system and the Data sharing mechanism facilitates the secure information exchange on Vehicular Named Data Networking(VNDN).

For the increasing number of transactions in vehicle-to-Grid(V2G)networks.Ref.[43]proposed a lightweight protocol based on blockchain, called DIRECTED acyclic graph-based V2G network(DV2G), which is a distributed ledger technology(DLT).The aim is to improve the computing power of V2G network and make it more suitable for micro transactions.

In view of the more and more common phenomenon of vehicle black box in the development of vehicle Internet, it is very important to develop the safe and effective sharing and transaction data through vehicle network.Ref.[44]proposed a model of vehicle data market platform based on blockchain and a data sharing scheme using doo-ABE based on attributes of data owners based on blockchain.The model meets the basic requirements of data confidentiality, and integrity.The system stores metadata on the on-chain, stores encrypted original data on the external chain, and adopts the federated block chain to safely and effectively process the video data of large capacity and privacy sensitive black box.In addition, data owners in this model can control their data using blockchain-based do-Abe and owner-defined access control lists.

There are still some problems about how to share perception data safely and reliably in vehicle fog environment.Ref.[45]proposed an efficient, privacy protection and verifiable vehicle-mounted foggy perception data collection and sharing scheme based on licensing block chain.In the data acquisition stage, the secure and verifiable calculation of the average and variance of the collected vehicle sensory data is realized by combining the homomorphic 2-DNF(disjunction normal)cryptography system and the identification-based signature scheme.At the same time, in order to achieve efficient and reliable data sharing, the scheme utilizes a licensed blockchain to maintain an immutable and tamper-proof record derived sensory data.The security analysis shows that the scheme is secure in the aspects of location privacy protection, verifiability and immutability.

In disaster areas, ground rescue vehicles need to transmit a large amount of data(rescue command, road damage, rescue experience, etc.)to ensure safe driving and efficient rescue.When communication infrastructure is damaged by disasters, uAVs can perform immediate rescue missions in damaged areas and assist in ground vehicle networking(IoV)data sharing.Ref.[46]proposed a uAV-assisted IoV light vehicle support block chain security(LVBS)data sharing framework for disaster relief.In the case of disaster, it solves the potential security threat of data sharing between vehicles and uAVs due to the unreliable network environment, unreliable tracking of bad behaviors and low quality of Shared data in uAVs assisted IoV.

Aiming at the security risks and centralized structure of the traditional intelligent transportation system, a security Data Sharing and Customized service scheme based on the Consortium Blockchain Data Sharing and Customized Services(DSCSCB)is proposed in Ref.[47].The agent reencryption algorithm based on ciphertext strategy attributes can divide the key into attribute keys and search keys for keyword search, which not only solves the problem that the agent reencryption algorithm cannot retrieve data, but also realizes data sharing and data forwarding.The algorithm effectively controls the access rights of data and provides a secure communication environment for VANET.

4.3 Applications of blockchain in smart city data sharing

With the development of science and technology, the construction of smart cities is developing steadily.The construction of smart cities involves multiple companies and government agencies, and the operation of smart cities usually requires data sharing across organizations.At present, realizing data sharing across organizations has become the biggest challenge in the development of smart cities.The traditional solution to data sharing is to set up a data sharing platform, which then extracts data from each organization and department.However, this data sharing mode causes many problems: ① The credibility of the data are questioned, and there is a need for data for traceability to ensure the data source, and due to the blockchain structure of the blockchain block, the data record in the chain can assure the traceability of data; ② In the process of transmitting data to the platform, data may be tampered with in the network, and unauthorized users may pretend to be users to access and control the data illegally; ③ Rules for sharing data need to be custom made.In the case of multi-party data sharing, single rules are no longer applicable.Therefore, rules for custom sharing data have become an urgent requirement.

In the data sharing architecture of mature smart cities, there are three identities: Data consumer, data provider, and data executor.Data consumers cannot access data directly; data providers share data locally.The data executor provides a trusted data-sharing environment and acts as a middleman between the data consumer and the data provider.When sharing data in a smart city, the data owner first uploads the information collected at the terminal device through the blockchain network, and after verification, stores some verification information and metadata in the blockchain.The real data are stored in a local database or a third-party storage organization.After a data requester makes a request, the blockchain network processes the request through monitoring.Most of the data processing is handled by the data executor, and the data are shared according to the rules customized by the data provider.

At present, the research focuses on data shared by multiple institutions.For example, in a smart grid, data are collected by multiple power centers to achieve faster dispatching.Courts, public security, and other relevant institutions share certain data.Until now, aspects that the research has focused on and the advantages furnished by the application of blockchain technology are as follows: ① Secure storage using blockchain: Data sharing in smart cities meets the needs of secure storage, mainly to solve data traceability.Scholars have found that storing data in the blockchain using the blockchain’s own chained data structure, the use of timestamps, and the non-tampering of data can meet the traceability needs of data; ② Customized sharing rules using blockchain: Because of the flexible programming and automatic execution of smart contracts, the ability to customize shared rules can be provided to multiple organizations; ③ Access control of data: Through the combination of blockchain and cryptography, users can be authorized, and the programmability of smart contracts is utilized to control permissions according to the corresponding identities of users, so as to achieve access control of data.

(1)Secure storage

Sharing record data in different places: Because data sharing in smart cities mostly involves multi-party sharing, in this case, ensuring the correctness of the data shared by each party becomes a big problem.The data structure of blockchain itself determines its traceability and no-tamperability, which can ensure the source and security of data.Therefore, many scholars use blockchain to secure the storage of shared data in smart cities.

Because a smart grid needs to share data among multiple power agencies, it requires that the data held by each node in the participating grid should be tamper-proof and traceable.This has higher requirements than traditional distributed networks.Because blockchain has these features, in Ref.[48], a system-based Hyperledger-Fabric stores the data to be uploaded by each system in the blockchain through encryption and signature.Due to the characteristics of the blockchain data structure, the data on the chain has traceability.The focus in Ref.[49]was to collect users’ electricity data for analysis and achieve optimal dispatching.The system first groups users according to their electricity consumption.Each group has its own private blockchain.After the electricity meter in the grid collects the data, it sends the data to the mining node, which integrates and stores the data in the blockchain.Because the system first groups users, and then stores the data in their respective blockchains, this ensures the security of the data.Moreover, due to its grouping nature, each private blockchain does not have a large amount of data.Storing data in the blockchain will not have a great impact on the performance of the blockchain, and the throughput of the blockchain can also be guaranteed to some extent.In a scenario of data sharing by multiple government agencies, blockchain was adopted to store three types of data in Ref.[50]: Metadata, smart contracts, and data operations records.The storage of records for data operations can be used as evidence to ensure that no data operations can deny their behavior, thus realizing the security of data.

(2)Access control

For remote record data sharing: When data are shared in a smart city, a large number of smart devices leads to information leakage when messages are passed between devices.The traditional way of data sharing cannot control access to shared data.To meet this demand, many scholars have applied blockchain to data sharing in smart cities for access control.

When resources are scarce, water and electricity must be conserved.However, due to the fast pace of modern life, people may forget to turn off water or electricity when they travel, leading to a waste of resources.In Ref.[51], the use of smart contracts to realize trading and regulatory resources based on blockchain is proposed.A smart device terminal collects data and shares the data to the corresponding platform, which performs data analysis to achieve resource management.The platform uses smart contract technology for authorization to achieve access control, as only authorized users are eligible to interact with data in the blockchain network.However, this method may consume time and space resources in the initial blockchain.

(3)Custom sharing rules

Data sharing for records used in different places: When multiple institutions share data simultaneously in a smart city, it may involve different sharing rules for each other’s data holdings.For example, when courts, public security organizations, banks, and other institutions share relevant data, they have different rights to operate the data according to their different identities.The need for each institution to be able to share data with its own specific sharing rules is particularly important.In response to this demand, Ref.[50]proposed a data sharing platform based on blockchain that involves each organization in data sharing as a node in the blockchain, and converts its paper contract to a smart contract for deployment.This satisfies the needs of different institutions to customize their shared rules.

4.4 Applications of blockchain in supply chain data management

A supply chain is a system of activity and associated information flows that moves a product or service from a supplier to a customer.The global supply chain market reportedly surged by more than $13 billion in 2017.This is adequate to prove that there is a lot of potential in the development of the supply chain, but problems in supply chain management cannot be underestimated.Because existing supply chain systems are centralized, there are considerable problems in product certification and traceability, and the traditional centralized way of data sharing is prone to arbitrary tampering and the illegal use of data by the central party.At present, the following problems in supply chain management need to be solved: ① Achieving traceability of the data recorded in the supply chain; ② Preventing centralized management, so the right holders can tamper with and use the data as they see fit.

The current popular architecture of blockchain-based supply chains includes suppliers, regulators, data processors, and ordinary users.Regulators have the highest authority over blockchain ledgers to read and write to them, and they can authorize nodes to join the blockchain.The supplier is the buyer and seller of the commodities that may be involved in each process, and has the right to write to the blockchain.That is, when the corresponding product information needs to be updated, the data executor is called to modify the ledger.Ordinary users are those who use the supply chain.They have only the right to inquire about the ledger and cannot modify it.

At present, the research focuses on the historical transaction information of various products and production information recorded in the blockchain.① Using blockchain for secure storage: The transaction information given to the products is encrypted, signed, and stored in the blockchain.The data structure of the blockchain itself makes it traceable, which meets the demand for traceability in supply chain records; ② Blockchain used for access control: Users in the system are classified by authorization through the use of digital signatures and smart contracts, and their identities are verified using smart contracts to achieve access control so that users can be identified.Moreover, the combination of smart contract and signature ensures the privacy of users and data.

(1)Secure storage

In view of the situation that the supply chain is used for food safety, food safety has received a great deal of attention.Thus, information about the source of the food needs to be accurate and cannot be tampered with.To do this, the author stores the registration records and access records in the blockchain.Due to the tamper-proof and decentralized nature of the blockchain, this approach ensures the identity of each node in the system and the non-repudiation of the data operations performed by each node.Similar to the background in Refs.[52-54]proposed a food information tracking system based on blockchain that utilizes two blockchains: the reservation chain, and the food information records supply chain.If users need to add real records to the supply chain, they need to make reservations in the reservation chain.The storage of real food records in the supply chain can save storage resources, because only real food records are stored in the blockchain.

Ref.[55]proposed a system based on blockchain similar to Bitcoin to record product transaction information.In each block, the seller’s information, product ID, and transaction amount are recorded so that the transaction records of each product can be queried.Moreover, the traders of the products can be queried through the record information on the blockchain, so traders cannot deny their operations.

(2)Access control

Data sharing scenarios for the remote use of record classes: In supply chain management, most cases involve data sharing by multiple organizations.Data leakage can easily occur in interagency communication.Traditional supply chain management methods cannot manage access control for data in the supply chain.Many scholars combine blockchain technology with encryption and signature to overcome the access control bottleneck in the supply chain.

Blockchain is used for access control as follows.Ref.[52]proposes a food information tracking system based on blockchain, which is built on the Hyperledger and is applicable to the licensing blockchain.There are three identities in the system: regulator, member, and customer.Regulators have the highest authority on the blockchain.Members refer to the suppliers in the food supply chain, companies, etc.They have the right to write data to the blockchain and submit data, but they can only see the data from the previous step and the data for the next step.Access control for these three identities is achieved by recording identity information in the blockchain.Each identity uses its own private key for logging in.Smart contracts implement queries and search requests sent by regulators, members, and customers by checking the identities recorded in the blockchain.

For drug tracking, in Ref.[54], the personnel responsible for drug transportation, transaction, and inspection in the drug supply chain are given corresponding authorization and sign the same digital signature during their work.Access control by identity is achieved this way.

5 Emerging application AI

With the emergence of blockchain and AI, they have become the two most popular technologies, and many researchers have devoted themselves to integrating blockchain technology and AI.The key to AI technology is data, algorithms, and computing power.Refs.[56-59]hold that the combination of AI technology and blockchain technology is of great significance, and will bring us into the fourth human revolution.At the data level, the distributed nodes of the blockchain and the characteristics of blockchain technology can render the data volume required by AI technology, and these data can ensure authenticity and effectiveness.Furthermore, some AI algorithms help to improve the efficiency of blockchain.

5.1 Enhanced data security[3]

Ref.[60]mentions that data are the input for various AI algorithms that mine valuable features, while data are everywhere on the Internet, controlled by different stakeholders who do not trust each other.The use of data in a complex network space is difficult to authorize or verify.Ref.[61]also states that the establishment of machine learning models requires a large amount of data.Collecting and organizing data accurately is a very tedious and time-consuming process.Thanks to blockchain technology, these problems can gradually be solved.In many different application fields, the integration of machine learning and blockchain technology is beneficial.Ref.[62]combines AI and blockchain to build an AI blockchain trust news platform, which solves the following problems: ① the verification of news authenticity is offered to the general public through blockchain technology; ② AI is used to detect and prevent fake news and deep fraud; ③ the blockchain trust property is used to provide a trust platform for journalism and the general public to release trusted news.Ref.[63]proposed a practical model training paradigm based on blockchain: distributed AI, whose purpose is to train a model with distributed data and reserve the data ownership and interest in the training model for its owner.This model solves the problems of data privacy, ownership, exchange, and model privacy in AI technology.Ref.[64]proposed an AI-blockchain system to manage electronic medical records, using blockchain to better manage the exponential growth of routine health care-related data.The use of AI technology can accurately analyze such large data, thus increasing the security of medical data, reducing costs, and increasing mutual interoperability.The rapid growth of data generated by connected devices under the mode of industrial Internet of Things provides new possibilities for improving the service quality of emerging applications through data sharing.However, security and privacy issues(for example, data leakage)are a major barrier for data providers to share data over wireless networks.Leaks of private data can cause serious problems beyond financial losses to providers.Ref.[65]first designs a blockchain authorized secure data sharing architecture for distributed parties.Then, combined with privacy protection federation learning, the data sharing problem is turned into a machine learning problem.Data privacy can be well maintained by sharing data models rather than exposing actual data.Finally, integrate federated learning into the negotiation process that allows blockchain, so that the computing work of the negotiation can also be used for federated training.The numerical results obtained from the actual data set show that the data sharing scheme has good accuracy, high efficiency and enhanced security.

5.2 High efficiency

For smart contracts with blockchain as the bottom layer, using AI to analyze data can lead to more accurate and intelligent execution of smart contracts.Scholars have also conducted studies in this regard[3].A framework for sharing and improving machine learning models based on blockchain is proposed in Ref.[66].According to the framework proposed, we can see that blockchain technology not only helps AI in diversity, accuracy, and large number of data sources, but also integrates AI technology into the smart contracts of blockchain technology, making the operations of a smart contract more intelligent and efficient.Ref.[67]also states that AI provides fast and effective processing of the blockchain system, and realizes intelligent contracts by automatically updating and modifying relevant conditions and verifying data, thus becoming the driving force behind the intelligent contract.Ref.[68]proposed a tradable P2P knowledge market for the Internet of Things based on edge AI.In this study, blockchain technology and machine learning are used to build a knowledge blockchain to ensure the safety and efficiency of the market.Ref.[69]established a mutual trust data sharing framework to break the direct data barriers of different operators.This framework uses AI technology combined with intelligent contracts to automatically operate the network, thus providing a secure and reliable environment for data sharing.Compared with other data sharing schemes, this framework has some performance advantages.

Traditional blockchain work requires a lot of computation at all miners’ nodes, which wastes considerable computing resources.Recent studies have improved the algorithms for the PoW, but have not completely separated the human factor.Ref.[70]proposes a new node selection algorithm based on AI technology that utilizes nearly complementary information from each node, and relies on a specially designed convolutional neural network to reach an agreement.This algorithm avoids complex hash operations and redundant verification operations, which is beneficial to energy savings.Experiments also showed that this algorithm had great advantages in the proof of blockchain.The private insurance industry provides claims services to its customers while employing state-of-the-art operations, processes and mathematical models to maximise profits.However, the traditional method based on the human in-loop model is very time-consuming and inaccurate.Ref.[71]presents a framework for a secure and automated insurance system that reduces human interaction, protects insurance activities, alerts and notifies risk customers, detects fraudulent claims, and reduces the insurance industry’s monetary losses.The scheme Uses a blockchain-based framework to share data between secure transactions and different interactive agents within the insurance.Meanwhile, Extreme Gradient Boosting(XGBoost)machine learning algorithm is adopted to boost the above insurance services.

Machine learning(ML)technology is expected to be used in specific applications of vehicle social networks(VSNs).Support vector machine(SVM)is a typical ML method, which has been widely used because of its high efficiency.Because of the limitations of the data source, the data collected by different entities often contains very different attributes.However, in some real-world scenarios, when training SVM classifiers, many entities face the same problem of lacking sufficient attribute data.Therefore, it is necessary for multiple entities to share data, combine data sets with different attributes, and jointly train a comprehensive classifier.However, data sharing raises data privacy issues.In order to solve this problem, Ref.[72]proposed a SVM classifier training scheme for privacy protection based on vertically partitioned data sets placed by multiple data providers.This scheme Uses the combined block chain and threshold homomorphic cryptography to establish a secure SVM classifier training platform without a trusted third party.

6 Future worthy research directions

6.1 Scalability

The data structure of blockchain itself, as well as its massive authentication process for recording data, makes it unable to cope with the growing volume of data, or to store and process big data, and expand the blockchain.At present, no progress has been made in terms of scalability.Some scholars have pointed out that the solution to blockchain expansion is to reconstruct the blockchain and expand its storage[73], but this method has not been proven.In Ref.[74], the author focused on the issue of blockchain scalability and proposed for readers the direction of extensibility for further research.

6.2 Lack of government and legal regulation

Because blockchain technology is an emerging technology, its rapid development and success have set off a new “revolutionary wave.” With the application of blockchain technology to electronic currency, medical treatment, Internet of Things, smart cities, and other fields, people are enjoying convenience brought by the technology, but corresponding management problems have also emerged.

With the explosive growth of electronic currency, the problems exposed by the lack of legal norms for blockchain technology should not be underestimated.In Ref.[75], the author investigated the preparation of blockchain laws in various countries and found that there were no relevant regulations in many countries, which could lead to the illegal circulation of electronic currency, and possibly the occurrence of “inflation.”

On the other hand, Ref.[76]proposed that due to the anonymity provided by blockchain, many illegal actors used this function to do illegal things.Due to the lack of relevant regulations for blockchain, some criminals have an opportunity to escape punishment.

7 Conclusion

This paper introduces the data sharing application and block chain technology, as well as the advantages and motivations of block chain technology applied to the data sharing scenario, and summarizes a typical data sharing architecture consisting of data storage layer, data sharing layer, consensus layer and terminal device layer.This paper summarizes the typical role of block chain technology in the data sharing scenarios of secure storage, access control, auditing and custom sharing rules.Artificial intelligence and other applications in need of large-scale data support will be the research hotspots of data sharing technology based on block chain.At the same time, the technology expansion capacity and business supervision capacity of the system also need to be improved rapidly to meet the demand of large-scale application.