Methodology

Summary 

The CBECI mining map aims to track the geographic distribution of Bitcoin’s total hashrate over time. Available in monthly intervals starting from September 2019, it extrapolates from a representative sample of geolocational mining facility data. This sample is based on the aggregation of individual pool distributions that are periodically collected from several Bitcoin mining pools through a dedicated API. 

Data collection 

 We have partnered with several Bitcoin mining pools to collect geolocational mining facility data in a non-obtrusive and privacy-preserving manner. This geolocational data is based on IP addresses of mining facility operators (‘hashers’) that connect to the servers of mining pools.  

Assumption 1: IP addresses of mining facility operators are an accurate indicator of hashrate location.

Each participating mining pool aggregates IP addresses on their end to create an average geographic distribution of total pool hashpower by country and region. Pools then periodically push their individual distribution to our database via a dedicated API endpoint, connecting with a unique, pseudonymous access token that obfuscates the identity of the pool. The corresponding name table is encrypted and stored locally for security purposes. Please note that CCAF has at no point access to the underlying IP addresses or any other sensitive pool data.

Data shared includes the following parameters: 

  • Average hashrate for a given country/region over the selected period (different units accepted)

  • Country name from the World Bank classification list 

  • Data period (ranging from “daily” to “monthly”) 

  • Period start date in YYYY–MM–DD format 

  • Province name for pools that want to provide a more granular regional breakdown within a country (optional, free format)  

Data aggregation and analysis 

Upon receiving individual pool data, we apply a number of data validation techniques to ensure that reported data is complete and constitutes a reasonable approximation. This is done, among others, by contrasting reported data to publicly observed data from third-party services such as Coin Metrics or BTC.com. 

We then proceed to the aggregation of the individual pool distributions to create a broader sample. This process is performed manually to ensure that no individual pool data may be inferable. As a result, we never add new pools or retrospective data in isolation, which may occasionally result in delayed map updates as the timing of data sharing needs to be coordinated across pools. The sample serves as a proxy for the geographic distribution of Bitcoin's total hashrate by means of extrapolation. In case of significant anomalies detected during the data collection process, we may exceptionally resort to alternative data sources or extrapolate historical data to ensure that the dataset remains an approximate reflection of real circumstances. We are in regular contact with industry experts, mining firms, and academics to monitor the situation and take appropriate actions where needed. In the rare event that these anomalies affect a significant share of the sample, publication will be postponed until we are sufficiently confident that issues have been resolved.

Assumption 2: data provided by participating mining pools constitutes a representative sample of Bitcoin’s total geographic hashrate distribution.

This approach is based on the implicit assumption that participating mining pools constitute a representative sample of the total Bitcoin hashrate (please see the next section for a discussion on the limitations of this approach). Data is currently provided by three mining pools for the latest period. As shown in Table 1 below, the sample has captured approximately between 32% and 38% of total Bitcoin hashrate since the launch of the mining map.1

Table 1: Mining pool sample 

Period 

BTC.com 

Poolin  

ViaBTC 

Foundry 

Average share of total hashrate 

Sep 2019 – Apr 2020 

X

X

X

 

37%

May 2020 – Jan 2021 

X

X

X

32%

Feb 2021 – Apr 2021 

X

X

X

X

35%

May 2021 – Aug 2021 

X

X

X

X

34%

Aug  2021 – November 2021

X

X

X

X

38%

Dec 2021 – Jan 2022

X

X

 

X

33%

Regional breakdowns are available for China and the United States, albeit the frequency of country-level updates may differ due to peculiarities of the data collection process. We plan to add more granularity with future updates to better capture regional hashing activities in other countries. If regional hashrate data is not available for a given pool, we extrapolate from the existing sample.

Assumption 3: the available sample of regional data is representative of the total hashrate distribution within a given country.

Discussion 

There are various methods available to approximate Bitcoin’s geographical hashrate distribution, each having their own set of trade-offs and limitations. We opted for the top-down mining pool approach because it provides the right balance between data availability and robustness on one hand, as well as data granularity and confidentiality on the other hand.  

Maintaining a bottom-up list of individual mining facilities is cumbersome, prone to error and/or omission, requires constant monitoring and updating, relies on a combination of different data sources that tend to be difficult to verify, and exposes individual facility operators to potential privacy risks. In contrast, the top-down pool approach eliminates much of the overhead associated with data collection (by significantly reducing the amount of data that needs to be collected), uses a single, consistent methodology throughout the process (thereby reducing the risk of incompatible or otherwise conflicting data points), and protects the privacy of both individual mining facility and pool operators (thanks to the double aggregation performed by different parties).

As with every model, however, there are certain limitations that arise from this approach.  

  • Sample may not be sufficiently representative 

The Bitcoin mining map is based on an extrapolation of a sample of mining pool data. This sample may not be fully representative as it (i) represents less than half of Bitcoin’s total hashrate, and (ii) is dominated by mining pools previously headquartered in China. 

While earlier versions of the mining map appeared overly biased towards China, there are reasons to believe that the sample nevertheless has provided a reasonable approximation of the actual hashrate distribution to date. For one, all participating pools maintain servers in various geographies across the globe to serve their foreign customer base with minimal latency. Furthermore, Chinese pools have dominated Bitcoin mining in recent years, among others because of their relatively low fee structure which has attracted numerous foreign hashers. With the recent exodus, we expect this trend to accelerate. Finally, the data is based on three mining pools that operate as independent businesses, which allows us to further cross-check for potential anomalies.

The research team is actively looking to partner with additional mining pools and hashers to improve the accuracy and reliability of the mining map. Please get in touch if you would like to contribute. 

  • Use of VPNs or proxy services 

It is no secret in the industry that hashers in certain locations use virtual private networks (VPNs) or proxy services to hide their IP addresses in order to obfuscate their location. Such behaviour may distort the sample and result in an overestimation (or underestimation) of hashrate in some provinces or countries. For one mining pool, this effect was particularly visible in the Chinese province of Zhejiang. To mitigate this issue, we divided the hashrate of Zhejiang province proportionally among other Chinese provinces listed in the pool’s dataset.

Following the government ban in June 2021, reported hashrate for Mainland China effectively plummeted to zero during the months of July and August but suddenly surged back to more than 20% in September. This strongly suggests that significant underground mining activity has formed in the country, which appears to empirically confirm similar assertions by leading industry insiders and miners. Access to off-grid electricity and geographically scattered, small-scale operations are among the major means used by underground miners to hide their operations from authorities and circumvent the ban.2

A comeback of this magnitude within the period of one month would seem unlikely given physical constraints, as it takes time to find existing, or build new, non-traceable hosting facilities at such scale. This highlights an inherent trade-off of our top-down pool-based approach, which is theoretically vulnerable to deliberate obfuscation by individual miners who may, for various reasons, choose to conceal their location, This behaviour is best illustrated by the persistently high reported shares of countries like Germany and Ireland where, as far as we can tell, no meaningful mining activities exist.3

In practice, however, we believe this limitation to only moderately impact the validity of the overall analysis for most of the time, with the exception of sudden ‘shocks’ that fundamentally alter risk tolerance and miner expectations. The reason is network latency and corresponding revenue losses: greater latency in network connections generally reduces mining revenues because longer block propagation times may result in orphaned blocks that yield no reward. Even if latency is minimal, it puts affected hashers at a disadvantage relative to other hashers in the global competitive race for solving the hash puzzle. With the exception of political and regulatory motives, hashers are therefore incentivised to use servers in close vicinity.