Translation of an article by Eric Walla, Chief Investment Officer at Arcane Assets. Bitcoin is semi-anonymous - the protocol doe...
Translation of an article by Eric Walla, Chief Investment Officer at Arcane Assets.
- Bitcoin is semi-anonymous - the protocol does not know your real name, but transactions can be linked to you in many ways;
- Analytical blockchain companies specialize in the deanonymization of activity on the Bitcoin blockchain and sell this data to corporations and law enforcement agencies.
- Understanding how the system works and using Tor, coin controls, CoinJoin transactions, and disposable addresses can be critical to ensuring anonymity on the Bitcoin network;
- This article aims to give the reader a general idea of the level of anonymity of Bitcoin. The following articles in this series will look at various bitcoin wallets, as well as cryptocurrencies and exchange platforms in regions with limited economic and political freedoms.
When examining cryptocurrencies at the protocol level, it is immediately clear that they are more privacy-oriented than traditional digital payment systems. At the basic level of these protocols, there is usually no mapping between users' cryptographic key pairs and their personal data.
There are many points of view on cryptocurrencies in the context of money, but in these articles we will primarily focus on the topic of privacy. The level of privacy that cryptocurrencies provide varies greatly depending on the specific user choice and the use of assistive technologies. We will also see that the level of adoption of cryptocurrencies - in particular, bitcoin - is increasing in countries where the economic freedom of the population is limited.
Bitcoin privacy
Bitcoin is neither completely anonymous nor completely transparent. Bitcoin privacy exists in a gray zone and ultimately depends on the user's skills and the capabilities of the blockchain analyst. There is no perfect solution to ensuring the privacy of any type of online activity. Moreover, privacy is never static and is constantly evolving in the course of the battle between those who create tools to protect anonymity and those who create tools for deanonymization. The Bitcoin protocol is no exception and is constantly evolving.
As a result, activists or journalists who are considering using bitcoin to hide from the prying eyes of an authoritarian government or corporation need to understand what types of traces they leave when they use bitcoin.
Tracking transactions
When you make a Bitcoin transaction, you leave two types of traces. They can be divided into “in-block” and “off-chain” traces. The information that resides on the blockchain does not show a direct connection between your identity and your transactions, but it does contain data that can help link your transactions. Traces that link your identity to transactions belong to the second category “off-chain”.
- What is outside the blockchain
When you complete a transaction, you are most often sending or receiving money from some organization that knows you. This organization will have information about you outside the blockchain.
As a result, someone with sufficient motivation can figure out how you use your bitcoins, how much you have, and with whom you have traded.
There are also countless ways to link you to a transaction, even if you make a deal with an organization that does not know you, since Bitcoin transactions are usually sent in unencrypted packets over the Internet and the source IP address can be determined in various ways. Bitcoin transactions sent through full nodes (Bitcoin Core) require interception and analysis of network traffic to determine the original IP address, however, "light" mobile wallets (Mycelium, Blockchain Wallet, Coinbase Wallet) will broadcast transactions through company-managed servers that can directly see your IP address as well as complete transaction history. The same applies to most hardware wallets (Ledger, Trezor) when used out of the box.
More importantly, your IP address reveals your Internet service provider, who in turn knows the real owner of the IP address and often has a legal obligation to keep this information for several months.
Even when using a public Wi-Fi network to transfer transactions, it is still possible to accidentally associate your real identity with that IP address through the websites you visit and the services your device connects to. Your Dropbox app will connect to Dropbox's servers when you turn on your laptop, which will associate an IP address with your Dropbox account. The same will happen when you go to your personal account on any website. Even if you do not visit any Internet accounts, the cookies stored on your laptop can help identify you by linking the cookie to your previous browsing history. Many websites track such users for analytical purposes - it is estimated that Google alone tracks users on 80% of sites.
Even if you delete cookies, website operators can track you across different websites through your unique browser fingerprint and thus associate an IP address with you. And even if you do not have running services, the MAC address of your device may be available to the Internet service provider, and it can already be associated with you using rather complex methods.
You may also be linked to an address or transaction by a simple Internet search, since not many people besides you will search for a specific transaction or address just like that.
The Tor network is currently the best known method of hiding your device and IP address when obtaining information about transactions or transferring transactions. Many wallets, including Bitcoin Core, provide it as a configurable option.
The Tor browser can be a useful tool for masking your internet activity as, in addition to hiding your IP address, it clears the cookie every time you exit and is immune to most browser fingerprinting methods.
- What's on the blockchain
An easy way to understand what type of information the bitcoin blockchain contains is to use the block explorer. We will be using the Blockstream.info browser.
The most recent block at the time of this writing (# 563899) in the bitcoin blockchain contains 2122 transactions. Let's see what we can learn about a randomly selected transaction.
Transactions contain inputs and outputs and are identified by the transaction ID (above in the picture above). If your bitcoin wallet has sent a transaction, then it will be associated with such an identifier.
In addition to the transaction, we can find out:
- The approximate time when the transaction was made (from the block header);
- Addresses to which bitcoins were sent, and amounts ("outputs");
- Source of transaction funds ("inputs").
Let's consider each of these points separately for the transaction above: e70c2ed31c05fbf2865a15a696a7ca0cb8f3afef92c34f4e41051dc2356827c8 .
Time
Blocks, not transactions, contain timestamps. These labels are not necessarily accurate, but provided that most miners report the correct time, all blocks should be reasonably accurate to represent the time in the range of several hours. This does not mean that the timestamp of the block must necessarily be accurate within a few hours of the broadcast time of the transaction, as sometimes it can take much longer to include a transaction in a block.
Some block browsers supplement the data by showing the time when they first saw the transaction on the network to give a more accurate picture of the broadcast time of the transaction.
The approximate time for including a transaction in a block can be determined by looking at the block header (in our case, this is block 563899 with a time stamp of 2019-02-20, 14:45 UTC).
Addresses to which bitcoins were sent and amounts:
- 32Z63LVtUERdEEwz275JHt3o4cewPfE8YC 0.26119849 BTC
- 31w3iWUN5EMJMW2YRCc5m4RFqm3zN61xK2 0.2214705 BTC
The address is more than meets the eye. Bitcoin addresses are often referred to as “hard-to-read Bitcoin-only email addresses,” but the address is not a simple pointer to a particular user's cryptographic key pair. The address is a cryptographic description of the spending rules the next time someone wants to move those bitcoins.
For example, if you are sending bitcoins to 37k7toV1Nv4DfmQbmZ8KuZDQCYK9x5KpzP , the configuration of this address is such that you are not sending bitcoins to the owner of a particular private key, but to a spending rule that issues anyone who can provide two different strings that have the same SHA-1 hash. It should be noted that since many of the address formats in use today are hashed when we send bitcoins to them, we generally cannot tell what these spending rules are until someone spends bitcoins from that address, as they should reveal what has been hashed.
In our example, bitcoin transactions were spent from both addresses, so the spending rules for these addresses are known. 32Z63LVtUERdEEwz275JHt3o4cewPfE8YC was found to be a 2-of-2 multisignature address when used in transaction f491dfe9867c36e85950116a90a612806060608866ad0f3598d70d146750162f . We'll take a closer look at this point in the next section.
Likewise 31w3iWUN5EMJMW2YRCc5m4RFqm3zN61xK2 is a commonly used 2-of-3 multisig address and contains approximately 2,700 bitcoins at the time of writing. More advanced blockchain reviewers such as oxt.me even display the balance of an address over time and what hours of the day it is most active.
Since 18: 00-22: 00 UTC is the clock with the least activity for this address, it would be reasonable to assume that these hours represent the night hours 01: 00-05: 00 or 02: 00-06: 00 in the region where this address is monitored. Considering the hours of activity, the volume and the multisignature function of this address, one can guess that this address belongs to the crypto exchange in GMT + 7/8 time zones.
To increase the level of anonymity, it is not recommended to reuse a bitcoin address. So-called HD wallets can generate many addresses that only need one seed to access . These wallets automatically generate a new address for you every time you make a transaction.
Bitcoin transactions are routed to two addresses, where one of the transaction output is the actual payment and the other is the change output. This is the "change" that is returned to the sender.
This "change" and the actual recipient address can be identified by the use of round numbers (in bitcoins or in fiat equivalent at the time of the transaction), the order of withdrawal in the transaction body, etc. It is easy to find the "change" in the transaction we have chosen, since it is returned to the same address.
Basically, different bitcoin wallets leave different traces on the blockchain - just like different browsers leave information about themselves when browsing websites. Because of this, it is sometimes possible to identify transactions from a specific wallet. Every little piece of information helps a blockchain analyst gain a more accurate picture of who you are and what you do.
Source of funds
In bitcoin transactions, the “source of funds” is always other transactions or “unspent transaction outputs” (known as UTXOs). It should be remembered that each browser shows a combination of blockchain data and derived data. One browser can display a transaction like this:
Here the "source of funds" is displayed as an address. In the Blockstream browser, the funds source displays a transaction:
The reason the Blockstream browser does not show the address as the source of funds is because the addresses are not technically part of the transaction inputs, and it is not always possible to deduce the original address ( example ). Moreover, since address reuse is discouraged, it is helpful to mentally separate the Bitcoin transaction model from traditional payment systems rather than reinforcing the idea that money can or should be returned to the recipient at the same address by showing addresses as senders.
Let's take a closer look at the technical side of the transaction and consider the transaction data that you can get from your own full bitcoin node ( or this web tool ). This is how it looks:
The source of funds is described by the "vin" array. It does not refer specifically to the address, but to the output of the previous transaction; 593e2d5c65b3505d897a13033741037d6c59e683b3345314a58253a8f1572758 , where “vout”: 0 refers to the first exit of this transaction (“vout”: 1 would mean its second exit, and so on). This UTXO transaction is the source of funds.
For clarity, the source of funds for a transaction is neither the address nor the transaction. The source of funds is the specific output of a specific previous transaction. Knowing this will help you protect your privacy while using bitcoin (more on this in the following sections).
The last hexadecimal line in txinwitness reveals a 2-of-3 multisignature script that allows us to infer that this address may be from an exchange. The other two hexadecimal strings in txinwitness are simple signatures that fulfill the 2-of-3 multisig conditions.
Now, having identified the source of funds, we can see that this output is 0.48298999 BTC (~ $ 1850), although the payment was sent only for ~ $ 1000. This represents unwanted disclosure of funds: Imagine a situation where a friend pays you $ 10, but the transaction shows that he actually owns a million dollars and has direct access to it - this is clearly not good for privacy. If you are concerned about disclosing your bitcoin information when sending a payment to someone, you need to know what inputs are used in your transactions.
Combining knowledge
Since transactions are always a source of funds, they are linked and together create a so-called transaction graph. If you pay a friend in Bitcoin, your friend will not only see the inputs you used in the transaction; you can also see when your friend spends these coins and to which addresses he sends them.
Some addresses are widely known in the bitcoin community, such as the Bitfinex cold wallet or the confiscated Silk Road coins . The address can be published by the organization itself on the Internet - analytical companies regularly collect such information.
Other addresses are determined through clustering.
Clustering
Let's go back to our transaction e70c2ed31c05fbf2865a15a696a7ca0cb8f3afef92c34f4e41051dc2356827c8. Here we can immediately see that both the source of funds for our transaction and our transaction (red dots) were used to co-fund the third transaction (big blue dot).
But since both of their private keys were used to sign the big blue dot transaction, these addresses all now belong to the same cluster (along with 407 other addresses participating in the inputs of the transaction). This allows us to make the assumption that they have one owner. This methodology for defining transactions is known today as the shared owner login heuristic .
Analytical blockchain companies use such a heuristic to create giant clusters. WalletExplorer has assigned these two addresses to a cluster of 162,787 addresses. Analyst companies tag such clusters with identifiers (IP addresses, user accounts, organizations, real names) to map out the ecosystem of bitcoin transactions. They then sell that data to law enforcement and other companies.
Many blockchain analytics companies receive transaction information directly from their customers (for example, cryptocurrency exchanges). The two largest analytical companies, Chainalysis and Elliptic, said that they do not track transactions of specific individuals, but only transactions of exchanges or other business structures.
Deanonymization of one address in a cluster leads to deanonymization of the entire cluster.
Fighting heuristics
We now know that there are many ways to associate your identity with a specific Bitcoin address or transaction. Taken together, this data can destroy all of our financial privacy.
Some Bitcoin users deliberately try to make things harder for blockchain analysts. Some methods tamper with heuristics, while others try to avoid heuristics altogether. Bitcoin wallets can help users by automating some of these methods, or making them available through the user interface.
Here is a partial list of these methods:
- Shuffle the order of exits when creating a transaction ( example ).
- Preventing address reuse (using HD wallets).
- Using PayNym, a publicly identifiable identifier that allows you to receive payments to different addresses that you control and which are only known to you and the sender. PayNym allows you to receive a new address for every payment without having to manually submit a new address each time.
- Coin Selection / Coin Management - Wallets can be designed to allow users to manually select inputs for transactions. This allows the wallet user not to show ownership of certain coins.
A more advanced example of a privacy-enhancing technique is CoinJoin transactions. CoinJoin is a scheme that adds many inputs from many different users to a shared transaction prior to broadcast.
In our example, we saw that the input of a transaction always refers to the specific output of the previous transaction, and not to the entire transaction:
But the inputs and outputs in each individual transaction are not related to each other in any way; the transaction is possible as long as there are enough bitcoins at the inputs to cover all outputs.
CoinJoin transaction outputs are of equal amounts, so you cannot be sure which input is funding a particular payment. As a result, a payment can have many possible “sources of funds” that are indistinguishable from each other, as well as many possible destinations. It doesn't technically hide the source of funds or destination, but it mixes them up in a way that makes it difficult to determine who sent the bitcoins and where.
What's interesting about these kinds of transactions is that they complicate the shared owner login heuristic. All of these inputs will be marked as belonging to the same owner (although in reality they are not). The images below show false clusters of independent payments in a CoinJoin transaction.
However, since these transactions all have strange exits with the same amounts, they are fairly easy to spot and rule out.
However, the same principle is used to create transactions that are indistinguishable from regular transactions in a scheme called PayJoin, or Pay-to-EndPoint (P2EP). This type of transaction mixes inputs from the payer and recipient and pays out to the recipient.
This transaction can hardly be called a mixing, but it complicates the heuristic of shared owner logins. More importantly, it leaves no clue for blockchain analysts to rule out these transactions. If the use of PayJoin becomes widespread, the shared owner heuristic itself will become unreliable, which will be a major blow to blockchain analytics companies.
Lightning network
The Lightning Network is a technology that is being developed based on the Bitcoin protocol for making small and fast payments. Lightning transactions differ in many ways from transactions on the Bitcoin mainnet, including in terms of privacy.
- Lightning transactions are not stored in the public ledger:
- Lightning transactions use onion routing that does not expose the rest of the final recipient's network.
- Lightning transactions do not mix inputs and do not lend themselves to clustering.
The Lightning Network is a channel system that requires liquidity; The current set of merchants and users who accept Lightning payments today is a small subset of the total set of Bitcoin users in the system, and not all payments (especially large ones) can go through the channel system yet. This also means that while Lightning can provide better privacy for transactions on its channel system, these channels still have to be funded by regular Bitcoin transactions that have the privacy concerns described above.
Output
The purpose of this article is to provide an understanding of Bitcoin privacy. Bitcoin's pseudonymous and public blockchain creates an environment in which the privacy of the system ultimately depends on the tools used by the user and blockchain analysts. In the next article, we'll take a closer look at how different wallets can help you increase your anonymity on the Bitcoin network.