BUIP045: Unified addresses format for BUIP037

deadalnix

Active Member
Sep 18, 2016
115
196
This is mostly an adaptation of Rusty Russel's work for BUIP037.

The address is composed of the following elements:
1. Prefix for type, followed by colon. Currently "btc:" or "testnet:".
2. The version field, encoded using z-base32
3. The hash field, encoded using z-base32
4. At least 35 bits for crc64-ecma, up to a multiple of 5 to reach a letter boundary. This covers the prefix (as ascii), the version and the hash.
5. The final letter is the Damm algorithm check digit of the entire previous string, using the 32 way quasigroup formed by GF(2^5) and x^5 + x^2 + 1 as base polynomial and n = 2*x + y as generator.

Step 4 and 5 add 8 characters to the address, 7 for CRC and 1 for Damm, and ensure this

This has various advantages:
- A unified address format for all current addresses and futures features to come.
- Nicer looking than base58 - subjective but base58 addresses tends to be intimidating for newcomers.
- Protected against errors via a CRC64.
- Protected against any single character change via Damm.
- As long as the number of version is kept under 64, the first letter indicate the type of address as it is the 5 most significant digit of the version.
- Addresses are compact enough, 42 digit for 20B hash, 61 for 32B.

Example (I did not run CRC or Damm on them, just filled with random digits):
btc:yybul3unspd4bn2aji33natie5sd2iqsikrfgm8ejk (42 digits for a P2KH, 20B hash).
btc:b1ee6ceodhmo2d0rmr7utt31u0163kdoakirgoigsd112d5batlt2a (61 digits for a P2SH, 32B hash).

EDIT: Change the quasigroup used for the Damm algorithm.
 
Last edited:

deadalnix

Active Member
Sep 18, 2016
115
196
Note that we can win 1 digit by using 30+ bits of CRC instead of 35. The checks is at least 1 in a billion, so it may be good enough and that'll make addresses smaller by 1 digit.
 

adamstgbit

Well-Known Member
Mar 13, 2016
1,206
2,650
I really dislike "Nicer looking than base58"

"yybul3unspd4bn2aji33natie5sd2iqsikrfgm8ejk"
or
"1M3Sew6p6vnru62cPMefrLz51kT9DaRgoX"

one looks like a bitcoin address the other simply does not, both still look unreadable...
can you keep base58 encoding and still have all these other neat things you speak of?

i'm just saying bitcoin address have been around for a while we all can recognize them quickly, changing this established look is undesirable IMO.
 
  • Like
Reactions: solex

deadalnix

Active Member
Sep 18, 2016
115
196
You are thinking about the 1M people who are used to this format, I'm talking about the 7B who aren't.

One could make it work with base58, but as 58 is not a power of 2, the spec would have to be significantly more complex.
 
What's the state of it?

Rusty on his blog:
I knew it was half-baked.
Is it by now more baked?

Can you elaborate what problems it precisely solves? What are the disadvantages of the old format?

Does it need a hardfork, or can it be added as an additional format?

If this BUIP makes my old addresses useless, I'm strongy against it. If not, if it is an additional format, I maybe support it, if you explain more precicely the questions above.
 

adamstgbit

Well-Known Member
Mar 13, 2016
1,206
2,650
You are thinking about the 1M people who are used to this format, I'm talking about the 7B who aren't.

One could make it work with base58, but as 58 is not a power of 2, the spec would have to be significantly more complex.
then invent a new base which looks more like bitcoin addresses.

yes i'm thinking short term this tiny change will have a monumental impact on poeple. the fact that it accompanied by segwit HF and EC, really makes one feel that we killed bitcoin and created somthing new.
 

deadalnix

Active Member
Sep 18, 2016
115
196
Is it by now more baked?
If people are happy with it, this is baked enough. You tell me.

Can you elaborate what problems it precisely solves? What are the disadvantages of the old format?
There are 2 old format: P2KH (addresses in 1xxxx) and P2SH (3xxxx). Other type of output are not representable using usual addresses. If you mess with blockchain explorers, you'll notice that they can't provide addresses for various complex outputs.

Soft fork SegWit from Core right now doesn't have a way to represent its addresses. They want to add new addresses formats to handle this but it is not done at this stage (See BIP142).

I propose to add one address format that can handle them all. This address format can provide an address for all BUIP037 outputs, including future extensions. It can also support existing addresses if we want to (P2SH and P2KH).

Does it need a hardfork, or can it be added as an additional format?
It is not a fork at all.

If this BUIP makes my old addresses useless, I'm strongy against it. If not, if it is an additional format, I maybe support it, if you explain more precicely the questions above.
Addresses are only useful if you and me, and whoever we use an address format with agree on its meaning. The two of us could start using this format today and have nobody else in the network know about it. However, it is useful to have some generally agreed upon meaning for addresses, so that different wallets and blockchain explorer and exchange and all can use the same and provide good experience to users.

Existing addresses format only represent one specific things, so we are doomed to add new address formats to represent new things agin and again, unless we have some more generic address format like this one.

You can use the old address format as long as you and whoever you are exchanging with accept it (and the software you are using).
 
Last edited:
  • Like
Reactions: freetrader

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
I'm happy enough with the new address format - the only thing I don't fully understand where it would come into play is the new 'version' field.

Still, I would consider this BUIP as-is a good candidate for a new address format for post-fork addresses on the BTCfork project, and would like to cobble together a rudimentary implementation of a library for this with some tests.

@deadalnix : if you have a preferred place (e.g. existing repo) where a collaborative development could take place on this feature, I'm open to suggestions. Otherwise I'd start a new repo for it under my GitHub user account.

Update: as per Slack discussion, working title: 'librustyaddress' :)
 
Last edited:
  • Like
Reactions: Mengerian

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
If I understand Rusty (and this BUIP) correctly, the prefix is supposed to be ASCII.

This is where I would suggest a change to allow the prefix to be UTF-8 encoded Unicode separated from the rest by a colon.

Supporting reasons for this suggested change:
  • all valid ASCII sequences are valid UTF-8 encoded Unicode as well, so no loss there
  • Unicode support is better suited to global needs when trying to get across human-readable prefixes
  • human readable unicode representation can be more compact, which is important for small devices
  • would allow using Bitcoin symbol (code point U+20BF, expected to be part of the Unicode 10.0 standard in June 2017) to be supported as address prefix.
 
Last edited:

theZerg

Moderator
Staff member
Aug 28, 2015
1,012
2,327
As LukeJr also mentioned, how hard would it be to add BIP47 reusable payment codes in? This would be an extremely useful feature whereas having a new address format, with the few small described advantages seems to me like churn.
 
If people are happy with it, this is baked enough. You tell me.
It is not a fork at all.

You can use the old address format as long as you and whoever you are exchanging with accept it (and the software you are using).
.
Ok, so the pub key and the scripts inside remain the same, it is just a way for wallets to view things?

If so, if this is just an option for wallets to represent things, I don't know a reason why to be against it. If the market wants it, it should take it. But I'm still somehow puzzled why anybody should use them. Will it makes the handling of Multisig and other scripts significantly easier? Imo the different address formats have not been the major reason against this.

I agree with @theZerg that some additional strong feature like Payment codes would be nice.
 

deadalnix

Active Member
Sep 18, 2016
115
196
Yes, this is just a way for wallet to represent things. The problem with the current way of doing things is that each address format is limited to a use case, so every new use case require a new address format. That's why you saw the addition of address in 3xxx when P2SH was introduced. There are no way to express soft fork SegWit addresses for instance. So my expectation is that current addresses will mostly be used for existing use cases, but that can be used for any new feature.

I'm not super familiar with payment code, so I'll dig into it and see what could be done here.
 

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
It's kind of an opportune moment to raise a BUIP047 to match BIP47.

Maybe BUIP046 could be an HD wallet, they seem like a prerequisite?

Either way, if BIP47 requires HD then seems like a large implementation effort which might be better timed after an unbundling of the wallet. Unless someone contributes already-existing code that can be easily integrated.

I'm not convinced it's a good idea to inflate this BUIP with it, rather keep BUIPs focused and small.
 

deadalnix

Active Member
Sep 18, 2016
115
196
If I understand Rusty (and this BUIP) correctly, the prefix is supposed to be ASCII.

This is where I would suggest a change to allow the prefix to be UTF-8 encoded Unicode separated from the rest by a colon.

Supporting reasons for this suggested change:
  • all valid ASCII sequences are valid UTF-8 encoded Unicode as well, so no loss there
  • Unicode support is better suited to global needs when trying to get across human-readable prefixes
  • human readable unicode representation can be more compact, which is important for small devices
  • would allow using Bitcoin symbol (code point U+20BF, expected to be part of the Unicode 10.0 standard in June 2017) to be supported as address prefix.
As discussed offline, I don't think making this UTF-8 is such a great idea. That would require to check for canonicity of the utf-8 encoding, and even that way there are all kind of ways to encode the same string, because code point aren't graphemes.

I'm worried that some of this could be exploited to trick users. I don't have a specific attack scenario in mind, but that wouldn't be the first time unicode and utf-8 are exploited for security. For this reason I'm not 100% convinced this is a good idea.
 
  • Like
Reactions: HostFat
Do you know the discussion of Ethereum which address format to use?
http://ethereum.stackexchange.com/questions/267/why-dont-ethereum-addresses-have-checksums
Maybe this is interesting for this discussion:

Ethereum apps don't take the Bitcoin approach because there is an even more featureful way of representing raw Ethereum addresses, called the ICAP, which looks like this: "XE7338O073KYGTWWZN0F2WZ0R8PX5ZPPZS". Like the standard Bitcoin address representation, it uses a wider range of alphanumeric characters to save space and includes a checksum. But that's not all, folks!

For one thing, the ICAP is a fully valid International Bank Account Number (or IBAN). That means that existing bank software can understand it and interact with it.

For another, the ICAP doesn't have to use hexadecimal addresses. Instead, once we all do switch over to using namereg contracts it can just use your actual human readable string to end up with something like "XE81ETHXREGJEFFCOLEMAN", which still matches bank formats but might be possible to actually remember!
Some more questions (excuse that I don't know too much):
- has the format you proposed strong checksums? (similar question - is it possible to mistype it, like a public key?)
- how does it behave with some kinf of "individualization", meaning build something "nice" and "readable" like 1BergmannXhkjsU72HsnlC (phantasie-address). If the format would make the bruting of individualized addresses more easy, it could be attractive.
 

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
@Christoph Bergmann : yes, it has a strong checksum:

In error detection, the Damm algorithm is a check digit algorithm that detects all single-digit errors and all adjacent transposition errors.

See also: https://en.wikipedia.org/wiki/Damm_algorithm#Strengths_and_weaknesses

Vanity addresses could still be constructed, they would just have to be manufactured using the address creation algorithm outlined here. That means that the alphabet is different due to the zbase-32 encoding choice:

abcdefghijkmnopqrstuwxyz13456789

(letters are not in order above - I've sorted for human readability)

Since the addresses described by this BUIP are less compact and also less computationally intensive to derive than current Bitcoin addresses, you should be able to get more "vanity" for your computational buck.
 

deadalnix

Active Member
Sep 18, 2016
115
196
- has the format you proposed strong checksums? (similar question - is it possible to mistype it, like a public key?)
It has at least 35 bits of CRC, plus a Damm.

- how does it behave with some kinf of "individualization", meaning build something "nice" and "readable" like 1BergmannXhkjsU72HsnlC (phantasie-address). If the format would make the bruting of individualized addresses more easy, it could be attractive.
You'd have to try to generate a bunch of addresses just like you do for current addresses. This doesn't change much here.
 

deadalnix

Active Member
Sep 18, 2016
115
196
I ran the number, you get, in the worst case scenario, a chance in 1000B to get a valid address ( 1099511627776 exactly), but it can be more depending on the type of address. For instance, it is 4 time that number for traditional P2SH/P2KH addresses.

In addition, it is 100% guaranteed that an error on one character is detected because of Damm.
 

freetrader

Moderator
Staff member
Dec 16, 2015
2,806
6,088
Some technical notes arrived at after trying to implement what's described:

1. The 64-bit CRC algorithm needs to be specified exactly. 'crc64-ecma' can lead to various interpretations - there are at least two conflicting parameter sets, variously implemented in libraries, that lay claim to the 'ECMA' moniker. Wikipedia lists ECMA-182 as the underlying spec, and these implementations share the polynomial defined by ECMA-182 but some of these 'ECMA' implementation do not strictly conform to ECMA-182 (notably the Golang implementation at [1]). The other contender for the 'ECMA' designation is CRC-64/XZ, which uses the same polynomial but the other parameters differ.
So it needs to be made precise which one is used. The Golang implementation is conform to the CRC-64/XZ parametrization as listed in [2].

2. It needs to be made precise how the bits of the computed 8-byte CRC are appended to the preceding data prior to zbase32 encoding. This makes a great difference to the resulting address of course, and currently is not fully specified. The most intuitive (in my view) would be to start appending from the most significant end of the CRC value (the high bits first). I don't think it really matters either way though, as long as there is no ambiguity.

[1] https://golang.org/src/hash/crc64/crc64_test.go (w.r.t. the 'outECMA' test values)
[2] http://reveng.sourceforge.net/crc-catalogue/17plus.htm#crc.cat-bits.64