nervuri

Zero-width character test

2021-02-20T00:00:00Z

Zero-width character test

Below are several zero-width Unicode characters, placed between underscores. Can your browser display them? Your text editor? Your terminal? To understand why not being able to display them might be a problem, read Tracking via pasted text or about the Trojan Source vulnerability.

This page also contains unassigned code points and various control characters which certain programs render as zero-width even though they shouldn't.

As a point of reference, here are a few positive-width characters:

0020: _ _ | 00E9: _é_ | 03A9: _Ω_ | 5B57: _字_ | 1F407: _🐇_

Zero-width characters

 00AD: __ soft hyphen

 034F: _͏_ combining grapheme joiner

 061C: _؜_ Arabic letter mark

 070F: _܏_ Syriac abbreviation mark

 115F: _ᅟ_ Hangul choseong filler
 1160: _ᅠ_ Hangul jungseong filler

 17B4: _឴_ Khmer vowel inherent aq
 17B5: _឵_ Khmer vowel inherent aa

 180B: _᠋_ Mongolian free variation selector one (FVS1)
 180C: _᠌_ Mongolian free variation selector two (FVS2)
 180D: _᠍_ Mongolian free variation selector three (FVS3)
 180E: _᠎_ Mongolian vowel separator
 180F: _᠏_ Mongolian free variation selector four (FVS4)

 200B: __ zero width space
 200C: _‌_ zero width non-joiner
 200D: _‍_ zero width joiner
 200E: _‎_ left-to-right mark
 200F: _‏_ right-to-left mark

 2028: _ _ line separator
 2029: _ _ paragraph separator

 202A: _‪_ left-to-right embedding
 202B: _‫_ right-to-left embedding
 202C: _‬_ pop directional formatting
 202D: _‭_ left-to-right override
 202E: _‮_ right-to-left override

 2060: _⁠_ word joiner
 2061: _⁡_ function application
 2062: _⁢_ invisible times
 2063: _⁣_ invisible separator
 2064: _⁤_ invisible plus
 2065: _⁥_ unassigned
 2066: _⁦_ left-to-right isolate
 2067: _⁧_ right-to-left isolate
 2068: _⁨_ first strong isolate
 2069: _⁩_ pop directional isolate
 206A: _⁪_ inhibit symmetric swapping (deprecated)
 206B: _⁫_ activate symmetric swapping (deprecated)
 206C: _⁬_ inhibit arabic form shaping (deprecated)
 206D: _⁭_ activate arabic form shaping (deprecated)
 206E: _⁮_ national digit shapes (deprecated)
 206F: _⁯_ nominal digit shapes (deprecated)

 3164: _ㅤ_ Hangul filler

 FE00: _︀_ variation selector-1 (VS1)
 FE01: _︁_ variation selector-2 (VS2)
 FE02: _︂_ variation selector-3 (VS3)
 FE03: _︃_ variation selector-4 (VS4)
 FE04: _︄_ variation selector-5 (VS5)
 FE05: _︅_ variation selector-6 (VS6)
 FE06: _︆_ variation selector-7 (VS7)
 FE07: _︇_ variation selector-8 (VS8)
 FE08: _︈_ variation selector-9 (VS9)
 FE09: _︉_ variation selector-10 (VS10)
 FE0A: _︊_ variation selector-11 (VS11)
 FE0B: _︋_ variation selector-12 (VS12)
 FE0C: _︌_ variation selector-13 (VS13)
 FE0D: _︍_ variation selector-14 (VS14)
 FE0E: _︎_ variation selector-15 (VS15)
 FE0F: _️_ variation selector-16 (VS16)

 FEFF: __ zero width no-break space

 FFA0: _ﾠ_ halfwidth Hangul filler

 FFF0: _￰_ unassigned
 FFF1: _￱_ unassigned
 FFF2: _￲_ unassigned
 FFF3: _￳_ unassigned
 FFF4: _￴_ unassigned
 FFF5: _￵_ unassigned
 FFF6: _￶_ unassigned
 FFF7: _￷_ unassigned
 FFF8: _￸_ unassigned
 FFF9: _￹_ interlinear annotation anchor
 FFFA: _￺_ interlinear annotation separator
 FFFB: _￻_ interlinear annotation terminator
 FFFC: __ object replacement character

 FFFE: __  (removed from Atom feed, as this character is not allowed in XML)
 FFFF: __  (removed from Atom feed, as this character is not allowed in XML)

13430: _𓐰_ Egyptian hieroglyph vertical joiner
13431: _𓐱_ Egyptian hieroglyph horizontal joiner
13432: _𓐲_ Egyptian hieroglyph insert at top start
13433: _𓐳_ Egyptian hieroglyph insert at bottom start
13434: _𓐴_ Egyptian hieroglyph insert at top end
13435: _𓐵_ Egyptian hieroglyph insert at bottom end
13436: _𓐶_ Egyptian hieroglyph overlay middle
13437: _𓐷_ Egyptian hieroglyph begin segment
13438: _𓐸_ Egyptian hieroglyph end segment
13439: _𓐹_ Egyptian hieroglyph insert at middle
1343A: _𓐺_ Egyptian hieroglyph insert at top
1343B: _𓐻_ Egyptian hieroglyph insert at bottom
1343C: _𓐼_ Egyptian hieroglyph begin enclosure
1343D: _𓐽_ Egyptian hieroglyph end enclosure
1343E: _𓐾_ Egyptian hieroglyph begin walled enclosure
1343F: _𓐿_ Egyptian hieroglyph end walled enclosure

1BCA0: _𛲠_ shorthand format letter overlap
1BCA1: _𛲡_ shorthand format continuing overlap
1BCA2: _𛲢_ shorthand format down step
1BCA3: _𛲣_ shorthand format up step

1D159: _𝅙_ musical symbol null notehead
1D173: _𝅳_ musical symbol begin beam
1D174: _𝅴_ musical symbol end beam
1D175: _𝅵_ musical symbol begin tie
1D176: _𝅶_ musical symbol end tie
1D177: _𝅷_ musical symbol begin slur
1D178: _𝅸_ musical symbol end slur
1D179: _𝅹_ musical symbol begin phrase
1D17A: _𝅺_ musical symbol end phrase

E0000: _󠀀_ unassigned
E0001: _󠀁_ language tag (deprecated)
E0002: _󠀂_ unassigned
... (E0002-E0019 unassigned)
E0019: _󠀙_ unassigned
E0020: _󠀠_ tag space
... (E0020-E007F formerly used for tagging texts by language)
E007F: _󠁿_ cancel tag
E0080: _󠂀_ unassigned
... (E0080-E00FF unassigned)
E00FF: _󠃿_ unassigned
E0100: _󠄀_ variation selector 17
... (E0100-E01EF: variation selectors supplement)
E01EF: _󠇯_ variation selector 256
E01F0: _󠇰_ unassigned
... (E01F0-E0FFF unassigned)
E0FFF: _󠿿_ unassigned

EFFFD: _󯿽_ unassigned
EFFFE: __  (removed from Atom feed, as this character is not allowed in XML)
EFFFF: __  (removed from Atom feed, as this character is not allowed in XML)

10FFFD: _􏿽_ unassigned (private use)
10FFFE: __  (removed from Atom feed, as this character is not allowed in XML)
10FFFF: __  (removed from Atom feed, as this character is not allowed in XML)

Not included: samples from unassigned ranges *FF80-*FFFF, with the exception of EFF80-EFFFF, a sample of which is included.

ASCII control codes

00: __ (removed from Atom feed, as this character is not allowed in XML)
01: __ (removed from Atom feed, as this character is not allowed in XML)
02: __ (removed from Atom feed, as this character is not allowed in XML)
03: __ (removed from Atom feed, as this character is not allowed in XML)
04: __ (removed from Atom feed, as this character is not allowed in XML)
05: __ (removed from Atom feed, as this character is not allowed in XML)
06: __ (removed from Atom feed, as this character is not allowed in XML)
07: __ (removed from Atom feed, as this character is not allowed in XML)
08: __ (removed from Atom feed, as this character is not allowed in XML)

0B: __ (removed from Atom feed, as this character is not allowed in XML)
0C: __ (removed from Atom feed, as this character is not allowed in XML)

0E: __ (removed from Atom feed, as this character is not allowed in XML)
0F: __ (removed from Atom feed, as this character is not allowed in XML)
10: __ (removed from Atom feed, as this character is not allowed in XML)
11: __ (removed from Atom feed, as this character is not allowed in XML)
12: __ (removed from Atom feed, as this character is not allowed in XML)
13: __ (removed from Atom feed, as this character is not allowed in XML)
14: __ (removed from Atom feed, as this character is not allowed in XML)
15: __ (removed from Atom feed, as this character is not allowed in XML)
16: __ (removed from Atom feed, as this character is not allowed in XML)
17: __ (removed from Atom feed, as this character is not allowed in XML)
18: __ (removed from Atom feed, as this character is not allowed in XML)
19: __ (removed from Atom feed, as this character is not allowed in XML)
1A: __ (removed from Atom feed, as this character is not allowed in XML)
1B: __ (removed from Atom feed, as this character is not allowed in XML)
1C: __ (removed from Atom feed, as this character is not allowed in XML)
1D: __ (removed from Atom feed, as this character is not allowed in XML)
1E: __ (removed from Atom feed, as this character is not allowed in XML)
1F: __ (removed from Atom feed, as this character is not allowed in XML)

7F: __ (removed from Atom feed, as this character is not allowed in XML)
80: __ (removed from Atom feed, as this character is not allowed in XML)
81: __ (removed from Atom feed, as this character is not allowed in XML)
82: __ (removed from Atom feed, as this character is not allowed in XML)
83: __ (removed from Atom feed, as this character is not allowed in XML)
84: __ (removed from Atom feed, as this character is not allowed in XML)
85: __ (removed from Atom feed, as this character is not allowed in XML)
86: __ (removed from Atom feed, as this character is not allowed in XML)
87: __ (removed from Atom feed, as this character is not allowed in XML)
88: __ (removed from Atom feed, as this character is not allowed in XML)
89: __ (removed from Atom feed, as this character is not allowed in XML)
8A: __ (removed from Atom feed, as this character is not allowed in XML)
8B: __ (removed from Atom feed, as this character is not allowed in XML)
8C: __ (removed from Atom feed, as this character is not allowed in XML)
8D: __ (removed from Atom feed, as this character is not allowed in XML)
8E: __ (removed from Atom feed, as this character is not allowed in XML)
8F: __ (removed from Atom feed, as this character is not allowed in XML)
90: __ (removed from Atom feed, as this character is not allowed in XML)
91: __ (removed from Atom feed, as this character is not allowed in XML)
92: __ (removed from Atom feed, as this character is not allowed in XML)
93: __ (removed from Atom feed, as this character is not allowed in XML)
94: __ (removed from Atom feed, as this character is not allowed in XML)
95: __ (removed from Atom feed, as this character is not allowed in XML)
96: __ (removed from Atom feed, as this character is not allowed in XML)
97: __ (removed from Atom feed, as this character is not allowed in XML)
98: __ (removed from Atom feed, as this character is not allowed in XML)
99: __ (removed from Atom feed, as this character is not allowed in XML)
9A: __ (removed from Atom feed, as this character is not allowed in XML)
9B: __ (removed from Atom feed, as this character is not allowed in XML)
9C: __ (removed from Atom feed, as this character is not allowed in XML)
9D: __ (removed from Atom feed, as this character is not allowed in XML)
9E: __ (removed from Atom feed, as this character is not allowed in XML)
9F: __ (removed from Atom feed, as this character is not allowed in XML)

Thin space characters

0020: _ _ space [for comparison]
205F: _ _ medium mathematical space (MMSP)
202F: _ _ narrow no-break space
2006: _ _ six-per-em space
2009: _ _ thin space
200A: _ _ hair space

For more details about each character, consult the Unicode Character Properties utility or the Unicode Character Name Index.

Since September 2023, Unicode is at version 15.1 and contains 149,813 characters.

Programs which pass the test

cat - when used with the "-v" option

Programs whitch almost pass the test

Dillo
Less - when used with the "-U" option
Vim
https://github.com/BurninLeo/see-non-printable-characters

Contact me if you know of any other characters or programs that should be listed here.

This test is also available as a plain .txt file.

TLS Client Hello Mirror

2023-10-13T00:00:00Z

TLS Client Hello Mirror

What
Why
How
Mentions
Closing thoughts

What

Client Hello Mirror is a server which outputs your browser's TLS Client Hello message, emphasizing aspects of it that are detrimental to privacy:

» tlsprivacy.nervuri.net

It supports HTTPS and Gemini, is written in Go and is free/libre software. I'll explain why and how I wrote it.

Why

A TLS connection starts with the client sending the server a "Hello" message that contains a set of supported capabilities and various parameters. This initial message presents two main privacy problems:

it often includes a unique session resumption token which can be used to track visitors (this is a problem on the web, less so in Geminispace);
by way of fingerprinting, it reveals to servers and on-path observers what browser you are likely using (down to its version, or a range of versions); if you change any TLS-related settings, your TLS fingerprint becomes specific to a much smaller group of users, possibly even to you alone.

» Tracking Users across the Web via TLS Session Resumption (2018)
» tlsfingerprint.io

I think these issues deserve more attention than they receive.

Among the browser testing tools available online, I hoped to find a web service that presented the complete Client Hello message, but my search came up empty. So, because I wanted such a tool to exist (for both Gemini and the web) and because I wanted to draw more attention to TLS privacy issues, I wrote Client Hello Mirror. This was a bit of a challenge, since many of the values in the Client Hello message are not exposed by TLS libraries.

How

First steps

I chose Go for this project for a single reason: uTLS, a "fork of the Go standard TLS library, providing low-level access to the ClientHello for mimicry purposes", made by the folks behind tlsfingerprint.io. Go developers are spoiled with such libraries: see also JA3Transport and CycleTLS, both of which are based on uTLS.

This made Go look like a good language for messing around with TLS – and indeed it is.

The first hurdle was figuring out how to extract the Client Hello bytes from the TCP stream and return them in an HTTP or Gemini response. I found the answer in Filippo Valsorda's GoLab 2018 talk "Building a DIY proxy with the net package" – the io.MultiReader trick that he details was exactly what was needed.

» Building a DIY proxy with the net package
» (code and slides)
» peek function (Client Hello Mirror)

Once I had the raw Client Hello bytes, the next step was to decode them. I based the Client Hello parser on the clientHelloMsg.unmarshal function in Go's built-in TLS library. The Client Hello message breakdown in Michael Driscoll's "The Illustrated TLS 1.3 Connection" was helpful in further developing the parser, as was Wireshark.

» The Illustrated TLS 1.3 Connection

And so the first version of the tool came about, which returned the Client Hello message as JSON.

CVE-2022-30629

After staring at the JSON output for a while, I noticed that obfuscated_ticket_age values for pre-shared keys (used for session resumption in TLS 1.3) weren't obfuscated at all. No matter what client I used, when resuming a session, the number of milliseconds since my last connection was plainly embedded in the Client Hello message, exposed to on-path observers. That's because Go's TLS server was setting the ticket_age_add value to zero for all session tickets, so clients added zero to the ticket age, resulting in no obfuscation.

I reported this on May 10 2022, as Go 1.19 was nearing release. Go's security people gave this issue a CVE ID and backported the fix to Go 1.17 and 1.18 as well.

» Bug report
» CVE-2022-30629

Non-random values for ticket_age_add in session tickets in crypto/tls before Go 1.17.11 and Go 1.18.3 allow an attacker that can observe TLS handshakes to correlate successive connections by comparing ticket ages during session resumption.

Privileges and timeouts

Part of making this server was figuring out how to properly drop root privileges in Go and how to correctly set timeouts on TCP connections. Tackling these issues is not as straightforward as it may appear. I assisted Solderpunk in dealing with them for Molly Brown as well.

» Golang: dropping privileges – my Stack Overflow answer
» Molly Brown: drop privileges
» Molly Brown: timeouts

The timeouts thread goes into tedious subtleties regarding what really happens when you call Close() on a TCP connection. It turns out that, by default, the kernel doesn't close the connection until its write buffer is emptied. The write buffer can be quite large and connections can be quite slow, so this can take a very long time – hours/days after you call Close(). So if you're looking to make it harder for "slow loris" attacks to exhaust socket descriptors, don't rely on timeouts/deadlines without also calling SetLinger(0) on the TCP connection before closing it.

» TCPConn.SetLinger

tlshello.agwa.name

About one year after starting this project, I came across a blog post by Andrew Ayer titled "Parsing a TLS Client Hello with Go's cryptobyte Package". It turns out that he wrote a very similar server at about the same time as me:

» tlshello.agwa.name
» Parsing a TLS Client Hello with Go's cryptobyte Package
» github.com/AGWA/tlshacks

Internally, his approach is very different. For one thing, he wrote the code for parsing the Client Hello message from scratch, whereas I extended the parser in Go's TLS library. For another thing, he managed to expose the full Client Hello message to a standard Go HTTP listener, which is something I had failed to figure out, leading me to do HTTP "by hand".

Not using a proper HTTP library may sound like asking for trouble on the request parsing side, but my code only looks at the first line of the request. It's so trivial that I dare say it is secure, as it only deals with the minimal subset of HTTP required for this to work (no request headers, no methods other than GET and HEAD, no HTTP/2…) and it doesn't serve files. Still, I would have preferred to use Go's HTTP library instead, because that would have made my code more useful to other developers. If you need to use the Client Hello message in an HTTP response, you're probably better off using Andrew Ayer's method.

What I took from his implementation was the idea of extracting TLS parameter and extension information from CSV files published by IANA. That's how the /json/v2 endpoint was born, which expands many numeric identifiers (of TLS versions, cipher suites, etc) into JSON objects containing a bit more information. This information is also used when generating the front page.

A subtle point about tlshello.agwa.name is that it doesn't use session resumption, so clients will never send it a pre_shared_key extension or a session_ticket value.

NJA3

I wanted to highlight TLS fingerprinting, so I included the popular JA3 fingerprint in the output. However, Chromium developers recently decided to randomize the ordering of extensions on each TLS handshake, as a counter to protocol ossification. This makes Chromium's JA3 fingerprint change on every connection, which prompted me to make a variant of JA3 that remains the same when extensions are shuffled. So I took JA3, sorted the extension codes and called the new fingerprint Normalized JA3 (NJA3).

A few days later, I came across a presentation by Troy Kent titled "(JA) 3 Reasons to Rethink Your Encrypted Traffic Analysis Strategies", which made a number of insightful suggestions, some of which I implemented. One of them was to ignore SNI, padding and other extensions that clients don't necessarily send on every connection. I also added five extra code groups and made a couple of changes inspired by mercury's Network Protocol Fingerprinting (NPF) specification. These modifications made NJA3 more precise and robust. It's more than "Normalized JA3" at this point.

» NJA3 documentation
» A first look at Chrome's TLS ClientHello permutation in the wild
» "(JA) 3 Reasons to Rethink Your Encrypted Traffic Analysis Strategies"
» Network Protocol Fingerprinting (NPF) specification

Mentions

If you connect to tlsprivacy.nervuri.net using Firefox / Tor Browser, you'll get a warning that your browser doesn't validate Signed Certificate Timestamps (for Certificate Transparency). You can enable SCT support by setting security.pki.certificate_transparency.mode = 1 in about:config, but doing so makes your TLS fingerprint stand out. Mozilla should enable this by default.
tlsfingerprint.io has a JSON endpoint which returns detailed (but not complete) information extracted from the Client Hello message. BrowserLeaks also presents such information in its TLS section:

» client.tlsfingerprint.io
» browserleaks.com/tls

Closing thoughts

Some of the features that I wished for didn't make it in. I would have liked the server to support early data / 0-RTT session resumption, as well as the legacy sessionID-based resumption method, but Go's crypto/tls library does not support them.

Also, I would have liked the server to detect clients' susceptibility to session prolongation attacks (see section 3.1 of the paper linked below). That, however, would require substantially more effort than it's probably worth. What's important is to know that even though the maximum lifetime of TLS 1.3 pre-shared keys is 7 days, a server can use them to track visitors over a much longer period, by just issuing a new one on each connection. This allows for tracking users indefinitely, as long as they connect at least once a week. This can be solved by clients sticking to the expiry date of the initial pre-shared key, but I doubt that any TLS libraries do this. As for other resumption methods, TLS session tickets and session IDs have a shorter maximum lifetime, but otherwise have the same problem.

» Tracking Users across the Web via TLS Session Resumption (2018)

TLS token binding (RFCs 8471, 8472 and 8473, formerly Channel ID) looks like it can be as bad for privacy as session resumption, but Chromium removed support for it in 2018. Edge might still support it, though. Token binding appears to be on its way out, but if it sticks around, Client Hello Mirror will probably highlight it at some point.

This concludes my exploration of TLS privacy issues, at least for now. On a similar note, I'm also interested in figuring out how feasible it is nowadays to determine device clock skews via the TCP timestamps option, and to what extent they can be used for device fingerprinting. But I'll leave that for another time.

» Remote physical device fingerprinting (2005)

Notes on e-mail privacy

2023-08-11T00:00:00Z

Notes on e-mail privacy

Introduction
POP3 considerations
POP-like IMAP
Caveats
STARTTLS vs implicit TLS
Synchronization and backup
Contacts and calendars
E-mail headers
E-mail clients
- Mutt, mpop, msmtp and getmail
- Thunderbird
Throwaway addresses, aliases, catch-all
Tracking links
Attach?
End-to-end encryption
Closing thoughts
Contributions

Introduction

E-mail privacy is notoriously bad:

Most people use parasitic e-mail providers with anti-privacy business models.
Most people accumulate decades' worth of messages server-side.
Many e-mail servers expose the sender's IP address within a message header (because the SMTP specification says they "MUST").
TLS is optional for server-to-server connections. Even when used, it is most often unauthenticated. Most servers don't support DANE or MTA-STS, but do support deprecated TLS versions and weak cipher suites.
End-to-end encryption is generally a pain to use, so hardly anyone does.
End-to-end encryption only covers the message body, not the subject nor any of the other headers.
E-mail clients add extraneous headers which expose information about the user.
HTML e-mail is a thing, therefore tracking e-mail views via HTML is also a thing.
URLs with unique identifiers are widely used for tracking.
Et cetera.

What bothers me most is that IMAP and webmail have lead to billions of people and organizations accumulating years and years' worth of private messages on mail servers. This increased their risk of data compromise to the point of near-certainty. Webmail in particular made people dependent on their online accounts for accessing their e-mail archives. And providers make zero effort to persuade people to move their data offline, even though this could be done with a couple of clicks in the webmail interface.

Given this state of affairs, it's worth figuring out how to minimize our information exposure. On that note, here are a few features of my e-mail setup:

E-mail is downloaded over POP3, then deleted from the server immediately.
Connections to the server are made over TLS 1.3 only (no STARTTLS).
All TLS connections are authenticated via certificate pinning.
All connections are made over Tor.
E-mail filtering is done client-side, no filtering information is stored on the server.
Messages are rendered as plain text by default, but if HTML is required, all HTML (anti-)features that can be used for tracking are disabled.
When sending e-mail, my timezone is spoofed. The e-mail client does not add a User-Agent header to the message, nor any other privacy-sensitive headers like Content-Language.

This is desiderata I have irrespective of which e-mail service I'm using. Filtering, storing and synchronizing e-mail across multiple devices should be the client's responsibility, while the server should know and do as little as possible (namely: receive messages, encrypt them to the user's public key – ideally – and store them until fetched).

Hardly anyone uses POP anymore, but returning to POP's download-and-delete approach would, in principle, be appropriate for most use cases, as software for end-to-end encrypted file synchronization can supplant IMAP's sync functionality – and could also cover e-mail filters and various client settings. Current mail clients don't make this approach easy in practice, but it's worth striving towards (by client developers in particular), as it's fundamentally more secure and private than the dominant server-centric paradigm.

POP3 considerations

When using POP, there are a few things to keep in mind. Most importantly: POP3 only downloads messages from the Inbox folder, so any server-side filters for putting specific messages in specific folders will have to be moved client-side. Filters and folder names can contain sensitive information, so moving them off the server is a good idea anyway. Client-side filters can either be configured in the e-mail client itself, or in a separate program like fdm.

However, there is one filter you might want to add server-side: "save all mail to inbox" (including spam). Otherwise your client will not download messages marked as spam and you won't get a chance to see the wrongly-flagged ones. Spam filtering can be configured client-side as well, either by making use of headers added by the server's spam checker, or by running a spam checker yourself. Thunderbird, for example, comes with a built-in junk classifier and ClawsMail can integrate with SpamAssasin and Bogofilter via plugins.

Another issue is that e-mail service providers sometimes don't allow POP3 to work as it's supposed to. For example, when using Microsoft's e-mail service (Outlook/Hotmail), you need to explicitly allow the POP3 client to delete messages (Settings -> Mail -> Sync email -> POP options), otherwise they'll just be moved to another folder. In GMail, POP3 has to be explicitly enabled:

» Read Gmail messages on other email clients using POP

Some privacy-focused providers like Proton Mail and Tutanota require special client software and do not support POP3. There is bridging software available for using Proton Mail with regular IMAP and SMTP, but there's currently no such software for Tutanota.

» hydroxide, third-party Proton Mail bridge
» official Proton Mail bridge

POP-like IMAP

IMAP can be used in a POP-like fashion as well, without being limited to the Inbox folder, but most clients don't support this mode of operation. I've only seen this feature in getmail, which has several delete_* options to control the removal of messages from the server after they're fetched. Deletion does not have to be immediate; getmail can be set to remove messages a specific number of days after they are downloaded, which means that you can still access recent messages on other computers via webmail or IMAP, and if the server gets compromised, the attacker won't get most of your mail.

» getmail 6

When moving your e-mail archive offline, you'll need to use IMAP in order to preserve the folder structure. Beyond that, however, bear in mind that POP3 is much simpler, has a smaller attack surface, and works in fetch-and-delete manner with every e-mail client that supports it.

Caveats

Bear in mind:

There is no guarantee that e-mail providers don't keep copies of deleted messages. Even if nothing shady is going on, it is likely that e-mail is kept in server backups for a period of time, and the same goes for message metadata in server logs. The specific period ought to be disclosed in each service provider's privacy policy.
Mainstream providers like Google and Microsoft may not allow connections over Tor and may even block VPNs, so you might have to reveal your real IP address to them. If you do, make sure to check whether they expose it in mail headers.

STARTTLS vs implicit TLS

It's official: STARTTLS is to be avoided. Appendix A of RFC 8314 details why, after a long period in which implicit TLS for e-mail protocols was deemed deprecated, it is now the recommended option. In their 2021 Usenix paper on the topic, Damian Poddebniak et al. provided a jaw-dropping illustration of how prone STARTTLS is to implementation flaws.

In short, use:

SMTP over port 465 rather than 587
POP3 over port 995 rather than 110
IMAP over port 993 rather than 143

Synchronization and backup

For synchronizing e-mail messages (and any other files) between devices in real time, there are several privacy-preserving solutions. Proprietary programs aside, the easiest option might be Nextcloud with end-to-end encryption enabled. You don't need to self-host, there are many Nextcloud providers, some of which offer a few GB of storage at no cost.

Then there's Syncthing, an excellent program for peer-to-peer sync:

Syncthing

The problem with peer-to-peer sync is that the devices you're synchronizing need to be online at the same time. You can work around this by also synchronizing to an "always-online" device (a server, a phone, etc). If you don't fully trust this device, you can hide your files from it via some form of transparent file encryption. Syncthing supports this via its "untrusted devices" feature, which is still in beta phase, but has been around for a while and reportedly works well. I used eCryptfs instead; it is Linux-only, but there are cross-platform solutions as well, such as EncFS, securefs, CryFS and Cryptomator.

My setup involved a phone and two computers synchronizing an encrypted folder via Syncthing; the folder was opaque to my phone, but the computers mounted it automatically on login, and synchronization was seamless. I took these notes when setting up eCryptfs:

# basic setup of ~/Private/ (and ~/.Private/), auto-mounted on login:
sudo apt install ecryptfs-utils
sudo modprobe ecryptfs
ecryptfs-setup-private

# setup of ~/Private/ (and ~/.Private/) WITHOUT auto-mount on login:
ecryptfs-setup-private --nopwcheck --noautomount --wrapping
ecryptfs-mount-private
ecryptfs-umount-private

# to change the passphrase:
ecryptfs-rewrap-passphrase ~/.ecryptfs/wrapped-passphrase

eCryptfs page on the Arch Linux wiki

At the moment I don't need real-time synchronization between devices. I rely on my backup if I have to move to a different machine.

When it comes to backup, I've used Borg and Duplicati and have heard good things about restic. All three support encryption, so backups can be safely stored on other people's machines. Duplicati is the more beginner-friendly one, it has a graphical interface and works on Windows.

Borg is easy to set up and has great documentation. This is how I use it to back up important files to my Raspberry Pi via SSH:

export BORG_REPO=myraspberrypi:/home/backup/mystuff
BORG_PASSPHRASE="$(pass show bkp)"
export BORG_PASSPHRASE

borg create --progress --stats ::"{now:%Y-%m-%d}" ~/Documents ~/Mail
borg prune --verbose --list --keep-daily=7 --keep-weekly=4

The Pi then rsyncs the backup to an off-site server.

Contacts and calendars

Vdirsyncer and DecSync CC can be used for synchronizing calendars and contacts as files. Alternatively, EteSync can be used for end-to-end encrypted and history-preserving synchronization of contacts and calendars.

Vdirsyncer syncs between a CardDAV/CalDAV server and the local filesystem; it runs on GNU/Linux and macOS. DecSync CC is an Android contacts & calendar provider which uses a local directory instead of a server; the directory can be synchronized using a separate program (see above). EteSync is a cross-platform server-based solution with end-to-end encryption, which also saves the history of changes made to your calendar and contacts.

I, for one, back up my contacts from my phone to my laptop manually. As for the calendar, I only access it on my laptop. Real-time synchronization is overkill for my needs.

E-mail headers

Here are a few examples of client-added e-mail headers and what information they expose:

User-Agent: reveals the mail client's name and version; may also include the operating system's name and version.
X-Mailer: client name and version.
Content-Language: languages your client is configured to use (for spellchecking and such).
Date: system time, including timezone.
Message-ID: may reveal the local machine's hostname and/or system time. The algorithm for generating message IDs varies from client to client, so it can also be used to deduce what mail program you use (Thunderbird uses UUIDs, Mutt uses a Unix timestamp concatenated with a random number, etc).
Content-Type: reveals whether or not you use format=flowed, which may stand out if it's not your client's default setting.

The order of headers can also be used for fingerprinting, as well as differences in how multi-line headers are indented. There may also be other subtle differences between how the various e-mail clients generate headers.

E-mail clients

Mutt, mpop, msmtp and getmail

My current e-mail setup consists of mutt, mpop, msmtp and a small bespoke filtering script that took the place of fdm. When I need IMAP, I use getmail.

I configured mpop and msmtp to retrieve mail account passwords from pass. You could also use a variant of pass called passage that uses age instead of GPG.

pass
passage

Thunderbird

A more approachable solution for non-techies is to use Thunderbird. The procedure for moving e-mail offline is simple: add your account via IMAP and then drag each folder to the Local Folders section. After the transfer completes, double-check that the number of messages in your local folders is the same as on the IMAP side, then delete all IMAP folder contents. Messages will be stored in your Thunderbird profile directory, which you can then back up:

» How to back up your Thunderbird profile directory

Before adding any accounts to Thunderbird, you may want to set the following in Settings -> General -> Config Editor (at the bottom):

# check all folders for new messages (including spam and trash)
mail.server.default.check_all_folders_for_new = true

# change default sort order (new messages first)
mailnews.default_sort_order = 2
mailnews.default_news_sort_order = 2

Thunderbird does not have good privacy protections by default, but it can be configured to get you most of the way there. At a minimum, I would set the following:

# hide timezone
mail.sanitize_date_header = true

# don't add Content-Language header
mail.suppress_content_language = true

# don't add User-Agent header
general.useragent.override = ""

You can also go to "Settings -> General -> Connection" and configure Thunderbird to proxy traffic through Tor. That's the equivalent of:

network.proxy.socks = 127.0.0.1
network.proxy.socks_port = 9050
network.proxy.socks_remote_dns = true

For more in-depth Thunderbird hardening, HorlogeSkynet's user.js file is worth looking into:

» Thunderbird user.js

This is not to say I recommend it in its entirety. For instance, it sets privacy.resistFingerprinting = true, which paradoxically makes Thunderbird more fingerprintable by adding a Firefox-specific User-Agent header to all outgoing e-mail. This setting was made for Firefox and does not work well with Thunderbird.

Throwaway addresses, aliases, catch-all

The e-mail address is an identifier that links one's online accounts. I generally don't want them linked, so most often I don't reveal my real e-mail address to services I sign up to.

When I need a throwaway address for an unimportant account, I reach for anonbox, Guerilla Mail or (for longer-term addresses) danwin1210.de and Fedora Email (not affiliated with the distro). I use SimpleLogin when I want messages to be forwarded to one of my real addresses. In cases where I don't want messages going through a third-party e-mail forwarder, I fall back to catch-all e-mail on my own domain.

If you start receiving spam, using a unique address for each service allows you to identify which one leaked your address to the spammers. It also enables you to block addresses that are receiving unwanted messages.

Tracking links

Messages sent by companies very often include links with tracking parameters. Often these parameters can be stripped without breaking the link, which is something that mail clients can do automatically, to some extent. For instance, Proton Mail has this feature:

» Proton Mail tracking links protection

Thunderbird inherits a similar feature from Firefox:

» Firefox Query Parameter Stripping

privacy.query_stripping.enabled = true
privacy.query_stripping.enabled.pbmode = true

Proton Mail also prompts users when they click links, displaying the links in full and asking for confirmation before opening them. This is meant to protect against phishing and could be enhanced by presenting the link in way that's easier to read, highlighting the hostname.

The more mail clients incorporate such features, the better.

» Tracking link test page

Attach?

Sometimes institutions and companies request sensitive information via e-mail. You may be asked to attach a photo of your ID, for example. Consider that, no matter what their privacy policies say, the number of organizations with good e-mail hygiene is close to zero. Messages remain on their servers (probably owned and managed by third parties) for who knows how long, increasing the odds that your data will be compromised. So, if possible, send an ephemeral link to the document instead. Either link to the file on a server you control or use a service which supports in-browser encryption. I have used:

Some employees are trained never to click links in e-mail, so this may not always work, but it's worth trying. Also consider that it may be more private to present the information in person or send it via postal mail. Paper is better for privacy; digital copies tend to multiply.

End-to-end encryption

It's strange to me that so many e-mail privacy guides jump straight to PGP when there are much lower-hanging fruit that are more beneficial to most people's privacy. Still, end-to-end encryption (via PGP, S/MIME, codecrypt, etc) is important in some contexts and should be used when appropriate.

There have been developments on the PGP UX front. For one thing, Proton Mail made PGP work such that users don't need to think about it or even know what it is. Proton Mail supports PGP encryption to external servers as well, with key discovery done automatically via WKD. The Pretty Easy Privacy project has also made noteworthy advances on the PGP UX front, as has Mailvelope. Researchers may continue to publish papers about how Johnny can't encrypt, but the tools have clearly gotten better and can get better still.

Modern keyservers like keys.openpgp.org send verification e-mails to check if the key owner really controls the e-mail address(es) associated with the key. Going further, projects like Keyoxide and Keybase offer proofs of online identity by associating public accounts and website(s) with PGP keys in a verifiable way. This helps greatly when assessing whether a particular key belongs to who it's supposed to.

The Memory Hole project was a standardization effort that addressed PGP encryption and/or signing of e-mail headers. The proposed standard was implemented by programs such as Enigmail and Mailpile. The most recent push in this regard is the IETF draft "Protected Headers for Cryptographic E-mail", the last revision of which was published in December 2019. Mutt, for example, has implemented coverage for the Subject header (see the crypt_protected_headers_read option).

There is also a standard called Autocrypt for negotiating end-to-end encryption between two users of e-mail:

Autocrypt

Lastly, there is software for e-mail servers which automatically encrypts incoming mail to the recipient's PGP key. This is not end-to-end encryption, but it's noteworthy nonetheless.

Lacre

Closing thoughts

What I'm ultimately hoping for is the development of a newbie-friendly private-by-default e-mail client which nails it on every privacy front. One such effort worth keeping an eye on is Mailpile, which is currently being overhauled with help from NLnet.

» Mailpile 2 – "A Mail Client in Six Steps"

I'd also like to see better integration between server and client, such that the client is able to:

show the mail service provider (MX records) for each destination address;
show if the server is going to use TLS for the connection to the destination server and what kind of authentication (if any) it will perform;
set per-domain TLS policies (see Postfix's smtp_tls_policy_maps);
inform the user when a message has been deferred (left in the server queue), what the reason was, when the retries are scheduled to occur, and notify the user when it gets delivered (in case of greylisting, for instance).

That's about it.

P.S. » Use plain text email.

Contributions

Thanks to sl1200 for letting me know about Syncthing's "untrusted devices" feature and reminding me of Keyoxide.

» sl1200

Analysis of the /e/OS app installer, part 2

2022-10-09T00:00:00Z

Analysis of the /e/OS app installer, part 2

Signature verification
Progressive web apps
Privacy
Promoting spyware
Other potential problems (untested)
Final thoughts
Discussion

In the first part of this analysis (October 2021), I detailed a number of security and privacy problems with the /e/OS app installer. To recap:

"Apps", the /e/ app installer, downloads applications from CleanAPK.org, an intermediary which provides apps that originate from F-Droid and elsewhere.

Since apps are not downloaded directly from F-Droid or Google Play, the installer takes certain measures to protect against tampering. Unfortunately, these measures can be bypassed in the majority of cases. This means that CleanAPK.org (or whoever compromises it) can get maliciously modified apps installed on /e/ users' devices, either when the user is installing a new app or during the update process.

Such an attack can be targeted at specific users, based on device information which the installer reveals to the CleanAPK server every time it checks for updates, namely: the list of installed apps, device model, build ID, Android version and installed languages.

Since then, things have changed somewhat. May 2022 brought the launch of /e/OS v1.0, as well as App Lounge, a rewrite of the installer which downloads applications directly from Google Play.

Signature verification

I expected this release to solve all security vulnerabilities detailed in the previous article, but it wasn't until /e/OS v1.3 (App Lounge v2.3.4) that they got dealt with. The first few versions of App Lounge had F-Droid signature verification removed, which made the situation worse than before. CleanAPK could have gotten the user to install a malicious version of any app just by including it in an API response (and still can, for those who haven't updated).

While understated by the developers, solving this problem was likely the most important security enhancement that the app installer ever received, so be sure to update to /e/OS 1.3 or later (App Lounge currently does not have a self-update mechanism; it is only updated along with the entire operating system).

So progress has been made. I have not tested the patch, but it looks like, four years after the initial release, the app installer finally provides adequate protection against application spoofing. However, this only applies to native apps. PWAs are a separate discussion.

Progressive web apps

Alongside native apps from Google Play and F-Droid, the app installer also provides progressive web apps (PWAs): web pages which can be installed alongside native applications. CleanAPK provides a list of links to PWAs from all over the web. When you install a PWA, App Lounge does nothing to verify that the link pointing to it is correct. CleanAPK can easily change any of these links and serve spoofed PWAs.

Responding to this observation, one of the developers mentioned that /e/ will be looking into W3C MiniApps, which can be packaged and verified. I don't know what their plans for MiniApps are, but as long as App Lounge provides PWAs, something ought to be done to secure their delivery.

My suggestion to /e/ devs is to make App Lounge update its PWA index by fetching a verified, offline-signed list from an /e/ server. Note, however, that this would only protect the list of links. It would not help against PWA servers themselves being compromised, which is something that web apps in general are not protected against. Web apps can be signed, but hardly ever are, because browsers have no built-in way to verify such signatures.

I also suggest more prominently presenting the PWA's URL within App Lounge, so that users have a better chance to spot at least obvious alterations, such as "uber.com" being changed to "evil.xxx". App Lounge currently displays each PWA's domain name under the "package name" field at the bottom of the app details page, which is easy to miss.

A quirk of CleanAPK is that all PWA links that it provides end with the redundant parameters ?force=pwa&source=mlpwa. The source parameter looks like it may be a referrer signal, but a developer told me that this is not intended nor required and asked me to open a ticket for it. Here is that ticket: https://gitlab.e.foundation/e/backlog/-/issues/5767. I opened it three months ago, no change yet.

Side note: around March 2022, /e/ devs were planning to get PWAs from pwastore.com rather than CleanAPK, but they seem to have dropped that idea.

Privacy

App Lounge inherits the previous installer's privacy problems and adds a few more. Namely, it sends Google the list of apps you have installed and the following device properties:

Build.BOOTLOADER
Build.BRAND
Build.DEVICE
Build.FINGERPRINT
Build.HARDWARE
Build.ID
Build.MANUFACTURER
Build.MODEL
Build.PRODUCT
Build.RADIO
Build.VERSION.RELEASE
Build.VERSION.SDK_INT
Features
GL.Extensions
GL.Version
GSF.version
HasFiveWayNavigation
HasHardKeyboard
Keyboard
Locales
Navigation
Platforms
Screen.Density
Screen.Height
Screen.Width
ScreenLayout
SharedLibraries
TouchScreen
UserReadableName
Vending.versionString
Vending.version

This info is sent regardless of whether the user logs in with a Google account. An example of the values these properties can take is available here. The corresponding source code can be found here.

It appears that sending this data is at least partly necessary in order to get apps from Google. Aurora Store behaves similarly, although its privacy policy does not go into much detail (I opened an issue about this). However, unlike App Lounge, Aurora Store provides a way to spoof device information it sends to Google.

I want to stress that the only way to get apps from Google Play without having to trust a third party is to download them straight from Google, which implies disclosing (at least some of) this data. I don't fault /e/ for using this method to gain access to Google Play apps. However, the user should be made aware of its privacy implications and be allowed to opt out before any connection is made to Google.

UPDATE 2023-02-15: Starting with of /e/OS 1.8, App Lounge has a "no-Google mode", allowing users to opt out.

As for HTTP headers, the User-Agent header is no longer as revealing as it was, but this doesn't help against Google, because the device details listed above are a large superset of the info that used to be exposed in the User-Agent string. The list of languages installed on your system is still exposed via the Accept-Language header, which is apparently required in order to get applications in the correct language.

Data sent to Google is only documented in an obscure page on the e.foundation website, which is better than nothing – in today's world it even stands out as going above and beyond. But privacy-related information should be presented clearly to all users before any app sends any data to any server. Does App Lounge contain an intro screen telling people what data goes where? No. The intro screen contains a block of legalese that does not mention any of this.

In conclusion, App Lounge sends Google more than enough information to identify most devices, even in its so-called "anonymous mode". /e/developers should look into spoofing as much of this info as possible. Regarding the list of installed applications, one way to hide it from Google might be for App Lounge to imitate Parcimonie and do update checks for each app separately, through a proxy (Tor, if possible), at random intervals.

Promoting spyware

When opening App Lounge, the user is greeted with a list of mostly proprietary applications with attractive images, but full of trackers. App Lounge fetches this list from Google Play. Allowing Google to determine what gets promoted in /e/'s privacy-focused app store is obviously problematic, as it steers users in the wrong direction. A privacy-focused system encouraging users to give up their privacy.

The old app installer has pretty much the same problem, except that it gets its app list from CleanAPK, which makes different recommendations based on unknown criteria.

In October 2021, I wrote a forum post and a ticket suggesting that the app installer's homepage ought to contain entirely privacy-respecting free/libre software. Users could still install whatever they wanted, but they would have to look for the junk, not have it pushed on them.

The issue has been largely ignored until July 2022, when I prodded the devs about it over e-mail, which sparked a semi-private discussion on the ticket thread (I can only see it while logged in). In short, they think not having popular crap like Facebook one tap away is bad UX, too inconvenient for most people. So they intend to leave these apps on the homepage, but are considering a few approaches to improve the situation, such as:

excluding apps with a privacy score below 6 (if that doesn't rule out too many popular apps)
highlighting the privacy score
refining the privacy score
modifying the homepage to also promote libre apps
adding a new app category containing privacy-respecting apps recommended by /e/
promoting libre, privacy-friendly alternatives in the app details page

Their approach could end up being better than what I initially suggested. Having mainstream apps on the front page can be an opportunity to educate users: App Lounge could strongly highlight the problematic apps as threats to privacy and discourage their use, promoting good alternatives where available. Will /e/ devs do this? It remains to be seen.

The privacy score is already a step in this direction, but I would argue it is insufficient. It is also objectively unreliable. To give an example I used previously, Tor Browser has a lower privacy score than Facebook Lite, even though Tor Browser does not do any tracking. Another example: if an app only requests access to contacts and to the internet, it will get a very good privacy score even if it all it does is send your contact list to a data broker. An automatically-generated privacy score cannot be expected to be accurate in all cases and applying human review to Google Play's entire app catalog is out of the question for /e/. Still, they could perhaps apply human review to only the most popular apps. They could also make use of F-Droid's tracking flag, which is arrived at by humans aided by various tools.

The /e/ team posted their current method for calculating the privacy score in a recent developer blog post. There they acknowledge its limitations and claim that they are looking into ways to improve its accuracy.

Aude M. mentioned requesting a study to determine the best solution for the homepage problem, but the link to the study is dead and no information has been provided on what it entails. All in all, I am disappointed by the lack of transparency regarding this decision. I don't see why they didn't keep the discussion public.

Final thoughts

App Lounge is moving in the right direction, but is currently not as private nor as secure as its alternatives: F-Droid is better at handling free software, Aurora Store is better at dealing with Google Play and PWAs are safer to install by using a web browser and checking the URL. With more attention paid to security, App Lounge could shine as a PWA installer, but other than that, even after all the planned improvements are implemented, App Lounge will still be lagging behind F-Droid and Aurora.

Also, from my perspective, the past and current issues with /e/'s app installer do not inspire confidence in /e/OS as a whole, nor do some of /e/'s other failures. For the sake of those who will end up using the system, I hope things will improve, but I've switched to GrapheneOS a while ago and have not looked back since.

The reason I dove into /e/OS in the first place was that I had no better option for my previous phone: the OEM's Android build was spyware and none of my preferred Android distributions had functioning builds for my device. Were it compatible with GrapheneOS, CalyxOS, DivestOS, LineageOS or Replicant, I would have used one of them instead. At this point I wonder if I wouldn't have been better off sticking with the stock OS, clearing out as much of the junk as possible via adb and using something like NetGuard to block unwanted connections. Among other advantages, going that route would have preserved verified boot, a significant security feature.

The main thing that differentiates /e/ from the above-mentioned Android distros is its well-integrated suite of cloud services, offering many of the conveniences provided by a Google account, but with the same pitfall of storing your data in plaintext on someone else's computer. If /e/ provided end-to-end encryption for those services (like EteSync does, for instance), it would truly stand out. Just saying "we're not Google" doesn't cut it for me.

Discussion

This article has been discussed on the /e/ Foundation forum.

Tracking via pasted text

2021-02-20T00:00:00Z

Tracking via pasted text

Plain text steganography and how it can be used against you

Zero-width characters can be used to embed hidden information inside of plain text. This is of primary concern to journalists and their sources, but it can affect anyone browsing the Internet. For example, a page can be dynamically generated server-side to include, between every few words:

your username, if logged in
your IP address
the current timestamp

By copying text from the page and pasting it somewhere public, you would be revealing this information to anyone who knew how to look for it. Details and demo in this article:

Be careful what you copy: Invisibly inserting usernames into text with Zero-Width Characters (Tim Ross, 2018)

To check if your browser displays zero-width characters, see the zero-width character test.

Other plain text watermarking techniques / canary traps are explained on Zach Aysan's blog:

To fingerprint text, server software could embed a hidden number between every few words, matching a log entry that contains information about the visitor (username, IP address, cookie, browser details, referrer link, timestamp). For easily finding pasted excerpts online, the software could similarly hide a static page-specific identifier within the text, that can later be put into search engines.

To achieve this, aside from zero-width characters, the software could use some of the other techniques described by Zach Aysan: "differences in dashes (en, em, and hyphens), quotes (straight vs curly), word spelling (color vs colour), and the number of spaces after sentence endings", different types of spaces, homoglyphs (a vs а), diacritic forms (ț vs ţ), ligatures (ﬁ vs fi, Ⅳ vs IV, ½ vs 1/2), as well as inserting hard to detect typos into the text. However, zero-with characters are by far the most potent technique, since they can be used to encode any number of bits between any two visible characters.

Solutions

A partial solution is to convert the text to ASCII, if language allows. There are also tools such as:

cat - displays all zero-width characters when used with the "-v" option.
Less (CLI) - displays most zero-width characters when used with the "-U" option.
SafeText (CLI) - also detects some homoglyphs. It started out well, but development has stopped; in its current state, there are many problematic characters that it does not detect - see issues.
Several browser extensions that detect a few zero-width characters.

However, they don't protect against the more sophisticated versions of this hack. A more complete tool would have to include not just a list of forbidden/allowed characters, but also a a spellchecker and a way to detect trailing whitespace - an x-ray mode that might be triggered when dubious text is detected in the clipboard. And not just text, image-based steganography can be used in a similar way. A technical solution might never be perfect, but it could cover the vast majority of cases.

An almost perfect non-technical solution is to retype the text. You can also try downloading the page twice from different accounts / IP addresses and diff the two versions, or check if the hashes match. Another solution is to take a screenshot of the text and run it through OCR software.

Tools for text steganography

StegCloak
Spam Mimic (see Encode -> Alternate encodings)
zwfp
SNOW
Snow10
WORDLISTTEXTSTEGANOGRAPHY & EMAILSTEGANO
inØsight — Zero Width Obfuscation (extension for Firefox and Chromium)
Zero Width Shortener - Shorten URLs using invisible spaces

Unicode character search

An analysis of the /e/OS app installer

2021-10-29T00:00:00Z

An analysis of the /e/OS app installer

Update – 2022-10-09
Summary
Background
How?
Privacy issues
Proposed solutions
Mitigations for users
App sources
Closing thoughts
Addendum
- Contributions
- Discussions

Update – 2022-10-09

In May 2022, seven months after this article was published, the /e/ team launched App Lounge, a rewrite of the app installer which downloads applications directly from Google Play. In the follow-up I address the changes that it brought.

Summary

The /e/ operating system (formerly eelo) is a privacy-focused variant of Android, forked from Lineage OS. It is part of the Murena project and is led by Gaël Duval, of Mandrake/Mandriva fame.

"Apps", the /e/ app installer, downloads applications from CleanAPK.org, an intermediary which provides apps that originate from F-Droid and elsewhere.

Since apps are not downloaded directly from F-Droid or Google Play, the installer takes certain measures to protect against tampering. Unfortunately, these measures can be bypassed in the majority of cases. This means that CleanAPK.org (or whoever compromises it) can get maliciously modified apps installed on /e/ users' devices, either when the user is installing a new app or during the update process.

Such an attack can be targeted at specific users, based on device information which the installer reveals to the CleanAPK server every time it checks for updates, namely: the list of installed apps, device model, build ID, Android version and installed languages. If the installer is configured to install updates automatically (as is the default), CleanAPK can push apps to users' devices in the background. It can install new apps, but can not replace installed apps with different ones.

These conclusions are based on:

intercepting connections made by a build of "Apps" 1.1.6 from April 2021;
testing the attacks on the same build;
an evaluation of code that has been written since to address parts of this problem.

Background

According to the FAQ about its built-in app installer,

The /e/ app installer relies on a third-party application repository called cleanapk.org.

Apps served by cleanapk.org can be searched here.

The FAQ also states:

Apps are checked either using a PGP signature check, or a checksum.

In case you suspect an app to be tampered, we would request you to please report the same here support@e.email and we will take appropriate action keeping you informed.

I read the code that checked for tampering and found that it could be easily bypassed. On the 21st of May 2021, I reported this to /e/ with a 90 day publication deadline, which I later extended by another 60 days after finding that the initial fix did not adequately address the problem. The team tackled many of the issues I reported (/e/OS v0.19 contains the fixes), but as of today (Oct 29 2021), the fundamental problem remains unsolved and the attacks described above are still possible.

How?

CleanAPK.org has a REST API which responds to search queries and returns metadata about available apps. For an example of how it works, let's say you search for NewPipe, tap the first result, then press "install"; the app installer would make the following API calls (I have removed unessential parameters):

https://api.cleanapk.org/v2/apps?action=search&keyword=newpipe
https://api.cleanapk.org/v2/apps?action=app_detail&id=5b16ba2089bb69289976fbab
https://apk.cleanapk.org/any_50e9e98f0d94f06facfeab7707437300_FDROID.org.schabi.newpipe.apk
https://api.cleanapk.org/v2/apps?action=download&app_id=5b16ba2089bb69289976fbab

The last call is made redundantly, as the same information is also included in the second call.

On the client side, the installer tries to determine the original source of the app that the user is about to install: /e/ (system app), F-Droid or Google Play. Based on this, it applies one of 3 checks:

system apps have their signatures checked using the system's signing key;
F-Droid apps have their signatures checked using F-Droid's PGP key;
Google Play apps are checked by comparing their SHA1 hashes with those included in the API response; since CleanAPK provides both the files and their hashes, it can tamper with them at will.

In order to provide tamper protection for Google Play apps, the installer would need to fetch signing keys from the source, which is difficult, because Google Play does not publish them like F-Droid does. You generally can't check that the app you got from a third party is the same as on Google Play unless you also download it from Google Play.

Looking at the installer's code, we find that signature verification for F-Droid apps can also be bypassed. It used to be the case – and still is, for un-updated systems – that if the API response contained "package_name":"com.google.android.gms", the installer assumed that the APK file was a system app (MicroG) with the same package name and did no further verification. And if the response contained a SHA1 hash of the APK file, the installer assumed that it was a Google Play app and did no signature verification. Both of these assumptions could be leveraged to install malicious apps.

Now the app installer extracts the package name from the APK file, then makes a GET request to

https://f-droid.org/en/packages/PACKAGE_NAME/

to determine if it is an F-Droid app or not. This means that the f-droid.org server gets to learn about every app that /e/ users are trying to install. This method also does not protect against instances where the attacker modifies the package name within the APK file. By modifying the package name, CleanAPK can induce a fallback to the checkGoogleApp() function, which provides no tamper protection.

So, in the end, F-Droid apps from CleanAPK are not protected against tampering either. CleanAPK can make it appear as though you are installing a libre app like NewPipe and instead you would be installing a malicious version of it (or a different app entirely) with a different internal package name.

System apps are the exception to the rule: they are not downloaded from CleanAPK, but from https://gitlab.e.foundation/. The only apps that currently fall under this category are MicroG and Mail.

In Android, once an app is installed, its updates must be signed with the same key. This means that CleanAPK won't be able to get a malicious APK installed if its package name collides with that of an already installed app. However, avoiding collisions is easy, especially since CleanAPK receives a list of every user's apps.

Privacy issues

During update checks (which occur daily, by default), the app installer sends a request to api.cleanapk.org for each installed app, with the exception of pre-installed system apps (like the installer itself). Each request looks something like this:

GET /v2/apps?action=search&keyword=org.schabi.newpipe&by=package_name
Host: api.cleanapk.org
Accept-Language: en-US,es-VE;q=0.9
User-Agent: Dalvik/2.1.0 (Linux; U; Android 8.0.0; SM-J337U Build/R16NW)
Accept-Encoding: gzip

The User-Agent header contains the device model, build ID and Android version. Installed languages are also revealed to CleanAPK via the Accept-Language header. The information from these headers, along with the IP address and the list of installed apps, can be used to uniquely identify most /e/OS devices. Aside from privacy, this also affects security, since device fingerprinting can enable targeted attacks. A compromised CleanAPK.org can quite easily target a specific device to install malware on.

Proposed solutions

My suggestions to /e/OS developers regarding the app installer:

Download the entire package index in advance, like F-Droid does;
barring 1, the k-anonymity method used in https://haveibeenpwned.com/Passwords might also work for privacy-friendly updates;
download apps directly from the source;
barring 3, download vetted signing certificate fingerprints from a server other than CleanAPK's; an initial list of fingerprints can be bundled with the installer, like the F-Droid PGP key currently is;
check that app metadata from the API matches the actual app;
don't send the User-Agent header;
don't send the Accept-Language header; it is currently used to get application metadata in the right language, but the CleanAPK API could be changed to return metadata in all available languages at once. If compression is used, the size of such responses should be manageable.

Also, use a better hash function than SHA1. SHA256, for instance.

Mitigations for users

As mentioned, if the installer is configured to install updates automatically (which it is, by default), CleanAPK can push malicious apps to users' devices in the background. Also, the installer sends the list of installed apps and other device data to CleanAPK every time it checks for updates. The "Apps" app cannot be uninstalled or disabled from the Android UI and there is no obvious way to disable update checks. Still, there are ways for users to limit the reach of the /e/ app installer:

App settings:
  - Update check interval: Monthly
  - Automatically install updates: uncheck

Android Settings > Apps > Apps:
  > FORCE STOP
  > Permissions:
    - Storage: uncheck
  > Data usage:
    - Disable all cellular data access: check
    - Disable all Wi-Fi data access: check
    - Disable all VPN data access: check

Alternatively, users can uninstall or disable "Apps" (and other system apps) from the command line:

# Uninstall:
adb shell pm uninstall --user 0 foundation.e.apps

# Reinstall:
adb shell cmd package install-existing foundation.e.apps
# If the above does not work, try:
adb shell pm install -r --user 0 /system/app/Apps/Apps.apk

# Disable:
adb shell pm disable-user --user 0 foundation.e.apps

# Enable:
adb shell pm enable foundation.e.apps

You can also block internet connections for /e/ "Apps" using a firewall such as NetGuard. Or you can block connections to CleanAPK by enabling adb root in the developer options and adding the following to /system/etc/hosts:

0.0.0.0 api.cleanapk.org
0.0.0.0 apk.cleanapk.org

F-Droid and Aurora Store can be used instead to give you access to pretty much the same broad range of apps, but directly from the source. You can install F-Droid from the official website and Aurora from its official website or from F-Droid.

/e/ "Apps" provides an embedded privacy score (calculated by CleanAPK) and list of trackers (detected by Exodus Privacy) for each app. However, the way in which CleanAPK calculates the privacy score is not clearly documented and I found it to be misleading at times (for instance, Tor Browser has a lower privacy score than Facebook Lite). Aurora Store also provides a list of trackers detected by Exodus Privacy. As for F-Droid, it lets you know when an app tracks and reports your activity, which is determined using a computer-assisted human review in which an Exodus Privacy report is taken into account. F-Droid checks for other anti-features as well.

App sources

Other than F-Droid, it is not clear what other source(s) CleanAPK uses. But there is at least one sign that it uses APKPure.com:

https://api.cleanapk.org/v1/apps?action=app_detail&id=5beee6c54ecab4722bde0e28

This API call returns metadata about the Uber app. The response contains: "apk_App uploaded by:":"Mòbile Soe". The APKPure page for Uber also contains "App uploaded by: Mòbile Soe". The same is also true for other proprietary apps served by CleanAPK, but not all.

If APKPure is used as a source, then we're relying on two intermediaries, not just CleanAPK. So it's even more important for the app installer to have proper tamper protections in place.

Maybe CleanAPK also gets apps from Google Play directly, or maybe not. It would be good to have transparency on this.

Closing thoughts

I can't shake the sense that the /e/ team is stretched too thin. The scope of the /e/ project is huge: maintaining online services, maintaining their own apps and forks, supporting around 200 devices, etc. They have an enormous amount of work on their hands. So, even though it is important, I can't say I'm surprised by the lack of care given to app verification, nor by other signs of rushed work.

But since /e/ is focused on privacy, I am somewhat surprised by the data it leaks. You would think that, before claiming privacy, such a project would analyze and disclose its own network connections. But this is not the first time /e/ has been shown to fail in this regard. Users deserve to know exactly what data is sent where and who can access it.

That being said, let's not lose sight of the fact that /e/ is significantly more privacy-respecting than the average Android system. Also, the problems I have described can be solved and based on my communications with the team, it appears they will be.