Recently I’ve stumbled upon an interesting proposal on defining security policies called security.txt.
As I’m quite interested in the security scene myself, I wondered how many websites already use a security.txt?
Table of contents
What is a security.txt file?
The security.txt is a proposed standard to define security policies on websites.
It consists of a simple and machine readable format for a text file which should be placed in a well known location.
Below is a minified version of CloudFlare’s security.txt as an example:
Contact: https://hackerone.com/cloudflare Contact: mailto:firstname.lastname@example.org Contact: https://www.cloudflare.com/abuse/ Preferred-Languages: en Encryption: https://www.cloudflare.com/gpg/security-at-cloudflare-pubkey-06A67236.txt Canonical: https://www.cloudflare.com/.well-known/security.txt Policy: https://www.cloudflare.com/disclosure Hiring: https://www.cloudflare.com/careers/jobs/ Expires: Tue, 15 Mar 2022 13:43:01 -0700
As you can see in the example above there are quite a few fields which can be defined. I’ll try my best and give a brief overview:
Acknowledgmentslinks to the website’s hall of fame for security researchers
Canonicalindicates related places where the security.txt can be found
Contactrepresents a way to contact the organisation, usually an e-mail address or website
Encryptionlinks to a location where the key can be found which is used to sign the security.txt
Expiresis a timestamp from when on the content should be treated as stale
Hiringlinks to security related jobs at the organisation
Policylinks to the vulnerability disclosure
Preferred-Languagesis a set of languages that are preferred for submitting reports
More info and an easy to use generator for your own website can be found on the official website.
My general idea was to get a list of domains to check, crawl the
/.well-known/security.txt path and save the response as well as some meta data in a database. Though the specification allows an alternative path under
/security.txt, I’ll only query the
/.well-known/security.txt path for simplicity’s sake.
Regarding the list of domains I found a very handy shell script by Adam Baldwin which fetches the Alexa Top websites and outputs the first 1000 in a slim text file.
- current timestamp
- requested domain
- returned status code
- response’s body
After crawling all these domains and inspecting the data, I got 116 failed requests. While inspecting them most of them were due to either broken redirects, invalid SSL certificates or simply not serving any HTTP/HTTPS traffic on them like the
amazonaws.com root domain. Nothing I want to manually inspect, so I decided to just ignore them.
In general most of the successful requests just returned a
404 Not Found or similar errors indicating there is no content to be served.
While going manually through the database I also discovered that many requests, which were returned with a
200 OK status code, had a
Content-Type: text/html header set. This seemed a bit off as the specification explicitly states the file must be served as plain text. After digging around a bit more, I found out that many websites returned a
404 Not Found page with a
200 OK status code.
In order to filter out the false positives, I’ve added a new column indicating if the given domain has a security.txt or not. My approach was to treat all entries as does not have security.txt when either the content-type header did not include
text/plain or the returned status code was not
UPDATE results SET hasSecurityFile = 0 WHERE headerContentType NOT LIKE "%text/plain%" OR statusCode IS NOT 200;
After this cleanup I’ve ended up with 105 entries with a security.txt in place. I still manually reviewed the remaining ones and filtered out a few false positives, leaving us with a final amount of 101 valid entries.
In the next step I’ll further analyze them.
The official website lists a few projects involving security.txt in some way. On there I’ve found a TypeScript script by Movitz Sunar which parses a given security.txt input and checks it against the security.txt’s specification. I’ve used this one as a base and modified it a bit to fit my needs, my version can be found in the Downloads paragraph.
Out of all 101 security.txt files I only found one which is fully compliant to the current (12th) version of the draft.
The most common violation with 96 times was a missing
Expires field which is a required field stated in section 3.5.5. Only one field had a valid value according to ISO-8601.
Followed by missing
mailto: prefixes when using e-mail addresses in the
Contact field with 13 violations.
Another interesting observation is the usage of the
Acknowledgement fields which were part of earlier versions of the draft but are not anymore. The
Acknowledgments field was renamed in the 4th version of the draft meanwhile the
Signature field was dropped in the 5th version of the draft. There were a total of four
Signature fields and 48 usages of either the old
Acknowledgments field or a misspelled version of it.
A total of seven files were PGP signed. All signatures were hashed with a SHA-2 algorithm, three with 256 bits and four with 512 bits.
Twenty files also had a
Preferred-Languages field set. All of them are accepting
en (English). Additionally the following languages are also accepted:
ru (Russian, two),
tr (Turkish, one),
et (Estonian, one) and
pl (Polish, one).
A .zip-file containing my scripts to crawl and analyze the data as well as the sqlite database and the list of domains I’ve crawled can be downloaded here. For the lazy ones, here is a GitHub Gist.