Recently I’ve stumbled upon an interesting proposal on defining security policies called security.txt.
As I’m quite interested in the security scene myself, I wondered how many websites already use a security.txt?
Table of contents
Open Table of contents
What is a security.txt file?
The security.txt is a proposed standard to define security policies on websites.
It consists of a simple and machine readable format for a text file which should be placed in a well known location.
Below is a minified version of CloudFlare’s security.txt as an example:
Contact: https://hackerone.com/cloudflare
Contact: mailto:security@cloudflare.com
Contact: https://www.cloudflare.com/abuse/
Preferred-Languages: en
Encryption: https://www.cloudflare.com/gpg/security-at-cloudflare-pubkey-06A67236.txt
Canonical: https://www.cloudflare.com/.well-known/security.txt
Policy: https://www.cloudflare.com/disclosure
Hiring: https://www.cloudflare.com/careers/jobs/
Expires: Tue, 15 Mar 2022 13:43:01 -0700
As you can see in the example above there are quite a few fields which can be defined. I’ll try my best and give a brief overview:
Acknowledgments
links to the website’s hall of fame for security researchersCanonical
indicates related places where the security.txt can be foundContact
represents a way to contact the organisation, usually an e-mail address or websiteEncryption
links to a location where the key can be found which is used to sign the security.txtExpires
is a timestamp from when on the content should be treated as staleHiring
links to security related jobs at the organisationPolicy
links to the vulnerability disclosurePreferred-Languages
is a set of languages that are preferred for submitting reports
More info and an easy to use generator for your own website can be found on the official website.
Idea
My general idea was to get a list of domains to check, crawl the /.well-known/security.txt
path and save the response as well as some meta data in a database. Though the specification allows an alternative path under /security.txt
, I’ll only query the /.well-known/security.txt
path for simplicity’s sake.
Regarding the list of domains I found a very handy shell script by Adam Baldwin which fetches the Alexa Top websites and outputs the first 1000 in a slim text file.
Crawling
To get the data we need, I wrote a quick and dirty JavaScript script that reads a file with a list of domains as input, crawls all of them and saves:
- current timestamp
- requested domain
- returned status code
- returned
Content-Type
header - response’s body
After crawling all these domains and inspecting the data, I got 116 failed requests. While inspecting them most of them were due to either broken redirects, invalid SSL certificates or simply not serving any HTTP/HTTPS traffic on them like the amazonaws.com
root domain. Nothing I want to manually inspect, so I decided to just ignore them.
In general most of the successful requests just returned a 404 Not Found
or similar errors indicating there is no content to be served.
While going manually through the database I also discovered that many requests, which were returned with a 200 OK
status code, had a Content-Type: text/html
header set. This seemed a bit off as the specification explicitly states the file must be served as plain text. After digging around a bit more, I found out that many websites returned a 404 Not Found
page with a 200 OK
status code.
In order to filter out the false positives, I’ve added a new column indicating if the given domain has a security.txt or not. My approach was to treat all entries as does not have security.txt when either the content-type header did not include text/plain
or the returned status code was not 200 OK
.
UPDATE results SET hasSecurityFile = 0 WHERE headerContentType NOT LIKE "%text/plain%" OR statusCode IS NOT 200;
After this cleanup I’ve ended up with 105 entries with a security.txt in place. I still manually reviewed the remaining ones and filtered out a few false positives, leaving us with a final amount of 101 valid entries.
In the next step I’ll further analyze them.
Analyzing
The official website lists a few projects involving security.txt in some way. On there I’ve found a TypeScript script by Movitz Sunar which parses a given security.txt input and checks it against the security.txt’s specification. I’ve used this one as a base and modified it a bit to fit my needs, my version can be found in the Downloads paragraph.
Out of all 101 security.txt files I only found one which is fully compliant to the current (12th) version of the draft.
The most common violation with 96 times was a missing Expires
field which is a required field stated in section 3.5.5. Only one field had a valid value according to ISO-8601.
Followed by missing mailto:
prefixes when using e-mail addresses in the Contact
field with 13 violations.
Another interesting observation is the usage of the Signature
and Acknowledgements
/Acknowledgement
fields which were part of earlier versions of the draft but are not anymore. The Acknowledgments
field was renamed in the 4th version of the draft meanwhile the Signature
field was dropped in the 5th version of the draft. There were a total of four Signature
fields and 48 usages of either the old Acknowledgments
field or a misspelled version of it.
A total of seven files were PGP signed. All signatures were hashed with a SHA-2 algorithm, three with 256 bits and four with 512 bits.
Twenty files also had a Preferred-Languages
field set. All of them are accepting en
(English). Additionally the following languages are also accepted: ru
(Russian, two), tr
(Turkish, one), et
(Estonian, one) and pl
(Polish, one).
Downloads
A .zip-file containing my scripts to crawl and analyze the data as well as the sqlite database and the list of domains I’ve crawled can be downloaded here. For the lazy ones, here is a GitHub Gist.