This is part 4 of an ongoing study of web font file sizes, subsetting, and file sizes of the subsets.
I used the collection of freely available web fonts that is Google Fonts.
- In part 1 I wondered How many bytes is "normal" for a web font by studying all regular fonts, meaning no bolds, italics, etc. The answer was
, of course 42,around 20K for a LATIN subset - In part 2 I wondered how does a font grow, by subsetting fonts one character at a time. The answer was
, of course 42,about 0.1K per character - Part 3 was a re-study of part 1, but this time focusing on variable fonts using only one variable dimension - weight, i.e. a variable bold-ness. This time the answer was
, of course 42,: 35K is the median file size of a wght-variable font
Now, instead of focusing on just regular or just weight-variable fonts, I thought let's just do them all and let you, my dear reader, do your own filtering, analysis and conclusions.
One constraint I kept was just focusing on the LATIN subset (see part 1 as to what LATIN means) because as Boris Shapira notes: "...even with basic high school Chinese, we would need a minimum of 3,000 characters..." which is order of magnitude larger than Latin and we do need to keep some sort of apples-to-apples here.
The study
First download all Google fonts (see part 1).
Then subset all of them fonts to LATIN and drop all fonts that don't support at least 200 characters. 200 and a bit is what the average LATIN font out there supports. This resulted in excluding fonts that focus mostly on non-Latin, e.g. Chinese characters. But it also dropped some fonts that are close to 200 Latin characters but not quite there. See part 1 for the "magic" 200 number. So this replicates part 1 and part 3 but this time for all available fonts.
This 200-LATIN filtering leaves us with 3277 font files to study and 261 font file "rejects". The full list of rejects is rejects.txt
Finally, subset each of the remaining fonts, 10 characters at a time to see how they grow. This replicates part 2 for all fonts, albeit a bit more coarse (10 characters at a time as opposed to 1. Hey, it still took over 24 hours while running 10 threads simultaneously, meaning 10 copies of the subsetting script!). The subsets are 1 character, 10, characters, 20... up to 200. I ended up with 68,817 font files.
((10 to 200 = 20) + 1) * 3277 files
Data
LATIN
The LATIN subset data is available in CSV (latin.csv) and HTML (latin.html)
Subsets
The subset data is available as CSV (stats.csv) and Google spreadsheet
Some observations
- The data set contains 3277 different fonts files, each being subset 21 times
- 588 are variable fonts
- 429 variable only on the weight axis
- 196 containing variable with more than one axis, e.g. [wdth,wght] or [FLAR,VOLM,slnt,wght]
- 63 using the [opsz] axis (it's been suggested this is the "expensive" one in terms of file size
Conclusions
I'd love to hear your analysis on the data! I hope this data can be useful and I'm looking forward to any and all insights.