Ashley Sheridan​.co.uk

Formatting File Sizes For Other Languages

Posted on

Whenever you're dealing with files on the web, you'll likely come to a point where you need to display the size of that file, whether it's to show the size of an attachment on an email, or on a list of downloads for your own Linux distro.

Now it might suit your purposes just fine to leave this in English, but if you're trying to venture out into new worldwide markets, you may want to format the file sizes appropriately.

Converting Bytes

If you're dealing with files in any way, you'll most likely be dealing with file sizes in bytes. This will be especially helpful when you're dealing with larger files (or data sizes) as you're less likely to run into floating point rounding errors (which can be a really frustrating problem if you're dealing with numbers that need to be exact!)

So, given that, a first start on a function to convert bytes to something more human-readable might look like this attempt I did when I wrote my own custom PHP framework some years ago:

function formatDataSize($size) { $units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']; $divider = 1024; $depth = count($units) - 1; $i = 0; while ($size >= $divider && $i < $depth) { $size /= $divider; $i++; } return sprintf('%01.0f %s', $size, $units[$i]); }

This works, but annoyingly doesn't round up to a larger size. Given 1023 bytes as input, it will return "1023 B" as a string, rather than round up 1 byte to become "1 KB".

Rounding Units

In order to do this, the while loop needs to be altered to check if the division will result in $size going above a threshold, which will indicate it's close enough to the larger unit size to be rounded up to:

function formatDataSize($size, $decimalPlaces = 2) { $units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']; $divider = 1024; $roundingPercentage = 0.97656249999999; $depth = count($units) - 1; $i = 0; while (($size / $divider) > $roundingPercentage && $i < $depth) { $size /= $divider; $i++; } return sprintf("%01.{decimalPlaces}f %s", $size, $units[$i]); }

There's one other small change to the sprintf format argument here too that ensures we see the fractional result of the rounding.

This brings us closer. Now, if we pass in a value of 1000 (1½ KB) we get a returned value of "0.98 KB" because it rounded up. However, the actual result of 1000 ÷ 1024 is 0.9765625. What if we don't want that rounded up but just "cut off" at a set number of decimal places?

There is a simple (albeit ugly) way to achieve this: subtract a small amount from $size in order to always force rounding down:

$forceRoundingValue = 0.005; $roundedValue = $size - $forceRoundingValue; return sprintf("%01.{$decimalPlaces}f %s", $roundedValue, $units[$i]);

Formatting for a Locale

Now we have something that converts bytes to something a human can more easily understand, we need to be able to format this depending on the current locale of the app. For example, France uses a comma as the decimals separator, instead of the period.

This requires using a different number formatting function built into PHP, ideally named number_format().

The 3rd and 4th arguments to this function allow us to specify the decimal and thousands separator to use for our number. We could manually calculate this within our app, but if we're localising it properly, we will have already set our applications locale with setlocale(). Because we've set up localisation properly, we can let PHP do the hard work and tell us what the separators should be:

setlocale(LC_ALL, 'fr_FR'); setlocale(LC_ALL, 'fr'); function formatDataSize($size, $decimalPlaces = 2) { $units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']; $divider = 1024; $roundingPercentage = 0.97656249999999; $depth = count($units) - 1; $i = 0; while (($size / $divider) > $roundingPercentage && $i < $depth) { $size /= $divider; $i++; } $locale = localeconv(); $forceRoundingValue = 0.005; $roundedValue = $size - $forceRoundingValue; $formattedSize = number_format( $roundedValue, $decimalPlaces, $locale['decimal_point'], $locale['thousands_sep'] ); return "$formattedSize {$units[$i]}"; }

The localeconv() function returns all kinds of useful information about formatting of various types of numbers, which can be used directly as the argument values for number_format. This gives us the correct size for our specified locale.

Localising the Units

We almost have a completely localised file size. However, there is one issue with the unit itself. For our given language of French, the unit size is wrong. Instead of KB, MB, GB, etc, they use Ko, Mo, and Go.

Using a standard gettext dictionary and a small change, we can localise it for any language that our application supports:

setlocale(LC_ALL, 'fr_FR'); setlocale(LC_ALL, 'fr'); function formatDataSize($size, $decimalPlaces = 2) { $units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']; $divider = 1024; $roundingPercentage = 0.97656249999999; $depth = count($units) - 1; $i = 0; while (($size / $divider) > $roundingPercentage && $i < $depth) { $size /= $divider; $i++; } $locale = localeconv(); $forceRoundingValue = 0.005; $roundedValue = $size - $forceRoundingValue; $unit = _($units[$i]); $formattedSize = number_format( $roundedValue, $decimalPlaces, $locale['decimal_point'], $locale['thousands_sep'] ); return "$formattedSize $unit"; }

Now, we'll get 0,97 Ko output if our locale is set to a French variant.