Say you want to know if a particular file is encoded using UTF-81. On a UNIX box, you could just use the
Now, I know that’s not right. I created the Housman & the Yeats files using vim, & vim is set to use UTF-82, so something is funny somewhere.
In poking around to try to figure out a better method to find out if a file is UTF-8 or not, I discovered just the command I needed:
isutf8. Yes, the name of the command is “is UTF8” all crammed together & lowercased, which certainly makes it easy to remember. It’s part of the
moreutils package that you can download & install. Here’s how I did it.
On my Linux box running Debian:
On my Mac, using Homebrew3:
isutf8 was installed, I tried again to see if those text files were UTF-8:
That’s right—nothing. As it should be. In typical UNIX fashion, no news is good news, & means that the command did NOT find any files that were NOT UTF-8. Or, to put it another way, all three text files were in fact UTF-8, so the command did nothing.
Let’s see what happens with some other files:
Yep. Those were definitely not UTF-8 encoded.
I don’t think I’ll be using
isutf8 constantly, but it’s sure a handy little tool to have around.4
- chronic: runs a command quietly unless it fails
- combine: combine the lines in two files using boolean operations
- ifdata: get network interface info without parsing ifconfig output
- ifne: run a program if the standard input is not empty
- isutf8: check if a file or standard input is utf-8
- lckdo: execute a program with a lock held
- mispipe: pipe two commands, returning the exit status of the first
- parallel: run multiple jobs at once
- pee: tee standard input to pipes
- sponge: soak up standard input and write to a file
- ts: timestamp standard input
- vidir: edit a directory in your text editor
- vipe: insert a text editor into a pipe
- zrun: automatically uncompress arguments to command
If you don’t know what UTF-8 is, read the Wikipedia article. Here’s the upshot: you want all your text editors & operating systems & web browsers to support & use UTF-8 by default. It makes life a lot easier. ↩
set enc=utf-8in my
.vimrcfile, of course. ↩
Eagle-eyed readers might have noticed a list of software packages that were installed along with
isutf8when I gave the Homebrew listing. Looking over the list at the
moreutilssite, I think I’m going to have a lot to play with & write about over the coming months: ↩