M-x all-things-emacs

Quick Tip: dos2unix, et al

April 30th, 2007 by Ryan McGeary · 8 Comments

I despise the fact that we live in a world with different end-of-line file formats. Windows/DOS uses CRLF, Unix uses LF, and Mac’s used to use CR1. Thankfully, Mac’s started to adopt the Unix format when OS X was released — if only Windows could do the same.

What I despise even more is that some editors seem to be incapable of determining the difference between a DOS and Unix file. There’s nothing worse than finding a once, perfect Unix file corrupted by a small section of lines with CRLFs while the rest of the file keeps only LFs. Most of the time, the blame can be placed on one’s editor configuration, but I also blame some editor defaults for not at least maintaining the format that the file was opened in. To be fair, most power-editors like emacs, vim, TextMate, etc behave “correctly” by default and keep the format that the file was opened in, but many others (unnamed) do not.

There’s not a whole lot we can do to avoid these problems without hounding our peers, but there are ways to fix these problems after they’re found.

Let’s fix the nastier problem first. When you find a file corrupted with half LFs and half CRLFs, strip out the ^M (CR) characters with a quick search and replace. Run M-% (query-replace) and substitute C-q C-m with nothing. C-q runs quoted-insert and is useful for inserting control characters (e.g. ^M, entered as C-m). Afterwards hit the exclamation point (!) to tell query-replace to replace all matches with no questions.

Other times, you will run into DOS formatted files and will just want to convert them to Unix format for consistency sake. To do this, open the buffer and run C-x <RET> f then enter unix or undecided-unix when prompted for the new coding system. This runs set-buffer-file-coding-system and the result is very similar to running dos2unix myfile.txt at the command line.

1 CR is Carriage Return. LF is Line Feed (aka Newline).

Tags: osx · quick · tips · unix · windows

8 responses so far ↓

  • 1 James // Apr 30, 2007 at 3:27 pm

    Fantastic. I never knew that ‘!’ turns a q-r-r into a replace-regexp. I’ve always just done a C-g, then run a replace-regexp reusing the last replacement.

  • 2 Peter // Apr 30, 2007 at 3:50 pm

    Have you seen the package: http://centaur.maths.qmul.ac.uk/Emacs/files/eol-conversion.el

  • 3 Christoph // May 1, 2007 at 9:03 pm

    C-x <RET> C-f in fact is C-x <RET> f.

  • 4 Ryan McGeary // May 2, 2007 at 12:39 am

    Thanks Christoph. I fixed the typo. I’m sorry if this caused anyone else unnecessary confusion.

  • 5 David Rysdam // Jul 22, 2010 at 3:30 pm

    Just found your blog and it’s great. Only problem: How am I going to remember these tips when I need them?

    Anyway: Run M-% (query-replace) and substitute C-q C-m with nothing. ... Afterwards hit the exclamation point (!) to tell query-replace to replace all matches with no questions.

    Why not M-x replace-string and then no (!)?

  • 6 David Rysdam // Jul 22, 2010 at 3:32 pm

    Oh man, I thought this entry was from a few months ago but it’s actually 3 years old. Errr…

  • 7 Ryan McGeary // Jul 23, 2010 at 12:28 am

    David,

    Good point, but mostly, it’s because I don’t have replace-string bound to a key binding. M-% is ingrained in my typical workflow.

  • 8 anon // Jun 28, 2012 at 12:56 am

    Tips can easily be converted to a macro and saved; thus reused