Your browser is very old. You might enjoy surfing the web more if you used something newer like:

Google Chrome

Even Firefox would be OK.

If you're being forced at gunpoint to use Internet Explorer, you should at least upgrade it. Version 8 is tolerable and 9 will be OK when it comes out.

Posts from June 2008

Search & Replace with vim

I was presented with an 11,000 line text file a couple of days ago and tasked with importing the contents of that file into a database.

Here’s a little snippet of the file.

08-06-06 16:29:07,127.0.0.1,first call
08-06-06 16:30:24,127.0.0.1,first call
08-06-06 16:32:11,127.0.0.1,first call
08-06-06 16:34:38,127.0.0.1,first call
08-07-06 16:07:49,127.0.0.1,first call
08-07-06 16:10:46,127.0.0.1,first call
08-07-06 16:11:31,127.0.0.1,first call
08-07-06 16:30:20,127.0.0.1,first call
08-07-06 16:40:01,127.0.0.1,first call
08-07-06 16:40:14,127.0.0.1,first call
08-07-06 18:14:10,127.0.0.1,first call
08-07-06 22:21:12,127.0.0.1,first call

The biggest problem was that the date/time was in the wrong format. It really needed to be in YYYY-mm-dd HH:MM:SS format so that MySQL could treat it as a real date.

My first thought was to throw together a little Python script to parse the file, line by line, and insert it into the database. But, the more I thought about it, the more that approach started to seem like overkill. My second thought was to use vim & a little bit of regex fun and just turn the whole file into a gigantic INSERT sql statement.  There will probably turn out to be an even easier way that I don’t know about.

Either way, that lead me to two things that I’d never done in vim before. Doing a find/replace in vim using a regex more complicated than %s/foo/bar/g and using backreferences.

My general knowledge of backreferences was spotty at best so I spent a while just reading about those before figuring out the exact, vim-specific, syntax and how it applied to my situation.

Which takes us to, how do we fix that date?

Let’s start with the regex to just match the YY-MM-DD part of each line

^d{2}-d{2}-d{2}s+

(Here’s the same thing split out line-by-line)

^d{2} (?#Match the 2-digit year)
-d{2} (?#Match the 2-digit month)
-d{2} (?#Match the 2-digit day)
s+ (?#Match the whitespace between the date & time)

That will match the part of the line beginning with 2 numbers followed by a – followed by 2 more numbers followed by a – followed by 2 more numbers followed by some sort of white space.

The thing that really threw me is that vim requires that you escape all of the things like {, }, +, etc. So the regex then becomes

^d{2}-d{2}-d{2}s+

You can see in the image below how just that first column with the dates is being matched.

You can also see how I’m replacing the entire date with “foo” which is not very useful.

To actually reorder it, we need the backreferences which ends up being pretty easy. Just put parentheses around each expression you’re going to refer back to later. And don’t forget to escape those parens!

%s/^(d{2})-(d{2})-(d{2})s+/foo/gc

(Here’s the same thing split out line-by-line)

^(d{2}) (?#Match the year and capture to backreference 1)
-(d{2}) (?#Match the month and capture to backreference 2)
-(d{2}) (?#Match the day and capture to backreference 3)
s+ (?#Match the whitespace between date & time.

That looks really ugly but it’s exactly the same as before except for the parens. And having those backreferences means we can use 1, 2, & 3 to refer to those three fields in the replace section.

In this case 1 is the Month, 2 is the Day, & 3 is the Year.

So to put it into YYYY-mm-dd format, we need to reorder things and make the year 4 digits.

%s/^(d{2})-(d{2})-(d{2})s+/203-1-2 /gc

You can see below how it’s going through and rearranging the dates. I like to use the gc option when doing find/replaces so it prompts me before each replace. That way I can double-check that it’s doing what I want before I tell it to do ahead and do the whole file.

The rest is pretty easy. We just need to quote each comma-delimited field & put parentheses at the beginning and end of each line. People with stronger regex-fu than me can probably do this in one step. I’m going to do it in three because it’s easier for me to conceptualize.

Put a (‘ at the beginning of each line

%s/^/('/g

Put a ‘), at the end of each line. That will leave a trailing comma at the end of the file that needs to be removed.

%s/$/'),/g

Put commas around each of the fields.

%s/,/','/g

Then I just put the rest of the insert statement at the beginning of the file and I’m done.

INSERT INTO `theTable`
VALUES
(`theDate`, `theIP`, `theText`)

The whole thing took about 15 minutes, including the time I spent reading up on backreferences. That’s probably the same amount of time it would have taken me to throw together a little script to do the same thing but I think the benefit comes from learning more about complex find/replaces in vim.

The Editor War

I have a thought for everyone regarding the editor holy-war that flares up from time to time.

It doesn’t matter which editor you use as long as you’re productive.

That said, I’m always trying new editors in the hopes of finding something that makes me more efficient. But I also don’t care in the slightest which editor you use.

Here’s a short list of editors that I’ve used on a regular basis over the years:

Emacs

When I was in college, the CS department had 2 computer labs. The first lab had 1 SGI workstation and a couple of dumb-terminals that connected to the main SunOS box that served the campus. The second lab had 10 underpowered Gatway boxes running RedHat 4. The computers in the first lab were the only ones that had X-Windows installed so they where usually taken. I didn’t want to have to deal with a modem connection from the Macintosh labs in the library so that meant living in command-line world of the RH 4 boxes.

I liked Emacs but never really got beyond a very basic level with it. I gave up on it when I started running my own linux server at home. The box I was using had very low specs and vim was much faster to load than Emacs. I learned the basics of vim and that was enough to get by with.

UltraEdit

I was always on the lookup for a decent Windows-based editor since I was dividing my time between Windows & Linux. I don’t remember how I found out about UltraEdit but it’s still my favorite Windows editor of all time. It’s lightweight, inexpensive, and very full-featured. There’s one feature that stands out about all others: The ability to edit remote files over SFTP/FTP. That single feature was my main reason for using UltraEdit and the feature that convinced me to quit using the trial version and pay for the full version. I’ve yet to see another editor that handles remote editing as seamlessly.

Textpad

I’ll only mention Textpad briefly because I’m sometimes forced to use it at work. There are so many things that bother me about it that’s hard to know where to start. The #1 reason that I hate Textpad with the fire of a thousand suns is non-standard keyboard shortcuts. For example, find & replace? Press F5. All of the shortcuts that I’ve been using for so long that they’re part of my muscle-memory are useless in Textpad. That’s such a basic thing and it completely destroys my usability experience with Textpad.

jEdit

I went in for an interview at a PHP development shop a while back and they used jEdit exclusively. It looked interesting so I downloaded it and gave it a try. I liked the cross-platform aspect of it and it was zippy enough that I wasn’t always aware that I was using a Java app. It also has the ability to assign custom keyboard shortcuts to pretty much anything which is good. But the remote file editing kept causing it to crash, the XML validator was clunky and hard to use, and I could never get it to properly format Python scripts so I eventually gave up on it. It may be worth trying again someday but I’m not willing to make the time investment I think it will take to get really good at it.

Vim

I’ve been using Vim (or gVim on Windows) on and off for years now but, about 4 months ago, I decided that I was going to spend a month using nothing but vim and see how good I got. I’m a very keyboard oriented person so switching between the mouse & keyboard can be jarring and disruptive. Even UltraEdit bothered me with how much I still had to use the mouse. So I printed out a bunch of cheat sheets of vim commands, ran through the vim tutor, and plastered post-it notes of commonly used functions all over my monitors. I also uninstalled UltraEdit so I wouldn’t be tempted.

That’s what really made the difference. Using vim casually can feel frustrating and slow but spending the time to learn as many different keyboard commands as possible is completely worth it. I’m about a thousand times faster with vim than I am with anything else and I’m learning new things about it all the time. That initial learning curve was worth getting through and I have no plans to switch to anything else near term.

I love that I can keep my vimrc file on my thumbdrive so I can easily copy it from machine to machine and get exactly the same behavior everywhere. I love that every *nix machine I ever work on will have some flavor of it that I can use.

Pretty much the only thing I don’t like is that I can’t remote-edit files from gVim on Windows. I can open up a putty window and use vim from there but it’s not as convenient. I’m also so used to the vim keyboard commands that I end up messing things up by trying to use them in other programs out of habit.

But, if you only take one thing away from this post, it’s that it doesn’t matter what you use. Find something you like and spend the time it takes to get really good at it. You may be slow in the short term while you’re learning but you’ll make your time investment back ten-fold once you’re up to speed.

And make sure to always keep an eye out for something better.

Interviewing is hard and annoying

We’ve been interviewing on and off for the last couple of months. We’ve brought in 13 people, made roughly 4 offers (none of which were accepted), and have yet to see someone that I really felt we had to have.

I’ve identified two major reasons why we can’t hire anyone good:

  • We can’t pay premium salaries for really talented people
  • There’s a lot of mandatory overtime.

Let’s start with the first one. We’re not a huge company so, on the surface, paying above-average salaries probably seems like a waste. What I can’t get anyone to realize is that this is horribly wrong. Hiring one truly talented person and paying them what they’re worth would bring more benefit to the company than the legions of kids-right-out-of-school or the out-of-work-and-desperate who get shuffled through our team.

Sadly the people who have the final say about how much money goes out don’t really understand what it means to be a truly good programmer and consequently can’t see the benefit of paying for one.

But let’s say that I find someone to overcome the money obstacle. That leaves #2. The kind of person I really want to hire isn’t going to stand for the sort of death marches that everyone here is accustomed too. This person is going to have the pick of jobs in this area, if they’re even looking. It’s going to take something a hell of lot more appealing than just money to woo them.

So that substantially shrinks our interviewing pool, in turn creating the next issue. It’s hard to find competent people.

I’d thought all of this stuff had been pretty well covered by other people but some people still aren’t listening. So here are few tips for you job-hunting programmers.

  1. Proofread your resume.  Spellcheck it. Then get 2-3 other people to read it. Then spell check it again. Then give it to more people to read it.  Writing “ect” instead of “etc”, “firer” instead of “fire”, mixing past & present tense, etc (See my little joke there?) will pretty much kill your chances with me from the beginning.  This is such an easy one to get right and so many people blow it.  I wish I lived in a world where you wouldn’t actually get to come in for an interview but I’m not that lucky.
  2. Get your technologies right. Claiming to be an expert in “Java Script” or “My Sequel” is not going to impress me.  Again, I’d like to say you won’t even be considered for an interview but some people don’t look as critically on those kinds of mistakes as I do.
  3. You get extra points if you bring code samples without being asked.
  4. Don’t act surprised when I ask you to write code as part of the interview.  It kills me how many people trying to get hired as programmers completely panic when they find out they have to write code.
  5. Don’t put a skill on your resume if you don’t really know anything about it. If it was something you studied in school 4 years ago and haven’t used since, take it off because you’re just going to look like a fool when I ask you questions you can’t answer.
  6. Know something about our company. “I don’t know why specifically I want to work here. I just want to stay in Louisville” is not going to give me warm-fuzzies when I ask why you want to work for us.

Now for the testing. I always ask people to write a very small amount code and I have a few different questions that I like to ask (most of them, I’ve cribbed from other people). I usually vary the questions depending on the person I’m interviewing and I give a lot of leeway on syntax since this is being done on paper and/or the whiteboard.  I just want to know if you can think logically and if you know at least a little bit of the languages you claim to know.

  • Someone right out of school without much language specific experience might get a recursion problem (Fibonacci, multiplication, and so on).
  • I ask C programmers pointer questions (that I “borrowed” from Joel Spolsky).
  • I ask Java & PHP people to come up with a data structure for storing a list of people and information about them, then sort it and print it out.  I specifically ask people not to use a database and you fail if you ignore me and use a database anyway.
  • People with database experience get asked to draw some basic tables to store report information. There are some one-to-many relationships & some normalization-iisues to consider. Then they get asked to come up with SQL for certain queries.

And it kills me how many people claiming years of experience can’t do these simple things. I know this has been discussed to death (Google “FizzBuzz”) but it’s still mind-boggling to me.

How did these people get jobs they claimed to have before? Were they lying? Are they just padding things to get the next job?

And, more importantly, how do I get the people who can do these tests (and aren’t psychopaths) to work for us?