OpenRefine

I use OpenRefine every single day. I work with a lot of data, and OpenRefine has the tools I need to help import the data, make sense of it, clean it up, give it new structure, and then export it in the format I want.

It’s a tool that’s pretty easy to use, but hard to understand the totality of its extensive feature set. So, here I present to you my greatest hits list of OpenRefine resources. We’ll be updating it constantly as we discover new resources, tips and tricks.

If you’re looking for an understanding of what OpenRefine offers that other data parsing tools don’t, this is a great overview. And this may be the closest there is to a master list of OpenRefine resources on the web.

Manual

As far as I can tell, this is the only OpenRefine manual out there. It was created by the geniuses at Free Your Metadata.

Tutorials

Here’s a tutorial from folks at UT Austin that has lots of helpful examples.

Enipedia (which is a group devoted to energy, industry, and open data out of the Netherlands), offers a completely stellar OpenRefine tutorial to help take you through the major tools that OpenRefine offers. Run through the whole tutorial here.

If you want to clean up your messy data (don’t we all?!), Free Your Metadata offers a screencast and step-by-step lesson.

Did you know that you can reconcile your sorta-standard vocabularies to ABSOLUTELY STANDARD vocabularies using OpenRefine tools? Is this blowing your mind? IS THIS A GAME CHANGER? Again, the folks at Free Your Metadata show you how.

Let’s get advanced. Do you want to turn a crazy word document into structured data? I’m going to write a tutorial of how I did this in the next couple of weeks, but I learned everything I needed to know from my friend Christie Peterson, who in turn learned it from this tutorial by David Huynh. Go to section five and see what he has to say. He’s an engineer at Google and while this tutorial is very straightforward, it’s not as hand-holdy as the Free Your Metadata folks.

School of Data has a very friendly tutorial to get you started.

This is much less relevant to me, but a tutorial for geospatial data can be found here.

Recipes and Getting It Done

Something that I LOVE is that Refine can save your steps and let you do the same actions over and over onto different data sets. Some folks have published these steps as “recipes.”

Here’s a recipe to help you analyze Apache server logs.

Wanna remove duplicate rows? Here’s how.

Finding nearly duplicate items.

Capitalize the first letter of a string.

Here’s what it looks like to use a recipe.

Here’s how to make a column that just has part of a string of characters. (For instance — maybe you know that heading information is always at the beginning of a note, or may be you want to pull a date out of a title).

How to add leading zeroes to a number (this could be SUPER USEFUL for formatting call numbers or accession numbers).

FEAR NOT GREL. General Refine Expression Language is a powerful tool for whipping your data into shape. Explanation of grel is here. Much more useful examples are here.

General Problem Solving

StackOverflow has a pretty good community for OpenRefine. They may be mean to you. Don’t be afraid, but do check to see if your question has been addressed already.

Advertisements

One thought on “OpenRefine

  1. Pingback: Chaos —> Order | Making DACS Dates

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s