You may not think you've got much in common with an investigative journalist or an academic medical researcher. But if you're trying to extract useful information from an ever-increasing inflow of data, you'll likely find visualization useful -- whether it's to show patterns or trends with graphics instead of mountains of text, or to try to explain complex issues to a nontechnical audience.
There are many tools around to help turn data into graphics, but they can carry hefty price tags. The cost can make sense for professionals whose primary job is to find meaning in mountains of information, but you might not be able to justify such an expense if you or your users only need a graphics application from time to time, or if your budget for new tools is somewhat limited. If one of the higher-priced options is out of your reach, there are a surprising number of highly robust tools for data visualization and analysis that are available at no charge.
Want to see all the tools at once?
For quick reference, check out our chart listing 22 free data visualization tools.
Here's a rundown of some of the better-known options, many of which were demonstrated at the Computer-Assisted Reporting (CAR) conference last month. Others are not as well known but show great promise. They range from easy enough for a beginner (i.e., anyone who can do rudimentary spreadsheet data entry) to expert (requiring hands-on coding). But they all share one important characteristic: They're free. Your only investment: time.
Before you can analyze and visualize data, it often needs to be "cleaned." What does that mean? Perhaps some entries list "New York City" while others say "New York, NY" and you need to standardize them before you can see patterns. There might be some records with misspellings or numerical data-entry errors. The following two tools are designed to help get your data in tip-top shape to be analyzed.
What it does: This Web-based service from Stanford University's Visualization Group is designed for cleaning and rearranging data so it's in a form that other tools such as a spreadsheet app can use.
Click on a row or column, and DataWrangler will suggest changes. For example, if you click on a blank row, several suggestions pop up such as "delete row" or "delete empty rows."
There's also a history list that allows for easy undo.
Drawbacks: I found that unexpected changes occurred as I attempted to explore DataWrangler's options; I constantly had to click "clear" to reset. And not all suggestions are useful ("promote row to header" seemed an odd suggestion when the row was blank) or easy to understand ("fold split 1 using 2 as key").
And while the fact that DataWrangler is a Web-based service makes it convenient to use, don't forget that it sends your data off to an external site -- which means it isn't an option for sensitive internal information. However, there are plans for a future release of a stand-alone desktop version. Another important thing to keep in mind is that DataWrangler is currently alpha code, and its creators say it's "still a work in progress."
Skill level: Advanced beginner.
Runs on: Any Web browser.
What it does: Google Refine can be described as a spreadsheet on steroids for taking a first look at both text and numerical data. Like Excel, it can import and export data in a number of formats including tab- and comma-separate text files and Excel, XML and JSON files.
Refine features several built-in algorithms that find text items that are spelled differently but actually should be grouped together. After importing your data, you simply select edit cells --> cluster and edit and select which algorithm you want to use. After Refine runs, you decide whether to accept or reject each suggestion. For example, you could say yes to combining Microsoft and Microsoft Corp., but no to combining Coach Inc. with CQG Inc. If it's offering too few or too many suggestions, you can change the strength of the suggestion function.
There are also numerical options that offer quick and easy overviews of data distributions. This functionality can reveal anomalies that might be the result of data input errors -- such as $800,000 instead of $80,000 for a salary entry, or it could expose inconsistencies -- such as differences in the way compensation data is reported from entry to entry, with some showing, say, hourly wages and others showing weekly pay or yearly salaries.
Beyond data housekeeping, Google Refine offers some useful analysis tools, such as sorting and filtering.
What's cool: Once you get used to which commands do what, this is a powerful tool for data manipulation and analysis that strikes a good balance between functionality and ease of use. The undo/redo list of every action you've taken lets you roll back when needed. And text functions handle Java-syntax regular expressions, allowing you to look for patterns (such as, say, three numbers followed by two digits) as well as specific text strings and numbers.
Finally, while this is a browser-based application, it works with files on your desktop, so your data remains local.
Drawbacks: Although Google Refine looks like a spreadsheet, you can't do typical spreadsheet calculations with it; for that, you must export to a conventional spreadsheet application. If you've got a large data set, carve out some time in your day to go through all of Refine's suggested changes, since it can take a while. And, depending on the data set, be prepared when looking for text items to merge: You're likely to get either a lot of false positives or missed problems -- or both.
Skill level: Advanced beginner. Knowledge of data analysis concepts is more important than technical prowess; power Excel users who understand data-cleaning needs should be comfortable with this.
Runs on: Windows, Mac OS X (if it appears to do nothing after loading on a Mac, point a browser manually to http://127.0.0.1:3333/ ), Linux.
Next: Statistical analysis
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.