Library Data Mashups

Samantha Becker, Research Project Manager, University of Washington Information School
Michael Crandall, Senior Lecturer, Information School, University of Washington Information School
Rebecca Blakewood, Research Analyst, University of Washington Information School

First steps.

  • decide what you want to know
  • find out what data is available
  • figure out where you can get the data from

What data to mash?

National data

  • IMLS Public Library Survey
  • Edge INitiative
  • Impact Survey/American Community Survey
  • Census

Local/Regional data

  • Community indicators
  • City/county/data
  • Community anchor institutions or agencies
  • Hospitals & School districts good resources, too
  • Local media outlets — may not share actual dataset, but might share results

Your own library data

  • Library use stats
  • Circ stats, including collection use – other usage
  • Patron surveys

Combining datasets

Conceptual Mash: guide decision making; give roughly right idea of the community; reveal areas for further research; raise qs to validate thru community needs assessment.

National vs Texas vs New Braunfels conceptual data mashup comparison: married/couple family; never married; eleme school (grades 1-5); Hispanic; Of non-English households speak Spanish at home)

Pew Library Typology Report — “Distant Admirers” 3/14/14

Largest share of HIspanics (27%) of any group; less likely to have broadband access at home; more likely to feel overloaded w too much info; less likely to feel finding info is “easy”; 43% say their children have visited the library in past year vs. 70%; 78% know where the library is at.

New Braunfels conceptual mash:

  • Large Hispanic population suggests likelihood of greater population of “Distant Admirers”
  • Higher percentage of married households with young children, plus Pew observation they are more likely to highly value libraries, suggests outreach strategy to Hispanic families with young children.

An actual mash: Edge & PLS

Are there differences in Edge scores that are attributable to library characteristics?

  • Very small (under 5,000 LSA)
  • Small (5,001-15,000 LSA)
  • Medium single location (15,001-65,000 LSA)
  • Medium multiple locations (15,001-65,000 LSA)
  • Medium-large (65,001-175,000 LSA)
  • Large (175,001-300,00 LSA)
  • Extra Large (> 300,000 LSA)

Scores fairly regularly increases as you go from very small to extra large libraries: what does that mean?

The difference isn’t really a true difference. While larger libraries are scoring higher on average, smaller libraries are still managing to get high scores. But there is variability in the scores of the smallest libraries, suggesting something else is going on.

Range in scores 200 –> 800 points range for the very small libraries.

So what is going on?

  • The variances is “real” (ANOVA p=.001)
  • But only between extra-large and other peer groups
  • Could it something to do with community characteristics?

A more complex actual mash: Multilevel modeling

  • State: influences on library support — politics; economy
  • Community: influences on library use: income; education
  • Library: resources and services: per capita funding; square feet

Does these factors lead to +/- Edge Score?

Data analysis and visualization tools

  • Pros and cons — tools vary in:
  • handling large data files
  • Using your own data
  • Attractiveness
  • Technical ability required
  • Smartness
  • Storage Costs

Excel: Pivot tables — good ole pivot tables can help you prepare your data to be visualized using Excel charts or web tools. Dragging variables around in different ways, so you see data in different ways. Need to look into this methodology more!

Infogr.am tool

Datawrapper — create widgets from your data to use on your website

Jolicharts — another tool; more simple data sets

Statwing — more complicated datasets

Explore public data resources

Impact Survey Reports

Explore library data app developed — lots of cross tabulation possibilities