Unique identifiers: why open grants data need them


(Michael Lenczner) #1

We’ve written about why data standards are important for minimizing inconsistencies between open grants data sets, in order for that data to be easily combined across funders. This, in turn, creates a rich dataset across funders, which allows us to ask interesting questions about the philanthropic landscape and identify potential funding gaps. One important component of most data standards are unique identifiers. In this week’s post, we’ll take a closer look at unique IDs, explore why they’re necessary for data-sharing, and give an example of how they work.

As this 360Giving blog post explains, an easy way to think about unique IDs is to imagine all the unique codes used in the everyday systems we are already familiar with. What good would barcodes, license plates, or emails be, if the same string of characters was used more than once, to refer to different objects, cars, or people? It would render their utility as unique references obsolete.

So—we can think of unique IDs as a barcode for an item that might be referred to in more than one dataset. In the context of grants data, unique IDs are usually used to identify specific projects or organizations, such as funder and grantees.

What happens to data sets without unique IDs?
Why couldn’t we just refer to an organization by their organization name in a dataset, instead of using a unique ID? There are two reasons why unique IDs are needed for identifying organizations in datasets, and these have to do with specificity and consistency.

Specificity: Unique IDs allow datasets to be specific about what they are referring to. The names of organizations are sometimes ambiguous and could refer to a range of organizations. “OTF” in a grants dataset could refer to the Ontario Trillium Foundation, the Ontario Teachers Federation, or the Open Technology Fund. While an organization might have shorthand for understanding which “OTF” they are referring to in the context of their work, this becomes messy when the data is published openly—or when it needs to be combined or compared with other datasets. Unique IDs ensure that publishers can be specific about exactly which organizations they are referring to in their data.

Consistency: Without unique IDs, the same organization could be entered into a grants database a dozen different ways. For example, the same YWCA Toronto branch could be entered as “YWCA”, or the “Young Women’s Christian Association - Toronto” or “YWCA Toronto”. Use of unique IDs ensures that an organization can be identified consistently using the same code, across different datasets, and even by different publishers.

How are unique IDs assigned or decided for organizations?
Many grants data standards require use of existing official registration numbers for organizational unique IDs. For example, the 360Giving data standard we introduced a few weeks ago requires unique IDs to use existing organizational registration numbers (e.g. registered charity number), preceded by a prefix list code that references the registry the registration number is taken from (e.g. the “GB-CHC” for the Charity Commission). Websites such as org-id.guide help data publishers choose an acceptable and relevant list code.

In short, IDs that are standardized and unique increase the utility of our datasets. Without them, our datasets would be more ambiguous, more messy, and harder to compare against one another.