Development Funding for PBCore provided by theCorporation for Public Broadcasting
PBCore is a Metadata & Cataloging Resource
Alison M. White
Paul E. Burrows
Efthimis N. Efthimiadis?
PB Core is the result of the public broadcasting Metadata Dictionary Project (PBMD Project). It is an effort of the public radio and television broadcasters to develop a schema for the description of their assets. The PBMD Project is under the auspices of the Corporation for Public Broadcasting. The paper discusses the user-centered development of the schema, the elements of the PB Core, the application profile, and the feedback and evaluation process of the schema.
Keywords: Public Broadcasting Metadata Dictionary Project, Dublin Core, PB Core, Media Asset Description.
As public broadcasting endeavors to maintain our value and values in a dramatically altered media environment, we know we must do three things: develop and deliver content across multiple platforms, strengthen our editorial and service partnerships, and engage in more efficient methods of conducting our new and legacy activities.
The recent convergence of IT capabilities with those of radio and television broadcasting has caused us and our constituents to appreciate that our prized editorial output (video clips, audio interviews, transcripts, etc.) can be understood as a series of digital assets, that can be identified, exchanged and distributed using an advanced digital infrastructure. Our ability to network — to exchange rich media content — within and across our newsrooms, production suites, satellite and terrestrial distribution systems, etc., and even with our educational and community partners (schools, libraries, museums) has never been greater. We have been afforded a tremendous opportunity for cultural relevance and operational efficiency.
In a public broadcasting system made up of hundreds of independent licensees, the challenges of organizing universal processes for asset appraisal, digitization, rights clearance, preservation, etc. are myriad, perhaps overwhelming. We did understand, however, that the foundation of any future effort in this direction would be a single, shared protocol for identifying and describing our rich media assets.
The Public Broadcasting Metadata Dictionary Project (PBMD Project) is a cross-organizational, multi-disciplined effort to establish a standard for all public broadcasting content (radio and television), in order that metadata might be more easily exchanged between colleagues, software systems, institutions, community partners, individual citizens, etc. The PBMD Project will be a "touchstone," a single, streamlined standard to which other database structures, including those of PBS, NPR, major producing stations, and other asset/content management systems will be "mapped." It can also be used as a guide for the onset of an archival or asset management process at an individual station or institution.
The project has been extant since January of 2002, and during its first two phases of CPB Future Fund support, a team of individuals representing public broadcasting's key institutions and endeavors, along with subject matter experts (see appendix for list of participants) has worked to:
The main goal of the PBMD Project is to create a schema that is easily understood, implemented and adopted by the Public Broadcasting community at large. PBMD Project embarked in a detailed review of existing metadata standards that are used for the description of rich media assets. These included standards that deal with the descriptive, administrative, and educational aspects of the assets. In general, while many of the metadata standards discussed below are in development, the Dublin Core Element Set has remained stable since its 1.1 revision in 1999 . Additions and other changes to the Dublin Core model come in the form of recommendations and application profiles, but the basic core of 15 elements remain unchanged. So we have built our model upon the Dublin Core that provides a solid foundation that is extensible, scalable, and easy to understand.
The standards that were considered were OAIS, SMEF-DM, MARC, METS and MPEG-7, as well as the educational standards SCORM, LOM, IMS. These are briefly discussed below.
OAIS: Reference Model for an Open Archival Information System  is a framework and reference architecture for digital preservation.
SMEF-DM: Standard Media Exchange Framework - Data Model  is an end to end broadcast production model, workflow oriented. Our assets may involve domains or materials not exclusive or even related to broadcasting, such as CD-ROM, DVD, books. Metadata was determined to describe assets as objects or files. However, SMEF mandates a specific workflow with limited options. For example, assumptions are made on the order of activities. Our experience is that productions have many different workflows that must be accommodated.
MPEG-7: "Multimedia Content Description Interface" is a highly structured standard focusing on multimedia. Our model does not preclude a station adopting MPEG-7 because the PB Core is based on the Dublin Core model and will map to MPEG-7. On the other hand, MPEG-7 is narrowly focused on multimedia, not on the wide range of other media or materials that will be found in a producing station's repository. See Hunter [5,6], Agnew .
MARC: The MARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form . MARC requires a cataloging skill set that is not likely to be found in most public broadcasting stations. Our model insists on the integrity of each asset (version or format of the content). Dublin Core crosswalk maps to key fields in MARC http://www.loc.gov/marc/dccross.html.
METS: Metadata Encoding and Transmission Standard . The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library.
SCORM: The Sharable Content Object Reference Model . This is an application profile "to provide a comprehensive suite of e-learning capabilities that enable interoperability, accessibility and reusability of Web-based learning content."
IEEE LOM: IEEE 1484 Learning Objects Metadata. A Learning Object is defined as any entity, digital or non-digital, which can be used, re-used or referenced during technology-supported learning . The mapping of LOM to Dublin Core is available at .
IMS Global Learning Consortium. IMS Meta-Data v1.2.2  The IMS Project originated in higher education but it now involves stakeholders in corporate and government training, K-12, and continuing education. The IMS learning consortium develops learning technology interoperability specifications. IMS initially set out to produce a unified specification covering metadata, content, administrative systems, and learner information. This proved to be too large a specification and IMS broke it up into component parts, with separate working groups developing each, and each being released separately.
The above schemas have many commonalities and there is an effort to increase interoperability among them. For example, SCORM uses LOM vocabulary. All schemas could be mapped to qualified Dublin Core elements. Extensions to our model as well as value lists (element types) allow for incorporating some of these needs.
Many parties have asked us why we did not adopt and adapt metadata schemas already in existence or in development. For several reasons, the existing standards were not appropriate to our needs. Basically, alternative schemas were either too cursory in their descriptive capabilities or far too ponderous.
An implementation project, such as the Public Broadcasting Metadata Dictionary Project Project, generally finds that no one metadata standard completely meets its needs for descriptions of media essence. General standards, like Dublin Core, are often folded into domain- or sector-specific standards, such as MPEG-7 for multimedia and IEEE/LOM for educational resources. New elements may be devised which meet local needs not covered by any existing standards. The Public Broadcasting Core can be thought of as an application profile whose schema combines elements from multiple standards, with application-specific constraints (as in the use of specific controlled vocabularies or structured values). The PB Core must be understandable and usable by all public broadcasting entities, from the smallest local NPR radio station to the largest public television producers of national programming.
The PBMD Project's primary interest is in data exchange, data crosswalks, and interoperability, not necessarily in creating a complete metadata model that can be exploited by digital asset management systems for comprehensive, original cataloging and markup of essence. The Project desires to facilitate the sharing of metadata and the discovery of valued assets. The PB Core is intended to be "simple," but not "simplistic." Furthermore, the PB Core should be considered as a starting point that may accommodate metadata extensions of interest to specific communities and users.
Consequently, the Project undertook a path that would reflect the Public Broadcasting industry's needs and wants regarding media assets by gathering together representatives from public broadcasting and growing a consensus. The unique quality of public broadcasting, both television and radio, is its local ownership and local ties to its surrounding communities. In a parallel fashion, the Public Broadcasting Metadata Dictionary Project was designed to tap into the various local constituencies and develop a metadata core from "grassroots" origins, rather than by administrative edict.
The Project conducted a detailed "needs assessment" of public broadcasters. Such measures are revealing and often unmask and articulate conditions, issues, needs, and desires that otherwise are dismissed or forgotten. By applying user-centered techniques PBMD Project was able to discover a wide spectrum of needs and applied the most appropriate metadata elements.
Public broadcasters have always endeavored to engage in complex and robust relationships with their constituents, whether those are viewers, listeners, educators, community leaders, etc. We have always provided extensive outreach for our broadcast content, with particular emphasis on the needs of K-12 teachers and lifelong learners. Today, with the advent of the Internet, that outreach is more significant and successful than ever before. As mentioned above, we also have an extremely complex structure; as opposed to our media counterparts, who increasingly concentrate their ownership and control of media outlets, very little of public broadcasting's operations are centralized. We have innumerable systems for producing and tracking our content, and our institutions are structured in a variety of ways, often based on who holds the broadcast license.
In order to ascertain the metadata needs of our "external" users ® constituents ® and "internal" users ® local and national staff ® we first created a list of users, and then double-checked this "strawman" with the core PBMD Project working group. A "User Requirements Team" was formed from within the working group. Using the now-modified user list, they set out to create a series of Use Case Scenarios.
During this process, the "User Requirements Team" interviewed a large number of stakeholders, including national program distributors, local station broadcast operations and IT staff, a K-12 "learning object" consortium, an independent television production company, a television graphic artist, and "interactive" specialists (web and TV).
The interviews provided very useful feedback that helped define aspects such as the levels of granularity for the description of assets, the specificity with respect to the number of elements, type of information to be described, such as rights, and encoding standards, e.g., XML. For example, what emerged from the interviews was a clear division between full-program metadata (such as title, format, date), which serves the needs of national distribution and local broadcast operations, and fragment, or clip-level data, which serves the needs of producers, educators, and website programmers. Most use case participants felt that it was critical to have a simple, intuitive set of metadata elements, with extensions for particular constituencies, e.g., K-12 curriculum-correlation, or graphics creation, so that the maximum number of assets could be identified and retrieved by the greatest number of individuals and institutions.
There was a great deal of concern about rights management, without which future business and service models crumble. Several interviewees felt that the working group should also determine standards for metadata exchange, such as XML.
A powerhouse of motivated and opinionated experts was assembled to contribute to the Public Broadcasting Metadata Dictionary Project Project. The members were drawn from a variety of communities related to public broadcasting:
The initial work of the members for the Public Broadcasting Metadata Dictionary Project Project lasted seven months. The overarching goal of the group was to recommend usable metadata fields that would facilitate the exchange of program and resource information between public broadcasting communities and other interested parties. Guiding our work process was the question, "How would a particular metadata element ultimately contribute to the discovery of public broadcasting's intellectual content by various end-users"? The objectives of the Working Group were to:
In the seven-month time period, two full meetings of the entire Working Group were conducted, as well as follow-up committee work.
These activities led to an intensive three-day work session in Boston (2002-10-16,17,18), where the Public Broadcasting Metadata Core was refined and honed by the PB Core Review Team.
Before the Boston Summit, the PB Core Review Team had surveyed existing metadata dictionaries from various authorities and organizations, including those in use by several public broadcasting groups. A total of 467 separate metadata elements were compiled, which spawned 2335 recommendations for grouping and collapsing the elements into the most relevant. From these recommendations, a total of 249 working metadata elements and their qualifiers were selected.
The work of the PB Core Review Team at the Boston Summit combined redundant elements, discarded the less relevant, and debated the appropriate application of preferred metadata within the dictionary. The Summit yielded a preliminary draft of 58 metadata elements and their qualifiers that were most appropriate to public broadcasting and related communities. (For details see http://www.pbcore.org and select User Guide)
Many of the 58 metadata elements selected for the Public Broadcasting Core of metadata descriptors were drawn from the Dublin Core Metadata Dictionary Project. Others were retained from existing public broadcasting digital asset management systems in development. Still others were drawn from additional working groups.
The PB Core Elements could be placed into three categories or clusters:
Table 1 reviews the 58 elements and qualifiers currently under consideration by the Public Broadcasting Metadata Dictionary Project. The Registration Authorities listed represent the agency of responsibility for the long term integrity and viability of particular metadata elements and associated qualifiers:
DESCRIPTIONS ABOUT THE CONTENT...
DESCRIPTIONS RELATED TO INTELLECTUAL PROPERTY...
DESCRIPTIONS IDENTIFYING A MEDIA ASSET'S INSTANTIATION...
SPECIAL EXTENSIONS ...
The 58 elements are delineated by 15 attributes according to the modified ISO 11179 Specification and Standardization of Data Elements . The full accounting of the specification is too large a document to include in this paper.
PBMD Project's interest is in data exchange, data crosswalks, and interoperability, not necessarily in creating a complete metadata model that can be exploited by digital asset management systems for comprehensive, original cataloging and markup of essence. Consequently, the primary desire of PBMD Project is to facilitate the sharing of metadata and the discovery of valued assets. Within the Application Profile, issues of concern to PBMD Project are:
The Project recognizes that it needs to remain focused on the fact that the Working Group is not a body of "standards makers." Rather, we are "real life implementers" who are tasked with generating effective solutions in order to service the efficient and widespread delivery of public broadcasting's intellectual content. Similar to our day-to-day business, we are engaged in applied and practical solution-making.
Like many other groups debating the application of metadata schemes, the Project remains conflicted in how best to match metadata descriptors with various instantiations of essence and assets. The question of embracing a "one-to-one" relationship between a metadata record and its associated essence or subscribing to a "one-to-many" relationship between a metadata record and the various instantiations of its essence still plagues the PBMD Project Project. Compelling arguments have been presented on both sides of the issue. We are hopeful that the next phase of our project, a Request for Comments, will assist us in sorting out a solution.
To a great extent, the work of the Public Broadcasting Metadata Working Group has modeled an unheard-of process ® coordination and consensus across vastly different institutions, on a topic of extreme detail and importance. The Preliminary PB Core is ready to be reviewed and tested.
During the next several months the Working Group will be asked to engage in an even more difficult process a mid-course evaluation.
The group will be divided into task teams, and through research, interviews, conference calls, and "thought papers," will address the following issues and objectives:
It is our assumption that these difficult questions will be answered in a manner that leads the project to the RFC (Request for Comments) process, and then test implementations in typical metadata scenarios.
The RFC process will include other public broadcasting production, IT and broadcast operations staff, key software vendors serving the industry, standards organizations, partnering institutions, etc.
Test implementations of the PB Core, still to be determined, will likely include radio, television and website production collaborations, tape libraries, national program distribution systems, as well as national producers of content. Consideration will be given to additional test participant(s) whose products, services and Projects are used by, and/or relate to public broadcasting stations and organizations.
The PBMD Project process has illuminated for participants and observers alike the critical need for a new, "advanced networking" approach toward conducting our core activities. We must change our institutions and infrastructures, even our funding models, to reflect a new spirit of exchange, collaboration and consolidation. Certainly, without Internet-like standards for descriptive and administrative metadata, rich media file formats, file exchange, etc., we will not be able to keep pace with changes in the media environment, nor will we advance our public service mission.
 Dublin Core Metadata Element Set, Version 1.1: Reference Description. Retrieved April 15, 2003, from http://www.dublincore.org/documents/dces/
 OAIS: Reference Model for an Open Archival Information System. Retrieved April 15, 2003, http://ssdoo.gsfc.nasa.gov/nost/isoas/overview.html
 SMEF-DM: Standard Media Exchange Framework - Data Model. Retrieved April 15, 2003, http://www.bbc.co.uk/guidelines/smef/
 MPEG-7: "Multimedia Content Description Interface" ISO/IEC JTC1/SC29/WG11. Retrieved April 15, 2003, from http://mpeg.telecomitalialab.com/standards/mpeg-7/mpeg-7.htm
 Hunter, J. "A Proposal for the Integration of Dublin Core and MPEG-7", ISO/IEC JTC1/SC29/WG11 M6500, 54th MPEG Meeting, La Baule, October 2000. Retrieved April 15, 2003, from http://archive.dstc.edu.au/RDU/staff/jane-hunter/m6500.zip
 Hunter, J. ViDE Video Access Group, "An Application Profile which combines DC and MPEG-7 for Simple Video Description", February 12, 2002. Retrieved April 15, 2003, from http://archive.dstc.edu.au/RDU/staff/jane-hunter/publications.html
 Agnew, G. (2003) A Tale of Two Schemas: Mapping Dublin Core to MPEG7. Retrieved May 15, 2003, from http://gondolin.rutgers.edu/MIC/text/how/mpeg7DC_agnew.pdf
 MARC: Machine Readable Cataloging. Retrieved April 15, 2003, from http://www.loc.gov/marc/
 METS: Metadata Encoding and Transmission Standard. Retrieved April 15, 2003, from http://www.loc.gov/standards/mets/
 SCORM: The Sharable Content Object Reference Model. Retrieved April 15, 2003, from http://www.adlnet.org/
 IEEE Learning Technology Standards Committee (LTSC). Learning Object Metadata. Draft Document v3.6. Retrieved April 15, 2003, from http://ltsc.ieee.org/doc/wg12/LOM3.6.html
 Sutton, S.A. (1999) IEEE 1484 LOM mappings to Dublin Core: Learning Object Metadata: Draft Document v3.6, IEEE Learning Technology Standards Committee (LTSC), 5 September 1999. Retrieved April 15, 2003, from http://www.ischool.washington.edu/sasutton/IEEE1484.html
 IMS Global Learning Consortium. IMS Meta-Data v1.2.2. Retrieved April 15, 2003, from http://www.imsproject.org/metadata/index.cfm
 ISO 11179 Specification and Standardization of Data Elements. Retrieved January 11, 2003, from http://www.diffuse.org/meta.html#ISO11179.
Marcia Brooks, WGBH (Project Director)
Dennis Haarsager, KWSU
Amy Rantanen, WGBH
Alison M. White, Corporation for Public Broadcasting
Grace Agnew, Rutgers University (AMIA)
Judy Brown, University of Wisconsin, DOD Academic CoLab (SCORM Project)
Efthimis N. Efthimiadis, The Information School, University of Washington, Seattle, WA
Alan Baker, Minnesota Public Radio
Nancy Baldacci, American Public Television
Sharon Blair, AMIA Local Television Task Force
Marty Bloss, National Public Radio Public Radio Satellite Service (NPR PRSS)
Paul E. Burrows, Media Solutions, OIT, University of Utah
Brian Callahan, WHRO (former participant)
Michael Connet, onCourse
David Felland, Milwaukee Public Television
Tom Handy, KWSU (former participant)
Steven Heard, Public Interactive
Rob Holt, NPR Online
Dave Johnston, PBS Online
Ann Lootens, WGBH
Dave MacCarn, WGBH
Chuck McConnell, NETA/OSBE
Bea Morse, Public Broadcasting Service (PBS)
Robin Mudge, onCourse (former participant)
Lesley Norman, David Grubin Productions (former participant)
Meg O'Hara, WNET
Tim Olson, KQED
Marilyn Pierce, Public Broadcasting Service (PBS)
Richard Ruotolo, Public Radio International (PRI)
James Steinbach, Wisconsin Public Television
Brent Trinacty, Public Radio International (PRI) (former participant)
Cate Twohill, Public Broadcasting Service (PBS)
Steven Vedro, Consultant
Tracy Vosburgh, WPSX (Penn State University)
Michael Yoch, NPR Online (former participant)
Art Zygielbaum, Nebraska Education Television
Additional Support Provided By
Scott Bridgewater, National Public Radio Public Radio Satellite Service (NPR PRSS)
Carrie Lowe, Public Broadcasting Service (PBS)
Thom Shepard, WGBH
and other staff at many of the participating organizations listed above