Questions for all test sites (for test plan)

Public Broadcasting Metadata Dictionary
Test Implementation Questionnaire

Participant Information:

Test Site B

Organization Kentucky Educational Television

Test focus Ingestion and description

Test Partner

Respondent Lisa Carter

Please answer any questions that are relevant to the testing carried out at your location. If a question does not apply, you may skip that question or note that it is not applicable to your test. (NA)

PBMD Documentation:

1.      In the documentation provided, is the PBMD as presented in a manner that is easy enough to interpret?

From the documentation provided, and being a professional librarian, I was able to understand PBMD and interpret it for our local situation. However, I did find that it was my responsibility to further define what the definitions of the elements were, not only to address local conventions but also to clarify what was meant so that the PB Core based fields in the database could be filled out by non-librarians. I don’t think that PB Core as is can be handed over to non-librarians without guidance with the expectation that data entry will be consistently applied across organizations. Element names are foreign to everyday language and some of the fields need further clarification (coverage.spatial, coverage.temporal, title, creator/contributor)

2.      Is the PBMD presented in a way that allows you to easily apply it to your desired use?

This is a difficult question. While I think PBMD is a necessary and good thing, I think that where it currently stands, it is not easily applied to desired use. Guidelines for application would be extremely helpful. The students had an easy time of it because I formatted the old database to match field for field with the WGBH tool and I provided them with localized definitions and guidance throughout the test. My understanding of what people want is a ready made database that their old data sets can be easily mapped to and guidelines that clearly describe what information is necessary to put in which fields. While the expectations might be a bit unrealistic because you can’t craft guidelines and a tool that is going to meet everyone’s needs, I think that expectation is a reality.

Export:

3.      Was it clear how to map existing data elements to the PBMD for exporting?

Mapping our existing data elements was relatively easy, but we did not try exporting, I think due to the complexity of the WGBH tool. The biggest issue in mapping the existing data elements to the PB Core elements was in the inconsistency with which our data had been recorded over the years. In addition, there were many one to many maps that needed to be made. Both of these would have resulted in significant necessary clean up if we had try exporting. It was almost better that we didn’t try to automate it, since we needed human intervention for every record anyway.

4.      What were the most challenging obstacles to the export process?

The most challenging obstacles to the export process will be:

-         Inconsistency in how data was originally entered into old databases (it doesn’t matter if they were set up based on another schema or not). For example, in many cases, KET staff had entered what was really a program title into what KET had labeled “Series Title”, a field that also contains real series titles. In other cases, they would have an actual program title placed in a field KET had labeled Program Title (in other words, the database had both a Program and Series title field but how the fields were used was inconsistent). This could be as drastic as a “Category” field that usually had PUBLISHER information in it but for 50-100 records out of 10,000, “Category” would contain GENRE information.

-         One-to-many data fields. Most organizations using a very simple flat file database (such as an unsophisticated FMP database or an Excel document) will tend to string data along in one field instead of having repeatable fields. For example, when our CREATORS/CONTRIBUTORS were not in the “Notes” field but actually in a “Producer” field, several names were entered into one instance of the field. On export, we would have had to find a way to split the data out so that different creators/contributors would end up in different instances of the CREATORS/CONTRIBUTORS elements.

-         Determining where the “extra” data goes. Many stations have been keeping information about their assets that does not have a place in PB Core. For example, KET had some tapes that had budget information in the “Budget” field. The tool that the records needed to be exported to needs to be simple enough to allow for organizations to add fields that are relevant to them.

5.      What worked well? N/A

6.      Does the basic structure of the PBMD map easily to your existing metadata fields? See answer #4.

7.      Was the XML/DTD documentation good enough to allow you to create your own PBMD compliant XML documents? N/A

8.      What would make this process easier for you? (Internal system needs as well as possible changes to PBMD)

I think the most helpful preparation that would make this process easier would be to start an effort now to get organizations to understand that metadata generation is necessary (and explain why), that eventually they will have to map to a metadata schema like PB Core and get people to start thinking about and planning for the cleanup that is going to need to be done on their data or how they will have to begin to train people in documentation standards.

Also, a database tool that is simple enough to be mapped to existing flat file data sets for export would be helpful. The WGBH tool did not seem to have a table or interface where you could see how the data from our old database (even when already mapped to PB Core) could be imported. I would have liked to have seen one layout where all the PB Core elements were laid out.

9.      Please describe the process of mapping your existing fields to PBMD fields.

Intellectually, I analyzed the data in the old KET databases, determining, based on my librarian background, what sort of information was being recorded in which fields. I then created a new table based on the PB Core elements. Before pulling the data into the new table, I went through each PB Core element, copied the relevant information from the PB Core element definitions and placed the copied definition next to the element name. I then created a “KET Usage” definition where I described based on my analysis of KET data what data would be described in which PB Core elements. Based on this “KET Usage” definition, I then used FMP and its import/export function to map the old fields to the PB Core based fields. It was from this PB Core based table that the students worked to fill out the WGBH supplied tool.

10. Please describe the process involved in extracting PBMD compliant XML docs from your existing system. N/A

11. Was this process one which you could automate in the future? N/A

Delivery:

12. Please describe how you delivered PBMD documents to your test partner or project manager. N/A

13. Were that challenges in this process that should be addressed by the PBMD development team? N/A

Ingest:

14. Describe the process of mapping the PBMD to your existing collection’s descriptive metadata fields.

Intellectually, I analyzed the data in the old KET databases, determining, based on my librarian background, what sort of information was being recorded in which fields. I then created a new table based on the PB Core elements. Before pulling the data into the new table, I went through each PB Core element, copied the relevant information from the PB Core element definitions and placed the copied definition next to the element name. I then created a “KET Usage” definition where I described based on my analysis of KET data what data would be described in which PB Core elements. Based on this “KET Usage” definition, I then used FMP and its import/export function to map the old fields to the PB Core based fields. It was from this PB Core based table that the students worked to fill out the WGBH supplied tool.

15. Was this task an easy one to work carry out? If not, please describe a few of the challenges.

The most challenging aspects of mapping the PBMD to my existing collection’s descriptive metadata fields are the same as the obstacles to exporting outlined in question #4:

-         Inconsistency in how data was originally entered into old databases (it doesn’t matter if they were set up based on another schema or not). Another example in addition to that given in #4 would be determining which data in a variety of fields would be used to designate DATE.CREATED and DATE.ISSUED. The available date fields in the old KET database included, “Record Date”, “Air Dates”, “Request Date”, “Production Start”, “Production End” and while it seems that you would map “Record Date” to DATE.CREATED and “Air Dates” to DATE.ISSUED, it became apparent in analyzing KET’s data that sometimes the actual date created would be recorded in the “Production Start” field.

-         For KET, the most useful fields in the old databases were the “General Description” and various “Notes” fields. The challenge here was taking the various data that had been dumped in these fields and getting them placed in the appropriate PB Core field. This can only be done by human intervention. Sometimes “General Description” would have CREATOR/CONTRIBUTOR data in it, other times it would have DESCRIPTION data in it, other times tape condition, air dates, closed captioning, distribution and part information in it.

-         Related to the issue above is determining where the “extra” data goes. Many stations have been keeping information about their assets that does not have a place in PB Core. Another example in addition to that given in #4 is Nola Code Information which might contain rights information and is needed to be kept for a while longer for KET’s purposes. The KET database contained information about the metadata itself: “IPD ID” denoted the number of the Independent Program Description sheet the information came off of, there was a “Created” date for the data the data was input into the system, a field for who created the data and who modified it. Since the data in these records was being dragged forward, some of these fields might have been helpful to maintain.

16. Please describe how you received PBMD documents from your test partner. N/A

17. What were the initial challenges dealing with the documents that were delivered? N/A

18. Please describe the process by which you interpreted and ingested the XML documents you were delivered. N/A

19. Does having the content described with PBMD fields in your current system help you find materials that are described?

We have not really tried to use any of the improved databases for retrieval yet. The problems I anticipate is that because we are using FMP and not a search tool that allowing searching across fields, people are going to have to know which field a certain kind of information is stored in. For example, if they are used to searching in “Series Title” for any title, they may have problems when they search the cleaned-up database, since titles which are really program titles will have been moved to the “Program Title” field.

20. Is this an improvement over what you currently use?

I think that a PB Core based system is going to be an improvement over what we currently use, because it will standardize data entry and separate different kinds of data. I think the biggest challenge will be in enforcing compliancy in terms of having the right fields filled out with the right kind of information.

21. How much of this information will likely travel through the system to end users? (stations or consumers) NA

22. Do you see any benefit to upgrade your internal systems to take advantage of PBMD?

I do, but I don’t know how producers and engineering staff will react. While they desire interoperability and retrievability, there is a lack of realization about how standardized, authority based data entry and the time that it takes to do that data entry is necessary for good retrieval.

23. What were the most difficult problems you faced in this test? NA

24. Do you have suggestions on how we might address these challenges? NA

Content Description: (new content records)

25. Does the PBMD provide you with sufficient descriptive metadata elements and modifiers to properly describe your collection?

While I think the Core contains most of the fields that are important for standardizing metadata generation in many organizations and enhancing interoperability, I think that additional metadata elements will need to be added at each station for local use. In the case of KET and their digitizing project, we found that while we were describing the digital file primarily (and consciously choose not to create separate records to describe the tapes), we still had reasons for generating/keeping data about the original source tapes, since this would inform questions of visual quality about the digital files. This meant that the PBMD did not have sufficient descriptive metadata elements for our purposes because since our “item” was a digital file, there was no way to describe a related item more fully in the same record where you were describing the item at hand. For example, some of the tapes we digitized had bars and tone that didn’t match the video and this was not corrected in the digital file on purpose. However, in PB Core there seems to be no place to clarify that the mismatch between the calibrations and the actual video was a feature of the source video. Similarly, there was no place in the PB Core element set for us to note that the original video was a ¾” tape, which seems like it would be useful in describing the digital file. While it is true that the tape and the digital video file can be linked through the RELATION.TYPE and RELATION.INDENTIFIER fields, since there will not be any records for the related item, this descriptive information would be lost without adding local elements.

Along these lines, there is no provision in PB Core for describing a program with several different formats or instantiations. In the video world, it is quite common for you to have several copies of the exact same program where the only information that is different is the technical metadata. In some cases, that might not even be different, for example, in the case that you have digital video file stored on one server and an exact copy of the same digital video file on another server for backup purposes. The absolute only difference would be the storage location of the file. It seems silly to have two different records for what is essentially the same item. And in terms of usability, users don’t like to see that there are several records for the same “item” which are all identical except for the fact that one copy is a ¾” tape, one is a 1” tape, one is a digital file and one is a backup digital file. If they see 4 different records they think you have 4 different tapes. While I’m sure this is based on traditional library/archives practice, I think the reality of how this is applied in a production environment needs to be rethought.

Finally, there will always need to be local “handling” fields. Internal management fields like airing information, condition, related formats, information about who has reformatted the item (engineering staff or technicians), metadata generation and updating activities and dates related to these are all important to managing the life-cycle of the items and need to be included in an asset management system. While PBMD might not have a place for these, implementation guidelines might need to address these concerns.

26. Does the PBMD element set easily map to your existing records for material you are describing using the PBMD?

The PBMD element set did not easily map to our existing records. It was only after I had gone through and analyzed the old data, created some maps to PB Core and exported the records to a transitional database that the mapping was complete. See Questions #15 and #4 for more discussion of the challenged I faced.

27. Please list any elements or modifiers that you feel are missing for this type of use.

As mentioned above, we needed to be able to describe features of the source tape while describing the digital file. Elements such as Condition, Relation Format, Relation Location and metadata tracking fields like who created the metadata record, who updated it and the date might be considered. Also when you are describing a digital file that is merely a surrogate for a videotape, how to you record the “creator” of the digital file. While the Producer/Director, etc should be recorded in the CREATOR field, where would you record the technician who didn’t “create” the content, but “created” the digital file.

28. Please list elements and modifiers that you feel are not necessary to have in the core set of metadata fields.

Since we were describing video, there was some question about the relevancy of the Audio elements: FORMAT.AUDIO.BIT.DEPTH, FORMAT.AUDIO.DATA.RATE, FORMAT.AUDIO.SAMPLING.RATE. Since these were archival videos, we questioned the relevancy of DATE.ISSUED and wondered exactly which date should go in there, the date it was placed on a storage location that anyone in the agency could access, the date the item will be posted on the web, the date that it is made available for ordering. FORMAT.TIME.START is questionable, since these were all complete (from beginning to end) programs, all the TIME.STARTs would be 00:00:00:00. DATE.AVAILABLE.START and DATE.AVAILABLE.END we did not use at all and I have left them out of even the eventual, complete element set. They seem to be repetitive of DATE.ISSUED or unclear as to what data belongs in it.

The application guidelines for TITLE and its qualified incarnations, TITLE.SERIES, TITLE.PROGRAM, TITLE.EPISODE need to be clarified because to enter data in all of them is repetitive. Some better delineation between what should go in TITLE and what goes in the others would be helpful. Otherwise, TITLE seems unnecessary because you can just combine the others to come up with the TITLE.

Finally, the definition of “core” needs to be more clear or reconsidered. Is the Core the mandatory fields? Is it all 58 elements? Is it just the xx.00 elements? First, I don’t think you are ever going to get anyone in the television industry to fill out 58 core elements. I think 15 elements is the most you can expect out of anyone. Perhaps your 58 could be pared down by offering tools that automatically generate the instantiation fields, in a digitizing process or gather them from the files or the software that generates the files. Also, offering tools to import old data sets will help ease the transition. Certainly, there has to be ways to integrate a tool based on the PB Core element set into the current workflow and tools used to create, edit and air programs that will allow for the harvesting of metadata for most of the instantiation fields. The pop-up value lists are going to make things easier. If typing only has to done in the TITLE, DESCRIPTION, RELATION.IDENTIFIER, CREATOR, PUBLISHER, CONTRIBUTOR, RIGHTS, DATE, IDENTIFIER, and LOCATION fields, you’ll have a much better chance of people adopting this standard. 58 elements is intimidating (even if almost half of them are easy instantiation fields). Now if you can get a software company to develop an easy to use package that automatically harvests metadata from files, editing or ingestion software and allows other fields to be set with a default and allows for pop-up fields for most of the others (again getting the actual typing that has to be done into 15 fields) then I think your chances of getting this adopted is greater.

29. Please describe the top three challenges you faced in this test.

The top three PB Core specific challenges that I faced as the local test manager/supervisor included:

Mapping the existing data to the PB Core elements, creating definitions for local use and dealing with implementation questions regarding what data should go in which field. (Especially: COVERAGE.SPATIAL, COVERAGE.TEMPORAL, CREATOR/CONTRIBUTOR, ANNOTATION)

The proper implementation of TITLE and TITLE.SERIES, TITLE.PROGRAM, TITLE.EPISODE and understanding how or if these elements worked together.

Dealing with the description of our very real situation (several formats described by one record, not having enough data to fill out all the fields, etc) within a standard that was developed for broader use.

More than the challenges with PB Core were the challenges we had working with the supplied tool:

The importing and exporting of data into the tool and back out appeared so complex that we didn’t even try it. We are still very concerned about how we are going to get our cleaned up information out of the tool so we can use it. Basically, while the tool is very sophisticated and clearly along the lines of what we all need, we are not in anyway prepared to move from our relatively, flat file database to something so complex.

Setting the tool up on a host machine as a multi-user file didn’t not go well at all (my choice, I admit) and many of the problems we had with the test resulted from this set-up. The option of providing this database for data-entry over web was unreasonable in the time given but even with several months, because of its complexity, it seems unrealistic. So how to use this tool with several different people working at the same time on the same set of records was a real issue for us. It didn’t help when lightening took the power out and the tool had to be restarted from afar. I think that if the tool were not so complex, it might have been easier to manage in this sort of a set up.

Other issues with the tool were minor and could have been fixed appropriately if the test/implementation period were longer.

30. What could have been provided that would have helped make this process easier or more effective?

Primarily, a simplified tool made accessible over the web would have really helped us with the testing. Again, we were prepared for a relatively flat file.

Application guidelines on what kind of data or form of data was expected in certain fields would have helped a lot. I think for people who aren’t librarians or catalogers you might need to provide more assistance and guidance on exactly what kind of information is expected to appear in some of the fields.

31. Would you recommend other use the PBMD as a starting point for their archiving or collection digitization projects?

Yes, but I wouldn’t encourage them to do it now, without some sort of archival intervention, a robust IT environment, committed computer support staff, consideration of the systems they have to integrate it into and a (in appearance at least) a simplified sample tool they could work with.

However, if you can educate people about what PBMD is and it’s potential, then they can start thinking about structuring their metadata gathering to comply with the schema. At the very least if people can be recording TITLE, DESCRIPTION, RELATION.IDENTIFIER, CREATOR, PUBLISHER, CONTRIBUTOR, RIGHTS, DATE, IDENTIFIER, and LOCATION on a spreadsheet or database, that would help. If they could just start pulling their source descriptive material together that would help. Or they could just start working with their producers to document that information about the programs anywhere.

32. Was the provided File Maker Pro mark-up tool a good fit for the project you are using it for?

I don’t think so. While it’s a potentially great tool for an asset management system, we really aren’t ready to take such a system on. All we are doing right now is recording descriptive information for digital video surrogates of our tape archives. We expected and planned to have very flat data to integrate and export into a longer term asset management system. To continue to use this tool at KET we would have to reconceptualize how we are managing metadata right now and none of the planning and cultural changes have taken place to lay the groundwork for that. We would have to look more closely than we have to address what it is that we want to be able to do with the data and whether or not we wanted to adopt MARS before investing in the time to customize the database and train people how to use it effectively.

33. What were the tools strong points and weaknesses?

Both its strong points and weaknesses are in its complexity. It’s clear that this database is well structured to handle a robust metadata structure and would contribute effectively too many different uses of the database (cataloging, collection management and retrieval). But with all the tables, relationships, and lookups, I think it is too complex to figure out how to use effectively in a short period of time or by users who are not committed to managing structured data (i.e. producers, aps, interns).

One thing that would really help with the database would be mouse over dialog boxes that pop up to remind people what the definition is of the element they are using.

Mapping: (crosswalks between metadata schemes and PBMD)

34. Please evaluate the form in which the PBMD is currently presented for your use and the supporting documentation.

I really like the website and think it was highly useful. I think some additional work can be done to make a few points and the structure clearer (like indenting the qualifiers or elements people could leave out in the menu bar on the left hand side of the screen. I think there could be more examples on the website (http://www.utah.edu/cpbmetadata/PBCore/Title.html) and the ability to cut and paste recommended value lists from the web pages.

It may be just me and my inexperience showing but the DTD, the Dummy xml were really not useful to me at all.

The PBCore Printout was handy, but I had already done that work and enhanced it for our purposes. I think this would be very helpful for people who are new to it though.

The only thing on my wish list along these lines was a ready made FMP database (simple) that was ready for me to plug my data in and go, with the definitions handy for reference. I had to create this myself and wasn’t happy with the results. Something between what I created on my own and the tool we received to do the test would be what I had in mind.

35. Considering the audience at which the PBMD is aimed, please evaluate how complete the PBMD is.

I think the PBMD is very complete. If anything there are too many fields to fill out, although I think that most of them are necessary. I think just the right amount of “options” are presented in the sense that if someone were starting with brand new items and wanted to do a “complete” job that would document most of the things they would need to have metadata gathered about, this would do the job.

36. Please describe the process that you used to create crosswalks between the standards that were mapped to the PBMD. N/A

37. Do you feel that the PBMD provides enough of a core metadata set for content description, and how does this compare to the other standards included in you test? N/A

38. Are these crosswalks something that, if finalized, might be easily kept up to date to allow the automated conversion of materials marked up in one standard to be translated to another? N/A

General:

39. What are the primary uses you see for the PBMD in your organization?

I think the primary use the PBMD has at KET is a way to structure the data they should be gathering about their programming. KET has not employed standards in the past and thus held out no incentive for accurate description of its assets. I think PBMD will help whoever is given the responsibility of implementing an asset management system here explain what kind of metadata we should be gathering and how it should be structured. It will also hold people to a standard. We can persuade the producers and tape librarians to give us better metadata because “it needs to be PB Core compliant”. People are craving structure here and anything that promises them retrievability and I think PB Core is a fundamental piece of the foundation that has to be built.

40. Do believe that materials that have been defined using PBMD schema will be easier to exchange between partners?

Sure, if you can get those partners to implement PBMD in a consistent and documented way. Issues outlined above regarding having more specific application guidelines and developing as many automation features and controlled vocabulary will assist in having data consistently applied.

41. Do these partners need to be also using the PBMD to make it useful, and how important is wide acceptance/implementation?

As long as you have the crosswalks and people really are putting the title in the TITLE (or its cross walked companions) and not sometimes putting Program Title in the Series Title field, I think you can probably get by with not everyone using PBMD. While I don’t think its wide acceptance/implementation is necessary or even as important as being able to crosswalk/map, I think that public television stations are anxious for this structure. Not to be imposed upon them but to be offered up in easy to use, simple tools that can be integrated with their other tools (like Protrack, Virage, etc).

42. Please give a short evaluation of the support materials (app profile etc)

I’m still not quite sure what the application profile is. Again, I think the web site is great and the PBCorePrintout was also helpful. I didn’t look at the DTD at all or the .xml file, but I’m glad to have them for when we get to that point. The one thing I could have used is what I would think an application profile would be. This would be guidelines for application, such as what form should the title take (capitalize which words, where do I put my colon, how to TITLE and the other TITLE elements work together, if at all)

43. What further documentation will be necessary to allow Public broadcasters and their partners to effectively use the PBMD?

I think for the people who want it you are going to need to provide more of a bridge between the element definitions and the source data they actually have to work with. I think you need to tell them which of the elements are the most necessary (if you don’t have it readily at hand, should you go look it up and why). I think you might need to translate some of the element names (I still can’t remember what kind of data is supposed to go in FORMAT.IDENTIFIER without looking at the definition). I think you need a starter tool, like a really simple database interface with help built in (what was the definition of this field again?). I think you need to find someone who can explain in simple terms what PB Core is and what benefits using it will provide so that it won’t intimidate people. Again a simple web tool might go a long way towards getting people to begin using it.

44. What changes would you suggest we make before releasing this to the rest of the industry?

I hate to beat a dead horse, but you have to simplify. You need to have a simpler tool for the people who aren’t doing anything more than labeling tapes right now. It needs to be easily accessible (not expensive software) and modifiable (with guidance I would hope). You have to make the names of the elements less intimidating. You need to provide simplified versions of the definitions where possible. You need to educate people about what PB Core is and what it isn’t. You need to do a lot of user analysis to find out what it is that people want and show them how PB Core will help them reach their goals. One thing that would be really helpful is if you could find out what kinds of systems and softwares stations are already using and come up with templates or examples for how PB Core can be used in that system (FMP template, Access template, .xml template, Virage template, Protrack, Excel) I think you also need to be able to concisely explain how this will work with or within the NGIS and how PB Core will help them utilize the NGIS more effectively.

45. What opportunities do you see to exploit the PBMD?

I’m not quite sure what you mean by this question. Personally, I’m going to use it to construct databases for my various projects (not only KET’s database, but the University of Kentucky’s video archive). When people call me, I’m going to suggest that they look at the PB Core elements rather than advise them to build a database from scratch. I think the PBMD can be used to finally get everyone talking about describing their content and keeping better records.

46. Do you think that having content described with the PBMD scheme will increase reuse of the material within your organization?

The use of any metadata scheme would increase reuse of the materials with KET. But it can’t do it alone. There’s still a lot of work to do to develop a hearty database with a web based interface that people can use for searching and data entry from their desktop and integrates with Medialogger, Virage and Protrack. And then there’s the training and getting people to use it. It’s going to take management support and a change in the culture of KET. PBMD is not the magic bullet that will unlock the key to increasing the use of the programming in KET. But it is a significant part of the puzzle, a foundation from which to work.

Public Broadcasting Metadata Dictionary Test Implementation Questionnaire

Public Broadcasting Metadata Dictionary
Test Implementation Questionnaire