How To Find Your OAI Sets And Records
Find Your Harvesting URL
- Contact your systems manager if you need help finding your base harvesting URL. It may look something like this: https://cdm16007.contentdm.oclc.org/oai/oai.php
- To view all sets visible for harvest, add “?verb=ListSets” to the end of the link as shown above. Your link will now look like this: https://cdm16007.contentdm.oclc.org/oai/oai.php?verb=ListSets. This page will display all of your sets in XML format. Take note of the setSpec values for the next step.
To view the metadata records of a specific set, you’ll need to determine the metadata schema you want to view your records in and choose a setSpec from the previous step. In this example, we’ll view the records in Qualified Dublin Core and view the State Library of Ohio Rare Books Collection, for which Set Spec value is “p267401cdi.”
Add the following string to your base harvesting URL: “?verb=ListRecords&set=[setSpec chosen above]&metadataPrefix=oai_qdc.”
The URL is now https://cdm16007.contentdm.oclc.org/oai/oai.php?verb=ListRecords&set=p267401cdi&metadataPrefix=oai_qdc.
This link will display the metadata in XML format. You can see how each metadata value is mapped. Review a few records to make sure your metadata is displaying as expected.
Disable Page Level Metadata In OAI-PMH Feed
DPLA requires all items in the metadata to be in an item level record, and not in individual page level records. Please take a moment to review your OAI-PMH output settings. If you’re using CONTENTdm, please check the following settings:
CONTENTdm Administration > “Administration” tab > “Harvesting”
The “Enable compound object pages” option in the “OAI-PMH” section allows you to enable or disable this functionality. This setting should be disabled for page-level records.
If you’re using another Digital Assessment Management System, please contact your server administrator to determine the settings in your system.
Removing Deleted Records From OAI-PMH Feed
It would be helpful during our initial DPLA setup of your collections if you could remove the references to deleted records in the OAI-PMH feed. This isn’t required, but it simplifies the QA process of your metadata.
If you are using a Digital Assessment Management System other than CONTENTdm, please contact your server administrator to find out how to perform this task in your system.
In CONTENTdm, the software keeps track of deleted records. Basic information about deleted records is then sent to any OAI-PMH harvesters (including us in our role with the DPLA). To the best of our knowledge, this information isn’t used in any other context in CONTENTdm, and it should be safe to remove the references to these deleted records from the collection (with one caveat explained below) but you should still confirm with your CONTENTdm support person this functionality has not changed recently.
It’s not possible to clear these references using the Graphical User Interface (GUI) in CONTENTdm. You must have access to the back-end server to can manually delete a file from it. If your CONTENTdm server is hosted by OCLC, you must send a request to the CONTENTdm Support/Hosting teams for this change to be implemented.
Each CONTENTdm collection would need to be processed separately. The file to be removed from the collection is: /index/description/delete.log
After deleting the “delete.log” file, it will be recreated the next time you delete an item from this CONTENTdm collection. This won’t be a problem, as we simply wish to clear out the deleted references during the initial setup when we’re looking most closely at your data in order to verify that our harvesting process is correct.
Removing “deleted” references from the OAI-PMH feed should only be problematic if you have another entity harvesting data from these collections and this other entity needs to know when records have been deleted from the collections. This information would then be used to keep that other entity’s records up-to-date. If you have no other harvesters doing this, then removal of the deleted references shouldn’t be a problem.