10.4 PDF Metadata: Two Steps Forward, One Step Back
The two steps forward:
- Documents exported to PDF from applications such as Pages now have metadata embedded in them.
- The filesystem indexes this metadata and can use it to filter and group PDFs (through Spotlight).
The step backward is that metadata embedded in documents such as Pages and Word files is not translated to PDF metadata. Instead, the user’s name is set as the Author and the file name (sans .extension) is set as the Title. This is a reasonable thing to do when the user has not manually input any metadata, but not otherwise.
I will walk through and illustrate the problem with a recent Walter Pincus story from the Washington Post, “British Intelligence Warned of Iraq War“.
- I copied the text of the article to a Pages document, allowing me to trim out the advertisements and links that would remain if I saved the web page directly as a PDF.
- I added a few bits of information to the Pages document using the Inspector’s Document Inspector -> Info section.
- I exported the Pages document as a PDF.
- I opened the PDF in Preview and opened the Info window (Tools -> Get Info) to see what metadata from the Pages document had been carried over. None.
Based on my (possibly totally incorrect) understanding of the Quartz 2D Programming Guide’s “Creating A PDF”, it should be possible for applications to write their internal metadata to exported PDFs. It may just be that the APIs are new or updated, meaning that developers will need to update their software to use them.


Leave a Reply