PDF Module 1.0

1 Introduction

This module allows manipulation of PDF documents.

1.1 Namespace conventions

The module defined by this document defines functions and elements in the namespace http://expath.org/ns/pdf. In this document, the pdf prefix, when used, is bound to this namespace URI.

Error codes are defined in the namespace http://expath.org/ns/error. In this document, err prefix, when used, is bound to this namespace URI.

1.2 Error management

Error conditions are identified by a code (a QName). When such an error condition is reached during the execution of the function, a dynamic error is thrown, with the corresponding error code (as if the standard XPath function error had been called).

2 Generalities

This section contains general information related to the functions detailed below.

2.1 Page ranges

The syntax for page ranges is as follows: for single page, use the page number; for a range of pages use a hyphen between the start page and the end page.

Single page: 17.

Page range: 17 - 22.

3 Creation

These functions are for creation of PDF documents.

3.1 The `pdf:create()` Function

This function is used for creating a PDF document.

pdf:create($contents as xs:base64binary*) as xs:base64binary

3.2 The `pdf:create-page()` Function

This function is used for creating a PDF page.

pdf:create-page($contents as xs:base64binary?) as xs:base64binary

4 Metadata

These functions are for getting and setting metadata about a PDF document and its contents.

4.1 Overall metadata

This is metadata about the document itself.

4.1.1 Basic overall metadata

The basic metadata is given in the table below (this is an excerpt from [PDF Reference 1.7]):

Basic overall PDF metadata
Key (as `xs:string`)	Value	Meaning
`Title`	`xs:string`	The document’s title. (Optional)
`Author`	`xs:string`	The name of the person who created the document. (Optional)
`Subject`	`xs:string`	The subject of the document. (Optional)
`Keywords`	`xs:string`	Keywords associated with the document. (Optional)
`Creator`	`xs:string`	If the document was converted to PDF from another format, the name of the application (for example, Adobe FrameMaker®) that created the original document from which it was converted. (Optional)
`Producer`	`xs:string`	If the document was converted to PDF from another format, the name of the application (for example, Acrobat Distiller) that converted it to PDF. (Optional)
`CreationDate`	`xs:date`	The date and time the document was created. (Optional)
`ModDate`	`xs:date`	The date and time the document was most recently modified. (Required if `PieceInfo` is present in the document catalog; otherwise optional)
`Trapped`	`xs:string`	A string indicating whether the document has been modified to include trapping information (see [PDF Reference 1.7], section 10.10.5, “Trapping Support”). The legal values are: "True", "False, and "Unknown". The default value is "Unknown". (Optional)

4.1.2 Custom overall metadata

This is custom metadata about the document itself. This metadata is get or set in form of an entry in the map(xs:string, xs:string) that constitutes the document's overall metadata.

4.2 The `pdf:get-document-metadata()` Function

This function is used to get the global metadata for a PDF document, as it is defined in [PDF Reference 1.7], section 10.2 Metadata. It gets the global metadata, along with the custom metadata. See [overall-metadata] for details about the overall metadata.

pdf:get-document-metadata($contents as xs:base64Binary?) as map(xs:string, xs:string)?

$contents is the PDF contents to get the metadata from.

4.3 The `pdf:set-document-metadata()` Function

This function is used to set the overall metadata for a PDF document.

pdf:set-document-metadata($contents as xs:base64Binary?, $metadata as item()) as xs:base64Binary*

$contents is the PDF contents to apply the metadata to.
$metadata is the overall metadata to be applied. See [overall-metadata] for details about the overall metadata.

4.4 Component metadata

This is metadata about individual components of a document.

5 Content navigation

These functions are for navigation among the PDF objects of a PDF document.

6 Content manipulation

These functions are for manipulation of contents of the PDF documents.

6.1 The `pdf:merge()` Function

This function is used for merging multiple PDF documents or subsets of them (groups of pages).

pdf:merge($contents as xs:base64binary,
	$new-resource-metadata as element(pdf:resource-metadata) as xs:base64binary

6.2 The `pdf:split()` Function

This function is used for splitting a PDF document into a number of sections. It returns a sequence of sections.

There can be many options for splitting: page splitting, bookmarks splitting, extract until text.

pdf:split($contents as xs:base64Binary?,
	$options as element(pdf:options)) as xs:base64Binary*

$contents is the PDF contents to be splitted.
$options are the options for the current operation.

6.2.1 The `pdf:options` element

	<pdf:options>
		(pdf:split-delimiter)
	</pdf:options>

the pdf:split-delimiter child element specifies the delimiter used for splitting the input PDF document.

6.3 The `pdf:extract()` Function

This function is used for extracting pages from a PDF document. It has a parameter for setting the deletion of the respective pages after extraction. It returns the extracted pages.

6.4 The `pdf:insert()` Function

This function is used for inserting pages into a PDF document.

6.5 The `pdf:delete()` Function

This function is used for deleting pages from a PDF document. It returns the PDF document having deleted the respective pages.

pdf:delete($contents as xs:base64binary?, $page-ranges as xs:string*) as xs:base64binary?

$contents is the PDF contents from which certain pages has to be deleted.
$page-ranges is the specification of page ranges to be deleted. The syntax for page range can be found at 2.1 Page ranges. For an example of usage, see scenario 16.5 Delete pages.

6.6 The `pdf:rotate()` Function

This function is used for rotating indicated pages of a PDF document. The rest of the pages remain unchanged and the page order is maintained. It returns the modified PDF document.

pdf:rotate($contents as xs:base64binary?, $page-ranges-and-directions as map(xs:string, xs:string)?) as xs:base64binary?

$contents is the PDF contents for which certain pages has to be rotated.
$page-ranges-and-directions is the specification of page ranges to be rotated and in which directions. This argument is a map, having as keys the page ranges and as values the corresponding directions. The syntax for page range can be found at 2.1 Page ranges. The legal values for rotation direction are: "right 90" (clockwise with 90 degress), "180" (rotation with 180 degress), and "left 90" (counterclockwise with 90 degress). For an example of usage, see scenario 16.6 Rotate pages.

6.7 The `pdf:reverse()` Function

This function is used for reversing pages of a PDF document. It returns the PDF document with its pages in reverse order.

pdf:reverse($contents as xs:base64binary?) as xs:base64binary?

$contents is the PDF contents for which to reverse page order.

7 Links and bookmarks

These functions are for manipulation of links and bookmarks of a PDF document.

7.1 The `pdf:create-bookmark()` Function

This function is used to create bookmarks in a PDF document.

7.2 The `pdf:edit-bookmark()` Function

This function is used to edit bookmarks of a PDF document.

7.3 The `pdf:delete-bookmark()` Function

This function is used to delete bookmarks from a PDF document.

7.4 The `pdf:import-bookmarks()` Function

This function is used to import bookmarks to a PDF document.

7.5 The `pdf:export-bookmarks()` Function

This function is used to export bookmarks from a PDF document.

7.6 The `pdf:create-link()` Function

This function is used to create links in a PDF document.

7.7 The `pdf:edit-link()` Function

This function is used to edit links of a PDF document.

7.8 The `pdf:delete-link()` Function

This function is used to delete links from a PDF document.

7.9 The `pdf:import-links()` Function

This function is used to import links to a PDF document.

7.10 The `pdf:export-links()` Function

This function is used to export links from a PDF document.

8 Form controls

These functions are designated for gathering information about and interacting with form controls of a PDF document.

8.1 The `pdf:get-text-fields()` Function

Get all the text fields from a PDF contents. Returns a map containing pairs of fully qualified name and value for each text field.

pdf:get-text-fields($contents as xs:base64binary?) as map(xs:string, xs:string)?

$contents is the PDF contents where to get the text fields from.

8.2 The `pdf:set-text-fields()` Function

Set the text fields of a PDF contents. Returns the updated PDF contents.

pdf:set-text-fields($contents as xs:base64binary?, $text-fields as map(xs:string, xs:string)?) as xs:base64binary?

$contents is the PDF contents where to set the text fields to.
$text-fields are the information sets about the text fields, namely a map containing pairs of fully qualified name and value for each text field to be set.

9 Stamping, commenting, annotating, and marking

These functions are associating various objects with a PDF document.

9.1 The `pdf:stamp()` Function

This function is used for applying a stamp to a PDF document. It has two signatures, the first one without a CSS selector for the stamp.

When using the first signature, the content of $stamp-styling parameter should consists of a set of CSS declarations that will only be applied to the respective stamp. Also, the respective stamp will be applied on every page of the PDF contents.

pdf:stamp($contents as xs:base64binary?, $stamp as item(), $stamp-styling as xs:string) as xs:base64binary?

pdf:stamp($contents as xs:base64binary?, $stamp as item(), $stamp-selector as xs:string, $stamp-styling as xs:string) as xs:base64binary?

$contents is the PDF contents whose pages are to be stamped.
$stamp is the stamp to be applied to the PDF contents. The stamp can be either a text, an image, a PDF document, a HTML + Javascript + CSS document, a SVG document, etc. (the implementations should define what formats are supported).
$stamp-selector is the CSS selector used to match the current stamp. Such selector is needed for further operations on the stamp, such as updating and deleting.
$stamp-styling is the CSS styling for the current stamp.

10 Contents rasterization

These functions are for rendering the PDF documents.

10.1 The `pdf:to-image()` Function

This function is used for converting pages of a PDF document to images. Returns a sequence of the generated images.

pdf:to-image($contents as xs:base64binary?, $format as xs:string, $scaling as xs:string) as xs:base64binary*

$contents is the PDF contents whose pages are to be converted to images.
$format is the format of the outputted images.
$scaling is the scaling of the outputted images.

11 Attachments

These functions are for manipulation of PDF file attachments.

11.1 The `pdf:list-file-attachments()` Function

This function is used to list the file attachments of a PDF document.

11.2 The `pdf:import-attachment()` Function

This function is used for importing an attachment into a PDF document.

11.3 The `pdf:export-attachment()` Function

This function is used for exporting an attachment into a PDF document.

12 Validation

These functions are designated for validating a PDF document.

12.1 The `pdf:validate()` Function

This function is used for validating a PDF document.

12.2 The `pdf:validate-links()` Function

This function is used for validating the links contained by a PDF document.

12.3 The `pdf:validate-bookmarks()` Function

This function is used for validating the bookmarks contained by a PDF document.

13 Optimization

These functions are designated for optimizing a PDF document.

13.1 The `pdf:optimize()()` Function

This function is used for optimizing a PDF document, by reducing the file size without affecting quality.

13.2 The `pdf:linearize()` Function

This function is used for linearizing a PDF document, for fast delivery over a network. The first page is already visible, while the rest is downloaded in background

13.3 The `pdf:compress()` Function

This function is used for compressing a PDF document.

13.4 The `pdf:uncompress()` Function

This function is used for uncompressing a PDF document.

14 Security

These functions are for protection of the PDF documents.

14.1 The `pdf:encrypt()` Function

This function is used for encrypting a PDF document by using a digital certificate.

14.2 The `pdf:decrypt()` Function

This function is used for decrypting a PDF document.

14.3 The `pdf:add-signature()` Function

This function is used for adding an electronic signature to a PDF document.

14.4 The `pdf:remove-signature()` Function

This function is used for removing an electronic signature from a PDF document.

15 Repairing

These functions are for repairing and recovering of PDF documents.

15.1 The `pdf:repair()` Function

This function is used for repairing a PDF document.

16 Scenarios of usage

Scenarios of usage of the functions comprised by this module.

16.1 Insert a (blank) page every nth page

pdf:insert(pdf:create-page(), $pattern)

16.2 Delete broken links

for $broken-link in pdf:validate-links($pdf-document)
return if () then () else ()

16.3 Audit Bookmarks and Links

One can validate bookmarks and links, and export those found broken. They can be included in report and/or fixed and imported to document.

16.4 Validate PDF document

One can validate a PDf document using certain validation criteria, as such: dimensions of all/certain pages should be A4 (297 mm * 210 mm), there should be no contents within a certain rectangular area of a page (left margin where the print shop inserts a bar code, for instance), number of pages should be less than N, PDF version used.

16.5 Delete pages

pdf:delete($pdf-document, ("3", "5 - 7", "13 - 17"))

16.6 Rotate pages

pdf:rotate($pdf-document, map {
	"3" := "right 90",
	"5 - 7" := "180",
	"13 - 17" := "left 90"
})

PDF Module 1.0

EXPath Candidate Module 9 April 2013

Abstract

Table of Contents

Appendices

1 Introduction

1.1 Namespace conventions

1.2 Error management

2 Generalities

2.1 Page ranges

3 Creation

3.1 The pdf:create() Function

3.2 The pdf:create-page() Function

4 Metadata

4.1 Overall metadata

4.1.1 Basic overall metadata

4.1.2 Custom overall metadata

4.2 The pdf:get-document-metadata() Function

4.3 The pdf:set-document-metadata() Function

4.4 Component metadata

5 Content navigation

6 Content manipulation

6.1 The pdf:merge() Function

6.2 The pdf:split() Function

6.2.1 The pdf:options element

6.3 The pdf:extract() Function

6.4 The pdf:insert() Function

6.5 The pdf:delete() Function

6.6 The pdf:rotate() Function

6.7 The pdf:reverse() Function

7 Links and bookmarks

7.1 The pdf:create-bookmark() Function

7.2 The pdf:edit-bookmark() Function

7.3 The pdf:delete-bookmark() Function

7.4 The pdf:import-bookmarks() Function

7.5 The pdf:export-bookmarks() Function

7.6 The pdf:create-link() Function

7.7 The pdf:edit-link() Function

7.8 The pdf:delete-link() Function

7.9 The pdf:import-links() Function

7.10 The pdf:export-links() Function