pdf annotation extractor

From their README

PDF Annotation Extractor

An Alfred Workflow to extract annotations as Markdown & insert Pandoc Citations as References.

Automatically determines correct page numbers, merges highlights across page breaks, prepends a YAML Header bibliographic information, and some more small Quality-of-Life conveniences.

Table of Contents


  • Requirement: Alfred 5 with Powerpack

  • Install Homebrew

  • Install pdfannots2json by pasting the following into your terminal:

    brew install mgmeyers/pdfannots2json/pdfannots2json
  • Download the latest release.

  • Set the hotkey by double-clicking the sky-blue field at the top left.

  • Set up the workflow configuration inside the app.


Requirements for the PDF

  • The PDF Annotation Extractor works on any PDF that has valid annotations saved in the PDF file. (Some PDF readers like Skim or Zotero 6 do not store annotations int eh PDF itself by default.)
  • The filename of the PDF must be exactly the citekey (without @), optionally followed by an underscore and some text like {citekey}_{title}.pdf. The citekey must not contain underscores (_).

You can achieve such a filename pattern with automatic renaming rules of most reference managers, for example with the ZotFile plugin for Zotero or the AutoFile feature of BibDesk.


Use the hotkey to trigger the Annotation Extraction on the PDF file currently selected in Finder.

Annotation Types extracted

  • Highlight ➡️ bullet point, quoting text and prepending the comment
  • Underline ➡️ output to Drafts.app; they are not included in the annotations.
  • Free Comment ➡️ blockquote of the comment text
  • Strikethrough ➡️ Markdown strikethrough
  • Rectangle ➡️ image

Automatic Page Number Identification

Instead of the PDF page numbers, this workflow retrieves information about the real page numbers from the BibTeX library and inserts them. If there is no page data in the BibTeX entry (e.g., monographies), you are prompted to enter the page number manually.

  • In that case, enter the real page number of your first PDF page.
  • In case there is content before the actual text (e.g., a foreword or Table of Contents), the real page number 1 often occurs later in the PDF. In that case, you must enter a negative page number, reflecting the true page number the first PDF would have. Example: Your PDF is a book which has a foreword, and uses roman numbers for it; real page number 1 is PDF page number 12. If you continued the numbering backwards, the first PDF page would have page number -10, you enter the value -10 when prompted for a page number.

Annotation Codes

Insert these special codes at the beginning of an annotation to invoke special actions on that annotation. Annotation Codes do not apply to Strikethroughs. (You can run the Alfred command acode to display a cheat sheet showing all the following information.)

  • +: Merge this highlight/underline with the previous highlight/underline. Works for annotations on the same page (= skipping text in between) and for annotations across two pages.
  • ? foo (free comments): Turns "foo" into a Question Callout (> ![QUESTION]) and move up. (Callouts are Obsidian-specific Syntax.)
  • ##: Turns highlighted/underlined text into a heading that is added at that location. The number of # determines the heading level. If the annotation is a free comment, the text following the # is used as heading instead (Space after # required).
  • =: Adds highlighted/underlined text as tags to the YAML-frontmatter (mostly used for Obsidian as output). If the annotation is a free comment, uses the text after the =. In both cases, the annotation is removed afterwards.
  • _ (highlights only): Removes the _ and creates a copy of the annotation, but with the type underline. This annotation code avoids having to highlight and underline the same text segment to have it in both places.

Extracting Images

  • The respective images is saved in the attachments subfolder of the output folder, and named {citekey}_image{n}.png.
  • The images is embedded in the markdown file with the ![[ ]] syntax, e.g. ![[filename.png|foobar]]
  • Any rectangle type annotation in the PDF is extracted as image.
  • If the rectangle annotation has any comment, it is used as the alt-text for the image. (Note that some PDF readers like PDF Expert do not allow you to add a comment to rectangular annotations.)


  • Update to the latest version of pdfannots2json by running the following Terminal command brew upgrade pdfannots2json in your terminal.
  • This workflow does not work with annotations that are not actually saved in the PDF file. Some PDF Readers like Skim or Zotero 6 do this, but you can tell those PDF readers to save the notes in the actual PDF.
  • This workflow sometimes does not work when the PDF has bigger free-form annotations (e.g., from using a stylus on a tablet). Delete all those annotations that are "free form" and the workflow should work.
  • When the hotkey does not work when triggered in Preview, most likely the Alfred app does not have permission to access the app. You can give Alfred permission in the macOS System Settings:Permission for Alfred to access Preview
  • There are some cases where the extracted text is all jumbled up. In that case, it's a is a problem with the upstream pdfannots2json. The issue is tracked here, and you can also report your problem.

As a fallback, you can use pdfannots as extraction engine, as a different PDF engine sometimes fixes issues. This requires installing pdfannots via pip3 install pdfannots, and switching the fallback engine via aconf. Note that pdfannots does not support image extraction or extracting only recent annotations, so generally you want to keep using pdfannots2json.



About the Developer

In my day job, I am a sociologist studying the social mechanisms underlying the digital economy. For my PhD project, I investigate the governance of the app economy and how software ecosystems manage the tension between innovation and compatibility. If you are interested in this subject, feel free to get in touch!

Buy me a Coffee

Buy Me a Coffee at ko-fi.com