Ghostreader References

While it's entirely possible to create effective custom prompts purely with plain language and one or two simple variables, the options become nearly limitless with a bit of imagination and the use of some more complex control structures.

In this document, you'll find descriptions of all the variables, subroutines, and other tools you can use to customize your prompts.

For those unfamiliar with the terminology, here's a quick overview:

  • Variables: Placeholders for content that will be pulled from the document or selection each time the prompt is run. Ghostreader's variables include options such as {{ document.title }} and {{ selection.sentence }}.
  • Subroutines: Processes that can be added to modify, quantify, or search within a given variable. They are separated from the variable by a pipe symbol (|) and often take additional arguments. (If you're familiar with Jinja2, you'll know these as "filters".) Ghostreader's subroutines include options like num_tokens and central_sentences.
  • Utilities: Extra tools you can use to craft prompts, such as input and response.

Variables

These are all the variables that can be used to retrieve information about the current document for Ghostreader to use as input for prompts. They should be enclosed within curly brackets ({{ }}) unless they're part of a statement, like {% if %} or {% set %}.

document.author

Renders the author of the targeted document, as it appears in the document metadata.

Example Usage:

{% if "Maria Popova" in document.author or "Jillian Hess" in document.author %}

document.category

Renders the category of the targeted document, as it appears in the document metadata.

Example Usage:

{% if "pdf" in document.category %}

document.content

Renders the full text of the targeted document.

Note that this variable should generally be used inside of an if-else statement that runs the central_sentences or central_paragraphs subroutine on documents that exceed a certain token limit. If the prompt tries to render the full context of, say, a 15,000 word essay, the prompt will exceed GPT-3.5’s context window and fail (or cost you a few dollars per GPT-4 summary).

Example Usage:

{% if (document.content | num_tokens) > 25000 %}
{{ document.html | central_paragraphs | join('\n\n') }}
{% elif (document.content | num_tokens) > 2500 %}
{{ document.content | central_sentences | join('\n\n') }}
{% else %}
{{ document.content }}
{% endif %}

document.domain

Renders the domain that the target document was saved or imported from, such as nytimes.com or jillianhess.substack.com.

Example Usage:

{% if "nytimes.com" in document.domain %}

document.length

Renders the length of the document, in number of words, as it appears in the document metadata.

Example Usage:

{% if document.length > 1000 %}

document.title

Renders the title of the document, as it appears in the document metadata.

Example Usage:

I'm about to read a document entitled "{{ document.title }}" written by {{ document.author }}. I want you to identify the 3 most interesting questions I should consider while reading.

document.source

Renders the source of the targeted document, as it appears in the document metadata. (Only relevant for RSS feeds.)

Example Usage:

{% if "NYT" in document.source %}

document.summary

Renders the current contents of the target document’s summary metadata field. (The summary was probably generated by a prompt itself—how very meta!)

Example Usage:

I'm about to read a document entitled "{{ document.title }}" written by {{ document.author }} having the following summary:
===
{{ document.summary }}
===

document.tags

This variable renders a list of all the tags applied to the target document. Since it returns a list, you should generally run the join(', ') function on it to flatten the tags into a comma-separated list.

Example Usage:

Tags: {{ document.tags | join(', ') }}

document.note

Renders the contents of the document note field.

Example Usage:

{% if document.note %}

document.language

Renders the language that's been set in the metadata of the document.

Example Usage:

I'm reading a document in {{ document.language }}.

document.progress

Renders the reading progress of the document. Returns an integer symbolizing the percentage read.

document.highlights

This variable returns an array containing the document’s highlights, which in themselves contain tags and notes. To extract the relevant information in a way that GPT can process, you need to use this variable within a for loop, like so:

{% for highlight in document.highlights %}
    Tags: {{ highlight.tags | join(', ') }}
    Note: {{ highlight.note }}
    Highlight: {{ highlight.content }}
{% endfor %}

The highlight.tags variable works in the same way as the document.tags variable and should always be used with the join function.

highlight.note and highlight.content render the content of their respective fields.

document.html

Returns the raw HTML of the document.

Example Usage:

{% if (document.content | num_tokens) > 25000 %}
{{ document.html | central_paragraphs | join('\n\n') }}
{% elif (document.content | num_tokens) > 2500 %}
{{ document.content | central_sentences | join('\n\n') }}
{% else %}
{{ document.content }}
{% endif %}

document.key_sentences

Returns the top sentences of the document in a single string. The number of sentences returned depends on the length of the document, so you can specify a token limit if you need to keep the response within a certain length, e.g. document.key_sentences(200).

selection

Returns the text selected by the user when invoking Ghostreader (i.e. the highlighted passage).

Example Usage:

I just came across the word or phrase "{{ selection }}" as used in the following sentence: "{{ selection.sentence }}"

selection.sentence

Returns the full sentence that contains the selected text.

Example Usage:

I just came across the word or phrase "{{ selection }}" as used in the following sentence: "{{ selection.sentence }}"

selection.paragraph

Returns the full paragraph that contains the selected text.

Example Usage:

Within the document, I selected this text:
{{ selection }}

That selection is contained in the paragraph:
{{ selection.paragraph }}

Subroutines

These can be used to modify the results of a variable or run another process on it. To use a subroutine with a variable, include it inside the same set of curly brackets {{ }}, but separate it from the variable using a pipe symbol |.

Example: {{ document.content | central_sentences | join('\n\n') }}

central_sentences

Mainly used with document.content, this subroutine analyzes the provided content and returns the sentences that best convey its main points. Note that the example usage includes the join() filter, which adds line breaks between each query result to keep the limits of each result clear.

Arguments:

  • top_k: integer (default = none)
  • target_tokens: integer (default = none)
  • document_order: boolean (default = true)

Example Usage:

{{ document.content | central_sentences(top_k=10, target_tokens=500, document_order=true) | join('\n\n') }}

central_paragraphs

Similar to the central_sentences subroutine, but returns full paragraphs. Note that the example usage includes the join() filter, which adds line breaks between each query result to keep the limits of each result clear.

Arguments:

  • html: string
  • target_ratio: float (default = 0.1)
  • max_paragraphs: integer (default = 10)
  • min_paragraph_length: integer (default= 150)
  • max_paragraph_length: integer (default = 700)
  • min_sentence_words: integer (default = 5)
  • max_tokens: integer (default = 3000)
  • max_tokens_tolerance: float (default= 0.1)
  • centrality_token_threshold: integer (default = 50)
  • document_order: bool (default = true)

Example Usage:

{{ document.content | central_paragraphs(max_paragraphs=30, min_paragraph_length=50, max_paragraph_length=800) | join('\n\n') }}

classify

This subroutine can be used to detect sentiment, labels, and tags without needing to be heavily trained for your specific use-case. Note that running this on long texts or in situations with a lot of classes can be very slow.

Note that the example usage includes the join() filter, but uses a comma as a delimiter rather than two newlines like many of the other subroutines on this page. That's because this subroutine is meant to return single words, rather than full sentences or paragraphs.

Arguments:

  • classes: list, strings
  • hypothesis: string (prepends the selected class in the output)
  • multi-label: boolean (default = false)
  • threshold: float (default = 0.8)

Example Usage:

{{ document.content | truncate | classify(classes=["news", "sports"], hypothesis="This news article was published under {}", multi_label=True) | join(', ') }}

document_range

Returns full centences, beginning with the document position of the first argument and ending at the second.

For example, document_range(10, 12) would return the sentences found between the 10% position and the 12% position.

Searches the provided content and returns literal keyword matches to the provided query. Note that the example usage includes the join() filter, which adds line breaks between each query result to keep the limits of each result clear.

Arguments:

  • query: string
  • document_order: boolean (default=false)
  • tokens_before: integer (default = 1)
  • tokens_after: integer (default = 1)
  • limit: integer (default = 5)

Example Usage:

{{ document.content | lexical_search(query=selection, document_order=true, tokens_before=2, tokens_after=2, limit=50) | join('\n\n') }}

most_similar

Searches the provided content and returns semantic matches to the provided query. Note that the example usage includes the join() filter, which adds line breaks between each query result to keep the limits of each result clear.

Arguments:

  • query: string
  • top_k: integer (default=10)
  • sentences_before: integer (default = 0)
  • sentences_after: integer (default = 0)
  • document_order: boolean (default = true)
  • threshold: float (default = 0.5)
  • extractive_summary_result_threshold: integer (default=none)
  • extractive_summary_score_threshold: float (default = none)

Example Usage:

{{ document.content | most_similar(query=selection, top_k=10, document_order=true) | join('\n\n') }}

num_tokens

Used to count the number of tokens that the returned content will use. The primary use case for this subroutine is checking that the content won't exceed the context window of the GPT and cause the prompt to fail. Often used inside an if statement.

Example Usage:

{% if (document.content | num_tokens) > 2500 %}

truncate

Shortens the content so that it doesn't exceed the specified number of tokens.

Arguments:

  • max_tokens: integer (default = 1000)

Example Usage:

{{ document.content | truncate(max_tokens=1500) }}

Utilities

This section describes some extra utilities that you can use while crafting your prompts. Some of these are built-in Jinja features, while others are specific to Ghostreader.

set

This Jinja statement lets you declare your own variable. It's useful in combination with other utilities like input and response, as well as to store other variables in certain formats.

It should be used inside of the same curly brackets with percentage symbols ({% %}) as other Jinja statements like if.

Example Usage:

{% set doc_tags = document.tags | join(', ') %}

length

The length Jinja filter returns the "number of items in a container" (as stated in the Jinja documentation). For the purposes of Ghostreader prompts, this is most applicable for variables like document.highlights or document.tags that return a list or an array.

Example Usage:

{% if (document.highlights | length) > 2 %}

join

The join Jinja filter is used to concatenate items using a specified delimiter. This is useful for variables like document.tags, and for subroutines like central_sentences.

Example Usage:

{{ document.content | central_sentences | join('\n\n') }}

input

This allows the prompt to take input from the user. The input is usually stored in a variable using the set statement.

Note the use of the minus symbol (-) in the example below. Adding this will strip any extra whitespaces from the beginning and end of the input.

Example Usage:

{%- set query = input() -%}

response

This powerful utility allows you to chain GPT responses inside of a single prompt. You can use this to tell GPT to analyze the provided content and classify it or answer a question about it, then set that response as a variable that can be used to influence the rest of the prompt.

Note the use of the minus symbol (-) in the example below. Adding this will strip any extra whitespaces from the beginning and end of the input.

Example Usage:

Please extract the most surprising and non-core topic of the passage for further exploration. Do not extract the core topic. Describe this surprising, non-core but surprising topic in a few words in the style of the title of a Wikipedia article.
{%- set topic = response() -%}

Write a three sentence, easy-to-read paragraph in the style of a Wikipedia page teaching me about the topic: {{ topic }}.