Parsing

An article I saved is missing content such text, images, videos, or tables. How do I fix missing content in saved articles?

We built a benchmark test we run against the top 200 articles saved to Readwise from Instapaper and Pocket and Reader already parses them better, but it's still not possible to parse the open web 100% correct 100% of the time.

If you save documents using the browser extensions on web or using the Safari browser and share sheet on iOS, these methods will generally result in the highest quality parsing because Reader is getting the full document content rather than the naked URL.

If the document is still missing content (such as missing images), you should report those documents through the feedback section of the Reader app and selecting Report document parsing issue. (Make sure the app is open to the document you're reporting when you do this to ensure that we receive the correct metadata in your report!)We have an engineer dedicated to fixing parsing tickets and we are constantly upgrading our parsing.

An article I saved includes non-core content such as an advertisement. How do I fix extra content in saved articles?

If the document still has extraneous content (such as inline advertisements), you should report those documents through the feedback section of the Reader app and selecting Report document parsing issue. (Make sure the app is open to the document you're reporting when you do this to ensure that we receive the correct metadata in your report!) We have an engineer dedicated to fixing parsing tickets and we are constantly upgrading our parsing.

How do I save articles behind paywalls?

If you save documents using the browser extensions on web or using the Safari browser and share sheet on iOS, paywalled document should be saved without issue.

I noticed that articles from large news sites such as NYT, Washington Post, Medium, and so on do not contain the full content. How do I save the full content of an article behind a paywall?

If you save documents using the browser extensions on web or using the Safari browser and share sheet on iOS, paywalled documents should be saved without issue. Note that if you save directly from another app on iOS (e.g. NYTimes, Medium, etc), this may result in partial parsing because Reader can only get the naked URL and those apps aggressively block read-it-later apps.

If you're seeing documents come over with partial content from a source in your Feed, you can quickly open the original article in your browser (by pressing O on your keyboard or tapping ... > Open on mobile), make sure you're logged into the site in question (e.g. NYT, Medium, etc), and then re-save the page using the browser extension or mobile share sheet.

Will Reader store the content of my saved articles even if the original article is changed or removed from the web?

Yes, saving an article to Reader parses the page and saves the content as-is. Reader will never try to re-parse previously saved content, so the version in your Reader library will always reflect the way you originally read it, and your highlights and notes will never lose their context.

What if I want to manually refresh an article to reflect updates to the original?

Since Reader intentionally stores the originally saved version to preserve highlight and note context, the only way to get an updated version of an article is to delete it from your library and re-save it. Note that this will also delete any highlights and notes associated with the document, in both Reader and Readwise.

How does Reader detect duplicate content?

Reader’s current de-duping logic will catch any documents that are saved multiple times with the same URL. If you re-save a URL, the document will be moved to the top of your Library and will feature a green dot in the upper left to indicate that you’ve saved it more than once.

However, the current logic isn’t able to detect duplicate content, so saving the same document from slightly different URLs will result in the creation of a second version. This often happens when a URL contains tracking information or other additional parameters (e.g. https://site-name.com/article-title?utm_source=rss-feed vs https://site-name.com/article-title), and this is the primary reason that articles saved from a Feed source won’t be recognized as duplicates of the same articles saved from the original website.