Go to file
ThePendulum 7de8f94caa Added hash comparison to duplicate avoidance. 2019-11-05 04:52:17 +01:00
config Removed post.user variable. 2019-11-02 02:46:11 +01:00
src Added hash comparison to duplicate avoidance. 2019-11-05 04:52:17 +01:00
.eslintrc Using YAML rather than TSV for index files. Improves both readability and reindexability. 2018-06-30 03:33:30 +02:00
.gitignore Added support for file with host IDs to ignore. 2018-07-02 02:33:34 +02:00
README.md Removed post.user variable. 2019-11-02 02:46:11 +01:00
package-lock.json Added hash comparison to duplicate avoidance. 2019-11-05 04:52:17 +01:00
package.json Added hash comparison to duplicate avoidance. 2019-11-05 04:52:17 +01:00

README.md

ripunzel

A powerful utility to fetch almost all of a reddit user's content. It supports many image and video hosts and has offers extensive filenaming options.

Features

Most features are optional and can easily be disabled!

  • Freely configure target paths with variables like username, formatted date, post title, subreddit, image ID, index in album and many more. All variables can be used to create directories
  • Fetch any (reasonable) amount of users in one go, with their profile image and description
  • Search various archives for posts no longer listed on reddit, but for which the source or preview is still available
  • Save image descriptions or other variables as metadata (EXIF for JPEG only, broader support coming soon!)
  • Avoid duplicates
  • Extract single images from albums

Supported hosts

  • Reddit text/self, images and videos*
  • Imgur (requires API key as of late 2019)
  • Gfycat
  • PornHub (videos)
  • Erome
  • Vidble
  • Eroshare archive

Plans and ideas

  • Avoid redownloading unless specified otherwise
  • Support for more image hosts (e.g. vidble, erome)
  • Watch-mode (keep the process running and automatically save new posts for specified users)
  • Templates for text/self posts (use any variable inside text files)
  • Support for subreddits
  • Expand metadata support to PNGs, GIFs and videos
  • Save additional details to an index file
  • Only download non-default profile images (avoid standard avatars)

Installation

ripunzel requires a arbitrarily recent version of Node.js. Before use, dependencies must be installed as follows:

npm install

Usage

npm start -- (--user <username> | --post <post-id> | --fetch <content-url>)

Optional arguments

  • --users <username> [<username>...]: You may fetch posts from multiple users by supplying a space-separated list of usernames to --users.
  • --posts <post-id> [<post-id>...]: Fetch multiple posts by supplying a space-separated list of post IDs to --posts.
  • --fetch <content-url>: Fetch content directly from an URL to an album or image on one of the supported hosts
  • --file-users <filepath>: Fetch posts from multiple users by supplying a file with newline separated usernames
  • --file-posts <filepath>: Fetch multiple posts by supplying a file with newline separated post IDs
  • --file-fetch <filepath>: Fetch content directly from multiple sources by supplying a file with newline separated URLs
  • --limit <number>: Maximum amount posts per user to fetch content from. Limit is applied after filtering out ignored, cross- and reposts. Posts requested directly by ID may be discarded as duplicates, but are not otherwise affected by the limit.
  • --sort <method>: How posts should be sorted while fetched. This affects the $postIndex variable, and in combination with a --limit decides what posts will be included.
  • --ignore <prop> [<prop>...]: Ignore posts with any of the following properties: pinned, stickied, hidden, over_18, spoiler.
  • --exclude <source> [<source>...]: Do not include posts from these sources (e.g. self, reddit, imgur, gfycat, ...). Should not be used in combination with --include.
  • --include <source> [<source>...]: Only include posts from these sources (e.g. self, reddit, imgur, gfycat, ...). Should not be used in combination with --exclude.
  • --base <path>: Overwrite the base path variables {base.posts} and {base.direct}, preserving the remainder of the filepath pattern.
  • --label <name>: Arbitrary text made available as the {label} variable.

Examples

  • npm start -- --user AWildSketchAppeared
  • npm start -- --users ShittyWatercolour AwildSketchAppeared --limit 10 --sort top
  • npm start -- --user GallowBoob --limit 50 --ignore pinned stickied

Reddit videos

The audio stream for videos with sound uploaded to reddit directly (v.redd.it) is hosted separately, and will be saved as such alongside the video (typically {filename}-0 and {filename}-1). For them to be muxed into a single file automatically, ffmpeg must be available on the system, and the separate source files will be deleted when muxing has succeeded. If ffmpeg is not available, the separate files will remain as is.

Configuration

The default configuration aims to be sensible. However, a multitude of options make this utility particularly powerful.

To change the configuration, please refer to config/default.js. I recommend not editing this file directly, but instead making a copy config/local.js, as the default configuration might be overwritten in updates and can be a useful reference for restoring any detrimental configuration errors. The structure of config/local.js must match the structure of the default configuration, but does not necessarily need to contain any properties you do not wish to override. If preferred, you may instead use JSON in config/local.json.

API keys

Unfortunately, it is necessary to register for the reddit and imgur APIs for this application to work reliably. Example details have been provided in config/default.js, but must be overwritten (preferably in config/local.js). More information on registering APIs will become available in this section soon.

Library

Path patterns dictate where and how a file will be saved. Various variables and options are available, and you may use subdirectories divided by /.

Variables

  • {base.posts} or {base.direct}: An optional variable intended to set the beginning most paths have in common, for content fetched via reddit and content fetched directly respectively. The variable must be added to each path manually and is not prefixed automatically as to allow for exceptions. The configuration for both will be overruled by the --base argument;
  • {label}: Arbitrary text specified by the --label argument.
Item (individual image, video or text)
  • {item.id}: The ID of the individual image or video
  • {item.title}: The title of the individual image or video
  • {item.description}: The description of the individual image or video
  • {item.date}: The submission date of the individual image or video, formatted by the dateFormat configuration described below
  • {item.index}: The index of the individual image or video in an album, offset by the indexOffset configuration described below
  • {tags.extracted}: Whether the item has been extracted as the only item in an album
  • {tags.preview}: Whether the image is a reddit preview because it was unavailable on the original host
  • {ext}: The extension of the medium. Must typically be included, but may be omitted for self (text) posts on Unix systems
Album
  • {album.id}: The ID of the media host album
  • {album.title}: The title of the media host album
  • {album.description}: The description of the media host album
  • {album.date}: The submission date of the media host album, formatted by the dateFormat configuration described below
Post
  • {post.id}: The ID of the reddit post
  • {post.title}: The title of the reddit post
  • {post.date}: The submission date of the reddit post, formatted by the dateFormat configuration described below
  • {post.index}: The index of the post according to the sort method
  • {post.score}: The current karma score of the post
  • {post.hash}: The hash of the post
  • {post.subreddit}: The name of the subreddit the post is submitted to
Host
  • {host.name} or {host.label}: Name of the source the content was hosted on, e.g. 'imgur' or 'gfycat'
  • {host.id}: ID of the source the content was hosted on
User
  • {user.name} or {user.username}: The nickname of the reddit user that submitted the post
  • {user.id}: The ID of the reddit user that submitted the post
  • {user.created}: The creation date or birthday of the reddit user, formatted according to dateFormat described below
  • {tags.verified}: Whether the reddit user is verified
  • {tags.verifiedEmail}: Whether the reddit user has verified their e-mail address
  • {tags.gold}: Whether the reddit user is a gold member of reddit
Profile

Many reddit users have a 'subreddit' of their own in the form of a profile (not to be confused with users that have created an actual subreddit for themselves). These variables are only available for users that have enabled this.

  • {profile.title}: The title of the reddit user's profile
  • {profile.id}: The ID of the reddit user's profile
  • {profile.description}: The description of the reddit user's profile
  • {tags.over18}: Whether the profile contains adult content and requires an 'over 18' age confirmation
{tags.x}

Tags are variables that will only be inserted when another variable is present. When you use a tag, you must configure a string of text that is inserted in place of a tag variable when the associated variable is available.

divider, {div} and {divs.x}

The {div} variable will insert an arbitrary string as configured by the divider option, intended, of course, to be used as a divider between other components. Similar to tags, {divs.x} will insert a divider only when the specified variable is present. For example, {divs.item.}title will only insert a divider when {item.title} is present. This makes sure a filename will look like, for example, either ./20191101 - abc123 - Hello world!.jpegwhen a title is available or./20191101 - abc123.jpgwhen no title is available, instead of20191101 - abc123 - .jpeg` when a title is not available.

dateFormat

Affects the representation of {item.date}, {album.date} and {post.date} and defaults to YYYYMMDD. See this documentation for an overview of all available tokens.

titleLength

Titles can sometimes be longer than you prefer your filenames to be, or even overflow the operating system's limit (255 bytes for Linux). This property cuts off titles at a fixed number of characters.

indexOffset

Arrays start at 0, but as to not tire myself out debating the matter, you may offset it my any numerical value you like. Affects the {item.index} variable for album items.

slashSubstitute

The patterns represent Unix file paths, and a / therefore indicates a new directory. You may freely use directories in your patterns, but titles or descriptions may contain a / that is not supposed to create a new directory. All instances of / in a variable value will be replaced with the configured slash substitute.

extractSingleAlbumItem

Some albums contain only one image or video. By setting extractSingleAlbumItem to true (default), the item will be saved in accordance to the individual item patterns rather than the album patterns. An extracted item will inherit the title and description of the album if it has none of its own. Extracted items will have a truthy {tags.extracted} variable.