How to download mass files from subreddit






















Please update Python before installation to meet the requirement. Then, you can install it as such:. These sources might be a subreddit, multireddit, a user list, or individual links. These sources are combined and downloaded to disk, according to a naming and organisational scheme defined by the user.

There are three modes to the BDFR: download, archive, and clone. Each one has a command that performs similar but distinct functions. The download command will download the resource linked in the Reddit submission, such as the images, video, etc. The archive command will download the submission data itself and store it, such as the submission details, upvotes, text, statistics, as and all the comments on that submission.

Lastly, the clone command will perform both functions of the previous commands at once and is more efficient than running those commands sequentially. Note that the clone command is not a true, failthful clone of Reddit. It simply retrieves much of the raw data that Reddit provides. To get a true clone of Reddit, another tool such as HTTrack should be used. However, these commands are not enough. You should chain parameters in Options according to your use case.

Some quick reference commands are:. The following options are common between both the archive and download commands of the BDFR. The following options apply only to the download command.

This command downloads the files and resources linked to in the submission, or a text submission itself, to the disk in the specified directory. The following options are for the archive command specifically. The clone command can take all the options listed above for both the archive and download commands since it performs the functions of both. This means that it is a secure, token-based system for making requests.

This also means that the BDFR only has access to specific parts of the account authenticated, by default only saved posts, upvoted posts, and the identity of the authenticated account.

Note that authentication is not required unless accessing private things like upvoted posts, saved posts, and private multireddits. If this is not there, then the BDFR will attempt to register itself with your account. This is normal, and if you run the program, it will pause and show a Reddit URL. Click on this URL and it will take you to Reddit, where the permissions being requested will be shown.

Thousands of new images are uploaded to Reddit every day. Web Scraping Images To achieve our goal, we will use ParseHub, a free and powerful web scraper that can work with any website. Enter the URL of the subreddit you will be scraping. The page will now be rendered inside the app. Make sure to use the old. You can now make the first selection of your scraping job. Start by clicking on the title of the first post on the page. It will be highlighted in green to indicate that it has been selected.

The rest of the posts will be highlighted in yellow. Click on the second post on the list to select them all. This command downloads the files and resources linked to in the submission, or a text submission itself, to the disk in the specified directory. The clone command can take all the options listed above for both the archive and download commands since it performs the functions of both. This means that it is a secure, token-based system for making requests.

This also means that the BDFR only has access to specific parts of the account authenticated, by default only saved posts, upvoted posts, and the identity of the authenticated account. Note that authentication is not required unless accessing private things like upvoted posts, saved posts, and private multireddits.

To authenticate, the BDFR will first look for a token in the configuration file that signals that there's been a previous authentication. If this is not there, then the BDFR will attempt to register itself with your account.

This is normal, and if you run the program, it will pause and show a Reddit URL. Click on this URL and it will take you to Reddit, where the permissions being requested will be shown.

Read this and confirm that there are no more permissions than needed to run the program. You should not grant unneeded permissions; by default, the BDFR only requests permission to read your saved or upvoted submissions and identify as you.

If the permissions look safe, confirm it, and the BDFR will save a token that will allow it to authenticate with Reddit from then on. Most users will not need to do anything extra to use any of the current features. However, if additional features such as scraping messages, PMs, etc are added in the future, these will require additional scopes. There is normally no need to do this, but it is allowed by the BDFR.

These can all be changed if the user wishes, however do not do so if you don't know what you are doing. The defaults are specifically chosen to have a very low security risk if your token were to be compromised, however unlikely that actually is.

Never grant more permissions than you absolutely need. For more details on the configuration file and the values therein, see Configuration Files. The naming and folder schemes for the BDFR are both completely customisable. A number of different fields can be given which will be replaced with properties from a submission when downloading it. For example, the previous string will result in the following submission file names:.

At least one key must be included in the file scheme, otherwise an error will be thrown. The folder scheme however, can be null or a simple static string. In the former case, all files will be placed in the folder specified with the directory argument. If the folder scheme is a static string, then all submissions will be placed in a folder of that name. In both cases, there will be no separation between all submissions.

No combination of other keys will necessarily be unique and may result in posts being skipped as the BDFR will see files by the same name and skip the download, assuming that they are already downloaded. The configuration files are, by default, stored in the configuration directory for the user. For Windows, this will be:. If Python has been installed through the Windows Store, the folder will appear in a different place.

Note that the hash included in the file path may change from installation to installation. If you need to submit a bug, it is this file that you will need to submit with the report. The config. At the moment, the following keys must be included in the configuration file supplied. All of these should not be modified unless you know what you're doing, as the default values will enable the BDFR to function just fine.

A configuration is included in the BDFR when it is installed, and this will be placed in the configuration directory as the default. Please click on the following link to open the newsletter signup page: Ghacks Newsletter Sign up.

Ghacks is a technology news blog that was founded in by Martin Brinkmann. It has since then become one of the most popular tech news sites on the Internet with five authors and regular contributions from freelance writers. Search for:. In fact, I'd estimate that the majority of posts making the frontpage are image submissions. Individual cookie controls are removed from Privacy and Security in Chrome Google is extending Chrome support for Windows 7 by another year.

Google Chrome 96 is out: here is what is new. How to deal with "may be dangerous" download prompts in Chrome. Chrome is using less memory, crashing less, and loading search results faster, according to Google. Previous Post: « Why does Google Chrome open downloaded pdfs in the browser? Comments lonelyman said on November 19, at pm. Blue said on November 19, at pm.



0コメント

  • 1000 / 1000