Package 'RedditExtractoR'

Title: Reddit Data Extraction Toolkit
Description: A collection of tools for extracting structured data from <https://www.reddit.com/>.
Authors: Ivan Rivera <[email protected]>
Maintainer: Ivan Rivera <[email protected]>
License: GPL-3
Version: 3.0.10
Built: 2024-12-26 03:26:19 UTC
Source: https://github.com/ivan-rivera/redditextractor

Help Index


Find subreddits by keywords

Description

Search for subreddits and their attributes based on a keyword

Usage

find_subreddits(keywords)

Arguments

keywords

A string representing your search query

Value

A data frame with obtained reddits

Examples

## Not run: 
find_subreddits("cats")

## End(Not run)

Find Reddit thread URLs

Description

Find URLs to reddit threads of interest. There are 2 available search strategies: by keywords and by home page. Using a set of keywords Can help you narrow down your search to a topic of interest that crosses multiple subreddits whereas searching by home page can help you find, for example, top posts within a specific subreddit

Usage

find_thread_urls(
  keywords = NA,
  sort_by = "top",
  subreddit = NA,
  period = "month"
)

Arguments

keywords

A optional string that you want to search for, e.g. "cute kittens". If NA, then either your front page will be searched or the front page of a specified subreddit

sort_by

A string representing how you want Reddit to sort the results. Note that this string is conditional on whether you are searching by keywords or not. If you are searching by keywords, then it must be one of: relevance, comments, new, hot, top; if you are not searching by keywords, then it must be one of: hot, new, top, rising

subreddit

(optional) A string representing the subreddit of interest

period

A string representing the period of interest (hour, day, week, month, year, all)

Value

a data frame with URLs to Reddit threads that are relevant to your input parameters

Examples

## Not run: 
find_thread_urls(keywords="cute kittens", subreddit="cats", sort_by="new", period="month")
find_thread_urls(subreddit="cats", sort_by="rising", period="all")

## End(Not run)

Get thread contents of Reddit URLs

Description

This function takes a collection of URLs and returns a list with 2 data frames: 1. a data frame containing meta data describing each thread 2. a data frame with comments found in all threads

Usage

get_thread_content(urls)

Arguments

urls

A vector of strings pointing to a Reddit thread

Details

The URLs are being retained in both tables which would allow you to join them if needed

Value

A list with 2 data frames "threads" and "comments"


Find data relating to a vector of Reddit users

Description

Given a list of valid Reddit user names, obtain a list consisting of general information about each user, their comments and threads

Usage

get_user_content(users)

Arguments

users

A vector of strings representing valid Reddit user names

Value

A nested list with user names containing another list that has "about" (list), "comments" (data frame) and "threads" (data frame)

Examples

## Not run: 
get_user_content(c("memes", "nationalgeographic"))

## End(Not run)

Reddit Data Extraction Toolkit

Description

Reddit is an online bulletin board and a social networking website where registered users can submit and discuss content. This package uses Reddit API to retrieve thread URLs, comments, subreddits and user information. For more information about the usage of this package, please see the following GitHub page: https://github.com/ivan-rivera/RedditExtractor

Details

Package: RedditExtractoR
Type: Package
Version: 3.0.10
Date: 2015-06-14
License: GPL-3

The package contains a collection of functions for extracting threads of interest and their corresponding comments, as well as functions for analysing the structure of these threads.

Author(s)

Ivan Rivera

Maintainer: Ivan Rivera <[email protected]>

See Also

https://www.reddit.com