Title: | Reddit Data Extraction Toolkit |
---|---|
Description: | A collection of tools for extracting structured data from <https://www.reddit.com/>. |
Authors: | Ivan Rivera <[email protected]> |
Maintainer: | Ivan Rivera <[email protected]> |
License: | GPL-3 |
Version: | 3.0.10 |
Built: | 2024-12-26 03:26:19 UTC |
Source: | https://github.com/ivan-rivera/redditextractor |
Search for subreddits and their attributes based on a keyword
find_subreddits(keywords)
find_subreddits(keywords)
keywords |
A string representing your search query |
A data frame with obtained reddits
## Not run: find_subreddits("cats") ## End(Not run)
## Not run: find_subreddits("cats") ## End(Not run)
Find URLs to reddit threads of interest. There are 2 available search strategies: by keywords and by home page. Using a set of keywords Can help you narrow down your search to a topic of interest that crosses multiple subreddits whereas searching by home page can help you find, for example, top posts within a specific subreddit
find_thread_urls( keywords = NA, sort_by = "top", subreddit = NA, period = "month" )
find_thread_urls( keywords = NA, sort_by = "top", subreddit = NA, period = "month" )
keywords |
A optional string that you want to search for, e.g. "cute kittens". If NA, then either your front page will be searched or the front page of a specified subreddit |
sort_by |
A string representing how you want Reddit to sort the results. Note that this string is conditional on whether you are searching by keywords or not. If you are searching by keywords, then it must be one of: relevance, comments, new, hot, top; if you are not searching by keywords, then it must be one of: hot, new, top, rising |
subreddit |
(optional) A string representing the subreddit of interest |
period |
A string representing the period of interest (hour, day, week, month, year, all) |
a data frame with URLs to Reddit threads that are relevant to your input parameters
## Not run: find_thread_urls(keywords="cute kittens", subreddit="cats", sort_by="new", period="month") find_thread_urls(subreddit="cats", sort_by="rising", period="all") ## End(Not run)
## Not run: find_thread_urls(keywords="cute kittens", subreddit="cats", sort_by="new", period="month") find_thread_urls(subreddit="cats", sort_by="rising", period="all") ## End(Not run)
This function takes a collection of URLs and returns a list with 2 data frames: 1. a data frame containing meta data describing each thread 2. a data frame with comments found in all threads
get_thread_content(urls)
get_thread_content(urls)
urls |
A vector of strings pointing to a Reddit thread |
The URLs are being retained in both tables which would allow you to join them if needed
A list with 2 data frames "threads" and "comments"
Given a list of valid Reddit user names, obtain a list consisting of general information about each user, their comments and threads
get_user_content(users)
get_user_content(users)
users |
A vector of strings representing valid Reddit user names |
A nested list with user names containing another list that has "about" (list), "comments" (data frame) and "threads" (data frame)
## Not run: get_user_content(c("memes", "nationalgeographic")) ## End(Not run)
## Not run: get_user_content(c("memes", "nationalgeographic")) ## End(Not run)
Reddit is an online bulletin board and a social networking website where registered users can submit and discuss content. This package uses Reddit API to retrieve thread URLs, comments, subreddits and user information. For more information about the usage of this package, please see the following GitHub page: https://github.com/ivan-rivera/RedditExtractor
Package: | RedditExtractoR |
Type: | Package |
Version: | 3.0.10 |
Date: | 2015-06-14 |
License: | GPL-3 |
The package contains a collection of functions for extracting threads of interest and their corresponding comments, as well as functions for analysing the structure of these threads.
Ivan Rivera
Maintainer: Ivan Rivera <[email protected]>