Title: | A 'Shiny' App for Exploration of Text Collections |
---|---|
Description: | Facilitates dynamic exploration of text collections through an intuitive graphical user interface and the power of regular expressions. The package contains 1) a helper function to convert a data frame to a 'corporaexplorerobject' and 2) a 'Shiny' app for fast and flexible exploration of a 'corporaexplorerobject'. The package also includes demo apps with which one can explore Jane Austen's novels and the State of the Union Addresses (data from the 'janeaustenr' and 'sotu' packages respectively). |
Authors: | Kristian Lundby Gjerde [aut, cre] |
Maintainer: | Kristian Lundby Gjerde <[email protected]> |
License: | GPL-3 | file LICENSE |
Version: | 0.9.0.9000 |
Built: | 2024-11-01 05:25:10 UTC |
Source: | https://github.com/kgjerde/corporaexplorer |
run_janeausten_app()
is a convenience function to directly
run the demo app without first creating
a corporaexplorerobject.
Equals explore(create_janeausten_app())
.
Interrupt R to stop the
application (usually by pressing Ctrl+C or Esc).
run_janeausten_app(...) create_janeausten_app()
run_janeausten_app(...) create_janeausten_app()
... |
Arguments passed to |
The demo app's data are Jane Austen's six novels, retrieved through the "janeaustenr" package (https://github.com/juliasilge/janeaustenr) – which must be installed for these functions to work – and converted to a corporaexplorerobject as shown at https://kgjerde.github.io/corporaexplorer/articles/jane_austen.html.
run_janeausten_app()
launches a Shiny app. create_janeausten_app()
returns
a corporaexplorerobject.
## Create corporaexplorerobject for demo app: jane_austen <- create_janeausten_app() if(interactive()){ ## Run the corporaexplorerobject: explore(jane_austen) ## Or create and run the demo app in one step: run_janeausten_app() }
## Create corporaexplorerobject for demo app: jane_austen <- create_janeausten_app() if(interactive()){ ## Run the corporaexplorerobject: explore(jane_austen) ## Or create and run the demo app in one step: run_janeausten_app() }
Two demo apps exploring the United States Presidential State of the Union addresses. The data are provided by the sotu package, and include all addresses through 2016. Interrupt R to stop the application (usually by pressing Ctrl+C or Esc).
run_sotu_app(...) create_sotu_app() run_sotu_decade_app(...) create_sotu_decade_app()
run_sotu_app(...) create_sotu_app() run_sotu_decade_app(...) create_sotu_decade_app()
... |
Arguments passed to |
For details, see https://kgjerde.github.io/corporaexplorer/articles/sotu.html.
The run_sotu_*
functions launch a Shiny app.
The create_sotu_*
functions return a corporaexplorerobject
.
Launch Shiny app for exploration of text collection. Interrupt R to stop the application (usually by pressing Ctrl+C or Esc).
explore()
explores a 'corporaexplorerobject'
created with the prepare_data()
function.
App settings optionally specified in
the arguments to explore()
.
explore0()
is a convenience function to directly explore
a data frame or character vector
without first creating a corporaexplorerobject using
prepare_data()
, instead creating one on the fly as the app
launches.
Functionally equivalent to
explore(prepare_data(dataset, use_matrix = FALSE))
.
explore( corpus_object, search_options = list(), ui_options = list(), search_input = list(), plot_options = list(), ... ) explore0( dataset, arguments_prepare_data = list(use_matrix = FALSE), arguments_explore = list() )
explore( corpus_object, search_options = list(), ui_options = list(), search_input = list(), plot_options = list(), ... ) explore0( dataset, arguments_prepare_data = list(use_matrix = FALSE), arguments_explore = list() )
corpus_object |
A corporaexplorerobject created by
|
search_options |
List. Specify how search operations in the app are carried out. Available options:
|
ui_options |
List. Specify custom app settings (see example below). Currently available:
|
search_input |
List. Gives the opportunity to pre-populate the following sidebar fields (see example below):
|
plot_options |
List. Specify custom plot settings (see example below). Currently available:
|
... |
Other arguments passed to |
dataset |
Data frame or character vector as specified in |
arguments_prepare_data |
List. Arguments to be passed to
|
arguments_explore |
List. Arguments to be passed to
|
For explore0()
:
by default, no document term matrix will be generated,
meaning that the data will be prepared for exploration faster than
by using the default settings in prepare_data()
,
but also that searches in the app are likely to be slower.
Launches a Shiny app.
# Constructing test data frame: dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-")) texts <- paste0( "This is a document about ", month.name[1:10], ". ", "This is not a document about ", rev(month.name[1:10]), "." ) titles <- paste("Text", 1:10) test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles) # Converting to corporaexplorerobject: corpus <- prepare_data(test_df, corpus_name = "Test corpus") if(interactive()){ # Running exploration app: explore(corpus) explore(corpus, search_options = list(optional_info = TRUE), ui_options = list(font_size = "10px"), search_input = list(search_terms = c("Tottenham", "Spurs")), plot_options = list(max_docs_in_wall_view = 12001, colours = c("gray", "green"))) # Running app to extract documents: run_document_extractor(corpus) } if (interactive()) { explore0(rep(sample(LETTERS), 10)) explore0(rep(sample(LETTERS), 10), arguments_explore = list(search_input = list(search_terms = "Z")) ) }
# Constructing test data frame: dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-")) texts <- paste0( "This is a document about ", month.name[1:10], ". ", "This is not a document about ", rev(month.name[1:10]), "." ) titles <- paste("Text", 1:10) test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles) # Converting to corporaexplorerobject: corpus <- prepare_data(test_df, corpus_name = "Test corpus") if(interactive()){ # Running exploration app: explore(corpus) explore(corpus, search_options = list(optional_info = TRUE), ui_options = list(font_size = "10px"), search_input = list(search_terms = c("Tottenham", "Spurs")), plot_options = list(max_docs_in_wall_view = 12001, colours = c("gray", "green"))) # Running app to extract documents: run_document_extractor(corpus) } if (interactive()) { explore0(rep(sample(LETTERS), 10)) explore0(rep(sample(LETTERS), 10), arguments_explore = list(search_input = list(search_terms = "Z")) ) }
Convert data frame or character vector to a ‘corporaexplorerobject’ for subsequent exploration.
prepare_data(dataset, ...) ## S3 method for class 'data.frame' prepare_data( dataset, date_based_corpus = TRUE, text_column = "Text", grouping_variable = NULL, within_group_identifier = "sequential", columns_doc_info = c("Date", "Title", "URL"), corpus_name = NULL, use_matrix = TRUE, matrix_without_punctuation = TRUE, tile_length_range = c(1, 10), columns_for_ui_checkboxes = NULL, ... ) ## S3 method for class 'character' prepare_data( dataset, corpus_name = NULL, use_matrix = TRUE, matrix_without_punctuation = TRUE, ... )
prepare_data(dataset, ...) ## S3 method for class 'data.frame' prepare_data( dataset, date_based_corpus = TRUE, text_column = "Text", grouping_variable = NULL, within_group_identifier = "sequential", columns_doc_info = c("Date", "Title", "URL"), corpus_name = NULL, use_matrix = TRUE, matrix_without_punctuation = TRUE, tile_length_range = c(1, 10), columns_for_ui_checkboxes = NULL, ... ) ## S3 method for class 'character' prepare_data( dataset, corpus_name = NULL, use_matrix = TRUE, matrix_without_punctuation = TRUE, ... )
dataset |
Object to convert to corporaexplorerobject:
|
... |
Other arguments to be passed to |
date_based_corpus |
Logical. Set to |
text_column |
Character. Default: "Text".
The column in |
grouping_variable |
Character string indicating column name in dataset. If date_based_corpus is TRUE, this argument is ignored. If date_based_corpus is FALSE, this argument is used to group the documents, e.g., if dataset is organised by chapters belonging to different books. The order of groups in the app is determined as follows:
|
within_group_identifier |
Character string indicating column name in |
columns_doc_info |
Character vector. The columns from |
corpus_name |
Character string with name of corpus. |
use_matrix |
Logical. Should the function create a document term matrix
for fast searching? If |
matrix_without_punctuation |
Should punctuation and digits be stripped
from the text before constructing the document term matrix? If
If |
tile_length_range |
Numeric vector of length two.
Fine-tune the tile lengths in document wall
and day corpus view. Tile length is calculated by
|
columns_for_ui_checkboxes |
Character. Character or factor column(s) in dataset.
Include sets of checkboxes in the app sidebar for
convenient filtering of corpus.
Typical useful for columns with a small set of unique
(and short) values.
Checkboxes will be arranged by |
For data.frame: Each row in dataset
is treated as a base differentiating unit in the corpus,
typically chapters in books, or a single document in document collections.
The following column names are reserved and cannot be used in dataset
:
"Date_",
"cx_ID",
"Text_original_case",
"Text_column_",
"Tile_length",
"Year_",
"cx_Seq",
"Weekday_n",
"Day_without_docs",
"Invisible_fake_date",
"Tile_length".
A character vector will be converted to a simple corporaexplorerobject with no metadata.
A corporaexplorer
object to be passed as argument to
explore
and
run_document_extractor
.
## From data.frame # Constructing test data frame: dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-")) texts <- paste0( "This is a document about ", month.name[1:10], ". ", "This is not a document about ", rev(month.name[1:10]), "." ) titles <- paste("Text", 1:10) test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles) # Converting to corporaexplorerobject: corpus <- prepare_data(test_df, corpus_name = "Test corpus") if(interactive()){ # Running exploration app: explore(corpus) # Running app to extract documents: run_document_extractor(corpus) } ## From character vector alphabet_corpus <- prepare_data(LETTERS) if(interactive()){ # Running exploration app: explore(alphabet_corpus) }
## From data.frame # Constructing test data frame: dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-")) texts <- paste0( "This is a document about ", month.name[1:10], ". ", "This is not a document about ", rev(month.name[1:10]), "." ) titles <- paste("Text", 1:10) test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles) # Converting to corporaexplorerobject: corpus <- prepare_data(test_df, corpus_name = "Test corpus") if(interactive()){ # Running exploration app: explore(corpus) # Running app to extract documents: run_document_extractor(corpus) } ## From character vector alphabet_corpus <- prepare_data(LETTERS) if(interactive()){ # Running exploration app: explore(alphabet_corpus) }
This function will be removed in a future version of corporexplorer.
run_document_extractor(corpus_object, max_html_docs = 400, ...)
run_document_extractor(corpus_object, max_html_docs = 400, ...)
corpus_object |
A |
max_html_docs |
The maximum number of documents allowed in one HTML report. |
... |
Other arguments passed to |
Shiny app for simple retrieval/extraction of documents from a "corporaexplorerobject" in a reading-friendly format. Interrupt R to stop the application (usually by pressing Ctrl+C or Esc).
# Constructing test data frame: dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-")) texts <- paste0( "This is a document about ", month.name[1:10], ". ", "This is not a document about ", rev(month.name[1:10]), "." ) titles <- paste("Text", 1:10) test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles) # Converting to corporaexplorer object: corpus <- prepare_data(test_df, corpus_name = "Test corpus") if(interactive()){ # Running exploration app: explore(corpus) # Running app to extract documents: run_document_extractor(corpus) }
# Constructing test data frame: dates <- as.Date(paste(2011:2020, 1:10, 21:30, sep = "-")) texts <- paste0( "This is a document about ", month.name[1:10], ". ", "This is not a document about ", rev(month.name[1:10]), "." ) titles <- paste("Text", 1:10) test_df <- tibble::tibble(Date = dates, Text = texts, Title = titles) # Converting to corporaexplorer object: corpus <- prepare_data(test_df, corpus_name = "Test corpus") if(interactive()){ # Running exploration app: explore(corpus) # Running app to extract documents: run_document_extractor(corpus) }
Created by corporaexplorer:::create_test_data()
.
test_data
test_data
A corporaexplorerobject.