text

R Language Analysis Suite

An R-package for analyzing natural language with transformers-based large language models. The text package is part of the R Language Analysis Suite, including:

talk - a package that transforms voice recordings into text, audio features, or embeddings.
text - a package that provides tools for many language tasks such as converting digital text into word embeddings.

talk and text offer access to Large Language Models from Hugging Face.
topics a package with tools for visualizing language patterns into topics.
the L-BAM Library a library that provides pre-trained models for different psychological assessments such as mental health issues, personality and related behaviours.

The R Language Analysis Suite is created through a collaboration between psychology and computer science to address research needs and ensure state-of-the-art techniques. The suite is continuously tested on Ubuntu, Mac OS and Windows using the latest stable R version.

The text-package has two main objectives:
* First, to serve R-users as a point solution for transforming text to state-of-the-art word embeddings that are ready to be used for downstream tasks. The package provides a user-friendly link to language models based on transformers from Hugging Face.
* Second, to serve as an end-to-end solution that provides state-of-the-art AI techniques tailored for social and behavioral scientists.
Please reference our tutorial article when using the text package: The text-package: An R-package for Analyzing and Visualizing Human Language Using Natural Language Processing and Deep Learning.

Point solution for transforming text to embeddings

Recent significant advances in NLP research have resulted in improved representations of human language (i.e., language models). These language models have produced big performance gains in tasks related to understanding human language. Text are making these SOTA models easily accessible through an interface to HuggingFace in Python.

Text provides many of the contemporary state-of-the-art language models that are based on deep learning to model word order and context. Multilingual language models can also represent several languages; multilingual BERT comprises 104 different languages.

Table 1. Some of the available language models

Models	References	Layers	Dimensions	Language
‘bert-base-uncased’	Devlin et al. 2019	12	768	English
‘roberta-base’	Liu et al. 2019	12	768	English
‘distilbert-base-cased’	Sahn et al., 2019	6	768	English
‘bert-base-multilingual-cased’	Devlin et al. 2019	12	768	104 top languages at Wikipedia
‘xlm-roberta-large’	Liu et al	24	1024	100 language

See HuggingFace for a more comprehensive list of models.

An end-to-end package

Text also provides functions to analyse the word embeddings with well-tested machine learning algorithms and statistics. The focus is to analyze and visualize text, and their relation to other text or numerical variables. For example, the textTrain() function is used to examine how well the word embeddings from a text can predict a numeric or categorical variable. Another example is functions plotting statistically significant words in the word embedding space.

Name		Name	Last commit message	Last commit date
Latest commit History 1,976 Commits
.github		.github
R		R
data		data
inst		inst
man		man
pkgdown/favicon		pkgdown/favicon
tests		tests
vignettes		vignettes
.DS_Store		.DS_Store
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codecov.yml		codecov.yml
installation.md		installation.md
text.Rproj		text.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

text

R Language Analysis Suite

Point solution for transforming text to embeddings

An end-to-end package

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 17

Uh oh!

Languages

OscarKjell/text

Folders and files

Latest commit

History

Repository files navigation

text

R Language Analysis Suite

Point solution for transforming text to embeddings

An end-to-end package

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 17

Uh oh!

Languages

Packages