README for Salary Prediction Project

Project Overview

This project involves predicting employee salary levels using various features related to their job satisfaction, workload, experience, and role within the company. The goal is to classify whether an employee's salary is "high" or not, based on the given dataset.

Dataset

The dataset used in this project (ds_cp.csv) includes the following columns:

satisfaction_level: Employee satisfaction level (0-1)
number_project: Number of projects assigned to the employee
average_montly_hours: Average monthly working hours
exp_in_company: Number of years the employee has worked at the company
role: Role of the employee within the company
salary: Salary level (categorical: "low", "medium", "high")

Project Steps

1. Data Preprocessing

Read Data: Load the dataset into R using read.csv.
Convert Salary to Binary: The salary column is converted into a binary format where "high" is labeled as 1 and others as 0.
Oversampling and Undersampling: To address class imbalance, both oversampling and undersampling techniques are applied using the ROSE package.
Convert Role to Numeric: The role column is converted from a categorical variable to a numeric format.
Select Specific Columns: Create a new dataframe with selected columns for model training and testing.

2. Data Splitting

Create Training and Testing Sets: The dataset is split into training (70%) and testing (30%) sets using the caret package.

3. Model Training

Train SVM Model: A Support Vector Machine (SVM) model with a linear kernel is trained using the training dataset. The model training includes cross-validation to optimize hyperparameters.

4. Evaluation

The trained model is evaluated on the testing set to measure its performance.

Required Libraries

The following R libraries are required for this project:

ROSE
smotefamily
caret

To install these packages, you can run:

install.packages(c("ROSE", "smotefamily", "caret"))

Code Overview

Load and Preprocess Data

# Clear environment
rm(list = ls())

# Load libraries
library(ROSE)
library(smotefamily)
library(caret)

# Read dataset
data <- read.csv("ds_cp.csv")
head(data)
table(data$salary)

# Convert salary to binary
names(data) <- gsub(",", "", names(data))
data$salary_binary <- ifelse(data$salary == "high", 1, 0)

Oversampling and Undersampling

# Apply oversampling and undersampling
both <- ovun.sample(salary_binary ~ ., data = data, method = "both", p = 0.5, seed = 222)$data
table(both$salary)
both <- both$data
both$salary_binary <- ifelse(both$salary == "high", 1, 0)

Convert Role to Numeric

# Convert role to numeric
data$role <- as.factor(data$role)
data$role_numeric <- as.numeric(data$role)

Create Dataframe with Selected Columns

# Select specific columns
temp_data <- c("satisfaction_level", "number_project", "average_montly_hours", "exp_in_company", "role_numeric", "salary_binary")
data2 <- data.frame(matrix(nrow = nrow(data)))
colnames(data2) <- temp_data

for (col_name in temp_data) {
  data2[[col_name]] <- data[[col_name]]
}

# Remove the first column from data2
data2 <- data2[, -1]
head(data2)

Split Data into Training and Testing Sets

# Split the data
set.seed(1221)
index <- createDataPartition(data2$salary_binary, p = 0.7, list = FALSE)
training_data <- data2[index, ]
testing_data <- data2[-index, ]
anyNA(data2)
summary(data2)

Train SVM Model

# Train SVM model
training_data[["salary_binary"]] = factor(training_data[["salary_binary"]])
trlctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)

svm_linear <- train(salary_binary ~ ., data = training_data, method = "svmLinear",
                   trControl = trlctrl,
                   preProcess = c("centre", "scale"),
                   tuneLength = 10)

Results and Evaluation

Evaluate the model using the testing dataset.
Check metrics such as accuracy, precision, recall, and F1-score to determine the model's performance

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
HR_Data_Tutorial.csv		HR_Data_Tutorial.csv
README.md		README.md
main.R		main.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

README for Salary Prediction Project

Project Overview

Dataset

Project Steps

1. Data Preprocessing

2. Data Splitting

3. Model Training

4. Evaluation

Required Libraries

Code Overview

Load and Preprocess Data

Oversampling and Undersampling

Convert Role to Numeric

Create Dataframe with Selected Columns

Split Data into Training and Testing Sets

Train SVM Model

Results and Evaluation

About

Uh oh!

Releases

Packages

Languages

pranit144/Salary-classifier-in-R

Folders and files

Latest commit

History

Repository files navigation

README for Salary Prediction Project

Project Overview

Dataset

Project Steps

1. Data Preprocessing

2. Data Splitting

3. Model Training

4. Evaluation

Required Libraries

Code Overview

Load and Preprocess Data

Oversampling and Undersampling

Convert Role to Numeric

Create Dataframe with Selected Columns

Split Data into Training and Testing Sets

Train SVM Model

Results and Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages