RecommendEngine

RecommendationEngine provides four models for making recommendations based on users' behavior data provided by tianyu.

You can run the algorithms using spark-submit.

Building RecommendEngine

Replace build.sbt and hdfs.scala of the project package

cd /$path_of_the_project/RecommendEngine/
cp repFiles/dcos/hdfs.scala src/main/scala/tianyu/algorithm/util/
cp repFiles/dcos/build.sbt .

Use simple build tool SBT to build the project

sbt clean
sbt package

Then you can find the packaged jar recommendationengine_2.11-1.0.jar under ~/RecommendEngine/target/scala-2.11/

Using scripts to build package and send it to aritifact

Modify RecommendEngine/scripts/pack2dcos.sh, including the target path and the path of proxy_scp which is used for uploading the local jar to dcos.

And run the following command.

source scripts/pack2dcos.sh

Spark submit codes

Common configuration of spark and class

example codes

/$path_of_spark_package/bin/spark-submit \
	--class tianyu.algorithm.ARDcosTest \
	--jars /$fullpath/scopt_2.11-3.3.0.jar,/$fullpath/spark-avro_2.11-3.2.0.jar \
	--master local[*] \
	--executor-memory 2G \
	~/myjars/recommendationengine_2.11-1.0.jar \
    ...

Instruction of parameters

--class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)

--master: The master URL for the cluster (e.g. spark://23.195.26.187:7077)

--deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client)

--executor-memory how much memory acquired for running the spark application

--conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).

application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.

application-arguments: Arguments passed to the main method of your main class, if any

Association Rules (Items)

Class: tianyu.algorithm.ARDcosTest:
Codes for parameters:

/$path_of_spark_package/bin/spark-submit \
	--class tianyu.algorithm.ARDcosTest \
	...
	--outDir /tianyu/analysis \
	--timeProcess true \
	--endTime now \
	--numPastMonths 3 \
	--maxItems 1000 \
	--minSupport 0.01 \
	--minConfidence 0.8 \
	--topN 10 \
	--numPartitions 200

Instruction of parameters

--outDir: Root directory for writing analysis results.

--timeProcess: True is for running AR with time processing and false is for without time processing.

--endTime: Specified end time for cutting out the data in logs. "now" is for current time. Other time must be formatted as yyyyMMdd.

--numPastMonths: Length of period for cutting out the data into transactions and recent history (/month)

--maxItems: The maximum of items in a transaction. Transaction will be filter out if it has more than such number of items.

--minSupport: Minimum support for filtering frequents

--minConfidence: Minimum confidence for filtering association rules

--topN: Maximum of recommendations for each user

--numPartitions: Number of tasks for an stage

Cluster (Users)

Class: tianyu.algorithm.ClusterDcosTest:
Codes for parameters:

/$path_of_spark_package/bin/spark-submit \
    --class tianyu.algorithm.ClusterDcosTest \
    ...
    ~/myjars/recommendationengine_2.11-1.0.jar \
    --outDir /tianyu/lynnDockerTest \
    --numCluster 5 \
    --maxIterations 10 \
    --topN 10

Instruction of parameters

--outDir: Root directory for writing analysis results.

--numCluster: Number of clusters which users are divided into.

--maxIterations: Maximum of iterations for clustering users.

--topN: Maximum of recommendations for each user

Cosine Similarity (Items)

Class: tianyu.algorithm.CSDcosTest:
Codes for parameters:
Instruction of parameters

Matrix Factorization (Users and Items)

Class: tianyu.algorithm.MFDcosTest:
Codes for parameters:

/$path_of_spark_package/bin/spark-submit \
    --class tianyu.algorithm.MFDcosTest \
    ...
    ~/myjars/recommendationengine_2.11-1.0.jar \
    --outDir /tianyu/lynnDockerTest \
    --rank 10 \
    --reg 1 \
    --maxIter 10 \
    --topN 10 \
    --numBlocks 200

Instruction of parameters

--outDir: Root directory for writing analysis results.

--rank: Rank of factors matrix for both users and items.

--reg : Regularization parameter in ALS.

--maxIter: Maximum number of iterations to run.

--topN: Maximum of recommendations for each user.

--numBlocks: Number of blocks the users and items will be partitioned into in order to parallelize computation.

Adjusted Cosine Similarities (Items)

Class: tianyu.algorithm.CSDcosTest:
Codes for parameters:

/$path_of_spark_package/bin/spark-submit \
    --class tianyu.algorithm.ClusterDcosTest \
    ...
    ~/myjars/recommendationengine_2.11-1.0.jar \
    --outDir /tianyu/analysis \
    --minSim 0.7 \
    --topSim 20 \
    --minCommons 5 \
    --topN 10

Instruction of parameters

--outDir: Root directory for writing analysis results.

--minSim: Minimum of adjusted similarities to filter (i,j) pair similarities.

--topSim : Maximum of similar items for each item (for calculate predictions)

--minCommons: Minimum number of same (rui)s to determine two items are similar.

--topN: Maximum of recommendations for each user.

Search for a user's history and recommendation in a specific model

Class: tianyu.algorithm.Comparison:
Codes for parameters:

/$path_of_spark_package/bin/spark-submit \
    --class tianyu.algorithm.Comparison \
    ...
    ~/myjars/recommendationengine_2.11-1.0.jar \
    --rootDir /tianyu/lynnDockerTest \
    --user zc14607X \
    --alg Association \
    --subType Full_History \
    --DateTime 2017-06-15-08

Instruction of parameters

--rootDir: Root directory of results file.

--user: Name, ID or account of the user to look up.

--alg: Name of recommendation model.(Association,Cluster,ALS,CosSim)

--subType: Subsidiary type of the specific recommendation model.(Full_History,Time_Window,Basic)

--DateTime 2017-06-15-08: Time when the analysis are executed, formatted yyyy-MM-dd-HH

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.idea		.idea
repFiles		repFiles
scripts		scripts
src		src
.gitignore		.gitignore
Readme.md		Readme.md
build.sbt		build.sbt
publish.settings.json		publish.settings.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RecommendEngine

Building RecommendEngine

Using scripts to build package and send it to aritifact

Spark submit codes

Common configuration of spark and class

Association Rules (Items)

Cluster (Users)

Cosine Similarity (Items)

Matrix Factorization (Users and Items)

Adjusted Cosine Similarities (Items)

Search for a user's history and recommendation in a specific model

About

Uh oh!

Releases

Packages

Languages

liuzi/spark_reconmendation_engine

Folders and files

Latest commit

History

Repository files navigation

RecommendEngine

Building RecommendEngine

Using scripts to build package and send it to aritifact

Spark submit codes

Common configuration of spark and class

Association Rules (Items)

Cluster (Users)

Cosine Similarity (Items)

Matrix Factorization (Users and Items)

Adjusted Cosine Similarities (Items)

Search for a user's history and recommendation in a specific model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages