RecommendationEngine provides four models for making recommendations based on users' behavior data provided by tianyu.
You can run the algorithms using spark-submit.
Replace build.sbt and hdfs.scala of the project package
cd /$path_of_the_project/RecommendEngine/
cp repFiles/dcos/hdfs.scala src/main/scala/tianyu/algorithm/util/
cp repFiles/dcos/build.sbt .Use simple build tool SBT to build the project
sbt clean
sbt packageThen you can find the packaged jar recommendationengine_2.11-1.0.jar under ~/RecommendEngine/target/scala-2.11/
Modify RecommendEngine/scripts/pack2dcos.sh, including the target path and the path of proxy_scp which is used for uploading the local jar to dcos.
And run the following command.
source scripts/pack2dcos.sh- example codes
/$path_of_spark_package/bin/spark-submit \
--class tianyu.algorithm.ARDcosTest \
--jars /$fullpath/scopt_2.11-3.3.0.jar,/$fullpath/spark-avro_2.11-3.2.0.jar \
--master local[*] \
--executor-memory 2G \
~/myjars/recommendationengine_2.11-1.0.jar \
...- Instruction of parameters
- --class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
- --master: The master URL for the cluster (e.g. spark://23.195.26.187:7077)
- --deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client)
- --executor-memory how much memory acquired for running the spark application
- --conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).
- application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
- application-arguments: Arguments passed to the main method of your main class, if any
- Class: tianyu.algorithm.ARDcosTest:
- Codes for parameters:
/$path_of_spark_package/bin/spark-submit \
--class tianyu.algorithm.ARDcosTest \
...
--outDir /tianyu/analysis \
--timeProcess true \
--endTime now \
--numPastMonths 3 \
--maxItems 1000 \
--minSupport 0.01 \
--minConfidence 0.8 \
--topN 10 \
--numPartitions 200- Instruction of parameters
- --outDir: Root directory for writing analysis results.
- --timeProcess: True is for running AR with time processing and false is for without time processing.
- --endTime: Specified end time for cutting out the data in logs. "now" is for current time. Other time must be formatted as yyyyMMdd.
- --numPastMonths: Length of period for cutting out the data into transactions and recent history (/month)
- --maxItems: The maximum of items in a transaction. Transaction will be filter out if it has more than such number of items.
- --minSupport: Minimum support for filtering frequents
- --minConfidence: Minimum confidence for filtering association rules
- --topN: Maximum of recommendations for each user
- --numPartitions: Number of tasks for an stage
- Class: tianyu.algorithm.ClusterDcosTest:
- Codes for parameters:
/$path_of_spark_package/bin/spark-submit \
--class tianyu.algorithm.ClusterDcosTest \
...
~/myjars/recommendationengine_2.11-1.0.jar \
--outDir /tianyu/lynnDockerTest \
--numCluster 5 \
--maxIterations 10 \
--topN 10 - Instruction of parameters
- --outDir: Root directory for writing analysis results.
- --numCluster: Number of clusters which users are divided into.
- --maxIterations: Maximum of iterations for clustering users.
- --topN: Maximum of recommendations for each user
- Class: tianyu.algorithm.CSDcosTest:
- Codes for parameters:
- Instruction of parameters
- Class: tianyu.algorithm.MFDcosTest:
- Codes for parameters:
/$path_of_spark_package/bin/spark-submit \
--class tianyu.algorithm.MFDcosTest \
...
~/myjars/recommendationengine_2.11-1.0.jar \
--outDir /tianyu/lynnDockerTest \
--rank 10 \
--reg 1 \
--maxIter 10 \
--topN 10 \
--numBlocks 200- Instruction of parameters
- --outDir: Root directory for writing analysis results.
- --rank: Rank of factors matrix for both users and items.
- --reg : Regularization parameter in ALS.
- --maxIter: Maximum number of iterations to run.
- --topN: Maximum of recommendations for each user.
- --numBlocks: Number of blocks the users and items will be partitioned into in order to parallelize computation.
- Class: tianyu.algorithm.CSDcosTest:
- Codes for parameters:
/$path_of_spark_package/bin/spark-submit \
--class tianyu.algorithm.ClusterDcosTest \
...
~/myjars/recommendationengine_2.11-1.0.jar \
--outDir /tianyu/analysis \
--minSim 0.7 \
--topSim 20 \
--minCommons 5 \
--topN 10 - Instruction of parameters
- --outDir: Root directory for writing analysis results.
- --minSim: Minimum of adjusted similarities to filter (i,j) pair similarities.
- --topSim : Maximum of similar items for each item (for calculate predictions)
- --minCommons: Minimum number of same (rui)s to determine two items are similar.
- --topN: Maximum of recommendations for each user.
- Class: tianyu.algorithm.Comparison:
- Codes for parameters:
/$path_of_spark_package/bin/spark-submit \
--class tianyu.algorithm.Comparison \
...
~/myjars/recommendationengine_2.11-1.0.jar \
--rootDir /tianyu/lynnDockerTest \
--user zc14607X \
--alg Association \
--subType Full_History \
--DateTime 2017-06-15-08- Instruction of parameters
- --rootDir: Root directory of results file.
- --user: Name, ID or account of the user to look up.
- --alg: Name of recommendation model.(Association,Cluster,ALS,CosSim)
- --subType: Subsidiary type of the specific recommendation model.(Full_History,Time_Window,Basic)
- --DateTime 2017-06-15-08: Time when the analysis are executed, formatted yyyy-MM-dd-HH