Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
dbd307f
fix the issues in preprocess and np.stack in batches in vec
LangFeng0912 Mar 22, 2023
896e819
add the learn_sep into the type4py pepeline
LangFeng0912 Mar 22, 2023
b38ba1e
add gen_cluster and update reduce for batches script
LangFeng0912 Mar 23, 2023
bfb9732
add infer_project CLI command for infer the dataset
LangFeng0912 Mar 24, 2023
4cc376f
add more comments
LangFeng0912 Mar 28, 2023
084aaa2
fix the issues
LangFeng0912 Apr 6, 2023
298c962
fix the issues, add the predicts script amd exceptions script
LangFeng0912 Apr 10, 2023
0408072
update infer-project .py
LangFeng0912 May 29, 2023
e336713
add project-base inference pipeline
LangFeng0912 May 29, 2023
08a9f51
add project-base inference for ml & hybrid
LangFeng0912 May 29, 2023
bd4bb81
add script explanations
LangFeng0912 May 29, 2023
51a9693
update infer-project base approach name t4pyre and t4pyright
LangFeng0912 Jun 5, 2023
5c5f522
update t4pyright logic in infer-project approach
LangFeng0912 Jun 6, 2023
8698994
rename type_preprocess script
LangFeng0912 Jun 8, 2023
e9b1a11
update TypeAnnotationFinder & Masker to libsa4py and import from it
LangFeng0912 Jun 8, 2023
0b491e2
update comments
LangFeng0912 Jun 8, 2023
afbddd7
update vectorize
LangFeng0912 Aug 17, 2023
2e80ff1
update preprocess
LangFeng0912 Aug 17, 2023
f7d1fef
update learn_split.py
LangFeng0912 Aug 17, 2023
ed8af0d
update learn_split.py
LangFeng0912 Aug 17, 2023
2ff621e
update learn_split.py
LangFeng0912 Aug 17, 2023
1a5332a
update learn_split.py
LangFeng0912 Aug 17, 2023
40f1302
update learn_split.py
LangFeng0912 Aug 17, 2023
9aa640b
update pipeline
LangFeng0912 Aug 18, 2023
e01df3c
update pipeline
LangFeng0912 Aug 18, 2023
3eb4fe9
update pipeline
LangFeng0912 Aug 18, 2023
33d169f
update Dockerfile for cuda version
LangFeng0912 Aug 18, 2023
8e0f0de
update model parameters
LangFeng0912 Aug 18, 2023
3dc74c5
update model parameters
LangFeng0912 Aug 23, 2023
97c1c11
update model parameters
LangFeng0912 Aug 23, 2023
67b8a70
update infer main approach
LangFeng0912 Aug 23, 2023
b369011
update infer main approach
LangFeng0912 Aug 23, 2023
acbdae1
update infer main approach
LangFeng0912 Aug 23, 2023
d612597
update infer main approach
LangFeng0912 Aug 23, 2023
704af65
update infer main approach
LangFeng0912 Aug 23, 2023
6fab977
update type preprocess_list
LangFeng0912 Aug 24, 2023
ace9baf
update type preprocess_list
LangFeng0912 Aug 24, 2023
1750b74
update eval scripts
LangFeng0912 Aug 24, 2023
46b3ea5
update eval scripts
LangFeng0912 Aug 24, 2023
9592acf
update infer project scripts
LangFeng0912 Aug 24, 2023
40880e1
update infer project scripts
LangFeng0912 Aug 24, 2023
502a1ca
update infer project scripts
LangFeng0912 Aug 24, 2023
3e94481
update README.md
LangFeng0912 Sep 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 48 additions & 20 deletions Dockerfile.cuda
Original file line number Diff line number Diff line change
@@ -1,15 +1,41 @@
# NOTE: This Docker file is configured to deploy Type4Py on our server and GPUs.
# For us, these configs seem to work: CUDA 11.0.3, ONNX v1.10.0, nvidia driver 450.36.06
# FROM --platform=linux/amd64 ubuntu

FROM nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04
WORKDIR /type4py/
# Put the required models files in a folder "t4py_model_files" inside "/type4py"
# -type4py/
# --type4py/
# --t4py_model_files/
COPY . /type4py
ENV T4PY_LOCAL_MODE="1"

RUN apt update --fix-missing -y && apt upgrade -y && apt install -y python3-pip libpq-dev
RUN ln -snf /usr/share/zoneinfo/$CONTAINER_TIMEZONE /etc/localtime && echo $CONTAINER_TIMEZONE > /etc/timezone

# RUN apt-get purge libappstream3
RUN apt-get update

# python 3.8 installed by one of the following packages
# install packages needed
RUN apt-get install -y vim
RUN apt-get install -y wget
RUN apt-get install unzip
RUN apt-get install -y git
RUN apt install -y software-properties-common
RUN add-apt-repository ppa:deadsnakes/ppa

RUN apt install -y expect

RUN apt-get install -y python3-distutils

RUN wget https://bootstrap.pypa.io/get-pip.py
RUN python3 get-pip.py

RUN pip --version

RUN apt-get install -y libssl-dev

# download watchman
RUN wget https://github.com/facebook/watchman/releases/download/v2022.12.12.00/watchman_ubuntu20.04_v2022.12.12.00.deb
# RUN dpkg -i watchman_ubuntu20.04_v2022.12.12.00.deb
# RUN apt-get -f -y install
# RUN watchman version

RUN apt install -y python3.8-venv
RUN python3 -m venv py38
# RUN /bin/bash -c "source py38/bin/activate"

# The current model files are pickled with the below ver. of sklearn
RUN pip install scikit-learn==0.24.1
Expand All @@ -20,20 +46,22 @@ RUN pip install https://type4py.com/pretrained_models/annoy-wheels/annoy-1.17.0-
# For production env., install ONNXRuntime with GPU support
RUN pip install onnx==1.10 onnxruntime==1.10 onnxruntime-gpu==1.10

# Install Type4Py
RUN pip install -e .
RUN pip install --upgrade pip
RUN pip install setuptools-rust

# Web server's required packages
RUN pip install -r type4py/server/requirements.txt
# install libsa4py
RUN git clone https://github.com/LangFeng0912/libsa4py.git
RUN pip install -r libsa4py/requirements.txt
RUN pip install -e libsa4py/

# install type4py
RUN git clone https://github.com/LangFeng0912/type4py.git
# RUN pip install -e type4py/

# Install NLTK corpus
RUN python3 -c "import nltk; nltk.download('stopwords')"
RUN python3 -c "import nltk; nltk.download('wordnet')"
RUN python3 -c "import nltk; nltk.download('omw-1.4')"
RUN python3 -c "import nltk; nltk.download('averaged_perceptron_tagger')"

WORKDIR /type4py/type4py/server/

EXPOSE 5010

CMD ["bash", "run_server.sh"]
# download dataset
RUN wget https://zenodo.org/record/8255564/files/ManyTypes4PyV8.tar.gz?download=1
74 changes: 57 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ pip install .
Follow the below steps to train and evaluate the Type4Py model.
## 1. Extraction
**NOTE:** Skip this step if you're using the ManyTypes4Py dataset.

**NOTE:** You can find a new ManyTypes4Py dataset(MTV0.8) on [Zenedo](https://zenodo.org/record/8321283).
```
$ type4py extract --c $DATA_PATH --o $OUTPUT_DIR --d $DUP_FILES --w $CORES
```
Expand All @@ -63,48 +65,86 @@ $ type4py vectorize --o $OUTPUT_DIR
Description:
- `$OUTPUT_DIR`: The path that was used in the previous step to store processed projects.

## 4. Learning
[//]: # (## 4. Learning)

[//]: # (```)

[//]: # ($ type4py learn --o $OUTPUT_DIR --c --p $PARAM_FILE)

[//]: # (```)

[//]: # (Description:)

[//]: # (- `$OUTPUT_DIR`: The path that was used in the previous step to store processed projects.)

[//]: # (- `--c`: Trains the complete model. Use `type4py learn -h` to see other configurations.)

[//]: # ()
[//]: # (- `--p $PARAM_FILE`: The path to user-provided hyper-parameters for the model. See [this](https://github.com/saltudelft/type4py/blob/main/type4py/model_params.json) file as an example. [Optional])

## 4*. Learning separately
```
$ type4py learn --o $OUTPUT_DIR --c --p $PARAM_FILE
$ type4py learns --o $OUTPUT_DIR --dt $DATA_TYPE --c --p $PARAM_FILE
```
Description:
- `$OUTPUT_DIR`: The path that was used in the previous step to store processed projects.
- `$DATA_TYPE`: Sequential Learing, either `var`, or `param` or `ret`
- `--c`: Trains the complete model. Use `type4py learn -h` to see other configurations.

- `--p $PARAM_FILE`: The path to user-provided hyper-parameters for the model. See [this](https://github.com/saltudelft/type4py/blob/main/type4py/model_params.json) file as an example. [Optional]

## 5. Testing
## 5**. Gernerating Type Cluster
```
$ type4py predict --o $OUTPUT_DIR --c
$ type4py gen_type_clu --o $OUTPUT_DIR --dt $DATA_TYPE
```
- `$OUTPUT_DIR`: The path that was used in the previous step to store processed projects.
- `$DATA_TYPE`: Sequential Learing, either `var`, or `param` or `ret`

## 6. Reducing Type Cluster
To reduce the dimension of the created type clusters in step 5, run the following command:
> Note: The reduced version of type clusters causes a slight performance loss in type prediction.
```
$ type4py reduce --o $OUTPUT_DIR --d $DIMENSION
```

Description:
- `$OUTPUT_DIR`: The path that was used in the first step to store processed projects.
- `--c`: Predicts using the complete model. Use `type4py predict -h` to see other configurations.
- `$DIMENSION`: Reduces the dimension of type clusters to the specified value [Default: 256]

## 6. Evaluating
## 7*. Project-base inference
```python
$ type4py infer_project --m results --p raw_projects --o results --a t4py
```
$ type4py eval --o $OUTPUT_DIR --t c --tp 10
- `$--m`: The path that saved the model
- `$--p`:The path that saved the raw projects, for project-base inference
- `$--o`:The path that output the inference results
- `$--a`:The approach you want, including t4py, t4pyre, t4pyright
```python
$ type4py infer_project --m results --p raw_projects --o results --a t4pyre
```

## 7. Testing
```
$ type4py predicts --o $OUTPUT_DIR
```

Description:
- `$OUTPUT_DIR`: The path that was used in the first step to store processed projects.
- `--t`: Evaluates the model considering different prediction tasks. E.g., `--t c` considers all predictions tasks,
i.e., parameters, return, and variables. [Default: c]
- `--tp 10`: Considers Top-10 predictions for evaluation. For this argument, You can choose a positive integer between 1 and 10. [Default: 10]

Use `type4py eval -h` to see other options.
[//]: # (- `--c`: Predicts using the complete model. Use `type4py predict -h` to see other configurations.)

## Reduce
To reduce the dimension of the created type clusters in step 5, run the following command:
> Note: The reduced version of type clusters causes a slight performance loss in type prediction.
## 8. Evaluating
```
$ type4py reduce --o $OUTPUT_DIR --d $DIMENSION
$ type4py eval --o $OUTPUT_DIR --t c --tp 10
```

Description:
- `$OUTPUT_DIR`: The path that was used in the first step to store processed projects.
- `$DIMENSION`: Reduces the dimension of type clusters to the specified value [Default: 256]
- `--t`: Evaluates the model considering different prediction tasks. E.g., `--t c` considers all predictions tasks,
i.e., parameters, return, and variables. [Default: c]
- `--tp 10`: Considers Top-10 predictions for evaluation. For this argument, You can choose a positive integer between 1 and 10. [Default: 10]

Use `type4py eval -h` to see other options.


# Converting Type4Py to ONNX
To convert the pre-trained Type4Py model to the [ONNX](https://onnxruntime.ai/) format, use the following command:
Expand Down
Loading