Skip to content

fix import issue when running huggingface_lowresource.sh#4

Open
wolvecap wants to merge 1 commit into
lancopku:mainfrom
wolvecap:main
Open

fix import issue when running huggingface_lowresource.sh#4
wolvecap wants to merge 1 commit into
lancopku:mainfrom
wolvecap:main

Conversation

@wolvecap

Copy link
Copy Markdown
Contributor

No description provided.

@RenShuhuai-Andy RenShuhuai-Andy left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the PR, here is some feedback:

  1. There is no need to change the way of importing packages in taa/archive.py etc. since I have moved the __main__ function in the taa/search.py outside (examples/reproduce_experiment.py).
  2. The __main__ function in taa/search_augment_train.py may also need to move to the examples fold, please check it.
  3. If a dataset in huggingface/datasets doesn't have the original val set, please split out 10% training samples for validation.
  4. Other specific suggestions on revision are commented after each file, please check it.

Thanks again.

Comment thread taa/data.py
Comment on lines +109 to +113
if C.get()['ir'] < 1 and C.get()['method'] != 'bt':
# rebalanced data
ir_index = np.where(labels == 0)
texts = np.append(texts, texts[ir_index].repeat(int(1 / C.get()['ir']) - 1))
labels = np.append(labels, labels[ir_index].repeat(int(1 / C.get()['ir']) - 1))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have fixed

Comment thread taa/data.py
Comment on lines -48 to +59
transform_train.transforms.insert(0, Augmentation(default_policy()))
pass

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have fixed

Comment thread taa/archive.py
('random_word_swap', 0.4009335761117499, 0.3015697007069029)]]


def default_policy():

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have deleted the default policy since it is not been used anymore

class_num = train_dataset.features['label'].num_classes
all_train_examples = get_examples(train_dataset, text_key)

train_examples, valid_examples = general_split(all_train_examples, test_size=test_size, train_size=1-test_size)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A judgment statement should be added: if the dataset originally has a validation set, there is no need to split the val set from the training set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants