Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on Dataset Ambiguities #1

Open
AggarwalTushar opened this issue Jun 13, 2024 · 0 comments
Open

Feedback on Dataset Ambiguities #1

AggarwalTushar opened this issue Jun 13, 2024 · 0 comments

Comments

@AggarwalTushar
Copy link

Hi,

I appreciate your efforts for creating this dataset:) The dataset really seems to be useful for testing of code models. However, I have found some examples in the dataset to be ambiguous while some to be noisy which you can find below:

Problem_id_32: The task is to create a transition matrix based on the given adjacency list. The adjacency list[i] would store all the nodes connected to node i. One would update the transition_matrix[node][neighbor] by the respective probability, but in the ground truth transition_matrix[neighbor][node] is being updated. I feel that row-major or column-major should be explicitly mentioned in the instruction.
32

Problem_id_66: For this problem, the instruction looks good but in the ground truth, along with the edits mentioned in the instruction, one more row of data namely "2024-01-02,P1001,Canada,Online,34,72.99,24,Female" is added in the 'data' variable. The models would never add this until it's explicitly mentioned in the instruction due to which most likely all the models will fail on this example.
66

Problem_id_21: The task is to modify the 'distances_to' function to support negative weights. In the source code, the input is an undirected graph. In the ground truth, the undirected is being converted into directed and then the Bellman Ford algorithm is implemented on top of it. I feel that converting the undirected graph into directed should be explicitly mentioned in the instruction.
21

Problem_id_28: The task is to implement a function which checks whether the given string contains special characters. Here, the set of special characters should be explicitly mentioned in the instruction, otherwise the model wouldn't know what to consider as special characters.
28

Please look into these and make the necessary changes in the dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant