About Datasets

Created by Patrick Kellenberger, Modified on Thu, 7 Nov, 2024 at 7:15 PM by Patrick Kellenberger

Q. My research uses datasets that have been withdrawn by their creators, such as DukeMTMC-ReID or MS-Celeb-1M. What should I do?

A. Generally, papers should not use datasets that have been withdrawn by their creators, as doing so may involve ethical violations or even legal complications. Under some circumstances, authors may feel they need to use such datasets — for example, if fair comparison is impossible in any other way. However, authors who use such datasets should always explain the need to do so carefully and in some detail as such claims will be carefully scrutinized. Note that in many cases alternative datasets exist. The recommended course should be to not use the dataset, and (if necessary) explain that this may affect certain comparisons with prior art. It is a violation of policy for a referee or area chair to require comparison on a dataset that has been withdrawn.

Q. My research relies on broadly used public datasets of others, which have not been withdrawn, but for which it is unclear if they have been approved by an IRB. Is this allowed?

A. In the case of broadly used datasets that are still offered by their creators, for which IRB approval status is unclear, authors are encouraged to discuss the situation, e.g., why no better alternatives are available.

Q. I wish to claim a dataset contribution in my paper, but I either cannot release the data publicly, or am not sure I will be able to do so by the time of publication. Is this an issue?

A. YES. If you wish to claim a dataset as one of your contributions, it is expected that your dataset will be ready and available at the time you will be submitting the camera ready paper. If you cannot ensure that you can meet this deadline, then the release of the dataset should not be one of the major scientific contributions of your paper. Note that it is still acceptable to submit work relying on a non-public dataset – you just cannot claim that dataset as one of your contributions, and the paper will have to be evaluated based on its other merits.