• VL Hub is a vision-language pretraining framework which integrates CLIP pretraining, LiT-Tuning, CoCa, conventional timm vision models and SimCLR contrastive models into a single test-train-eval framework, making it easier to compare models across architectures and objectives.
  • ArcheType is a framework which uses large language models to help with data cleaning and integration. This repository also includes the D4Tables dataset, a zero-shot dataset for experimenting with column type annotation.
  • JANuS is an image classification dataset designed for experimentation with controlled data. Each dataset in JANuS is either a subset or a superset of an existing dataset, and each is fully captioned and fully labeled, either using annotated or synthetic labels. JANuS is ideal for groups which don't have the resources to experiment wih large-scale pretraining.
  • iNat Captions is a version of the iNaturalist 2021 dataset which includes ground-truth captions. The dataset is in webdataset format, allowing for convenient pretraining of vision-language models using VLHub.