Get the latest news, expert insights and market research, sent straight to your inbox.
Newsletters may contain advertising. You can unsubscribe at any time.
A feature-rich data annotation tool can significantly improve the accuracy of your ML projects. Let’s look at some open-source data annotation tools that serve multiple use cases.
Whether you are developing a machine learning model for a self-driving car or a product recommendation app, the first step is to label your training data. In other words, every dataset should be labeled (or annotated), so when the model is deployed, it will recognize similar datasets in unannotated data and take the appropriate action. Whether you end up with a high-performing ML model or a failed project will depend upon the data annotation tools you use to label your data.
According to Cloud Factory, a data annotation tool is “a cloud-based, on-premise, or containerized software solution that annotates training data for machine learning.” While many commercial tools are available for purchase, open-source or freeware data annotation tools are often preferred. Not only are they available at no cost, but they also let you customize your tools’ annotation accuracy thresholds and security features.
When choosing among the myriad of open-source projects, you should evaluate them in terms of the following:
To derive the most out of your machine learning project, you should ensure that the annotation tool should be capable of working with the file types you will need to label at all times. You should be able to search, filter, sort, clone and merge datasets, whether they are stored locally or in the cloud.
Different tools store annotations in different output formats, such as Pascal, TFRecords, or text files (CSV, txt), to name a few. You should choose the tool that meets your format requirements; otherwise, you will need to spend additional time converting your annotations to your target format.
Ensure that the tool’s annotation methods for building and managing ontologies, such as classes and attributes, meet your particular use case requirements. While many tools can work with many different use cases, others focus only on specific types of labeling. As per Cloud Factory, your chosen annotation tool should be able to annotate images for all kinds of computer vision tasks you will employ, such as classification, object annotation, or semantic segmentation.
As far as the annotation app itself is concerned, not all tools can be used both online and offline. While some tools can function as Windows apps and web-based apps, most of them are web-based only. So it would be best if you choose accordingly.
Also, consider privacy issues that you may incur before considering web-based-only tools. Working with a 3rd party web app may expose your system to a data breach. Likewise, you will want your tool to prevent unauthorized viewing or downloading of your data by annotators. In short, ensure that the annotation tool will help you maintain any regulatory compliance requirements your use cases fall under.
Finally, look for tools that will include hotkeys and a user interface to make manual annotation more efficient and less time-consuming.
Learn More: How to Improve the Accuracy of AI Systems With Diversified Data
CVAT can perform both image and video data annotation and can be installed in the local network using Docker or locally on any operating system. You can also work with it entirely online from CVAT’s website. With CVAT, you have a variety of annotation shapes to choose from, including everything from rectangles, polygons and polylines to points, cuboids, tags and tracks. It also supports a wide range of annotation formats such as CVAT, Pascal, XML, MS COCO, YOLO and TFRecords. Hotkeys and semantic segmentation are also supported.
Data quality for annotation can be set from very high full resolution to completely compressed. Among its collaborative features is its capability to divide annotation tasks among team members and monitor, visualize, and analyze annotation jobs. It also supports automated annotation using pre-trained models.
LabelImg has a Qt graphical interface that you can install locally on any operating system. It is available for Windows/Linux/Ubuntu/Mac and as a Python library in Anaconda or Docker. It supports a number of output formats such as Pascal, YOLO’s txts, CSV and TFRRecords. It supports hotkeys and image verification but supports only the bounding box annotation shape and has no browser support.
VIA can be run through a browser window and can label image, audio, and video data. Annotation shapes supported include bounding boxes, circles, ellipses, as well as polygons, points and polylines. You can also use it for text annotation. Supported output formats include COCO JSONs, Pascal and CSVs. Exporting to other formats will require additional external transformations. Along with hotkeys, VIA includes project management functionality for setting up multiple jobs for annotators and tracking their progress.
VoTT can import data from both local and cloud storage and export labeled data back to local or cloud storage. It can run from source or on Windows, Linux or OSX. It is also available as a standalone web application that can run on any web browser. However, the web app requires that the dataset be uploaded to the cloud as it cannot access a local file system. It supports two types of annotation shapes: polygons and rectangles. Features include project tracking metrics and keyboard shortcuts. Along with common output formats CSV, Generic JSONs, Pascal, and TFRecords, VoTT also supports Microsoft Cognitive Toolkit (CNTK) and Azure Custom Vision Service.
CoLabeler is a freeware tool that is free to download, install, use, and share like its open-source counterparts. It uses bounding box and 2-D point annotation shapes and also supports text annotation.
Learn More: Why Machine Learning Accuracy Matters and Top Tools to Supercharge It
Most small/medium-sized work teams prefer free open-source tools. However, there may come a time when commercial solutions will become a better value. For example, open-source tools are difficult to scale, as these tools typically do not offer the workflow features necessary for enterprise-scale teams working on data annotation. Additionally, while they are not free, commercial solutions can greatly reduce the cost of ownership related to open-source tools–such as workflow development and ongoing support that are typically built into commercial tools.
However, unlike self-built, open-source tools, vendors don’t typically build their tools to a customer’s specifications. You must decide what custom features you are willing to forego both now and in the future. With that in mind, CloudFactory lists some key questions you should ask vendors before making a move to a commercial solution. For example, how does a vendor’s tool differ from other commercially available tools? What aspects of the machine data labeling process does their tool support? Are they open to making changes and feature enhancements to better serve your use cases?
In terms of dataset management, what features do they offer? Where can files be stored? What volume of data can the tool handle? Will you be able to upload pre-annotated images into the tool?
Does the tool come with an API and/or SDK? Can you upload custom-built classes and attributes into the tool? Can your own algorithms be plugged into the tool? Finally, what enterprise features are built into the tool, including security compliance or certifications, quality control, quality assurance or AI?
To have the control, data security, and agility to make feature enhancements or other changes, open-source tools that are self- built and managed may end up being the best option over commercially produced tools.
Do you think open-source data annotation tools are as effective in improving the accuracy of ML projects as commercial ones? Comment below or let us know on LinkedIn, Twitter, or Facebook. We’d love to hear from you.
IT Analyst, CMR Executive Advisory
Your email address will not be published. Required fields are marked *
You may use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>
No Account? Sign up
We'll send an email with a link to reset your password.
Get the latest news, expert insights and market research, tailored to your interests.
Already have an account? Sign in
Enter the email address associated with your account. We'll send a magic link to your inbox.
You auth link is expired or incorrect, please try again.
Get the latest news, expert insights and market research, tailored to your interests.
Enter a Email Address
Country Afghanistan Albania Algeria Andorra Angola Anguilla Antarctica Antigua & Barbuda Argentina Armenia Aruba Australia Austria Azerbaijan Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin Bermuda Bhutan Bolivia Bosnia-Herzegovina Botswana Bouvet Island Brazil British Virgin Islands Brunei Bulgaria Burkina Faso Burundi Cambodia Cameroon Canada Cape Verde Cayman Islands Central African Republic Chad Chile China Colombia Comoros Congo Cook Islands Costa Rica Cote D’ivoire Croatia Cuba Cyprus Czech Republic Denmark Djibouti Dominica Dominican Republic East Timor Ecuador Egypt El Salvador Equatorial Guinea Eritrea Estonia Ethiopia Falkland Islands Faroe Islands Fiji Finland France French Guiana French Polynesia Gabon Gambia Georgia Germany Ghana Gibraltar Greece Greenland Grenada Guadeloupe Guatemala Guinea Guinea-Bissau Guyana Haiti Honduras Hong Kong Hungary Iceland India Indonesia Iraq Ireland Islamic Republic of Iran Israel Italy Jamaica Japan Jordan Kazakhstan Kenya Kiribati Korea, DPRK Korea, ROK Kuwait Kyrgyzstan Laos Latvia Lebanon Lesotho Liberia Libya Liechtenstein Lithuania Luxembourg Macau Madagascar Malawi Malaysia Maldives Mali Malta Martinique Mauritania Mauritius Mayotte Mexico Moldova Monaco Mongolia Monserrat Morocco Mozambique Myanmar (Burma) Nambia Nauru Nepal Netherlands Netherlands Antilles New Caledonia New Zealand Nicaragua Niger Nigeria Niue Norfolk Island Norway Oman Pakistan Panama Papua New Guinea Paraguay Peru Philippines Pitcairn Poland Portugal Qatar Reunion Romania Russian Federation Rwanda Saint Lucia Samoa San Marino Sao Tome & Principe Saudi Arabia Senegal Seychelles Sierra Leone Singapore Slovakia Slovenia Solomon Islands Somalia South Africa Spain Sri Lanka St. Helena St. Pierre & Miquelon Sudan Suriname Swaziland Sweden Switzerland Syria Taiwan Tajikistan Tanzania Thailand Togo Tokelau Tonga Trinidad & Tobago Tunisia Turkey Turkmenistan Tuvalu U.S. Pacific Islands Uganda Ukraine United Arab Emirates United Kingdom United States Uruguay Uzbekistan Vanuatu Vatican City (Holy See) Venezuela Vietnam Western Sahara Yemen Yugoslavia Zaire Zambia Zimbabwe