Harvard Releases Free Large-Scale AI Training Database - 1

Photo by Aleks Marinkovic on Unsplash

Harvard Releases Free Large-Scale AI Training Database

  • Written by Andrea Miliani Former Tech News Expert

Harvard University announced it will release a large data set of almost 1 million public-domain books for AI training for free, created by its new program Institutional Data Initiative (IDI).

In a Rush? Here are the Quick Facts!

  • Harvard in collaboration with Google Books released a dataset with almost 1 million public-domain books to train AI models for free
  • The dataset was created by the new Institutional Data Initiative, an initiative backed by Microsoft and OpenAI
  • Small organizations can benefit from this data collection to compete more fairly in the AI sphere

According to Wired , the dataset includes publications scanned by Google Books that are not protected by copyright anymore—it usually expires 70 years after the author’s death or its publication. The data collection covers multiple formats and genres, from creative writing by famous authors like Charles Dickens, Shakespeare, and Dante to textbooks and dictionaries.

According to IDI’s executive director Greg Leppert, the goal is to “level the playing field” and allow more organizations and small projects to join the AI race with valuable tools. The data set’s size is larger than the one used to train popular AI models like Meta’s Llama. “I think about it a bit like the way that Linux has become a foundational operating system for so much of the world,” said Leppert.

The IDI was officially launched today and it has been supported by OpenAI and Microsoft with funding and encouraging words. The initiative aims to work with knowledge institutions like government agencies and libraries “to develop data collections and best practices for artificial intelligence.” The details of how the new dataset can be downloaded have not been revealed, only that Google will help with the distribution.

This new data collection should avoid disputes for copyright infringement as many AI companies have been facing this year. “Large public domain datasets like these further demolish the ‘necessity defense’ some AI companies use to justify scraping copyrighted work to train their models,” said Ed Newton-Rex, a former executive at Stability AI who now runs a nonprofit that certifies ethically-trained AI tools to Wired.

Newton-Rex recently led a petition to stop tech companies from scraping data to train their AI models.

Project Astra And Mariner Showcased As Key Innovations In Gemini 2.0 - 2

Image by Amanz, from Unsplash

Project Astra And Mariner Showcased As Key Innovations In Gemini 2.0

  • Written by Kiara Fabbri Former Tech News Writer
  • Fact-Checked by Justyn Newman Former Lead Cybersecurity Editor

Google and Alphabet CEO Sundar Pichai announced on Wednesday the launch of Gemini 2.0, a new iteration of the company’s AI model, aimed at advancing capabilities in multimodal reasoning and the development of intelligent agents.

In a Rush? Here are the Quick Facts!

  • Gemini 2.0 Flash offers low latency and improved performance for developers.
  • Project Astra improves dialogue and memory, supporting multilingual communication and integration tools.
  • Project Mariner, an early prototype, successfully navigates web tasks with 83.5% success.

With Gemini 2.0, Google moves closer to its vision of creating a universal assistant, with new features that will expand the utility of artificial intelligence across various domains.

The launch of Gemini 2.0 follows the introduction of Gemini 1.0 in December 2023, which marked the beginning of Google’s push into multimodal AI, capable of processing and understanding text, images, video, audio, and code.

Now, with the release of Gemini 2.0, these capabilities are further enhanced, allowing for both input and output of multiple modalities, including images, video, and even text-to-speech in multiple languages.

At the core of Gemini 2.0 is its “agentic” nature, meaning it can reason through problems and take actions on behalf of the user with their supervision. The release marks the beginning of a broader effort to integrate these advanced capabilities into everyday products.

These capabilities bring Gemini 2.0 closer to becoming a universal assistant, capable of supporting various applications across industries.

For developers, the launch also includes the Gemini 2.0 Flash model, a workhorse with low latency and improved performance, available through the Gemini API and Google AI Studio. This model aims to offer real-time, efficient interactions, significantly improving the speed and effectiveness of AI applications.

Gemini 2.0 Flash will be available to developers in early 2024, and users can experience an upgraded version of the Gemini assistant via the Gemini app, which is optimized for mobile and desktop access.

Additionally, new projects powered by Gemini 2.0 that showcase the evolving capabilities of AI agents. Project Astra is an AI assistant that uses multimodal understanding to interact with the real world. It has received valuable feedback from trusted testers, allowing Google to refine its capabilities.

The latest updates include improved dialogue, with support for multiple languages and accents, and new tool integrations like Google Search, Lens, and Maps. Project Astra also boasts better memory, allowing it to recall past conversations, and lower latency, enabling near-instant responses.

Google is expanding its tester program and plans to bring these features to various devices, including prototype glasses.

Project Mariner, another project powered by Gemini 2.0, is a browser-based AI agent designed to assist with complex tasks. It can navigate web pages, understand text, code, images, and forms, and perform tasks like filling out forms.

Although still in early stages, it has demonstrated impressive performance, achieving 83.5% success on the WebVoyager benchmark. Safety measures are in place, such as requiring user confirmation before sensitive actions.

Additionally, Google is experimenting with AI agents in gaming, where they help navigate virtual worlds, and in robotics, using Gemini 2.0’s spatial reasoning for real-world applications. Google emphasizes responsible development, working with experts to mitigate risks and ensure safety in these advanced AI systems.