Thinking Machines Lab

Thinking Machines Lab is an artificial intelligence research and product company. We're building a future where everyone has access to the knowledge and tools to make AI work for their unique needs and goals.

While AI capabilities have advanced dramatically, key gaps remain. The scientific community's understanding of frontier AI systems lags behind rapidly advancing capabilities. Knowledge of how these systems are trained is concentrated within the top research labs, limiting both the public discourse on AI and people's abilities to use AI effectively. And, despite their potential, these systems remain difficult for people to customize to their specific needs and values. To bridge the gaps, we're building Thinking Machines Lab to make AI systems more widely understood, customizable and generally capable.

We are scientists, engineers, and builders who've created some of the most widely used AI products, including ChatGPT and Character.ai, open-weights models like Mistral, as well as popular open source projects like PyTorch, OpenAI Gym, Fairseq, and Segment Anything.

Our Mission


We are focused on three key objectives:

  1. Enhancing Accessibility: AI should be as customizable and adaptable as possible. Thinking Machines Lab is developing tools that make it easier for users, both technical and non-technical, to shape AI models according to their specific needs.

  2. Promoting Transparency: Unlike many AI companies that guard their research behind closed doors, Thinking Machines Lab is committed to publishing technical notes, research papers, and code. By taking an open-science approach, it aims to encourage collaboration and accelerate AI advancements.

  3. Advancing AI Capabilities: Current AI systems excel in structured tasks like coding and mathematics, but they still struggle with real-world adaptability. Thinking Machines Lab is working to develop models that can handle a broader range of tasks and integrate into more industries.

Science is better when shared

Scientific progress is a collective effort. We believe that we'll most effectively advance humanity's understanding of AI by collaborating with the wider community of researchers and builders. We plan to frequently publish technical blog posts, papers, and code. We think sharing our work will not only benefit the public, but also improve our own research culture.

AI that works for everyone

Emphasis on human-AI collaboration. Instead of focusing solely on making fully autonomous AI systems, we are excited to build multimodal systems that work with people collaboratively.

More flexible, adaptable, and personalized AI systems. We see enormous potential for AI to help in every field of work. While current systems excel at programming and mathematics, we're building AI that can adapt to the full spectrum of human expertise and enable a broader spectrum of applications.

Solid foundations matter

Model intelligence as the cornerstone. In addition to our emphasis on human-AI collaboration and customization, model intelligence is crucial and we are building models at the frontier of capabilities in domains like science and programming. Ultimately, the most advanced models will unlock the most transformative applications and benefits, such as enabling novel scientific discoveries and engineering breakthroughs.

Infrastructure quality as a top priority. Research productivity is paramount and heavily depends on the reliability, efficiency, and ease of use of infrastructure. We aim to build things correctly for the long haul, to maximize both productivity and security, rather than taking shortcuts.

Advanced multimodal capabilities. We see multimodality as critical to enabling more natural and efficient communication, preserving more information, better capturing intent, and supporting deeper integration into real-world environments.

Learning by doing

Research and product co-design. Products enable iterative learning through deployment, while great products and research strengthen each other. Products keep us grounded in reality and guide us to solve the most impactful problems.

Empirical and iterative approach to AI safety. The most effective safety measures come from a combination of proactive research and careful real-world testing. We plan to contribute to AI safety by:

  1. Maintaining a high safety bar--preventing misuse of our released models while maximizing users' freedom.

  2. Sharing best practices and recipes for how to build safe AI systems with the industry.

  3. Accelerating external research on alignment by sharing code, datasets, and model specs.

We believe that methods developed for present day systems, such as effective red-teaming and post-deployment monitoring, provide valuable insights that will extend to future, more capable systems.

Measure what truly matters. We'll focus on understanding how our systems create genuine value in the real world. The most important breakthroughs often come from rethinking our objectives, not just optimizing existing metrics.

Our Team


Mira Murati
(CEO)
Lilian Weng
Co-Founder
Barret Zoph
(CTO)
Jonathan Lachman
(Head of Ops)
John Schulman
(Chief Scientist)
Alex Gartrell
(Infrastructure)
Alexander Kirillov
(Multimodality)
Andrew Tulloch
(Technical Staff)
Brydon Eastman
(Research Scientist)
Christian Gibson
(Technical Staff)
Devendra Chaplot
(Founding Team)
Ian O'Connell
(SW Engr in Boulder)
Jacob Menick
(Tech Staff in UK)
Joshua Gross
(Technical Staff)
Kurt Shuster
(Research Engr NYC)
Kyle Luther
(Technical Staff)
Luke Metz
(Research Engineer)
Mario Saltarelli
(IT)
Myle Ott
(Founding Team)
Nikki Sommer
(Head of HR)
Noah Shpak
(Tech Staff NYC)
Pia Santos
(Founding Team)
Randall Lin
(Technical Staff)
Rowan Zellers
(Research Engineer)
Sam Schoenholz
(Research Engineer)
Sam Shleifer
(Research in NYC)
Stephen Chen
(Software Engineer)
Stephen Roller
(Research in NYC)
Yinghai Lu
(Technical Staff)

Join Us

We're building AI systems that push technical boundaries while delivering real value to as many people as possible. Our team combines rigorous engineering with creative exploration, and we're looking for collaborators to help shape this vision.

You can follow us on X at @thinkymachines or submit job applications here if you're interested in working with us.

Product Builders

Join us in the exciting early stages of building something transformative. We are looking for people with a strong track record of building successful AI-driven products from the ground up and enthusiasm about wearing multiple hats--building functional product prototypes, crafting smooth UI designs and directing product decisions--to bring cutting-edge AI to the real world.

Machine Learning Experts

We're putting together a small, high-caliber team of machine learning scientists and engineers. The activities will range from building training infrastructure to carrying out exploratory research projects. Whether you hold a PhD or are self-taught, we're interested in candidates who can demonstrate concrete achievements in ML research and engineering through:

  • Research publications in ML
  • Open-source ML implementations
  • Experience building and scaling ML systems

Research Program Manager

An efficient process can transform an entire team's productivity. As our first Research Program Manager, you'll shape how our team operates, scale our human data efforts, and lead key projects like GPU compute planning. If you excel at problem solving, fast learning, and driving operational excellence, this is your chance to make a big impact. A strong technical foundation and the ability to learn things fast will be highly valued.


Posts


Mira Murati
(CEO)

Today, we are excited to announce Thinking Machines Lab (ThinkingMachines.ai), an artificial intelligence research and product company. We are scientists, engineers, and builders behind some of the most widely used AI products and libraries, including ChatGPT, Character.ai, PyTorch, and Mistral. Our mission is to make artificial intelligence work for you by building a future where everyone has access to the knowledge and tools to make AI serve their unique needs.

We are committed to open science through publications and code releases, while focusing on human-AI collaboration that serves diverse domains. Our approach embraces co-design of research and products to enable learning from real-world deployment and rapid iteration. This work requires three core foundations: state-of-the-art model intelligence, high-quality infrastructure, and advanced multimodal capabilities. We are committed to building models at the frontier of capabilities to deliver on this promise.

If you’re interested in joining our team, consider applying here: click here to apply


I started Thinking Machines Lab alongside a remarkable team of scientists, engineers, and builders. We're building three things:
  1. Helping people adapt AI systems to work for their specific needs
  2. Developing strong foundations to build more capable AI systems
  3. Fostering a culture of open science that helps the whole field understand and improve these systems
Our goal is simple, advance AI by making it broadly useful and understandable through solid foundations, open science, and practical applications.



Lilian Weng
Co-Founder
This is something we have been cooking together for a few months and I'm very excited to announce it today.

Thinking Machines Lab is my next adventure and I'm feeling very proud and lucky to start it with a group of talented colleagues. Learn more about our vision at: ThinkingMachines.ai


AI Terms

Artificial intelligence (AI) is a technology that enables computers to perform tasks that typically require human intelligence. AI uses math, logic, and large amounts of data to learn and improve its performance over time.

AI Algorithm is a set of instructions that teach a computer how to learn, analyze data, and make decisions. AI algorithms are the building blocks of artificial intelligence.

AI Alignment refers to a process of encoding human values into AI models to make them safer and more reliable.

AI Bias is when an AI system produces biased results that discriminate against certain groups of people. This bias can be based on factors like race, gender, disability, or socioeconomic status.

AI Distillation is a machine learning technique that transfers knowledge from a large AI model to a smaller one. The larger model is called the "teacher model" and the smaller model is called the "student model".

AI Fine-tuning is the process of modifying a pre-trained model to perform a specific task or dataset. It's a fundamental technique in Deep Learning.

AI Hallucinations are incorrect or misleading results from artificial intelligence (AI) models. These errors can occur when AI models misinterpret data or make incorrect assumptions.

AI Parameters are variables that control how a model processes data and makes decisions. They are learned during training and adjusted to improve the model's performance.

Artificial General Intelligence (AGI) refers to a theoretical type of artificial intelligence that would possess the ability to perform any intellectual task a human can, essentially mimicking human-like intelligence across various domains, including reasoning, learning, problem-solving, and adapting to new situations, all without being specifically programmed for each task; it's considered a future goal of AI research, as current AI systems are primarily designed for specific tasks rather than broad cognitive abilities like humans have.

Augmented intelligence is a technology that uses artificial intelligence (AI) and machine learning to help humans work more effectively. It's also known as intelligence amplification (IA) or cognitive augmentation.

Autonomous Systems are systems that can operate independently and make decisions without human supervision. They can be used in many fields, including transportation, healthcare, and manufacturing.

Chatbot is a a computer program designed to simulate conversation with human users.

Data Mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems.

Deep Learning (DL) is a type of machine learning that teaches computers to learn from data using artificial neural networks in which multiple layers of processing are used to extract progressively higher level features from data.

Diffusion Models are a type of artificial intelligence (AI) that can generate images, audio, and other data. They are a leading generative AI technology.

Generative AI (GenAI) is a type of artificial intelligence (AI) that can create new content, ideas, or data. It can produce text, images, videos, music, and more.

GPT stands for Generative Pre-trained Transformer, a type of artificial intelligence (AI) that can create human-like text. GPT models are used in applications like ChatGPT, and are a key advancement in AI.

Human-AI Collaboration is a partnership between human intelligence and artificial intelligence (AI) to solve problems and create new ideas. This collaboration can be beneficial in many fields, including healthcare, customer service, and scientific research.

Human-Centered AI is AI that seeks to augment the abilities of, address the societal needs of, and draw inspiration from human beings. It researches and builds effective partners and tools for people; such as a robot helper and companion for the elderly.

Intelligence might be defined as the ability to learn and perform suitable techniques to solve problems and achieve goals, appropriate to the context in an uncertain, ever-varying world. A fully pre-programmed factory robot is flexible, accurate, and consistent - but not intelligent.

Large Language Model (LLM) is an artificial intelligence (AI) system that can understand, generate, and process human language. LLMs are trained on large amounts of data using machine learning techniques.

Machine learning (ML) is a subfield of artificial intelligence, which is broadly defined as the capability of a machine to imitate intelligent human behavior. Artificial intelligence systems are used to perform complex tasks in a way that is similar to how humans solve problems.

Multimodal Systems process multiple types of user input, such as speech, gestures, and eye movements, and then coordinates that input with multimedia output. Multimodal systems are designed to recognize human language and behavior.

Natural Language Processing (NLP) refers to a field of computer science that allows computers to understand and process human language, enabling them to interpret the meaning of text or speech, often used for tasks like sentiment analysis, machine translation, and text summarization, essentially allowing machines to "communicate" in a way similar to humans.

Neural Network is a type of artificial intelligence (AI) that teaches computers to process data in a way that mimics the human brain. Neural networks are a key part of machine learning, a subset of AI that allows systems to learn and improve on their own.

Pre-training is a machine learning technique that involves training a model on a large dataset before using it for a specific task. It's a common way to initialize machine learning models. It creates a model that can generate content from the web.

Post-training models are machine learning models that have been refined after initial training. The goal of post-training is to improve the model's performance and make it more efficient. It targets a narrower range of behaviors; like being a chat assistant.

Prompt Engineering is the process of writing instructions that guide AI models to produce desired outputs. It's a creative and iterative process that involves choosing words, phrases, and formats to get the AI to respond in a meaningful way.

Reinforcement Learning ia a type of machine learning where an AI learns to make decisions by performing actions and receiving rewards or penalties. It lets an agent learn action sequences that optimize its total rewards, such as winning games, without explicit examples of good techniques, enabling autonomy.

Supervised Learning is where a computer learns to predict human-given labels, such as dog breeds based on labeled dog pictures.

Transformer Models are a type of neural network that have transformed natural language processing in AI. They take a different approach compared to previous models. First, transformers can train on entire texts all at once in parallel, rather than word-by-word in order. This makes training drastically faster.

Unsupervised Learning does not require labels, sometimes making its own prediction tasks; such as trying to predict each successive word in a sentence.

AI Jobs

AI Engineer is a professional who develops, programs, and maintains artificial intelligence (AI) systems. They use AI and machine learning (ML) to create applications that can perform human-like tasks.

AI Ethics Specialist is a professional who focuses on ensuring that artificial intelligence (AI) systems are developed and deployed in a way that aligns with ethical principles, considering factors like fairness, transparency, accountability, and privacy, by identifying potential risks and mitigating them throughout the AI development process.

AI Product Manager is a professional responsible for overseeing the development and launch of products powered by artificial intelligence (AI), ensuring they meet both business objectives and customer needs by collaborating with data scientists, engineers, and other stakeholders to define features and manage the product lifecycle throughout its development and deployment.

Business Intelligence Developer is a software engineer who designs, builds, and maintains systems that transform raw data into actionable insights for businesses, primarily by creating dashboards, reports, and data visualizations to facilitate informed decision-making through data analysis; essentially, they bridge the gap between raw data and meaningful business intelligence.

Computer Vision Engineer creates systems that enable computers to process and understand visual data. This includes developing algorithms that allow machines to identify and interpret images and videos.

Data Engineer is a software professional who builds and maintains an organization's data infrastructure. They ensure that data is accurate, accessible, and secure. Data engineers are vital to the success of an organization because they make large amounts of data usable and actionable. Data Scientist work together with analysts and businesses to convert data insights into action. They make diagrams, graphs, and charts to represent trends and predictions. Data summarization helps stakeholders understand and implement results effectively.

Deep Learning Engineer is an AI expert who develops and maintains machine learning models using deep learning algorithms. Deep learning is a subset of machine learning that uses neural networks with many layers to analyze data.

Machine Learning Engineer is an professional who designs and builds AI systems that can learn and make predictions. They are part of a data science team and work with data scientists, data engineers, and others.

NLP Engineer designs algorithms and systems that allow computers to understand and respond to human language. NLP engineers work in many industries, including healthcare, finance, and retail.

Prompt Engineer s a professional who designs and optimizes prompts for AI models. Prompts are the instructions given to AI models, such as ChatGPT or Midjourney, to generate specific responses.

Robotics Engineer designs, builds, and maintains robots and robotic systems. They work in many industries, including manufacturing, healthcare, and aerospace.

Articles


Books


Links


Searches


History

Date Event
2025-07-04 GPT-5 model will be released
2025-02-18 T-M-L launched new website
2024-12-15 T-M-L was founded
2024-09-07 T-M-L registered domain name
2023-10-30 Biden signs Safe AI Executive Order
2023-08-01 DALL-E v3 model released
2023-05-14 GPT-4 model released
2022-11-01 ChatGPT model released
2022-04-01 DALL-E v2 model released
2021-01-01 DALL-E v1 model released
2020-06-01 GPT-3 model released
2019-02-14 GPT-2 model released
2018-06-11 GPT-1 model released

Papers

Date Document
2024-12-05 OpenAI o1 System Card
2024-11-28 Reward Hacking
2024-01-01 The Rise of Thinking Machines
2023-04-31 Let’s Verify Step by Step
2023-01-31 Scaling Laws for Learning
2022-03-21 Language & Coding Creativity
2021-07-14 Evaluating LLMs Trained on Code
2020-01-01 Deterrence of Thinking Machines
1983-01-01 Thoughts of Thinking Machines

Interviews







Presentations




Reports











Pictures