Search Test Information Space

Found 83 bookmarks

Custom sorting

ChatGPT - Hybrid Intelligence for TIS

#ChatGPT #Testing

·chatgpt.com·Mar 5, 2025

ChatGPT - Hybrid Intelligence for TIS

Grok Conversation / X

(Pulling TIS up by Grok, like its own knot per Fishbach 2022. After reviewing training, the chatbot eventually realizes it is part of the equation. Ends with a compressed prompt, though too big for a Quora question.)

#Grok #Testing

·x.com·Mar 5, 2025

Grok Conversation / X

A new experiment to help people explore more career possibilities

#Recruiting #Testing #Google #Labor

·blog.google·Feb 21, 2025

A new experiment to help people explore more career possibilities

Researchers Replicate OpenAI's Hot New AI Tool in 24 Hours

(Incidentally, a bar or vinculum over a letter in Roman numerals is a multiplier of 1000.)

#Reasoning #Training #Testing #Large Language Models #Blog #Research #Questions and Answers

·futurism.com·Feb 10, 2025

Researchers Replicate OpenAI's Hot New AI Tool in 24 Hours

Mem 2.0 Alpha Testing Guide - Mem

#Mem #Testing

·get.mem.ai·Jan 8, 2025

Mem 2.0 Alpha Testing Guide - Mem

AI Models Are Getting Smarter. New Tests Are Racing to Catch Up

#Testing #Model #Evaluation

·time.com·Dec 25, 2024

AI Models Are Getting Smarter. New Tests Are Racing to Catch Up

The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

#Large Language Models #Training #Testing #Paper #PDF

·arxiv.org·Dec 9, 2024

The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

Advancing red teaming with people and AI | OpenAI

#OpenAI #Testing

·openai.com·Nov 22, 2024

Advancing red teaming with people and AI | OpenAI

Machine Learning Might Mean Less Chip Testing

#Testing #Hardware #Machine Learning

·spectrum.ieee.org·Nov 10, 2024

Machine Learning Might Mean Less Chip Testing

ChatGPT - Emerging Tech and Politics

#ChatGPT #Search #Testing

·chatgpt.com·Oct 31, 2024

ChatGPT - Emerging Tech and Politics

ChatGPT - Critiquing Search Engines vs AI

#ChatGPT #Search #Evaluation #Testing

·chatgpt.com·Oct 31, 2024

ChatGPT - Critiquing Search Engines vs AI

ChatGPT - Best Planners for Month

(Showing a conversation thread rather than only the current message.)

#ChatGPT #Search #Testing

·chatgpt.com·Oct 31, 2024

ChatGPT - Best Planners for Month

How to try and use Grok 2 for free

https://lmarena.ai/

#Grok #Testing

·medium.com·Oct 16, 2024

How to try and use Grok 2 for free

Assume that a green block is atop a red block, and a yellow block is atop a separate blue block. The purple block is put on the yellow block. What is the yellow block under?

(Winograd.)

#OpenAI #Chatbot #Testing

·poe.com·Sep 19, 2024

Assume that a green block is atop a red block, and a yellow block is atop a separate blue block. The purple block is put on the yellow block. What is the yellow block under?

Red Teaming o1 Part 2/2– Detecting Deception with Marius Hobbhahn of Apollo Research

#OpenAI #Testing #Large Language Models

·youtube.com·Sep 19, 2024

Red Teaming o1 Part 2/2– Detecting Deception with Marius Hobbhahn of Apollo Research

Red Teaming o1 Part 1/2–Automated Jailbreaking w/ Haize Labs' Leonard Tang, Aidan Ewart& Brian Huang

#OpenAI #Testing #Large Language Models

·youtube.com·Sep 19, 2024

Red Teaming o1 Part 1/2–Automated Jailbreaking w/ Haize Labs' Leonard Tang, Aidan Ewart& Brian Huang

Testing and mitigating elections-related risks \ Anthropic

#Anthropic #Elections #Testing

·anthropic.com·Jun 6, 2024

Testing and mitigating elections-related risks \ Anthropic

POE Gemini 1.5 Flash

(Ran an exercise on NotebookLM for Gemini PRO 1.5 using separate notes for each section, then here all together. The former can also be given a shelf of books to be quizzed on the subject matter. Possibly the project reports.)

#Gemini #Testing #NotebookLM #POE

·poe.com·May 15, 2024

POE Gemini 1.5 Flash

Powerful New Chatbot Mysteriously Returns in the Middle of the Night

#Chatbot #Comparison #Testing

·gizmodo.com·May 7, 2024

Powerful New Chatbot Mysteriously Returns in the Middle of the Night

Which other authors appear to be the most influential to the answerer in the QUORA CONTENT?

Was able to upload and interrogate the pair of HTML files downloaded for a combined 12.3 MB from Quora content. The apps may eventually be more integrated. At least, chatbots have benefitted elsewhere from editors or notebooks. Although it did think the voice was more Gen Xer than Boomer. Incidentally, Google AI Studio also offers access to Gemini 1.5. However, neither could parse YouTube links like the Gemini Pro/Ultra site.

#POE #Quora #Gemini #Testing

·poe.com·Apr 11, 2024

Which other authors appear to be the most influential to the answerer in the QUORA CONTENT?

Convert content from the image to text.

Gemini 1.5 Pro OCR did well in both a printed label and a handwriting, lower and upper-case lines, test. It balked if the part with the manufacturer was included.

#Gemini #POE #OCR #Testing

·poe.com·Apr 10, 2024

Convert content from the image to text.

Why it's impossible to review AIs, and why TechCrunch is doing it anyway | TechCrunch

#AI #Testing #Review

·techcrunch.com·Mar 23, 2024

Why it's impossible to review AIs, and why TechCrunch is doing it anyway | TechCrunch

PlanRunner2024 - Poe

Prior Prompt Set: Classify ideas in the text and outline a blog post on planning methods. (Upload Rocketbook PDF with 10 pages.) What are the various types of essay or blog writing styles to cover this kind of topic? What about if the topic was the use of new chatbots for planning in contrast to the old ways directly from spreadsheets, apps, schedulers, notebooks, journals, or index cards? What is the upshot, e.g. a prompt list to template the format and set up the content? Detail a contextual prompt as a style guide to use for a customized chatbot that specializes in daily and weekly planning. The goal is to allow the user to engage with the chatbot to develop their plans on a continuous basis. What is a good handle name for this chatbot? What is good greeting message for such a chatbot to cue the user?

#POE #Testing #Chatbot #Gemini

·poe.com·Mar 11, 2024

PlanRunner2024 - Poe

Alex on X: "Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval. For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of… https://t.co/m7wWhhu6Fg" / X

#Testing #Claude

·twitter.com·Mar 5, 2024