Board

Board

2272 bookmarks
Newest
Context-Free Grammar Parsing With LLMs
Context-Free Grammar Parsing With LLMs
Last week, I open-sourced a method to coerce LLMs into only generating a specific structure for a given regex pattern (ReLLM on Github). The library has proven extremely useful for a handful of tasks I’ve been doing with LLMs (everything from categorization to simple agent automation).
·blog.matt-rickard.com·
Context-Free Grammar Parsing With LLMs
Sitting on the bench
Sitting on the bench
There are many reasons to pick working for a bigger company in tech. The benefits, the pay, and, at least until recently, the job security. In many ways, it's hard to argue with the cold logic of taking a seat on a star destroyer, if you can land one. But odds are you'll be sitting on the bench if you do. That is, your talents won't ge...
·world.hey.com·
Sitting on the bench
Google Cloud suffers major outage in Paris
Google Cloud suffers major outage in Paris
Google Cloud customers in France were hit by one of the worst cloud outages in recent history, and it might have exposed a weakness in how it designs cloud regions.
"There's a saying in programming: make the easy things easy and the hard things possible," Pennarun said. "There's lots of products out there that make the hard things possible. But surprisingly, there's very few products that make the easy things easy."
That might be good advice for enterprise tech entrepreneurs: Tailscale has raised $150 million in funding since it was founded in 2019 and just hit the 100-employee milestone. The dirty secret of enterprise tech is that most companies don't need infrastructure that makes hard things possible to thrive on the internet; you are not Google.
·runtime.news·
Google Cloud suffers major outage in Paris
Taylor Swift and Launch Cadence
Taylor Swift and Launch Cadence
From 2006-2019, Taylor Swift released albums every 2 years — around the industry standard (although extremely impressive to sustain for so long). But in 2019, Taylor started increasing the velocity. In 2019, she recorded one album. In 2020, during the pandemic, two albums. In 2021, two more. In 2022, another. And there’s one slated for release in July this year. So that’s 7 albums in 4 years. She’s not the only one with the skill and discipline to have a launch cadence like this.
·blog.matt-rickard.com·
Taylor Swift and Launch Cadence
LAION, The Pile, and more datasets
LAION, The Pile, and more datasets
What's actually used to train these LLMs? A brief look at some of the datasets involved. LAION-5B Stable Diffusion was trained on a dataset called LAION-5B ("Large-scale Artificial Intelligence Open Network"), which is comprised of 5.85 billion image-text pairs crawled from the internet. The actual crawled data comes from Common Crawl. Common Crawl 3.15 billion pages contained in 380 TB. OpenAI's GPT-3 was, in part, trained by the data in Common Crawl. It is a non-profit founded by Gil Elbaz
·matt-rickard.com·
LAION, The Pile, and more datasets
Google I/O and the Coming AI Battles
Google I/O and the Coming AI Battles
Google A/I suggests that AI is a sustaining innovation for all of Big Tech; that means the real battle will be between incumbents and Big Tech on one side, and open source on the other.
·stratechery.com·
Google I/O and the Coming AI Battles
Accel - 40 Years in Tech
Accel - 40 Years in Tech
Technology has transformed our world and paved the way for a bright future. Join us as we reflect on key moments of innovation since 1983.
·40-years.accel.com·
Accel - 40 Years in Tech
User-Defined Functions in Databases vs SaaS - what's the difference?
User-Defined Functions in Databases vs SaaS - what's the difference?
Co-authored by Carl Sverre from SingleStore What is a User-Defined Function? A User-Defined Function (UDF) encapsulates business logic in such a way that it can be safely run within another service's infrastructure. Historically, general purpose UDFs...
When reaching for a UDF system as a tool to solve a problem, ask yourself this: “am I modifying my own system’s behaviour, or is someone else modifying my system?”. If the answer is that you want to modify the behaviour of your own system, a database UDF may be a good option. If you want to grant someone else the ability to modify how your software behaves, then a SaaS UDF is the way to go. Permalink
·blog.suborbital.dev·
User-Defined Functions in Databases vs SaaS - what's the difference?
To Understand Pants, Understand Bazel’s History
To Understand Pants, Understand Bazel’s History
Unlike I think pretty much all other professionals that don’t have it within their power to create their own tools – if you work in sales and you want better sales tools, you have to find a software engineer to do it for you – but we are software engineers and the tools that we use are themselves made out of software. So we have it in our power to fix them.
[I] went to work at Foursquare, I quickly noticed that Foursquare had the exact same problem. They had this big Scala code base and it wasn’t scaling. The solution at the time – and I am not joking – was to give all of the engineers a stick of ram, a screwdriver, and to say just upgrade your laptops.
I work on Earthly, another open-source build tool tackling similar problems. To Benjy, though, the important thing is not the potential for competition, it’s the size of the problem. I think an example of how much work there is to do in this space is the fact that Earthly and Pants are so different in their approaches, and yet both really fill in these needs.
There’s so much open space here to fill with good technology that two systems with radically different architectures and radically different approaches can both be very useful in their own right and also complimentary.
·earthly.dev·
To Understand Pants, Understand Bazel’s History
That underdog DNA
That underdog DNA
Jason just penned a beautiful, succinct ode to the underdogs. Go read it. It's funny how finding just the right word unlocks the perfect mental image. We've often thought of ourselves as being in the corner of the small business, but that was never quite right. There are many kinds of small businesses, not all of them thinking of thems...
·world.hey.com·
That underdog DNA
We stand with the Underdogs
We stand with the Underdogs
What do they got? A big team, lots of money, a strong brand, seemingly unlimited resources, panache, reputation, all that. They’re established. They’re your competitors. You want to look away, but you see them everywhere. Their ads on your social, their name in the media, your dream clients on their website. But you know what else they...
·world.hey.com·
We stand with the Underdogs
The AI Startup Litmus Test
The AI Startup Litmus Test
Differentiation is critical for Generative AI startups. Use the AI Litmus Test to determine if your company is unique, hard and defensible.
·nfx.com·
The AI Startup Litmus Test
The New AI Moats
The New AI Moats
“We Have No Moat, And Neither Does OpenAI,” a supposedly leaked document from Google, makes some interesting points. The competitive landscape shifts, and so do the moats. What is no longer a moat Data is no longer a moat. For example, GPT-3 and Stable Diffusion were trained on public data sets by companies or groups with zero proprietary data. Now,
·blog.matt-rickard.com·
The New AI Moats
Second-level Thinking
Second-level Thinking
Howard Marks, the founder of Oaktree Capital Management, makes the distinction between first-level thinking and second-level thinking. First-level thinking is superficial analysis — investors (or any other decision makers) making decisions on market sentiment, recent news, or stock price. His examples:
·blog.matt-rickard.com·
Second-level Thinking
Self-hosted Compilers and Bootstrapped AI
Self-hosted Compilers and Bootstrapped AI
The Go compiler was initially written in C but is now entirely written in Go. The Rust compiler is written in Rust (initially in OCaml). These compilers are capable of compiling themselves. Linux is compiled and developed on Linux. PyPy is a self-hosted Python interpreter.
·blog.matt-rickard.com·
Self-hosted Compilers and Bootstrapped AI
llm.ts
llm.ts
There are over 100 different LLMs, with more shipping every day. They differ slightly in their architectures, and the data they were trained on, but all of them do text completion. It’s the APIs that are fragmented — OpenAI uses a “completions” endpoint with parameters like “
·blog.matt-rickard.com·
llm.ts
Unix Philosophy for AI
Unix Philosophy for AI
Text processing was the initial pitch for the development of Unix at Bell Labs (see An Oral History of Unix). It became more than that. Spell checkers in `ed` used the `sort` command. Then there was `AWK,` the text processing language used by the `awk` tool by Aho, Weinberger, and Kernighan. Then there were Unix pipes — the development that made the
·blog.matt-rickard.com·
Unix Philosophy for AI
The data view from AWS re:Invent
The data view from AWS re:Invent
A look at Data & Analytics announcements out of AWS re:Invent in Nov/Dec-22.
Last year's focus was on adding serverless flavors of their analytical & streaming engines, plus on expanding their data marketplace features. However, AWS's push towards "serverless" was soon criticized as only delivering auto-scaling capabilities rather than truly lifting infrastructural burdens off the customer. If it does not have on-demand pricing nor turn off fully when idle, it is not serverless by most cloud observers' definition (which is also AWS's own definition when they invented serverless functions with Lambda).
Beyond Redshift catching up to Snowflake on the separation of storage and compute, data sharing, and serverless with on-demand pricing, they have also caught up on data marketplace capabilities.
·hhhypergrowth.com·
The data view from AWS re:Invent