Found 10 bookmarks
Newest
Data Laced with History: Causal Trees & Operational CRDTs
Data Laced with History: Causal Trees & Operational CRDTs
After mulling over my bullet points, it occurred to me that the network problems I was dealing with—background cloud sync, editing across multiple devices, real-time collaboration, offline support, and reconciliation of distant or conflicting revisions—were all pointing to the same question: was it possible to design a system where any two revisions of the same document could be merged deterministically and sensibly without requiring user intervention?
It’s what happened after sync that was troubling. On encountering a merge conflict, you’d be thrown into a busy conversation between the network, model, persistence, and UI layers just to get back into a consistent state. The data couldn’t be left alone to live its peaceful, functional life: every concurrent edit immediately became a cross-architectural matter.
I kept several questions in mind while doing my analysis. Could a given technique be generalized to arbitrary and novel data types? Did the technique pass the PhD Test? And was it possible to use the technique in an architecture with smart clients and dumb servers?
Concurrent edits are sibling branches. Subtrees are runs of characters. By the nature of reverse timestamp+UUID sort, sibling subtrees are sorted in the order of their head operations.
This is the underlying premise of the Causal Tree. In contrast to all the other CRDTs I’d been looking into, the design presented in Victor Grishchenko’s brilliant paper was simultaneously clean, performant, and consequential. Instead of dense layers of theory and labyrinthine data structures, everything was centered around the idea of atomic, immutable, metadata-tagged, and causally-linked operations, stored in low-level data structures and directly usable as the data they represented.
I’m going to be calling this new breed of CRDTs operational replicated data types—partly to avoid confusion with the exiting term “operation-based CRDTs” (or CmRDTs), and partly because “replicated data type” (RDT) seems to be gaining popularity over “CRDT” and the term can be expanded to “ORDT” without impinging on any existing terminology.
Much like Causal Trees, ORDTs are assembled out of atomic, immutable, uniquely-identified and timestamped “operations” which are arranged in a basic container structure. (For clarity, I’m going to be referring to this container as the structured log of the ORDT.) Each operation represents an atomic change to the data while simultaneously functioning as the unit of data resultant from that action. This crucial event–data duality means that an ORDT can be understood as either a conventional data structure in which each unit of data has been augmented with event metadata; or alternatively, as an event log of atomic actions ordered to resemble its output data structure for ease of execution
To implement a custom data type as a CT, you first have to “atomize” it, or decompose it into a set of basic operations, then figure out how to link those operations such that a mostly linear traversal of the CT will produce your output data. (In other words, make the structure analogous to a one- or two-pass parsable format.)
OT and CRDT papers often cite 50ms as the threshold at which people start to notice latency in their text editors. Therefore, any code we might want to run on a CT—including merge, initialization, and serialization/deserialization—has to fall within this range. Except for trivial cases, this precludes O(n2) or slower complexity: a 10,000 word article at 0.01ms per character would take 7 hours to process! The essential CT functions have to be O(nlogn) at the very worst.
Of course, CRDTs aren’t without their difficulties. For instance, a CRDT-based document will always be “live”, even when offline. If a user inadvertently revises the same CRDT-based document on two offline devices, they won’t see the familiar pick-a-revision dialog on reconnection: both documents will happily merge and retain any duplicate changes. (With ORDTs, this can be fixed after the fact by filtering changes by device, but the user will still have to learn to treat their documents with a bit more caution.) In fully decentralized contexts, malicious users will have a lot of power to irrevocably screw up the data without any possibility of a rollback, and encryption schemes, permission models, and custom protocols may have to be deployed to guard against this. In terms of performance and storage, CRDTs contain a lot of metadata and require smart and performant peers, whereas centralized architectures are inherently more resource-efficient and only demand the bare minimum of their clients. You’d be hard-pressed to use CRDTs in data-heavy scenarios such as screen sharing or video editing. You also won’t necessarily be able to layer them on top of existing infrastructure without significant refactoring.
Perhaps a CRDT-based text editor will never quite be as fast or as bandwidth-efficient as Google Docs, for such is the power of centralization. But in exchange for a totally decentralized computing future? A world full of devices that control their own data and freely collaborate with one another? Data-centric code that’s entirely free from network concerns? I’d say: it’s surely worth a shot!
·archagon.net·
Data Laced with History: Causal Trees & Operational CRDTs
How to Make a Great Government Website—Asterisk
How to Make a Great Government Website—Asterisk
Summary: Dave Guarino, who has worked extensively on improving government benefits programs like SNAP in California, discusses the challenges and opportunities in civic technology. He explains how a simplified online application, GetCalFresh.org, was designed to address barriers that prevent eligible people from accessing SNAP benefits, such as a complex application process, required interviews, and document submission. Guarino argues that while technology alone cannot solve institutional problems, it provides valuable tools for measuring and mitigating administrative burdens. He sees promise in using large language models to help navigate complex policy rules. Guarino also reflects on California's ambitious approach to benefits policy and the structural challenges, like Prop 13 property tax limits, that impact the state's ability to build up implementation capacity.
there are three big categories of barriers. The application barrier, the interview barrier, and the document barrier. And that’s what we spent most of our time iterating on and building a system that could slowly learn about those barriers and then intervene against them.
The application is asking, “Are you convicted of this? Are you convicted of that? Are you convicted of this other thing?” What is that saying to you, as a person, about what the system thinks of you?
Often they’ll call from a blocked number. They’ll send you a notice of when your interview is scheduled for, but this notice will sometimes arrive after the actual date of the interview. Most state agencies are really slammed right now for a bunch of reasons, including Medicaid unwinding. And many of the people assisting on Medicaid are the same workers who process SNAP applications. If you missed your phone interview, you have to call to reschedule it. But in many states, you can’t get through, or you have to call over and over and over again. For a lot of people, if they don’t catch that first interview call, they’re screwed and they’re not going to be approved.
getting to your point about how a website can fix this —  the end result was lowest-burden application form that actually gets a caseworker what they need to efficiently and effectively process it. We did a lot of iteration to figure out that sweet spot.
We didn’t need to do some hard system integration that would potentially take years to develop — we were just using the system as it existed. Another big advantage was that we had to do a lot of built-in data validation because we could not submit anything that was going to fail the county application. We discovered some weird edge cases by doing this.
A lot of times when you want to build a new front end for these programs, it becomes this multiyear, massive project where you’re replacing everything all at once. But if you think about it, there’s a lot of potential in just taking the interfaces you have today, building better ones on top of them, and then using those existing ones as the point of integration.
Government tends to take a more high-modernist approach to the software it builds, which is like “we’re going to plan and know up front how everything is, and that way we’re never going to have to make changes.” In terms of accreting layers — yes, you can get to that point. But I think a lot of the arguments I hear that call for a fundamental transformation suffer from the same high-modernist thinking that is the source of much of the status quo.
If you slowly do this kind of stuff, you can build resilient and durable interventions in the system without knocking it over wholesale. For example, I mentioned procedural denials. It would be adding regulations, it would be making technology systems changes, blah, blah, blah, to have every state report why people are denied, at what rate, across every state up to the federal government. It would take years to do that, but that would be a really, really powerful change in terms of guiding feedback loops that the program has.
Guarino argues that attempts to fundamentally transform government technology often suffer from the same "high-modernist" thinking that created problematic legacy systems in the first place. He advocates for incremental improvements that provide better measurement and feedback loops.
when you start to read about civic technology, it very, very quickly becomes clear that things that look like they are tech problems are actually about institutional culture, or about policy, or about regulatory requirements.
If you have an application where you think people are struggling, you can measure how much time people take on each page. A lot of what technology provides is more rigorous measurement of the burdens themselves. A lot of these technologies have been developed in commercial software because there’s such a massive incentive to get people who start a transaction to finish it. But we can transplant a lot of those into government services and have orders of magnitude better situational awareness.
There’s this starting point thesis: Tech can solve these government problems, right? There’s healthcare.gov and the call to bring techies into government, blah, blah, blah. Then there’s the antithesis, where all these people say, well, no, it’s institutional problems. It’s legal problems. It’s political problems. I think either is sort of an extreme distortion of reality. I see a lot of more oblique levers that technology can pull in this area.
LLMs seem to be a fundamental breakthrough in manipulating words, and at the end of the day, a lot of government is words. I’ve been doing some active experimentation with this because I find it very promising. One common question people have is, “Who’s in my household for the purposes of SNAP?” That’s actually really complicated when you think about people who are living in poverty — they might be staying with a neighbor some of the time, or have roommates but don’t share food, or had to move back home because they lost their job.
I’ve been taking verbatim posts from Reddit that are related to the household question and inputting them into LLMs with some custom prompts that I’ve been iterating on, as well as with the full verbatim federal regulations about household definition. And these models do seem pretty capable at doing some base-level reasoning over complex, convoluted policy words in a way that I think could be really promising.
caseworkers are spending a lot of their time figuring out, wait, what rule in this 200-page policy manual is actually relevant in this specific circumstance? I think LLMS are going to be really impactful there.
It is certainly the case that I’ve seen some productive tensions in counties where there’s more of a mix of that and what you might consider California-style Republicans who are like, “We want to run this like a business, we want to be efficient.” That tension between efficiency and big, ambitious policies can be a healthy, productive one. I don’t know to what extent that exists at the state level, and I think there’s hints of more of an interest in focusing on state-level government working better and getting those fundamentals right, and then doing the more ambitious things on a more steady foundation.
California seemed to really try to take every ambitious option that the feds give us on a whole lot of fronts. I think the corollary of that is that we don’t necessarily get the fundamental operational execution of these programs to a strong place, and we then go and start adding tons and tons of additional complexity on top of them.
·asteriskmag.com·
How to Make a Great Government Website—Asterisk
Why does every job feel like someone is just passing the buck? : r/ExperiencedDevs
Why does every job feel like someone is just passing the buck? : r/ExperiencedDevs
The last three jobs I've held in the last 5 years have all felt like someone just handing me the keys to a sinking boat before they jump off. Every job is sold as having at least some greenfield development where you can "own" the domain and "lead" the direction of the project, but once you accept the offer and get on-boarded, you realize that the system is so brittle that any change will completely break and cause incidents, and there is a year's worth of backlog issues to address with duck-tape and glue before you could even consider fixing the fundamental problems.
Often the teams that built these systems are long gone, so there is nobody to ask for help when you're learning the rough edges, you're just on your own. The technology decisions are all completely set in stone because we could never justify the risk of making changes. There is so much tech debt and maintenance work, we don't really have time to do any new development with the current staffing levels. The job then becomes dominated by on-call responsibilities and fire-fighting. It's 90% toil, and almost zero actual system design and development work.
Being responsible for a whole system that you didn't build, that you know is brittle and broken, but which you cannot fix, is incredibly stressful. It's almost a hopeless situation.
·reddit.com·
Why does every job feel like someone is just passing the buck? : r/ExperiencedDevs
What I learned getting acquired by Google
What I learned getting acquired by Google
While there were undoubtedly people who came in for the food, worked 3 hours a day, and enjoyed their early retirements, all the people I met were earnest, hard-working, and wanted to do great work. What beat them down were the gauntlet of reviews, the frequent re-orgs, the institutional scar tissue from past failures, and the complexity of doing even simple things on the world stage. Startups can afford to ignore many concerns, Googlers rarely can. What also got in the way were the people themselves - all the smart people who could argue against anything but not for something, all the leaders who lacked the courage to speak the uncomfortable truth, and all the people that were hired without a clear project to work on, but must still be retained through promotion-worthy made-up work.
Another blocker to progress that I saw up close was the imbalance of a top heavy team. A team with multiple successful co-founders and 10-20 year Google veterans might sound like a recipe for great things, but it’s also a recipe for gridlock. This structure might work if there are multiple areas to explore, clear goals, and strong autonomy to pursue those paths.
Good teams regularly pay down debt by cleaning things up on quieter days. Just as real is process debt. A review added because of a launch gone wrong. A new legal check to guard against possible litigation. A section added to a document template. Layers accumulate over the years until you end up unable to release a new feature for months after it's ready because it's stuck between reviews, with an unclear path out.
·shreyans.org·
What I learned getting acquired by Google
Interview with Kevin Kelly,editor, author, and futurist
Interview with Kevin Kelly,editor, author, and futurist
To write about something hard to explain, write a detailed letter to a friend about why it is so hard to explain, and then remove the initial “Dear Friend” part and you’ll have a great first draft.
To be interesting just tell your story with uncommon honesty.
Most articles and stories are improved significantly if you delete the first page of the manuscript draft. Immediately start with the action.
Each technology can not stand alone. It takes a saw to make a hammer and it takes a hammer to make a saw. And it takes both tools to make a computer, and in today’s factory it takes a computer to make saws and hammers. This co-dependency creates an ecosystem of highly interdependent technologies that support each other
On the other hand, I see this technium as an extension of the same self-organizing system responsible for the evolution of life on this planet. The technium is evolution accelerated. A lot of the same dynamics that propel evolution are also at work in the technium
Our technologies are ultimately not contrary to life, but are in fact an extension of life, enabling it to develop yet more options and possibilities at a faster rate. Increasing options and possibilities is also known as progress, so in the end, what the technium brings us humans is progress.
Libraries, journals, communication networks, and the accumulation of other technologies help create the next idea, beyond the efforts of a single individual
We also see near-identical parallel inventions of tricky contraptions like slingshots and blowguns. However, because it was so ancient, we don’t have a lot of data for this behavior. What we would really like is to have a N=100 study of hundreds of other technological civilizations in our galaxy. From that analysis we’d be able to measure, outline, and predict the development of technologies. That is a key reason to seek extraterrestrial life.
When information is processed in a computer, it is being ceaselessly replicated and re-copied while it computes. Information wants to be copied. Therefore, when certain people get upset about the ubiquitous copying happening in the technium, their misguided impulse is to stop the copies. They want to stamp out rampant copying in the name of "copy protection,” whether it be music, science journals, or art for AI training. But the emergent behavior of the technium is to copy promiscuously. To ban, outlaw, or impede the superconductivity of copies is to work against the grain of the system.
the worry of some environmentalists is that technology can only contribute more to the problem and none to the solution. They believe that tech is incapable of being green because it is the source of relentless consumerism at the expense of diminishing nature, and that our technological civilization requires endless growth to keep the system going. I disagree.
Over time evolution arranges the same number of atoms in more complex patterns to yield more complex organisms, for instance producing an agile lemur the same size and weight as a jelly fish. We seek the same shift in the technium. Standard economic growth aims to get consumers to drink more wine. Type 2 growth aims to get them to not drink more wine, but better wine.
[[An optimistic view of capitalism]]
to measure (and thus increase) productivity we count up the number of refrigerators manufactured and sold each year. More is generally better. But this counting tends to overlook the fact that refrigerators have gotten better over time. In addition to making cold, they now dispense ice cubes, or self-defrost, and use less energy. And they may cost less in real dollars. This betterment is truly real value, but is not accounted for in the “more” column
it is imperative that we figure out how to shift more of our type 1 growth to type 2 growth, because we won’t be able to keep expanding the usual “more.”  We will have to perfect a system that can keep improving and getting better with fewer customers each year, smaller markets and audiences, and fewer workers. That is a huge shift from the past few centuries where every year there has been more of everything.
“degrowthers” are correct in that there are limits to bulk growth — and running out of humans may be one of them. But they don’t seem to understand that evolutionary growth, which includes the expansion of intangibles such as freedom, wisdom, and complexity, doesn’t have similar limits. We can always figure out a way to improve things, even without using more stuff — especially without using more stuff!
the technium is not inherently contrary to nature; it is inherently derived from evolution and thus inherently capable of being compatible with nature. We can choose to create versions of the technium that are aligned with the natural world.
Social media can transmit false information at great range at great speed. But compared to what? Social media's influence on elections from transmitting false information was far less than the influence of the existing medias of cable news and talk radio, where false information was rampant. Did anyone seriously suggest we should regulate what cable news hosts or call in radio listeners could say? Bullying middle schoolers on social media? Compared to what? Does it even register when compared to the bullying done in school hallways? Radicalization on YouTube? Compared to talk radio? To googling?
Kids are inherently obsessive about new things, and can become deeply infatuated with stuff that they outgrow and abandon a few years later. So the fact they may be infatuated with social media right now should not in itself be alarming. Yes, we should indeed understand how it affects children and how to enhance its benefits, but it is dangerous to construct national policies for a technology based on the behavior of children using it.
Since it is the same technology, inspecting how it is used in other parts of the world would help us isolate what is being caused by the technology and what is being caused by the peculiar culture of the US.
You don’t notice what difference you make because of the platform's humongous billions-scale. In aggregate your choices make a difference which direction it — or any technology — goes. People prefer to watch things on demand, so little by little, we have steered the technology to let us binge watch. Streaming happened without much regulation or even enthusiasm of the media companies. Street usage is the fastest and most direct way to steer tech.
Vibrators instead of the cacophony of ringing bells on cell phones is one example of a marketplace technological solution
The long-term effects of AI will affect our society to a greater degree than electricity and fire, but its full effects will take centuries to play out. That means that we’ll be arguing, discussing, and wrangling with the changes brought about by AI for the next 10 decades. Because AI operates so close to our own inner self and identity, we are headed into a century-long identity crisis.
What we tend to call AI, will not be considered AI years from now
What we are discovering is that many of the cognitive tasks we have been doing as humans are dumber than they seem. Playing chess was more mechanical than we thought. Playing the game Go is more mechanical than we thought. Painting a picture and being creative was more mechanical than we thought. And even writing a paragraph with words turns out to be more mechanical than we thought
out of the perhaps dozen of cognitive modes operating in our minds, we have managed to synthesize two of them: perception and pattern matching. Everything we’ve seen so far in AI is because we can produce those two modes. We have not made any real progress in synthesizing symbolic logic and deductive reasoning and other modes of thinking
we are slowly realizing we still have NO IDEA how our own intelligences really work, or even what intelligence is. A major byproduct of AI is that it will tell us more about our minds than centuries of psychology and neuroscience have
There is no monolithic AI. Instead there will be thousands of species of AIs, each engineered to optimize different ways of thinking, doing different jobs
Now from the get-go we assume there will be significant costs and harms of anything new, which was not the norm in my parent's generation
The astronomical volume of money and greed flowing through this frontier overwhelmed and disguised whatever value it may have had
The sweet elegance of blockchain enables decentralization, which is a perpetually powerful force. This tech just has to be matched up to the tasks — currently not visible — where it is worth paying the huge cost that decentralization entails. That is a big ask, but taking the long-view, this moment may not be a failure
My generic career advice for young people is that if at all possible, you should aim to work on something that no one has a word for. Spend your energies where we don’t have a name for what you are doing, where it takes a while to explain to your mother what it is you do. When you are ahead of language, that means you are in a spot where it is more likely you are working on things that only you can do. It also means you won’t have much competition.
Your 20s are the perfect time to do a few things that are unusual, weird, bold, risky, unexplainable, crazy, unprofitable, and looks nothing like “success.” The less this time looks like success, the better it will be as a foundation
·noahpinion.substack.com·
Interview with Kevin Kelly,editor, author, and futurist