Technology Commentary

Technology Commentary

8218 bookmarks
Custom sorting
Traffic 101: Packets Mostly Flow - Slack Engineering
Traffic 101: Packets Mostly Flow - Slack Engineering
Slack handles billions of inbound network requests per day, all of which traverse through our edge network and ingress load balancing tiers. In this blog post, we’ll talk about how a request flows — from a Slack’s user perspective — across the vast ether of the network to reach AWS and then Slack’s internal …
·slack.engineering·
Traffic 101: Packets Mostly Flow - Slack Engineering
Let’s Just Kill The Advertising Industry
Let’s Just Kill The Advertising Industry
I was watching Redbeard‘s presentation from 28c3 summarizing the current context of corporate datatrawling and not for the first time it struck me just how w...
·humaniterations.net·
Let’s Just Kill The Advertising Industry
The Atari Ten Commandments
The Atari Ten Commandments
Back in 1986, a resurgent Atari was just hitting its stride. The 520ST was a hit and selling well. Rather than rest on their laurels (something they would do in later years), Atari announced the 1040ST in January 1986. After a short delay it became available in March and quickly became the biggest selling ST, at least in the US.
·goto10retro.com·
The Atari Ten Commandments
Let the internet run out
Let the internet run out
Sometimes I read the last post in my Mastodon feed, and pull to refresh, and there’s nothing new to see. Other times I’ll read everything in my RSS feeds, so there’s nothing in my queue. Then there are places like Twitter, Facebook, Instagram, Threads, Reddit, and many
·birchtree.me·
Let the internet run out
The Real Business Value of Platform Engineering
The Real Business Value of Platform Engineering
If your focus is solely on the developer experience, you’re missing half the point — and probably half the value — of platform engineering.
·thenewstack.io·
The Real Business Value of Platform Engineering
Generative AI and intellectual property — Benedict Evans
Generative AI and intellectual property — Benedict Evans
If you put all the world’s knowledge into an AI model and use it to make something new, who owns that and who gets paid? This is a completely new problem that we’ve been arguing about for 500 years.
·ben-evans.com·
Generative AI and intellectual property — Benedict Evans
Web UX: Study Guide
Web UX: Study Guide
Unsure where to start? Use this collection of links to our articles and videos to learn how users interact with the web and how to design effective web user experiences.
·nngroup.com·
Web UX: Study Guide
Even Friendlier SQL with DuckDB
Even Friendlier SQL with DuckDB
TLDR; DuckDB continues to push the boundaries of SQL syntax to both simplify queries and make more advanced analyses possible. Highlights include dynamic column selection, queries that start with the FROM clause, function chaining, and list comprehensions. We boldly go where no SQL engine has gone before! Who says that SQL should stay frozen in time, chained to a 1999 version of the specification? As a comparison, do folks remember what JavaScript felt like before Promises? Those didn’t launch until 2012! It’s clear that innovation at the programming syntax layer can have a profoundly positive impact on an entire language ecosystem. We believe there are many valid reasons for innovation in the SQL language, among them opportunities to simplify basic queries and also to make more dynamic analyses possible. Many of these features arose from community suggestions! Please let us know your SQL pain points on Discord or GitHub and join us as we change what it feels like to write SQL! If you have not had a chance to read the first installment in this series, please take a quick look here. The future is now The first few enhancements in this list were included in the “Ideas for the Future” section of the prior post. Reusable column aliases When working with incremental calculated expressions in a select statement, traditional SQL dialects force you to either write out the full expression for each column or create a common table expression (CTE) around each step of the calculation. Now, any column alias can be reused by subsequent columns within the same select statement. Not only that, but these aliases can be used in the where and order by clauses as well. Old way 1: Repeat yourself select 'These are the voyages of the starship Enterprise...' AS intro, instr('These are the voyages of the starship Enterprise...', 'starship') AS starship_loc substr('These are the voyages of the starship Enterprise...', instr('These are the voyages of the starship Enterprise...', 'starship') + len('starship') + 1) AS trimmed_intro; Old way 2: All the CTEs WITH intro_cte AS ( SELECT 'These are the voyages of the starship Enterprise...' AS intro ), starship_loc_cte AS ( SELECT intro, instr(intro, 'starship') AS starship_loc FROM intro_cte ) SELECT intro, starship_loc, substr(intro, starship_loc + len('starship') + 1) AS trimmed_intro FROM starship_loc_cte; New way SELECT 'These are the voyages of the starship Enterprise...' AS intro, instr(intro, 'starship') AS starship_loc, substr(intro, starship_loc + len('starship') + 1) AS trimmed_intro; intro starship_loc trimmed_intro These are the voyages of the starship Enterprise… 30 Enterprise… Dynamic column selection Databases typically prefer strictness in column definitions and flexibility in the number of rows. This can help by enforcing data types and recording column level metadata. However, in data science workflows and elsewhere, it is very common to dynamically generate columns (for example during feature engineering). No longer do you need to know all of your column names up front! DuckDB can select and even modify columns based on regular expression pattern matching, EXCLUDE or REPLACE modifiers, and even lambda functions (see the section on lambda functions below for details!). Let’s take a look at some facts gathered about the first season of Star Trek. Using DuckDB’s httpfs extension, we can query a csv dataset directly from GitHub. It has several columns so let’s DESCRIBE it. INSTALL httpfs; LOAD httpfs; CREATE TABLE trek_facts AS SELECT * FROM 'https://raw.githubusercontent.com/Alex-Monahan/example_datasets/main/Star_Trek-Season_1.csv'; DESCRIBE trek_facts; column_name column_type null key default extra season_num BIGINT YES NULL NULL NULL episode_num BIGINT YES NULL NULL NULL aired_date DATE YES NULL NULL NULL cnt_kirk_hookups BIGINT YES NULL NULL NULL cnt_downed_redshirts BIGINT YES NULL NULL NULL bool_aliens_almost_took_over_planet BIGINT YES NULL NULL NULL bool_aliens_almost_took_over_enterprise BIGINT YES NULL NULL NULL cnt_vulcan_nerve_pinch BIGINT YES NULL NULL NULL cnt_warp_speed_orders BIGINT YES NULL NULL NULL highest_warp_speed_issued BIGINT YES NULL NULL NULL bool_hand_phasers_fired BIGINT YES NULL NULL NULL bool_ship_phasers_fired BIGINT YES NULL NULL NULL bool_ship_photon_torpedos_fired BIGINT YES NULL NULL NULL cnt_transporter_pax BIGINT YES NULL NULL NULL cnt_damn_it_jim_quote BIGINT YES NULL NULL NULL cnt_im_givin_her_all_shes_got_quote BIGINT YES NULL NULL NULL cnt_highly_illogical_quote BIGINT YES NULL NULL NULL bool_enterprise_saved_the_day BIGINT YES NULL NULL NULL COLUMNS() with regular expressions The COLUMNS expression can accept a string parameter that is a regular expression and will return all column names that match the pattern. How did warp change over the first season? Let’s examine any column name that contains the word warp. SELECT episode_num, COLUMNS('.*warp.*') FROM trek_facts; episode_num cnt_warp_speed_orders highest_warp_speed_issued 0 1 1 1 0 0 2 1 1 3 1 0 … … … 27 1 1 28 0 0 29 2 8 The COLUMNS expression can also be wrapped by other functions to apply those functions to each selected column. Let’s simplify the above query to look at the maximum values across all episodes: SELECT MAX(COLUMNS('.*warp.*')) FROM trek_facts; max(trek_facts.cnt_warp_speed_orders) max(trek_facts.highest_warp_speed_issued) 5 8 We can also create a WHERE clause that applies across multiple columns. All columns must match the filter criteria, which is equivalent to combining them with AND. Which episodes had at least 2 warp speed orders and at least a warp speed level of 2? SELECT episode_num, COLUMNS('.*warp.*') FROM trek_facts WHERE COLUMNS('.*warp.*') = 2; -- cnt_warp_speed_orders = 2 -- AND -- highest_warp_speed_issued = 2 episode_num cnt_warp_speed_orders highest_warp_speed_issued 14 3 7 17 2 7 18 2 8 29 2 8 COLUMNS() with EXCLUDE and REPLACE Individual columns can also be either excluded or replaced prior to applying calculations on them. For example, since our dataset only includes season 1, we do not need to find the MAX of that column. It would be highly illogical. SELECT MAX(COLUMNS(* EXCLUDE season_num)) FROM trek_facts; max(trek_facts.episode_num) max(trek_facts.aired_date) max(trek_facts.cnt_kirk_hookups) … max(trek_facts.bool_enterprise_saved_the_day) 29 1967-04-13 2 … 1 The REPLACE syntax is also useful when applied to a dynamic set of columns. In this example, we want to convert the dates into timestamps prior to finding the maximum value in each column. Previously this would have required an entire subquery or CTE to pre-process just that single column! SELECT MAX(COLUMNS(* REPLACE aired_date::timestamp AS aired_date)) FROM trek_facts; max(trek_facts.season_num) max(trek_facts.episode_num) max(aired_date := CAST(aired_date AS TIMESTAMP)) … max(trek_facts.bool_enterprise_saved_the_day) 1 29 1967-04-13 00:00:00 … 1 COLUMNS() with lambda functions The most flexible way to query a dynamic set of columns is through a lambda function. This allows for any matching criteria to be applied to the names of the columns, not just regular expressions. See more details about lambda functions below. For example, if using the LIKE syntax is more comfortable, we can select columns matching a LIKE pattern rather than with a regular expression. SELECT episode_num, COLUMNS(col - col LIKE '%warp%') FROM trek_facts WHERE COLUMNS(col - col LIKE '%warp%') = 2; episode_num cnt_warp_speed_orders highest_warp_speed_issued 14 3 7 17 2 7 18 2 8 29 2 8 Automatic JSON to nested types conversion The first installment in the series mentioned JSON dot notation references as future work. However, the team has gone even further! Instead of referring to JSON-typed columns using dot notation, JSON can now be automatically parsed into DuckDB’s native types for significantly faster performance, compression, as well as that friendly dot notation! First, install and load the httpfs and json extensions if they don’t come bundled with the client you are using. Then query a remote JSON file directly as if it were a table! INSTALL httpfs; LOAD httpfs; INSTALL json; LOAD json; SELECT starfleet[10].model AS starship FROM 'https://raw.githubusercontent.com/vlad-saling/star-trek-ipsum/master/src/content/content.json'; starship USS Farragut - NCC-1647 - Ship on which James Kirk served as a phaser station operator. Attacked by the Dikironium Cloud Creature, killing half the crew. ad. Now for some new SQL capabilities beyond the ideas from the prior post! FROM first in SELECT statements When building a query, the first thing you need to know is where your data is coming FROM. Well then why is that the second clause in a SELECT statement?? No longer! DuckDB is building SQL as it should have always been - putting the FROM clause first! This addresses one of the longest standing complaints about SQL, and the DuckDB team implemented it in 2 days. FROM my_table SELECT my_column; Not only that, the SELECT statement can be completely removed and DuckDB will assume all columns should be SELECTed. Taking a look at a table is now as simple as: FROM my_table; -- SELECT * FROM my_table Other statements like COPY are simplified as well. COPY (FROM trek_facts) TO 'phaser_filled_facts.parquet'; This has an additional benefit beyond saving keystrokes and staying in a development flow state: autocomplete will have much more context when you begin to choose columns to query. Give the AI a helping hand! Note that this syntax is completely optional, so your SELECT * FROM keyboard shortcuts are safe, even if they are obsolete… 🙂 Function chaining Many SQL blogs advise the use of CTEs instead of subqueries. Among other benefits, they are much more readable. Operations are compartmentalized into discrete chunks and they can be read in order top to bottom instead of forcing the reader to work their way inside out. DuckDB enables the same interpretability improvement for every scalar function! Use the dot operator to c
·duckdb.org·
Even Friendlier SQL with DuckDB
Everything you wanted to know about GPG – but were scared to ask
Everything you wanted to know about GPG – but were scared to ask
Each year, around the New Year Holiday, I get to re-learn GPG in all its glory. I’ve used GPG for many years and have marveled at how well it works (when it does), yet how hard it is to get s…
·hypecycles.com·
Everything you wanted to know about GPG – but were scared to ask
WordPress’ 100 Year Plan: Putting A Price On Your Legacy
WordPress’ 100 Year Plan: Putting A Price On Your Legacy
The company behind WordPress decided to offer a service that covers your web hosting long after you die. Sticker shock aside, I don’t hate the idea.
·tedium.co·
WordPress’ 100 Year Plan: Putting A Price On Your Legacy
PDF Format History: Where Our Future History Lies
PDF Format History: Where Our Future History Lies
The story of the PDF, the portable document format that’s become one of the internet’s defining information formats. It’ll be with us after we’re long gone.
·tedium.co·
PDF Format History: Where Our Future History Lies
Documentation Is More than Your Thinnest Viable Platform
Documentation Is More than Your Thinnest Viable Platform
Documentation is not only the first step in your platform strategy, it's what can make platform engineering self-service and self-marketed.
·thenewstack.io·
Documentation Is More than Your Thinnest Viable Platform
A note to young folks: download the things you love
A note to young folks: download the things you love
Many years ago, my favorite podcast was The Totally Rad Show. It was a fun video podcast where the hosts reviewed movies, games, TV shows, and other general nerd stuff. The hosts were Jeff Cannata (who last year I called the podcasting GOAT), Dan Trachtenberg (who went on to direct
·birchtree.me·
A note to young folks: download the things you love
When tech says ‘no’ — Benedict Evans
When tech says ‘no’ — Benedict Evans
The tech industry always has a reason why any new laws or regulations are bad - indeed, so does any industry. They always say that! The trouble is, sometimes it’s true, and some laws are (or would be) disasters. So which is it? Well, there are three ways that people say ‘NO!’
·ben-evans.com·
When tech says ‘no’ — Benedict Evans
Incident Response: Three Ts to Rule Them All
Incident Response: Three Ts to Rule Them All
The best operations platforms can quickly resolve high-impact incidents and elevate continuous learning, in which teams are ahead of issues before they start.
·thenewstack.io·
Incident Response: Three Ts to Rule Them All