Next Newsletter

724 bookmarks

Custom sorting

Some Unexpected Things About Subsetting | Credibly Curious

#done

·njtierney.com·Mar 16, 2023

Some Unexpected Things About Subsetting | Credibly Curious

Population Pyramid Plots in `ggplot2` | Credibly Curious

·njtierney.com·Mar 16, 2023

Population Pyramid Plots in `ggplot2` | Credibly Curious

Evidence to prioritise pop-up cycleways | Robin Lovelace

I gave a talk today at the ‘Ideas with Beers’ seminar series, hosted each Tuesday for the past several months by Brian Deegan, an experienced transport infrastructure engineer and Principal Design Engineer at Urban Movement.

#done

·robinlovelace.net·Mar 16, 2023

Evidence to prioritise pop-up cycleways | Robin Lovelace

Programming notes

·jsta.github.io·Mar 16, 2023

Programming notes

Install complexity as a de-motivator for reproducibility - Jemma Stachelek

·jsta.rbind.io·Mar 16, 2023

Install complexity as a de-motivator for reproducibility - Jemma Stachelek

Loading packages: the difference between R’s library() and require() functions – Tim Farewell

·timfarewell.co.uk·Mar 15, 2023

Loading packages: the difference between R’s library() and require() functions – Tim Farewell

My easy R script header template – Tim Farewell

·timfarewell.co.uk·Mar 15, 2023

My easy R script header template – Tim Farewell

Notes on Jeremy Howard's videos · TomAF

This weekend I got round watching lessons 4 and 5 of the “Introduction toMachine Learning for Coders” course by fast....

·afiodorov.github.io·Mar 14, 2023

Notes on Jeremy Howard's videos · TomAF

Ten Techniques Learned From fast.ai

Right now, Jeremy Howard – the co-founder of fast.ai – currently holds the 105th highest score for the plant seedling classification contest on Kaggle, but he's dropping fast. Why? His own students are beating him. And their names can now be found across the tops of leaderboards all over Kaggle.

·blog.floydhub.com·Mar 14, 2023

Ten Techniques Learned From fast.ai

Responsible Innovation: The Next Wave of Design Thinking

Building your moral imagination to create a more ethical future

·medium.com·Mar 14, 2023

Responsible Innovation: The Next Wave of Design Thinking

A crash course in AI terms (machine learning, diffusion models) in 6 minutes or less

Understanding a few terms helps us understand and use AI better.

·mythicalai.substack.com·Mar 13, 2023

A crash course in AI terms (machine learning, diffusion models) in 6 minutes or less

neuroplausible: I Hate Matlab: How an IDE, a Language, and a Mentality Harm

This blog post is inspired by a few Matlab-related tweets of mine, which turned into days-long discussions with fellow science and non-science tweeps. Those tweets of mine in turn are motivated by two main things: my desire for programming in psychology, neuroscience, and science in general to be taught and taught well, and my desire for students to learn transferable skills more generally. This blog post is premised on a number of themes which came up on Twitter. The great need for scientists to be able to code. The fact that Matlab is akin to bad training wheels on a bicycle, which never aid with learning to ride, but are used over again because they are better than walking. And the idea that while there is a best tool for every job, not every tool is best for any job. The discussion on Twitter was motivating and so I promised everybody I would write up what I think. So this blog post is about how I think teaching Matlab, the whole ecosystem not just the language, within psychology harms students more than it helps them in many cases in my experience. To clarify, Matlab used to be the best tool for many things. Before things like the NumPy/Matplotlib/Jupyter trilogy, it was probably the only tool that had “everything”. When Matlab first came out, the alternative was Fortran (which has goto statements, if you don’t know why this is scary, never mind, you’re lucky). But I believe it is now more a cause of brain-rot than mind-expanding awesomeness (please do not watch Arrival just to get this Sapir-Whorf reference). It is now more user- and science-jail than a freeing experience that allows us to make prototypes fast (it of course still does the latter). The Matlab logo is a visually appealing render of an eigenfunction of the wave equation. If you are a proficient coder and love Matlab, then this blog post is not really for you. Importantly, my intended audience are those who wish to see an improvement in the teaching of programming within psychology. I am talking from the perspective of my experiences within my field: psychology and cognitive science. I have designed from scratch: a course, that I taught when I was working as a postdoc at Oxford; and a workshop, while I was a PhD student; both with the aim of teaching the principles of coding before diving into Python specifically for psychology students. I also want people in science to have dependable transferrable skills, to be able to move to other languages, as well as having as much fun as possible while learning. Because of my training, I am privileged enough to be able to pick up a new language in a couple of hours. I want others to have such skill-related opportunities too, not only because it is useful for science as an endeavour to have skilled researchers, but for us as individuals: if one emerges from their degree a coder one will have more opportunities (both within and outside science). To reiterate my titular claim: the way we teach Matlab in psychology appears to be more harmful than helpful. I would like us to move beyond Matlab because the ecosystem it provides is a dangerous attractor, which many of my peers and my students involuntarily get sucked into. In this post I will outline the main reasons why the Matlab ecosystem and language are as provocatively described above. I intend to use “Matlab” to mean the whole ecosystem: the IDE, the language, and the mentality it brings about because I think they are inseparable. In the same way “C programmers [allocate] their own damn memory, probably right after building their own computer out of rocks and twigs”, Matlab coders within psychology also have and create a culture around them aided by the IDE and the pre-existing community they have joined. Limited Skill Transfer Firstly, Matlab is not sufficient to provide us with a transferable programming skillset. Matlab provides a programming environment in which nothing, at least superficially, seems hard — and thus nothing meaningful about coding itself is learned. We do not need to worry about namespaces, nor even functions too much. And we do not need to learn anything too complex to get some OK-looking figures. This is great for prototyping — we can produce something that works well enough impressively quickly. But this comes at a huge cost to us as a newbie coder. We have not learned any of the important skills that would enable us to pick up another language. And we will undeniably need to pick up other languages because that is the state psychology is in — e.g., R is becoming the standard for statistical analyses. Yet we just learned a language that does not help us do that since it did not push us to learn the basics of what other languages have at their core. IDEs are extremely useful if you are a proficient coder already. However, they can act more like bad training wheels on a bicycle, hindering deeper learning. To put this another way, when one is learning to drive they do not tend to learn to drive using an automatic gearbox. They learn to drive with a manual gearbox and it is tough. Learning the harder of the two types, manual, allows us to then easily transfer to the easier of the two if need be. In the case of USAmericans, they mostly learn to drive an automatic gearbox and almost never learn manual (because their skills do not transfer easily). Although the metaphor is simplistic, it suffices to explain why Matlab is not the best language to learn, it is a car with an automatic gearbox. We cannot easily transfer what we have learned to driving stick and in fact licences for just automatic transmission exist in my home country and the UK: if you learn just automatic you cannot be expected to know stick, whereas if you learn manual transmission you know “everything”. Furthermore, I posit that Matlab knowledge can make it harder than absolutely no programming knowledge for us to shift to another language. Matlab has an IDE that provides GUI functionality that allows us to edit variables dynamically like in Excel, which we know causes demonstrable problems. It causes some of our students to think that the Matlab IDE is what programming is, in much the same way some of our students think SPSS is what statistics is. Furthermore, high dependence on manually editing things is extremely bad because our workflow will not be reproducible nor replicable. In addition, all the bells and whistles of the IDE and the GUI never force us to think about variables deeply (since we can always visualise them). This exercise in keeping a mental model of what the code is doing, writing down what the code should be doing, imagining the data structures, etc., is a skill one needs to be developing. More than once I have been asked to help people who were editing their variables in the GUI and hence did not properly understand their own code nor how to debug it. This is not their fault, but had they learned to code without this they would never have picked up such terrible habits. They had not learned exactly what a loop was and a lot of other helper scripts worked just fine, so they had no feedback that editing in the GUI is maladaptive per se. In most other languages: there is no GUI and there is no IDE that has the language baked in. This results in many of us using Matlab by just pressing buttons and hoping something useful will come out the other end. And this observation, shocking though it may seem, that this is what we and our students do, has been backed up by so many of you over chat and Twitter. The GUI and IDE crutches will be snatched away from us as we will have to learn to code all over again — something we need never have to do if we had learned using a manual gearbox/not Matlab. Matlab puts a ceiling on what kinds of projects we can do both in size and in scope. Optimising for hardware, needing to lower space and time complexity, wanting something very specific like web-scraping, etc., are all tougher within Matlab. This is because Matlab is more a domain-specific than a domain-general language, it is centrally controlled, and the GUI and IDE cannot cope with large projects easily (although there is a command line mode, which we will be predominantly uncomfortable with given we only know Matlab). To further underline my point, Matlab explicitly teaches us some very unorthodox programming principles. Some “features” do not exist in (m)any other languages, and certainly not in any we will likely want to learn in the near future (Python, C/C++, R, Julia — even LaTeX). For example, we are not allowed to have more than a single externally accessible function per file, and that file must have the same filename as the function we wish to access. In essence this means we cannot have more than a function per file if we are, e.g., trying to code up a library in a clear way. Matlab does not permit us to store all our global variables in one file, e.g., if we need constant values. Due to all this, Matlab promotes spaghetti code. This adds to why many of us feel embarrassed to share our code online. We never learned to write neat code because Matlab allows us to be quick and dirty without any repercussions. Perhaps most flagrantly, arrays in Matlab start at 1. One has no idea how maladaptive this is until they move outside Matlab. Computer science starts from zero for a reason. If we want to learn generalisable skills, learning that indexing starts at 1 will hinder us, perhaps even cause us to introduce very nasty hard-to-find bugs when we move outside the Matlab ecosystem. All these put together cause us to get more confused by new languages as the baggage we carry with us from learning Matlab needs to be actively unlearned and inhibited. Closed Source Means Closed Science Secondly, Matlab is closed source, proprietary, and prohibitively expensive if you have to buy it yourself. They obfuscate their source code in many cases, meaning bugs are much harder to spot and impossible to edit ourselves without r

·neuroplausible.com·Mar 12, 2023

neuroplausible: I Hate Matlab: How an IDE, a Language, and a Mentality Harm

Are Airbnb guests less energy efficient than their host? • Max Halford

TLDR I compared the energy consumption of Airbnb guests versus their host, in the same appartment, during 2022. It appears that guests do in fact consume more energy than hosts. The data I used is available to any Airbnb host. I also open-sourced all the code I wrote for this analysis. Introduction European energy prices have soared in 2022. It’s gone to the point where some Airbnb hosts have become reluctant to rent, believing their guests are too wasteful and cost too much.

·maxhalford.github.io·Mar 10, 2023

Are Airbnb guests less energy efficient than their host? • Max Halford

Run Predictions Inside the Database

It parses a fitted R model object, and returns a formula in Tidy Eval code that calculates the predictions. It works with several databases back-ends because it leverages dplyr and dbplyr for the final SQL translation of the algorithm. It currently supports lm(), glm(), randomForest(), ranger(), earth(), xgb.Booster.complete(), cubist(), and ctree() models.

·tidypredict.tidymodels.org·Mar 10, 2023

Run Predictions Inside the Database

Online gradient descent written in SQL • Max Halford

Edit: this post generated a few insightful comments on Hacker News. I’ve also put the code in a notebook for ease of use. Introduction Modern MLOps is complex because it involves too many components. You need a message bus, a stream processing engine, an API, a model store, a feature store, a monitoring service, etc. Sadly, containerisation software and the unbundling trend have encouraged an appetite for complexity. I believe MLOps shouldn’t be this complex.

·maxhalford.github.io·Mar 10, 2023

Online gradient descent written in SQL • Max Halford

The Most Underrated R packages: Part 2

A curated list of awesome and less known R libraries

·towardsdatascience.com·Mar 10, 2023

The Most Underrated R packages: Part 2

The end of range anxiety: how has the range of electric cars changed over time?

The average range of electric cars has more than tripled since 2011.

·hannahritchie.substack.com·Mar 9, 2023

The end of range anxiety: how has the range of electric cars changed over time?

Measuring Goodhart’s law

Goodhart’s law famously says: “When a measure becomes a target, it ceases to be a good measure.” Although originally from economics, it’s something we have to grapple with at OpenAI when figuring out how to optimize objectives that are difficult or costly to measure.

·openai.com·Mar 9, 2023

Measuring Goodhart’s law

Solving (some) formal math olympiad problems

We built a neural theorem prover for Lean that learned to solve a variety of challenging high-school olympiad problems, including problems from the AMC12 and AIME competitions, as well as two problems adapted from the IMO.

·openai.com·Mar 9, 2023

Solving (some) formal math olympiad problems

The Tidymodels Extension for Time Series Modeling

The time series forecasting framework for use with the tidymodels ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the forecast and prophet packages. Refer to "Forecasting Principles & Practice, Second edition" (https://otexts.com/fpp2/). Refer to "Prophet: forecasting at scale" (https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/.).

·business-science.github.io·Mar 9, 2023

The Tidymodels Extension for Time Series Modeling

AdaBoost data mining algorithm in plain English - Hacker Bits

The AdaBoost data mining algorithm is part of a longer article about many more data mining algorithms.What does it do? AdaBoost is a boosting algorithm which constructs a classifier. As you probably remember, a classifier takes a bunch of data ... Read More

·hackerbits.com·Mar 9, 2023

AdaBoost data mining algorithm in plain English - Hacker Bits

k-Nearest Neighbor (kNN) data mining algorithm in plain English - Hacker Bits

The kNN data mining algorithm is part of a longer article about many more data mining algorithms.What does it do? kNN, or k-Nearest Neighbors, is a classification algorithm. However, it differs from the classifiers previously described because it's a lazy ... Read More

·hackerbits.com·Mar 9, 2023

k-Nearest Neighbor (kNN) data mining algorithm in plain English - Hacker Bits

Naive Bayes data mining algorithm in plain English - Hacker Bits

The Naive Bayes data mining algorithm is part of a longer article about many more data mining algorithms.What does it do? Naive Bayes is not a single algorithm, but a family of classification algorithms that share one common assumption: Every ... Read More

·hackerbits.com·Mar 9, 2023

Naive Bayes data mining algorithm in plain English - Hacker Bits

20 random tips and tricks for working with R, Rmarkdown, and RStudio | Ashirwad Barnwal

My top 20 R-related tips & tricks

·ashirwad.netlify.app·Mar 7, 2023

20 random tips and tricks for working with R, Rmarkdown, and RStudio | Ashirwad Barnwal

Inside the Suspicion Machine

Obscure government algorithms are making life-changing decisions about millions of people around the world. Here, for the first time, we reveal how one of these systems works.

·wired.com·Mar 7, 2023

Inside the Suspicion Machine

Presentation-Ready Data Summary and Analytic Result Tables

Creates presentation-ready tables summarizing data sets, regression models, and more. The code to create the tables is concise and highly customizable. Data frames can be summarized with any function, e.g. mean(), median(), even user-written functions. Regression models are summarized and include the reference rows for categorical variables. Common regression models, such as logistic regression and Cox proportional hazards regression, are automatically identified and the tables are pre-filled with appropriate column headers.

·danieldsjoberg.com·Mar 7, 2023

Presentation-Ready Data Summary and Analytic Result Tables

The Illustrated Transformer

Discussions: Hacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments) Translations: Arabic, Chinese (Simplified) 1, Chinese (Simplified) 2, French 1, French 2, Japanese, Korean, Persian, Russian, Spanish 1, Spanish 2, Vietnamese Watch: MIT’s Deep Learning State of the Art lecture referencing this post In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Google Neural Machine Translation model in specific tasks. The biggest benefit, however, comes from how The Transformer lends itself to parallelization. It is in fact Google Cloud’s recommendation to use The Transformer as a reference model to use their Cloud TPU offering. So let’s try to break the model apart and look at how it functions. The Transformer was proposed in the paper Attention is All You Need. A TensorFlow implementation of it is available as a part of the Tensor2Tensor package. Harvard’s NLP group created a guide annotating the paper with PyTorch implementation. In this post, we will attempt to oversimplify things a bit and introduce the concepts one by one to hopefully make it easier to understand to people without in-depth knowledge of the subject matter. 2020 Update: I’ve created a “Narrated Transformer” video which is a gentler approach to the topic: A High-Level Look Let’s begin by looking at the model as a single black box. In a machine translation application, it would take a sentence in one language, and output its translation in another.

·jalammar.github.io·Mar 3, 2023

The Illustrated Transformer

Trying to isolate Russia

The New York Times shows how the west tried to isolate Russia and how things haven’t gone as expected. A series of packed bubbles, cartograms, and flowcharts provide a visual timeline for eac…

·flowingdata.com·Mar 2, 2023

Trying to isolate Russia

100 visualizations from a single dataset with 6 data points

The structure of a dataset can help you pick a visualization method or chart type, but it only takes you part of the way there. To demonstrate, Ferdio started with a simple dataset with six data po…

·flowingdata.com·Mar 2, 2023

100 visualizations from a single dataset with 6 data points

Using satellite imagery to assess the damage in Ukraine

The Economist combined two satellite imagery sources, one that estimates fire events and one that estimates building damage, to assess the extent of damage in Ukraine: Both approaches have weakness…

·flowingdata.com·Mar 2, 2023

Using satellite imagery to assess the damage in Ukraine