pinboarded

12669 bookmarks

Custom sorting

Training Data for the Price of a Sandwich: Common Crawl’s Impact on Generative AI

Helpful insight!

Common Crawl does not contain the “entire web,” nor a representative sample of it. Despite its size, there are important limitations on how much of the web is covered. The crawling process is almost entirely automated to prioritize pages on domains that are frequently linked to, which makes domains related to digitally marginalized communities less likely to be included. The language and regional coverage is strongly skewed toward English content. Moreover, a growing number of relevant domains like Facebook and the New York Times block Common Crawl from crawling most (or all) of their pages.When used as a source for AI training, Common Crawl should be used with care, but such care is often lacking. Due to Common Crawl’s deliberate lack of curation, AI builders do not use it directly as training data for their models. Instead, builders choose from a variety of filtered Common Crawl versions to train their LLMs. However, there is a lack of reflection among AI builders about the limitations and biases of Common Crawl’s archive. Popular Common Crawl versions are especially problematic when used to train LLMs for end-user products because the filtering techniques used to create them are simplistic and often focused on removing pornography or boilerplate text like the names of navigational menu items, leaving lots of other types of problematic content untouched.Common Crawl and AI builders have a shared responsibility for making generative AI more trustworthy. While Common Crawl was never primarily about providing AI training data, it now positions itself as an important building block for LLM development. However, it continues to provide a source that AI builders need to filter before model training. Both groups can help make generative AI more trustworthy in their own ways. Common Crawl should better highlight the limitations and biases of its data and be more transparent and inclusive about its governance. It could also enforce more transparency around generative AI by requiring AI builders to attribute their usage of Common Crawl. AI builders should put more effort into filtering out more types of problematic content and try to better take into account the various cultural contexts in which their generative AI products are deployed. There is also a need for industry standards and best practices for end-user products to reduce potential harms when using Common Crawl or similar sources for training data. In addition, AI builders should create or support dedicated intermediaries tasked with filtering Common Crawl in transparent and accountable ways that are continuously updated. Long term, there should be less reliance on sources like Common Crawl and a bigger emphasis on training generative AI on datasets created and curated by people in equitable and transparent ways.

#ai #llm

·foundation.mozilla.org·Nov 26, 2024

Training Data for the Price of a Sandwich: Common Crawl’s Impact on Generative AI

YT RSS Feeds

Enter a URL for a YouTube Channel to generate the feed URL you can use to subscribe with an RSS reader. A worthy backstory read of connection via mastodon https://82mhz.net/posts/2024/11/a-very-cool-internet-thing/

#rss #YouTube

·ytrss.pesky.moe·Nov 20, 2024

YT RSS Feeds

HTML5 For Web Designers by Jeremy Keith

#book #html5 #webdesign

·html5forwebdesigners.com·Nov 19, 2024

HTML5 For Web Designers by Jeremy Keith

Making a Mastodon bot with Google Sheets and Apps Scripts | Stefan Bohacek

And what’s a better use of spreadsheets than setting up an automated account on Mastodon that will post data from it!

#mastodon #api #bot

·stefanbohacek.com·Nov 18, 2024

Making a Mastodon bot with Google Sheets and Apps Scripts | Stefan Bohacek

GitHub - YahnisElsts/plugin-update-checker: A custom update checker for WordPress plugins. Useful if you don't want to host your project in the official WP repository, but would still like it to support automatic updates. Despite the name, it also works w

A custom update checker for WordPress plugins. Useful if you don't want to host your project in the official WP repository, but would still like it to support automatic updates. Despite the name, it also works with themes.

#github #theme #plugin #wordpress

·github.com·Nov 18, 2024

Local Optimizations Don't Lead to Global Optimums

I feel that maybe we should teach people to program the way they teach martial arts, like only in the most desperate situations when all else failed should you resort to automating something. I don’t quite know if I’m just old and grumpy, seeing industry trends fly by me at a pace I don’t follow, or whether there’s really something to it, but I thought I’d take a walk through a set of ideas and concepts that motivate my stance.

#ai #Programming

·ferd.ca·Nov 18, 2024

Local Optimizations Don't Lead to Global Optimums

Build a Secure Online Election for Free | Election Runner

Secure, Cloud-based Elections Create an election for your school or organization in seconds. Your voters can vote from any location on any device.

#tools

·electionrunner.com·Nov 16, 2024

Build a Secure Online Election for Free | Election Runner

Drystone walling brought me peace, says Kristie de Garis — Barley

After decades away from rural Scotland, and a deeply unhappy period in the city, Kristie de Garis moved to Perthshire and set about rebuilding her life. In a candid essay, she tells how drystone walling, and reconnecting with her past, helped her recover and envisage a simpler, more nourishing future.

#hope #inspiration

·barleymagazine.com·Nov 16, 2024

Drystone walling brought me peace, says Kristie de Garis — Barley

Open Climate Curriculum

As educators, we agree that avoiding catastrophic climate change will require rapid and systemic change globally, and that educating university and college school students – future leaders – to meet that challenge is therefore of utmost importance. We are participating in the Open Climate Curriculum to accelerate the education of students on this critical issue by sharing expertise and resources. Faculty and staff involved in the teaching of students at any accredited institution of higher learning are welcome to register and share materials and expertise.

#climatechange #oegconnect

·openclimatecurriculum.org·Nov 15, 2024

Open Climate Curriculum

[2301.13188] Extracting Training Data from Diffusion Models

mage diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.

#copyright #ai

·arxiv.org·Nov 13, 2024

[2301.13188] Extracting Training Data from Diffusion Models

Connecting the Dots: 20+ Years of Open in Australia - LibGuides at CAUL - Council of Australian University Librarians

There have been open research initiatives in Australia since the very beginning of global discussions on open access to research publications in the early 2000s. The initiatives in Australia have come from a range of actors, including the federal government, funders, institutions, and peak and advocacy bodies. This arrow illustrates some of the key initiatives over the past 20 years. In 2020, the Council of Australian University Librarians (CAUL) and the Australasian Open Access Strategy Group (AOASG, now Open Access Australasia) facilitated a national discussion on open research. In 2021, there is increased momentum towards open access to research publications driven by work from the Office of the Chief Scientist, Dr Cathy Foley.

#openaccess #history #australia #oegconnect

·caul.libguides.com·Nov 13, 2024

Connecting the Dots: 20+ Years of Open in Australia - LibGuides at CAUL - Council of Australian University Librarians

AI May Ruin the University as We Know It

Such is the ed-tech vision of higher education now. What the example of NotebookLM’s promotional campaign demonstrates is the emergence of a new model or template for education, if not for learning itself: a productivity schema ready to be laid across the full spectrum of the postindustrial knowledge economy. It is not difficult to see that in the next phase one can eliminate the lectures and discussions and simply start with the summaries (and eventually the summaries of the summaries), streamed on demand.

#ai #education

·chronicle.com·Nov 8, 2024

AI May Ruin the University as We Know It

Indigenous AI

The Indigenous Protocol and Artificial Intelligence (A.I.) Working Group develops new conceptual and practical approaches to building the next generation of A.I. systems. The working group is interested the following questions: From an Indigenous perspective, what should our relationship with A.I. be? How can Indigenous epistemologies and ontologies contribute to the global conversation regarding society and A.I.? How do we broaden discussions regarding the role of technology in society beyond the largely culturally homogeneous research labs and Silicon Valley startup culture? How do we imagine a future with A.I. that contributes to the flourishing of all humans and non-humans?

#ai #culture #ethics #indigenous #oegconnect

·indigenous-ai.net·Nov 7, 2024

Indigenous AI

AntennaPod

AntennaPod is a podcast player, completely free and respectful of your privacy. It’s created by volunteers throughout the world without any commercial interest. Together we contribute to an open podcasting ecosystem. So that you get to enjoy the web’s best audio content.

#podcasting #audio #opensource

·antennapod.org·Nov 3, 2024

AntennaPod

Terence Eden / ActivityBot · GitLab

A single PHP file which acts as a basic ActivityPub bot server.

#activitypub #bot #fediverse #mastodon

·gitlab.com·Nov 2, 2024

Terence Eden / ActivityBot · GitLab

SomaFM: Commercial-free, Listener-supported Radio

Over 30 unique channels of listener-supported, commercial-free, underground/alternative radio broadcasting to the world. All music hand-picked by SomaFM's award-winning DJs and music directors.

#music #radio

·somafm.com·Nov 2, 2024

SomaFM: Commercial-free, Listener-supported Radio

Automatic WordPress theme updates from a GitHub repository using the native WordPress update system.

Adding automatic theme updates from a GitHub repository is actually pretty simple. The following function will hook into WordPress's native update system and grab the latest release from the repo of your choosing (if the version number has increased).

#wordpress

·gist.github.com·Nov 1, 2024

Automatic WordPress theme updates from a GitHub repository using the native WordPress update system.

Self-Hosted Plugin Update for WordPress

In this tutorial I will show you how to develop a custom update repository for your private plugin. It will be helpful if you’re developing plugins for sale. We are not going to use any ready or bloated PHP-libraries for that, only two filter hooks plugins_api and site_transient_update_plugins and transient cache, that’s all.

#wordpress #plugins #api

·rudrastyh.com·Nov 1, 2024

Self-Hosted Plugin Update for WordPress

Teacher in a Box

In a nutshell it is a huge interactive library of community and academic resources that can be accessed without internet. Yes, NO internet!! In many rural villages in developing countries there is no internet, and even in places where there is internet it may be unreliable, or the cost limits people’s use. Teacher in a Box has a huge positive impact where there are limited teaching resources or teachers with limited training or education themselves. Our Teacher in a Box servers support self-paced learning as well as classroom teaching. Thanks to the amazing generosity and efforts of not-for-profit educational software developers, our own computer donors and our volunteer workers, we are able to supply Teacher in a Box pre-loaded with a huge range of quality educational and training materials at minimal cost.

#offline

·teacherinabox.org.au·Oct 31, 2024

Teacher in a Box

View of Open For All: The OERu’s Next Generation Digital Learning Ecosystem | The International Review of Research in Open and Distributed Learning

This paper describes the functionality, scalability, and cost of implementing and maintaining a suite of open source technologies, which have supported hundreds of thousands of learners in the past year, on an information technology infrastructure budget of less than US$10,000 per year. In addition, it reviews pedagogical opportunities offered by a fully open digital learning ecosystem, as well as benefits for learners and educators alike. The Open Education Resource universitas (OERu) is an international consortium made up of 36 publicly funded institutions and the OER Foundation. The OERu currently offers first-year postsecondary courses through OER-based micro-courses with pathways to gain stackable micro-credentials, convertible to academic credit toward recognised university qualifications. The OERu, adhering to open principles (Wiley, 2014b), has created an open source Next Generation Digital Learning Ecosystem (NGDLE) to meet the needs of learners, consortium partners, and OERu collaborators. The NGDLE—a distributed, loosely coupled component model, consisting entirely of free and open source software (FOSS)—is a global computing infrastructure created to reach learners wherever they are. All OERu services are hosted on commodity FOSS infrastructure, conferring significant advantages and creating opportunities for institutions adopting any of these services to enhance education opportunities at minimal cost. The NGDLE can also increase technological autonomy and resilience while providing exceptional learning opportunities and agency for learners and educators alike.

#openeducation #opensource

·irrodl.org·Oct 26, 2024

View of Open For All: The OERu’s Next Generation Digital Learning Ecosystem | The International Review of Research in Open and Distributed Learning

Document Studio - Google Workspace Marketplace

Create pixel-perfect documents from data in Google Sheets™ and Google Forms™ responses. Use the built-in Mail Merge tool to email documents, the generated files are saved in Google Drive™ (or Shared Drives) and can be automatically shared with colleagues and clients. Document Studio integrates with popular apps such as Slack, Trello, Telegram, WhatsApp, Stripe, and more, allowing users to automate their workflows and create beautiful documents in minutes. Document Studio can be used to create professional-looking and sophisticated documents, including personalized business letters, student test results, customer invoices, event tickets, vendor contracts, purchase orders, sales pitches, and any other type of document that needs to be generated on a repetitive basis. The documents can be generated in popular file formats, including PDF, Microsoft Word, Excel, PowerPoint presentations, OpenOffice formats, ePUB ebooks, HTML web pages, or plain text. Document Studio also includes built-in support for embedding Barcodes, QR Codes, Google Drawings™, Google Charts™, Images, Links, and Google Drive™ files in emails and document

#Google #documents

·workspace.google.com·Oct 26, 2024

Document Studio - Google Workspace Marketplace

AI Emoji Generator

They are cute, maybe too cute, but that's how AI Generators roll. If you need a cartoon figure, obbject, give it a spin. Associated with Flux AI Playground

#ai #emoji

·emojis.sh·Oct 25, 2024

AI Emoji Generator

#JoinMobilizon - Let’s take back control of our events

Mobilizon is a tool that helps you find, create and organize events. You can also create a page for your group where the members will be able to get organized together.

#community #event #fediverse #activitypub

·joinmobilizon.org·Oct 23, 2024

#JoinMobilizon - Let’s take back control of our events

Dario Amodei — Machines of Loving Grace

I think and talk a lot about the risks of powerful AI. The company I’m the CEO of, Anthropic, does a lot of research on how to reduce these risks. Because of this, people sometimes draw the conclusion that I’m a pessimist or “doomer” who thinks AI will be mostly bad or dangerous. I don’t think that at all. In fact, one of my main reasons for focusing on risks is that they’re the only thing standing between us and what I see as a fundamentally positive future. I think that most people are underestimating just how radical the upside of AI could be, just as I think most people are underestimating how bad the risks could be. In this essay I try to sketch out what that upside might look like—what a world with powerful AI might look like if everything goes right. Of course no one can know the future with any certainty or precision, and the effects of powerful AI are likely to be even more unpredictable than past technological changes, so all of this is unavoidably going to consist of guesses. But I am aiming for at least educated and useful guesses, which capture the flavor of what will happen even if most details end up being wrong. I’m including lots of details mainly because I think a concrete vision does more to advance discussion than a highly hedged and abstract one. First, however, I wanted to briefly explain why I and Anthropic haven’t talked that much about powerful AI’s upsides, and why we’ll probably continue, overall, to talk a lot about risks.

#ai #economics #essay #future

·darioamodei.com·Oct 23, 2024

Dario Amodei — Machines of Loving Grace

H5P in the Classroom – Learning Technology & Innovation (Thomson Rivers University)

By using our custom “H5P Creator” role in Moodle, students can create and share H5P games, review questions, or interactive learning elements within a Moodle course.

#h5p #oegconnect

·h5pintheclassroom.trubox.ca·Oct 23, 2024

H5P in the Classroom – Learning Technology & Innovation (Thomson Rivers University)

Solar-powered, self-hosted website

This website runs on a Raspberry Pi powered by a solar panel and a battery located on our roof deck. h/t https://werd.io/2024/my-solar-powered-and-self-hosted-website

#solar #smallweb

·solar.dri.es·Oct 19, 2024

Solar-powered, self-hosted website

Open Pedagogy Project at the University of New Haven

We want to design courses more deliberately for—and with—our students. We explain that we will do this by establishing a faculty development initiative centered on open pedagogy. In their oft-cited introduction to open pedagogy, Robin DeRosa and Rajiv Jhangiani clarify that open pedagogy is about more than reducing textbook costs (DeRosa, Robin, and Rajiv Jhangiani. “Open Pedagogy.” Open Pedagogy Notebook, 16 Mar. 2018, https://openpedagogy.org/open-pedagogy/) Our program builds on the work of Jhangiani, DeRosa, and many others by providing a framework through which faculty can engage students as collaborators in their learning.

#openpedagogy #openeducation #oegconnect

·unewhavendh.org·Oct 17, 2024

Open Pedagogy Project at the University of New Haven

Add Alt Text To Images In Adobe Photoshop and Lightroom 2024

I just discovered Adobe updated the 2024 version of Photoshop and Lightroom so you can add alt text to images directly from the apps. This is wonderful news for the world of accessability. When other apps and services start using the metadata it'll lead to a huge increase in the numer of images with alt text.

#lightroom #accessibility

·alanwsmith.com·Oct 17, 2024

Add Alt Text To Images In Adobe Photoshop and Lightroom 2024

framework radio

framework began broadcasting in june, 2002 on the newly reformed resonance 104.4fm in london. the show now airs on twelve radio stations around the world, with regular new additions to its broadcast family, and streams and podcasts here on its own website. framework is consecrated to field recording and its use in composition, and began broadcasting at a time when a new community of sound artists with a special interest in found sound was developing, a community spread across the globe that, thanks to the internet, was no longer limited to a specific geography. framework sees itself as an outlet for this ever-growing and developing community, a folk-tool in a new folk movement, a community driven exchange point for creators and listeners alike. framework‘s goal is to present not only the extremely diverse sound environments of our world, but also the extremely diverse work that is being produced by the artists who choose to use these environments as their sonic sources. we hope to ask this question: is ‘field recording’ a style, or a genre, or is it in fact as uncontrollable and undefinable an instrument or tool as any, that may be interpreted, manipulated, and appropriated by anyone with a microphone and an idea? these works are its definition, and not vice versa.

#radio #audio

·frameworkradio.net·Oct 17, 2024

framework radio

Pluralistic: You should be using an RSS reader (16 Oct 2024) – Pluralistic: Daily links from Cory Doctorow

And then I had a realization: the conduit through which I experience Molly's excellent work is totally enshittification-proof, and the more I use it, the easier it is for everyone to be less enshittified. This conduit is anti-lock-in, it works for nearly the whole internet. It is surveillance-resistant, far more accessible than the web or any mobile app interface. It is my secret super-power. It's RSS.

#rss

·pluralistic.net·Oct 17, 2024

Pluralistic: You should be using an RSS reader (16 Oct 2024) – Pluralistic: Daily links from Cory Doctorow