No Clocks

No Clocks

2687 bookmarks
Newest
autodb: Automatic Database Normalisation for Data Frames
autodb: Automatic Database Normalisation for Data Frames
Automatic normalisation of a data frame to third normal form, with the intention of easing the process of data cleaning. (Usage to design your actual database for you is not advised.) Originally inspired by the 'AutoNormalize' library for 'Python' by 'Alteryx' (<a href="https://github.com/alteryx/autonormalize" target="_top"https://github.com/alteryx/autonormalize/a>), with various changes and improvements. Automatic discovery of functional or approximate dependencies, normalisation based on those, and plotting of the resulting "database" via 'Graphviz', with options to exclude some attributes at discovery time, or remove discovered dependencies at normalisation time.
·cran.r-project.org·
autodb: Automatic Database Normalisation for Data Frames
Access, retrieve, and work with CMHC data.
Access, retrieve, and work with CMHC data.
Wrapper around the Canadian Mortgage and Housing Corporation (CMHC) web interface. It enables programmatic and reproducible access to a wide variety of housing data from CMHC.
·mountainmath.github.io·
Access, retrieve, and work with CMHC data.
HelloData - Full Product Demo (6-3-2024)
HelloData - Full Product Demo (6-3-2024)
Power your multifamily rent surveys with real-time data on over 25M units nationwide, sourced entirely from property websites and public data sources.
·youtu.be·
HelloData - Full Product Demo (6-3-2024)
Data Pipeline Design Patterns - #1. Data flow patterns
Data Pipeline Design Patterns - #1. Data flow patterns
Data pipelines built (and added on to) without a solid foundation will suffer from poor efficiency, slow development speed, long times to triage production issues, and hard testability. What if your data pipelines are elegant and enable you to deliver features quickly? An easy-to-maintain and extendable data pipeline significantly increase developer morale, stakeholder trust, and the business bottom line! Using the correct design pattern will increase feature delivery speed and developer value (allowing devs to do more in less time), decrease toil during pipeline failures, and build trust with stakeholders. This post goes over the most commonly used data flow design patterns, what they do, when to use them, and, more importantly, when not to use them. By the end of this post, you will have an overview of the typical data flow patterns and be able to choose the right one for your use case.
·startdataengineering.com·
Data Pipeline Design Patterns - #1. Data flow patterns
Advanced Tidyverse
Advanced Tidyverse
Use piped workflows for efficient data cleaning and visualization.
·sesync-ci.github.io·
Advanced Tidyverse
Summarizing and Querying Data from Excel Spreadsheets Using eparse and a Large Language Model
Summarizing and Querying Data from Excel Spreadsheets Using eparse and a Large Language Model
Editor's Note: This post was written by Chris Pappalardo, a Senior Director at Alvarez & Marsal, a leading global professional services firm. The standard processes for building with LLM work well for documents that contain mostly text, but do not work as well for documents that contain tabular data (like spreadsheets). We wrote about our latest thinking on Q&A over csvs on the blog a couple weeks ago, and we loved reading Chris's exploration of working with csvs and LangChain using agents, chai
·blog.langchain.dev·
Summarizing and Querying Data from Excel Spreadsheets Using eparse and a Large Language Model
Agentic AI for Data Management and Warehousing
Agentic AI for Data Management and Warehousing
Explore how Agentic AI for data management enhances automation, governance, and decision-making by leveraging intelligent workflows, real-time insights
·xenonstack.com·
Agentic AI for Data Management and Warehousing