Cooking With DuckDB

Tutorials
A Beginner’s Guide to Data Engineering — Part I
Data Engineering: The Close Cousin of Data Science
How to trigger a spark job from AWS Lambda
Wondering how to execute a spark job on an AWS EMR cluster, based on a file upload event on S3? Then this post if for you. In this post we go over how to trigger spark jobs on an AWS EMR cluster, using AWS Lambda. The lambda function will execute in response to an S3 upload event. We will go over this event driven pattern with code snippets and set up a fully functioning pipeline.
Hadoop Vs SQL | Find Out The Top 6 Most Successful Differences
Guide to Hadoop Vs SQL. Here we have discussed Hadoop vs SQL head to head comparison, key difference along with infographics and comparison table.
Course Information - Big Data Platforms, Autumn 2021
Helsingin yliopiston kaikille avoin ja ilmainen ohjelmoinnin perusteet opettava verkkokurssi. Kurssilla perehdytään nykyaikaisen ohjelmoinnin perusideoihin sekä ohjelmoinnissa käytettävien työvälineiden lisäksi algoritmien laatimiseen. Kurssille osallistuminen ei vaadi ennakkotietoja ohjelmoinnista.
The Apache Cassandra Beginner Tutorial
There are lots of data-storage options available today. You have to choose between managed or unmanaged, relational or NoSQL, write- or read-optimized, proprietary or open-source — and it doesn't end there. Once you begin your search, you will end up in the universe that is database marketing. All of the vendors
danielbeach/data-engineering-practice: Data Engineering Practice Problems
Data Engineering Practice Problems. Contribute to danielbeach/data-engineering-practice development by creating an account on GitHub.
DevOps for Data Science
NoSQL databases sample models: MongoDB, Neo4j, Swagger, Cassandra
Get the sample models for MongoDB, Neo4j, Cassandra, Swagger, Avro, Parquet, Glue, and more! After download, open the models using Hackolade, and learn through the examples how to leverage the modeling power of the software.
How to Put a Database in Kubernetes - DZone Cloud
Learn the key steps of deploying databases and stateful workloads in Kubernetes and meet cloud-native technologies that can streamline Apache Cassandra for K8s.
Ultimate CI Pipeline for All of Your Python Projects
Everything you ever wanted for your Python project continuous integration pipeline — up-and-running in matter of minutes
Starting your journey with Microsoft Azure Data Factory
In this article, we will go through the Microsoft Azure Data Factory service, that can be used to ingest, copy and transform data generated from various data sources
Whats the difference between ETL & ELT?
This post goes over what the ETL and ELT data pipeline paradigms are. It tries to address the inconsistency in naming conventions and how to understand what they really mean. Finally ends with a comparison of the 2 paradigms and how to use these concepts to build efficient and scalable data pipelines.
Where to validate incoming data?
When you watch the blueprint I also use in my cookbook you see the different phases: Connect, Processing Framework, Store and Buffer. At…
A Beginner Guide to Airflow
A step-by-step guide on how to start with Airflow: from your local set-up to creating simple tasks.
How to improve at SQL as a data engineer
Are you disappointed with online SQL tutorials that aren't deep enough? Are you frustrated knowing that you are missing SQL skills, but can't quite put your finger on it? This post is for you. In this post, we go over a few topics that can take your SQL skills to the next level and help you be a better data engineer.
6 Key Concepts, to Master Window Functions
In this post, we go over 6 key concepts to help you master window functions. Window functions are one the most powerful features of SQL, they are very useful in analytics and performing operations that cannot be done easily with the standard group by, subquery and filters. Despite this, window functions are not used frequently. If you have ever thought 'window functions are confusing', then this post is for you.
What are Common Table Expressions(CTEs) and when to use them?
You have heard of Common Table Expressions(CTEs), but are not be sure what they are and when to use them. What if you knew exactly what Common Table Expressions(CTEs) were and when to use them? In this post, we go over what CTEs are, and their performance comparisons against subqueries, derived tables, and temp tables to help decide when to use them.
Event-driven architecture with Azure Eventgrid (Databricks-Azure Data Factory)
A step by step guide on how to build an Event-driven architecture in Azure.
Designing a Data Project to Impress Hiring Managers
Frustrated that hiring managers are not reading your Github projects? then this post is for you. In this post, we discuss a way to impress hiring managers by hosting a live dashboard with near real-time data. We will also go over coding best practices such as project structure, automated formatting, and testing to make your code professional. By the end of this post, you will have deployed a live dashboard that you can link to your resume and LinkedIn.
Data Engineering Project: Stream Edition · Start Data Engineering
Data engineering project for beginners, stream edition. In this post we design and build a simple data streaming pipeline using Apache Kafka, Apache Flink and PostgreSQL DB. We will also review the design and understand some common issues to avoid while building distributed stream processing systems.