Wondering how to execute a spark job on an AWS EMR cluster, based on a file upload event on S3? Then this post if for you. In this post we go over how to trigger spark jobs on an AWS EMR cluster, using AWS Lambda. The lambda function will execute in response to an S3 upload event. We will go over this event driven pattern with code snippets and set up a fully functioning pipeline.
Course Information - Big Data Platforms, Autumn 2021
Helsingin yliopiston kaikille avoin ja ilmainen ohjelmoinnin perusteet opettava verkkokurssi. Kurssilla perehdytään nykyaikaisen ohjelmoinnin perusideoihin sekä ohjelmoinnissa käytettävien työvälineiden lisäksi algoritmien laatimiseen. Kurssille osallistuminen ei vaadi ennakkotietoja ohjelmoinnista.
There are lots of data-storage options available today. You have to choose between managed or unmanaged, relational or NoSQL, write- or read-optimized, proprietary or open-source — and it doesn't end there. Once you begin your search, you will end up in the universe that is database marketing. All of the vendors
Get the sample models for MongoDB, Neo4j, Cassandra, Swagger, Avro, Parquet, Glue, and more! After download, open the models using Hackolade, and learn through the examples how to leverage the modeling power of the software.
Learn the key steps of deploying databases and stateful workloads in Kubernetes and meet cloud-native technologies that can streamline Apache Cassandra for K8s.
Starting your journey with Microsoft Azure Data Factory
In this article, we will go through the Microsoft Azure Data Factory service, that can be used to ingest, copy and transform data generated from various data sources
This post goes over what the ETL and ELT data pipeline paradigms are. It tries to address the inconsistency in naming conventions and how to understand what they really mean. Finally ends with a comparison of the 2 paradigms and how to use these concepts to build efficient and scalable data pipelines.
Are you disappointed with online SQL tutorials that aren't deep enough? Are you frustrated knowing that you are missing SQL skills, but can't quite put your finger on it? This post is for you. In this post, we go over a few topics that can take your SQL skills to the next level and help you be a better data engineer.
In this post, we go over 6 key concepts to help you master window functions. Window functions are one the most powerful features of SQL, they are very useful in analytics and performing operations that cannot be done easily with the standard group by, subquery and filters. Despite this, window functions are not used frequently. If you have ever thought 'window functions are confusing', then this post is for you.
What are Common Table Expressions(CTEs) and when to use them?
You have heard of Common Table Expressions(CTEs), but are not be sure what they are and when to use them. What if you knew exactly what Common Table Expressions(CTEs) were and when to use them? In this post, we go over what CTEs are, and their performance comparisons against subqueries, derived tables, and temp tables to help decide when to use them.
Designing a Data Project to Impress Hiring Managers
Frustrated that hiring managers are not reading your Github projects? then this post is for you. In this post, we discuss a way to impress hiring managers by hosting a live dashboard with near real-time data. We will also go over coding best practices such as project structure, automated formatting, and testing to make your code professional. By the end of this post, you will have deployed a live dashboard that you can link to your resume and LinkedIn.
Data Engineering Project: Stream Edition · Start Data Engineering
Data engineering project for beginners, stream edition. In this post we design and build a simple data streaming pipeline using Apache Kafka, Apache Flink and PostgreSQL DB. We will also review the design and understand some common issues to avoid while building distributed stream processing systems.