Apache Flink Use Cases & Benefits of Integration with Confluent’s Data Streaming Platform

What is Apache Flink?

Apache Flink is a distributed stream and batch processing framework for stateful processing. It comes with a collection of complex and intelligently written APIs that power stream processing platforms at a wide range of companies. This technical blog aims to provide knowledge on the basics of event-streaming with Apache Kafka and build on those concepts to explain how Confluent’s new Flink service will adapt and improve its current system to provide the next level of stream processing.

You might also like:

What is Confluent TableFlow?

An event stream is a constant flow of data, where each event is a change of state in the system, and these events travel from Publishers to Subscribers (in Kafka, these are the Producers and the Consumers). Some examples of events are the constantly changing coordinates of a car in a city, a confirmation of a payment, and the value of a stock; there is no limit. Each collection of events is stored in a topic, which is a categorisation of different events, and we could create and store the previous examples in topics called: ‘car_location,’ ‘transactions,’ and ‘stock_values.’

These Topics are then broken down into smaller, more manageable chunks called Partitions, which are spread and replicated in the Kafka Cluster, which is made up of a collection of servers called Brokers. The main functions of the brokers are to receive, store and send messages to consumers. Partitions are replicated across brokers to ensure for high fault tolerance and high availability and are then further broken down into Segments, which are the actual physical files on which the data is stored. Data is structured in immutable logs, append-only lists of elements.

Confluent provides its users with a wide range of additional features, utilising Apache Kafka, Kafka Streams and a custom streaming SQL engine called ksqlDB to build an efficient, real-time data streaming platform. Paired with their vast number of custom-written connectors in the Confluent Hub, this makes Confluent the best choice for any business, new or old to view and manage their data in Confluent. They say it best, defining their service as providing “enterprise-grade capabilities [needed to run] mission-critical use cases [which allows its users to] operationalise and scale all your data streaming projects so you never lose focus on your core business.”

Methods of Enriching Data

Kafka Streams and Apache Flink are two methods for performing stream processing tasks. Confluent currently uses ksqlDB, running on the Kafka Streams engine to modify data being produced to the Kafka cluster, and by extension, data consumed by the consumers.

There are two main types of transforming data. Stateful Transformations (which rely on previous, persistent data in a stream, which influences the next event(s) processed) and Stateless Transformations (which process each stream or batch of data independently from the last). Some stateful transformation examples are: reduce, aggregate and branch. Some stateless transformation examples are map, filter, flatMap and groupBy.

Apache Flink: Stream Processing Evolved

There are a few key limitations to the current methods of stream processing being used, namely metadata declaration, checkpointing, and limited complexity of statements. This is where Apache Flink and Confluent step in, offering an entirely new batch processing framework where, with the use of Apache Kafka as a storage layer, users can leverage four different APIs of increasing levels of abstraction to filter, enrich and join data. Flink is also fully integrated with Confluent’s tooling for security, governance and observability.

Apache Flink comes with four different APIs, each of which performs a multitude of different actions and allows for many different use cases, as they are highly customisable. Flink also includes support for a range of different programming languages, including Scala, Python, SQL and Java. In decreasing levels of abstraction, these APIs are Flink SQL, Table API, DataStream API and ProcessFunction. With these APIs, Flink unifies both stream processing and batch processing, meaning you can select the mode most appropriate for the data you want to process, between Batch Processing Mode (BPM) for bounded data streams (like tables), and Stream Processing Mode (SPM) for unbounded data streams (such as a constantly changing stock price). This means that you can mix both real-time and historical data processing in the same application, and semantics/logic/code will be available to reuse when swapping between them.

While Flink alone is useful, it can be relatively difficult to set up and manage. Confluent has innovated a new version of Flink with an assortment of additional features that make Flink even more powerful and accessible than ever before:

Serverless: As Flink is difficult to physically set up, Confluent has made the Flink service cloud-native to ensure efficiency and speed
Metadata Importation: Confluent has removed the need to recreate tables – if the customer has Kafka that already contains topics and schemas in the Schema Registry, you will immediately be able to browse and query in Flink SQL
Independent Scalability: With Kafka as the storage layer and Flink as the computational layer, Confluent achieves separation of the storage layer and the computational later, ensuring both are scalable independently of each other
Evergreen Runtime: The Flink Runtime is not versioned, and to provide the user with a fully managed service, you will never have to update or upgrade it, as it is done automatically
Autoscaling Workloads: On Confluent Cloud, workloads scale automatically, and require no user intervention at all – you can view a lot of these metrics in the Confluent Control Panel, or you can integrate them with existing observability platforms
Usage-Based Billing: Allocation of compute resources is automatic in the Flink layer, and once you stop using the resources they are deallocated, and you only pay for them when you are using them. This also is linked to the concept of keeping Flink and Cluster communication region-specific, meaning that you cannot query across regions, but this is by design to reduce expensive data transfer charges
Built-In Security Model: Confluent provides users with one and offers the same systems in place for Kafka. RBAC (role-based access control) is also available with Flink and is easily defined in the Confluent Control Panel

Use Cases and Potential Customers

Confluent Cloud for Apache Flink® has an incredibly wide range of potential customers and use cases, due to the sheer range of features and additional services that Confluent ships with Flink. Not only will Confluent provide its users with Flink, but it will also maintain support and usage of ksqlDB, which will still run on the Kafka Streams engine. By not replacing their previous system for stream processing, users will be able to migrate to Flink at their own pace, which elevates their platform to be the very best for managing data.

Some examples of different types of data pipelines that would benefit from using Flink are as follows: stock market application to track constantly changing prices, shopping market catalogue of all items and prices as well as transactions/orders, transport application that uses customer data and driver locations to inform current prices, banking application to keep a record of all transactions and current account values.

Any customer currently using Apache Kafka knows that keeping the cluster healthy, managing the resources and ensuring near-perfect up-time can be difficult. Confluent Cloud removes the elements that make managing a cluster difficult and allows the user to dedicate their time towards ensuring that they get the most out of their data, instead of managing resources. Additionally, Flink SQL is ANSI standard compliant, ensuring even more companies can join the platform.

Due to the nature of Flink and Kafka, you can integrate your current systems into Confluent with relative ease. Whether you need a better system to manage an endless stream of data or a finite set of values, Confluent provides a solution and its relationship with Flink is just beginning.

More Resources like this one:

Confluent NEW Apache Flink
Seamless Integration for Real-time Insights with Confluent

Migration Accelerator Programme Details
Unlock the Power of Confluent with Somerford Associates

Request a Demo of Confluent Apache Flink

Get started today with Confluent’s cloud-native, serverless Apache Flink service by speaking to one of our Confluent experts.

REQUEST A DEMO

Cookie	Duration	Description
language	1 month 1 hour	This cookie is used to store the language preference of the user.
li_gc	6 months	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
yt-remote-cast-available	session	The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed	session	The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-fast-check-period	session	The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app	session	The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name	session	The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY	never	The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
ADRUM_BT1	past	This cookie is used to optimize the visitor experience on the website by detecting errors on the website and share the information to support staff.
ADRUM_BTa	past	This cookie is used to optimize the visitor experience on the website by detecting errors on the website and share the information to support staff.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat_gtag_UA_*	1 minute	Google Analytics sets this cookie to store a unique user ID.
_gat_gtag_UA_1170872_23	1 minute	Set by Google to distinguish users.
_gat_gtag_UA_99925054_1	1 minute	Set by Google to distinguish users.
_gcl_au	3 months	Google Tag Manager sets the cookie to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_lfa	1 year	This cookie is set by the provider Leadfeeder to identify the IP address of devices visiting the website, in order to retarget multiple users routing from the same IP address.
CONSENT	16 years 2 months 24 days 11 hours 26 minutes	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
nQ_cookieId	1 year	Information about Albacross’ processing of your personal data We inform you regarding the processing of personal data on behalf of Albacross Nordic AB (“Albacross”). Information collected from cookies set in your device that qualify as personal data will be processed by Albacross, a platform offering visitor identification and ad targeting services with offices in Stockholm and Krakow. Please see below for full contact details. The purpose for the processing of the personal data is that it enables Albacross to improve a service rendered to us and our website (e.g “Intent” service), by adding data to their database about companies. The Albacross database will in addition to “Intent Data” be used for targeted advertising purposes towards companies and for this purpose data will be transferred to third-party data service providers. For the purpose of clarity, targeted advertising regards companies, not towards individuals. The data that is collected and used by Albacross to achieve this purpose is information about the IP address from which you visited our website and technical information that enables Albacross to tell apart different visitors from the same IP address. Albacross stores the domain from form input in order to correlate the IP address with your employer. For full information about our processing of personal data, please see Albacross’ Privacy Policy. Albacross Nordic AB Companyreg. no 556942-7338 Tegelbacken 4A 111 52 Stockholm, Sweden www.albacross.com - contact@albacross.com
nQ_visitId	1 year	Information about Albacross’ processing of your personal data We inform you regarding the processing of personal data on behalf of Albacross Nordic AB (“Albacross”). Information collected from cookies set in your device that qualify as personal data will be processed by Albacross, a platform offering visitor identification and ad targeting services with offices in Stockholm and Krakow. Please see below for full contact details. The purpose for the processing of the personal data is that it enables Albacross to improve a service rendered to us and our website (e.g “Intent” service), by adding data to their database about companies. The Albacross database will in addition to “Intent Data” be used for targeted advertising purposes towards companies and for this purpose data will be transferred to third-party data service providers. For the purpose of clarity, targeted advertising regards companies, not towards individuals. The data that is collected and used by Albacross to achieve this purpose is information about the IP address from which you visited our website and technical information that enables Albacross to tell apart different visitors from the same IP address. Albacross stores the domain from form input in order to correlate the IP address with your employer. For full information about our processing of personal data, please see Albacross’ Privacy Policy. Albacross Nordic AB Companyreg. no 556942-7338 Tegelbacken 4A 111 52 Stockholm, Sweden www.albacross.com - contact@albacross.com
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.

Cookie	Duration	Description
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__Secure-ROLLOUT_TOKEN	6 months	Description is currently not available.
_lfa_test_cookie_stored	less than a minute	Description is currently not available.
cookie-test	past	No description
cookielawinfo-checkbox-functional	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
guest	1 month 1 hour	No description available.
is_bot	session	Description is currently not available.
jcm	past	No description
jcmc	past	No description
JOTFORM_SESSION	1 month	No description available.
nQ_userVisitId	1 hour	No description available.
SameSite	past	No description available.
theme	1 month 1 hour	No description available.
userReferer	1 month 1 hour	No description available.

Apache Flink and Confluent: The Use Cases and Benefits of Integration with Confluent’s Data Streaming Platform

What is Apache Flink?

Apache Kafka: An Event-Streaming Platform

Methods of Enriching Data

Apache Flink: Stream Processing Evolved

Use Cases and Potential Customers