Nowadays customers demand to see their data in real-time without any lag time. The challenge is to process in real-time and, if necessary, transform or clean the data to make sense of it.
It’s not just about the quantity of data you can analyze at once, it’s about the speed, scale, and quality of processing.
A streaming data platform plays an increasingly important role in Big Data ecosystem as more companies are moving towards real-time processing.
A streaming platform has three key capabilities:
- Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
- Store streams of records in a fault-tolerant durable way.
- Process streams of records as they occur.
Apache Kafka was originally developed by LinkedIn, and subsequently open sourced in early 2011 (now is part of the Apache software foundation), has become the leading distributed data streaming platform.
Apache Kafka offers a single platform to publish and subscribe, store, and process streams of data in real-time, at any scale and in any environment.
“Kafka is generally used for two broad classes of applications:
- Building real-time streaming data pipelines that reliably get data between systems or applications
- Building real-time streaming applications that transform or react to the streams of data”
Kafka is used in production by more than 35% of the Fortune 500 companies such as Uber, Twitter, Airbnb, Goldman Sachs and Netflix. These companies include seven of the top 10 global banks, eight of the top 10 insurance companies, and nine of the top 10 U.S. telecom companies [source].
Uber, for example, uses Kafka as a message bus for connecting different parts of the ecosystem. The picture below shows data flows through Kafka pipelines to power many of Uber’s analytical use cases on the right.
The New York Times uses Apache Kafka and the Kafka Streams API to store and distribute, in real-time, published content to the various applications and systems that make it available to the readers. The picture below shows the New York Times log/Kafka-based publishing architecture.
At Larus we have integrated Kafka to support various big data use cases, to help enterprises to build real-time pipelines and streaming applications.
We have worked with Apache Kafka over the last years. We have supported companies to architect and optimize their Kafka solutions through tailored consulting services, by assessing them for reliability, scalability, performance and by sharing best practices. We use Kafka is combination with Spark, Hadoop and NoSQL databases like as Couchbase and Neo4j.
In the financial sector we have implemented Apache Kafka to stream data in real-time and migrate to an event-based architecture with microservice-approach transformation. With Kafka we helped financial institutions identify and recognize frauds providing fraud analysts with most relevant, recent, and complete data to inform their machine learning models. Kafka aggregates data across sources for real-time view of potential exposures while making necessary changes to the portfolios and compute the value at risk on the fly. Using Kafka at the center of the risk management practice we provided real-time views of company-wide market, credit and liquidity risk.
Partnership with Kafka vendor Confluent
Last year, Larus joined the Confluent Partner Program as a Consulting Partner of Confluent. Working together, we help our customers harness the torrent of continuously changing data by enabling Confluent Platform, based on Apache Kafka.
The Confluent Platform is a streaming platform that enables you to organize and manage data from many different sources with one reliable, high-performance system. Confluent Platform improves Apache Kafka by expanding its integration capabilities, adding tools to optimize and manage Kafka clusters, and methods to ensure the streams are secure.
We’ve developed the official Apache Kafka-Neo4j Integration
The project is composed of two different parts:
Neo4j Streams, a plugin that works directly into Neo4j, enabling it to manage Apache Kafka events in three ways:
- as a Source, where any database change will be published as Kafka event
- as a Sink to ingest any kind of Kafka event into your graph.
- via the Neo4j Streams Procedures that enable to Produce/Consume Kafka events directly from Cypher.
The second pillar is the Kafka-Connect Plugin that works in the Kafka Cluster allowing to ingest any Kafka events into Neo4j via the Bolt Driver.
The connector received the Verified Gold certification from Confluent, assuring users that it’s fully compatible with the Kafka Connect API.
Providing change-data-capture and ingestion capabilities as Neo4j Extension and the Kafka Connect Neo4j Sink on Confluent Hub allows you to integrate real-time streaming with graph querying and analytics.
We are seeing strong demand for graphs applied to stream data for real-time fraud detection, financial transactions monitoring, customer 360 and real-time recommendations. In the following picture Neo4j works with Kafka and Confluent to enrich event streams with powerful graph-based analytics.
For more details about our project please see the following link, video and articles.
In this talk, from FOSDEM 2019, Andrea Santurbano, CTO at Larus, shows how leverage real-time streaming with Neo4j-Streams.
If you’re interested in hearing how leverage streaming data to build a faster, real-time business contact us today at firstname.lastname@example.org