Skip to content

rnhc1000/pipeline-api

Repository files navigation

Pipelining Data from a Web Page

When dealing with events, in order to get my hands dirty and understand how to build an application that collect analytics data from a web page, where each button clicked or an input field has been filled in or a mouse move has been done, to generates an event and this event be consumed by an API that has a Kafka broker attached and persisting these data to a AWS S3-like object storage, partitioned by hour through a Spark processing and having this data available through Hive SQL tables.
A front-end application was built pipeline-ui in order to generate the events to emulate a user surfing in a web page and collecting some data. Hopefully this codebase can help someone to better understand how a pipeline is built under the hood.

Table of contents

Overview


Requirements

There are some specific requirements to be met, such as authentication and authorization, data persistence, paginated data recovery, authenticated access to endpoints and some other requirements.

The app has been coded using ReactJS, TypeScript, Java 21, Spring Boot 3.3.5, Gradle, Javadoc, Spring Security, Spring JPA, Spark, Kafka, Minio, OpenAPI, PostgresSQL, Docker and hosted in an AWS EC2 instance with secure access provided by a NGINX SSL proxy reverse and being live at Pipeline


Project Structure

  • docs
    • javadocs
  • src
    • main
    • java
      • br.dev.ferreiras.PipelineApiApplication
        • config
        • controller
          • handlers
        • dto
        • entity
        • enums
        • mapper
        • repository
        • services
          • exceptions
    • resources
      • db.migration
      • certs
    • test

Howto Build and Run


"""

...

 ....
"""



Screenshot

Links

Built with

My Skills

Code Snippet

/**
 *
 * @author ricardo@ferreiras.dev.br
 * @version 1.1.030901
 * @since 1.0
 *
 */
package br.dev.ferreiras.pipeline_api.config;

import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.common.serialization.StringSerializer;
import org.springframework.context.annotation.*;
import org.springframework.kafka.core.*;
import org.springframework.kafka.support.serializer.JsonSerializer;
import java.util.HashMap;
import java.util.Map;


@Configuration
public class KafkaProducerConfig {

    @Bean
    public ProducerFactory<String, Object> producerFactory() {
        Map<String, Object> configProps = new HashMap<>();

        configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "172.31.40.1:9092");
        configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
        configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
        configProps.put(ProducerConfig.ACKS_CONFIG, "all");
        configProps.put(ProducerConfig.RETRIES_CONFIG, 3);

        return new DefaultKafkaProducerFactory<>(configProps);
    }

    @Bean
    public KafkaTemplate<String, Object> kafkaTemplate() {
        return new KafkaTemplate<>(producerFactory());
    }
}

Outcome...

{
        "topic": "click-events",
        "partition": 0,
        "offset": 29,
        "timestamp": 1749676029658,
        "key": null,
        "value": {
        "countryCode": "BR",
        "userAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36",
        "formData": {
        "username": "guacadmin",
        "email": "rnhc1000@gmail.com",
        "address": "Rua Lisboa, 757",
        "country": "BR"
        },
        "city": "New City",
        "longitude": 49.4762,
        "timestamp": "2025-06-11T21:07:09.392Z",
        "latitude": 6.5238,
        "referer": "http://192.168.0.18:8000/pages",
        "region": "MA",
        "clientIp": "192.168.0.16",
        "buttonId": "loginButton",
        "cookieData": {
        "user-id": "5bff99ec-4b5a-474d-818c-41a33874c89f",
        "session-id": "",
        "source": "landing-page"
        }
        },
        "headers": "{}"
        }

Continued development

  • Unit Tests - TBD
  • Subscriber Authentication - TBD
    • Spring JWT-OAuth2 - TBD
  • Records Pagination - TBD

Useful resources

Author

Ricardo Ferreira

- Portfolio

My Portfolio...

About

Web page analytics api

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published