We provide IT Staff Augmentation Services!

Big Data Developer Resume

5.00/5 (Submit Your Rating)

Dearborn, MI

SUMMARY

  • 6+ years of software development experience with around 4+ years of extensive experience in Data Engineering using BigData/Spark technologies.
  • Strong experience building data pipelines, deploying to production, monitoring, and maintaining.
  • Good experience with programming languages Scala, Java, and Python.
  • Strong experience working with large datasets and designing highly scalable and optimized data modelling and data integration pipelines.
  • Good understanding of distributed systems architecture and parallel computing paradigms.
  • Strong experience working with Spark processing framework for performing large scale data transformations, cleansing, aggregations etc.,
  • Good experience working Spark core, Spark Dataframe Api, Spark Sql and Spark Streaming Apis.
  • Strong experience fine tuning long running Spark applications and troubleshooting common failures.
  • Utilized various features of spark like broadcast variables, accumulators, caching/persist, dynamic allocation etc.,
  • Worked on real time data integration using Kafka and Spark streaming.
  • Experience developing Kafka producers and Kafka Consumers for streaming millions of events on streaming data.
  • Strong experience working with various Hadoop ecosystem components like HDFS, Hive, Hbase, Sqoop, Oozie, Impala, Yarn, Hue etc.,
  • Strong experience using Hive for creating centralized data warehouses and data modelling for efficient data access.
  • Strong experience creating partitioned tables in Hive and bucketing for improving large join performance.
  • Extensive experience utilizing AWS cloud services like S3, EMR, Redshift, Athena, Glue metastore etc., for managing and building data lakes natively on teh cloud.
  • Hands on experience in importing and exporting data into HDFS and Hive using Sqoop.
  • Exposure on usage of NoSQL databases HBase and Cassandra.
  • Extensive experienced in working with structured, semi - structured, and unstructured data by implementing complex MapReduce programs.
  • Experience with design, development and maintenance of ongoing metrics, reports, analyses, dashboards, etc. using tableau, to drive key business decisions and communicate key concepts to readers.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks, etc.) to fully implement and leverage new Hadoop features.
  • Good exposure to other cloud providers GCP and Azure and utilized Azure Databricks for learning and experimentation.
  • Strong experience as a Core Java developer for building Rest Apis and other integration applications.
  • Strong Experience in working with Databases like Oracle, DB2, Teradata and MySQL and proficiency in writing complex SQL queries.
  • Great team player and quick learner with TEMPeffective communication, motivation, and organizational skills combined with attention to details and business improvements.
  • Experienced in involving complete SDLC life cycle includes requirements gathering, design, development, Testing, and production environments.

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, HBase, Cassandra, Parquet and Snappy.

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR, Azure Databricks, Azure Data Lake

Languages: Java, SQL, Scala, Python

No SQL Databases: HBase and Snowflake

Methodology: Agile, waterfall

Development / Build Tools: Eclipse, Maven, IntelliJ, JUNIT and log4J.

DB Languages: MySQL, PL/SQL, PostgreSQL, and Oracle

PROFESSIONAL EXPERIENCE:

Big Data Developer

Confidential

Responsibilities:

  • Developed Spark Applications to implement various data cleansing/validation and processing activity of large-scale datasets ingested from traditional data warehouse systems.
  • Worked both with batch and real time streaming data sources.
  • Developed custom Kafka producers to write teh streaming messages from external Rest applications to Kafka topics.
  • Developed spark streaming applications to consume teh streaming json messages from Kafka topics.
  • Developed data transformations job using Spark Data frames to flatten JSON documents.
  • Worked with teh Spark for improving performance and optimization of teh existing transformations.
  • Used Spark Streaming APIs to perform transformations and actions on teh fly for building common learner data model which gets teh data -from Kafka in Near real time and persist it to HBase.
  • Worked and learned a great deal from AWS Cloud services like EMR, S3, RDS, Redshift, Athena, Glue.
  • Migrated an existing on-premises data pipelines to AWS.
  • Worked on automating provisioning of AWS EMR clusters.
  • Used Hive QL to analyze teh partitioned and bucketed data, executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet teh business specification logic.
  • Experience in using Avro, Parquet, ORC file and JSON file formats, developed UDFs in Hive.
  • Worked with Log4j framework for logging debug, info & error data.
  • Used Jenkins for Continuous integration.
  • Generated various kinds of reports using Tableau based on client specification.
  • Used Jira for bug tracking and Git to check-in and checkout code changes.
  • Responsible for generating actionable insights from complex data to drive real business results for various application teams and worked in Agile Methodology projects extensively.
  • Worked with Scrum team in delivering agreed user stories on time for every Sprint.

Environment: AWS, S3, EMR, Spark, Kafka, Hive, Athena, Glue, Redshift, Teradata, Tableau

Data Engineer

Confidential, Dearborn, MI

Responsibilities:

  • Ingested gigabytes of click stream data from external servers such as FTP server and S3 buckets on daily basis using customized home-grown Input Adapters.
  • Created Spark JDBC ingestion jobs to import/export data from RDBMS to S3 data store.
  • Developed various spark applications using Scala to perform cleansing, transformation, and enrichment of these click stream data.
  • Involved in data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting.
  • Troubleshooting Spark applications for improved error tolerance and reliability.
  • Fine-tuning spark applications/jobs to improve teh efficiency and overall processing time for teh pipelines.
  • Created Kafka producer API to send live stream json data into various Kafka topics.
  • Developed Spark-Streaming applications to consume teh data from Kafka topics and to insert teh processed streams to Snowflake.
  • Utilized Spark in Memory capabilities, to handle large datasets.
  • Used Broadcast variables in Spark, TEMPeffective & efficient Joins, transformations, and other capabilities for data processing.
  • Experienced in working with EMR cluster and S3 in AWS cloud.
  • Creating Hive tables, loading and analyzing data using hive scripts. Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Involved in continuous Integration of application using Jenkins.
  • Interacted with teh infrastructure, network, database, application, and BA teams to ensure data quality and availability.
  • Followed Agile Methodologies while working on teh project.

Environment: AWS EMR, Spark, Snowflake, Hive, HDFS, Sqoop, Kafka, Scala, Java, S3, CloudWatch, Aws simple workflow

Big Data/Hadoop Developer

Confidential

Responsibilities:

  • Worked on installing Kafka on Virtual Machine and created topics for different users
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Responsible for importing real time data to pull teh data from sources to Kafka clusters.
  • Worked with spark performance improvement options like broadcasting, caching, repartitioning, and modifying teh spark executor configurations for performance tuning.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Involved in migrating MapReduce jobs into Spark jobs and used Spark and Data frames API to load structured data into Spark clusters.
  • Involved in using Spark API over Hadoop YARN as execution engine for data analytics using Hive and submitted teh data to BI team for generating reports, after teh processing and analyzing of data in Spark SQL.
  • Performed SQL Joins among Hive tables to get input for Spark batch process.
  • Worked with data science team to build statistical model with Spark MLLIB and Pyspark.
  • Involved in performing importing data from various sources to teh Cassandra cluster using Sqoop.
  • Worked on creating data models for Cassandra from Existing Oracle data model.
  • Designed Column families in Cassandra and Ingested data from RDBMS, performed data transformations, and then export teh transformed data to Cassandra as per teh business requirement.
  • Used Sqoop to import functionality for loading Historical data present in RDBMS to HDFS
  • Designed workflows and coordinators in Oozie to automate and parallelize Hive jobs on Hadoop environment by Cloudera (HDP 2.2)
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze teh logs produced by teh spark cluster.
  • Developed Oozie workflow for scheduling & orchestrating teh ETL process.
  • Created Data Pipelines as per teh business requirements and scheduled it using Oozie Coordinators.
  • Wrote Python scripts to parse XML documents and load teh data in database.
  • Worked extensively on Apache Nifi to build Nifi flows for teh existing Oozie jobs to get teh incremental load, full load, and semi structured data and to get data from Rest API into Hadoop and automate all teh Nifi flows runs incrementally.
  • Created Nifi flows to trigger spark jobs and used put email processors to get notifications if there are any failures.
  • Worked extensively with importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive and AWS cloud.
  • Used version control tools like GITHUB to share teh code snippet among teh team members.
  • Involved in daily SCRUM meetings to discuss teh development/progress and was active in making scrum meetings more productive.

Environment: Hadoop, HDFS, Hive, Python, HBase, Nifi, Spark, MYSQL, Oracle 12c, Linux, Hortonworks, Oozie, MapReduce, Sqoop, Shell Scripting, Apache Kafka, Scala, AWS.

Hadoop Developer

Confidential

Responsibilities:

  • Analyzing Functional Specifications based on Project Requirement.
  • Ingested data from various data sources into Hadoop HDFS/Hive Tables using SQOOP, Flume, Kafka.
  • Extended Hive core functionality by writing custom UDFs using Java.
  • Developing Hive Queries for teh user requirement.
  • Worked on multiple POCs in Implementing Data Lake for Multiple Data Sources ranging from Teamcenter, SAP, Workday, Machine logs.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Worked on MS Sql Server PDW migration for MSBI warehouse.
  • Planning, scheduling, and implementing Oracle to MS SQL server migrations for AMAT in house applications and tools.
  • Worked on Solr Search Engine to index incident reports data and developed dash boards in Banana Reporting tool.
  • Integrated Tableau with Hadoop data source for building dashboard to provide various insights on sales of teh organization.
  • Worked on Spark in building BI reports using Tableau. Tableau was integrated with Spark using Spark-SQL.
  • Developed Spark jobs using Scala and Python on top of Yarn/MRv2 for interactive and Batch Analysis.
  • Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it in AWS HDFS.
  • Developed workflows in Live compared to Analyze SAP Data and Reporting.
  • Worked on Java development and support and tools support for in house applications.
  • Participated in daily scrum meetings and iterative development.

Environment: Hadoop, Hive, Sqoop, Spark, Kafka, Scala, MS SQL Server, Java.

We'd love your feedback!