We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Bloomington, IL

SUMMARY

  • Over 7+ years of experience in software design, development, maintenance, testing, and troubleshooting of enterprise applications.
  • Over 4 plus years of experience in design, development, and maintenance and support of Big Data Analytics using Hadoop Ecosystem components like HDFS, Hive, Pig, Sqoop, Zookeeper, Map Reduce, and Oozie.
  • Strong working experience with ingestion, storage, processing and analysis of big data.
  • Good Experience in writing Map Reduce programs using Scala.
  • Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig.
  • Good Experience in designing the Nifi flows for data routing, transformations and mediation logic.
  • Experience with SQL, PL/SQL and NoSQL databases concepts
  • Good Experience with job workflow scheduling like Oozie
  • Experience on creating databases, tables and views in HIVE, IMPALA
  • Experience with performance tuning on map reduce and hive jobs
  • Load and transform large sets of structured, semi - structured and unstructured data using Hadoop ecosystem components.
  • Implemented several optimization mechanisms like Combiners, Distributed Cache, Data Compression, and Custom Partitioner’s to speed up the jobs.
  • Worked with Sqoop in importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
  • Have good exposure on ETL & Reporting tools like Informatica and Tableau respectively.
  • Experience with working of cloud configuration in Amazon web services AWS
  • Experience with different data formats likeJson, Avro, parquet, RC and ORCand compressions likesnappy & bzip.
  • Proficient in PL/SQL programming - Stored Procedures, Functions, Packages, SQL tuning, and creation of Oracle Objects - Tables, Views, Materialized Views, Triggers, Sequences, Database Links, and User Defined Data Types
  • Experience in various phases of Software Development Life Cycle (Analysis, Requirements gathering, Designing) with expertise in documenting various requirement specifications, functional specifications, Test Plans, Source to Target mappings, SQL Joins.

TECHNICAL SKILLS

Big Data Ecosystems: Hadoop, Map Reduce, HDFS, Zookeeper, Hive, Pig, Sqoop, Oozie, Flume, Yarn, Spark.

DB Languages: SQL, PL/SQL, Teradata, Oracle

Programming Languages: Java, C, and Scala.

Frameworks: Spring.

Scripting Languages: JSP & Servlets, JavaScript, Python

Web Services: Restful

Databases: RDBMS, HBase, Cassandra, MongoDB

Tools: Eclipse, Net Beans.

Platforms: Windows, Linux, Unix

Application Servers: Apache Tomcat.

Methodologies: Agile, Waterfall

ETL & Reporting Tools: Informatica, Tableau

PROFESSIONAL EXPERIENCE

Confidential, Bloomington, IL

Big Data Engineer

Responsibilities:

  • Written a code using SCALA for different transformations in SPARK and creating RDD’s over that.
  • Writing the logic based on the client requirement.
  • Worked with ORC, JSON file formats and used various compression techniques to leverage the storage in HDFS
  • Design and develop recommendation work flows.
  • Hands on experience on AWS platform with EC2, S3 & EMR
  • Optimizing the EMRFS for Hadoop to directly read and write in parallel to AWS S3 performantly.
  • Knowledge on Amazon EC2 Spot integration & and Amazon S3 integration
  • Implemented spark streaming in scala to process the JSON Messages and push them to the kafka topic.
  • Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Integration testing performed on different individual software modules.
  • Worked with Infra Team and release the build to Production.
  • Worked with Architecture review team in case of any changes to Data feeds/design.
  • Responsible to monitor application with high performance.
  • Responsible for optimizing resource allocation in distributed systems.
  • Migrated the existing data to Hadoop from RDBMS (Oracle) using Sqoop for processing the data.
  • Worked with sqoop export to export the data back to RDBMS.
  • Used Github to set the overall direction of the project and track the progress of the project.
  • Used various compression codecs to effectively compress the data in HDFS.
  • Involved in creating Hive tables and loading and analyzing data using hive queries.
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Worked on various production issues during the month end support and provide the resolutions without missing any SLA.

Environment: Hive, SPARK, SCALA, SPARK SQL, HDFS SAPBO, Sqoop, Kafka, SPARK streaming, AWS.

Confidential, Seattle, WA

Big Data Engineer

Responsibilities:

  • Implemented workflows to process around 400 messages per second and push the messages to the DocumentDB as well as Event Hubs.
  • Developed a custom message producer which can produce about 4000 messages per second for scalability testing.
  • Implemented call-back architecture and notification architecture for real time data.
  • Implemented spark streaming in scala to process the JSON Messages and push them to the Kafka topic.
  • Implemented applications which process weather images and send the data in json formats.
  • Created Custom Dashboards Using Application Insights and Application Insights Query Language to process metrics sent to AI and create dashboards on top of it in AZURE.
  • Created real time streaming dashboards in PowerBi using Stream Analytics to push dataset to PowerBi.
  • Developed a custom message consumer to consume the data from the Kafka producer and push the messages to service bus and event hub (Azure Components).
  • Written Auto scalable functions which will consume the data from Azure Service Bus or Azure Event Hub and send the data to DocumentDB.
  • Written spark Application to capture the change feed from the DocumentDB using java API and write updates to the new DocumentDB.
  • Worked in creating DStreams from sources like Flume, Kafka and performed different Spark transformations and actions on it.
  • Used different Spark APIs to perform necessary transformations and actions on the data which gets from Kafka in real time.
  • Performed various Parsing technique's using spark API'S to cleanse the data from Kafka
  • Experienced in working with Spark SQL on different file formats like Avro and Parquet.
  • Implemented Zero Down Time deployment for the entire production pipelines in Azure.
  • Implemented CICD pipelines to build and deploy the projects in Hadoop environment.
  • Experienced in implementing the pipelines in Jenkins.
  • Used Custom Receiver, socket stream, File stream and Directory stream in spark streaming.
  • Used Lambda, Kinesis, DynamoDB, Cloudwatch from AWS.
  • Used APP Insights, DocumentDB, Service Bus, Azure Data Lake Store, Azure Blob Store, Event HUB, Azure Functions.
  • Used Python to run the ansible playbook which will deploy the logic apps to azure.

Environment: Hadoop, Hive, HDFS, Azure, AWS, spark Streaming, spark-sql, scala, python, Java, webserver’s, Maven Build, Jenkins, Ansible.

Confidential, Atlanta, Georgia

Big Data Engineer

Responsibilities:

  • Used Nifi to ingest the data from various sources into the datalake.
  • Created Hive managed and external tables.
  • Used Kafka for streaming application.
  • Created topics in kafka broker which gets the data from sources with the help of Nifi and Spark job consumes it and pushes it into Azure Document DB.
  • Worked with Partitioning, bucketing and other optimizations in hive.
  • Developed and implemented core API services using Spark with Scala.
  • Used Rally to keep the track of the user stories and tasks for completing in each sprint.
  • Worked on ingesting the data from hive to spark and create dataframes in spark then updating it into Azure Document DB.
  • Also prepared the data with the help of Paxata (a data preparation tool) for our Business users.
  • Worked on various production issues during the month end support and provide the resolutions without missing any SLA.
  • Used Github to set the overall direction of the project and track the progress of the project.
  • Used Paxata for delivering the data to the BI users for creating the dashboards for power generation trends and people without electricity.

Environment: CDH, Hadoop, HDFS, Map Reduce, Hive, Pig, Scala, Spark, Sqoop, UNIX.

Confidential, New York City, New York

Hadoop Engineer

Responsibilities:

  • Worked on processing large sets of structured, semi-structured and unstructured data and also supported systems application architecture.
  • Participated in multiple big data POC to evaluate different architectures, tools and vendor products.
  • Designed and implemented spark streaming application.
  • Worked on pulling the data from Amazon S3 bucket into the HDFS.
  • Ability to spin up differentAWS instancesincludingEC2-classic and EC2-VPCusing cloud formation templates
  • Worked on various file formats like avro and ORC.
  • Involved in sqooping the history data from Teradata and loading into HDFS.
  • Built a DI/DQ utility to validate the data at the Teradata and Hadoop Distributed File System.
  • Responsible for optimizing resource allocation in distributed systems.
  • Responsible to monitor application with high performance.
  • Used Bedrock (automation tool) to perform the hive incremental updates.
  • Optimized the hive queries using Spark which bought down the cluster usage (from 80% resources to 20%) using coalesce and repartition.
  • Worked on the POC for evaluating the performance of the Paxata (a data wrangling tool).

Environment: CDH, Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Spark, UNIX.

Confidential

Java/UI Developer

Responsibilities:

  • Involved in designing and development of the requirements in SDLC followed by Waterfall methodology.
  • Developed the application on Eclipse IDE utilizing the spring framework and MVC Architecture.
  • Worked with users to analyze the requirements and technically created the requirements in JAVA frameworks.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Involved in the development of presentation layer and GUI responsive development using JSP, HTML5, CSS/CSS3, Bootstrap & used Client Side validations were done using Spring MVC, XSLT and JQuery.
  • Transformed HTML files from XML, XSLT using DOM Parser, and Transformer Factory and hosted on Weblogic server.
  • Worked on Spring IoC, Spring MVC framework, Spring Messaging Framework and Spring AOP to develop application service components.
  • Used JavaScript, JQuery and Ajax API for intensive user operations and client-side validations.
  • Used JDBC to connect to Oracle database.
  • Implemented Database application programming for Oracle, PostgreSQL server using Stored Procedures, Triggers, and Views etc.
  • Wrote SQL Queries and stored procedures for data manipulations with the Oracle database.
  • Developed web services using SOAP, WSDL and REST.
  • Used SOAP (Simple Object Access Protocol) for web service by exchanging XML data between the applications.
  • Modified Log4j for logging and debugging and developed the pom.xml using Maven for compiling the dependencies.
  • Performed Unit Testing using JUnit and supported System in production.
  • Used Ant to build the deployment JAR and WAR files.
  • Used Core Java concepts like Collections, Garbage Collection, Multithreading, OOPs concepts and APIs to do encryption and compression of incoming request to provide security.
  • Written and implemented test scripts to support Test driven development (TDD) and continuous integration.

Environment: Eclipse, Java, JavaScript, JQuery, Ajax, SQL, Oracle10g, Maven, Log4j, HTML4, CSS2, Bootstrap2.0, XML, PostgreSQL

We'd love your feedback!