We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Franklin Lakes, NJ

PROFESSIONAL SUMMARY:

  • Over 7+ years of Software Engineering experience including 5+ years as Big Data/Hadoop Engineer with AWS.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Hands on experience in Test - driven development, Software Development Life Cycle (SDLC) methodologies like Agile and Scrum.
  • Experienced with performing Real Time Analytics on NoSQL distributed data bases like Cassandra, HBase and MongoDB.
  • Good understanding of designing attractive data visualization dashboards using Tableau.
  • Good experience in developing Scala scripts, UDFs using both Data frames and RDDs in Spark for Data Aggregation, queries and writing data back into OLTP Systems.
  • Hands on experience in using spark with the help of Scala API in developing Data Ingestion pipelines using Kafka.
  • Experience in implementing the various services using Microservices architecture in which the services working dependently, implemented Spring Boot Microservices to divide the application into various sub modules.
  • Hands on experience in designing and developing POCs in Spark to compare the performance of Spark with Hive and SQL/Oracle using Scala.
  • Used Flume and Kafka to direct data from different sources to/from HDFS.
  • Worked with AWS cloud and created EMR clusters with spark for analyzing raw data processing and access data from S3 buckets.
  • Scripted an ETL Pipeline on Python that ingests files from AWS S3 to Redshift Table.
  • Hands on experience with various file formats such as ORC, Avro, Parquet and JSON.
  • Knowledge about using Data Bricks Platform, Cloudera Manager and Hortonworks Distribution to monitor and manage clusters.
  • Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Expertise in working with Linux/Unix and shell commands on the Terminal.
  • Expertise with Python, Scala and Java in Design, Development, Administrating and Supporting of large scale distributed systems.
  • Experience in working on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
  • Ability to develop MapReduce program using Java and Python.
  • Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing.
  • Good understanding and exposure to Python programming.
  • Experience in using various IDEs Eclipse, Intellij and repositories SVN and Git.
  • Exporting and importing data to and from Oracle using SQL developer for analysis.
  • Good experience in using Sqoop for traditional RDBMS data pulls and worked with different distributions of Hadoop like Hortonworks and Cloudera.
  • Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.

TECHNICAL SKILLS:

Big data/Hadoop: HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, Hue, Flume, Kafka and Spark

NoSQL Databases: HBase, MongoDB & Cassandra

Cloud Services: Amazon AWS, EC2, Redshift, MS Azure

Java/J2EE Technologies: Servlets, JSP, JDBC, JSTL, EJB, JAXB, JAXP, JMS, JAX-RPC, JAX- WS

Programming Languages: Java, Python, SQL, PL/SQL, AWS, Hive QL, Unix Shell Scripting, Scala

Methodologies: Software Development Lifecycle (SDLC), Waterfall Model and Agile, STLC (Software Testing Life cycle) & UML, Design Patterns (Core Java and J2EE)

Database: Oracle 12c/11g, MYSQL, SQL Server 2016/2014

Web/ Application Servers: WebLogic, Tomcat, JBoss

Web Technologies: HTML5, CSS3, XML, JavaScript, JQuery, AJAX, WSDL, SOAP

Tools and IDE: Eclipse, NetBeans, Maven, DB Visualizer, Visual Studio 2008, SQL Server Management Studio

PROFESSIONAL EXPERIENCE:

Confidential, Franklin Lakes, NJ

Sr. Big Data Engineer

Responsibilities:

  • Working as a Sr. Big Data Engineer with Big data & Hadoop Ecosystems components.
  • Worked in Agile development environment and Participated in daily scrum and other design related meetings.
  • Installed and Configured Apache Hadoop clusters for application development and Hadoop tools.
  • Installed and configured Hive and written Hive UDFs and used repository of UDF’s for Pig Latin.
  • Developed data pipeline using Pig, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • In preprocessing phase of data extraction, Used Spark to remove all the missing data for transforming of data to create new features.
  • Installed Kafka on Hadoop cluster and configured producer and consumer in java to establish connection from source to HDFS with popular hash tags.
  • Worked on Spark SQL, created data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Imported data from AWS S3 and into Spark RDD and performed transformations and actions on RDD.
  • Created FTP scripts for Sqoop data from DB2 and save in AWS as Avro formatting.
  • Developed Nifi flow to move data from different sources to HDFS and from HDFS to S3 buckets.
  • Worked on Spark SQL, created Data frames by loading data from Hive tables and created prep data and stored in AWS S3.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
  • Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data from Kafka in near real-time and persist it to HBase.
  • Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.
  • Developed Spark scripts by using Java, and Python shell commands as per the requirement.
  • Involved with ingesting data received from various relational database providers, on HDFS for analysis and other big data operations.
  • Used AWS infrastructure to host the portal.
  • Used EC2, RDS, and S3 features of AWS.
  • Deployed code into the EMR Cluster in S3 buckets.
  • Migrated the existing on-perm code to AWS EMR cluster.
  • Implemented Spark to migrate MapReduce jobs into Spark RDD transformations, streaming data using Spark streaming
  • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data.
  • Worked with teams in setting up AWS EC2 instances by using different AWS services.
  • Extensively worked with Cloudera Hadoop distribution components and custom packages.
  • Implemented AWS Redshift (a petabyte-scale data warehouse service in the cloud).
  • Implemented multiple MapReduce Jobs in java for data cleansing and pre-processing.
  • Wrote complex Hive queries and UDFs in Java and Python.
  • Analyzed the data by performing Hive queries (Hive QL), Pig scripts, Spark SQL and Spark streaming.
  • Developed tools using Python, Shell scripting, XML to automate some of the menial tasks.

Environment: Hadoop 3.0, Agile, Hive 2.3, SQL, Spark, Python, AWS, MVC, NoSQL, HBase 2.1, Hortonworks, XML

Confidential, San Jose, CA

Big Data Engineer

Responsibilities:

  • Worked as a Big Data Engineer for providing solutions for big data problem.
  • Understand Business requirement and involved in preparing Design document preparation according to client requirement.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in java and Scala for data cleaning and preprocessing.
  • Installed and configured Hive and written Hive UDFs and Used MapReduce and Junit for unit testing.
  • Responsible for automating build processes towards CI/CD automation goals.
  • Installed and configured Hadoop Ecosystem like Hive, Oozie, Sqoop by which implemented using Cloudera Hadoop cluster for helping with performance tuning and monitoring.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.
  • Analyzing the existing data flow to the warehouses and taking the similar approach to migrate the data into HDFS.
  • Created Partitioning, Bucketing, and Map Side Join, Parallel execution for optimizing the Hive queries decreased the time of execution from hours to minutes.
  • Defined the application architecture and design for Big Data Hadoop initiative to maintain structured and unstructured data; create architecture for the enterprise.
  • Identify data sources, create source-to-target mapping, storage estimation, provide support for Hadoop cluster setup, data partitioning.
  • Involved in gathering requirements from client and estimating time line for developing complex queries using Hive for logistics application.
  • Worked with cloud provisioning team on a capacity planning and sizing of the nodes (Master and Slave) for an AWS EMR Cluster.
  • Responsible for creating an instance on Amazon EC2 (AWS) and deployed the application on it.
  • Worked with Amazon EMR to process data directly in S3 when we want to copy data from S3 to the Hadoop Distributed File System (HDFS) on your Amazon EMR cluster by setting up the Spark Core for analysis work.
  • Exposure on Spark Architecture and how RDD’s work internally by involving and processing the data from Local files, HDFS and RDBMS sources by creating RDD and optimizing for performance.
  • Involved in data pipeline using Pig, Sqoop to ingest cargo data and customer histories into HDFS for analysis.
  • Created HBase tables to load large sets of semi-structured data coming from various sources.
  • Responsible for loading the customer's data and event logs from Kafka into HBase using REST API.
  • Created custom UDF’s for Spark and Kafka procedure for some of non-working functionalities in custom UDF into Scala in production environment.
  • Developed workflows in Oozie and scheduling jobs in Mainframes by preparing data refresh strategy document & Capacity planning documents required for project development and support.
  • Worked with different actions in Oozie to design workflow like Sqoop action, pig action, hive action, shell action.
  • Implemented Fair scheduler on the Job tracker to share the resources of the cluster for the map reduces jobs given by the users.
  • Implemented Kafka consumers to move data from Kafka partitions into HBase for near real time analysis.
  • Involved in collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
  • Ingested all formats of structured and unstructured data including Logs/Transactions, Relational Databases using Sqoop & Flume into HDFS.
  • Assisted with data capacity planning and node forecasting and collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: AWS S3, EMR, Python 3.8, PySpark 3.0, Scala, Hadoop 2.7.2, MapReduce, Hive, impala, Sqoop, Spark SQL

Confidential, Natrona Heights, PA

Hadoop Developer

Responsibilities:

  • Created the automated build and deployment process for application, re-engineering setup for better user experience, and leading up to building a continuous integration system.
  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Involved in Daily Scrum (Agile) meetings, Sprint planning and estimation of the tasks for the user stories, participated in retrospective and presenting Demo at end of the sprint.
  • Created S3 buckets and managing policies for S3 buckets and Utilized S3 bucket and Glacier for Archival storage and backup on AWS.
  • Utilized Kubernetes and Docker for the runtime environment of the CI/CD system to build, test deploys.
  • Involved in loading data from LINUX file system to HDFS.
  • Reviewed Hadoop log files to detect failures.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
  • Real streaming the data using Spark 1.6.0 with Kafka.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Implemented test scripts to support test driven development and continuous integration.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Implementing custom code for MapReduce practitioner and custom writable.
  • Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second.
  • Used Kafka producer API's to produce messages.
  • Implemented data ingestion and handling clusters in real time processing using Kafka.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way.
  • Deployed Microservices, including provisioning AWS environments using Ansible Playbooks.
  • Created scripts in Python which integrated with Amazon API to control instance operations.
  • Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (HBase).
  • Along with the Infrastructure team, I have implemented Kafka-Storm based data pipeline
  • Written python scripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application.
  • Supported MapReduce Programs those are running on the cluster.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Worked on tuning the performance Pig queries.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Mentored analyst and test team for writing Hive Queries.
  • Worked with application teams to install operating system, Hadoop updates, patches, version u grades as required.

Environment: Hadoop 2.7, HBase, PL/SQL, XML, Kafka 0.10.0.0, Pig, MapReduce, Spark 1.6, Sqoop, Microservices, Scala, HDFS, AWS, CI/CD, S3, Agile, Linux

Confidential, St. Louis, MO

Sr. Java/AWS Engineer

Responsibilities:

  • Performed Analysis, Design, Development, Integration and Testing of application modules.
  • Worked on multiple AWS accounts with different VPC's for Prod and Non-Prod where key objective included automation, build out, integration and cost control.
  • Amazon EC2 Cloud Instances using Amazon Web Services (Linux) and Configuring launched instances with respect to specific applications.
  • Implemented Microservices using Pivotal Cloud Foundry platform build upon Spring Boot Services.
  • Assisted in migrating the existing data center into the AWS environment.
  • Created S3 buckets and managing policies for S3 buckets and Utilized S3 bucket and Glacier for Archival storage and backup on AWS.
  • Utilized Kubernetes and Docker for the runtime environment of the CI/CD system to build, test deploys.
  • Container management using Docker by writing Docker files and set up the automated build on Docker HUB and installed and configured Kubernetes.
  • Designed and implemented by configuring Topics in new Kafka cluster.
  • Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.
  • Used security groups, network ACLs, Internet Gateways, NAT instances and Route tables to ensure a secure zone for organizations in AWS public cloud.
  • Responsible for build and deployment automation using Docker, Kubernetes containers and Ansible.
  • Developed the back end webServices using python.
  • Used Ansible and Ansible Tower as Configuration management tool, to automate repetitive tasks, quickly deploys critical applications, and proactively manages change.
  • Used Ansible Playbooks to setup Continuous Delivery Pipeline.
  • Deployed Microservices, including provisioning AWS environments using Ansible Playbooks.
  • Created scripts in Python which integrated with Amazon API to control instance operations.
  • Supported MapReduce Programs those are running on the cluster and also Wrote MapReduce jobs using Java API.
  • Maintained the monitoring and alerting of production and corporate servers using Cloud Watch service.
  • Migrated applications from internal data center to AWS.
  • Deployed and configured Git repositories with branching, tagging, and notifications

Environment: Spring Boot, Microservices, AWS, EC2, CI, CD, S3, Docker, Linux, Kafka, Jenkins, Python, Docker, Kubernetes

Confidential

Java Developer

Responsibilities:

  • Involved in Software Development Life Cycle (SDLC) of the application: Requirement gathering, Design Analysis and Code development.
  • Implemented Struts framework based on the Model View Controller design paradigm.
  • Designed the application by implementing Struts based on MVC Architecture, simple Java Beans as a Model, JSP UI Components as View and Action Servlets as a Controller.
  • Used JNDI to perform lookup services for the various components of the system.
  • Involved in designing and developing dynamic web pages using HTML and JSP with Struts tag libraries.
  • Used HQL (Hibernate Query Language) to query the Database System and used JDBC Thin Driver to connect to the database.
  • Developed Hibernate entities, mappings and customized criterion queries for interacting with database.
  • Responsible for designing Rich user Interface Applications using JavaScript, CSS, HTML and AJAX and developed Webservices by using SOAP UI.
  • Used JPA to persistently store large amount of data into database.
  • Implemented modules using Java APIs, Java collection, Threads, XML, and integrating the modules.
  • Applied J2EE Design Patterns such as Factory, Singleton, and Business delegate, DAO, Front Controller Pattern and MVC.
  • Used JPA for the management of relational data in application.
  • Designed and developed business components using Session and Entity Beans in EJB.
  • Developed the EJBs (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
  • Developed XML configuration files, properties files used in struts Validate framework for validating Form inputs on server side.
  • Extensively used AJAX technology to add interactivity to the web pages.
  • Developed JMS Sender and Receivers for the loose coupling between the other modules and Implemented asynchronous request processing using Message Driven Bean.
  • Used JDBC for data access from Oracle tables.
  • Successfully installed and configured the IBM WebSphere Application server and deployed the business tier components using EAR file.
  • Involved in deployment of application on Weblogic Application Server in Development & QA environment.
  • Used Log4j for External Configuration Files and debugging.

Environment: JSP 1.2, Servlets, Struts1.2.x, JMS, EJB 2.1, Java, JavaScript, Ajax, JMS, Eclipse, JPA, ANT.

We'd love your feedback!