We provide IT Staff Augmentation Services!

Sr. Bigdata Engineer Resume

3.00/5 (Submit Your Rating)

Pittsburgh, PA

SUMMARY:

  • 9 years of competent and result oriented experience in IT including 4+ years of Big Data development, designing and analytical processing and a bit of admin activities.
  • Experience in Designing, implementing, and improving analytical solutions in Big Data on
  • Apache Hadoop, Hive, Spark, Kafka, Nifi, Minifi.
  • Experience in analyzing data using Spark, Hive QL, PIG Latin, and custom MapReduce programs in Java.
  • Experience in importing and exporting the data using SQOOP from HDFS to Relational Database
  • Systems / mainframe and vice - versa.
  • Experience in creating automated data flow between RDBMS and Hadoop clusters using Apache NiFi, dataflow automation using Minifi and automating configuration deployment to tenants using Minifi C2 service.
  • Experience in creating data models and design.
  • Excellent knowledge on MySQL, Oracle, DB2 and RDBMS.
  • Experience in writing Map Reduce jobs in Java using Eclipse Builder and PIG.
  • Experience in designing and implementing NoSQL database stores such as Mongo DB, Elasticsearch, HBase.
  • Experience in development, implementation and testing of Database projects.
  • In-depth understanding of Data Structure and Algorithms.
  • Background with traditional databases such as DB2, Oracle, SQL Server.
  • Good understanding of SSH and SSL.
  • Strong knowledge in all phases of Software Development Life Cycle (SDLC) commended for technical, analytical and problem-solving skills; effective task prioritization on various engineering disciplines to troubleshoot complex system-level issues.
  • Worked on both Agile and waterfall models.
  • Extremely comfortable with all types of hardware systems.
  • Strong Communication skills of written, oral, interpersonal and presentation.
  • Ability to perform at a high level, meet deadlines, adaptable to ever changing priorities.

TECHNICAL SKILLS:

Big Data Ecosystem: Spark, Scala, Kafka, Hive, Nifi, Minifi, Minifi C2, SQOOP, HDFS, HBase, Hadoop MapReduce, Zookeeper, Pig, YARN, Oozie, JSON, XML, Git, Bitbucket. Amazon AWS: EC2, S3, EMR, AMI, EBS, VPC, CloudWatch, CloudFormation.

Languages: SCALA, C, C++, Java, COBOL, JCL, CICS and SQL/PLSQL.

Methodologies: Agile and Waterfall.

Database: DB2, Oracle 10g, PL/SQL, MySQL, HBase, VSAM, MS SQL server

DevOps Tools: Terraform and Ansibles.

IDE / Testing Tools: Eclipse, IntelliJ IDEA.

Operating System: Windows, Unix/Linux, IBM Z/OS.

Scripts: Shell Scripting, Scala scripting.

Distribution: Cloudera, MAPR and Hortonworks

PROFESSIONAL EXPERIENCE:

Confidential, Pittsburgh, PA

Sr. Bigdata Engineer

Responsibilities:

  • Worked on batch processing of data using Apache Spark.
  • Good experience and understanding with Spark Scala programming and its 'In Memory' processing capability.
  • Worked on the core and Spark SQL modules of Spark extensively using programming languages like Scala.
  • Worked on creating the RDD's, DF's and Datasets for the required input data and performed the data transformations and actions using Spark Scala.
  • Worked on creating Kafka Producer, Consumer, Brokers, Topic and partitions.
  • Experience in Writing the Scala functions, procedures, Constructors and Traits.
  • Real time streaming of the data using Spark with Kafka.
  • Worked on integrating Apache Kafka with Apache Spark for data processing.
  • Created Hive data ware house and loaded data with Apache Spark.
  • Created both external and internal Hive tables also partitioned and bucketed them based on the requirement.
  • Worked on different AWS components like S3, EC2, EMR, EBS, VPC, CloudWatch and
  • CloudFormation.
  • Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
  • Worked on fine tuning the Spark programs and Hive queries.
  • Worked on resolved on long running Spark jobs.
  • Creating CloudFormation templates for the OpsWork stacks like automating Minifi C2 service startup, Creation Nifi, Kafka (EMR) clusters, also automated SSL certificate deployment to clusters.
  • Worked on Terraform and ansibles to recreate all the CloudFormation work since we found some performance gain.
  • Infrastructure management using Terraform and configuration management using Ansibles.
  • Good understanding of different file formats, it’s advantages and disadvantages.
  • Created dataflow between SQL Server and Hadoop clusters using Apache Nifi and Minifi.
  • Automated the Nifi configuration deployment to Minifi tenants using Minifi C2 service.
  • Knowledge on creation of clusters and involved in creating 10 node Nifi, Kafka clusters (also created infrastructure like VPC, Subnets, Security Groups, Internet and NAT gateways, Bastion hosts etc.,) and secured them using Java OpenSSL certificates.
  • Deployed Apache MiNiFi windows service for 30 clients for extraction of real time streaming data from Microsoft SQL database through XT Applications.
  • Complete end to end design and development of Apache NiFI flow which acts as the agent between Apache Minifi and Kafka for transformation of real time data in Analytics.
  • Deployed Apache MiNiFi windows service for 30 clients for extraction of real time streaming data from Microsoft SQL database through XT Applications.

Environment: Spark, Scala, Kafka, Hive, SparkSQL, Apache NiFi, Minifi, Minifi C2, YARNAWS (EC2, S3, EMR, EBS, VPC, CloudWatch, CloudFormation), Java, JSON, XML, IntelliJ IDEA, Git, Bitbucket, Centos, Amazon Linux2.

Confidential, New Jersey

Sr. Scala/Spark/Hadoop Developer/Lead

Responsibilities:

  • Creating the Case Classes.
  • Working with the Data Frames and RDD’s.
  • Parsing the JSON Objects into flatten formats using Scala.
  • Creating the tables in Hive and integrating data between Hive & Spark.
  • Worked on the core and Spark SQL modules of Spark extensively using programming languages like Scala.
  • Hands on experience with Spark Scala programming and good understanding of its 'In Memory' processing capability.
  • Worked on creating the RDD's, DF's and Datasets for the required input data and performed the data transformations using Spark Scala.
  • Writing Scala User-Defined Functions (UDFs) to solve the business requirements.
  • Experience in Kafka Producer, Consumer, Brokers, Topic and partitions.
  • Experienced with batch processing of data sources using Apache Spark.
  • Experienced in working with RDDs.
  • Experience in Writing the Scala functions, procedures, Constructors and Traits.
  • Real time streaming the data using Spark with Kafka.
  • Worked on Creating Kafka topics, partitions, writing custom partitioner classes.
  • Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
  • Experience in integrating Apache Kafka with Apache Spark for real time processing.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Experienced in defining job flows.
  • Processed the low latency queries.

Environment: Spark, Scala, Kafka, Hive, SparkSQL, SQOOP, MapReduce, YARN, AWS, Java, JSON, XML, Eclipse, Git.

Confidential

Sr. Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and SQOOP.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Implemented nine nodes CDH3 Hadoop cluster on Red hat LINUX.
  • Involved in loading data from UNIX file system to HDFS.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Implemented best income logic using Pig scripts and UDFs.
  • Writing the Hive queries using HQL.
  • Implemented test scripts to support test driven development and continuous integration.
  • Responsible to manage data coming from different sources.
  • Load and transform large sets of structured, semi structured data.
  • Cluster coordination services through Zookeeper.
  • Experience in performance analysis and capacity planning for growing Hadoop clusters.
  • Experience in managing and reviewing Hadoop log files.
  • Experience in creating data models and design.
  • Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports for the BI team.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and SQOOP.
  • Designed and Developed full stack for Hadoop Distributed File System (HDFS) framework.
  • Including MapReduce, HBase, Hive, Pig Framework, Zookeeper. Etc.
  • Writing Pig Latin scripts to process the data.
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Written Hive queries for data analysis to meet the Business requirements.
  • Experience in managing and reviewing Hadoop log files.
  • Experience in creating data models and design.
  • Experience in development related CRUD operations.
  • Experience in Schema defining.
  • Creating Indexes and Aggregation framework.
  • Load and transform large sets of structured, semi structured data.
  • Exported the analyzed data to the relational databases using SQOOP for visualization and to
  • Generate reports. Developed Hive queries for the analysts
  • Got good experience with NOSQL database. Involved in loading data from UNIX file system to
  • HDFS.
  • Supported Map Reduce Programs those are running on the cluster.
  • Responsible to manage data coming from different sources.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and SQOOP. Cluster co-ordination through Zookeeper.
  • Involved in creating Hive tables, loading with data and writing hive queries.

Environment: Hadoop, HDFS, Hive, HBase, SQOOP, PIG, MapReduce, Zookeeper, Java (JDK 1.6), Eclipse, PL/SQL, MySQL Shell Scripting and Ubuntu.

Confidential

Java, Mainframes developer

Responsibilities:

  • Involved in requirement gathering.
  • Involved in presenting the prototype to the customer.
  • Involved in designing.
  • Analysis of the business functionality of the system.
  • Involved in designing Use-case.
  • Involved in designing and implementing.
  • Involved in writing SQL queries.
  • Analysis of the business functionality of the system.
  • Design and Coding as per requirements.
  • Review the coded programs.
  • Coordinating Testing phase in the Unit and System.
  • Preparing test scripts.
  • Review of Unit and Integration test cases.
  • Addressing critical issues and fixing bugs, also involved in design discussion.
  • Project Quality Team member.
  • Analysis of the business functionality of the system.
  • Design and Coding as per client requirements.
  • UTP preparation, Unit testing and System testing.
  • Handling critical ticket of Severity 1-2.
  • Analyzing, Co-coordinating with plant-users for solving problem tickets.
  • Solving Business related issues within the System.
  • Offshore coordination with Onsite leads.

Confidential

Software Engineer

Environment: COBOL, JCL, DB2, REXX, VSAM, CICS, QMF, Expediter, SPUFI, FILE-AID, Easytrieve, TSO/ISPF, DB2 utilities.

Responsibilities:

  • Design and Coding as per requirements.
  • Review the coded programs.
  • Coordinating Testing phase in the Unit and System.
  • Preparing test scripts.
  • Review of Unit and Integration test cases.
  • Implemented Exception handling using custom exception.
  • Addressing critical issues and fixing bugs, also involved in code reviews, design discussion.
  • Preparing the estimates for the minor improvements and enhancement works.

We'd love your feedback!