We provide IT Staff Augmentation Services!

Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Pittsburg, PA

SUMMARY:

  • Around 12 years of professional IT work experience which includes Development, Deployment and Maintenance of critical software and Big Data applications, Process Analysis, Process Design, Business Analysis, Data Analysis.
  • Provided a solution using HIVE, SQOOP (to export/ import data), for faster data load by replacing the traditional ETL process with HDFS for loading data to target tables.
  • Created UDF's and Oozie workflows to Sqoop the data from source to HDFS and then to the target tables.
  • Implemented custom Data types, Input Format, Record Reader, Output Format, Record Writer for MapReduce computations.
  • Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs.
  • Hands on experience of Cloudera distribution.
  • Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
  • Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS.
  • Good experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners.
  • Knowledge in Python and R-Programming.
  • Experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Experience in using different file formats like CSV, and JSON files.
  • Experience in big data ingestion tools like Sqoop, Flume and Apache Kafka.
  • Experience in using Flume and Kafka to load the log data from multiple sources into HDFS.
  • Experience with NoSQL Databases like HBase, MongoDB, Cassandra
  • Experience in retrieving data from databases like MYSQL and Oracle into HDFS using Sqoop and ingesting them into HBase.
  • Experience in creating Tableau score cards, Tableau dashboards using stack bars, bar graphs and geographical maps.
  • Good understanding in Software Development methodologies like Agile and Waterfall.
  • More than 5 years of experience in IT process methodologies, ITIL, Six Sigma, IS0 9000

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop, Map Reduce, YARN, Pig, Hive, HBase, Flume, Sqoop, Impala, Apache Spark, Kafka, Scala, MongoDB, Cassandra.

No SQL Databases: Cassandra, MongoDB, HBase.

Databases: MYSQL, Oracle

Operating Systems: Windows, UNIX, Linux

Other Tools: Putty, WINSCP

Languages: Python, Scala, R-Programming, SQL, HTML, C and basics of Java.

App/Web servers: WebSphere, Tomcat.

Process: ITIL Expert, ISO 9001, Lean Six Sigma

PROFESSIONAL EXPERIENCE:

Confidential, Pittsburg, PA

Big Data Engineer

Responsibilities:

  • Worked on a live 24 node cluster working on HDP 2.2.
  • Worked with Sqoop (version 1.4.3) jobs with incremental load to populate Hawq External tables to Internal table.
  • Importing and exporting data job's, to perform operations like copying data from HDFS and to HDFS using Sqoop.
  • Worked with Spark core and spark SQL modules of Spark.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Optimized Hive QL/ pig scripts by using execution engine like Tez, Spark.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs
  • Experience in reviewing Hadoop log files to detect failures
  • Worked on RCA documentation. In case of any production failures.
  • Hands on Experience on Defect Resolution.
  • Performed benchmarking of the No-SQL databases, Cassandra and Hbase streams.
  • Worked with Pig, HBase, NoSQL database HBASE and Sqoop, for analyzing the Hadoop cluster as well as big data.
  • Knowledge of workflow/schedulers like Oozie/crontab/Autosys
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Creating Hive tables and working on them for data analysis in order to meet the business requirements.
  • Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
  • Worked on data cleansing in order to populate into hive external table and internal tables.
  • Hands on Experience writing PIG Scripts to Tokenized sensitive information using PROTEGRITY.
  • Supporting and building the Data Science team projects on to Hadoop
  • Used FLUME to dump the application server logs into HDFS.
  • Worked with data modeling teams. Created conceptual, logical & physical models.
  • Experience in working with NoSQL database HBASE in getting real time data analytics.
  • Hands on experience working as production support Engineer.
  • Hands on experience in Hadoop Testing.
  • Automated incremental loads to load data into production cluster
  • Ingested the data from various file system to HDFS using Unix command line utilities.
  • Worked on EPIC user stories and delivered on time.
  • Worked on data ingestion part for Malicious intent model. Automated daily incrementaljobs that can run on daily basis.
  • Hands on experience in Agile and scrum methodologies

Confidential, Bowie, MD

Hadoop Developer

Responsibilities:

  • Worked on a live 90 nodes Hadoop cluster running CDH4.4
  • Worked with highly unstructured and semi structured data of 90 TB in size (270 TB)
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Worked with Sqoop (version 1.4.3) jobs with incremental load to populate Hive External tables.
  • Extensive experience in writing Pig (version 0.10) scripts to transform raw data from several data sources into forming baseline data.
  • Responsible to play as Hadoop SME role for customer in absence of onshore SME.
  • Developed Hive (version 0.10) scripts for end user / analyst requirements to perform ad hoc analysis
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
  • Worked with Java resources for UDFs in Java as and when necessary to use in PIG and HIVE queries
  • Developed Oozie workflow for scheduling and orchestrating the ETL process
  • Worked with the admin team in designing and upgrading CDH 3 to CDH 4 Environment Cluster
  • Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN) setups
  • Worked on Performance Tuning to Ensure that assigned systems were patched, configured and optimized for maximum functionality and availability. Implemented solutions that reduced single points of failure and improved system uptime to 99.9% availability.
  • Very good experience in monitoring and managing the Hadoop cluster using Cloudera Manager.

Confidential, Atlanta, GA

Hadoop Developer

Responsibilities:

  • Worked on a live 90 nodes Hadoop cluster running CDH4.1
  • Worked with highly unstructured and semi structured data of 120 TB in size (360 TB)
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Involved in the setup and deployment of Hadoop cluster
  • Developed Map Reduce programs for some refined queries on big data.
  • Involved in loading data from UNIX file system to HDFS.
  • Loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
  • Exported the analyzed data to the relational databases using Sqoop and generated reports for the BI team.
  • Managing and scheduling jobs on a Hadoop cluster using Oozie.
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline
  • Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.6.3 API's to produce messages.
  • Provided daily code contribution, worked in a test-driven development.
  • Developed Merge jobs in Python to extract and load data into MySQL database.
  • Developed Simple to complex Map/reduce Jobs using Hive.
  • Implemented Partitioning and bucketing in Hive.
  • Mentored analyst and test team for writing Hive Queries.
  • Involved in setting up of HBase to use HDFS.
  • Extensively used Pig for data cleansing.
  • Along with the Infrastructure team, involved in design and developed Kafka and Storm based performed benchmarking of the No-SQL databases, Cassandra and Hbase streams.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Knowledgeable of Spark and Scala mainly in framework exploration for transition from Hadoop/Map Reduce to Spark.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

Confidential, Atlanta, GA

Application Support Engineer - Hadoop

Responsibilities:

  • Worked on a live 110 nodes Hadoop cluster running CDH4.0
  • Processed data into HDFS by developing solutions analyzed the data using MapReduce, pig Hive and produce summary results from Hadoop to downstream systems
  • Involved in development of MapReduce job using HiveQL Statements
  • Work closely with various levels of individuals to coordinate and prioritize multiple projects, estimate scope, schedule and track projects throughout SDLC
  • Worked with various Java development teams in Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs for data cleaning and processing.
  • Involved in MapReduce job using HiveQL query for data stored in HDFS
  • Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Experienced in managing and reviewing Hadoop Log files.
  • Designed a data warehouse using Hive
  • Handling structured, semi structured and unstructured data
  • Developed simple to complex MapReduce jobs using Hive and Pig.
  • Extensively used pig for data cleansing
  • Created partitioned tables in Hive.
  • Managed and reviewed Hadoop log files.
  • Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.6.3 API's to produce messages.
  • Developed Simple to complex Map/reduce Jobs using Hive.
  • Implemented Partitioning and bucketing in Hive.
  • Mentored analyst and test team for writing Hive Queries.
  • Supports and assists QA Engineers in understanding, Testing and troubleshooting.
  • Exported and analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and Pre-processing with Pig.
  • Involved in the database migrations to transfers to migrations to transfer data from one database to other and other and complete virtualization of many client applications.
  • Created HBase tables to store variable data formats of data coming from different portfolios.
  • Used Sqoop widely in order to import data from various systems/sources (like MYSQL) into HDFS

Confidential 

Process Engineer (Assistant Director)

Responsibilities:

  • Maintain and Application Development Lifecycle methods and processes
  • Develop processes and support SAP implementation
  • Support Apache Hadoop implementation from process and documentation perspective
  • Work with stakeholders at all levels to ensure IT strategy is developed and maintained
  • Develop workflows as per ITIL standards
  • Work with various development projects to understand issues and perform Root Cause Analysis
  • Work with process owners and technical teams to maintain and improve existing IT processes

Confidential

Data & Process Lead Analyst

Responsibilities:

  • Conceptualize on new processes as per business requirement Conduct FMEA on process to ensure that control points are built in the process Enhance processes by make it more lean
  • Performance review calls with vendors, Preparation of performance matrix for vendors
  • Ensure proper documentation of technical changes in present architecture
  • Customer Interaction: Status updates, Bridge calls
  • Define processes as per standard requirement Define audit checklist
  • Train resources on auditing as per audit checklist Identify gaps in implementation and drive fixing of gaps
  • Understand the requirement of the standard Define processes as per standard requirement Define audit checklist
  • Metrics Management: Identify new measurement parameters for processes to measure and improve on process performance review with internal and external customers on data behavior and patterns
  • Analyze audit reports and drive enhancements in process to make process fool-proof Review internal/external escalation to redefine audit control points.
  • Enhance on audit process to drive process compliance
  • Ensure tickets closure within SLA, Timely acknowledgement, assignment & Queue management of tickets to team, Helping the team in technical issues, Identify and handle all internal and external escalations on time, ensure timely escalation to appropriate next levels
  • Reporting: Prepare timely reports for the client, Prepare RCA's as applicable, SQL queries on reporting database.
  • Managed process improvement project to improve customer satisfaction across Confidential IT Infrastructure Management Services within the scope of Microland (Incident Management, Desktop Management, Windows Server Management).
  • Developed Comprehensive Operating Procedures for Managed Services based on ITIL and ISO Standards
  • Audited Services based on ITIL framework to ensure Adherence to Standard Operating Procedures
  • Developed Metrics and KPIs for key IT Infrastructure Support Processes
  • Analyzed performance report on a daily, weekly and month basis to identify deviations form agreed Service Levels

Confidential

Data & Process Analyst

Responsibilities:

  • End to End planning for process transition to meet the needs of business timelines and SOW.
  • Alignment of processes to Microland & Business standard processes and signoff with the external customer on the same Definition of metrics to the shared with the customers (internal / external) and signoff on the same
  • Definition of any new process required to deliver services to the customer as per business requirement.
  • Revision of any existing process as per changes in business requirement or for improvement in service performance.
  • Ensuring all processes have defined steps to be audited, defined sample size to be audited and defined frequency at which the audit needs to be conducted.
  • There should be a sign off on these with the process owner.
  • Ensure that all engineers/executive/leads following the process are trained on audit steps as and when the audits steps are getting defined or revised.
  • Training of Quality Analysts on new audits and audit requirement.
  • Ensuring all processes are having defined metrics to measure the performance of the process as per business requirement.
  • Maintain Technical Support website, use HTML to add articles.
  • Ensuring SOPs for all reports and dashboards are documented as per document management guidelines and are stored live and updated in the central repository.
  • Use VB macros to automate excel reporting
  • Analyzing process performance metrics to identify any areas of improvement and drive improvement together with process owner.
  • Analyzing processes to identity any areas of automation in the process and drive automation initiatives together with process owners.
  • Participating in internal reviews to discuss metrics on performance of processes, understanding process owners pain points, identifying areas of improvement and driving improvement together with process owner.
  • Participating in operations reviews with customer to discuss metrics on performance of processes, understanding process owners pain points, identifying areas of improvement and driving improvement together with process owner.

We'd love your feedback!