Big Data Engineer Resume
San, FranciscO
SUMMARY:
- Over 8 years of experience in software industry under wide range of Legacy and Big Data technologies
- Experience on Big Data Hadoop Development and Administration in Confidential , San Francisco,
- Experience on Big Data Hadoop Development and Administration in Confidential
- 5 Years 5 month of experience on Mainframes and Big Data Technologies under the Wealth Management Vertical in Confidential
- Cloudera Certified Developer for Apache Hadoop
- Have a Good Work exposure in Development, Maintenance, Reversed Engineering, Production stability and across all phases of SDLC
- Personal skills include systematic problem analysis, quick learner, good communication skills, dedication towards work, good team player and effective time management
TECHNICAL SKILLS:
Mainframe Skills: COBOL, S - COBOL, VSAM, JCL, PL/1, REXX, Easytrieve
Hadoop Skills: Spark, Core Java, Python, Scala, Hadoop, Pig, Hive, Sqoop, Flume, HDFSMapReduce, YARN, Oozie, ZooKeeper, Cloudera Impala, Phoenix
Hadoop Administration: Hortonworks, Cloudera, Amazon EMR
BI Skills: Pentaho, IBM Data Stage
Operating Systems: Windows, Linux (Cent OS, Ubuntu, Red Hat)
OLTP: CICS, IMS Transactions, TGADP (Web to Mainframe connectivity)
Data Bases: IBM DB2 Database with SQL, HBASE, mongoDB
Data Base Tools: Query Management Facility (QMF), SPUFI, TOAD, SQL Workbench/J
Utilities: ISPF/TSO, DFSORT
Version Control: Changeman, Endevor, GitHub
Testaided Tools: Expediter, Smart Test, File-Aid, Abend-Aid
Quality Tools: Hp Service Center (Peregrine), Clear Quest (CQ), HP Quality Center (QC)
PROFESSIONAL EXPERIENCE:
Confidential, SAN FRANCISCO
Big Data Engineer
Responsibilities:
- I re-architected the entire batch system into incremental processing and also set up the data pipeline to run under the Airflow scheduler.
- Since most of the codebase was in Python, using Airflow (also built in Python) was a sensible choice than OOZIE.
- The new incremental data pipeline system resulted in a direct EMR cost savings of 30% due to less hardware requirements and also ensured the most up-to-date data availability in the impala database for reporting.
- Work is currently on to re-write the reporting modules from Python to Spark to enable distributed processing.
Confidential
Principal Engineer
Technologies Used: HBASE, HIVE, Impala, Sqoop, Spark, MapReduce, Pig, Apache Phoenix, Hortonworks Hadoop Distro, Python
Responsibilities:
- Installing and Setting up the Hadoop cluster with Hortonworks distribution
- Data Modeling in HBASE
- Pre-processing scripts in Python to format the data for HBASE load
- Work with the resource on SLS side to agree upon the benchmarking parameters
- Decide upon the compression algorithm, splits and bloom filters in HBASE for the in alignment with SLS
- Building bulk load scripts in HBASE using importtsv utility
- Performing bulk loads and capturing various parameters for benchmarking
- Execute the SQL scripts from SLS on the HBASE cluster using Apache Phoenix
- Capturing, comparing and reporting various benchmarking parameters
Confidential
Principal Engineer
Technologies Used: Hive, Pig, Sqoop, Spark, Spark SQL, Oozie, Flume, Oracle, Java, Hortonworks Hadoop Distro
Responsibilities:
- Installing and Setting up the Hadoop cluster with Hortonworks distribution
- Development and maintenance of scripts in Hive, Pig and Sqoop
- Setting up workflows in Oozie, scheduling them based on required time
- Data Analytics
- REST API’s for UI to Hive connectivity
- Performance tuning of the batch scripts to ensure SLA
- End to end performance and system testing
Confidential, New Jersey
Principal EngineerTechnologies Used: HDFS, MapReduce, Pig, Hive, Sqoop, Datastage, Impala and Shell scripting
Responsibilities:
- Resolve production defects under Confidential and ensure business continuity
- Data Analysis
- Data Reporting
- Data Clean Ups by executing special request single purpose COBOL programs
- Defect Triaging
- Applying tactical technical solutions to production problems and come up with strategic solutions for the same
- Attend to and resolve any issues with batch jobs across Test, QA and Production environments
- Performance tuning of batch jobs to increase the cost savings
- Setting up and scheduling of news jobs and batch processes starting from Test till Production regions
- Provide 24/7 support for all production problems
Analyst I - Apps Programmer
Confidential
Responsibilities:
- Develop High Level Design document from SRS
- Review the HLD with project teams and business
- Prepare Low level design documents from HLD
- Develop and modify COBOL programs based on the requirements
- Prepare Test plans
- Unit, Regression & Performance Testing in Test environments
- Peer reviews
- Promotion of code from Test to Production environment using the version control tools
- Data mining and support testing teams in QA environments
- Support guarantee period for new changes in Production
Confidential
Software Engineer
Responsibilities:
- Analysis, Coding, Unit testing of individual components.
- Preparation of test cases and testing the programs.
- Review of other components.
- Monitoring and Estimating Work Loads.
- Support for delivered components.
- Delivering the programs to client.