Senior Hadoop Developer Resume
Cincinnati, OH
SUMMARY
- IT Consultant with 61/2 years of extensive experience in Operations, developing, maintaining, monitoring and upgrading Hadoop Clusters.
- Extensive Retail Domain and Telecom Domain knowledge with primary skillset on Merchandising, Finance, Product Design and Development and Supply Chain Management areas.
- Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the Operations, implementation, administration and support of ETL processes for large - scale Data Warehouses.
- Good Experience in translating client’s Big Data business requirements and transforming them into Hadoop centric technologies.
- Hands on experience in installing/configuring/maintaining Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Spark, Kafka, Zookeeper, Hue and Sqoop using Hortonworks.
- Hands on experience in developing and deploying enterprise based applications using major components in Hadoop ecosystem like Hadoop 2.x, YARN, Hive, Pig, Map Reduce, Spark, Kafka, Storm, Oozie, HBase, Flume, Sqoop and Zookeeper.
- Experience in converting Hive/SQL queries into Spark transformations using Java. Experience on ETL development using Kafka, Flume, and Sqoop.
- Built large-scale data processing pipelines and data storage platforms using open-source big data technologies.
- Experience in installation, configuration, management and deployment of Big Data solutions and the underlying infrastructure of Hadoop Cluster.
- Experience in installing, configuring Hive, its services and Metastore. Exposure to Hive Querying Language, knowledge about tables like importing data, altering and dropping tables.
- Experience in installing and running Pig, its execution types, Grunt, Pig Latin Editors. Good knowledge about how to load, store, filter data and also combining and splitting data.
- Experience in tuning and debugging Spark application running.
- Experience integration of Kafka with Spark for real time data processing.
- In depth knowledge about database imports, worked with imported data to populate tables in Hive. Exposure about how to export data from relational databases to Hadoop Distributed File System.
- Experience in setting up the High-Availability Hadoop Clusters.
- Good knowledge about planning a Hadoop cluster like choosing the distribution, hardware selection for both master as well as slave nodes and cluster sizing.
- Experience in developing Shell Scripts for system management.
- Experience in Hadoop administration with good knowledge about Hadoop features like safe mode, auditing.
- Experience with Software Development Processes & Models: Agile, Waterfall & Scrum Model.
- Have good knowledge on sprint planning tools like Rally, Jira and GitHub version control tools as well.
- Experience in UNIX shell scripting and has good understanding of OOPS and Data structures.
- Team Player and a fast learner with good analytical and problem solving skills.
- Self-Starter and Ability to work independently as well as a Team.
TECHNICAL SKILLS
Operating Systems: MSDOS, Win 95/98/NT/2000/XP, Windows 7/8/10, Zos, UNIX
Project Management Tools/ Methodology: MS-Project, Unified Modelling Language(UML), Rational Unified Process(RUP), Software Design Life Cycle(SDLC), Agile(SCRUM), KANBAN
Process/Model Tools: Rational Rose, MS Visio, Rally, Jira
Hadoop Technologies: Hadoop/Big Data Technologies HDFS, SPARK, Scala, Hive, Pig, Sqoop, Flume, Java, Kafka, Gobblin
Language: JCL, REXX, EXTRIEVE, SQL, COBOL
Database: DB2, MS Access, Oracle 9i, HBase
Database Tools: IBM DB2 Connect, TOAD, SQL Developer
Testing Strategies: System Integration Testing, Regression, and System Testing
Testing Tools: HP Quality Center, Quality Center
Office Tools: MS Word, MS Excel, MS PowerPoint, MS Access, MS Project
Web Related: HTML, XML, VBScript, and JavaScript
Other: Tandem(Outside Overview), Total Systems(TSYS)
PROFESSIONAL EXPERIENCE
Senior Hadoop Developer
Confidential, Cincinnati, OH
Responsibilities:
- Designed and implemented Mapreduce based large-scale parallel relation-learning system.
- Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
- Real time streaming the data using Spark Streaming with Kafka.
- Developed Spark scripts by using Scala as per the requirement.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Performed different types of transformations and actions on the RDD to meet the business requirements.
- Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
- Also worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop.
- Involved in loading data from UNIX file system to HDFS.
- Very good understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
- Developed multiple MapReduce jobs in Java and python for data cleaning and pre-processing.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented best offer logic using Pig scripts and Pig UDFs.
- Used Angular JS for data-binding, and Node JS for back-end support with APIs.
- Created Single Page Application with loading multiple views using route services and adding more user experience to make it more dynamic by using Angular JS framework.
- Cluster configuration and data transfer (distcp and hftp), inter and intra cluster data transfer.
- Responsible to manage data coming from various sources.
- Installed and configured Hive and also written Hive UDFs.
- Experience on loading and transforming of large sets of structured, semi structured and unstructured data.
- Cluster coordination services through Zookeeper.
- Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Analysed large amounts of data sets to determine optimal way to aggregate and report on it.
- Responsible for setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Installed and configured Hadoop Mapreduce, HDFS.
- Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Installed and configured Pig.
- Build REST web service by building Node.js Server in the back-end to handle requests sent from the front-end jQuery Ajax calls.
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
- Involved in managing and reviewing Hadoop log files.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Responsible for writing Hive queries for data analysis to meet the business requirements.
- Responsible for creating Hive tables and working on them using Hive QL.
- Responsible for importing and exporting data into HDFS and Hive using Sqoop.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
Environment: Hadoop, MapReduce, Hive, Pig, Sqoop, Java, node.js, Oozie, HBase, Kafka, Spark, Scala, Eclipse, Linux, Oracle, Teradata.
Hadoop Developer
Confidential, Boise, ID
Responsibilities:
- Responsible for Installation and configuration of Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Involved in design Cassandra data model, used CQL (Cassandra Query Language) to perform CRUD operations on Cassandra file system
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Used GZIP with AWS Cloud front to forward compressed files to destination node/instances.
- Implemented using SCALA and SQL for faster testing and processing of data. Real-time streaming the data using with KAFKA.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
- Worked on Agile methodology projects extensively.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Scheduled data refresh on Tableau Server for weekly and monthly increments based on business change to ensure that the views and dashboards were displaying the changed data accurately.
- Installing, Upgrading and Managing Hadoop Clusters.
- Administration, installing, upgrading and managing distributions of Hadoop, Hive, HBase.
- Advanced knowledge in performance troubleshooting and tuning Hadoop clusters.
- Created Hive tables, loaded data and wrote Hive queries that run within the map.
- Implemented business logic by writing Pig UDF's in Java and used various UDFs from Piggybanks and other sources.
- Implemented analytical platform and used HiveQL functions and different kind of join operations like Map joins, Bucketed Map joins.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms LZO, snappy.
- Processed the source data to structured data and store in NoSQL database Cassandra.
- Created alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
- Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console.
- Developed Service (EJB) components for middle tier and implementation of business logic using J2EE Design patterns on Web Logic App Server
- Evaluated suitability of Hadoop and its ecosystem to the above project and implementing / validating with various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
Environment: Map Reduce, HDFS, Hive, EJB 3, Pig, HBase, SQL, Sqoop, Flume, Oozie, Apache Kafka, Scala, Zookeeper, J2EE, Eclipse, Cassandra.
Hadoop Developer
Confidential, Madison, WI
Responsibilities:
- Gathered the business requirements by working alongside Teradata analysis team.
- Worked on Hadoop ecosystem components including but not limited to HDFS, MapReduce, Hive, Pig, Sqoop and HBase.
- Worked with several web-based application developments by using HTML, CSS, AJAX, JQuery, AngularJS and JavaScript.
- Created data refining queries using Map Reduce.
- Participated in loading data into HDFS from UNIX file system.
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
- Imported data extensively using Sqoop and flume and used Sqoop to export data from HDFS to RDBMS.
- Performed transformations, event joins and pre-aggregations as well by using Pig as ETL prior to storage of data into HDFS.
- Joined multiple tables of a source system using join queries in HIVE and loaded them into Elastic search tables.
- Implemented lambda architecture as a solution.
- Worked on partitioning the tables in Hive to better performance and query times.
- Used Pig to transport data into HBase.
- Automated the task of data loading into HDFS by using Oozie to develop workflow.
- Worked alongside administrators at all levels to architect, implement and test Big Data based Analytical solution from various sources.
- Worked on source system analysis, data analysis, data modelling to ETL.
- Worked on MapReduce procedures to extract data for extraction, transformation and aggregation through multiple file formats like XML, JSON, CSV.
- Extracted data from web server output files by developing Pig Latin scripts and loaded the data into HDFS.
- Helped QA engineers better their testing and troubleshooting.
- Worked with Production Rollout Support - Monitored the solution after it went live and resolved any issues discovered.
- Using JIRA, documented operational problems while complying with standards and procedures.
- Managed and reviewed data backups and logs as well as assisted in cluster maintenance, monitoring and troubleshooting.
- Worked on gathering the requirements from business partners and converting those requirements into technical specifications.
- Provided solutions and technical support for clients daily for Hadoop and its ecosystem including HAWQ database.
- Expert in bucketing concepts, as well as partitions, managed and created external tables in HIVE to better performance.
- Prepared reports using Sqoop to import data into HDFS and HIVE.
Environment: Hadoop, Apache, Sqoop, Hive, Oozie, Java, Flat files, Oracle, MySQL, Windows NT, UNIX, Zoo Keeper, Cloudera, FLUME, CentOS, Maven, CDH, HDFS, MapReduce, Yarn, PIG, Oozie, Sqoop, Linux, Shell scripting, SBT, Amazon S3, JIRA, Git Stash, HDFS, Eclipse, SQL, Oracle 11g.
Java Developer
Confidential
Responsibilities:
- Collecting and understanding the user requirements and functional specifications.
- Developed message driven beans to listen to JMS.
- Developed the web interface using Servlets, Java Server Pages, HTML, and CSS.
- Development of GUI using HTML, CSS, JSP, and Javascript.
- Creating components for isolated business logic.
- Used WebLogic to deploy applications on local and development environments of the application.
- Extensively used the JDBC prepared statement to embed the SQL queries in the Java code.
- Developed (DAO) using Spring Framework 3.
- Developed web applications with Rich Internet applications using Java applets, Silverlight, Java.
- Used JavaScript to perform client side validations and Struts-Validator Framework for Server-side validation.
- Provided on call support based on the priority of the issues.
- Deployment of application in J2EE architecture.
- Implementing Session Façade Pattern using Session and Entity Beans.
Environment: Java, J2EE, JDBC, JSP, Struts, JMS, Spring, SQL, MS-Access, JavaScript, HTML.