Sr. Hadoop Developer Resume
New York, NY
SUMMARY
- Over 7+ years of total professional experience in IT field involving project development, implementation, deployment and maintenance using Hadoop ecosystem related technologies with domain noledge in Finance, Banking, Communication, Insurance, Retail Industry and Health care.
- 4+ years of hands on experience in Hadoop Ecosystem technologies like HDFS, MapReduce, Yarn, Spark, Hive, Pig, Oozie, Sqoop, Flume, Zookeeper, HBase.
- Over all two years of hands on experience using Spark framework with Scala.
- 3+ years of Java programming experience in developing web based applications and Client - Server technologies.
- In depth understanding of Hadoop Architecture and its various components such as Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, and MapReduce concepts.
- Proficient noledge on Apache Spark and Apache Storm to process real time data.
- Extensive noledge in programming with Resilient Distributed Datasets(RDDs).
- Good exposure to performance tuning hive queries, map-reduce jobs, spark jobs.
- Worked with various formats of files like delimited text files, click stream log files, Apache log files,Avro files, JSON files, XML Files
- Experience on installation and configuration of spark standalone mode for testing and development environments.
- Developed simple to complex MapReduce jobs using Java language.
- Worked on live 60 nodes Hadoop cluster running on Cloudera CDH4.
- Extensive experience in analyzing data using Hive QL, Pig Latin and custom MapReduce programs in Java
- Developed UDF, UDAF, UDTF functions for Hive and Pig.
- Good noledge of Partitions, Bucketing concepts, designed and managed them and created external tables in Hive in order to optimize performance.
- Good experience in Avro files, RC files, Combiners, Counters for best practices and performance improvements.
- Good noledge on Joins, group and aggregation concepts and resolved performance issues in Hive and Pig scripts by implementing them.
- Experience with Big Data ML toolkits such as Mahout and Spark ML.
- Experience in job work flow scheduling and monitoring tools like Oozie and Zookeeper.
- Worked on NoSQL databases including HBase, Cassandra and Mongo DB.
- Experience in importing data from a Relational database management system (RDBMS) such as MySql and Oracle into HDFS, Hive and exported teh processed data back into RDBMS using Sqoop.
- Experience in importing data from RDBMS to HBase and exporting data into RDBMS using Sqoop.
- Implemented Flume for collecting, aggregating and moving large amount of server logs and streaming data to HDFS.
- Experience in HBase cluster setup and implementation.
- Done Administration, installing, upgrading and managing distributions of Cassandra.
- Good noledge in performance troubleshooting and tunning Cassandra clusters and understanding of Cassandra Data Modeling based on applications.
- Experience in setting up Hadoop in Pseudo distributed environment.
- Experience in setting up Hive, Pig, HBase and Sqoop in Ubuntu operating system.
- Good noledge on Software development life cycle(SDLC).
- Experience as Java Developer in Web, Client Server technologies using Java, J2EE, Servlets, JSP, EJB, Hibernate framework and Spring framwork.
- Good understanding of Software Development Life Cycle(SDLC) and sound noledge of project implementation methodologies including Waterfall and Agile.
TECHNICAL SKILLS
- Hadoop
- Cloudera
- Big Data
- HDFS
- MapReduce
- Sqoop
- Spark
- Hive
- HBase
- Linux
- Java
- Eclipse
- Hadoop Distribution of Cloudera
- PL/SQL
- Toad 9.6
- Windows NT
- MongoDB
- Cassandra
- Tableau
- Unix shell scripting
- Putty
- Eclipse
PROFESSIONAL EXPERIENCE
Sr. Hadoop Developer
Confidential - New York, NY
Responsibilities:
- Developed simple to complex MapReduce streaming jobs using Java language for processing and validating teh data.
- Developed data pipeline using MapReduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
- Developed MapReduce and Spark jobs to discover trends in data usage by users.
- Implemented Spark using Python and Spark SQL for faster processing of data.
- Implemented algorithms for real time analysis in Spark.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Used teh Spark -Cassandra Connector to load data to and from Cassandra.
- Real time streaming teh data using Spark with Kafka.
- Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.
- Exported teh analyzed data to teh relational databases using Sqoop, to further visualize and generate reports for teh BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed teh data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting.
- Developed Pig Latin scripts to perform Map Reduce jobs.
- Developed product profiles using Pig and commodity UDFs.
- Developed Hive scripts in HiveQL to De-Normalize and Aggregate teh data.
- Created HBase tables and column families to store teh user event data.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Created UDF's to store specialized data structures in HBase and Cassandra.
- Scheduled and executed work flows in Oozie to run Hive and Pig jobs.
- Used Impala to read, write and query teh Hadoop data in HDFS from HBase or Cassandra.
- Used Tez framework for building high performance jobs in Pig and Hive.
- Configured Kafka to read and write messages from external programs.
- Configured Kafka to handle real time data.
- Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase.
- Written Storm topology to emit data into Cassandra DB.
- Written Storm topology to accept data from Kafka producer and process teh data.
- Continuous monitoring and managing teh Hadoop cluster using Cloudera Manager.
- Used JUnit framework to perform Unit testing of teh application
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Performed data validation on teh data ingested using MapReduce by building a custom model to filter all teh invalid data and cleanse teh data.
- Experience with data wrangling and creating workable datasets.
- Developed schemas to handle reporting requirements using Jaspersoft.
Environment: Hadoop, MapReduce, Spark, Pig, Hive, Sqoop, Oozie, HBase, Zookeeper, Kafka, Flume, Solr, Storm, Tez, Impala, Mahout, Cassandra, Cloudera manager, MySQL, Jaspersoft, Multi-node cluster with Linux-Ubuntu, Windows, Unix.
Hadoop Developer
Confidential - Phoenix, AZ
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka, Pig, Hive and Map Reduce.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Real time streaming teh data using Spark with Kafka.
- Configured Spark streaming to receive real time data from teh Kafka and store teh stream data to HDFS using Scala.
- Worked within teh Apache Hadoop framework, utilizing Opinion Lab statistics to ingest teh data from a streaming application program interface (API), automate processes by creating Oozie workflows, and draw conclusions about consumer sentiment based on data patterns found through teh use of Hive for external client use.
- Wrote teh Storm topology with HDFS Bolt and Hive Bolts as destinations.
- Expertise in writing Storm topology development, maintenance and bug fixes.
- Developed Hadoop streaming Map/Reduce works using Java.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning teh performance of Pig queries.
- Involved in loading data from Linux file system to HDFS.
- Importing and exporting data into HDFS using Sqoop.
- Good noledge on building Apache spark applications using Scala.
- Experience working on processing unstructured data using Pig.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
- Good noledge with NoSQL databases like HBase, Cassandra
- Handled Administration, installing, upgrading and managing distributions of Cassandra.
- Advanced noledge in performance troubleshooting and tuning Cassandra clusters.
- Done Scaling Cassandra cluster based on lead patterns.
- Good understanding of Cassandra Data Modeling based on applications.
- Experience with Cassandra Performance tuning.
- Highly involved in development/implementation of Cassandra environment.
- Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes and VMWare Vm's as required in teh environment.
- Expertise in AWS data migration between different database platforms like SQL Server to Amazon Aurora using RDS tool.
- Supported Map Reduce Programs those are running on teh cluster.
- Gained experience in managing and reviewing Hadoop log files.
- Involved in scheduling Oozie workflow engine to run multiple pig jobs.
- Responsible for developing data pipeline using flume, Sqoop and Pig to extract teh data from weblogs and store in HDFS.
- Data scrubbing and processing with Oozie.
- Developed Pig Latin scripts to extract data from teh web server output files to load into HDFS.
- Involved in developing Hive DDLs to create, alter and drop tables.
- Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Also exploring with teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark SQL, Data Frame, Pair RDDs, Storm, Spark YARN.
- Used many features like Parallelize, Partitioned, Caching (both in-memory and disk Serialization), Kryo Serialization, etc.
Environment: Hadoop, Cloudera, Big Data, HDFS, MapReduce, Sqoop, Spark, Hive, HBase, Linux, Java, Eclipse, Hadoop Distribution of Cloudera, PL/SQL, Toad 9.6, Windows NT, MongoDB, Cassandra, Tableau, Unix shell scripting, Putty and Eclipse.
Hadoop Developer
Confidential - Auburn Hills, MI
Responsibilities:
- Installed and configured Hadoop Ecosystem components and Cloudera manager using CDH distribution.
- Frequent interactions with Business partners.
- Designed and developed a Medicare-Medicaid claims system using Model-driven architecture on a customized framework built on Spring.
- Moved data from HDFS to Cassandra using MapReduce and BulkOutputFormat class.
- Imported trading and derivatives data in Hadoop Distributed File System and Eco System (MapReduce, Pig, Hive, Sqoop).
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Created tables in HBase and loading data into HBase tables.
- Developed scripts to load data from HBase to Hive Meta store and perform Map Reduce jobs.
- Was part of an activity to setup Hadoop ecosystem at dev & QA Environment.
- Managed and reviewed Hadoop Log files.
- Responsible writing Pig Script and Hive queries for data processing
- Running Sqoop for importing data from Oracle & Other Database.
- Creation of shell script to collect raw logs from different machines.
- Created Partition in a Hive as static and dynamic.
- Implemented Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, AND UNION.
- Optimized teh Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
- Defined some Pig UDF for some financial functions such as swap, hedging, Speculation and arbitrage
- Coded many MapReduce program to process unstructured logs file.
- Worked on import and export data into HDFS and Hive using Sqoop.
- Used different data formats (Text format and Avro format) while loading teh data into HDFS.
- Used parameterize Pig script and optimized script using illustrate and explain.
- Involved in teh process of configuring HA, Kerberos security issues and name node failure restoration activity time to time as a part of zero downtime.
- Implemented FAIR Scheduler as well.
Environment: Hadoop, Linux, MapReduce, HDFS, Hbase, Hive, Pig, Shell Scripting, Sqoop, CDH Distribution, Windows, Linux, Java 6, Eclipse, Ant, Log4j and JUnit.
Hadoop Developer
Confidential - Chula Vista, CA
Responsibilities:
- Hands on Experience in joining raw data with teh reference data using Pig scripting.
- Written custom UDF's in Hive.
- Hands on experience in extracting data from different databases and to copy into HDFS file system using Sqoop.
- Written Sqoop incremental import job to move new/updated info. From database to HDFS.
- Created Oozie coordinated workflow to execute Sqoop incremental job daily.
- Used Oozie workflow engine to run multiple Hive and Pig jobs.
- Hands on experience in exporting teh results into relational databases using Sqoop for visualization and to generate reports for teh BI team.
- Involved in Installing and configuring Hive, Pig, Sqoop, Flume and Oozie on teh Hadoop cluster.
- Worked with application teams to install Operating System, Hadoop updates, patches, versions upgrades as required.
- Working with clients on requirements based on their business needs.
- Communicate deliverables status to user/stakeholders, client and drive periodic review meetings.
- On time completion of tasks and teh projects per quality goals.
Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, HBase, Oozie, MySql, SVN, PuttyZookeeper, Ganglia, Unix and Shell scripting.
Hadoop Developer
Confidential - Miami, FL
Responsibilities:
- Integrated Kafka with Storm for real time data processing and written some storm topologies to store teh processed data directly to MongoDB and HDFS.
- Experience in writing Spark SQL scripts.
- Imported data from different sources into Spark RDD for processing.
- Developed custom aggregate functions using Spark SQL and performed interactive querying.
- Involved in loading data from edge node to HDFS using shell scripting.
- Worked on installing cluster, commissioning and decommissioning of Datanode, Namenode high availability, capacity planning and slots configuration.
- Completion of unit testing for teh new Hadoop jobs in standalone mode designated for Unit region using MR Unit.
- Developed Spark scripts by using Scala and Python shell commands as per teh requirement.
- Experience in managing and reviewing Hadoop log files.
- Experience in Hive partitioning, bucketing and perform joins on Hive tables and implementing Hive SerDe like REGEX, JSON and Avro.
- Optimized Hive analytics Sql queries, created tables/views, written custom UDF's and Hive based exception processing.
- Involved in transforming teh Teradata to legacy lables to HDFS and HBase tables using Sqoop and vice versa.
- Configured Fair Scheduler to provide fair resources to all teh applications across teh cluster.
Environment: Hortonworks Hadoop, Ambari, Spark, Solr, Kafka, MongoDB, Linux, HDFS, Hive, Pig, Sqoop, Flume, Zookeeper, RDBMS.
Java/J2EE Developer
Confidential - Roseville, CA
Responsibilities:
- Write design document based on requirements from MMSEA user guide.
- Performed requirement gathering, design, coding, testing, implementation and deployment.
- Worked on modeling of Dialog process, Business Processes and coding Business Objects, QueryMapper and JUnit files.
- Involved in teh design and creation of Class diagrams, Sequence diagrams and Activity Diagrams using UML models
- Created teh Business Objects methods using Java and integrating teh activity diagrams.
- Involved in developing JSP pages using Struts custom tags, jQuery and Tiles Framework.
- Used JavaScript to perform client side validations and Struts-Validator Framework for server-side validation
- Worked in web services using SOAP, WSDL.
- Wrote Query Mappers and MQ Experience in JUnit Test Cases.
- Developed teh UI using XSL and JavaScript.
- Managed software configuration using ClearCase and SVN.
- Design, develop and test features and enhancements.
- Performed error rate analysis of production issues and technical errors.
- Developed test environment for testing all teh Web Service exposed as part of teh core module and their integration with partner services in Integration test.
- Analyze user requirement document and develop test plan, which includes test objectives, test strategies, test environment, and test priorities.
- Responsible for performing end-to-end system testing of application writing JUnit test cases
- Perform Functional testing, Performance testing, Integration testing, Regression testing, Smoke testing and User Acceptance Testing (UAT).
- Converted Complex SQL queries running at mainframes into pig and Hive as a part of a migration from mainframes into Hadoop cluster.
Environment: Shell Scripting, Java 6, JEE, Spring, Hibernate, Eclipse, Oracle 10g, JavaScript, Servlets, Nodejs, JMS, Ant, Log4j and Junit, Hadoop (Pig & Hive).
Java Developer
Confidential
Responsibilities:
- Involved in teh design and implementation of teh architecture for teh project using OOAD, UML design patterns.
- Involved in design and development of server side layer using XML, JSP, JDBC, JNDI, EJB and DAO patterns using eclipse IDE.
- Work involved extensive usage of HTML, CSS, Javascript and Ajax for client side development and validations.
- Used parsers for teh conversion of XML files to java objects and vice versa.
- Developed screens using XML documents and XSL.
- Developed Client programs for consuming teh Web services published by teh Country Defaults Department which keeps in track of teh information regarding life span, inflation rates, retirement age, etc. using Apache Axis.
- Developed Java Beans and JSP's by using Spring and JSTL tag libs for supplements.
- Development of EJB's, Servlets and JSP files for implementing Business rules and Security options using IBM Web Sphere.
- Involved in creating tables, stored procedures in SQL for data manipulation and retrieval using SQL Server, Oracle and DB2.
- Trained end users on developed application.
Environment: Java, JSF Framework, Eclipse IDE, Ajax, Apache Axis, OOAD, Web Logic, Java script, HTML, XML, CSS, SQL Server, Oracle, Web services, Ajax, Spring, OOAD and UML, Windows.
