Hadoop Developer Resume
Cambridge, MA
SUMMARY
- A Dynamic Professional with 8+ years of experience in IT industry which includes 4+years of solid software development experience for mission critical, data - intensive applications in Big Data and Hadoop.
- Experience in developing applications that perform large scale Distributed Data Processing using Hadoop, MapR, Pig, Hive, Sqoop, Oozie, Java, Spark, Storm, Hbase, Cassandra, Kafka, Zookeeper and Flume.
- Excellent understanding and knowledge of Big Data and Hadoop architecture.
- In-Depth knowledge and experience in design, development and deployments of Big Data projects using Hadoop / Data Analytics / NoSQL / Distributed Machine Learning frameworks.
- Good understanding / knowledge of Spark architecture and various components such as Spark streaming, Spark SQL, Spark R programming paradigm.
- Solid experience using YARN and tools like Pig and Hive for data analysis, Sqoop for data ingestion, Oozie for scheduling and Zookeeper for coordinating cluster resources.
- Excellent understanding on CDH, HDP, Pivotal, MapR and Apache Hadoop distributions.
- Good understanding of HDFS Designs, Daemons, HDFS high availability (HA), HDFS Federation.
- Experience in analyzing data using Pig Latin, HiveQL, Hbase and custom MapReduce programs in Java.
- Excellent understanding of Hadoop architecture and different demons of Hadoop clusters which include Resource Manager, Node Manager, Name Node and Data Node.
- Experience in developing applications cloud computing applications using Amazon EC2, S3, EMR.
- Excellent experience in working with high velocity Real-time data processing frameworks using tools like Kafka, Spark and Storm.
- Expert in working with Hive data warehouse creating tables, data distribution by implementing Partitioning and Bucketing.
- Expertise in implementing Ad-hoc queries using Hive QL and writing Pig Latin scripts to sort, group, join and filter the data as part of data transformation as per the business requirements.
- Experienced with different kind of compression techniques to save data and optimize data transfer over networks using Lzo, Snappy, Gzip, Bzip2 etc.
- Experience with NoSQL databases like Hbase, Cassandra, MongoDB and Couchbase as well as other ecosystems like ZooKeeper, Oozie, Impala, Storm, Spark Streaming, SparkSQL, Kafka and Flume.
- Extending HIVE and PIG core functionality by using custom UDFs in Java.
- Experience in importing and exporting data using Sqoop from HDFS (Hive & Hbase) to Relational Database Systems (Oracle, Mysql, DB2, Informix, TeraData) and vice-versa.
- Hands on experience in setting up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs using Oozie-Coordinator.
- Good understanding on SQL database concepts and data warehouse Technologies like Talend.
- Extensive experience in Requirements gathering, Analysis, Design, Reviews, Coding and Code Reviews, Unit and Integration Testing.
- Extensive experience using Java, JEE, J2EE design Patterns like Singleton, Factory, MVC, Front Controller, for reusing most effective and efficient strategies.
- Expertise in using IDE like WebSphere (WSAD), Eclipse, NetBeans, My Eclipse, WebLogic Workshop.
- Experience in developing service components using JDBC.
- Experience in developing and designing Web Services (SOAP and Restful Web services).
- Good amount of experience in developing applications using SCRUM methodology.
- Good understanding and experience with Software Development methodologies like Agile and Waterfall.
- Excellent communication, analytical skills and flexible to learn new technologies in the IT industry towards company’s success.
TECHNICAL SKILLS
Languages/Tools: Java, XML, XSTL, HTML/XHTML, HDML, DHTML, Python, Scala, R, GIT.
Big Data Technologies: Apache Hadoop, HDFS, Spark, HIVE, PIG, Talend, Hbase, SQOOP, Oozie, Zookeeper, Spark, Mahout, Kafka, Storm, Impala.
Java Technologies: JSE: JAVA architecture, OOPs concepts JEE:JDBC, JNDI, JSF(Java Server Faces), Spring, Hibernate, SOAP/Rest web services
Web Technologies: HTML, XML, Java Script, WSDL, Soap, JSON, angular JS
Databases/NO SQL: MS SQL Server, MySQL, Hbase, Oracle, MS Access, Teradata, oracle, Netezza, Cassandra, Greenplum, MongoDB.
PROFESSIONAL EXPERIENCE
Confidential, Cambridge, MA
Hadoop Developer
Responsibilities:
- Maintained System integrity of all sub-components (primarily HDFS, MR, Hbase, and Hive).
- Migrating the needed data from MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Involved in creating mappings, sessions and workflows. Used all kind of Informatica transformations.
- Developed multiple Kafka Producers and Consumers from base by using low level and high level API's and implementing.
- Involved in moving data from Hive tables into Cassandra for real time analytics on Hive tables
- Create, alter, insert and delete queries involving lists, sets and maps in DataStax Cassandra
- Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- HiveQL scripts to create, load, and query tables in a Hive.
- Analyzed, developed and implemented the ETL architecture using Erwin &Informatica.
- Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.
- Used all kinds ofInformaticatransformations to load all kinds of data and to automate the process.
- Supported Map Reduce Programs those are running on the cluster
- Monitored System health and logs and respond accordingly to any warning or failure conditions.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
- Implemented Spark using Scardsla and Spark SQL for faster testing and processing of data.
- Real time streaming the data using Spark with Kafka.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Strongly recommended to bring in Elastic Search and was responsible for installing, configuring and administration.
- Developing and maintaining efficient ETL Talend jobs for Data Ingest.
- Worked on Talend RTX ETL tool, develop jobs and scheduled jobs in Talend integration suite.
- Modified reports and Talend ETL jobs based on the feedback from QA testers and Users in development and staging environments.
- Involved in migration Hadoop jobs into higher environments like SIT, UAT and Prod.
Environment: Hortonworks, HDFS, Hive, HQL scripts, Scala, Map Reduce, Storm, Spark, Java, Hbase, Cassandra, Pig, Sqoop, Shell Scripts, Oozie Co-ordinator, MySQL, Tableau, Elastic search, Talend, Informatica and SFTP.
Confidential, Indianapolis, IN
Hadoop Developer
Responsibilities:
- Coordinated with business customers to gather business requirements. And also interacted with other technical peers to derive Technical requirements.
- Developed Sqoop jobs to import and store massive volumes of data in HDFS and Hive.
- Developed MapReduce, Pig and Hive scripts to cleanse, validate and transform data.
- Worked with performance issues and tuning the Pig and Hive scripts.
- Orchestrated Oozie workflow engine to run multiple Hive and Pig jobs.
- Designed and developed PIG data transformation scripts to work against unstructured data from various data points and created a base line.
- Involved in implementing Spark RDD transformations, actions to implement business analysis.
- Worked on creating and optimizing Hive scripts for data analysts based on the requirements.
- Created Hive UDFs to encapsulate complex and reusable logic for the end users.
- Experienced in working with Sequence files and compressed file formats.
- Converted the raw files (CSV, TSV) to different file formats like Parquet and Avro with datatype conversion using cascading.
- Developed the code for removing or replacing the error fields in the data fields using cascading.
- Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
- Monitored the cascading flow using the Driven component to ensure the desired result was obtained.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka
- Developed some utility helper classes to get data from Hbase tables.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
Environment: Cloudera, Linux (CentOS, RedHat), UNIX Shell, Pig, Hive, MapReduce, YARN, Apache Spark, Eclipse, Core Java, JDK1.7, Oozie Workflows, AWS, EMR, HBASE, Cassandra, SQOOP, Scala, Kafka.
Confidential, Tampa, FL
Hadoop Developer
Responsibilities:
- Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
- Used Sqoop widely in order to import data from various systems/sources (like MySQL) into HDFS.
- Applied Hive quires to perform data analysis on Hbase using Storage Handler in order to meet the business requirements.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Hands on experience with NoSQL databases like Hbase.
- Developing Scripts and Batch Job to schedule abundle (group of coordinators) which consists of various Hadoop Programs using Oozie
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Involved in ETL, Data Integration and Migration.
- Developing custom aggregate functions using Spark SQL and performed interactive querying.
- Assisted in creating and maintaining Technical documentation to launching HADOOP Clusters and even for executing Hive queries and Pig Scripts
- Installed and configured Hadoop, Mapreduce, HDFS, Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Worked on ETL Informatica for parsing the data, and then the parsed data is loaded to HDFS.
- Responsible for building Scalable distributed data solutions using Hadoop.
- Assisted in setting up Horton works cluster and installing all the ecosystem components through Ambari and manually from command line.
- Developed scripts for tracking the changes in file permissions of the files and directories through audit logs in HDFS.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in loading data from UNIX file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data.
- Balancing HDFS manually to decrease network utilization and increase job performance.
- Set up automated processes to archive/clean the unwanted data on the cluster, in particular on HDFS and Local file system.
Environment: Cloudera, HDFS, Hive, Pig, Sqoop, LINUX, Hbase, Tableau, Informatica, Micro strategy, Shell Scripting, Ubuntu, RedHat Linux.
Confidential, Tampa, FL
Hadoop Developer
Responsibilities:
- Developed data pipeline using Flume, Sqoop, Pig and Map Reduce to ingest customer behavioural data and purchase histories into HDFS for analysis.
- Used Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Loaded the aggregated data onto DB2 for reporting on the dashboard.
- Monitoring and Debugging Hadoop jobs/Applications running in production.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Building, packaging and deploying the code to the Hadoop servers.
- Moving data from Oracle to HDFS and vice-versa using SQOOP
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked with different file formats and compression techniques to determine standards
- Developed Hive scripts for implementing control tables logic in HDFS.
- Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE
- Created Hbase tables to store various data formats of data coming from different portfolios and data processing using SPARK.
- Cluster co-ordination services through ZooKeeper.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Hands on writing Map Reduce code to make unstructured data as structured data and for inserting data into HBase from HDFS.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
Environment: JDK, Ubuntu Linux, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, MongoDBZookeeper, Hbase, Java, Shell Scripting, Informatica, Cognos, SQL, Teradata.
Confidential
Java/J2EE Developer
Responsibilities:
- Analysis of system requirements and development of design documents.
- Development of Spring Services.
- Development of persistence classes using Hibernate framework.
- Development of SOA services using Apache Axis webservice framework.
- Development of user interface using Apache Struts2.0, JSPs, Servlets, JQuery, HTML and Java Script.
- Developed client functionality using ExtJS.
- Development of JUnit test cases to test business components.
- Extensively used Java Collection API to improve application quality and performance.
- Vastly used Java 5 features like Generics, enhanced for loop, type safe etc.
- Providing production support and enhancements design to the existing product.
Environment: s: Java 1.5, SOA, Spring, ExtJS, Struts 2.0, Servlets, JSP, GWT, JQuery, JavaScript, CSS, Web Services, XML, Oracle, Weblogic Application Server, Eclipse, UML, Microsoft Vision.
Confidential
Java Developer
Responsibilities:
- Involved in the analysis, design, implementation, and testing of the project.
- Implemented the presentation layer with HTML, XHTML and JavaScript.
- Involving in creation object model to relational using Hibernate.
- Developed web components using JSP, Servlets and JDBC.
- Implemented database using SQL Server.
- Consumed Web Services for transferring data between different applications.
- Wrote complex SQL and stored procedures.
- Involved in fixing bugs and unit testing with test cases using JUnit.
- Developed user and technical documentation.
Environment: Java, SQL, Servlets, HTML, XML, Hibernate, JavaScript, spring, Hibernate.