Hadoop Developer Resume
BostoN
SUMMARY
- A Qualified IT Professional with 5+ years of experience including 3+ years of experience as a Hadoop developer.
- Working experience of various phases of SDLC such as Requirement Analysis, Design, Code Construction and Test.
- Having 3+ years of experience as a Hadoop Developer.
- Having 2+ years of experience in the field of software development in creating solutions for Enterprise Applications and Web based Applications using JAVA & J2EE Technologies.
- Has experience in Hadoop distributions like Amazon, Cloudera and Hortonworks and Cloud ecosystems components of Amazon like Redshift, Dynamo DB, and EMR.
- Having Experience on HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, HBase, Impala, Cassandra, Zookeeper, Hue.
- Having experience with Schema design using like HBase, Cassandra and having knowledge on NoSQL databases.
- Experience in Python and shell scripting.
- Having experience on importing and exporting data using Flume and Kafka.
- Having experience in data modelling using Cassandra.
- Good Knowledge on Hadoop EcoSystem components, Cluster architecture and monitoring the cluster.
- Having basic knowledge about real - time processing tools Storm, Spark and having knowledge on Spark Streaming, Spark SQL and building spark applications using Scala.
- Having exposure on spark architecture and how RDD’s work internally.
- Having experience in Scala programming language and used in data processing with spark.
- Having experience in loading CSV, JSON and Paraquet files in Apache Spark using Data Frame API.
- Experience in analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java
- Having Experience on monitoring tools Ganglia, Cloudera Manager, and Ambari.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems, Teradata and vice versa.
- Real time experience on Production deployment, code fixes.
- Basic knowledge on Machine Learning and Predictive Analysis.
- Having good experience in using agile approaches and agile scrum.
- Worked on Talend data visualization tools.
- Wrote custom UDF’s for extending Hive and Pig core functionality.
- Worked on Postgres, MySQL, and Oracle 10g.
- Development experience with IDE’s Eclipse and NetBeans.
- Passionate about working on the most cutting-edge Big Data technologies.
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
- Willing to update my knowledge and learn new skills according to business requirements.
TECHNICAL SKILLS
Hadoop Technologies: HDFS,MapReduce,Hive,Impala,Pig,Sqoop,Flume,Oozie,Zookeeper,Ambari,Hue, Spark, Storm, Talend, Ganglia, Kafka
Operating System: Windows, Unix, Linux
Languages: Java, J2EE, SQL, PL/SQL, Shell Script, Python
Project Management Tools: MS Project, MS Office, TFS, HP Quality Center Tool
Front - End: HTML, JSTL, DHTML, JavaScript, CSS, XML, XSL, XSLT
Databases: MySQL, Oracle 11g/10g/9i, SQL Server
NoSQL Databases: HBase, Cassandra
File System: HDFS
Reporting Tools: Jasper Reports, Tableau, Talend
IDE Tools: Eclipse, NetBeans
Application Server: Apache Tomcat, Web Logic
PROFESSIONAL EXPERIENCE
Confidential, Boston
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Written multiple Map Reduce programs in Java for Data Analysis.
- Wrote Map Reduce job using Pig Latin and Java API.
- Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
- Developed Pig scripts for analyzing large data sets in the HDFS.
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
- Designed and presented plan for POC on Impala.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Knowledge on handling Hive queries using Spark SQL dat integrates with Spark environment.
- Implemented Avro and parquet data formats for Apache Hive computations to handle custom business requirements.
- Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing Hive queries to further analyze the logs to identify issues and behavioral patterns.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Implemented Daily Cron jobs dat automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
- Responsible for performing extensive data validation using Hive
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Involved in loading data from Teradata database into HDFS using Sqoop queries.
- Involved in submitting and tracking MapReduce jobs using Job Tracker.
- Involved in migrating the Map Reduce jobs into Spark jobs.
- Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames API.
- Integrated Spark Streaming with Sprinkler using pull mechanism and loaded the JSON data from social media into HDFS system.
- Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
- Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
- Exported data to Tableau and excel with Power view for presentation and refining
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
- Implemented Hive Generic UDF's to implement business logic.
- Extracted files from Cassandra through sqoop and placed in HDFS and processed.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Apache Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Cassandra, Linux, Maven, Teradata, Zookeeper, Tableau.
Confidential - Minneapolis, MN
Hadoop Developer
Responsibilities:
- Worked on writing transformer/mapping Map-Reduce pipelines using Java.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Designed and implemented Incremental Imports into Hive tables.
- Worked in Loading and transforming large sets of structured, semi structured and unstructured data
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate TEMPeffective querying on the log data
- Experienced in managing and reviewing the Hadoop log files.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Implemented the workflows using Apache Oozie framework to automate tasks
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Involved in Setup and benchmark of Hadoop /HBase clusters for internal use.
- Used NoSQL database with Cassandra.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Pig Scripts.
Environment: Hadoop, HDFS, Map Reduce, Sqoop, Oozie, Pig, Hive, Flume, LINUX, Java, Eclipse, Cassandra.
Confidential - Phoenix, AZ
Hadoop Developer
Responsibilities:
- Responsible for loading the customer's data and event logs from MSMQ into HBase using Java API.
- Created HBase tables to store variable data formats of input data coming from different portfolios
- Involved in adding huge volumes of data in rows and columns to store data in HBase.
- Used Sqoop for transferring data from HBase to HDFS and vice versa.
- Responsible for architecting Hadoop clusters with CDH4 on CentOS, managing with Cloudera Manager.
- Involved in initiating and successfully completing Proof of Concept on FLUME for Pre-Processing, Increased Reliability and Ease of Scalability over traditional MSMQ.
- Used Flume to collect the log data from different resources and transfer the data type to hive tables using different SerDe’s to store in JSON, XML and Sequence file formats.
- Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
- End-to-end performance tuning of Hadoop clusters and MapReduce routines against very large data sets.
- Developed the Pig UDF'S to pre-process the data for analysis
- Monitored Hadoop cluster job performance and performed capacity planning and managed nodes on Hadoop cluster.
- Proficient in using Cloudera Manager, an end to end tool to manage Hadoop operations.
Environment: Hadoop (CDH4), Big Data, HDFS, Pig, Hive, MapReduce, Sqoop, Cloudera manager, LINUX, FLUME, HBase, Pig, Hive
Confidential, NY
JAVA/J2EE Developer
Responsibilities:
- Responsible and active in the analysis, design, implementation and deployment of full Software Development Lifecycle (SDLC) of the project.
- Designed and developed user interface using JSP, HTML and JavaScript.
- Developed Struts action classes, action forms and performed action mapping using Struts framework and performed data validation in form beans and action classes.
- Extensively used Struts framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
- Validated the fields of user registration screen and login screen by writing JavaScript validations.
- Developed build and deployment scripts using Apache ANT to customize WAR and EAR files.
- Used DAO and JDBC for database access.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
- Design and develop XML processing components for dynamic menus on the application.
- Involved in post-production support and maintenance of the application.
Environment: Oracle 11g, Java 1.5, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat 6.
Confidential
Java / Web Developer
Responsibilities:
- Familiarized in the various aspects of Agile Methodologies such as Scrum and estimating various tasks.
- Used various design patterns such as factory, singleton, session facade, DAO and DTO, Service Locator, Transaction management
- Involved in developing Web Services using SOAP for sending and getting data from external interface
- Involved in requirement gathering, requirement analysis, defining scope, and design
- Worked with various J2EE components like Servlets, JSPs, JNDI, JDBC using, Web Logic Application server
- Involved in developing and coding the interfaces and classes required for the application and created appropriate relationships between the system classes and the interfaces provided
- Assisting project managers with drafting use case scenarios during the planning stages.
- Developing the Use Cases, Class Diagrams and Sequence Diagrams
- Used Java Script for client side Validation
- Used HTML, CSS, JavaScript to create web pages
- Deployed Servlets and JSP pages using Apache Tomcat server.
Environment: Java, J2EE, JDBC, HTML, CSS, JavaScript, Servlets, JSP, JDBC, Oracle, Eclipse, Web Logic