Sr.hadoop Developer Resume
NY
PROFESSIONAL SUMMARY:
- Having 7 years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
- Having 4years of work experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Hadoop Ecosystem development including MapReduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, ZooKeeper, Sqoop, Flume, Oozie and AWS.
- Extensive experience working in Teradata, Oracle, Netezza, SQLServer and MySQL database.
- Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
- Strong experience working with different Hadoop distributions like Cloudera, Hortonworks, MapR and Apache distributions.
- Experience in installation, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
- Experience in Amazon AWS services such as EMR, EC2, S3, Cloud Formation, Redshift which provides fast and efficient processing of Big Data.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, Hadoop GEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
- Good understanding of R Programming, Data Mining and Machine Learning techniques.
- Strong experience and knowledge of real time data analytics using Storm, Kafka, Flume and Spark.
- Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
- Involved in upgrading existing MongoDB instances from version 2.4 to version 2.6 by upgrading the security roles and implementing newer features.
- Responsible for performing reads and writes in Cassandra from and web application by using java JDBC connectivity.
- Experience in extending HIVE and PIG core functionality by using custom UDF’s and UDAF’s.
- Debugging MapReduce jobs using Counters and MRUNIT testing.
- Expertise in writing the Real - time processing application Using spout and bolt in Storm.
- Good understanding of Spark Algorithms such as Classification, Clustering, and Regression.
- Good understanding on Spark Streaming with Kafka for real-time processing.
- Extensive experience working with Spark tools like RDD transformations, spark MLlib and spark QL.
- Experienced in moving data from different sources using Kafka producers, consumers and pre-process data using Storm topologies.
- Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
- Good understanding of MPP databases such as HP Vertica, Greenplum and Impala.
- Good knowledge on streaming data from different data sources like Log files, JMS, applications sources into HDFS using Flume sources.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Worked on Docker based containerized applications.
- Knowledge of data warehousing and ETL tools like Talend and Pentaho.
- Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari and Ganglia.
- Experience with Testing MapReduce programs using MR Unit, Junit.
- Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB.
- Expertise in developing responsive Front End components with JSP, HTML, XHTML, JavaScript, DOM, Servlets, JSF, NodeJS, Ajax, JQuery and AngularJS.
- Extensive experience in working with SOA based architectures using Rest based web services using JAX-RS and SOAP based web services using JAX-WS.
- Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA/MINGLE to track issues and crucible for code reviews.
- Experience in different application servers like JBoss/Tomcat, Web Logic, IBM WebSphere.
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, ZooKeeper, Spark, Solr, Storm, Drill, Ambari, Mahout, MongoDB, Cassandra, Avro, Parquet and Snappy.
Hadoop Distributions: Cloudera, MapR, Hortonworks
Languages: Java, Scala, Python, SQL, HTML, JavaScript and C/C++
No SQL Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
Web Design Tools: HTML, JavaScript, JQuery and CSS and AngularJs
Development/Build Tools: Eclipse, Ant, Maven, Gradle, IntelliJ, JUNIT and log4J.
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS and Windows Variants
ETL Tools: Tableau, Talend
PROFESSIONAL EXPERIENCE:
Confidential, NY
Sr.Hadoop Developer
Responsibilities:
- Involved in importing and exporting data between Hadoop Data Lake and Relational Systems like Oracle, MySQL, using Sqoop .
- Worked on Creating Kafka topics, partitions, writing custom partitioners classes.
- Experienced in writing Spark Applications in Scala and Python (Pyspark).
- Imported Avro files using Apache Kafka and did some analytics using Sparking Scala.
- Extracting real time data using Kafka and Spark streaming by Creating D streams and converting them into RDD, processing it and stored it into Cassandra.
- Experience in building Real-time Datapipeline with Kafka Connect and Spark Streaming.
- Configured, deployed and maintained multi-node Dev and Test Kafka Clusters.
- Processed and transferred the data from Kafka into HDFS through Spark Streaming APIs.
- Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting and grouping.
- Developed script which will Load the data into Spark Data frames and do in memory data computation to generate the output response.
- Used Scala sbt to develop Scala coded spark projects and executed using spark-submit.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Building the Cassandra nodes using AWS & setting up the Cassandra cluster using Ansible automation tools.
- Worked and learned a great deal from Amazon Web Services (AWS) cloud services like EC2, S3, EMR, EBS, RDS and VPC.
- Used highly available AWS Environment to launch the applications in different regions and implemented Cloud Front with AWS Lambda to reduce latency.
- Developed Scala scripts, UDF’s using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Developed Oozie Bundles to Schedule Pig, Sqoop and Hive jobs to create data pipelines.
- Experience in using ORC, Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
- Developed Hive queries to do analysis of the data and to generate the end reports to be used by business users.
- Written extensive Hive queries to do transformations on the data to be used by downstream models.
- Used spark and spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Experience in writing and tuning extensive Impala queries and creating views for Adhoc and business processing.
- Design solution for various system components using Microsoft Azure.
- Written generic extensive data quality check framework to be used by the application using impala.
- Generated various marketing reports using Tableau with Hadoop as a source for data.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (Pyspark).
- Involved in ingesting data into Cassandra and consuming the ingested data from Cassandra to Hadoop Data Lake.
- Involved in the process of Cassandra data modelling and building efficient data structures.
- Written storm topology to emit data into Cassandra DB.
- Understanding of Kerberos authentication in Oozie workflow for Hive and Cassandra.
- Developed complex Talend ETL jobs to migrate the data from flat files to database.
Environment: Hadoop, Hive, Impala, Oracle, Spark, Python, Pig, Sqoop, Oozie, Map Reduce, GIT, HDFS, Cassandra, Apache Kafka, Storm, Linux, Tableau, Solr, Confluence, Jenkins, Jira
Confidential, WI
Hadoop Developer
Responsibilities:
- Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
- Extensively used Spark stack to develop pre-processing job which includes RDD, Datasets and Data frames Api'sto transform the data for upstream consumption.
- Developed Real-time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka, Flume and JMS.
- Worked on extracting and enriching HBase data between multiple tables using joins in spark.
- Worked on writing APIs to load the processed data to HBase tables.
- Replaced the existing MapReduce programs into Spark application using Scala.
- Built on premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service.
- Developed the Hive UDF’s to handle data quality and create filtered datasets for further processing
- Experienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS.
- Good knowledge on Kafka streams API for data transformation.
- Implemented logging framework - ELK stack (Elastic Search, LogStash Kibana) on AWS.
- Setup Spark EMR to process huge data which is stored in Amazon S3.
- Developed Oozie workflow for scheduling & orchestrating the ETL process.
- Used Talend tool to create workflows for processing data from multiple source systems.
- Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
- Developed Hive Queries to analyze the data in HDFS to identify issues and behavioural patterns.
- Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts.
- Able to use Python Pandas, NumPy modules for Data analysis, Data scraping and parsing.
- Deployed applications using Jenkins framework integrating Git- version control with it.
- Participated in production support on a regular basis to support the Analytics platform
- Used Rally for task/bug tracking.
- Used GIT for version control.
Environment: MapR, Hadoop, HBase, HDFS, AWS, PIG, Hive, Drill, SparkSQL, MapReduce, Spark streaming, Kafka, Flume, Sqoop, Oozie, Jupyter Notebook, Docker, Kafka, Spark, Scala, HBase, Talend, Shell Scripting, Java.
Confidential, Boston, MA
Hadoop Developer
Responsibilities:
- Part of Configuring Hadoop cluster and load balancing across the nodes.
- Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
- Created Hive queries to compare the raw data with EDW reference tables and performing aggregates.
- Experienced in developing custom input formats and data types to parse and process unstructured and semi structured input data and mapped them into key value pairs to implement business logic in Map-Reduce.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in analyzing data with Hive and Pig.
- Experienced knowledge over designing Restful services using java based API’s like JERSEY.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Integrating bulk data into Cassandra file system using MapReduce programs
- Expertise in designing, data modeling for Cassandra NoSQL database.
- Experienced in managing and reviewing Hadoop log files
- Defined multiple job flows using Oozie workflow.
- Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis.
- Worked closely with AWS EC2 infrastructure teams to troubleshoot complex issues.
- Expertise in writing the Scala code using higher order functions for the iterative algorithms in spark for performance consideration
- Experience in managing and monitoring Hadoop cluster using Cloudera Manager.
- Experienced in analyzing and Optimizing RDD’s by controlling partitions for the given data.
- Good understanding on DAG cycle for entire spark application flow on Spark application WebUI
- Experienced in writing live Real-time Processing using Spark Streaming with Kafka.
- Developed custom mappers in python script and Hive UDFs and UDAFs based on the given requirement.
- Experience in implementing custom sterilizer, interceptor, source and sink as per the requirement in Flume to ingest data from multiple sources.
- Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
- Used HiveQL to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Involved in requirement gathering to setup a cluster.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
- Unit tested a sample of raw data and improved performance and turned over to production.
Environment : CDH, Java(JDK1.7), Hadoop, MapReduce, HDFS, Hive, Sqoop, Flume, Cassandra, Pig, Oozie, Kerberos, Scala, Spark SQL, Spark Streaming, Kafka, Linux, AWS, Shell Scripting, MySQL Oracle 11g, SQL*PLUS
Confidential
Java Developer
Responsibilities:
- Coded front-end components using HTML, JavaScript and jQuery, Back End components using Java, spring, Hibernate, Services Oriented components using Restful and SOAP based web services, and Rules based components using JBoss Drools.
- Involved in design and development phases of Software Development Life Cycle (SDLC).
- Proficient in writing SQL queries, stored procedures for multiple databases, Oracle and SQL Server 2005.
- Integrated Spring Dependency Injection among different layers of an application with spring and O/R mapping tool of Hibernate for rapid development and ease of maintenance.
- Wrote Stored Procedures using PL/SQL. Performed query optimization to achieve faster indexing and making the system more scalable.
- Worked with Struts MVC objects like Action Servlet, Controllers, validators, Web Application Context, Handler Mapping, Message Resource Bundles, Form Controller, and JNDI for look-up for J2EE components.
- Implemented the Connectivity to the Database Server Using JDBC.
- Developed the RESTful web services using Spring IOC to provide user a way to run the job and generate daily status report.
- Developed and exposed the SOAP web services by using JAX-WS, WSDL, AXIS, JAXP and JAXB.
- Configured domains in production, development and testing environments using configuration wizard.
- Created SOAP Handler to enable authentication and audit logging during Web Service calls.
- Created Service Layer API's and Domain objects using Struts.
- Used AJAX and JavaScript for validations and integrating business server-side components on the client side within the browser.
- Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
- Implementing project using Agile SCRUM methodology, involved in daily stand up meetings and sprint showcase and sprint retrospective.
- Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the complexities of the application.
- Developed a Dojo based front end including forms and controls and programmed event handling.
- Used XSLT to transform my XML data structure into HTML pages.
- Deployed EJB Components on Tomcat. Used JDBC API for interaction with Oracle DB.
- Developed the UI panels using JSF, XHTML, CSS, DOJO and jQuery.
Environment : Java 6 - JDK 1.6, JEE, Spring 3.1 framework, Spring Model View Controller (MVC), Java Server Pages (JSP) 2.0, Servlets 3.0, JDBC4.0, AJAX, Web services, Rest API, JSON, Java Beans, jQuery, JavaScript, Oracle 10g, JUnit, HTML Unit, XSLT, HTML/DHTML.
Confidential
Java Developer
Responsibilities:
- Used message driven beans for asynchronous processing alerts to the customer.
- Used Struts framework to generate Forms and actions for validating the user request data.
- Developed Server side validation checks using Struts validators and Java Script validations.
- With JSP’s and Struts custom tags, developed and implemented validations of data.
- Developed applications, which access the database with JDBC to execute queries, prepared statements, and procedures.
- Developed programs to manipulate the data and perform CRUD operations on request to the database.
- Worked on developing Use Cases, Class Diagrams, Sequence diagrams, and Data Models.
- Developed and Deployed SOAP Based Web Services on Tomcat Server
- Coding of SQL, PL/SQL, and Views using IBMDB2 for the database.
- Working on issues while converting JAVA to AJAX.
- Supported in developing business tier using the stateless session bean.
- Extensively used JDBC to access the database objects.
- Using Clear case for source code control and JUNIT testing tool for unit testing.
- Reviewing the code and perform integrated module testing.
Environment: Java 5, J2EE 1.4,AJAX, Struts 1.0, Web Services, SOAP, HTML, XML, JSP, JDBC, ANT, XML, IBM, Tomcat, JUNIT, DB2, Rational Rose, Eclipse Helios, CVS.
