Hadoop Developer Resume
New, YorK
SUMMARY:
- Experience around 9 years in IT industry with complete software development of life cycle (SDLC) which includes business requirements gathering, system analysis & design, data modeling, development, testing and implementation of the projects.
- Experience around 5 years in development, implementation and configuration of Hadoop ecosystem tools such as HDFS, MapReduce, Hive, Oozie, Sqoop, NiFi, Kafka, Zookeeper, ElasticSearch, Knox, Ranger, Cassandra, HBase, MongoDB, Spark Core, Spark Streaming, Spark Data Frame and Spark MLlib.
- Experienced in configuration, deployments and managing of different Hadoop distributions like Cloudera (CDH4 & CDH5) and Hortonworks (HDP).
- Experience of import/export data using Sqoop from Hadoop distributed file systems to relational database systems and vice versa.
- Experience in handling various file formats like AVRO, Sequential, text, xml, JSON and Parquet with different compression techniques such as gzip, LZO, Snappy etc.
- Experienced on Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming and Spark MLlib.
- Imported the data from source HDFS into Spark Data Frame for in - memory data computation to generate the optimized output response and better visualizations.
- Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-
- Core also convert RRD to Data Frame.
- Experienced on collection the real time streaming data and creating the pipeline for row data from different source using Kafka and store data into HDFS and NoSQL using Spark.
- Extending HIVE core functionality by using custom User Defined Function's (UDF) and User Defined Aggregating Functions (UDAF).
- Implemented POC for using Impala for data processing on top of HIVE for better utilization of C++ executions engines.
- Experience in NoSQL Databases HBase, Cassandra and it’s integrated with Hadoop cluster .
- Implemented Cluster for NoSQL tools HBase as a part of POC to address HBase limitations.
- Exploring with Spark Beta version API to improve the performance, and optimization of the existing algorithms with different modes such as YARN, Mesos and standalone for POC.
- Expertise in using ETL Tool Informatica Power Center designer, workflow manager, repository manager, data quality and ETL concepts.
- Experienced with NiFi to automate the data movement between different Hadoop systems.
- Worked with different Hadoop Security such as Knox and Ranger integrated LDAP store with Kerberos KDC.
- Good understanding on security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experienced on cloud integration with AWS using Elastic Map Reduce (EMR), Simple Storage Service (S3), EC2, Redshift and Microsoft Azure.
- Experienced on different Relational Data Base Management Systems like Teradata, PostgresDB, DB2, Oracle and SQL Server.
- Experienced in scheduling and monitoring the production jobs using Oozie and Azkaban.
TECHNICAL SKILLS:
Hadoop Ecosystems: Hadoop, HDFS, MapReduce, Hive, Spark Core, Spark SQL, Spark Streaming, Spark MLlib Impala, Kafka, YARN, Oozie, Zookeeper, Solar, Sqoop, NiFi, Knox, Ranger, and Kerberos .
Cloud Services: EMR, EC2, S3, Cloud Watch, RedShift, BigQuery and MS Azure.
Languages: Java, Scala, Python, Pandas, R, PL/SQL, Unix Shell Scripting.
UI Technologies: HTML5, JavaScript, CSS3, Angular, XML, JSP, JSON AJAX.
Development Tools: IntelliJ, Postman’s, Scala IDE, Jupyter, Zeppelin, Condo.
Frameworks/Web Server: Spring, JSP, Hibernate, Web Logic, Web Sphere, Tomcat.
SQL/ NoSQL Databases: Teradata, PostgreSQL, Oracle, HBase, MongoDB, Cassandra, MySQL and DB2.
Other tools: GitHub, BitBucket, SVN, JIRA, Vagrant, Dockers, Maven.
WORK HISTORY:
Confidential, New York
Hadoop Developer
Responsibilities:
- Actively involved in installation, configuration, design, developments and maintenances Hadoop cluster with several tools set with complete software development life cycle as an agile methodology.
- Working latest version of Hadoop distribution system such as Hortonworks Distribution (HDP2.X).
- Working on both kind of data processing as batch and streaming with ingestion to NoSQL and HDFS with different file format such as parquet and AVRO.
- Working on integration of Kafka with Spark streaming for high speed data processing.
- Developed multiple Kafka Producers and Consumers as per the business requirement also customized the partition to get optimized results.
- Working on data pipelines as per the business requirements and scheduled it using Oozie schedulers.
- Working on advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala and Python as per requirements.
- Working on cluster co-ordination with data capacity planning and node forecasting using Zookeepers.
- Working on experimental Spark API for better optimization of existing algorithms such as Spark context, Spark SQL, Spark Streaming, Spark DataFrames.
- Involved on configuration, development of Hadoop environment with AWS cloud such as EC2, EMR, Redshift, Route 53, Cloud watch.
- Experience on machine learning for training the data models using supervision algorithms of classifications.
- Worked on Spark and MLlib to develop a linear regression model for logistic information.
- Worked on Exporting and analyzing data to the RDBMS using for visualization and to generate reports for the BI team.
- Supported in setting up QA environment and updating configurations for implementing scripts.
Environment : Scala, Spark SQL, Spark Streaming, Spark Data Frame, Spark MLlib, HDFS, Hive, Sqoop, Kafka, Shell Scripting, Cassandra, Python, AWS, Tableau, SQL Server, GitHub, Maven.
Confidential - Philadelphia, PA
Hadoop Developer
Responsibilities:
- Involved in installation, configuration and Design of Hadoop Distributed using Cloudera and Hortonworks of Hadoop.
- Involved in complete Big Data flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data using several tools.
- Imported the data from various formats like JSON, Sequential, Text, CSV, AVRO and Parquet to HDFS cluster with compressed for optimization.
- Experienced on ingesting data from RDBMS sources like - Oracle, SQL Server and Teradata into HDFS using Sqoop.
- Configured Hive and written Hive UDF’s and UDAF’s Also, created partitions such as Static and Dynamic with bucketing.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like spark.
- Importing and exporting data into HDFS and hive using Sqoop and Kafka with batch and streaming.
- Experienced with Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into HBase.
- Performance analysis of Spark streaming and batch jobs by using Spark tuning parameters.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Developed Spark scripts by using Python shell commands as per the requirement.
- Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
- Experience in managing and reviewing huge Hadoop log files.
- Expertise in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting
- Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
- Maintaining technical documentation for each and every step of development environment and launching Hadoop clusters.
- Built the automated build and deployment framework using GitHub and Maven etc.
- Worked on IntelliJ IDEA to develop the code and dubbing.
- Worked on BI tools as Tableau to create dashboards like weekly, monthly, daily reports using tableau desktop and publish them to HDFS cluster.
Environment : Scala, Hadoop, HDFS, Hive, Oozie, Sqoop, NiFi, Spark, Kafka, Elastic Search, Shell Scripting, HBase, Python, GitHub, Tableau, Oracle, MySQL, Teradata and AWS.
Confidential - Plano, TX
Hadoop Developer
Responsibilities:
- Involved in requirement gathered, narrates the stories and worked with complete Software Development Life Cycle (SDLC) methodologies based on Agile.
- Involved in installation, configuration, supporting and managing Hadoop Clusters using Hortonworks Distribution (HDP) to Cloudera Distributions Hadoop (CDH).
- Worked on Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Experienced in managing and reviewing Hadoop log files and documenting the issues on daily basis to the resolution portal.
- Implemented Dynamic Partitions, Buckets in HIVE.
- Experience configuring spouts and bolts in various Storm topologies and validating data in the bolts.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Have an experience to l oad and transform large sets of structured, semi structured and unstructured data, using Sqoop from Hadoop Distributed File Systems to Relational Database Systems and also Relational Database Systems to Hadoop Distributed File Systems.
- Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and databases such as HBase.
- Established/implemented firewall rules, validated rules with vulnerability scanning tools
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce Hive, Pig, and Sqoop.
- Implemented Storm builder topologies to perform cleansing operations before moving data into HBase.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
- Used Spark to create API's in Java and Python for Big Data analysis.
- Experience in troubleshooting errors in Cassandra, Hive and MapReduce.
- This plugin allows Hadoop MapReduce programs, Cassandra, Pig and Hive to work unmodified and access files directly.
- Used versions controls tools such as GitHub to pull data from Upstream to local branch, check conflict, cleaning also reviewing the codes of other developers.
- Involved with development teams to discuss JIRA stories and understand the requirements.
- Actively, involved in complete life cycle of agile methodology to design, develop, deploy and support solutions.
Environment: Hadoop, Hive, Pig, Strom, Cassandra, Sqoop, Impala, Oozie, Java, Python, Shell Scripting, MapReduce, Java Collection, MySQL.
Confidential - Austin, TX
Java Developer
Responsibilities:
- Responsible and active in the analysis, design, implementation and deployment of full Software Development Lifecycle (SDLC) of the project.
- Designed and developed user interface using JSP, HTML and JavaScript.
- Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
- Validated the fields of user registration screen and login screen by writing JavaScript and jQuery validations.
- Used DAO and JDBC for database access.
- Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
- Design and develop XML processing components for dynamic menus on the application.
- Involved in postproduction support and maintenance of the application.
- Involved in the analysis, design, implementation, and testing of the project modules.
- Implemented the presentation layer with HTML, XHTML and JavaScript.
- Developed web components using JSP and JDBC.
- Implemented database using SQL Server.
- Designed tables and indexes.
- Wrote complex SQL queries and stored procedures.
- Involved in fixing bugs and unit testing with test cases using JUnit.
- Created user and technical documentation.
Environment: Java, Oracle, HTML, XML, SQL, J2EE, JUnit, JDBC, JSP, Tomcat, SQL Server, MongoDB, JavaScript, GitHub, SourceTree, NetBeans.
Confidential - San Jose, CA
Responsibilities:
- Participating in user requirement sessions to gather business requirements and technical walk through.
- Involved in requirements Analysis, Design, Development, Integration and testing of application modules.
- Involved in debugging and troubleshooting the bugs and resolved those issues.
- Deployed applications on JBoss Application Server.
- Developed Hibernate POJO Classes, Hibernate Configuration file and Hibernate Mapping files.
- Experience on Unit and Functional testing and coding with Junit framework.
- Utilized Model-View-Presenter (MVP) design pattern, decoupling view and presenter in front-end development.
- Used SVN as version control tools to maintain the code repository and Tortoise SVN client for execution.
- Extensively worked on core Java concepts for backend coding which involves Collection API and multithreads.
- Developed all the UI pages using HTML5, CSS3, JSON, JavaScript, Bootstrap and Node JS.
- Implemented a Single Page Web Applications (SPA) based front end for displaying user requests, user records history and security settings various users using JavaScript and Angular JS.
- AJAX is used to get the data from the server asynchronously by using JSON/XML object.
- Used the Node JS with Flux Frameworks in the development of the web applications.
- Implemented a Node JS server to manage authentication. Used Spring Core Annotations for Dependency injection, Spring MVC for RESTAPI's and Spring Boot for microservices.
- Implemented the project using the Vaadin Framework.
- Designed and developed base framework classes, common re-usable components.
- Used MAVEN to define the dependencies and build the application and used JUnit for suite execution and Assertions.
- Hands on experience in creating Docker containers and images and deployed the code using docker and AWS Services.
- Created Jenkins job to trigger Cloud Formation scripts and deploy jar/ear file in AWS EC2 instance by triggering Ansible playbook from a Jenkins Job.
- Expertise in Back-end/server side java technologies such as: Web services, Java persistence API (JPA), Java . Messaging Service (JMS), Rabbit MQ, Java Database Connectivity (JDBC), Java Naming and Directory Interface (JND).
- Architecture with JSP as View, Servlet as Controller and combination of EJBs and Java classes as Model. Used Struts2, JSTL, Struts-el, Tag Libraries.
- Communicating with production, QA teams for support.
- Involved in Bug fixing and closing tickets raised by QA team.
- Experience working on writing SQL Queries and SQL server 2008/2012.
- Experience working with defect tracking tool JIRA.
- Good knowledge about Agile and Waterfall methodologies.
Environment: JDK 1.8, JSP, JBoss 7, Unit Testing, JDBC, XML, DOM, SAX, SVN, HTML DHTML, JNDI, Restful Web Services, Node JS, HTML 5, CSS, Micro services, Spring Boot, Soap UI, Groovy, Grails, AWS, PII Data, Jenkins, JUNIT, SQL, DHTML, SQL Server, PL/SQL Developer, Log4j, ANT, JIRA
