Senior Hadoop Developer Resume
Medford, MA
SUMMARY
- 8 years of overall experience in a variety of industries including 3+ years of experience in Big Data Technologies (Apache Hadoop stack and Apache Spark), 4+ years of experience in Java Technologies and 1+ years of experience in Dot Net technologies
- Hands on experience in Hadoop Ecosystem components such as Hadoop, Spark, HDFS,MapReduce, YARN, TEZ, Hive, Sqoop, Flume, Pig, Oozie, Zookeeper, HBase, Kafka.
- Hands on experience on working in multiple domains such as Manufacturing, Healthcare,Finance & Banking etc.
- Experience in working with Cloudera, Horton works, Amazon Web Services and Microsoft Azure HDINSIGHT Hadoop Distributions.
- In - depth knowledge of Apache Hadoop Architecture (1.x and 2.x) and Apache Spark 1.x Architecture.
- Experience in implementing OLAP multi-dimensional cube functionality using ATSCALE.
- Responsible for ingesting structured data residing on our traditional back-end databases on toHadoop and HIVE using SQOOP.
- Hands on experience in writing Map Reduce jobs to perform data cleaning and preprocessing using Java and Python.
- Experience in dealing with SQL in Hadoop with Apache Hive.
- Hands on experience in writing Apache Spark SQL and Spark Streaming programming with Scala and Python.
- Experienced in transporting, and processing real time event streaming using Spark streaming andKafka.
- Experience in writing Hive UDF to in corporate complex business logic into Hive Queries.
- Responsible for modification and performance tuning of HIVE scripts, resolving automation jobfailure issues and reloading the data into HIVE Data Warehouse if needed.
- Experience in writing Sparktransformations and actions using SparkSQL(RDDs and
- Dataframes) in Scala by converting Hive/SQL queries.
- In-depth knowledge of Apache Hadoop Architecture (1.x and 2.x) and Apache Spark 1.x Architecture
- Hands on experience with Cloudera and Hortonworks
- Hands on experience in Hadoop Ecosystem components such as Hive, Pig, Sqoop, Flume, Impala, Oozie, Zookeeper, HBase.
- Strong knowledge of Hadoop Architecture and Daemons such asHDFS, Job Tracker, Task Tracker, Name None, Data Node and Map Reduce concepts.
- Hands on experience in writing Map Reduce programs using Java to handle different data sets using Map and Reduce tasks.
- Hands on experience with various Apache Hadoop Ecosystems such as Hadoop, Spark, HDFS, MapReduce, YARN, TEZ,HBase, Pig, Hive, Sqoop, Flume, Oozie, and Kafka
- Hands on experience in writing MapReduce jobs in Java, Pig, and Python
- Experience in dealing with SQL in Hadoop with Apache Hive
- Hands on experience in writing Apache Spark SQL and Spark Streaming programming with Scala and Python.
- Developed multiple Map Reduce jobs to perform data cleaning and preprocessing.
- Involved in designing the data model in Hive for migrating the ETL process into Hadoop and wrote Pig Scripts to load data into Hadoop environment
- Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design.
- Expertise in writing Hive UDF, Generic UDF's to in corporate complex business logic into Hive Queries.
- Experienced in optimizing Hive queries by tuning configuration parameters.
- Implemented SQOOP for large dataset transfer between Hadoop and RDBMS.
- Extensively used Apache Flumeto collect the logs and error messages across the cluster.
- Experience in implementing Real-Time streaming and analytics using SparkStreaming and Kafka
- Experience in data ingestion using Sqoop from RDBMS to HDFS and Hive and vice-versa
- Proficient in Java/J2EE technologies - Core Java, JSP, Java Beans, Java Servlets, Ajax, JDBC, ODBC, Web Services, Swing, Hibernate, Spring, Struts, XML and XSLT
- Proficient in Dot Net Technologies - C# .Net, ASP .Net, Entity Framework, WCF, Ajax, and MVC
- Good Experience in MVC architecture using Spring, Struts and ASP .Net frameworks
- Performed data analysis using MySQL, SQL Server Management Studio and Oracle
- Experience with ETL Tool using Informatica, Talend and SSIS
- Experience in working with Cloudera (CDH3 & CDH4&CDH5) and Hortonworks Hadoop Distributions.
- Hands on experience onAWS infrastructure services Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2).
- Worked with Oozie and Zookeeper to manage the flow of jobs and coordination in the cluster
- Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure using Cloudera manager.
- Experience with configuration of Hadoop Ecosystem components: Map Reduce, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Flume, Storm, Spark, Yarn, Tez.
- Experience with Restful Services and Amazon Web Services
- Hands on Experience on Amazon’s EC2, EMR and S3
- Conversant with Web/Application Servers - Tomcat, Websphere, Weblogic and IIS
- Experience in writing Maven and SBT scripts to build and deploy Java and Scala Applications
- Implemented unit testing with Junit and MRUnit
- Expertise in Web Application Development with JSP, HTML, CSS, JavaScript, ASP .Net, C# .Net and JQuery
TECHNICAL SKILLS
Bigdata Technologies: Hadoop, Map Reduce, HDFS, Hive, Pig, Zookeeper, Sqoop,Oozie, Flume, IMPALA, HBASE, Kafka, Storm
Big Data Frameworks: HDFS, YARN, Spark
Hadoop Distributions: Cloudera(CDH3,CDH4,CDH5),Horton works, Amazon EMR
Programming Languages: Java, C, C++, shell scripting, Scala
Databases: RDBMS, MySQL, Oracle, Microsoft SQL Server, Teradata, DB2, PL/SQL,CASSANDRA, MongoDB
IDE and Tools: Eclipse, NetBeans, Tableau
Operating System: Windows XP/vista/7, Linux/Unix
Frameworks: Spring, Hibernate, JSF, EJB, JMS
Scripting Languages: JSP & Servlets, JavaScript, XML, HTML, Python
Application Servers: Apache Tomcat, Web Sphere, Web logic, JBoss
Methodologies: Agile, SDLC,Waterfall
Web Services: Restful, SOAP
ETL Tools: Talend, Informatica
Others: Solr, elasticsearch
PROFESSIONAL EXPERIENCE
Confidential, Medford, MA
Senior Hadoop Developer
Responsibilities:
- Experience in working with Horton works Distribution of Hadoop.
- Experience in implementing OLAP multi-dimensional cube functionality using ATSCALE.
- Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for ingesting structured ERP systems data residing on our traditional back-end Microsoft SQL server database on to Hadoop data platform using SQOOP.
- Experience in writing AZURE POWERSHELL scripts to copy or move data from local filesystemto HDFS Blob storage.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Analyzed large data sets by running Hive queries and Pig scripts
- Developed Simple to complex Map Reduce Jobs using Hive and Pig
- Involved in running Hadoop jobs for processing millions of records of text data
- Developed multiple Map Reduce jobs in java for data cleaning and preprocessing
- Involved in loading data from LINUX file system to HDFS
- Responsible for managing data from multiple sources
- Extracted files from Relational Database through Sqoop and placed in HDFS and processed
- Experienced in runningHadoopstreaming jobs to process terabytes of xml format data
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Assisted in exporting analyzed data to relational databases using Sqoop
- Managed and reviewed Hadoop Log files
- Load log data into HDFS using Flume
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
- Used JDBC for database connectivity with MySQL Server
- Extensive work in ETL process consisting of data transformation, data sourcing, mapping,conversion and loading using Talend
- Experience in working with Sparkstreaming.
- Written Spark SQL queries using Dataframes.
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Hands on experience in defining, partitioning, bucketing, compressing Hive tables to meet business requirement.
- Experience in performance tuning of Hive queries.
- Implemented Ad-hoc query using Hive to perform analytics on structured data.
- Worked extensively with HIVE DDLs and Hive Query language (HQLs) and implemented business logic using Hive UDF's to perform ad-hoc queries on structured data.
- Implemented Optimized joins to perform analysis on different data sets using Map Reduce programs.
- Written Hive queries for data analysis to meet the business requirements.
- Hands on experience in working with IMPALA.
- Hands on experience in writingMap Reduceprograms to meet business needs.
- Hands on experience in writing Linux/Unix Shell scripting.
- Experienced in transporting, and processing real time event streaming using Kafka.
- Experienced in defining CRON job flows.
- Able to understand and migrate the ETL & BI codes cross multiple ETL and BI tools like Talend
- Experienced in analyzing data using HiveQL and Pig latin and custom MapReduce programs in Java
- Diverse experience in utilizing Java and python tools in business, web and client server environments including Java platform, JSP, Servlet, Java beans, JSTL, JSP custom tags, EL, JSF and JDBC
- Deep JVM knowledge of heavy experience with Functional Programming language like Scala
- Involved in converting Hive/SQL queries intoSparktransformations and actions using Spark SQL(RDDs and Dataframes) in Python and Scala
- ImplementedSparkSQL queries with Scala for faster testing and processing of data
- Implemented Spark Streaming to read real-time data from Kafka in parallel and processed in parallel and save the result as parquet format in Hive
- Did analytics POC to analyze outpatient details with R and SparkR (with Logistic Regression algorithm)
- Installed Zeppelin in Cloudera Dev environment and executed Spark programs
- Developed applications using Eclipse
- Used Hadoop Streaming to write jobs in a Python scripting language
- Expertise in writing Shell scripts to monitor Hadoop job
Environment: Hadoop, MapReduce, HDFS, Pig, Hive,HBase,Sqoop, Flume, Java, Python, Oracle 10g, MySQL, Ubuntu, Agile, XML, SQL Server, YARN, Cloudera, Teradata, Talend, UNIX Shell Scripting, Oozie, Scala, Spark, R, Maven, SBT, Zeppelin, Eclipse, IntelliJ
Confidential, Santa Barbara, CA
Sr. Hadoop Developer.
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce
- Experience with Cloudera distribution of Hadoop.
- Experience in implementing applications on Spark frameworks using Scala.
- Written Spark SQL queries using Data frames.
- Developed spark programming code in SCALA on INTELLIJIDEAIDE using SBT tools.
- Experience in writing Spark transformations and actions using SparkSQL (RDDs and Dataframes) in Scala by converting Hive/SQL queries.
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Hands on experience in defining, partitioning, bucketing, compressing Hive tables to meet business requirement.
- Experience in performance tuning of Hive queries.
- Implemented Ad-hoc query using Hive to perform analytics on structured data.
- Worked extensively with HIVE DDLs and Hive Query language (HQLs) and implementedbusiness logic using Hive UDF's to perform ad-hoc queries on structured data.
- Written Hive queries for data analysis to meet the business requirements.
- Hands on experience in working with IMPALA.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Used Pig as ETL tool to do transformations, event joins, filtering and some pre-aggregations before storing the data onto HDFS.
- Hands on experience in writing, executing pig scripts.
- Hands on experience in writing Pig UDFs.
- Configured Oozie work flows to automate data flow, preprocess and cleaning tasks using Hadoop Actions.
- Daily Monitoring of Cluster status and health includedDataNode, Job Tracker, Talk Tracker, and Name Node.
- Experience with configuration of Hadoop Ecosystem components: Map Reduce, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Flume, Storm, Spark, Yarn, Tez.
- Experience with CDH distribution and ClouderaManager to manage and monitor Hadoop clusters.
- Knowledge on rendering and delivering reports in desired formats by using reporting tools such as Tableau.
- Worked on debugging, performance tuning of Hive & Pig Jobs
- Worked on tuning the performance Pig queries
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data from different relational databases into HDFS and Hive using Sqoop and performed transformations using MapReduce and Hive
- Analyzed data by performing Hive Queries and running the Pig Scripts to study the behavior in a particular aspect
- Experience working on processing unstructured data using Pig and Hive
- Used UDFs to implement business logic in Hadoop
- Supported MapReduce Programs those are running on the cluster
- Gained experience in managing and reviewing Hadoop log files
- Created HBase tables to store variousdataformats coming from different applications
- Developed ETL Scripts for Data acquisition and Transformation using Talend
- Extensive experience with Talend source & connections configuration, credentials management, context management
- Implemented and assisted with Talend installations and Talend Servers setup which including,MDM server
- Implemented proof of concept to analyze the streaming data using Apache Spark with Scala and Python; Used Maven and SBT for build and deploy the Spark programs
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Developed simple to complex MapReduce jobs using Java, Pig and Hive
- Developed application using Eclipse and used build and deploy tool as Maven
- Exported the analyzed data to the relational databases using Sqoop for visualization
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, Flume, Java, Oracle 10g, MySQL, SQL Server, Ubuntu, Agile, SQL Server, YARN, Spark,Hortonworks, Teradata, Talend, UNIX Shell Scripting, Oozie, Maven, Eclipse
Confidential, Austin, Texas
Hadoop Developer/ Java
Responsibilities:
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
- Manipulated, transformed, and analyzed data from various types of databases
- Worked extensively in creating Map Reduce jobs to power data for search and aggregation
- Extensively used Pig for data cleansing with Tez
- Created HBase tables to store variousdataformats coming from different applications
- Designed a data warehouse using Hive
- Have strong understanding of Dynamic Partitioning in Hive
- Created partitioned and bucketed tables in Hive to provide nice sample during predictive modeling
- Worked with business teams and created Hive queries for ad hoc access
- Created several UDFs in Pig and Hive to give additional support for the project
- Did Analytics with Hive Queries.
- Implemented counters on HBasedata to count total records on different tables.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Implemented secondary sorting to sort reducer output globally in map reduce.
- Implemented data pipeline by chaining multiple mappers by using Chained Mapper
- Experience with Hortonworks distribution of Hadoop.
- Worked onHadoopcluster which ranged from 5-8 nodes during pre-production stage and it was sometimes extended up to 26 nodes during production
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
- Experienced in handling data from different data sets, join them and pre-process using Pig join operations.
- Moving Bulk amount data into HBase using Map Reduce Integration.
- Worked extensively with Sqoop for importing data from Oracle and Netezza
- Created ETL Scripts for Data acquisition and Transformation using Talend
- Can understand and migrate the ETL & BI codes cross multiple ETL and BI tools like Talend
- Developed application using Eclipse and used build and deploy tool as Maven
- Evaluated usage of command line Oozie/Hue for Workflow Orchestration
- Created and maintained Technical documentation for launchingHADOOPClusters and for executing Hive queries and Pig Scripts.
- Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
- Provided batch processing solution to certain unstructured and large volume of data by using Hadoop MapReduce framework
- Experience with configuration of Hadoop Ecosystem components: Map Reduce, Hive, HBase, Pig, Sqoop, Oozie, Zookeeper, Flume, Storm, Spark, Yarn, Tez.
- Mentored analyst and test team for writing Hive Queries
- Used R for analytics, predictive modeling and regression analysis
- Implemented test scripts to support test driven development and continuous integration
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, HBase, Flume, Java, Oracle 10g, Netezza, MySQL, Ubuntu, Agile, Cloudera, UNIX Shell Scripting, Oozie, Maven, Eclipse
Confidential, Carson City, NV
Java/ J2EE Developer.
Responsibilities:
- Worked as a senior developer for the project
- Created UML class diagrams that depict the code’s design and its compliance with the functional requirements
- Analysis, Design, Development and Unit Testing of the modules
- Used Java Mail notification mechanism to send confirmation email to applied companies
- Also involved in writing JSP’s/JavaScript and Servlets to generate dynamic web pages and web content
- Developed various Java classes, SQL queries and procedures to retrieve and manipulate the data from backend Oracle database using JDBC
- Used Enterprise Java Beans as a middleware in developing a three-tier distributed application
- Developed Session Beans and Entity beans to business and data process
- Implemented Web Services with REST
- Developed user interface using HTML, CSS, JSPs and AJAX
- Client side validation using JavaScript and JQuery
- Performed client side validation with JavaScript and applied server side validation as well to the web pages.
- Developed the application leveraging the Model View Layer (MVC) architecture, Build tools Maven, ANT.
- Used JIRA for BUG Tracking of Web application.
- Written Spring Core and Spring MVC files to associate DAO with Business Layer.
- Worked with HTML, DHTML, CSS, and JAVASCRIPT in UI pages.
- Wrote Web Services using SOAP for sending and getting data from the external interface.
- Extensively worked with JUnit framework to write JUnit test cases to perform unit testing of the application
- Implemented JDBC modules in java beans to access the database.
- Designed the architecture, tables for the back-end Oracle database.
- Application hosted under Web Logic and developed utilizing Eclipse IDE.
- Used XSL/XSLT for transforming and displaying reports. Developed Schemas for XML.
- Involved in writing the ANT scripts to build and deploy the application.
- Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework.
- Implemented field level validations with AngularJS, JavaScript and JQuery
- Preparation of unit test scenarios and unit test cases
- Branding the site with CSS
- Code review and unit testing the code
- Involved in unit testing using Junit
- ImplementedLog4Jto trace logs and to track information
- Involved in project discussions with clients and analyzed complex project requirements as well as prepared design documents
Environment: Java, JSP, EJB, JMS, JavaScript, JSF, XML, JBOSS, WebSphere, WebLogic, Hibernate, spring, SQL, PL/SQL, CSS, Log4j, JUnit, Eclipse, Oracle 11g, Load Runner, TFS
Confidential
Java Developer
Responsibilities:
- Interacted with clients to gather functional requirements such as SEO requirements, Captcha implementation, consultation form implementation and etc.,
- Involved in Analysis, design, development and testing of the modules
- Developed master pages and static pages
- Developed consultation form with Captcha functionality and mailing functionality
- Developed Services with AngularJS
- Working with Shibboleth Identity Provider andService Provider
- Using IIS and Apache for web Server.
- Developed analysis level documentation such as Use Case Model, Activity, Sequence and Class Diagrams.
- Developed the application using Struts MVC for the web layer.
- Developed UI layer logics of the application using JSP, JavaScript, HTML/DHTML, and CSS.
- Implemented URL Rewrite and Redirection using URLRewriteFilter
- Implemented English to French Toggling functionality
- Extensively used Core Java, Servlets, JSP and XML.
- Used Struts 1.2 in presentation tier.
- Involved in writing JSP and JSF components. Used JSTL Tag library (Core, Logic, Nested, and Bean and Html taglib’s) to create standard dynamic web pages.
- Application was based on MVC architecture with JSP serving as presentation layer, Servlets as controller and Hibernate in business layer to access to Oracle Database.
- Developed the DAO layer for the application using Spring Hibernate Template support
- Implemented Google Analytics for all pages by using Google Scripts
- Implemented Log4j Logging in the application
- Hosted the web application in Testing Environment and Supported network team to host the same in Live
Environment: ASP .Net, C# .Net, SQL Server, SSIS, SharePoint, Entity Framework, Outlook, SMTP Mailing, HTML, JavaScript, JSON, JQuery, CSS, Visual Studio, SQL Server Management Studio, Team Foundation Server, XML, and IIS