Sr. Hadoop Developer Resume
IL
PROFESSIONAL SUMMARY:
- Having 9+ years of experience in software development, deployment and maintenance of applications of various stages.
- Having 4 years of experience on major components in Hadoop Ecosystem like Hadoop Map Reduce, HDFS, HIVE, PIG, Pentaho, HBase, Zookeeper, Sqoop, Oozie, Flume, Storm, Yarn, Spark, Scala and Avro.
- Extensively worked on Hadoop tools which include Pig, Hive, Oozie, Sqoop, and Spark, Data frames, HBase and MapReduce programming.
- Created Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Developed SPARK applications using Scala for easy Hadoop transitions. Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive. Developed Spark code and Spark - SQL/Streaming for faster testing and processing of data.
- Experience in applying the latest development approaches including applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Thorough knowledge with the data extraction, transformation and load in Hive, Pig and HBase
- Hands on experience in coding Map Reduce / Yarn Programs using Java, Scala for analyzing Big data .
- Worked with Apache Spark which provides fast and general engine for large data processing integrated with functional programming language Scala .
- Hands on experience in writing Pig Latin scripts, working with grunt shells and job scheduling with Oozie.
- Hadoop Distributions Worked with Apache Hadoop along enterprise version of Cloudera and Hortonworks. Good Knowledge on MAPR distribution.
- Data Ingestion in to Hadoop (HDFS): Ingested data into Hadoop from various data sources like Oracle, MySQL using Sqoop tool.
- Created Sqoop job with incremental load to populate Hive External tables. Involved in importing the real-time data to Hadoop using Kafka and worked on Flume.
- Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports for the BI team.
- Experience in designing and implementing of secure Hadoop cluster using Kerberos.
- Processing this data using Spark Streaming API with Scala .
- Good exposure to MongoDB, its functionality and Cassandra implementation.
- Have a good experience working in agile development environment including Scrum methodology .
- Good Knowledge on Spark framework on both Confidential and real-time data processing.
- Expertise in Storm for reliable real-time data processing capabilities to Enterprise Hadoop .
- Hands on experience in scripting for automation, and monitoring using Shell, PHP, Python & Perl scripts.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa according to client's requirement.
- Experienced in deployment of Hadoop Cluster using Puppet tool.
- Excellent knowledge in existing Pig Latin script migrating into Java Spark code.
- Experience in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MY SQL, Oracle, Teradata and DB2 using Sqoop .
- Strong knowledge in Upgrading Mapr, CDH and HDP Cluster.
- Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata .
- Java Experience Created applications in core Java, built application that satisfy use of database and constant connectivity such as a client-server model using JDBC, JSP, Spring and Hibernate. Implemented web-services for network related applications in java.
- Methodologies Handful experience in working with different software methodologies like Water fall and agile methodologies.
- No SQL Databases Worked with NoSQL such as HBase, MongoDB, and Cassandra etc.
- AWS Planned, deployed, and maintained Amazon AWS cloud infrastructure consisting of multiple nodes and Involved in deploying the applications in AWS.
- Experience in developing web pages using Java, JSP, Servlets, JavaScript, jQuery, Angular JS, Node JS, jQuery, JBOSS 4.2.3, XML, Web Logic, SQL, PL/SQL, JUnit, and Apache-Tomcat, Linux.
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, Map R and Apache
Languages: Java, Python, J ruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++
No SQL Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Methodology: Agile, waterfall
Web Design Tools: HTML, DHTML, AJAX, JavaScript, jQuery and CSS, Angular JS, Ext JS and JSON, Node JS.
Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, Web Logic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS and Windows Variants
Data analytical tools: R and MATLAB
ETL Tools: Talend, Informatica, Pentaho
PROFESSIONAL EXPERIENCE:
Confidential, IL
Sr. Hadoop Developer
Responsibilities:
- Designed, developed and tested Map Reduce programs on Mobile Offers Redemptions and Sent it to the downstream applications like HAVI.
- Scheduled this MapReduce job through Oozie workflow.
- Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala IDE for Eclipse.
- Implemented different machine learning techniques in Scala using Scala machine learning library.
- Developed Spark applications using Scala for easy Hadoop transitions.
- Created RDD's in Spark, Scala and Python.
- Closely worked with Admin team to gather hardware for Data nodes, edge nodes, and Name nodes.
- Successfully loaded files to Hive and HDFS from Oracle, Netezza and SQL Server using SQOOP.
- Used Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop HIVE.
- Designed & Created ETL Jobs through Talend to load huge volumes of data into Cassandra.
- Used Sqoop to import data from SQL server to Cassandra.
- Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive.
- Developed some machine learning algorithms using Mahout for data mining for the data stored in HDFS.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Worked with Oozie Workflow manager to schedule Hadoop jobs and high intensive jobs.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Loaded data into HIVE tables, and extensively used Hive/HQL or Hive queries to query data in Hive Tables.
- Introduced Tableau Visualization to Hadoop to produce reports for Business and BI team.
- Creating UDF functions in Pig & Hive and applying partitioning and bucketing techniques in Hive for performance improvement.
- Creating indexes and tuning the SQL queries in Hive and Involved in database connection by using Sqoop.
- Involved in Hadoop Name node metadata backups and load balancing as a part of Cluster Maintenance and Monitoring.
- Worked on Spark with Python and Scala.
- Serialized data in Hadoop using Avro serialization system.
- Used File System Check (FSCK) to check the health of files in HDFS.
- Monitored Nightly jobs to export data out of HDFS to be stored offsite as part of HDFS backup.
- Used Pig for analysis of large data sets and brought data back to HBase by Pig.
- Scheduled, monitored and debugged various MapReduce, Pig, Hive jobs using Oozie Workflow.
- Design and deployment of Storm cluster integration with Kafka and HBase.
- Implemented authentication and authorization service using Kerberos authentication protocol.
- Ingested huge amount of XML files into Hadoop by Utilizing DOM Parsers with in Map Reduce.
- Extracted Daily Sales, Hourly Sales and Product Mix of the items sold in Yum Brand Restaurant's and loaded them into Global Data Warehouse.
- Scheduled Multiple Map Reduce jobs in Oozie. Involved in extracting the promotions data for all stores within USA by writing the map reduce jobs and automating it with UNIX shell script.
- Gathered business requirements in meetings for successful implementation and POC and moving it to Production.
Environment: Hadoop, MapReduce, Sqoop, Hive, Flume, Oozie, Pig, HBase, Scala, Python, Zookeeper, Talend Open Studio, Kafka, Storm, Oracle, Apache Cassandra, SQL Server 2008, MySQL, Java, SQL, PLSQL, Toad, Eclipse Kepler IDE, Microsoft Office 2007, MS Outlook 2007, SharePoint Team site.
Confidential, Minneapolis, MN
Hadoop Developer
Responsibilities:
- Developed data pipeline using FLUME, SQOOP, HIVE AND JAVA MAPREDUCE to ingest customer behavioural data and financial histories into HDFS for analysis.
- Used HIVE to do transformations, event joins, filter boot traffic and SOME PRE-AGGREGATIONS before storing the data onto HDFS.
- Extensive experience in ETL (Talend) Data Ingestion, In-Stream data processing, Confidential ANALYTICS and Data PERSISTENCE STRATEGY.
- Worked on Designing and Developing ETL (Talend) Workflows using Java for processing data in HDFS/Cassandra using Oozie.
- Expertise with the tools in Hadoop Ecosystem including PIG, HIVE, HDFS, MAP REDUCE, SQOOP, KAFKA, YARN, OOZIE, AND ZOOKEEPER. Hadoop architecture and its components.
- Involved in integration of Hadoop cluster with spark engine to perform Confidential and Streaming operations.
- Explored with the SPARK, improving the performance and optimization of the existing algorithms in Hadoop using SPARK CONTEXT, SPARK-SQL, DATA FRAME, PAIR RDD'S, SPARK YARN.
- Import the data from different sources like HDFS/HBase into SPARK RDD.
- Developed SPARK code using SCALA and Spark-SQL/Streaming for faster testing and processing of data.
- Developed KAFKA producer and consumers, HBase clients, SPARK and Hadoop Map Reduce jobs along with components on HDFS, Hive.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Involved in setting up Name node, Resource Manager, HBase Master and Oozie high availability.
- Responsible for monitoring regular daily MapReduce jobs scheduled through Oozie.
- Involved in developing HIVE DDLS to create, alter and drop Hive tables and storm.
- Create scalable and high-performance web services for data tracking.
- Involved in loading data from UNIX file system to HDFS. Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zoo Keeper.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Extracted the data from other data sources into HDFS using Sqoop
- Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or Copy to Local.
- Experienced in managing Big bucket for java and python code.
- Experienced in managing Hadoop Cluster using CLOUDERA MANAGER TOOL.
- Involved in using Confidential to access Hive table metadata from Map Reduce.
Environment: MapReduce, YARN, Hive, Pig, Cassandra, Oozie, Talend, Sqoop, SPLUNK, Kafka, ORACLE 11G, Core Java, Cloudera, Eclipse, Python, Scala, Spark, SQL,TABLEAU, BIG BUCKET, UNIX SHELL SCRIPTING.
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Applied transformations on the data loaded into Spark Data frames and done in memory data computation to generate the output response.
- Developed multiple POCs using Spark Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using Python (PySpark)
- Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Used hive to analyze the partitioned data and compute various metrics for reporting.
- Import the data from different sources like HDFS into Spark Data frames.
- Scheduled and executed workflows in Oozie to run Hive and Pig jobs
- Experienced with Spark Context, Spark -SQL, Data Frame and Pair RDD's.
- Reduced the latency of spark jobs by tweaking the spark configurations and following other performance and Optimization techniques.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce, Hive, Pig.
- Used Hive, spark SQL Connection to generate Tableau BI reports.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Created Hive Generic UDF's to process business logic that varies based on policy.
- Developed various data connections from data source to SSIS, Tableau Server for report and dashboard development.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
- Develop Spark code using Scala and Spark-SQL for faster testing and data processing.
- Developed solutions utilizing the Hadoop ecosystem such Hadoop, Spark, Hive, HBASE, Pig, Sqoop, Oozie, Ambari, Zookeeper etc.
- Experience in writing map reduce programs with java API to cleanse structured and unstructured data.
- Experience in Rdms such as oracle, Teradata.
- Worked on loading the data from MySQL & Teradata to HBase where necessary using Sqoop.
Environment: Scala, spark, Kafka, Hive, Horotonworks, Oozie, Play framework, Akka, Git, ElasticSearch, Logstash, Kibana, Kerberos.
Confidential
Java Developer
Responsibilities:
- Involved in design, development and testing phases of the project.
- Implemented GUI using Html, Jsp, Tiles, Struts Tag Libs, CSS components.
- Configured faces-config.xml for the page navigation rules and created managed and backing beans for the Optimization module.
- Developed application using JSF, myFaces, spring, and JDO technologies which communicated with java. Used JSF layout for View of MVC. JavaScript, DHTML also used for front end interactivity.
- Used LDAP for user Authentication and authorization.
- Developed Enterprise Application using SpringMVC, JSP, MySQL
- Worked on developing client-side Web Services components using Jax-Ws technologies
- Extensively worked on JUnit for testing the application code of server-client data transferring
- Developed and enhanced products in design and in alignment with business objectives
- Used SVN as a repository for managing/deploying application code
- Involved in the system integration and user acceptance tests successfully
- Developed front end using JSTL, JSP, HTML, and Java Script
- Used XML to maintain the Queries, JSP page mapping, Bean Mapping etc.
- Used Oracle 10g as the backend database and written PL/SQL scripts.
- Maintained and modified system based on user feedbacks using the OO concepts
- Implemented database transactions using Spring AOP & Java EE CDI capability
- Enriched organization reputation via fulfilling requests and exploring opportunities
- Business Analysis, Reporting Service and Integrate to Sage Accpac (ERP)
- Developing new and maintaining existing functionality using SPRING MVC, Hibernate
- Developed test cases for integration testing using Junit.
- Extensively used tools like AccVerify, Check style and Clockworks to check the code.
- Creating new and maintaining existing web pages build in JSP, Servlet .
- Presented the process logical and physical flow to various teams using PowerPoint and Visio diagrams.
Environment: Java JDK, Java J2EE, Informatica, Oracle (TOAD and SQL developer) Servlets, JBoss application Server, Water Fall,JSPs, EJBs, DB2, RAD, XML, Web Server, JUNIT, Hibernate, MS ACCESS, Microsoft Excel.
Confidential
Java Developer
Responsibilities:
- Involved in projects utilizing Java, Java EE web applications to create fully-integrated client management systems
- Developed UI using HTML, Java Script, JSP and developed business Logic and interfacing components using Business Objects, JDBC and XML .
- Participated in user requirement sessions to analysis and gather Business requirements.
- Development of user visible site using Perl, back end admin sites using Python and big data using core java.
- Involved in development of the application using Spring Web MVC and other components of the
- Elaborated Use Cases based on business requirements and was responsible for creation of class Diagrams, Sequence Diagrams.
- Implemented Object-relation mapping in the persistence layer using Hibernate (ORM) framework.
- Implemented REST Web Services with Jersey API to deal with customer requests
- Experienced in developing Restful web services: consumed and also produced.
- Used Hibernate for the Database connection and Hibernate Query Language (HQL) to add and retrieve the information from the Database.
- Implemented Spring JDBC for connecting Oracle database.
- Designed the application using MVC framework for easy maintainability
- Provided bug fixing and testing for existing web applications.
- Involved in full system life cycle and responsible for Developing, Testing, Implementing.
- Involved in Unit Testing, Integration Testing and System Testing.
- Implemented Form Beans and their Validations.
- Written Hibernate components.
- Developed client side validations with Java script.
Environment: Spring, JSP, Servlets, REST, Oracle, AJAX, Java Script, JQuery, Hibernate, Web Logic, Log4j, HTML, XML, CVS, Eclipse, SOAP Web Services, XSLT, XSD, UNIX, Maven, Mockito Junits, Jenkins, shell scripting, MVS, ISPF.
