Sr. Hadoop/ Spark Developer Resume
Houston, TX
SUMMARY
- Over all 8+ years of IT experience in a variety of industries, which includes hands on experience of in Big Data Analytics and development
- Expertize with the tools in Hadoop Ecosystem including Pig, Hive,HDFS, MapReduce,Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm..
- Rich working experience in data loading in hive tables and writing hive queries using join, orderby, groupby etc., by sqooping data from RDBMS.
- Experience in designing and developing applications in Sparkusing Scala to compare the performance of Sparkwith Hive and SQL/Oracle.
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- Strong experience on Hadoop distributions like Cloudera, MapR and HortonWorks.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
- Apache Spark concepts with Scala, writing transformations in Scala for live streaming data. Clickstream analysis using Spark with Scala involving data gathering from Kafka, Flume and then processing it through Hive, Pig, Python and Shell scripting.
- Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, Apache parquet and Avro.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa according to client's requirement.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Written Scala codes for data analytics in Spark using map, flatMap,, reduceByKey, groupByKey etc. to analyze the real time streaming data. Worked on data from various sources such as marketing, Research, logs data, website data etc.
- Involved in performance tuning of the applications. Used various techniques such as tez engine, ORC format etc. for more optimization
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
- Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
- Worked in large and small teams for systems requirement, design & development.
- Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Technology and Web based applications.
- Experience in using various IDEs Eclipse, IntelliJ and repositories SVN and Git.
- Experience of using build tools Ant,Maven.
- Preparation of Standard Code guidelines, analysis and testing documentations.
TECHNICAL SKILLS
Big Data/Hadoop Technologies: HDFS, YARN, MapReduce, Machine Learning
Hadoop Ecosystem: Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Zookeeper and Oozie
SPARK Streaming Technologies: SPARK,Spark SQL
NOSQL Databases: HBase, Cassandra, MongoDB
Java/J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB, RESTful
Programming Languages: Java, Scala, Python, SQL, PL/SQL, Pig Latin, HiveQL, Java Script, Shell Scripting
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat
Databases: Microsoft SQL Server, MySQL, Oracle, DB2
Build Tools: Jenkins, Maven, ANT,Anaconda with Spyder
Business Intelligence Tools: Tableau, Splunk, Qlik View
Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, IntelliJ
Development Methodologies: Agile/Scrum, Waterfall
PROFESSIONAL EXPERIENCE
Confidential - Houston, TX
Sr. Hadoop/ Spark Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Used Spark - Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Developed Sparkscripts by using Scala shell commands as per the requirement.
- Used SparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce inSpark1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of SparkApplications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using SparkContext, Spark-SQL, Data Frames and Pair RDD's.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Experienced in handling large datasets using Partitions, Sparkin Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
- Worked on Cluster of size 130 nodes.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Analyzed the SQL scripts and designed the solution to implement using Pyspark
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Developed Hive queries to process the data and generate the data cubes for visualizing
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Good experience with continuous Integration of application using Jenkins.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Environment: Hadoop YARN, SparkCore, SparkStreaming, SparkSQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Oozie, Jenkins, Cloudera, Oracle 12c, Linux.
Confidential, Bentonville, AR
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
- Worked on Installation and configuring of Zoo Keeper to co - ordinate and monitor the cluster resources.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on POC's with Apache Sparkusing scala to implement Sparkin project.
- Consumed the data from Kafka using Apache Spark.
- Load and transform large sets of structured, semi structured and unstructured data.
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data into HDFS and Hive using Sqoop
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive
- Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
- Extending HIVE and PIG core functionality by using custom User Defined Function's (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
- Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Responsible for loading data files from various external sources like ORACLE, MySQL into staging area in MySQL databases.
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Actively involved in code review and bug fixing for improving the performance.
- Created Linux shell Scripts to automate the daily ingestion of IVR data
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
Environment: Hadoop, HDFS, Pig, Apache Hive, Sqoop, Kafka, Apache Spark, Shell Scripting, HBase, Python, Kerberos, Agile, Zoo Keeper, Maven, Ambari, Horton Works, Teradata, MySQL.
Confidential, NJ
Hadoop Developer
Responsibilities:
- Importing and exporting data into HDFS and Hive using Sqoop and Kafka.
- Develop different components of system likeHadoopprocess that involves Map Reduce, and Hive.
- Developed interface for validating incoming data into HDFS before kicking offHadoopprocess.
- Written hive queries using optimized ways like user - defined functions, customizingHadoopshuffle & sort parameters.
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs.
- Developing map reduce programs for different types of Files using Combiners with UDF's and UDAF's.
- Experience working on multiple node cluster tool which offer several commands to return HBase usage.
- Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
- Experience on pre-processing the logs and semi structured content stored on HDFS using PIG.
- Experience in structured data imports and exports into Hive warehouse which enables business analysts to write Hive queries.
- Experience in managing and reviewingHadooplog files.
- Experience on Unix shell scripts for business process and loading data from different interfaces to HDFS.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts.
Environment: Linux 6.7, CDH5.5.2, MapReduce, Hive 1.1, PIG, HBase, Shell Script, SQOOP 1.4.3, Eclipse, Java 1.8.
Confidential, Dublin, OH
Big Data/Hadoop Developer
Responsibilities:
- Developed automated scripts to install Hadoop clusters
- Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster in fully distributed mode Mapping the DB2 V9.7, V10.x Data Types to Hive Data Types and validations.
- Developed Hive jobs to transfer 8 years of bulk data from DB2, MS SQL Server to HDFS layer
- Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts
- Job automation framework to support & operationalize data loads
- Automated the DDL creation process in hive by mapping the DB2 data types
- Monitored Hadoop cluster job performance and capacity planning.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Had experience in Hadoop framework, HDFS, MapReduce processing implementation.
- Tuning Hadoop performance with high availability and involved in recovery of Hadoop clusters
- Responsible for coding Java Batch, Restful Service, Map Reduce program, Hive query's, testing, debugging, Peer code review, troubleshooting and maintain status report.
- Designed Business classes and used Design Patterns like Data Access Object, MVC etc.
- Used AVRO, Parquet file formats for serialization of data.
- Good experience with ETL data flow using informatica power center.
- Developed several test cases using MR Unit for testing Map Reduce Applications
- Responsible for troubleshooting and resolving the performance issues of Hadoop cluster.
- Used Bzip2 compression technique to compress the files before loading it to Hive.
- Used Flume to collect, aggregate, and store the web log data from different sources like web serversmobile devices and pushed to HDFS.
- Experience in using HBase as backend database for the application development.
- Support/Troubleshoot hive programs running on the cluster and Involved in fixing issues arising out of duration testing.
- Prepare daily and weekly project status report and share it with the client.
Environment: Hadoop, MapReduce, Flume, Sqoop, Hive, Pig, WebServices, Linux, Core Java, Informatica, HBase, Avro, JIRA, Git, Cloudera, MR Unit, MS-SQL Server, UNIX, DB2
Confidential, Columbus
Java/J2EE Developer
Responsibilities:
- Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.
- Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
- Applied OOAD principle for the analysis and design of the system.
- Implemented XML Schema as part of XQuery query language
- Applied J2EE design patterns like Singleton, Business Delegate, Service Locator, Data Transfer Object (DTO), Data Access Objects (DAO) and Adapter during the development of components.
- Used RAD for the Development, Testing and Debugging of the application.
- Used Websphere Application Server to deploy the build.
- Developed front - end screens using Struts, JSP, HTML, AJAX, JQuery, Java script, JSON and CSS.
- Used J2EE for the development of business layer services.
- Developed Struts Action Forms, Action classes and performed action mapping using Struts.
- Performed data validation in Struts Form beans and Action Classes.
- Developed POJO based programming model using spring framework.
- Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.
- Used Web Services to connect to mainframe for the validation of the data.
- SOAP has been used as a protocol to send request and response in the form of XML messages.
- JDBC framework has been used to connect the application with the Database.
- Used Eclipse for the Development, Testing and Debugging of the application.
- Log4j framework has been used for logging debug, info & error data.
- Used Hibernate framework for Entity Relational Mapping.
- Used Oracle 10g database for data persistence and SQL Developer was used as a database client.
- Extensively worked on Windows and UNIX operating systems.
- Used SecureCRT to transfer file from local system to UNIX system.
- Performed Test Driven Development (TDD) using JUnit.
- Used Ant script for build automation.
- PVCS version control system has been used to check-in and checkout the developed artifacts. The version control system has been integrated with Eclipse IDE.
- Used Rational Clear quest for defect logging and issue tracking.
Environment: Windows XP, Unix, RAD7.0, Core Java, J2EE, Struts, Spring, Hibernate, Web Services, Design Patterns, Websphere, Ant, (Servlet, JSP), HTML, AJAX, JavaScript, CSS, jQuery, JSON, SOAP, WSDL, XML, Eclipse, Agile, Jira, Oracle 10g, Win SCP, Log4J, JUnit.
Confidential
Java/J2EE Developer
Responsibilities:
- Implementation of new module development, new change requirement, fixes the code. Defect fixing for defects identified in pre - production environments and production environment.
- Wrote technical design document with class, sequence, and activity diagrams in each use case.
- Created Wiki pages using Confluence Documentation.
- Developed various reusable helper and utility classes which were used across all modules of the application.
- Involved in developing XML compilers using XQuery.
- Developed the Application using Spring MVC Framework by implementing Controller, Service classes.
- Involved in writing Spring Configuration XML file that contains declarations and other dependent objects declaration.
- Used Hibernate for persistence framework, involved in creating DAO's and used Hibernate for ORM mapping.
- Written Java classes to test UI and Web services through JUnit.
- Performed functional and integration testing, extensively involved in release/deployment related critical activities. Responsible for designing Rich user Interface Applications using JSP, JSP Tag librariesSpring Tag libraries, JavaScript, CSS, HTML.
- Used SVN for version control. Log4J was used to log both User Interface and Domain Level Messages.
- Used Soap UI for testing the Web Services.
- Use of MAVEN for dependency management and structure of the project
- Create the deployment document on various environments such as Test, QC, and UAT.
- Involved in system wide enhancements supporting the entire system and fixing reported bugs.
- Explored Spring MVC, Spring IOC, Spring AOP, and Hibernate in creating the POC.
- Done data manipulation on front end using JavaScript and JSON.
Environment: Java, J2EE, JSP, Spring, Hibernate, CSS, JavaScript, Oracle, JBoss, Maven, Eclipse, JUnit, Log4J, AJAX, Web services, JNDI, JMS, HTML, XML, XSD, XML Schema, SVN, Git.