We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

3.00/5 (Submit Your Rating)

Detroit, MI

SUMMARY:

  • Overall 8+ years of IT experience in a variety of industries, which includes hands on experience in Big Data Analytics and development
  • Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
  • Excellent knowledge on Hadoop Ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Strong experience in writing applications using python using different libraries like Pandas, NumPy, SciPy, Matpotlib etc.
  • Good Knowledge in Machine Learning algorithms using Python and its concepts as data - preprocessing, Regression, classification etc and appropriate model selection techniques.
  • Good exposure with Agile software development process.
  • Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
  • Strong experience on Hadoop distributions like Cloudera, MapR and HortonWorks.
  • Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB.
  • Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, parquet and Avro.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice-versa.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
  • Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC, SOAP and RESTful web services.
  • Strong Experience of Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
  • Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
  • Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
  • Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
  • Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
  • Worked in large and small teams for systems requirement, design & development.
  • Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Experience in using various IDEs Eclipse, IntelliJ and repositories SVN and Git.
  • Experience of using build tools Ant, Maven.
  • Preparation of Standard Code guidelines, analysis and testing documentations.
  • Technology and Web based applications.

TECHNICAL SKILLS:

BigData/Hadoop Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper and Oozie

Machine Learning: NO SQL Databases, HBase, Cassandra, MongoDB

Languages: C, Java, Scala, Python, SQL, PL/SQL, Pig Latin, HiveQL, Java Script, Shell Scripting

Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts, JMS, EJB, RESTful

Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.

Cloud Computing Tools: Amazon WBS

Operating Systems: UNIX, Windows, LINUX

Databases: Microsoft SQL Server, MySQL, Oracle, DB2

Build Tools: Jenkins, Maven, ANT

Business Intelligence Tools: Tableau, Splunk, QlikView

Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, IntelliJ

Development Methodologies: Agile/Scrum, Waterfall Version Control Tools Git, SVN

PROFESSIONAL EXPERIENCE:

Confidential, Detroit, MI

Sr. Hadoop/Spark Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked on Cluster of size 130 nodes.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Involved in creating Hive tables, and loading and analyzing data using hive queries
  • Developed Hive queries to process the data and generate the data cubes for visualizing
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Good experience with Talend open studio for designing ETL Jobs for Processing of data.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Good experience with continuous Integration of application using Jenkins.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.

Environment: Hadoop YARN, Spark Core, Spark Streaming, Spark SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Talend, Oozie, Jenkins, Cloudera, Oracle 12c, Linux.

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Managing fully distributed Hadoop cluster is an additional responsibility assigned to me. I was trained to overtake the responsibilities of
  • A Hadoop Administrator, which includes managing the cluster, Upgrades and installation of tools that uses Hadoop ecosystem.
  • Worked on Installation and configuring of Zoo Keeper to co-ordinate and monitor the cluster resources.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on POC’s with Apache Spark using Scala to implement spark in project.
  • Consumed the data from Kafka using Apache spark.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive
  • Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Experienced with performing CURD operations in HBase.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Responsible for loading data files from various external sources like ORACLE, MySQL into staging area in MySQL databases.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Actively involved in code review and bug fixing for improving the performance.
  • Good experience in handling data manipulation using python Scripts.
  • Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
  • Created Linux shell Scripts to automate the daily ingestion of IVR data
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Helped the Analytics team with Aster queries using HCatlog.
  • Automated the History and Purge Process.
  • Created HBase tables to store various data formats of incoming data from different portfolios.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Developed the verification and control process for daily load.
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs

Environment: Hadoop, HDFS, Pig, Apache Hive, Sqoop, Kafka, Apache Spark, Storm, Solr, Shell Scripting, HBase, Python, Kerberos, Agile, Zoo Keeper, Maven, Ambari, Horton Works, MySQL.

Confidential, Columbia, MD

Big Data/Hadoop Developer

Responsibilities:

  • Worked on the proof-of-concept for Apache Hadoop 1.20.2 framework initiation
  • Installed and configured Hadoop clusters and eco-system
  • Developed automated scripts to install Hadoop clusters
  • Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster in fully distributed mode Mapping the DB2 V9.7, V10.x Data Types to Hive Data Types and validations.
  • Performed load and retrieve unstructured data (CLOB, BLOB etc.)
  • Developed Hive jobs to transfer 8 years of bulk data from DB2, MS SQL Server to HDFS layer
  • Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts
  • Job automation framework to support & operationalize data loads
  • Automated the DDL creation process in hive by mapping the DB2 data types
  • Monitored Hadoop cluster job performance and capacity planning.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Had experience in Hadoop framework, HDFS, MapReduce processing implementation.
  • Tuning Hadoop performance with high availability and involved in recovery of Hadoop clusters
  • Responsible for coding Java Batch, Restful Service, Map Reduce program, Hive query’s, testing, debugging, Peer code review, troubleshooting and maintain status report.
  • Designed Business classes and used Design Patterns like Data Access Object, MVC etc.
  • Used AVRO, Parquet file formats for serialization of data.
  • Good experience with ETL data flow using Informatica power center.
  • Developed several test cases using MR Unit for testing Map Reduce Applications
  • Responsible for troubleshooting and resolving the performance issues of Hadoop cluster.
  • Used Bzip2 compression technique to compress the files before loading it to Hive
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile devices and pushed to HDFS.
  • Experience in using HBase as backend database for the application development.
  • Support/Troubleshoot hive programs running on the cluster and Involved in fixing issues arising out of duration testing.
  • Prepare daily and weekly project status report and share it with the client.

Environment: Hadoop, MapReduce, Flume, Sqoop, Hive, Pig, WebServices, Linux, Core Java, Informatica, HBase, Avro, JIRA, Git, Cloudera, MR Unit, MS-SQL Server, UNIX, DB2.

Confidential, Buffalo, NY

Hadoop Developer

Responsibilities:

  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Configured Sqoop Jobs to import data from RDBMS into HDFS using Oozie workflows.
  • Involved in creating Hive Internal and External tables, loading data and writing hive queries, which will run internally in map, reduce way.
  • Involved in Migrating the Hive queries to Impala.
  • Created batch analysis job prototypes using Hadoop, Pig, Oozie, Hue and Hive.
  • Assisted with data capacity planning and node forecasting.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Documented the systems processes and procedures for future references.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Monitoring, Performance tuning of Hadoop clusters, Screening Hadoop cluster job performances and capacity planning Monitor Hadoop cluster connectivity and security Manage and review Hadoop log files.
  • Load and transform large sets of structured, semi structured and unstructured data.

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, Hadoop, HDFS, Map Reduce, Hive, HBase, Linux, Cluster Management

Confidential, Chicago, IL

Sr.Java/J2EE Developer

Responsibilities:

  • Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.
  • Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
  • Applied OOAD principle for the analysis and design of the system.
  • Implemented XML Schema as part of XQuery query language
  • Applied J2EE design patterns like Singleton, Business Delegate, Service Locator, Data Transfer Object (DTO), Data Access Objects (DAO) and Adapter during the development of components.
  • Used RAD for the Development, Testing and Debugging of the application.
  • Used WebSphere Application Server to deploy the build.
  • Developed front-end screens using Struts, JSP, HTML, AJAX, JQuery, Java script, JSON and CSS.
  • Used J2EE for the development of business layer services.
  • Developed Struts Action Forms, Action classes and performed action mapping using Struts.
  • Performed data validation in Struts Form beans and Action Classes.
  • Developed POJO based programming model using spring framework.
  • Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.
  • Used Web Services to connect to mainframe for the validation of the data.
  • SOAP has been used as a protocol to send request and response in the form of XML messages.
  • JDBC framework has been used to connect the application with the Database.
  • Used Eclipse for the Development, Testing and Debugging of the application.
  • Log4j framework has been used for logging debug, info & error data.
  • Used Hibernate framework for Entity Relational Mapping.
  • Used Oracle 10g database for data persistence and SQL Developer was used as a database client.
  • Extensively worked on Windows and UNIX operating systems.
  • Used SecureCRT to transfer file from local system to UNIX system.
  • Performed Test Driven Development (TDD) using JUnit.
  • Used Ant script for build automation.
  • PVCS version control system has been used to check-in and checkout the developed artifacts. The version control system has been integrated with Eclipse IDE.
  • Used Rational Clear quest for defect logging and issue tracking.

Environment: Windows XP, Unix, RAD7.0, Core Java, J2EE, Struts, Spring, Hibernate, Web Services, Design Patterns, WebSphere, Ant, (Servlet, JSP), HTML, AJAX, JavaScript, CSS, jQuery, JSON,SOAP, WSDL, XML, Eclipse, Agile, Jira, Oracle 10g, Win SCP, Log4J, JUnit.

Confidential, Omaha, NE

Java/J2EE Developer

Responsibilities:

  • Designed and developed the application using agile methodology.
  • Implementation of new module development, new change requirement, fixes the code. Defect fixing for defects identified in pre-production environments and production environment.
  • Wrote technical design document with class, sequence, and activity diagrams in each use case.
  • Created Wiki pages using Confluence Documentation.
  • Developed various reusable helper and utility classes which were used across all modules of the application.
  • Involved in developing XML compilers using XQuery.
  • Developed the Application using Spring MVC Framework by implementing Controller, Service classes.
  • Involved in writing Spring Configuration XML file that contains declarations and other dependent objects declaration.
  • Used Hibernate for persistence framework, involved in creating DAO's and used Hibernate for ORM mapping.
  • Written Java classes to test UI and Web services through JUnit.
  • Performed functional and integration testing, extensively involved in release/deployment related critical activities. Responsible for designing Rich user Interface Applications using JSP, JSP Tag libraries, Spring Tag libraries, JavaScript, CSS, HTML.
  • Used SVN for version control. Log4J was used to log both User Interface and Domain Level Messages.
  • Used Soap UI for testing the Web Services.
  • Use of MAVEN for dependency management and structure of the project
  • Create the deployment document on various environments such as Test, QC, and UAT.
  • Involved in system wide enhancements supporting the entire system and fixing reported bugs.
  • Explored Spring MVC, Spring IOC, Spring AOP, and Hibernate in creating the POC.
  • Done data manipulation on front end using JavaScript and JSON.

Environment: Java, J2EE, JSP, Spring, Hibernate, CSS, JavaScript, Oracle, JBoss, Maven, Eclipse, JUnit, Log4J, AJAX, Web services, JNDI, JMS, HTML, XML, XSD, XML Schema, SVN, Git.

We'd love your feedback!