Sr. Hadoop/spark Developer Resume
NJ
SUMMARY:
- Overall 10+ years of IT experience in a variety of industries, which includes hands on experience in Big Data Analytics and development
- Expertise with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Spark, Kafka, Yarn, Oozie, and Zookeeper.
- Excellent knowledge on Hadoop Ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
- Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Good exposure with Agile software development process.
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- Strong experience on Hadoop distributions like Cloudera, MapR and HortonWorks.
- Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase and Postgres.
- Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, parquet and Avro.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries for Oracle.
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Strong experience in Object-Oriented Design, Analysis, Development, Testing and Maintenance.
- Excellent implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Experienced in using agile approaches, including Extreme Programming, Test-Driven Development and Agile Scrum.
- Worked in large and small teams for systems requirement, design & development.
- Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Experience in using various IDEs Eclipse, IntelliJ and repositories SVN and Git.
- Experience of using build tools Ant, Maven.
- Preparation of Standard Code guidelines, analysis and testing documentations.
- Technology and Web based applications.
TECHNICAL SKILLS:
BigData/Hadoop Technologies: HDFS, YARN, MapReduce, Hive, Pig, Sqoop, Spark, Kafka, Zookeeper and Oozie
Machine Learning: NO SQL Databases, HBase, Cassandra, MongoDB
Languages: C, Java, Scala, Python, SQL, PL/SQL, Pig Latin, HiveQL, Java Script, Shell Scripting
Java & J2EE Technologies: Core Java, RESTful
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat.
Cloud Computing Tools: Amazon WBS
Operating Systems: UNIX, Windows, LINUX
Databases: Microsoft SQL Server, MySQL, Oracle, DB2
Build Tools: Jenkins, Maven, ANT
Business Intelligence Tools: Tableau, Splunk
Development Tools: Eclipse, NetBeans, IntelliJ
Development Methodologies: Agile/Scrum, Waterfall Version Control Tools Git, SVN
PROFESSIONAL EXPERIENCE:
Confidential, NJ
Sr. Hadoop/Spark Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Hive.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the spark cluster.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
- Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs
- Tuning Hadoop performance with high availability and involved in recovery of Hadoop clusters
- Worked on Cluster of size 310 nodes.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
- Involved in creating Hive tables, and loading and analyzing data using hive queries
- Developed Hive queries to process the data and generate the data cubes for visualizing
- Implemented schema extraction for Parquet and Avro file Formats in Hive.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Good experience with continuous Integration of application using Jenkins.
- Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Sr.Java/J2EE Developer
Confidential
Responsibilities:
- Involved in Requirement Analysis, Design, Development and Testing of the risk workflow system.
- Involved in the implementation of design using vital phases of the Software development life cycle (SDLC) that includes Development, Testing, Implementation and Maintenance Support.
- Applied OOAD principle for the analysis and design of the system.
- Implemented XML Schema as part of XQuery query language
- Applied J2EE design patterns like Singleton, Business Delegate, Service Locator, Data Transfer Object (DTO), Data Access Objects (DAO) and Adapter during the development of components.
- Used RAD for the Development, Testing and Debugging of the application.
- Used WebSphere Application Server to deploy the build.
- Developed front-end screens using Struts, JSP, HTML, AJAX, JQuery, Java script, JSON and CSS.
- Used J2EE for the development of business layer services.
- Developed Struts Action Forms, Action classes and performed action mapping using Struts.
- Performed data validation in Struts Form beans and Action Classes.
- Developed POJO based programming model using spring framework.
- Used IOC (Inversion of Control) Pattern and Dependency Injection of Spring framework for wiring and managing business objects.
- Used Web Services to connect to mainframe for the validation of the data.
- SOAP has been used as a protocol to send request and response in the form of XML messages.
- JDBC framework has been used to connect the application with the Database.
- Used Eclipse for the Development, Testing and Debugging of the application.
- Log4j framework has been used for logging debug, info & error data.
- Used Hibernate framework for Entity Relational Mapping.
- Used Oracle 10g database for data persistence and SQL Developer was used as a database client.
- Extensively worked on Windows and UNIX operating systems.
- Used SecureCRT to transfer file from local system to UNIX system.
- Performed Test Driven Development (TDD) using JUnit.
- Used Ant script for build automation.
- PVCS version control system has been used to check-in and checkout the developed artifacts. The version control system has been integrated with Eclipse IDE.
- Used Rational Clear quest for defect logging and issue tracking.
Environment: Windows XP, Unix, RAD7.0, Core Java, J2EE, Struts, Spring, Hibernate, Web Services, Design Patterns, WebSphere, Ant, (Servlet, JSP), HTML, AJAX, JavaScript, CSS, jQuery, JSON,SOAP, WSDL, XML, Eclipse, Agile, Jira, Oracle 10g, Win SCP, Log4J, JUnit.
