Sr. Hadoop /spark Developer Resume
NJ
SUMMARY:
- A versatile Software Developer with an experience of over 9+ years with 5 years extensively in Hadoop along with 4+ years of experience in Java/J2EE enterprise application design, development and maintenance.
- Professional experience with development and implementation in Big Data Management Platform (BMP) using HDFS, Spark, Map Reduce, Hive, PigLatin, Oozie, Flume, Kafka, Impala, Oozie and other Hadoop related eco - systems as a Data Storage and Retrieval systems.
- Extensive experience in installing, configuring and using eco system components like MapReduce, HDFS, Hbase, Hive, Pig, Flume, Sqoop, Spark, mahout, R packages.
- 6 or more years of experience with various tools and frameworks that enable capabilities within the big data Hadoop ecosystem (HDFS, Hive, PigLatin, map reduce, Oozie, Sqoop, YARN, HBase, NoSQL).
- Experience in Sqoop for moving data between RDBMS and HDFS, Experience with Data Warehouse/ODS
- Worked on writing Map Reduce programs to perform Data processing and analysis and expertise developing MapReduce jobs to scrub, sort, filter, join and summarize data.
- Proficient in analyzing data with Hive and PigLatin on Hadoop Ecosystem and Developed Batch Processing jobs using Java, Map Reduce, Pig and Hive.
- Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files.
- Worked on Oozie for managing Hadoop jobs and configuring job flows and experience in designing both time driven and data driven automated workflows using Oozie.
- Experience in PigLatin, HiveQL scripts for Data Analysis and ETL purposes and extended the default functionality by writing User Defined Functions (UDFs), (UDAFs) for custom data specific processing.
- Experience in cluster coordination using Zookeeper and experience with building and maintaining Big Data and fast Data applications using Open source software like Spark, Sqoop, Flume, and Kafka
- Experience in loading logs from multiple sources directly into HDFS using Flume.
- Manage data extraction jobs, and build new data pipelines from various structured and unstructured sources into Hadoop.
- In depth understanding of Hadoop Architecture and its various components such as Resource Manager, Application Master, Name Node, Data Node, HBase design principles etc.,
- Execute or support test scripts, analyze, validate results against established success criteria.
- Familiar in developing predictive models using Machine Learning and Data Mining algorithms.
- Support patching and incident and problem management activities, Provide work guidance or technical assistance to less senior engineers.
- Experience in writing queries for moving data from HDFS to Hive and analyzing data using Hive-QL, understanding partitions, HiveQuery optimization, Bucketing etc.
- Very Good Knowledge in Object-oriented concepts with strong knowledge of Software Development Life Cycle (SDLC) including Business interaction, Requirement Analysis, Software Architecture, Design, Development, Testing and Documentation phases.
- Profound experience in Machine Learning, Python, R statistical software, data analytics tool Tableau.
- Created various use cases using massive public data sets. Ran various performance tests for verifying the efficacy of Map Reduce, PIG and HIVE in various modes - standalone, pseudo distributed, cluster and Cloud.
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Working experience with Enterprise Data Warehouse (EDW) architecture, various data modeling concepts like star schema, databases, development, deployment and testing with various domains.
- Expertise in implementation and designing of disaster recovery plan for Hadoop cluster
- Experience in Apache, Hortonworks (HDP), Cloudera distributions (CDH), experience in development of logging standards and mechanism based on Log4J.
- Experienced in preparing and executing Unit Test Plan and Unit Test Cases after software development.
- Work with peers in development to tune infrastructure and plan for resource management including adding / removing cluster nodes for maintenance or capacity needs.
- Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.
- Experience writing Shell scripts in Linux OS and integrating them with other solutions.
- Intensive work experience in developing enterprise solutions using Java, J2EE, Servlets, JSP, JDBC, Struts, spring, Hibernate, JavaBeans, JSF, MVC.
- Expert at creating UML diagrams Use Case diagrams, Activity diagrams, Class diagrams and Sequence diagrams using Microsoft Visio.
- Hands on experience in Python to maintain and improve the internet applications.
- A great team player; Ability to multi-task and work on multiple projects & prioritize correctly, Experience with large projects, handling clients and leading offshore teams, ability to effectively communicate with all levels of the organization such as technical, management and customers.
TECHNICAL SKILLS:
Hadoop & Big Data Technologies: HDFS, Hive, PigLatin, Sqoop, Flume, Map Reduce, Oozie, Hbase, Cassandra, Spark, Zookeeper, YARN, Kafka, Storm, Machine Learning, Neural Networks
Programming Languages: SQL, PL/SQL, R Program, python, Java JDK1.4/1.5/1.6 (JDK 5/JDK 6), C/C++, Scala, HTML
Frameworks: Hibernate 2.x/3.x, Spring 2.x/3.x, Struts1.x/2.x and JPA
Databases: NoSQL, Oracle 9i/10g, Microsoft SQL Server 2008/2012, DB2 & MySQL 4.x/5.x, DB2, Teradata
Machine Learning& Data mining Algorithms: Un/ Supervised Learning, Regression, Classification, Clustering, Time series, Naive Bayes, Decision Trees, RandomForest, Bayesian, Hierarchal Clustering, Reinforcement, Dimensionality Reduction, feature selection, feature extraction
Scripting Languages: Unix Shell Scripting, Perl
Java/J2EE & Web Technologies: J2EE, EJB, JSF, Servlets, JSP, JSTL, CSS, HTML, XML, Angular JS, AJAX, Java Script
Methodologies: Agile/Scrum, UML, Rational Unified Process and Waterfall.
Web Application Servers: IBM Web sphere, Tomcat, Web Logic, WebSphere
Distributed platforms: Hortonworks, Cloudera, MapR
Operating Systems: Windows 7/8,10, UNIX, Linux
Development Tools: Eclipse, Net Beans, SVN, Git, Ant, Maven, Visio, SOAP UI, JMX, explorer, XML Spy, QC, QTP, Jira, SQL Developer, QTOAD, Visio
PROFESSIONAL EXPERIENCE:
Confidential, Philadelphia
Sr. Hadoop /Spark Developer
Responsibilities:
- The main aim of the project is tuning the performance of the existing Hive Queries.
- Implemented Spark using Scala, Java and utilizing Data frames and Spark SQL API for faster processing of data.
- Created end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities according to the requirement
- Developed data pipeline using Spark, Hive and Sqoop, to ingest, transform and analyze operational data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark DataFrames and Scala.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Real time streaming the data using Spark with Kafka
- Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters. Used in productionby multiple companies.
- Developed Python scripts to copy data between the clusters. The Python script that is developed for the copy enables to copy huge amount of data very fast.
- Hands on experience in developing predictive models by using Machine Learning algorithms.
- Ingested syslog messages, parses them and streams the data to Apache Kafka.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting, aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Created HBase tables and column families to store the user event data.
- Scheduled and executed workflows in Oozie to run Hive jobs.
Environment: Hadoop, HDFS, MapR 5.1, HBase, Spark, Scala, Hive, MapReduce, Python, Sqoop, ETL, Java, PL/SQL, Oracle 11g, Unix/Linux, Machine Learning
Confidential, NJ
Sr. HadoopDeveloper
Responsibilities:
- Extensively worked on Oozie, Unix scripts for batch processing and scheduling workflows dynamically.
- Implemented data ingestion from multiple sources like IBM DB2, Teradata and Oracle using Sqoop, SFTP and MR jobs.
- Developed Sqoop scripts to import and export data from relational sources and handled incremental and updated changes into HDFS layer.
- Developed transformations and aggregated the data for large data sets using MapReduce, Pig, Hive scripts.
- Worked on partitioning and used bucketing in Hive tables and running the scripts in parallel to improve the performance.
- Developed test cases in JUnit for unit testing of MapReduce Jobs.
- Explored different BI reporting tools to compare which one best suits the requirements.
- Worked on JDBC connectivity using Squirrel, ODBC connection setups with Tableau and Toad, Stress testing Hadoop data for BI reporting tools.
- Implemented a process to automatically update the Hive tables by reading a change file provided by business users.
- Experienced working with different file formats - Avro, Parquet and JSON.
- Experience in using Gzip, LZO, Snappy and Bzip2 compressions.
- Experience in reading and writing files into HDFS using Java file system API.
- Developed Pig and Hive UDF's based on requirements.
- Developed the workflow jobs using Oozie services to run the Map Reduce, Pig and Hive jobs and created JIL scripts to run Oozie jobs.
- Improved performance using advanced joins in Apache Pig and Apache Hive.
- Tuning MapReduce job parameters and configuration parameters to improve performance.
- Data copy between production and lower environments.
- Created reporting views in Impala using Sentry Policy files.
- Involved in all phases of the SDLC including analysis, design, development, testing, and deployment of Hadoop cluster.
- Experience with Agile development processes and practices.
Environment: Cloudera Hadoop, Hortonworks Hadoop, HDFS, Hive, Pig, MapReduce, Oozie, Flume, Sqoop, Kafka, Impala, Informatica Big Data, Tableau, Teradata, UNIX, Shell Scripting. Kerberos security, IBM DB2, Teradata, Oracle
Confidential - Topeka, MO
Big Data/ Hadoop Consultant
Responsibilities:
- Participated in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
- Responsible for building scalable, distributed data solutions using Hadoop.
- Optimised Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, Map/Reduce, loaded data into HDFS and extracted data from DB2/Oracle into HDFS using Sqoop.
- Created Hive external tables to store the Pig script output, and worked on them for data analysis in order to meet business requirements.
- Worked with application teams to install operating systems, Hadoop updates, patches, version upgrades as required.
- Analyzed large amount of data sets to determine optimal way to aggregate and report on it.
- Written multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Optimized the existing Hive and Pig scripts.
- Developed MapReduce pipeline jobs to process the data and create necessary HFiles and loading the HFiles into HBase for faster access without taking performance hit.
- Actively participated in weekly team meetings with technical teams to review the code.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in moving all the log files generated from various sources to HDFS for further processing through flume.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, java, eclipse, Oracle, DB2, MYSQL, Unix Shell, Hbase
Confidential - Dallas TX
Java/Hadoop Engineer
Responsibilities:
- Developed Map Reduce jobs in java for data cleansing and pre-processing.
- Moving data from Oracle to HDFS and vice-versa using SQOOP.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
- Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- The Hive tables created as per requirement were internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency.
- Implemented partitioning, bucketing in Hive for better organization of the data.
- Worked with different file formats and compression techniques to determine standards
- Developed Hive queries and UDFS to analyse/transform the data in HDFS.
- Developed Hive scripts for implementing control tables logic in HDFS.
- Designed and Implemented Partitioning (Static, Dynamic), Buckets in HIVE.
- Developed Pig scripts and UDF’s as per the Business logic.
- Analysing/Transforming data with Hive and Pig.
- Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
- Designed and developed read lock capability in HDFS.
- Implemented Hadoop Float equivalent to the Oracle Decimal.
- Involved in End to End implementation of ETL logic.
- Effective coordination with offshore team and managed project deliverable on time.
- Worked on QA support activities, test data creation and Unit testing activities.
- Monitoring and Debugging Hadoop jobs/Applications running in production.
- Worked on Providing User support and application support on Hadoop Infrastructure.
- Reviewing ETL application use cases before on boarding to Hadoop.
- Worked on Evaluating, comparing different tools for test data management with Hadoop.
- Helped and directed testing team to get up to speed on Hadoop Application testing.
- Worked on Installing 20 node UAT Hadoop cluster.
Environment: Apache Hadoop 2.0.0, Pig 0.11, Hive 0.10, Sqoop 1.4.3, Flume, MapReduce, HDFS, LINUX, Oozie, Cassandra, Hue, Java, Eclipse, VSS, Red Hat Linux.
Confidential, TN
Java/J2EE Developer
Responsibilities:
- Developed the application using Struts Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams (sequence and collaboration) and activity diagrams were used.
- Worked in an Agile work environment with Content Management system for workflow management and content versioning.
- Involved in designing user screens and validations using HTML, jQuery, Ext JS and JSP as per user requirements.
- Responsible for validation of Client interface JSP pages using Struts form validations.
- Integrating Struts with Spring IOC.
- Used Spring Dependency Injection properties to provide loose-coupling between layers.
- Used Hibernate ORM framework with Spring framework for data persistence and transaction management.
- Used Hibernate 3.0 object relational data mapping framework to persist and retrieve the data from database.
- Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.
- Implemented the logging mechanism using Log4j framework.
- Wrote test cases in JUnit for unit testing of classes.
Environment: JDK 1.5, J2EE 1.4,Agile Development Process, Struts 1.3, Spring 2.0, Web Services (JAX-WS, Axis 2) Hibernate 3.0, RSA, JMS, JSP, Servlets 2.5, WebSphere 6.1, SQL Server 2005, Windows XP, HTML, XML, IBM Rational Application Developer (RAD), ANT 1.6, Log4J, XML, XSLT, XSD, jQuery, JavaScript, Ext JS, JUnit 3.8, SVN.
Confidential, Fremont, CA
Java Developer
Responsibilities:
- Analysis, Design, Project Planning and effort estimate and Development of FTM application based on -MVC using Struts Framework and server-side J2EE technologies.
- Part of the core agile team in developing the application in Agile Development Methodology.
- Involved in mentoring team in technical discussions and Technical review of Design Documents.
- Hands on Code development by using Core java, servlet and Hibernate framework's API.
- Used Hibernate to develop persistent classes following ORM principles.
- Developed Hibernate configuration files for establishing data base connection and Hibernate mapping files based on POJO classes.
- Developed JUNIT test cases and System test cases for all the developed modules and classes, use Jmeter for performance test.
- Used SVN for source control.
- Used Maven for product lifecycle management.
- Involved in code reviews and verifying bug analysis reports
- Created the PL/SQL stored procedure, function, triggers for the Oracle 11g database.
- Used Eclipse Juno as the IDE and Tomcat 6.0/ 7.0 as the application server.
Environment: Java, J2EE 1.5, Struts 1.3, Hibernate 3.0, JSP, Servlets, XML, Tomcat, JDBC, Oracle SQL Developer, Oracle, JQuery
