Sr. Hadoop/spark Developer Resume
New Hartford, NY
SUMMARY
- Having around 9 years of professional experience this includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using Java/J2EE Technologies and Hadoop technologies.
- Hands on experience in using various Hadoop distributions (Apache, Horton works, Cloud era, MapR).
- Experience in working with Amazon EMR, Cloudera (CDH3, CDH4 & CDH5) and Horton Works Hadoop Distributions.
- Expertise in Hadoop Ecosystem tools which including HDFS, Yarn, MapReduce, Pig, Hive, Sqoop, Flume, Kafka, Spark, Zookeeper and Oozie.
- Good knowledge in EMR (Elastic Map Reducing) to perform big data operations in AWS.
- Knowledge in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Excellent understanding of Spark and its benefits in Big Data Analytics.
- Hands on experience in Stream processing frameworks such as Storm, Spark Streaming.
- Experience in design and develop the POC in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Hand - on experience in using Scala, Spark Streaming, batch processing for processing theStreaming data and batch data.
- Working Knowledge of other programming languages like C,C++ and Markup Languages like XML, HTML 5.
- Participated in all Business Intelligence activities related to data warehouse, ETL and report development methodology.
- Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Experience in data analysis using HiveQL, Pig Latin and custom Map Reduce programs in Java.
- , Hortonworks, MapR and Apache distributions.
- Hands-on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka.
- Experienced working with Hadoop Big Datatechnologies (hdfs and MapReduce programs), Hadoop ecosystems (HBase, Hive, pig) and NoSQL database MongoDB.
- Experience in queried and analyzed data from Cassandra for quick searching, sorting and grouping through CQL.
- Implemented Data Quality in ETL Tool Talend and having good knowledge in Data Warehousing and ETL Tools like IBM DataStage, Informatica and Talend.
- Scraped and analyzed data using Machine Learning algorithms in Python and SQL.
- Experience using Spark DataStax and Cassandra Connector load data to and from Cassandra.
- Experience on usage of NoSQL in writing applications like HBase, Cassandra and MongoDB.
- Extensive Experience on importing and exporting data using Flume and Kafka.
- Experience in configuring the Zookeeper to coordinate servers in clusters and to maintain data consistency.
- Expertise in loading the datafrom the different datasources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables.
- Had good working experience onHadoop architecture, HDFS, Map Reduce and other components in the Cloudera - Hadoop echo system.
- Experience in developing data pipeline by using Kafka to store the data into HDFS.
- Experience in migrating data by using SQOOP from HDFS to Relational Database System and vice-versa according to client's requirements.
- Used Cassandra CQL with Java API’s to retrieve the data from Cassandra tables.
- Worked with NIFI for managing flow of data from source to HDFS.
- Good Experience on source control repositories like CVS, GIT and SVN.
- Experience in working different scripting technologies like Python, UNIX shell scripts.
- Excellent Java development skills using J2EE, J2SE, Servlets, JSP, EJB, JDBC.
- Experience working with Spring and Hibernates frameworks in JAVA.
- Experience in developing web page interfaces using HTML, JSP and Java Swings scripting languages.
- Involved in the Ingestion of data from various Databases like TERADATA( Sales Data Warehouse), AS400, DB2, SQL-SERVER using Sqoop.
- Used Spring Core Annotations for Dependency Injection Spring DI and Spring MVC for REST API’s and Spring Boot for micro-services.
- Good understanding and working experience on Cloud based architectures.
- Experience in handling various file formats like AVRO, Parquet, Sequential etc.
- Expertise implementation knowledge of Enterprise/Web/Client Server using Java, J2EE.
- Expertise in Oracle ORMB and Stored procedures concepts.
- Good understanding and experience with Software Development methodologies like Agile and Waterfall and performed Testing such as Unit, Regression, White-box and Black-box.
- Ability to work onsite and offshore team members.
TECHNICAL SKILLS
Big Data Technologies: HDFS, MapReduce, YARN, Pig, Hive, Sqoop, Kafka, Flume, HBase, Cassandra, MongoDB, Spark, Solr, Ambari, Hue, Avro, Mahout, Impala, Oozie, Nifi and Zookeeper
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache
Database: MySQL, Oracle 10g/11g, PL/SQL, MS SQL Server 2012
No-SQL Database: HBase, Cassandra and MongoDB
Programming Languages: C, C++, Java, JavaScript, Python, Scala
Frameworks: Struts, Spring, Hibernate, Spring Boot, Micro-services
Operating System: Windows 7/8/10, Vista, Ubuntu, Linux, UNIX, Mac OS
Cloud Platforms: AWS Cloud, Google Cloud
Application Servers: Web Logic, Web Sphere, Tomcat
Architecture: Client-Server Architecture, Relational DBMS, OLAP, OLTP
Testing: Selenium Web Driver, Junit
Modelling Tools: Visual paradigm for UML, Rational Rose, Star UML
ETL Tools: Talend, Informatica, Tableau
IDE Tools: NetBeans, Eclipse, Intellij, Visual Studio Code
Built Tools: Maven, Jenkins
Development Methodologies: Waterfall, Agile/Scrum
PROFESSIONAL EXPERIENCE
Confidential, New Hartford, NY
Sr. Hadoop/Spark Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced in loading and transforming of large sets of structured, semi structured and unstructured data.
- DevelopedSparkjobs and Hive Jobs to summarize and transform data.
- Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
- Experienced in developing Spark scripts for data analysis in both python and Scala.
- Built on premise data pipelines using Kafka and spark for real time data analysis.
- Created reports in TABLEAUfor visualization of the data sets created and tested native Drill, Impala and Spark connectors.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive and involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Analysed the SQL scripts and designed the solution to implement using Scala.
- Implemented Hive complex UDF's to execute business logic with Hive Queries.
- Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
- Evaluated performance of Spark SQL vs IMPALAvs DRILL on offline data as a part of poc.
- Worked on Solr configuration and customizations based on requirements.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exporting of result set from HIVE to MySQL using Sqoop export toolfor further processing.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Utilized ApacheHadoopenvironment by Cloudera.
- Responsible for developing data pipeline by implementing Kafka producers and consumers.
- Performed data analysis with HBase using Apache Phoenix.
- Exported the analyzed data to Impala to generate reports for the BI team.
- Developed multiple Spark jobs inPySparkfor data cleaning and pre-processing.
- Managing and reviewing Hadoop Log files to resolve any configuration issues.
- Developed a program to extract the name entities fromOCRfiles.
- Used Gradle for building and testing project
- Fixed defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
- Participated in development/implementation of Cloudera Hadoop environment.
- Used Mingle and later moved to JIRA for task/bug tracking.
- Used GIT for version control.
Environment: MapR, Cloudera, Hadoop, HDFS,AWS, PIG, Hive, Impala, Drill, SparkSQL, OCR, MapReduce, Flume, Sqoop, Oozie, Storm, Zepplin, Mesos, Docker, Solr, Kafka, Mapr DB, Spark, Scala, HBase, ZooKeeper, Tableau, Shell Scripting, Gerrit, Java, Redis.
Confidential, Kenilworth, NJ
Hadoop/Spark Developer
Responsibilities:
- Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
- Pulled the data from data warehouse using Sqoop and placed in HDFS.
- Extensively used Spark stack to develop pre-processing job which includes RDD, Datasets and Data frames Api'sto transform the data for upstream consumption.
- DevelopedReal-time data processing applications by using Scala and Python and implemented ApacheSparkStreaming from various streaming sources like Kafka, Flume and JMS.
- Worked on extracting and enriching HBase data between multiple tables using joins in spark.
- Worked on writing APIs to load the processed data toHBasetables.
- Replaced the existing MapReduce programs intoSparkapplicationusing Scala.
- Built on premise data pipelines using Kafka and Spark streaming using the feed from API streaming Gateway REST service.
- Developed the Hive UDF’s to handle data quality and create filtered datasets for further processing
- Knowledge of data warehousing and ETL tools like Informatica, Talend and Pentaho.
- Experienced in writing Sqoop scripts to import data into Hive/HDFS from RDBMS.
- Good knowledge on Kafka streams API for data transformation.
- Implemented logging framework - ELK stack (Elastic Search, LogStash & Kibana) on AWS.
- Setup Spark EMR to process huge data which is stored in Amazon S3.
- Developed Oozie workflow for scheduling & orchestrating the ETL process.
- UsedTalendtool to create workflows for processing data from multiple source systems.
- Created sample flows in Talend, Stream sets with custom coded jars and analyzed the performance of Stream sets and Kafka steams.
- Developed Hive Queries to analyze the data in HDFS to identify issues and behavioural patterns.
- Involved in writing optimized Pig Script along with developing andtestingPig Latin Scripts.
- Able to use Python Pandas, Numpy modules for Data analysis, Data scraping and parsing.
- Deployed applications using Jenkins framework integrating Git- version control with it.
- Participated in production support on a regular basis to support the Analytics platform
- Used Rally for task/bug tracking.
- Used GIT for version control.
Environment: MapR, Hadoop, HBase, HDFS, AWS, PIG, Hive, Drill, SparkSQL, MapReduce, Spark streaming, Kafka, Flume, Sqoop, Oozie, Jupyter Notebook, Docker, Kafka, Spark, Scala, Talend, Shell Scripting, Java.
Confidential, Dublin, OH
Hadoop Developer
Responsibilities:
- Loading the data from the different Data sources like (Teradata, DB2, Oracle and flat files) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Created different pig scripts & converted them as a shell command to provide aliases for common operation for project business flow.
- Implemented various Hive queries for Analysis and call then from java client engine to run on different nodes.
- Worked on NiFi to automate the data movement between different Hadoop systems.
- Extracted the data from the flat files and other RDBMS databases into staging area and populated onto Data warehouse.
- Created few Hive UDF's to as well to hide or abstract complex repetitive rules.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Involved in End to End implementation of ETL logic.
- Reviewing ETL application use cases before on boarding to Hadoop.
- Developed bash scripts to bring the log files from FTP server and then processing it to load into Hive tables.
- Designed and implemented custom NiFi processors that reacted, processed for the data pipeline.
- All the bash scripts are scheduled using Resource Manager Scheduler.
- Moved data from HDFS to Cassandra using Map Reduce and Bulk Output Format class.
- Developed Map Reduce programs for applying business rules to the data.
- Did Implementation using Apache Kafka replacement for a more traditional message broker (JMS Solace) to reduce licensing and decouple processing from data producers, to buffer unprocessed messages.
- CreatedNiFiflows to trigger spark jobs and used put email processors to get notifications if there are any failures.
- Designed a data warehouse using Hive, created and managed Hive tables in Hadoop.
- Created HBase tables and column families to store the user event data.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Implemented receiver based approach, here I worked on Spark streaming for linking with Streaming Context using java API and handle proper closing & waiting for stages as well.
- Maintaining Authentication module to support Kerberos.
- Experience in Implementing Rack Topology scripts to the Hadoop Cluster.
- Implemented the part to resolve issues related with old Hazel cast API Entry Processor.
- Participated with the admin team in designing and upgrading CDH 3 to HDP 4.
- Developed Some Helper class for abstracting Cassandra cluster connection act as core toolkit.
- Enhanced existing module written in python scripts.
- Used dashboard tools like Tableau.
Environment: Hadoop, Linux, MapReduce, HDFS, HBase, Hive, Pig, Tableau, NoSQL, Shell Scripting, Sqoop, Java, Eclipse, NiFi, Oracle 10g, Maven, Open source technologies Apache Kafka, Apache Spark, ETL, Hazel cast, Git, Mockito, python.
Confidential
Java Developer
Responsibilities:
- Involved in Analysis, Design, Development and Testing of the application.
- Incorporated UML diagrams (Class diagrams, Activity diagrams, Sequence diagrams) as part of design documentation and other system documentation.
- Enhanced the Port search functionality by adding a VPN Extension Tab.
- Created end to end functionality for view and edit of VPN Extension details.
- Used AGILE process to develop the application as the process allows faster development as compared to RUP.
- Designed UI screens using JSP, jQuery, Ajax and HTML.
- Used Hibernate for persistence framework
- Used Struts MVC framework and WebLogic Application Server in this application.
- Involved in creating DAO’s and used Hibernate for ORM mapping.
- Built REST API end-points for various concepts.
- Written procedures and Triggers for validating the consistency of Meta data.
- Written SQL code blocks using cursors for shifting records from various tables based on checks.
- Written Java classes to test UI and Web services through JUnit and JWebUnit.
- Performed functional and integration testing.
- Extensively involved in release/deployment related critical activities.
- Tested the entire application using JUnit and JWebUnit.
- Log4J was used to log both User Interface and Domain Level Messages.
- Used Perforce for version control.
Environment: JAVA, JSP, Servlets, J2EE, EJB, Struts Framework, JDBC, WebLogic Application Server, Hibernate, Oracle 9i, UNIX, Web Services, CVS, Eclipse, Rational Rose, JUnit, JWebUnit.
Confidential
Java Developer
Responsibilities:
- Involved in Analysis, design and coding on JAVA/JSP Front End Environment.
- Responsible for developing use cases, class and sequence diagram for the modules using UML and Rational Rose Enterprise edition as a Feature owner.
- Designed the dynamic stress reporting C++.
- Developed application using Spring, Servlets, JSP and EJB.
- Implemented MVC (Model View Controller) architecture.
- Designed the Application flow using Rational Rose.
- Used web servers like Apache Tomcat.
- Implemented Application prototype using HTML, CSS and JavaScript.
- Developed the user interfaces with the spring tag libraries.
- Developed, build and deployment scripts using Apache ANT to customize WAR, EAR and EJB jar files.
- Prepared field validation and on-scenario test cases using Junit and testing of the module in 3 phases named unit testing and system using testing and regression testing.
- Code and unit test according to client standards.
- Used Oracle Database for data storage and coding stored procedures, functions and Triggers.
- Wrote DB queries using SQL for interacting with database.
- Design and develop XML processing components for dynamic menus on the application.
- Created Components using JAVA, Spring and JNDI.
- Prepared Sp ring deployment descriptors using XML.
- Good Knowledge on C++.
- Problem Management during QA, Implementation and Post- Production Support.
- Developed a logging component using Apache Log4J to log messages and errors and wrote test cases to verify the code for different conditions using Junit.
Environment: Java, HTML, Spring, JSP, Servlets, C++., BMS, Web Services, JNDI, JDBC, Eclipse, Web sphere, XML/XSL, Apache Tomcat, TOAD, Oracle, MySQL, JUNIT, SQL, PL/SQL, CSS.
