Hadoop Developer Resume
San Jose, CaliforniA
SUMMARY:
- Over 8 years of professional IT work experience in Analysis, Design, Administration, Development, Deployment and Maintenance of critical software and big data applications in Big Data platform as both Developer and Administrator.
- Hands on experience in developing and deploying enterprise - based applications using major Hadoop ecosystem components like Map Reduce, YARN, Hive, Pig, HBase, Flume, Sqoop, Spark-Streaming, Spark-SQL, Storm, Kafka, Oozie and Cassandra.
- Hands on experience in using MapReduce programming model for Batch processing of data stored in HDFS.
- Exposure to administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig
- Installed and configured multiple Hadoop clusters of different sizes and with ecosystem components like Pig, Hive, Sqoop, Flume, HBase, Oozie and Zookeeper.
- Worked on all major distributions of Hadoop Cloudera and Hortonworks.
- Responsible for designing and building a Data Lake using Hadoop and its ecosystem components.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Spark MLlib, Data Frame, Pair RDD's, Spark YARN.
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
- Experience in installation, configuration, Management, supporting and monitoring Hadoop cluster using various distributions such as Apache and Cloudera.
- Experience using middleware architecture using Sun Java technologies like J2EE, Servlets, and application servers like Web Sphere and Web logic.
- Used Different Spark Modules like Spark core, Spark RDD's, Spark Data frame, Spark SQL.
- Converted Various Hive queries into Spark transformations and Actions that are required.
- Experience in working on apache Hadoop open source distribution with technologies like HDFS, Map-reduce, Python, Pig, Hive, Hue, HBase, SQOOP, Oozie, Zookeeper, Spark, Spark-Streaming, Storm, Kafka, Cassandra, Impala, Snappy, Green plum and MongoDB, Mesos.
- In-Depth knowledge of Scala and Experience building Spark applications using Scala.
- Good experience working on Tableau and Spotfire and enabled the JDBC/ODBC data connectivity from those to Hive tables.
- Designed neat and insightful dashboards in Tableau.
- Adequate knowledge of Scrum, Agile and Waterfall methodologies.
- Designed and developed multiple J2EEModel 2 MVC based Web Application using J2EE.
- Worked on various Tools and IDEs like Eclipse, IBM Rational, Apache Ant-Build Tool, MS-Office, PLSQL Developer, and SQL Plus.
- Highly motivated with the ability to work independently or as an integral part of a team and Committed to highest levels of profession.
PROFESSIONAL EXPERIENCE:
Confidential, San Jose, California
Hadoop Developer
Responsibilities:
- Worked on developing architecture document and proper guidelines
- Analyzing the requirement to setup a cluster.
- Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
- Involved in loading data from LINUX file system, servers, Java web services using Kafka Producers, partitions.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
- Implemented Storm topologies to pre-process data before moving into HDFS system.
- Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS.
- Implemented POC to migrate MapReduce programs into Spark transformations using Spark and Scala.
- Migrated complex MapReduce programs into Spark RDD transformations, actions.
- Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
- Involved in creating Hive tables, loading with data and writing hive queries which runs internally in MapReduce way.
- Developed the MapReduce programs to parse the raw data and store the pre Aggregated data in the partitioned tables.
- Loaded and transformed large sets of structured, semi structured, and unstructured data with MapReduce, Hive and pig.
- Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
- Experienced in developing custom input formats and data types to parse and process unstructured and semi structured input data and mapped them into key value pairs to implement business logic in MapReduce.
- Involved in using HCATALOG to access Hive table metadata for MapReduce or Pig code.
- Experience in implementing custom sterilizer, interceptor, source and sink as per the requirement in flume to ingest data from multiple sources.
- Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
- Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Evaluated usage of Oozie for Workflow Orchestration.
- Converted unstructured data to structured data by writing Spark code.
- Indexed documents using Apache Solr.
- Set up Solr Clouds for distributing indexing and search.
- Automation of all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and running MR, PIG, and Hive jobs using Kettle and Oozie (Work Flow management)
- Worked on No-SQL databases like Cassandra, MongoDB for POC purpose in storing images and URIs.
- Integrating bulk data into Cassandra file system using MapReduce programs.
- Used Talend ETL tool to develop multiple jobs and in setting workflows.
- Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.
- Worked on MongoDB for distributed storage and processing.
- Designed and implemented Cassandra and associated RESTful web service.
- Implemented Row Level Updates and Real time analytics using CQL on Cassandra Data.
- Used Cassandra CQL with JavaAPI's to retrieve data from Cassandra tables.
- Worked on analyzing and examining customer behavioral data using Cassandra.
- Created partitioned tables in Hive, mentored analyst and SQA team for writing Hive Queries.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in cluster setup, monitoring, test benchmarks for results.
- Involved in build/deploy applications using Maven and integrated with CI/CD server Jenkins.
- Involved in agile methodologies, daily scrum meetings, Spring planning's Worked on Hadoop cluster with 350 nodes on Cloudera distribution 7.7.0.
- Tested loading the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Installed Kerberos secured Kafka cluster with no encryption on POC also set up Kafka ACL's.
- Experience with Solr integration with HBase using Lily indexer/Key-Value Indexer.
- Created NoSQL solution for a legacy RDBMS Using Kafka, Spark, SOLR, and HBase indexer for ingestion, SOLR and HBase for and real-time querying.
- Involved in installation of MapR and upgrade from MapR 5.0 to MapR 5.2.
- Worked with CDH4 as well as CDH5 applications. Performed Data transfer of large data back and forth from development and production clusters.
- Troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Extensively worked on Elastic search querying and indexing to retrieve the documents in high speeds.
- Installed, configured, and maintained several Hadoop clusters which includes HDFS, YARN, Hive, HBase, Knox, Kafka, Oozie, Ranger, Atlas, Infra Solr, Zookeeper, and Nifi in Kerberos environments.
- Worked on setting up high availability for major production cluster and designed automatic failover control using zookeeper and quorum journal nodes.
- Assisted in configuration, development and testing of Autosys JIL and other scripts.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Deployed Dashboards for monitoring cluster nodes using Graphite as a Data Source and collect as a metric sender.
- Worked on Navigator API to export Denied Access on Cluster to prevent security threat.
- Worked with Hadoop tools like Flume, Hive, Sqoop and Oozie on the Hadoop cluster.
- Experience in workflow scheduling and monitoring tool Run-deck and Control-M.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS
- Participated in configuring and monitoring distributed and multiple platform servers using Puppet. Used Puppet server and workstation to manage and configure nodes.
- Experience in managing virtual instances and disks using Puppet.
- Deployed Puppet, Puppet dashboard for configuration management to existing infrastructure.
- Production experience in large environments using configuration management tool Puppet supporting with 500+ servers and involved in developing manifests.
- Implemented continuous integration web hooks and workflows around Jenkins to automate the dev test deploy workflow around Puppet codebase.
- Setup puppet master, client and wrote scripts to deploy applications on Dev, QA, production environment.
- Created continuous integration system using Ant, Jenkins, Puppet, Continuous Integration forfaster and flawless deployments.
- Worked on installation, configuration and maintenance of Debian/Redhat, CentOS and Suse Servers at multiple Data Centers.
- Automated the installation, deployment and maintenance of Middleware Application Servers to RHEL Development and Test Environments.
Environment: Cloudera, Puppet, Hadoop 2.7.2, HDFS2.7.2, AWS s3, AWS EC2, SparkSql1.6.1, Sqoop1.4.6, Spark1.6.3, Scala 2.12, MySQL, Shell Scripting, Java, GitHub.
Confidential, Philadelphia, PA
Hadoop Developer
Responsibilities:
- Created documentation processes, server diagrams, preparing server requisition documents and upload them in Share point
- Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
- Responsible for building scalable distributed data solutions using Hadoop .
- Experienced in loading and transforming of large sets of structured, semi structured and unstructured data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala and Python.
- Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
- Experienced in developing Spark scripts for data analysis in both python and Scala.
- Wrote Scala scripts to make spark streaming work with Kafka as part of spark Kafka integration efforts.
- Built on-premise data pipelines using Kafka and spark for real time data analysis.
- Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
- Implemented Hive complex UDF's to execute business logic with Hive Queries.
- Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Evaluated performance of Sparks vs IMPALA vs DRILL on offline data as a part of poc.
- Worked on solr configuration and customizations based on requirements.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Responsible for developing data pipeline by implementing Kafka producers and consumers and configuring brokers.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed Unit tests for MapReduce Programs using MR Unit testing library.
- Experience in managing and reviewing Hadoop Log files.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Setup Spark EMR to process huge data which is stored in AmazonS3.
- Developed PIGUDF'S for manipulating the data as per the business requirements and worked on developing custom PIG Loaders.
- Used Gradle for building and testing project
- Fixed defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
- Used Mingle and later moved to JIRA for task/bug tracking.
- Used GIT for version control
- Having knowledge on documenting processes, server diagrams, preparing server requisition documents
- Created topic for different users
- Installed Zookeepers, brokers, schema registry, control Center on multiple machine.
- Setup ACL/SSL security for different users and assign users to multiple topics
- Develop security for users and they can connect with SSL security
- Assign access to users by multiple user’s login.
- Set up Hortonworks Infrastructure from configuring clusters to Node
- Managing the configuration of the cluster to the meet the needs of analysis whether I/O bound, or CPU bound
- Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools.
- Working on setting up 100 node production cluster and a 40-node backup cluster at two different data centers
- Automated the setup of Hadoop Clusters and creation of Nodes
- Monitor the improvement of CPU utilization and maintain it.
- Performance tune and manage growth of the O/S, disk usage, and network traffic
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from LINUX file system to HDFS.
- Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
Environment: Cloudera, Hadoop, AWS, Sqoop, Oozie, Docker, Kafka, Spar, Scala, Hbase, Zoo Keeper, MySQL, Tableau, Shell Scripting, Java.
Confidential, New York, New York
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
- Installed Hadoop, Map Reduce, HDFS, and Developed multiple maps reduce jobs in PIG and Hive for data cleaning and pre-processing.
- Developed simple to complex MapReduce streaming jobs using Java language for processing and validating the data.
- Developed data pipeline using MapReduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
- Developed MapReduce and Spark jobs to discover trends in data usage by users.
- Implemented Spark using Python and Spark SQL for faster processing of data.
- Implemented algorithms for real time analysis in Spark.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Used the Spark -Cassandra Connector to load data to and from Cassandra.
- Real time streaming the data using Spark with Kafka.
- Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, MapReduce and then loading data into HDFS.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed product profiles using Pig and commodity UDFs.
- Developed Hive scripts in HiveQL to De-Normalize and Aggregate the data.
- Created HBase tables and column families to store the user event data.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Created UDF's to store specialized data structures in HBase and Cassandra.
- Scheduled and executed work flows in Oozie to run Hive and Pig jobs.
- Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.
- Used Tez framework for building high performance jobs in Pig and Hive.
- Configured Kafka to read and write messages from external programs.
- Configured Kafka to handle real time data.
- Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase.
- Written Storm topology to emit data into Cassandra DB.
- Written Storm topology to accept data from Kafka producer and process the data.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Used JUnit framework to perform Unit testing of the application
- Developed interactive shell scripts for scheduling various data cleansing and data loading process.
- Performed data validation on the data ingested using MapReduce by building a custom model to filter all the invalid data and cleanse the data.
- Experience with data wrangling and creating workable datasets
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Experienced in managing and reviewing the Hadoop log files.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Worked hands on with ETL process. Handled importing data from various data sources, performed transformations.
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Performed real time analysis on the incoming data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Environment: Hadoop , MapReduce, Spark, Pig, Hive, Sqoop, Oozie, HBase, Zookeeper, Kafka, Flume, Solr, Storm, Tez, Impala, Mahout, Cassandra, Cloudera manager, MySQL, Multi-node cluster with Linux-Ubuntu, Windows, Unix
Confidential, New York, New York
Selenium Tester
Responsibilities:
- Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
- Analyzed technical and functional requirements documents and design and developed QA Test Plan/Test cases, Test Scenario by maintaining E2E flow of process.
- Developed testing script for internal brokerage application that is utilized by branch and financial market representatives to recommend and manage customer portfolios; including international and capital markets.
- Designed and Developed Smoke and Regression automation script and Automation of functional testing framework for all modules using Selenium and WebDriver.
- Created Data Driven scripts for adding multiple customers, checking online accounts, user interfaces validations, and reports validations.
- Performed cross verification of trade entry between mainframe system, its web application and downstream system.
- Extensively used Selenium WebDriver API (XPath and CSS locators) to test the web application.
- Configured Selenium WebDriver, TestNG, Maven tool, Cucumber, and BDD Framework and created Selenium automation scripts in java using TestNG.
- Performed Data-Driven testing by developing Java based library to read test data from Excel & Properties files.
- Extensively performed DB2 database testing to validate the trade entry from mainframe to backend system. \ Developed data driven framework with Java, Selenium WebDriver and Apache POI which is used to do the multiple trade order entry.
- Developed internal application using Angular.js and Node.js connecting to Oracle on the backend.
- Expertise in debugging issues occurred in front end part of web-based application which is developed using HTML5, CSS3, Angular JS, Node.JS and Java.
- Developed smoke automation test suite for regression test suite.
- Applied various testing technique in test cases to cover all business scenario for quality coverage.
- Interacted with development team to understand design flow, code review, discuss unit test plan.
- Executed tests in System & integration Regression testing In Testing environment.
- Conducted Defect triage meeting, Defect root cause analysis, track defect in HP ALM Quality Center, manage defect by follow up open items, and retest defects with regression testing.
- Provide QA/UAT sign off after closely reviewing all the test cases in Quality Center along with receiving the Policy sign off the project.
Environment : HP ALM, Selenium WebDriver, JUnit, Cucumber, Angular JS, Node.JS Jenkins, GitHub, Windows, UNIX, Agile, MS SQL, IBM DB2, Putty, WinSCP, FTP Server, Notepad++, C#, DB Visualizer.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, RDBMS/DB, Flat files, Teradata, MySQL, CSV, Avro data files. JAVA, J2EE.
Confidential, New York, New York.
Software Test Engineer
Responsibilities:
- Involved in preparing Test Plan and Test cases.
- Developed test cases for automation team for regression testing.
- Formulated methods to perform positive and negative testing against requirements.
- Performed backend testing using SQL queries.
- Reported bugs found during test using Quality Center.
- Conducted functional, regression, black box and system testing.
- Reviewed functional design for internal product documentation.
- Used Quality Center for requirements management, planning, scheduling, running tests defect tracking and managing the defects.
- Analyzed, tested, and certified application-specific software and performed ambiguity reviews of business requirements and functional specification documents.
- Developed Manual Test cases and test scripts to test the functionality of the application.
- Provided test results, graphs, and analysis of application performance data by email or phone during testing to the application developer and manager.
- Implemented Automated-testing methodology such as Data Driven Testing, Key Word Driven Testing methods.
- Created and executed regression scripts using Quick Test Professional.
- Inserted various check points, parameterized the test scripts, and performed regular expression on scripts.
- Documented tests bug in Quality Center.