Senior Hadoop Developer Resume
Stamford, CT
SUMMARY
- Over 9 years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
- More than 4+years of work experience in ingestion, storage, querying, processing and analysis of Bigdata with hands on experience in Hadoop Ecosystem development including MapReduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, Zookeeper, Sqoop, Flume, Oozie and AWS.
- Extensive experience working in Teradata, Oracle, Netezza, SQL Server and MySQL database.
- Expertise in developing responsive Front - End components with JSP, HTML, XHTML, JavaScript, DOM, Servlets, JSF, NodeJS, Ajax, jQuery and AngularJS.
- Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
- Strong experience working with different Hadoop distributions like Cloudera, Hortonworks, MapReduce and Apache distributions.
- Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
- Experience in Amazon AWS services such as EMR, EC2, S3, Cloud Formation, RedShift which provides fast and efficient processing of Big Data.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, HadoopGEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
- Good understanding of R Programming, Data Mining and Machine Learning techniques.
- Strong experience and knowledge of real time data analytics using Storm, Kafka, Flume and Spark.
- Good experience on general data analytics on distributed computing cluster like Hadoop using Apache Spark, Impala, and Scala.
- Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
- Experience in installing and maintaining Cassandra by configuring the cassandra.yaml file as per the requirement.
- Involved in upgrading existing MongoDB instances from version 2.4 to version 2.6 by upgrading the security roles and implementing newer features.
- Responsible for performing reads and writes in Cassandra from and web application by using Java JDBC connectivity.
- Experience in extending HIVE and PIG core functionality by using custom UDF's and UDAF's.
- Debugging MapReduce jobs using Counters and MRUNIT testing.
- Expertise in writing the Real-time processing application Using spout and bolt in Storm.
- Worked on setting up Apache NiFi and performing POC with NiFi in orchestrating a data pipeline.
- Extensive experience working with Spark tools like RDD transformations, Spark MLlib and SPARQL.
- Experienced in moving data from different sources using Kafka producers, consumers and pre-process data using Storm topologies.
- Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
- Good understanding of MPP databases such as HPVertica, Greenplum and Impala.
- Highly Knowledgeable in streaming data from different data sources like Log files, JMS, applications sources into HDFS using Flume sources.
- Knowledge of data warehousing and ETL tools like Informatica, Talend and Pentaho.
- Experience with Testing MapReduce programs using MR Unit, Junit.
- Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB.
- Involved in Incident management and change management process. Heavily involve in fully automated CI/CD pipeline process through Github, Jenkins and Puppet
- Extensive experience in working with SOA based architectures using Rest based web services using JAX-RS and SOAP based web services using JAX-WS.
- Hands on experience with build and deployment tools Maven, ANT and Gradle and Jenkins.
- Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA/MINGLE to track issues and crucible for code reviews.
- Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant-Build Tool, MS-Office, PLSQL Developer, SQL*Plus.
- Experience in different application servers like JBoss/Tomcat, WebLogic, and IBM WebSphere.
- Experience in working with Onsite-Offshore model.
- Implemented logging framework - ELK stack (Elastic Search, Logstash & Kibana) on AWS.
TECHNICAL SKILLS
Programming Languages: Java, python, Scala, Shell Scripting, SQL, PL/SQL.
J2EE Technologies: Java, Spring, Servlets, SOAP/REST services, JSP, JDBC, SML, Hibernate.
Bigdata Ecosystem: HBase, Hortonworks, MapReduce, Hive, Pig, Sqoop, Impala, Cassandra, Oozie, Nifi, Zookeeper, Flume, Ambary, Storm, Spark and Kafka.
Databases : NoSQL, Oracle 10g/11g/12C, SQL Server 2008/2008 R 2/2012/2014/2016/ 2017 , MySQL 2003-2016.
Database Tools: Oracle SQL Developer, MongoDB, TOAD and PLSQL Developer.
Cloud Tools: AWS, S3, EMR, EC2
Modeling Tools: UML on Rational Rose 4.0/7.5/7.6/8.1
Web Technologies: HTML5, JavaScript, XML, JSON, jQuery, Ajax, CSS3.
Web Services: Web Logic, Web Sphere, Apache Cassandra, Tomcat Eclipse, NetBeans, WinSCP.
Operating systems: Windows, UNIX, Linux (Ubuntu), Solaris, Centos, Ubuntu, Windows Server 2003/2006/2008/2009/2012/2013/2016.
Frameworks: MVC, Struts, Log4J, Junit, Maven, ANT, Webservices.
PROFESSIONAL EXPERIENCE:
Senior Hadoop Developer
Confidential, Stamford, CT
Responsibilities:
- Analysed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
- Designed and implemented MapReduce based large-scale parallel relation-learning system.
- Developed custom aggregate functions using Spark SQL and performed interactive querying.
- Installed and Configured multi-nodes fully distributed Hadoop cluster.
- Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
- Built Spark Scripts by utilizing Scala shell commands depending on the requirement.
- Worked with Senior Engineer on configuring Kafka for streaming data
- Imported data into HDFS from various SQL databases and files using Sqoop and from streaming systems using Storm into Big Data Lake.
- Involved in scripting (python and shell) to provision and spin up virtualized Hadoop clusters
- Worked with NoSQL databases like HBase to create tables and store the data Collected and aggregated large amounts of log data using Apache Flume and staged data in HDFS for further analysis.
- Wrote Pig scripts to store the data into HBase
- Collect all the logs from source systems into HDFS using Kafka and perform analytics on it.
- Exported the analyzed data to Teradata using Sqoop for visualization and to generate reports for the BI team.
- Extracted files from RDBMS through Sqoop and placed in HDFS and processed.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (HBase).
- Installed and configured Hive and also written Hive UDFs.
- Involved in HDFS maintenance and administering it through Hadoop-Java API
- Loaded data into the cluster from dynamically generated files using FLUME and from RDBMS using Sqoop.
- Developed and Configured Kafka brokers to pipeline server logs data into spark streaming
- Sound knowledge in programming Spark using Scala.
- Involved in writing Java API’s for interacting with HBase
- Involved in writing Flume and Hive scripts to extract, transform and load data into Database
- Used HBase as the data storage
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
- Create interface to convert mainframe data into ASCII.
- Experienced in installing, configuring and using Hadoop Ecosystem components.
- Experienced in Importing and exporting data into HDFS and Hive using Sqoop.
- Installed and configured Hive and also written Hive UDFs and Used Map Reduce and Junit for unit testing.
- Experienced in working with various kinds of data sources such as Teradata and Oracle.
- Successfully loaded files to HDFS from Teradata, and load loaded from HDFS to HIVE and IMPALA.
- Performed POCs on Spark test environment
- Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
- Started using Apache NiFi to copy the data from local file system to HDFS.
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined detain partitioned tables in the EDW.
- Monitored and managed the Hadoop cluster using Apache Ambary
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Red hat.
- Involved in implementing security on Hortonworks Hadoop Cluster using with Kerberos by working along with operations team to move non secured cluster to secured cluster.
- Implemented Partitioning, Dynamic Partitions and Buckets in Hive for efficient data access.
Environment: Java, Hadoop, Hive, Pig, Sqoop, Scala, Flume, HBase, Hortonworks, Oracle 10g/11g/12C, Teradata, Cassandra, HDFS, Kafka, Data Lake, Spark, MapReduce, Ambari, Cloudera, Tableau, AWS, Jenkins, maven, Snappy, Zookeeper, NoSQL, Shell Scripting, Ubuntu, Solar.
Hadoop Developer
Confidential - Englewood Cliffs, NJ
Responsibilities:
- Worked on migrating MapReduce programs into Spark transformations using Scala.
- Responsible for building scalable distributed data solutions using Hadoop.
- Developed data pipeline using Sqoop, Hive, Pig and Java MapReduce to ingest claim and policy histories into HDFS for analysis.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Applied MapReduce framework jobs in java for data processing by installing and configuring Hadoop, HDFS.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Implemented Fair schedulers on the Job tracker to share the resources of the Cluster for the Map Reduce jobs given by the users.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Worked on processing streaming data from Kafka topics using Scala and ingest the data into Cassandra.
- Developed NiFi workflows to automate the data movement between different Hadoop systems.
- Perform POC on single member debug on Spark and Hive
- Responsible for architecting Hadoop clusters with CDH3.
- Imported data from mainframe dataset to HDFS using Sqoop. Also handled importing of data from various data sources (i.e. Oracle, DB2, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive and MapReduce.
- Worked on Scala programming in developing Spark streaming jobs for building stream data platform integrating with Kafka.
- Analyzed the data by performing Hive queries, ran Pig scripts, Spark SQL and Spark Streaming.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Worked on NoSQL databases including HBase and Elastic Search.
- Performed cluster co-ordination through Zookeeper.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in Map Reduce way.
- Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark/PySpark.
- Run trials connecting the Kafka to the storage layers such as HBase, MongoDB, HDFS/Hive and other analytics.
- Worked on Scala programming in developing Spark streaming jobs for building stream data platform integrating with Kafka.
- Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
- Migrated the computational code in hql to PySpark.
- Completed data extraction, aggregation and analysis in HDFS by using PySpark and store the data needed to Hive.
- Developed NiFi Workflows to pick up the data from Data Lake as well as from server logs and send that to Kafka broker.
- Installed and configured Hive and also written Hive UDFs.
- Performed data analysis in Hive by creating tables, loading it with data and writing Hive queries which will run internally in a MapReduce way.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
- Developed shell script to pull the data from third party system’s into Hadoop file system.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig. Migrated existing on-premises application to AWS and used AWS services like EC2 and S3 for large data sets processing and storage and worked with Elastic MapReduce and setup Hadoop environment in AWS EC2 Instances.
- Responsible for migrating the code base from Hortonworks Platform to Amazon EMR and evaluated Amazon eco systems components like Redshift.
- Lead the Offshore team for automating the NiFi workflows using NiFi REST api
- Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it in AWS HDFS.
- Load log data into HDFS using Flume. Worked extensively in creating MapReduce jobs to power data for search and aggregation.
Environment : Hadoop, MapReduce, Scala, Spark, Kafka, HDFS, Flume, Nifi, Cassandra, Sqoop, Pig, HBase, Hive, ZooKeeper, Cloudera, Oozie, Elastic Search, Sqoop, AWS, Jenkins, Maven, NoSQL, UNIX/LINUX.
Big Data Developer
Confidential
Responsibilities:
- Hands-on experience developing application leveraging Hadoop Ecosystem components (Hadoop, MapReduce, Spark, Pig, Hive and Sqoop).
- Installed and configured three node clusters in Full Distributed mode and pseudo Distributed mode. Extracted data from relational databases such as SQL Server and MySql by developing Scala and SQL code.
- Deploy, Configure, Maintain Compute on Azure Cloud.
- Develop and maintain new process and document it.
- Imported and exported data (MySQL, CSV and text file) from local/ external file system and MySQL to HDFS on a regular basis.
- Worked with structured, semi-structured and unstructured data which is automated in the tool Big Bench.
- Configuring Spark Streaming to receive real time data from the Kafka and Store the stream data to HDFS.
- Worked with Spark to create structured data from the pool of unstructured data received.
- Analyzed the data and proposed NoSQL database solutions to meet requirements.
- Installed and integrated redshift with Hadoop to meet business requirements.
- Install and maintain Hadoop Hortonworks Cluster on EC2 servers.
- Worked towards creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka. Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Configure Hadoop stack on EC2 servers. Transferred data between S3 and EC2 instances.
- Involved in developing machine learning libraries for data analysis and data visualization.
- Developed multiple MapReduce jobs in Java and python for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Involved in continuous monitoring and managing the Hadoop cluster using Hortonworks.
- Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
- Used tableau for data visualization.
- Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Developer and maintained ETL (Data extraction, Transformation and loading) mappings using Informatica designer.
- Installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH4) distributions and on Amazon web services (AWS).
- Explore prebuilt ETL metadata, mappings and DAC metadata and Develop and maintain SQL code as needed for SQL Server database.
- Responsible to Configure on the Hadoop cluster and troubleshoot the common Cluster Problem.
- Involved in handling the issues related to cluster start, node failures on the system.
- Cluster configuration and data transfer (distcp and hftp), inter and intra cluster data transfer.
Environment : Hadoop, HDFS, Map Reduce, Hive, Sqoop, Kafka, Scala, Pig, Nifi, Hortonworks, Cloudera, Hbase, Spark, Oozie, Cassandra, Python, Shell Scripting, AWS, AWS EMR, EC2.
Hadoop/SQL Developer
Confidential - Charlotte, NC
Responsibilities:
- Imported log files of master card, baseII, visa organizations from mainframes using Golden Gate Software and injected these logfiles into hive tables by creating hive external tables for each type of log files.
- Written complex Hive and spark SQL queries for data analysis to meet business requirements.
- Creating Hive external tables to store the GGS output. Working on them for data analysis to meet the business requirements.
- Created and HBase tables to load huge amount of structured, semi-structured and unstructured data coming from NoSQL and Tandem system.
- Used ESP schedule jobs to automate the pipeline workflow and orchestrate the MapReduces jobs that extract the data on a timely manner.
- Involved in Hive performance optimizations like partitioning, bucketing and perform several types of joins on Hive tables and implementing Hive serdes like JSON and Avro.
- Designed and implemented Map Reduce-based large-scale parallel relation-learning system.
- Worked with Parquet, Avro Data Serialization system to work with all data formats.
- Implemented several types of scripts like shell scripts, python, and HQL scripts to meet the business requirements.
- Performing technical analysis, ETL design, development, and deploying on the data as per the business requirement.
- Involved in performing various data manipulations using various Talend components.
- Developed Spark Streaming applications for real time Processing.
- Experienced in managing and reviewing Hadoop log file.
- Used streamsets engine to stream the data in real time.
- Experienced in working with different scripting technologies like Python, Unix shell scripts.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Designed and developed a corporate intranet that is used in daily workflow to increase.
- Developed Spark Streaming applications for real time Processing.
- Applied different Transformations and actions in Spark-SQL like joins and collect.
- Drives and leads solution design services, including requirements analysis, functional and technical design leadership, and documentation / review with business and IT constituents
- Used agile development environments using continuous integration and deployments.
Environment: Hadoop, HDFS, Hive, Spark, MapReduce, Pig, Cloudera, Avro, CDH, Shell script, Eclipse, Python, MySQL, AWS S3, GCP.
JAVA Developer
Confidential
Responsibilities:
- Developed the Web Based Rich Internet Application (RIA) using J2EE (Spring framework) and Macromedia/Adobe Flex.
- Developing Applications using latest JAVA/J2EE technologies such as Cassandra, JAVA 8, Junit.
- Extensively worked in designing and implementation of multi-tier applications using JAVA 8, J2EE, JDBC, JSP, HTML5, Spring Boot, AngularJS, Servlets, JavaBeans with Eclipse.
- Used new Java 8 features, such as Lambda expressions, Streams and Functional Interfaces.
- Designed and developed using the Java collection API with Java 8 features and other java processes to fulfill business case requirement, such clients account statements and clients running balance based on each transaction.
- Replaced existed Angular JS code with Angular 4 code, which decreased number of lines of code for a web application which increased performance.
- Expertise in Big data architecture with Hadoop File system and its eco system tools MapReduce, HBase, Hive, Agile, Pig, Zookeeper, Oozie, Flume, Avro, Impala, Apache spark, Spark Streaming and Spark SQL
- Worked on Modularization of the JDK under Project Jigsaw on Java 8. RAML specifications/build experience with MuleSoft.
- Extensively worked on both Enterprise and Community edition of MULE ESB and also configured MULE API manager and RAML.
- Worked on RAML and REST based Web services for Mule ESB flows, also on Mule Soft MMC and Enterprise release capabilities.
- Developed software's for AWS in Java 8 (using Spring Framework 5.0, MySQL 5.6, AWS Aurora, Lambda, API Gateway, S3, SNS, SQS, DynamoDB, EC2, EBS, Akamai WAF (web application firewall) and Apache Tomcat web server.)
- Developed restful Web-services using Grails framework in Python.
- Used Spring RESTful API to create RESTful Web Services, set JSON data type between front-end and the middle-tier controller
- Developed Spring Restful/Micro services and implemented Spring Eureka, Netflix, Ribbon as part of Services Discovery using Apache Axis.
- Worked in developing front end technologies such as java script, Angular 2.0+, jQuery, HTML, CSS, JSON, JSP and Struts 1.0/2.0
- Team leader on numerous projects utilizing Java, Java EE, Enterprise Java Bean, and Apache Struts Web applications to create fully-integrated client management systems.
- Deployed Spring Boot based micro services in Docker and Amazon EC2 container using Jenkins
- Worked with Splunk and ELK stack for creating monitoring and analytics solutions.
- Developed Micro services using Spring MVC, Spring Boot, and Spring Cloud.
- Used Micro services architecture, with Spring Boot based services interacting through a combination of REST and Spring Boot.
- Built Spring Boot micro services for the delivery of software products across the enterprise
Environment : Apache Hadoop, HDFS, MapReduce, Sqoop, Flume, Pig, Hive, HBase, Oozie, Scala, Spark, AWS, Linux.
