Hadoop Senior Consultant Resume
Ashburn, VA
PROFESSIONAL SUMMARY:
- Over 9 years of IT experience in various domains wif Hadoop Eco Systems, Core java and SQL&PL/SQL Technologies wif hands - on project experience in various Verticals which includes financial services and trade compliance.
- Extensive hands-onexperience in Spark Core, Spark-Sql, Spark Streaming and Spark machine learning using Scala and Python programming language.
- Solid understanding of RDD operations in Apache Spark i.e., Transformations &Actions, Persistence (Caching), Accumulators, Broadcast Variables, Optimising Broadcasts.
- In depth understanding of Apache spark job execution Components like DAG, lineage graph, Dag Scheduler, Task scheduler, Stages and task.
- Experience in exposing Apache Spark as web services.
- Good understanding of Driver, Executor Spark web UI.
- Experience in submitting Apache Spark job and map reduce jobs to YARN.
- Experience in real time processing using Apache Spark and Flume, Kafka.
- Migrated Python Machine learning modules to scalable, high performance and fault-tolerant distributed systems like Apache Spark.
- Strong experience in Spark SQL UDFs, Hive UDFs, Spark SQL Performance, Performance Tuning. Hands on experience in working wif input file formats like orc, parquet, json, avro.
- Good expertise in coding in Python, Scala and Java.
- Good understanding of teh map reduces framework architectures (MRV1 & YARN Architecture).
- Good Knowledge and understanding of Hadoop Architecture and various components in Hadoop ecosystems - HDFS, Map Reduce, Pig, Sqoop and Hive.
- Developed various Map Reduce applications to perform ETL workloads on meta data and terabytes of data.
- Hands on experience in cleansing semi-structured and unstructured data using Pig Latin scripts
- Good working knowledge in creating Hive tables and worked using HQL for data analysis to meet teh business requirements.
- Experience in managing and reviewing Hadoop log files.
- Having good working experience of No SQL database like Hbase,Cassandra and MangoDB
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Experience in importing and exporting teh data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Experience in working wif flume to load teh log data from multiple sources directly into HDFS
- Experience in scheduling time driven and data driven Oozieworkflows.
- Used Zookeeper on a distributed Hbase for cluster configuration and management.
- Worked wif Avro Data Serialization system.
- Experience in fine-tuning Map reduces jobs for better scalability and performance.
- Experience in writing shell scripts do dump teh shared data from landing zones to HDFS.
- Experience in performance tuning teh Hadoop cluster by gathering and analyzing teh existing infrastructure.
- Expertise in Client Side designing and validations using HTML and Java Script.
- Excellent communication and inter-personal skills detail oriented, analytical, time bound, responsible team player and ability to coordinate in a team environment and possesses high degree of self-motivation and a quick learner.
TECHNICAL SKILLS:
Confidential Frameworks: Hadoop, Hive, Kafka, AWS, Cassandra, HBase, Flume, Pig, Sqoop, Map Reduce, Cloudera, Mongo DB, Spark, Scala.
Confidential distribution: Cloudera, Amazon EMR
Programming languages: Oracle PL/SQL,Core Java, Scala, Python, Shell Scripting
Operating Systems: Windows, Linux (Ubuntu)
Databases: Oracle10g,Mysql, Netezza, Sql Server, Tera Data, Postgres
Designing Tools: Eclipse,PL/SQL Developer, Toad,Putty
Development methodologies: Agile, Waterfall
Messaging Services: ActiveMQ, Kafka,JMS
Version Tools: PVCS, SVN and CVS, Git
Analytics: Tableau, SPSS, SAS EM and SAS JMP
PROFESSIONAL EXPERIENCE:
Confidential, Ashburn, VA
Hadoop Senior Consultant
Responsibilities:
- Write spark jobs to read data into a data frame and apply various transformations and actions to filter and transform data into teh required format
- Build scalable framework using spark’s advanced framework.
- Write spark jobs to write final data into HDFS and RDBMS.
- Manage and monitor Hadoop cluster.
- Manage ETL team and motivate/educate them to learn and work efficiently
- Involved Hive queries into Spark SQL to improve performance.
- Executed Spark RDD transformations and actions as per business analysis needs
- Imported data from MySQL to HDFS using Sqoop and manage Hadoop log files
- Fully automated job scheduling, monitoring, and cluster management wifout human.
- Used Sqoop to import and export data among HDFS, MySQL database and Hive.
Environment: Linux, Hadoop, Spark core, Spark SQL, Scala, HiveCITI Bank, USA Jul ’2015 to Jun’2017
Confidential
Hadoop Consultant.
Responsibilities:
- Load and transform large sets of structured, semi structured and unstructured data coming from different source systems and a variety of portfolios
- Used Sparkdata frame to read text data, CSV data, and image data from HDFS, S3 and Hive.
- Worked closely data scientist for building predictive model using Spark.
- Cleaned input text data using Spark Machine learning feature exactions API.
- Migrated Hive queries into Spark SQL to improve performance.
- Involving in Migrating teh coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
- Trained model using historical data stored in HDFS and Amazon S3.
- Used Spark Streaming to load teh trained model to predict on real time data from Kafka.
- Executed Spark RDD transformations and actions as per business analysis needs
- Imported data from MySQL to HDFS using Sqoop and manage Hadoop log files
- Fully automated job scheduling, monitoring, and cluster management wifout human.
- Created Hive tables and involved in meta data loading and writing Hive UDFs
- Used Sqoop to import and export data among HDFS, MySQL database and Hive.
- Migrated python scikit learn machine learning to data frame based spark machine learning algorithms.
Environment: Spark core, SparkSQL, Spark streaming, Spark machine learning, Scala, Data frames, Datasets,AWS, Kafka Hive, Sqoop, Hbase,Github, Webflow, Amazon s3, Amazon EMR.
Confidential
Hadoop Associate Consultant
Responsibilities:
- Created various Map reduce jobs for performing ETL transformations on teh transactional and application specific data sources.
- Imported data from our relational data stores to Hadoop using Sqoop
- Wrote PIG scripts and executed by using Grunt shell.
- Worked on teh conversion of existing Map Reduce batch applications for better performance.
- Confidential analysis using Pig and User defined functions (UDF).
- Worked on loading tables to Impala for faster retrieval using different file formats .
- Teh system was initially developed using Java. Teh Java filtering program was restructured to has business rule engine in a jar that can be called from both java and Hadoop.
- Created Reports and Dashboards using structured and unstructured data.
- Upgrade operating system and/or Hadoop distribution as and when new versions released by using Puppet.
- Performed joins, group by and other operations in Map Reduce by using Java and PIG.
- Processed teh output from PIG, Hive and formatted it before sending to teh Hadoop output file.
- Used HIVE definition to map teh output file to tables.
- Setup and benchmarked Hadoop/HBase clusters for internal use
- Wrote data ingesters and map reduce programs
- Reviewed teh HDFS usage and system design for future scalability and fault-tolerance
- Wrote MapReduce/HBase jobs
- Worked wif HBase, NOSQL database.
Environment: ApacheHadoop 2.x, MapReduce, HDFS, Hive, Pig, Hbase, Sqoop, Flume, Linux, Java 7, Eclipse, NOSQL.
Confidential
Hadoop Associate Consultant
Responsibilities:
- Developed Confidential Solutions that enabled teh business and technology teams to make data-driven decisions on teh best ways to acquire customers and provide them business solutions.
- Installed and configured Apache Hadoop, Hive, and HBase.
- Worked on Hortonworks cluster, which was used to process teh Confidential .
- Developed multiple map reduce jobs in java for data cleaning and pre-processing.
- Sqoop was used to pull data into Hadoop distributed file system from RDBMS and vice versa
- Defined workflows using Oozie.
- Used Hive to create partitions on hive tables and analyzes dis data to compute various metrics for reporting.
- Created Data model for Hive tables
- Good Experience in managing and reviewing Hadoop log files
- Used Pig as ETL tool to do transformations, joins and pre-aggregations before loading data onto HDFS.
- Worked on large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources
- Installed and configured Hive and also developed Hive UDFs to extend core functionality of hive
- Responsible for loading data from UNIX file systems to HDFS.
Environment: Apache Hadoop 2.x, MapReduce, HDFS, Hive, HBase, Pig, Oozie, Linux, Java 7, Eclipse.
Confidential
Associate Consultant
Responsibilities:
- Full life cycle experience including requirements analysis, high level design, detailed design, data model design, coding, testing and creation of functional and technical design documentation.
- Extensively involved in writing stored procedures, functions, packages as per teh business requirements.
- Redesigned existing procedures and packages to enhance teh performance.
- Debugging Pro*C and PL/SQL code block of stored procedures.
- Generation of ad-hoc reports using SQL and stored procedures.
- Involved in teh continuous enhancements and fixing of production problems.
- Analysis of CRs those are raised from UAT & Production.
- Coordinating wif teh UAT & Production team as well as wif teh users.
- Used Bulk Collectionsforbetter performanceand easy retrieval of data, by reducing context switching between SQL and PL/SQL engines.
- Wrote SQL, PL/SQL, SQL*Plus programs required to retrieve data using cursors and exception handling
- Involved in SIT and UAT Support for solving critical issues.
- Involved in requirements, Design phases, Coding, Testing of teh functionality.
- Creating and Maintaining Database objects.
- End to end functional testing for entire application
Environments: Oracle11g, SQL, PL/SQL, Pro*c, Putty, Sun Solaris.
Confidential
Software Engineer
Responsibilities:
- Involved in writing stored procedures, functions, packages as per teh business requirements.
- Developed Pro*c programs for flat file generation.
- Worked on Request for Changes (RFC) and Production Problem Resolutions (PPR).
- Provided support across teh various phases of teh project.
- Prepared and executed unit test cases.
Environment: Oracle11g, SQL, PL/SQL, Pro*c, Putty, Sun Solaris