We provide IT Staff Augmentation Services!

Senior Spark/hadoopconsultant Resume

3.00/5 (Submit Your Rating)

VirginiA

SUMMARY

  • Over 10 years of professional IT experience including 4 plusyears of experience on Big Data,HadoopDevelopment and Ecosystem Analytics, Development and Design of Java based enterprise applications.
  • Experience in different Hadoopdistributions like Cloudera(Cloudera distribution CDH3, 4 and 5), Horton Works Distributions (HDP) and Elastic Mapreduce (EMR).
  • Extensive experience in Spark/Scala,MapReduce MRv1 and MapReduce MRv2 (YARN)
  • Extensive experience in testing, debugging and deploying MapReduceHadoopplatforms
  • Extensive experience in working withHDFS, PIG, Hive, Sqoop, Flume, Oozie, Zookeeper and HBase, Spark
  • Experience with Cloudera CDH4 and CDH5 distributions
  • Expert in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
  • Experience with Sequence files, AVRO, ORC, Parquet file formats and compression
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Successfully loaded files to Hive and HDFS from Oracle and SQL Server using SQOOP.
  • Experience in creating Map Reduce codes in Java as per the business requirements.
  • Developed Mapreduce jobs to automate transfer of data from Hbase.
  • Strong experience in working with Elastic MapReduceand setting up environments on Amazon AWS EC2 instances
  • Hands on NoSQL database experience with HBase and Cassandra
  • Good working knowledge on Eclipse, Intellij IDE for developing and debugging Java applications
  • Expertise in creating UI using JSP, HTML, XML and JavaScript.
  • Created the data maps from database to dimension and fact table.
  • Worked with Oraclefor data import/export Operations.
  • Carried out the QA deployments and worked on the processflowdiagram.
  • Created dimension and factjobs and schedulingjobruns.
  • Well experienced in using networkingtools like PuTTY and WinSCP
  • Have technical exposure on Cassandra CLI creating key spaces, column families and analyzing fetched data.
  • Hands on experience SQL in Datawarehouse Environment with Reporting.
  • Strong understanding of OLAP and data mining concepts.
  • Experienced in creating and analyzing Software Requirement Specifications (SRS) and FunctionalSpecification Document (FSD). Strong knowledge of Software Development Life Cycle (SDLC)
  • Created Business Intelligence reports using Tableau.
  • Strong Knowledge of Flume andKafka.
  • Having experienced in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along with data analytics, data wrangling and Excel data extracts.
  • Experience in Scrum, Agile and Waterfall models.Work Experience

TECHNICAL SKILLS

Programming Languages: C,C++,CORE,Advance Java JAVA,COBOL, JCL, SQL,Unix

Data Handling: Oozie, Ambari, FlumeHadoop, HDFS, Map Reduce, Yarn, Pig, Hive, HBase, Sqoop,Zoo Keeper

Scripting Languages: Perl Shell Scripting

Databases: Explored MS SQL 2008 Oracle 11g, DB2, MongoDB

Software Experience: Oracle Data Miner(ODM), Visual Studio C++, Eclipse, Oracle SQLDeveloper, Web Services MQ communication

Libraries/Tools Explored: ECLIPSE,MQSeries,CHANGEMAN,TSO,FILE-AID,PANVALET,LCS,STS,NDM, SPUFI,QMF

Web Related: HTML, Extended Java Script, Servlet JavaScript, JSP, XQuery, XPath, XML, XSLT

Operating System: Android, IOS Microsoft Windows, Unix/Solaris, Linux, MVS-OS/390,MS-WINDOWS 2000/XP AND MS-DOS

PROFESSIONAL EXPERIENCE

Senior Spark/Hadoop Consultant

Confidential, Virginia

Responsibilities:

  • Worked on analyzingHadoopcluster and different Big Data analytic tools including Pig, Hbase database and Sqoop
  • Responsible for building scalable distributed data solutions usingHadoop
  • Installed and configured Flume, Hive, Pig, Sqoop, Hbase on theHadoopcluster.
  • Managing and scheduling Jobs on aHadoopcluster.
  • Implemented nine nodes CDH3Hadoopcluster on Red hat LINUX.
  • Resource management ofHADOOPCluster including adding/removing cluster nodes for maintenance and capacity needs
  • Involved in loading data from UNIX file system to HDFS.
  • Created Hbase tables to store variable data formats of PII data coming from different portfolios.
  • Installed and configured Hive and also written Hive UDFs.
  • Cluster coordination services through Zookeeper.
  • Experience in managing and reviewingHadooplog files.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.

Environment: Hadoop, HDFS, Hive, Flume, Hbase, Sqoop, PIG, Java (JDK 1.6), Eclipse, MySQL and Ubuntu, Zookeeper

Senior Spark/Hadoop Consultant

Confidential, Rensselaer, NY

Responsibilities:

  • Designing the entire architecture of the data pipeline for analysis.
  • Worked on Sqoop jobs to import data from Oracle and bring into HDFS.
  • Scala Script to load processed into DataStax Cassandra 4.8.
  • Performace tuning of Spark and Sqoop Job
  • Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
  • Map - Reduce Job to compare two files TSV and save the processed output into Oracle
  • Hands on design and development of an application using Hive (UDF).
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query
  • Provide support data analysts in running Pig and Hive queries.
  • Transformed the ABintio Process intoHadoopusing PIG and HIVE
  • Created partitioned tables in Hive
  • Created Reports using Tableau on HiveServer2.
  • Worked on Data Modelling for Dimension and Fact tables in Hive Warehouse.
  • Sceduling the jobs thorugh Walgreens EBS internal Scehduling System.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Hortoworks Data Platform 2.3.4,Hadoop2.7, Spark 1.4.1, Scala 2.10, SBT 0.13, Sqoop 1.4.6, Mapreduce, HDFS, Pig, Hive 0.13, Java, Oracle 11g, DataStaxCassandra 4.8,Centos, Windows, Python 3.0

Senior Hadoop/Spark Developer

Confidential, Chicago, IL

Responsibilities:

  • Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
  • Developed custom multi - threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
  • Developed many Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning exercise.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Worked on fine-tuning spark applications to improve the over-all processing time for the pipelines.
  • Wrote Kafka producers to stream the data from external rest APIs to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient joins, transformations and other capabilities.
  • Worked extensively with Sqoop for importing data from Oracle.
  • Experience working for EMR cluster in AWS cloud and working with S3.
  • Involved in creating Hive tables, loading and analyzing data using hive scripts.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Good experience with continuous Integration of application using Jenkins.
  • Used Reporting tools like Tableau to connect with Impala for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.

Environment: Spark, Hive, S3, Sqoop, Shell Scripting, AWS EMR, Kafka, AWS S3, Map Reduce, Scala, Eclipse, Maven.

Spark/Hadoop Developer

Confidential, Sterling, VA

Responsibilities:

  • Extensively migrated existing architecture to Spark Streaming to process the live streaming data.
  • Responsible for Spark Core configuration based on type of Input Source.
  • Executed Spark code using Scala for Spark Streaming/SQL for faster processing of data.
  • Performed SQL Joins among Hive tables to get input for Spark batch process.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs.
  • Developed Kafka producer and brokers for message handling.
  • Involved in importing the real - time data toHadoopusing Kafka and implemented the Oozie job for daily imports.
  • Used Amazon CLI for data transfers to and from Amazon S3 buckets.
  • ExecutedHadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
  • Experience in pulling the data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
  • Implemented Spark RDD transformations, actions to implement business analysis.
  • Used Amazon CloudWatch to monitor and track resources on AWS.
  • Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, JSON, CSV formats.
  • The RDDs and data frames undergo various transformations and actions and are stored in HDFS as parquet Files.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's in Scala and Python.
  • Using Spark-Streaming APIs to perform transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and persists into Cassandra.
  • Implemented CRUD operations using CQL on top of Cassandra file system.
  • Worked on analyzing and examining customer behavioral data using Cassandra.
  • Set up Solr Clouds for distributing indexing and search.
  • Analyzes large amount of data sets to determine optimal way to aggregate and report on it.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.

Environment: Cassandra, Kafka, Spark, Pig, Hive, Oozie, AWS, Solr, SQL, Scala, Python, Java, FileZilla, putty, IntelliJ, github.

Java/J2EE Developer

Confidential, New York, NY

Responsibilities:

  • Responsible for all stages of design, development, and deployment of applications.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Implemented the application using Struts2 Framework which is based on Model View Controller design pattern.
  • Developed Custom Tags to simplify the JSP2.0 code. Designed UI screens using JSP 2.0, Ajax and HTML. Used JavaScript for client side validation.
  • Actively involved in designing and implementing Value Object, Service Locator, MVC and DAO design patterns.
  • Developed and used JSP custom tags in the web tier to dynamically generate web pages.
  • Used Java Message Service for reliable and asynchronous exchange of important information such as Order submission that consumed the messages from the Java message queue and generated emails to be sent to the customers.
  • Designed and developed Stateless Session driven beans (EJB 3)
  • Used JQuery as a Java Script library.
  • Used Data Access Object (DAO) pattern to introduce an abstraction layer between the business logic tier (Business object) and the persistent storage tier (data source).
  • Implemented Session EJB's at a middle tier level to house the business logic.
  • Used RESTful Web services for sending and getting data from different applications using Jersey Framework.
  • Developed stored procedures and complex packages extensively using PL/SQL and shell programs.
  • Used DB2 as database and developed complex SQL queries.
  • Used JUnit framework for unit testing of application and Maven to build the application and deployed on Web Sphere 8.5. Used IDE RAD 7.5
  • Used HP Quality Center for Defect Reporting and Tracking
  • Prepared Low Level Design, High level Design, Unit test Results documents.
  • Used Log4J for logging.

Environment: Struts2, EJB 3, WebSphere 8.5, JQuery, Java 1.6, REST Jersey, JSP 2.0, Servlets 2.5, JMS, XML, JavaScript, UML, HTML5, JNDI, CVS, Log4J, JUnit, Eclipse, ANT, Maven, DB2, RAD 7.5, Windows XP

We'd love your feedback!