We provide IT Staff Augmentation Services!

Spark/hadoop Consultant Resume

3.00/5 (Submit Your Rating)

Atlanta, GA

SUMMARY

  • Over 9+ years of professional IT experience including 4 plusyears of experience on Big Data, Hadoop Development and Ecosystem Analytics, Development and Design of Java based enterprise applications.
  • Experience in differentHadoopdistributions like Cloudera(Cloudera distribution CDH3, 4 and 5), Horton Works Distributions (HDP) and Elastic Mapreduce (EMR).
  • Extensive experience in Spark/Scala,MapReduce MRv1 and MapReduce MRv2 (YARN)
  • Extensive experience in testing, debugging and deploying MapReduce Hadoop platforms
  • Extensive experience in working wifHDFS, PIG, Hive, Sqoop, Flume, Oozie, Zookeeper and HBase, Spark
  • Experience wif Cloudera CDH4 and CDH5 distributions
  • Expert in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
  • Experience wif Sequence files, AVRO, ORC, Parquet file formats and compression
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Successfully loaded files to Hive and HDFS from Oracle and SQL Server using SQOOP.
  • Experience in creating Map Reduce codes in Java as per the business requirements.
  • Developed Mapreduce jobs to automate transfer of data from Hbase.
  • Strong experience in working wif Elastic MapReduceand setting up environments on Amazon AWS EC2 instances
  • Hands on NoSQL database experience wif HBase and Cassandra
  • Good working knowledge on Eclipse, Intellij IDE for developing and debugging Java applications
  • Expertise in creating UI using JSP, HTML, XML and JavaScript.
  • Created the data maps from database to dimension and fact table.
  • Worked wif Oraclefor data import/export Operations.
  • Carried out the QA deployments and worked on the processflowdiagram.
  • Created dimension and factjobs and schedulingjobruns.
  • Well experienced in using networkingtools like PuTTY and WinSCP
  • Have technical exposure on Cassandra CLI creating key spaces, column families and analyzing fetched data.
  • Hands on experience SQL in Datawarehouse Environment wif Reporting.
  • Strong understanding of OLAP and data mining concepts.
  • Worked on Performance Tuning ofHadoopjobs by applying techniques such as Map Side Joins, Partitioning, Bucketing
  • Experienced in creating and analyzing Software Requirement Specifications (SRS) and FunctionalSpecification Document (FSD). Strong knowledge of Software Development Life Cycle (SDLC)
  • Experienced in preparing and executing Unit Test Plan and Unit Test Cases after software development.
  • Expertise in Defect Management and Defect Tracking to do Performance Tuning for delivering utmost Quality product.
  • Created Business Intelligence reports using Tableau.
  • Strong Knowledge of Flume andKafka.
  • Having experienced in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along wif data analytics, data wrangling and Excel data extracts.
  • Experience of software development in Python (libraries used: libraries- BeautifulSoup, numpy, scipy, matplotlib, python-twitter, OpenCV, Pandas dataframe, networkx, urllib2, MySQLdb for database connectivity) and IDEs - sublime text, pycharm, emacs.
  • Good knowledge in Machine Learning Concepts by using Mahout and Mallet packages.
  • Experience in Scrum, Agile and Waterfall models.
  • Good communication Skills, committed, result oriented, hard working wif a quest to learn new technologies.

TECHNICAL SKILLS

BigData/ Hadoop Framework: Spark, HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume HBase, Amazon AWS (EMR)

Databases: Cassandra, MySQL, Oracle

Languages: Java, Scala 2.11.0, Python2.7/3.x, Pig Latin, HiveQL

Web Technologies: JSP, Servlets, JavaBeans, JDBC, AWT

Operating Systems: CentOS, Ubuntu, Macintosh, Windows 10, Windows 90/00/NT/XP

Front-End: HTML/HTML 5, CSS3, JavaScript/JQuery

Development Tools: Microsoft SQL Studio, DbVisualizer, Eclipse, Intelij, MySQLWorkbench, Pycharm, Sublime, PL/SQL Developer

Reporting Tool: Tableau, SAP Business Objects

Office Tools: Microsoft Office Suite

Development Methodologies: Agile/Scrum, Waterfall

Other skills: Machine Learning, Internet of Things.

PROFESSIONAL EXPERIENCE

Confidential, Atlanta GA

Spark/Hadoop Consultant

Responsibilities:

  • Designing the entire architecture of the data pipeline for analysis.
  • Worked on Scala 2.10 jobs using Spark 1.6.0for data processing using RDD’s and Dataframe API.
  • Wrote Scala/Spark jobs to integrate the real time data coming from Kafka to Parse and Stage it.
  • Scala Applications to load processed into DataStax Cassandra 4.8.
  • Performace tuning of Spark and Sqoop Job
  • Worked on creating Spark Application to lod data into Hbase from Apache Phoenix
  • Worked on transforming the queries written in Hive to Spark Application.
  • Worked on integration of Spark Application wif Microservies for request respose to Web UI.
  • Building the Spark Application and deploying on cluster.
  • Worked on Apache Nifi to Uncompress and move json files from local to HDFS .
  • Created Oozie Jobs for workflow of Spark, Sqoop and Shell scripts.
  • Created Spark Application to load data into Dynamic Partition Enabled Hive Table
  • Worked on stateful transformation of Spark Application
  • Worked on Hive Scripts to apply various tranformations and saving the data into Paruet file format.
  • Map-Reduce Job to compare two files CSV and save the processed output into Hbase.
  • Hands on design and development of an application using Hive (UDF).
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Provide support data analysts in running Pig and Hive queries.
  • Created dynamic partitioned tables in Hive.
  • Worked on Sqoop jobs to import data from Oracle EDBand bring into HDFS.
  • Responsible for defining the data flow wifin Hadoop eco system and direct the team in implement them.
  • Created Reports using Tableau on HiveServer2.
  • Worked on POC’s to inetgrate Spark wif other tools.
  • Worked on Data Modelling for Dimension and Fact tables in Hive Warehouse.

Environment: CDH 5.7.6, Hadoop 2.6, Spark 1.6.0, Scala 2.10, Hbase 1.2.0, Apache Phoenix 4.7, Maven, Apache nifi, Sqoop 1.4.6, Mapreduce, HDFS, Pig, Hive 0.13, Intellij, Oracle EDB, DataStax Cassandra 4.8,Centos, Windows, Python 3.0, Tableu 9.0

Confidential

Spark/Hadoop Consultant

Responsibilities:

  • Designing the entire architecture of the data pipeline for analysis.
  • Worked on Scala jobs using Spark 1.4.1 for data processing using RDD’s and Dataframe API.
  • Worked on Sqoop jobs to import data from Oracle and bring into HDFS.
  • Wrote Scala jobs to integrate the real time data coming from various queue messaging to Parse it.
  • Scala Script to load processed into DataStax Cassandra 4.8.
  • Performace tuning of Spark and Sqoop Job
  • Written Python script to autmate the entire job flow of execution and integrating in one script.
  • Worked on Hive Scripts to aply various tranformations and saving the data into ORC file format.
  • Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
  • Map-Reduce Job to compare two files TSV and save the processed output into Oracle
  • Hands on design and development of an application using Hive (UDF).
  • Custom parser loader application of Data migration to HBase.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Provide support data analysts in running Pig and Hive queries.
  • Resolved Kerberos issue by using MIT Kerberos Ticket tool to connect Hive to various tools.
  • Transformed the ABintio Process into Hadoop using PIG and HIVE
  • Created partitioned tables in Hive
  • Responsible for defining the data flow wifin Hadoop eco system and direct the team in implement them.
  • Created Reports using Tableau on HiveServer2.
  • Worked on Data Modelling for Dimension and Fact tables in Hive Warehouse.
  • Sceduling the jobs thorugh Confidential EBS internal Scehduling System.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Hortoworks Data Platform 2.3.4, Hadoop 2.7, Spark 1.4.1, Scala 2.10, SBT 0.13, Sqoop 1.4.6, Mapreduce, HDFS, Pig, Hive 0.13, Java, Oracle 11g, DataStaxCassandra 4.8,Centos, Windows, Python 3.0

Confidential, Rensselaer, NY

Hadoop Consultant

Responsibilities:

  • Worked on analyzing, writing HadoopMapreduce jobs using Java API, Pig and Hive.
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode high
  • Availability, capacity planning, and slots configuration.
  • Configured MySQL Database to store Hive metadata.
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
  • Solved performance issues in Hive and Pig scripts wif understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
  • Worked in tuning Hive and Pig scripts to improve performance
  • Good experience in writing MapReduce programs in Java on MRv2 / YARN environment
  • Good experience in troubleshooting performance issues and tuning Hadoop cluster
  • Administered, installed, upgraded and managing HDP2.2, Pig, Hive&Hbase
  • Imported and exported data into HDFS and Hive using Sqoop.
  • Knowledge in performance troubleshooting and tuningHadoopclusters.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Involved in start to end process ofHadoopjobs dat used various technologies such as Sqoop, PIG, Hive, Mapreduce, Spark and Shell scripts(for scheduling of few jobs )
  • Used Tableau9.0 to create reports representing analysis in graphical format.
  • Created Hbase tables to store variable data formats of PII data coming from different portfolios.
  • Implemented a script to transmit sysprin information from Teradata and Oracle toHbase using Sqoop.
  • Implemented best income logic using Pig scripts and UDFs.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance Pig queries.
  • Worked wif application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Managed data coming from different sources.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Experience in managing and reviewing Hadoop log files.
  • Implemeted Mahout recommendation engine.
  • Job management using Fair scheduler.
  • Created Mapreduce jobs using Pig Latin and Hive Queries.
  • Worked extensively wif Sqoop for importing metadata from Oracle.
  • Designed a data warehouse using Hive
  • Created partitioned tables in Hive
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
  • Developed workflow inOozieto automate the tasks of loading the data into HDFS and pre-processing wif Pig.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: Hortoworks Data Platform 2.2, Hadoop,Sqoop,Flume,Oozie,Mapreduce, HDFS, Pig, Hive, Hbase, Java, Oracle 10g, MySQL, Ubuntu, Python 3.0

Confidential, Pittsburgh, PA

Hadoop Developer

Responsibilities:

  • Involved in review of functional and non-functional requirements.
  • Installed and configured HadoopMapreduce, HDFS, Developed multiple Mapreduce jobs in Java for data cleaning and preprocessing.
  • Imported and exported data into HDFS and Hive using Sqoop.
  • Monitoring Hadoop cluster using tools like Nagios,Ganglia and Cloudera Manager.
  • Managed andreviewed Hadoop Log Files
  • Hadoopstreaming jobs to process terabytes of XML format data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Responsible to manage data coming from different sources.
  • Integrated NOSQL database (Hbase, Cassandra).
  • Supported Map Reduce Programs running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Implemented Apache Solr for fast retreival for information.
  • Creted repots using Tableau using HIVE ecosystem.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in creating Hive tables, loading wif data and writing Hive queries which will run internally in map reduce way.
  • Gained very good business knowledge on fraud suspect identification,appeals process etc.
  • Developed a custom File System plug in for Hadoop so it can access files on Data Platform.
  • dis plugin allows HadoopMapreduce programs, Hbase, Pig and Hive to work unmodified and access files directly.
  • Designed and implemented Mapreduce-based large-scale parallel relation-learning system
  • Designed and DevelopedTalendJobs to extract data from Oracle into MongoDB.
  • Used R Programming to feed data and Analysis of Datasets of data collected from webdata and implemented statistics (Regression Techniques) for Analysis.
  • Represented data in Pie Chart, Histograms, Line Graphs using R Programming.
  • Extracted feeds form social media sites such as Facebook, Twitter using Python scripts.
  • Setup and benchmarked Hadoop/Hbase clusters for internal use
  • Setup Hadoop cluster on Amazon EC2 using whirr for POC.
  • Wrote recommendation engine using mahout.
  • Configured services wif the web interface Hue.
  • Effectively used Sqoop to transfer data from databases (MySQL, Oracle) to HDFS, Hive.
  • Created Hive Managed and External tables defined wif static and dynamic partitions.
  • Supported Data Analysts in running Map Reduce Programs.
  • Contribute to the creation and maintenance of system documentation

Environment: Java (JDK 1.7), Mapreduce, HDFS, Pig, Hive, Hbase, Flume, Sqoop, Oozie,Cloudera Manager, Unix.

Confidential, IL

Hadoop Developer

Responsibilities:

  • Worked on analysingHadoopcluster and different Big Data analytic tools including Pig, Hbase database and Sqoop.
  • Responsible for building scalable distributed data solutions usingHadoop.
  • Installed and configured Hive, Pig, Sqoop, Hbase on theHadoopcluster.
  • Managing and scheduling Jobs on aHadoopcluster.
  • Wrote Map Reduce Jobs to analyze the data.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
  • Resource management ofHADOOPCluster including adding/removing cluster nodes for maintenance and capacity needs
  • Loaded log data into HDFS
  • Monitored Hadoop cluster using tools like ClouderaManager.
  • Installed and configured Hive and also written Hive UDFs.
  • Supported in setting up QA environment and updating configurations for implementing scripts wif Pig and Sqoop.
  • Automated script to monitor HDFS and Hbase through cronjobs.
  • Cluster coordination services through Zookeeper.
  • Managing and scheduling Jobs on aHadoopcluster.
  • Plan, design, and implement processing massive amounts of marketing information, complete wif information enrichment, text analytics, and natural language processing.
  • Prepare multi-cluster test harness to exercise the system for performance and failover.
  • Develop high-performance cache, making the site stable and improving its performance.
  • Create a complete processing engine, enhanced to performance.
  • Administrative support for parallel computation research on a 24-node Fedora/ Linux cluster.
  • Build and support standard-based infrastructure capable of supporting tens of thousands of computers in multiple locations.
  • Negotiated and managed projects related to designing and deploying tis architecture.
Environment: HDFS, Hbase, Hive, Pig, HDFS, Java, JDBC, Struts, Maven, Subversion, JUnit, SQL, Putty and Eclipse

Confidential, OH

Hadoop Developer

Responsibilities:

  • Worked on analyzingHadoopcluster and different Big Data analytic tools including Pig, Hbase database and Sqoop
  • Responsible for building scalable distributed data solutions usingHadoop
  • Installed and configured Flume, Hive, Pig, Sqoop, Hbase on theHadoopcluster.
  • Managing and scheduling Jobs on aHadoopcluster.
  • Implemented nine nodes CDH3Hadoopcluster on Red hat LINUX.
  • Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration..
  • Resource management ofHADOOPCluster including adding/removing cluster nodes for maintenance and capacity needs
  • Involved in loading data from UNIX file system to HDFS.
  • Created Hbase tables to store variable data formats of PII data coming from different portfolios.
  • .Implemented test scripts to support test driven development and continuous integration.
  • Responsible to manage data coming from different sources.
  • Installed and configured Hive and also written Hive UDFs.
  • Experienced on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Cluster coordination services through Zookeeper.
  • Experience in managing and reviewingHadooplog files.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Supported in setting up QA environment and updating configurations for implementing scripts wif Pig and Sqoop.

Environment: Hadoop, HDFS, Hive, Flume, Hbase, Sqoop, PIG, Java (JDK 1.6), Eclipse, MySQL and Ubuntu, Zookeeper

Confidential, IL

J2EE Developer

Responsibilities:

  • Involved in Presentation Tier Development using JSF Framework and ICE Faces tag Libraries.
  • Involved in business requirement gathering and technical specifications.
  • Implemented J2EE standards, MVC2 architecture using JSF Framework.
  • Implementing Servlets, JSP and Ajax to design the user interface.
  • Extensive experience in building GUI (Graphical User Interface) using JSF and ICE Faces.
  • Developed Rich Enterprise Applications using ICE Faces and Portlets technologies.
  • Experience using ICE Faces Tag Libraries to develop user interface components.
  • Used JSF, JSP, Java Script, HTML, and CSS for manipulating, validating, customizing, error messages to the User Interface.
  • Used EJB (Session beans) to implement the business logic, JMS for communication for sending updates to various other applications and MDB for routing priority requests.
  • All the Business logic in all the modules is written in core Java.
  • Wrote WebServices using SOAP for sending and getting data from the external interface.
  • Developed a web-based reporting for monitoring system wif HTML and Tiles using Struts framework.
  • Middleware Services layer is implemented using EJB (Enterprise Java Bean - stateless) in WebSphere environment.
  • Used Design patterns such as Business delegate, Service locator, Model View Controller, Session façade, DAO.
  • Funds Transfers are sent to another application using JMS technology asynchronously.
  • Involved in implementing the JMS (Java messaging service) for asynchronous communication.
  • Involved in writing JMS Publishers to post messages.
  • Involved in writing MDB (Message Driven Beans) as subscribers.
  • Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle

Environment: J2EE, EJB, JSF, ICE Faces, EJB, WebServices, XML, XSD, Agile, Microsoft Visio, Clear Case, Oracle 9.i/10.g, Weblogic8.1/10.3,RAD, Log4j,Servlets, JSP, Unix.

Confidential

J2EE Developer

Responsibilities:

  • Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle
  • Involved in Requirement Analysis, Development and Documentation.
  • Used MVC architecture (Jakarta Struts framework) for Web tier.
  • Participated in developing form-beans and action mappings required for struts implementation and validation framework using struts.
  • Development of front-end screens wif JSP Using Eclipse.
  • Involved in Development of Medical Records module. Responsible for development of the functionality using Struts and EJB components.
  • Coding for DAO Objects using JDBC (using DAO pattern).
  • XML and XSDs are used to define data formats.
  • Implemented J2EE design patterns value object singleton, DAO for the presentation tier, business tier and Integration Tier layers of the project.
  • Involved in Bug fixing and functionality enhancements.
  • Designed and developed excellent Logging Mechanism for each order process using Log4j.
  • Involved in writing OracleSQL Queries.
  • Involved in Check-in and Checkout process using CVS.
  • Created SAP Business Objects Reports.
  • Developed additional functionality in the software as per business requirements.
  • Involved in requirement analysis and complete development of client side code.
  • Followed Sun standard coding and documentation standards.
  • Participated in project planning wif business analysts and team members to analyze the Business requirements and translated business requirements into working software.
  • Developed software application modules using disciplined software development process.

Environment: Java, J2EE, JSP, EJB, ANT, STRUTS1.2, Log4j, Weblogic 7.0, JDBC, MyEclipse, Windows XP, CVS, Oracle, SAP Business Objects, Netezza

We'd love your feedback!