We provide IT Staff Augmentation Services!

Hadoop/spark/big Data Consultant Resume

3.00/5 (Submit Your Rating)

Des Moines, IowA

SUMMARY:

  • Over all 7 years of IT experience in Hadoop and Big Data ecosystems.
  • Experience in designing and developing large scale distributed data management, analytics and J2EE enterprise applications.
  • Experience in clearly articulating pros and cons of various technologies and platforms.
  • Strong development skills on Spark, MapReduce, Pig, Hive, HBase, Sqoop and Flume.
  • Proven ability to excel in fast paced development environment using latest framework/tools(SPARK with PYTHON - PYSPARK)
  • Hands on experience in optimizing MapReduce algorithms using combiners and partitioners to get the best results
  • Experience in designing NoSQL databases like HBase and Cassandra.
  • Experience in importing and exporting data to/from HDFS and Hive.
  • Developed Spark scripts by using Scala as per the requirement.
  • Hands on experience on Apache Spark and Scala.
  • Experience in design, build, and document and consume REST APIs using Swagger and develop Spring boot microservices.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
  • Analyzing of Large volumes of structured data using SparkSQL.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
  • Hands on experience in designing and developing full-text search applications.
  • Hands on experience on SOLR schema creation, indexing data and Elasticsearch.
  • Extensively used many solr query types like phrase queries, wildcard queries, range etc…
  • Experience in Apache Spark cluster and streams processing using Spark Streaming.
  • Involved in SolrCloud implementation and created custom data types and filters.
  • Experience in tuning solr documents for best relevance.
  • Good experience on Storm, Kafka, Flume and Sqoop.
  • Performed cluster coordination services through Zookeeper.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
  • In-Depth knowledge of Scala and Experience building Spark applications using Scala.
  • Good experience and knowledge on Amazon Web Services (AWS), Hortonworks Data Platform (HDP) and Cloudera Distribution Hadoop (CDH).
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
  • Excellent understanding of Hadoop architecture, YARN and various components like HDFS, Name node, Data Node, Job Tracker and Task Tracker.
  • Working on implementing Spark and Strom frame work to ingest the data in real time and apply transformations in SCALA.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Expert level skills in J2ee components/frameworks - EJB2.0/3.0, Servlets, JSP, JSF, Java Beans, JDBC, JMS, RMI.
  • Extensive experience in using Struts, Hibernate 3.0, EJB2.0 and J2ee Design patterns.
  • Extensive experience in SOA and Web Services (SOAP / RESTful).
  • Developed multi-tiered object-oriented system architectures utilizing Use cases, UML diagrams.
  • Familiar with all aspects of technology projects including Business Requirements, Design Specification, Design Patterns and Deployment and MQ Series.
  • Extensive development works on IBM Web Sphere 7.0/8.5, RAD7.0/8.5, Weblogic 7.0/8.1 and Oracle Application Server.
  • Experience in Wrote Shell Script to execute HiveQL.
  • Wrote automated shell scripts in Linux/Unix environment using bash.
  • Migrated HiveQL queries into SparkSQL to improve performance.
  • Extracted Real time feed using Spark streaming and convert to RDD and process data into Data Frame and load the data into HBase.
  • Used Data Stax Spark connector which is used to store the data into Cassandra database or get the data from Cassandra database.
  • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
  • Developed functional programs in SCALA for connecting the streaming data application and gathering web data using JSON and XML and passing it to FLUME.
  • Expert level familiarity with Tomcat and Apache 2.xx Web servers.
  • Extensive knowledge on Scrum and Agile, Software Development Life Cycle (SDLC) having thorough understanding of various phases like Requirements, Development, Integration, Documentation, Testing, Deployment of Client Server Architecture and Web Technologies from medium to large-scale enterprise projects.
  • Able to work effectively at all organizational levels and have an ability to manage rapid changes.
  • Good communication skills and a genuine team player with good organizational and self-management skill.
  • Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.

TECHNICAL SKILLS:

Big Data: Hadoop: HDFS, Map Reduce, Sqoop, Flume, Hive, PIG, Hbase, Cassandra, SOLR, Zookeeper, Storm, Spark, Scala, Kafka, MongoDB, Predix, AWS.

Languages/Frameworks: Java /J2EE (EJB, JSP, JDBC, JMS, JAXB, JNDI, JAX-RPC, SOAP/RESTful Web Service), Java, JSF, ADF, Struts, Swing, XML, SQL, and Shell Programming.

Web/Application Servers: Web sphere, Web logic Server and Apache/Tomcat.

Web Internet: XML, XSLT, CSS, HTML, JavaScript.

Databases: Oracle 11g/10g/9i/8i, MS Access.

PROFESSIONAL EXPERIENCE:

Confidential, Des Moines, Iowa

Hadoop/Spark/Big Data Consultant

Responsibilities:

  • Performance optimizations on Spark/Scala.
  • Used Spark as ETL tool.
  • Implemented best offer logic using Pig scripts and Pig UDFs.
  • Analyzing of Large volumes of structured data using SparkSQL.
  • Responsible to load data from external systems, parse and clean data for data scientists
  • Create Docker images for Spark and Postgres
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Hive, Spark, Python, Sqoop, flume, Oozie.
  • Avoided MapReduce by using PySpark for boosting performance to 3x times.
  • Used Spark SQL to access Hive tables for data analytical and fast processing
  • Import data from Postgres database to Hive using Sqoop using optimized techniques
  • Develop Cassandra application to ingest log and time series data.
  • Developed spark streaming application to process real time events
  • Research customer needs and develop applications as per the customers need
  • Developed microservices using Spring Boot api to interact with MongoDB to store analytical configurations.
  • Build Cassandra Cluster in AWS environment.
  • Developing Scripts and Batch Job to schedule various Hadoop Program
  • Fine tuning Cassandra and Spark clusters and Hive queries
  • Travel to customer places and identify current drilling issues.
  • Responsible for creating and maintaining the micro services, Postgres and Rabbit MQ services in the cloud environments (GE Predix, AWS and Azure)
  • Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
  • Developed Spark scripts by using Scala as per the requirement.
  • Involved in business requirement gathering, analysis and preparing design documents.
  • Developed MapReduce jobs to ingest data into Hbase and index into SOLR
  • Involved in preparing SOLR collection and schema creation.
  • Developed Spark jobs using Scala for processing locomotive events
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
  • Involved in debugging and fine tuning the SOLR cluster and queries
  • Designed and developed applications using SOLRJ api to index and search documents
  • Involved in importing documents data from external system to HDFS
  • Developed Spark streaming applications to ingest emails and instant messages into HBase and Elasticsearch.
  • Involved in troubleshooting, performance issues and tuning Hadoop cluster.
  • Written code to interact with Hbase using Hbase java client API
  • Managing and allocating tasks for onsite and offshore resources
  • Involved in setting up Kerberos and authenticating from web application
  • Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.

Environment: HDP 2.6, AWS, Azure, Cassandra, Rabbit MQ, Postgres, SPARK, Hive, Elasticsearch, Hadoop, HDFS, Docker, Sqoop, MongoDB, Spring Boot, Swagger

Confidential, Atlanta GA

Hadoop/Spark Developer

Responsibilities:

  • Developed MapReduce jobs to process documents
  • Responsible for SOLR implementation and setup collections in SolrCloud.
  • Involved in Hadoop cluster setup and configuring Hadoop Ecosystems.
  • Developing Scripts and Batch Job to schedule various Hadoop Program
  • Write code to parse the external documents before copying to HDFS.
  • Developed Spark scripts by using Scala as per the requirement.
  • Developed HBase ingestion for documents and tuning
  • Developed web application to interact with SOLR for searching documents and ingest using SOLRJ api
  • Developed Spark jobs using Scala for processing locomotive events
  • Responsible for interacting with business partners and gather requirements and prepare technical design documents.
  • Developed service oriented architecture (SOA) based design of the application
  • Responsible for writing detail design documents and class diagrams and sequence diagrams.
  • Developed composite components using JSF 2.0.
  • Coordinating with the Onsite team and Clients.
  • Preparing the Unit Test Cases and executing the same.
  • Involved in the Integration testing, User Acceptance Support.
  • Involved in the Production Support.
  • Collaborate with product/business users, data scientists and other engineers to define requirements to design, build and tune complex solutions.
  • Involved in business requirement gathering, analysis and preparing design documents
  • Involved in preparing SOLR collection and schema creation.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
  • Involved in debugging and fine tuning the SOLR cluster and queries
  • Involved in importing documents data from external system to HDFS
  • Developed Spark streaming applications to process real time events, ingest emails and instant messages into HBase and Elasticsearch.
  • Managing and allocating tasks for onsite and offshore resources
  • Involved in setting up Kerberos and authenticating from web application
  • Involved in the refactoring the existing application to improve the performance of the application.
  • Interacting with client to map the legacy data with SCOPE specific data.
  • Developed Service Java Classes interface between application and external systems
  • Have written SQL query for creating the batch table.
  • Involved in Build Process and run the deployment procedure in the UNIX Environment on regular basis.
  • Monitoring the log files on regular basis in UNIX environment.

Environment: Hortonworks Data Platform (HDP 2.3), Hadoop, HDFS, Spark, Kafka, Hive, SOLR 5.2.1, HBase, Sqoop, Solr, Sun Solaris, Elasticsearch 2.0.0, RSA, Primefaces, JSF, RAD 8/8.5, AngularJS, Websphere Application Server 8/8.5, Java 1.7, Subversion, EJB 3.0, Oracle 11g.

Confidential, Arlington, VA

Hadoop Developer

Responsibilities:

  • Installed Hadoop, MapReduce, HDFS, and developed multiple MapReduce jobs in PIG and HIVE for data.
  • Used IMPALA to read, write and query the Hadoop data in HDFS and configured KAFKA to read and write messages from external programs.
  • Used PIG as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Created Stored Procedures to transform the Data and worked extensively in SQL for various needs of the transformations while loading the data.
  • Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark data frames.
  • Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
  • Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.

Environment: Cloudera, Hadoop, HDFS, Hive, Impala,Spark Sql, Python, Sqoop, Oozie, Storm, Spark, Scala, MySQL, Shell Scripting

Confidential, Concord, N.C

Hadoop Developer

Responsibilities:

  • Involved in importing data from external systems to HDFS
  • Analyse the requirements provided by Business Analyst and prepare Impact analysis and estimates.
  • Have Prepared the Use cases and Functional Specification documents.
  • Have Prepared the High Level Design Documents.
  • Writing functional and technical design specifications, Test documents and Unit test cases and develop modules.
  • Involved in setting up Hortonworks on Amazon Web Services (AWS) for POC.
  • Developed MapReduce jobs and Pig scripts
  • Managing and scheduling jobs using Oozie
  • Responsible for designing and managing Sqoop jobs that load data from Oracle to HDFS
  • Wrote Hive generic UDF's to perform business logic operations at record level
  • Involved in business requirement gathering and analysis
  • Involved in troubleshooting, performance issues and tuning hadoop cluster
  • Responsible for writing Pig scripts and Hive queries.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
  • Created Hive managed and external tables to store processed data
  • Responsible for creating Hive tables, partitions, bucketing, loading data and writing hive queries
  • Developed web application which interacts with HBase using HBase client API
  • Responsible for developing conceptual designs from requirements.
  • Responsible for writing detail design documents and class diagrams and sequence diagrams.
  • Developing Scripts and Batch Job to schedule various Hadoop Program
  • Involved in developing SQL queries, PL/SQL stored procedures, and functions.
  • Used Struts framework for implementing the Web tier of the application.
  • Code reviews and refactoring done during the development and check list is strictly adhered during development.
  • Involved in the Integration Testing and User Acceptance Testing support
  • Design patterns are extensively used to achieve clean separation of different layers. Used different patterns like MVC pattern, DAO pattern, and singleton.
  • Experience in configuring Hadoop Clusters and HDFS.
  • Prepared the test plans and executed test cases for unit, integration and system testing.
  • Involved in code promotion and rebase of DEV and TEST streams using ClearCase
  • Involved in the entire software development cycle spanning requirements gathering, analysis, design, development, building, testing, and deployment.

Environment: Web Services, Oracle 10g, Java Script, CSS, HTML, AJAX, Unix Shell Scripts, Junit, Hortonworks Data Platform (HDP 2.1) HDFS, Hive, HBase, Pig, Sqoop, Flume, Kafka

We'd love your feedback!