Big Data Consultant Resume
Richardson, TX
SUMMARY
- Good IT experience in Architecture, Analysis, design, development, implementation, maintenance and support with experience in developing strategic methods for deploying big data technologies to efficiently solve Big Data processing requirement.
- Worked extensively on Hadoop ecosystem components like HDFS, YARN, MapReduce, Spark, Scala, Sqoop, Flume, Kafka, Pig, Hive, HBase, Phoenix, Oozie, Zookeeper and Flume
- Experience in implementing Spark RDD's in Scala.
- Working with the data extraction, transformation and load using Hive, Pig and Spark(Using Scala)
- Experience in handling messaging services using Apache Kafka.
- Hands on experience on Streaming data ingestion and Processing.
- Experienced in designing different time driven and data driven automated workflows using Zena.
- Highly Acumen in choosing an efficient ecosystem in Hadoop and providing the best solutions to Big Data problems.
- Well versed with Design and Architecture principles to implement Big Data Systems.
- Hands on experience in configuring and working with Sqoop to load the data from multiple sources directly into HDFS
- Expertise in relational databases like Oracle, My SQL and SQL Server.
- Well versed with Sprint ceremonies that are practiced in Agile methodology.
- Highly involved in all phases of SLDC with Analysis, Design, Development, Integration, Implementation.
- Strong analytical and problem solving skills, highly motivated, good team player with very Good communication & interpersonal skill
- Very much adherent to Agile standards.
- Providing 24/7 support as per the company requirements.
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Avro, Parquet and Snappy.
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, Map R and Apache
Languages: Scala, Java, SQL, JavaScript, XML and C/C++
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Windows Variants
Methodologies: Agile, WaterFall
NOSQL Databases: HBase, Cassandra, MongoDb
Version Control Tools: SVN, Git
PROFESSIONAL EXPERIENCE
Confidential, Richardson, TX
Big Data Consultant
Responsibilities:
- Involved in complete SDLC life cycle of big data project that includes requirement analysis, design, coding, testing and production.
- Extracted Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
- Used Spark and Spark - SQL to read the parquet data and create the tables in hive.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Extensively worked on Sqoop to import/export data between RDBMS and Hive tables.
- Involved in implementing the solution for data preparation which is responsible for data transformation as wells as handling user stories.
- Developing and testing data Ingestion/Preparation/Dispatch jobs.
- Created Hive external table on top of Hbase which were used for easy querying.
- Created Zena Processes for automatation.
- Worked with Spark SQL for processing data in the Hive tables.
- Prepared Pig Latin Scripts to perform transformations (ETL) as per the use case requirement.
- Created Hive target tables to hold the data after all the ETL operations using PIG/Spark.
- Created HQL scripts to perform the data validation once transformations are done as per the use case.
- Implemented compression technique to free up some space in the cluster using Snappy compression on HBase tables to reclaim the space.
- Hands on experience with Accessing and perform CURD operations against HBase data.
- Integrating with Pheonix to add SQL layer on top of HBase to get the best performance while reading and writing using salting feature.
- Written shell scripts to automate the process by scheduling and calling the scripts from Zena.
- Develop ETL Process using SPARK, SCALA, HIVE and HBASE. Closely collaborated with both the onsite and offshore team
- Well versed with Sprint ceremonies that are practiced in Agile methodology.
- Strong analytical and problem solving skills, highly motivated, good team player with very Good communication & interpersonal skill
- Providing 24/7 support as per the company requirements.
Environment: Hadoop, HDFS, YARN, Map Reduce, Tez, Hive, Sqoop, Kafka, PIG, Spark, Scala, Java (JDK 1.8), Eclipse, Intelliz, MySQL, Zookeeper, Oracle, Shell Scripting, Zena, Phoenix.
Confidential, Greenville, SC
Big Data Consultant
Responsibilities:
- Established custom Map Reduces programs in order to analyze data and used Pig Latin to clean unwanted data.
- Created HQL queries and wrote Hive UDF to successfully implement business requirements.
- Involved in creating hive tables, loading data into tables and writing hive queries those are running in MapReduce and Tez way.
- Experienced with using different kind of compression techniques to save data and optimize data transfer over network using Snappy in Hive tables.
- Creating unit test cases to make sure to cover all the positive and negative cases.
- Installed Zena e workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.
- Hands on experience to perform CURD operations against HBase data and also in following best practices for handling data.
- Configured Kafka to read and write messages from external programs.
- Configured Kafka to handle real time data.
- Involved in POC to migrate map reduce jobs into Spark RDD transformations using Scala.
- Extensively used Hive queries to query data according to the business requirement.
- Used Pig for analysis of large data sets and brought data back to Hbase by Pig
- Closely worked with App support team to deploy the developed jobs into production
- Created dispatcher jobs using sqoop export to dispatch the data into Teradata target tables.
- Loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Developed workflow in Zena to automate the tasks of loading the data into HDFS and pre-processing with Pig. Developed Pig Scripts to pull data from HDFS.
- Used Tez framework for building high performance jobs in Pig and Hive.
Environment: Hadoop, HDFS, Map Reduce, Hive, Flume, Sqoop, PIG, Kafka, MySQL, Unix, Zookeeper, Java, Oracle, Shell Scripting, Zena.
Confidential, Charlotte, NC
Big Data Consultant
Responsibilities:
- Loaded and transformed large sets of structured and semi structured data.
- Created Map Reduce Jobs, HIVEQL, Pig Scripts.
- Imported data using Sqoop into Hive and Hbase from existing SQL Server.
- Support code/design analysis, strategy development and project planning.
- Create reports for the BI team using Sqoop to export data into HDFS and Hive.
- Develop multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Involve in Requirement Analysis, Design, and Development.
- Export and Import data into HDFS and Hive using Sqoop.
- Work closely with the business and analytics team in gathering the system requirements.
- Load and transform large sets of structured and semi structured data.
- Load data into HBase tables and Hive partitioned tables according to business requirements.
- Storing the data to HBase tables according to business requirements.
- Creating Hive Tables using advances concepts in Hive like bucketing, partitioning, UDF’s.
- Created Phoenix tables for indexing and for better performance.
- Creating friendly environment in team.
Environment: Hadoop Framework, MapReduce, HDFS, Core Java, MapReduce, Hive, Pig, Sqoop, Zena, Shell scripting, UNIX, Teradata, Oracle
Confidential, Cary, NC
Hadoop Developer
Responsibilities:
- Developed data pipeline using Map Reduce, Flume, Sqoop and Pig to ingest customer behavioral data into HDFS for analysis.
- Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Analyzed the data by performing Hive queries (Hive QL) and running Pig scripts (Pig Latin) to study customer behavior.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed Pig Latin scripts to perform Map Reduce jobs.
- Developed product profiles using Pig and commodity UDFs.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Created HBase tables and column families to store the user event data.
- Written automated HBase test cases for data quality checks using HBase command line tools.
- Created UDF’s to store specialized data structures in HBase and Cassandra.
- Scheduled and executed workflows in Oozie to run Hive and Pig jobs.
- Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra.
- Used Tez framework for building high performance jobs in Pig and Hive.
Environment: Hadoop, Map Reduce, Pig, Hive, Sqoop, Oozie, HBase, Zoo keeper, Kafka, Flume, Tez, Impala, MySQL, Unix.