We provide IT Staff Augmentation Services!

Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY

  • 6+ years of software development experience which includes 4+ years on Big Data Technologies like Hadoop, and other Hadoop Eco - System Components like Hive, Pig, Sqoop, HBase, NIFI, Kafka, Oozie, Spark, Shell, YARN, FLINK, Ranger, Zookeeper.
  • Have Good exposure to AWS Cloud services for Big Data Development.
  • Good hands-on knowledge in the Hadoop ecosystem and its components such as Map-Reduce & HDFS.
  • Worked on installing, configuring, and administrating the Hadoop cluster for distributions like HDP and CDP.
  • Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing, and optimizing the Hive QL queries.
  • Expert in working with Beeline and Prestosql.
  • Have experience in Using Spark.
  • Experience in dealing with structured and semi-structured data in HDFS.
  • Knowledge in UNIX shell scripting.
  • Have a Good Understanding in ETL tools.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experienced in data formats like JSON, PARQUET, AVRO, and ORC formats.
  • Hands-on experience in configuring and working with STORM to load the data from multiple sources directly into HDFS.
  • Have experience in Scripting Language Python.
  • Experience in using Apache Sqoop to import and export data.
  • Have experience in the web services like REST.
  • Have experience of working in NIFI.
  • Experience in using and managing change management tool Git and build server software, Jenkins.
  • Have experience in Messaging and collection Framework, Kafka.
  • Have experience in using Streaming technologies.
  • Strong knowledge in Hadoop cluster installation, capacity planning, and performance tuning, benchmarking, disaster recovery plan, and application deployment in the production cluster.
  • Good exposure to databases like MYSQL, SQL, POSTGRES, Netezza.
  • Have experience in using the Software Development Methodologies like Agile for providing the solutions.
  • Develop and execute maintainable automation tests for acceptance, functional, and regression test cases Investigate and debug test failures, updating tests or reporting bugs as necessary and provide test coverage analysis based on automation results.
  • Comprehensive knowledge of the Software Development Life Cycle coupled with excellent communication skills.
  • Worked on scheduling for maximizing CPU time utilization and performing backup and restore if different components.
  • Have experience in HBASE and Apache Phoenix database engines.
  • Have good exposure to Pyspark Scripts.
  • Have a good exposure to Data Governance.

PROFESSIONAL EXPERIENCE

Confidential, Dallas, TX

Big Data Engineer

Responsibilities:

  • Provided a solution usingHIVE,SQOOP(to export/ import data), for faster data load by replacing the traditionalETLthe process withHDFSfor loading data to target tables.
  • Developed thePig UDF'sto preprocess the data for analysis.
  • Implemented pipeline to load XMl’s into HDFS usingSTORM & FLINK.
  • UsedPig Latin and Pyspark scriptsto extract the data from the output files, process it, and load intoHDFS.
  • Worked on Creating Custom Datasets for downstream reporting.
  • Implemented partitioning, dynamic partitions, bucketing inHIVE.
  • Used the messaging FrameworkKafka.
  • UsedKafkawith a combination ofApache Storm, Hivefor real-time analysis of streaming of data.
  • ConfiguredSpark streamingto receive real-time data from the Kafka and store the stream data to HDFS.
  • Used Data formats likeORC,Avro, Parquet.
  • Delivered the solution usingAgile Methodology.
  • Used theSparkto fast processing of data inHive and HDFS.
  • UsedSpark SQLfor Structured data processing using data frames API and Datasets API.
  • UsedKafkain conjunction withZookeeperfor deployment management, which necessitates monitoring its metrics alongsideKafka clusters.
  • DevelopedHivequeries to process the data and generate the results in a tabular format.
  • WrittenHive queriesfordata analysisto meet business requirements.
  • Used NIFI to load the data from the FTP server to HDFS.
  • Used NIFI to expose the data using Restful API.
  • Used HBase to load huge datasets and maintaining the Change Data Capture.
  • Worked on the upgrading of cluster from HDP 2.6.5 to HDP 3.1.2 and from HDP 2.6.5 to CDP 7.1.2.
  • Have a good exposure on HIVE3 and HWC for Spark.
  • Worked on Migration actives (i.e.) Migrated all Pig Scripts to Pyspark and Hive on MR and Hive on Tez to Beeline on TEZ.
  • Worked with Data Science teams to modularize the Pyspark code and to support after the production deployments.
  • Have a good exposure on Data encryption on HDFS using Ranger KMS.
  • Worked on POC to Ingest Realtime data XMLS(OTA’s) on to HDFS using NIFI to replace the Storm.
  • Worked on Tag Sync Using Ranger and Atlas for Capturing Metadata and PII information Data Governance Needs.

Environment: Hadoop2, Hadoop3, Java, Python, HDFS, Pig, Sqoop, Hbase, Hive2, Hive3, Spark, oozie, NIFI, Storm, Shell Scripting, Linux, and RDBMS.

Confidential, Englewood, Co

Hadoop Developer

Responsibilities:

  • Provided a solution using HIVE, SQOOP (to export/ import data), for faster data load by replacing the traditional ETL process with HDFS for loading data to target tables.
  • Created UDF's and Oozie workflows to Sqoop the data from source to HDFS and then to the target tables.
  • Implemented custom Datatypes, Input Format, Record Reader, Output Format, Record Writer for Map Reduce computations.
  • Developed the Pig UDF's to preprocess the data for analysis.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Used Pig Latin scripts to extract the data from the output files, process it and load into HDFS.
  • Extensively involved in entire QA Process and defect Management life cycle.
  • Created reports for BI team using Sqoop to export data into HDFS and Hive.
  • Implemented partitioning, dynamic partitions, bucketing in HIVE.
  • Used the messaging Framework Kafka and Storm.
  • Used Storm to process the unbounded streams of data.
  • Used Kafka with combination of Apache Storm, Hbase for real time analysis of streaming of data.
  • Used Storm to a distributed real-time computation system for processing large volumes of high-velocity data.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Java.
  • Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
  • Used Spouts, bolts and topologies in Storm Streaming API.
  • Used Storm to process Streaming API’s.
  • Used the Data formats like Avro, Parquet.
  • Delivered the solution using Agile Methodology.
  • Used the Spark to fast processing of data in Hive and HDFS.
  • Used Spark SQL for Structured data processing using data frames API and Datasets API.
  • Used Kafka to conjunction withZookeeperfor deployment management, which necessitates monitoring its metrics alongside Kafka clusters.
  • Developed Hive queries to process the data and generate the results in a tabular format.
  • Handled importing of data from multiple data sources using Sqoop, performed transformations using Hive, MapReduce and loaded data into HDFS.
  • Written Hive queries for data analysis to meet the business requirements.
  • Created Hive tables and worked on them using Hive QL.
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozie workflows.
  • Involved in designing and developing non-trivial ETL processes within Hadoop using tools like Pig, Sqoop, Flume, and Oozie.
  • Responsible for developing data pipeline using Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.

Environment: Hadoop, Java, J2EE, HDFS, Pig, Sqoop, Hbase, AWS, Shell Scripting, Linux, and RDBMS.

Confidential, Chicago

Hadoop Developer

Responsibilities:

  • Involved in source system analysis, data analysis, and data modeling to ETL (Extract, Transform and Load).
  • Used Sqoop extensively to ingest data from various source systems into HDFS.
  • Being a part of a POC effort to help build new Hadoop clusters.
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Hive was used to produce results quickly based on the report that was requested.
  • Developed shell scripts, which acts as wrapper to start Hadoop jobs and set the configuration parameters.
  • Worked on a stand-alone as well as a distributed Hadoop application.
  • Understood complex data structures of different types (structured, semi structured) and de-normalizing for storage.
  • Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
  • Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Involved in loading data from LINUX file system to HDFS.
  • Responsible for managing data from multiple sources.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

We'd love your feedback!