We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

San Jose, CA

SUMMARY:

  • Overall 7 years of professional experience in IT in BIGDATA using HADOOP framework and Analysis, Design, Development, Testing, Documentation, Deployment and Integration using SQL and Big Data technologies.
  • Strong Experience in batch processing and real - time data processing using Kafka and Spark Streaming.
  • Proficient in SQL, Java, Python and Scala and have experience in writing MapReduce programming.
  • Experience in designing ETL Architecture and developing ETL data pipelines to integrate data from the different source systems.
  • Extensive experience in working with Hadoop, HDFS, Hive, Hbase, Sqoop and Spark.
  • Good knowledge in Hadoop Architecture and its components such as HDFS, MapReduce, JobTracker, TaskTracker, NameNode, DataNode.
  • Hands on experience in working with Horton works Distributed Platform(HDP) and Cloudera.
  • Experience in Hadoop and related Big Data technologies such as HBASE, HIVE, PIG, FLUME, OOZIE, SQOOP, and ZOOKEEPER.
  • Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.
  • Extensive experience in developing Pig Latin Scripts for transformations and using Hive Query Language for data analytics.
  • Having good knowledge in writing MapReduce jobs through Pig, Hive, and Sqoop.
  • Extending Pig and Hive core functionality by writing customized User Defined Functions for analysis of data, file processing, by running PigLatinScripts.
  • Having experience in creating Hiveinternal/externalTables using shared MetaStore.
  • Worked on importing data into HBase using HBaseShell.
  • Used ApacheOozie for scheduling and managing the Hadoop Jobs.
  • Excellent programming skills with experience in Java, C, SQL and Python Programming.
  • In depth and extensive knowledge of analyzing data using HiveQL, PigLatin, HBase and custom MapReduce programs in Java.
  • Good understanding of Zookeeper for monitoring and managing Hadoop jobs.
  • Having extensive knowledge on RDBMS such as Oracle, MicrosoftSQLServer, MYSQL
  • Extensive experience working on various databases and database script development using SQL and PL/SQL.
  • Good understanding of NoSQL databases such as HBase, Cassandra and MongoDB.
  • Experience with operating systems: Linux, RedHat, and UNIX.
  • Supported MapReduce Programs running on the cluster and wrote custom MapReduceScripts for data processing in Java.
  • Experience in Developing Spark jobs using Scala in test environment for faster data processing and used SparkSQL for querying.
  • Experience in different IDEs like Eclipse, NetBeans.

TECHNICAL SKILLS:

Programming Languages : Java, C, SQL/PLSQL, PIG LATIN, Scala, HTML, XML.

Web Technologies : HTML, CSS, JavaScript, JSP, JQuery, XML.

Hadoop/Big Data: MapReduce, Spark, Pig, Hive, Sqoop, Hbase, Flume, Kafka Yarn, Oozie, Zookeeper, Kerberos, Impala

RDBMS Languages : MySQL, PL/SQL

Cloud: Azure, AWS.

NoSQL : MongoDB, HBase, Apache Cassandra.

Tools: /IDES : .Net Beans, Eclipse, GIT, Putty.

Operating System : Linux, Windows, Ubuntu, Red Hat Linux, UNIX

Methodologies : Agile, Waterfall model.

Testing Hadoop: MR UNIT Testing, Quality Center, Hive Testing.

PROFESSIONAL EXPERIENCE:

Confidential, San Jose, CA

Data Engineer

Responsibilities:

  • Involved in the high-level design of the Hadoop2.6.3 architecture for the existingdata structure and Problem statement and setup the 64-node cluster and configured the entire Hadoop platform using Hortonworks Distribution (HDP) with Ambari Server .
  • Worked on analysing Hadoop 2.7.2 cluster and different Big Data analytic tools including Pig0.16.0, Hive2.0HBase1.1.2 database and SQOOP1.4.6
  • Experience in integrating the real-time data using Kafka and Spark Streaming in Java and Scala to process it coming from the different source systems such as XLSX, CSV, Database, etc.
  • Exploring with Spark using Scala for improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame and pair RDD's
  • Experience in developing SparkSQL applications both using SQL and DSL
  • Implemented algorithms for real time analysis in Spark.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
  • Involved in validating the aggregate table based on the rollup process documented in the data mapping. Developed HiveQL, SparkSQL and automated the flow using shell scripting.
  • Worked with various HDFS file formats like Parquet, Json for serializing and deserializing .
  • Developed MapReduce3 programs to parse the raw data and store the refined data in tables.
  • Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume1.7.0.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Created Hive tables, loaded data and wrote Hive queries that run within the map.
  • Used OOZIE1.2.1 Operational Services for batch processing and scheduling workflows dynamically and created UDF's to store specialized data structures in HBase and Cassandra.
  • Used Impala to read, write and query the Hadoop data in HDFS from HBase or Cassandra and configured Kafka to read and write messages from external programs.
  • Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Have experience in NIFI which runs in a cluster and provides real-time control that makes it easy to manage the movement of data between any source and any destination
  • Hands on experience in application development using Java, RDBMS, and Linux shell scripting.
  • Extensive usage of alias for Oozie and HDFS commands
  • Create a complete processing engine, based on Hortonworks's distribution, enhanced to performance.
  • Manage and review Hadoop log files.
  • Involved in the identifying, analysing defects, questionable function error and inconsistencies in output.

Environment: Hadoop, Map Reduce, Yarn, Spark, Scala, Hive, Pig, HBase, Sqoop, Flume, Oracle 11g, Core Java, Hortonworks, HDFS, Eclipse

Confidential, Richmond, VA

Data Engineer

Responsibilities:

  • Executed Hive queries that helped in analysis of market trends by comparing the new data with EDW reference tables and historical data.
  • Managed and reviewed Hadoop log files job tracker, NameNode, secondary NameNode, data node, and task tracker.
  • Tested raw market data and executed performance scripts on data to reduce the runtime.
  • Involved in loading the created Files into HBase for faster access of large sets of customer data without affecting the performance.
  • Importing and exporting the data from HDFS to RDBMS using Sqoop and Kafka.
  • Executed the Hive jobs to parse the logs and structure them in relational format to provide effective queries on the log data.
  • Created Hive tables (Internal/external) for loading data and have written queries that will run internally in MapReduce and queries to process the data.
  • Developed PigScripts for capturing data change and record processing between new data and already existed data in HDFS.
  • Populated HDFS and Cassandra with huge amounts of data using ApacheKafka.
  • Involved in importing of data from different data sources, and performed various queries using Hive, MapReduce, and PigLatin.
  • Involved in loading data from local file system to HDFS using HDFS Shell commands.
  • Experience on UNIX shell scripts for process and loading data from various interfaces to HDFS.
  • Develop different components of Hadoop ecosystem system process that involves Map Reduce, and Hive.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Big Data, Java, Flume, Kafka, Yarn, HBase, Kafka Oozie, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera.

Confidential, Saline, MI

Data Engineer

Responsibilities:

  • Involved in the high-level design of the Hadoop2.6.3 architecture for the existingdata structure and Problem statement and setup the 64-node cluster and configured the entire Hadoop platform using Hortonworks Distribution (HDP) with Ambari Server .
  • Implemented Data Interface to get information of customers using RestAPI and Pre-Processdata using MapReduce 2.0 and store into HDFS (Hortonworks)
  • Extracted files from MySQL, Oracle, and Teradata 2 through Sqoop 1.4.6 and placed in HDFS Cloudera Distribution and processed.
  • Configured Hive 1.1.1 metastore, which stores the metadata for Hive tables and partitions in a relational database.
  • Worked with various HDFS file formats like Avro1.7.6, Sequence File, Json and various compression formats like Snappy, bzip2.
  • Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform datacleaning and preprocessing on Hortonworks.
  • Developed the Pig 0.15.0 UDF's to pre-process the data for analysis and Migrated ETL operations into Hadoop system using Python Scripts3.5.1.
  • Used Pig as ETL tool to do transformations, event joins, filtering and some pre-aggregations before storing the data into HDFS.
  • Troubleshooting, debugging & altering Talend issues, while maintaining the health and performance of the ETL environment.
  • Developed Hive queries for data sampling and analysis to the analysts.
  • Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Developed custom Unix SHELL scripts to do pre and post validations of master and slave nodes, before and after configuring the name node and datanodes respectively.
  • Experienced in running Hadoop streaming jobs to process terabytes of formatted data using Pythonscripts.
  • Developed small distributed applications in our projects using Zookeeper3.4.7 and scheduled the workflows using Oozie4.2.0 .
  • Developed complex Talend jobs mappings to load the data from various sources using different components.
  • Created HBase tables from Hive and Wrote HiveQL statements to access HBase0.98.12.1 table's data.
  • Proficient in designing Row keys and Schema Design for NoSQL DatabaseHbase and knowledge of other NOSQL database Cassandra.
  • Used Hive to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into Hbase.
  • Created a MapReduce program which looks into data in Hbase current and prior versions to identify transactional updates. These updates are loaded into Hive external tables which are in turn referred by Hivescripts in transactional feeds generation.

Environment: : Hadoop (Hortonworks), HDFS, Map Reduce, Hive, Scala, Pig, Sqoop, AWS, UNIX Shell Scripting

Confidential

Hadoop and Java Developer

Responsibilities:

  • Worked on Distributed/Cloud Computing Hortonworks distributed Hadoop (CDH4) and Ambari Server.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and processing.
  • Involved in installing Hadoop Ecosystem components.
  • Importing and exporting data into HDFS, Pig, Hive and HBase using SQOOP.
  • Experience in Extraction, Transformation, and Loading (ETL) of data from different sources and building pipelines to load into the data warehouse.
  • Experience in loading data into HDFS using Sqoop from the RDBMS after filtering the input data.
  • Built Hive External and Managed tables on top of the data and processed various operations in Hive queries to clean the data(trim), add calculated attributes by extracting.
  • Created Hive partitioning, bucketing, map joins and sort merge bucketing to generate optimal results.
  • Performed various troubleshoots on job failures and found solutions for the root cause in resource manager.
  • Developed PigScripts for capturing data change and record processing between new data and already existed data in HDFS .
  • Installed and configured Hive and written Hive User Defined Functions.
  • Worked on Maven to build tool for building jar files.
  • Knowledge in struts tiles framework for layout management.
  • Worked on design, analysis, and development and testing various phases of the application.
  • Develop named HQL queries and Criteria for use in application.
  • Developed user interface using JSP and HTML .
  • Used JDBC for the Database connectivity.
  • Consistently met deadlines as well as requirements for all production work orders.
  • Executed SQL statements for searching contactors depending on Criteria.
  • Development and integration of the application using EclipseIDE .

Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Core Java, SQL, Oracle, Java, J2EE, JSP, Hibernate, JSP, HTML, JDBC, Maven, Eclipse.

We'd love your feedback!