We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • Proficient in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Flume, Yarn, HBase, Sqoop, AWS, Spark, Storm, Kafka, Oozie, and Zookeeper.
  • Experience with Stream sets and looker.
  • Worked on google cloud storage and big query.
  • Worked on data bricks notebooks. Used Spark with Scala/ python scripts in by attaching the cluster.
  • Strong comprehension of Hadoop daemons and Map - Reduce topics.
  • Strong knowledge of Spark for handling large data processing in the streaming process along with Scala.
  • Hands-on experience in provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services (AWS) - EC2.
  • Experience on Elastic search and kibana.
  • Experienced in developing UDFs for Pig and Hive using Java.
  • Hands on experience in developing UDF, DATA Frames and SQL Queries in Spark SQL.
  • Highly skilled in integrating Kafka with Spark streaming for high-speed data processing.
  • Unit Testing with Junit test cases and integration of developed code.
  • Worked with NoSQL databases like HBase, Cassandra and MongoDB for information extraction and place a huge amount of data.
  • Having the knowledge to implement Horton works (HDP 2.3 and HDP 2.1), Cloudera (CDH3, CDH4, CDH5) on Linux
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
  • Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Experience in designing star Schema, Snowflake schema for Data warehouse, ODS architecture.
  • Ability to develop Map Reduce program using Java and Python.
  • Good understanding and exposure to Python programming.
  • Knowledge in developing a Nifi flow prototype for data ingestion in HDFS.
  • Exporting and importing data to and from Oracle using SQL developer for analysis.
  • Good experience in using Sqoop for traditional RDBMS data pulls.
  • Practical experience with Git, Jenkins, and Docker in a work/team

    environment: and have experience with Agile practices

  • Worked with different distributions of Hadoop like Hortonworks and Cloudera.
  • Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors.
  • Extensive experience in Shell and python scripting.
  • Extensive use of Open Source Software and Web/Application Servers like Eclipse 3.x IDE and Apache Tomcat 6.0.
  • Involved in reports development using reporting tools like Tableau and looker. Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports.
  • Experience in cluster monitoring tools like Ambari & Apache hue.

PROFESSIONAL EXPERIENCE

Confidential

BIG DATA ENGINEER

Responsibilities:

  • Responsible for developing and supporting Data warehousing operations.
  • Involved in peta byte scale data migration operations.
  • Experienced in dealing with large scale HIPAA compliant data applications and handling sensitive information like PHI (Patient Health Information) in a secure environment.
  • Worked on building and developing ETL pipelines using Spark-based applications.
  • Maintained resources on-premises as well as on the cloud.
  • Utilized various cloud-based services to maintain and monitor various cluster resources.
  • Conducted ETL Data Integration, Cleansing, and Transformations using Apache Kudu and Spark.
  • Used Apache Nifi for file conversions and data processing.
  • Developed applications to map the data between different sources and destinations using Python and Scala.
  • Reviewed and conducted performance tuning on various Spark applications.
  • Responsible for managing data from disparate sources.
  • Experienced in loading and transforming large sets of structured semi-structured and unstructured data.
  • Using Hive Script in Spark for data cleaning and transformation purpose.
  • Responsible for migrating data from various conventional data sources as per the architecture.
  • Developed Spark applications in Scala and Python to migrate the data.
  • Developed Linux based shell scripts to automate the applications.
  • Provided support for building Kafka consumer applications.
  • Performed unit testing and collaborated with the QA team for possible bug fixes.
  • Collaborated with data modelers and other developers during the implementation.
  • Worked in an Agile-based Scrum Methodology.
  • Load data into Hive partitioned tab.
  • Export the analyzed data to relational databases using Kudu for visualization and to generate reports for the Business Intelligence team.

Environment: AWS, Linux, Spark-SQL, Python, Scala, CDH 5.12.1, Kudu, Spark, Oozie, Cloudera Manager, Hue, SQL Server, Maven, Git, Agile methodology.

Confidential

Hadoop Developer

Responsibilities:

  • Involved in analyzing business requirements and prepared detailed specifications that follow project guidelines required for project development.
  • Used Sqoop to import data from Relational Databases like MySQL, Oracle.
  • Involved in importing structured and unstructured data into HDFS.
  • Responsible for fetching real-time data using Kafka and processing using Spark and Scala.
  • Worked on Kafka to import real-time weblogs and ingested the data to Spark Streaming.
  • Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
  • Worked on Building and implementing real-time streaming ETL pipeline using Kafka Streams API.
  • Worked on Hive to implement Web Interfacing and stored the data in Hive tables.
  • Migrated Map Reduce programs into Spark transformations using Spark and Scala.
  • Experienced with Spark Context, Spark-SQL, Spark YARN.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
  • Implemented data quality checks using Spark Streaming and arranged passable and bad flags on the data.
  • Implemented Hive Partitioning and Bucketing on the collected data in HDFS.
  • Implemented Sqoop jobs for large data exchanges between RDBMS and Hive clusters.
  • Developed traits and case classes etc. in Scala.
  • Developed Spark scripts using Scala shell commands as per the business requirement.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Worked on connecting the Cassandra database to the Amazon EMR File System for storing the database in S3.
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
  • Deployed the project on Amazon EMR with S3 connectivity for setting backup storage.
  • Well versed in using Elastic Load Balancer for Autoscaling in EC2 servers.
  • Configured workflows that involve Hadoop actions using Oozie.
  • Used Python for pattern matching in build logs to format warnings and errors.
  • Coordinated with the SCRUM team in delivering agreed user stories on time for every sprint.

Environment: Hadoop YARN, Spark SQL, Spark-Streaming, AWS S3, AWS EMR, Spark-SQL, GraphX, Scala, Python, Kafka, Hive, Pig, Sqoop, Cloudera, Oracle 10g, Linux.

Confidential

Hadoop Developer

Responsibilities:

  • Responsible to collect, clean, and store data for analysis using Kafka, Sqoop, Spark, HDFS
  • Used Kafka and Spark framework for real time and batch data processing
  • Ingested large amount of data from different data sources into HDFS using Kafka
  • Implemented Spark using Scala and performed cleansing of data by applying Transformations and Actions
  • Used Case Class in Scala to convert RDD’s into DataFrames in Spark
  • Processed and Analyzed data in stored in HBase and HDFS
  • Developed Spark jobs using Scala on top of Yarn for interactive and Batch Analysis.
  • Developed UNIX shell scripts to load large number of files into HDFS from Linux File System.
  • Experience in querying data using Spark SQL for faster processing of the data sets.
  • Offloaded data from EDW into Hadoop Cluster using Sqoop.
  • Developed Sqoop scripts for importing and exporting data into HDFS and Hive
  • Created Hive internal and external Tables by Partitioning, bucketing for further Analysis using Hive
  • Used Oozie workflow to automate and schedule jobs
  • Used Zookeeper for maintaining and monitoring clusters
  • Exported the data into RDBMS using Sqoop for BI team to perform visualization and to generate reports
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager
  • Used JIRA for project tracking and participated in daily scrum meetings

Environment: Spark, Sqoop, Scala, Hive, Kafka, YARN, Teradata, RDBMS, HDFS, Oozie, Zookeeper, HBase, Tableau, Hadoop (Cloudera), JIRA

Confidential

Hadoop Developer

Responsibilities:

  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Involved in running MapReduce jobs for processing millions of records.
  • Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
  • Responsible for Data Modeling in Cassandra as per our requirement.
  • Managing and scheduling Jobs on a Hadoop cluster using Oozie and cron jobs.
  • Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Used Elastic Search & MongoDB for storing and querying the offers and non-offers data.
  • Created UDFs to calculate the pending payment for the given Residential or Small Business customer, and used in Pig and Hive Scripts.
  • Deployed and built the application using Maven.
  • Used Python scripting for large scale text processing utilities
  • Handled importing of data from various data sources, performed transformations using Hive. (External tables, partitioning).
  • Responsible for data modeling in MongoDB in order to load data which is coming as structured as well as unstructured data.
  • Unstructured files like XML's, JSON files are processed using custom-built Java API and pushed into MongoDB.
  • Wrote test cases in MRunitfor unit testing of MapReduce Programs.
  • Involved in templates and screens in HTML and JavaScript.
  • Developed the XML Schema and Web services for the data maintenance and structures.
  • Built and deployed applications into multiple UNIX based environments and produced both unit and functional test results along with release notes.

Environment: HDFS, Kafka, Sqoop, Scala, java, Hive, Oozie, NoSQL, Oracle, MySQL, GIT, Zookeeper, DataStax Cassandra, Agile methodology, JIRA, Horton works data platform, Jenkins, AGILE(SCRUM).

Confidential

Software Engineer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive. HBase and MapReduce
  • Extracted data of everyday transaction of customers from DB2 and export to Hive and setup Online analytical processing
  • Installed and configured Hadoop, MapReduce, and HDFS clusters
  • Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
  • Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Loaded the structured data which was resulted from MapReduce jobs into Hive tables.
  • Identified issues on behavioral patterns and analyzed the logs using Hive queries.
  • Analyze and transform stored data by writing MapReduce or Pig jobs based on business requirements
  • Used Flume to collect, aggregate, and store the weblog data from different sources like web servers, mobile, and network devices and import to HDFS
  • Using Oozie, developed a workflow to automate the tasks of loading the data into HDFS and pre-processing with Pig scripts
  • Integrated Map-Reduce with HBase to import bulk data using MR programs
  • Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
  • Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
  • Developed data pipeline using Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
  • Used SQL queries, Stored Procedures, User Defined Functions (UDF), Database Triggers, using tools like SQL Profiler and Database Tuning Advisor (DTA)

Environment: HDFS, Map Reduce, Pig, Hive, Oozie, Sqoop, Flume, HBase, Talend, HiveQL, Java, Maven, Avro, Eclipse and Shell Scripting.

We'd love your feedback!