We provide IT Staff Augmentation Services!

Cloud Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Charlotte, NC

PROFESSIONAL SUMMARY:

  • IT professional with 7+ years of experience in software design, development, deployment and maintenance of business applications in fields of health, insurance, finance (BFSI), retail and Investment sectors.
  • 4 years of experience in domain of BigData using various Hadoop eco - system tools and Spark APIs.
  • Solid understanding of architecture, working of Hadoop framework involving Hadoop Distributed File System and its eco-system components Map Reduce, Pig, Hive, HBase, Flume, Sqoop, Hue, Ambari, Zookeeper and Oozie, Storm, Spark, Kafka.
  • Experienced in building highly reliable, scalable Big-data solutions on Hadoop distributions Cloudera, Horton works, AWS EMR.
  • Expertise in Developing Spark application using SparkCore, SparkSQL and SparkStreaming API's in Scala deploying in yarn cluster in client, cluster mode using spark-submit.
  • Involved in creating, transforming and actions on RDDs, Datasets using Scala integrating the applications to Spark framework using SBT and MAVEN build automation tools.
  • Experience in using D- Streams in streaming, Accumulator , Broadcastvariables , various levels of caching .
  • Deep understanding of performance tuning , partitioning for optimizing spark applications.
  • Worked on real time data integration using Kafka data pipeline, Sparkstreaming and HBase.
  • Extensive knowledge on NoSQL databases like HBase and Mongo DB.
  • In-depth understanding of NoSQL databases such as HBase, MongoDB and its Integration with Hadoop cluster.
  • Experience in streaming data ingestion using Kafka and stream processing platforms like Spark Streaming.
  • Configured and deployed Cloudera distribution Multi-node Hadoop cluster on Amazon Ec2 instances, pseudo-distributed cluster in local Linux machines for Proof of concepts (POC).
  • Strong working experience in extracting, wrangling, ingestion, processing, storing, querying and Analyzing structured, semi-structured and unstructured data.
  • Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
  • Developed, deployed and supported several MapReduce applications in Java to handle semi and unstructured data.
  • Sound Knowledge in Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Compression techniques, Multiple Hadoop Input & output formats.
  • Solid experience in working with csv, text, sequential, avro, parquet, orc, json formats of data.
  • Expertise in working with Hive data warehouse tool - creating tables, data distribution by implementing static and dynamic partitioning, bucketing and optimizing the HiveQL queries.
  • Involved in ingestion of structured data from SQL Server, MySQL, TERADATA to HDFS, Hive and HBase using Sqoop.
  • Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
  • Expertise in moving structured schema data between Pig and Hive using HCatalog.
  • Designed and implemented Hive and Pig UDF's using java for evaluation, filtering, loading and storing of data.
  • Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
  • Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism with the service catalog services.
  • Experienced in job workflow such as Oozie and monitoring tools like Hue.
  • Proficient knowledge and hands on experience in writing shell scripts in Linux.
  • Expertise in complete JavaPackage, object-oriented design.
  • Developed core modules in large cross-platform applications using JAVA , JSP , Servlets , Hibernate , RESTful , JDBC , JavaScript , XML , and HTML .
  • Extensive experience in developing and deploying applications using WebLogic , ApacheTomcat and JBOSS .
  • Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
  • Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).

TECHNICAL SKILLS:

BigData Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Hue, Ambari, Zookeeper, Kafka, Apache Spark, Storm

Hadoop Distributions: Cloudera, Horton Works, Apache, AWS EMR

Languages: C, Java, PL/SQL, PigLatin, HiveQL, Scala

IDE Tools: Eclipse, NetBeans, IntelliJ, Spring Tool Suite (STS)

Web Technologies: HTML, CSS, JavaScript, XML, JSP, RESTful.

Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS

Reporting Tools /ETL Tools: Tableau, Power view for Microsoft Excel, Splunk.

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (Hbase, MongoDB)

Build Automation tools: SBT, Ant, Maven

PROFESSIONAL EXPERIENCE:

Confidential, Charlotte, NC

Cloud Big Data Engineer

Responsibilities:

  • Working as a part of the Big Data analytics team on cloud (AWS) big data infrastructure such as Hadoop Eco System, HDFS, Spark, and cloud technologies. Build data pipelines with gigabytes/terabytes of data and triage the challenges of manipulating such large datasets.
  • Working on the large-scale Hadoop Yarn cluster for distributed data processing analyzing using Spark, Hive.
  • Used various spark Transformations and Actions for cleansing the input data.
  • Developed shell scripts to generate the hive create statements from the data and load the data into the table.
  • Wrote Map Reduce jobs using Java API.
  • Responsible for performing extensive data validation using Hive.
  • Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
  • Involved in designing and developing non­trivial ETL processes within Hadoop using tools like Sqoop and Oozie.
  • Used DML statements to perform different operations on Hive Tables.
  • Developing Hive queries for creating foundation tables from stage data.
  • Developed java Map Reduce custom counters to track the records that are processed by map reduce job.
  • Involved in Oozie Workflow with Java Actions, Bash actions to submit Spark jobs.
  • Involved in creating Dash boards on Splunk using the AWS Cloud watch logs.

Environment: Hadoop2.6, Spark, Scala, Hive, MapReduce, MYSQL8.4, SQLServer2014/2012, Java, Sqoop, Splunk, various file formats like JSON & Parquet, AWS S3, EMR, Cloud Watch, Service Catalog, Hue, Control M.

Confidential, Springfield, IL

BigData/Hadoop developer

Responsibilities:

  • Involved in creating Hive tables, loading and analyzing data using hive queries.
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig.
  • Used Cloudera Quickstart vm for deploying the cluster.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Mentored analyst and test team for writing Hive Queries.
  • Analyzed the data by performing Hive queries and running Pig scripts to validate data.
  • Generated the datasets and loaded to HADOOP Ecosystem.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries, Pig Scripts, Sqoop jobs.
  • Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
  • Used Sqoop, Pig, Hive as ETL tools for pulling and transforming data.
  • Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality

Environment: Cloudera, Hadoop, HDFS, Spark, Oozie, Pig, Hive, MapReduce, Sqoop, MongoDB, Linux, Core Java, SOAP, XML, JMS, JBOSS.

Confidential

Hadoop Developer

Responsibilities:

  • Installed/clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Involved in installation and configuration of Cloudera distribution Hadoop CDH 3.x, CDH 4.x.
  • Involved in setup of 50 nodes Hadoop cluster.
  • Developed Hive/Pig scripts.
  • Worked on Sqoop and hive tuning activities.
  • Worked on upgrading cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
  • Installed and integrated Oozie with the Hadoop stack to run multiple hive and Pig scripts.
  • Involved in creating and maintaining Hive tables, loading data into the tables using Hive queries and MapReduce jobs.
  • Handled Data load from different UNIX file systems to HDFS.
  • Customized the SSH settings in the Master node.
  • Resolved various issues faced by users which are related to platform.

Environment: CDH, Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Kerberos, Shell script, UNIX

Confidential

Java Developer

Responsibilities:

  • Analyzed, Designed and developed the system to meet the requirements of business users.
  • Participated in the design review of the system to perform Object Analysis and provide best possible solutions for the application
  • Implemented presentation tier using HTML, JSP, Servlets, AJAX frameworks.
  • Used AJAX for implementing part of the functionality for Customer Registration, View Customer information modules.
  • Used JavaScript for client-side validation.
  • Implemented Struts MVC framework for developing J2EE based web application.
  • Used JDBC to connect and access database.
  • IBM WebSphere to deploy J2EE application components
  • Database tier involved the SQL Server.
  • Developed JUnit test cases.

Environment: Java, JSP, Servlets, Struts, HTML, JavaScript, JQuery, SQL Server, WebSphere MQ, JUnit, XML, AJAX, Windows NT, CVS.

We'd love your feedback!