We provide IT Staff Augmentation Services!

Senior Big Data Cloud Engineer Resume

2.00/5 (Submit Your Rating)

Houston, TX

SUMMARY

  • Overall 9+ years of professional IT experience in Software Development. This also include 7+ years of experience in ingestion, storage, querying, processing and analysis of Big Data using Hadoop technologies and solutions.
  • Expertise in Azure Development worked on Azure web application, App services, Serverless, Azure storage, Azure Data Bricks, Azure SQL Database, Azure Virtual Machines, Azure AD, Azure search, Azure DNS, Azure VPN Gateway and Notification hub.
  • Proficiency with the tools in Hadoop Ecosystem Components like Pig, Hive, HDFS, Map Reduce, Sqoop, Flume, HBase, Impala.
  • Good knowledge of Hadoop Architecture and various daemons such as Job Tracker, Task Tracker, Name Node, Data Node.
  • Experience on YARN environment with Storm, Spark, Kafka and Avro
  • Expertise in importing and exporting terra bytes of data using Sqoop from HDFS to Relational Database system and Vice - versa.
  • Experienced with the Sparkimproving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's,Spark YARN
  • Hands on experience in loading unstructured data into HDFS using Flume/Kafka.
  • Developed producers for Kafka which compress, and bind many small files into a larger Avro and Sequence files before writing to HDFS to make best use of a Hadoop block size
  • Good hands-on experience in NoSQL databases such as HBase, MongoDB and Cassandra.
  • Experience in streaming real-time data using Flume into HDFS.
  • Experience in analyzing data using Hive QL, Pig Latin, HBase and custom Map Reduce programs in Java.
  • Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
  • Expertise in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Developed simple to complex Map/Reduce jobs using Hive and Pig to handle files in multiple formats like JSON, Text, XML, Sequence File etc.
  • Worked extensively on creating combiners, Partitioning, Distributed cache to improve the performance of MapReduce jobs.
  • Working knowledge of AWS cloud infrastructural components and hands-on experience in AWS provisioning.
  • Experienced in working with streaming data using Kafka K-Streams.
  • Experience in working with different data sources like Flat files, XML files, log files and Database.
  • Very Good understanding and Working Knowledge of Object-Oriented Programming (OOPS).
  • Expertise in distributed and web environments focused on core java technologies like Collections, Multithreading, IO, Exception Handling and memory management.
  • Expertise in application development using Java, RDBMS, and UNIX shell scripting.
  • Knowledge of data warehousing and ETL tools like Informatica and Pentaho.
  • Experience in Working with BI teams in generating the reports and designing ETL workflows on Tableau.
  • Extensive experience with SQL, PL/SQL and database concepts.
  • Working knowledge in SQL, Stored Procedures, Functions, Packages, DB Triggers and Indexes.
  • Good Experience with version control tools like CVS, SVN, CLEAR CASE AND GIT.

TECHNICAL SKILLS

Languages: C, C++, Java, XML, JavaScript, Python, Visual Basic, PL/SQL.

Big Data technologies: HDFS, MapReduce, NiFi, Hive, Pig, Sqoop, Flume, Impala, HBase, MongoDB, Cassandra, Oozie, Zookeeper, Yarn, Storm, Kafka, AWS, Spark.

Frameworks: Spring, Hibernate, Struts.

J2EE Technologies: JSP, Servlets, EJB, JMS, JTA, JNDI, LDAP, JPA, JDBC, Annotations, AOP (Aspect Oriented Programming), IoC (Dependency Injection), Java Mail.

Web Services: WSDL, SOAP, Apache CXF/XFire, Apache Axis, REST, Jersey.

Web Server: Apache Tomcat 6.4, JBOSS, WebSphere, WebLogic.

Version Control: GIT, CVS, SVN, CLEARCASE.

Databases: Oracle, DB2, SQL Server, MySQL.

IDE: Eclipse, WebSphere Studio Application Developer, SQL Server, Notepad++, NetBeans, MS Office suite.

Operating systems: UNIX, Windows, LINUX.

PROFESSIONAL EXPERIENCE

Confidential

Senior Big Data Cloud Engineer

Responsibilities:

  • Interacting with multiple teams understanding their business requirements for designing flexible and common component.
  • Hands on experience on Unified Data Analytics with Databricks, Databricks Workspace User Interface, Managing Databricks Notebooks, Delta Lake with Python, Delta Lake with Spark SQL.
  • Good understanding of Spark Architecture with Databricks, Structured Streaming. Setting Up AWS and Microsoft Azure with Databricks, Databricks Workspace for Business Analytics, Manage Clusters in Databricks, Managing the Machine Learning Lifecycle
  • Analyzed the data flow from different sources to target to provide the corresponding design Architecture in Azure environment.
  • Created Application Interface Document for the downstream to create new interface to transfer and receive the files through Azure Data Share.
  • Created Linked service to land the data from SFTP location to Azure Data Lake.
  • Validating the source file forData Integrity and Data Qualityby reading header and trailer information and column validations.
  • Used Hive to do transformations, joins, filter and some pre-aggregations before storing the data onto HDFS.
  • With Data Lake as a source, created an End-to-End pipeline with Kafka.
  • Implemented two different pipelines with Kafka K-Streams one carrying the JSON formatted data whereas other carrying the Avro data.
  • Worked on transforming the data using K-Streams.
  • Worked with Schema Registry while dealing with Avro data format.
  • Implemented Kafka SSL on topics to stop unauthorized access to Kafka topics.
  • Experience in working with different formats of data like Text, JSON and Avro and pass them through Kafka pipeline.
  • Experience in creating and managing pipelines using Azure Data Factory, copying data, configuring data flow in and out of Azure Data Lake Stores according to technical requirements.
  • Experience in creating Data sets and Data flows in Azure from different databases.
  • UsedSqoopfor importing and exporting data from DB2, OracleintoHDFS and Hive.
  • Worked on three layers for storing data such asraw layer, intermediate layer and publish layer.
  • Creating externalhivetables to store and queries the data which is loaded.
  • UsingAvrofile format compressed with Snappy in intermediate tables for faster processing of data.
  • Written multiple python scripts to make REST Api calls and push data from csv files to web applications.
  • Usedparquetfile format for published tables and created views on the tables.
  • Automated the jobs withAirflow and Jenkins.
  • ImplementedSpark SQLto accesshivetables into spark for faster processing of data.
  • Active member for developingPOCon streaming data usingApache KafkaandSpark Streaming.
  • Participated in evaluation and selection of new technologies to support system efficiency.
  • Participated in development and execution of system and disaster recovery processes

Environment: Hadoop, MAPR, HDFS, Hive, Spark, Kafka, Sqoop, Pig, Java, Python, Eclipse, Tableau, UNIX, and Maven.

Confidential, Houston, TX

Lead Hadoop Developer

Responsibilities:

  • Developed a data pipeline using Kafka, HBase, Spark and Hive to ingest, transform and analyzing customer behavioral data.
  • Handled large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, TEMPEffective & efficient Joins, Transformations.
  • Developing use cases for processing real time streaming data using tools like Spark Streaming.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS Created data partitions on large data sets in S3 and DDL on partitioned data.
  • Imported required tables from RDBMS to HDFS using Sqoop and used Spark and Kafka to get real time streaming of data into HBase.
  • Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis and Worked with Spark accumulators and broadcast variables.
  • Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework and handled Json Data.
  • Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka.
  • Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Experienced in managing and reviewing the Hadoop log files.
  • Involved in building real time data pipeline using Kafka and Spark with storage as MongoDB.
  • Used Oozie for designing workflows and Falcon for Job scheduling.
  • Fix defects as needed during the QA phase, support QA testing, troubleshoot defects and identify the source of defects.
  • Working along with solution design teams to come up with the optimal solution on using appropriate set of tools for the specific sources of data.

Environment: Apache Hadoop, Apache Hive, Map Reduce, Pig, Zookeeper, SQOOP, Spark, Oozie, Falcon, Kafka, Tableau, Hortonworks, AWS.

Confidential

Sr. Big Data Engineer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
  • Design and support of Data ingestion, Data Migration and Data processing for BI and Data Analytics.
  • Worked on Hadoop, Map Reduce, HDFS and Developed multiple map reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Coordinated with business customers to gather business requirements and interact with other technical peers to derive technical requirements and delivered the BRD and TDD documents.
  • Created concurrent access for hive tables with shared and exclusive locking dat can be enabled in hive with the halp of Zookeeper implementation in the cluster.
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Worked on building BI reports in Tableau with Spark using Shark and Spark SQL.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate TEMPeffective querying on the log data.
  • Involved in creating Hive tables, loading with data and writing hive queries dat will run internally in map reduce way.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration.
  • Experienced in managing and reviewing the Hadoop log files.
  • Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Plan, deploy, monitor, and maintain Amazon AWS cloud infrastructure consisting of multiple EC2 nodes.
  • Aurora using RDS tool
  • Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions.
  • Implemented Storm builder topologies to perform cleansing operations before moving data into HBase.
  • Migrated an existing on-premises application to AWS.
  • Worked on performance tuning of Apache NiFi workflow to optimize the data ingestion speeds.
  • Involved in creating Hive Tables, loading data and writing hive queries.
  • Created Data model for Hive tables.
  • Involved in Unit testing and delivered Unit test plans and results documents.
  • Worked on Oozie workflow engine for Job scheduling.
  • Implementation of POC on Hadoop stack and different big data analytic tools, migration from different databases (me.e. Teradata, Oracle, MySQL) to Hadoop
  • Extensively involved in Design phase and delivered Design documents.
  • Involved in Testing and coordination with business in User testing.

Environment: Apache Hadoop, Apache Hive, Map Reduce, Pig, Zookeeper, SQOOP, Spark, Oozie, Kafka, Storm, AWS.

Confidential, Madison, WI

Hadoop Developer

Responsibilities:

  • Involved in the Complete Software development life cycle (SDLC) to develop the application.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop.
  • Involved in importing data from LINUX file system to HDFS.
  • Experience in managing and reviewing Hadoop log files.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Implemented test scripts to support test driven development and continuous integration.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Installed and configured Hadoop and Hadoop eco system. Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Involved in creating Hive tables, loading with data and writing hive queries dat will run internally in MapReduce way.
  • Supported MapReduce Programs those are running on the cluster.
  • Analyzed large data sets by running Hive queries and Pig scripts.
  • Worked on tuning the performance Pig queries.
  • Used AWS services like EC2 and S3 for small data sets.
  • Installed Oozie workflow engine to run multiple MapReduce jobs.
  • Used Kafka for Website activity tracking and Stream processing.
  • Used Flume, Kafka to load log data into HDFS.
  • Worked to develop a stream filtering system on top of Apache Kafka.
  • Designed a system using Kafka to auto-scale the backend servers based on the events throughput.
  • Used Cassandra to by CloudScout"> http://cassandra.apache.org/ support contracts and services dat are available from third parties
  • Designed and implemented Cassandra no SQL based database.
  • Worked with application teams to install operating system, Hadoop updates, patches, Kafka version upgrades as required.

Environment: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Linux, Java, Oozie, H-Base, Kafka, Cassandra, AWS, Hortonworks.

Confidential, Omaha, NE

Big Data Engineer

Responsibilities:

  • Developed the application using Struts Framework dat leverages classical Model View Controller (MVC) architecture.
  • Extensively worked on User Interface for few modules using JSPs, JavaScript and Ajax
  • Created Business Logic using Servlets, POJO's and deployed them on Web logic server
  • Involved in running Hadoop jobs for processing millions of records of text data.
  • Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, manage and review data backups and log files.
  • Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop Clusters.
  • Monitored multiple Hadoop clusters environments using Ganglia.
  • Managing and scheduling Jobs on a Hadoop cluster.
  • Involved in defining job flows, managing and reviewing log files.
  • Monitored workload, job performance and capacity planning using Cloud era Manager.
  • Worked on Oozie workflow engine to run multiple Map Reduce, Hive and Pig jobs.
  • Implemented Map Reduce programs on log data to transform into structured way to find user information.
  • Responsible for loading and transforming large sets of structured, semi structured and unstructured data.
  • Collected the log data from web servers and integrated into HDFS using Flume.
  • Responsible to manage data coming from different sources.
  • Extracted files from Couch DB and placed into HDFS using Sqoop and pre-process the data for analysis.
  • Gained experience with NoSQL database.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: JDK, J2EE, Struts, JSP, Spring, Servlets, WebSphere, HTML, XML, JavaScript, Informatica Power Center, Hadoop, HDFS, Pig, Hive, MapReduce, HBase, Sqoop, Python, Oozie, Ganglia and Flume.

We'd love your feedback!