Cloud Big Data Engineer Resume
Charlotte, NC
PROFESSIONAL SUMMARY:
- IT professional with 7+ years of experience in software design, development, deployment and maintenance of business applications in fields of health, insurance, finance (BFSI), retail and Investment sectors.
- 4 years of experience in domain of BigData using various Hadoop eco - system tools and Spark APIs.
- Solid understanding of architecture, working of Hadoop framework involving Hadoop Distributed File System and its eco-system components Map Reduce, Pig, Hive, HBase, Flume, Sqoop, Hue, Ambari, Zookeeper and Oozie, Storm, Spark, Kafka.
- Experienced in building highly reliable, scalable Big-data solutions on Hadoop distributions Cloudera, Horton works, AWS EMR.
- Expertise in Developing Spark application using SparkCore, SparkSQL and SparkStreaming API's in Scala deploying in yarn cluster in client, cluster mode using spark-submit.
- Involved in creating, transforming and actions on RDDs, Datasets using Scala integrating the applications to Spark framework using SBT and MAVEN build automation tools.
- Experience in using D- Streams in streaming, Accumulator , Broadcastvariables , various levels of caching .
- Deep understanding of performance tuning , partitioning for optimizing spark applications.
- Worked on real time data integration using Kafka data pipeline, Sparkstreaming and HBase.
- Extensive knowledge on NoSQL databases like HBase and Mongo DB.
- In-depth understanding of NoSQL databases such as HBase, MongoDB and its Integration with Hadoop cluster.
- Experience in streaming data ingestion using Kafka and stream processing platforms like Spark Streaming.
- Configured and deployed Cloudera distribution Multi-node Hadoop cluster on Amazon Ec2 instances, pseudo-distributed cluster in local Linux machines for Proof of concepts (POC).
- Strong working experience in extracting, wrangling, ingestion, processing, storing, querying and Analyzing structured, semi-structured and unstructured data.
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Developed, deployed and supported several MapReduce applications in Java to handle semi and unstructured data.
- Sound Knowledge in Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Compression techniques, Multiple Hadoop Input & output formats.
- Solid experience in working with csv, text, sequential, avro, parquet, orc, json formats of data.
- Expertise in working with Hive data warehouse tool - creating tables, data distribution by implementing static and dynamic partitioning, bucketing and optimizing the HiveQL queries.
- Involved in ingestion of structured data from SQL Server, MySQL, TERADATA to HDFS, Hive and HBase using Sqoop.
- Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
- Expertise in moving structured schema data between Pig and Hive using HCatalog.
- Designed and implemented Hive and Pig UDF's using java for evaluation, filtering, loading and storing of data.
- Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
- Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism with the service catalog services.
- Experienced in job workflow such as Oozie and monitoring tools like Hue.
- Proficient knowledge and hands on experience in writing shell scripts in Linux.
- Expertise in complete JavaPackage, object-oriented design.
- Developed core modules in large cross-platform applications using JAVA , JSP , Servlets , Hibernate , RESTful , JDBC , JavaScript , XML , and HTML .
- Extensive experience in developing and deploying applications using WebLogic , ApacheTomcat and JBOSS .
- Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
- Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).
TECHNICAL SKILLS:
BigData Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Hue, Ambari, Zookeeper, Kafka, Apache Spark, Storm
Hadoop Distributions: Cloudera, Horton Works, Apache, AWS EMR
Languages: C, Java, PL/SQL, PigLatin, HiveQL, Scala
IDE Tools: Eclipse, NetBeans, IntelliJ, Spring Tool Suite (STS)
Web Technologies: HTML, CSS, JavaScript, XML, JSP, RESTful.
Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS
Reporting Tools /ETL Tools: Tableau, Power view for Microsoft Excel, Splunk.
Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (Hbase, MongoDB)
Build Automation tools: SBT, Ant, Maven
PROFESSIONAL EXPERIENCE:
Confidential, Charlotte, NC
Cloud Big Data Engineer
Responsibilities:
- Working as a part of the Big Data analytics team on cloud (AWS) big data infrastructure such as Hadoop Eco System, HDFS, Spark, and cloud technologies. Build data pipelines with gigabytes/terabytes of data and triage the challenges of manipulating such large datasets.
- Working on the large-scale Hadoop Yarn cluster for distributed data processing analyzing using Spark, Hive.
- Used various spark Transformations and Actions for cleansing the input data.
- Developed shell scripts to generate the hive create statements from the data and load the data into the table.
- Wrote Map Reduce jobs using Java API.
- Responsible for performing extensive data validation using Hive.
- Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access.
- Involved in designing and developing nontrivial ETL processes within Hadoop using tools like Sqoop and Oozie.
- Used DML statements to perform different operations on Hive Tables.
- Developing Hive queries for creating foundation tables from stage data.
- Developed java Map Reduce custom counters to track the records that are processed by map reduce job.
- Involved in Oozie Workflow with Java Actions, Bash actions to submit Spark jobs.
- Involved in creating Dash boards on Splunk using the AWS Cloud watch logs.
Environment: Hadoop2.6, Spark, Scala, Hive, MapReduce, MYSQL8.4, SQLServer2014/2012, Java, Sqoop, Splunk, various file formats like JSON & Parquet, AWS S3, EMR, Cloud Watch, Service Catalog, Hue, Control M.
Confidential, Springfield, IL
BigData/Hadoop developer
Responsibilities:
- Involved in creating Hive tables, loading and analyzing data using hive queries.
- Developed Simple to complex Map Reduce Jobs using Hive and Pig.
- Used Cloudera Quickstart vm for deploying the cluster.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Mentored analyst and test team for writing Hive Queries.
- Analyzed the data by performing Hive queries and running Pig scripts to validate data.
- Generated the datasets and loaded to HADOOP Ecosystem.
- Assisted in exporting analyzed data to relational databases using Sqoop.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries, Pig Scripts, Sqoop jobs.
- Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
- Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
- Used Sqoop, Pig, Hive as ETL tools for pulling and transforming data.
- Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality
Environment: Cloudera, Hadoop, HDFS, Spark, Oozie, Pig, Hive, MapReduce, Sqoop, MongoDB, Linux, Core Java, SOAP, XML, JMS, JBOSS.
Confidential
Hadoop Developer
Responsibilities:
- Installed/clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Involved in installation and configuration of Cloudera distribution Hadoop CDH 3.x, CDH 4.x.
- Involved in setup of 50 nodes Hadoop cluster.
- Developed Hive/Pig scripts.
- Worked on Sqoop and hive tuning activities.
- Worked on upgrading cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacity planning, and slots configuration.
- Installed and integrated Oozie with the Hadoop stack to run multiple hive and Pig scripts.
- Involved in creating and maintaining Hive tables, loading data into the tables using Hive queries and MapReduce jobs.
- Handled Data load from different UNIX file systems to HDFS.
- Customized the SSH settings in the Master node.
- Resolved various issues faced by users which are related to platform.
Environment: CDH, Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Kerberos, Shell script, UNIX
Confidential
Java Developer
Responsibilities:
- Analyzed, Designed and developed the system to meet the requirements of business users.
- Participated in the design review of the system to perform Object Analysis and provide best possible solutions for the application
- Implemented presentation tier using HTML, JSP, Servlets, AJAX frameworks.
- Used AJAX for implementing part of the functionality for Customer Registration, View Customer information modules.
- Used JavaScript for client-side validation.
- Implemented Struts MVC framework for developing J2EE based web application.
- Used JDBC to connect and access database.
- IBM WebSphere to deploy J2EE application components
- Database tier involved the SQL Server.
- Developed JUnit test cases.
Environment: Java, JSP, Servlets, Struts, HTML, JavaScript, JQuery, SQL Server, WebSphere MQ, JUnit, XML, AJAX, Windows NT, CVS.