We provide IT Staff Augmentation Services!

Big Data Engineer Resume

0/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • 8+ years of IT experience with expertise in Hadoop ecosystem components such as HDFS, Map Reduce, Yarn, HBase, Pig, ETL Data Stage and Cloud technologies like AWS, Azure and Snowflake Sqoop, Spark SQL, Spark Streaming, and Hive for scalability, distributed computing, and high - performance computing.
  • Experience with Spark Core, Spark SQL and its core abstraction layer API’s like RDD and Data frame.
  • Experience gathering customer requirements, writing test cases, and partnering with developers to ensure full understanding of internal/external customer needs.
  • Excellent knowledge in building data engineering pipelines, automating, and fine-tuning for both batch and real time data pipelines.
  • Proficient Knowledge on streaming processes like Kafka and Spark.
  • Expertise in documenting source to target data mappings and Business rules associated with the ETL processes.
  • Deployed agile development methodology and actively participated in daily scrum meetings.
  • Experience working with AWS like S3, EMR, EC2, Step functions and Cloud Watch.
  • Experience in Work on AWS Databases like Elastic Cache (Memcached & Redis) and NoSQL databases HBase, Cassandra & MongoDB for database performance tuning & data modeling.
  • Experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, and Azure Data Lake.
  • Expertise in HQL queries to do data analytics on top of Bigdata.
  • Created a private cloud using Kubernetes that supports DEV, TEST, and PROD environments.
  • Experience in development of complex shell/Python scripts in Linux and DevOps tool like Jenkins, Maven, Terraform, Ansible, Docker and Kubernetes.
  • Expertise in writing Sqoop jobs to migrate huge amount of data from Relational Database Systems to HDFS/Hive Tables and vice-versa according to client's requirement.
  • Experience in configuringSpark Streamingto receive real time data from theApache Kafkaand store the stream data toHDFSand expertise in usingspark-SQLwith various data sources likeJSON, Parquet and Hive.
  • Extensively usedSpark Data Frames APIoverCloudera platformto perform analytics on Hive data and also usedSpark Data Frame Operationsto perform required Validations in the data.
  • Expertise in NOSQL databases and its integration with Hadoop cluster to store and retrieve huge amount of data.
  • Good Knowledge on SQL queries and creating database objects like stored procedures, triggers, packages, and functions using SQL and PL/SQL for implementing the business techniques.
  • Extensive experience in storing and documentation using NoSQL Databases like MongoDB, Snowflake, and HBase.
  • Experience with visualizing the data using BI and services amp tools such as Power BI, Tableau, Plotly, and Matplotlib.
  • Experience with Azure transformation projects and implementing ETL and data solutions using Azure Data Factory (ADF), SSIS.
  • Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database and SQL Datawarehouse environment.
  • Experience in data warehousing and business intelligence area in various domain.
  • Created Tableau dashboards designing with large data volumes from data source SQL servers.
  • Extract, Transform and Load (ETL) source data into respective target tables to build Data Marts.
  • Conducted Gap Analysis, created Use Cases, workflows, screen shots and Power Point presentations for various Data Applications.
  • Active involvement in all scrum ceremonies - Sprint Planning, Daily Scrum, Sprint Review and Retrospective meetings and assisted Product owner in creating and prioritizing user stories.

TECHNICAL SKILLS

Hadoop/Spark Ecosystem: Hadoop, MapReduce, Pig, Hive/impala, YARN, Kafka, Flume, Oozie, Zookeeper, Spark, Airflow

Hadoop Distribution: Cloudera distribution and Hortonworks

AWS: Amazon EC2, S3, RDS, IAM, Auto Scaling, CloudWatch, SNS, Athena, Glue, Kinesis, Lambda, EMR, Redshift, DynamoDB

Azure: Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake, Azure HDInsight, GCP, OpenStack.

ETL/BI Tools: Informatica, SSIS, Tableau, PowerBI, SSRS

CI/CD: Jenkins, Splunk, Ant, Maven, Gradle.

Containerization: Docker, Kubernetes

Ticketing Tools: JIRA, Service Now, Remedy

Operating Systems: Linux, Windows, Ubuntu, Unix

Database: Oracle, SQL Server, Cassandra, Teradata, PostgreSQL, Snowflake, HBase, MongoDB

Programming Languages: Scala, Hibernate, PL/SQL, R

Scripting: Python, Shell Scripting, JavaScript, jQuery, HTML, JSON, XML.

Web/Application server: Apache Tomcat, WebLogic, WebSphere Tools Eclipse, NetBeans

Version Control: Git, Subversion, Bitbucket, TFS.

SDLC: Agile, Scrum, Waterfall, Kanban.

PROFESSIONAL EXPERIENCE

Confidential, Charlotte NC

Big Data Engineer

Responsibilities:

  • Involved in all phases of SDLC including Requirement Gathering, Design, Analysis and Testing of customer specifications, Development, and Deployment of the Application.
  • Developed Data pipeline using Spark, Hive, Impala, and HBase to ingest customer behavioural data and financial histories intoHadoop cluster for analysis.
  • Processed different kinds of data including unstructured (logs, clickstreams, Shares, likes, topics, etc.), semi-structured (XML, JSON), and structured like RDBMS.
  • Developed Oozie Workflows for daily incremental loads, which get data from Teradata and then imported into hive tables.
  • Involved in data ingestion of log files from various servers using NIFI.
  • Experience with designing and building solutions for data ingestion in both real-time& batch using Sqoop, Impala, Kafka, and Spark.
  • Monitored workload, job performance, and capacity planning using Cloudera Manager.
  • Created an end-to-end Machine learning pipeline in PySpark and Python.
  • Written various Spark programs in Java for data extraction, transformation, and aggregation from multiple file-formats including XML, JSON, CSV and other compressed file formats and store the refined data in partitioned tables in the EDW.
  • Optimized the PySpark jobs to run on Kubernetes Cluster for faster data processing
  • Build ETL/ELT pipeline in data technologies like PySpark, Hive, Presto and Databricks.
  • Experience in data architecture best practices, integration, and data governance solutions (Data Catalog, Data Governance frameworks, Metadata and Data Quality)
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Developed Python scripts to manage AWS resources from API calls using BOTO3 SDK and worked with AWS CLI.
  • Set up the CI/CD pipelines using Maven, GitHub, and AWS.
  • Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work on AWS cloud (S3).
  • Experienced in writing real-time processing and core jobs using Spark Streaming with Kafka as a data pipeline system.
  • Created UNIX shell scripts for parameterizing the Sqoop and hive jobs.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs Map Reduce, Hive, and Sqoop as well as system-specific jobs such as Java programs and Shell scripts.
  • Experienced in the successful implementation of ETL solution between an OLTP and OLAP database in support of Decision Support Systems with expertise in all phases of SDLC.

Environment: MySQL, SQL Server, Python, Spark, Kubernetes Hive, AWS, Sqoop, Spark-SQL, Kafka, Oozie Airflow, Oracle.

Confidential, PA

Big Data Engineer

Responsibilities:

  • Designed and implement end-to-end data solutions (storage, integration, processing, and visualization) in Azure.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL and U-SQL Azure Data Lake Analytics.
  • Performed data Ingestion to one or more Azure Services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
  • Integrating Kubernetes with network, storage of security to provide a comprehensive infrastructure and orchestrating the Kubernetes containers across the multiple hosts.
  • Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Implement ETL and data movement solutions using Azure Data Factory, SSIS
  • Develop dashboards and visualizations to help business users analyze data as well as providing data insight to upper management with a focus on Microsoft products like SQL Server Reporting Services (SSRS) and Power BI.
  • Migrate data from traditional database systems to Azure SQL databases.
  • Implement ad-hoc analysis solutions using Azure Data Lake Analytics/Store, HDInsight
  • Design and implement streaming solutions using Kafka or Azure Stream Analytics
  • Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. Used U-SQL for data transformation as part of a cloud data integration strategy.
  • Work with similar Microsoft on-prem data platforms, specifically SQL Server and related technologies such as SSIS, SSRS, and SSAS.
  • Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database and SQL Datawarehouse environment. experience in DWH/BI project implementation using Azure DF
  • Involved in designs Logical and Physical Data Model for Staging, DWH and Data Mart layer.
  • Created POWER BI Visualizations and Dashboards as per the requirements
  • Used various sources to pull data into Power BI such as SQL Server, Excel, Oracle, SQL Azure etc.
  • Develop dashboards and visualizations to help business users analyze data as well as providing data insight to upper management with a focus on Microsoft products like SQL Server Reporting Services (SSRS) and Power BI.

Environment: Azure, Scala, Hive, HDFS, Apache Spark, Kubernetes Oozie, Sqoop, Cassandra, Shell Scripting, Power BI, Mongo DB, Jenkins, UNIX, JIRA, Git.

Confidential, Mountain View, CA

Hadoop Developer

Responsibilities:

  • Developed highly optimized Spark applications to perform data cleansing, validation, transformation, and summarization activities
  • Data pipeline consists of Spark, Hive and Sqoop and custom build Input Adapters to ingest, transform and analyze operational data.
  • Created Spark jobs and Hive Jobs to summarize and transform data.
  • Used Spark for interactive queries, processing of streaming data, and integration with popular NoSQL database for huge volume of data.
  • Converted Hive/SQL queries into Spark transformations using Spark Data Frames and Scala.
  • Used different tools for data integration with different databases and Hadoop.
  • Used Spark for interactive queries, processing of streaming data, and integration with popular NoSQL database for huge volume of data.
  • Built real-time data pipelines by developing Kafka producers and spark streaming applications for consumption.
  • Ingested Syslog messages parse them and stream the data to Kafka.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce, and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
  • Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Scheduled and executed workflows in Oozy to run various jobs.
  • Implemented business logic in Hive and written UDF's to process the data for analysis.
  • Addressing the issues occurring due to the huge volume of data and transitions.
  • Designed, documented operational problems by following standards and procedures using JIRA.

Environment: Spark, Scala, Hive, Apache NiFi, Kafka, HDFS, Oracle, HBase, MapReduce, Oozie, Sqoop

Confidential

Data Engineer

Responsibilities:

  • Involved in gathering the requirements and analyze requirements specification for the reports.
  • Formulate policies and procedures necessary for data management, processing and quality assessment functions Ensured system deliverables meet business requirements by participating in identifying, designing and testing solutions.
  • Defined specifications like use case documentation, activity diagram, and business process flow using Microsoft Visio.
  • Performed complex data analysis in support of ad-hoc, standard, and project related requests.
  • Identified and resolved data related issues.
  • Performed Data Mining to analyze the patterns of the data sets.
  • Prepared Test plans which include an introduction, various test strategies, test schedules, QA team’s role, test deliverables, etc.
  • Provided support to the development and testing teams during the lifecycle of the project.
  • Developed various types of complex reports like Drill Down, Drill through, Cross tab reports, Tableau scorecards.
  • Created quick filters, table calculations, calculated fields and performed conditional sorting and filtering as per the requirements.
  • Designed dashboard templates as per the requirement and dashboard content which includes complex tabular reports.
  • Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dashboard actions and Table calculations.
  • Developed formulas using MS Excel and extensive use of tools such as Pivot Tables and VLOOKUP function in MS Excel.
  • Identifying key metrics and build dashboards using Tableau to provide business insights.

Environment: Tableau, Oracle DB, MS Excel, SQL Server, Erwin

Confidential

SQL Developer

Responsibilities:

  • Developed complex SQL statements to extract the Data and packaging/encrypting Data for delivery to customers.
  • Provided business intelligence analysis to decision-makers using an interactive OLAP tool
  • Created T/SQL statements (select, insert, update, delete) and stored procedures.
  • Defined Data requirements and elements used in XML transactions.
  • Created Informatica mappings using various Transformations like Joiner, Aggregate, Expression, Filter and Update Strategy.
  • Worked to ensure high levels of Data consistency between diverse source systems including flat files, XML and SQL Database.
  • Developed and run ad-hoc Data queries from multiple database types to identify system of records, Data inconsistencies, and Data quality issues.
  • Performed Tableau administering by using tableau admin commands.
  • Involved in defining the source to target Data mappings, business rules and Data definitions.
  • Metrics reporting, Data mining and trends in helpdesk environment using Access
  • Worked on SQL Server Integration Services (SSIS) to integrate and analyze data from multiple heterogeneous information sources.
  • Built reports and report models using SSRS to enable end user report builder usage.
  • Created Excel charts and pivot tables for the Ad-hoc Data pull.

Environment: SQL, PL/SQL, T/SQL, XML, Informatica, Tableau, OLAP, SSIS, SSRS, Excel, OLTP.

We'd love your feedback!