We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Ashburn, VA

PROFESSIONAL SUMMARY:

  • Around 7+ years of IT experience in Hadoop and ETL with complete software development life cycle including requirement gathering, analysis, design, development, implementation, and maintenance of BigData applications.
  • Experience in OLAP, OLTP and Data warehousing concepts with emphasis on ETL, Reporting, Analytics, Expertise in Data Migration, Data Profiling, Data Cleansing, Transformation, Integration and Data Import
  • Expertise in Microsoft Azure Cloud Services (PaaS & IaaS), Application Insights, Document DB, Internet of Things (IoT), Azure Monitoring, Key Vault, Visual Studio Online (VSO) and SQL Azure.
  • Hands on experience in working with Big Data Ecosystems like HDFS, Hive, Sqoop, Scala, Spark, Kafka.
  • Experience in using various tools like Sqoop, Flume, Kafka to ingest structured, semi - structured and unstructured data into the cluster.
  • Experience in Hadoop cluster using cloudera, Horton works HDP and Experience in working with structured data using HiveQL, join operations, Hive UDFs, partitions, bucketing and internal/external tables.
  • Good Knowledge in creating event processing data pipelines using Kafka and Spark Streaming.
  • Good Knowledge and experience with the Hive Query optimization and Performance tuning.
  • Hands on experience in writing Pig Latin Scripts and custom implementations using Hive and Pig UDFS.
  • Experience in using Flume to load log files into HDFS and Oozie for workflow design and scheduling.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL, Scala, and Python.
  • Solid Experience and understanding of Implementing large scale Data warehousing Programs and E2E Data Integration Solutions on Snowflake Cloud, GCP, AWS Redshift, Informatica Intelligent Cloud Services (IICS - CDI) & Informatica PowerCenter integrated with multiple Relational databases (MySQL, Teradata, Oracle, Sybase, SQL server, DB2)
  • Experience in developing web applications by using Python, Django, C++, XML, CSS, HTML, JavaScript and JQuery.
  • Experience with different data formats like Json, Avro, parquet, ORC formats and Implemented various Azure platforms such as Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory
  • Experience working with NoSQL database technologies, like Cassandra, MongoDB and HBase and Experience in usage of Snowflake, Hadoop distribution like Cloudera, Hortonworks distribution &AWS (EC2, EMR, RDS, Redshift, DynamoDB, Snowball) and Data bricks (data factory, notebook etc.)
  • Built Spark jobs using PySpark to perform ETL for data in S3 data lake. Experience in developing spring Boot applications for transformations
  • Experience in creating reports and dashboards in visualization tools like Tableau, Spotfire and Power BI. And responsible for Creating, Debugging, Scheduling and Monitoring jobs using Apache NiFi, Oozie, Luigi, Airflow.
  • Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.

TECHNICAL SKILLS:

Operating Systems: Linux (Ubuntu, CentOS), Windows, Mac OS

Hadoop Ecosystem: Hadoop, MapReduce, Yarn, HDFS, Pig, Oozie, Zookeeper, Hue

Big Data Ecosystem: Spark, Spark SQL, Spark Streaming, Spark MLlib, Hive

NOSQL Databases: HBase, Cassandra, MongoDB, CouchDB

Monitoring Tools: New Relic, Cloud Watch

Cloud Technologies: Amazon Web Services (EC2, S3, RDS, Red Shift, Quicksight), Microsoft Azure, Google Cloud Platform| GCP| IICS

Methodologies/Tools: SDLC, Agile, Scrum, Waterfall Model

PROFESSIONAL EXPERIENCE:

Confidential, Ashburn, VA

BIG Data Engineer

Responsibilities:

  • Migrating Objects using the custom ingestion framework from variety of sources such as Oracle, SAP/HANA, MongoDB & Teradata.
  • Planning and design of data warehouse in STAR schema. Designing structure of tables and documenting it.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
  • Designed and implemented end to end big data platform on data Appliance.
  • Performing ETL from multiple sources such as Kafka, NIFI, Teradata, DB2 using Hadoop spark.
  • Worked on Apache Spark Utilizing the Spark, SQL, and Streaming components to support the intraday and real-time data processing.
  • Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large scale data handling Millions of records every day.
  • Wrote various data normalization jobs for new data ingested into Redshift.
  • Worked extensively on AWS Components such as Elastic Map Reduce (EMR)
  • Created data pipeline for different events of ingestion, aggregation, and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for Tableau dashboards.
  • Created monitors, alarms, and notifications for EC2 hosts using Cloud Watch, Cloud trail and SNS.
  • Configured EC2 instance and configured IAM users and roles and created S3 data pipe using Boto API to load data from internal data sources.
  • Experience in data ingestions techniques for batch and stream processing using AWS Batch, AWS Kinesis, AWS Data Pipeline.
  • Developed HP vertica vSQL scripts for bulk loading, delta loading stage & target tables using IICS cloud data Integration.
  • Develop and deploy the outcome using spark code in Hadoop cluster running on GCP.
  • Created Messaging queues using RabbitMQ to read data from HDFS to process the data.
  • Developed frontend and backend modules using Python on Django web Framework and created UI using JavaScript, bootstrap, Cassandra with MySQL and HTML5/ CSS.
  • Worked on IBM (BDWM) and created integration model for mortgage.
  • Building data pipeline ETLs for data movement to S3, then to Redshift.
  • Written AWS Lambda code in Python for nested Json files, converting, comparing, sorting etc.
  • Installed and configured apache airflow for workflow management and created workflows in python.
  • Write UDFs in Hadoop PySpark to perform transformations and loads.
  • Writing TDCH scripts and apache NIFI to load data from Mainframes DB2 to Hadoop cluster.
  • Working with, ORC, AVRO and JSON, Parquetted file formats and create external tables and query on top of these files Using Big Query.
  • Experience delivering solutions using other established database technologies such as MS SQL Server, Oracle, Sybase, IBM Information Server
  • Source Analysis, Tracing back the sources of the data and finding its roots though Teradata, DB2 etc.
  • Documented research reports describing the experiment conducted, results, and findings and make strategic recommendations to technology, product, and senior management.
  • Worked closely with regulatory delivery leads to ensure robustness in prop trading control frameworks using Hadoop, Python Jupyter Notebook, Hive and NoSQL.
  • Implement Continuous Integration and Continuous Delivery process using GitLab along with Python and Shell scripts to automate routine jobs, which includes synchronize installers, configuration modules, packages, and requirements for the applications.
  • Worked on Informatica Power Center tools- Designer, Repository Manager, Workflow Manager, and Monitor
  • Experienced in working with version control systems like GIT and used Source code management client tools like Git Bash, GitHub, GitLab.
  • Check the data and tables structure in the Postgres & Redshift databases and run the queries to generate reports.
  • Migrating credit card data marts into IDW using IBM (BDW).
  • Created Snow pipe for continuous data load from staged data residing on cloud gateway servers.
  • Developing automated process for code builds and deployments using Jenkins, Ant, Maven, Sonar type, Shell Script.oute53.
  • Configured and deployed instances on GCP environments.
  • Installing and configuring the applications like docker tool and Kubernetes for the orchestration purpose
  • Developed automation system using PowerShell scripts and JSON templates to remediate the Azure services.

Confidential - Miami,FL

Data Engineer with Azure

Responsibilities:

  • Responsible for Creating, Debugging, Scheduling and Monitoring jobs using Apache NiFi tool.
  • Performed data mapping between source systems to Target systems, logical data modeling, created class diagrams and ER diagrams and used SQL queries to filter data.
  • Developed highly complex Python and Scala code, which is maintainable, easy to use and satisfies application requirements, data processing and analytics using inbuilt libraries.
  • Worked with Snowflake connector stage to extract from snowflake and perform transformation as per business logic.
  • Convertind SSIS jobs to IICS jobs with help of BRD document and Flow chart from Visio.
  • Implemented AWS Lambdas to drive real-time monitoring dashboards from system logs.
  • Performed Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
  • Created several types of data visualizations using Python and Tableau.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to profile data from DB2 and SQL Server database systems.
  • Explored cloud technologies such as GCP Compute Engine, Data Prep and Cloud SQL for successful implementation of data warehouse.
  • Worked on migrating to Snowflake from Oracle.
  • Worked on Designing and development of ETL solution using Azure data factory.
  • Used Databricks in ADF Pipeline to automate ETL Process.
  • Created Azure Databricks Pyspark notebooks to Convert the Data in the Storage, Database to the Delta Lake Tables.
  • Hands on experience with message brokers such as Apache kafka and RabbitMQ.
  • Used python Boto3 to write scripts to automate launch, starting and stopping of EC2 instance and taking snapshots of the servers.
  • Developed Azure Data Factory, Databricks pipelines to move the data from Azure blob storage/file share to Azure Sql Datawarehouse and blob.
  • Worked effectively on Google Cloud Big data Technologies like Data Proc, Data Flow, Big Query and GCP Storage, and having knowledge on pub sub.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics in Azure Databricks.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Involved in creating the PySpark Data frame in Azure Databricks to read the data from Data Lake or from Blob storage and use Spark Sql context for transformation.
  • Designed, developed, and implemented ETL pipelines using PySpark and Azure Data Factory.

Confidential - Charlotte, NC

Associate Data Engineer with AWS

Responsibilities:

  • Involved inDatamappingspecifications, whatdatawill be extracted from an internaldatawarehouse, transformed and sent to an external entity, to create and execute detailed system test plans.
  • Responsible for building ETL process to create the data backend for a high impact project of Claims Analytics Engine.
  • Gathered data from multiple raw files from varied sources, analyzed for issues, brought heterogeneous data, volume of 100+ mil records, onto a homogenous platform.
  • Ingested and standardized Insurance Claims data across ten major insurance plans including Medicare, Medicaid, UHC, etc..
  • SSAS OLAP cube allowing non-technical business leaders access customizable analytics of high-level revenue data
  • Carried out Ad Hoc Data analysis on Atrium market share in various geographic regions and medical domains. Ex. Opportunity and market share in Cardiosurgery patients.
  • Analyzed and Confirmed Diversification of new revenue streams proposed by BCBS as High Risk
  • Designed theDataMartsin dimensionaldatamodeling using star and snowflake schemas.
  • Converted Conceptual Model to the Logical and Physical model and reviewed it with internal data architects, business analysts and business users.
  • Created the test environment and loaded the Staging area with data from multiple sources.
  • Worked in Hadoop Ecosystem and used various tools like Sqoop, Hive, etc to pull data from multiple source systems and query them.
  • Tested the ETL process both before and after the data validation.
  • Tested the database to check fieldsize validation, checkconstraints,stored proceduresandcross verifying the field size definedwithin the application withmetadata.

Confidential

Data Analyst/ Data Modeler

Responsibilities:

  • Extracted large volume of employee activity data from company’s inhouse tool to Power BI using SQL queries.
  • Created weekly KPIs and SLAs for senior management to measure employee productivity and client’s satisfaction rate against targets using Power BI dashboard resulting in time savings of up to 10%.
  • Extracted data from SAP to MS Excel using SQL queries to create Balance sheet and income statements for more than 10 clients.
  • Worked on building the data model using Erwin as per the requirements. Designed the grain of facts depending on reporting requirements.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using star and snowflake Schemas.
  • Good exposure to other cloud providers GCP and Azure and utilized Azure Databricks for learning and experimentation.
  • Gather all the Sales analysis report prototypes from the business analysts belonging to different Business units; Participated in JAD sessions involving the discussion of various reporting needs.
  • Scripted stored procedures and User Define Scalar Functions to be used in the SSIS packages and SQL scripts.
  • Generated monthly and quarterly reports as bar charts for the management using different techniques like data filtering, adding interactivity, deploying reports to report server using SSRS.
  • Generated SQL queries to extract consumer and sales data for a key client operating in different geographies with more than 140 product offerings.
  • Identified key issues around customer churn rate, purchasing behaviour, seasonal impact on product margins in various geographies.
  • Performed ad-hoc financial analysis and provided year wise performance reports and presented to the client for decision making.
  • Produced operational reports in SSRS i.e., drill-down, drill-through, dashboard and matrix reports. Responsible for ETL through SSIS and loading Data to DB from different input sources.

We'd love your feedback!