We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • Proficient IT professional experience with 8 years of experience specialized in Big Data ecosystem - Data Acquisition, Ingestion, Modeling, Storage Analysis, Integration, and Data Processing.
  • Experience in all phases of Software Development Life Cycle (SDLC) which includes analysis, design, development, testing and deployment of software systems which includes big data and Hadoop ecosystem, Machine Learning, Cloud Computing, Business Intelligence, ETL/ELT.
  • Excellent Programming skills at a higher level of abstraction using Scala, Java, R and Python.
  • Experience in using python integrated IDEs like PyCharm, Sublime Text, and IDLE.
  • Experienced on data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing.
  • Acquired profound knowledge on spark ecosystem and Architecture in developing production ready Spark applications utilizing Spark Core, Spark Streaming, Spark SQL, Data Frames, Datasets and Spark-ML.
  • Experience with Confidential Kinesis to Stream, Analyze and Process real-time Logs from Apache application server and Confidential Kinesis Firehose to store the Processed Log Files in Confidential S3 Bucket.
  • Experienced in designing and maintaining relational databases using MySQL, MS SQL Server and PostgreSQL.
  • Experience working with various AWS Services like EC2, S3, ELB, Auto scaling, Route53, SNS, SES, Cloud Watch, RDS, Dynamo DB, VPC, Elasticache, Elastic search, Cloud Formation, Cloud Front, ECS etc.
  • Strong experience in spinning up AWS infrastructure using Terraform and CloudFormation template.
  • Expertise in Analysis, design and implementation of Data Warehousing/BI Solutions using various tools like Tableau, Alteryx, Power BI, SQL Server, Confidential, DataStage, Hadoop.
  • Strong working experience with SQL and NoSQL databases, data modeling and data pipelines. Involved in end-to-end development and automation of ETL pipelines using SQL and Python.
  • Worked on NoSQL databases including HBase, Cassandra and Mongo DB.
  • Created and maintained user accounts, profiles, network security and security groups, using AWS-IAM.
  • Very strong knowledge in ETL tool Informatics PC and BDM, Confidential DataStage, SSIS, Reporting tool SSRS, Tableau and power BI.
  • The ETL Decision making engine was build using Spark, HIVE, Scala and Unix Shell scripts and the front-end dashboards were built using Elastic Search and Kibana.
  • Experience in all phases of Data Warehouse development like requirements gathering, design, development, implementation, testing, and documentation.
  • Hands on working experience with RESTful API's, API life cycle management and consuming RESTful services.
  • In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Map Reduce, Spark.
  • Good knowledge in understanding and using NoSQL databases Apache Cassandra, Mongo DB, Dynamo DB, Couch DB and Redis.

PROFESSIONAL EXPERIENCE

Senior Data Engineer

Confidential

Responsibilities:

  • Installed and configured with Apache big data Hadoop components like HDFS, Map Reduce, YARN, Hive, HBase, Sqoop, Pig, Ambari and Nifi.
  • Excellent working experience in Scrum/Agile framework and Waterfall project execution methodologies.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kinesis in near real time and Persists into Cassandra.
  • Written terraform code to spin up the AWS infrastructure EC2, EMR, S3, RDS with required security protocols.
  • Wrote python scripts to parse Confidential documents and load the data in PostgreSQL database.
  • Installed application on AWS EC2 Instances & also configured the storage on S3 buckets.
  • Exported the analyzed data to the Redshift using spark, to further visualize and generate reports for the BI team.
  • Using PySpark developed framework to implement ETL architecture to input raw data and stores structured data in Hadoop cluster.
  • Designed and developed automation for SPARK, UNIX and Python Scripts using Airflow DAGs.
  • Experienced in Data Modeling techniques employing Data warehousing concepts like star/snowflake schema and Extended Star.
  • Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark with Databricks.
  • Extensively worked on Spark Context, Spark-SQL, RDD's Transformation, Actions and Data Frames.
  • Manage and maintain optimize SQL Server, PostgreSQL in Windows, and Linux environment.
  • Managing Confidential Web Services (AWS) infrastructure with automation and configuration management tools such as Ansible, Puppet, or custom-built.
  • Designing cloud-hosted solutions, specific AWS product suite experience.
  • Used Power BI Gateways to keep dashboards and reports up to date and Published Power BI reports in required originations and made Power BI dashboards available in web clients and mobile apps.
  • Implemented Confidential Kinesis to Stream, Analyze and Process real-time Logs from Apache application server and Confidential Kinesis Firehose to store the Processed Log Files in Confidential S3 Bucket.
  • Stored and retrieved data from data-warehouses using Confidential Redshift.
  • Developed custom ETL solutions, batch processing and real-time data ingestion pipeline to move data in and out of Hadoop using PySpark and shell scripting.
  • Used PySpark functions and Spark SQL Data frames to increase performance by writing user defined functions (UDF's).
  • Installed Airflow and created a database in PostgreSQL to store metadata from Airflow.
  • Sound experience in building production ETL pipelines between several source systems and Enterprise Data Warehouse by leveraging Informatics Power Center, SSIS, SSAS and SSRS.
  • Migrated from JMS solace to Apache Kafka, used Zookeeper to manage synchronization, serialization, and coordination across the cluster.
  • Enhance existing data integration/ ETL processes as per the business needs for claims processing in Facets.
  • I am working as an application AWS Glue developer with Redshift DBA, to support PDM application.
  • The ETL Decision making engine was build using Spark, HIVE, Scala and Unix Shell scripts and the front-end dashboards were built using Elastic Search and Kibana.

Environment: Hadoop, Map Reduce, HDFS, Hive, PySpark, Cassandra, Sqoop, Oozie, SQL, Power BI, PostgreSQL, Kafka, Spark, Microservices, Scala, Java, Python 3.6, AWS, .Net, Airflow, GitHub, Redshift, Talend Big Data Integration, Solr, Impala, AWS, AWS Cloud Formation Templates, AWS RDS, AWS Kinesis, AWS Cloud Watch.

Data Engineer

Confidential, Harrisburg, PA

Responsibilities:

  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like, write-back tool and backwards.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Developed Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Used AWS lambda to run code virtually queries from Python using Python -MySQL connector and MySQL database package.
  • Responsible for updating and maintaining Terraform scripts, cluster deployment manifest, Docker images.
  • Used Apache Airflow for authoring and orchestrating big data workflows.
  • Worked on AWS EC2 Instances creation, setting up AWS VPC, launching AWS EC2 instances different kind of private and public subnets based on the requirements for each of the applications.
  • Documented logical, physical, relational and dimensional data models. Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
  • Experienced in implementing Microservices, Service Oriented Architecture (SOA) with XML based Web Services (SOAP/UDDI/WSDL) using Top Down and Bottom-Up approach.
  • Designed and deployed data pipelines using AWS services such as EMR, AWS DynamoDB, Lambda, Glue, EC2, S3, RDS, EBS, Elastic load Balancer (ELB), Auto-scaling groups.
  • Created serverless ETL workflows in cloud platform using AWS Glue, Glue Data Catalog, S3, RDS, Cloud Watch, and Lambda.
  • Leveraged Confidential Web Services like EC2, RDS, EBS, ELB, Auto scaling, AMI, IAM through AWS console and API Integration.
  • Worked on utilizing AWS cloud services like S3, EMR, Redshift, Athena and Glue Meta store.
  • Developing and deploying Infrastructure as Code (IaC) using Terraform scripts for provisioning different services as AWS VPC, Elasticsearch, AWS RDS (MySQL, Confidential, Postgres), S3 bucket.
  • Administer, optimize, and ensure availability of PostgreSQL and MariaDB.
  • Designed and Developed Spark code using python, PySpark & Spark SQL for high-speed data processing to meet critical business requirement.
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables.
  • Deployed and tested (CI/CD) our developed code using Visual Studio Team Services (VSTS).
  • Analyzed the SQL scripts and designed the solution to implement using python.
  • Facilitated in developing testing procedures, test cases and User Acceptance Testing (UAT).

Environment: Java, Spark, YARN, HIVE, Pig, Scala, .Net, Mahout, NiFi, Python, Airflow, Redshift, Hadoop, Dynamo DB, Kibana, NOSQL, Sqoop, MYSQL, PostgreSQL, AWS, Apache Airflow.

Data Analyst

Confidential

Responsibilities:

  • Conducted Design discussions and meetings to come out with the appropriate Data Model.
  • Used Agile Methodologies, Scrum stories and sprints experience in a Python based environment.
  • Conducted statistical analysis on Healthcare data using python and various tools.
  • Designed and Developed Confidential, PL/SQL and Data Import/Export, Data Conversions and Data Cleansing.
  • Working as Data warehouse specialist to actively manage project and keeping it on track for delivery.
  • Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.
  • Designed and deployed data pipelines using AWS services such as EMR, AWS DynamoDB, Lambda, Glue, EC2, S3, RDS, EBS, Elastic load Balancer (ELB), Auto-scaling groups.
  • Actively participated in data mapping activities for the data warehouse.
  • Developed XML configuration files, properties files used in struts Validate framework for validating Form inputs on server side.
  • Used SOAP as an XML-based protocol for web service operation invocation.
  • Involved in deployment of application on WebLogic Application Server in Development & QA environment.
  • Developed Web Applications with rich internet applications using Java applets, Silverlight, Java.
  • Played a key role in the high-level design for the implementation of the application.
  • Designed and established the process and mapping the functional requirement to the workflow process.

Environment: OLAP, OLTP, XML, SQL, SSIS, PL/SQL, SAS, SSRS, UNIX, Excel, ETL, Tableau, Java, Servlets, Java Beans, JSP, EJB, J2EE, STRUTS, XML, XSLT, JavaScript, HTML, CSS, Spring 3.2, SQL, PL/SQL, MS Visio, Eclipse, JDBC, Windows XP, AWS.

We'd love your feedback!