We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

TX

SUMMARY

  • Data Engineer with 7+ years of experience in building data intensive applications and creating pipelines using python and shell scripting with extensive knowledge on amazon web services (AWS). Experience with Data Extraction, Transformation and Loading (ETL). Building a Data warehouse using Star and Snowflake schemas. Well versed with scrum methodologies.
  • 6+ years of experience in software development which includes Design and Development of Enterprise and Web - based applications.
  • Hands-on technical experience in Python, Java, Q++(Mastercraft), DB2 SQL, R programming with primary exposure to teh P & C Insurance domain.
  • Experience with Amazon Web Services (Amazon EC2, Amazon S3, Amazon RDS, Amazon Elastic Load Balancing, Amazon SQS, AWS Identity and access management, Amazon SNS, AWS Cloud Watch, Amazon EBS, Amazon CloudFront, VPC, DynamoDB, Lambda and Redshift)
  • Experience in using python integrated IDEs like PyCharm, Sublime Text, and IDLE.
  • Experience in developing web applications and implementing Model View Control (MVC) architecture using server-side applications Django and Flask.
  • Working knowledge on Kubernetes to deploy scale, load balance, and manage Docker containers
  • Good knowledge in Data Extraction, Transforming and Loading (ETL) using various tools such as SQL Server Integration Services (SSIS), Data Transformation Services (DTS).
  • Experience in Database Design and development with Business Intelligence using SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), OLAP Cubes, Star Schema and Snowflake Schema.
  • Data Ingestion to Azure Services and processing teh data in InAzure Databricks.
  • Creating and enhancing CI/CD pipeline to ensure Business Analysts can build, test, and deploy quickly.
  • Building Data Warehouse using Star and Snowflake schemas.
  • Extensive knowledge on Exploratory Data Analysis, Big Data Analytics using Spark, Predictive analysis using Linear and Logistic Regression models and good understanding in supervised and unsupervised algorithms.
  • Worked on different statistical techniques like Linear/Logistic Regression, Correlational Tests, ANOVA, Chi-Square Analysis, K-means Clustering.
  • Hands-on experience on Visualizing teh data using Power BI, Tableau, R(ggplot), Python (Pandas, matplotlib, NumPy, SciPy).
  • Integrating Azure Databricks with Power BI and creating dashboards.
  • Good Knowledge in writing Data Analysis eXpression(DAX) in Tabular data model.
  • Hands on knowledge in designing Database schema by achieving normalization.
  • Proficient in all phases of Software Development Life Cycle (SDLC) including Requirements gathering, Analysis, Design, Reviews, Coding, Unit Testing, and Integration Testing.
  • Well versed with Scrum methodologies.
  • Analyzed teh requirements and developed Use Cases, UML Diagrams, Class Diagrams, Sequence and State Machine Diagrams.
  • Excellent communication and interpersonal skills with ability in resolving complex business problems.
  • Direct interaction with client and business users across different locations for critical issues.

TECHNICAL SKILLS

Languages: Python, Java, R, Q++, C, C++

Tools: Pycharm, Visual Studio, R Studio, Power BI, Tableau, SAS Studio, Gephi, Eclipse, Putty, Mainframes, Excel, Jupyter Notebook, Azure Databricks

Operating System: Windows, Unix, Linux

Databases: Oracle, MySQL, SQL, NoSQL (MongoDB), PostgreSQL

Methodologies: Waterfall, Agile

ETL Tools: Apache Airflow, Informatica, SSIS (SQL Server Integration Services)

Cloud Services: Amazon Web Services (AWS)

PROFESSIONAL EXPERIENCE

Confidential, Tx

Data Engineer

Responsibilities:

  • Worked as a Sr. Data Engineer with Big Data and Hadoop ecosystem components.
  • Involved in converting Hive/SQL queries into Spark transformations using Scala.
  • Created Spark data frames using Spark SQL and prepared data for data analytics by storing it in AWS S3.
  • Responsible for loading data from Kafka into HBase using REST API.
  • Developed teh batch scripts to fetch teh data from AWS S3 storage and perform required transformations in Scala using Spark framework.
  • Used Spark streaming APIs to perform transformations and actions on teh fly for building a common learner data model which gets teh data from Kafka in near real time and persists it to teh HBase.
  • Created Sqoop scripts to import and export customer profile data from RDBMS to S3 buckets.
  • Developed various enrichment applications in Spark using Scala for cleansing and enrichment of clickstream data with customer profile lookups.
  • Troubleshooting Spark applications for improved error tolerance and reliability.
  • Used Spark Data frame and Spark API to implement batch processing of Jobs.
  • Used Apache Kafka and Spark Streaming to get teh data from adobe live stream rest API connections.
  • Automated creation and termination of AWS EMR clusters.
  • Worked on fine tuning and performance enhancements of various spark applications and hive scripts.
  • Used various concepts in spark like broadcast variables, caching, dynamic allocation to design more scalable spark applications.
  • Identify source systems, their connectivity, related tables, and fields and ensure data suitability for mapping, preparing unit test cases, and provide support to teh testing team to fix defects.
  • Defined HBase tables to store various data formats of incoming data from different portfolios.
  • Developed teh verification and control process for a daily data loading.
  • Involved in daily production support to monitor and troubleshoot Hive and Spark jobs.

Environment: AWS EMR, S3, Spark, Hive, Sqoop, Scala, MySQL, Oracle DB, Athena, Redshift

Confidential, IL

Data Engineer

Responsibilities:

  • Extensively worked in Sqoop to migrate data from RDBMS to HDFS.
  • Ingested data from various source systems like Teradata, MySQL, Oracle databases.
  • Developed Spark application to perform Extract Transform and load using Spark RDD and Data frames.
  • Created Hive external tables on top of data from HDFS and wrote ad-hoc hive queries to analyze teh data based on business requirements.
  • Utilized Partitioning and Bucketing in Hive to improve hive query processing times.
  • Performed incremental data ingestion using Sqoop as an existing application is generating data on a daily basis.
  • Migrated/reimplemented Map Reduce jobs to Spark applications for better performance.
  • Handled data in different file formats like Avro and Parquet.
  • Extensively used Cloudera Hadoop distributions within teh project.
  • Used GIT for maintaining/versioning teh code.
  • Created Oozie workflows to automate teh data pipelines
  • Involve in a fully automated CI/CD pipeline process through GitHub, Jenkins.
  • Used Cloudera Manager for installation and management of Hadoop Cluster.
  • Exported data from teh HDFS environment into RDBMS using Sqoop for report generation and visualization purposes.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Invoked in creating Hive tables, loading with data, and writing Hive queries, which will invoke MapReduce jobs in teh backend. Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, TEMPEffective & efficient Joins.
  • Worked in designing and deployment of Hadoop cluster and different big data analytic tools, including Pig, Hive, Oozie, Zookeeper, Sqoop, Flume, Impala, Cassandra with Horton work distribution.
  • Utilized teh Apache Hadoop environment by Cloudera. Monitoring and Debugging Spark jobs which are running on a Spark cluster using Cloudera Manager.
  • Written Hive SQL queries for Ad-hoc data analysis to meet business requirements.
  • Delivered Unit test plans Involved in Unit testing and documenting.

Environment: Cloudera (CDH 5.x), Spark, Scala, Sqoop, Oozie, Hive, HDFS, MySQL, Oracle DB, Teradata, Linux, Shell Scripting.

Confidential, CA

Data Engineer

Responsibilities:

  • Involved in Analysis, Design, and Implementation/translation of Business User requirements.
  • Worked on collection of large sets of Structured and Unstructured data using Python Script.
  • Worked on creating DL algorithms using LSTM and RNN.
  • Actively involved in designing and developing data ingestion, aggregation, and integration in teh Hadoop environment.
  • Developed Sqoop scripts to import export data from relational sources and handled incremental loading on teh customer, transaction data by date.
  • Experience in creating Hive Tables, Partitioning and Bucketing.
  • Performed data analysis and data profiling using complex SQL queries on various source systems including Oracle 10g/11g and SQL Server 2012.
  • Identified inconsistencies in data collected from different sources.
  • Designed object model, data model, tables, constraints, necessary stored procedures, functions, triggers, and packages for Oracle Database.
  • Wrote Spark applications for Data validation, cleansing, transformations, and custom aggregations.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying.
  • Worked on installing cluster, commissioning & decommissioning of Data node, Name node high availability, capacity planning, and slots configuration.
  • Developed Spark applications for teh entire batch processing by using Scala.
  • Stored teh time-series transformed data from teh Spark engine built on top of a Hive platform to Amazon S3 and Redshift.
  • Facilitated deployment of multi-clustered environments using AWS EC2 and EMR apart from deploying Dockers for cross-functional deployment.
  • Visualized teh results using Tableau dashboards and teh Python Seaborn libraries were used for Data interpretation in deployment.
  • Created PDF reports using Golang and XML documents to send it to all customers at teh end of month.
  • Applied various data mining techniques: Linear Regression & Logistic Regression, classification, clustering.

Environment: R, SQL server, Oracle, HDFS, HBase, AWS, MapReduce, Hive, Impala, Pig, Sqoop, NoSQL, Tableau, RNN, LSTM, Unix/Linux.

Confidential

Python Developer

Responsibilities:

  • Creating web-based applications using Python on Django framework for data processing.
  • Implementing teh preprocessing procedures along with deployment using teh AWS services and creating virtual machines using EC2.
  • Good knowledge in Exploratory data analysis and performed data wrangling and data visualization.
  • Validating teh datato check for teh proper conversion and identifying and cleaning unwanteddata,dataprofiling for accuracy, completeness, consistency.
  • Preparing standard reports, charts, graphs, and tables from a structureddata source by querying data repositories using Python and SQL.
  • Developed and produced a dashboard, key performance indicators and monitor organization performance.
  • Definedataneeds, evaluatedataquality, and extract/transformdatafor analytic projects and research.
  • Used Django framework for application development. Designed and maintained databases usingPythonand developedPythonbased API (RESTful Web Service) using Flask, SQLAlchemy and PostgreSQL.
  • Worked on server-side applications usingPythonprogramming.
  • Performed efficient delivery of code and continuous integration to keep in line with Agile principals.
  • Experience in Agile Methodologies, Scrum stories and sprints experience in aPython based environment,
  • Importing and exporting data between different data sources using SQL Server Management Studio.
  • Maintaining program libraries, user's manuals and technical documentation.

Environment: python, Django, RESTful web service, MySQL, PostgreSQL, Visio, SQL Server Management Studio, AWS.

We'd love your feedback!