We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

Dublin, OH

SUMMARY

  • Around 8+ years of professional IT experience involving project development, implementation, deployment, and maintenance using Bigdata technologies in designing and implementing complete end - to-end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Scala, Yarn, Kafka, PIG, HIVE, Sqoop, Flume, Oozie, Impala, HBase, Spark integration wif Cassandra, Avro, Solr and Zookeeper.
  • 7+Years of experience As Developer using Big Data Technologies like Databricks/Spark and Hadoop Ecosystems.
  • Hands on experience on Unified Data Analytics wif Databricks, Databricks Workspace User Interface, Managing Databricks Notebooks, Delta Lake wif Python, Delta Lake wif Spark SQL.
  • Good understanding of Spark Architecture wif Databricks, Structured Streaming.
  • Setting Up AWS and Microsoft Azure wif Databricks, Databricks Workspace for Business Analytics, Manage Clusters in Databricks.
  • Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue, Lambda functions, Step functions, CloudWatch, SNS, DynamoDB, SQS.
  • Proficiency in multiple databases like MongoDB, MySQL, ORACLE, and MS SQL Server.
  • Worked as team JIRA administrator providing access, working assigned tickets, and teaming wif project developers to test product requirements/bugs/new improvements.
  • CreatedSnowflake Schemasby normalizing teh dimension tables as appropriate and creating a Sub Dimension named Demographic as a subset to teh Customer Dimension.
  • Hands on experience in test driven development(TDD),Behavior driven development(BDD)and acceptance test driven development (ATDD)approaches.
  • Worked wif Google Compute Cloud Data Flow and Big Query to manage and move data wifin a 200 Petabyte Cloud Data Lake for GDPR Compliance and designed star schema in Big Query.
  • Provided full life cycle support to logical/physical database design, schema management and deployment. Adept at database deployment phase wif strict configuration management and controlled coordination wif different teams.
  • Experience in writing code in R and Python to manipulate data for data loads, extracts, statistical analysis, modeling, and data munging.
  • Utilized Kubernetes and Docker for teh runtime environment for teh CI/CD system to build, test, and deploy. Experience in working on creating and running docker images wif multiple microservices.
  • Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Experience in developing customizedUDF’sin Python to extend Hive and Pig Latin functionality.
  • Skilled in performing data parsing, data ingestion, data manipulation, data architecture, data modelling and data preparation wif methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.
  • Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB usingPython.
  • Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.

TECHNICAL SKILLS

Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, HBASE, YARN, Kafka, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Elastic Search, MongoDB Avro, Storm, Parquet, Snappy, AWS

Cloud Technologies: AWS, Azure, Google cloud platform (GCP)

IDE’s: IntelliJ, Eclipse, Spyder, Jupyter.

Databases & Warehouses: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, HBASE, NoSQL, SQL Server, MS Access, Teradata

Programming / Query Languages: Java, SQL, Python, NoSQL, PySpark, SQL, PL/SQL, Linux shell scripts, Scala.

Data Engineer/Big Data Tools / Cloud / Visualization / Other Tools: Databricks, Hadoop Distributed File System (HDFS), Hive, Pig, Sqoop, MapReduce, Spring Boot, Flume, YARN, Hortonworks, Cloudera, Mahout, MLlib, Oozie, Zookeeper, etc. AWS, Azure Databricks, Azure Data Explorer, Azure HDInsight, Salesforce, NI-FI, Linux, Big Query, Bash Shell, Unix, Tableau, Power BI, SAS, We Intelligence, Crystal Reports.

Version Controllers: GIT, SVN, Bitbucket

ETL Tools: Informatica, Talend

Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapReduce, AWS EMR

PROFESSIONAL EXPERIENCE

Confidential, Dublin,OH

Data Engineer

Responsibilities:

  • Hands - on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake.
  • Worked on creating tabular models on Azure analysis services for meeting business reporting requirements.
  • Has good experience working wif Azure BLOB and Data Lake storage and loading data into Azure SQL Synapse analytics (DW).
  • Extract, transform and load data from source systems to Azure Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL of Azure Data Lake Analytics.
  • Data ingestion to one or more Azure services (Azure Data Lake, Azure Storage, Azure SQL DB, Azure SQL DW), and processing teh data in Azure Databricks.
  • Has experience of working on Snow-flake data warehouse.
  • Moved teh data from Azure Blob storage to snowflake database.
  • Designed custom Spark REPL application to handle similar datasets.
  • Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation.
  • Performed Hive test queries on local sample files and HDFS files.
  • Used Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, hive, HBase, Spark and Sqoop.
  • Building teh pipelines to copy teh data from source to destination in Azure Data Factory
  • Creating Stored Procedure and Scheduled them in Azure Environment
  • Experience in using SSIS tools like Import and Export Wizard, Package Installation, and SSIS Package Designer.
  • Experience in ETL processes involving migrations and in sync processes between two databases.
  • Analyzed Data Profiling Results and Performed Various Transformations.
  • Hands on Creating Reference Table using Informatica Analyst tool as well as Informatica Developer tool.
  • Written Python scripts to parse JSON documents and load teh data in database.
  • Generating various capacity planning reports (graphical) using Python packages like Numpy, matplotlib.
  • Analyzing various logs dat are been generating and predicting/forecasting next occurrence of event wif various Python libraries.
  • Hands-on experience wif Snowflake utilities, SnowSQL, SnowPipe, Big Data model techniques using Python.
  • ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.

Environment: Azure, ADF, Azure Databricks, Snowflake, Linux, Oracle 11g, SQL, SQL Server, MySQL, SSIS, Oracle

Confidential, Dover, OH

Data Engineer

Responsibilities:

  • Hands - on Developing a data platform from scratch and took part in requirement gathering and analysis phase of teh project in documenting teh business requirements.
  • Worked closely wif Data Scientists to know data requirements for teh experiments.
  • Migrated from SAS application to pyspark.
  • Refactored teh SAS code to pyspark SQL.
  • Used AWS EMR for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and dump teh output parquet files into S3 for teh modelers.
  • Developed scripts to load data to hive from HDFS and involved in ingesting data into Data Warehouse using various data loading techniques. shell scripts to run teh jobs on a Linux environment.Ingested teh data from Restful API, Databases, and csv files.
  • Developing data processing tasks using Pyspark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
  • Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
  • Used Django REST framework and integrated new and existing API's endpoints.
  • Worked on Django ORM API to create and insert data into teh tables and access teh database.
  • Used and customized NGINX server to for checking our developed project.
  • Implemented teh use of Amazon EMR for Big Data processing among a Hadoop Cluster of virtual servers on Amazon related EC2 and S3.
  • Worked on building docker images and run jobs on Kubernetes cluster.
  • Extensive expertise using teh core Spark APIs and processing data on an EMR cluster
  • Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.

Environment: Hadoop, HDFS, Hive, Spark, Cloudera, AWS EC2, AWS S3, AWS ERM, Sqoop, Kafka, Yarn, Shell Scripting,Pig, Cassandra, Oozie, Agile methods, MySQL

Confidential, Edison, NJ

Data Engineer

Responsibilities:

  • Experienced in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Kafka, Spark wif Cloudera distribution.
  • Worked on Cloudera distribution and deployed on AWS EC2 Instances.
  • Hands on experience on Cloudera Hue to import data on teh GUI.
  • Worked on integrating Apache Kafka wif Spark Streaming process to consume data from external REST APIs and run custom functions.
  • Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
  • Developed Spark scripts by using Scala Shell commands as per teh requirement.
  • Configured, deployed, and maintained multi-node Dev and Tested Kafka Clusters.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Involved in runningHadoopstreaming jobs to process terabytes of text data. Worked wif different file formats such as Text, Sequence files, Avro, ORC and Parquet.
  • Configured, supported, and maintained all network, firewall, storage, load balancers, operating systems, and software inAWSEC2.
  • Implemented teh use of Amazon EMR for Big Data processing among a Hadoop Cluster of virtual servers on Amazon related EC2 and S3.
  • Worked on custom Pig Loaders and storage classes to work wif variety of data formats such as JSON and XML file formats.
  • Involved in designing and deploying multi-tier applications using all teh AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM, Cloud formation) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation
  • Supporting Continuous storage in AWS using Elastic Block Storage, S3, Glacier. Created Volumes and configured Snapshots for EC2 instances.
  • Implementations of generalized solution model using AWS SageMaker.
  • Extensive expertise using teh core Spark APIs and processing data on an EMR cluster.
  • Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Performed data analysis wif Cassandra using Hive External tables.
  • Designed teh Column families in Cassandra.
  • Experienced in runningHadoopstreaming jobs to process terabytes of xml format data.

Environment: Hadoop, HDFS, Hive, Spark, Cloudera, AWS EC2, AWS S3, AWS ERM, Sqoop, Kafka, Yarn, Shell Scripting, Scala, Pig, Databricks, Snowflake, Oozie, Agile methods, MySQL

Confidential - Chicago,IL

Data Engineer

Responsibilities:

  • Hands - on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application HD Insights, Azure Monitoring, Key Vault, Azure Data Lake.
  • Worked extensively on running spark jobs on Azure HD Insights environment.
  • Used Spark as Data processing framework and has worked on performance tuning of teh production jobs.
  • Ingested teh data from ms-sql server to Azure data storage.
  • Worked on creating tabular models onAzure analysis servicesfor meeting business reporting requirements.
  • Has good experience working wif Azure BLOB andData Lakestorage and loading data intoAzure SQL Synapse analytics (DW)
  • As a Hadoop Developer my responsibility is managing teh data pipelines and data lake.
  • Has experience of working on Snow -flake data warehouse.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Designed custom Spark REPL application to handle similar datasets.
  • Used Hadoop scripts for HDFS (Hadoop File System) data loading and manipulation.
  • Performed Hive test queries on local sample files and HDFS files.
  • Created, tested, and maintained PHP scripts, MySQL programming, forms, reports, triggers and procedures for teh Data Warehouse.
  • Created database application using PHP and MySQL as teh database to monitor customer profiles and complaints.
  • Experienced in designing, modeling, developing and support web-based projects wif responsibilities including analysis, design, development, implementation and maintenance.
  • Moving data from HIVE to Azure SQL DB wif teh help of pipeline and data flows)
  • Migrating teh data from different sources to teh destination wif teh help of ADF
  • Transformed data from one server to other servers using tools like Bulk Copy Program (BCP), and SQL Server
  • Wrote severalTeradata SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull request.
  • Developed Spark SQL to load tables into HDFS to run select queries on top.
  • Developed analytical component using Scala, Spark, and Spark Stream.
  • Worked on teh NoSQL databases HBase and mongo DB.
  • Has good experience in logging defects in Jira and Azure Devops tools.
  • Hands on experience working wif snowflake database and running ETL pipelines on snowflake warehouse.
  • Copied teh data from snowflake to S3 and from S3 to snowflake. created snowflake warehouses and managed teh permissions by creating teh roles
  • Hands on experience working wif snowsql and snowflake query optimizations
  • Used python APIs for extracting daily data from multiple vendors.

Environment: Hadoop, Azure, Spark, Hive,Oozie, Java, Linux, Maven, MS-SQL,SSIS,Oracle 11g/10g, Zookeeper,

Confidential -Charlotte, NC

Data Engineer

Responsibilities:

  • Experience in developing Spark applications using Spark-SQL inDatabricksfor data extraction, transformation, and aggregation from multiple file formats for Analyzing& transforming teh data to uncover insights into teh customer usage patterns.
  • Extract Transform and Load data from sources Systems to Azure Data Storage services using a combination of Azure Data factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data ingestion to one or more Azureservices (Azure Data Lake, Azure Storage, Azure SQL, Azure DW)and processing teh data inAzure Databricks.
  • Responsible for estimating teh cluster size, monitoring, and troubleshooting of teh Spark databricks cluster.
  • Using Sqoop to import and export data from Oracle and PostgreSQL into HDFS so as to use it for teh analysis.
  • Migrated Existing MapReduce programs to Spark Models using Python.
  • Responsible for Designing Logical and Physical data modelling for various data sources on Confidential Redshift
  • Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift.
  • Done data validation between data present in Data Lake and S3 bucket.
  • Used Spark Data Frame API over Cloudera platform to perform analytics on hive data.
  • Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to dat of MR jobs.
  • Used Kafka for real time data ingestion.
  • Created different topic for reading teh data in Kafka.
  • Created database objects like Stored Procedures, UDFs, Triggers, Indexes and Views using TSQL in both OLTP and Relational data warehouse in support of ETL.
  • Created report models from cubes as well as relational data warehouse to create ad-hoc reports and chart reports
  • Written Hive queries for data analysis to meet teh business requirements.
  • Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting.
  • Created many Spark UDF and UDAFs in Hive for functions dat were not preexisting in Hive and Spark Sql.

Environment: Linux, Apache Hadoop Framework, HDFS, YARN, HIVE, HBASE, AWS (S3, EMR), Scala, Spark, SQOOP, MS SQL Server 2014, Teradata, ETL, SSIS, Alteryx, Tableau (Desktop 9.x/Server 9.x), Python 3.x(Scikit-Learn/Scipy/Numpy/Pandas), AWS Redshift, Spark (Pyspark, MLlib, Spark SQL).

Confidential

Data Engineer

Responsibilities:

  • Document teh complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
  • Recommended structural changes and enhancements to systems and databases.
  • Conducted Design reviews and technical reviews wif other project stakeholders.
  • Was a part of teh complete life cycle of teh project from teh requirements to teh production support.
  • Created test plan documents for all back-end database modules.
  • Used MS Excel, MS Access, and SQL to write and run various queries.
  • Worked extensively on creating tables, views, and SQL queries in MS SQL Server.
  • Worked wif internal architects and assisting in teh development of current and target state data architectures.
  • Coordinate wif teh business users in providing appropriate, TEMPeffective, and efficient way to design teh new reporting needs based on teh user wif teh existing functionality.
  • Write Python scripts to parse JSON documents and load teh data in database.
  • Generating various capacity planning reports (graphical) using Python packages like Numpy, matplotlib.
  • Analyzing various logs dat are been generating and predicting/forecasting next occurrence of event wif various Python libraries.
  • Worked on data dat was a combination of unstructured and structured data from multiple sources and automated teh cleaning using Python scripts.
  • Extensively performed large data read/writes to and from csv and excel files using pandas.
  • Tasked wif maintaining RDD's using SparkSQL.
  • Communicated and coordinated wif other departments to collection business requirement.
  • Used python APIs for extracting daily data from multiple vendors.
  • Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.

Environment: SQL, SQL Server, MS Office, MS Visio, SQL Server 2012, Jupyter, R 3.1.2, Python, SSRS, SSIS, SSAS, MongoDB, HBase, HDFS, Hive, Pig, Microsoft office, Business Intelligence Development Studio.

We'd love your feedback!