Data Engineer Resume
3.00/5 (Submit Your Rating)
SUMMARY:
- Accomplished IT proficient with 8+ years of engagement, spent significant time in Big Data systems, Data Acquisition, Ingestion, Modeling, Storage Analysis, Integration, and Data Processing.
- An adept fanatic engineer with solid critical thinking, investigating and logical abilities, who effectively participates in understanding and delivering business requirements.
- Experience in Big Data analytics, Data manipulation, using Hadoop Eco system tools Map - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, Sqoop, AWS, Spark integration with Cassandra, Avro and Zookeeper.
- Closely worked together with business outcomes, production prop up, designing group consistently for diving profound on informational data, effective dynamic decision making and to help Analytics phases.
- Strong Hadoop and stage uphold involvement in major Hadoop Distributions like Cloudera, Hortonworks, Amazon EMR, and Azure HDInsight.
- Excellent information on Hadoop design and its core key ideas like distributed frameworks, Parallel transformations, High accessibility, Fault resistance and Flexibility.
- Extensive working involvement in Big Data systems like Hadoop (HDFS, MapReduce, Yarn), Spark, Kafka, Hive, Impala, HBase, Sqoop, Pig, Airflow, Oozie, Zookeeper, Ambari, Flume, Nifi.
- Proficient at composing MapReduce jobs and UDF's to assemble, examine, change, and convey the information according to business prerequisites.
- Hands on involvement with making continuous data streaming solutions utilizing Apache Spark, Spark SQL and Data Frames, Kafka, Spark streaming and Apache Storm.
- Proficient in building PySpark and Scala applications for interactive analysis, batch processing, and stream processing.
- Strong working involvement in SQL and NoSQL databases, data modeling and data pipelines. Associated with start to finish advancement and automate ETL pipelines utilizing SQL and Python.
- Acquired significant knowledge with AWS clo
PROFESSIONAL EXPERIENCE:
Confidential
Data Engineer
Responsibilities:
- Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write - back tool and backwards. Strong experience of leading multiple Azure Big Data and Data transformation implementations in Banking and Financial Services, High Tech and Utilities industries. Implemented large Lambda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight
- Azure SQL Server, Azure ML and Power BI. Prepared the complete data mapping for all the migrated jobs using SSIS. Designed SSIS Packages to transfer data from flat files to SQL Server using Business Intelligence Development Studio. Extensively used SSIS transformations such as Lookup, Derived column, Data conversion, Aggregate, Conditional split, SQL task, Script task and Send Mail task etc Designed end to end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake, Storage and Machine Learning Studio. Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL
- Activity. Developed Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns. Undertake data analysis and collaborated with down-stream, analytics team to shape the data according to their requirement. Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning. Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team
- Services (VSTS). Designed and Developed Real time Stream processing Application using Spark, Kafka, Scala, and Hive to perform Streaming ETL and apply Machine Learning. Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables. Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL. Azure Kubernetes Service was used to deploy a managed Kubernetes cluster in Azure and built an Azure portal AKS cluster with Azure CLI, and also used template-driven deployment options such as templates for the Resource Manager and Terraform. Used Kubernetes to deploy scale, load balance, scale and manage Docker containers with multiple name spaced versions. Designed strategies for optimizing all aspect of the continuous integration, release and deployment processes using container and virtualization techniques like Docker and Kubernetes. Built Docker containers using Microservices project and deploy to Dev. Collected the Json data from HTTP Source and developed Spark
Confidential
Data engineer/Big Data Developer
Responsibilities:
- Created Spark jobs on Databricks to perform undertakings like data validation, standardization, and normalization and afterward applied changes according to the utilization use cases. Worked with Data Science group running Machine Learning models on Spark EMR cluster and conveyed the information needs according to business prerequisites. Experience with Apache big data Hadoop segments like HDFS, MapReduce, YARN, Hive, HBase, Sqoop, Pig, Nifi, and Kafka. Involved in building an information pipeline and performed analysis utilizing AWS stack (EMR, EC2, S3, RDS, Lambda, Glue, SQS, and
- Redshift). Created Sqoop jobs for information ingestion, steady information loads from RDBMS to HDFS. Used Spark's in memory capacities to deal with huge datasets on S3 Data Lake. Stacked information into S3 pails, sifted and stacked into Hive external tables. Involved in creating group and stream handling applications that require practical pipelining utilizing Spark APIs. Strong Hands - on involvement with making and altering SQL stored procedure techniques, functions, views, indexes, and triggers. Involved in building up an exceptionally assembled Rest API to help continuous client analytic investigation for information researchers and applications. Involved in extricating and enhancing various Cassandra tables utilizing joins in SparkSQL. Likewise changed over Hive queries into Spark transformations. Automated the process of transforming and ingesting terabytes of monthly data using Kafka, S3, Lambda and Oozie. Involved in migrating data from on prem Cloudera cluster to AWS EC2 instances deployed on EMR cluster and developed ETL pipeline to extract logs and store in AWS S3 Data Lake and further processed it using PySpark. Fetched live information from Oracle database utilizing Spark Streaming and Kafka utilizing the feed from API Gateway REST service. Performed ETL activities utilizing Python, SparkSQL, S3 and Redshift on terabytes of information to acquire customer insights. Developed Oozie work processes for planning and arranging the ETL cycle. Associated with composing
- Python scripts to computerize the way towards extricating weblogs utilizing Airflow DAGs. Hands-on experience on API plan and advancement utilizing Spring Boot for Data development across various frameworks. Involved in writing unit tests, worked alongside DevOps group in Installing libraries, Jenkins operators, in production ETL occupations and microservices. Created Complex ETL Packages using SSIS to extract data from staging tables to partitioned tables with incremental load. Created SSIS Reusable Packages to extract data from Multi formatted Flat files, Excel, XML files into UL Database and DB2
- Billing Systems. Developed, deployed, and monitored SSIS Packages. Used Ansible to arrange the climate and conveyed applications in a CI/CD measure utilizing a Jenkins pipeline. Additionally, deployed designs utilizing Terra
Confidential
Data Engineer
Responsibilities:
- Worked with the Hortonworks Distribution of Hadoop. Involved in the development of Confidential Data Lake and in building Confidential Data Cube on Microsoft Azure HDINSIGHT cluster. Responsible for managing data coming from disparate data sources. Experience in ingesting incremental updates from structured ERP systems residing on Microsoft SQL server database on to Hadoop data platform using SQOOP. Implemented OLAP multi - dimensional cube functionality using Azure SQL Data Warehouse. Responsible for transporting and processing real-time stream data sourced from Magento and Form site
- APIs for inventory management using NIFI, Kafka and Storm. Experience in working with Restful APIs. Created HBase tables to store various data formats coming from different applications. Developed scripts for extracting and processing EDI POS sales data sourced from SFTP server in Hive data warehouse using Linux shell scripting. Implemented proof of concept to analyze the streaming data using Apache Spark with Scala, used Maven/SBT for build and deploy the Spark programs. Responsible for building Confidential data cube using SPARK framework by writing Spark SQL queries in Scala so as to improve efficiency of data processing and reporting query response time. Developed spark programming code in SCALA on INTELLIJ IDE using SBT tools. Performance tuning of SQOOP, Hive and Spark jobs. Responsible for modification of ETL data load scripts, scheduling automated jobs and resolving production issues (if any) on time. Wrote AZURE POWERSHELL scripts to copy or move data from local file system to HDFS Blob storage. Developed OOZIE workflows to automate ETL process by scheduling multiple SQOOP and HIVE and Spark jobs. Daily Monitoring of Cluster status and health using AMBARI UI.
- Experience in rendering and delivering reports in desired formats by using reporting tools such as Tableau. Maintained technical documentation for launching and executing jobs on Hadoop clusters. Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
- Responsible for programming code independently for intermediate to complex modules following development standards. Planned and conducted code reviews for changes and enhancements that ensure standards compliance and systems interoperability. Responsible for modifying the code, debugging, and testing the code before deploying on the production cluster.
Environment: Hadoop Stack, Java, Sqoop, Hive, ATSCALE, Oozie, Microsoft SQL server, NIFI, Kafka, Storm, Ubuntu, HBASE, YARN, Hortonworks, UNIX Shell Scripting, AZURE PowerShell, CRON, Scala, Spark, R, Maven, SBT, IntelliJ, Tableau, Microsoft Azure HDINSIGHT, SSMS, Azure Data Factory, Azure Data Warehouse, SAP HANA.
Confidential
Big Data Developer
Responsibilities:
- Worked with Hortonworks dispersion. Introduced, arranged, and kept up a Hadoop cluster dependent on the business prerequisites. Involved in start to finish execution of ETL pipelines utilizing Python and SQL for high volume data analysis, likewise audited use cases before on boarding to HDFS. Mindful to stack, oversee and audit terabytes of log records utilizing Ambari web UI. Utilized Sqoop to relocate information between relational DBMS and HDFS. Ingested data from MS SQL, Teradata, and Cassandra databases. Performed specially appointed inquiries utilizing Hive joins, bucketing techniques for faster data access. Utilized Nifi to mechanize the data stream between divergent frameworks. Planned dataflow models and objective tables to acquire applicable metrics from different sources. Developed Bash contents to get log documents from FTP server and executed Hive responsibilities to parse them.
- Actualized different Hive queries for analytics. Created External tables, advanced Hive queries and improved the cluster execution by 30%. Enhanced contents of existing Python modules. Dealt with composing APIs to stack the processed data to HBase tables. Migrated ETL tasks to Pig contents to apply joins, aggregations, and transformations. Worked with Jenkins build and continuous integration tools. Involved in writing Groovy scripts to automate the Jenkins pipeline's integration and delivery service. Used Power BI as a front - end BI tool and MS SQL Server as a back-end database to plan and create dashboards, workbooks, and complex aggregate calculations. Used Jenkins for CI/CD and SVN for version control. Used Informatica as an ETL tool to create source/target definitions, mappings and sessions to extract, transform and load data into staging tables from various sources. Designed and Developed
- Informatica processes to extract data from internal check issue systems. Used Informatica Power exchange to extract data from one of the EIC s operational system called Datacom. Extensive experience in Building, publishing customized interactive reports and dashboards, report scheduling using Tableau
- Desktop and Tableau Server. Extensive experience in Tableau Administration Tool, Tableau Interactive Dashboards, Tableau suite. Developed Tableau visualizations and dashboards using Tableau Desktop and published the same on Tableau Server. Used Informatica Power Center for (ETL) extraction, transformation and loading data from heterogeneous source systems into target database. Analyzed data stored in S3 buckets using SQL, PySpark and stored the processes data in Redshift and validated data sets by implementing Spark components. Worked as ETL developer and Tableau developer and widely involved in Designing, Development Debugging of ETL mappings using Informatica designer tool as well as Created advanced chart types, visualizations and complex calculations to manipulate the data using Tableau D
Confidential
Data Analyst
Responsibilities:
- Understand the data visualization requirements from the Business Users. Writing SQL queries to extract data from the Sales data marts as per the requirements. Developed Tableau data visualization using Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart. Designed and deploy rich
- Graphic visualizations with Drill Down and Drop - down menu option and Parameterized using Tableau. Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau. Explored traffic data from databases connecting them with transaction data, and presenting as well as writing report for every campaign, providing suggestions for future promotions. Extracted data using SQL queries and transferred it to Microsoft Excel and Python for further analysis. Data Cleaning, merging and exporting the dataset was done in Tableau Prep. Data processing and cleaning techniques carried out to reduce text noise, reduce dimensionality in order to improve the analysis.
Environment: Python, Informatics v9.x, MS SQL SERVER, T-SQL, SSIS, SSRS, SQL Server Management Studio, Oracle, Excel.
