We provide IT Staff Augmentation Services!

Sr.azure Data Engineer Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • More than 10+ years of experience on gathering System Requirements, Analyzing the requirements, Designing and developing systems.
  • Excellent domain knowledge on Canadian Banking and Finance including Canadian Capital Markets, Insurance, Telecommunications.
  • 6+ years of experience on Big Data Technologies like Hadoop, Hive, Spark, Kafka, Sqoop, Flume, HBase, Cassandra
  • Excellent Knowledge in understanding Big Data infrastructure, distributed file systems - HDFS, parallel processing - MapReduce framework and complete Hadoop ecosystem - Hive, Pig, Sqoop, Spark, Kafka, Hbase, NoSQL, Oozie and Flume.
  • Configured and administered Azure like Resource Group, Storage Account, Blob Storage, Delta Lake, Cluster config, Event Hub, Cosmos DB for different zones in development, testing and production environments.
  • Experience in working wif Azure Code Pipeline and creating Cloud Formation JSON templates to create custom sized VPC & migrate a production infrastructure into an Azure.
  • Hands-on experience wif building Azure notebooks, dbutils functions using Visual Studio Code and creating deployment using GIT.
  • Automated to build the Azure infrastructure using Terraform and Azure cloud formation.
  • Hands on experience on Spark Framework wif Spark core, Spark Streaming, Spark SQL for data processing by using Scala programing language.
  • Experienced Hadoop/Java developer and Spark/Scala having end to end experience in developing applications in Hadoop ecosystem.
  • Experience wif agile/scrum methodologies to iterate quickly on product changes, developing user stories and working through backlog.
  • Hands on experience in writing HiveQL queries to do data cleansing and processing and experienced in hive performance optimization using Partitioning and Bucketing and Parallel Execution concepts.
  • Excellent understanding and knowledge on NOSQL databases like HBase, Cassandra and MongoDB.
  • Proficient in developing Sqoop scripts for the extractions of data from various RDBMS databases into HDFS.
  • Good working experience on different file formats like PARQUET, TEXTFILE, AVRO, ORC and different compression codecs GZIP, SNAPPY, LZO.
  • Strong ability to compile Java programming (Core Java) including OOPS concepts, Class, Method, Inheritance, Encapsulation, Loop, Exception handling etc.
  • Hands on experience in application development using Java, RDBMS (SQL), and Linux/Unix shell scripting.
  • Experience working on Version control tools like SVN and GIT revision control systems such as GitHub and JIRA to track issues.

TECHNICAL SKILLS

Big Data Ecosystemts: HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Spark, Kafka, Spark, Hbase.

Language: Java, Scala,Python

RDBMS/Databases: Oracle10g and MS-SQL Server

Operating Systems: Windows2003, UNIX, Linux

Build Tools: SBT, Maven

Version Control Tools: SVN and git

PROFESSIONAL EXPERIENCE

Confidential

Sr.Azure data Engineer

Responsibilities:

  • Integrated in Infrastructure Development and Operations involving Azure Cloud platforms, Firewall setup, Blob Storage, Resource Groups, Network etc…,
  • Create Notebooks using Databricks, Scala and spark and capturing the data from Delta tables in Delta lakes.
  • Created Azure Data Factory and managing policies for Data Factory and Utilized Blob storage for storage and backup on Azure. Extensive knowledge in migrating applications from internal data storage to Azure.
  • Experience in building streaming applications in Azure Notebooks using Kafka and Spark.
  • Captured SCD2 and updated or inserted or deleted based on Business requirement using Databricks.
  • Develop up the Framework for creation of new snapshots and deletion of old snapshots in Azure Blob Storage and worked on setting up the life cycle policies to back the data from delta lakes.
  • Expert in building the Azure Notebooks functions by using Python, Scala and Spark.
  • Built and configured a virtual data center in the Azure cloud to support Enterprise Data Warehouse hosting including Virtual Private Cloud (VPC), Public and Private Subnets, Security Groups, Route Tables.
  • Integrated both framework andCloudFormation to automate Azure environment creation along wif ability to deploy on Azure, using build scripts (Azure CLI) and automate solutions using terraform.
  • Created GIT repositories and specified branching strategies dat best fitted the needs of the client.

Technologies Used: GIT, Azure (Cosmos DB, Kafka, Delta lakes, Blob Storage, Event hub, Databricks, Notebooks, Scala, Python), Terraform.

Confidential

Sr. BigData Developer

Responsibilities:

  • Worked on ingesting the source data into the Hadoop datalake (OLONA) from various databases by using Sqoop tool.
  • Reading, Processing and parsing CSV source data files through HQL script and ingesting to Hive and Impala tables.
  • Extensively worked on Hive, Hbase, Impala tables, partitions and buckets for analyzing large volumes of data.
  • Scheduled the Hive queries daily by using oozie coordinator and by writing an oozie workflow.
  • I also worked on database testing and QA validation to make sure dat the product is bug free.
  • Developed the application using Agile Methodology.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.

Technologies Used: Cloudera, Hive, Impala, Hbase, Oozie, SQL, GitHub, JIRA, Confluence

Confidential

Azure Developer

Responsibilities:

  • Design and develop the application using Agile/Scrum Methodologies.
  • Configure, monitor and automate Azure Notebooks as well as involved in deploying the content cloud platform.
  • Hands-on experience wif building Azure Notebooks functions using Databricks, Scala and creating deployment through Data Factory
  • Created Terraform Scripts to Automate Azure services which include Firewall, Blob Storage, database, and application configuration, this Script creates stacks, single servers or joins web servers to stacks.
  • Involved in DevOps migration/automation processes for build and deploy systems.
  • Developed Docker based micro services, deployment modules wif Jenkins, Kubernetes based pipelines/frameworks.
  • Used GIT as an open source platform for automating deployment, scaling and operations of applications containers across clusters of hosts, providing container centric infrastructure.

Technologies Used: GIT, Azure (Cosmos DB, Kafka, Delta lakes, Blob Storage, Databricks, Notebooks, Scala, Python), Terraform, UNIX/LINUX.

Confidential

Big Data/ETL Lead

Responsibilities:

  • Design and develop the application using Agile/Scrum Methodologies.
  • Involved in architecture design and implementation of best solutions wif Hadoop/Big data based up on the requirements of the client.
  • Performed Data ingestion, Batch Processing, Data Extraction, Transformation, Loading and Real Time Streaming using Hadoop Frameworks.
  • Experience working wif market data on Capital Market Project (Bloomberg).
  • Worked as an ETL/Big Data Lead wif leading a team of 6 developers.
  • Interacted wif multiple teams (Business Analyst, Project Management and Upstream development teams) and progressively tracking the issues and solving them.
  • Reading, Processing and parsing the source data files through Spark/Scala and ingesting to Hive tables.
  • Kafka streaming wif Spark framework is implemented for ingesting and analyzing the huge volumes of BMO data which is coming as 25TPS from the source systems.
  • Loading the data in to Hadoop ecosystem (Hive and HDFS) by using Hive ETL tool (Spoon).
  • Extensively worked on Hive tables partitions and buckets for analyzing large volumes of data.
  • Version controlling by using GIT Hub, Bitbucket tools and document maintenance by using JIRA, Confluence tools.
  • Expert in Performance tuning and optimization of the Hive jobs and SQL queries.
  • Scheduling and monitoring the Hive jobs on daily basis by using crontab tool.
  • Worked in Azure Cloud IaaS stage wif components Delta Lakes, Azure blob storage, Notebooks, DBFS, Spark, Scala, Data Factory and CosmosDB.
  • Reading, Processing and Parsing CSV source data files through Spark/Scala and ingesting to Hive tables.
  • The CSV source data is read through the Scala programming language from core (creating RDDs, Data Frame, Dataset, Scala Methods, Scala Classes and Objects, Pattern Matching, Working wif Lists, Collections, Etc.)
  • Knowledge transition to the end users and junior developers to understand the hive queries and the business requirements.

Technologies Used: Hortonworks Hadoop Distribution (HDP) 2.4, Azure, Hive, Confluent Kafka, Control-M, Hive 8.1, MySQL, Spark 2.3, Scala 2.1, HDFS, Unix, Hbase, GIT, Apache Parquet.

Confidential

Senior Hadoop Developer

Responsibilities:

  • Design and develop the application using Agile Methodologies.
  • Loading the data in to Hadoop ecosystem (Hive and Impala).
  • Extensively worked on Hive tables, Impala tables partitions and buckets for analyzing large volumes of data.
  • Version controlling by using GIT Hub, Source Tree tools and document maintenance by using JIRA, Confluence tools.
  • Expert in Performance tuning and optimization of the Hive jobs and SQL queries.
  • Extensively worked on Apache Flume to flow the data from Guidewire Billing center to the HDFS destination.
  • Scheduling and monitoring the hive jobs on daily basis by using Control-M tool.
  • Jenkins tools are used for continuous integration services for software development and automated builds.
  • Used Apache parquet wif Hive to make the advantages of compressed, efficient columnar data representation available to this project in Hadoop ecosystem.
  • Reading, Processing and parsing semi structured source data files through Spark/Scala and ingesting to Hive tables.
  • Knowledge transition to the end users and junior developers to understand the hive queries and the business requirements
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.

Technologies Used: Cloudera Distribution(CDH) 5.1, Hive, Impala, Flume, Control-M, MySQL, Spark 2.2, Scala, HDFS, Unix, Hbase, GIT, Apache Parquet.

Confidential

Senior Hadoop Developer

Responsibilities:

  • Worked on ingesting the source data into the Hadoop datalake from various databases by using Sqoop tool.
  • Reading, Processing and parsing CSV source data files through Spark/Scala and ingesting to Hive tables.
  • Extensively worked on Hive tables, partitions and buckets for analyzing large volumes of data.
  • Scheduled the Hive queries daily by using oozie coordinator and by writing an oozie workflow.
  • I also worked on database testing and QA validation to make sure dat the product is bug free.
  • Knowledge transition to the end users and junior developers to understand the hive queries and the business requirements.
  • Developed the application using Agile Methodology.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.

Technologies Used: Horton Works Hadoop Distribution (HDP 2.4), HDFS, Spark 1.6, Scala 2.1, Kerberos, Unix, Hive, tableau, Oozie, HBase, Kafka, NoSQL, MySQL

Confidential

Responsibilities:

  • Collaborating wif business users/product owners/developers to contribute to the analysis of functional requirements.
  • Design and develop the application using Agile Methodologies.
  • Involved in loading data from the system generated data sources to HDFS and experienced in writing multiple java-based Map-Reduce jobs for cleaning, processing the data.
  • Loading the data in to hadoop ecosystem by using hive from various data sources and file systems.
  • Extensively worked on Hive tables, partitions and buckets for analyzing large volumes of data.
  • Used Apache Avro wif Hive to make the advantages of compressed, efficient columnar data representation available to this project in Hadoop ecosystem.
  • Had experience wif Source Code Repository systems (SVN) and used revision control systems such as Git.
  • Debugging Map Reduce jobs using job history logs and syslog for tasks.
  • Developed shell scripts for adding process dates to the source files.
  • Transferring the log incoming log files to the Parser (specially written Java Code to load in to HDFS, HBASE) by using Kafka message broker.
  • Design & Develop ETL workflow using Oozie for business requirements which includes automating the extraction of data from MySQL database into HDFS using Sqoop scripts.
  • Splunk tool is used to retrieve the data from the hadoop cluster.
  • Extensively worked on Apache Flume to collect the logs and error messages across the cluster.
  • Created a Spark POC to capture user click stream data and find what topics they are interested in. Scala language is used in Spark project.
  • Scheduling the Hive jobs using Oozie and Falcon process files.
  • Performed defect co-ordination wif both Development & Testing Teams.
  • Performed data analytics in Hive and then exported this metrics back to Oracle Database using Sqoop.
  • Installed Hadoop single node and multi node clusters on Red Hat OS to store the data into HDFS for preforming various Hadoop jobs.
  • Jenkins tools are used for continuous integration services for software development and automated builds.
  • Conducting root cause analysis and resolve production problems and data issues.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.

Technologies Used: JDK1.7, Horton Works Hadoop Distribution (HDP 2.3), Red Hat Linux, HDFS, Map-Reduce, Hive, Pig, Kafka, Zookeeper, Apache Parquet, Oozie, HBase, NoSQL, Splunk, Spark, Apache Avro, BigSQL, Hive Data Intigration 6.1, Apache Solr 6.1.0, Core Java, Jenkins, MySQL

Confidential

Hadoop developer

Responsibilities:

  • Proactively monitored systems and services, architecture design and implementation of hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions. Documented the systems processes and procedures for future references.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers and network devices and pushed to HDFS.
  • Performed Java MapReduce programs on log data to transform into structured way to find user location, age group, spending time.
  • The data is migrated from RDBMS data sources to Hive, Pig and HDFS by using Sqoop.
  • Reading the Mainframe copy book and the mainframe files by Java language and migrating the data to hadoop ecosystems wif the metadata.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports by our BI team.
  • Jira bug tracking tool is used for logging the bugs.
  • ETL (Informatica) tool was used in this project hence I had exposure towards Informatica.
  • Tableau reporting tool was used to generate the reports from the Hadoop cluster.
  • Integrated Oozie wif the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Technologies Used: Horton Works Hadoop Distribution (HDP2.2), FLUME, HIVE, SQOOP, PIG, OOZIE, AMBARI, JAVA, LINUX, CENTOS/REDHAT, INFORMATICA, Core Java, Jenkins, Jira

Confidential, NY

Hadoop Developer

Responsibilities:

  • Collaborating wif business users/product owners/developers to contribute to the analysis of functional requirements.
  • Hands on experience on Hadoop technology and framework, coding, testing and implementation.
  • Developed Hive/Pig scripts to process the data using custom Java functions to convert data types. Created documentation for production team.
  • Cassandra and HBASE no-sql databases are used.
  • Hands on experience on designing and architecting large scale distributed applications.
  • Hands on experience wif Cassandra cluster nodes.
  • This project is built on top of Ubuntu Linux environment.
  • Provided support to data analyst in running Pig and Hive queries for creating views.
  • Configured PIG, HIVE, SQOOP, HBASE echo-systems on Hadoop for developing PIGLATIN scripts to process the data and HIVE queries for loading the data into HIVE tables.
  • Design & Develop ETL workflow using oozie for business requirements which includes automating the extraction of data from MySQL database into HDFS using Sqoop scripts.
  • Designed HBase (No SQL) database in this project.
  • Designed workflow by scheduling Hive processes for Log file data which is streamed into HDFS using Kafka.
  • Developed Map reduce programs were used to extract and transform the data sets and results were exported back to RDBMS (structured) using Sqoop.
  • Integrating Hadoop eco-systems wif Tableau and designing reports by Tableau tool according to the requirements.
  • Data Management, Data profiling TEMPhas been done on the Hadoop cluster.
  • Data modeling is done by defining and analyzing the data requirements needed to support the business process wifin the scope of corresponding information systems in organizations.
  • Performed data analytics in Hive and then exported this metrics back to Oracle Database using Sqoop.
  • Installation and Administration of a Hadoop cluster. Debugging and troubleshooting the issues in development and Test environments.
  • JIRA bug tracking tool is used for logging the bugs.
  • Conducting root cause analysis and resolve production problems and data issues.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Involved in Minor and Major Release work activities.

Technologies Used: Cloudera hadoop distribution (CDH 2.0), JDK1.6, Red Hot Linux, HDFS, Map-Reduce, Hive, Pig, Sqoop, Kafka, Zookeeper, Oozie, Netezza, Teradata, DB2, Cassandra, NoSQL, MongoDB, Core Java, JIRA

Confidential

Java Developer

Responsibilities:

  • Designed the application UML Class Diagrams, Sequence diagrams using RSA.
  • Involved in creating technical design for the project along wif core team members.
  • Java application was run on JVM (Java Virtual Machine).
  • Interaction wif business requirements team and developed business process
  • Developed task utility services necessary for generating documents
  • Developed the application by using core java programming skills.
  • Handled overnight build in this project and involved in the proper release.
  • Worked in Windows and Oracle Enterprise Linux, Apache Tomcat, Oracle WebLogic Server.

Technologies Used: JDK1.5, Core Java, J2EE, Linux, HTML, JSP, Springs (IOC and AOP), JSF, Web Sphere, Hibernate, JavaScript, Maven, CSS, DB2, XML, UML, XSLT, FTP, HTTP, RSA7.0, JUnit, Log4j, Apache Velocity, JMS, JDBC, EJB and Web Services

We'd love your feedback!