We provide IT Staff Augmentation Services!

Azure Data Engineer Resume

4.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY

  • Experience in design, development, maintenance and support of Big Data Analytics using Java, Horton work, Cloudera Hadoop Ecosystem tools like HDFS, Hive, Sqoop, Pig, Spark, Kafka.
  • Experienced in processing Big data on teh Horton work, Cloudera, Apache Hadoop framework using MapReduce programs.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate - wide ETL Solutions andData warehouse tools for reporting and data analysis.
  • Excellent understanding and knowledge of NOSQL databases like HBase and Mongo DB.
  • Good knowledge of Hadoop ecosystem, HDFS, Big Data, RDBMS, SPARK.
  • Having experience on RDD architecture and implementing spark operations on RDD also optimizing transformations and actions in Spark.
  • Experienced in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions, Horton Works and AWS.
  • Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration.
  • Experience in implementing OLAP multi-dimensional cube functionality usingAzure SQL Data Warehouse.
  • DevelopingSparkapplications usingSpark - SQLinAzure Databricksfor data extraction, transformation and aggregation from multiple file formats for analyzing & transforming teh data to uncover insights into teh customer usage patterns. experience with Statistics, Data Analysis, Machine Learning using R language and Python.
  • Good knowledge on Spark, Hadoop, HBase, Hive, Pig Latin Scripts, MR, Sqoop, Flume, Hive QL.
  • Experience in analyzing data using Pig Latin, HiveQL and HBase.
  • Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Implemented Proofs of Concept on Hadoop stack and different big data analytic tools, migration from different databases (i.e.Teradata, Oracle, MYSQL) to Hadoop.
  • Worked on NoSQL databases including HBase, Cassandra and MongoDB
  • Successfully loaded files to Hive and HDFS from MongoDB, HBase
  • Experience in configuring Hadoop Clusters and HDFS.
  • Good Understanding in Apache Hue, GITHUB and SVN.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka
  • Worked extensively in Java, J2EE, XML, XSL, EJB, JSP, JSF, JDBC, MVC, Jakarta struts, JSTL, Spring2.0, Design Patterns and UML.
  • Extensive experience in Object Oriented Programming, using Java & J2EE (Servlets, JSP, Java Beans, EJB, JDBC, RMI, XML, JMS, Web Services, AJAX).
  • Excellent analytical and problem-solving skills and ability to quickly learn new technologies. Worked withAgile, Scrum and Confidentialsoftwaredevelopment framework for managing product development.
  • Deploy data warehouse and BI solutiongetting information to teh users as quickly as possible, so they can see data and request relevant changes while development is under way.Developing and testing reports and dashboards in close collaboration with users
  • Good communication and interpersonal skills. A very good team player with teh ability to work independently.
  • Hands-on experience with AWS ( Confidential Web Services), using Elastic MapReduce (EMR), creating and storing data in S3buckets and creating Elastic Load Balancers(ELB)

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Spark, Scala, Kafka.

No SQL Databases: HBase,Cassandra, MongoDB

Languages: C, Python, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, R Programming

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, jQuery

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer, Tableau, Qlik View, Micro strategy, Informatica

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

Methodologies: Agile (Scrum), Theta’s Pragmatic Agile methodology, Waterfall.

PROFESSIONAL EXPERIENCE

Confidential, Dallas, Tx

Azure Data Engineer

Responsibilities:

  • Analyze, design and build Modern data solutions using Azure PaaS service to support visualization of data. Understand current Production state of application and determine teh impact of new implementation on existing business processes.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics . Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing teh data in InAzure Databricks.
  • Experience on Migrating SQL database toAzure data Lake, Azure data lake Analytics,Azure SQL Database, Data BricksandAzure SQL Data warehouseand Controlling and granting database accessandMigrating On premise databases toAzure Data lake storeusing Azure Data factory.
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Developed Spark applications usingPysparkandSpark-SQLfor data extraction, transformation and aggregation from multiple file formats for analyzing &
  • Developed web page and form validation with team using Angular, Bootstrap.js, Node.js, Express.js, HTML5, CSS3. transforming teh data to uncover insights into teh customer usage patterns.
  • Responsible for estimating teh cluster size, monitoring and troubleshooting of teh Spark databricks cluster.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • To meet specific business requirements wrote UDF’s inScalaandPyspark.
  • Developed JSON Scripts for deploying teh Pipeline in Azure Data Factory (ADF) that process teh data using teh Sql Activity.
  • Hands-on experience on developing SQL Scripts for automation purpose.
  • Created Build and Release for multiple projects (modules) in production environment using Visual Studio Team Services (VSTS).

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Spark, Kafka, IntelliJ, ADF,Cosmos, Sbt, Zeppelin, YARN, Scala, SQL, Git.

Confidential, New York

Big Data Engineer

Responsibilities:

  • Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
  • Importing teh unstructured data into teh HDFS using Flume.
  • Used Oozie to orchestrate teh map reduce jobs that extract teh data on a timely manner.
  • Written Map Reduce java programs to analyze teh log data for large-scale data sets.
  • Involved in using HBase Java API on Java application.
  • Automated all teh jobs for extracting teh data from different Data Sources like MySQL to pushing teh result set data to Hadoop Distributed File System, Cloudera
  • Implemented Map Reduce jobs using Java API and PIG Latin as well HIVEQL
  • Participated in teh setup and deployment of Hadoop cluster, Cloudera.
  • Hands on design and development of an application using Hive (UDF).
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Worked and learned a great deal from AmazonWebServices (AWS) Cloud services like EC2, S3
  • Hand on experience on cloud services like Confidential Web Services (AWS)
  • Created data pipelines for different events to load teh data from DynamoDB to AWS S3 bucket and tan into HDFS location.
  • Worked on reading multiple data formats onHDFSusingpython.
  • Automatically scale-up teh EMR Instances based on teh data.
  • ImplementedSparkusing Python (pySpark) andSparkSQLfor faster testing and processing of data.
  • Imported real time weblogs using Kafka as a messaging system and ingested teh data to Spark Streaming.
  • Deployed teh project on Confidential EMR with S3 Connectivity.
  • Implemented usage of Confidential EMR for processing Big Data across a Hadoop Cluster of virtual servers on Confidential Elastic Compute Cloud (EC2) and Confidential Simple Storage Service (S3).
  • Worked on AWS cloud services (EC2, S3, RDS, Coludwatch, Redshift, EMR, Kinesis,
  • Loaded teh data into Simple Storage Service (S3) in teh AWS Cloud.
  • Good Knowledge in using of Confidential Load Balancer for Auto scaling in EC2 servers.
  • Worked in designing and implementation of multi-tier applications using web-based technologies likeSpring MVCandSpring Boot.
  • Worked with a high quality Data Lakes and Data Warehousing team and design teh team to scale. Build cross functional relationships with Data analysts,
  • Implemented a'server less'architecture usingAPI Gateway, Lambda, and Dynamo DBand deployedAWS Lambda codefrom Confidential S3 buckets. Created a Lambda Deployment function, and configured it to receive events from your S3 bucket
  • Designed teh data models to be used in data intensiveAWS Lambdaapplications which are aimed to do complex analysis creating analytical reports for end-to-end traceability
  • Worked into a Data Lake and a Data Swamp and seeNoSQL, teh technology responsible for teh benefits of a Data Lake
  • Executed teh Spark jobs in Confidential EMR.
  • Migrated an existing on-premises application to AWS.
  • Initially migrated existing MapReduce programs to spark model using python.
  • Designed data visualization to present current impact and growth of teh department using python package Matplotlib.
  • Involved in data analysis using python and handling teh ad-hoc requests as per requirement.
  • Developing python scripts for automating tasks.
  • Provide support data analysts in running Pig and Hive queries.
  • Involved in HiveQL and Involved in Pig Latin.
  • Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
  • Developed data warehouse model in snowflake for datasets using whereScape.
  • Involved in Migrating Objects from Teradata to Snowflake.
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
  • Configured HA cluster for both Manual failover and Automatic failover.
  • Excellent working Knowledge in Spark Core, Spark SQL, Spark Streaming.
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka
  • Used various Java APIs like Apache POI, Java Email, I Text etc. as part of test automation.
  • Performed API Level testing for web services, enhanced theTest harness and developed many Test suites using XML and Python.Configured deployed and maintained multi-node Dev and Test Kafka Clusters
  • Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Pig Latin and Java-based map-reduce.
  • Specifying teh cluster size, allocating Resource pool, Distribution of Hadoop by writing teh specification texts in JSON File format.
  • Experience in writing SOLR queries for various search documents
  • Responsible for defining teh data flow within Hadoop eco system and direct teh team in implement them.
  • Developed Json Scripts for deploying teh Pipeline in Azure Data Factory (ADF) that process teh data using teh Cosmos Activity.
  • Worked with multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among teh team.
  • Develop and deploy teh outcome using spark and Scala code in Hadoop cluster running on GCP.
  • Designs and develops test plans for ETL unit testing and integration testing
  • Involved in convertingHive/SQLqueries into Spark transformations using API’s likeSpark SQL, Data Framesand python.
  • Analyzed teh SQL scripts and designed teh solution to implement using python.
  • Generated graphs and reports using ggplot package in RStudio for analytical models.
  • Primarily responsible for designing, implementing, Testing, and maintaining database solution for Azure.
  • Utilized Agile and Scrum Methodology to halp manage and organize a team of developers with regular code review sessions.
  • Perform validation on machine learning output from R.

Environment: Big Data Horton Work, Apache Hadoop, Hive, Python, Hue Tool, Zookeeper, Map Reduce, Sqoop, crunch API,Pig 0.10 and 0.11, HCatalog, Unix, Java, JSP, Eclipse, Maven, Oracle, SQL Server, Linux,MYSQL.

Confidential

Big Data Engineer

Responsibilities:

  • Processed BigData using a Hadoop cluster consisting of 40 nodes.
  • Designed and configured Flume servers to collect data from teh network proxy servers and store to HDFS.
  • Loaded teh customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.
  • Built data pipeline using Pig and Java Map Reduce to store onto HDFS.
  • Applied transformations and filtered both traffic using Pig.
  • Used Pattern matching algorithms to recognize teh customer across different sources and built risk profiles for each customer using Hive and stored teh results in HBase.
  • Performed unit testing using MRUnit.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on teh fly for building teh common learner data model which gets teh data from Kafka in near real time and Persists into Cassandra.
  • Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Experience in design and develop teh POC in Spark using Scala to compare teh performance of Spark with Hive and SQL/Oracle.
  • Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration.
  • Consumed teh data from Kafka using Apache spark.
  • Performed various benchmarking steps to optimize teh performance of spark jobs and thus improve teh overall processing.
  • Used Spark API over Horton work Hadoop YARN to perform analytics on data in Hive and involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in teh backend.
  • Responsible for building scalable distributed data solutions using Hadoop
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on teh Hadoop cluster
  • Setup and benchmarked Hadoop/HBase clusters for internal use
  • Developed Simple to complex Map/reduce Jobs using Hive and Pig.
  • Involved in convertingMap Reduceprograms into Spark transformations using Spark RDD on python.
  • Developed Spark scripts by using pythonShellcommands as per teh requirement.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted teh data from MySQL into HDFS using Sqoop
  • Developed Merge jobs inPythonto extract and load data into MySQL database.
  • Analyzed teh data by performing Hive queries and running Pig scripts to study employee behavior
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs

Environment: Hadoop, Hive, Zookeeper, Python, Map Reduce, Sqoop, Pig 0.10 and 0.11, JDK1.6,HDFS, Flume, Oozie, DB2, HBase, Mahout, Unix, Linux

Confidential 

BigData Developer/Admin

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Developed Simple to complex Map reduce Jobs using Hive and Pig.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted teh data from MySQL into HDFS using Sqoop
  • Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports for teh BI team. Extensively used Pig for data cleansing.
  • Created partitioned tables in Hive. Managed and reviewed Hadoop log files.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting.Installed and configured Pig and written Pig Latin scripts.
  • Developed Pig Latin scripts to extract teh data from teh web server output files to load into HDFS.
  • Load and transform large sets of structured, semi structured and unstructured data Responsible to manage data coming from different sources
  • Worked with application teams to install operating system, Hadoopupdates, patches, version upgrades as required.

Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Java, SQL, Sqoop, Java (jdk 1.6), Eclipse, Git, Unix,Linux,Subversion.

We'd love your feedback!