Azure Data Engineer Resume
Bellevue, WA
SUMMARY
- Around 5+ years of experience in IT industry, nearly 4 years of experience in all phases of Hadoop Eco system components and Big Data technologies.
- Experienced in configuring, installing, upgrading and managing Hortonworks, Cloudera Hadoop Distributions.
- Hands on experience with Big Data and Hadoop Eco - System components (HDFS, Map Reduce, Yarn, Hive, Hue, Sqoop, Flume, Spark, Oozie, HBase and Pig).
- Experienced in implementing Big Data projects using Hortonworks Distribution.
- Experienced in managing and reviewing Hadoop Log Files.
- Experience in handling of Cloudera’s Hadoop and Hortonworks Hadoop
- Hands on experience on configuring a Hadoop cluster in a professional environment and on Amazon Web Services (AWS) using an EC2 instance.
- Experienced in working with MapReduce programs and Hive commands to deliver the best results.
- Experienced in implementing High Availability using QJM and NFS to avoid single point of failure.
- Performed Decommissioning, Commissioning, Balancing, managing nodes and tuning server for optimal performance on running cluster.
- Experienced in writing Oozie workflows and job controllers for job automation.
- Having strong experience in different data warehouse tools including ETL tools like Informatica etc. and BI tools like MicroStrategy, Tableau, and Relational Database systems like Oracle, MySQL and PostgreSQL.
- Experienced in Close monitoring and analysis of the Map Reduce job executions on cluster at task level.
- Experienced with complete Software Design Lifecycle including design, development, testing and implementation of moderate to advanced complex systems.
- Develop NiFi workflow to pick up the multiple retail files from ftp location and move those to HDFS on daily basis
- Having working knowledge on Sqoop and Flume for Data Processing.
- Enabled security to the cluster using Kerberos.
- Experienced in monitoring, troubleshooting and performance tuning skills.
- Excellent Knowledge in NoSQL databases like HBase.
- Experienced in loading data from the different data sources like (Teradata and DB2) into HDFS using Sqoop and load into partitioned Hive tables.
- Overall Strong experience in system Administration, Installation, Upgrading, Patches, Migration, Configuration, Troubleshooting, Security, Backup, Disaster Recovery, Performance monitoring and Fine-tuning on Linux (RHEL) systems.
- Communicated to diverse communities of clients at offshore and onshore, dedicated to client satisfaction and quality outcomes. Extensive experience in coordinating the Offshore Development activities
- Represent the production on company -wide project teams.
- Monitor process and software changes that impact production support, communicate project information to the production support staff and raise production support issues to the project team.
- Prioritize workload, providing timely and accurate resolutions. Perform production support activities which involve assignment of issues and issue analysis and resolution within the specified SLAs.
TECHNICAL SKILLS
Big Data components: Hadoop/Big Data HDFS, MapReduce, HBase, Pig, Cassandra, Hive, Scala, Sqoop, Oozie, Flume, Kafka, Zookeeper, MongoDB
Programming Languages: Java (J2SE, J2EE), C, C#, PL/SQL, Swift, SQL+, ASP.NET, JDBC, Python
Testing Tools: J-Unit Testing, HP- Unified functional testing, HP- Performance Centre, Selenium, win runner, Load Runner, QTP
UNIX Tools: Apache, Yum, RPM
Operating Systems: Windows, Linux, Ubuntu, Mac OS, Red Hat Linux
Protocols: TCP/IP, HTTP and HTTPS
Web Servers: Apache Tomcat
Cluster Management Tools: Cloudera Manager, Horton Works, Ambari
Methodologies: Agile, V-model, Waterfall model
Databases: HBase, MongoDB, Cassandra, Oracle 10g, MySQL, Couch, MS SQL server
PROFESSIONAL EXPERIENCE
Azure Data Engineer
ConfidentialResponsibilities:
- Designed and developed Hadoop - based Big Data analytic solutions and engaged clients in technical discussions.
- Worked on multiple Azure platforms like Azure Data Factory, Azure Data Lake, Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight.
- Worked on the creation and implementation of custom Hadoop applications in the Azure environment.
- Created ADF Pipelines to load data from an on-prem to Azure SQL Server database and Azure Data Lake storage.
- Developed complicated Hive queries to extract data from various sources (Data Lake) and to store it in HDFS.
- Used Azure Data Lake Analytics, HDInsight/Databricks to generate Ad Hoc analysis.
- Developed custom ETL solutions, batch processing, and real-time data ingestion pipeline to move data in and out of Hadoop using PySpark and shell scripting.
- Data Ingestion to at least one Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
- Worked on all aspects of data mining, data collection, data cleaning, model development, data validation, and data visualization.
- Experienced in managing Azure Data Lake Storage (ADLS), Databricks Delta Lake and an understanding of how to integrate with other Azure Services.
- Worked on building data pipelines using Azure Data Factory, Azure Databricks, loading data to Azure Data Lake.
- Handled bringing in enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce, and then loading data into HBase tables.
- Responsible for estimating the cluster size, monitoring, and troubleshooting of the Hadoop cluster.
- Used Zeppelin, Jupyter notebooks, and Spark-Shell to develop, test, and analyze Spark jobs before Scheduling Customized Spark jobs.
- Worked with Azure BLOB and Data lake storage and loading data into Azure SQL Synapse analytics (DW).
- Performing hive tuning techniques like partitioning, bucketing, and memory optimization.
- Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation, aggregation from various file formats for analysing & transforming the data to uncover insights into customer usage pattern.
- Analyze, design, and build Modern data solutions using Azure PaaS service to support visualization of data.
- Using Data bricks utilities called widgets to pass parameters on run time from ADF to Data bricks.
- Integrated data storage options with Spark, notably with Azure Data Lake Storage and Blob storage.
- Hands-on experience on creating Spark cluster in both HDInsight's and Azure Databricks environment.
- Created an Oozie workflow to automate the process of loading data into HDFS and Hive.
- Created tables using NoSQL databases like HBase to load massive volumes of semi-structured data from sources.
- Created, provisioned numerous Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
- Worked on creating tabular models on Azure analysis services for meeting business reporting requirements.
- Developed SSIS modules to move data from a variety of sources like MS Excel, Flat files, and CVS files.
- Designed, developed, and deployed Business Intelligence solutions using SSIS, SSRS, and SSAS.
- Implemented a variety of MapReduce tasks in Scala for data cleansing and data analysis in Impala.
- Fetched live stream data using Spark Streaming and Kinesis.
- Imported and exported the data using Sqoop from HDFS to Relational Database systems and vice-versa and loaded into Hive tables, which are partitioned.
Environment: Azure Data Factory, Azure Databricks, Azure Data Lake, Blob Storage, HDFS, MapReduce, Spark, SQL, Hive, HBase, HDInsight, Kafka, Oozie, NiFi, Jenkins, OLAP, OLTP, Scala, SSIS, Agile.
Hadoop Engineer
Confidential - Bellevue, WA
Responsibilities:
- Involved in the requirement collecting phase to collect needs from business users to accommodate changing user requirements constantly.
- Created a Data Quality Framework for Spark that does schema validation and data profiling (PySpark).
- Developed Spark code for quicker data testing and processing utilizing Scala and Spark-SQL/Streaming.
- Used Python and Scala with Spark to design data and ETL pipeline.
- Developed very complicated Python and Scala scripts that are sustainable, easy to use, and meet application requirements, data processing, and analytics through the usage of built-in libraries.
- Contributed to the design of Spark SQL queries, Data frames, data import from data sources, transformations, read/write operations, and saving the results to an output directory in HDFS/AWS S3.
- Created Pig Latin scripts to import data from web server output files and to store it in HDFS.
- Created Tableau tools to help internal and external teams see and extract information from big data platforms.
- Responsible for conducting Hive queries and running Pig scripts on raw data to analyze and clean it.
- Created Hive tables, imported data, and wrote Hive queries.
- Worked on ETL (Extract, Transform, Load) processing, which includes data source, data transformation, mapping, conversion, and loading.
- Used multiple compression algorithms to optimize MapReduce jobs to make the most of HDFS.
- Worked with AWS Elastic Cloud Compute (EC2) infrastructure for computational operation, while Simple Storage Service (S3) was used as a storage method.
- Configured AWS CLI and performed necessary actions on the AWS services using scripting.
- Able to execute and monitor Hadoop and Spark tasks on AWS using EMR, S3, and Cloud Watch services.
- Configured the monitoring and alerting of production and corporate servers/storage usingCloud Watch.
- Developed Docker containers by merging them with workflow to make them lighter.
- Developed several MapReduce programs to extract, transform, and aggregate data from a variety of file formats including XML, JSON, CSV, and other compressed file formats.
- Migrated the existingdatafrom Teradata/SQL Server to Hadoop and perform ETL operations on it.
- Good knowledge in queryingdatafrom Cassandra for searching grouping and sorting.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, DataFrames, and Pair RDD's.
- Good knowledge in using Apache NiFi to automate the data movement between different Hadoop systems.
- Created AWS Lambda, EC2 instances provisioning on AWS environment and implemented security groups, administered Amazon VPC's.
- Developed ETL Pipelines in and out ofdatawarehouse, develop major regulatory and financial reports using advanced SQL queries.
- Experience in using theAWSservices Athena, Redshift and Glue ETL jobs.
- Experience in using Terraform to create Infrastructure as Code onAWS.
- Scripting experience in PySpark, which involves cleansing and transformation ofdata.
- Used the AWS Kinesis to gather and load data onto HDFS, using Sqoop to load data from relational databases.
- Developed job processing scripts using Oozie workflow to automate data loading into HDFS.
- Developed SQL queries for both dimensional and relational data warehouses and performed data analysis.
- Good experience with use-case development, with Software methodologies likeAgile.
Environment: HDFS, Spark, Spark SQL, PySpark, Scala, Python, AWS S3, EC2, CLI, EMR, Cloud Watch, Docker, Data Frames, Pair RDD’s, NiFi, SQL, Pig Latin, Hive, Tableau, MapReduce.
Hadoop Engineer
Confidential
Responsibilities:
- Responsible for Cluster Maintenance, Monitoring, Managing, Commissioning and decommissioning Data nodes, Troubleshooting, and review data backups, Manage & review log files for Cloudera (CDH 4.x.x and CDH 5.x.x) distributions.
- Hands on experience in provisioning and managing multi-node Hadoop Clusters Amazon Web Services (AWS) - EC2 instances.
- Adding/Installation of new components and removal of them through Cloudera Manager.
- Monitoring workload, job performance, capacity planning using Cloudera Manager.
- Installed Ambari on existing cluster for monitoring workload, job performance and capacity planning using Ambari.
- Major and Minor upgrades and patch updates.
- Creating and managing the Cron jobs.
- Installed MAPR eco system components like Pig, Hive, Hbase and Sqoop in POC CLuster.
- Experience in setting up tools like Nagios and Ganglia for monitoring Hadoop cluster.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Handling the data movement between HDFS and different web sources using Flume and Sqoop.
- Extracted files from NoSQL database like Cassandra, Cloud DB and HBase through Sqoop and placed in HDFS for processing.
- Implemented Fair schedulers to share the resources of the cluster for the map reduce jobs given by the users.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment, supporting and managing Hadoop Clusters.
- Installed and configured MapReduce, HDFS and developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Working with applications teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Extensively worked on Informatica tool to extract data from flat files, Oracle and Teradata and to load the data into the target database.
- Experienced in deploying Hadoop Cluster using automation tools like Chef and Puppet.
- Responsible for developing data pipeline using HDInsight, Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
- Commissioning Data Nodes when data grew and De-commissioning of data nodes from cluster in hardware degraded.
- Set up and managing HA Name Node to avoid single point of failures in large clusters.
- Working with data delivery teams to setup new Hadoop users, Linux users, setting up Kerberos principles and testing HDFS, Hive.
- Working on Tableau for generating reports on HDFS data.
- Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
Environment: - Java (JDK 1.7), Linux, Shell Scripting, Tableau, Map Reduce, Teradata, SQL server, NoSQL, Cloudera Hadoop, Flume, Sqoop, Chef, Puppet, Pig, Hive, Zookeeper and HBase.
Big Data Engineer
ConfidentialResponsibilities:
- Create, validate and maintain scripts to load data from and into tables in Oracle PL/SQL and in SQL Server 2008 R2.
- Wrote Store Procedures and Triggers.
- Converting, testing and validating Oracle scripts to SQL Server.
- Developed Kafka producer and consumers, HBase clients, Apache Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS.
- Used SOLR for database integration IBM MAXIMO to SQL SERVER.
- Upgraded IBM Maximo database from 5.2 to 7.5.
- Analyze, validate and document the changed records for IBM Maximo web application.
- Importing data from MySQL database to HiveQL using Scoop.
- Writing Map Reduce jobs.
- Develop, validate and maintain HiveQL queries.
- Running reports in Pig and Hive Queries.
- Wrote and Implemented Apache PIG scripts to load data from and to store data into Hive.
- Install and configure Hue.
- Managing Amazon Web Services AWS infrastructure with automation and configuration management tools such as IBM Deploy, Puppet or custom-built designing cloud-hosted solutions, specific AWS product suite experience.
- Junit for unit testing.
- Conduct datamining, data modelling, statistical analysis, business intelligence gathering, trending and benchmarking by using Data Meer.
- Used Tableau for visualization and generate reports for financial data consolidation, reconciliation and segmentation.
- Designed and developed script for transfer of files using FTP/SFTP between servers according to business requirements
- Implemented machine learning techniques like clustering and regression on Tableau and created interactive dashboards
- Hands on experience in installation, configuration, supporting and managingHadoop ClustersusingApache, Cloudera (CDH3, CDH4), Yarn distributions.
- Support full testing cycle for ETL processes, including bug fixes. sing the data Integration tool Pentaho for designing ETL jobs in the process of building Data warehouses and Data Marts.
Environment: HDFS, Hive, Pig, Sqoop, Zookeeper, Oozie, ETL, Pentaho BI.5.0.1, AWS, Tableau, Hive Query, CentOS, Cloudera
Confidential
Jr. Java Developer
Responsibilities:
- Worked with several clients with day-to-day requests and responsibilities.
- Involved in analysing system failures, identifying root causes and recommended course of actions.
- Integrated Struts Hibernate and JBoss Application Server to provide efficient data access.
- Involved in HTML page Development using CSS and JavaScript.
- Developed the presentation layer with JSF, JSP, JAVA Script technologies.
- Designed table structure and coded scripts to create tables, indexes, views, sequence, synonyms, and database triggers. Involved in writing Database procedures, Triggers, PL/SQL statements for data retrieval.
- Developed the UI components using JQuery and JavaScript Functionalities.
- Designed database and coded PL/SQL stored Procedures, triggers required for the project.
- Used Session and FacesContext of JSF Objects for passing content from one Bean to other.
- Designed and developed Session Beans to implement business logic.
- Tuned SQL statements, hibernate mapping, and Web Sphere application server to improve performance, and consequently met the SLAs.
- Created the EAR and WAR files and deployed the application in different environment
- Engaged in analysing requirements, identifying various individual logical components, expressing the system design through UML diagrams using Rational Rose.
- Assisted in designing, building, and maintaining database to analyze life cycle of checking and debit transactions.
- Involved in running shell scripts.
- Extensively used HTML and CSS in developing the front-end.
- Deployed and tested application on Web Sphere Application Server.
- Designed and Developed JSP pages to store and retrieve information.
- Suggest improvements in Monitoring and Application logging
- Ensure OPS SLAs are not being broken
- Provide SME Triage and Root cause analysis in War Rooms
- Participate in 24/7 support rotations
Environment: Java, J2EE, JSP, JavaScript, JSF Sun RI, Ajax4JSF, Spring, XML XHTML, Hibernate, Oracle9i, PL/SQL, SOAP Web service, Web Sphere, Oracle, JUnit, SVN.