Azure Big Data Engineer Resume
Scottsdale, AZ
SUMMARY
- Around 6 years of professional IT experience in analysis, design and development using Bigdata, Python, Scala, and SQL
- Strong experience in dealing with Apache Hadoop components such as HDFS, Map reduce, Hive, HBase, PIG, Spark, Spark Streaming, Impala, Kafka and Sqoop.
- Strong Experience in processing large sets of structured and semi - structured data and supporting systems application architecture.
- Good Experience in assess business rules, collaborate with stakeholders, and perform source-to-target data mapping, design and review.
- Experience optimizing ETL workflows.
- Worked on C/C++ and shell Scripting to provide final bills to the customers
- Used Spark SQL to get data from Hive and process using Spark API’s.
- Involved in converting Hive/SQL queries into Spark transformations for efficient data access.
- Experience working both independently and collaboratively to solve problems and deliver high quality results in a fast-paced, unstructured environment.
- Exhibited strong written and oral communication skills. Rapidly learn and adapt quickly to emerging new technologies and paradigms.
- Experience with Hadoop 2.0 architecture YARN(MRV2) and developing YARN Applications on it.
- Worked on building scalable distributed data solutions using Hadoop.
- Worked on analyzing the data using HiveQL and writing UNIX shell scripting.
- Worked on writing custom UDF'S in extending Hive functionality.
- Worked on managing and reviewing Hadoop log files.
- Worked on Sqoop, in moving the data from a relational database into Hadoop and used FLUME in collecting the data and populate Hadoop.
- Worked on HBase in conducting the quick look ups such as updates, inserts and deletes in Hadoop.
- Very good Knowledge inSparkarchitecture with thepython/Scalascripts.
- Experience with Cloudera, Hortonworks and MapR distributions.
- Worked on the Cloudera Hadoopand Spark developer environment with on-demand lab work using a virtual machine on the cloud.
- Worked with business stakeholders and other SMEs to understand high-level business requirements.
- Experience with Apache Spark’s Core, Spark SQL, Streaming and MlLib components.
- Work experience with cloud infrastructure like Amazon Web Services (AWS).
- Experience in Data modeling, complex data structures, Data processing, Data quality, Data life cycle.
- Experience in running Map Reduce and Spark jobs over YARN.
- Expertise in interactive data visualization and analyzation with BI tools like Tableau.
- Hands-on experience on Snowflake.
- Hands-on experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Participated in design reviews, code reviews, unit testing and integration testing.
- Strong Experience on SQL, PL/SQL and the database concepts.
- Experience on NoSQL Databases such as Hbase and Mongo DB.
- Knowledge on administrative tasks such as installing Hadoop and its ecosystem components such as Hive, Spark and HBase.
TECHNICAL SKILLS
Hadoop/Big Data ecosystems: HDFS, Spark, PySpark, Spark Streaming, Kafka, Hive, MapReduce, Impala, Sqoop, Oozie.
No SQL Database: HBase, MongoDb
Tools: and IDE: Eclipse, Intellij Idea, Aqua Data Studio, Altova Mapforce, NetBeans, Maven, SBT.
Languages: C, C++, Java, J2EE, PL/SQL, MR, Pig Latin, HiveQL, Unix shell scripting, Perl scripting, Python and Scala
Databases and Datawarehousing: Teradata, Oracle, SQL server, MySQL, DB2, PostgreSQL
ETL tools: Data stage, Teradata
Operating Systems: Windows 95/98/2000/XP/Vista/7, Unix, Linux
PROFESSIONAL EXPERIENCE
Confidential, Scottsdale, AZ
Azure Big Data Engineer
Responsibilities:
- Build the ingestion pipelines to get data ingested from various data bases and complex data source systems into Delta lake built on Azure using Spark, Spark-Streaming, Python and Scala.
- Partner end-to-end with Product Manager/Architect and Data Scientists to understand business requirements and design prototypes and brings new ideas to production environment.
- Attending project kick-off meetings, capturing requirements, conducting Impact Analysis of requirement and providing sizing for development efforts and timelines based on Impact analysis, sprint planning and stand up.
- Building a distributed, scalable and reliable data pipelines that ingest and process data at scale in batch and real-time.
- Created Pipelines in ADF using Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse.
- Experience on Migrating SQL/Netezza database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
- Preparing high level design and low-level design documents for the captured requirements based on the analysis as per the Quality Procedures and standards. Preparing and maintaining all the deliverables from Design to Rollout Phase with reliable quality.
- Develop Spark core, Spark SQL/Streaming scripts using Python and Scala for faster data processing and use HBase to load the data.
- Develop complex, high-performance, and highly available distributed software systems that will successfully deliver to customers.
- Work on programming languages which include Scala, Python, Spark, SQL and Hive.
- Works with various columnar storage and file formats using Parquet, AVRO and ORC.
- Day to Day support of existing data ingestion jobs which scale about 10-1000 TB of data.
- Creation, support and scheduling of ingestions from new sources using Oozie and Cron tab.
- Effectively worked on Azure products: Azure Data Lake Store, Azure HD Insights, Cosmos DB, Power BI, Data Factory and DataBricks.
- Work on version controlling using Git and Jenkin's for CI/CD.
- Unit Testing & Production, including responsibility for testing Frameworks and scripts before going to QA and production.
- Work on loading data from Databricks Delta tables to Snowflake.
- Work closely with the QA team in deploying the code with Jenkins and make necessary changes as per QA team requests.
- Build, and troubleshoot Cosmos DB SQL solutions that meet business and technical requirements.
- Deployment and Maintenance, including working with internal and external data providers on data validation, provide feedback, and make customized changes to data feeds and data mappings for analytical and operational use.
- Participating in all release management activities and providing description of project, back out plans and Implementation of enhancements into real time environment.
Confidential
Big Data Developer
Responsibilities:
- Developed data pipeline using Spark, Hive, Pig, Python and HBase for processing the daily and historical data.
- Developed Sparkcore and SparkSQL scripts using Scala and Python for faster data processing.
- Implemented Dynamic partitions and Bucketing in hive for efficient data access.
- Created big data workflows to ingest the data from various sources to Hadoop using Oozie and these workflows comprises of heterogeneous jobs like Spark, Hive, Sqoop and Shell scripts.
- Involved in developing Hive/Sql queries into Spark transformations using Spark RDD's, Data Frames and Python scripts.
- Implemented Apache NIFI flow topologies for moving the data to HDFS.
- Extensively used Sqoop for importing and exporting data between RDBMS and HDFS.
- Very good experience on Data driven and Test driven environments.
- Worked on loading the delta records into HBase table and used Apache Phoenix to pull and aggregate the data.
- Used ETL process to Extract, Transform and Load the data into stage area and data warehouse.
- Worked under minimal direction in fast-paced energetic environment, managing multiple projects and priorities at once.
- Worked on real time data ingestion using Change Data Capture(CDC) methodologies.
- Experience in migrating PIG and Hive jobs to Spark for improving the performance and tuning the code.
- Generation of Surrogate Keys for the dimensions and fact tables for indexing and faster access of data in Data Warehouse.
- Worked on creating Hive tables and query them using Hive QL.
- Worked on the dimensional data model using Erwin Data Modeler (Star Schema).
- Experience in working Change capture stage and Slowly Changing Dimension (SCD) stage.
- Involved in Unit testing and Integration testing to test jobs and the flow.
- Experience in troubleshooting by tuning mappings, identify and resolve performance bottlenecks in various levels like source, target, mappings and session.
- Created data flow diagrams, data mapping from Source to Stage and Stage to Target mapping documents indicating the source tables, columns, data types, transformations required and business rules to be applied.
- Used stored procedures in Sqoop export to update the data in MySql / Nettezza database.
- Used Git for version control and Jenkins for continuous integration and continuous deployment.
- Involved in the daily maintenance of the database that involved monitoring the daily run of the scripts as well as troubleshooting in the event of any errors in the entire process.
Environment: HortonWorks, Spark, Spark-Sql, Hive, NIFI, HBase, Shell Script, Scala, Python, Oozie, Hue, Sqoop, HDFS, Nettezza, SQL Developer