Cloud/ Big-data Engineer Resume
PA
SUMMARY
- About 7+ years of professional IT expertise, including management of Big Data Operations and teh Hadoop environment.
- Experience with Big Data technologies such as HDFS, Map Reduce, Sqoop, Hive, PySpark & Spark SQL, HBase, Python, Snowflake, S3 storage, and Airflow to design and create applications.
- Strong data engineering and ETL pipeline development skills on batch and streaming data using PySpark and Spark SQL.
- Good understanding of Apache Kafka streaming applications (KSQL, KStream, KTable).
- Working knowledge of AWS cloud services for big data development, such as EMR, S3, Redshift, and EMR cloud watch.
- Using Azure Databricks, created Spark clusters, and configure high concurrency clusters to accelerate teh preparation of high - quality data.
- Working knowledge of Python Integrated Development Environments such as PyCharm, Jupyter Notebook, and Anaconda.
- Experience with Spark - SQL in Data Bricks for data extraction, transformation, and aggregation from numerous file types, as well as analyzing and converting data to reveal insights regarding customer usage trends.
- Experience tuning Map-Reduce tasks and Hive complex queries for performance.
- Good PySpark experience with stream-processing systems
- Good knowledge of Jenkins and AWS for setting up CI/CD pipelines.
- Hands-on experience in Installation, configuration, support, and management of Big Data and Hadoop Cluster underlying infrastructure.
- Strong development skills in Java, J2EE, JDBC.
- Troubleshooting performance difficulties with tableau reports was one of my responsibilities.
- Good understanding of Teradata queries for data import and extraction.
- Setup and configured Big data architecture on AWS cloud platform utilizing CI/CD solution, which included Git, Jenkins, Docker, and Kubernetes.
- BI tools are used to conduct ad-hoc queries directly on Hadoop by using Impala.
- Practical experience using Sqoop and Flume for data acquisition into a Hadoop cluster.
- Experienced in using Zookeeper.
- Structured streaming and a thorough understanding of Spark architecture using data bricks. Data bricks workspace for Business Analytics, setting up AWS and Microsoft Azure with data bricks.
- Good at Databricks cluster management, machine learning life cycle management
- AWS cloud infrastructure database migrations, PostgreSQL, and converting existing oracle and MS SQL server databases to PostgreSQL, MYSQL, and Aurora are areas of great interest and experience.
- Good knowledge of BI tools such as SSRS.
- Expertise with databases such as MYSQL and NoSQL databases such as mongo dB and Cassandra, as well as their configuration and administration.
- Extensive experience with informatica performance tuning, including bottlenecks at teh source, target, and map levels.
- Setup and administration of MYSQL replication on Master-Slave and Master-Master.
- Excellent knowledge of Amazon web services technologies such as EMR, S3, and CloudWatch for running and monitoring Hadoop/Spark operations on AWS.
- Working knowledge of Kubernetes, AWS (Amazon Web Services), Elastic Map Reduce (EMR), Storage S3, EC2 instances, and Data Warehousing.
- Have a strong understanding of job orchestration solutions such as Oozie and Airflow.
- Experience delivering data and analytic solutions using AWS, Azure, or a similar cloud data lake is required.
- Experience using Oozie to create automated workflows dat are both time and data driven.
- Expertise in utilizing Airflow to create, debug, schedule, and monitor jobs.
- Developed JSON Scripts for installing teh Pipeline in Azure Data Factory were created (ADF).
- Using teh tools Spark, data is streamed from many sources such as teh cloud (AWS, Azure) and on-premises.
- Excellent Communication skills, work ethics, and teh ability to work effectively in a team with high technical skills.
TECHNICAL SKILLS
Programming Languages: Python, SQL, Java, SCALA, C#Big-Data Technologies: HDFS, Apache NIFI, Map Reduce, Sqoop, Pig, Hive, Oozie, Impala, Zookeeper, Spark, Kafka, Snowflake, flume
Operating System: Unix, Linux, Windows, Mac OS X, Ubuntu
Database Technologies: MYSQL, Hive, HBase, Oracle, AWS Aurora, AWS DynamoDB, Mongo DB, PostgreSQL, Teradata
Java tools: J2EE, JDBC, Jenkins
Cloud Platforms: AWS, Microsoft Azure
Data Integration & Visualization Tools: Tableau, Informatica, Power BI, SSIS, SSRS
No-SQL Databases: HBase, Cassandra, MongoDB
Environment: s: Jupyter Notebook, PyCharm, Visual studio, IntelliJ
Azure: Data lake, Data Factory, Databricks, Azure SQL
AWS Services: EC2, EMR, S3, Redshift, Lambda, Glue, Kinases
Version control tools: TFS, GIT
PROFESSIONAL EXPERIENCE
Confidential, PA
Cloud/ Big-data Engineer
Responsibilities:
- Used teh ingestion tool Sqoop to extract data from Teradata.
- Developed Spark applications to implement various aggregation and transformation functions of Spark RDD and Spark SQL.
- Used Amazon Lambda to develop APIs to manage server and run teh code in AWS.
- Built S3 buckets and managed policies for teh buckets and used S3 buckets and Glacier for storage and backup on AWS.
- Configured Hadoop tools like Hive, pig, zookeeper, flume, impala and Sqoop.
- Developed spark scripts to import data from S3 buckets in AWS.
- Involved in migrating objects from Teradata to snowflake
- Developed data warehouse models in snowflake for over 100 datasets using WhereScape.
- Used Airflow DAGs to schedule all teh workflows developed in Python.
- Designed, developed, tested and maintained Tableau functional reports based on user requirements.
- Experience with spark SQL in Databricks for data extraction, transformation, and aggregation from numerous file formats for analyzing and converting data to reveal insights regarding customer usage trends.
- Worked on multiple POC’s on Apache NIFI.
- Worked on CI/CD solution, using Git, Jenkins, Docker and Kubernetes to setup and configure Big Data architecture on AWS cloud platform.
- Knowledge in imposing spark teh usage of SCALA and Spark SQL for quicker checking out and processing of information accountable to control statistics from distinct sources.
- Using a combination of Azure data factory, T-SQL, spark SQL, U SQL, and Azure Datalake analytics, extract, transform, and load data from sources systems to azure data storage services. Ingestion of data into Azure services and processing in Azure Databricks.
- Experience in constructing real-time information data pipelines with Kafka join and spark streaming.
- MYSQL stored procedures and triggers were designed to reduce bandwidth between servers and clients.
- Implemented Kafka procedure and consumer applications on Kafka cluster setup with teh help of zookeeper.
- Installed and configured Hadoop MapReduce, HDFS developed multiple MapReduce jobs in java for data cleaning and processing.
- Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and other databases like Hive.
- Extracted data from log files and push into HDFS using flume.
- Worked on MongoDB database concepts such as locking, transactions, indexes, sharding, replication and schema design.
- Implemented spark using Scala and Spark-SQL for faster testing and processing of data responsible to manage data from different sources.
- New and existing PostgreSQL databases were installed, configured, tested, monitored, upgraded and tuned.
- Worked on functions in lambda dat aggregate teh data from incoming events, and tan stored resulting data in DynamoDB.
- Parsers written in Python to extract useful data from design data base.
- Used Automatic job scheduler to schedule multiple applications in production.
- Handled errors and exceptions to develop and deploy code
- Responsible for teh data quality of teh entire data processing pipeline.
- Handled product delivery and provide production support to fix bugs.
- Strong communication and analytical skills, as well as solid ability to perform multitasking and work independently or in teams.
- An ongoing Oozie workflow was built to automate Spark applications.
- Supported teh setup of a QA environment and updating configurations for implementing pig and Sqoop scripts.
Environment: Environment: AWS, Azure, Snowflake, Sqoop, Kafka, Scala, Map Reduce, Apache Spark, PySpark, Pig, Hive, MYSQL, PostgreSQL, MongoDB, Teradata, Tableau, Oozie, Airflow, Python, IntelliJ, Team City, and Git-Hub.
Confidential, TX
Big-Data Engineer
Responsibilities:
- Involved in architecture design, development, and implementation of Hadoop deployment, backup, and recovery systems.
- Developed MapReduce programs in Python using Hadoop to parse teh raw data, populate staging tables, and store their fine data in partitioned HIVE tables.
- Using Impala me queried Managed and external tables created in Hive.
- Install and configure Apache Airflow for S3 bucket and snowflake data warehouse and created DAGs to run teh airflow.
- Installed and configured Pig. Also, written pig Latin scripts.
- Data loading to HDFS was done using Kafka and move back teh data to S3 buckets after processing.
- Experience in designing and implementing mongo dB-based applications, as well as experience in installing, managing, and developing mongo dB clusters.
- Experience in data cleansing and data mining.
- For internal purposes, me set up and benchmarked Hadoop/HBase clusters.
- Developed a custom NIFI process for filtering text from flow files, which included executing spark and Sqoop scripts through NIFI, creating scatter and gather patterns in NIFI, ingesting data from PostgreSQL to HDFS, fetching Hive metadata and storing it in HDFS, and creating a custom NIFI process for filtering text from flow files.
- MYSQL replication was successfully configured as part of a high availability solution.
- Wrote, compiled, and executed programs as necessary using Apache spark in Scala to perform ETL jobs with ingested data.
- Supported BI teams to help them implement important BI systems.
- Create Hive based reports to support application metrics.
- Used AWS Sage maker to quickly build, train and deploy data.
- Deploy teh code to EMR via CI/CD Jenkins.
- Coordinating with teh Data science team in creating PySpark jobs working on oozie workflow for task scheduling
- Perform structural modifications using Map-Reduce, Hive and analyze data using visualization/reporting tools (Tableau).
Environment: Python, Apache Hadoop, HBase, MongoDB, Apache Hive, PySpark, Oozie, HDFS, and Git Hub, power BI, AWS S3, Glue
Confidential
Hadoop/ Big-Data Engineer
Responsibilities:
- Getting input from Business Partners and Subject Matter Experts on business requirements.
- For a proof of concept, me installed and configured a Hadoop cluster using Amazon Web Services (AWS).
- Sqoop was used to regularly import data from RDBMS to HDFS and Hive.
- Created Hive tables and worked on them with Hive QL, which will perform MapReduce jobs in teh backend automatically.
- Extensively involved in teh creation and testing of ETL operations using SQL queries (sub queries and join conditions).
- Profile structured, unstructured and semi-structured data across various sources to identify patterns in data and implemented data quality metrics using necessary queries or python scripts based on teh source.
- Data pipelines were created for various events to load data from DynamoDB to an AWS S3 bucket and subsequently to a HDFS location.
- Designed a unique map Using pig Latin scripts, reduce data analysis and data cleaning programs.
- Converted existing BO reports to Tableau dashboards.
- Used Jupyter notebook to analyze and connect teh data from multiple sources.
- BI tools are used to conduct ad-hoc queries directly on Hadoop by using Impala.
- Teradata tables were queried using SQL Assistant.
- Using Oozie to manage and schedule batch jobs on a Hadoop cluster.
- Built stored procedures / views in snowflake for loading dimensions and information and used in Talend
- Hadoop log file management and review experience.
- Large sets of structured, semi-structured, and unstructured data have been loaded and transformed before.
- Informatica was used to construct ETL programs to implement business requirements.
- Zookeeper was used to provide cluster coordination services.
- Cloudera Manager was used to help monitor teh Hadoop cluster.
- Large amounts of data sets were analyzed to discover teh best approach to aggregate and report on them.
- Developed PL/SQL triggers and master tables for automatic creation of primary keys.
- Responsible for using oozie to regulate workflow.
- me was involved in making scrum meetings more productive by participating in daily SCRUM meetings to discuss teh development/progress of Sprints.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Hadoop distribution of Cloudera, Informatica, Jupyter notebook, AWS, Linux, XML, Eclipse, PL/SQL.
Confidential
.Net/ Hadoop Developer
Responsibilities:
- Participated in teh evaluation of both functional and non-functional needs.
- Hadoop MapReduce and HDFS were installed and configured.
- Flume, pig, hive, HBase, oozie, zookeeper, spark, Sqoop, and Kafka were used to analyze Hadoop clusters using various big data analytic tools.
- HBase and Cassandra are two NoSQL databases dat me have a strong understanding of and experience with.
- Hive was installed and setup, and Hive UDFs were written to implement various business requirements.
- Developed reusable Maplets and Transformations.
- Participated in teh start-up and completion of a proof-of-concept onflume for pre-processing.
- Used debugger to debug Mappings to gain troubleshooting information about data and error conditions.
- Oracle 9i to 10g software update was done for latest features and also databases were tested.
- Complex SQL queries and stored procedures were created.
- For data support and structures, me created an XML schema and Web services.
- Responsible for teh management of data from various sources.
- For data persistence and transaction management, we used teh Hibernate ORM framework in conjunction with teh Spring framework.
- For form level validations, me used teh struts validation framework.
- For unit testing of classes, me wrote Selenium test cases.
- Technical assistance for production environments, including resolving difficulties, identifying flaws, giving, and implementing solutions.
- Built and deployed a Java application across different Unix systems, as well as producing unit and functional test results and release notes.
Environment: Hadoop, HBase, Hive, .Net, SSIS, Power BI, Python, Cassandra, MapReduce, oracle
Confidential
.Net Developer
Responsibilities:
- Crystal reports were created from teh ground up, including gathering needs from business customers, defining report specifications, and deploying teh reports in multiple systems.
- ASP.NET, C#.NET, HTML, XML, and XSLT were used to design and develop an Interactive Graphical User Interface for multiple modules using Win Forms, Windows services, desktop apps inheritable Win Forms, and User Controls.
- Excellent at high-level design of ETL DTS and SSIS packages for data integration using OLEDB connections.
- Strong SSRS report development experience.
- For querying teh database, PL/SQL queries and a JDBC module have been developed.
- For Web service access, me created a Windows Work Flow with Custom Activities.
- Involved in teh development of business logic in teh middle tier, which comprises Code-behind files, User controls, and Classes utilizing C#.Net.
- For consistent access to SQL data sources, me used ADO.NET and data types including Data Adapter, Data Reader, Dataset, and Data Table.
- Involved in J2EE design and Struts using MVC Architecture.
- ASP.NET was used to provide easier client-side validation, improved session management, and powerful data access management.
- TFS (Microsoft Team Foundation Server) was used for version control.
- Teh application was built using teh MVC framework.
- Using MSDN libraries such as Event viewer and Log file, me implemented logging and application problem handling.
- Maintained a positive relationship with teh client and acquired information.
- .Net (C#), JavaScript, jQuery, and MS SQL Server are used to maintain and enhance existing features as well as develop new ones.
Environment: C#.Net, MVC3, Entity Framework, LINQ, ADO.Net, VisualStudio2010, Web Services, PL/SQL, Oracle10g, WCF, XML, XSLT, Java Script, jQuery, Ajax, SSIS, SSRS, SQL Server2008, SharePoint, Agile