We provide IT Staff Augmentation Services!

Senior Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Ada, MI

SUMMARY

  • Over 8 years of technical IT experience in all phases of Software Development Life Cycle (SDLC) with skills in data analysis, design, development, testing and deployment of software systems.
  • Experience in Big Data analytics,Data manipulation, using Hadoop Eco system toolsMap - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HBase, Spark, Kafka, Flume, Sqoop, Oozie, Avro,AWS,Spring Boot, Spark integration with Cassandra, Avro, Solr and Zookeeper.
  • Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue, Lambda functions, Step functions, Cloud Watch, SNS, Dynamo DB, and SQS.
  • Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server. Worked on different file formats like delimited files, Avro, Json and parquet. Docker container orchestration using ECS, ALB and lambda.
  • Created Snowflake Schemasby normalizing the dimension tables as appropriate and creating a Sub Dimension named Demographic as a subset to the Customer Dimension.
  • Hands on experience in test driven development(TDD),Behaviour driven development(BDD)and acceptance test driven development (ATDD)approaches.
  • Managing Database, Azure Data Platform services (Azure Data Lake (ADLS), Data Factory (ADF), Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, and NoSQL DB), SQL Server, Oracle, Data Warehouse etc. Built multiple Data Lakes.
  • Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau, Power BI.
  • Expertise in Java programming and have a good understanding on OOPs, I/O, Collections, Exception Handling, Lambda Expressions, Annotations
  • Provided full life cycle support to logical/physical database design, schema management and deployment. Adept at database deployment phase with strict configuration management and controlled coordination with different teams.
  • Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy. Experience in working on creating and running Docker images with multiple micro services.
  • Utilized analytical applications like R, SPSS, Rattle and Python to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value.
  • Extensive hands-on experience in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, EMR, and Elastic search), Hadoop, Python, Spark and effective use of Azure SQL Database, MapReduce, Hive, SQL and PySpark to solve big data type problems.
  • Hands-on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components like MapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark MLlib, Spark GraphX, Spark SQL, Kafka
  • Proficient with Spark Core, Spark SQL, Spark MLlib, Spark GraphX and Spark Streaming for processing and transforming complex data using in-memory computing capabilities written in Scala. Worked with Spark to improve efficiency of existing algorithms using Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDD's and Spark YARN.
  • Able to use Sqoop to migrate data between RDBMS, NoSQL databases and HDFS.
  • Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating and moving data from various sources using Apache Flume, Kafka, Power BI and Microsoft SSIS.
  • Experience in developing customizedUDF’sin Python to extend Hive and Pig Latin functionality.
  • Expertise in designing complex Mappings and have expertise in performance tuning and slowly changing Dimension Tables and Fact tables
  • Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from different source systems including flat files.
  • Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and MongoDB usingPython.
  • Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle)
  • Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.
  • Developed spark applications in python (Pyspark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
  • Good knowledge of Data Marts, OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
  • Ability to work effectively in cross-functional team environments, excellent communication, and interpersonal skills.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, KafkaFlume, Cassandra, Impala, Oozie, Zookeeper, Map R, Amazon Web Services (AWS), EMR

Machine Learning Classification Algorithms: Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbor (KNN), Gradient Boosting Classifier, Extreme Gradient Boosting Classifier, Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayes Classifier, Extra Trees Classifier, Stochastic Gradient Descent, etc.

Cloud Technologies: AWS, Azure, Google cloud platform (GCP)

IDE’s: IntelliJ, Eclipse, Spyder, Jupyter

Ensemble and Stacking: Averaged Ensembles, Weighted Averaging, Base Learning, Meta Learning, Majority Voting, Stacked Ensemble, Auto ML - Scikit-Learn, MLjar, etc.

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, HBASE

Programming / Query Languages: Java, SQL, Python Programming (Pandas, NumPy, SciPy, Scikit-Learn, Seaborn, Matplotlib, NLTK), NoSQL, PySpark, PySpark SQL, SAS, R Programming (Caret, Glmnet, XGBoost, rpart, ggplot2, sqldf), RStudio, PL/SQL, Linux shell scripts, Scala.

Data Engineer/Big Data Tools / Cloud / Visualization / Other Tools: Databricks, Hadoop Distributed File System (HDFS), Hive, Pig, Sqoop, MapReduce, Spring Boot, Flume, YARN, Horton works, Cloudera, Mahout, MLlib, Oozie, Zookeeper, etc. AWS, Azure Databricks, Azure Data Explorer, Azure HDInsight, Salesforce, GCP, Google Shell, Linux, PuTTY, Bash Shell, Unix, etc., Tableau, Power BI, SAS, Web Intelligence, Crystal Reports, Dashboard Design.

PROFESSIONAL EXPERIENCE

Confidential, ADA, MI

Senior Big Data Engineer

Responsibilities:

  • Installing, configuring and maintaining Data Pipelines
  • Transforming business problems into Big Data solutions and defining Big Data strategy and Roadmap.
  • Designing the business requirement collection approach based on the project scope and SDLC methodology.
  • Files extracted from Hadoop and dropped on daily or hourly basis intoS3
  • Authoring Python (PySpark) Scripts for custom UDF’s for Row/ Column manipulations, merges, aggregations, stacking, data labelling and for all Cleaning and conforming tasks.
  • Writing Pig Scripts to generate MapReduce jobs and performing ETL procedures on the data in HDFS.
  • Develop solutions to leverage ETL tools and identify opportunities for process improvements using Informatica and Python
  • Conduct root cause analysis and resolve production problems and data issues
  • Performance tuning, code promotion and testing of application changes
  • Conduct performance analysis and optimize data processes. Make recommendations for continuous improvement of the data processing environment.
  • Developed a data platform from scratch and took part in requirement gathering and analysis phase of the project in documenting the business requirements.
  • Design and implement multiple ETL solutions with various data sources by extensive SQL Scripting, ETL tools, Python, Shell Scripting and scheduling tools. Data profiling and data wrangling of XML, Web feeds and file handling using python, UNIX and Sql.
  • Developing Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats. Using Kafka and integrating Spark Streaming. Developed data analysis tools using SQL andPythoncode.
  • Loading data from different sources to a data warehouse to perform some data aggregations for business Intelligence using python.
  • Designed and implemented Sqoop for the incremental job to read data from DB2 and load to hive tables and connected to Tableau for generating interactive reports using Hive server2.
  • Used Sqoop to channel data from different sources of HDFS and RDBMS.
  • Create Spark code to processstreaming datafromKafkacluster and load the data to staging area for processing.
  • Create data pipelines to use for business reports and process streaming data by using Kafka on premise cluster.
  • Process the data from Kafka pipelines from topics and show the real time streaming in dashboards
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats.
  • Used SSIS to build automated multi-dimensional cubes.
  • Used Spark Streaming to receive real time data from Kafka and store the stream data to HDFS using Python and NoSQL databases such as HBase and Cassandra
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performed necessary Transformations and Aggregation on the fly to build the common learner data model and persisting the data in HDFS.
  • Prepared and uploaded SSRS reports. Managed database and SSRS permissions.
  • Used Apache NiFi to copy data from local file system to HDP. Good knowledge on various modules of AML including Watch List Filtering, Suspicious Activity Monitoring, CTR,CDD, and EDD.
  • Used SQL Server Management Tool to check data in the database.
  • Validated test data in DB2 tables on Mainframes and on Teradata using SQL queries.
  • Identified and documented Functional/Non-Functional and other related business decisions for implementing Actimize-SAM to comply with AML Regulations.
  • Automated and scheduled recurring reporting processes using UNIXshellscriptingand Teradata utilities such as MLOAD, BTEQ and Fast Load
  • Implemented Actimize Anti-Money Laundering (AML) system to monitor suspicious transactions and enhance regulatory compliance.
  • Worked on Dimensional and Relational Data Modelling using Star and Snowflake Schemas, OLTP/OLAP system, Conceptual, Logical and Physical data modelling using Erwin.
  • Automated data processing with Oozie to automate data loading into the Hadoop Distributed File System.
  • Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, MongoDB, T-SQL, and SQL Server usingPython.

Environment: Cloudera Manager (CDH5), Hadoop, Pyspark, HDFS, NiFi, Pig, Hive, S3, Kafka, Scrum, Git, Sqoop, Oozie, Informatica, Tableau, OLTP, OLAP, HBase, Scala, Cassandra, Informatica, SQL Server, Python, Shell Scripting, XML, Unix.

Confidential, Jersey City, NJ

Senior Big Data Engineer

Responsibilities:

  • Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines
  • Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines
  • Performed data extraction, transformation, loading, and integration in data warehouse, operational data stores and master data management
  • Strong understanding of AWS components such as EC2 and S3
  • Responsible for data services and data movement infrastructures
  • Good understanding of ETL concepts, building ETL solutions and Data modeling
  • Worked on architecting the ETL transformation layers and writing spark jobs to do the processing.
  • Designed & built infrastructure for the Google Cloud environment from scratch
  • Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension)
  • Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as AWS, GCP
  • Worked on confluence and Jira
  • Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with Python
  • Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling
  • Compiled data from various sources to perform complex analysis for actionable results
  • Experience in working with different join patterns and implemented both Map and Reduce Side Joins.
  • Data visualization:Pentaho, Tableau, D3. Have knowledge of Numerical optimization, Anomaly Detection and estimation, A/B testing, Statistics, and Maple. Have big data analysis technique using big data related techniques i.e.,Hadoop, Map Reduce, NoSQL, Pig/Hive, Spark/Shark, MLlibandScala, NumPy, SciPy, Pandas, Scikit-learn.
  • UtilizedSpark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLlib, and Pythonand used the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used ApacheSpark Data frames, Spark-SQL, Spark MLlibextensively and developed and designed POC's using Scala, Spark SQL and MLlib libraries.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.
  • Imported several transactional logs from web servers with Flume to ingest the data into HDFS.
  • Used Flume and Spool directory for loading the data from local system (LFS) to HDFS.
  • Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.
  • Created Partitioned Hive tables and worked on them using Hive QL.
  • Worked on continuous Integration tools Jenkins and automated jar files.
  • Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server.
  • Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
  • Experience in setting up the whole app stack, setup, and debug log stash to send Apache logs to AWS Elastic search.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into Spark for faster processing of data.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data inAzure Databricks.
  • Tested Apache Tez for building high performance batch and interactive data processing applications on Pig and Hive jobs.
  • Measured Efficiency of Hadoop/Hive environment ensuring SLA is met
  • Implemented a Continuous Delivery pipeline with Docker, and Git Hub and AWS
  • Participated in the full software development lifecycle with requirements, solution design, development, QA implementation, and product support using Scrum and other Agile methodologies
  • Collaborated with team members and stakeholders to design and develop the data environment
  • Prepared associated documentation for specifications, requirements, and testing

Environment: Hadoop, Hive, AWS, GCP, Bigquery, Hbase, Scala, Flume, Apache Tez, Cloud Shell, Azure Databricks, Docker, Jira, MySQL, Postgres, Sql Server, Python, Scala, Spark, Hive, Spark-Sql

Confidential, Chicago, IL

Big Data Developer

Responsibilities:

  • As a Big Data Developer, worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
  • Responsibilities include gathering business requirements, developing strategy for data cleansing and data migration, writing functional and technical specifications, creating source to target mapping, designing data profiling and data validation jobs in Informatica, and creating ETL jobs in Informatica.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive.
  • Worked on Hadoop cluster which ranged from 4-8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production.
  • Built APIs that will allow customer service representatives to access the data and answer queries.
  • Designed changes to transform current Hadoop jobs to HBase.
  • Handled fixing of defects efficiently and worked with the QA and BA team for clarifications.
  • Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
  • Extending the functionality of Hive with custom UDF s and UDAF's.
  • The new Business Data Warehouse (BDW) improved query/report performance, reduced the time needed to develop reports and established self-service reporting model in Cognos for business users.
  • Implemented Bucketing and Partitioning using hive to assist the users with data analysis.
  • Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Develop database management systems for easy access, storage, and retrieval of data.
  • Perform DB activities such as indexing, performance tuning, and backup and restore.
  • Expertise in writing Hadoop Jobs for analysing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in the hive and Map Side joins.
  • Expert in creating Hive UDFs using Java to analyse the data efficiently.
  • Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop.
  • Implemented AJAX, JSON, and Java script to create interactive web screens.
  • Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB.Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analysed them by running Hive queries. Processed the image data through the Hadoop distributed system by using Map and Reduce then stored into HDFS.
  • Created Session Beans and controller Servlets for handling HTTP requests from Talend
  • Performed Data Visualization and Designed Dashboards with Tableau and generated complex reports including chars, summaries, and graphs to interpret the findings to the team and stakeholders.
  • Wrote documentation for each report including purpose, data source, column mapping, transformation, and user group.
  • Utilized Waterfall methodology for team and project management
  • Used Git for version control with Data Engineer team and Data Scientists colleagues. Involved in creating CreatedTableaudashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt charts etc. using show me functionality.Dashboards and stories as needed usingTableauDesktop andTableauServer
  • Performed statistical analysis using SQL, Python, R Programming and Excel.
  • Worked extensively with Excel VBA Macros, Microsoft Access Forms
  • Import, clean, filter and analyse data using tools such as SQL, HIVE and PIG.
  • Used Python& SAS to extract, transform & load source data from transaction systems, generated reports, insights, and key conclusions.
  • DevelopedSparkjobs using Scala for faster real-time analytics and usedSparkSQL for querying
  • Developed story telling dashboards inTableauDesktop and published them on toTableauServer which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information.
  • Analysed and recommended improvements for better data consistency and efficiency
  • Designed and Developeddata mapping procedures ETL-Data Extraction,Data Analysis and Loading process for integratingdata using R programming.
  • Effectively Communicated plans, project status, project risks and project metrics to the project team planned test strategies in accordance with project scope.

Environment: Cloudera CDH4.3, Hadoop, Pig, Hive, Informatica, Hbase, MapReduce, HDFS, Sqoop, Impala, SQL, Tableau, Python, SAS, Flume, Scala, Java script, Oozie, Linux, No SQL, MongoDB, Talend, Git.

Confidential, Bentonville, AR

Hadoop/ Spark Developer

Responsibilities:

  • Experience in Big Data Analytics and design in Hadoop ecosystem using MapReduce Programming, Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, Kafka
  • Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive.
  • Responsible for importing data to HDFS using Sqoop from different RDBMS servers and exporting data using Sqoop to the RDBMS servers after aggregations for other ETL operations.
  • Experience in designing and developing applications in PySpark using python to compare the performance of Spark with Hive.
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Create/Modify shell scripts for scheduling various data cleansing scripts and ETL load process.
  • Developed testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
  • Used Jira for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment.
  • Build the Oozie pipeline which performs several actions like file move process, Sqoop the data from the source Teradata or SQL and exports into the hive staging tables and performing aggregations as per business requirements and loading into the main tables.
  • Running of Apache Hadoop, CDH and Map-R distros, dubbedElastic MapReduce (EMR)on(EC2).
  • Performing the forking action whenever there is a scope of parallel process for optimization of data latency.
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Performed pig script which picks the data from one Hdfs path and performs aggregation and loads into another path which later pulls populates into another domain table. Converted this script into a jar and passed as parameter in Oozie script
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity. Build an ETL which utilizes spark jar inside which executes the business analytical model.
  • Hands on experiences on git bash commands like git pull to pull the code from source and developing it as per the requirements, git add to add files, git commit after the code build and git push to the pre prod environment for the code review and later used screwdriver. Yaml which actually build the code, generates artifacts which releases in to production
  • Created logical data model from the conceptual model and its conversion into the physical database design using Erwin. Involved in transforming data from legacy tables toHDFS, andHBasetables usingSqoop.
  • Connected to AWS Redshift through Tableau to extract live data for real time analysis.
  • Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.
  • Involved in creating UNIX shell Scripting. Defragmentation of tables, partitioning, compressing and indexes for improved performance and efficiency.
  • Developed reusable objects like PL/SQL program units and libraries, database procedures and functions, database triggers to be used by the team and satisfying the business rules.
  • Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
  • Developed and implemented R and Shiny application which showcases machine learning for business forecasting. Developed predictive models using Python & R to predict customers churn and classification of customers.
  • Partner with infrastructure and platform teams to configure, tune tools, automate tasks and guide the evolution of internal big data ecosystem; serve as a bridge between data scientists and infrastructure/platform teams.
  • Implemented Big Data Analytics and Advanced Data Science techniques to identify trends, patterns, and discrepancies on petabytes of data by using Azure Databricks, Hive, Hadoop, Python, PySpark, Spark SQL, MapReduce, and Azure Machine Learning.
  • Data analysis using regressions, data cleaning, excel v-look up, histograms and TOAD client and data representation of the analysis and suggested solutions for investors
  • Rapid model creation in Python using pandas, NumPy, sklearn, and plot.ly for data visualization. These models are then implemented in SAS where they are interfaced with MSSQL databases and scheduled to update on a timely basis.

Environment: MapReduce, Hadoop, Spark, Hive, Pig, Sqoop, HBase, Oozie, Impala, Kafka, JSON, XML PL/SQLSql, HDFS, UNIX, Python, Scala, SAS, PySpark, Redshift, Azure, Shell Scripting.

Confidential

Data & Reporting Analyst

Responsibilities:

  • Imported Legacy data from SQL Server and Teradata into Amazon S3.
  • Created consumption views on top of metrics to reduce the running time for complex queries.
  • Exported Data into Snowflake by creating Staging Tables to load Data of different files from Amazon S3.
  • Compare the data in a leaf level process from various databases when data transformation or data loading takes place. I need to analyze and look into the data quality when these types of loads are done (To look for any data loss, data corruption).
  • Description of End-to-end development of Actimize models for trading compliance solutions of the project bank.
  • As a part of Data Migration, wrote many SQL Scripts for Mismatch of data and worked on loading the history data from Teradata SQL to snowflake.
  • Developed SQL scripts to Upload, Retrieve, Manipulate and handle sensitive data (National Provider Identifier Data I.e. Name, Address, SSN, Phone No) in Teradata, SQL Server Management Studio and Snowflake Databases for the Project
  • Worked on to retrieve the data from FS to S3 using spark commands
  • Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS
  • Created performance dashboards in Tableau/ Excel / Power point for the key stakeholders
  • Incorporated predictive modelling (rule engine) to evaluate the Customer/Seller health score using python scripts, performed computations and integrated with the Tableau viz.
  • Designed, developed and maintained Big Data streaming and batch applications using Storm.
  • Worked with stakeholders to communicate campaign results, strategy, issues or needs.
  • Analysed marketing campaigns from various perspectives including CTR, conversion rates, seasonal/geographical trends, search queries, landing page, conversion funnel, quality score, competitors, distribution channel, etc. to achieve maximum ROI for clients.
  • Worked with business to identify the gaps in mobile tracking and come up with the solution to solve.
  • Analysed click events of Hybrid landing page which includes bounce rate, conversion rate, Jump back rate, List/Gallery view, etc. and provide valuable information for landing page optimization.
  • Evaluated the traffic and performance of Daily deals PLA ads and compare those items with non-daily deal items to see the possibility of increasing ROI. suggested improvements and modify existing BI components (Reports, Stored Procedures)
  • Understood Business requirements to the core and Came up with Test Strategy based on Business rules
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
  • Prepared Test Plan to ensure QA and Development phases are in parallel
  • Written and executed Test Cases and reviewed with Business & Development Teams.
  • Implemented Defect Tracking process using JIRA tool by assigning bugs to Development Team
  • Automated Regression tool (Qute) and reduced manual effort and increased team productivity
  • Involved in Functional Testing, Integration testing, Regression Testing, Smoke testing and performance Testing. Tested Hadoop MapReduce developed in python, pig, Hive
  • Created Metric tables, End user views in Snowflake to feed data for Tableau refresh.
  • Generated Custom SQL to verify the dependency for the daily, Weekly, Monthly jobs.
  • Using Nebula Metadata, registered Business and Technical Datasets for corresponding SQL scripts
  • Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like text file, CSV file.
  • Developed spark code and spark-SQL/streaming for faster testing and processing of data.
  • Closely involved in scheduling Daily, Monthly jobs with Precondition/Post condition based on the requirement.
  • Monitor the Daily, Weekly, Monthly jobs and provide support in case of failures/issues.

Environment: Hadoop, MapReduce, AWS, Snowflake, AWS S3, Scala, Kafka, GitHub, Service Now, HP Service Manager, Jira, EMR, Nebula, Teradata, SQL Server, Apache Spark, Sqoop.

We'd love your feedback!