Big Data Hadoop Developer Resume

SUMMARY

12+ years of experience working as a Lead Architect/Developer Engineer, AWS, Azure, GCP, GCP Certified, Oracle Cloud Services, Web App, Web Logic, S3, Blob Storage, Snowflake DW, Hadoop, Hive, Informatica, PowerCenter, BDQ, IDQ, BDM,PIM, MDM, Azure, Databricks, DevOps, Terraform, MS Dynamics CRM, Terraform, Jenkins, Docker, Chef, Ansible, Redshift, DataBricks, GCP, Big Table, HBase, EMR, Kinesis, S3, EC2, Lambda,Kubernetes, Data Integration, Migration, IaaS, Paas, Saas, PySpark, Big Data Cloud Architect, Azure, Ansible, Jenkins, Palo Alto Network Security, Docker, Kubernetes, DevOps, Automation, CI/CD, Jenkins, Automation, DevOps, Big Data Hadoop/Spark, HIVE, Impala, Azure Cosmos DB, Chef, Terraform, Data Bricks, Event Hubs MS SQL Server, IBM DataStage, Dashboard, Octave, Web Services, Web Logic, Matlab, SAS, SalesForce Developer, IOT, Machine Learning Tableau, Looker, SQL server, Oracle, UNIX/Linux, PostgreSQL on client/server Configuration, troubleshooting, Spiceworks, Netbox fir solutions and monitoring and supporting the hosted PIM instance, Creating a PIM instance and
Data loading using OOTB capability into PIM for the required capabilities, Hands on OOTB APIs from Informatica PIM,
PIM functionality as a data store for multiple product data types. Data loading using OOTB capability into PIM for the required capabilities, Hands on OOTB APIs from Informatica, PowerCenter, BDQ, IDQ, BDM,PIM, MDM, Data syndication capability, UI capabilities and fetching product data
PIM Repository changes, include Adding New fields, Enumerations, Import/Export field parameters, Categories, Subentities etc. Work on Items, Variants
Structure groups and features. Article attributes. Data Merge from Supplier catalogs to Master catalog. Creation of DQ rules. Advanced SQL queries for interfaces or reporting purposes. Experienced on PIM, IDQ two major components IDQ Analyst IDQ Developer and a series of IDQ Options. Hands on experience on PIM, IDQ workbench for developing global, reusable Data Quality rules and strong experience on Data Profiling using IDQ Analyst.Good understanding on Supplier portal, Dashboards in PIM, Media Manager and Solutions and Monitoring using Netbox, GLPI, plugin customizations using SDK, REST services etc. Supporting upgrades to the tools and technology
Developed in Pentaho, Informatica, Talend for Big Data projects, Dashboard Designs, and Data Visualizations. Utilized with Pentaho(PDI) Kettle, Informatica IDQ, PowerCenter, MDM, data management, Electronic Data Interchange (EDI) data integration/quality, and data governance
AWS Analytics services, Azure, DevOps with Terraform services, GCP, Big Table, Directory, Scripting, Automation, Data Lake, S3, Blob Storage, Salesforce, Azure Data Factory, ESB(Enterprise Service Bus), Logic Apps, API, EMR, Hive, SQOOP
Managed Error Handling, Performance Tuning, Error Logging, clustering and High Availability
Performed on Pentaho, DataStage, Informatica, Talend Data Fabric ETL, Data warehouse concepts, Kimball, Star & Snowflake Schema, Fact/Dimension tables. Pentaho, Informatica, Talend/Informatica Integration Cloud Services, IaaS, PaaS, SaaS.
Developed complex ETL SSIS jobs from various sources such as SQL server, PostgreSQL and other files and loaded into target databases using Pentaho, Informatica, Talend OS ETL tool. Created Big Data Hadoop/Talend Dashboard
ETL/Pentaho, Informatica, Tableau, Talend jobs both design and code to process data to target databases. Used MS SQL Server, Pentaho, Informatica, Talend Admin Console Job conductor to schedule ETL Jobs on daily, weekly, monthly and yearly
Generated Data Quality Dashboard, KPI Reports, Looker Developer, Dynamics CRM, Electronic Data Interchange (EDI), ETL, SSIS, SSAS, SSRS, Dashboard Data Visualizations, SQL, Databases, Data Management, Data Warehousing, Data Governance concepts. Worked with Hadoop, Talend to architect, design and built Big Data solutions, Hive Hadoop, created dashboards, Data Visualizations
Developed in Pentaho, Informatica, and Talend for Dashboards, Data Visualizations, Big Data Spark projects working on Tableau Engineer / Pentaho/ BI/Reporting platform to enable data access, analytic models, and data visualization.

TECHNICAL SKILLS

MS SQL Server Database Administrator/Developer
BI
ETL
Pentaho
Informatica developer IDQ 9.1/9.5.1
Informatica Power Mart 6.x/5.x
Power Exchange
SQL Server Configuration
Looker
Mode
Replication
Virtualization.Visual Studio
C#
.Net
C++
VB
SDLC
Data Migration S3 to Hadoop/HIVE
HIVE to S3
Healthcare
HEDIS
HIPAA
PHI
ESB(Enterprise Service Bus)
Logic Apps
API SharePoint
PowerShell scripts
REST API
Open Data Protocol MS EXCEL
Access
Visio
Oracle
Java
Cogito
Star
Radar
Workbench
REST API
Open Data Protocol
AWS
Redshift
Data Lake
Azure Data Factory
IaaS
SaaS
PaaS
Databricks
S3
PowerBI
data migration
n prem to Azure
AWS
GCP
Cloud
Data Cloud Architect
Azure
Ansible
Jenkins
Docker
Kubernetes
DevOps
Automation
ECS
EC2
ECR
Lambda
VPC
S3
and IoT. HTTP
REST
JSON and IP technologies CI/CD
JavaScript. IBM DataStage SMS
SSRS
SSAS
SAAS
SSIS
ETL
SalesForce Developer
Electronic Data Interchange (EDI)
Crystal Reports. NoSQL
PLSQL
MySQL
T - SQL
SQL queries
Transact-SQL
SQL Server architecture. Data Science
Data Analysis
data mining
Data Warehouse
Business Intelligence
Statistical analysis
concatenations
pivot tables
Table partitioning and archiving
Data cubes
Data marts
IaaS
PaaS
SaaS
Cloud Computing
Big Data
Optimize Stored Procedures
Indexing
Consistency Checks
performance tuning; SQL Server log shipping
SQL replication
scripting
Fine tune database
Function and trigger design and coding
Index implementation and maintenance
IBM DataStage Clustering
Indexing. Random Forest
Machine Learning
Python
PySpark
PowerPivot
PowerView Matlab/R
Ruby
Ruby on Rails
Agile
Waterfall
E-Commerce
Hadoop
SSAS OLAP cubes
Tableau
Pentaho
Hadoop
PIG
Hive
Spark
Oracle. Tableau Architect
Looker
PHP
SQL Server Integration and Analytics Services
Tableau
Looker
Data cubes
Data Science
Data Analysis
Data Warehouse Architect
mapping. E-Commerce
Looker Developer
Hadoop
Big Data
MapReduce
Allscripts
R
HBase
Data modeling
HR Analytics
Data Integration architecture. OLTP
OLAP
database design
performance tuning and security model implementations.BI and analytic tools
Business Objects
QlikView
Tableau
COGNOS. Agile
Waterfall
Scrum development methodology
Web Services
Hyperion
OBIEE
Informatica
Informatica
PowerCenter
BDQ
IDQ
BDM
PIM
MDM
Healthcare
HIPPA
X12 EDI
Healthcare 835/837 Formats
PHI
HEDIS
Cerner
Med Epic Tapestry
Epic Cashe
FACETS
Epic Clarity
HIM
Meditech
TriZetto Reporting
Eclipsys
Allscripts
Cerner
Siemens and McKesson EMR
Epic systems
Epic Beaker/Labs
HIPPA
EDI
Revenue Cycle
HEDIS
SOX
Compliance

PROFESSIONAL EXPERIENCE

Sr. Lead ETL Pentaho, Informatica, SSIS, Big Data Hadoop Developer

Confidential

Hands on Tools: Pentaho, Informatica, Azure, Azure Data Factory, Talend 5.6.3, Spark 1.6, SQL Server 2014, BO 6.5, XiR2, Business Objects 3.1 and 4.0, 4.1, Microstrategy 9.4, Tableau 10.xHadoop 2.6.5, IBM DataStage, IDQ 9.6.1.

Project Environment: Spark API, Cloudera Hadoop YARN, Spark 1.6, Data Aggregation, Data frames, SQL

Responsibilities:

Configuration, troubleshooting, and supporting the hosted PIM instance, Creating a PIM instance and setting up Informatica PIM instance with user acces
Data loading using OOTB capability into PIM for the required capabilities, Hands on OOTB APIs from Informatica PIM, Data syndication capability, UI capabilities and fetching product data
Experienced working on Informatica MDM, UI, Workflows, Models, Hub, Data Quality, Modelling, Architecting complex multi-domain scalable MDM implementations using Informatica MDM 9.x, 10.x Stack, EBX5 and Stibo.
Experience working with Informatica MDM 10.2+, Informatica 360, Product 360
Extensive experience designing scalable solutions for real-time integration with a downstream application involving large volumes of data for MDM projects.
PIM functionality as a data store for multiple product data types. Data loading using OOTB capability into PIM for the required capabilities, Hands on OOTB APIs from Informatica PIM, Data syndication capability, UI capabilities and fetching product data
PIM Repository changes, include Adding New fields, Enumerations, Import/Export field parameters, Categories, Subentities etc.Work on Items, Variants, Products, Structures and relationships. Structure groups and features. Article attributes. Data Merge from Supplier catalogs to Master catalog. Creation of DQ rules. Advanced SQL queries for interfaces or reporting purposes.
Experienced on PIM, IDQ two major components IDQ Analyst IDQ Developer and a series of IDQ Options. Hands on experience on PIM, IDQ workbench for developing global, reusable Data Quality rules and strong experience on Data Profiling using IDQ Analyst.Good understanding on Supplier portal, Dashboards in PIM, Media Manager and plugin customizations using SDK, REST services etc. Supporting upgrades to the tools and technology being used in the Project.
Experienced leveraging third party Cloud Service Providers (IaaS, PaaS, SaaS).
Experienced as an established hands-on technical leader of Cloud Software Engineering teams.
Experienced working with Agile development and methodologies and SaaS
Experienced working in building and delivering Cloud Services and Cloud Automation solutions.
Worked on hosting Cloud services and able to translate business requirements into securely implemented capabilities in the Cloud.
Worked in data migration S3 to Hive data warehousing and high-performance enterprise computing environment.
Worked as a Looker Developer, an Enterprise Data Warehouse Architect to design a Enterprise Data Warehouse for a single source repository for all the financial sales and marketing data with including dashboards/Ad-HOC Reports for client/stakeholders.
Provided reports requiring information for improved Health Care reporting
Worked in ESB (Enterprise Service Bus), Logic Apps, API,reviewing and documenting existing SQL database design and proposing and implementing an architecture to migrate existing data from data repository to an enterprise data warehouse.
Suggested design and action plans to review the existing platform and document the baseline of existing databases;
Designed an enterprise data warehouse that is scalable, accommodates needs of the business.
Document proposed data warehouse for the proposed data warehouse and database architecture document, workflows, ERD,
Data Dictionary (DD), data migration S3 to Hive and Hadoop to S3 migrations, and related artifacts
Documented stored procedures in the existing data marts;
Performed Extract/Transform/Load (ETL) the business data into data cubes and data warehouse;
Designed a data warehouse to ingest data from different data sources and scalable for future requirements.
Recommended and configured standard end user business Intelligence applications in developing reports and extracts, ad hoc queries and dashboards, Looker, Tableau, LookUp, Pentaho, Talend.
Worked in Designing Enterprise Data Warehouses;
Worked with SalesForce Development, SQL Server Business Intelligence Architecture (BIA) top-down and bottom-up approach
Developed and designed Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) databases
Worked in ESB(Enterprise Service Bus), Logic Apps, API, Designing and implementing data extract using Replication, Stored Procedures and SQL
Integration Services (SSIS);
Created procedures and policies for data warehouses; Installing, maintaining, administering Microsoft SQL Server;
Worked in Creating requirements, documents, systems architecture and interfaces, data models, configuration management documents and procedural manuals; and,
Worked in Maintaining database structures to support the Software Development Life Cycle (SDLC) phases
Worked in developing and designing GCP, HBase, Big, Table, Big Query,Microsoft Azure SQL Data Warehouses; and, developing and maintaining SQL Server 2016/SQL Server Analysis Services (SSAS)/SSIS/SSRS.
Worked with Pentaho, ETL, SSIS, Talend Open Studio &Talend Enterprise platform for data management
Managed Error Handling, Performance Tuning, Error Logging, clustering and High Availability in Talend
Worked with SalesForce, Informatica Cloud Data Integration to deliver accessible, trusted, and secured data to facilitate more valuable business decisions to identify competitive advantages to better service customers, and build an empowered workforce.
Worked with IBM DataStage, Azure Data Factory pipelines and other Azure Data Platform to orchestrate management tasks using Azure Automation,
Worked with Azure Data Factory (ADF) since its a great SaaS solution to compose and orchestrate Azure data services. Data Cloud Architect, Azure, Ansible, Jenkins, Docker, Kubernetes, DevOps, Automation, CI/CD, Utilized Azure Data Factory to create, schedule and manage data pipelines
Worked as a Data Engineer to help form a cloud based big data engineering team to deliver platform automation and security.
Built data workflows by using AWS EMR, Spark, Spark SQL, Scala, and Python
Configured and installed tools with a highly available architecture, created ETL SSIS packages, VB, C#
Designed, reviewed and fixed security vulnerabilities at network/subnet/security groups level
Created security standardized templates including password management strategy and implementation
Installed custom software and automated installation process. Optimized Redshift database for optimal performance knowledge of Looker, data migration S3 to Hive, SSIS, ETL, Electronic Data Interchange (EDI), AWS EMR, Spark, Scala, Hadoop, HortonWorks, S3, RedShift
Expert level knowledge of IBM DataStage, Linux/Unix, PowerShell, network security architecture and database tuning
Strong communication and teaching skill to train teams with proven problem solving and critical thinking skills
Used Informatica Cloud Data Integration for global, Data Cloud Architect, Azure, Ansible, Jenkins, Docker, Kubernetes, DevOps, Automation, CI/CD, distributed data warehouse and analytics projects.
Worked with cloud data warehouses AWS Redshift, Salesforce, Azure SQL Data Warehouse, and Snowflake and Informatica Cloud Data Integration solutions to augment the performance, productivity, and extensive connectivity to cloud and on-premises sources.
Developed required modifications of business logic in Data Mart and transition to Data Lake
Created Thematic Heat maps using MapInfo in Tableau
Worked with Azure Data Factory pipelines and other Azure Data Platform to orchestrate management tasks using Azure Automation
Worked with Looker, Azure Data Factory (ADF) since its a great SaaS solution to compose and orchestrate Azure data services.
Utilized Azure Data Factory to create, Data Cloud Architect, Azure, Ansible, Jenkins, Docker, Kubernetes, DevOps, Automation, CI/CD,, schedule and manage data pipelines
Worked with Informatica Cloud since it’s flexible and scalable transformations and advanced capabilities, to seamlessly integrate growing data volumes across disparate sources in the DWH using wizards, developed dashboard reports using Tableau, preconfigured templates, Electronic Data Interchange (EDI), and out-of
Experienced with tableau, PWX Informatica CDC 10.X project implementations, Informatica BDM, PWX, B2B DT, DX
Worked with Oracle PL/SQL, Big Data, enterprise projects implementations
Worked with PWX Informatica CDC to stream data in real time, PWXCCL remote logger to apache Kafka distributed platform
Used Tableau, Looker, Pentaho, Hadoop, Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
Worked with SalesForce Developer Tools, Azure Data Factory pipelines and other Azure Data Platform to orchestrate management tasks using Azure Automation, data migration S3 to Hive
Worked with Azure Data Factory (ADF) since its a great SaaS solution to compose and orchestrate Azure data services.
Utilized Tableau, Looker, Azure Data Factory to create, Electronic Data Interchange (EDI), schedule and manage data pipelines
Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists in
Used Pentaho, Hadoop, Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Scala scripts, UDFFs using Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system Sqoop. Created and managed Worked with EMR provisioning, updating through service catalog using Cloud formation.
Worked with SalesForce Developer Tools, Hadoop cluster set up, performance fine-tuning, monitoring and administration.
Worked with Big Data, HD Insight, Hadoop Eco System, Hive, MapReduce YARN, Tez, Presto, Beeline, Pig, Spark, Scala
Worked with AWS Cloud formation to create service catalog to launch EMR clusters with desired setup.
Worked with AWS EMR, Salesforce, Ranger, and Hadoop technologies
Performed performance tuning of applications for setting right Batch Interval time, correct level of parallelism and tuning.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Worked on ESB(Enterprise Service Bus), Logic Apps, API,Scala scripts, UDFFs using Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries writing data back into OLTP
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists
Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
Worked with Hadoop/Hive/Big data to architect, design and build solutions to create dashboards/Data Visualizations
Utilized Hadoop HiveQL(HQL) development and performance tuning on full lifecycle implementations.
Designed and developed POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Worked with data migration S3 to Hive, ETL Electronic Data Interchange (EDI) interfacing components of solution design and configuration activity
ECS, EC2, ECR, Lambda, VPC, S3, and IoT). HTTP, REST, JSON and IP technologies
Performed cross system joins for identifying duplicates and data anomalies
Created IDQ Dashboards/KPI Metrics Dashboards for reporting using Hadoop/Talend
Performed ETL, Electronic Data Interchange (EDI), and resolved data quality issues with analysis
Created and managed policies in AWS Ranger for Hive.
Worked with EMR provisioning, updating through service catalog using Cloud formation.
Worked with Hadoop cluster set up, performance fine-tuning, monitoring and administration.
Worked with Hadoop Eco System, Hive, Salesforce, MapReduce YARN, Tez, Presto, Beeline, Pig, Spark, Scala
Worked with AWS Cloud formation to create service catalog to launch EMR clusters with desired setup.
Worked with AWS EMR, Ranger, and Hadoop technologies
Worked with Tableau, Pentaho, Salesforce, Informatica Power Center Development/IDQ Output
Developed, enhanced, supported, and integrated products and software solutions
Deployed, maintained, managed product and software solutions for various clients
Worked as a Tableau Engineer / Pentaho/ BI/Salesforce, Reporting platform to enable data access, analytic models, and visualization.
Worked with ESB(Enterprise Service Bus), Logic Apps, API, Data Management processes for Reporting and Analytics
Worked with Big Data technologies including Hadoop HDFS, MapReduce, Pig, Hbase, and Hive, Python, and SQL

Confidential, San Francisco, CA

Lead Big Data/Hadoop+Cloud (AWS/Azure)Azure Cloud ArchitectPentaho/Informatica/Talend Spark/Hive Hadoop Developer

Hands on Tools: Talend 5.4,Spark 1.6.0, SQL Server 2013, VB, C#, SQL, Oracle 11.2.0.4, ETL, Hadoop 2.4.1.

Project Environment: JSON, XML, IBM DataStage Spark API, Spark-SQL, Data Frames, SQL/Oracle, Talend OS ETL, SQL server, PostgreSQL

Responsibilities:

Experienced leveraging third party Cloud Service Providers (IaaS, PaaS, SaaS).
Experienced as an established hands - on technical leader of Cloud Software Engineering teams.
Hands on AWS experience with strong management skills
Experienced working with Agile development and methodologies and SaaS
Experienced working in building and delivering Cloud Services and Cloud Automation solutions.
Worked SalesForce Development and hosting Cloud services and able to translate business requirements into securely implemented capabilities in the Cloud.
Worked in data warehousing and high-performance enterprise computing environment.
Utilized Talend open studio & Talend Enterprise platform for big data management,
Worked with X12 EDI standards for healthcare data, HEDIS, and HIPAA
Worked with SalesForce Developer Tools, IBM DataStage, GCP, HBase, Big, Table, Big Query,Azure Data Factory pipelines and other Azure Data Platform to orchestrate management tasks using Azure Automation
Worked with Azure Data Factory (ADF), Data Cloud Architect, Azure, Ansible, Jenkins, Docker, Kubernetes, DevOps, Automation, CI/CD, SaaS solution to compose and orchestrate Azure data services.
Utilized Azure Data Factory to create, schedule and manage data pipelines
Worked with IBM DataStage, Big Data, Hadoop,Saleforce, Hive, Pig, Sqoop, Pentaho, Informatica
Established, maintained, and enforced ETL architecture design principles, techniques, standards, and best practices
Managed technical designs of ETL Electronic Data Interchange (EDI), reference architectures to ensure high data quality, data integration performance, error recovery/handling, optimize performance. Created and managed policies in AWS Ranger for Hive.
Worked with ESB(Enterprise Service Bus), Logic Apps, API,Azure Data Factory pipelines and other Azure Data Platform, Azure Automation
Worked with Azure Data Factory (ADF) since its a great SaaS solution to compose and orchestrate Azure data services.
Utilized Azure Data Factory to create, schedule and manage data pipelines
Worked with EMR provisioning, updating through service catalog using Cloud formation.
Worked with Hadoop cluster set up, performance fine-tuning, monitoring and administration.
Worked with Hadoop Eco System, Hive, MapReduce YARN, Tez, Presto, Beeline, Pig, Spark, Scala
Worked with AWS Cloud formation to create service catalog to launch EMR clusters with desired setup.
Worked with Looker, ESB(Enterprise Service Bus), Logic Apps, API,AWS EMR, Ranger,and Hadoop technologies
Worked with Informatica Cloud Data Integration Azure IaaS, SaaS, PaaS Electronic Data Interchange (EDI) to deliver accessible, trusted, and secured data to facilitate more valuable business decisions to identify competitive advantages to better service customers, and build an empowered workforce.
Used Informatica Cloud Data Integration for global, distributed data warehouse and analytics projects.
Worked with Electronic Data Interchange (EDI), Salesforce, Azure IaaS, PaaS, SaaS cloud data warehouses AWS Redshift, Azure SQL Data Warehouse, and Snowflake and Informatica Cloud Data Integration solutions to augment the performance, productivity, and extensive connectivity to cloud and on-premises
Worked with data migration S3 to Hive, Informatica Cloud since it’s flexible and scalable transformations and advanced capabilities, to seamlessly integrate growing data volumes across disparate sources in the data warehouse using wizards, preconfigured templates, and out-of-the-box mappings
Reviewed/assessed existing ETL applications to update features, performance improvements, upgrades, and ongoing sustainment
Conducted design reviews, code reviews, and performance tuning minimizing bottlenecks, maximizing performance
Researched and recommended future improvements in ETL, Pentaho, and Informatica, and daily functions
Worked with team to develop in-house knowledge repository for best practices, solution documentations and manuals
Used Pentaho, Informatica, Tableau, ETL, SaaS, PaaS, SaaS, SSIS for data management, data integration, data quality, MDM, data governance
Worked to architect, design and built Big Data solutions using Hive Hadoop for analysis and to build dashboards
Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time
Used Hadoop, Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Worked with SalesForce Development tools, Azure Data Factory pipelines and other Azure Data Platform to orchestrate management tasks using Azure
Worked with data migration S3 to Hive, Python, Pyspark, Azure Data Factory (ADF) since its a great SaaS solution to compose and orchestrate Azure
Utilized Azure Data Factory to create, schedule and manage data pipelines
Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
Worked with Hadoop, Azure SaaS, IaaS, Paas, SSIS, SSAS OLAP cubes, Pentaho, Hadoop, PIG, Hive, Spark, Oracle, and MS SQL Server
Implemented custom Error handling in ETL Pentaho Informatica jobs and worked on different methods of logging
Hands-on data integration, Tableau, ETL SSIS, Electronic Data Interchange (EDI), data management/data warehousing experience, Pentaho, Informatica PowerCenter/IDQ development with direct development experience Informatica Business Glossary, Informatica Versions 10.x/9.1/8.x/7.x Source Analyzer, Mapping Designer, Mapplet, Transformations, Workflow Monitor, Workflow Manager, Informatica developer IDQ 9.1/9.5.1, Informatica Power Mart 6.x/5.x, Power Exchange, Power Connect, Power Analyzer Data Profiling and Data cleaning, Flat file system Fixed width, Delimiter, OLAP.
Worked with SalesForce, Pentaho, Informatica MDM 9.X or 10.X, MDM Hub & IDD
Performed Informatica Data Profiling with IDQ and Analyzer
Performed analysis to identify data anomalies, data cleansing (ETL) and resolved data quality issues
Developed IDQ using Joiner to configure & Develop Business Rules
Performed data integration using Pentaho, Salesforce, Informatica, cross system joins for identifying duplicates and data anomalies
Created IDQ Dashboards/KPI Metrics for reporting. Performed ETL and resolved data quality issues with analysis
Extensively worked on Informatica IDE/IDQ. Involved in data profiling using IDQ (Analyst Tool) prior to data staging.
Used Pentaho, Informatica IDQ's standardized plans for addresses and names clean ups.
Worked on IDQ file configuration at user's machines and resolved the issues.
Used IDQ to complete initial data profiling and removing duplicate data, worked on IDQ admin tasks and as Admin and IDQ developer.
Implemented Informatica Business Glossary, Informatica Data Quality, MDM, PowerCenter, Analyst/Metadata Manager for glossary
Performed with Pentaho ETL Data Conversion and Data Transformation, SF Data Modeling, Jitterbit.
Used Talend Integration Suite and Talend Open Studio Strong knowledge and Experience in using Informatica Power Center ETL
Strong experience in data migration S3 to Hive Extraction, Transformation, loading (ETL) data from various sources into
Data Warehouses and Data Marts using Informatica Power Center (Designer, Workflow Manager,Workflow Monitor, Metadata Manger).
Performed data migration S3 to Hive, data manipulations using various Talend components like tMap, tJavarow, tjava, tOracleRow, tOracleInput, tOracleOutput, tMSSQLInput and many more.
Analyzing the source data to know the quality of data by using Talend Data Quality.
Troubleshoot data integration issues and bugs, analyze reasons for failure, implement optimal solutions, and revise procedures
Performed Electronic Data Interchange (EDI), Salesforce, Migration projects to migrate data from data warehouses on Oracle/DB2 and migrated those to Netezza.
Used SQL queries and other data analysis methods, as well as Talen Enterprise Data Quality
Performed profiling and comparison of data used to make decisions regarding how to measure business rules quality of the data.
Worked on Tableau Dashboards, TalendRTX ETL tool, develop jobs and scheduled jobs in Talend integration suite.
Responsible for tuning ETL mappings, Workflows and underlying data model to optimize load and query performance.
Developed Talend ESB services and deployed them on ESB servers on different instances.
Monitored and supported the Talend jobs scheduled through Talend Admin Center (TAC).
Developed Oracle PL/SQL, DDLs, and Stored Procedures and worked on performance tuning
Tuning of SQL Strong understanding of Dimensional Modeling, OLAP, Star, Snowflake Schema, Fact and Dimensional tables

Confidential, San Francisco, CA

Lead Talend Spark Hadoop Developer/ETL BI DW Dashboard Developer

Hands on Tools: Pentaho 5.x, Talend 5.3.0, Hadoop 2.4.1,Oracle 11.2.0.4, Informatica IDQ 9.6

Project Environment: ETL, AWS, S3, Redshift, Talend, Hadoop, Spark, Scala, Python, HiveQL, HQL, Data Visualizations Dashboards, Cloudera Hadoop YARN

Responsibilities:

Experienced leveraging third party Cloud Service Providers (IaaS, PaaS, SaaS).Experienced as an established hands - on technical leader of Cloud Software Engineering teams.
Hands on AWS experience with strong management skills
Experienced working with Agile development and methodologies and SaaS
Experienced working in building and delivering Cloud Services and Cloud Automation solutions.
Worked on hosting Cloud services and able to translate business requirements into securely implemented capabilities in the Cloud.
Worked in data warehousing and high-performance enterprise computing environment.
Worked as a Data Engineer to help form a cloud based big data engineering team to deliver platform automation and security.
Built data workflows by using AWS EMR, PySpark, Python, Spark SQL, Scala, and Python
Configured and installed tools with a highly available architecture, created ETL SSIS packages, VB, C#, SQL,
Designed, reviewed and fixed security vulnerabilities at network/subnet/security groups level
Created security standardized templates including password management strategy and implementation
Installed custom software and automated installation process
Optimized Redshift database for optimal performance
Expert knowledge of Salesforce, Azure, IaaS, PaaS, SaaS, AWS EMR, Spark, Scala, Hadoop, HortonWorks, S3, RedShift
Expert level knowledge of Linux/Unix, PowerShell, network security architecture and database tuning
Strong communication and teaching skill to train teams with proven problem solving and critical thinking skills
Performed in Extraction Transformation loading (ETL) data from various sources into Data Warehouses and Data Marts using
Worked with X12 EDI standards for healthcare data, HEDIS, and HIPAA
Worked with Big Data, Salesforce, Hadoop, Hive, Pig, Sqoop, Pentaho, Informatica
Established, maintained, and enforced ETL architecture design principles, techniques, standards, and best practices
Managed technical designs of ETL reference architectures to ensure high data quality, data integration performance, and error recovery/handling, optimize performance. Created and managed policies in AWS Ranger for Hive.
Worked with EMR provisioning, updating through service catalog using Cloud formation.
Worked with Hadoop cluster set up, performance fine-tuning, monitoring and administration.
Worked with Hadoop Eco System, Hive, MapReduce YARN, Tez, Presto, Beeline, Pig, Spark, Scala
Worked with AWS Cloud formation to create service catalog to launch EMR clusters with desired setup.
Worked with AWS EMR, Ranger, and Hadoop technologies
Worked with data migration S3 to Hive Informatica Cloud Data Integration, Azure IaaS, Paas, SaaS to deliver accessible, trusted, and secured data to facilitate more valuable business decisions to identify competitive advantages to better service customers, and build an empowered workforce.
Used Informatica Cloud Data Integration Electronic Data Interchange (EDI) for global, distributed data warehouse and analytics projects.
Worked with data migration S3 to Hive, cloud data warehouses, Tableau, AWS Redshift, Azure SQL Data Warehouse, and Snowflake and Informatica Cloud Data Integration solutions to augment the performance, productivity, and extensive connectivity to cloud and on-premises sources.
Worked with Informatica Cloud since it’s flexible and scalable transformations and advanced capabilities, to seamlessly integrate growing data volumes across disparate sources in the data warehouse using wizards, preconfigured templates, and out-of-the-box mappings
Reviewed/assessed existing ETL applications to update features, performance improvements, upgrades, and ongoing sustainment
Conducted design reviews, code reviews, and performance tuning minimizing bottlenecks, maximizing performance
Researched and recommended future improvements in ETL, Pentaho, and Informatica, and daily functions
Worked with team to develop in-house repository for best practices, solution documentations, manuals, and procedures for user education.
Used Pentaho, Informatica, ETL, SSIS for data management, data integration, data quality, MDM, data governance
Worked to architect, design, Tableau reporting, and built Big Data solutions using Hive Hadoop for analysis and to build dashboards
Used Hadoop, Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
Worked with Hadoop, Electronic Data Interchange (EDI), SSIS, SSAS OLAP cubes, Pentaho, Hadoop, PIG, Hive, Spark, Oracle, and MS SQL
Implemented custom Error handling in ETL Pentaho Informatica jobs and worked on different methods of logging
Hands-on data integration, data management/data warehousing experience, Pentaho, Informatica PowerCenter/IDQ development with direct development experience Informatica Business Glossary, Informatica Versions 10.x/9.1/8.x/7.x Source Analyzer, Mapping Designer, Mapplet, Transformations, Workflow Monitor, Workflow Manager, Informatica developer IDQ 9.1/9.5.1, Informatica Power Mart 6.x/5.x, Power Exchange, Power Connect, Power Analyzer Data Profiling and Data cleaning, Flat file system Fixed width, Delimiter, OLAP.
Worked with Pentaho, Electronic Data Interchange (EDI), Informatica MDM 9.X or 10.X, MDM Hub & IDD
Performed Informatica Data Profiling with IDQ and Analyzer
Performed analysis to identify data anomalies, Salesforce, Tableau reports, data cleansing (ETL) and resolved data quality issues
Developed IDQ using Joiner to configure & Develop Business Rules
Performed data integration using Pentaho, Informatica, cross system joins for identifying duplicates and data anomalies
Created IDQ Dashboards/KPI Metrics for reporting. Performed ETL and resolved data quality issues with analysis
Extensively worked on Informatica IDE/IDQ. Involved in data profiling using IDQ (Analyst Tool) prior to data staging.
Used Pentaho, Informatica IDQ's standardized plans for addresses and names clean ups.
Worked on IDQ file configuration at user's machines and resolved the issues.
Used IDQ to complete initial data profiling and removing duplicate data. And extensively worked on IDQ admin tasks and worked as both IDQ Admin and IDQ developer. Created ETL, SSIS packages, Salesforce, VB, C#, SQL coding.
Implemented Informatica Business Glossary, Informatica Data Quality, MDM, PowerCenter, Informatica Analyst/Metadata
Performed on Pentaho ETL Data Conversion and Data Transformation, SF Data Modeling, Jitterbit.
Worked with client and provider analytics, developed new data marts for new and existing data warehouses.
Worked with IDQ, Informatica PowerCenter, Oracle, Dimensional Data Modeling for Healthcare/Payor Data solutions
Worked on data integration using ETL SSIS, developing data models using ERWIN, PLSQL, Informatica Data Analyst, EPIC, Facets, Informatica MDM, Informatica IDD, TOAD, salesforce.

Confidential, Nashville, TN

Azure Architect/ETL Pentaho/Talend Big Data Hadoop Developer

Responsibilities:

Experienced in data migration S3 to Hive, Azure architecture, design and implementation on - prem and hybrid cloud solutions utilizing Azure and AWS
Experienced with DevOps, CI/CD pipeline tools Jenkins, RLM, Bitbucket.
Proficient in Linux, VMWare and container technologies
Experienced leveraging third party Cloud Service Providers (IaaS, PaaS, SaaS).
Experienced as an established hands-on technical leader of Cloud Software Engineering teams.
Worked with AWS experience with strong management skills
Experienced working with Agile development and methodologies and SaaS
Experienced working in building and delivering Cloud Services and Cloud Automation solutions.
Worked on hosting Cloud services and able to translate business requirements into securely implemented capabilities in the Cloud.
Worked in data warehousing and high-performance enterprise computing environment.
Performed data integration using ETL/Pentaho/Informatica/Talend Open Studio Integration Suite.
Used Pentaho/PDI, Kettle, Informatica, SSIS, Batch Data Analysis using Hive, SQL SSAS, VB, C#, SQL.
Worked with Healthcare Data/Claims 835 and 837 formats for analytical purposes, X12 Enterprise Data Integration(EDI), PHI
Worked with X12 EDI standards for healthcare data, HEDIS, and HIPAA
Worked with Informatica Cloud Data Integration to deliver accessible, trusted, and secured data to facilitate valuable business decisions to identify competitive advantages to better service customers, and build an empowered workforce.
Used Informatica Cloud Data Integration for global, distributed data warehouse and analytics projects.
Worked with cloud data warehouses, Tableau, AWS Redshift, Azure SQL Data Warehouse, and Snowflake and Informatica Cloud Data Integration solutions to augment the performance, productivity, and extensive connectivity to cloud and on-premises sources.
Worked with Informatica Cloud since it’s flexible and scalable transformations and advanced capabilities, to seamlessly integrate growing data volumes across disparate sources in the data warehouse using wizards, preconfigured templates, and out-of-the-box mappings
Worked with Big Data, Hadoop, Hive, Pig, Sqoop, Salesforce, Pentaho, Informatica
Established, maintained, and enforced SSIS packages, ETL architecture design principles, techniques, standards, and best practices
Managed technical designs of ETL reference architectures to data quality, data integration performance, and error recovery/handling, created SSIS packages, VB, C#, SQL, Java. Created and managed policies in AWS Ranger for Hive.
Worked with EMR provisioning, updating through service catalog using Cloud formation.
Worked with Hadoop cluster set up, performance fine-tuning, monitoring and administration.
Worked with Hadoop Eco System, Hive, MapReduce YARN, Tez, Presto, Beeline, Pig, Spark, Scala
Worked with AWS Cloud formation to create service catalog to launch EMR clusters with desired setup.
Worked with AWS EMR, Ranger, and Hadoop technologies
Worked with X12 EDI standards for healthcare data, HEDIS, and HIPAA
Worked with Big Data, Electronic Data Interchange (EDI), Hadoop, Hive, Pig, Sqoop, Pentaho, Informatica
Established, maintained, and enforced ETL architecture design principles, techniques, standards, and best practices
Managed technical designs of ETL Electronic Data Interchange (EDI) reference architectures to ensure high data quality, data integration performance, and error recovery/handling, optimize performance, created SSIS packages, VB, C#, SQL, MS SQL Server
Reviewed/assessed existing ETL applications to update features, performance improvements, upgrades, and ongoing sustainment
Conducted design reviews, code reviews, and performance tuning minimizing bottlenecks, maximizing performance
Researched and recommended future improvements in ETL, Pentaho, and Informatica, and daily functions
Worked with team to develop in-house knowledge repository for best practices, solution documentations and manuals
Used Pentaho, Informatica, ETL, Electronic Data Interchange (EDI), SSIS for data management, data integration, data quality, MDM, data governance
Worked to architect, design and built Big Data solutions using Hive Hadoop for analysis and to build dashboards
Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
Used Hadoop, Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
Worked with Hadoop, SSIS, SSAS OLAP cubes, Electronic Data Interchange (EDI), Pentaho, Hadoop, PIG, Hive, Spark, Oracle, and MS SQL Server
Implemented custom Error handling in ETL Pentaho Informatica jobs and worked on different methods of logging
Hands-on data integration, data management/data warehousing experience, Pentaho, Informatica PowerCenter/IDQ development with direct development experience Informatica Business Glossary, Informatica Versions 10.x/9.1/8.x/7.x Source Analyzer, Mapping Designer, Mapplet, Transformations, Workflow Monitor, Workflow Manager, Informatica developer IDQ 9.1/9.5.1, Informatica Power Mart 6.x/5.x, Power Exchange, Power Connect, Power Analyzer Data Profiling and Data cleaning, Flat file system Fixed width, Delimiter, OLAP.
Worked with Pentaho, Informatica MDM 9.X or 10.X, MDM Hub & IDD
Performed Informatica Data Profiling with IDQ and Analyzer
Performed analysis to identify data anomalies, data cleansing (ETL) and resolved data quality issues
Developed IDQ using Joiner to configure & Develop Business Rules
Performed data integration using Pentaho, Salesforce, Informatica, cross system joins for identifying duplicates and data anomalies
Created IDQ Dashboards/KPI Metrics for reporting. Performed ETL and resolved data quality issues with analysis
Extensively worked on Informatica IDE/IDQ. Involved in data profiling using IDQ (Analyst Tool) prior to data staging.
Used Pentaho, Informatica IDQ's standardized plans for addresses and names clean ups.
Worked on IDQ file configuration at user's machines and resolved the issues.
Used IDQ to complete initial data profiling and removing duplicate data. And extensively worked on IDQ admin tasks IDQ Admin and IDQ developer.
Implemented Informatica Business Glossary, Informatica Data Quality, MDM, PowerCenter, Informatica Analyst/Metadata
Performed on Pentaho ETL Data Conversion, Electronic Data Interchange (EDI), and Data Transformation, SF Data Modeling, Jitterbit.
Reviewed/assessed existing ETL applications to update features, performance improvements, upgrades, and ongoing sustainment
Conducted design reviews, code reviews, and performance tuning minimizing bottlenecks, maximizing performance
Researched and recommended future improvements in ETL, Pentaho, and Informatica, and daily functions
Developed in-house knowledge repository for best practices, solution documentations, manuals, and procedures for education
Used Pentaho, Informatica, ETL, SSIS for data management, data integration, data quality, MDM, data governance
Worked to architect, design and built Big Data solutions using Hive Hadoop for analysis and to build dashboards
Used Hadoop, Hive, Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
Worked with Informatica Cloud Data Integration to deliver accessible, trusted, secured data to facilitate valuable business decisions to identify competitive advantages to better service customers, and build an empowered workforce.
Used Informatica Cloud Data Integration for global, distributed data warehouse and analytics projects.
Worked with Salesforce,cloud data warehouses AWS Redshift, Azure SQL Data Warehouse, and Snowflake and Informatica Cloud Data Integration solutions to augment the performance, productivity, and extensive connectivity to cloud and on-premises sources.
Worked with Informatica Cloud since it’s flexible and scalable transformations and advanced capabilities, to seamlessly integrate growing data volumes across disparate sources in the data warehouse using wizards, preconfigured templates, and out-of-the-box mappings
Used Hadoop, Salesforce, Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Loaded the data into Spark RDD and do in memory data Computation to generate the Output response.
Worked with Hadoop, Salesforce, SSAS OLAP cubes, Pentaho, Salesforce, Hadoop, PIG, Hive, Spark, Oracle, and MS SQL Server
Implemented custom Error handling in ETL Pentaho Informatica jobs and worked on different methods of logging
Hands-on data integration, data management/data warehousing experience, Informatica PowerCenter/IDQ development with direct development experience Informatica Business Glossary, Informatica Versions 10.x/9.1/8.x/7.x Source Analyzer, Mapping Designer, Mapplet, Transformations, Workflow Monitor, Workflow Manager, Informatica developer IDQ 9.1/9.5.1, Informatica Power Mart 6.x/5.x, Power Exchange, Power Connect, Power Analyzer Data Profiling and Data cleaning, Flat file system Fixed.
Created and managed policies in AWS Ranger for Hive.
Worked with EMR provisioning, updating through service catalog using Cloud formation.
Worked with Hadoop cluster set up, performance fine-tuning, monitoring and administration.
Worked with Hadoop Eco System, Hive, MapReduce YARN, Tez, Presto, Beeline, Pig, Spark, Scala
Worked with AWS Cloud formation to create service catalog to launch EMR clusters with desired setup.
Worked with AWS EMR, Ranger, and Hadoop technologies
Worked with IB, Informatica MDM 9.X or 10.X, MDM Hub & IDD
Performed Informatica Data Profiling with IDQ and Analyzer
Performed analysis to identify data anomalies, data cleansing (ETL) and resolved data quality issues
Developed IDQ using Joiner to configure & Develop Business Rules
Performed cross system joins for identifying duplicates and data anomalies
Created IDQ, geospatial Dashboards/KPI Metrics for reporting. Performed ETL SSIS, resolved data issues with analysis
Extensively worked on Salesforce, Informatica IDE/IDQ. Involved in data profiling using IDQ (Analyst Tool) data staging.
Used IDQ's standardized plans for addresses and names clean ups.
Worked on IDQ file configuration at user's machines and resolved the issues.
Used IDQ to complete initial data profiling and removing duplicate, worked on IDQ admin tasks IDQ Admin and IDQ developer.
Implemented Informatica Business Glossary, Informatica Data Quality, MDM, PowerCenter, Informatica Analyst/Metadata
Performed on Pentaho ETL Data Conversion and Data Transformation, Salesforce, SF Data Modeling, Jitterbit.
Worked in Product Development (PDP) Data Governance Office (DGO), Business Glossary.
Responsible for developing Informatica Business Glossary solution based on functional and technical design specifications for business designs and technical requirements and develop a catalog structure in Informatica.
Worked with IBM, Hadoop, Star Schema, Dimension and Fact data models for dashboards, data visualizations projects
Worked to architect, design and built Big Data solutions using Hive Hadoop.
Designed and developed POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
Worked on Pentaho, Informatica, SSIS, SQL queries worked on the design, development and testing mappings.
Created dashboards, Tableau, data visualizations, ETL job infrastructure using Talend Open Studio, Hadoop, Informatica
Managed Error Handling, Performance Tuning, Error Logging, clustering and High Availability in Talend
Worked with IBM, Business Analysts to correlate business requirements to domain entities and data elements
Managed ETL, Tableau, Pentaho, Talend, Informatica, SSIS, interfacing components, dashboard designs, solution design and configuration activity
Monitored the daily runs, weekly runs and AdHoc runs to develop dashboards, load data into the target systems.
Created test plans, Electronic Data Interchange (EDI), test data for extraction and transformation processes and resolved data issues following the data standards.
Used Talend, Hadoop, IDQ tool for profiling, applying rules and develop mappings to move data from source to target systems.
Developed dashboards, Transformations, Mapplets and Mappings using Informatica Designer to implement business logic.
Presented dashboard design architectures to the various stakeholders, customers, servers, Network, Security and other teams.
Provided technical leadership and governance of the big data team and the implementation of solution architecture
Managed the architecture, Electronic Data Interchange (EDI), dashboard design changes due to business requirements and other interface integration changes
Provided an overall architect responsibilities including roadmaps, leadership, planning, technical innovation, security
Designed, Layout, and Deployed Hadoop clusters in the cloud using Hadoop ecosystem & open Source platforms
Configured and tuned production and development Hadoop environments with the various intermixing Hadoop components
Provided End-to-end systems implementation such as data security and privacy concerns
Designed and implemented Tableau geospatial big data ingestion, processing and delivery
Provided cloud-computing infrastructure solutions on Amazon Web Services AWS - EC2, VPCs, S3, IAM
Involved in the administration, configuration management, monitoring, debugging, and performance tuning, technical resolution on Hadoop applications suit, Hadoop platform, MapReduce
Worked with Tableau, star, snowflake schemas, indexing, aggregate tables, dimension tables, constraints, keys, and fact tables

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship