- Overall +8 years of professional experience in information technology with an expert hand in the areas of BIG DATA, HADOOP, SPARK, HIVE, IMPALA, SQOOP, FLUME, KAFKA, SQL tuning, ETL development, report development, database development, data modelling and strong knowledge of oracle database architecture.
- Extensive experience in developing and designing data integration solutions using ETL tool such as Informatica Powercenter, Teradata Utilities for handling large volumes of data.
- Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/sub cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver.
- Well knowledge and experience in Cloudera ecosystem (HDFS, YARN, Hive, SQOOP, FLUME, HBASE, Oozie, Kafka, Pig), Data pipeline, data analysis and processing with hive SQL, IMPALA, SPARK, SPARK SQL.
- Strong experience with Informatica Designer, Workflow Manager, Workflow Monitor, Repository Manager.
- Create clusters in Google Cloud and manage the clusters using Kubernetes(k8s). Using Jenkins to deploy code to Google Cloud, create new namespaces, creating docker images and pushing them to container registry of Google Cloud.
- Extensive experience in Data Mining solutions to various business problems and generating data visualizations using Tableau, PowerBI, Birst, Alteryx.
- Using Flume, Kafka and Spark streaming to ingest real time or near real time data in HDFS.
- Analysed data and provided insights with R Programming and Python Pandas.
- Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
- Hands on experience on architecting the ETL transformation layers and writing spark jobs to do the processing.
- Worked on Python Open stack API's and used Python scripts to update content in the database and manipulate files.
- Experience in DWBI across verticals such as: Games, Sales, Online, Online Marketing, Social Media Analytics and ecommerce.
- Worked on migrating data from Teradata to AWS using Python and BI tools like Alteryx.
- Have good Programming experience with Python and Scala.
- Hands in experience on No SQL database like HBase, Cassandra.
- Analysing the way to migrate oracle database to redshift.
- Performed remediation on threats using FireEye NX and Helix HX.
- Experience in creating dashboards in Stack driver. Can setup alerting and create custom metrics using google API developer tools.
- Migrated Splunk 6.5 from bare metal servers to AWS.
- Experience with ETL workflow Management tools like Apache Airflow and have significant experience in writing the python scripts to implement the workflow.
- Experience with scripting languages like PowerShell, Perl, Shell, etc.
- Designed data systems for the social-local-mobile eCommerce system.
- Expert knowledge and experience in fact dimensional modelling (Star schema, Snow flake schema), transactional modelling and SCD (Slowly changing dimension)
- Extensive experience in writing MS SQL, T-SQL procedures, ORACLE TOAD functions and queries.
- Create and Manage Private Lab with Dell PowerEdge and AWS to host Splunk Clustered Environment.
- Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy.
- Effective team member, collaborative and comfortable working independently
- Proficient in achieving oracle SQL plan stability, maintaining baselines with SQL plans, ASH, AWR, ADDM, Sql Advisor for pro-active follow up and SQL rewrites.
- Used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval.
- Designing and implementing Splunk - based best practice solutions.
- Experience on Shell scripting to automate various activities.
- Used JSON schema to define table and column mapping from S3 data to Redshift
- Application development with oracle forms and report with OBIEE, discoverer, report builder and ETL development.
ETL: Informatica Power Center 10.x/ 9.6/9.1/8.6/8.5/8.1/7.1 , Informatica PowerExchange
Hadoop/Big Data: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Spark, Kafka, Storm
No SQL Databases: HBase, Cassandra, MongoDB
Languages: C, C++, Java, Python, Scala, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts
Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery
Frameworks: MVC, Struts, Spring, Hibernate
Operating Systems: HP-UNIX, Red Hat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Confidential, Kansas City, KS
Sr. Data Engineer
- Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.
- Design and architect various layer of Data lake.
- Design star schema in Big Query
- Working on the migration of mobile application from skava to cloud (Google Cloud) by making the chunk of code to microservices.
- Perform Informatica Cloud Services, Informatica Power Center Administration ETL strategies and ETL Informatica mapping. Setting up of Secure Agent and connect different applications and its Data Connectors for processing the different kinds of data including unstructured (logs, click streams, Shares, likes, topics etc..), semi structured (XML, JSON) and structured like RDBMS.
- Install and configured splunk Enterprise environment on linux, Configured Universal and Heavy forwarder.
- Extensive data knowledge in supply chain, Services, Ecommerce, Retail, Banking and Media domains .
- Using rest API with Python to ingest Data from and some other site to BIGQUERY.
- Build a program with Python and apache beam and execute it in cloud Dataflow to run Data validation between raw source file and Big query tables.
- Work on the automation factory building and Alteryx server setup to improve the reporting process and enhance the customer experience.
- Building a Scala and spark based configurable framework to connect common Data sources like MYSQL, Oracle, Postgres, SQL Server, Salesforce, Big query and load it in Big query.
- Optimization and troubleshooting, test case integration into CI/CD pipeline using docker images.
- Extensive Knowledge and hands-on experience implementing PaaS, IaaS, SaaS style delivery models inside the Enterprise (Data centre) and in Public Clouds using like AWS, Google Cloud, and Kubernetes etc.
- Extensively worked on Informatica tools like source analyzer, mapping designer, workflow manager, workflow monitor, Mapplets, Worklets and repository manager.
- Building data pipeline ETLs for data movement to S3, then to Redshift.
- Designed and implemented ETL pipelines between from various Relational Data Bases to the Data Warehouse using Apache Airflow.
- Hands-on experience with Informatica power center and power exchange in integrating with different applications and relational databases
- Prepared dashboards using Tableau for summarizing Configuration, Quotes, Orders and other e-commerce data.
- Monitoring Big query, Dataproc and cloud Data flow jobs via Stack driver for all the environment.
- Configured EC2 instances and configured IAM users and roles and created S3 data pipe using Boto API to load data from internal data sources.
- Hands on experience with Alteryx software for ETL, data preparation for EDA and performing spatial and predictive analytics.
- Submit spark jobs using gustily and spark submission get it executed in Dataproc cluster
- Write a Python program to maintain raw file archival in GCS bucket.
- Experienced in configuration of the splunk input and output configuration files.
- Provided Best Practice document for Docker, Jenkins, Puppet and GIT
- Expertise in implementing DevOps culture through CI/CD tools like Repos, Code Deploy, Code Pipeline, GitHub.
- Developed Shell Scripts for Automation and dependency functions.
- Used dynamic cache memory and index cache to improve the performance of Informatica server
- Backing up AWS Postgres to S3 on daily job run on EMR using Data Frames.
- Developed server-based web traffic using RESTful API's statistical analysis tool using Flask, Pandas.
- Analyse various type of raw file like Json, Csv, Xml with Python using Pandas, Numpy etc.
- Write Scala program for spark transformation in Dataproc.
Environment: Informatica Power Center 10.x/9.x, Gcp, Big query, Gcs Bucket, G-Cloud Function, Apache Beam, Cloud Dataflow, Cloud Shell, AWS, AWS S3, Splunk, Alteryx, Cloud Sql, MySQL, Posgres, Sql Server, Salesforce Sql, Python, Scala, Spark, Hive, Sqoop, Spark-Sql.
Confidential, Chicago, IL
- Using g-cloud function with Python to load Data in to Big query for on arrival csv files in GCS bucket.
- Write a program to download a SQL Dump from there equipment maintenance site and then load it in GCS bucket. On the other side load this SQL dump from GCS bucket to MYSQL (hosted in Google cloud SQL) and load the Data from MYSQL to Big query using Python, Scala, spark and Dataproc.
- Integrate Collibra with Data Lake using Collibra connect API.
- Involved in creating the Tables in Greenplum and loading the data through Alteryx for Global Audit Tracker.
- Implemented Change Data Capture using Informatica Power Exchange 9.1.
- Designed, developed Informatica Power Center 9.5 mappings to extract, transform and load the data into Oracle 11g target tables.
- Create Knowledge Objects, regex statement and splunk instances.
- Process and load bound and unbound Data from Google pub/sub topic to Big query using cloud Dataflow with Python.
- Worked with ETL tools Including Talend Data Integration, Talend Big Data, Pentaho Data Integration and Informatica.
- Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
- Developed Python scripts to automate the ETL process using Apache Airflow and CRON scripts in the UNIX operating system as well.
- Setup Alerting and monitoring using Stack driver in GCP.
- Creating, maintain, support, repair, customizing System & Splunk applications, search queries and dashboards.
- Developed various ETL flows using Informatica power center and power exchange
- Created iterative macro in Alteryx to send Json request and download Json response from webservice and analyze the response data.
- Developed shell scripts for job automation, which will generate the log file for every job.
- Extensively used spark SQL and Data frames API in building spark applications.
- Experience in cloud versioning technologies like GitHub.
- Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
- Create firewall rules to access Google Data proc from other machines.
- Write Scala program for spark transformation in Dataproc.
Environment: Informatica Power Center 9.5, Gcp, Bigquery, Gcs Bucket, G-Cloud Function, Apache Beam, Alteryx, Cloud Dataflow, Cloud Shell, Splunk, Cloud Sql, MySQL, AWS Glue, Posgres, Sql Server, Python, Scala, Spark, Hive, Spark-Sql.
Confidential, San Diego, CA
- Analysing client data using Scala, spark, spark SQL and define an end to end data lake presentation towards the team
- Design the transformation layers to write the ETL using Scala and spark and distribute among the team including me.
- Created interactive Alteryx workflow using Action tool for Giftcard & Donation Tracking related projects.
- Keep the team motivated to deliver the project on time and work side by side with other members as a team member.
- Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshift.
- Used Debugger in Informatica Power Center Designer to check the errors in mapping.
- Do fact dimensional modelling and proposed solution to load it
- Processing data with Scala, spark, spark SQL and load in hive partition tables in parquet file format
- Develop spark job with partitioned RDD (like hash, range, custom) for faster processing
- Develop near real time data pipeline using flume, Kafka and spark stream to ingest client data from their web log server and apply transformation
- Develop SQOOP script and SQOOP job to ingest data from client provided database in batch fashion on incremental basis.
- Published Alteryx workflow in Alteryx Gallery as well as scheduled the flow in Alteryx Server.
- Use DISTCP to load files from S3 to HDFS and Processing, cleansing and filtering data using Scala, Spark, Spark SQL, HIVE, Impala Query and Load in Hive tables for data scientists to apply their ML algorithms and generate recommendations as part of data lake processing layer.
- Building part of oracle database in Redshift
- Loading data in No SQL database (HBase, Cassandra)
- Load data into Amazon Redshift and use AWS Cloud Watch to collect and monitor AWS RDS instances within Confidential.
- Involved in performance tuning of Informatica jobs.
- Combine all the above steps in oozie workflow to run the end to end ETL process
- Using YARN in CLOUDERA manager to monitor job processing
- Developing under scrum methodology and in a CI/CD environment using Jenkin.
- Do participate in architecture council for database architecture recommendation
- Utilized Unix Shell Scripts for adding the header to the flat file targets.
- Preparation of the Test Cases and involvement in Unit Testing and System Integration Testing.
- Deep analysis on SQL execution plan and recommend hints or restructure or introduce index or materialized view for better performance
- Deploy EC2 instances for oracle database
Environment: ETL, Informatica 8.x, Hadoop Ecosystem (HDFS, Yarn, Pig, Hive, Alteryx, Sqoop, Flume, Oozie, Kafka, Hive Sql, Impala, Spark, Scala, HBase, Cassandra, EC2, EBS Volume, AWS, VPC, S3, Oracle 12c, Oracle Enterprise Linux, Shell Scripting.
- Devised simple and complex SQL scripts to check and validate Dataflow in various applications.
- Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
- Devised PL/SQL Stored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation and Materialized views to optimize query performance.
- Developed logistic regression models (using R programming and Python) to predict subscription response rate based on customer’s variables like past transactions, response to prior mailings, promotions, demographics, interests and hobbies, etc.
- Developed K-shell scripts to run from Informatica pre-session, post session commands.
- Created Tableau dashboards/reports for data visualization, Reporting and Analysis and presented it to Business.
- Design and develop spark job with Scala to implement end to end data pipeline for batch processing
- Created Data Connections, Published on Tableau Server for usage with Operational or Monitoring Dashboards.
- Knowledge in Tableau Administration Tool for Configuration, adding users, managing licenses and data connections, scheduling tasks, embedding views by integrating with other platforms.
- Worked with senior management to plan, define and clarify dashboard goals, objectives and requirement.
- Responsible for daily communications to management and internal organizations regarding status of all assigned projects and tasks.
Environment: Hadoop Ecosystem (HDFS), Yarn, Pig, Hive, Sqoop, Flume, Oozie, Kafka, Hive Sql, Impala, Spark.