Data Scientist/Data Engineer Resume Indianapolis, IN - Hire IT People

SUMMARY:

8+years of experience in Analysis, Design, Development and Implementation as a Data Engineer.
Expert in providing ETL solutions for any type of business model.
Provided and constructed solutions for complex data issues.
Experience in development and design of various scalable systems using Hadoop technologies in various environments. Extensive experience in analyzing data using Hadoop Ecosystems including HDFS, MapReduce, Hive & PIG.
Experience in understanding the security requirements for Hadoop.
Extensive experience in working with Informatica PowerCenter
Implemented Integration solutions for cloud platforms with Informatica Cloud.
Worked with Java based ETL tool, Talend.
Proficient in SQL, PL/SQL and Python coding.
Experience developing On - premise and Real Time processes.
Excellent understanding of best practices of Enterprise Data Warehouse and involved in Full life cycle development of Data Warehousing.
Expertise in DBMS concepts.
Involved in building Data Models and Dimensional Modeling with 3NF, Star and Snowflake schemas for OLAP and Operational data store (ODS) applications.
Skilled in designing and implementing ETL Architecture for cost effective and efficient environment.
Optimized and tuned ETL processes & SQL Queries for better performance.
Performed complex data analysis and provided critical reports to support various departments.
Work with Business Intelligence tools like Business Objects and Data Visualization tools like Tableau.
Extensive Shell/Python scripting experience for Scheduling and Process Automation.
Good exposure to Development, Testing, Implementation, Documentation and Production support.
Develop effective working relationships with client teams to understand and support requirements, develop tactical and strategic plans to implement technology solutions, and effectively manage client expectations.
Solid knowledge and experience in Deep Learning techniques including Feedforward Neural Network, Convolutional Neural Network (CNN), Recursive Neural Network (RNN)
Hypothesis Testing, T-Test, Z Test, Gradient descent, Newton’s Method, ANOVA test, Chi-square test. Libraries: Numpy , Pandas , Matplotlib , Scikit-learn , NLTK , plotly , Seaborn , Scikit-Image , Open CV Tools
Actively contributed in all phases of the project life cycle including Data Acquisition (Web Scraping), Data Cleaning, Data Engineering (Dimensionality Reduction (PCA & LDA), normalization, weight of evidence, information value), Feature Selection, Features Scaling & Features Engineering, Statistical Modeling (decision trees, regression models, neural networks, SVM, clustering), Testing and Validation (ROC plot, k-fold cross validation) and Data Visualization.
Implemented Baye’s Net, Viterbi algorithm, Image processing using Gaussian noise
Worked with various text analytics or Word Embedding libraries like Word2Vec, Count Vectorizer, GloVe, LDA etc.
Skilled in Advanced Regression Modeling, Time Series Analysis, Statistical Testing, Correlation, Multivariate Analysis, Forecasting, Model Building, Business Intelligence tools and application of Statistical Concepts.
Worked on several python packages like NumPy, Pandas, Matplotlib, SciPy, Seaborn and Scikit-learn.
Experience in using cloud services AWS, Azure, GCP including EC2, S3, AWS Lambda and EMR.
Experience working with statistical and regression analysis, multi-objective optimization.
Good knowledge on Performance metrics to evaluate Algorithm's performance.
Worked with clients to identify analytical needs and documented them for further use.
Worked with outlier analysis with various methods like Z-Score value analysis, Liner regression, Dbscan (Density Based Spatial Clustering of Applications with Noise) and Isolation forest.
Worked on Gradient Boosting decision trees with XGBoost to improve performance and accuracy in solving problems. Also worked with several boosting methodologies like ADA Boost, Gradient Boosting and XGBoost.
Worked and extracted data from various database sources like Oracle, SQL Server, DB2, MongoDB and Teradata.

SKILL:

Languages: R, SQL, Python, Shell scripting, Java, Scala, C++.

IDE: R Studio, Jupyter Notebook, PyCharm, Atom.

Databases: Oracle 11g, SQL Server, MS Access, MySQL, MongoDBCassandra PL/SQL, ETL.

Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Impala, kafka, Spark MLLib. PySpark, Sqoop.

Systems: Windows XP/7/8/10, Ubuntu, Unix, Linux

Packages: ggplot2, caret, dplyr, RWeka, gmodels, RCurl, tm, C50Wordcloud, Kernlab, Neuralnet, twitter, NLP, Reshape2, rjsonplyr, pandas, NumPy, seaborn, SciPy, matplotlib, scikit-learnBeautiful Soup, Rpy2, Tensorflow, Pytorch, CNN, RNN, XGBoost

Technologies: HTML, CSS, PHP, JavaScript

Tools: R console, Python (NumPy, pandas, SciKit-learn, SciPy), SPSS.

Visualization: Tableau, SSAS, SSRS, QuickView, Business Objects, Power BI, and Cognos.

Data Warehousing: Informatica Power Center 9.x/8.x/7.x, Informatica Cloud, Talend Open studio

Version Controls: GIT, SVN

Cloud: Google Cloud, Azure, AWS

WORK EXPERIENCE:

Confidential, Indianapolis, IN

Data Scientist/Data Engineer

Responsibilities:

Analyze and cleanse raw data using HiveQL
Experience in data transformations using Map-Reduce, HIVE for different file formats.
Involved in converting Hive/SQL queries into transformations using Python
Performed complex joins on tables in hive with various optimization techniques
Created Hive tables as per requirements, internal or external tables defined with appropriate static and dynamic partitions, intended for efficiency
Worked extensively with HIVE DDLS and Hive Query language (HQLs)
Involved in loading data from edge node to HDFS using shell scripting.
Understand and manage Hadoop Log Files.
Manage Hadoop infrastructure with Cloudera Manager.
Created and maintained technical documentation for launching Hadoop cluster and for executing Hive queries.
Build Integration between applications primarily Salesforce.
Extensive work in Informatica Cloud.
Expertise in Informatica cloud apps Data Synchronization (ds), Data Replication (dr), Task Flows, Mapping configurations, Real Time apps like process designer and process developer.
Work extensively with flat files. Loading them into on-premise applications and retrieve data from applications to files.
Work with WSDL, SOAP UI for APIs
Write SOQL queries, create test data in salesforce for informatica cloud mappings unit testing.
Prepare TDDs, Test Case documents after each process has been developed.
Identify and validate data between source and target applications.
Verify data consistency between systems.
Responsible for supervising the Data cleansing, Validation, data classifications and data modelling activities.
To develop algorithms in python like K - Means, Random Forest linear regression, XG Boost and SVM.as part of data analysis.
Built streaming pipeline with confluent AWS with python to support CI/CD

Environment : Python, Bigdata ECO systems, Hadoop, HDFS, Hive, PIG, Cloudera, MapReduce, Python, Informatica Cloud Services, Salesforce, Unix scripts, Flat Files, XML files, and AWS.

Confidential, Austin, TX

Data Scientist/ Data Engineer

Responsibilities:

Designed a data workflow model to create a data lake in Hadoop ecosystem so that reporting tools like Tableau can plugin to generate the necessary reports
Created Source to Target Mappings (STM) for the required tables by understanding the business requirements for the reports
Developed Py Spark and Spark SQL code to process the data in Apache Spark on Amazon EMR to perform the necessary transformations based on the STMs developed
Hive tables were created on HDFS to store the data processed by Apache Spark on the Cloudera Hadoop Cluster in Parquet format.
Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
Loading log data directly into HDFS using Flume.
Leveraged AWS S3 as storage layer for HDFS.
Encoded and decoded json objects using PySpark to create and modify the dataframes in Apache Spark
Used Bit Bucket as the code repository and frequently used Git commands to clone, push, pull code to name a few from the Git repository
Hadoop Resource manager was used to monitor the jobs that were run on the Hadoop cluster
Used Confluence to store the design documents and the STMs
Meet with business and engineering teams on a regular basis to keep the requirements in sync and deliver on the requirements
Used Jira as an agile tool to keep track of the stories that were worked on using the Agile methodology
Involved in creating various regression and classification algorithms by using various sklearn libraries such as Linear Regression, Decision Trees, and Random Forest.
Involved in creating Machine Learning models for hyper tuning test content which is useful for making better decisions regarding the products.

Environment : SPARK, Hive, Pig, Flume Intellij IDE, AWS CLI, AWS EMR, AWS S3, Rest API, shell scripting, Git, Spark, PySpark, SparkSQL, Spyder IDE, Tableau.

Confidential

Python Developer/Data Analyst

Responsibilities:

The work will involve the development of workflows triggered by events from other systems.
Develop easy to use documentation for the frameworks and tools developed for adaption by other teams.
Developed Hive UDFs and Pig UDFs using Python in Microsoft HDInsight environment.
Implemented end-to-end systems for Data Analytics, Data Automation and customized visualization tools using Python, R, Hadoop and MongoDB.
Used Pandas, NumPy, seaborn, SciPy, matplotlib, SciKit-learn, Keras, Tensorflow, Open CV, PyTorch in Python for developing various machine learning algorithms.
Performed data profiling to merge the data from multiple data sources.
Worked on csv, json, excel different types of files for the data cleaning and data analysis.
Used Python for statistical operations on the data and ggplot2 for the visualizing the data.
Worked with several use cases like campaign sales analysis, forecasting sales, KPI analysis.
Managed offshore projects and coordinated work for 24-hour productivity cycle
Designed and developed a horizontally scalable APIs using Python Flask.
Experience in developing entire frontend and backend modules using Python on Django and Flask Web Frameworks.
Worked on development of SQL and stored procedures on MYSQL, SQLAlchemy.

Environment: Python, JavaScript, Django Framework 1.3, Flask, HTML, CSS, SQL, MySQL, LAMP, JQuery, Apache web server, SQLAlchemy.

Confidential

ETL/Informatica Developer

Responsibilities:

Analyze requirements from Business users
Perform data analysis for any requirement and provide source to target mapping rule document
Data validation/profiling by writing complex SQL queries by joining several tables.
Identifying the source to target mapping attributes under different source systems.
Designed data models to support user's business requirements.
Designed and developed complex aggregate, joiner, look up transformation rules (business rules) to generate consolidated (fact/summary) data identified by dimensions using Informatica ETL Power Center.
Used the Slowly Changing Dimensions wizard (type 2) to update the data in the target dimension tables.
Created sessions, database connections and batches using Informatica Server Manager/Workflow Manager.
Optimized mappings, sessions/tasks, source, and target databases as part of the performance tuning.
Configured the server and email variables using Informatica Server Manager/Workflow Manager.
Used all types of caches like dynamic, static and persistent caches while creating sessions/tasks.
Used Metadata Reporter to run reports against the repository.
Designed the physical structures necessary to support the logical database design.
Designed processes to extract, transform, and load data to the Data Mart.
Involved in Informatica mappings development using Power Center designer and server manager/Workflow Manager to create the sessions and did lot of testing and data cleansing.

Environment: Informatica Power Center 8X (Repository Manger, Designer, Workflow Monitor, Workflow Manager), SQL server, Netezza 4.2, SQL, PL/SQL, UNIX.

We provide IT Staff Augmentation Services!

Data Scientist/data Engineer Resume

Indianapolis, IN

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship