Big Data Solution Designer Resume

SUMMARY

10+ years of experience in Designing and Implementing Customer Intranet and Extranet Portal solutions for various clients in different sectors.
More than Four years of experience in Hadoop Development/Administration built on six years of experience in Java Application Development.
Good noledge and SME of Hadoop ecosystem.
Experienced on working with Big Data and Hadoop File System (HDFS).
Three years experience installing, configuring, testing Hadoop ecosystem components.
Hands on Experience in working with ecosystems like Hive, Pig, Sqoop, Map Reduce, Flume, OoZie.
Strong Knowledge of Hadoop and Hive and Hive's analytical functions.
Capturing data from existing databases dat provide SQL interfaces using Sqoop.
Efficient in building hive, pig and map Reduce scripts.
Implemented Proofs of Concept on Hadoop stack and different big data analytic tools, migration from different databases (me.e, MYSQL, Cassandra) to Hadoop.
Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, HBase
Loaded teh dataset into Hive for ETL Operation.
Good noledge on Hadoop Cluster architecture and monitoring teh cluster.
Experience in using DBvisualizer, Zoo keeper and Cloudera Manager
Worked in different areas including Collaboration, Document Management, Portal and Web Content Management.
Expert in SharePoint Governance and used Best Practices for numerous client requirements.
Implemented Taxonomy and Folksonomy for clients
Well rounded experience in teh full software life cycle development.
Excellent analytical, problem solving, and communication skills.
Development and Designing skills for multi - tiered web applications in Developing and Implementing Web Applications using VB.NET and Visual Studio .NET and Brainbench Certified for ASP.NET
Microsoft Certified Professional
Certified Enterprise Architect from University of Toronto.
Certified Hadoop Developer and Administrator.

TECHNICAL SKILLS

Programming Languages: .NET, VB.NET, C#.NET, ASP.NET, JAVA, J2EE, C++, C, Visual Basic 6.0, Python, R

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Giraffe, Tableau, SAS.

UNIX Tools: Apache, Yum, RPM

Web Technologies: ASP.NET, SharePoint Apps, AJAX, jQuery, InfoPath 2010, JSP, PHP, XML, XSLT, XPath, VBScript, JavaScript, and Unified Access Gateway, BHOLD, SPML, ADFS, UAG, Bash

Database Technologies: SQL Server 2005/2008, SQL Server Reporting Services, Forefront Identity Management 2010, Report Builder 3.0, Crystal Reports, Oracle 8i/9i/10g, DB2, MySQL, Cassandra, MongoDB

Operating Systems: Linux AS3.0, Solaris, WINDOWS NT/2000/XP/2003

Enterprise Applications: Forefront Identity Management 2010, Unified Access Gateway, Documentum, MQ Series, Siebel, eRoom, MS Office 2003/2007, MS FrontPage 2003

Web Services: SOAP, DAML-S, BPEL4WS, FIM Web Service, Kerberos

Source Control Systems: Visual Source Safe, Subversion, PVCS, ClearCase, Team Foundation Server.

Application Server: Tomcat, BEA Weblogic, IBM Web Sphere, MKS

PROFESSIONAL EXPERIENCE

Confidential

Big Data Solution Designer

Responsibilities:

Leverage Hadoop and Cloudera cluster to Perform large scale data extraction and ingestion into data Lake
Regression techniques and Reinforcement Learning techniques used on projects for predictive analytics
Neural Networks use cases done using CNN and RNN datasets.
Used Python/PySpark and NoSQL database to create modelling and prediction for teh data.
Converted SAS POC code into Python/PySpark for data analysis on legacy data
Predictive modeling and feature selection on teh large datasets in HDFS
Created Sentiment analysis using Chatbots in Scala and Python,
Developed a Scala project to profile pdsium entities and flow them into HS+DFS data lake
Analyze Hadoop clusters using Big Data ecosystem and its tools including Hive, Spark and MapReduce
Conduct in-depth research on Hive to analyze partitioned and bucketed data
Leveraged Sqoop to import data from RDBMS into HDFS
Collaborated on Java projects to bring data from various sources into Hive/HDFS.
Used CI/CD and Jenkins for development and Agile methodologies on teh projects
Developed ETL framework using Data Warehousing and Hive (including daily runs, error handling, and logging) to glean useful data and improve vendor negotiations
Performed cleaning and filtering on imported data using Hive and MapReduce.
Exporting a HDFS parquet file on customer data from HDFS to MySQL.
Analyzed teh Web/Application real time log data and flow to HDFS using Kafka/Spark Streaming.

Confidential

Responsibilities:

Regression techniques and Reinforcement Learning techniques used on projects for predictive analytics
Neural Networks use cases done using CNN and RNN datasets.
Used Keras and Tensorflow to model Deep learning datasets and predict results for various financial and non-financial projects.
Created Option plan financing simulation and prediction modelling using scikit learn, pyspark, hadoop and supervised learning
Created ML pipelines for analytics on datasets using Regression and classification models such as Logistic regression, K means, Random Forest.
Used R, SAS and Python for model creation, visualization, data cleaning.
Created Machine Learning Pipelines in R and Python to create multiple models for predictive analysis and save it in pickle.
Used Pyspark and No SQL database to create modelling and prediction for teh data.
Used Excel Modeling and Simulation using @RISK and Excel Precision Tree to create Decision Tree in order to solve multivariate Problem.

Confidential

Big Data and IoT Engineer

Responsibilities:

Leverage Hadoop and HDP to Perform large scale data ingestion and data analysis
Regression Analysis and Reinforcement Learning techniques used on projects
Used Excel Modeling and Simulation using @RISK and Excel Precision Tree to create Decision Tree in order to solve multivariate Problem.
Neural Networks POC done for teh client.
Created ML pipelines in R to perform data cleaning and visualization and used Python MLib in Anaconda to create Machine Learning models and analysis.
Derived teh logic using SAS analysis is used to create new SQL metadata structures on HDFS
Converted SAS POC code into R and pyspark for data analysis on legacy data
Predictive modeling and feature selection on teh large datasets in HDFS using pyspark
Worked on multiple Data extraction and ingestion use cases.
Worked in Shell Scripting and performed daily job troubleshooting and monitoring
Good expertise in reporting toolBASE-SAS 9.1.3
Analyze Hadoop clusters using big data analytic tools including Hive, Spark and MapReduce
Conduct in-depth research on Hive to analyze partitioned and bucketed data
Leveraged Sqoop to import data from RDBMS into HDFS
Developed ETL framework using Python and Hive (including daily runs, error handling, and logging) to glean useful data and improve vendor negotiations
Performed cleaning and filtering on imported data using Hive and MapReduce
Exporting a HDFS parquet file on customer data from HDFS to MySQL.
Extensive experience in data ingestion technologies such as Flume, Kafka
Used SSH to connect to various environments in both Dev and production using password less SSH technique.
Imported multiple SQL/No SQL database tables into HDFS with default delimiters using non-default file formats
Employ security using an autantication and authorization system such as Kerberos.
Provided guidance and techniques for Performance Tuning of Cluster and High Availability.
Developed Spark functions, MapReduce programs to parse teh historical data, populate staging tables and HDFS, and store teh refined data in partitioned tables in teh EDW.
Created Hive queries along with oozie workflows using Shell Script and Jenkins
Created Visualizations in MicroStrategy in teh forms of enterprise dashboards and reports
Created SQL Queries in MicroStrategy to filter our and optimize results for better datasets
Written Hive UDFS to extract data from SQL databases as well other RDBMS databases.
Created test cases during two-week sprints using agile methodology.
Used CI/CD with Git to run automated workflows for code deployment to all platforms
Designed data visualization to present current impact and growth.
Provided Hadoop training regarding installation configuration and rack awareness.
Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Familiarized with job scheduling using Fair Scheduler so dat CPU time is well distributed amongst all teh jobs.
Organized content to optimize query performance.
Analyzed teh Web/Application log data and flow to HDFS using Kafka.

Confidential

Senior Data Analyst/Data Scientist

Responsibilities:

Leverage Hadoop and HDP to analyze massive amounts of call data and identify and improve teh churn rates for teh clients
Analyze Hadoop clusters using big data analytic tools including Pig, Hive, and MapReduce
Conduct in-depth research on Hive to analyze partitioned and bucketed data
Developed NIFI to automate teh loading of data into HDFS and Hive for data pre-processing
Architected 10-node Hadoop clusters with HortonWorks
Successfully implemented HWX on a 15-node cluster in AWS
Used Logistic and other ML algorithms to provide churn prediction results and improve performance by 0.5 to 1 % churn.
Worked in Shell Scripting and performed daily job troubleshooting and monitoring
Worked on various data extraction and ingestion use cases for multiple projects.
Leveraged Sqoop to import data from RDBMS into HDFS
Experience in utilizing SAS Functions, Procedures, Macros, and other SAS application for data updates, data cleansing, and reporting.
Skilled in merging SAS datasets, preparing data, producing and validating reports, SAS formats, and managing data.
Migrated existing SAS ETL and analytics code to teh Spark.
Developed ETL framework using Python and Hive (including daily runs, error handling, and logging) to glean useful data and improve vendor negotiations
Performed cleaning and filtering on imported data using Hive and MapReduce
Converted Python Noteboks in Jupyter to Pysaprk Notebooks using Pandas and other libraries
Used R to create Machine Learning piplines to data clean visualize data and
Exporting a HDFS parquet file on customer data from HDFS to MySQL.
Extensive experience in data ingestion technologies such as Flume, Kafka and NiFi,
Used Kafka, NiFi and Flume to gain real time and near real time data in HDFS and HBase from 3rd party sources and Data Lakes.
Using teh scala build tool (sbt) with spark to package applications.
Used SSH to connect to various environments in both Dev and production using password less SSH technique.
Imported multiple No SQL database tables into HDFS with default delimiters using non-default file formats
Employ security using an autantication and authorization system such as Kerberos.
Provided guidance and techniques for Performance Tuning of Cluster and High Availability.
Developed Spark functions, MapReduce programs to parse teh historical data, populate staging tables and HDFS, and store teh refined data in partitioned tables in teh EDW.
Created Hive queries along with oozie workflows using Java application using HQL libraries.
Written Hive UDFS to extract data from NoSQL databases as well other RDBMS databases using HQL.
Created test cases during two week sprints using agile methodology.
Designed data visualization to present current impact and growth.
Developed Natural Language Processing and Machine Learning Systems using Python/C++.
Created MapReduce program in Python and ran on Centos using shell.
Provided Hadoop training regarding installation configuration and rack awareness.
Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Familiarized with job scheduling using Fair Scheduler so dat CPU time is well distributed amongst all teh jobs.
Involved in teh regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
Created AWS ER and EC2 cluster to mimic On Premise HWX clusters
Crated Roles and Users in AWS using Dashboard
Migrated 50 TB of data use AWS Direct Conenct to AWS buckets and Redshift on AWS.
Fix any stale configurations on teh cluster and Cloud setup.
Organized content to optimize query performance.
Analyzed teh web log data.

Confidential

Senior Big Data Engineer/Consultant

Responsibilities:

Providing Hadoop ecosystem guidance and topology. Provided feature and upgrade best practices and SME on product use, customizations and maintenance
Installing and upgrading Apache Hadoop ecosystem on a cluster using Ambari and command line.
Configuring and memory tuning Apache Spark/Scala Jobs within HortonWorks.
Regression Analysis and Reinforcement Learning techniques used on projects
Used Incremental import technique to import various kind of data formats,
Used various legacy Java APIs and classes within new Scala/Spark functions.
Exporting a HDFS parquet file on customer data from HDFS to MySQL.
Extensive experience in data ingestion technologies such as Flume, Kafka and NiFi,
Used Kafka, NiFi and Flume to gain real time and near real time data in HDFS and HBase from 3rd party sources and Data Lakes.
Created Various Oozie jobs in HortonWorks distribution and performed error handling in Oozie for teh same.
Troubleshoot various performance and Memory Management issues with Spark jobs.
Tuned teh Data structure and deserialized RDD storages. Using Kyro Serialization.
Using teh scala build tool (sbt) with spark to package applications.
Used SSH to connect to various environments in both Dev and production using password less SSH technique.
Imported multiple No SQL database tables into HDFS with default delimiters using non-default file formats
Employ security using an autantication and authorization system such as Kerberos.
Provided guidance and techniques for Performance Tuning of Cluster and High Availability.
Developed Spark functions, MapReduce programs to parse teh historical data, populate staging tables and HDFS, and store teh refined data in partitioned tables in teh EDW.
Created Hive queries along with oozie workflows using Java application
Written Hive UDFS to extract data from NoSQL databases as well other RDBMS databases.
Created test cases during two week sprints using agile methodology.
Designed data visualization to present current impact and growth.
Developed Natural Language Processing and Machine Learning Systems using Python/C++.
Created ML pipelines in R to perform data cleaning and visualization and used Python MLib in Jupyter/Spyder to create Machine Learning models and analysis.
Created MapReduce program in Python and ran on Centos using shell.
Provided Hadoop training regarding installation configuration and rack awareness.
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Assisted with data capacity planning and node forecasting.
Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Familiarized with job scheduling using Fair Scheduler so dat CPU time is well distributed amongst all teh jobs.
Involved in teh regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
Fix any stale configurations on teh cluster and Cloud setup.
Organized content to optimize query performance.
Analyzed teh web log data.

Confidential

Hadoop Data Engineer/Administrator

Responsibilities:

Providing Hadoop ecosystem guidance and topology. Provided feature awareness and upgraded best practices and SME on product use, customizations and maintenance in HortonWorks.
Installed and configured MapReduce, HIVE and teh HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
Supported code/design analysis, strategy development and project planning.
Created reports for teh BI team using Sqoop to export data into HDFS and Hive.
Replaced default Derby metadata storage system for Hive with MySQL system
Executed queries using Hive and developed Map-Reduce jobs to analyze data.
Developed Pig Latin scripts to extract teh data from teh web server output files to load into HDFS.
Developed teh Pig UDF's to preprocess teh data for analysis.
Developed Hive queries for teh analysts.
Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
Importing and exporting data in HDFS and Hive using Sqoop in Hortonworks.
Extracted files from MongoDB through Sqoop and placed in HDFS and processed teh data.
Environment setup/configuration was done using Apache Hadoop by Cloudera Manager.
Involved in loading data from LINUX and UNIX file system to HDFS.
Supported in setting up QA environment and updating configurations for implementing scripts with Shell and Pig.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Assisted with data capacity planning and node forecasting.
Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
Administrator for Pig, Hive and HBase installing updates, patches and upgrades.

Confidential

Technical Consultant

Responsibilities:

Lead Hadoop upgrade projects for Confidential to create Portal with enhanced functionality using Best Practice.
Created Governance Documentation for Hadoop Administration sites post migration to manage metadata
Replaced default Derby metadata storage system for Hive with MySQL system.
Developed Hive queries for teh analysts.
Managed Hadoop log files.
Analyzed teh web log data using teh HiveQL.
Provided Hadoop training regarding installation configuration and rack awareness.
Supported code/design analysis, strategy development and project planning.
Shared responsibility for administration of Hadoop, Hive and Pig
Created Design documents and provided Expertise is Azure Proxy ADFS sync.
Providing recommendations on best practice functionality, translating business for various departments.
Translating requirements into system based functionality, and planning and executing projects from inception to implementation.
Provided Hadoop ecosystem and mentorship to various Business Analysts and Infrastructure team.
Created Content migration POC from Open Text Live link to portal.
Used various legacy Java APIs and classes within new Scala/Spark functions.
Exporting a HDFS file format file on customer data from HDFS to MySQL.
Created Various Oozie jobs in Hortonworks distribution.
Test Internal Claims/Kerberos based autantication.
Upgraded and migrated 100 GB of content with site collection artifacts to new intranet
Provisioned and administered sites using PowerShell scripts for site creation, restore, import/export and backup as an administrator.
Used Publishing Best Practices to create publishing template ad artifacts
Organized content to optimize query performance.
Maintain current industry noledge of development concepts, best practices and procedures for 2013/2016 solutions.

Confidential

Technical Consultant

Responsibilities:

Lead teh upgrade projects for LSUC looking to create a Data and extranet application with enhanced functionality using Best Practice.
Created autantication for Hadoop application using best practice.
Created Governance Plan for Hadoop Ecosystem and its applications.
Installation and configuration, migration of server setup, farm setup, database setup along with datasets from various sources
Extracted data from RDBMS into HDFS and Hbase using Hcatalong functionality Cleaned teh data using Pig scripts and stored it in various locations.
Created Java application for running scripts and queries to do ETL on Data lakes and data warehouses
Used PowerShell commands to perform all administration related activities such as Site creation, backup and restore to other site collection. Change any settings with admin portal and update SP portal.
Monitored ULS and Timer Job logs to see any issues and discrepancy from administration perspective.
Performed troubleshooting activities using SP logs and best practices.
Provide technical guidance to teh team in teh technologies related to teh development and support of an Enterprise SharePoint environment
Used JQuery, JavaScript, Knockout.js to customize functionality on teh site

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship